在安装了PostGis 2.2.0的Postgres 9.5数据库中,我有两个带有几何数据(点)的表,我想将一个表中的点分配给另一个表中的点,但我不想要分配两次.一旦分配了一个,就不应该将其分配给另一个.buildings.gid
buildings.gid
pvanlagen.buildid
buildings
:
CREATE TABLE public.buildings ( gid numeric NOT NULL DEFAULT nextval('buildings_gid_seq'::regclass), osm_id character varying(11), name character varying(48), type character varying(16), geom geometry(MultiPolygon,4326), centroid geometry(Point,4326), gembez character varying(50), gemname character varying(50), krsbez character varying(50), krsname character varying(50), pv boolean, gr numeric, capac numeric, instdate date, pvid numeric, dist numeric, CONSTRAINT buildings_pkey PRIMARY KEY (gid) ); CREATE INDEX build_centroid_gix ON public.buildings USING gist (st_transform(centroid, 31467)); CREATE INDEX buildings_geom_idx ON public.buildings USING gist (geom);
pvanlagen
:
CREATE TABLE public.pvanlagen ( gid integer NOT NULL DEFAULT nextval('pv_bis2010_bayern_wgs84_gid_seq'::regclass), tso character varying(254), tso_number numeric(10,0), system_ope character varying(254), system_key character varying(254), location character varying(254), postal_cod numeric(10,0), street character varying(254), capacity numeric, voltage_le character varying(254), energy_sou character varying(254), beginning_ date, end_operat character varying(254), id numeric(10,0), kkz numeric(10,0), geom geometry(Point,4326), gembez character varying(50), gemname character varying(50), krsbez character varying(50), krsname character varying(50), buildid numeric, dist numeric, trans boolean, CONSTRAINT pv_bis2010_bayern_wgs84_pkey PRIMARY KEY (gid), CONSTRAINT pvanlagen_buildid_fkey FOREIGN KEY (buildid) REFERENCES public.buildings (gid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid) ); CREATE INDEX pv_bis2010_bayern_wgs84_geom_idx ON public.pvanlagen USING gist (geom);
我的想法是在表中添加一个boolean
列pv
,该列在分配buildings
时设置buildings.gid
:
UPDATE pvanlagen SET buildid=buildings.gid, dist='50' FROM buildings WHERE buildid IS NULL AND buildings.pv is NULL AND pvanlagen.gemname=buildings.gemname AND ST_Distance(ST_Transform(pvanlagen.geom,31467) ,ST_Transform(buildings.centroid,31467))<50; UPDATE buildings SET pv=true FROM pvanlagen WHERE buildings.gid=pvanlagen.buildid;
我测试了50行,buildings
但申请所有这些都需要很长时间.我有3.200.000建筑物和260.000 PV.
在gid
最近的建筑物应进行分配.如果是关系,那么gid
分配哪个并不重要.如果我们需要制定规则,我们可以采用较低的建筑物gid
.
50米意味着作为限制.我ST_Distance()
之所以使用,是因为它返回的最小距离应该在50米以内.后来我多次提出它,直到每个PV Anlage都被分配.
建筑物和PV被分配到各自的区域(gemname
).这应该使分配更便宜,因为我知道最近的建筑必须在同一区域内(gemname
).
我在下面的反馈后尝试了这个查询:
UPDATE pvanlagen p1 SET buildid = buildings.gid , dist = buildings.dist FROM ( SELECT DISTINCT ON (b.gid) p.id, b.gid, b.dist::numeric FROM ( SELECT id, ST_Transform(geom, 31467) FROM pvanlagen WHERE buildid IS NULL -- not assigned yet ) p , LATERAL ( SELECT b.gid, ST_Distance(ST_Transform(p1.geom, 31467), ST_Transform(b.centroid, 31467)) AS dist FROM buildings b LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid WHERE p1.buildid IS NULL AND b.gemname = p1.gemname ORDER BY ST_Transform(p1.geom, 31467) <-> ST_Transform(b.centroid, 31467) LIMIT 1 ) b ORDER BY b.gid, b.dist, p.id -- tie breaker ) x, buildings WHERE p1.id = x.id;
但它回归0 rows affected in 234 ms execution time
.
我哪里错了?
要强制执行您的规则,只需声明pvanlagen.buildid
UNIQUE
:
ALTER TABLE pvanlagen ADD CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid);
building.gid
是你的更新透露的PK.为了还强制引用完整性的添加FOREIGN KEY
约束来buildings.gid
.
你现在已经实现了.但是在添加这些约束之前运行UPDATE
下面的大数据会更有效.
您的表定义中还有许多内容需要改进.一个,buildings.gid
也pvanlagen.buildid
应该是类型integer
(或者bigint
如果你燃烧了很多 PK值).numeric
是昂贵的废话.
让我们关注核心问题:
案件并不像看起来那么简单.这是一个"最近邻居"问题,具有独特任务的额外复杂性.
此查询查找每个PV 最近的一个建筑物(PV Anlage的缩写 - 排入pvanlagen
),其中既没有分配,但是:
SELECT pv_gid, b_gid, dist FROM ( SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467 FROM pvanlagen WHERE buildid IS NULL -- not assigned yet ) p , LATERAL ( SELECT b.gid AS b_gid , round(ST_Distance(p.geom31467 , ST_Transform(b.centroid, 31467))::numeric, 2) AS dist -- see below FROM buildings b LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid -- also not assigned ... WHERE p1.buildid IS NULL -- ... yet -- AND p.gemname = b.gemname -- not needed for performance, see below ORDER BY p.geom31467 <-> ST_Transform(b.centroid, 31467) LIMIT 1 ) b;
为了使这个查询快,你需要在空间,功能性的GiST指数buildings
,使其多快:
CREATE INDEX build_centroid_gix ON buildings USING gist (ST_Transform(centroid, 31467));
不知道你为什么不这样做
相关答案以及更多解释:
具有多个自连接执行缓慢的大型表上的空间查询
如何查询坐标5英里范围内的所有行?
进一步阅读:
http://workshops.boundlessgeo.com/postgis-intro/knn.html
http://www.postgresonline.com/journal/archives/306-KNN-GIST-with-a-Lateral-twist-Coming-soon-to-a-database-near-you.html
在索引到位的情况下,我们不需要将匹配限制为相同gemname
的性能.只有这是强制执行的实际规则才能执行此操作.如果必须始终观察,请在FK约束中包含该列:
将外键关系限制为相关子类的行
我们可以在UPDATE
语句中使用上面的查询.每个PV仅使用一次,但不止一个PV可能仍然找到最接近的同一建筑物.每栋建筑只允许一个 PV.那你怎么解决这个问题?
换句话说,你会如何在这里分配对象?
一个简单的解决方案是:
UPDATE pvanlagen p1 SET buildid = sub.b_gid , dist = sub.dist -- actual distance FROM ( SELECT DISTINCT ON (b_gid) pv_gid, b_gid, dist FROM ( SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467 FROM pvanlagen WHERE buildid IS NULL -- not assigned yet ) p , LATERAL ( SELECT b.gid AS b_gid , round(ST_Distance(p.geom31467 , ST_Transform(b.centroid, 31467))::numeric, 2) AS dist -- see below FROM buildings b LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid -- also not assigned ... WHERE p1.buildid IS NULL -- ... yet -- AND p.gemname = b.gemname -- not needed for performance, see below ORDER BY p.geom31467 <-> ST_Transform(b.centroid, 31467) LIMIT 1 ) b ORDER BY b_gid, dist, pv_gid -- tie breaker ) sub WHERE p1.gid = sub.pv_gid;
我使用每个建筑物DISTINCT ON (b_gid)
减少到恰好一排,选择最短距离的PV.细节:
选择每个GROUP BY组中的第一行?
对于最接近一个PV的任何建筑物,仅分配最近的PV.PK列gid
(别名pv_gid
)如果两个同样接近,则作为决胜局.在这种情况下,某些PV将从更新中删除并保持未分配状态.重复查询,直到分配完所有PV.
不过,这仍然是一种简单的算法.看看上面的图表,这将建筑物4分配给PV 4,将建筑物5分配给PV 5,而4-5和5-4可能是一个更好的解决方案......
dist
列的类型目前你用numeric
它.您的原始查询分配了一个常量integer
,没有任何一点numeric
.
在我的新查询中ST_Distance()
返回以米为单位的实际距离double precision
.如果我们只是指定我们在numeric
数据类型中得到15个左右的小数位数,那么这个数字就不那么精确了.我严重怀疑你想浪费存储空间.
我宁愿double precision
从计算中保存原文.或者,更好的是,根据需要进行舍入.如果仪表足够精确,只需投射并保存integer
(自动舍入数字).或者首先乘以100以保存cm:
(ST_Distance(...) * 100)::int