-
-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non moving node updates should not update related way/relation geometries #1046
Comments
I've implemented the test for location change, and it shows a 10x speedup in the node parsing step for nodes that did not move. In the diffs I've used for my tests, a large portion of nodes had new coordinates, so the speedup is reduced but still there for the others (from 10 to 50% in my tests). The benefit is not limited to the node parsing... but cascades to ways and relations as fewer pending ones need to be checked and updated. |
Do you like to do a quick PR? I can't promise that we take it as is but it would be easier to discuss the implications with the actual code. |
I'll do. It's only a few lines change. Be aware... I've not written much C++ in my life, it may hurt purists ;) |
Just pushed the PR: #1059 |
@cquest Can you elaborate on what you tested exactly and how the numbers changed with and without this patch? There are several issues here that could affect timing: With the patch fewer ways (and relations) will be marked as (possibly) changed. On the other hand we have to lookup the old location of all nodes, which is cheap if we are using flat node store, but possibly expensive when using the database. All of this, of course, depends on what kind of node changes there are usually. To figure out the best way to do this we need some numbers not from isolated queries, but from real-world update runs. Does this, on avarage, speed up things considerably or not? How does the flat node factor in here? How does the bucket index? @cquest You speek of a 10% speed up which would be well worth it, but I am not sure what you have been measuring there. |
When a new version of a node has the same location as the old version of that node, we don't need to re-process ways and relations this node is a member of. See osm2pgsql-dev#1046 This is some preliminary experiments. It is not tested well. Currently only works with flat node file. Does this work in all cases? Can we be shure that the state reflected in the middle tables/the flat node file is the same as in the output tables? What happens if we re-start updating with an older diff file? What happens if a nodes is deleted and then re-surrected?
I did some statistics on node changes using the planet history file. Of the about 2 billion node changes (i.e. new versions of an existing non-deleted node) about one quarter did not change the location. Many of these will be for POIs which are not in a way or relation, so this is not so relevant here. But even if you only look at untagged nodes (no tags in old and new version), in about 10% of cases the location did not change. Presumably these are all members of a way or relation so they will trigger further processing unnecessarily. I did some experimentation in a branch, but it is not totally straightforward to change the current code, so this would need a lot more testing and especially benchmarking. It makes the code more complex, so this has to be evaluated thoroughly. If anybody want to play around with it, go ahead. At the moment I don't have the time to work on that. This is also part of a larger issue: How do we handle changed objects in a better and more flexible way instead of just deleting and inserting. And how do we detect early that changes in the OSM data do not actually result in changes in the database, thus allowing us to elide further processing through the processing chain from nodes to ways to relations, to generalization to tile invalidation. This needs a lot more thoughts. |
When a node is modified, osm2pgsql does not make a difference if it is just a tag change or a lat/lon change.
When the lat/lon is the same as the actual one in the database, there is no impact on ways and relations refering to that node and their geometry do not need to be updated.
I suggest to add a test in node_modify to check if the lat/lon has changed or not.
The actual code is very basic:
It could be more efficient:
The text was updated successfully, but these errors were encountered: