Skip to content
This repository was archived by the owner on Feb 16, 2022. It is now read-only.

Tile Generation - TTL Exceeded #4

Closed
d3mon187 opened this issue Apr 6, 2020 · 8 comments
Closed

Tile Generation - TTL Exceeded #4

d3mon187 opened this issue Apr 6, 2020 · 8 comments

Comments

@d3mon187
Copy link

d3mon187 commented Apr 6, 2020

Hey @thesocialdev , I've managed to get zoom levels 0-12 generating ok in speeds comparable to the first section on this page . On Zoom 13+ however, it fails with TTL exceeded. I see on that page it was a problem there, but haven't seen anyone else mentioning the problem. Do you have any more information on this issue since then? I've got heap size, ttl, and heartbeat turned up, but maybe I'm still not at the levels needed?
Also, I see on that page that it claims a new way of generating was found that was way quicker, but the link doesn't really seem to show anything new. My main goal is to be able to render the planet to zoom 15 within a month, and then hopefully be able to run off of planet diffs after that. I have one server I have dedicated to rendering, and then two production servers that I would like to copy the data to and server the maps from. Any info you can send my way on that stuff would be hugely appreciated as well.
Thanks, and stay safe!

@thesocialdev
Copy link
Owner

thesocialdev commented Apr 6, 2020

@d3mon187 I suggest you tweak tileTimeOut, see. TBH we gave up with full planet tile generation for this and other unknown issues that was happening, since our resources on maps are constrained we copied a dump of the existing tiles on the old server and generate those tiles in order to have parity. On another note, I was sketching a possible change for tilerator in order to avoid unnecessary processing in future iterations, see.

I hope this helps. Stay safe!

@d3mon187
Copy link
Author

d3mon187 commented Apr 8, 2020

Thanks for the heads up. I managed to get Zoom 13 rendered in about 24 hours, and will start on 14 soon. I think the main issue might be down to heap size. Seems like upping that fixes my problems with the ttl exceeded errors, but I haven't done any definitive tests. Will see how 14 goes, and then of course 15 will take a while to check.

Your idea of using rendering history is exactly what I would expect. I'd think you would want to make a list of all tiles at zooms 13+ that were requested more than once, and then load that to a database. Maybe you could get good statistics off the varnish servers if they log that sort of stuff? I would think avoiding pre-rendering all the fields and even just long stretches of roads that no one will ever zoom up on would be a huge help. One question though. I haven't messed with the front end of Kartotherian much yet, but when you zoom into a section of tiles that hasn't been pre-rendered to Cassandra, can Kartotherian request the tiles be rendered through genview and saved to cassandra? Playing with it today it only seems to work with data from pre-rendered zoom levels.

@d3mon187
Copy link
Author

d3mon187 commented Apr 9, 2020

Oh man @thesocialdev , I finally feel like my brain is clicking this stuff into place. Sorry to keep bothering you!

So stop me if I'm wrong. From what I understand, currently Wikimedia starts with a full render down to level 14, using checkzoom -1 to speed things up. For updating, zoom 14's tile list is dumped to a text file and zooms 10-15 are redone using this tile list and fileZoomOverride. Osmosis is used to keep the osm data up to date, and shp2pgsql waterpolygons are re-imported once a month. It appears Kartotherian service can be run with only access to cassandra, and doesn't need tilerator or a postgresql connection. So you could run one server that runs postgres, generates tiles, and updates a cassandra cluster, and another server that serves Kartotherian and uses data from the cassandra cluster. Does that sound right, and like a reasonable plan?

If so, then just a few more questions.

  1. Do you happen to know how often Wikimedia is regenerating tiles? Weekly, monthly, or are they updating off of changed tiles list from osmosis?
  2. By running fileZoomOverride on zooms 10-15 from 14s list, is that still giving full detail to z15, and is there any real benefit to rendering zooms lower than 15?
  3. Does Cassandra ever need to be purged, or will data just be overwritten?

@thesocialdev
Copy link
Owner

thesocialdev commented Apr 9, 2020

Oh man @thesocialdev , I finally feel like my brain is clicking this stuff into place. Sorry to keep bothering you!

It's not bothering at all, keep going!

So stop me if I'm wrong. From what I understand, currently Wikimedia starts with a full render down to level 14, using checkzoom -1 to speed things up. For updating, zoom 14's tile list is dumped to a text file and zooms 10-15 are redone using this tile list and fileZoomOverride.

Last full planet regeneration we triggered regeneration for dumped text file for each zoom, it avoids no-data tiles.

Osmosis is used to keep the osm data up to date, and shp2pgsql waterpolygons are re-imported once a month.

Yep.

It appears Kartotherian service can be run with only access to cassandra, and doesn't need tilerator or a postgresql connection.

Yep. As soon as you have a vector-tile storage. One possible upgrade for kartotherian would be generating tiles on the fly by querying vector-tiles directly from OSM DB and tilerator could be sunset, credits to Yuri Astrakhan.

So you could run one server that runs postgres, generates tiles, and updates a cassandra cluster, and another server that serves Kartotherian and uses data from the cassandra cluster. Does that sound right, and like a reasonable plan?

That's a reasonable plan, we keep everything in the same server because of simplicity, but the critical production point of maps is Kartotherian tile server.

  1. Do you happen to know how often Wikimedia is regenerating tiles? Weekly, monthly, or are they updating off of changed tiles list from osmosis?

We regenerate by the same rate we use osmosis, currently every 12 hours but used to be 24h, although this is still under test.

  1. By running fileZoomOverride on zooms 10-15 from 14s list, is that still giving full detail to z15, and is there any real benefit to rendering zooms lower than 15?

I guess it depends on your need for accuracy, in high-density areas, there are a lot of data so it can be heavy and should be enough with z15 data, but definitely would have fewer details with only z14 data, see.

  1. Does Cassandra ever need to be purged, or will data just be overwritten?

Data is overwritten, it's size can increase in 2 situations:

  1. A new tile is replaced with a bigger tile because someone added more info to some specific OSM relation
  2. You generated a tile that wasn't generated before (it can happen).

I would love to see your work with maps in the form of a blog-post sometime, please do it! Be safe.

@d3mon187
Copy link
Author

Excellent info @thesocialdev , glad to finally feel like I have a grasp.

Would you guys happen to have those dumped text files for the different zooms available anywhere?

I've mostly got updates working, aside from still needing to do some testing. I ran into a huge problem with the updates initially where a day's update was taking 22hrs to run. Turns out it was another issue with PG 12 and osm2pgsql, and was fixed by turning off jit AND setting max parallel workers to 0. If you guys play around with version 11 or 12, make sure to keep those two settings in mind. I had already turned jit off, but the max workers was a new one. More info here - osm2pgsql-dev/osm2pgsql#1045
I may revert back to 10, but it seems a shame to not take advantage of years of development. :\

I'd be happy to post up some of the stuff I've found and figured out. If Wikimedia has a good spot for it then I can certainly type it up there.

Thanks again!

@d3mon187
Copy link
Author

Looking like updating daily shouldn't be a problem. Taking about 45 min for osmosis import, and then I estimate maybe 4 or 5 hours to generate tiles. Not absolutely sure on that though.

Do you know if adding checkzoom would help when feeding it an osm2pgsql dirty tiles dump through filepath? I have trouble passing checkZoom through tileshell.js because it complains "-1" is a boolean, and for some reason was saying it had to be a max of 0 when I was trying to pass it 9 for zoom 10 when using filepath. In the past I was able to use one less for checkZoom, so maybe it's not compatible with tile dumps?

@wikiocity
Copy link

A large amount of ram is required to generate zoom 12+. There appears to be a memory leak with some of the promises not being cleared, which accumulate until the job is finished and a new one is started under the same process. I haven't been able to pinpoint the bad promise variable yet, but if you have enough ram then you can increase memory limits in nodejs to get around it.

@thesocialdev
Copy link
Owner

@wikiocity and @d3mon187 could you please move this issue to the official Phabricator board?

Also, I would like to mention that we are planning to move away from Tilerator and use Tegola, and integrate it with kartotherian.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants