Description
What's the problem this feature will solve?
Currently in the PyPI logged-in UI, we show the IP address that performed certain actions to the user:
I don't know my own IP offhand. Especially if there are multiple different IPs listed here, I would need to manually look up the approximate location where these came from to get an idea of whether they were actually me or not.
Describe the solution you'd like
It would be nice if PyPI also showed me an (approximate) location for any given IP address as well, so I could easily visually filter ones that seem incorrect, e.g.:
Event | Date / time | IP address |
---|---|---|
Logged in | less than 10 seconds ago | 11.22.11.22 (Austin TX USA) |
Logged in | June 22, 2020 | 22.33.22.33 (Austin TX USA) |
Logged in | June 19, 2020 | 44.55.44.55 (Timbuktu, Mali) |
Logged in | June 19, 2020 | 66.77.66.77 (Austin TX, USA) |
Additional context
This shouldn't require external API calls. Using something like https://pypi.org/project/geoip2/ with an embedded database like https://dev.maxmind.com/geoip/geoip2/geolite2/ would probably work.
Ideally this would be determined on the fly and not stored anywhere (e.g. along with the IP address), so if we someday replaced the mechanism with something more precise (or just updated the embedded DB) the updates would be immediately reflected.
Todo list
- Use Fastly's geolocation services to determine geographic location at edge
- Hash & salt IP addresses at edge, pass those to our backends/logs (populate a X-PyPI-Hashed-IP header.
- Replace IP addresses in gunicorn logs
- Begin storing hashed IPs everywhere for all events
- Replace IP addresses in the user-facing UI (user/project events) with corresponding geolocation data
-
Replace IP addresses in journals with corresponding hashed IP - Drop submitted_from column from journals table (duplicated in ProjectEvent)
- For all events tables, change IP columns to CIText, retroactively hash IP addresses (geoIP data will be missing)
- For Admin IP addresses table, data migration to retroactively hash the IP addresses.
- Drop X-Fastly-IP header at edge