Skip to content

Geolocate user IP addresses when presenting them in UI #8158

Open
@di

Description

@di

What's the problem this feature will solve?
Currently in the PyPI logged-in UI, we show the IP address that performed certain actions to the user:

Screen Shot 2020-06-24 at 6 37 55 PM

I don't know my own IP offhand. Especially if there are multiple different IPs listed here, I would need to manually look up the approximate location where these came from to get an idea of whether they were actually me or not.

Describe the solution you'd like
It would be nice if PyPI also showed me an (approximate) location for any given IP address as well, so I could easily visually filter ones that seem incorrect, e.g.:

Event Date / time IP address
Logged in less than 10 seconds ago 11.22.11.22 (Austin TX USA)
Logged in June 22, 2020 22.33.22.33 (Austin TX USA)
Logged in June 19, 2020 44.55.44.55 (Timbuktu, Mali)
Logged in June 19, 2020 66.77.66.77 (Austin TX, USA)

Additional context
This shouldn't require external API calls. Using something like https://pypi.org/project/geoip2/ with an embedded database like https://dev.maxmind.com/geoip/geoip2/geolite2/ would probably work.

Ideally this would be determined on the fly and not stored anywhere (e.g. along with the IP address), so if we someday replaced the mechanism with something more precise (or just updated the embedded DB) the updates would be immediately reflected.

Todo list

  • Use Fastly's geolocation services to determine geographic location at edge
  • Hash & salt IP addresses at edge, pass those to our backends/logs (populate a X-PyPI-Hashed-IP header.
  • Replace IP addresses in gunicorn logs
  • Begin storing hashed IPs everywhere for all events
  • Replace IP addresses in the user-facing UI (user/project events) with corresponding geolocation data
  • Replace IP addresses in journals with corresponding hashed IP
  • Drop submitted_from column from journals table (duplicated in ProjectEvent)
  • For all events tables, change IP columns to CIText, retroactively hash IP addresses (geoIP data will be missing)
  • For Admin IP addresses table, data migration to retroactively hash the IP addresses.
  • Drop X-Fastly-IP header at edge

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions