-
Notifications
You must be signed in to change notification settings - Fork 1
Functional Specification
This is the functional specification page for the geocoder project. Some of the items listed here may not yet be built, but this document provides a guideline for such things.
The long term vision for the project is to create a standardized geocoding tool that loads data to be geocoded and then provides a server framework for performing geocode searches.
This project will be initially focused on correcting improperly stored geocoding data.
Improper Addresses
Bill is trying to make a geocoder that has decent quality data. Unfortunately for Bill, his only source of full data is provided by an severly understaffed organization that is known for hiring monkeys to throw darts at maps on the wall as a method of assigning gps coordinates to certain addresses that the regular staff does do correctly but just doesn't have the time to do. As can be seen a lot of these points are severly incorrect and thus Bill has made a fixing script that corrects these gps coordinates to the correct point based on the other fields in those particular database records.
- Gather and load data into database
- Fix bad data
- Output data (to SOLR)
- Track process performance
A variety of data sources will be used and the raw data will be obtained from the organization(s) that publish the data. A current short list of data will be Open Street Map Data, master address files, address exceptions, address ranges and interfaces to database tables with gps data. Each of these data sources will have an importing step and a loading step that can be called separately or together.
After all of the data has been gathered, this data will be loaded into various database tables in a standardized format.
There will be a script that downloads data from Open Street Map for a bounding box that is defined in the settings of the importing configuration. Although various options can be used within the OSM dataset, for now only intersections will be used in the geocoding result data.
There will be a script that will download a master address file. For this project a master address file is downloaded from Metro (a regional agency that provides a master address file for the Portland, OR Metro area).
There will be a particular file or folder where address exceptions can be put such that they will be used to override other addresses imported from other sources. The address exception files are also produced by this program in the data fixing steps.
There will be a script that downloads address ranges from TIGER for a defined applicable area.
There will be a helper script that will be used to connect to various databases and extract gps data from certain tables which can be configured in the importing configuration.
Some kind of script pulls data from the database and analyzes it and corrects the bad records. The bad records will also be noted in an address exception file.
The corrected data is loaded into the geocode search engine (Solr)
For now, some flat text file will be outputted detailing how long each process took, when it started, when it finished, how many records were processed and if applicable how many errors were found.