Skip to content

Functional Specification

djstroky edited this page Jan 2, 2014 · 3 revisions

This is the functional specification page for the geocoder project. Some of the items listed here may not yet be built, but this document provides a guideline for such things.

Long Term Vision

The long term vision for the project is to create a standardized geocoding tool that loads data to be geocoded and then provides a server framework for performing geocode searches.

Current Work Effort

This project will be initially focused on correcting improperly stored geocoding data.

Use Case Scenario

Improper Addresses

Bill is trying to make a geocoder that has decent quality data. Unfortunately for Bill, his only source of full data is provided by an severly understaffed organization that is known for hiring monkeys to throw darts at maps on the wall as a method of assigning gps coordinates to certain addresses that the regular staff does do correctly but just doesn't have the time to do. As can be seen a lot of these points are severly incorrect and thus Bill has made a fixing script that corrects these gps coordinates to the correct point based on the other fields in those particular database records.

Core Functionality

  • Gather and load data into database
  • Fix bad data
  • Output data (to SOLR)
  • Track process performance

Gathering Raw Data

A variety of data sources will be used and the raw data will be obtained from the organization(s) that publish the data. A current short list of data will be Open Street Map Data, master address files, address exceptions, address ranges and interfaces to database tables with gps data. Each of these data sources will have an importing step and a loading step that can be called separately or together.
After all of the data has been gathered, this data will be loaded into various database tables in a standardized format.

Open Street Map Data

There will be a script that downloads data from Open Street Map for a bounding box that is defined in the settings of the importing configuration. Although various options can be used within the OSM dataset, for now only intersections will be used in the geocoding result data.

Master Address File

There will be a script that will download a master address file. For this project a master address file is downloaded from Metro (a regional agency that provides a master address file for the Portland, OR Metro area).

Address Exceptions

There will be a particular file or folder where address exceptions can be put such that they will be used to override other addresses imported from other sources. The address exception files are also produced by this program in the data fixing steps.

Address Ranges

There will be a script that downloads address ranges from TIGER for a defined applicable area.

Database Tables

There will be a helper script that will be used to connect to various databases and extract gps data from certain tables which can be configured in the importing configuration.

Fixing Bad Data

Some kind of script pulls data from the database and analyzes it and corrects the bad records. The bad records will also be noted in an address exception file.

Outputing data (to SOLR)

The corrected data is loaded into the geocode search engine (Solr)

Tracking Performance

For now, some flat text file will be outputted detailing how long each process took, when it started, when it finished, how many records were processed and if applicable how many errors were found.