Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
anerv committed Jan 18, 2023
1 parent b25c663 commit aea64df
Show file tree
Hide file tree
Showing 201 changed files with 994,234 additions and 1 deletion.
29 changes: 29 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.DS_Store
*.ipynb_checkpoints
src/___pycache__/
*.pyc
__pycache__/
scripts/cache
src/cache
.vscode/settings.json

data/OSM/
data/REFERENCE/cph_geodk/processed

results/OSM/cph_geodk/data/
results/REFERENCE/cph_geodk/data/
results/COMPARE/cph_geodk/data/

exports/cph_geodk/pdf/1a.pdf
exports/cph_geodk/pdf/1b.pdf
exports/cph_geodk/pdf/2a.pdf
exports/cph_geodk/pdf/2b.pdf
exports/cph_geodk/pdf/3a.html
exports/cph_geodk/pdf/preamble.pdf
exports/cph_geodk/pdf/titlepage.pdf
exports/cph_geodk/pdf/3a.pdf
exports/cph_geodk/pdf/3b.pdf
exports/cph_geodk/pdf/appendix_a.pdf
exports/cph_geodk/pdf/report_lowres.pdf


661 changes: 661 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

240 changes: 239 additions & 1 deletion README.md

Large diffs are not rendered by default.

173 changes: 173 additions & 0 deletions config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Provide the name of the study area and the name of the reference data, if available, as a human-readable string (max 22 characters).
# This will be used for:
# - plot labelling
# - result labelling
# - exported reports

area_name: "Copenhagen"
reference_name: "GeoDanmark"

# Provide the name of the study area/project as a slug.
# (Use https://you.tools/slugify/ if unsure)
# This will be used for:
# - folder and subfolder structure setup
study_area: "cph_geodk"

# Provide the CRS which will be used throughout the anaylysis.
# This must be a projected CRS with meters as unit length.
study_crs: 'EPSG:25832' # The CRS you want to use for the analysis.

# Choose whether plots should be saved in low or high resolution.
# If 'low', plots are saved as png. If 'high', plots are saved as svg (this will lead to significantly larger files)
plot_resolution: 'low'

# Queries used to retrieve the network edges with dedicated bicycle infrastructure from OSM street network data. Update as needed.
bicycle_infrastructure_queries:
A:
"highway == 'cycleway'"
B:
"cycleway in ['lane','track','opposite_lane','opposite_track','designated','crossing']" #'shared_lane',
C:
"cycleway_left in ['lane','track','opposite_lane','opposite_track','designated','crossing']" # should shared_busway be included? 'shared_lane'
D:
"cycleway_right in ['lane','track','opposite_lane','opposite_track','designated','crossing']" # 'shared_lane'
E:
"cycleway_both in ['lane','track','opposite_lane','opposite_track','designated','crossing']" # 'shared_lane'

osm_bicycle_infrastructure_type:
'protected':
- "highway == 'cycleway'"
- "cycleway in ['track','opposite_track']"
- "cycleway_left in ['track','opposite_track']"
- "cycleway_right in ['track','opposite_track']"
- "cycleway_both in ['track','opposite_track']"

'unprotected':
- "cycleway in ['lane','opposite_lane','crossing']" # 'shared_lane'
- "cycleway_left in ['lane','opposite_lane','crossing']" # 'shared_lane'
- "cycleway_right in ['lane','opposite_lane','crossing']" # 'shared_lane'
- "cycleway_both in ['lane','opposite_lane','crossing']" # 'shared_lane'

'unknown':
- "cycleway in ['designated']"
- "cycleway_left in ['designated']"
- "cycleway_right in ['designated']"
- "cycleway_both in ['designated']"


# Define tags to be downloaded from OSM here. Note that any non-standard tags used in the custom filter must be included here.
osm_way_tags:
- "access"
- "barrier"
- "bridge"
- "bicycle"
- "bicycle_road"
- "crossing"
- "cycleway"
- "cycleway:left"
- "cycleway:right"
- "cycleway:both"
- "cycleway:buffer"
- "cycleway:left:buffer"
- "cycleway:right:buffer"
- "cycleway:both:buffer"
- "cycleway:width"
- "cycleway:left:width"
- "cycleway:right:width"
- "cycleway:both:width"
- "cycleway:surface"
- "foot"
- "footway"
- "highway"
- "incline"
- "junction"
- "layer"
- "lit"
- "maxspeed"
- "maxspeed:advisory"
- "moped"
- "moter_vehicle"
- "motorcar"
- "name"
- "oneway"
- "oneway:bicycle"
- "osm_id"
- "segregated"
- "surface"
- "tracktype"
- "tunnel"
- "width"


# Define tags to be analysed when evaluating the number of existing/missing tags.
# Must be in the form of a nested dictionary.
# The first keys indicate the overall attribute analysed. The sub-keys indicate which columns to look at depending on whether the OSM feature is mapped as a feature on a centerline or the as an individual geometry.
# For example, if highway = 'cycleway' the feature is mapped as an individual geometry. In this case, the tag 'width', if filled out, describes the width of the cycleway.
# On the other hand, if highway = 'primary' and cycleway = 'track' the bicycle infrastructure is mapped as an atttribute to a road centerline and the 'width' tag refers to the main road. In this instance only the 'cycleway_width' columns is of interest.
existing_tag_analysis:
surface:
true_geometries:
- surface
- cycleway_surface
centerline:
- cycleway_surface
width:
true_geometries:
- width
- cycleway_width
- cycleway_left_width
- cycleway_right_width
- cycleway_both_width
centerline:
- cycleway_width
- cycleway_left_width
- cycleway_right_width
- cycleway_both_width
speedlimit:
all:
- maxspeed
lit:
all:
- lit

# Define tags that are considered incompatible and a sign of errors in the OSM data.
# For example, if an element has been defined as bicycle_infrastructure = 'yes' earlier in the analysis, it should not have bicycle = 'no' or bicycle = 'dismount' or car = 'yes'
incompatible_tags_analysis:
bicycle_infrastructure:
'yes':
- ['bicycle','no']
- ['bicycle','dismount']
- ['car','yes']


# Define the desired width of grid cell size in meters for the grid used for local summaries of analysis resutls.
# When evaluating the quality of road network data, using cell sizes of 1 km is usually the default (see e.g. Koukoletsos et al, 2011; Haklay, 2010; Neis et al., 2011)
# Smaller cell sizes give a better granularity, but will make some elements of the analysis slower to compute
grid_cell_size: 300

# Specify whether the bicycle infrastructure in the reference data have been mapped as centerlines or true geometries.
# Describes whether bicycle infrastructure is digitised as one line per road segment (regardless of whether there are bicycle infrastructure along both sides)
# or if there are two distinct geometries mapped in situations with a bike path/track on both sides of a street.
# Can be a value describing the whole dataset or the name of the column describing the situation for each row.
# Valid values are: 'centerline' or 'true_geometries' or a string with the name of the column with either True or False for each geometry.
reference_geometries: true_geometries # Alternative: centerline.

# Specify whether the infrastructure geometries are designed for travelling in both directions (i.e. bidirectional) or only one way.
# This information is used to assess the true value of the network so that e.g. broad bidirectional bike lanes' length are counted twice since they represent the same infrastructure as two narrow lanes on each side of the road.
# If geometries are mapped as centerlines but represent infrastructure in both sides of the street (see 'reference_geometries'), this column should also be set to True.
# Can be a value describing the whole dataset or the name of the column describing the situation for each row.
# Valid values are: True or False or a string with the name of the column with either True or False for each geometry.
bidirectional: False

# Specify a dictionary used for classifying segments of bicycle infrastructure as protected or unprotected.
# For protected, unprotected or mixed (protected on one side, unprotected on the other side) specify the query defining the type
ref_bicycle_infrastructure_type:
protected:
- "vejklasse == 'Cykelsti langs vej'"
unprotected:
- "vejklasse == 'Cykelbane langs vej'"
# mixed: # Only provide mixed query if relevant
# - ""

# Specify the column name (string) of the column in the reference data with the unique ID for each row/feature.
reference_id_col: 'fot_id'
Binary file added data/REFERENCE/cph_geodk/raw/reference_data.gpkg
Binary file not shown.
Binary file not shown.
93 changes: 93 additions & 0 deletions datasetrequirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Data set requirements for BikeDNA

## Study area input requirements

- The study area must be defined by a **polygon** in `gpkg` format. **Note**: If a different file name or file extension is used, the file paths in notebooks 1a and 2a must be updated. The file must be in a format readable by [GeoPandas](https://geopandas.org/en/stable/docs/user_guide/io.html) (e.g., GeoPackage, GeoJSON, Shapefile etc.).
- The polygon must be placed in the folder structure as follows: `/data/study_area_polygon/'my_study_area'/study_area_polygon.gpkg`
- The polygon must be in a projected CRS with meters as unit length

## OSM settings

### Custom filter

The queries in `config.yml` provides one way of getting the designated bicycle infrastructure from OSM data. What is considered bicycle infrastructure - and how it is tagged in OSM - is however highly contextual. If you want to use your own filter for retrieving bicycle infrastructure, set `use_custom_filter` to *True* and provide the custom filter under the `custom_filter` variable. For an example of how it should be formatted, see the provided filter `bicycle_infrastructure_queries`.

*Please note that all ':' in OSM column names are replaced with '_' in the preprocessing of the data to enable using pandas.query without errors.*

### OSM infrastructure type

Similarly, the `config.yml` contains a dictionary with queries used to classify all OSM edges as either protected, unprotected, or mixed (if there is protected infrastructure in one side and unprotected on the other side). Update if needed, but note that it should correspond to the queries used to define the bicycle infrastructure - i.e., all edges must be classified as either protected, unprotected, or mixed.

### Missing tag analysis

In the intrinsic analysis, one element is to analyze how many edges have values for attributes commonly considered important for evaluating bike friendliness. If you want to change which tags are analyzed, modify the dictionary `missing_tags_analysis`. Please note that the relevant tags might depend on the geometry type (i.e. center line or true geometry, see below).

### Incompatible tags analysis

OSM has guidelines, but no restrictions on how tags can be combined. This sometimes results in contradictory information, for example when a path is both tagged as *'highway=cycleway'* and *'bicycle=dismount'*. The default configuration includes a dictionary with a few examples of tag combinations that we consider incompatible, but more entries can be added.

The dictionary is a nested dictionary, where the first key is a sub-dictionary with the name of the column - e.g., *'bicycle_infrastructure'*. The dictionary value for *'bicycle_infrastructure'* is the actual value for the column bicycle_infrastructure (e.g., *'yes'*), that is considered incompatible with a list of column-value combinations, available as a list of values for the sub-dictionary under *'yes'* as a key.

## Reference data input requirements

If the extrinsic analysis is to be performed:

- The reference datase must be a GeoPackage called `reference_data.gpkg`. If a different file name or file extension is used, the file path in notebook 2a must be updated. The file must be in a format readable by [GeoPandas](https://geopandas.org/en/stable/docs/user_guide/io.html) (e.g., GeoPackage, GeoJSON, Shapefile etc.).
- The reference dataset must be placed in the folder structure as follows: `/data/reference/'my_study_area'/raw/reference_data.gpkg`

For the code and the analysis to run without errors, the data must:

- only contain **bicycle infrastructure** (i.e. not also the regular street network)
- have all geometries as **LineStrings** (not MultiLineStrings)
- have **all intersections** represented as LineString endpoints
- be in a **CRS** recognized by GeoPandas
- contain a column describing the **type of bicycle infrastructure**, i.e. whether each feature is a physically **protected**/separated infrastructure or if it is **unprotected** (*feature* refers to a network edge - each row in the network edge GeoDataFrames thus represents one feature)
- contain a column describing whether each feature is **bidirectional** or not (see below for details)
- contain a column describing how features have been digitized (**'geometry type'**) (see below for details)
- contain a column with a unique **ID** for each feature

For an example of how a municipal dataset with bicycle infrastructure can be converted to the above format, see the notebooks [reference_data_preparation_01](scripts/examples/reference_data_preparation_01.ipynb) and [reference_data_preparation_02](scripts/examples/reference_data_preparation_02.ipynb) for workflows for preprocessing two different public Danish datasets on bicycle infrastructure.

### Reference Geometries

In the *config.yml*, the setting `reference_geometries` refers to how the bicycle infrastructure have been digitized. The analysis operates with two different scenarios: either the bicycle infrastructure has been mapped as an attribute to the center line of the road (this is often done when the bicycle infrastructure is running along or are part of a street with car traffic) *or* it has been digitized as its own geometry.
In the first scenario you will only have one line, even in situations with a cycle track on each side of the street, while two cycle tracks on each side will result in two lines in the second scenario.

If a dataset only includes one type of mapping bicycle infrastructure, you can simply set `reference_geometries` to either *'centerline'* or *'true_geometries'*.

If the data, like OSM, includes a variation of both, the data must contain a column named *'reference_geometries'* with values being either *'centerline'* or *'true_geometries'*, specifying the digitization method for each feature.

The illustration below shows a situation where the same bicycle infrastructure has been mapped in two different ways. The blue line is a center line mapping, while the red lines are from a dataset that digitizes all bicycle infrastructure as individual geometries.

<p align="center"><img src='images/geometry_types_illustration.png' width=500/></p>

### Cycling directions

Due to the different ways of mapping geometries described above, datasets of the same area will have vastly different lengths if you do not consider that the blue line on the illustration above is bidirectional, while the red lines are not. To enable more accurate comparisons of length differences, the data must either contain a column *'bidirectional'* with values either True or False, indicating whether each feature allows for bicycle in both directions or not.
If all features in the reference dataset have the same value, you can simply set `bidirectional` as either *True* or *False* in the `config.yml`.

<p align="center"><img src='images/bidirectional_illustration.png' width=500/></p>

### Bicycle infrastructure type

The 'bicycle infrastructure' type simply refers to whether infrastructure is protected (i.e. physically separated from car traffic) or unprotected (e.g. a bike path only marked with paint).

The setting requires a dictionary, `ref_bicycle_infrastructure_type` with two entries: `protected` and `unprotected`. For each entry a list of queries must be provided that returns respectively the protected or unprotected infrastructure.

For example, the query `"vejklasse == 'Cykelsti langs vej'"` returns all the protected bicycle infrastructure in the test data from GeoDanmark available in the repository.

<p align="center">

<img src='images/track_illustration.jpeg' width=250/>

*Protected cycle track. Attribution: [wiki.openstreetmap](https://wiki.openstreetmap.org/wiki/File:Sciezki_wroclaw_wyspianskiego_1.jpg)*

</p>

<p align="center">

<img src='images/cycle_lane_illustration.jpeg' width=380/>

*Unprotected cycle lane. Attribution: [wiki.openstreetmap](https://wiki.openstreetmap.org/wiki/File:Fietsstrook_Herenweg_Oudorp.jpg)*

</p>
Loading

0 comments on commit aea64df

Please sign in to comment.