Skip to content

Configuration options

Bart Noordervliet edited this page Jan 24, 2022 · 23 revisions

This page documents all the configuration items available in the xml-to-postgres YAML format. It is expected that you'll have a separate config for each type of XML document that you'll be converting.

  • name [required] Sets the name of the main SQL table for this dataset
  • path [required] Sets the path to the repeating tag that contains the row entries
  • file Sets the output filename for the main table; if not present the data will be sent to stdout
  • skip Defines a sub-path to skip entirely (purely for performance reasons)
  • cols [required] Defines the columns for the main table
    • name [required] Sets the name of this column
    • path [required] Sets the path to the data for this column
    • attr Causes the data for this column to be taken from the named attribute rather than the text node of the element
    • find Sets a text string to find (and replace through the next field) in the data for this column
    • repl Sets the replacement string to be used with find
    • conv Enables a conversion function to be run on the data for this column; currently available functions are:
      • xmltotext Converts all child nodes to a single text column, including XML tags and attributes
      • gmltoewkb Interprets the data as GML and converts it into EWKB for fast importing into PostGIS (WARNING experimental)
    • mult Forces the use of multitype (e.g. MultiPolygon) with EWKB output (only relevant with conv: gmltoewkb)
    • bbox Limits the output features within the requested boundary box (format: "minx,miny maxx,maxy", only relevant with conv: gmltoewkb)
    • cons Enables a consolidation function to deal with repeating elements; currently available functions are:
      • first Takes the first occurrence (in XML document order) and ignores any others
      • append Concatenates all occurrences into a single string separated by commas
    • incl Filters the result set by a regex match on this column; a non-match will cause the entire row to be skipped
    • excl Filters the result set (after include above) by a regex match on this column; a match will cause the entire row to be skipped
    • hide Omits the column from the output; enable this to use a column with incl/excl but not include it in the output
    • cols Defines this path as the start of a 'subtable', a nested set of columns that will be saved into a separate table with a one-to-many relationship to the parent table. The value of the first column of the parent table will be saved as the first column of the subtable, to be used as a foreign key. This set of columns can use any option documented above. The subtable column itself is required to have a file field, because the subtable output cannot be mixed with the main table.
Clone this wiki locally