Skip to content

Configuration options

Bart Noordervliet edited this page Jan 24, 2022 · 23 revisions

This page documents all the configuration items available in the xml-to-postgres YAML format. It is expected that you'll have a separate config for each type of XML document that you'll be converting.

Path specifications

A path is simply a slash-separated series of nested tags you need to traverse in the XML document to get to the required node. It's like the most basic application of XPath, so in a document like this:

<A>
  <B>
    <C>id1</C>
  </B>
  <B>
    <C>id2</C>
  </B>
</A>

Use path /A/B for the row entries and then path /C for the id column.

Options

  • name <string> [required] Sets the name of the main SQL table for this dataset
  • path <string> [required] Sets the path to the repeating element that contains the row entries
  • file <string> Sets the output filename for the main table; if not present the data will be sent to stdout
  • skip <string> Defines a sub-path to skip entirely (purely for performance reasons)
  • cols <array of objects> [required] Defines the columns for the main table
    • name <string> [required] Sets the name of this column
    • path <string> [required] Sets the path to the data for this column
    • attr <string> Causes the data for this column to be taken from the named attribute rather than the text node of the element
    • find <string> Sets a text string to find (and replace through the next field) in the data for this column
    • repl <string> Sets the replacement string to be used with find
    • conv <string> Enables a conversion function to be run on the data for this column; currently available functions are:
      • xmltotext Converts all child nodes to a single text column, including XML tags and attributes
      • gmltoewkb Interprets the data as GML and converts it into EWKB for fast importing into PostGIS (WARNING)
    • mult <boolean> Forces the use of multitype (e.g. MultiPolygon) with EWKB output (only relevant with conv: gmltoewkb)
    • bbox <string> Limits the output features within the requested boundary box (format: "minx,miny maxx,maxy", only relevant with conv: gmltoewkb)
    • cons <string> Enables a consolidation function to deal with repeating elements; currently available functions are:
      • first Takes the first occurrence (in XML document order) and ignores any others
      • append Concatenates all occurrences into a single string separated by commas
    • incl <regex string> Filters the result set by a regex match on this column; a non-match will cause the entire row to be skipped
    • excl <regex string> Filters the result set (after include above) by a regex match on this column; a match will cause the entire row to be skipped
    • hide <boolean> Omits the column from the output; enable this to use a column with incl/excl but not include it in the output
    • cols <array of objects> Defines this path as the start of a 'subtable', a nested set of columns that will be saved into a separate table with a one-to-many relationship to the parent table. The value of the first column of the parent table will be saved as the first column of the subtable, to be used as a foreign key. This set of columns can use any option documented above. The subtable column itself is required to have a file field, because the subtable output cannot be mixed with the main table.
Clone this wiki locally