Skip to content

Configuration options

Bart Noordervliet edited this page Feb 18, 2022 · 23 revisions

This page documents all the configuration items available in the xml-to-postgres YAML format. It is expected that you'll have a separate config for each type of XML document that you'll be converting.

Path specifications

A path is simply a slash-separated series of nested tags you need to traverse in the XML document to get to the required node. It's like the most basic application of XPath, so in a document like this:

<A>
  <B>
    <C>id1</C>
  </B>
  <B>
    <C>id2</C>
  </B>
</A>

Use path /A/B for the row entries and then path /C for the id column.

No other XPath features are available at the moment, but they may be added if a good use-case turns up.

Options

  • name <string> [required] Sets the name of the main SQL table for this dataset
  • path <string> [required] Sets the path to the repeating element that contains the row entries
  • file <string> Sets the output filename for the main table; if not present the data will be sent to stdout
  • emit <string> A comma-separated string of additional SQL statements to be included in the output
    Currently available are:
    • copy_from Adds a "COPY <table> FROM stdin" statement
      This allows the data to be preceded/followed by other SQL statements
      As such all other emit options imply this option
    • create_table Adds a "CREATE TABLE" statement that defines a table based on the columns specified below
      The datatype of each column will be 'text' unless it has a specific type set
    • drop_table Adds a "DROP TABLE" statement
    • truncate Adds a "TRUNCATE" statement
    • start_trans Wraps all statements in an explicit transaction
  • skip <string> Defines a sub-path to skip entirely (purely for performance reasons)
  • cols <array of objects> [required] Defines the columns for the main table
    • name <string> [required] Sets the name of this column
    • path <string> [required] Sets the path to the data for this column
    • attr <string> Causes the data for this column to be taken from the named attribute rather than the text node of the element
    • find <string> Sets a text string to find (and replace through the next field) in the data for this column
    • repl <string> Sets the replacement string to be used with find
    • trim <boolean> Collapses any linebreaks and surrounding whitespace in text nodes into a single space character
    • conv <string> Enables a conversion function to be run on the data for this column
      Currently available functions are:
      • xmltotext Converts all child nodes to a single text column, including XML tags and attributes
      • gmltoewkb Interprets the data as GML and converts it into EWKB for fast importing into PostGIS (WARNING)
    • mult <boolean> Forces the use of multitype (e.g. MultiPolygon) with EWKB output (only relevant with conv: gmltoewkb)
    • bbox <string> Limits the output features within the requested boundary box (format: "minx,miny maxx,maxy", only relevant with conv: gmltoewkb)
    • cons <string> Enables a consolidation function to deal with repeating elements
      Currently available functions are:
      • first Takes the first occurrence (in XML document order) and ignores any others
      • append Concatenates all occurrences into a single string separated by commas
    • incl <regex string> Filters the result set by a regex match on this column; a non-match will cause the entire row to be skipped
    • excl <regex string> Filters the result set (after include above) by a regex match on this column; a match will cause the entire row to be skipped
    • hide <boolean> Omits the column from the output
      Enable this to use a column with incl/excl but not include it in the output
    • cols <array of objects> Defines this path as the start of a 'subtable', a nested set of columns that will be saved into a separate table with a one-to-many relationship to the parent table
      The value of the first column of the parent table will be saved as the first column of the subtable, to be used as a foreign key
      This set of columns can use any option documented above
      The subtable column itself is required to have a file field, because the subtable output cannot be mixed with the main table
Clone this wiki locally