|
| 1 | +* [Programmability of Omniparser](#programmability-of-omniparser) |
| 2 | + * [Out\-of\-Box Basic Use Case](#out-of-box-basic-use-case) |
| 3 | + * [Add A New custom\_func](#add-a-new-custom_func) |
| 4 | + * [Add A New custom\_parse](#add-a-new-custom_parse) |
| 5 | + * [Add A New File Format](#add-a-new-file-format) |
| 6 | + * [Add A New Schema Handler](#add-a-new-schema-handler) |
| 7 | + * [Put All Together](#put-all-together) |
| 8 | + * [In Non\-Golang Environment](#in-non-golang-environment) |
| 9 | +* [Programmability of Some Components without Omniparser](#programmability-of-some-components-without-omniparser) |
| 10 | + * [Functions](#functions) |
| 11 | + * [IDR](#idr) |
| 12 | + * [CSV Reader](#csv-reader) |
| 13 | + * [Fixed\-Length Reader](#fixed-length-reader) |
| 14 | + * [EDI Reader](#edi-reader) |
| 15 | + * [JSON Reader](#json-reader) |
| 16 | + * [XML Reader](#xml-reader) |
| 17 | + |
| 18 | +# Programmability of Omniparser |
| 19 | + |
| 20 | +There are many ways to use omniparser in your code/service/app programmatically. |
| 21 | + |
| 22 | +## Out-of-Box Basic Use Case |
| 23 | + |
| 24 | +This is covered in [Getting Started](./gettingstarted.md#using-omniparser-programmatically), copy it |
| 25 | +here for completeness. |
| 26 | +``` |
| 27 | +schema, err := omniparser.NewSchema("your schema name", strings.NewReader("your schema content")) |
| 28 | +if err != nil { ... } |
| 29 | +transform, err := schema.NewTransform("your input name", strings.NewReader("your input content"), &transformctx.Ctx{}) |
| 30 | +if err != nil { ... } |
| 31 | +for { |
| 32 | + output, err := transform.Read() |
| 33 | + if err == io.EOF { |
| 34 | + break |
| 35 | + } |
| 36 | + if err != nil { ... } |
| 37 | + // output contains a []byte of the ingested and transformed record. |
| 38 | +} |
| 39 | +``` |
| 40 | +Note this out-of-box omniparser setup contains only the `omni.2.1` schema handler, meaning only schemas |
| 41 | +whose `parser_settings.version` is `omni.2.1` are supported. `omni.2.1.` schema handler's supported file |
| 42 | +formats include: delimited (CSV, TSV, etc), EDI, XML, JSON, fixed-length. `omni.2.1.` schema handler's |
| 43 | +supported built-in `custom_func`s are listed [here](./customfuncs.md). |
| 44 | + |
| 45 | +## Add A New `custom_func` |
| 46 | + |
| 47 | +If the built-in `custom_func`s are enough, you can add your own custom functions by |
| 48 | +[doing this](../extensions/omniv21/samples/customfileformats/jsonlog/sample_test.go) (note the linked |
| 49 | +sample does more than just adding a new `custom_func`): |
| 50 | +``` |
| 51 | +schema, err := omniparser.NewSchema( |
| 52 | + "your schema name", |
| 53 | + strings.NewReader("your schema content"), |
| 54 | + omniparser.Extension{ |
| 55 | + CreateSchemaHandler: omniv21.CreateSchemaHandler, |
| 56 | + CustomFuncs: customfuncs.Merge( |
| 57 | + customfuncs.CommonCustomFuncs, // global custom_funcs |
| 58 | + v21.OmniV21CustomFuncs, // omni.2.1 custom_funcs |
| 59 | + customfuncs.CustomFuncs{ |
| 60 | + "normalize_severity": normalizeSeverity, // <====== your own custom_funcs |
| 61 | + })}) |
| 62 | +if err != nil { ... } |
| 63 | +transform, err := schema.NewTransform("your input name", strings.NewReader("your input content"), &transformctx.Ctx{}) |
| 64 | +if err != nil { ... } |
| 65 | +for { |
| 66 | + output, err := transform.Read() |
| 67 | + if err == io.EOF { |
| 68 | + break |
| 69 | + } |
| 70 | + if err != nil { ... } |
| 71 | + // output contains a []byte of the ingested and transformed record. |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +Each `custom_func` must be a Golang function with the first param being `*transformctx.Ctx`. The rest |
| 76 | +params can be of any type, as long as they will match the types of data that are fed into the function |
| 77 | +in `transform_declarations`. |
| 78 | + |
| 79 | +## Add A New `custom_parse` |
| 80 | + |
| 81 | +There are several ways to customize transform logic, one of which is using the all mighty `custom_func` |
| 82 | +`javascript` (or its silibing `javascript_with_context`), see details |
| 83 | +[here](./use_of_custom_funcs.md#javascript-and-javascript_with_context). |
| 84 | + |
| 85 | +However currently we don't support multi-line javascript (yet), which makes writing complex transform |
| 86 | +logic in a single line javascript difficult to read and debug. Also there are situations where schema |
| 87 | +writers want the following: |
| 88 | +- native Golang code transform logic |
| 89 | +- logging/stats |
| 90 | +- better/thorough test coverage |
| 91 | +- more complexed operations like RPCs calls, encryption, etc, which isn't really suited/possible for |
| 92 | +javascript to handle. |
| 93 | + |
| 94 | +`custom_parse` provides an in-code transform plugin mechanism. In addition to a number of built-in |
| 95 | +transforms, such as field, `const`, `external`, `object`, `template`, `array`, and `custom_func`, |
| 96 | +`custom_parse` allows schema writer to be able to provide a Golang function that takes in the |
| 97 | +`*idr.Node` at the current IDR cursor (see more about IDR cursoring |
| 98 | +[here](./xpath.md#data-context-and-anchoring)), does whatever processing and transforms as it sees |
| 99 | +fit, and returns whatever the desired result to be embedded in place of the `custom_parse`. |
| 100 | + |
| 101 | +[This sample](../extensions/omniv21/samples/customparse/sample_test.go) gives a very detailed demo |
| 102 | +of how `custom_parse` works. |
| 103 | + |
| 104 | +## Add A New File Format |
| 105 | + |
| 106 | +While built-in `omni.2.1` schema handler already supports most popular file formats in a typical |
| 107 | +ETL pipeline, new file format(s) can be added into the schema handler, so it can ingest new formats |
| 108 | +while using the same extensible/capable transform (`transform_declarations`) logic. |
| 109 | + |
| 110 | +On a high level, a [`FileFormat`](../extensions/omniv21/fileformat/fileformat.go) is a component |
| 111 | +that knows how to ingest a data record, in streaming fashion, from a certain file format, and |
| 112 | +convert it into an `idr.Node` tree, for later processing and transform. |
| 113 | + |
| 114 | +Typically, a new [`FileFormat`](../extensions/omniv21/fileformat/fileformat.go) may require some |
| 115 | +additional information in a schema (usually in a `file_declaration` section), thus `omni.2.1` schema |
| 116 | +handler will give a new custom [`FileFormat`](../extensions/omniv21/fileformat/fileformat.go) a |
| 117 | +chance to validate a schema. Then the schema handler will ask |
| 118 | +the new [`FileFormat`](../extensions/omniv21/fileformat/fileformat.go) to create a format specific |
| 119 | +reader, whose job is to consume input stream, and convert each record into the IDR format. |
| 120 | + |
| 121 | +See [this example](../extensions/omniv21/samples/customfileformats) for how to add a new |
| 122 | +[`FileFormat`](../extensions/omniv21/fileformat/fileformat.go). |
| 123 | + |
| 124 | +## Add A New Schema Handler |
| 125 | + |
| 126 | +To complete omniparser's full extensibility picture, we allow adding complete new schema handlers, |
| 127 | +whether they're for major schema version upgrades that break backward-compatibility, or for brand-new |
| 128 | +parsing/transform paradigms. In fact, we utilize this customizability capability ourselves for |
| 129 | +integrating those legacy omniparser schema supports (schema versions that are older than `omni.2.1` |
| 130 | +and are not compatible with `omni.2.1`): take a glimpse at: https://github.com/jf-tech/omniparserlegacy. |
| 131 | + |
| 132 | +## Put All Together |
| 133 | + |
| 134 | +The most canonical use case of omniparser would be a (micro)service that is part of a larger ETL |
| 135 | +pipeline that gets different input files/streams from different external integration influx points, |
| 136 | +performs schema driven (thus codeless) parsing and transform to process and standardize the inputs |
| 137 | +into internal formats for later stage loading (L) part of ETL. |
| 138 | + |
| 139 | +Because omniparser's parsing and transform is schema driven and involves little/no coding, it enables |
| 140 | +faster and at-scale ETL integration possibly done by non-coding engineers or support staffs: |
| 141 | + |
| 142 | + |
| 143 | + |
| 144 | +First in your service, there needs to be a schema cache component that loads and refreshes all the |
| 145 | +schemas from a schema repository (could be a REST API, or a database, or some storage). These schemas |
| 146 | +are parsed, validated (by [`omniparser.NewSchema`](../schema.go) calls) and cached. |
| 147 | + |
| 148 | +As different integration partners' input streams are coming in, the service will, based on some |
| 149 | +criteria, such as partner IDs, select which schema to use for a particular input. Once schema |
| 150 | +selection is completed, the service calls [`schema.NewTransform`](../schema.go) to create an |
| 151 | +instance of a transform operation for this particular input, performs the parsing and transform, and |
| 152 | +sends the standardized output into a later stage in the ETL pipeline. |
| 153 | + |
| 154 | +## In Non-Golang Environment |
| 155 | + |
| 156 | +Omniparser is currently only implemented in Golang (we do want to port it to other languages, at least |
| 157 | +Java, in the near future), the only way to utilize it, if your service or environment is not in Golang, |
| 158 | +is to sidecar it, by either making it a standard alone service or shell-exec omniparser, both of which |
| 159 | +involves omniparser's CLI. |
| 160 | + |
| 161 | +Recall in [Getting Started](./gettingstarted.md#cli-command-line-interface) we demonstrated omniparser |
| 162 | +CLI's `transform` command. You can shell-exec it from your service. Keep in mind the following if you |
| 163 | +want to go down this path: |
| 164 | +- you will have to pre-compile omniparser CLI binary (which needs to platform/OS specific) and ship with |
| 165 | +your service, and |
| 166 | +- you will need to copy down the input file locally in your service before invoking the CLI, and then |
| 167 | +intercept `stdout`/`stderr` from the CLI and its exit code in order to get the results. |
| 168 | + |
| 169 | +Omniparser CLI has another command `server`, which simply launches the CLI into a http listening service |
| 170 | +that exposes a REST API: |
| 171 | +- `POST` |
| 172 | +- request `Content-Type`: `application/json` |
| 173 | +- request JSON: |
| 174 | + ``` |
| 175 | + { |
| 176 | + "schema": "... the schema content, required ...", |
| 177 | + "input": "... the input to be parsed and transformed, required ...", |
| 178 | + "properties": { ... JSON string map used for `external` transforms, optional ...} |
| 179 | + } |
| 180 | + ``` |
| 181 | +Keep in mind the following if you want to go down this path: |
| 182 | +- you will need to host this CLI-turned omniparser service somewhere accessible to your service, |
| 183 | +- you lose the benefit of omniparser stream processing, which enables parsing infinitely large input, |
| 184 | +because now you need to send the input as a single string in the `input` field of the HTTP POST request. |
| 185 | +
|
| 186 | +# Programmability of Some Components without Omniparser |
| 187 | +
|
| 188 | +There are many components inside omniparser can be useful in your code, even if you don't want to |
| 189 | +use omniparser as a whole for parsing and transforming input file/data. Here is a selected list of |
| 190 | +these components: |
| 191 | +
|
| 192 | +## Functions |
| 193 | +
|
| 194 | +- [`DateTimeToRFC3339()`, `DateTimeLayoutToRFC3339()`, `DateTimeToEpoch()`, `EpochToDateTimeRFC3339()`](../customfuncs/datetime.go) |
| 195 | +
|
| 196 | + Parsing and formatting date/time stamps isn't trivial at all, especially when time zones are |
| 197 | + involved. These functions can be used independent of omniparser and are very useful when your |
| 198 | + Golang code deals with date/time a lot. |
| 199 | +
|
| 200 | +- [`JavaScript()`](../extensions/omniv21/customfuncs/javascript.go): |
| 201 | +
|
| 202 | + Omniparser uses github.com/dop251/goja as the native Golang javascript engine. Yes you can directly |
| 203 | + use `goja`, but you'll have to deal with performance related vm caching, and error handling. Instead |
| 204 | + you can directly use `JavaScript` function. |
| 205 | +
|
| 206 | +## IDR |
| 207 | +
|
| 208 | +We have an in-depth [doc](./idr.md) talking about IDR, which proves to be really useful in many document |
| 209 | +parsing situations, even outside of omniparser realm. This `idr` package contains the IDR node/tree |
| 210 | +definitions, creation, caching, recycling and releasing mechanisms, serialization helpers, XPath |
| 211 | +assisted navigation and querying, and two powerful stream readers for JSON and XML inputs. |
| 212 | +
|
| 213 | +Particularly, the [JSON](../idr/jsonreader.go)/[XML](../idr/xmlreader.go) readers are two powerful |
| 214 | +parsers, capable of ingesting JSON/XML data in streaming fashion assisted by XPath style target |
| 215 | +filtering, thus enabling processing arbitrarily large inputs. |
| 216 | +
|
| 217 | +## CSV Reader |
| 218 | +
|
| 219 | +Use [`NewReader()`](../extensions/omniv21/fileformat/csv/reader.go) to create a CSV reader that does |
| 220 | +- header column validation |
| 221 | +- header/data row jumping |
| 222 | +- XPath based data row filtering |
| 223 | +- Mis-escaped quote replacement |
| 224 | +- Context-aware error message |
| 225 | +
|
| 226 | +For more reader specific settings/configurations, check |
| 227 | +[CSV in Depth](./csv_in_depth.md#csv-file_declaration) page. |
| 228 | +
|
| 229 | +## Fixed-Length Reader |
| 230 | +
|
| 231 | +Use [`NewReader()`](../extensions/omniv21/fileformat/fixedlength/reader.go) to create a fixed-length |
| 232 | +reader that does |
| 233 | +- row based or header/footer based envelope parsing |
| 234 | +- XPath based data row filtering |
| 235 | +- Context-aware error message |
| 236 | +
|
| 237 | +For more reader specific settings/configurations, check |
| 238 | +[Fixed-Length in Depth](./fixedlength_in_depth.md) page. |
| 239 | +
|
| 240 | +## EDI Reader |
| 241 | +
|
| 242 | +Use [`NewReader()`](../extensions/omniv21/fileformat/edi/reader.go) to create an EDI reader that does |
| 243 | +- segment min/max validation |
| 244 | +- XPath based data row filtering |
| 245 | +- Context-aware error message |
| 246 | +
|
| 247 | +Future TO-DO: create a version of non-validating EDI reader for users who are only interested in |
| 248 | +getting the raw segment data, without any validation. |
| 249 | +
|
| 250 | +## JSON Reader |
| 251 | +See [IDR](#idr) notes about the JSON/XML readers above. |
| 252 | +
|
| 253 | +## XML Reader |
| 254 | +See [IDR](#idr) notes about the JSON/XML readers above. |
0 commit comments