2
2
3
3
This page is a step-by-step introduction of how to write an omniparser schema (specifically tailor
4
4
for the latest ` "omni.2.1" ` schema version) and how to ingest and transform inputs programmatically
5
- and by the cli tool.
5
+ and by the CLI tool.
6
6
7
7
## Prerequisites and Notes
8
8
@@ -47,7 +47,6 @@ transform each of the data line into the following JSON output:
47
47
"wind": "South East 4.97 mph"
48
48
}
49
49
]
50
-
51
50
```
52
51
As you can see, in the desired output, we'd like to standardize all the input temperatures into the
53
52
same fahrenheit unit; we'd also like to do some translation such that the wind direction and wind
@@ -56,7 +55,7 @@ into [RFC-3339](https://tools.ietf.org/html/rfc3339) standard format.
56
55
57
56
## CLI (command line interface)
58
57
59
- Before we get into schema writing, let's first get familiar with omniparser cli so that we can easily
58
+ Before we get into schema writing, let's first get familiar with omniparser CLI so that we can easily
60
59
and incrementally test our schema writing.
61
60
62
61
Assuming you have the git repo cloned at ` ~/dev/jf-tech/omniparser/ ` , simply run this bash script:
@@ -77,7 +76,7 @@ $ touch input.csv
77
76
$ touch schema.json
78
77
```
79
78
Use any editor to cut & paste the CSV content from [ The Input] ( #the-input ) into ` input.csv ` , and
80
- now run omniparser cli from ` ~/Downloads/omniparser/guide/ ` :
79
+ now run omniparser CLI from ` ~/Downloads/omniparser/guide/ ` :
81
80
```
82
81
$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
83
82
Error: unable to perform schema validation: EOF
@@ -99,7 +98,7 @@ This is the common part of all omniparser schemas, the header `parser_settings`:
99
98
}
100
99
}
101
100
```
102
- It's self-explanatory. Now let's run the cli again:
101
+ It's self-explanatory. Now let's run the CLI again:
103
102
```
104
103
$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
105
104
Error: schema 'schema.json' validation failed: (root): transform_declarations is required
@@ -121,7 +120,7 @@ transformation. Let's add an empty `transform_declarations` for now:
121
120
"transform_declarations": {}
122
121
}
123
122
```
124
- Run the cli we get another error:
123
+ Run the CLI we get another error:
125
124
```
126
125
$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
127
126
Error: schema 'schema.json' validation failed: transform_declarations: FINAL_OUTPUT is required
@@ -143,7 +142,7 @@ the output. Given the section is called `transform_declarations` you might have
143
142
multiple templates defined in it. Each template can reference other templates. There must be one
144
143
and only one template called ` FINAL_OUTPUT ` .
145
144
146
- Run the cli we get a new error:
145
+ Run the CLI we get a new error:
147
146
```
148
147
$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
149
148
Error: schema 'schema.json' validation failed: (root): file_declaration is required
@@ -193,7 +192,7 @@ Let's add these:
193
192
}
194
193
```
195
194
196
- Run the cli again:
195
+ Run the CLI again:
197
196
```
198
197
$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
199
198
[
@@ -279,7 +278,7 @@ Let's make small modifications to our schema:
279
278
}
280
279
```
281
280
282
- Rerun the cli to ensure everything is still working. Now the IDR and its imaginary converted XML
281
+ Rerun the CLI to ensure everything is still working. Now the IDR and its imaginary converted XML
283
282
equivalent look like this:
284
283
```
285
284
<>
@@ -339,7 +338,7 @@ Remember for the first data line, its corresponding IDR (or the IDR's equivalent
339
338
Thus, an XPath query ` "xpath": "DATE" ` on the root of the IDR would return ` 01/31/2019 12:34:56-0800 ` , which is
340
339
used as the value for the field ` date ` . So on and so forth for all other fields.
341
340
342
- Run the cli , we have:
341
+ Run the CLI , we have:
343
342
```
344
343
$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
345
344
[
@@ -388,7 +387,7 @@ built-in function to achieve this:
388
387
}
389
388
```
390
389
391
- Run cli we have:
390
+ Run CLI we have:
392
391
```
393
392
$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
394
393
[
@@ -508,7 +507,7 @@ Here we introduce two new things: 1) template and 2) custom_func `javascript`.
508
507
value `10.5`, `"type": "float"` is used. However when the script is done, the result is already
509
508
in float, there is no need to specify `"type": "float"` for the `custom_func` directive.
510
509
511
- Now let's run cli :
510
+ Now let's run CLI :
512
511
```
513
512
$ ~ /dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
514
513
[
@@ -562,7 +561,7 @@ numeric value. That should be an easy fix:
562
561
Basically changing `"low_temperature_fahrenheit": { "xpath": "LOW_TEMP_F" }` to
563
562
`"low_temperature_fahrenheit": { "xpath": "LOW_TEMP_F", "type": "float" }`.
564
563
565
- Run cli again, we have:
564
+ Run CLI again, we have:
566
565
```
567
566
$ ~ /dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
568
567
[
@@ -804,3 +803,21 @@ code snippet of showing how to achieve this:
804
803
// output contains a []byte of the ingested and transformed record.
805
804
}
806
805
```
806
+
807
+ ### The Output
808
+ ```
809
+ [
810
+ {
811
+ "date": "2019-01-31T12:34:56-08:00",
812
+ "high_temperature_fahrenheit": 50.9,
813
+ "low_temperature_fahrenheit": 30.2,
814
+ "wind": "North 20.5 mph"
815
+ },
816
+ {
817
+ "date": "2020-07-31T01:23:45-05:00",
818
+ "high_temperature_fahrenheit": 102.2,
819
+ "low_temperature_fahrenheit": 95,
820
+ "wind": "South East 4.97 mph"
821
+ }
822
+ ]
823
+ ```
0 commit comments