@@ -90,7 +90,7 @@ Now we're ready to go!
90
90
91
91
### ` parser_settings `
92
92
93
- This is the comment part of all omniparser schemas, the header ` parser_settings ` :
93
+ This is the common part of all omniparser schemas, the header ` parser_settings ` :
94
94
```
95
95
{
96
96
"parser_settings": {
@@ -141,7 +141,7 @@ Let's add an empty `FINAL_OUTPUT` in:
141
141
` FINAL_OUTPUT ` is the special name reserved for the transform template that will be used for
142
142
the output. Given the section is called ` transform_declarations ` you might have guessed we can have
143
143
multiple templates defined in it. Each template can reference other templates. There must be one
144
- and only one templated called ` FINAL_OUTPUT ` .
144
+ and only one template called ` FINAL_OUTPUT ` .
145
145
146
146
Run the cli we get a new error:
147
147
```
@@ -158,11 +158,11 @@ data into the desired output format, we still owe the parser the instructions ho
158
158
stream, there comes ` file_declaration ` . (Note not all input formats require a ` file_declaration `
159
159
section, e.g. JSON and XML inputs need no ` file_declaration ` in their schemas.)
160
160
161
- For CSV, we need to define the following common settings:
161
+ For CSV, we need to define the following settings:
162
162
- What's the delimiter character, comma or something else?
163
163
- Is there a header in the CSV input that defines the names of each column? If no, what each column
164
164
should be called during ingestion and transformation?
165
- - Where does the actual data lines begin?
165
+ - Where do the actual data lines begin?
166
166
167
167
For this guide example, the settings are:
168
168
- delimiter is ` | `
@@ -220,7 +220,7 @@ all input formats. If you're interested in more technical details, check the IDR
220
220
221
221
CSV has a very simple IDR representation: each data line is mapped to an IDR tree, where each column
222
222
is mapped to the tree's leaf nodes. So for our sample input csv here, the first data line would be
223
- represented like the following IDR:
223
+ represented by the following IDR:
224
224
```
225
225
|
226
226
+--"DATE"
@@ -240,7 +240,7 @@ represented like the following IDR:
240
240
```
241
241
242
242
You can imaginarily convert the IDR into XML which helps you understand the extensive use of XPath
243
- later in transformation:
243
+ queries later in transformation:
244
244
```
245
245
<>
246
246
<DATE>01/31/2019 12:34:56-0800</DATE>
@@ -250,9 +250,9 @@ later in transformation:
250
250
<WIND SPEED KMH> 33</WIND SPEED KMH>
251
251
</>
252
252
```
253
- Note XML/XPath don't like element name containing spaces. While IDR doesn't care of names containing
254
- spaces, XPath queries used in transforms will later break. So we'd like to ** assign some
255
- XPath- friendly column name aliases in our schema, if the raw column names containing special chars** :
253
+ Note XML/XPath don't like element name containing spaces. While IDR doesn't care about names with
254
+ spaces, XPath queries used in transforms do care and will break. So we'd like to ** assign some
255
+ XPath friendly column name aliases in our schema, if the raw column names containing special chars** :
256
256
257
257
Let's make small modifications to our schema:
258
258
```
@@ -280,7 +280,7 @@ Let's make small modifications to our schema:
280
280
```
281
281
282
282
Rerun the cli to ensure everything is still working. Now the IDR and its imaginary converted XML
283
- equivalent looks like this:
283
+ equivalent look like this:
284
284
```
285
285
<>
286
286
<DATE>01/31/2019 12:34:56-0800</DATE>
@@ -427,11 +427,11 @@ of the function, in this case `dateTimeToRFC3339`, and a list of arguments the f
427
427
428
428
The first argument here is ` { "xpath": "DATE" }, ` basically providing the function the input datetime
429
429
string. The second argument ` dateTimeToRFC3339 ` requires specifies what time zone the input datetime
430
- string is in. Since the datetime strings in the guide sample CSV contain time zone offsets ( ` -0800 ` ,
431
- ` - 0500` ), an empty string is supplied to the input time zone argument. The third argument is the
432
- desired output time zone. If, say, we want to standardize all the ` date ` fields in the output to be
433
- in time zone of ` America/Los_Angeles ` , we can specify it in the third argument, and the ` custom_func `
434
- will perform the correct time zone shifts for us.
430
+ string is in. Since the datetime strings in the guide sample CSV already contain time zone offsets
431
+ ( ` -0800 ` , ` - 0500` ), an empty string is supplied to the input time zone argument. The third argument is
432
+ the desired output time zone. If, say, we want to standardize all the ` date ` fields in the output to
433
+ be in time zone of ` America/Los_Angeles ` , we can specify it in the third argument, and the
434
+ ` custom_func ` will perform the correct time zone shifts for us.
435
435
436
436
### Fix ` FINAL_OUTPUT.high_temperature_fahrenheit `
437
437
@@ -495,11 +495,11 @@ Here we introduce two new things: 1) template and 2) custom_func `javascript`.
495
495
}
496
496
```
497
497
custom_func `javascript` takes a number of arguments: the first one is the actual script string,
498
- and all remaining arguments to are to provide values for all the variables declared in the script
498
+ and all remaining arguments are to provide values for all the variables declared in the script
499
499
string, in this particular case, only one variable `temp_c`. All remaining arguments come in
500
500
pairs. The first in each pair always declares what variable the second in pair is about. And the
501
501
second in each pair provides the actual value for the variable. In this example, we see variable
502
- `temp_c` should have the value based on the XPath query `"."` and converted into `float` type.
502
+ `temp_c` should have a value based on the XPath query `"."` and converted into `float` type.
503
503
Remember this template's invocation is anchored on the IDR node `<HIGH_TEMP_C>`, thus XPath query
504
504
`"."` returns its text value `"10.5"`, after which it was converted into numeric value `10.5`
505
505
before the math computation starts.
@@ -581,7 +581,7 @@ $ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
581
581
]
582
582
```
583
583
584
- Almost there, but not quite ! The `wind` field is a bit tricky to fix.
584
+ Almost there! The `wind` field is a bit tricky to fix.. .
585
585
586
586
### Fix `wind`
587
587
@@ -602,13 +602,13 @@ Recall the first data line's IDR (XML equivalent) looks like:
602
602
<WIND_SPEED_KMH> 33</WIND_SPEED_KMH>
603
603
</>
604
604
```
605
- So `wind` value needs to be derived from two columns in the input CSV data line. Let's look at them
606
- one by one.
605
+ So `wind` value needs to derive from two columns in the input CSV data line. Let's look at them one
606
+ by one.
607
607
608
608
1) Wind Direction
609
609
610
610
In the input, the wind direction is abbreviated (such as `"N"`, `"E"`, `"SW"`, etc). In the
611
- desired output we want it read English. So we need some mapping, for which again we resort to the
611
+ desired output we want it to be English. So we need some mapping, for which again we resort to the
612
612
all mighty custom function `javascript`:
613
613
```
614
614
"wind_acronym_mapping": {
@@ -621,7 +621,8 @@ one by one.
621
621
}
622
622
}
623
623
```
624
- A giant/long `? :` ternary operator maps wind direction abbreviations into English phrases.
624
+ A giant/long `? :` ternary operator infested javascript line maps wind direction abbreviations
625
+ into English phrases.
625
626
626
627
2) Wind Speed
627
628
@@ -630,6 +631,8 @@ one by one.
630
631
```
631
632
Math.floor(kmh * 0.621371 * 100) / 100
632
633
```
634
+ (Several uses of `Math.floor(...*100/100)` throughout this page is to limit the number of decimal
635
+ places to be more human readable.)
633
636
634
637
Put 1) and 2) together, we can have the new transform schema look like this:
635
638
```
@@ -712,7 +715,7 @@ code snippet of showing how to achieve this:
712
715
if err == io.EOF {
713
716
break
714
717
}
715
- // output contains the []byte of the ingested and transformed record.
718
+ // output contains a []byte of the ingested and transformed record.
716
719
}
717
720
```
718
721
@@ -798,6 +801,6 @@ code snippet of showing how to achieve this:
798
801
if err == io.EOF {
799
802
break
800
803
}
801
- // output contains the []byte of the ingested and transformed record.
804
+ // output contains a []byte of the ingested and transformed record.
802
805
}
803
806
```
0 commit comments