Skip to content

Commit 2c975b9

Browse files
authored
getting started - some wording change (#122)
'
1 parent f057be3 commit 2c975b9

File tree

1 file changed

+27
-24
lines changed

1 file changed

+27
-24
lines changed

doc/gettingstarted.md

+27-24
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ Now we're ready to go!
9090

9191
### `parser_settings`
9292

93-
This is the comment part of all omniparser schemas, the header `parser_settings`:
93+
This is the common part of all omniparser schemas, the header `parser_settings`:
9494
```
9595
{
9696
"parser_settings": {
@@ -141,7 +141,7 @@ Let's add an empty `FINAL_OUTPUT` in:
141141
`FINAL_OUTPUT` is the special name reserved for the transform template that will be used for
142142
the output. Given the section is called `transform_declarations` you might have guessed we can have
143143
multiple templates defined in it. Each template can reference other templates. There must be one
144-
and only one templated called `FINAL_OUTPUT`.
144+
and only one template called `FINAL_OUTPUT`.
145145

146146
Run the cli we get a new error:
147147
```
@@ -158,11 +158,11 @@ data into the desired output format, we still owe the parser the instructions ho
158158
stream, there comes `file_declaration`. (Note not all input formats require a `file_declaration`
159159
section, e.g. JSON and XML inputs need no `file_declaration` in their schemas.)
160160

161-
For CSV, we need to define the following common settings:
161+
For CSV, we need to define the following settings:
162162
- What's the delimiter character, comma or something else?
163163
- Is there a header in the CSV input that defines the names of each column? If no, what each column
164164
should be called during ingestion and transformation?
165-
- Where does the actual data lines begin?
165+
- Where do the actual data lines begin?
166166

167167
For this guide example, the settings are:
168168
- delimiter is `|`
@@ -220,7 +220,7 @@ all input formats. If you're interested in more technical details, check the IDR
220220

221221
CSV has a very simple IDR representation: each data line is mapped to an IDR tree, where each column
222222
is mapped to the tree's leaf nodes. So for our sample input csv here, the first data line would be
223-
represented like the following IDR:
223+
represented by the following IDR:
224224
```
225225
|
226226
+--"DATE"
@@ -240,7 +240,7 @@ represented like the following IDR:
240240
```
241241

242242
You can imaginarily convert the IDR into XML which helps you understand the extensive use of XPath
243-
later in transformation:
243+
queries later in transformation:
244244
```
245245
<>
246246
<DATE>01/31/2019 12:34:56-0800</DATE>
@@ -250,9 +250,9 @@ later in transformation:
250250
<WIND SPEED KMH> 33</WIND SPEED KMH>
251251
</>
252252
```
253-
Note XML/XPath don't like element name containing spaces. While IDR doesn't care of names containing
254-
spaces, XPath queries used in transforms will later break. So we'd like to **assign some
255-
XPath-friendly column name aliases in our schema, if the raw column names containing special chars**:
253+
Note XML/XPath don't like element name containing spaces. While IDR doesn't care about names with
254+
spaces, XPath queries used in transforms do care and will break. So we'd like to **assign some
255+
XPath friendly column name aliases in our schema, if the raw column names containing special chars**:
256256

257257
Let's make small modifications to our schema:
258258
```
@@ -280,7 +280,7 @@ Let's make small modifications to our schema:
280280
```
281281

282282
Rerun the cli to ensure everything is still working. Now the IDR and its imaginary converted XML
283-
equivalent looks like this:
283+
equivalent look like this:
284284
```
285285
<>
286286
<DATE>01/31/2019 12:34:56-0800</DATE>
@@ -427,11 +427,11 @@ of the function, in this case `dateTimeToRFC3339`, and a list of arguments the f
427427

428428
The first argument here is `{ "xpath": "DATE" },` basically providing the function the input datetime
429429
string. The second argument `dateTimeToRFC3339` requires specifies what time zone the input datetime
430-
string is in. Since the datetime strings in the guide sample CSV contain time zone offsets (`-0800`,
431-
`-0500`), an empty string is supplied to the input time zone argument. The third argument is the
432-
desired output time zone. If, say, we want to standardize all the `date` fields in the output to be
433-
in time zone of `America/Los_Angeles`, we can specify it in the third argument, and the `custom_func`
434-
will perform the correct time zone shifts for us.
430+
string is in. Since the datetime strings in the guide sample CSV already contain time zone offsets
431+
(`-0800`, `-0500`), an empty string is supplied to the input time zone argument. The third argument is
432+
the desired output time zone. If, say, we want to standardize all the `date` fields in the output to
433+
be in time zone of `America/Los_Angeles`, we can specify it in the third argument, and the
434+
`custom_func` will perform the correct time zone shifts for us.
435435

436436
### Fix `FINAL_OUTPUT.high_temperature_fahrenheit`
437437

@@ -495,11 +495,11 @@ Here we introduce two new things: 1) template and 2) custom_func `javascript`.
495495
}
496496
```
497497
custom_func `javascript` takes a number of arguments: the first one is the actual script string,
498-
and all remaining arguments to are to provide values for all the variables declared in the script
498+
and all remaining arguments are to provide values for all the variables declared in the script
499499
string, in this particular case, only one variable `temp_c`. All remaining arguments come in
500500
pairs. The first in each pair always declares what variable the second in pair is about. And the
501501
second in each pair provides the actual value for the variable. In this example, we see variable
502-
`temp_c` should have the value based on the XPath query `"."` and converted into `float` type.
502+
`temp_c` should have a value based on the XPath query `"."` and converted into `float` type.
503503
Remember this template's invocation is anchored on the IDR node `<HIGH_TEMP_C>`, thus XPath query
504504
`"."` returns its text value `"10.5"`, after which it was converted into numeric value `10.5`
505505
before the math computation starts.
@@ -581,7 +581,7 @@ $ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
581581
]
582582
```
583583
584-
Almost there, but not quite! The `wind` field is a bit tricky to fix.
584+
Almost there! The `wind` field is a bit tricky to fix...
585585
586586
### Fix `wind`
587587
@@ -602,13 +602,13 @@ Recall the first data line's IDR (XML equivalent) looks like:
602602
<WIND_SPEED_KMH> 33</WIND_SPEED_KMH>
603603
</>
604604
```
605-
So `wind` value needs to be derived from two columns in the input CSV data line. Let's look at them
606-
one by one.
605+
So `wind` value needs to derive from two columns in the input CSV data line. Let's look at them one
606+
by one.
607607
608608
1) Wind Direction
609609
610610
In the input, the wind direction is abbreviated (such as `"N"`, `"E"`, `"SW"`, etc). In the
611-
desired output we want it read English. So we need some mapping, for which again we resort to the
611+
desired output we want it to be English. So we need some mapping, for which again we resort to the
612612
all mighty custom function `javascript`:
613613
```
614614
"wind_acronym_mapping": {
@@ -621,7 +621,8 @@ one by one.
621621
}
622622
}
623623
```
624-
A giant/long `? :` ternary operator maps wind direction abbreviations into English phrases.
624+
A giant/long `? :` ternary operator infested javascript line maps wind direction abbreviations
625+
into English phrases.
625626
626627
2) Wind Speed
627628
@@ -630,6 +631,8 @@ one by one.
630631
```
631632
Math.floor(kmh * 0.621371 * 100) / 100
632633
```
634+
(Several uses of `Math.floor(...*100/100)` throughout this page is to limit the number of decimal
635+
places to be more human readable.)
633636
634637
Put 1) and 2) together, we can have the new transform schema look like this:
635638
```
@@ -712,7 +715,7 @@ code snippet of showing how to achieve this:
712715
if err == io.EOF {
713716
break
714717
}
715-
// output contains the []byte of the ingested and transformed record.
718+
// output contains a []byte of the ingested and transformed record.
716719
}
717720
```
718721
@@ -798,6 +801,6 @@ code snippet of showing how to achieve this:
798801
if err == io.EOF {
799802
break
800803
}
801-
// output contains the []byte of the ingested and transformed record.
804+
// output contains a []byte of the ingested and transformed record.
802805
}
803806
```

0 commit comments

Comments
 (0)