Skip to content

Commit 2061104

Browse files
Typo repair and PEP8 cleanup (Netflix#1190)
* Fixed typos, spelling, and grammar * Fixed several simple PEP warnings * Reverted changes to _vendor folder * Black formatting * Black linting * Reset _vendor folder after black formatting
1 parent 6764209 commit 2061104

File tree

112 files changed

+360
-336
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

112 files changed

+360
-336
lines changed

R/inst/tutorials/02-statistics/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Episode 02-statistics: Is this Data Science?
22

3-
**Use metaflow to load the movie metadata CSV file into a data frame and compute some movie genre specific statistics. These statistics are then used in
3+
**Use metaflow to load the movie metadata CSV file into a data frame and compute some movie genre-specific statistics. These statistics are then used in
44
later examples to improve our playlist generator. You can optionally use the
55
Metaflow client to eyeball the results in a Markdown Notebook, and make some simple
66
plots.**

R/inst/tutorials/02-statistics/stats.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ output:
55
df_print: paged
66
---
77

8-
MovieStatsFlow loads the movie metadata CSV file into a Pandas Dataframe and computes some movie genre specific statistics. You can use this notebook and the Metaflow client to eyeball the results and make some simple plots.
8+
MovieStatsFlow loads the movie metadata CSV file into a Pandas Dataframe and computes some movie genre-specific statistics. You can use this notebook and the Metaflow client to eyeball the results and make some simple plots.
99

1010
```{r}
1111
suppressPackageStartupMessages(library(metaflow))

R/inst/tutorials/05-statistics-redux/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ running on remote compute. In this example we re-run the 'stats.R' workflow
66
adding the '--with batch' command line argument. This instructs Metaflow to run
77
all your steps on AWS batch without changing any code. You can control the
88
behavior with additional arguments, like '--max-workers'. For this example,
9-
'max-workers' is used to limit the number of parallel genre specific statistics
9+
'max-workers' is used to limit the number of parallel genre-specific statistics
1010
computations.
1111
You can then access the data artifacts (even the local CSV file) from anywhere
1212
because the data is being stored in AWS S3.**

docs/Environment escape.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ but *some* can execute in another Python environment.
2020
At a high-level, the environment escape plugin allows a Python interpreter to
2121
forward calls to another interpreter. To set semantics, we will say that a
2222
*client* interpreter escapes to a *server* interpreter. The *server* interpreter
23-
operates in a slave-like mode with regards to the *client*. To give a concrete
23+
operates in a slave-like mode with regard to the *client*. To give a concrete
2424
example, imagine a package ``data_accessor`` that is available in the base
2525
environment you are executing in but not in your Conda environment. When
2626
executing within the Conda environment, the *client* interpreter is the Conda
@@ -69,7 +69,7 @@ identifier to find the correct stub. There is therefore a **one-to-one mapping
6969
between stub objects on the client and backing objects on the server**.
7070

7171
The next method called on ```job``` is ```wait``` which returns ```None```. In
72-
this system, by design, only certain objects are able to be transferred between
72+
this system, by design, only certain objects may be transferred between
7373
the client and the server:
7474
- any Python basic type; this can be extended to any object that can be pickled
7575
without any external library;
@@ -224,9 +224,9 @@ everything to the server:
224224
performs computations at the request of the client when the client is unable
225225
to do so.
226226

227-
The server is thus started by the client and the client is responsible for
228-
terminating it when it dies. A big part of the client and server code consist
229-
in loading the configuration for the emulated module, particularly the
227+
The server is thus started by the client, and the client is responsible for
228+
terminating the server when it dies. A big part of the client and server code
229+
consist in loading the configuration for the emulated module, particularly the
230230
overrides.
231231

232232
The steps to bringing up the client/server connection are as follows:
@@ -274,7 +274,7 @@ used).
274274

275275
## Defining an emulated module
276276

277-
To define an emulated module, you need to create a sub directory in
277+
To define an emulated module, you need to create a subdirectory in
278278
```plugins/env_escape/configurations``` called ```emulate_<name>``` where
279279
```<name>``` is the name of the library you want to emulate. It can be a "list"
280280
where ```__``` is the list separator; this allows multiple libraries to be
@@ -286,9 +286,9 @@ create two files:
286286
- ```EXPORTED_CLASSES```: This is a dictionary of dictionary describing the
287287
whitelisted classes. The outermost key is either a string or a tuple of
288288
strings and corresponds to the "module" name (it doesn't really have to be
289-
the module but the prefix of the full name of the whitelisted class)). The
289+
the module but the prefix of the full name of the whitelisted class). The
290290
inner key is a string and corresponds to the suffix of the whitelisted
291-
class. Finally, the value is the class that the class maps to internally. If
291+
class. Finally, the value is the class to which the class maps internally. If
292292
the outermost key is a tuple, all strings in that tuple will be considered
293293
aliases of one another.
294294
- ```EXPORTED_FUNCTIONS```: This is the same structure as
@@ -324,7 +324,7 @@ create two files:
324324
define how attributes are accessed. Note that this is not restricted to
325325
attributes accessed using the ```getattr``` and ```setattr``` functions but
326326
any attribute. Both of these functions take as arguments ```stub```,
327-
```name``` and ```func``` which is the function to call to call the remote
327+
```name``` and ```func``` which is the function to call in order to call the remote
328328
```getattr``` or ```setattr```. The ```setattr``` version takes an additional
329329
```value``` argument. The remote versions simply take the target object and
330330
the name of the attribute (and ```value``` if it is a ```setattr``` override)

docs/cards.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Metaflow cards can be created by placing an [`@card` decorator](#@card-decorator
2929

3030
Since the cards are stored in the datastore we can access them via the `view/get` commands in the [card_cli](#card-cli) or by using the `get_cards` [function](../metaflow/plugins/cards/card_client.py).
3131

32-
Metaflow ships with a [DefaultCard](#defaultcard) which visualizes artifacts, images, and `pandas.Dataframe`s. Metaflow also ships custom components like `Image`, `Table`, `Markdown` etc. These can be added to a card at `Task` runtime. Cards can also be edited from `@step` code using the [current.card](#editing-metaflowcard-from-@step-code) interface. `current.card` helps add `MetaflowCardComponent`s from `@step` code to a `MetaflowCard`. `current.card` offers methods like `current.card.append` or `current.card['myid']` to helps add components to a card. Since there can be many `@card`s over a `@step`, `@card` also comes with an `id` argument. The `id` argument helps disambigaute the card a component goes to when using `current.card`. For example, setting `@card(id='myid')` and calling `current.card['myid'].append(x)` will append `MetaflowCardComponent` `x` to the card with `id='myid'`.
32+
Metaflow ships with a [DefaultCard](#defaultcard) which visualizes artifacts, images, and `pandas.Dataframe`s. Metaflow also ships custom components like `Image`, `Table`, `Markdown` etc. These can be added to a card at `Task` runtime. Cards can also be edited from `@step` code using the [current.card](#editing-metaflowcard-from-@step-code) interface. `current.card` helps add `MetaflowCardComponent`s from `@step` code to a `MetaflowCard`. `current.card` offers methods like `current.card.append` or `current.card['myid']` to helps add components to a card. Since there can be many `@card`s over a `@step`, `@card` also comes with an `id` argument. The `id` argument helps disambiguate the card a component goes to when using `current.card`. For example, setting `@card(id='myid')` and calling `current.card['myid'].append(x)` will append `MetaflowCardComponent` `x` to the card with `id='myid'`.
3333

3434
### `@card` decorator
3535
The `@card` [decorator](../metaflow/plugins/cards/card_decorator.py) is implemented by inheriting the `StepDecorator`. The decorator can be placed over `@step` to create an HTML file visualizing information from the task.
@@ -75,7 +75,7 @@ if __name__ == "__main__":
7575

7676

7777
### `CardDatastore`
78-
The [CardDatastore](../metaflow/plugins/cards/card_datastore.py) is used by the the [card_cli](#card-cli) and the [metaflow card client](#access-cards-in-notebooks) (`get_cards`). It exposes methods to get metadata about a card and the paths to cards for a `pathspec`.
78+
The [CardDatastore](../metaflow/plugins/cards/card_datastore.py) is used by the [card_cli](#card-cli) and the [metaflow card client](#access-cards-in-notebooks) (`get_cards`). It exposes methods to get metadata about a card and the paths to cards for a `pathspec`.
7979

8080
### Card CLI
8181
Methods exposed by the [card_cli](../metaflow/plugins/cards/.card_cli.py). :
@@ -142,12 +142,12 @@ class CustomCard(MetaflowCard):
142142

143143
The class consists of the `_get_mustache` method that returns [chevron](https://github.com/noahmorrison/chevron) object ( a `mustache` based [templating engine](http://mustache.github.io/mustache.5.html) ). Using the `mustache` templating engine you can rewrite HTML template file. In the above example the `PATH_TO_CUSTOM_HTML` is the file that holds the `mustache` HTML template.
144144
#### Attributes
145-
- `type (str)` : The `type` of card. Needs to ensure correct resolution.
146-
- `ALLOW_USER_COMPONENTS (bool)` : Setting this to `True` will make the a card be user editable. More information on user editable cards can be found [here](#editing-metaflowcard-from-@step-code).
145+
- `type (str)` : The `type` of card. Needs to ensure correct resolution.
146+
- `ALLOW_USER_COMPONENTS (bool)` : Setting this to `True` will make the card be user editable. More information on user editable cards can be found [here](#editing-metaflowcard-from-@step-code).
147147

148148
#### `__init__` Parameters
149149
- `components` `(List[str])`: `components` is a list of `render`ed `MetaflowCardComponent`s created at `@step` runtime. These are passed to the `card create` cli command via a tempfile path in the `--component-file` argument.
150-
- `graph` `(Dict[str,dict])`: The DAG associated to the flow. It is a dictionary of the form `stepname:step_attributes`. `step_attributes` is a dictionary of metadata about a step , `stepname` is the name of the step in the DAG.
150+
- `graph` `(Dict[str,dict])`: The DAG associated to the flow. It is a dictionary of the form `stepname:step_attributes`. `step_attributes` is a dictionary of metadata about a step , `stepname` is the name of the step in the DAG.
151151
- `options` `(dict)`: helps control the behavior of individual cards.
152152
- For example, the `DefaultCard` supports `options` as dictionary of the form `{"only_repr":True}`. Here setting `only_repr` as `True` will ensure that all artifacts are serialized with `reprlib.repr` function instead of native object serialization.
153153

@@ -201,7 +201,7 @@ class CustomCard(MetaflowCard):
201201
```
202202

203203
### `DefaultCard`
204-
The [DefaultCard](../metaflow/plugins/cards/card_modules/basic.py) is a default card exposed by metaflow. This will be used when the `@card` decorator is called without any `type` argument or called with `type='default'` argument. It will also be the default card used with cli. The card uses a [HTML template](../metaflow/plugins/cards/card_modules/base.html) along with a [JS](../metaflow/plugins/cards/card_modules/main.js) and a [CSS](../metaflow/plugins/cards/card_modules/bundle.css) files.
204+
The [DefaultCard](../metaflow/plugins/cards/card_modules/basic.py) is a default card exposed by metaflow. This will be used when the `@card` decorator is called without any `type` argument or called with `type='default'` argument. It will also be the default card used with cli. The card uses an [HTML template](../metaflow/plugins/cards/card_modules/base.html) along with a [JS](../metaflow/plugins/cards/card_modules/main.js) and a [CSS](../metaflow/plugins/cards/card_modules/bundle.css) files.
205205

206206
The [HTML](../metaflow/plugins/cards/card_modules/base.html) is a template which works with [JS](../metaflow/plugins/cards/card_modules/main.js) and [CSS](../metaflow/plugins/cards/card_modules/bundle.css).
207207

@@ -237,17 +237,17 @@ def train(self):
237237
)
238238
self.next(self.end)
239239
```
240-
In the above scenario there are two `@card` decorators which are being customized by `current.card`. The `current.card.append`/ `current.card['a'].append` methods only accepts objects which are subclasses of `MetaflowCardComponent`. The `current.card.append`/ `current.card['a'].append` methods only add a component to **one** card. Since there can be many cards for a `@step`, a **default editabled card** is resolved to disambiguate which card has access to the `append`/`extend` methods within the `@step`. A default editable card is a card that will have access to the `current.card.append`/`current.card.extend` methods. `current.card` resolve the default editable card before a `@step` code gets executed. It sets the default editable card once the last `@card` decorator calls the `task_pre_step` callback. In the above case, `current.card.append` will add a `Markdown` component to the card of type `default`. `current.card['a'].append` will add the `Markdown` to the `blank` card whose `id` is `a`. A `MetaflowCard` can be user editable, if `ALLOW_USER_COMPONENTS` is set to `True`. Since cards can be of many types, **some cards can also be non editable by users** (Cards with `ALLOW_USER_COMPONENTS=False`). Those cards won't be eligible to access the `current.card.append`. A non user editable card can be edited through expicitly setting an `id` and accessing it via `current.card['myid'].append` or by looking it up by its type via `current.card.get(type=’pytorch’)`.
240+
In the above scenario there are two `@card` decorators which are being customized by `current.card`. The `current.card.append`/ `current.card['a'].append` methods only accepts objects which are subclasses of `MetaflowCardComponent`. The `current.card.append`/ `current.card['a'].append` methods only add a component to **one** card. Since there can be many cards for a `@step`, a **default editable card** is resolved to disambiguate which card has access to the `append`/`extend` methods within the `@step`. A default editable card is a card that will have access to the `current.card.append`/`current.card.extend` methods. `current.card` resolve the default editable card before a `@step` code gets executed. It sets the default editable card once the last `@card` decorator calls the `task_pre_step` callback. In the above case, `current.card.append` will add a `Markdown` component to the card of type `default`. `current.card['a'].append` will add the `Markdown` to the `blank` card whose `id` is `a`. A `MetaflowCard` can be user editable, if `ALLOW_USER_COMPONENTS` is set to `True`. Since cards can be of many types, **some cards can also be non-editable by users** (Cards with `ALLOW_USER_COMPONENTS=False`). Those cards won't be eligible to access the `current.card.append`. A non-user editable card can be edited through explicitly setting an `id` and accessing it via `current.card['myid'].append` or by looking it up by its type via `current.card.get(type=’pytorch’)`.
241241

242242
#### `current.card` (`CardComponentCollector`)
243243

244244
The `CardComponentCollector` is the object responsible for resolving a `MetaflowCardComponent` to the card referenced in the `@card` decorator.
245245

246-
Since there can be many cards, `CardComponentCollector` has a `_finalize` function. The `_finalize` function is called once the **last** `@card` decorator calls `task_pre_step`. The `_finalize` function will try to find the **default editable card** from all the `@card` decorators on the `@step`. The default editable card is the card that can access the `current.card.append`/`current.card.extend` methods. If there are multiple editable cards with no `id` then `current.card` will throw warnings when users call `current.card.append`. This is done because `current.card` cannot resolve which card the component belongs.
246+
Since there can be many cards, `CardComponentCollector` has a `_finalize` function. The `_finalize` function is called once the **last** `@card` decorator calls `task_pre_step`. The `_finalize` function will try to find the **default editable card** from all the `@card` decorators on the `@step`. The default editable card is the card that can access the `current.card.append`/`current.card.extend` methods. If there are multiple editable cards with no `id` then `current.card` will throw warnings when users call `current.card.append`. This is done because `current.card` cannot resolve which card the component belongs.
247247

248248
The `@card` decorator also exposes another argument called `customize=True`. **Only one `@card` decorator over a `@step` can have `customize=True`**. Since cards can also be added from CLI when running a flow, adding `@card(customize=True)` will set **that particular card** from the decorator as default editable. This means that `current.card.append` will append to the card belonging to `@card` with `customize=True`. If there is more than one `@card` decorator with `customize=True` then `current.card` will throw warnings that `append` won't work.
249249

250-
One important feature of the `current.card` object is that it will not fail. Even when users try to access `current.card.append` with multiple editable cards, we throw warnings but don't fail. `current.card` will also not fail when a user tries to access a card of a non-existing id via `current.card['mycard']`. Since `current.card['mycard']` gives reference to a `list` of `MetaflowCardComponent`s, `current.card` will return a non-referenced `list` when users try to access the dictionary inteface with a non existing id (`current.card['my_non_existant_card']`).
250+
One important feature of the `current.card` object is that it will not fail. Even when users try to access `current.card.append` with multiple editable cards, we throw warnings but don't fail. `current.card` will also not fail when a user tries to access a card of a non-existing id via `current.card['mycard']`. Since `current.card['mycard']` gives reference to a `list` of `MetaflowCardComponent`s, `current.card` will return a non-referenced `list` when users try to access the dictionary interface with a nonexistent id (`current.card['my_non_existant_card']`).
251251

252252
Once the `@step` completes execution, every `@card` decorator will call `current.card._serialize` (`CardComponentCollector._serialize`) to get a JSON serializable list of `str`/`dict` objects. The `_serialize` function internally calls all [component's](#metaflowcardcomponent) `render` function. This list is `json.dump`ed to a `tempfile` and passed to the `card create` subprocess where the `MetaflowCard` can use them in the final output.
253253

docs/concurrency.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Concurrency is practically never needed during the first two phases.
2929

3030
We divide the concurrency constructs into two categories: Primary and
3131
Secondary. Whenever possible, you should prefer the constructs in
32-
the first category. The patterns are well established and they have
32+
the first category. The patterns are well established and have
3333
been used successfully in the core Metaflow modules, `runtime.py`
3434
and `task.py`. The constructs in the second category can be used in
3535
subprocesses, outside the core code paths in `runtime.py` and `task.py`.
@@ -109,7 +109,7 @@ delay, to avoid the parent from blocking.
109109

110110
The sidecar subprocess may die for various reasons, in which case
111111
messages sent to it by the parent may be lost. To keep communication
112-
essentially non-blocking and fast, there is no blocking acklowdgement of
112+
essentially non-blocking and fast, there is no blocking acknowledgement of
113113
successful message processing by the sidecar. Hence the communication is
114114
lossy. In this sense, communication with a sidecar is more akin to UDP
115115
than TCP.
@@ -139,7 +139,7 @@ Use a sidecar if you need a task that runs during scheduling or
139139
execution of user code. A sidecar task can not perform any critical
140140
operations that must succeed in order for a task or a run to be
141141
considered valid. This makes sidecars suitable only for opportunistic,
142-
best effort tasks.
142+
best-effort tasks.
143143

144144
### 3. Data Parallelism
145145

0 commit comments

Comments
 (0)