Skip to content

Commit

Permalink
Grammer and language (#1024)
Browse files Browse the repository at this point in the history
* Edit grammer and count

* update runtime.txt

* updated the file

* fixed some typos, formatted document
  • Loading branch information
ArshErgon authored Jul 28, 2022
1 parent 7ed7ac7 commit bcb5667
Show file tree
Hide file tree
Showing 22 changed files with 123 additions and 80 deletions.
26 changes: 13 additions & 13 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,22 @@ appearance, race, religion, or sexual identity and orientation.
Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
- The use of sexualized language or imagery and unwelcome sexual attention or
advances
- Trolling, insulting/derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing other's private information, such as physical or electronic
address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Expand Down
34 changes: 21 additions & 13 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
# Contributing to examples

We want to make contributing to this project as easy and transparent as
possible.

## Pull Requests

We actively welcome your pull requests.

If you're new we encourage you to take a look at issues tagged with [good first issue](https://github.com/pytorch/examples/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)

### For new examples
0. Create a github issue proposing a new example and make sure it's substantially different from an existing one

0. Create a GitHub issue proposing a new example and make sure it's substantially different from an existing one.
1. Fork the repo and create your branch from `main`.
2. If you've added code that should be tested, add tests to `run_python_examples.sh`
2. If you've added code that should be tested, add tests to `run_python_examples.sh`.
3. Create a `README.md`.
4. Add a card with a brief description of your example and link to the repo to
the `docs/source/index.rst` file and build the docs by running:
the `docs/source/index.rst` file and build the docs by running:

```
cd docs
Expand All @@ -22,34 +25,39 @@ If you're new we encourage you to take a look at issues tagged with [good first
pip install -r requirements.txt
make html
```

When done working with `virtualenv`, run `deactivate`.

5. Verify that there are no issues in your doc build. You can check preview locally
5. Verify that there are no issues in your doc build. You can check the preview locally
by installing [sphinx-serve](https://pypi.org/project/sphinx-serve/) and
then running `sphinx-serve -b build`.

5. Ensure your test passes locally.
6. If you haven't already, complete the Contributor License Agreement ("CLA").
7. Address any feedback in code review promptly.
6. Ensure your test passes locally.
7. If you haven't already, complete the Contributor License Agreement ("CLA").
8. Address any feedback in code review promptly.

## For bug fixes

1. Fork the repo and create your branch from `main`.
2. Make sure you have a GPU-enabled machine, either locally or in the cloud. `g4dn.4xlarge` is a good starting point on AWS.
3. Make your code change.
2. Make sure you have a GPU-enabled machine, either locally or in the cloud. `g4dn.4xlarge` is a good starting point on AWS.
3. Make your code change.
4. First, install all dependencies with `./run_python_examples.sh "install_deps"`.
5. Then make sure that `./run_python_examples.sh` passes locally by running script end to end.
5. Then make sure that `./run_python_examples.sh` passes locally by running the script end to end.
6. If you haven't already, complete the Contributor License Agreement ("CLA").
7. Address any feedback in code review promptly.


## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need

To accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Facebook's open source projects.

Complete your CLA here: <https://code.facebook.com/cla>

## Issues

We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.

## License

By contributing to examples, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@

https://pytorch.org/examples/

`pytorch/examples` is a repository showcasing examples of using [PyTorch](https://github.com/pytorch/pytorch). The goal is to have curated, short, few/no dependencies *high quality* examples that are substantially different from each other that can be emulated in your existing work.
`pytorch/examples` is a repository showcasing examples of using [PyTorch](https://github.com/pytorch/pytorch). The goal is to have curated, short, few/no dependencies _high quality_ examples that are substantially different from each other that can be emulated in your existing work.

* For tutorials: https://github.com/pytorch/tutorials
* For changes to pytorch.org: https://github.com/pytorch/pytorch.github.io
* For a general model hub: https://pytorch.org/hub/ or https://huggingface.co/models
* For recipes on how to run PyTorch in production: https://github.com/facebookresearch/recipes
* For general Q&A and support: https://discuss.pytorch.org/
- For tutorials: https://github.com/pytorch/tutorials
- For changes to pytorch.org: https://github.com/pytorch/pytorch.github.io
- For a general model hub: https://pytorch.org/hub/ or https://huggingface.co/models
- For recipes on how to run PyTorch in production: https://github.com/facebookresearch/recipes
- For general Q&A and support: https://discuss.pytorch.org/

## Available models

Expand Down
2 changes: 1 addition & 1 deletion cpp/autograd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ $ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

where `/path/to/libtorch` should be the path to the unzipped *LibTorch*
where `/path/to/libtorch` should be the path to the unzipped _LibTorch_
distribution, which you can get from the [PyTorch
homepage](https://pytorch.org/get-started/locally/).

Expand Down
6 changes: 4 additions & 2 deletions cpp/custom-dataset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,10 @@ $ make

where /path/to/libtorch should be the path to the unzipped LibTorch distribution, which you can get from the [PyTorch homepage](https://pytorch.org/get-started/locally/).

if you see an error like ```undefined reference to cv::imread(std::string const&, int)``` when running the ```make``` command, you should build LibTorch from source using the instructions [here](https://github.com/pytorch/pytorch#from-source), and then set ```CMAKE_PREFIX_PATH``` to that PyTorch source directory.
if you see an error like `undefined reference to cv::imread(std::string const&, int)` when running the `make` command, you should build LibTorch from source using the instructions [here](https://github.com/pytorch/pytorch#from-source), and then set `CMAKE_PREFIX_PATH` to that PyTorch source directory.

The build directory should look like this:

```
.
├── custom-dataset
Expand All @@ -38,9 +39,10 @@ The build directory should look like this:
└── ...
```

```info.txt``` file gets copied from source directory during build.
`info.txt` file gets copied from source directory during build.

Execute the compiled binary to train the model:

```shell
./custom-dataset
Running on: CUDA
Expand Down
2 changes: 1 addition & 1 deletion cpp/dcgan/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ $ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

where `/path/to/libtorch` should be the path to the unzipped *LibTorch*
where `/path/to/libtorch` should be the path to the unzipped _LibTorch_
distribution, which you can get from the [PyTorch
homepage](https://pytorch.org/get-started/locally/).

Expand Down
1 change: 0 additions & 1 deletion cpp/distributed/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,3 @@ To run the code,
```shell
mpirun -np {NUM-PROCS} ./dist-mnist
```

2 changes: 1 addition & 1 deletion cpp/mnist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ $ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

where `/path/to/libtorch` should be the path to the unzipped *LibTorch*
where `/path/to/libtorch` should be the path to the unzipped _LibTorch_
distribution, which you can get from the [PyTorch
homepage](https://pytorch.org/get-started/locally/).

Expand Down
2 changes: 1 addition & 1 deletion cpp/regression/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ $ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

where `/path/to/libtorch` should be the path to the unzipped *LibTorch*
where `/path/to/libtorch` should be the path to the unzipped _LibTorch_
distribution, which you can get from the [PyTorch
homepage](https://pytorch.org/get-started/locally/).

Expand Down
2 changes: 1 addition & 1 deletion cpp/transfer-learning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ For **prediction**:
1. `cd build`
2. `./classify <path_image> <path_to_resnet18_model_without_fc_layer> <model_linear_trained>` : `./classify <path_image> ../resnet18_without_last_layer.pt model_linear.pt`

Detailed blog on applying Transfer Learning using Libtorch: https://krshrimali.github.io/Applying-Transfer-Learning-Dogs-Cats/.
Detailed blog on applying Transfer Learning using Libtorch: https://krshrimali.github.io/Applying-Transfer-Learning-Dogs-Cats/.
3 changes: 3 additions & 0 deletions dcgan/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,15 @@ with the samples from the generative model.
After every epoch, models are saved to: `netG_epoch_%d.pth` and `netD_epoch_%d.pth`

## Downloading the dataset

You can download the LSUN dataset by cloning [this repo](https://github.com/fyu/lsun) and running

```
python download.py -c bedroom
```

## Usage

```
usage: main.py [-h] --dataset DATASET --dataroot DATAROOT [--workers WORKERS]
[--batchSize BATCHSIZE] [--imageSize IMAGESIZE] [--nz NZ]
Expand Down
24 changes: 22 additions & 2 deletions distributed/ddp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ multiple nodes, each with multiple GPUs using PyTorch's distributed
[launcher script](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py).

# Prerequisites
We assume you are familiar with [PyTorch](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html), the primitives it provides for [writing distributed applications](https://pytorch.org/tutorials/intermediate/dist_tuto.html) as well as training [distributed models](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html).

We assume you are familiar with [PyTorch](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html), the primitives it provides for [writing distributed applications](https://pytorch.org/tutorials/intermediate/dist_tuto.html) as well as training [distributed models](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html).

The example program in this tutorial uses the
[`torch.nn.parallel.DistributedDataParallel`](https://pytorch.org/docs/stable/nn.html#distributeddataparallel) class for training models
Expand All @@ -20,6 +21,7 @@ application but each one operates on different portions of the
training dataset.

# Application process topologies

A Distributed Data Parallel (DDP) application can be executed on
multiple nodes where each node can consist of multiple GPU
devices. Each node in turn can run multiple copies of the DDP
Expand Down Expand Up @@ -49,6 +51,7 @@ computational costs. In the rest of this tutorial, we assume that the
application follows this heuristic.

# Preparing and launching a DDP application

Independent of how a DDP application is launched, each process needs a
mechanism to know its global and local ranks. Once this is known, all
processes create a `ProcessGroup` that enables them to participate in
Expand All @@ -66,26 +69,32 @@ python -c "from os import path; import torch; print(path.join(path.dirname(torch
```

This will print something like this:

```sh
/home/username/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/distributed/launch.py
```

When the DDP application is started via `launch.py`, it passes the world size, global rank, master address and master port via environment variables and the local rank as a command-line parameter to each instance.
To use the launcher, an application needs to adhere to the following convention:

1. It must provide an entry-point function for a _single worker_. For example, it should not launch subprocesses using `torch.multiprocessing.spawn`
2. It must use environment variables for initializing the process group.

For simplicity, the application can assume each process maps to a single GPU but in the next section we also show how a more general process-to-GPU mapping can be performed.

# Sample application

The sample DDP application in this repo is based on the "Hello, World" [DDP tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html).

## Argument passing convention

The DDP application takes two command-line arguments:

1. `--local_rank`: This is passed in via `launch.py`
2. `--local_world_size`: This is passed in explicitly and is typically either $1$ or the number of GPUs per node.

The application parses these and calls the `spmd_main` entrypoint:

```py
if __name__ == "__main__":
parser = argparse.ArgumentParser()
Expand All @@ -94,7 +103,9 @@ if __name__ == "__main__":
args = parser.parse_args()
spmd_main(args.local_world_size, args.local_rank)
```

In `spmd_main`, the process group is initialized with just the backend (NCCL or Gloo). The rest of the information needed for rendezvous comes from environment variables set by `launch.py`:

```py
def spmd_main(local_world_size, local_rank):
# These are the parameters used to initialize the process group
Expand All @@ -116,6 +127,7 @@ def spmd_main(local_world_size, local_rank):
```

Given the local rank and world size, the training function, `demo_basic` initializes the `DistributedDataParallel` model across a set of GPUs local to the node via `device_ids`:

```py
def demo_basic(local_world_size, local_rank):

Expand Down Expand Up @@ -144,10 +156,13 @@ def demo_basic(local_world_size, local_rank):
```

The application can be launched via `launch.py` as follows on a 8 GPU node with one process per GPU:

```sh
python /path/to/launch.py --nnode=1 --node_rank=0 --nproc_per_node=8 example.py --local_world_size=8
```

and produces an output similar to the one shown below:

```sh
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Expand Down Expand Up @@ -177,16 +192,21 @@ Setting OMP_NUM_THREADS environment variable for each process to be 1 in default
[238631] rank = 4, world_size = 8, n = 1, device_ids = [4]
[238627] rank = 0, world_size = 8, n = 1, device_ids = [0]
```
Similarly, it can be launched with a single process that spans all 8 GPUs using:
```sh
python /path/to/launch.py --nnode=1 --node_rank=0 --nproc_per_node=1 example.py --local_world_size=1
```
that in turn produces the following output
```sh
[262816] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '0', 'WORLD_SIZE': '1'}
[262816]: world_size = 1, rank = 0, backend=nccl
[262816] rank = 0, world_size = 1, n = 8, device_ids = [0, 1, 2, 3, 4, 5, 6, 7]
```
# Conclusions
As the author of a distributed data parallel application, your code needs to be aware of two types of resources: compute nodes and the GPUs within each node. The process of setting up bookkeeping to track how the set of GPUs is mapped to the processes of your application can be tedious and error-prone. We hope that by structuring your application as shown in this example and using the launcher, the mechanics of setting up distributed training can be significantly simplified.
As the author of a distributed data parallel application, your code needs to be aware of two types of resources: compute nodes and the GPUs within each node. The process of setting up bookkeeping to track how the set of GPUs is mapped to the processes of your application can be tedious and error-prone. We hope that by structuring your application as shown in this example and using the launcher, the mechanics of setting up distributed training can be significantly simplified.
22 changes: 11 additions & 11 deletions distributed/rpc/batch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
This folder contains two examples for [`@rpc.functions.async_execution`](https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.functions.async_execution):

1. Synchronized Batch Update Parameter Server: uses `@rpc.functions.async_execution`
for parameter update and retrieving. This serves as a simple starter example
for batch RPC.
```
pip install -r requirements.txt
python parameter_server.py
```
for parameter update and retrieving. This serves as a simple starter example
for batch RPC.
```
pip install -r requirements.txt
python parameter_server.py
```
2. Multi-Observer with Batch-Processing Agent: uses `@rpc.functions.async_execution`
to run multiple observed states through the policy to get actions.
```
pip install -r requirements.txt
python reinforce.py
```
to run multiple observed states through the policy to get actions.
```
pip install -r requirements.txt
python reinforce.py
```
2 changes: 1 addition & 1 deletion distributed/rpc/parameter_server/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
### RPC-based distributed training

This is a basic example of RPC-based training that uses several trainers remotely train a model hosted on a server.
This is a basic example of RPC-based training that uses several trainers remotely train a model hosted on a server.

To run the example locally, run the following command worker for the server and each worker you wish to spawn, in separate terminal windows:
`python rpc_parameter_server.py --world_size=WORLD_SIZE --rank=RANK`. For example, for a master node with world size of 2, the command would be `python rpc_parameter_server.py --world_size=2 --rank=0`. The trainer can then be launched with the command `python rpc_parameter_server.py --world_size=2 --rank=1` in a separate window, and this will begin training with one server and a single trainer.
Expand Down
2 changes: 1 addition & 1 deletion distributed/rpc/rnn/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Distributed RNN Model Parallel Example

This example shows how to build an RNN model using RPC where different
This example shows how to build an RNN model using RPC where different
components of the RNN model can be placed on different workers.

```
Expand Down
3 changes: 0 additions & 3 deletions distributed/sharded_tensor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,10 @@ PyTorch native sharding APIs, which include:
3. A E2E demo of tensor parallel for a given toy model (Forward/backward + optimization).
4. API to optimize parameters when they are `ShardedTensor`s.


More details about the design can be found:
https://github.com/pytorch/pytorch/issues/72138


```
pip install -r requirements.txt
python main.py
```

Loading

0 comments on commit bcb5667

Please sign in to comment.