diff --git a/README.md b/README.md index 2d0d85d7..2e8a5054 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ unzip the zipfile and put all `.pt` files into `bert_data` ### Option 2: process the data yourself #### Step 1 Download Stories -Download and unzip the `stories` directories from [here](http://cs.nyu.edu/~kcho/DMQA/) for both CNN and Daily Mail. Put all `.stroy` files in one directory (e.g. `../raw_stories`) +Download and unzip the `stories` directories from [here](http://cs.nyu.edu/~kcho/DMQA/) for both CNN and Daily Mail. Put all `.story` files in one directory (e.g. `../raw_stories`) #### Step 2. Download Stanford CoreNLP We will need Stanford CoreNLP to tokenize the data. Download it [here](https://stanfordnlp.github.io/CoreNLP/) and unzip it. Then add the following command to your bash_profile: