diff --git a/README.md b/README.md index 61ba162..ba4b9a5 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ Data and code for this research project are distributed across different places. ## The Lexicon -We created emotion lexicons for 91 languages, each one covers eight emotional variables and comprises over 100k word entries. There are several versions of the lexicons, the difference being the choice of the expansion model: There is a linear regression baseline and three versions of neural network models. The *main* version of our lexicons (the version we refer to in the main experiments of our paper and the one we would recommend to use) is referred to as as **MTL_grouped** (applying multi-task learning within two groups of our target variables). **If you are mainly interested in our lexicons, download [this](https://zenodo.org/record/3756607/files/MTL_grouped.zip?download=1) zip file.** It contains 91 tsv files which are named `.tsv` . Please refer to the [description of the Zenodo record](https://doi.org/10.5281/zenodo.3756606) for more details. +We created emotion lexicons for 91 languages, each one covers eight emotional variables and comprises over 100k word entries. There are several versions of the lexicons, the difference being the choice of the expansion model: There is a linear regression baseline and three versions of neural network models. The *main* version of our lexicons (the version we refer to in the main experiments of our paper and the one we would recommend to use) is referred to as as **MTL_grouped** (applying multi-task learning within two groups of our target variables). **If you are mainly interested in our lexicons, download [this](https://zenodo.org/record/3756607/files/MTL_grouped.zip?download=1) zip file.** It contains 91 tsv files which are named `.tsv`. Please refer to the [description of the Zenodo record](https://doi.org/10.5281/zenodo.3756606) for more details. @@ -60,7 +60,7 @@ Recreating the lexicons from scratch requires the Source lexicon, data splits, a * Get the file [Ratings_Warriner_et_al.csv](https://github.com/JULIELab/XANEW/blob/master/Ratings_Warriner_et_al.csv) (commit b1ed97e from 11 Nov 2019) and place it in `/memolon/data/Source`. * Get the file [Warriner_BE.tsv](https://github.com/JULIELab/EmoMap/blob/master/coling18/main/lexicon_creation/lexicons/Warriner_BE.tsv) (commit dbfa3b9 from 15 Jun 2018) and place it in ``/memolon/data/Source``. - + The python scripts for creating the lexicons can be found in `/memolon/src`. You can either `cd` there and simply run `run_all.sh` or follow the more detailed instructions below. Please take note that the whole process may take several hours. **You do not have to have a GPU to run our code in a reasonable amount of time.** @@ -96,23 +96,23 @@ Running the gold evaluation and related analyses requires you to manually collec and place it in `/memolon/data/TargetGold`. * es1. Get the file `Redondo(2007).xls` from [Redondo et al. (2007)](https://doi.org/10.3758/BF03193031) and place it `/memolon/data/TargetGold`. * es2. Get the file `13428_2015_700_MOESM1_ESM.csv` from [Stadthagen-Gonzalez et al. (2017)](https://doi.org/10.3758/BF03192999) and save it as `/memolon/data/TargetGold/Stadthagen_VA.csv` - * es3. Get the file `Hinojosa et al_Supplementary materials.xlsx` from [Hinojosa et al., (2015)](https://link.springer.com/article/10.3758%2Fs13428-015-0572-5) and place it in `/memolon/data/TargetGold`. + * es3. Get the file `Hinojosa et al_Supplementary materials.xlsx` from [Hinojosa et al. (2015)](https://link.springer.com/article/10.3758%2Fs13428-015-0572-5) and place it in `/memolon/data/TargetGold`. * es4. Included in the download for es3. * es5. Get the file `13428_2017_962_MOESM1_ESM.csv` from [Stadthagen-Gonzalez et al. (2018)](https://doi.org/10.3758/s13428-017-0962-y) and save it as `/memolon/data/TargetGold/Stadthagen_BE.csv`. - * es6. Get the file `13428_2016_768_MOESM1_ESM.xls` from [Ferre et al. (2017)](https://doi.org/10.3758/s13428-016-0768-3) ad save it as `/memolon/data/TargetGold/Ferre.xlsx`. - * de1. Get the file `13428_2013_426_MOESM1_ESM.xlsx` from [Schmidtke et al., 2014](https://doi.org/10.3758/s13428-013-0426-y) and save it as `/memolon/data/TargetGold/Schmidtke.xlsx` + * es6. Get the file `13428_2016_768_MOESM1_ESM.xls` from [Ferré et al. (2017)](https://doi.org/10.3758/s13428-016-0768-3) ad save it as `/memolon/data/TargetGold/Ferre.xlsx`. + * de1. Get the file `13428_2013_426_MOESM1_ESM.xlsx` from [Schmidtke et al. (2014)](https://doi.org/10.3758/s13428-013-0426-y) and save it as `/memolon/data/TargetGold/Schmidtke.xlsx` * de2. Get the file `BAWL-R.xls` from [Vo et al. (2009)](https://doi.org/10.3758/BRM.41.2.534) which is currently available [here](https://www.ewi-psy.fu-berlin.de/einrichtungen/arbeitsbereiche/allgpsy/Download/BAWL/index.html). You will need to request a password from the authors. Save the file **without password** as `/memolon/data/TargetGold/BAWL-R.xls`. We had to run an automatic file repair when oping it with Excel for the first time. * de3. Get the file `LANG_database.txt` from [Kaske and Kotz (2010)](https://doi.org/10.3758/BRM.42.4.987) and place it `/memolon/data/TargetGold`. * de4. Get de2 (see above). Then, get the file `13428_2011_59_MOESM1_ESM.xls` from [Briesemeister et al. (2011)](https://doi.org/10.3758/s13428-011-0059-y) and save it as `/memolon/data/TargetGold/Briesemeister.xls`. - * pl1. Get the file `data sheet 1.xlsx` from [Imbir 2016](https://doi.org/10.3389/fpsyg.2016.01081) and save it as `/memolon/data/TargetGold/Imbir.xlsx`. + * pl1. Get the file `data sheet 1.xlsx` from [Imbir (2016)](https://doi.org/10.3389/fpsyg.2016.01081) and save it as `/memolon/data/TargetGold/Imbir.xlsx`. * pl2. Get the file `13428_2014_552_MOESM1_ESM.xlsx` from [Riegel et al. (2015)](https://doi.org/10.3758/s13428-014-0552-1) and save it as `/memolon/data/TargetGold/Riegel.xlsx` * pl3. Get pl2 (see above). Then, get the file `S1 Dataset` from [Wierzba et al. (2015)](https://doi.org/10.1371/journal.pone.0132305) and save it as `/memolon/data/TargetGold/Wierzba.xlsx`. - * zh1. Get CVAW 2.0 from [Yu et al. 2016](https://doi.org/10.18653/v1/N16-1066) which is distributed via + * zh1. Get CVAW 2.0 from [Yu et al. (2016)](https://doi.org/10.18653/v1/N16-1066) which is distributed via [this website](http://nlp.innobic.yzu.edu.tw/resources/cvaw.html). Use Google Translate to 'translate' the words in `cvaw2.csv` from traditional to simplified Chinese characters (you can batch-translate by copy-pasting multiple words separated by newline directly from the file). Save the modified file as `/memolon/data/TargetGold/cvaw2_simplied.csv`. - * zh2. Get the file `13428_2016_793_MOESM2_ESM.pdf` from [Yao et al. 2017](https://doi.org/10.3758/s13428-016-0793-2). Convert PDF to Excel (there are online tools for that but check the results for correctness) and save as `/memolon/data/TargetGold/Yao.xlsx`. + * zh2. Get the file `13428_2016_793_MOESM2_ESM.pdf` from [Yao et al. (2017)](https://doi.org/10.3758/s13428-016-0793-2). Convert PDF to Excel (there are online tools for that but check the results for correctness) and save as `/memolon/data/TargetGold/Yao.xlsx`. * it. Get the data from [Montefinese et al. (2014)](https://doi.org/10.3758/s13428-013-0405-3). The website offers a PDF version of the ratings. However, the formatting makes it very difficult to process automatically. Instead, the first author Maria Montefinese provided us with an Excel version. Save the ratings as `/memolon/data/TargetGold/Montefinese.xls`. * pt. Get the file `13428_2011_131_MOESM1_ESM.xls` from [Soares et al. (2012)](https://doi.org/10.3758/s13428-011-0131-7). @@ -120,7 +120,7 @@ Running the gold evaluation and related analyses requires you to manually collec * nl. Get the file `13428_2012_243_MOESM1_ESM.xlsx` from [Moors et al. (2013)](https://doi.org/10.3758/s13428-012-0243-8). Save it as `/memolon/data/TargetGold/Moors.xlsx`. * id. Get the file `Data Sheet 1.XLSX` from [Sianipar et al. (2016)](https://doi.org/10.3389/fpsyg.2016.01907). Save it as `/memolon/data/TargetGold/Sianipar.xlsx` - * el. Get the data from [Palogiannidi et a. (2016)](https://www.aclweb.org/anthology/L16-1458): We downloaded the ratings via the [link](www.telecom.tuc.gr/~epalogiannidi/docs/resources/greek_affective_lexicon.zip) + * el. Get the data from [Palogiannidi et al. (2016)](https://www.aclweb.org/anthology/L16-1458): We downloaded the ratings via the [link](www.telecom.tuc.gr/~epalogiannidi/docs/resources/greek_affective_lexicon.zip) provided in the paper on March 13, 2018. The link pointed to zip containing a single file `greek_affective_lexicon.csv` which we saved under `/memolon/data/TargetGold`. However, the original link does not work anymore (as of April 22, 2020). We recommend contacting the authors for a replacement. * tr1. Get the file `TurkishEmotionalWordNorms.csv` from [Kapucu et al. (2018)](https://doi.org/10.1177/0033294118814722) which is available [here](https://osf.io/rxtdm/). Place it under `/memolon/data/TargetGold`. diff --git a/memolon/data/TargetPred/.gitignore b/memolon/data/TargetPred/.gitignore index c96a04f..35f0812 100644 --- a/memolon/data/TargetPred/.gitignore +++ b/memolon/data/TargetPred/.gitignore @@ -1,2 +1,6 @@ * -!.gitignore \ No newline at end of file +!.gitignore +!MTL_all +!MTL_grouped +!ridge +!STL diff --git a/memolon/data/TargetPred/MTL_all/.gitignore b/memolon/data/TargetPred/MTL_all/.gitignore new file mode 100755 index 0000000..c96a04f --- /dev/null +++ b/memolon/data/TargetPred/MTL_all/.gitignore @@ -0,0 +1,2 @@ +* +!.gitignore \ No newline at end of file diff --git a/memolon/data/TargetPred/MTL_grouped/.gitignore b/memolon/data/TargetPred/MTL_grouped/.gitignore new file mode 100755 index 0000000..c96a04f --- /dev/null +++ b/memolon/data/TargetPred/MTL_grouped/.gitignore @@ -0,0 +1,2 @@ +* +!.gitignore \ No newline at end of file diff --git a/memolon/data/TargetPred/STL/.gitignore b/memolon/data/TargetPred/STL/.gitignore new file mode 100644 index 0000000..c96a04f --- /dev/null +++ b/memolon/data/TargetPred/STL/.gitignore @@ -0,0 +1,2 @@ +* +!.gitignore \ No newline at end of file diff --git a/memolon/data/TargetPred/ridge/.gitignore b/memolon/data/TargetPred/ridge/.gitignore new file mode 100644 index 0000000..c96a04f --- /dev/null +++ b/memolon/data/TargetPred/ridge/.gitignore @@ -0,0 +1,2 @@ +* +!.gitignore \ No newline at end of file