-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
43afc5a
commit d8dbd51
Showing
1 changed file
with
23 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,27 @@ The core functionality of JEmAS is also implemented in a [UIMA Analysis Engine]( | |
## Installation | ||
JEmAS was written for Java 7. If you only want to _use_ JEmAS, I recommend downloading the precompiled JAR-files from [here](https://github.com/JULIELab/JEmAS/releases). They are ready to use and come without prerequisites (other than a java installation). If you are a developer and like to build on top of JEmAS, you will need Maven for compilation. | ||
|
||
## Output | ||
The output of the tool is printed on standard output (your terminal window) in tsv format (TAB seperated values) which is structured like this: | ||
|
||
| File Name | Valence | Arousal | Dominance | StdDev Valence | StdDev Arousal | StdDev Dominance | Tokens | Alphabetic Token | Non-Stopword Tokens | Recognized Tokens | NumberCount | | ||
|-----------|---------|----------|-----------|----------------|----------------|------------------|--------|------------------|---------------------|-------------------|-------------| | ||
| test.txt | 0.51901 | -0.82562 | 0.7281 | 1.17961 | 0.82111 | 0.85398 | 612 | 362 | 274 | 121 | 25 | | ||
|
||
You can copy+paste this output into excel or calc to get proper formatting (or redirect the output into a file right from the start). | ||
|
||
The columns have the following meanings: | ||
|
||
- File Name: The file you analyzed. | ||
- Valence, Arousal, Domiance: The three-dimensional emotion value of the document as determined by JEmAS (this is the most important piece of information you want to get from it). | ||
- StdDev Valence, StdDev Arousal, StdDev Dominance: Standard deviation (SD) of all the _words_ in the document in respect to their individual Valence, Arousal and Dominance ratings. | ||
- Token: The number of tokens (i.e., individual words, numbers, punctuation marks, ...) in your document | ||
- Alphabethic Token: the number of tokens which start with a letter (thus excluding numbers and punctuation). | ||
- Non-Stopword Tokens: the number of tokens left after stopword (mostly non-content words) removal | ||
- Recognized Tokens: the number of tokens which are recognized to be emotional relevant according to the emotion lexicon. | ||
- NumberCount: the number of numeric expressions (numbers, currency,...) | ||
|
||
|
||
## Usage | ||
JEmAS has two distinct operation mode, a default mode and an advanced mode. Using the advanced mode, you can manually choose the employed word emotion lexicon, the term weighting function for constructing the BOW representation (absolute frequency or TFIDF) and the preprocessing mode (no lexical normalization or lemmatization). Using the default mode, JEmAS will run with default settings: | ||
- lexical normilazation: lemmatization | ||
|
@@ -22,7 +43,7 @@ JEmAS has two distinct operation mode, a default mode and an advanced mode. Usin | |
```` | ||
java -jar NAME_OF_JAR INPUT (AUXILIARY_FOLDER) | ||
```` | ||
Where INPUT is the path to a folder in which all existing files (with .txt suffix or without any suffix) will be processed or the path to a file where each line will then be processed individually (generating emotion scores for each seperate line). AUXILIARY_FOLDER is the path to an existing folder where auxiliary output files (such as the vocabulary) will be saved. When you omit this argument, a new folder will be created in your working directory. | ||
Where INPUT is the path to a folder in which all existing files (with .txt suffix or without any suffix) will be processed or the path to a file where each line will then be processed individually (generating emotion scores for each seperate line). In the latter case, the output table will use line numbers (starting with 1) in the "File Name" column instead (see above). AUXILIARY_FOLDER is the path to an existing folder where auxiliary output files (such as the vocabulary) will be saved. When you omit this argument, a new folder will be created in your working directory. | ||
|
||
### Advanced mode: | ||
```` | ||
|
@@ -35,28 +56,8 @@ You will specify your desired settings in a dialog-like fashion. | |
The lexcion has to be csv-formatted with TAB as delimiter and without column headers. Each entry (consisting of a word and an associated VAD value) must, thus, be formatted like this: | ||
```` | ||
WORD TAB VALENCE TAB AROUSAL TAB DOMINANCE | ||
``` | ||
Where VALENCE, AROUSAL and DOMINANCE are numerical values. | ||
## Output | ||
The output of the tool is printed on standard output (your terminal window) in tsv format (TAB seperated values). It should look like this where "..." indicates some following lines with numbers (one line per document you analyze). | ||
``` | ||
File Name Valence Arousal Dominance StdDev Valence StdDev Arousal StdDev Dominance Tokens Alphabetic Token Non-Stopword Tokens Recognized Tokens NumberCount | ||
test.txt 0,51901 -0,82562 0,7281 1,17961 0,82111 0,85398 612 362 274 121 25 | ||
... | ||
```` | ||
You can copy+paste this output into excel or calc to get proper formatting (or redirect the output into a file right from the start). | ||
|
||
The columns have the following meanings: | ||
|
||
- File Name: The file you analyzed. | ||
- Valence, Arousal, Domiance: The three-dimensional emotion value of the document as determined by JEmAS (this is the most important piece of information you want to get from it). | ||
- StdDev Valence, StdDev Arousal, StdDev Dominance: Standard deviation (SD) of all the _words_ in the document in respect to their individual Valence, Arousal and Dominance ratings. | ||
- Token: The number of tokens (i.e., individual words, numbers, punctuation marks, ...) in your document | ||
- Alphabethic Token: the number of tokens which start with a letter (thus excluding numbers and punctuation). | ||
- Non-Stopword Tokens: the number of tokens left after stopword (mostly non-content words) removal | ||
- Recognized Tokens: the number of tokens which are recognized to be emotional relevant according to the emotion lexicon. | ||
- NumberCount: the number of numeric expressions (numbers, currency,...) | ||
Where VALENCE, AROUSAL and DOMINANCE are numerical values. | ||
|
||
## Contact | ||
I am happy to give additional information or get feedback on this tool via email: [email protected] |