GitHub - RekhaMandyaRaju/Text-Entity-recogniser: Error detection and correction: dealing with homophone confusion

                    CSCI 544 Homework 3

Error detection and correction: dealing with homophone confusion

Training Data

Creation of Training file

After obtaining the extracted files from Project Gutenberg and some free e-book, use mergeFiles.py to read and write it to a text file.
Use collect_data1.py to obtain POS_TAGGING for the text file.
Use collect_data2.py to obtain tagged_words from NLTK Corpus.
Append both the tagged words to single file to foem training data.
- python2.7 mergeFiles.py train.dat training.txt
- python2.7 collect_data1.py training.txt training_tag
- python2.7 collect_data2.py training_corpus

After obtaining the tagged_words in the form of training _file, use hw3train.py to learn from training file execute following command

* python3 hw3train.py TRAININGFILE train_format

Execution of Perceptron

To learn from training file and create MODELFILEexecute following command

python3 perceplearn.py train_format MODELFILE

To classify the TEST_DATA taken from STDIN execute the following command

cat hw3.test.err.txt | python3 hw3tag.py MODELFILE hw3.output.txt

where MODELFILE is the modelfile generated by perceplearn.py

Here the input given is the error file and the output given is the corrected output file.

Third Party Software Used

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
README.md		README.md
hw3.output.txt		hw3.output.txt

Provide feedback