The use of natural language processing has exploded over the last decade. Appilcations that require machines to understand natural human speech patterns are abundant and substantial improvements in these systems has increased their utility. Within the educational space NLP is used to interpret human speech for the prupose of understanding human problems and recently an online tutor passed a limited version of the Turing Test when it was indistinguishable from teaching assistants in a college class.
The purpose of this project is to process a set of documents, run a sentiment analysis of these documents and then generate topic models of those documents by applying Latent Dirichlet Allocation (LDA) topic modelling to a set of documents. The documents consist of student notes from the graduate-level Core Methods in Data Mining course.
- week-list.csv
- class-notes (containing CSV files with student notes)
- negative-words.txt
- positive-words.txt
First, the document files in the class-notes file were binded together into a dataframe. Then, the student notes were cleaned and processed with tm package. A word cloud was generated, which is shown below.
Then, the dataframe with students' notes was merged with the week-list, and a sentiment analysis was performed. In particular, a visualization with the sum of the sentiment score over weeks was generated, which is shown below.
LDA Topic Modelling was performed on the documents in order to generate topics. A visualization was created displaying a sentiment for each week and one important topic for that week, which can be seen below.


