This project presents an overview of Topic Modelling - a classical problem of unsupervised machine learning’s branch i.e., Natural Language Processing (NLP) - by studying and comparing two latent algorithms - Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). These techniques are applied to a public dataset - ‘A Million News Headlines’ - which contains a huge corpus of more than one million news headlines published by ABC (Australian Broadcasting Corporation) News over a period of 17 years.
Dataset link: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SYBGZL