A web portal that uncovers the most important factor for movie making through the historical data on IMDb • Designed a web-scraping tool in Python that crawls over 10,000 reviews from The Shawshank Redemption including title, content, rating, vote and date, using Selenium to simulate web scrolling and click action • Pre-processed data by Pandas, tokenized text and applied stopwords from NLTK corpus • Analyzed the processed data by calculating keywords frequency in four dimensions: the cast, the director, the writer and the film company using NumPy, and visualized the results by Matplotlib • Interpreted the visual results and outputted the reports on webpages using Bokeh
Yaqing-Peng/MovieMetrics-Insight-Engine
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|