We have created a web app using Streamlit and deployed a custom machine learning model that can predict whether any GitHub repository is safe to consume or not just by passing the URL. Here, we generate a score based on the data we scrapped from some famous and random repositories on GitHub.
- Folder
Notebookscontains data and script to extract data, analysis of data or the model creation code. - We have used github api and Kaggle to collect the github data stored in the file
github_api.csvandkaggle_data.csvrespectively which has columnsrepo_name,star,fork,watch,issue,tags,most_used_lang,discription,contributors,license, andrepo_url. data_extraction.ipynbfile contains script to extract the information from repositories,analysis.ipynbfile contains cleaning and visualization operations on the dataset.model.ipynbbuilding a machine learning model that can predict which repositories will gain how muchstarsin the future. 😃
