This project classifies research papers based on their content and predicts their publishability. The model determines whether a paper is suitable for publication and, if so, suggests the most relevant conference.
- Uses SciBERT for embedding paper content
- Self-training classifier for publishability prediction
- Sentence-BERT for similarity-based conference classification
- Supports multiple conferences, including CVPR, NeurIPS, EMNLP, TMLR, and KDD
Ensure you have the following installed:
- Python 3.8+
- Required dependencies (install via
requirements.txt)
- Clone the repository:
git clone https://github.com/Ayush-Sharma23/pathway-hackathon-resources.git cd pathway-hackathon-resources - Install dependencies:
pip install -r requirements.txt
- Prepare CSV files:
data/labeled_data.csv(contains labeled paper content and labels for training)data/unlabeled_data.csv(contains paper content for classification)
- Run the script:
python main.py
- The results will be saved in
results/output.csv.
research-paper-classifier/
│── main.py # Main script for training and classification
│── requirements.txt # Dependencies
│── data/ # Folder containing input data
│ ├── labeled_data.csv
│ ├── unlabeled_data.csv
│── results/ # Folder to store outputs
│ ├── output.csv
│── README.md # Project documentation
pandasnumpytransformerssentence-transformersscikit-learntqdm
- Fine-tune SciBERT for better embeddings
- Add more conferences and their topic embeddings
- Improve classification accuracy using advanced ML techniques