This framework provides an automated and configurable pipeline for mining research hotspots from papers submitted to top AI conferences (e.g., ICLR, NeurIPS, ICML). Driven by configuration files, it can execute data fetching, topic modeling, and results visualization.
- AI Academic Conference Hotspot Analysis Framework
Below are example results generated by analyzing ICLR 2025 conference papers using this framework.
Note: Running
main.pywill generate the latest analysis results based on your configuration.
.
├── configs/ # YAML configuration files for analysis tasks
├── data/ # (Git ignored) Stores raw (.jsonl) and processed (.csv) data
├── docs/ # Documentation and related resources (e.g., README images)
├── LICENSE # Project license file
├── main.py # Main entry point script (runs the analysis)
├── models/ # (Git ignored) Stores downloaded machine learning models
├── notebooks/ # Jupyter Notebooks (tutorials, exploratory analysis)
├── README_cn.md # Project description in Chinese
├── README.md # Project description in English (this file)
├── requirements.txt # Python dependency list
├── results/ # (Git ignored) Stores analysis results (plots, tables, models)
├── src/ # Core Python functional modules
│ ├── analyze.py # Analysis and visualization logic
│ ├── get_papers.py # Data fetching logic
│ ├── run_topic_modeling.py # Topic modeling logic
│ └── utils.py # (Optional) Common utility functions
└── .gitignore # Specifies intentionally untracked files that Git should ignore
It is recommended to use Conda for environment creation and pip for installing dependencies.
# Clone the repository
git clone [https://github.com/zhihengli-casia/AI-Paper-Trends.git](https://github.com/zhihengli-casia/AI-Paper-Trends.git)
cd AI-Paper-Trends
# 1. Create a new Conda environment (Python 3.10 recommended)
conda create --name ai-trend-analysis python=3.10
# 2. Activate the newly created environment
conda activate ai-trend-analysis
# 3. Install all required libraries using requirements.txt
pip install -r requirements.txtThe analysis pipeline is defined by .yaml files in the configs/ directory.
- Navigate to the
configs/directory. - Duplicate an existing
.yamlfile or create a new one. - Modify the parameters within the file to specify your analysis target.
Example (configs/iclr_2025_analysis.yaml):
conference_id: 'ICLR.cc/2025/Conference' # Target conference ID
fetch_reviews: True # Whether to fetch detailed review info
limit: null # Upper limit on papers to process (null=unlimited)
topic_modeling:
enabled: True # Whether to perform topic modeling
min_topic_size: 30 # BERTopic minimum topic size
analysis:
enabled: True # Whether to perform analysis and visualization
tasks: # List of analysis tasks to execute
- 'plot_paper_count' # - Plot ranking by paper count
- 'plot_avg_rating' # - Plot ranking by average score
- 'plot_decision_breakdown' # - Plot decision composition breakdown
- 'generate_summary_table' # - Generate statistics table
output_folder_name: 'iclr_2025_analysis' # Output directory name under results/Execute main.py from the project root directory, specifying the configuration file.
python main.py --config configs/iclr_2025_analysis.yamlThe script will execute the data fetching, topic modeling, and results generation steps according to the configuration. Outputs will be located in the data/ and results/ directories.
The notebooks/ directory provides a Jupyter environment for more in-depth or customized exploratory analysis based on the processed data (data/processed/*.csv) generated by main.py.
Usage Flow:
- Ensure the Conda environment is activated:
conda activate ai-trend-analysis - Start Jupyter Lab from the project root:
jupyter lab - Open the
.ipynbfiles within thenotebooks/directory in your browser.
Modify the conference_id in the configuration file. Common ID examples:
- ICLR:
ICLR.cc/2025/Conference - NeurIPS:
NeurIPS.cc/2023/Conference - ICML:
ICML.cc/2024/Conference
Suggestion: Verify the exact ID of the target conference on the OpenReview website.
To quickly validate the pipeline or configuration, set the limit parameter in the config file to process only a subset of papers:
limit: 100 # Process only the first 100 papersContributions are welcome! Please feel free to report issues, suggest features, or submit code contributions via Issues or Pull Requests.
This project is released under the MIT License.



