Optimizing LLM-Powered Agents for Tabular Data Analytics 🚀

Overview 📋

This repository contains the code and resources for my diploma thesis. The project explores the use of Large Language Models (LLMs) in analyzing tabular data using natural language by generation and execution of Python code. The thesis includes a comprehensive literature review, development of an LLM-based Agent program, and performance evaluations using fine-tuned and state-of-the-art models.

Project Structure 🗂️

The project is organized as follows:

TableQA-LLMAgent/       # Root directory
│
├── .github/            # CI/CD workflows
│   └── workflows/
│
├── README.md           # This README file
├── main.py             # Agent usage example
├── poetry.lock         # Poetry dependency management
├── pyproject.toml      # Readable dependencies
│
├── tableqallmagent/    # Source code of the package
│   ├── __init__.py
│   ├── agent.py        # Constructor and the main interface
│   ├── code_manipulation.py # Processing generated code
│   ├── coder_llms.py   # Forward passes for coding LLMs
│   ├── llms.py         # Higher level methods for LLMs
│   ├── logger.py       # Color constants for readability
│   ├── prompts.py      # Prompting strategies and formatting
│
├── dataset/            # Multiple datasets and preprocessing
├── dist/               # PyPI versions
├── evaluation/         # LLM-as-evaluator
├── finetuning/         # LoRA training scripts and configs
├── plots/              # Directory to store generated images
└── tests/              # pytest

Installation 🔧

To get started with the project, follow these steps:

Clone the repository:

git clone https://github.com/poludmik/TableQA-LLMAgent.git
cd TableQA-LLMAgent

Install the dependencies:
```
poetry install
```

Usage 🚀

You can run the main script to see the basic example functionalities of the agent:

python main.py

Features ✨

Fine-Tuning: Fine-tuning LLMs using LoRA and QLoRA techniques.
Code Generation: Generating Python code to analyze tabular data.
Model Evaluation: Rigorous benchmarks for evaluating LLM Agents.
MLOps: Tracking experiments using MLOps tools to ensure reproducibility.

Results 📊

The fine-tuning experiments significantly improved the performance of the Code Llama 7B Python model from 35.3% to 60.3% on the proposed evaluation benchmark.

Contact 📫

For any questions or feedback, please reach out to me:

Mikhail Poludin
michael.poludin@gmail.com

License 📜

This project is licensed under the MIT License.

Name	Name	Last commit message	Last commit date
Latest commit poludmik Update README.md Jul 10, 2024 fe044ef · Jul 10, 2024 History 126 Commits
.github/workflows	.github/workflows	First CI test	Apr 13, 2024
README	README	README update	Mar 3, 2024
dataset	dataset	Name refactor	May 13, 2024
dist	dist	Refactor and GPT-4o testing	May 17, 2024
evaluation	evaluation	Name refactor	May 13, 2024
finetuning/lora	finetuning/lora	Name refactor	May 13, 2024
plots	plots	CoderOnlyFunctions strategy	Feb 28, 2024
tableqallmagent	tableqallmagent	Name refactor	May 13, 2024
tests	tests	Name refactor	May 13, 2024
.gitignore	.gitignore	DePlot tests	May 13, 2024
README.md	README.md	Update README.md	Jul 10, 2024
main.py	main.py	Refactor and GPT-4o testing	May 17, 2024
poetry.lock	poetry.lock	Refactor and GPT-4o testing	May 17, 2024
pyproject.toml	pyproject.toml	Refactor and GPT-4o testing	May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing LLM-Powered Agents for Tabular Data Analytics 🚀

Overview 📋

Project Structure 🗂️

Installation 🔧

Usage 🚀

Features ✨

Results 📊

Contact 📫

License 📜

About

Releases

Packages

Languages

poludmik/TableQA-LLMAgent

Folders and files

Latest commit

History

Repository files navigation

Optimizing LLM-Powered Agents for Tabular Data Analytics 🚀

Overview 📋

Project Structure 🗂️

Installation 🔧

Usage 🚀

Features ✨

Results 📊

Contact 📫

License 📜

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages