Skip to content

Source code for LLM-based Agents for my diploma thesis project. Natural language driven data analysis. LoRA and QLoRA for quality enhancement.

Notifications You must be signed in to change notification settings

poludmik/TableQA-LLMAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fe044ef Β· Jul 10, 2024
Apr 13, 2024
Mar 3, 2024
May 13, 2024
May 17, 2024
May 13, 2024
May 13, 2024
Feb 28, 2024
May 13, 2024
May 13, 2024
May 13, 2024
Jul 10, 2024
May 17, 2024
May 17, 2024
May 17, 2024

Repository files navigation

Optimizing LLM-Powered Agents for Tabular Data Analytics πŸš€

Overview πŸ“‹

This repository contains the code and resources for my diploma thesis. The project explores the use of Large Language Models (LLMs) in analyzing tabular data using natural language by generation and execution of Python code. The thesis includes a comprehensive literature review, development of an LLM-based Agent program, and performance evaluations using fine-tuned and state-of-the-art models.

Project Structure πŸ—‚οΈ

The project is organized as follows:

TableQA-LLMAgent/       # Root directory
β”‚
β”œβ”€β”€ .github/            # CI/CD workflows
β”‚   └── workflows/
β”‚
β”œβ”€β”€ README.md           # This README file
β”œβ”€β”€ main.py             # Agent usage example
β”œβ”€β”€ poetry.lock         # Poetry dependency management
β”œβ”€β”€ pyproject.toml      # Readable dependencies
β”‚
β”œβ”€β”€ tableqallmagent/    # Source code of the package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ agent.py        # Constructor and the main interface
β”‚   β”œβ”€β”€ code_manipulation.py # Processing generated code
β”‚   β”œβ”€β”€ coder_llms.py   # Forward passes for coding LLMs
β”‚   β”œβ”€β”€ llms.py         # Higher level methods for LLMs
β”‚   β”œβ”€β”€ logger.py       # Color constants for readability
β”‚   β”œβ”€β”€ prompts.py      # Prompting strategies and formatting
β”‚
β”œβ”€β”€ dataset/            # Multiple datasets and preprocessing
β”œβ”€β”€ dist/               # PyPI versions
β”œβ”€β”€ evaluation/         # LLM-as-evaluator
β”œβ”€β”€ finetuning/         # LoRA training scripts and configs
β”œβ”€β”€ plots/              # Directory to store generated images
└── tests/              # pytest

Installation πŸ”§

To get started with the project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/poludmik/TableQA-LLMAgent.git
    cd TableQA-LLMAgent
  2. Install the dependencies:

    poetry install

Usage πŸš€

You can run the main script to see the basic example functionalities of the agent:

python main.py

Features ✨

  • Fine-Tuning: Fine-tuning LLMs using LoRA and QLoRA techniques.
  • Code Generation: Generating Python code to analyze tabular data.
  • Model Evaluation: Rigorous benchmarks for evaluating LLM Agents.
  • MLOps: Tracking experiments using MLOps tools to ensure reproducibility.

Results πŸ“Š

The fine-tuning experiments significantly improved the performance of the Code Llama 7B Python model from 35.3% to 60.3% on the proposed evaluation benchmark.

Contact πŸ“«

For any questions or feedback, please reach out to me:

Mikhail Poludin
michael.poludin@gmail.com

License πŸ“œ

This project is licensed under the MIT License.

About

Source code for LLM-based Agents for my diploma thesis project. Natural language driven data analysis. LoRA and QLoRA for quality enhancement.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages