Skip to content

damiangilgonzalez1995/TalkDocument

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Name: Talk with Document using LLM's

Diagram of how the web application works 📷:

Schema

Description

🤖 The Interactive Document Query System project offers a dynamic web application that empowers users to inquire about documents available in PDF, TXT, or URL formats. Behind this project lies a robust technological stack, including embeddings, vector storage, distance calculation algorithms like FAISS, and a large language model for facilitating user-document interactions.

Demo 🎥

Demo Video

Project Structure

The project is organized as follows:

TalkDocument
├─ .streamlit
│  └─ config.toml
├─ data
├─ example
├─ README.md
├─ requirements.txt
├─ resources
├─ setup.py
└─ src
   ├─ Home.py
   ├─ pages
   │  ├─ 1_Step 1️⃣ Create Data Base.py
   │  ├─ 2_Step 2️⃣ Ask to the document.py
   │  ├─ __init__.py
   │
   ├─ qa_tool.py
   ├─ style.py
   ├─ utils
   │  ├─ util.py
   │  └─ __init__.py
   ├─ __init__.py

  • The .streamlit directory contains the Streamlit configuration file config.toml for customizing the web application's behavior.
  • The data directory holds sample documents (test.pdf and test.txt) that will be used for creating the database and querying.
  • The docs directory is intended for documentation-related assets, such as images.
  • 'README.md' (this file) is the project's main documentation file.
  • 'requirements.txt' lists the required Python packages for setting up the project environment.
  • The src directory contains the main source code for the project.
    • Home.py likely represents the main application entry point or landing page.
    • The pages directory includes the implementation for different steps/pages of the application.
    • qa_tool.py defines the TalkDocument class responsible for creating the database and handling queries.
    • Other utility files like style.py and utils.py might provide styling and helper functions, respectively.

Requirements

To successfully utilize the Interactive Document Query System, you must satisfy the following prerequisites:

Warning A free API key from Hugging Face Hub: The system employs Hugging Face models for embedding and vector storage. Obtain your API key by registering on the Hugging Face website.

Warning Optionally, an API key from OpenAI (if using OpenAI embedding): If you choose to utilize OpenAI's embedding model, you'll need an OpenAI API key. Register on the OpenAI platform to acquire your key.

Getting Started

To launch the application, follow these steps:

  1. Clone the repository to your local machine.
  2. Install the required dependencies using pip install -r requirements.txt.
  3. Open your terminal and navigate to the project's root directory.
  4. Run the following command:
streamlit run yourpath/TalkDocument/src/home.py

Credits

This project was developed by Damián Gil González.

About

This project is about how to talk with a document using LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages