This application processes documents using AWS Textract, restructures the content, and allows users to ask questions about the document using Claude, an AI model from Anthropic.
- Document upload and processing using AWS Textract
- Document content restructuring
- Question-answering capability using Claude AI
- User-friendly interface built with Streamlit
- Python 3.7+
- AWS account with access to S3 and Textract services
- Anthropic API access for Claude
-
Clone the repository:
git clone https://github.com/yourusername/document-qa-app.git cd document-qa-app -
Install the required dependencies:
pip install -r requirements.txt -
Set up your AWS credentials:
- Create a file named
~/.aws/credentials(on Linux/Mac) orC:\Users\YourUsername\.aws\credentials(on Windows) - Add your AWS access key and secret key:
[default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY
- Create a file named
-
Run the Streamlit app:
streamlit run app.py -
Open your web browser and go to
http://localhost:8501 -
Upload a document (PDF, PNG, JPG, or JPEG)
-
Wait for the document to be processed and restructured
-
Enter your question about the document and click "Submit"
-
View the AI-generated answer
app.py: Main Streamlit applicationtextract_processing.py: Handles document processing with AWS Textractdocument_restructuring.py: Restructures the processed document contentclaude_qa.py: Manages interaction with the Claude AI model for question-answeringrequirements.txt: Lists all Python dependencies