Skip to content

rsingh135/ProteinArchitect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

82 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 Protein Architect - Expressibility-Aware Designer

Hack Princeton Fall 2025

An Agentic, Closed-Loop Generative Platform that designs and optimizes therapeutic protein sequences for function, stability, and large-scale industrial manufacturability.

πŸ“š Quick Links

  • πŸš€ START HERE: See START_HERE.md for quick setup
  • πŸ”§ Fix PyTorch Error: See QUICK_FIX.md or FIX_TORCH_ERROR.md
  • πŸ“– Complete Setup: See COMPLETE_STEP_BY_STEP.md
  • πŸ”’ Security: See SECURITY.md

βš™οΈ Configuration & Setup (REQUIRED BEFORE RUNNING)

πŸ” Step 1: Configure API Keys and Environment Variables

CRITICAL: You must set up API keys before running the application. Follow these steps:

  1. Create Environment File:

    cd backend
    cp .env.example .env
  2. Get API Keys (Required):

    Gemini API Key (Required for Protein Search):

    OpenAI API Key (Optional - for LLM refinement):

    AWS Credentials (Required for SageMaker deployment):

  3. Edit .env File:

    # Open backend/.env in your editor
    nano backend/.env
    # or
    code backend/.env
  4. Fill in Your Keys:

    # Required
    GEMINI_API_KEY=your_actual_gemini_api_key_here
    
    # Optional (for LLM refinement)
    OPENAI_API_KEY=your_actual_openai_api_key_here
    
    # Required for SageMaker (if deploying)
    AWS_ACCESS_KEY_ID=your_aws_access_key_id
    AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
    AWS_REGION=us-east-1
    SAGEMAKER_PPI_ENDPOINT=protein-ppi-endpoint
    
    # Development mode (uses local service instead of SageMaker)
    USE_LOCAL_PPI=true
  5. Verify Configuration:

    # Check that .env is in .gitignore (should be)
    cat .gitignore | grep .env
    
    # Verify .env file is NOT tracked by git
    git status | grep .env
    # Should return nothing (file should not appear)

🚨 Security Checklist

Before committing code, verify:

  • .env file is in .gitignore
  • .env file is NOT committed to git
  • API keys are kept secret and never shared
  • Different keys used for development vs production
  • Keys are rotated regularly

Quick Security Check:

# Run the security check script
./check_security.sh

# Or manually check:
git status | grep .env  # Should return nothing
git ls-files | grep .env  # Should only show .env.example

πŸ“‹ Quick Setup Summary

# 1. Clone repository
git clone <your-repo-url>
cd GenLab

# 2. Set up backend environment
cd backend
cp .env.example .env
# Edit .env with your API keys (see above)

# 3. Install backend dependencies
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# 4. Set up frontend
cd ../frontend
npm install

# 5. Start backend (in one terminal)
cd ../backend
source venv/bin/activate
uvicorn main:app --reload --port 8000

# 6. Start frontend (in another terminal)
cd frontend
npm run dev

πŸ†˜ Troubleshooting

Error: "Gemini service not available"

  • Check that GEMINI_API_KEY is set in .env
  • Verify the API key is correct
  • Check that .env file is in the backend/ directory

Error: "SageMaker endpoint not found"

  • Set USE_LOCAL_PPI=true for local development
  • Or deploy model to SageMaker and set SAGEMAKER_PPI_ENDPOINT

Error: "Module not found"

  • Make sure virtual environment is activated
  • Run pip install -r requirements.txt again

Environment variables not loading

  • Ensure .env file is in backend/ directory
  • Check file is named exactly .env (not env or .env.txt)
  • Restart the backend server after changing .env

🎯 Problem Solved: The Expressibility Cliff

The major bottleneck in biologics development is that AI-designed proteins often misfold, are unstable, or are impossible to produce at high yield in a bioreactor. This system guarantees that designs are viable for commercial production.

✨ Key Features

  • Generative Design-for-Expression: RL-based AI generates novel protein sequences optimized for your constraints
  • Expressibility Oracle: GNN/Transformer model (deployed on AWS SageMaker) predicts stability and manufacturability
  • Interactive Design Dialogue: LLM agent enables conversational protein refinement
  • 3D Structural Visualization: Interactive 3D protein structure viewer using Three.js
  • Host Organism View: Visual representation of protein expression in host cells (E. coli/CHO)
  • Manufacturing Protocol Generation: Automatic generation of industrial production recipes with cost/yield predictions

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  React Frontend β”‚
β”‚  (Three.js 3D)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚ HTTP/REST
         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     FastAPI Backend              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Protein Generator       β”‚   β”‚
β”‚  β”‚  (Mock RL-based)         β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Expressibility Oracle   β”‚   β”‚
β”‚  β”‚  └─> AWS SageMaker       β”‚   β”‚
β”‚  β”‚      (Mock endpoint)     β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Manufacturing Agent     β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  LLM Agent (OpenAI)      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Python 3.9+ (Python 3.11 recommended for better compatibility)
  • Node.js 18+
  • npm or yarn

Backend Setup

  1. Navigate to the backend directory:
cd backend
  1. Create and activate virtual environment:
# Option A: Use installation script (recommended)
./install.sh

# Option B: Manual installation
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt

Note: If you encounter Python 3.13 compatibility issues, use Python 3.11:

python3.11 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
  1. Configure Environment Variables (REQUIRED):

    # Copy the example file
    cp .env.example .env
    
    # Edit .env and add your API keys
    # See "Configuration & Setup" section at top of README for details

    Required API Keys:

    Important: Never commit .env to git! It's already in .gitignore.

  2. Start the backend server:

uvicorn main:app --reload --port 8000

The API will be available at http://localhost:8000

Frontend Setup

  1. Navigate to the frontend directory:
cd frontend
  1. Install dependencies:
npm install
  1. Start the development server:
npm run dev

The frontend will be available at http://localhost:3000

πŸ“– Usage

  1. Design a Protein: Fill in the target name and constraints in the Design tab
  2. View 3D Structure: See the generated protein structure in the 3D Structure tab
  3. Check Manufacturing: View the production protocol and cost estimates in the Manufacturing tab
  4. Refine Design: Use the "Refine Design" button to interactively improve your protein using natural language

πŸ§ͺ API Endpoints

POST /generate_protein

Generate a novel protein sequence with expressibility optimization.

Request:

{
  "target_name": "Anti-TNF-alpha Antibody",
  "max_length": 200,
  "max_cysteines": 5,
  "functional_constraint": "Must bind to receptor X",
  "additional_constraints": "Optimize for stability"
}

Response:

{
  "sequence": "MKTAYIAKQR...",
  "length": 150,
  "oracle_results": {
    "instability_index": 35.2,
    "stability_score": 64.8,
    "yield_prediction": 0.8,
    "host_cell": "E. coli",
    "cost_per_gram": 105.2,
    "is_stable": true
  },
  "manufacturing_protocol": {...},
  "retraining_triggered": false
}

POST /refine_protein

Refine protein design using conversational LLM agent.

Request:

{
  "sequence": "MKTAYIAKQR...",
  "refinement_prompt": "Reduce predicted immunogenicity by 20%"
}

🎨 Features Demo

3D Visualization

  • Interactive protein structure viewer
  • Color-coded amino acids by type
  • Rotate and zoom controls

Host Organism View

  • 3D visualization of E. coli or CHO cells
  • Highlighted protein expression sites
  • Animated ribosomes

Expressibility Oracle

  • Real-time stability predictions
  • Cost penalty calculations
  • Yield predictions

πŸ”§ Technology Stack

Backend:

  • FastAPI (Python web framework)
  • OpenAI API (LLM agent)
  • AWS SageMaker (mock for hackathon)
  • NumPy, scikit-learn (ML utilities)

Frontend:

  • React 18
  • Three.js (3D visualization)
  • Vite (build tool)
  • Axios (HTTP client)

πŸ“ Notes for Hackathon

  • AWS SageMaker Integration: Currently uses a mock implementation. In production, this would connect to an actual SageMaker endpoint hosting a trained GNN/Transformer model.
  • Protein Generation: Uses mock RL-based generation. In production, this would use a trained reinforcement learning model.
  • 3D Structure: Uses simplified visualization. In production, this would integrate with AlphaFold/ESMFold API for accurate structure prediction.

🎯 Hackathon Highlights

  1. Closed-Loop System: Demonstrates the complete pipeline from design to manufacturing
  2. AWS Cloud Integration: Shows SageMaker deployment architecture (mock)
  3. Interactive 3D Visualization: Engaging user experience with Three.js
  4. LLM-Powered Refinement: Natural language protein design refinement
  5. Cost Optimization: Automatic cost penalty system guides design optimization

πŸ“„ License

See LICENSE file for details.

πŸ™ Acknowledgments

Built for Hack Princeton Fall 2025. Inspired by the need to bridge the gap between AI-designed proteins and industrial manufacturability.

About

Hack Princeton Fall 2025

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •