Hack Princeton Fall 2025
An Agentic, Closed-Loop Generative Platform that designs and optimizes therapeutic protein sequences for function, stability, and large-scale industrial manufacturability.
- π START HERE: See
START_HERE.mdfor quick setup - π§ Fix PyTorch Error: See
QUICK_FIX.mdorFIX_TORCH_ERROR.md - π Complete Setup: See
COMPLETE_STEP_BY_STEP.md - π Security: See
SECURITY.md
CRITICAL: You must set up API keys before running the application. Follow these steps:
-
Create Environment File:
cd backend cp .env.example .env -
Get API Keys (Required):
Gemini API Key (Required for Protein Search):
- Visit: https://makersuite.google.com/app/apikey
- Create a new API key
- Copy the key
OpenAI API Key (Optional - for LLM refinement):
- Visit: https://platform.openai.com/api-keys
- Create a new API key
- Copy the key
AWS Credentials (Required for SageMaker deployment):
- Visit: https://console.aws.amazon.com/iam/
- Create access keys
- Copy Access Key ID and Secret Access Key
-
Edit
.envFile:# Open backend/.env in your editor nano backend/.env # or code backend/.env
-
Fill in Your Keys:
# Required GEMINI_API_KEY=your_actual_gemini_api_key_here # Optional (for LLM refinement) OPENAI_API_KEY=your_actual_openai_api_key_here # Required for SageMaker (if deploying) AWS_ACCESS_KEY_ID=your_aws_access_key_id AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key AWS_REGION=us-east-1 SAGEMAKER_PPI_ENDPOINT=protein-ppi-endpoint # Development mode (uses local service instead of SageMaker) USE_LOCAL_PPI=true
-
Verify Configuration:
# Check that .env is in .gitignore (should be) cat .gitignore | grep .env # Verify .env file is NOT tracked by git git status | grep .env # Should return nothing (file should not appear)
Before committing code, verify:
-
.envfile is in.gitignore -
.envfile is NOT committed to git - API keys are kept secret and never shared
- Different keys used for development vs production
- Keys are rotated regularly
Quick Security Check:
# Run the security check script
./check_security.sh
# Or manually check:
git status | grep .env # Should return nothing
git ls-files | grep .env # Should only show .env.example# 1. Clone repository
git clone <your-repo-url>
cd GenLab
# 2. Set up backend environment
cd backend
cp .env.example .env
# Edit .env with your API keys (see above)
# 3. Install backend dependencies
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# 4. Set up frontend
cd ../frontend
npm install
# 5. Start backend (in one terminal)
cd ../backend
source venv/bin/activate
uvicorn main:app --reload --port 8000
# 6. Start frontend (in another terminal)
cd frontend
npm run devError: "Gemini service not available"
- Check that
GEMINI_API_KEYis set in.env - Verify the API key is correct
- Check that
.envfile is in thebackend/directory
Error: "SageMaker endpoint not found"
- Set
USE_LOCAL_PPI=truefor local development - Or deploy model to SageMaker and set
SAGEMAKER_PPI_ENDPOINT
Error: "Module not found"
- Make sure virtual environment is activated
- Run
pip install -r requirements.txtagain
Environment variables not loading
- Ensure
.envfile is inbackend/directory - Check file is named exactly
.env(notenvor.env.txt) - Restart the backend server after changing
.env
The major bottleneck in biologics development is that AI-designed proteins often misfold, are unstable, or are impossible to produce at high yield in a bioreactor. This system guarantees that designs are viable for commercial production.
- Generative Design-for-Expression: RL-based AI generates novel protein sequences optimized for your constraints
- Expressibility Oracle: GNN/Transformer model (deployed on AWS SageMaker) predicts stability and manufacturability
- Interactive Design Dialogue: LLM agent enables conversational protein refinement
- 3D Structural Visualization: Interactive 3D protein structure viewer using Three.js
- Host Organism View: Visual representation of protein expression in host cells (E. coli/CHO)
- Manufacturing Protocol Generation: Automatic generation of industrial production recipes with cost/yield predictions
βββββββββββββββββββ
β React Frontend β
β (Three.js 3D) β
ββββββββββ¬βββββββββ
β
β HTTP/REST
β
ββββββββββΌββββββββββββββββββββββββββ
β FastAPI Backend β
β ββββββββββββββββββββββββββββ β
β β Protein Generator β β
β β (Mock RL-based) β β
β ββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββ β
β β Expressibility Oracle β β
β β ββ> AWS SageMaker β β
β β (Mock endpoint) β β
β ββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββ β
β β Manufacturing Agent β β
β ββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββ β
β β LLM Agent (OpenAI) β β
β ββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββ
- Python 3.9+ (Python 3.11 recommended for better compatibility)
- Node.js 18+
- npm or yarn
- Navigate to the backend directory:
cd backend- Create and activate virtual environment:
# Option A: Use installation script (recommended)
./install.sh
# Option B: Manual installation
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txtNote: If you encounter Python 3.13 compatibility issues, use Python 3.11:
python3.11 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt-
Configure Environment Variables (REQUIRED):
# Copy the example file cp .env.example .env # Edit .env and add your API keys # See "Configuration & Setup" section at top of README for details
Required API Keys:
GEMINI_API_KEY- Get from https://makersuite.google.com/app/apikeyOPENAI_API_KEY- Get from https://platform.openai.com/api-keys (optional)AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY- For SageMaker deployment
Important: Never commit
.envto git! It's already in.gitignore. -
Start the backend server:
uvicorn main:app --reload --port 8000The API will be available at http://localhost:8000
- Navigate to the frontend directory:
cd frontend- Install dependencies:
npm install- Start the development server:
npm run devThe frontend will be available at http://localhost:3000
- Design a Protein: Fill in the target name and constraints in the Design tab
- View 3D Structure: See the generated protein structure in the 3D Structure tab
- Check Manufacturing: View the production protocol and cost estimates in the Manufacturing tab
- Refine Design: Use the "Refine Design" button to interactively improve your protein using natural language
Generate a novel protein sequence with expressibility optimization.
Request:
{
"target_name": "Anti-TNF-alpha Antibody",
"max_length": 200,
"max_cysteines": 5,
"functional_constraint": "Must bind to receptor X",
"additional_constraints": "Optimize for stability"
}Response:
{
"sequence": "MKTAYIAKQR...",
"length": 150,
"oracle_results": {
"instability_index": 35.2,
"stability_score": 64.8,
"yield_prediction": 0.8,
"host_cell": "E. coli",
"cost_per_gram": 105.2,
"is_stable": true
},
"manufacturing_protocol": {...},
"retraining_triggered": false
}Refine protein design using conversational LLM agent.
Request:
{
"sequence": "MKTAYIAKQR...",
"refinement_prompt": "Reduce predicted immunogenicity by 20%"
}- Interactive protein structure viewer
- Color-coded amino acids by type
- Rotate and zoom controls
- 3D visualization of E. coli or CHO cells
- Highlighted protein expression sites
- Animated ribosomes
- Real-time stability predictions
- Cost penalty calculations
- Yield predictions
Backend:
- FastAPI (Python web framework)
- OpenAI API (LLM agent)
- AWS SageMaker (mock for hackathon)
- NumPy, scikit-learn (ML utilities)
Frontend:
- React 18
- Three.js (3D visualization)
- Vite (build tool)
- Axios (HTTP client)
- AWS SageMaker Integration: Currently uses a mock implementation. In production, this would connect to an actual SageMaker endpoint hosting a trained GNN/Transformer model.
- Protein Generation: Uses mock RL-based generation. In production, this would use a trained reinforcement learning model.
- 3D Structure: Uses simplified visualization. In production, this would integrate with AlphaFold/ESMFold API for accurate structure prediction.
- Closed-Loop System: Demonstrates the complete pipeline from design to manufacturing
- AWS Cloud Integration: Shows SageMaker deployment architecture (mock)
- Interactive 3D Visualization: Engaging user experience with Three.js
- LLM-Powered Refinement: Natural language protein design refinement
- Cost Optimization: Automatic cost penalty system guides design optimization
See LICENSE file for details.
Built for Hack Princeton Fall 2025. Inspired by the need to bridge the gap between AI-designed proteins and industrial manufacturability.