Skip to content

Fix: Resolve Pipeline 500 Error and Improve Data Processing#1

Merged
ODORA0 merged 1 commit intomainfrom
fix/pipeline-500-error
Oct 16, 2025
Merged

Fix: Resolve Pipeline 500 Error and Improve Data Processing#1
ODORA0 merged 1 commit intomainfrom
fix/pipeline-500-error

Conversation

@ODORA0
Copy link
Owner

@ODORA0 ODORA0 commented Oct 16, 2025

🐛 Bug Fix: Pipeline 500 Error Resolution

The 'Run Pipeline' button was returning a 500 Internal Server Error, preventing the data pipeline from executing successfully.

Root Causes Identified

  1. Worker Service Not Running: Python/FastAPI worker wasn't started properly
  2. dbt Port Mismatch: dbt configured for port 5432, but PostgreSQL running on 5433
  3. CSV Column Name Issues: Numeric column names like "1958" caused PostgreSQL syntax errors
  4. dbt Model Generation: Inconsistent column naming between CSV loading and dbt models

Solutions Implemented

1. Fixed dbt Port Configuration

  • Updated dbt/relayboard/profiles.yml to use port 5433 instead of 5432
  • Ensures dbt can connect to PostgreSQL Docker container

2. Enhanced CSV Column Name Handling

  • Added robust column name cleaning function:
    def clean_column_name(name):
        cleaned = str(name).strip().replace('"', '').replace("'", "").replace(" ", "_")
        if cleaned.isdigit() or (cleaned and cleaned[0].isdigit()):
            cleaned = f"col_{cleaned}"
        if not cleaned:
            cleaned = "unnamed_column"
        return cleaned.lower()
  • Numeric column names like "1958" are now converted to col_1958

3. Improved Error Handling

  • Added detailed logging and traceback information in worker
  • Better error messages for debugging pipeline issues

4. Fixed dbt Model Generation

  • Applied consistent column cleaning logic to dbt model generation
  • Ensures staging and warehouse tables have matching column names

5. Updated .gitignore

  • Added dbt build artifacts (target/, .user.yml) to prevent committing build files

Testing Results

Pipeline now works successfully!

image

What's Working Now

  1. ** CSV Registration**: Upload CSV files to MinIO
  2. ** Slack Configuration**: Set up Slack webhook destinations
  3. ** Pipeline Execution**: Complete data pipeline runs successfully:
    • Downloads CSV from MinIO
    • Loads data to PostgreSQL staging schema
    • Generates dbt models automatically
    • Runs dbt transformations
    • Dispatches results to Slack

Video DEMO:

Screen.Recording.Oct.16.2025.1.1.mov

- Fix dbt port configuration (5432 → 5433) to match PostgreSQL Docker setup
- Add robust CSV column name cleaning to handle numeric column names
- Improve error handling in worker with detailed logging and traceback
- Fix dbt model generation to use consistent column naming
- Add dbt build artifacts to .gitignore

Fixes the 'Run Pipeline' 500 error by:
1. Starting worker service properly with uvicorn
2. Correcting database port mismatch
3. Handling edge cases in CSV column names (e.g., '1958' → 'col_1958')
4. Ensuring dbt models use cleaned column names consistently

Pipeline now successfully:
- Downloads CSV from MinIO
- Loads data to PostgreSQL staging
- Generates and runs dbt transformations
- Dispatches results to Slack

Resolves: Pipeline execution errors and data processing issues
@ODORA0 ODORA0 merged commit 4e00aa8 into main Oct 16, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant