Current Architecture - CaseStrainer Extraction System

Last Updated: October 16, 2025
Status: ✅ Consolidated & Optimized

🎯 ACTIVE CODE - Use These

Primary Extraction & Cleaning

src/unified_case_extraction_master.py

Status: ✅ ACTIVE - Single source of truth
Purpose: ALL case name extraction and cleaning
Key Functions:
- extract_case_name_and_date_unified_master() - Main extraction
- get_master_extractor() - Get singleton instance
- UnifiedCaseExtractionMaster._clean_case_name() - Cleaning logic

Citation Finding (Production Pipeline)

src/clean_extraction_pipeline.py

Status: ✅ ACTIVE - Production citation finding
Purpose: Find citations in documents (uses eyecite + regex)
Delegates to: unified_case_extraction_master.py for cleaning
Used by: URL uploads, async processing

Production Endpoint

src/citation_extraction_endpoint.py

Status: ✅ ACTIVE - Production entry point
Purpose: Main API endpoint for citation extraction
Uses: clean_extraction_pipeline.py

⚠️ DEPRECATED BUT STILL FUNCTIONAL

Legacy Wrapper (Delegates to Master)

src/unified_case_name_extractor_v2.py

Status: ⚠️ DEPRECATED (shows warnings)
Purpose: Backwards compatibility wrapper
Delegates to: unified_case_extraction_master.py
Migration: Replace calls with direct master calls

Legacy Processor (Being Phased Out)

src/unified_citation_processor_v2.py

Status: ⚠️ PARTIALLY DEPRECATED
Has deprecation notice: Line 50-57
Still used by: Some legacy code paths
Future: Will be replaced by clean pipeline

❌ DO NOT USE - Deprecated

src/unified_extraction_architecture.py
src/case_name_extraction_core.py (47+ duplicate functions)
src/enhanced_sync_processor.py
src/processors/citation_extractor.py
src/processors/name_year_extractor.py

📊 Code Flow - Current

URL Upload (Async):

User uploads URL
    ↓
progress_manager.py
    ↓
citation_extraction_endpoint.extract_citations_production()
    ↓
clean_extraction_pipeline.CleanExtractionPipeline.extract_citations()
    ↓
clean_extraction_pipeline._clean_eyecite_case_name()
    ↓ DELEGATES TO
unified_case_extraction_master._clean_case_name()
    ✅ SINGLE SOURCE OF TRUTH

Text Paste (Sync):

User pastes text
    ↓
unified_sync_processor.py
    ↓
May use various paths, but ultimately:
    ↓
unified_case_extraction_master.extract_case_name_and_date_unified_master()
    ✅ SINGLE SOURCE OF TRUTH

🔧 Recent Consolidation (Oct 16, 2025)

Before:

❌ Duplicate cleaning logic in 2 files
❌ 80+ lines of code duplicated
❌ Bug fixes needed in multiple places

After:

✅ Single source of truth for cleaning
✅ 51 lines of duplicate code eliminated
✅ Bug fixes apply everywhere automatically

📝 For Developers

Adding a New Fix:

✅ DO: Edit unified_case_extraction_master._clean_case_name()
✅ RESULT: Fix automatically applies to ALL code paths
❌ DON'T: Edit clean_extraction_pipeline._clean_eyecite_case_name() (it delegates)

Deprecating More Code:

Add deprecation warnings
Update DEPRECATION_NOTICE.md
Create delegation to master
Test both code paths
Remove after 2-3 versions

🎯 Future Work

Complete Migration: Fully migrate unified_citation_processor_v2.py to use master
Remove Wrappers: Eventually remove unified_case_name_extractor_v2.py
Single Pipeline: Consolidate sync and async paths to use same pipeline
Archive Old Code: Move deprecated files to src/deprecated/ folder

✅ Quality Metrics

Code Duplication: 73 fewer duplicate lines (51 + 22 from imports)
Single Source of Truth: 100% (all cleaning goes through master)
Test Coverage: All fixes tested with URL and text inputs
Documentation: Deprecation notices in place

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current Architecture - CaseStrainer Extraction System

🎯 ACTIVE CODE - Use These

Primary Extraction & Cleaning

Citation Finding (Production Pipeline)

Production Endpoint

⚠️ DEPRECATED BUT STILL FUNCTIONAL

Legacy Wrapper (Delegates to Master)

Legacy Processor (Being Phased Out)

❌ DO NOT USE - Deprecated

📊 Code Flow - Current

URL Upload (Async):

Text Paste (Sync):

🔧 Recent Consolidation (Oct 16, 2025)

Before:

After:

📝 For Developers

Adding a New Fix:

Deprecating More Code:

🎯 Future Work

✅ Quality Metrics

FilesExpand file tree

ARCHITECTURE_CURRENT.md

Latest commit

History

ARCHITECTURE_CURRENT.md

File metadata and controls

Current Architecture - CaseStrainer Extraction System

🎯 ACTIVE CODE - Use These

Primary Extraction & Cleaning

Citation Finding (Production Pipeline)

Production Endpoint

⚠️ DEPRECATED BUT STILL FUNCTIONAL

Legacy Wrapper (Delegates to Master)

Legacy Processor (Being Phased Out)

❌ DO NOT USE - Deprecated

📊 Code Flow - Current

URL Upload (Async):

Text Paste (Sync):

🔧 Recent Consolidation (Oct 16, 2025)

Before:

After:

📝 For Developers

Adding a New Fix:

Deprecating More Code:

🎯 Future Work

✅ Quality Metrics