Last Updated: October 16, 2025
Status: ✅ Consolidated & Optimized
src/unified_case_extraction_master.py
- Status: ✅ ACTIVE - Single source of truth
- Purpose: ALL case name extraction and cleaning
- Key Functions:
extract_case_name_and_date_unified_master()- Main extractionget_master_extractor()- Get singleton instanceUnifiedCaseExtractionMaster._clean_case_name()- Cleaning logic
src/clean_extraction_pipeline.py
- Status: ✅ ACTIVE - Production citation finding
- Purpose: Find citations in documents (uses eyecite + regex)
- Delegates to:
unified_case_extraction_master.pyfor cleaning - Used by: URL uploads, async processing
src/citation_extraction_endpoint.py
- Status: ✅ ACTIVE - Production entry point
- Purpose: Main API endpoint for citation extraction
- Uses:
clean_extraction_pipeline.py
src/unified_case_name_extractor_v2.py
- Status:
⚠️ DEPRECATED (shows warnings) - Purpose: Backwards compatibility wrapper
- Delegates to:
unified_case_extraction_master.py - Migration: Replace calls with direct master calls
src/unified_citation_processor_v2.py
- Status:
⚠️ PARTIALLY DEPRECATED - Has deprecation notice: Line 50-57
- Still used by: Some legacy code paths
- Future: Will be replaced by clean pipeline
src/unified_extraction_architecture.py
src/case_name_extraction_core.py (47+ duplicate functions)
src/enhanced_sync_processor.py
src/processors/citation_extractor.py
src/processors/name_year_extractor.py
User uploads URL
↓
progress_manager.py
↓
citation_extraction_endpoint.extract_citations_production()
↓
clean_extraction_pipeline.CleanExtractionPipeline.extract_citations()
↓
clean_extraction_pipeline._clean_eyecite_case_name()
↓ DELEGATES TO
unified_case_extraction_master._clean_case_name()
✅ SINGLE SOURCE OF TRUTH
User pastes text
↓
unified_sync_processor.py
↓
May use various paths, but ultimately:
↓
unified_case_extraction_master.extract_case_name_and_date_unified_master()
✅ SINGLE SOURCE OF TRUTH
- ❌ Duplicate cleaning logic in 2 files
- ❌ 80+ lines of code duplicated
- ❌ Bug fixes needed in multiple places
- ✅ Single source of truth for cleaning
- ✅ 51 lines of duplicate code eliminated
- ✅ Bug fixes apply everywhere automatically
- ✅ DO: Edit
unified_case_extraction_master._clean_case_name() - ✅ RESULT: Fix automatically applies to ALL code paths
- ❌ DON'T: Edit
clean_extraction_pipeline._clean_eyecite_case_name()(it delegates)
- Add deprecation warnings
- Update
DEPRECATION_NOTICE.md - Create delegation to master
- Test both code paths
- Remove after 2-3 versions
- Complete Migration: Fully migrate
unified_citation_processor_v2.pyto use master - Remove Wrappers: Eventually remove
unified_case_name_extractor_v2.py - Single Pipeline: Consolidate sync and async paths to use same pipeline
- Archive Old Code: Move deprecated files to
src/deprecated/folder
- Code Duplication: 73 fewer duplicate lines (51 + 22 from imports)
- Single Source of Truth: 100% (all cleaning goes through master)
- Test Coverage: All fixes tested with URL and text inputs
- Documentation: Deprecation notices in place