docs: radically simplify EVALUATION.md and RANKING.md#47
Merged
Conversation
Simplified evaluation documentation by: **Removed (764 lines)**: - Unimplemented scripts (generate_test_queries.py, evaluate_ranking.py, etc.) - Non-existent modules (src.ranking.evaluation, EvaluationService, ABTest) - Extensive A/B testing methodology (not yet implemented) - Automated evaluation pipelines (not yet implemented) - Continuous monitoring infrastructure (not yet implemented) - Relevance judgment collection tools (not yet implemented) - Query set generation and validation (not yet implemented) **Kept (218 lines)**: - Manual search testing with working commands - Brief explanation of MRR, NDCG, P@k, R@k metrics - Comparison of semantic/BM25/hybrid strategies - Practical evaluation checklist - Links to external resources for theory **Key Changes**: - Added disclaimer at top: automated evaluation not yet implemented - All code examples now use actual working API endpoints - Moved theoretical content to "Future Evaluation Plans" section - Updated status: "Manual testing only" **Impact**: - No more confusing references to non-existent code - Users can actually run all provided examples - Clear distinction between implemented vs. planned features - Much easier to read and navigate (78% shorter)
Simplified ranking documentation by: **Removed (505 lines)**: - Non-existent modules (BM25Scorer, FeatureExtractor, ABTest) - 40+ ranking features documentation (not implemented) - Parameter tuning infrastructure (not implemented) - Query expansion and reformulation (not implemented) - Advanced optimization code (ANN, batch processing) - A/B testing framework (not implemented) - ML-based reranking features **Kept (338 lines)**: - Four actual strategies: semantic, bm25, hybrid_rrf, hybrid_weighted - Clear decision guide for when to use each - Working API examples (all curl commands tested) - Algorithm explanations with concrete examples - Understanding sections for BM25, RRF, vector similarity - Future enhancements section (clearly marked as planned) **Key Changes**: - Added disclaimer: advanced features not yet implemented - All code examples use actual working endpoints - Focused on explaining how algorithms work conceptually - Removed references to non-existent src.ranking modules - Updated status: "Basic ranking strategies implemented" **Impact**: - 60% reduction in size - 100% of content is accurate and usable - Clear separation between implemented vs. planned features
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Radically simplified two major documentation files by removing extensive documentation for unimplemented features and keeping only practical, working content.
Files Changed
1. EVALUATION.md: 982 → 218 lines (78% reduction)
2. RANKING.md: 843 → 338 lines (60% reduction)
Total reduction: 1,825 lines → 556 lines (69% reduction, 1,269 lines removed)
Problem
Both documentation files extensively documented features that don't exist:
EVALUATION.md Issues
evaluate_ranking.py,generate_test_queries.py)src.ranking.evaluation,EvaluationService,ABTest)RANKING.md Issues
BM25Scorer,FeatureExtractor)This created confusion for users trying to follow the documentation.
Changes Made
EVALUATION.md (982 → 218 lines)
Removed (764 lines):
Kept (218 lines):
RANKING.md (843 → 338 lines)
Removed (505 lines):
BM25Scorer,FeatureExtractor,ABTest)Kept (338 lines):
semantic,bm25,hybrid_rrf,hybrid_weightedKey Improvements
Both Files
Added disclaimers at top:
All code examples work: Every curl command and script uses actual implemented endpoints
Clear structure:
Status footers:
Impact
Testing
EVALUATION.md
RANKING.md
Related PRs
Part of post-Milestone 3 documentation cleanup:
Philosophy
Better to have accurate, minimal documentation than comprehensive but misleading documentation.
We can expand these files when the infrastructure is actually implemented. For now, users get: