folder2md4llms converts folder structures and file contents into LLM-friendly Markdown files. It's designed to help developers share codebases with AI assistants like Claude.
- CLI (
cli.py): Entry point using rich-click for enhanced help - Processor (
processor.py): Main orchestrator for repository processing - Analyzers: Code analysis and condensing logic
- Converters: Document format conversion (PDF, DOCX, etc.)
- Engine: Smart anti-truncation engine for token management
- Utils: File handling, patterns, and configuration
- Streaming Processing: Handles large files efficiently with parallel processing
- Smart Condensing: Progressive condensing based on token budgets
- Hierarchical Ignore Patterns: Supports .folder2md_ignore at multiple levels
- Platform Agnostic: Uses python-magic-bin on Windows, python-magic elsewhere
src/folder2md4llms/
├── cli.py # Command-line interface
├── processor.py # Main processing logic
├── analyzers/ # Code analysis modules
│ ├── priority_analyzer.py # File importance scoring
│ ├── progressive_condenser.py # Smart code condensing
│ └── binary_analyzer.py # Binary file analysis
├── converters/ # Document converters
│ ├── converter_factory.py # Central converter registry
│ └── [format]_converter.py # Format-specific converters
├── engine/ # Smart processing engine
│ └── smart_engine.py # Token budget management
├── formatters/ # Output formatting
│ └── markdown.py # Markdown generation
└── utils/ # Utilities
├── config.py # Configuration management
├── file_strategy.py # File processing strategies
├── streaming_processor.py # Parallel file processing
└── token_utils.py # Token counting
# Clone and navigate
git clone https://github.com/henriqueslab/folder2md4llms
cd folder2md4llms
# Install with uv (recommended)
uv sync --all-extras
# Or traditional pip
pip install -e ".[dev]"# Run tests with coverage
just test
# Format code and fix lint issues
just fix
# Run all static analysis (format, lint, types)
just check
# Build package
just build- Tests use pytest with parallel execution
- Mock heavy operations (file I/O, network)
- Test cross-platform compatibility
- Coverage target: >80%
- Test Coverage: Increase coverage from 67% to >80%
- Error Handling: Improve error messages and recovery
- Performance: Optimize for large repositories
- Add support for more file formats
- Implement custom token counting models
- Add progress bars for long operations
- Keep docs simple and focused
- Avoid unnecessary complexity in installation guides
- Focus on common use cases, not edge cases
token_limit: Maximum tokens for output (e.g., 80000)smart_condensing: Enable intelligent code condensingcondense_languages: Languages to condensemax_file_size: Skip files larger than thistoken_budget_strategy: How to allocate tokens (balanced/aggressive/conservative)
FOLDER2MD_CONFIG: Path to custom config fileFOLDER2MD_UPDATE_CHECK: Disable update checks
- Python 3.11+ with type hints
- Ruff for linting and formatting
- Line length: 88 characters
- Docstrings for public APIs
- NO comments unless necessary
- Update version in
__version__.py - Update CHANGELOG.md
- Create PR to main
- Tag release triggers PyPI publication
- Update Homebrew formula (see below)
folder2md4llms is distributed via PyPI and Homebrew. After the PyPI release is live, update the Homebrew formula.
The Homebrew formula is maintained in a separate repository located at ../homebrew-formulas:
- Repository:
../homebrew-formulas/ - Formula file:
Formula/folder2md4llms.rb - Automation: Managed via justfile commands
After the PyPI release is live and verified at https://pypi.org/project/folder2md4llms/:
cd ../homebrew-formulas
# Option 1: Full automated release workflow (recommended)
# This will update, test, commit, and push in one command
just release folder2md4llms
# Option 2: Manual step-by-step workflow
just update folder2md4llms # Updates to latest PyPI version
just test folder2md4llms # Tests the formula installation
just commit folder2md4llms VERSION # Commits with standardized message
git push # Push to remote
# Utility commands
just list # List all formulas with current versions
just check-updates # Check for available PyPI updates
just sha256 folder2md4llms VERSION # Get SHA256 for a specific version- Always verify PyPI first: The formula update pulls package info from PyPI, so the release must be live
- Automatic metadata: The
just updatecommand automatically fetches the version, download URL, and SHA256 checksum from PyPI - Full automation: The
just releasecommand runs the complete workflow: update → test → commit → push - Standardized commits: Formula updates use consistent commit message format
- Testing: The
just testcommand uninstalls and reinstalls the formula to verify it works correctly
- Sanitize file paths to prevent traversal
- Limit file sizes to prevent DoS
- No execution of analyzed code
- Safe handling of binary files
- Adding File Formats: Extend
converter_factory.py - New Analyzers: Inherit from
BaseCodeAnalyzer - Token Counting: Use
token_utils.pyfor consistency - Cross-platform: Test on Windows, macOS, Linux
- GitHub Actions for CI/CD
- Coverage reports with codecov
- Performance benchmarks in tests
- Error tracking via GitHub issues
- Package Managers: PyPI, pipx
- IDEs: VS Code extension planned
- CI/CD: GitHub Actions examples
- Cloud: AWS Lambda, Google Cloud Functions
- Comprehensive: Process diverse file types and formats
- Intelligent: Use AST parsing and smart analysis for quality output
- Configurable: Extensive options for different use cases
- Reliable: Consistent output across platforms
- LLM-Friendly: Optimized for AI consumption
- Added upgrade workflow:
--upgradeand--upgrade-checkcommands using centralized henriqueslab-updater library- Automatic installation method detection (homebrew, pipx, uv, pip)
- Rich-formatted upgrade notifications
- GitHub release notes integration
- Simplified documentation across all channels
- Removed legacy version support references
- Streamlined installation instructions (Python package only)
- Reduced troubleshooting complexity
- Enhanced document converters with binary content validation
- Improved error handling in converters
- Supports 15+ document formats
- Multi-language AST parsing capabilities
- Configurable token/character limits
- Smart condensing with priority analysis