A CLI tool for analyzing research software repositories with a focus on workflow languages. Mudag helps researchers and developers count code, comments, and blank lines in various workflow language files to understand code complexity and documentation levels.
- Count code lines, comment lines, and blank lines in files
- Focus on workflow languages used in scientific research:
- Common Workflow Language (CWL)
- Snakemake
- Nextflow
- Galaxy
- KNIME
- Support for scanning individual files or entire directories
- Multiple output formats (table, JSON, CSV)
- Automatic file and directory exclusion using
.mudagignore
patterns (similar to.gitignore
)
# Clone the repository
git clone https://github.com/aaronstrachardt/mudag.git
cd mudag
# Create and activate Virtual Environment (recommended)
python3 -m venv venv
source venv/bin/activate
# Install the package in development mode
pip install -e .
- Python 3.7+
- No external dependencies required
# Analyze workflow files in a directory
mudag analyze path/to/directory
# Analyze a specific file
mudag analyze path/to/file.cwl
# Choose output format
mudag analyze path/to/directory --format json
# Save output to a file
mudag analyze path/to/directory --output results.json
# List all workflow files in a directory
mudag list-workflows path/to/directory
Language | Extensions |
---|---|
Common Workflow Language | .cwl |
Snakemake | Snakefile , .smk , .snake , .snakefile , .snakemake , .rules , .rule |
Nextflow | .nf , .nextflow , .config |
Galaxy | .ga , .galaxy , .gxwf |
KNIME | .knwf , .workflow.knime , .knar |
File Path | Code | Comment | Blank | Total
------------------------------------------------
path/to/file1.cwl | 10 | 5 | 2 | 17
path/to/file2.smk | 20 | 10 | 5 | 35
------------------------------------------------
TOTAL | 30 | 15 | 7 | 52
{
"summary": {
"total_files": 2,
"total_code": 30,
"total_comment": 15,
"total_blank": 7,
"total_lines": 52
},
"files": {
"path/to/file1.cwl": {
"code": 10,
"comment": 5,
"blank": 2,
"total": 17
},
"path/to/file2.smk": {
"code": 20,
"comment": 10,
"blank": 5,
"total": 35
}
}
}
File Path,Code Lines,Comment Lines,Blank Lines,Total Lines
path/to/file1.cwl,10,5,2,17
path/to/file2.smk,20,10,5,35
TOTAL,30,15,7,52
Mudag automatically uses .mudagignore
files to exclude files and directories from analysis. This allows you to specify patterns of files and directories to exclude, similar to how .gitignore
works.
You can create a .mudagignore
file in the following locations:
- Project-specific: In the root directory of your project
- Global: In your home directory (
~/.mudagignore
)
Example .mudagignore
file:
# Default directories to ignore
.git/
__pycache__/
node_modules/
venv/
env/
# Ignore all .log files
*.log
# Ignore specific directories
temp/
old_workflows/
# Ignore specific files
broken_workflow.cwl
test_data.fa
The ignore patterns support glob patterns similar to .gitignore
:
*
matches any number of characters?
matches a single character[abc]
matches any character in the set- Lines starting with
#
are treated as comments
To run the tests for Mudag:
# Run all tests
python3 -m pytest tests -v
# Run specific tests
python3 -m pytest tests/unit/test_analyzer.py -v
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.