|
| 1 | +# dpdata - Atomistic Data Format Manipulation |
| 2 | + |
| 3 | +dpdata is a Python package for manipulating atomistic data from computational science software. It supports format conversion between various atomistic simulation packages including VASP, DeePMD-kit, LAMMPS, GROMACS, Gaussian, ABACUS, and many others. |
| 4 | + |
| 5 | +Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here. |
| 6 | + |
| 7 | +## Working Effectively |
| 8 | + |
| 9 | +- **Bootstrap and install the repository:** |
| 10 | + - `cd /home/runner/work/dpdata/dpdata` (or wherever the repo is cloned) |
| 11 | + - `uv pip install -e .` -- installs dpdata in development mode with core dependencies (numpy, scipy, h5py, monty, wcmatch) |
| 12 | + - Test installation: `dpdata --version` -- should show version like "dpdata v0.1.dev2+..." |
| 13 | + |
| 14 | +- **Run tests:** |
| 15 | + - `cd tests && python -m unittest discover` -- runs all 1826 tests in ~10 seconds. NEVER CANCEL. |
| 16 | + - `cd tests && python -m unittest test_<module>.py` -- run specific test modules (individual modules take ~0.5 seconds) |
| 17 | + - `cd tests && coverage run --source=../dpdata -m unittest discover && coverage report` -- run tests with coverage |
| 18 | + |
| 19 | +- **Linting and formatting:** |
| 20 | + - Install ruff: `uv pip install ruff` |
| 21 | + - `ruff check dpdata/` -- lint the main package (takes ~1 second) |
| 22 | + - `ruff format dpdata/` -- format code according to project style |
| 23 | + - `ruff check --fix dpdata/` -- auto-fix linting issues where possible |
| 24 | + |
| 25 | +- **Pre-commit hooks:** |
| 26 | + - Install: `uv pip install pre-commit` |
| 27 | + - `pre-commit run --all-files` -- run all hooks on all files |
| 28 | + - Hooks include: ruff linting/formatting, trailing whitespace, end-of-file-fixer, yaml/json/toml checks |
| 29 | + |
| 30 | +## Validation |
| 31 | + |
| 32 | +- **Always test CLI functionality after making changes:** |
| 33 | + - `dpdata --help` -- ensure CLI still works |
| 34 | + - `dpdata --version` -- verify version is correct |
| 35 | + - Test a basic conversion if sample data is available |
| 36 | + |
| 37 | +- **Always run linting before committing:** |
| 38 | + - `ruff check dpdata/` -- ensure no new linting errors |
| 39 | + - `ruff format dpdata/` -- ensure code is properly formatted |
| 40 | + |
| 41 | +- **Run relevant tests for your changes:** |
| 42 | + - For format-specific changes: `cd tests && python -m unittest test_<format>*.py` |
| 43 | + - For core system changes: `cd tests && python -m unittest test_system*.py test_multisystems.py` |
| 44 | + - For CLI changes: `cd tests && python -m unittest test_cli.py` (if exists) |
| 45 | + |
| 46 | +## Build and Documentation |
| 47 | + |
| 48 | +- **Documentation:** |
| 49 | + - `cd docs && make help` -- see all available build targets |
| 50 | + - `cd docs && make html` -- build HTML documentation (requires additional dependencies) |
| 51 | + - Documentation source is in `docs/` directory using Sphinx |
| 52 | + - **NOTE:** Full docs build requires additional dependencies like `deepmodeling-sphinx` that may not be readily available |
| 53 | + |
| 54 | +- **Package building:** |
| 55 | + - Uses setuptools with pyproject.toml configuration |
| 56 | + - `uv pip install build && python -m build` -- create source and wheel distributions |
| 57 | + - Version is managed by setuptools_scm from git tags |
| 58 | + |
| 59 | +## Common Tasks |
| 60 | + |
| 61 | +The following are outputs from frequently run commands. Reference them instead of re-running to save time. |
| 62 | + |
| 63 | +### Repository structure |
| 64 | +``` |
| 65 | +/home/runner/work/dpdata/dpdata/ |
| 66 | +├── dpdata/ # Main package code |
| 67 | +│ ├── __init__.py |
| 68 | +│ ├── cli.py # Command-line interface |
| 69 | +│ ├── system.py # Core System classes |
| 70 | +│ ├── format.py # Format registry |
| 71 | +│ ├── abacus/ # ABACUS format support |
| 72 | +│ ├── amber/ # AMBER format support |
| 73 | +│ ├── deepmd/ # DeePMD format support |
| 74 | +│ ├── vasp/ # VASP format support |
| 75 | +│ ├── xyz/ # XYZ format support |
| 76 | +│ └── ... # Other format modules |
| 77 | +├── tests/ # Test suite (91 test files) |
| 78 | +├── docs/ # Sphinx documentation |
| 79 | +├── plugin_example/ # Example plugin |
| 80 | +├── pyproject.toml # Project configuration |
| 81 | +└── README.md |
| 82 | +``` |
| 83 | + |
| 84 | +### Key dependencies |
| 85 | +- Core: numpy>=1.14.3, scipy, h5py, monty, wcmatch |
| 86 | +- Optional: ase (ASE integration), parmed (AMBER), pymatgen (Materials Project), rdkit (molecular analysis) |
| 87 | +- Testing: unittest (built-in), coverage |
| 88 | +- Linting: ruff |
| 89 | +- Docs: sphinx with various extensions |
| 90 | + |
| 91 | +### Test timing expectations |
| 92 | +- Full test suite: ~10 seconds (1826 tests). NEVER CANCEL. |
| 93 | +- Individual test modules: ~0.5 seconds |
| 94 | +- Linting with ruff: ~1 second |
| 95 | +- Documentation build: ~30 seconds |
| 96 | + |
| 97 | +### Common workflows |
| 98 | +1. **Adding a new format:** |
| 99 | + - Create module in `dpdata/<format>/` |
| 100 | + - Implement format classes inheriting from appropriate base classes |
| 101 | + - Add tests in `tests/test_<format>*.py` |
| 102 | + - Register format in the plugin system |
| 103 | + |
| 104 | +2. **Fixing bugs:** |
| 105 | + - Write test that reproduces the bug first |
| 106 | + - Make minimal fix to pass the test |
| 107 | + - Run full test suite to ensure no regressions |
| 108 | + - Run linting to ensure code style compliance |
| 109 | + |
| 110 | +3. **CLI changes:** |
| 111 | + - Modify `dpdata/cli.py` |
| 112 | + - Test with `dpdata --help` and specific commands |
| 113 | + - Add/update tests if needed |
| 114 | + |
| 115 | +## Troubleshooting |
| 116 | + |
| 117 | +- **Installation timeouts:** Network timeouts during `uv pip install` are common. If this occurs, try: |
| 118 | + - Individual package installation: `uv pip install numpy scipy h5py monty wcmatch` |
| 119 | + - Use `--timeout` option: `uv pip install --timeout 300 -e .` |
| 120 | + - Verify existing installation works: `dpdata --version` should work even if reinstall fails |
| 121 | + |
| 122 | +- **Optional dependency errors:** Many tests will skip or fail if optional dependencies (ase, parmed, pymatgen, rdkit) are not installed. This is expected. Core functionality will work with just the basic dependencies. |
| 123 | + |
| 124 | +- **Documentation build failures:** The docs build requires specific dependencies like `deepmodeling-sphinx` that may not be readily available. Use `make help` to see available targets, but expect build failures without full doc dependencies. |
| 125 | + |
| 126 | +- **Test artifacts:** The test suite generates temporary files (`tests/data_*`, `tests/tmp.*`, `tests/.coverage`). These are excluded by `.gitignore` and should not be committed. |
| 127 | + |
| 128 | +- **Import errors:** If you see import errors for specific modules, check if the corresponding optional dependency is installed. For example, ASE functionality requires `uv pip install ase`. |
| 129 | + |
| 130 | +## Critical Notes |
| 131 | + |
| 132 | +- **NEVER CANCEL** test runs or builds - they complete quickly (10 seconds for tests, 30 seconds for docs) |
| 133 | +- Always run `ruff check` and `ruff format` before committing |
| 134 | +- Test artifacts in `tests/` directory are excluded by `.gitignore` - don't commit them |
| 135 | +- Optional dependencies are required for some formats but core functionality works without them |
| 136 | +- The CLI tool `dpdata` is the main user interface for format conversion |
| 137 | + |
| 138 | +## Commit and PR Guidelines |
| 139 | + |
| 140 | +- **Use semantic commit messages** for all commits and PR titles following the format: `type(scope): description` |
| 141 | + - **Types:** `feat` (new feature), `fix` (bug fix), `docs` (documentation), `style` (formatting), `refactor` (code restructuring), `test` (testing), `chore` (maintenance) |
| 142 | + - **Examples:** |
| 143 | + - `feat(vasp): add support for POSCAR format` |
| 144 | + - `fix(cli): resolve parsing error for multi-frame files` |
| 145 | + - `docs: update installation instructions` |
| 146 | + - `test(amber): add tests for trajectory parsing` |
| 147 | +- **PR titles** must follow semantic commit format |
| 148 | +- **Commit messages** should be concise but descriptive of the actual changes made |
0 commit comments