docs: add comprehensive GitHub Copilot instructions with uv and semantic commit guidelines (#873)

Copilot · njzjz · pre-commit-ci[bot] · web-flow · commit ebb20ad10b63 · 2025-08-29T11:49:43.000+08:00
This PR adds a comprehensive `.github/copilot-instructions.md` file that provides GitHub Copilot coding agents with detailed guidance on how to work effectively in the dpdata repository. The instructions include: **Package Management with uv:** - Uses `uv pip install` for all package installations instead of pip - Provides faster and more reliable dependency management - Includes troubleshooting for network timeouts and installation issues **Semantic Commit Guidelines:** - Enforces semantic commit message format: `type(scope): description` - Defines commit types: feat, fix, docs, style, refactor, test, chore - Requires semantic format for both commits and PR titles - Provides clear examples for different scenarios **Core Development Workflow:** - Step-by-step installation process using uv - Test execution commands with timing expectations (~10 seconds for full suite) - Linting and formatting using ruff with "NEVER CANCEL" warnings - Documentation build process using Sphinx **Validated Commands:** All commands have been tested and validated: - CLI functionality (`dpdata --version`, `dpdata --help`) - Test suite execution (`python -m unittest discover`) - Individual test modules (`python -m unittest test_<module>.py`) - Code quality tools (`ruff check`, `ruff format`) - Coverage reporting **Key Features:** - Imperative tone throughout with clear directives - Timing expectations for all operations - Comprehensive troubleshooting section - Repository structure overview - Format-specific development workflows - Clear validation scenarios for testing changes **Additional Improvements:** - Updated `.gitignore` to exclude test artifacts (`tests/data_*`, `tests/tmp.*`, `tests/.coverage`) - Documented core vs optional dependency relationships - Included guidance for plugin development The instructions follow GitHub Copilot's expected format with firm directives to reference them first before falling back to search or bash commands. Fixes #872.  --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: njzjz <9496702+njzjz@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,148 @@
+# dpdata - Atomistic Data Format Manipulation
+
+dpdata is a Python package for manipulating atomistic data from computational science software. It supports format conversion between various atomistic simulation packages including VASP, DeePMD-kit, LAMMPS, GROMACS, Gaussian, ABACUS, and many others.
+
+Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here.
+
+## Working Effectively
+
+- **Bootstrap and install the repository:**
+  - `cd /home/runner/work/dpdata/dpdata` (or wherever the repo is cloned)
+  - `uv pip install -e .` -- installs dpdata in development mode with core dependencies (numpy, scipy, h5py, monty, wcmatch)
+  - Test installation: `dpdata --version` -- should show version like "dpdata v0.1.dev2+..."
+
+- **Run tests:**
+  - `cd tests && python -m unittest discover` -- runs all 1826 tests in ~10 seconds. NEVER CANCEL.
+  - `cd tests && python -m unittest test_<module>.py` -- run specific test modules (individual modules take ~0.5 seconds)
+  - `cd tests && coverage run --source=../dpdata -m unittest discover && coverage report` -- run tests with coverage
+
+- **Linting and formatting:**
+  - Install ruff: `uv pip install ruff`
+  - `ruff check dpdata/` -- lint the main package (takes ~1 second)
+  - `ruff format dpdata/` -- format code according to project style
+  - `ruff check --fix dpdata/` -- auto-fix linting issues where possible
+
+- **Pre-commit hooks:**
+  - Install: `uv pip install pre-commit`
+  - `pre-commit run --all-files` -- run all hooks on all files
+  - Hooks include: ruff linting/formatting, trailing whitespace, end-of-file-fixer, yaml/json/toml checks
+
+## Validation
+
+- **Always test CLI functionality after making changes:**
+  - `dpdata --help` -- ensure CLI still works
+  - `dpdata --version` -- verify version is correct
+  - Test a basic conversion if sample data is available
+
+- **Always run linting before committing:**
+  - `ruff check dpdata/` -- ensure no new linting errors
+  - `ruff format dpdata/` -- ensure code is properly formatted
+
+- **Run relevant tests for your changes:**
+  - For format-specific changes: `cd tests && python -m unittest test_<format>*.py`
+  - For core system changes: `cd tests && python -m unittest test_system*.py test_multisystems.py`
+  - For CLI changes: `cd tests && python -m unittest test_cli.py` (if exists)
+
+## Build and Documentation
+
+- **Documentation:**
+  - `cd docs && make help` -- see all available build targets
+  - `cd docs && make html` -- build HTML documentation (requires additional dependencies)
+  - Documentation source is in `docs/` directory using Sphinx
+  - **NOTE:** Full docs build requires additional dependencies like `deepmodeling-sphinx` that may not be readily available
+
+- **Package building:**
+  - Uses setuptools with pyproject.toml configuration
+  - `uv pip install build && python -m build` -- create source and wheel distributions
+  - Version is managed by setuptools_scm from git tags
+
+## Common Tasks
+
+The following are outputs from frequently run commands. Reference them instead of re-running to save time.
+
+### Repository structure
+```
+/home/runner/work/dpdata/dpdata/
+├── dpdata/           # Main package code
+│   ├── __init__.py
+│   ├── cli.py        # Command-line interface
+│   ├── system.py     # Core System classes
+│   ├── format.py     # Format registry
+│   ├── abacus/       # ABACUS format support
+│   ├── amber/        # AMBER format support
+│   ├── deepmd/       # DeePMD format support
+│   ├── vasp/         # VASP format support
+│   ├── xyz/          # XYZ format support
+│   └── ...          # Other format modules
+├── tests/            # Test suite (91 test files)
+├── docs/             # Sphinx documentation
+├── plugin_example/   # Example plugin
+├── pyproject.toml    # Project configuration
+└── README.md
+```
+
+### Key dependencies
+- Core: numpy>=1.14.3, scipy, h5py, monty, wcmatch
+- Optional: ase (ASE integration), parmed (AMBER), pymatgen (Materials Project), rdkit (molecular analysis)
+- Testing: unittest (built-in), coverage
+- Linting: ruff
+- Docs: sphinx with various extensions
+
+### Test timing expectations
+- Full test suite: ~10 seconds (1826 tests). NEVER CANCEL.
+- Individual test modules: ~0.5 seconds
+- Linting with ruff: ~1 second
+- Documentation build: ~30 seconds
+
+### Common workflows
+1. **Adding a new format:**
+   - Create module in `dpdata/<format>/`
+   - Implement format classes inheriting from appropriate base classes
+   - Add tests in `tests/test_<format>*.py`
+   - Register format in the plugin system
+
+2. **Fixing bugs:**
+   - Write test that reproduces the bug first
+   - Make minimal fix to pass the test
+   - Run full test suite to ensure no regressions
+   - Run linting to ensure code style compliance
+
+3. **CLI changes:**
+   - Modify `dpdata/cli.py`
+   - Test with `dpdata --help` and specific commands
+   - Add/update tests if needed
+
+## Troubleshooting
+
+- **Installation timeouts:** Network timeouts during `uv pip install` are common. If this occurs, try:
+  - Individual package installation: `uv pip install numpy scipy h5py monty wcmatch`
+  - Use `--timeout` option: `uv pip install --timeout 300 -e .`
+  - Verify existing installation works: `dpdata --version` should work even if reinstall fails
+
+- **Optional dependency errors:** Many tests will skip or fail if optional dependencies (ase, parmed, pymatgen, rdkit) are not installed. This is expected. Core functionality will work with just the basic dependencies.
+
+- **Documentation build failures:** The docs build requires specific dependencies like `deepmodeling-sphinx` that may not be readily available. Use `make help` to see available targets, but expect build failures without full doc dependencies.
+
+- **Test artifacts:** The test suite generates temporary files (`tests/data_*`, `tests/tmp.*`, `tests/.coverage`). These are excluded by `.gitignore` and should not be committed.
+
+- **Import errors:** If you see import errors for specific modules, check if the corresponding optional dependency is installed. For example, ASE functionality requires `uv pip install ase`.
+
+## Critical Notes
+
+- **NEVER CANCEL** test runs or builds - they complete quickly (10 seconds for tests, 30 seconds for docs)
+- Always run `ruff check` and `ruff format` before committing
+- Test artifacts in `tests/` directory are excluded by `.gitignore` - don't commit them
+- Optional dependencies are required for some formats but core functionality works without them
+- The CLI tool `dpdata` is the main user interface for format conversion
+
+## Commit and PR Guidelines
+
+- **Use semantic commit messages** for all commits and PR titles following the format: `type(scope): description`
+  - **Types:** `feat` (new feature), `fix` (bug fix), `docs` (documentation), `style` (formatting), `refactor` (code restructuring), `test` (testing), `chore` (maintenance)
+  - **Examples:**
+    - `feat(vasp): add support for POSCAR format`
+    - `fix(cli): resolve parsing error for multi-frame files`
+    - `docs: update installation instructions`
+    - `test(amber): add tests for trajectory parsing`
+- **PR titles** must follow semantic commit format
+- **Commit messages** should be concise but descriptive of the actual changes made
diff --git a/.gitignore b/.gitignore
@@ -29,3 +29,8 @@ docs/minimizers.csv
 docs/api/
 docs/formats/
 .DS_Store
+# Test artifacts
+tests/data_*.h5
+tests/data_*/
+tests/tmp.*
+tests/.coverage
diff --git a/tests/.coverage b/tests/.coverage