Gztarchiver is a python tool built to archive, categorise and collect metadata on government documents.
| Feature | Description |
|---|---|
| Document Categorization | Categorize documents based on their content. |
| Smart Filtering | Filter by year, month, day, and language. |
| Organized Storage | Files saved in structured folders: year/month/day/gazette_id/ |
| Get New Updates | Can get new updates from the source. |
| Resume Capability | If interrupted, run the same command again to resume downloads. |
| Progress Tracking | Real-time download progress with statistics. |
| File Validation | Automatic validation of downloaded PDF files. |
| Comprehensive Logging | Detailed logs for successful, failed, unavailable and categorised documents. |
| Error Handling | Automatic retry for failed downloads and check for unavailable documents and re-try them. |
Please see our Getting Started Guide.
Please see our Contributing Guide.
Please see our Code of Conduct.
Please see our Security Policy.
Distributed under the Apache 2.0 License. See License for more information.
Checkout our Archives application. We made this document archive using this tool.