Skip to content

LDFLK/gztarchiver

Gztarchiver

License Code of Conduct Security Contributing

Gztarchiver is a python tool built to archive, categorise and collect metadata on government documents.

Features

Feature Description
Document Categorization Categorize documents based on their content.
Smart Filtering Filter by year, month, day, and language.
Organized Storage Files saved in structured folders: year/month/day/gazette_id/
Get New Updates Can get new updates from the source.
Resume Capability If interrupted, run the same command again to resume downloads.
Progress Tracking Real-time download progress with statistics.
File Validation Automatic validation of downloaded PDF files.
Comprehensive Logging Detailed logs for successful, failed, unavailable and categorised documents.
Error Handling Automatic retry for failed downloads and check for unavailable documents and re-try them.

Getting Started

Please see our Getting Started Guide.

Contributing

Please see our Contributing Guide.

Code of Conduct

Please see our Code of Conduct.

Security

Please see our Security Policy.

License

Distributed under the Apache 2.0 License. See License for more information.

References

Checkout our Archives application. We made this document archive using this tool.


Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages