The project is quite simple with all sources are in /src
You can start with /src/cli to check what are available and start there
I tried to make the process of adding new product as easy as possible via config first approach.
The base config is at vietlott.config.products.ProductConfig, with settings mostly works for all products of Vietlott.
Key points:
- cookies used to needed to crawl but not anymore (disabled for all products)
- data on website are in pages so the fetching are designed around that mechanism (also the detect missing and back-filled mechanism at missing.py)
The project uses Github Actions with config to schedule the run daily to crawl & push to itself. So no server required.
To make it easier (for me) to dev, the binary file
set PYTHONPATH to /src, but it can and should be using installed cli:
[project.scripts]
vietlott-crawl = "vietlott.cli.crawl:crawl"
vietlott-missing = "vietlott.cli.crawl:detect_missing_data"