All contributions are welcome. This dataset is being fact-checked file by file — every correction, addition, or pair of eyes helps.
No CLA, no contributor agreement. Public regulatory data, open contribution.
In priority order:
- Fact-corrections — you spotted a wrong allocation, power limit, modulation, or regulatory reference for your country
- Country additions — currently 9 covered (FR, US, UK, CN, DE, RU, ES, IT, CH). Adding JP, KR, BR, IN, AU, CA, others = huge value
- Device examples with model numbers — real-world devices that actually transmit on a band in your region
- Regulatory reference enrichment — pin a specific FCC Part / ECC Decision / national gazette to an entry
- HF datasets to cross-reference — point at community RF / rtl_433 datasets the pipeline should triangulate against
- Pipeline improvements — new sources, better confidence weighting, additional cross-check backends
Even a single-line correction is valuable — that's one less hallucination in someone's fine-tuned LLM.
Open one for:
- Factual errors (please include the source: regulator URL, doc reference, etc.)
- Missing devices / services / countries
- Pipeline bugs
- Suggestions
- Edit the relevant JSON in
enriched_data/ormerged_dataset/ - Add a
correction_notefield referencing the source you fact-checked against - Update the relevant report in
factcheck_reports/if applicable - PR description: what changed, why, source consulted
Example minimal correction:
{
"freq_low_mhz": 868.0,
"freq_high_mhz": 868.6,
"country_code": "DE",
"service": "SRD",
"regulatory_ref": "BNetzA Vfg 60/2019, ETSI EN 300 220-2",
"correction_note": "Power limit corrected from 25 mW ERP to 25 mW ERP with ≤1% duty cycle (was missing duty cycle constraint). Source: BNetzA Allgemeinzuteilung Vfg 60/2019 §4.2."
}- PRs against
Data_Process/scripts/ - Keep cross-check sources public / API-free where possible (DDG fallback, Wikipedia, HF public datasets)
- Add tests / smoke checks if you change confidence math
- Update
Data_Process/README.mdif you change behavior
- Add country to the relevant 5 enriched JSON files (one entry per allocation × country)
- Cite the national regulator at minimum
- Add the regulator to
STANDARDS.md - Add to the
CROSS_CHECK_SOURCES.websearch.trusted_domainslist in00_config.pyif they have an official.gov/ national TLD
- Python 3.10+
- No formatter enforced; PEP8-ish
- Keep network calls behind env flags (e.g.
CROSS_CHECK=1) so the pipeline stays offline-first
- Typo fixes ✅
- Single-line factual corrections with a source ✅
- New device examples ✅
- New regulatory citations ✅
- Removing entries (might need a
disputed_noteinstead) - Changing the confidence scoring weights
- Adding new cross-check sources (good — let's pick the right ones together)
- Schema changes (need to migrate existing files)
Commit history is the credit log. If you want a different name in commits, configure your git locally before pushing — I won't rewrite history.
Open an issue with the question label. Discussion is welcome before opening a big PR.
Thanks for helping make this dataset more accurate. RF allocations are genuinely complex and per-country edge cases are everywhere — community knowledge is the only way this gets to the ×210 production target with real quality.