-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Purpose
Expanding the database with data from more sources (example sentences, pitch accent data, estimated JLPT levels) will make it more versatile and valuable. By integrating this information directly into the source, developers won’t need to build parallel processes later to fetch or merge external data.
Implementation Idea
Treat enrichment as a post-processing step after JMdict is processed. Create additional processors for each dataset (pitch accent, JLPT lists, example sentence corpora). These processors can attach the supplemental data to the existing entries in a structured, consistent way. JLPT data, while unofficial, can be estimated using historical usage from past exams or publicly available frequency lists.
This approach ensures modularity (each processor handles its own data source) while keeping the final dataset unified and easy to query.