Sharing what I've learned on my way to prototype a startup idea; All code here is opensource under MIT.
Prototype http://143.198.229.7/
Changelog/Work http://143.198.229.7/changelog/
Main repo atm https://gitlab2.cip.ifi.lmu.de/hemminger/ER (Not MIT; Will ship it into this one over-time)
Brainstorming
KPI-Bert
- German only
- Bundesanzeiger
- Not open source
- Small paper
- Bad
Finer-ord
- Good for standard tag recognition?
- Good F1s for ORG, PER, LOC
- Trained on free news articles, no SEC context
Finer-139
- Looks good
Finance-instruct-500k
- Reasoning; Conversations; Entity Recognition; Sentiment; Multilingual; Address parsing
Open Source Datasets/Training:
- Webz.io (Finer-ord; financial news articles)
- SEC