There's some lack of standardization in the way the JSON word lists are formatted which make using them programmatically a tad more difficult than should be. For example, "term": "blackhat-whitehat" is used as a combination entry for two terms whereas, "term": "blast-radius" is a single term.
I've written a Semgrep rules generator that uses your word list and data to scan code projects for terms and reports instances of use as findings ranked according to the tier. To do this I also had to clean up the JSON. You can see the revamped word list here: https://gitlab.com/SuperTeece/inclusive-language-semgrep-rules/-/blob/595cb5cf1d27c9006f59832c3aa864a7827e947c/data/word-lists.json
There's some lack of standardization in the way the JSON word lists are formatted which make using them programmatically a tad more difficult than should be. For example, "term": "blackhat-whitehat" is used as a combination entry for two terms whereas, "term": "blast-radius" is a single term.
I've written a Semgrep rules generator that uses your word list and data to scan code projects for terms and reports instances of use as findings ranked according to the tier. To do this I also had to clean up the JSON. You can see the revamped word list here: https://gitlab.com/SuperTeece/inclusive-language-semgrep-rules/-/blob/595cb5cf1d27c9006f59832c3aa864a7827e947c/data/word-lists.json