Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4 new bulk lists from Gary Price @ InfoDocket #37

Merged
merged 18 commits into from
Feb 3, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added seed-lists/DEI_FAA-20240201.xlsx
Binary file not shown.
Binary file added seed-lists/EPA 2024-PDF-20250201.xlsx
Binary file not shown.
Binary file added seed-lists/GENDER_ID-20250201.xlsx
Binary file not shown.
8 changes: 8 additions & 0 deletions seed-lists/README.md
Original file line number Diff line number Diff line change
@@ -28,6 +28,10 @@ Seeds supplied by Dorothy Bower of the U.S. Government Publishing Office:
* PURL_server_domains_20240214.csv - report of all target domains from the PURL server; some determined to be out of scope were not included in the Nomination Tool.
* PURL_server_domains_20240214_non_gov_mil.csv - non .gov/.mil seeds from the PURL_server_domains_20240214.csv list that were determined to be in scope by Mark Phillips of UNT.

### Harvard Law School's Library Innovation Lab

* data_20250130_catalog_urls_empty-harvard-LiL.txt. List of urls of data.gov metadata records that do NOT inlude links to data files but ONLY have links to federal agency landing pages. LiL collected all of the records that included data files.

### infoDOCKET seeds
Seed lists produced by Gary Price, editor of infoDOCKET:

@@ -51,6 +55,10 @@ Seed lists produced by Gary Price, editor of infoDOCKET:
* Diversity-DEI-20250119.xlsx. 2199 PDFs (with a few exceptions) from several agencies. The focus of these docs, DEI topics and issues.
* pclob-20250122.xlsx. 600 urls (PDFs and HTML) from the u.s. Privacy and Civil Liberties Oversight Board.
* MSPB-20250128.xlsx. 844 urls (html and pdf) from the Merit Service Protection Bureau.
* USDA_ClimateChange-20250201.xlsx. 1633 seeds from USDA. Topic: Climate Change. Includes approx 1000 urls (HTML and PDF) from Climatehubs.usda.gov.
* EPA 2024-PDF-20250201.xlsx 2300+ EPA seeds. Most PDFs from 2024-present.
* GENDER_ID-20250201.xlsx. 3553 lines with PDFs (from various agencies and some .mil domains) that contain the phrase "gender identity".
* DEI_FAA-20240201.xlsx 758 lines of PDFs from FCC.gov. Terms: DEI and related terms.

### Internet Archive seeds
Seeds supplied by Antoine McGrath of Internet Archive:
Binary file added seed-lists/USDA_ClimateChange-20250201.xlsx
Binary file not shown.
18,019 changes: 18,019 additions & 0 deletions seed-lists/data_20250130_catalog_urls_empty-harvard-LiL.txt

Large diffs are not rendered by default.