Several libraries and library networks publish their data as "open data".
Péter Király created a list of international open MARC 21 data sets at <https://github.com/pkiraly/metadata-qa-marc#datasources>.
The Internet Archive's Open Library project is making thousands of library records freely available for anyone's use, see <https://archive.org/details/ol_data>.
You can download the data sets via the command line, e.g.:
$ wget http://ered.library.upenn.edu/data/opendata/pau.zip
$ unzip pau.zip
Many libraries offer MARC 21 data via public APIs like Z39.50, SRU, OAI.
Z39.50 is a standard (ANSI/NISO Z39.50-2003) that defines a client/server based service and protocol for information retrieval. Like MARC 21 Z39.50 has a quite long history (Lynch, 1997) and is maintained by Library of Congress.
Many libraries offer access to their Online Public Access Catalogues (OAPC) via Z39.50 servers, e.g. Library of Congress or KOBV.
To retrieve data from Z39.50 servers you need a client software like yaz-client
from Index Data, which is part of the free open source toolkit "YAZ":
# open client
$ yaz-client
# connect to database
Z> open lx2.loc.gov/LCDB
# set format to MARC
Z> format 1.2.840.10003.5.10
# set element set
Z> element F
# append retrieved records to file
Z> set_marcdump loc.mrc
# find record for subject
Z> find @attr 5=100 @attr 1=21 "Perl"
# get first 50 records
Z> show 1+50
# close client
Z> exit
The Catmandu toolkit provides a Z39.50 client "Catmandu::Importer::Z3950":
$ catmandu convert -v Z3950 \
--host z3950.kobv.de \
--port 210 \
--databaseName k2 \
--preferredRecordSyntax usmarc \
--queryType PQF \
--query '@attr 1=1016 code4lib' \
--handler USMARC \
to MARC > code4lib.mrc
SRU (Search/Retrieve via URL) is another standard protocol for information retrival. It uses HTTP as application layer protocol and XML for data serialization. Search queries are expressed with CQL (Contextual Query Language), a formal language for representing queries.
You can use the yaz-client
to search and retrive data from a SRU server:
# open client
$ yaz-client
# connect to database
Z> open http://sru.k10plus.de/gvk
# append retrieved records to file
Z> set_marcdump gvk.mrc.xml
# find record for subject
Z> find pica.sw=Perl
# get first 50 records
Z> show 1+50
# close client
Z> exit
The Catmandu toolkit also provides a SRU client "Catmandu::Importer::SRU":
$ catmandu convert -v SRU \
--base https://services.dnb.de/sru/zdb \
--recordSchema MARC21-xml \
--query 'dnb.iss = 1940-5758' \
--parser marcxml \
to MARC --type XML > code4lib.xml
OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is a protocol for metadata replication and distribution. Data providers host metadata records and their changes over time, so service providers can harvest them. As SRU it uses HTTP as application layer protocol and XML for data serialization.
The Catmandu toolkit provides an OAI-PMH harvester client "Catmandu::Importer::OAI":
$ catmandu convert -v OAI \
--url https://lib.ugent.be/oai \
--metadataPrefix marcxml \
--from 2021-02-01 \
--until 2021-02-01 \
--handler marcxml \
to MARC > gent.mrc