-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1. Updated handling of check connetion. Fall back to Unauthenticated …
…connection if API key is invalid. 2. Moved defining headers to `MetaDataCrawler` class 3. Added example.ipynb for running the crawler on mybinder.org 4. Updated README, CITATION.cff and pyproject.toml.
- Loading branch information
Showing
8 changed files
with
182 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,10 @@ | ||
cff-version: 0.1.1 | ||
cff-version: 0.1.2 | ||
message: "If you use this software, please cite it as below." | ||
authors: | ||
- family-names: "Lui" | ||
given-names: "Lok Hei" | ||
orcid: "https://orcid.org/0000-0001-5077-1530" | ||
title: "Dataverse Metadata Crawler" | ||
version: 0.1.1 | ||
version: 0.1.2 | ||
date-released: 2025-01-28 | ||
url: "https://github.com/scholarsportal/dataverse-metadata-crawler" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Step 1: Setting environment variables\n", | ||
"Replace the values inside the quotes for BASE_URL and API_KEY.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Replace the placeholders with your own values and run this script to create a .env file\n", | ||
"BASE_URL = 'TARGET_REPO_URL' # Base URL of the repository; e.g., \"https://demo.borealisdata.ca/\"\n", | ||
"API_KEY = 'YOUR_API_KEY' # Found in your Dataverse account settings. Optional. Delete this line if you plan not to use it.\n", | ||
"\n", | ||
"\n", | ||
"# Write the .env file\n", | ||
"with open('.env', 'w', encoding='utf-8') as file:\n", | ||
" if locals().get('API_KEY') is None:\n", | ||
" file.write(f'BASE_URL = \"{BASE_URL}\"\\n')\n", | ||
" else:\n", | ||
" file.write(f'BASE_URL = \"{BASE_URL}\"\\n')\n", | ||
" file.write(f'API_KEY = \"{API_KEY}\"\\n')\n", | ||
" print('Successfully created the .env file!')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Step 2: Running the command line tool\n", | ||
"The following cell runs the comand line tool.\n", | ||
"\n", | ||
"**Configuration**:\n", | ||
"1. Replace the COLLECTION_ALIAS with your desired value. See [here](https://github.com/scholarsportal/dataverse-metadata-crawler/wiki/Guide:-How-to-find-the-COLLECTION_ALIAS-of-a-Dataverse-collection) for getting your collection alias.\n", | ||
"2. Replace the VERSION with your desired value. It can either be 'latest', 'latest-published' or a version number 'x.y' (like '1.0')\n", | ||
"3. Add the optional flags. See the following table for your reference:\n", | ||
" \n", | ||
"\n", | ||
"| **Option** | **Short** | **Type** | **Description** | **Default** |\n", | ||
"|----------------------|-----------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|\n", | ||
"| --auth | -a | TEXT | Authentication token to access the Dataverse repository. <br/> | None |\n", | ||
"| --log <br/> --no-log | -l | | Output a log file. <br/> Use `--no-log` to disable logging. | `log` (unless `--no-log`) |\n", | ||
"| --dvdfds_metadata | -d | | Output a JSON file containing metadata of Dataverses, Datasets, and Data Files. | |\n", | ||
"| --permission | -p | | Output a JSON file that stores permission metadata for all Datasets in the repository. | |\n", | ||
"| --emptydv | -e | | Output a JSON file that stores all Dataverses which do **not** contain Datasets (though they might have child Dataverses which have Datasets). | |\n", | ||
"| --failed | -f | | Output a JSON file of Dataverses/Datasets that failed to be crawled. | |\n", | ||
"| --spreadsheet | -s | | Output a CSV file of the metadata of Datasets. | |\n", | ||
"| --help | | | Show the help message. | |\n", | ||
"\n", | ||
"Example:\n", | ||
"1. Export the metadata of latest version of datasets under collection 'demo' to JSON\n", | ||
"\n", | ||
" `!python3 dvmeta/main.py -c demo -v latest -d`\n", | ||
"\n", | ||
"2. Export the metadata of version 1.0 of all datasets under collection 'demo' to JSON and CSV\n", | ||
"\n", | ||
" `!python3 dvmeta/main.py -c demo -v 1.0 -d -s`\n", | ||
"\n", | ||
"3. Export the metadata and permission metadata of version latest-published of all datasets under collection 'toronto' to JSON and CSV. Also export the empty dataverses and datasets failed to be crawled\n", | ||
"\n", | ||
" `!python3 dvmeta/main.py -c toronto -v latest-published -d -s -p -e -f`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Run the command line interface\n", | ||
"# Replace 'COLLECTION_ALIAS' and 'VERSION' with your values\n", | ||
"# Modify the flags as needed referring to the table above\n", | ||
"!python3 dvmeta/main.py -c 'COLLECTION_ALIAS' -v 'VERSION' -d -s -p -e -f" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[tool.poetry] | ||
name = "dataverse-metadata-crawler" | ||
version = "0.1.1" | ||
version = "0.1.2" | ||
description = "A Python CLI tool for bulk extracting and exporting metadata from Dataverse repositories' collections to JSON and CSV formats." | ||
authors = ["Ken Lui <[email protected]>"] | ||
license = "MIT" | ||
|