Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
kenlhlui committed Jan 20, 2025
1 parent 833f8ba commit bcd202f
Show file tree
Hide file tree
Showing 4 changed files with 80 additions and 8 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/jekyll-gh-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Sample workflow for building and deploying a Jekyll site to GitHub Pages
name: Deploy Jekyll with GitHub Pages dependencies preinstalled

on:
# Runs on pushes targeting the default branch
push:
branches: ["main"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Build job
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Pages
uses: actions/configure-pages@v5
- name: Build with Jekyll
uses: actions/jekyll-build-pages@v1
with:
source: ./
destination: ./_site
- name: Upload artifact
uses: actions/upload-pages-artifact@v3

# Deployment job
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ authors:
- family-names: "Lui"
given-names: "Lok Hei"
orcid: "https://orcid.org/0000-0001-5077-1530"
title: "Dataverse metadata Crawler"
title: "Dataverse Metadata Crawler"
version: 0.1.0
date-released: 2025-01-16
url: "https://github.com/kenlhlui/dataverse-metadata-crawler-p"
url: "https://github.com/scholarsportal/dataverse-metadata-crawler"
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ A Python CLI tool for extracting and exporting metadata from [Dataverse](https:/

2. Change to the project directory
```sh
cd ~/dataverse-metadata-export-p
cd ./dataverse-metadata-crawler
```

3. Create an environment file (.env)
Expand Down Expand Up @@ -65,13 +65,15 @@ A Python CLI tool for extracting and exporting metadata from [Dataverse](https:/
python3 dvmeta/main.py [-a AUTH] [-l] [-d] [-p] [-f] [-e] [-s] -c COLLECTION_ALIAS -v VERSION
```
**Required arguments:**

| **Option** | **Short** | **Type** | **Description** | **Default** |
|--------------------|-----------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
| --collection_alias | -c | TEXT | Name of the collection to crawl. <br/> **[required]** | None |
| --version | -v | TEXT | The Dataset version to crawl. Options include: <br/> • `draft` - The draft version, if any <br/> • `latest` - Either a draft (if exists) or the latest published version <br/> • `latest-published` - The latest published version <br/> • `x.y` - A specific version <br/> **[required]** | None (required) |


**Optional arguments:**

| **Option** | **Short** | **Type** | **Description** | **Default** |
|----------------------|-----------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|
| --auth | -a | TEXT | Authentication token to access the Dataverse repository. <br/> If | None |
Expand All @@ -96,6 +98,7 @@ python3 dvmeta/main.py -c demo -v 1.0 -d -s -p -a xxxxxxxx-xxxx-xxxx-xxxx-xxxxxx
```

## 📂Output Structure

| File | Description |
|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| ds_metadata_yyyymmdd-HHMMSS.json | Datasets' their data files' metadata in JSON format. |
Expand Down Expand Up @@ -145,21 +148,20 @@ If you use this software in your work, please cite it using the following metada

APA:
```
Lui, L. H. (2025). Dataverse metadata Crawler (Version 0.1.0) [Computer software]. https://github.com/kenlhlui/dataverse-metadata-crawler-p
Lui, L. H. (2025). Dataverse Metadata Crawler (Version 0.1.0) [Computer software]. https://github.com/scholarsportal/dataverse-metadata-crawler
```

BibTeX:
```
@software{Lui_Dataverse_metadata_Crawler_2025,
author = {Lui, Lok Hei},
month = jan,
title = {{Dataverse metadata Crawler}},
url = {https://github.com/kenlhlui/dataverse-metadata-crawler-p},
title = {{Dataverse Metadata Crawler}},
url = {https://github.com/scholarsportal/dataverse-metadata-crawler},
version = {0.1.0},
year = {2025}
}
```

## ✍️Authors
Ken Lui - Data Curation Specialist, Map and Data Library, University of Toronto - [email protected]

Ken Lui - Data Curation Specialist, Map and Data Library, University of Toronto - [[email protected]](mailto:[email protected])
19 changes: 19 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Site settings
title: Dataverse Metadata Crawler
description: A Python CLI tool for extracting and exporting metadata from Dataverse repositories to JSON and CSV formats.
baseurl: "/dataverse-metadata-crawler" # Base URL (leave blank for root deployment)
url: "https://scholarsportal.github.io" # Your GitHub Pages URL

remote_theme: pages-themes/primer
plugins:
- jekyll-remote-theme # add this line to the plugins list if you already have one
- jekyll-seo-tag # Required by primer theme

# Markdown settings
markdown: kramdown
kramdown:
input: GFM # Enables GitHub Flavored Markdown (GFM)

# Build settings
source: ./
destination: ./_site

0 comments on commit bcd202f

Please sign in to comment.