Skip to content

Commit

Permalink
1. Added GitHub Actions workflows for Jekyll deployment and Poetry de…
Browse files Browse the repository at this point in the history
…pendency export

2. Updated CITATION.cff & README
  • Loading branch information
kenlhlui committed Jan 28, 2025
1 parent e5c7a3b commit f207835
Show file tree
Hide file tree
Showing 5 changed files with 154 additions and 9 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/jekyll-gh-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Sample workflow for building and deploying a Jekyll site to GitHub Pages
name: Deploy Jekyll with GitHub Pages dependencies preinstalled

on:
# Runs on pushes targeting the default branch
push:
branches: ["main"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Build job
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Pages
uses: actions/configure-pages@v5
- name: Build with Jekyll
uses: actions/jekyll-build-pages@v1
with:
source: ./
destination: ./_site
- name: Upload artifact
uses: actions/upload-pages-artifact@v3

# Deployment job
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
73 changes: 73 additions & 0 deletions .github/workflows/poetry-export_dependencies.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Poetry export requirements.txt
on:
push:
branches:
- '*' # Trigger on any push to any branch
paths:
- 'requirements.txt'
- 'pyproject.toml'
- 'poetry.lock'
jobs:
poetry-export_dependencies:
strategy:
fail-fast: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install poetry
uses: abatilo/actions-poetry@v4
with:
poetry-version: 'latest'
- name: Install the poetry-plugin-export
run: poetry self add poetry-plugin-export
- name: Update poetry lock file
run: poetry lock
- name: Export the project dependencies to requirements.txt
run: |
poetry export -f requirements.txt --output requirements.txt
- name: Get branch name
shell: bash
run: echo "BRANCH_NAME=${GITHUB_REF#refs/heads/}" >> $GITHUB_ENV
- name: Check for changes
id: check_changes
run: |
if [[ -n "$(git status --porcelain requirements.txt poetry.lock)" ]]; then
echo "changes=true" >> $GITHUB_OUTPUT
else
echo "changes=false" >> $GITHUB_OUTPUT
fi
- name: Configure Git
run: |
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
- name: Commit and push if changed
if: steps.check_changes.outputs.changes == 'true'
run: |
# Pull with rebase to get latest changes
git pull --rebase origin ${{ env.BRANCH_NAME }}
# Stage and commit changes
git add requirements.txt poetry.lock
git commit -m "chore: update requirements.txt and poetry.lock [skip ci]"
# Push with retry logic
max_attempts=3
attempt=1
while [ $attempt -le $max_attempts ]; do
if git push origin ${{ env.BRANCH_NAME }}; then
break
else
if [ $attempt -eq $max_attempts ]; then
echo "Failed to push after $max_attempts attempts"
exit 1
fi
echo "Push failed, attempt $attempt of $max_attempts. Pulling and retrying..."
git pull --rebase origin ${{ env.BRANCH_NAME }}
attempt=$((attempt + 1))
fi
done
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ authors:
- family-names: "Lui"
given-names: "Lok Hei"
orcid: "https://orcid.org/0000-0001-5077-1530"
title: "Dataverse metadata Crawler"
title: "Dataverse Metadata Crawler"
version: 0.1.0
date-released: 2025-01-16
url: "https://github.com/kenlhlui/dataverse-metadata-crawler-p"
url: "https://github.com/scholarsportal/dataverse-metadata-crawler"
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ A Python CLI tool for extracting and exporting metadata from [Dataverse](https:/

2. Change to the project directory
```sh
cd ~/dataverse-metadata-export-p
cd ./dataverse-metadata-crawler
```

3. Create an environment file (.env)
Expand Down Expand Up @@ -65,13 +65,15 @@ A Python CLI tool for extracting and exporting metadata from [Dataverse](https:/
python3 dvmeta/main.py [-a AUTH] [-l] [-d] [-p] [-f] [-e] [-s] -c COLLECTION_ALIAS -v VERSION
```
**Required arguments:**

| **Option** | **Short** | **Type** | **Description** | **Default** |
|--------------------|-----------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
| --collection_alias | -c | TEXT | Name of the collection to crawl. <br/> **[required]** | None |
| --version | -v | TEXT | The Dataset version to crawl. Options include: <br/> • `draft` - The draft version, if any <br/> • `latest` - Either a draft (if exists) or the latest published version <br/> • `latest-published` - The latest published version <br/> • `x.y` - A specific version <br/> **[required]** | None (required) |


**Optional arguments:**

| **Option** | **Short** | **Type** | **Description** | **Default** |
|----------------------|-----------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|
| --auth | -a | TEXT | Authentication token to access the Dataverse repository. <br/> If | None |
Expand All @@ -96,6 +98,7 @@ python3 dvmeta/main.py -c demo -v 1.0 -d -s -p -a xxxxxxxx-xxxx-xxxx-xxxx-xxxxxx
```

## 📂Output Structure

| File | Description |
|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| ds_metadata_yyyymmdd-HHMMSS.json | Datasets' their data files' metadata in JSON format. |
Expand Down Expand Up @@ -145,21 +148,20 @@ If you use this software in your work, please cite it using the following metada

APA:
```
Lui, L. H. (2025). Dataverse metadata Crawler (Version 0.1.0) [Computer software]. https://github.com/kenlhlui/dataverse-metadata-crawler-p
Lui, L. H. (2025). Dataverse Metadata Crawler (Version 0.1.0) [Computer software]. https://github.com/scholarsportal/dataverse-metadata-crawler
```

BibTeX:
```
@software{Lui_Dataverse_metadata_Crawler_2025,
@software{Lui_Dataverse_Metadata_Crawler_2025,
author = {Lui, Lok Hei},
month = jan,
title = {{Dataverse metadata Crawler}},
url = {https://github.com/kenlhlui/dataverse-metadata-crawler-p},
title = {{Dataverse Metadata Crawler}},
url = {https://github.com/scholarsportal/dataverse-metadata-crawler},
version = {0.1.0},
year = {2025}
}
```

## ✍️Authors
Ken Lui - Data Curation Specialist, Map and Data Library, University of Toronto - [email protected]

Ken Lui - Data Curation Specialist, Map and Data Library, University of Toronto - [[email protected]](mailto:[email protected])
19 changes: 19 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Site settings
title: Dataverse Metadata Crawler
description: A Python CLI tool for extracting and exporting metadata from Dataverse repositories to JSON and CSV formats.
baseurl: "/dataverse-metadata-crawler" # Base URL (leave blank for root deployment)
url: "https://scholarsportal.github.io" # Your GitHub Pages URL

remote_theme: pages-themes/primer
plugins:
- jekyll-remote-theme # add this line to the plugins list if you already have one
- jekyll-seo-tag # Required by primer theme

# Markdown settings
markdown: kramdown
kramdown:
input: GFM # Enables GitHub Flavored Markdown (GFM)

# Build settings
source: ./
destination: ./_site

0 comments on commit f207835

Please sign in to comment.