Skip to content

Commit

Permalink
feat: add GitHub Actions workflows for Jekyll deployment and Poetry d…
Browse files Browse the repository at this point in the history
…ependency export
  • Loading branch information
kenlhlui committed Jan 20, 2025
1 parent e5c7a3b commit 834a3ab
Show file tree
Hide file tree
Showing 5 changed files with 154 additions and 9 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/jekyll-gh-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Sample workflow for building and deploying a Jekyll site to GitHub Pages
name: Deploy Jekyll with GitHub Pages dependencies preinstalled

on:
# Runs on pushes targeting the default branch
push:
branches: ["main"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Build job
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Pages
uses: actions/configure-pages@v5
- name: Build with Jekyll
uses: actions/jekyll-build-pages@v1
with:
source: ./
destination: ./_site
- name: Upload artifact
uses: actions/upload-pages-artifact@v3

# Deployment job
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
73 changes: 73 additions & 0 deletions .github/workflows/poetry-export_dependencies.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Poetry export requirements.txt
on:
push:
branches:
- '*' # Trigger on any push to any branch
paths:
- 'requirements.txt'
- 'pyproject.toml'
- 'poetry.lock'
jobs:
poetry-export_dependencies:
strategy:
fail-fast: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install poetry
uses: abatilo/actions-poetry@v4
with:
poetry-version: 'latest'
- name: Install the poetry-plugin-export
run: poetry self add poetry-plugin-export
- name: Update poetry lock file
run: poetry lock
- name: Export the project dependencies to requirements.txt
run: |
poetry export -f requirements.txt --output requirements.txt
- name: Get branch name
shell: bash
run: echo "BRANCH_NAME=${GITHUB_REF#refs/heads/}" >> $GITHUB_ENV
- name: Check for changes
id: check_changes
run: |
if [[ -n "$(git status --porcelain requirements.txt poetry.lock)" ]]; then
echo "changes=true" >> $GITHUB_OUTPUT
else
echo "changes=false" >> $GITHUB_OUTPUT
fi
- name: Configure Git
run: |
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
- name: Commit and push if changed
if: steps.check_changes.outputs.changes == 'true'
run: |
# Pull with rebase to get latest changes
git pull --rebase origin ${{ env.BRANCH_NAME }}
# Stage and commit changes
git add requirements.txt poetry.lock
git commit -m "chore: update requirements.txt and poetry.lock [skip ci]"
# Push with retry logic
max_attempts=3
attempt=1
while [ $attempt -le $max_attempts ]; do
if git push origin ${{ env.BRANCH_NAME }}; then
break
else
if [ $attempt -eq $max_attempts ]; then
echo "Failed to push after $max_attempts attempts"
exit 1
fi
echo "Push failed, attempt $attempt of $max_attempts. Pulling and retrying..."
git pull --rebase origin ${{ env.BRANCH_NAME }}
attempt=$((attempt + 1))
fi
done
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ authors:
- family-names: "Lui"
given-names: "Lok Hei"
orcid: "https://orcid.org/0000-0001-5077-1530"
title: "Dataverse metadata Crawler"
title: "Dataverse Metadata Crawler"
version: 0.1.0
date-released: 2025-01-16
url: "https://github.com/kenlhlui/dataverse-metadata-crawler-p"
url: "https://github.com/scholarsportal/dataverse-metadata-crawler"
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ A Python CLI tool for extracting and exporting metadata from [Dataverse](https:/

2. Change to the project directory
```sh
cd ~/dataverse-metadata-export-p
cd ./dataverse-metadata-crawler
```

3. Create an environment file (.env)
Expand Down Expand Up @@ -65,13 +65,15 @@ A Python CLI tool for extracting and exporting metadata from [Dataverse](https:/
python3 dvmeta/main.py [-a AUTH] [-l] [-d] [-p] [-f] [-e] [-s] -c COLLECTION_ALIAS -v VERSION
```
**Required arguments:**

| **Option** | **Short** | **Type** | **Description** | **Default** |
|--------------------|-----------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
| --collection_alias | -c | TEXT | Name of the collection to crawl. <br/> **[required]** | None |
| --version | -v | TEXT | The Dataset version to crawl. Options include: <br/> • `draft` - The draft version, if any <br/> • `latest` - Either a draft (if exists) or the latest published version <br/> • `latest-published` - The latest published version <br/> • `x.y` - A specific version <br/> **[required]** | None (required) |


**Optional arguments:**

| **Option** | **Short** | **Type** | **Description** | **Default** |
|----------------------|-----------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|
| --auth | -a | TEXT | Authentication token to access the Dataverse repository. <br/> If | None |
Expand All @@ -96,6 +98,7 @@ python3 dvmeta/main.py -c demo -v 1.0 -d -s -p -a xxxxxxxx-xxxx-xxxx-xxxx-xxxxxx
```

## 📂Output Structure

| File | Description |
|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| ds_metadata_yyyymmdd-HHMMSS.json | Datasets' their data files' metadata in JSON format. |
Expand Down Expand Up @@ -145,21 +148,20 @@ If you use this software in your work, please cite it using the following metada

APA:
```
Lui, L. H. (2025). Dataverse metadata Crawler (Version 0.1.0) [Computer software]. https://github.com/kenlhlui/dataverse-metadata-crawler-p
Lui, L. H. (2025). Dataverse Metadata Crawler (Version 0.1.0) [Computer software]. https://github.com/scholarsportal/dataverse-metadata-crawler
```

BibTeX:
```
@software{Lui_Dataverse_metadata_Crawler_2025,
@software{Lui_Dataverse_Metadata_Crawler_2025,
author = {Lui, Lok Hei},
month = jan,
title = {{Dataverse metadata Crawler}},
url = {https://github.com/kenlhlui/dataverse-metadata-crawler-p},
title = {{Dataverse Metadata Crawler}},
url = {https://github.com/scholarsportal/dataverse-metadata-crawler},
version = {0.1.0},
year = {2025}
}
```

## ✍️Authors
Ken Lui - Data Curation Specialist, Map and Data Library, University of Toronto - [email protected]

Ken Lui - Data Curation Specialist, Map and Data Library, University of Toronto - [[email protected]](mailto:[email protected])
19 changes: 19 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Site settings
title: Dataverse Metadata Crawler
description: A Python CLI tool for extracting and exporting metadata from Dataverse repositories to JSON and CSV formats.
baseurl: "/dataverse-metadata-crawler" # Base URL (leave blank for root deployment)
url: "https://scholarsportal.github.io" # Your GitHub Pages URL

remote_theme: pages-themes/primer
plugins:
- jekyll-remote-theme # add this line to the plugins list if you already have one
- jekyll-seo-tag # Required by primer theme

# Markdown settings
markdown: kramdown
kramdown:
input: GFM # Enables GitHub Flavored Markdown (GFM)

# Build settings
source: ./
destination: ./_site

0 comments on commit 834a3ab

Please sign in to comment.