Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# FAIR Data Fund — Developer Overview

This repository contains a small Python web application that powers the FAIR Data Fund application & review workflow. It’s a classic WSGI app built on **Werkzeug** and **Jinja2**, with data stored in / read from a **SPARQL 1.1 endpoint** via **RDFLib**.

## Tech stack (at a glance)

- **Language**: Python 3.8+
- **Web**: Werkzeug (WSGI), Jinja2 templates, static assets (HTML/CSS/JS, jQuery, Quill, Dropzone)
- **Data**: RDF and SPARQL via `rdflib` (uses `SPARQLStore`); defaults to a Virtuoso-style endpoint at `http://127.0.0.1:8890/sparql`
- **Auth**: Optional SAML settings are supported via XML config (no external SAML lib; parsed & validated with `defusedxml`)
- **Email**: SMTP (configurable via XML)
- **Build tooling**: GNU Autotools (`configure.ac`, `Makefile.am`) is used to generate `pyproject.toml` and Makefiles; packaging via `setuptools`
- **CLI entry point**: `fair-data-fund` → `fair_data_fund.ui:main`
- **Key deps** (from `requirements.txt`): Jinja2, rdflib, requests, urllib3, Werkzeug, defusedxml

## Project structure

```
fair-data-fund/
├─ configure.ac # Autotools: defines Python version & outputs Makefiles/pyproject
├─ Makefile.am # Autotools: top-level build/dist instructions
├─ pyproject.toml.in # Template → becomes pyproject.toml after ./configure
├─ requirements.txt # Runtime dependencies
└─ src/
├─ Makefile.am # Autotools: package layout & installables
└─ fair_data_fund/
├─ *.py # Application modules (wsgi, ui, database, email, validator, rdf, ...)
└─ resources/
├─ html_templates/ # Jinja2 templates (home, application form, review UI, etc.)
├─ static/ # CSS, JS (jquery, quill, dropzone), fonts, images
└─ sparql_templates/ # Jinja2-templated SPARQL queries
```

### Notable modules

- `fair_data_fund.wsgi`: WSGI app; routing via `werkzeug.routing.Map/Rule`; static files via `SharedDataMiddleware`.
- `fair_data_fund.ui`: CLI entry; starts the dev server with `werkzeug.serving`. Handles `--config-file` and `--initialize`.
- `fair_data_fund.database`: SPARQL client built on `rdflib.plugins.stores.sparqlstore.SPARQLStore`. Uses Jinja2-rendered SPARQL templates. Default endpoint: `http://127.0.0.1:8890/sparql`. SPARQL Update can be configured separately.
- `fair_data_fund.validator`, `formatter`, `email_handler`, `rdf`, `cache`, `convenience`: support logic.

## Prerequisites

- Python **3.8+**
- A **SPARQL 1.1 endpoint** with update enabled (e.g. Virtuoso at `:8890` or Jena Fuseki). You can run one locally using Docker; set the endpoint URLs in the XML config below.
- (Optional) SMTP account for outgoing email notifications.
- (Optional) SAML Identity Provider metadata if you want SSO in dev.

## Local setup (fast path for development)

You can run the app directly from source without a full Autotools build:

```bash
# 1) Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 2) Install Python dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 3) Create a config file (see example below), then start the server
python -m fair_data_fund.ui --config-file ./dev-config.xml --initialize
# or, after installation via setuptools (see "Autotools build" below):
# fair-data-fund --config-file ./dev-config.xml --initialize
```

This starts a dev server (defaults to `127.0.0.1:8080`). The `--initialize` flag will populate the RDF store with default triples (institutions, etc.).

> **Tip:** If you run from source, `python` needs to see the `src/` tree. From the repo root, `python -m fair_data_fund.ui` works because Python will pick up `src/` automatically when installed in editable mode. If you hit `ModuleNotFoundError`, run `pip install -e .` after generating `pyproject.toml` (see below), or set `PYTHONPATH=src`.

## Minimal config (XML)

Create a file `dev-config.xml` in the repo root with something like:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<fair-data-fund>
<!-- Web server -->
<address>127.0.0.1</address>
<port>8080</port>
<base-url>http://127.0.0.1:8080</base-url>
<debug-mode>1</debug-mode>
<live-reload>1</live-reload>

<!-- Caching (optional) -->
<cache-root clear-on-start="1">.cache</cache-root>

<!-- Storage & RDF/SPARQL -->
<storage-root>.fdf_storage</storage-root>
<rdf-store>
<state-graph>default://graph</state-graph>
<sparql-uri>http://127.0.0.1:8890/sparql</sparql-uri>
<sparql-update-uri>http://127.0.0.1:8890/sparql</sparql-update-uri>
</rdf-store>

<!-- Email (optional) -->
<email>
<server>smtp.example.com</server>
<port>587</port>
<username>user</username>
<password>pass</password>
<from>[email protected]</from>
<subject-prefix>[FDF]</subject-prefix>
<starttls>1</starttls>
</email>

<!-- SAML SSO (optional) -->
<saml>
<entity-id>https://fdf.local/</entity-id>
<certificate-file>/path/to/sp.pem</certificate-file>
<private-key-file>/path/to/sp.key</private-key-file>

<identity-provider>
<entity-id>https://idp.example.org/</entity-id>
<x509-certificate>...PEM...</x509-certificate>
<single-signon-service>
<url>https://idp.example.org/sso</url>
<binding>urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect</binding>
</single-signon-service>
</identity-provider>
</saml>
</fair-data-fund>
```

Only the SPARQL settings are required for a functional dev run; email & SAML are optional.

## Running a local SPARQL endpoint (example)

**Virtuoso (quick start):**
```bash
docker run -it --name virtuoso -p 8890:8890 -e DBA_PASSWORD=dba openlink/virtuoso-opensource-7:latest
# After it starts, ensure SPARQL Update is enabled and use the endpoints in dev-config.xml.
```

**Jena Fuseki (alternative):**
```bash
docker run -it --name fuseki -p 3030:3030 stain/jena-fuseki
# Create a dataset in the UI, then set:
# <sparql-uri>http://127.0.0.1:3030/your-dataset/sparql</sparql-uri>
# <sparql-update-uri>http://127.0.0.1:3030/your-dataset/update</sparql-update-uri>
```

## Autotools build (optional, for packaging)

This repo uses GNU Autotools to generate the packaging metadata (`pyproject.toml`) and Makefiles. If you want to build & install the package the “classic” way, do:

```bash
# Install autotools if needed (autoconf, automake, libtool)
autoreconf -i # generates ./configure and friends
./configure # writes Makefile(s) and pyproject.toml from pyproject.toml.in
make # builds dist artifacts (and assembles extra resources)
pip install . # installs the package (exposes the 'fair-data-fund' CLI)
# (Optional) make distcheck / make install
```

Once installed:
```bash
fair-data-fund --config-file ./dev-config.xml --initialize
```

## Common issues

- **SPARQL endpoint not reachable**: The app will log failures if it cannot connect. Check `rdf-store` URIs, port mappings, and that SPARQL Update is enabled.
- **Missing fonts/static**: Static is served by the WSGI app via `SharedDataMiddleware`; ensure you’re running from an installed layout or from the source tree so resource paths resolve.
- **SAML misconfiguration**: Start without SAML in dev; add it later with correct IdP metadata.
- **Email sending fails**: Verify SMTP credentials; set `<starttls>1</starttls>` only if your SMTP server supports it.

## License

AGPL-3.0-or-later (see `LICENSE` if included in your dist).

---

_This README was generated from the uploaded repository contents to give you a working starting point._
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,14 @@ <h2>Basic information</h2>
<label for="datatype">Data types handled in the project</label><div class="fas fa-question-circle help-icon"><span class="help-text">For example: Quantitive data, qualitative data, research software.</span></div>
<input type="text" id="datatype" name="datatype" value="{{application.datatype}}" />

<label for="size">What is the volume (in gigabytes) of the total dataset?</label>&nbsp;<span class="required-field">&#8727;</span>
<label for="size">
What is the volume (in gigabytes) of the total dataset?
<br>
<small style="font-weight: normal; color: #555;">
💡 Every user has an initial storage space of 10GB assigned.
If you want to upload a larger dataset, you can request more storage space via the repository website.
</small>
</label>&nbsp;<span class="required-field">&#8727;</span>
<input type="text" id="size" name="size" value="{{application.size}}" />

<label>Briefly describe the content of the data and file formats before obtaining funding</label><div class="fas fa-question-circle help-icon"><span class="help-text">Include a description of the data (raw, processed, analyzed) and research software as these are currently available. In later questions, you can elaborate on how the FAIR principles will be implemented. </span></div>
Expand Down