-
Notifications
You must be signed in to change notification settings - Fork 13
first rework of project setup chapter #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| /.quarto/ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,11 +2,165 @@ | |
| format: "html" | ||
| --- | ||
|
|
||
| # Project setup | ||
| # Overview | ||
|
|
||
| ### Video | ||
| | Questions | Objectives | Key Concepts / Tools | | ||
| | - | - | - | | ||
| | How do I set up my project to be reproducible? | Create a project directory and subdirectories following best practices | Research Compendium | | ||
| | How should I name my files? | Update file names as necessary. | Naming Conventions | | ||
| | How do I link the different components of my project? | Update folder & file paths as necessary | Absolute vs. Relative Paths | | ||
|
|
||
| {{< video https://vimeo.com/462773031 >}} | ||
|
|
||
| ## The Project Directory | ||
|
|
||
| The first step in making your code reproducible is setting up your project in a self-contained directory. This directory - which can also be described as a [_research compendium_](https://book.the-turing-way.org/reproducible-research/compendia/) - should contain all the (digital) components of the project. These componenents should be structured in such a way that reproducing all results is straightforward. | ||
|
|
||
| ## Getting Started | ||
|
|
||
| - Begin by creating a single, recognizable folder (directory) named after your project. | ||
|
|
||
| - Creating subfolders (subdirectories) that distinguish the _type_ of files depending on their content or nature. For example: | ||
|
|
||
| - `data` (RO) | ||
| - `src` / `scripts` / `R` (HW) | ||
| - `output` (PG) | ||
|
|
||
| Where: | ||
|
|
||
| - _read-only (RO)_: not edited by either code or researcher | ||
| - _human-writeable (HW)_: edited by the researcher only. | ||
| - _project-generated (PG)_: folders generated when running the code; these folders can be deleted or emptied and will be completely reconstituted as the project is run. | ||
|
|
||
| - Initialize the following files: | ||
| - README.md | ||
| - LICENSE.md | ||
| - CITATION.cff | ||
|
|
||
| - Initialize version control (if not done earlier) | ||
|
|
||
| ## A Good Enough Project | ||
|
|
||
| The tabset below outlines examples of a 'good enough project' in R & Python. These projects are available as [templates](https://utrechtuniversity.github.io/workshop-computational-reproducibility/chapters/version-control.html#simple-repository-templates-provided-by-utrecht-university) which you will (re)use in the next sections. | ||
|
|
||
| ::: {.panel-tabset} | ||
|
|
||
| ### R | ||
|
|
||
| ``` | ||
| . | ||
| ├── .gitignore | ||
| ├── CITATION.cff | ||
| ├── LICENSE.md | ||
| ├── README.md | ||
| ├── data <- All project data, ignored by git | ||
| │ ├── processed <- The final, canonical data sets for modeling. (PG) | ||
| │ ├── raw <- The original, immutable data dump. (RO) | ||
| │ └── temp <- Intermediate data that has been transformed. (PG) | ||
| ├── docs <- Documentation notebook for users (HW) | ||
| │ ├── manuscript <- Manuscript source, e.g., LaTeX, Markdown, etc. (HW) | ||
| │ └── reports <- Other project reports and notebooks (e.g. Jupyter, .Rmd) (HW) | ||
| ├── results | ||
| │ ├── figures <- Figures for the manuscript or reports (PG) | ||
| │ └── output <- Other output for the manuscript or reports (PG) | ||
| ├── R <- Source code for this project (HW) | ||
| └── MyProject.Rproj <- R Project File (PG) | ||
| ``` | ||
|
|
||
| ### Python | ||
|
|
||
| ``` | ||
| . | ||
| ├── .gitignore | ||
| ├── CITATION.cff | ||
| ├── LICENSE.md | ||
| ├── README.md | ||
| ├── requirements.txt | ||
| ├── data <- All project data, ignored by git | ||
| │ ├── processed <- The final, canonical data sets for modeling. (PG) | ||
| │ ├── raw <- The original, immutable data dump. (RO) | ||
| │ └── temp <- Intermediate data that has been transformed. (PG) | ||
| ├── docs <- Documentation notebook for users (HW) | ||
| │ ├── manuscript <- Manuscript source, e.g., LaTeX, Markdown, etc. (HW) | ||
| │ └── reports <- Other project reports and notebooks (e.g. Jupyter, .Rmd) (HW) | ||
| ├── results | ||
| │ ├── figures <- Figures for the manuscript or reports (PG) | ||
| │ └── output <- Other output for the manuscript or reports (PG) | ||
| └── src <- Source code for this project (HW) | ||
| ``` | ||
|
|
||
| ::: | ||
|
|
||
|
nehamoopen marked this conversation as resolved.
|
||
| ## Names & Naming Conventions | ||
|
|
||
| All files (and folders) should be named to reflect their content or function. These names should be immediately understandable to you and others. | ||
|
|
||
| A naming convention is a set of rules for naming things, particularly so that they're machine-readable. You can apply it to things like folders, files, and variables. Here are some popular naming conventions: | ||
|
|
||
| | Naming Covention | Example | Description | | ||
| | ---------------- | ----------------- | ----------- | | ||
| | original name | `an awesome name` | N/A | ||
| | snake_case | `an_awesome_name` | All words are lowercase and separated by an underscore ( `_` ) | | ||
| | kebab-case | `an-awesome-name` | All words are lowercase and separated by a hyphen ( `-` ) | | ||
| | PascalCase | `AnAwesomeName` | All words are capitalized. Spaces are not used. | | ||
| | camelCase | `anAwesomeName` | The first word is lowercase, the remaining words are capitalized. Spaces are not used. | | ||
|
|
||
| If you want to retroactively apply a naming convention, you can use your programming language of choice or the command line. | ||
|
|
||
| ## Absolute vs. Relative Paths | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure whether this chapter fits here or better in book/chapters/reusability.qmd
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I revisited the reusability chapter and I think we could keep the stuff about paths here, so people can focus on the concept of functionalizing/modularizing code in that section. But you can let me know if you think otherwise! |
||
|
|
||
| When linking files, directories, or scripts in your project, use _relative paths_ to ensure your project remains reproducible and portable. | ||
|
|
||
| - **Absolute paths** specify the full location of a file or directory from the root of the filesystem (e.g., `C:/Users/name/project/data/file.csv`). While absolute paths always point to the exact location, they are not portable — these paths will break if the project is moved or shared across different machines. This can also happen on your own computer if you rename or change any part of the path before your project directory. | ||
|
|
||
| - **Relative paths** specify the location of a file or directory relative to the current working directory (e.g., `./data/file.csv`). If you’ve structured your project as a self-contained directory, the root of that directory should be your working directory. Relative paths are portable and reproducible, provided the working directory remains consistent. | ||
|
|
||
| ### Example | ||
|
|
||
| Let's inspect the following example file structure and some best parctices for Python, R, Matlab: | ||
|
|
||
| ``` | ||
| my_project/ | ||
| │ | ||
| ├ data/ | ||
| │ file.csv | ||
| │ | ||
| └ script.py | ||
| ```` | ||
|
|
||
| #### Python | ||
|
|
||
| Open the folder `my_project` in your integrated development environment (IDE), e.g. Jupyter Notebooks or PyCharm (see some suggestions below.). You can address the csv file like this: | ||
|
|
||
| ```py | ||
| import pandas as pd | ||
|
|
||
| df = pd.read_csv("data/file.csv") | ||
| ``` | ||
| Since the working directory of your IDE is your project folder, the compiler will find the data automatically. | ||
|
|
||
| #### R | ||
|
|
||
| In R you can use `RStudio` as IDE, simply create `File → New Project → New Directory` _or_ `File → New Project → Existing Directory` depending on whether you creating a new project or working with an existing project. | ||
|
|
||
| ```r | ||
| data <- read.csv("data/file.csv") | ||
| ``` | ||
|
|
||
| #### Matlab | ||
|
|
||
| Similar as in R you can set up a project folder in Matlab and use: | ||
|
|
||
| ``` | ||
| data = readtable("data/file.csv"); | ||
| ``` | ||
|
|
||
| In general, IDEs help you to make use of relative paths: | ||
|
|
||
| Language | Beginner IDE | Feature | ||
| -- | -- | -- | ||
| Python | Visual Studio Code | workspace folder | ||
| R | RStudio | built-in projects | ||
| MATLAB | MATLAB | current folder | ||
|
|
||
| ### Slides | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,38 +8,20 @@ format: | |
| --- | ||
|
|
||
|
|
||
| # Welcome!{data-background-color="#FFCD00"} | ||
| ## Welcome! Who are you? | ||
|
|
||
| - What is your role/position? | ||
|
|
||
| ## Thanks: | ||
| - What is your faculty/background? | ||
|
|
||
| ::: {.theme-section} | ||
|
|
||
| - To you for being here! | ||
|
|
||
| - To the [Open Science Community Utrecht (OSCU)](https://openscience-utrecht.com) and [Research Data Management Support (RDM Support)](https://www.uu.nl/en/research/research-data-management) for supporting the development of this workshop | ||
|
|
||
| - To Armel Lefebvre, Bianca Kramer, Cedric Thieulot, Erik van Sebille, Jeroen Bosman, Jeroen Ooms, Jonathan de Bruin, Lukas van de Wiel, Mateusz Kuzak, Menno Fraters, Neha Moopen, Philippe Delandmeter, and Renato Alves, for helping develop this workshop | ||
|
|
||
| - To your teachers and helpers today! | ||
|
|
||
| <br> | ||
|
|
||
| #### Who are you? What brings you here? | ||
|
|
||
| - E.g. your name, pronouns, background, and motivation for this workshop. | ||
|
|
||
| ::: | ||
| - What brought you to this workshop?? | ||
|
|
||
| ## General guidelines (and advice!) | ||
| ## Before We Start... | ||
|
|
||
| ::: {.theme-section} | ||
|
|
||
| * Ask for help when you need it! | ||
| + Raise your hand | ||
| + Use a sticky note | ||
| * Our helpers can help you with technical issues | ||
|
|
||
| * Our helpers can help you with technical issues. | ||
| * Take a computer break when we take a break! | ||
|
|
||
| <br> | ||
|
|
@@ -49,7 +31,9 @@ You can find all workshop information at [tinyurl.com/repcopilot](https://utrech | |
|
|
||
| ::: | ||
|
|
||
| ## Being reproducible | ||
| ## Let's Git Started! | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pun intended ;) |
||
|
|
||
| ## Being Reproducible | ||
|
|
||
| {width=75%} | ||
|
|
||
|
|
@@ -111,7 +95,7 @@ Automated analyses trace your steps, and prevent human error (or at the very lea | |
| ::: {.column width="40%"} | ||
| {width=60%} | ||
| ::: | ||
| :::: | ||
| ::: | ||
|
|
||
| ## What will we do in this workshop? | ||
|
|
||
|
|
@@ -131,21 +115,12 @@ We will take you through a workflow (in a broad sense!) | |
|
|
||
| ::: | ||
|
|
||
| ## What do we expect from you? | ||
|
|
||
| ::: {.theme-section} | ||
|
|
||
| - Our group has many different abilities and experiences. We hope you will value this as much as we do! | ||
| - We have done our best to make the workshop asynchronous so you can work at your own pace. | ||
| - However, we have incorporated several moments for shared discussion and questions. Use them! | ||
| - Feel free and safe to share your expertise and experiences. | ||
| ## What do we want to achieve? | ||
|
|
||
| #### Our objectives for you | ||
| <br> | ||
| We want to teach you **good habits** that will make your work more accessible, trustworthy, and reproducible by others. In doing so, we have tried to identify those habits that are a **good return on investment**: meaning, they save you time in the not-so-long run. | ||
| We want develop **good habits** that will make our code more accessible, trustworthy, and reproducible by others. We try to focus on habits that are a **good return on investment**: meaning, they save you time in the not-so-long run. | ||
|
|
||
| <br> | ||
| <br> | ||
|
|
||
| #### And finally: we hope you enjoy the workshop | ||
|
|
||
| ::: | ||
| ### Enjoy the workshop! | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure about that. Data is usually updated, extended or even fully revised (e.g. an experiment that did not go well).
I like the distinction between human-writeable and project-generated. Although I would also rename the latter to "output files" or "generated files" ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more about the principle, in that there are some folders that may only be read-only? It doesn't necessarily have to be the data directory. It can be another one. Or perhaps the
raw datadirectory.