diff --git a/a_sample_module_template/a_sample_module_template.md b/a_sample_module_template/a_sample_module_template.md
deleted file mode 100644
index 659dc2a4e..000000000
--- a/a_sample_module_template/a_sample_module_template.md
+++ /dev/null
@@ -1,285 +0,0 @@
-
-
-# Title
-
-(Note that the title is the only level-1 header in the document)
-
-To see how to use this template, you'll need to look at this file in its [raw format](https://raw.githubusercontent.com/arcus/education_r25/main/working_documentation/template_modules.md?token=ACEVZUTXZ6BTRFIIBXPN4SDBD3FR6).
-To see what it looks like rendered via LiaScript, [click here](https://liascript.github.io/course/?https://raw.githubusercontent.com/arcus/education_r25/main/working_documentation/template_modules.md?token=ACEVZUTXZ6BTRFIIBXPN4SDBD3FR6#1) or go to [https://liascript.github.io/](https://liascript.github.io/) and paste the link to the **raw** file into the box on that page and click "load course".
-
-
-
-## Overview
-@comment
-
-**Is this module right for me?** @long_description
-
-**Estimated time to completion:**
-
-**Pre-requisites**
-
-List any skills and knowledge needed to do this module here. When available, include links to resources, especially other modules we've made (to show learners where this falls within our catalog).
-
-* one skill we have [another module for, linked here](https://education.arcus.chop.edu)
-* some familiarity with [a topic](https://education.arcus.chop.edu)
-* understanding of [one thing](https://education.arcus.chop.edu) and [another](https://education.arcus.chop.edu)
-
-If relevant, you can include recommendations for somewhere else to start if the learner doesn't have these prereqs. For example: If you are brand new to R or python (or want a refresher) consider starting with [Intro to R](link) or [Intro to python](link) first and then coming back here.
-
-**Learning Objectives**
-
-@learning_objectives
-
-For help articulating learning objectives, see [this guide to learning objectives, including lots of example verbs](https://cft.vanderbilt.edu/guides-sub-pages/blooms-taxonomy/).
-
-
-
-## Lesson Preparation
-
-If your module includes code learners may want to run, then give links to a pangeo binder here so they can start it up now. Also provide a link to the raw code so learners can download the code itself and run it on their own machines or copy it into a cloud server.
-
-This module makes use of [pangeo binder](https://binder.pangeo.io/) for interactive code examples in R and python. You don't need to install anything or set up an account, but you need a modern web browser like Chrome and a moderately good wifi connection. If you have R and/or python already installed on your computer and you prefer to work through code examples there, you can download the code for this module to run offline.
-
-If you intend to do the hands-on activities in this module with pangeo binder, we have a bit of preparation for you to do now. Because it can take a few minutes for the environment to be created, we suggest you click the link below to start up the activity. We recommend using right-click to open it in a new tab or window, and then returning here to continue learning while the environment finishes loading. Here is the link:
-
-[](https://binder.pangeo.io/v2/gh/arcus/education_r_environment/main?urlpath=rstudio) **Click the "launch binder" button!**
-
-You don't have to do anything except come back here once the link opens in a new tab or window.
-
-## Module Content
-
-Note that liascript will create a new page at each level 1, 2, or 3 header, so to avoid a page with only a header and no content, include text after each header before the next.
-
-Text after level 2 headers provides a good opportunity to give a sentence or two of overview, explain the structure of the coming content, and/or get preliminaries out of the way.
-
-## Including Media
-
-
-
-
-
-You can link to images online with their url, or locally with the file path, e.g. ``
-
-If you want to provide several images in a gallery, just make a "paragraph" of image links and LiaScript will render it as a gallery:
-
-  
-
-
-!?[This video is hosted on youtube.](https://www.youtube.com/watch?v=iIAO4Htzn8M)
-
-You can also embed local videos, just as with images: `!?[An embedded video.](vid/intro.mp4 "This is its subtitle")`
-
-In theory, you should be able to embed just about anything. Read more [here](https://liascript.github.io/course/?https://raw.githubusercontent.com/LiaScript/docs/master/README.md#24).
-
-You can also include movies, audio, and any other embedded content in galleries just by putting the links for them all in a paragraph.
-
-## Including highlight boxes
-
-Paragraph text
-
-
-Include special notes with different formatting. This style is for important points and key ideas.
-
-
-More text
-
-
-This style is for a quote.
-
-— Maya Angelou, And Still I Rise
-
-
-More text
-
-
-This style alerts users to potential pitfalls.
-
-
-More text
-
-
-This style alerts users resources for further learning, especially links to a more in-depth discussion of an issue that might be touched on only briefly in the module.
-
-
-More text
-
-
-This style is for an aside to let learners know there's another possible approach (e.g. "You could also skip setting up an OSF account completely and just use github to publish and share your research products, but many people prefer to have OSF links available" or "To do this in R instead of python, see this other module")
-
-
-There's an additional style of highlight, "answer", that is used in quizzes.
-
-## Including math
-
-I want to include a math statement here: $ 1 + \beta = 2 $
-
-## Including code
-
-Next comes some code. This code won't do anything (it's not interactive).
-
-```r
-# You only need to install it once
-install.packages("ggplot2")
-
-# You'll need to load the library anew for each R session
-library("ggplot2")
-```
-You don't have to specify the programming language, but you can, and it should help you get appropriate syntax highlighting.
-
-```python
-print("This is python code")
-```
-
-It is possible to include interactive code, too! See [the Rextester template for LiaScript](https://github.com/LiaTemplates/Rextester).
-
-## Quiz 1
-
-Quizzes are just more markdown text, so if you want it to show up on its own page, put a new header before it. Otherwise you can include quiz questions at the end of a section, or even interspersed with the rest of your content.
-
-Quizzes should connect directly to your learning objectives. Each quiz question should connect to one learning objective, and every learning objective should have at least one quiz question associated with it somewhere in the module.
-
-Here is the first question. It's multiple choice.
-
-[(X)] This answer is right
-[( )] This is wrong
-[( )] Also wrong
-[[?]] Hint: Provide a hint here if you like. Hints are marked with the ?
-[[?]] Hint: You can include as many hints as you want.
-
-You can have questions with multiple correct answers. Select all of the following correct choices:
-
-[[ ]] Not this one
-[[X]] This is one of the correct ones
-[[X]] Here's another correct one
-[[ ]] This one is wrong, though
-[[?]] Hint: Remember to select ALL of the correct choices.
-
-True or False: This statement is NOT true. ;)
-
-[( )] TRUE
-[(X)] FALSE
-
-Short answer/text response. Note that, without any additional script, to get it marked "correct" the learner has to enter it exactly as you do.
-
-[[right answer]]
-[[?]] Hint: The answer is "right answer"
-***
-
-This is extra text that will show up after the learner clicks to have the correct answer revealed. It can be as long as you like, and allows any markdown formatting (you can embed pictures or videos, links, etc.).
-
-Use `
` to mark these sections with special styling, so that they're visually distinct from the rest of the quiz. The style for `"answer"` is defined in the css file.
-
-For this context to show up automatically when the learner answers the question correctly or clicks to have the right answer revealed, it needs to be surrounded by `***` (at least three, but you can use more if you want a more visually distinct horizontal marker in your md file).
-
-***
-
-We can allow some flexibility in what we accept as correct answers for text by adding a little script after the answer, though. For the following, either "right answer" or "correct answer" (not case sensitive) will be accepted:
-
-[[right answer]]
-
-***
-
-For this question, either "right answer" or "correct answer" (not case sensitive) counts as correct.
-
-***
-
-This question accepts any of several items from a list of possible correct answers. It is not case sensitive (that's the little `i` at the end of the regex).
-
-[[this text will never show up if they type a right answer and click "Check", only if they click the checkmark button to reveal the answer]]
-[[?]] Hint: The answers are like "item1", "item2", etc.
-
-***
-
-With flexible answers like this, it's definitely a good idea to include a follow-up to help the learner put their answer in context.
-
-For example, if the question was "Name one or more colors" with acceptable answers including red, orange, yellow, green, blue, and purple, and they wrote "red, green, and the center of a black hole" that would be marked as correct because it contains at least one string from the acceptable list. Similarly, "hammered metal" would be marked as correct because it contains the string "red" ([you can prevent this if you want](https://www.w3schools.com/jsref/jsref_regexp_begin.asp)). On the other hand "teal, scarlet, indigo" would be marked wrong.
-
-Reiterate what the correct answer or answers should be, and try to anticipate likely wrong answers so you can explain why they're not correct.
-
-***
-
-There are also questions that allow you to select from a drop down, but I don't know why that would be preferable over regular multiple choice. [Read more about quiz syntax here.](https://liascript.github.io/course/?https://raw.githubusercontent.com/liaScript/docs/master/README.md#quizzes)
-
-Note that you can use any markdown formatting you want in quizzes, including bold, links, math, etc.
-
-Surveys (ungraded questions)
----
-
-You can ask questions with no graded answer as well. LiaScript calls these [surveys](https://liascript.github.io/course/?https://raw.githubusercontent.com/LiaScript/docs/master/README.md#111).
-
-Here's an ungraded question with a text box three lines long:
-
-[[___ ___ ___]]
-
-Here's one that's just one line long:
-
-[[___]]
-
-Here's a multiple choice with no correct answer. What is your favorite Beatles album?
-
-[(rev)] Revolver
-[(wa)] The While Album
-[(ar)] Abbey Road
-[(sgtp)] Sgt. Pepper's Lonely Hearts Club Band
-
-Here's a survey multiple choice that lets you select more than one response. Which Beatles albums do you love super hard?
-
-[[rev]] Revolver
-[[wa]] The While Album
-[[ar]] Abbey Road
-[[sgtp]] Sgt. Pepper's Lonely Hearts Club Band
-
-Hints and follow-up explanations don't work for survey questions.
-
-
-## Additional Resources
-
-The last section of the module content should be a list of additional resources, both ours and outside sources, including links to other modules that build on this content or are otherwise related.
-
-## Feedback
-
-In the beginning, we stated some goals.
-
-**Learning Objectives:**
-
-@learning_objectives
-
-We ask you to fill out a brief (5 minutes or less) survey to let us know:
-
-* If we achieved the learning objectives
-* If the module difficulty was appropriate
-* If we gave you the experience you expected
-
-We gather this information in order to iteratively improve our work. Thank you in advance for filling out [our brief survey](https://redcap.chop.edu/surveys/?s=KHTXCXJJ93&module_name=%22Module+Template%22)!
-
-Remember to change the redcap link so that the module name is correct for this module!
diff --git a/data_visualization_in_ggplot2/data_visualization_ggplot2.md b/data_visualization_in_ggplot2/data_visualization_ggplot2.md
deleted file mode 100644
index 32026c96f..000000000
--- a/data_visualization_in_ggplot2/data_visualization_ggplot2.md
+++ /dev/null
@@ -1,858 +0,0 @@
-
-# Data Visualization in ggplot2
-
-
-# Overview
-@comment
-
-**Is this module right for me?** @long_description
-
-**Estimated time to completion:** 1 hr
-
-**Pre-requisites**
-
-This module assumes some familiarity with principles of data visualizations as applied in the ggplot2 library. If you've used ggplot2 (or python's seaborn) a little already and are just looking to extend your skills, this module should be right for you. If you are brand new to ggplot2 and seaborn, start with the overview of [data visualizations in open source software](https://liascript.github.io/course/?https://raw.githubusercontent.com/arcus/education_modules/main/data_visualization_in_open_source_software/data_visualization.md) first, and then come back here.
-
-This module also assumes some basic familiarity with R, including
-
-* [installing and loading packages](https://r4ds.had.co.nz/data-visualisation.html#prerequisites-1)
-* [reading in data](https://r4ds.had.co.nz/data-import.html)
-* manipulating data frames, including [calculating new columns](https://r4ds.had.co.nz/transform.html#add-new-variables-with-mutate), and [pivoting from wide format to long](https://r4ds.had.co.nz/tidy-data.html#longer)
-* some [statistical tests](https://liascript.github.io/course/?https://raw.githubusercontent.com/arcus/education_modules/main/statistical_tests/statistical_tests.md), especially linear regression
-
-If you are brand new to R (or want a refresher) consider starting with [Intro to R](https://liascript.github.io/course/?https://raw.githubusercontent.com/arcus/education_modules/main/intro_to_r_rstudio/intro_to_r_rstudio.md) first.
-
-**Learning Objectives**
-
-@learning_objectives
-
-
-
-
-## Lesson Preparation
-
-This module makes use of [pangeo binder](https://binder.pangeo.io/) for interactive code examples in R. You don't need to install anything or set up an account, but you need a modern web browser like Chrome and a moderately good wifi connection. If you have R already installed on your computer and you prefer to work through code examples there, you can download the code for this module to run offline.
-
-If you intend to do the hands-on activities in this module with pangeo binder, we have a bit of preparation for you to do now. Because it can take a few minutes for the environment to be created, we suggest you click the link below to start up the activity now. Use right-click to open it in a new tab or window, and you can simply return here to continue learning while the environment finishes loading. Here is the link:
-
-[](https://binder.pangeo.io/v2/gh/arcus/education_r_environment/roseh-data-viz-module?urlpath=rstudio) **Click the "launch binder" button!**
-
-You don't have to do anything except come back here after opening the link opens in a new tab or window.
-
-## Making plots in ggplot2
-
-This module is a practical, hands-on guide to making data visualizations in R's ggplot2. Snippets of code are included throughout the text here, but you are strongly encouraged to try running the code yourself instead of just reading it. Better yet, try to modify the code for each of the example plots to use with your own data!
-
-
-If you are using the [pangeo binder instance we prepared](#lesson-preparation), then all of the R packages you need will already be installed and you're all set.
-
-If you are using R on your own machine, though, then you may need to run the following code in R before continuing with the code examples here:
-
-```r
-install.packages("ggplot2", "readr", "dplyr")
-
-```
-
-
-### How ggplot2 works
-
-If you've already used R for other tasks, you may feel like the R code for ggplot2 is a little different. Usually, when you want R to do something, you use a single function, or for more complicated tasks, a set of nested or [piped functions](https://style.tidyverse.org/pipes.html). For example, if you want to [create a scatterplot in base R](https://www.statmethods.net/graphs/scatterplot.html), you might run something like this:
-
-```r
-plot(wt, mpg, data = mtcars)
-```
-
-In ggplot2, you use the ggplot() function to generate an empty base plot, and then you **add** each of the elements of the plot as a layer. For example, to generate a scatterplot, you start with a command like `ggplot(mtcars, mapping = aes(x=mpg, y=wt))` to set up the basic information for the plot, and then you add a layer saying what kind of plot you want to draw, like `+ geom_point()` to indicate a scatterplot.
-
-At first, this seems like more work than just using a single command in another plotting system, and it is true that ggplot visualizations are often more lines of code than other kinds of visualizations. The idea of breaking a plot into different kinds of pieces and applying each as a layer has some advantages, though, in that it makes it easier to tweak plots to get exactly what you want --- this is important both for generating plots that can be made to adhere to formatting rules (like for a journal article submission), and because tweaking a plot and re-visualizing the data is a powerful way to explore trends and patterns when you're analyzing a data set.
-
-
-To learn more about the theory behind ggplot2, read [Hadley Wickham's article, "A Layered Grammar of Graphics"](http://vita.had.co.nz/papers/layered-grammar.pdf)
-
-
-## Working through interactive coding examples
-
-Hopefully your [binder instance](#lesson-preparation) is done loading now! If not, be patient --- it can take as long as 20 or 30 minutes some times if the files haven't been used recently.
-
-When it is ready, you should see the RStudio application running in your browser. In the Files pane in the lower right corner, there is a list of subfolders available. Open the one called "data_viz_in_ggplot", and open the .r file in that subfolder.
-
-All of the example code in this module is in that data_visualization_ggplot2.r file. While you read through this module, we recommend you keep returning back to the binder instance to try running the code for yourself. Even better, try changing the code and see what happens.
-
-
-Note that binder instances aren't stable. When you close the window or if it idles too long, it may erase all of your work. If you want to save any code or output you come up with while working in binder, you need to copy-paste the code to a new file to save it on your computer.
-
-
-## Scatterplots
-
-Scatterplots show the relationship between two continuous variables, one on the x-axis and one on the y-axis. Because they show each individual data point as a marker, they also provide a handy way to check visually for outliers. For more background on scatterplots, watch [this Kahn Academy series](https://www.khanacademy.org/math/cc-eighth-grade-math/cc-8th-data/cc-8th-scatter-plots/v/constructing-scatter-plot).
-
-### The data
-
-First, we need to load the libraries we'll be using:
-
-```r
-# the libraries we'll be using
-library(readr)
-library(dplyr)
-library(ggplot2)
-
-```
-
-
-The readr and dplyr packages, like ggplot2, are part of the [tidyverse](https://www.tidyverse.org/), a set of R packages for data science. Check out the free R for Data Science book online to learn more about both readr (the [data import](https://r4ds.had.co.nz/data-import.html) chapter) and dplyr (the [data transformation](https://r4ds.had.co.nz/transform.html) chapter).
-
-
-And then read in the data set:
-
-```r
-breast_cancer_data <- read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00451/dataR2.csv")
-
-```
-
-
-Run the above code yourself in binder (see [lesson preparation](#lesson-preparation) for links to start the binder instance) or on your own computer.
-
-In the data_visualization_ggplot2.r file, the code at the top of the file includes these library commands and the command to read the csv file for the data. Before you will be able to generate the plots in the rest of the module, you should run those lines of code.
-
-
-
-### Basic scatterplot
-
-To make any plot using ggplot2, we start by specifying the variables we'll be using and how --- this information is called the [aesthetic mapping](https://ggplot2.tidyverse.org/reference/aes.html), and is included by using the aes() function in the mapping argument of the ggplot2 command. Here, the important aesthetics are just x and y. After setting the aesthetics with the ggplot() command, then we add a layer specifying how we want ggplot2 to plot this information, in our case as a scatterplot, which is specified using geom\_point().
-
-```r
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point()
-
-```
-
-
-
-### Using color for continuous variables
-
-Let's try adding information about a third variable by using color. ggplot2 uses color differently depending on whether the variable is continuous ("numeric" in R) or categorical (a "factor" in R). First we'll look at a continuous variable.
-
-```r
-# use color to add information about a continuous variable
-ggplot(breast_cancer_data(y=Glucose, x=Age, color = BMI)) +
- geom_point()
-
-```
-
-
-
-Note that when you add an aesthetic for color (or shape, line type, alpha, or size), it will automatically add a legend to your plot.
-
-### Using color to show groups
-
-Now let's look at using color for a categorical variable. In this case, the variable is a categorical one (Classification, 1 or 2), but it isn't properly coded as a factor in the data frame. We'll fix that first and then send the corrected dataframe to the ggplot command.
-
-
-Tip: It's generally much easier to make any necessary changes to the dataframe, such as mutating variables, before sending it to the plotting command.
-
-
-```r
-# the Classification variable is currently treated as numeric,
-# so convert it to a factor
-breast_cancer_data <- breast_cancer_data %>%
- mutate(Class_factor = factor(Classification,
- levels = c(1,2),
- labels = c("Class 1", "Class 2")))
-
-# use color to add information about a categorical variable
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age, color = Class_factor)) +
- geom_point()
-
-```
-
-
-
-### Distinguish groups more clearly with custom colors and shape
-
-So far, we've been sticking to ggplot's default color scheme, but you can control what colors are used in your plots. There are a number of excellent tutorials available about how to control the colors in your ggplot visualizations (here is one [color in ggplot tutorial]((https://blogs.uoregon.edu/rclub/2015/02/17/picking-pretty-plot-palates/)), and [another color in ggplot tutorial](https://www.r-graph-gallery.com/ggplot2-color.html)). Here we'll just show one approach, using colors you specify by hand.
-
-```r
-# save the colors you want to use as a vector
-# you can specify colors by name (e.g. "blue"),
-# or use HTML codes, as from https://htmlcolorcodes.com/color-picker/
-class_colors <- c(`Class 1` = "#FEB648", `Class 2` = "#3390FF")
-
-# add a layer with scale_color_manual to specify the colors you want to use
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age, color = Class_factor)) +
- geom_point() +
- scale_color_manual(values = class_colors)
-
-```
-
-
-
-
-Tip: Don't use color alone to convey important information in your plots because if your end users are unable to distinguish the colors, the plot loses its value. Instead, double-up color information with another element, such as shape, to make the different groups easier to distinguish. For help selecting colors that are most likely to work for users with color vision deficiencies, [read about the colorspace package in R](https://arxiv.org/abs/1903.06490).
-
-
-We'll improve this plot by using shape and color together to mark the Classification groups.
-
-```r
-# add shape as a second signal to distinguish Classification
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age, color = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- scale_color_manual(values = class_colors)
-
-```
-
-
-
-### Changing background color with theme
-
-Finally, we can control the background (and other aspects of the plot's general appearance) by adjusting the theme. There are quite a lot of [pre-made themes available](https://ggplot2-book.org/polishing.html#themes), or you can [specify your own](https://ggplot2-book.org/polishing.html#modifying-theme-components). I'll use the theme called theme_bw().
-
-
-Some journals or other organizations have pre-made ggplot2 themes available that you can apply to make your plots adhere to their aesthetic guidelines. It's worth asking! For more information about using themes and modifying them to meet journal requirements, see the [figures chaper in Writing Papers with R and Friends](https://bookdown.org/content/43652694-3819-41d2-9e70-8cfc8dd25fd1/figures.html).
-
-
-```r
-# change the theme to theme_bw()
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age, color = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- scale_color_manual(values = class_colors) +
- theme_bw()
-
-```
-
-
-
-### Custom colors for continuous variables
-
-We can also manually control the color for a continuous variable, such as BMI. To do that, we'll use `scale_color_continuous()` instead of `scale_color_manual()`. We'll use theme_bw() again here, as well.
-
-```r
-# manually adjust color for a continuous variable
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age, color = BMI)) +
- geom_point() +
- scale_color_gradient(low = "lightgrey", high ="darkred") +
- theme_bw()
-```
-
-
-
-### Quiz: Scatterplots
-
-True or False: The only two crucial aesthetics for a ggplot2 scatterplot are x and y.
-
-[(X)] TRUE
-[( )] FALSE
-***********************************************************************
-
-While x and y are the only two **crucial** aesthetics, you may want to include others, such as color and shape, to communicate information about additional variables in the data.
-
-***********************************************************************
-
-What is the geom command for a scatterplot in ggplot2?
-
-[[geom_point()]]
-
-***********************************************************************
-
-Every ggplot2 visualization starts with the `ggplot()` command first to set which data will be used and how (i.e., the aesthetics), and then one or more "geoms" that control what type of plot will be created.
-
-***********************************************************************
-
-
-Which of the following can be used to manually set the color for a **numeric** variable in ggplot2?
-
-[(X)] `scale_color_gradient()`
-[( )] `scale_color_manual()`
-[( )] `theme_color()`
-[( )] Any of the above
-***********************************************************************
-
-`scale_color_gradient()` is for continuous variables, and `scale_color_manual()` is for categorical variables (factors).
-
-***********************************************************************
-
-Modify the code from the example above under [changing the background color with theme](#changing-background-color-with-theme) to apply a different theme. Add another layer to the plot with `theme(legend.position = "bottom")`. (Note: see the ggplot2 website for more on [modifying the legend for a plot](https://ggplot2.tidyverse.org/reference/guides.html).)
-
-```r -Solution
-# Note this is just one possible solution!
-# If your code generates a plot you like, that's a good solution.
-
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age, color = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- scale_color_manual(values = class_colors) +
- theme_classic() +
- theme(legend.position = "bottom")
-
-```
-
-## Histograms
-
-Histograms show the distribution of a continuous variable. The values of the variable are shown along the x-axis, and data are grouped into bins, with the height of each bin corresponding to the number of data points in that bin. In other words, it communicates where your data for a given variable fall within its range. It is a great way to quickly assess for symmetry vs skew, outliers, and less common issues like multimodality.
-
-We'll continue using the same data we explored to make scatter plots.
-
-### Basic histogram
-
-Because histograms show just one variable, the only aesthetic they require is x. The y-axis of the plot will just show the counts of observations in each bin on the x-axis. (Note that it is possible to provide ggplot2 with y instead of x, in which case it will generate a sideways histogram.)
-
-```r
-# a basic histogram
-ggplot(breast_cancer_data, mapping = aes(x=Glucose)) +
- geom_histogram() +
- theme_bw()
-```
-
-
-
-Note that we can use theme_bw() again to make a white background, as we did with the scatter plots.
-
-### Change the number of bins
-
-The appearance of a histogram can change a lot depending on the number of bins you use along the x-axis. It's a good idea to try a few different sets of bins to see what works well for communicating this distribution.
-
-```r
-# try fewer bins
-ggplot(breast_cancer_data, mapping = aes(x=Glucose)) +
- geom_histogram(bins=10) +
- theme_bw()
-
-```
-
-
-
-```r
-# try more bins
-ggplot(breast_cancer_data, mapping = aes(x=Glucose)) +
- geom_histogram(bins=100) +
- theme_bw()
-
-```
-
-
-
-### Using color to show groups
-
-As with scatterplots, we can add information about an additional variable by using color. Let's add the Classification factor to our aesthetics so we can see how the distribution of glucose values differs in the two groups.
-
-
-Tip: ggplot2 thinks about color differently for points and lines vs. filled in objects like bars. To adjust the color of the bars in a histogram, we need to use the fill aesthetic, not color.
-
-
-Note that we can set fill manually, just as we did with color in the scatterplots. In fact, we can use the same vector of colors we specified there to control fill now, giving our plots a coherent appearance.
-
-```r
-# use color to show Classification as well
-ggplot(breast_cancer_data, mapping = aes(x=Glucose, fill = Class_factor)) +
- geom_histogram(bins=30) +
- scale_fill_manual(values = class_colors) +
- theme_bw()
-
-```
-
-
-
-You may be noticing that the distribution for Class 1 appears to be stacked on top of the distribution for Class 2. This makes it easy to still see the overall distribution across both classification groups, but it makes it hard to see what the Class 1 distribution is like on its own. You can have ggplot display both groups as if they were each their own histogram instead of stacking by changing the position argument in the geom\_histogram() function. It defaults to "stacked", but if you make it "identity", then it will plot by the height of each bin within each group rather than both groups added together.
-
-If we do this, we'll also need to control the transparency in the plot --- otherwise one distribution will obscure the other where they overlap. Alpha is a value between 0 (totally transparent) and 1 (totally opaque), and it defaults to 1. We'll try it at .5 here to see if that lets us see both distributions well enough.
-
-
-Tip: When your data overlap on a plot, use alpha to make them more transparent.
-
-
-```r
-# plot as two overlapping histograms, rather than stacked bins
-# use alpha to control transparency
-ggplot(breast_cancer_data, mapping = aes(x=Glucose, fill = Class_factor)) +
- geom_histogram(bins=30, alpha = .5, position = "identity") +
- scale_fill_manual(values = class_colors) +
- theme_bw()
-```
-
-
-
-### Transforming axes
-
-Let's take a look at another variable in the breast cancer data, Insulin:
-
-```r
-# a histogram of a positively skewed variable
-ggplot(breast_cancer_data, mapping = aes(x=Insulin)) +
- geom_histogram(bins=30) +
- theme_bw()
-```
-
-
-
-This is a variable with positive skew. That means that the bulk of the data are clustered at the left end of the range between 0 and 10, with a long tail extending up toward 60.
-
-You may want to adjust the scale so that you can see more of the detail in the 0-10 range. One way to do that would be to statistically transform the Insulin variable in the data and then re-plot it with the transformed variable, but actually ggplot2 has built in functions to transform an axis so there's no need to modify the data. We'll use a common log transformation here, but there are [many more transformations](https://ggplot2.tidyverse.org/reference/scale_continuous.html) available.
-
-```r
-# transform the x-axis to show more detail at lower values
-ggplot(breast_cancer_data, mapping = aes(x=Insulin)) +
- geom_histogram(bins=30) +
- scale_x_continuous(trans = "log10") +
- theme_bw()
-```
-
-
-
-Note the scale on the x-axis. Fully half of the plot now shows the 0-10 range, allowing us to better see what the distribution there is like.
-
-### Quiz: Histograms
-
-What is the geom function for creating a histogram in ggplot2?
-
-[[geom_histogram()]]
-
-
-Which of the following aesthetics can be used to plot a histogram in ggplot2?
-
-[( )] x only
-[(X)] either x or y
-[( )] y only
-[( )] both x and y
-***********************************************************************
-
-Histograms can only make use of one dimension of data (x or y, but never both) because the other dimension will always be the count of observations in each bin. If you try to provide both x and y as aesthetics, ggplot2 will give you an error.
-
-In all of our examples, we used the x aesthetic for our histograms, but it is possible to provide a y aesthetic instead. As an experiment, try generating one of the plots above, but substitute y for x and see what happens!
-
-***********************************************************************
-
-
-What do you use to control transparency in ggplot2?
-
-[[alpha]]
-***********************************************************************
-
-Note the second plot in the [Using color to show groups](#using-color-to-show-groups), which includes an alpha adjustment.
-
-***********************************************************************
-
-True or False: Many common scale transformations are available in ggplot2, so you don't have to transform the data itself before plotting if you want to correct skew in your visualization.
-
-[(X)] True
-[( )] False
-***********************************************************************
-
-For a review, see [transforming axes](#transforming-axes).
-
-There are a few common transformations with their own ggplot2 functions, but there are many more available, and you can even [write your own transformation to use in ggplot2](https://scales.r-lib.org/reference/trans_new.html) if you like.
-
-***********************************************************************
-
-## Line Plots
-
-Line plots are especially useful when you want to show data points that are connected in a meaningful way. The most common application is repeated measures over time (also called time series data), such as when patients are measured on a given variable (plotted on the y-axis) at several times (plotted along the x-axis), and each line would represent one patient, or a summary across a group of patients.
-
-
-A word of caution: You may see line plots where the data points don't actually share a meaningful theoretical connection (e.g. all being from the same patient, or the same group). Although it's not uncommon, this is generally not considered good practice and you may receive pushback from reviewers or readers.
-
-
-
-A more detailed [tutorial on plotting time series in ggplot2](https://www.r-graph-gallery.com/279-plotting-time-series-with-ggplot2.html), including options for changing the way dates display along the x-axis.
-
-
-### Data for line plots
-
-The breast cancer data we've used for the other examples doesn't include any variables that would make sense for a line plot. Instead, for this example we'll use one of the [datasets that comes built-in with R](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html), called `Seatbelts`. It contains some publicly available data about road deaths in the UK from 1969 to 1984. The wearing of seatbelts was made compulsory in the UK in Jan 1983, so this is an interesting period over which to measure driver safety.
-
-This dataset comes already installed in R, so you don't need to download it to be able to use it.
-
-To learn more about the Seatbelts data, pull up its help documentation in R by running the following command:
-
-```r
-?Seatbelts
-```
-
-This data file is actually a special time series object in R, rather than being a regular dataframe. Before we use it for visualization, we'll turn it into a dataframe, and add a column that gives the date of each record. This is not a step you would likely need to take with your own data since most data will import into R as a dataframe by default, not a time series object.
-
-```r
-seatbelt_data <- Seatbelts %>%
- as.data.frame() %>%
- # add a column specifying the date
- mutate(date = seq(from = as.Date("1969-01-01"), to = as.Date("1984-12-01"), by="month"))
-
-```
-
-### Basic line plot
-
-```r
-ggplot(seatbelt_data, mapping = aes(x = date, y=drivers)) +
- geom_line() +
- theme_bw()
-```
-
-
-
-### Using color and line type to show groups
-
-If you want multiple lines on one plot, there are two ways to achieve that.
-
-The first is to add more `geom_line` layers, each with one variable you want plotted. The second is to convert the dataframe into long format, so the variables you want to plot are represented as two columns, one indicating the variable name and a second indicating the value.
-
-#### Method 1: Multiple layers
-
-```r
-ggplot(seatbelt_data, mapping = aes(x = date)) +
- geom_line(mapping = aes(y=drivers), color = "red", linetype = 1) +
- geom_line(mapping = aes(y=front), color = "blue", linetype = 2) +
- geom_line(mapping = aes(y=rear), color = "darkgreen", linetype = 3) +
- theme_bw()
-```
-
-Note that we've added an aesthetic mapping to each of the `geom_line` layers. Plot layers will inherit the aesthetic mapping set in the orginal `ggplot` command, or you can override it for just that layer by putting a new aesthetic mapping in. In this case, each layer inherits the same x-axis, but gets its own separate y-axis variable.
-
-
-
-There are a couple things about this plot that aren't quite ideal:
-
-- The y-axis is labeled "drivers" because that was the first geom layer we put in. The other two lines aren't for drivers, though, they're for front and rear seat passengers. It would be better to have an axis label that described all three lines well, like "deaths".
-- We have to set color and line type individually for each layer. In some cases, that may be your preference, but in others it's more convenient to have colors and line types selected automatically for you without you having to set them.
-- There is no legend showing which variable goes with each color and line type. We could [create a legend manually](https://community.rstudio.com/t/adding-manual-legend-to-ggplot2/41651/3), or we could provide that information in notes below the chart, but it's usually more convenient to have a legend generated automatically.
-
-#### Method 2: Convert data to long format
-
-The second way to create a line plot with several variables is to [convert the data from wide format to long](https://r4ds.had.co.nz/tidy-data.html#longer). This means we'll take the three variables we want to plot (drivers, front, and rear), and convert them into two columns: one that indicates which variable it is (drivers, front, or rear), and another that gives how many deaths occurred.
-
-This is called long format because it will make the data longer and narrower (more rows, fewer columns). In this case, since we have three variables for each time point, the new dataframe will end up being three times as many rows as before. This format is generally harder for humans to read, but it's usually much easier to work with for plotting. There are many cases where you'll find [converting data to a long format makes plotting it easier](https://scc.ms.unimelb.edu.au/resources-list/simple-r-scripts-for-analysis/r-scripts).
-
-To convert the seatbelt dataframe to long format, we'll use the `pivot_longer` function:
-
-```r
-seatbelt_data_long <- seatbelt_data %>%
- pivot_longer(cols = c(drivers, front, rear), names_to = "seat", values_to = "deaths")
-
-```
-
-In the code above, we are taking the three specified columns (drivers, front, rear), and pivoting them into two columns, one called seat and one called deaths.
-
-Then we can use the new long dataframe to plot the data with separate lines for each seat.
-
-```r
-ggplot(seatbelt_data_long, mapping = aes(x=date, y=deaths, color = seat, linetype = seat)) +
- geom_line() +
- theme_bw()
-```
-
-Note that now there is only one `geom_line` layer, but the aesthetic mapping is more complex --- it includes both x and y axes, but also color and linetype are set to vary by seat.
-
-
-
-### Adding a reference line
-
-It is possible to add a reference line to any ggplot visualization by adding a layer. We'll add a reference line here to indicate when the seatbelt law was passed (Jan 1983), since we might expect to see the number of deaths decrease after that date.
-
-There are two different functions for adding reference lines in ggplot2: `geom_hline` for horizontal reference lines, and `geom_vline` for vertical reference lines. We'll use `geom_vline` here since we want the line to mark a specific date on the x-axis.
-
-```r
-ggplot(seatbelt_data, mapping = aes(x = date, y=drivers)) +
- geom_line() +
- geom_vline(xintercept = as.Date("1983-01-031"), linetype = 2, color = "red") +
- theme_bw()
-```
-
-
-
-### Quiz: Line Plots
-
-What is the geom function for creating a line plot in ggplot2?
-
-[[geom_line()]]
-
-
-True or False: Line plots are usually appropriate as an alternative to scatterplots.
-
-[( )] TRUE
-[(X)] FALSE
-***********************************************************************
-
-Line plots and scatterplots are generally used for different kinds of data, so a line plot is usually not a good alternative to a scatterplot. Line plots are used to communicate **meaningfully connected** data, most often repeated observations over time.
-
-***********************************************************************
-
-Modify the code from the first example, the [basic line plot](#basic-line-plot), to add a horizontal reference line at 1000 deaths. Make it a dashed line.
-
-```r -Solution
-ggplot(seatbelt_data, mapping = aes(x = date, y=drivers)) +
- geom_line() +
- geom_hline(yintercept=1000, linetype = 2) +
- theme_bw()
-```
-
-Modify the code from the [long data version of the plot with multiple lines](#method-2:-convert-data-to-long-format) so that you control the colors used for the three lines. You can set them to anything you like. (Hint: You can use the same approach we used in the scatterplot section to [distinguish groups more clearly with custom colors and shape](#distinguish-groups-more-clearly-with-custom-colors-and-shape).)
-
-```r -Solution
-colors <- c(drivers = "#FEB648", front = , "#7B085D", rear = "#3390FF")
-
-ggplot(seatbelt_data_long, mapping = aes(x=date, y=deaths, color = seat, linetype = seat)) +
- geom_line() +
- scale_color_manual(values = colors) +
- theme_bw()
-```
-
-## Trend Lines
-
-Trend lines look like line plots, but they are different in one key way: They show a **summary** of other data (usually a linear model), rather than plotting data directly.
-
-Trend lines are used to show the overall trend in a scatterplot. Sometimes, the scatterplot points themselves are omitted and just the trend lines are shown to keep the visualization as clean as possible, but they're still implied.
-
-### Method 1: geom\_smooth
-
-Because trend lines are such a useful visual summary, ggplot2 provides several built-in functions to add trend lines to your plots quickly and easily. These functions are available in the geom\_smooth() command. If you add a geom\_smooth layer to a scatterplot, it will show a trend line.
-
-```r
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_smooth() +
- theme_bw()
-
-```
-
-
-
-By default[^1](Actually, the default is only a loess curve for 1000 or fewer observations; it switches to a generalized additive model if you have more.), geom\_smooth() uses a loess curve to summarize your data, rather than a straight linear regression best fit line. To see a linear trendline, set the `method` argument to "lm" (short for linear model) in the geom\_smooth command.
-
-```r
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_smooth(method = "lm") +
- theme_bw()
-
-```
-
-
-
-What happens to the trend line if you make the underlying scatterplot more complicated? Let's try using geom\_smooth to put a trend line on the scatterplot we made [distinguishing groups with custom colors and shape](#distinguish-groups-more-clearly-with-custom-colors-and-shape).
-
-```r
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- geom_smooth(method = "lm") +
- scale_color_manual(values = class_colors) +
- theme_bw()
-
-```
-
-
-
-Note that the color aesthetic affects the line drawn by geom\_smooth as well. Each layer inherits the aesthetic mappings from the original ggplot command, so geom\_smooth is drawn using x, y, and Class (set by color, which affects lines, and shape which has no effect on lines) --- that means we automatically get a trend line fit within each Class.
-
-The grey confidence interval around the trend lines doesn't change, though, because that is controlled by fill, not color (for a reminder about color vs fill in ggplot2, see how we changed color in the [histograms](#histograms)). To make the confidence interval match the Class color as well, add a fill aesthetic. You can add it either to the ggplot command at the top, or you could put a `aes()` mapping into the geom\_smooth layer itself.
-
-```r
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = Class_factor,
- fill = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- geom_smooth(method = "lm") +
- scale_color_manual(values = class_colors) +
- theme_bw()
-
-```
-
-
-
-### Method 2: geom\_abline
-
-Rather than having ggplot2 run a linear regression for you, you may prefer to use the coefficients from your model to draw the line of best fit directly.
-
-This is especially handy if your model differs from ggplot2's, or if you're not sure what model ggplot2 is running and you want to make sure your plotted trend line matches the model you ran.
-
-The geom\_abline() function allows you to specify a y-intercept and slope (which you can pull from your model coefficients) and it will draw a line accordingly.
-
-The first step is to run the linear model you want to get trend lines from. We won't step through how to run linear models in R here.
-
-
-For a more in-depth example of linear regression models in R, including using geom\_abline to draw linear trend lines, see this [Arcus Education post on ordinary linear regression](https://education.arcus.chop.edu/ordinary_linear_regression/).
-
-
-```r
-# run a linear model
-model <- lm(Glucose ~ Age, data = breast_cancer_data)
-# print the coefficients estimated from the model, so you can see what they are
-model$coefficients
-
-```
-
-Briefly, this code estimates a linear model from the breast_cancer_data dataframe, with Glucose as the outcome and Age as the predictor and saves it as an object named `model`. We can see the coefficients estimated from the model by running `model$coefficients`.
-
-We can then use those coefficients in a geom\_abline layer on our scatterplot to draw this trend line.
-
-```r
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_abline(intercept = model$coefficients[1],
- slope = model$coefficients[2]) +
- theme_bw()
-
-```
-
-
-
-Note that unlike geom\_smooth, geom\_abline doesn't draw a confidence interval around the trend line. Also note that since we're drawing the trend line manually, changing the underlying scatterplot won't affect the line drawn by geom\_abline --- if we were to add color and shape aesthetics, it would still just draw this same single black line.
-
-### Method 3: geom\_line
-
-The third approach for drawing trend lines actually uses the same geom we used for line plots, geom\_line. It takes advantage of the fact that R returns the predicted values for any model as a hidden piece of the model results --- that means you can use those predicted values to plot a line, which will draw any trendline produced by your model.
-
-Just as with [the geom\_abline approach](#method-2:-geom_abline), first you need a model object. We can use the same model we saved for the geom\_abline plot. This time, though, instead of pulling out the coefficients for the intercept and slope to draw a line that way, we'll use the fitted values.
-
-The fitted values are the expected outcome (Glucose) value for each observation in the data, based on the predictor(s) (in this case, just Age). If we connect all of the fitted values, that is the trend line from the model.
-
-Recall from the section on [line plots](#line-plots) that the geom\_line function connects points in a line. So we'll add a geom\_line layer and set its aesthetic mapping to use Age as the x-axis and `model$fitted.values` for the y-axis. It will inherit `x=Age` from the aesthetic mapping in the original ggplot command, but we want to override the y mapping and replace it with `y=model$fitted.values` for just the geom\_line layer.
-
-```r
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_line(mapping = aes(y=model$fitted.values)) +
- theme_bw()
-
-```
-
-
-
-
-There's a second way to get predicted values from a model, the `predict` function. So `predict(model)` will give you the same set of fitted values as `model$fitted.values`. The `predict` function gives you a little more flexibility (for example, to get fitted values from a different dataset from the one you originally trained the model on), but at the default settings it will give you the exact same results as `model$fitted.values`.
-
-
-Note that the results from the geom\_abline approach and the geom\_line approach look very similar, but there's one important difference: The geom\_abline approach will draw lines that extend all the way from one edge of your plot to the other, while geom\_line (like geom\_smooth) will only draw the trend line across the range of the observed data. This means geom\_abline can imply predictions outside of the observed range, which is often okay but sometimes not what you want.
-
-### Quiz: Trend Lines
-
-What is the geom function for creating a trend line in ggplot2?
-
-[[geom_smooth(), geom_line(), geom_abline() all work!]]
-
-***********************************************************************
-
-This is a little bit of a trick question (sorry!) since there is no single correct answer: geom\_smooth(), geom\_line(), geom\_abline() all work to create trend lines!
-
-If you use geom\_abline or geom\_line, you need to first run the model, and then use the model results in your ggplot commands. The geom\_smooth function actually runs a model for you behind the scenes. Another difference is that geom\_smooth can print a confidence interval around your trend line, but geom\_line and geom\_abline just draw the line itself.
-
-***********************************************************************
-
-True or False: If you wanted to, you could use geom\_abline or geom\_line to draw a totally unrelated trend line on a scatterplot (e.g. one derived from different data).
-
-[(X)] TRUE
-[( )] FALSE
-***********************************************************************
-
-True! And this is an important point, because this can happen by accident. If you run several models in your code, always double check to make sure you're referencing the correct model to get the coefficients (for geom\_abline) or fitted values (for geom\_line) when you create the trend line.
-
-One good strategy is to generate trend lines a couple different ways while you're working and check to make sure they all look the same. For example, add a trend line using geom\_smooth, and then run that model yourself and try to generate the same trend line using geom\_line or geom\_abline. That way you can confirm that you know the details of the model that geom\_smooth is using.
-
-***********************************************************************
-
-Modify the code from [the first example in the geom\_smooth approach](#method-1:-geom_smooth) to change the appearance of the trend line it draws. Make it black, and make it a dashed line. (Hint: See the examples in the [line plots](#line-plots) section for a reminder of setting color and line type.)
-
-```r -Solution
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_smooth(color = "black", linetype = 2) +
- theme_bw()
-```
-
-Write code to draw a linear trend line showing the relationship between Age and Glucose, but create a plot with just the line, no scatterplot underneath. Try it each of the three ways, using geom\_smooth, geom\_abline, and geom\_line.
-
-```r -Solution using geom_smooth
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_smooth(method = "lm") +
- theme_bw()
-```
-```r -Solution using geom_abline
-# note that this doesn't actually plot a line, since there are no observations to set the x and y scales
-# you'll see a blank plot
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_abline(intercept = model$coefficients[1], slope = model$coefficients[2]) +
- theme_bw()
-
-# you can set the x and y scales yourself manually by adding a layer for each
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_abline(intercept = model$coefficients[1], slope = model$coefficients[2]) +
- scale_y_continuous(limits = c(min(breast_cancer_data$Glucose), max(breast_cancer_data$Glucose))) +
- scale_x_continuous(limits = c(min(breast_cancer_data$Age), max(breast_cancer_data$Age))) +
- theme_bw()
-
-```
-```r -Solution using geom_line
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_line(mapping = aes(y=model$fitted.values)) +
- theme_bw()
-```
-```r -Another solution, using alpha
-# you can also make any element of a plot invisible by setting its alpha to 0
-# in this case, we can make the dots of the scatterplot disappear from any of the plots we made above
-# for example:
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point(alpha = 0) +
- geom_line(mapping = aes(y=model$fitted.values)) +
- theme_bw()
-
-# this has the advantage of keeping the scales for the plot consistent
-# and it means you don't have to set the scales manually when using geom_abline
-```
-
-## Additional Resources
-
-For an excellent quick reference, see the [ggplot2 cheatsheet](https://ggplot2.tidyverse.org/#cheatsheet). It includes a tremendous amount of information in a very compact format, so it's not great for people just getting started with ggplot2, but it's a valuable reference to keep on hand for when you start making plots for your own analyses.
-
-For more detail on controlling color in ggplot2, refer to the [ggplot2 book](https://ggplot2-book.org/scale-colour.html), available for free online.
-
-To learn how to make plots in python using seaborn, see [data visualization in seaborn](https://liascript.github.io/course/?https://raw.githubusercontent.com/arcus/education_modules/main/data_viz_seaborn/data_visualization_in_seaborn.md).
-
-## Feedback
-
-In the beginning, we stated some goals.
-
-**Learning Objectives:**
-
-@learning_objectives
-
-We ask you to fill out a brief (5 minutes or less) survey to let us know:
-
-* If we achieved the learning objectives
-* If the module difficulty was appropriate
-* If we gave you the experience you expected
-
-We gather this information in order to iteratively improve our work. Thank you in advance for filling out [our brief survey](https://redcap.chop.edu/surveys/?s=KHTXCXJJ93&module_name=%22Data+visualizations+in+ggplot2%22)!
diff --git a/data_visualization_in_ggplot2/data_visualization_ggplot2.r b/data_visualization_in_ggplot2/data_visualization_ggplot2.r
deleted file mode 100644
index c46d29076..000000000
--- a/data_visualization_in_ggplot2/data_visualization_ggplot2.r
+++ /dev/null
@@ -1,205 +0,0 @@
-# ---------------
-# Scatterplots
-# ---------------
-
-# the libraries we'll be using
-library(readr)
-library(dplyr)
-library(ggplot2)
-
-breast_cancer_data <- read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00451/dataR2.csv")
-
-# basic scatter plot
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point()
-
-# use color to add information about a continuous variable
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = BMI)) +
- geom_point()
-
-# the Classification variable is currently treated as numeric,
-# so convert it to a factor
-breast_cancer_data <- breast_cancer_data %>%
- mutate(Class_factor = factor(Classification,
- levels = c(1,2),
- labels = c("Class 1", "Class 2")))
-
-# use color to add information about a categorical variable
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = Class_factor)) +
- geom_point()
-
-# save the colors you want to use as a vector
-# you can specify colors by name (e.g. "blue"),
-# or use HTML codes, as from https://htmlcolorcodes.com/color-picker/
-class_colors <- c(`Class 1` = "#FEB648", `Class 2` = "#3390FF")
-
-# add a layer with scale_color_manual to specify the colors you want to use
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = Class_factor)) +
- geom_point() +
- scale_color_manual(values = class_colors)
-
-# add shape as a second signal to distinguish Classification
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- scale_color_manual(values = class_colors)
-
-# change the theme to theme_bw()
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- scale_color_manual(values = class_colors) +
- theme_bw()
-
-# manually adjust color for a continuous variable
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age, color = BMI)) +
- geom_point() +
- scale_color_gradient(low = "lightgrey", high ="darkred") +
- theme_bw()
-
-# ---------------
-# Histograms
-# ---------------
-
-# a basic histogram
-ggplot(breast_cancer_data, mapping = aes(x=Glucose)) +
- geom_histogram() +
- theme_bw()
-
-# try fewer bins
-ggplot(breast_cancer_data, mapping = aes(x=Glucose)) +
- geom_histogram(bins=10) +
- theme_bw()
-
-# try more bins
-ggplot(breast_cancer_data, mapping = aes(x=Glucose)) +
- geom_histogram(bins=100) +
- theme_bw()
-
-# use color to show Classification as well
-ggplot(breast_cancer_data, mapping = aes(x=Glucose, fill = Class_factor)) +
- geom_histogram(bins=30) +
- scale_fill_manual(values = class_colors) +
- theme_bw()
-
-# plot as two overlapping histograms, rather than stacked bins
-# use alpha to control transparency
-ggplot(breast_cancer_data, mapping = aes(x=Glucose, fill = Class_factor)) +
- geom_histogram(bins=30, alpha = .5, position = "identity") +
- scale_fill_manual(values = class_colors) +
- theme_bw()
-
-# a histogram of a positively skewed variable
-ggplot(breast_cancer_data, mapping = aes(x=Insulin)) +
- geom_histogram(bins=30) +
- theme_bw()
-
-# transform the x-axis to show more detail at lower values
-ggplot(breast_cancer_data, mapping = aes(x=Insulin)) +
- geom_histogram(bins=30) +
- scale_x_continuous(trans = "log10") +
- theme_bw()
-
-# ---------------
-# Line plots
-# ---------------
-
-# to learn more about the Seatbelts dataset
-?Seatbelts
-
-# make the time series data into a dataframe, so we can use it for plotting
-seatbelt_data <- Seatbelts %>%
- as.data.frame() %>%
- # add a column specifying the date
- mutate(date = seq(from = as.Date("1969-01-01"),
- to = as.Date("1984-12-01"),
- by="month"))
-
-# basic line plot
-ggplot(seatbelt_data, mapping = aes(x = date, y=drivers)) +
- geom_line() +
- theme_bw()
-
-
-# using color and line type to show groups
-# option 1: multiple geom_line layers
-ggplot(seatbelt_data, mapping = aes(x = date)) +
- geom_line(mapping = aes(y=drivers), color = "red", linetype = 1) +
- geom_line(mapping = aes(y=front), color = "blue", linetype = 2) +
- geom_line(mapping = aes(y=rear), color = "darkgreen", linetype = 3) +
- theme_bw()
-
-# option 2: convert data to long format
-seatbelt_data_long <- seatbelt_data %>%
- pivot_longer(cols = c(drivers, front, rear), names_to = "seat", values_to = "deaths")
-ggplot(seatbelt_data_long, mapping = aes(x=date, y=deaths,
- color = seat,
- linetype = seat)) +
- geom_line() +
- theme_bw()
-
-
-# add a reference line
-ggplot(seatbelt_data, mapping = aes(x = date, y=drivers)) +
- geom_line() +
- geom_vline(xintercept = as.Date("1983-01-031"), linetype = 2, color = "red") +
- theme_bw()
-
-# ---------------
-# Trend lines
-# ---------------
-
-# Method 1: geom_smooth
-
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_smooth() +
- theme_bw()
-
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_smooth(method = "lm") +
- theme_bw()
-
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- geom_smooth(method = "lm") +
- scale_color_manual(values = class_colors) +
- theme_bw()
-
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age,
- color = Class_factor,
- fill = Class_factor,
- shape = Class_factor)) +
- geom_point() +
- geom_smooth(method = "lm") +
- scale_color_manual(values = class_colors) +
- theme_bw()
-
-# Method 2: geom_abline
-
-# run a linear model
-model <- lm(Glucose ~ Age, data = breast_cancer_data)
-# print the coefficients estimated from the model, so you can see what they are
-model$coefficients
-
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_abline(intercept = model$coefficients[1],
- slope = model$coefficients[2]) +
- theme_bw()
-
-
-# Method 3: geom_line
-
-ggplot(breast_cancer_data, mapping = aes(y=Glucose, x=Age)) +
- geom_point() +
- geom_line(mapping = aes(y=model$fitted.values)) +
- theme_bw()
diff --git a/data_visualization_in_ggplot2/environment.yml b/data_visualization_in_ggplot2/environment.yml
deleted file mode 100644
index 3f003bfc8..000000000
--- a/data_visualization_in_ggplot2/environment.yml
+++ /dev/null
@@ -1,9 +0,0 @@
-channels:
- - conda-forge
-dependencies:
- - r-base=3.6
- - r-readr
- - r-ggplot2
- - r-dplyr
- - r-tidyr
- - r-remotes
\ No newline at end of file
diff --git a/data_visualization_in_ggplot2/media/ggplot_hist_1.png b/data_visualization_in_ggplot2/media/ggplot_hist_1.png
deleted file mode 100644
index 432f9bd7b..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_hist_1.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_hist_2.png b/data_visualization_in_ggplot2/media/ggplot_hist_2.png
deleted file mode 100644
index 252fefa9f..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_hist_2.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_hist_3.png b/data_visualization_in_ggplot2/media/ggplot_hist_3.png
deleted file mode 100644
index 98efd28f7..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_hist_3.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_hist_4.png b/data_visualization_in_ggplot2/media/ggplot_hist_4.png
deleted file mode 100644
index 4e69102a4..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_hist_4.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_hist_5.png b/data_visualization_in_ggplot2/media/ggplot_hist_5.png
deleted file mode 100644
index b214d312e..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_hist_5.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_hist_6.png b/data_visualization_in_ggplot2/media/ggplot_hist_6.png
deleted file mode 100644
index 846d80ba3..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_hist_6.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_hist_7.png b/data_visualization_in_ggplot2/media/ggplot_hist_7.png
deleted file mode 100644
index 4325ca2eb..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_hist_7.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_line_1.png b/data_visualization_in_ggplot2/media/ggplot_line_1.png
deleted file mode 100644
index d5cd5c3cb..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_line_1.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_line_2.png b/data_visualization_in_ggplot2/media/ggplot_line_2.png
deleted file mode 100644
index 46fdaa7c6..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_line_2.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_line_3.png b/data_visualization_in_ggplot2/media/ggplot_line_3.png
deleted file mode 100644
index a482fde07..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_line_3.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_line_4.png b/data_visualization_in_ggplot2/media/ggplot_line_4.png
deleted file mode 100644
index bea90e324..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_line_4.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_scatter_1.png b/data_visualization_in_ggplot2/media/ggplot_scatter_1.png
deleted file mode 100644
index 73a33f802..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_scatter_1.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_scatter_2.png b/data_visualization_in_ggplot2/media/ggplot_scatter_2.png
deleted file mode 100644
index cdf87bba6..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_scatter_2.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_scatter_3.png b/data_visualization_in_ggplot2/media/ggplot_scatter_3.png
deleted file mode 100644
index e954e3d65..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_scatter_3.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_scatter_4.png b/data_visualization_in_ggplot2/media/ggplot_scatter_4.png
deleted file mode 100644
index 86392a3ba..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_scatter_4.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_scatter_5.png b/data_visualization_in_ggplot2/media/ggplot_scatter_5.png
deleted file mode 100644
index d48b04f73..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_scatter_5.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_scatter_6.png b/data_visualization_in_ggplot2/media/ggplot_scatter_6.png
deleted file mode 100644
index 4be2cd023..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_scatter_6.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_scatter_7.png b/data_visualization_in_ggplot2/media/ggplot_scatter_7.png
deleted file mode 100644
index 909b5bacc..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_scatter_7.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_trend_1.png b/data_visualization_in_ggplot2/media/ggplot_trend_1.png
deleted file mode 100644
index 3ccd0feb0..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_trend_1.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_trend_2.png b/data_visualization_in_ggplot2/media/ggplot_trend_2.png
deleted file mode 100644
index 1c7ee74ab..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_trend_2.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_trend_3.png b/data_visualization_in_ggplot2/media/ggplot_trend_3.png
deleted file mode 100644
index 5d7d80dbb..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_trend_3.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_trend_4.png b/data_visualization_in_ggplot2/media/ggplot_trend_4.png
deleted file mode 100644
index 70d91ea7c..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_trend_4.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_trend_5.png b/data_visualization_in_ggplot2/media/ggplot_trend_5.png
deleted file mode 100644
index 74d4fbb81..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_trend_5.png and /dev/null differ
diff --git a/data_visualization_in_ggplot2/media/ggplot_trend_6.png b/data_visualization_in_ggplot2/media/ggplot_trend_6.png
deleted file mode 100644
index 038f5daef..000000000
Binary files a/data_visualization_in_ggplot2/media/ggplot_trend_6.png and /dev/null differ
diff --git a/assets/media/chop-icon.png b/git/assets/media/chop-icon.png
similarity index 100%
rename from assets/media/chop-icon.png
rename to git/assets/media/chop-icon.png
diff --git a/git/assets/media/collaborating_images/github-add-collaborators-01.png b/git/assets/media/collaborating_images/github-add-collaborators-01.png
new file mode 100644
index 000000000..529477905
Binary files /dev/null and b/git/assets/media/collaborating_images/github-add-collaborators-01.png differ
diff --git a/git/assets/media/collaborating_images/github-collaboration-02.svg b/git/assets/media/collaborating_images/github-collaboration-02.svg
new file mode 100644
index 000000000..2e79e1750
--- /dev/null
+++ b/git/assets/media/collaborating_images/github-collaboration-02.svg
@@ -0,0 +1,767 @@
+
+
+
+
\ No newline at end of file
diff --git a/git/assets/media/conflicts_images/conflict-01.svg b/git/assets/media/conflicts_images/conflict-01.svg
new file mode 100644
index 000000000..ce24496a2
--- /dev/null
+++ b/git/assets/media/conflicts_images/conflict-01.svg
@@ -0,0 +1,309 @@
+
+
+
+
\ No newline at end of file
diff --git a/assets/media/favicon.ico b/git/assets/media/favicon.ico
similarity index 100%
rename from assets/media/favicon.ico
rename to git/assets/media/favicon.ico
diff --git a/git/assets/media/remotes_step_01_images/git-freshly-made-github-repo-05.svg b/git/assets/media/remotes_step_01_images/git-freshly-made-github-repo-05.svg
new file mode 100644
index 000000000..2d70ffd17
--- /dev/null
+++ b/git/assets/media/remotes_step_01_images/git-freshly-made-github-repo-05.svg
@@ -0,0 +1,315 @@
+
+
+
+
\ No newline at end of file
diff --git a/git/assets/media/remotes_step_01_images/git-staging-area-04.svg b/git/assets/media/remotes_step_01_images/git-staging-area-04.svg
new file mode 100644
index 000000000..c74c2987e
--- /dev/null
+++ b/git/assets/media/remotes_step_01_images/git-staging-area-04.svg
@@ -0,0 +1,93 @@
+
+
+
+
diff --git a/git/assets/media/remotes_step_01_images/github-create-repo-01.png b/git/assets/media/remotes_step_01_images/github-create-repo-01.png
new file mode 100644
index 000000000..6dc6bf219
Binary files /dev/null and b/git/assets/media/remotes_step_01_images/github-create-repo-01.png differ
diff --git a/git/assets/media/remotes_step_01_images/github-create-repo-02.png b/git/assets/media/remotes_step_01_images/github-create-repo-02.png
new file mode 100644
index 000000000..5981881cd
Binary files /dev/null and b/git/assets/media/remotes_step_01_images/github-create-repo-02.png differ
diff --git a/git/assets/media/remotes_step_01_images/github-create-repo-03.png b/git/assets/media/remotes_step_01_images/github-create-repo-03.png
new file mode 100644
index 000000000..ebce87d5e
Binary files /dev/null and b/git/assets/media/remotes_step_01_images/github-create-repo-03.png differ
diff --git a/git/assets/media/remotes_step_02_images/github-change-repo-string-02.png b/git/assets/media/remotes_step_02_images/github-change-repo-string-02.png
new file mode 100644
index 000000000..7ffe1bfda
Binary files /dev/null and b/git/assets/media/remotes_step_02_images/github-change-repo-string-02.png differ
diff --git a/git/assets/media/remotes_step_02_images/github-find-repo-string-01.png b/git/assets/media/remotes_step_02_images/github-find-repo-string-01.png
new file mode 100644
index 000000000..97d339bd7
Binary files /dev/null and b/git/assets/media/remotes_step_02_images/github-find-repo-string-01.png differ
diff --git a/git/assets/media/remotes_step_03_images/github-repo-after-first-push_01.svg b/git/assets/media/remotes_step_03_images/github-repo-after-first-push_01.svg
new file mode 100644
index 000000000..700339541
--- /dev/null
+++ b/git/assets/media/remotes_step_03_images/github-repo-after-first-push_01.svg
@@ -0,0 +1,483 @@
+
+
+
+
\ No newline at end of file
diff --git a/assets/styles.css b/git/assets/styles.css
similarity index 100%
rename from assets/styles.css
rename to git/assets/styles.css
diff --git a/git/collaborating.md b/git/collaborating.md
new file mode 100644
index 000000000..34dcb968c
--- /dev/null
+++ b/git/collaborating.md
@@ -0,0 +1,246 @@
+
+
+# Collaborating
+
+
+
+
+## Overview
+@comment
+
+**Is this module right for me?** @long_description
+
+**Estimated time to completion:** 30-45 minutes
+
+**Pre-requisites**
+
+* A [GitHub](https://github.com/) account
+* At least one GitHub repository associated with your username
+* (Preferred) Completion of git modules 1-6
+
+**Learning Objectives**
+
+@learning_objectives
+
+
+
+
+## Lesson Preparation
+
+- Open a web browser, like Chrome or Firefox
+- Open a command line terminal
+
+## Collaboration
+
+
+**Practicing by yourself**
+
+If you’re working through this lesson on your own, you can carry on by opening a second terminal window. This window will represent your partner, working on another computer. You won’t need to give anyone access on GitHub, because both ‘partners’ are you.
+
+
+
+For the next step, get into pairs. One person will be the “Owner” and the other will be the “Collaborator”. The goal is that the Collaborator add changes into the Owner’s repository. We will switch roles at the end, so both persons will play Owner and Collaborator.
+
+The Owner needs to give the Collaborator access. On GitHub, click the settings button on the right, select Manage access, click Invite a collaborator, and then enter your partner’s username.
+
+
+
+To accept access to the Owner’s repo, the Collaborator needs to go to https://github.com/notifications or check for email notification. Once there she can accept access to the Owner’s repo.
+
+Next, the Collaborator needs to download a copy of the Owner’s repository to her machine. This is called “cloning a repo”.
+
+The Collaborator doesn’t want to overwrite her own version of planets.git, so needs to clone the Owner’s repository to a different location than her own repository with the same name.
+
+To clone the Owner’s repo into her Desktop folder, the Collaborator enters:
+```console
+$ git clone git@github.com:vlad/planets.git ~/Desktop/vlad-planets
+```
+Replace ‘vlad’ with the Owner’s username.
+
+If you choose to clone without the clone path (`~/Desktop/vlad-planets`) specified at the end, you will clone inside your own planets folder! Make sure to navigate to the Desktop folder first.
+
+
+
+The Collaborator can now make a change in her clone of the Owner’s repository, exactly the same way as we’ve been doing before:
+
+```console
+$ cd ~/Desktop/vlad-planets
+$ nano pluto.txt
+$ cat pluto.txt
+```
+```output
+It is so a planet!
+```
+
+```console
+$ git add pluto.txt
+$ git commit -m "Add notes about Pluto"
+```
+```output
+ 1 file changed, 1 insertion(+)
+ create mode 100644 pluto.txt
+```
+
+Then push the change to the Owner’s repository on GitHub:
+
+```console
+$ git push origin main
+```
+```output
+Enumerating objects: 4, done.
+Counting objects: 4, done.
+Delta compression using up to 4 threads.
+Compressing objects: 100% (2/2), done.
+Writing objects: 100% (3/3), 306 bytes, done.
+Total 3 (delta 0), reused 0 (delta 0)
+To https://github.com/vlad/planets.git
+ 9272da5..29aba7c main -> main
+```
+Note that we didn’t have to create a remote called `origin`: Git uses this name by default when we clone a repository. (This is why `origin` was a sensible choice earlier when we were setting up remotes by hand.)
+
+Take a look at the Owner’s repository on GitHub again, and you should be able to see the new commit made by the Collaborator. You may need to refresh your browser to see the new commit.
+
+To download the Collaborator’s changes from GitHub, the Owner now enters:
+```console
+$ git push origin main
+```
+```output
+remote: Enumerating objects: 4, done.
+remote: Counting objects: 100% (4/4), done.
+remote: Compressing objects: 100% (2/2), done.
+remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0
+Unpacking objects: 100% (3/3), done.
+From https://github.com/vlad/planets
+ * branch main -> FETCH_HEAD
+ 9272da5..29aba7c main -> origin/main
+Updating 9272da5..29aba7c
+Fast-forward
+ pluto.txt | 1 +
+ 1 file changed, 1 insertion(+)
+ create mode 100644 pluto.txt
+```
+
+Now the three repositories (Owner’s local, Collaborator’s local, and Owner’s on GitHub) are back in sync.
+
+### Some more notes about remotes
+
+In this episode and the previous one, our local repository has had a single “remote”, called `origin`. A remote is a copy of the repository that is hosted somewhere else, that we can push to and pull from, and there’s no reason that you have to work with only one. For example, on some large projects you might have your own copy in your own GitHub account (you’d probably call this `origin`) and also the main “upstream” project repository (let’s call this `upstream` for the sake of examples). You would pull from `upstream` from time to time to get the latest updates that other people have committed.
+
+Remember that the name you give to a remote only exists locally. It’s an alias that you choose - whether `origin`, or `upstream`, or `fred` - and not something intrinstic to the remote repository.
+
+The `git remote` family of commands is used to set up and alter the remotes associated with a repository. Here are some of the most useful ones:
+
+- git `remote -v` lists all the remotes that are configured (we already used this in the last episode)
+- git `remote add [name] [url]` is used to add a new remote
+- git `remote remove [name]` removes a remote. Note that it doesn’t affect the remote repository at all - it just removes the link to it from the local repo.
+- git `remote set-url [name] [newurl]` changes the URL that is associated with the remote. This is useful if it has moved, e.g. to a different GitHub account, or from GitHub to a different hosting service. Or, if we made a typo when adding it!
+- `git remote rename [oldname] [newname]` changes the local alias by which a remote is known - its name. For example, one could use this to change upstream to fred.
+
+### A Basic Collaborative Workflow
+
+In practice, it is good to be sure that you have an updated version of the repository you are collaborating on, so you should `git pull` before making our changes. The basic collaborative workflow would be:
+
+- update your local repo with `git pull origin main`,
+- make your changes and stage them with `git add`,
+- commit your changes with git `commit -m`, and
+- upload the changes to GitHub with `git push origin main`
+
+It is better to make many commits with smaller changes rather than of one commit with massive changes: small commits are easier to read and review.
+
+
+### Exercise: Switch Roles
+
+Switch roles and repeat the whole process.
+
+[[GitHub GUI]]
+***
+
+Switching roles will allow you to practice both sides of the change process.
+
+***
+
+
+### Quiz: Review changes
+
+The Owner pushed commits to the repository without giving any information to the Collaborator. How can the Collaborator find out what has changed with command line? And on GitHub?
+
+[[Review changes]]
+***
+
+On the command line, the Collaborator can use `git fetch origin main` to get the remote changes into the local repository, but without merging them. Then by running `git diff main origin/main` the Collaborator will see the changes output in the terminal.
+
+On GitHub, the Collaborator can go to the repository and click on “commits” to view the most recent commits pushed to the repository.
+
+***
+
+### Comment Changes in GitHub
+
+The Collaborator has some questions about one line change made by the Owner and has some suggestions to propose.
+
+With GitHub, it is possible to comment the diff of a commit. Over the line of code to comment, a blue comment icon appears to open a comment window.
+
+The Collaborator posts its comments and suggestions using GitHub interface.
+
+
+### Quiz: Version History, Backup, and Version Control
+
+Some backup software can keep a history of the versions of your files. They also allows you to recover specific versions. How is this functionality different from version control? What are some of the benefits of using version control, Git, and GitHub?
+
+[[Version History, Backup, and Version Control]]
+***
+
+Version control provides a richer history of who made which changes than backup software. In version control, you are in control of what is committed when.You get to decide what the versions look like. Other people are also able to work within version control and you can all view and work with each others' changes.
+
+
+***
+
+## Wrap-Up
+- `git clone` copies a remote repository to create a local repository with a remote called `origin` automatically set up.
+
+
+
+## Additional Resources
+
+To learn more about SSH and its setup, refer to the Software carpentries episode [here](https://swcarpentry.github.io/git-novice/07-github/index.html#3-ssh-background-and-setup).
+
+
+## Feedback
+
+In the beginning, we stated some goals.
+
+**Learning Objectives:**
+
+@learning_objectives
+
+We ask you to fill out a brief (5 minutes or less) survey to let us know:
+
+* If we achieved the learning objectives
+* If the module difficulty was appropriate
+* If we gave you the experience you expected
+
+We gather this information in order to iteratively improve our work. Thank you in advance for filling out [our brief survey](https://redcap.chop.edu/surveys/?s=KHTXCXJJ93&module_name=%22Module+Template%22)!
+
+Remember to change the redcap link so that the module name is correct for this module!
diff --git a/git/conflicts.md b/git/conflicts.md
new file mode 100644
index 000000000..cd6298f8e
--- /dev/null
+++ b/git/conflicts.md
@@ -0,0 +1,485 @@
+
+
+# Conflicts
+
+
+
+
+## Overview
+@comment
+
+**Is this module right for me?** @long_description
+
+**Estimated time to completion:** 15-30 minutes
+
+**Pre-requisites**
+
+* A [GitHub](https://github.com/) account
+* At least one GitHub repository associated with your username
+* (Preferred) Completion of git modules 1-8
+
+**Learning Objectives**
+
+@learning_objectives
+
+
+
+
+## Lesson Preparation
+
+- Open a web browser, like Chrome or Firefox
+- Open a command line terminal
+
+## Conflicts
+
+As soon as people can work in parallel, they’ll likely step on each other’s toes. This will even happen with a single person: if we are working on a piece of software on both our laptop and a server in the lab, we could make different changes to each copy. Version control helps us manage these conflicts by giving us tools to resolve overlapping changes.
+
+To see how we can resolve conflicts, we must first create one. The file mars.txt currently looks like this in both partners’ copies of our `planets` repository:
+
+```console
+$ cat mars.txt
+```
+```output
+Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman
+But the Mummy will appreciate the lack of humidity
+```
+
+Let’s add a line to the collaborator’s copy only:
+
+```console
+$ nano mars.txt
+$ cat mars.txt
+```
+```output
+Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman
+But the Mummy will appreciate the lack of humidity
+This line added to Wolfman's copy
+```
+
+and then push the change to GitHub:
+
+```console
+$ git add mars.txt
+$ git commit -m "Add a line in our home copy"
+```
+```output
+[main 5ae9631] Add a line in our home copy
+ 1 file changed, 1 insertion(+)
+```
+
+```console
+$ git push origin main
+```
+```output
+Enumerating objects: 5, done.
+Counting objects: 100% (5/5), done.
+Delta compression using up to 8 threads
+Compressing objects: 100% (3/3), done.
+Writing objects: 100% (3/3), 331 bytes | 331.00 KiB/s, done.
+Total 3 (delta 2), reused 0 (delta 0)
+remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
+To https://github.com/vlad/planets.git
+ 29aba7c..dabb4c8 main -> main
+```
+
+Now let’s have the owner make a different change to their copy _without_ updating from GitHub:
+
+```console
+$ nano mars.txt
+$ cat mars.txt
+```
+```output
+Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman
+But the Mummy will appreciate the lack of humidity
+We added a different line in the other copy
+```
+
+We can commit the change locally:
+
+```console
+$ git add mars.txt
+$ git commit -m "Add a line in my copy"
+```
+```output
+[main 07ebc69] Add a line in my copy
+ 1 file changed, 1 insertion(+)
+```
+
+but Git won’t let us push it to GitHub:
+
+```console
+$ git push origin main
+```
+```output
+To https://github.com/vlad/planets.git
+ ! [rejected] main -> main (fetch first)
+error: failed to push some refs to 'https://github.com/vlad/planets.git'
+hint: Updates were rejected because the remote contains work that you do
+hint: not have locally. This is usually caused by another repository pushing
+hint: to the same ref. You may want to first integrate the remote changes
+hint: (e.g., 'git pull ...') before pushing again.
+hint: See the 'Note about fast-forwards' in 'git push --help' for details.
+```
+
+
+
+Git rejects the push because it detects that the remote repository has new updates that have not been incorporated into the local branch. What we have to do is pull the changes from GitHub, merge them into the copy we’re currently working in, and then push that. Let’s start by pulling:
+
+```console
+$ git pull origin main
+```
+```output
+remote: Enumerating objects: 5, done.
+remote: Counting objects: 100% (5/5), done.
+remote: Compressing objects: 100% (1/1), done.
+remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
+Unpacking objects: 100% (3/3), done.
+From https://github.com/vlad/planets
+ * branch main -> FETCH_HEAD
+ 29aba7c..dabb4c8 main -> origin/main
+Auto-merging mars.txt
+CONFLICT (content): Merge conflict in mars.txt
+Automatic merge failed; fix conflicts and then commit the result.
+```
+
+The git pull command updates the local repository to include those changes already included in the remote repository. After the changes from remote branch have been fetched, Git detects that changes made to the local copy overlap with those made to the remote repository, and therefore refuses to merge the two versions to stop us from trampling on our previous work. The conflict is marked in in the affected file:
+
+```console
+$ cat mars.txt
+```
+```output
+Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman
+But the Mummy will appreciate the lack of humidity
+<<<<<<< HEAD
+We added a different line in the other copy
+=======
+This line added to Wolfman's copy
+>>>>>>> dabb4c8c450e8475aee9b14b4383acc99f42af1d
+```
+
+Our change is preceded by `<<<<<<< HEAD`. Git has then inserted `=======` as a separator between the conflicting changes and marked the end of the content downloaded from GitHub with `>>>>>>>`. (The string of letters and digits after that marker identifies the commit we’ve just downloaded.)
+
+It is now up to us to edit this file to remove these markers and reconcile the changes. We can do anything we want: keep the change made in the local repository, keep the change made in the remote repository, write something new to replace both, or get rid of the change entirely. Let’s replace both so that the file looks like this:
+
+```console
+$ cat mars.txt
+```
+```output
+Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman
+But the Mummy will appreciate the lack of humidity
+We removed the conflict on this line
+```
+
+To finish merging, we add `mars.txt` to the changes being made by the merge and then commit:
+
+
+```console
+$ git add mars.txt
+$ git status
+```
+```output
+On branch main
+All conflicts fixed but you are still merging.
+ (use "git commit" to conclude merge)
+
+Changes to be committed:
+
+ modified: mars.txt
+```
+
+
+```console
+$ git commit -m "Merge changes from GitHub"
+```
+```output
+[main 2abf2b1] Merge changes from GitHub
+```
+
+Now we can push our changes to GitHub:
+
+```console
+$ git push origin main
+```
+```output
+Enumerating objects: 10, done.
+Counting objects: 100% (10/10), done.
+Delta compression using up to 8 threads
+Compressing objects: 100% (6/6), done.
+Writing objects: 100% (6/6), 645 bytes | 645.00 KiB/s, done.
+Total 6 (delta 4), reused 0 (delta 0)
+remote: Resolving deltas: 100% (4/4), completed with 2 local objects.
+To https://github.com/vlad/planets.git
+ dabb4c8..2abf2b1 main -> main
+```
+
+Git keeps track of what we’ve merged with what, so we don’t have to fix things by hand again when the collaborator who made the first change pulls again:
+
+```console
+$ git pull origin main
+```
+```output
+remote: Enumerating objects: 10, done.
+remote: Counting objects: 100% (10/10), done.
+remote: Compressing objects: 100% (2/2), done.
+remote: Total 6 (delta 4), reused 6 (delta 4), pack-reused 0
+Unpacking objects: 100% (6/6), done.
+From https://github.com/vlad/planets
+ * branch main -> FETCH_HEAD
+ dabb4c8..2abf2b1 main -> origin/main
+Updating dabb4c8..2abf2b1
+Fast-forward
+ mars.txt | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+```
+
+We get the merged file:
+
+```console
+$ cat mars.txt
+```
+```output
+Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman
+But the Mummy will appreciate the lack of humidity
+We removed the conflict on this line
+```
+
+We don’t need to merge again because Git knows someone has already done that.
+
+Git’s ability to resolve conflicts is very useful, but conflict resolution costs time and effort, and can introduce errors if conflicts are not resolved correctly. If you find yourself resolving a lot of conflicts in a project, consider these technical approaches to reducing them:
+
+- Pull from upstream more frequently, especially before starting new work
+- Use topic branches to segregate work, merging to main when complete
+- Make smaller more atomic commits
+- Where logically appropriate, break large files into smaller ones so that it is less likely that two authors will alter the same file simultaneously
+
+Conflicts can also be minimized with project management strategies:
+
+- Clarify who is responsible for what areas with your collaborators
+- Discuss what order tasks should be carried out in with your collaborators so that tasks expected to change the same lines won’t be worked on simultaneously
+- If the conflicts are stylistic churn (e.g. tabs vs. spaces), establish a project convention that is governing and use code style tools (e.g. `htmltidy`, `perltidy`, `rubocop`, etc.) to enforce, if necessary
+
+
+### Exercise: Solving conflicts that you create
+
+Clone the repository created by your instructor. Add a new file to it, and modify an existing file (your instructor will tell you which one). When asked by your instructor, pull her changes from the repository to create a conflict, then resolve it.
+
+### Conflicts on Non-textual files
+
+#### Code changes
+Git is clever enough to recognize when two people have made changes to the same lines of text files. However, Git is NOT smart enough to know about changes to images or other binary files not to detect any logical errors in code. Git does not lint or check code, even if they're text files, for functionality or logical consistency. That still needs to be a manal process!
+
+#### Images
+
+What does Git do when there is a conflict in an image or some other non-textual file that is stored in version control?
+
+Let’s try it. Suppose Dracula takes a picture of Martian surface and calls it `mars.jpg`.
+
+If you do not have an image file of Mars available, you can create a dummy binary file like this:
+
+```console
+$ head -c 1024 /dev/urandom > mars.jpg
+$ ls -lh mars.jpg
+```
+```output
+-rw-r--r-- 1 vlad 57095 1.0K Mar 8 20:24 mars.jpg
+```
+
+`ls` shows us that this created a 1-kilobyte file. It is full of random bytes read from the special file, `/dev/urandom`.
+
+Now, suppose Dracula adds `mars.jpg` to his repository:
+
+```console
+$ git add mars.jpg
+$ git commit -m "Add picture of Martian surface"
+```
+```output
+[main 8e4115c] Add picture of Martian surface
+ 1 file changed, 0 insertions(+), 0 deletions(-)
+ create mode 100644 mars.jpg
+```
+
+Suppose that Wolfman has added a similar picture in the meantime. His is a picture of the Martian sky, but it is also called mars.jpg. When Dracula tries to push, he gets a familiar message:
+
+```console
+$ git push origin main
+```
+```output
+To https://github.com/vlad/planets.git
+ ! [rejected] main -> main (fetch first)
+error: failed to push some refs to 'https://github.com/vlad/planets.git'
+hint: Updates were rejected because the remote contains work that you do
+hint: not have locally. This is usually caused by another repository pushing
+hint: to the same ref. You may want to first integrate the remote changes
+hint: (e.g., 'git pull ...') before pushing again.
+hint: See the 'Note about fast-forwards' in 'git push --help' for details.
+```
+
+We’ve learned that we must pull first and resolve any conflicts:
+
+```console
+$ git pull origin main
+```
+
+When there is a conflict on an image or other binary file, git prints a message like this:
+
+```output
+$ git pull origin main
+remote: Counting objects: 3, done.
+remote: Compressing objects: 100% (3/3), done.
+remote: Total 3 (delta 0), reused 0 (delta 0)
+Unpacking objects: 100% (3/3), done.
+From https://github.com/vlad/planets.git
+ * branch main -> FETCH_HEAD
+ 6a67967..439dc8c main -> origin/main
+warning: Cannot merge binary files: mars.jpg (HEAD vs. 439dc8c08869c342438f6dc4a2b615b05b93c76e)
+Auto-merging mars.jpg
+CONFLICT (add/add): Merge conflict in mars.jpg
+Automatic merge failed; fix conflicts and then commit the result.
+```
+
+The conflict message here is mostly the same as it was for `mars.txt`, but there is one key additional line:
+
+```output
+warning: Cannot merge binary files: mars.jpg (HEAD vs. 439dc8c08869c342438f6dc4a2b615b05b93c76e)
+```
+
+Git cannot automatically insert conflict markers into an image as it does for text files. So, instead of editing the image file, we must check out the version we want to keep. Then we can add and commit this version.
+
+On the key line above, Git has conveniently given us commit identifiers for the two versions of `mars.jpg`. Our version is `HEAD`, and Wolfman’s version is `439dc8c0....` If we want to use our version, we can use `git checkout`:
+
+```console
+$ git checkout HEAD mars.jpg
+$ git add mars.jpg
+$ git commit -m "Use image of surface instead of sky"
+```
+```output
+[main 21032c3] Use image of surface instead of sky
+```
+If instead we want to use Wolfman’s version, we can use `git checkout` with Wolfman’s commit identifier, `439dc8c0`:
+
+
+```console
+$ git checkout 439dc8c0 mars.jpg
+$ git add mars.jpg
+$ git commit -m "Use image of sky instead of surface"
+```
+```output
+[main da21b34] Use image of sky instead of surface
+```
+
+We can also keep both images. The catch is that we cannot keep them under the same name. But, we can check out each version in succession and rename it, then add the renamed versions. First, check out each image and rename it:
+
+
+```console
+$ git checkout HEAD mars.jpg
+$ git mv mars.jpg mars-surface.jpg
+$ git checkout 439dc8c0 mars.jpg
+$ mv mars.jpg mars-sky.jpg
+```
+
+Then, remove the old mars.jpg and add the two new files:
+
+```console
+$ git rm mars.jpg
+$ git add mars-surface.jpg
+$ git add mars-sky.jpg
+$ git commit -m "Use two images: surface and sky"
+```
+```output
+[main 94ae08c] Use two images: surface and sky
+ 2 files changed, 0 insertions(+), 0 deletions(-)
+ create mode 100644 mars-sky.jpg
+ rename mars.jpg => mars-surface.jpg (100%)
+```
+
+Now both images of Mars are checked into the repository, and `mars.jpg` no longer exists.
+
+### Quiz: A typical work session
+
+You sit down at your computer to work on a shared project that is tracked in a remote Git repository. During your work session, you take the following actions, but not in this order:
+
+- Make changes by appending the number 100 to a text file numbers.txt
+- Update remote repository to match the local repository
+- Celebrate your success with some fancy beverage(s)
+- Update local repository to match the remote repository
+- Stage changes to be committed
+- Commit changes to the local repository
+
+In what order should you perform these actions to minimize the chances of conflicts? Put the commands above in order in the action column of the table below. When you have the order right, see if you can write the corresponding commands in the command column. A few steps are populated to get you started.
+
+order action . . . . . . . . . . command . . . . . . . . . .
+1
+2 echo 100 >> numbers.txt
+3
+4
+5
+6 Celebrate! AFK
+
+[[A typical work session]]
+***
+
+***
+
+## Wrap-Up
+- Conflicts occur when two or more people change the same lines of the same file.
+- The version control system does not allow people to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.
+- Git is clever enough to recognize when two people have made changes to the same lines of text files. Git is NOT smart enough to know about changes to images or other binary files not to detect any logical errors in code.
+
+## Additional Resources
+
+To learn more about SSH and its setup, refer to the Software carpentries episode [here](https://swcarpentry.github.io/git-novice/07-github/index.html#3-ssh-background-and-setup).
+
+## Feedback
+
+In the beginning, we stated some goals.
+
+**Learning Objectives:**
+
+@learning_objectives
+
+We ask you to fill out a brief (5 minutes or less) survey to let us know:
+
+* If we achieved the learning objectives
+* If the module difficulty was appropriate
+* If we gave you the experience you expected
+
+We gather this information in order to iteratively improve our work. Thank you in advance for filling out [our brief survey](https://redcap.chop.edu/surveys/?s=KHTXCXJJ93&module_name=%22Module+Template%22)!
+
+Remember to change the redcap link so that the module name is correct for this module!
diff --git a/git/interim/git_existing_resources_notes.md b/git/interim/git_existing_resources_notes.md
new file mode 100644
index 000000000..c7239ec0c
--- /dev/null
+++ b/git/interim/git_existing_resources_notes.md
@@ -0,0 +1,44 @@
+
+# Git Notes
+
+
+Notes for JP^2 to gather necessary resources to prepare the git education module.
+
+
+## Existing Resources
+
+* [Git 101](https://education.arcus.chop.edu/git-101/)
+* [Git 102](https://education.arcus.chop.edu/git-102/)
+* [Arcus Cataloging Manual](https://docs.google.com/document/d/1XVSnZSAYWpml4i5SpFtPiIl4tWWwUdkVhDYmbZDVkFc/edit#heading=h.anoqvlic7g4a)
+ * Some sections include git commands within the mms-ide.
+ * This may be of limited help since it has very strict context.
+* [Github published docs](https://docs.github.com/en/get-started)
+* [Joy's slides for CI Fellows](https://docs.google.com/presentation/d/1NPeWZzSEeE3wCynYXLzAN5WRkX63qvKX/edit#slide=id.p19)
+
+
+## Module Prep Notes 20211209
+
+### Gap Analysis
+
+* Are there any major concepts missing from the existing resources? Can we find other materials to cover those gaps or do we need to create materials?
+* Don't be afraid of overlap and reminders
+* Use lens of personas for outline?
+ * individual workers who want to keep all their steps
+ * groups that need to coordinate change
+* 3 modules to start. Each module will be on its own branch in education_modules. Each JP will work on each subset of each module.
+ * Demystifying Git/Github
+ * What is it? Why use it?
+ * Setting up Git
+ * installation, signing up, various companies, BB GH, etc
+ * Good Enough Git
+ * "Doing git badly"
+ * focus on streamlined process for getting off the ground without necessarily using "best practices" yet
+ * Set expectations for gatekeeping and bro culture in Git/Github and bracing yourself for stack overflow
+ * mostly working in main, not branching all the time
+ * not working with PRs
+ * majority solo work
+ * Others concepts/modules to cover or ideas to incorporate
+ * git repos and the file system... what is git DOING?
+ * git concepts: commits and branches -- when code needs to be updated and changed!
+ * other git tools (VS Code, RStudio, Atom)
+ * Aside -- we're opinonated about GH and teach this first....
diff --git a/git/remotes_in_git.md b/git/remotes_in_git.md
new file mode 100644
index 000000000..efe51dfaa
--- /dev/null
+++ b/git/remotes_in_git.md
@@ -0,0 +1,464 @@
+
+
+# Remotes in GitHub
+
+
+
+
+## Overview
+@comment
+
+**Is this module right for me?** @long_description
+
+**Estimated time to completion:** 45 minutes-1 hour
+
+**Pre-requisites**
+
+List any skills and knowledge needed to do this module here. When available, include links to resources, especially other modules we've made (to show learners where this falls within our catalog).
+
+* A [GitHub](https://github.com/) account
+* (Preferred) Completion of git modules 1-6
+
+**Learning Objectives**
+
+@learning_objectives
+
+
+
+
+## Lesson Preparation
+
+Open a web browser, like Chrome or Firefox and navigate to github.com.
+
+## Create a remote repository
+
+We will be creating a remote version of the `planets` repository we worked with in the previous lessons. We will then connect our remote version to our local version of the `planets` repository.
+
+Log in to [GitHub](github.com), then click on the icon in the top right corner to create a new repository:
+
+
+
+Name your repository “planets” and then click “Create Repository”.
+
+
+Since this repository will be connected to a local repository, it needs to be empty. Leave “Initialize this repository with a README” unchecked, and keep “None” as options for both “Add .gitignore” and “Add a license.” See the “GitHub License and README files” exercise below for a full explanation of why the repository needs to be empty.
+
+
+
+
+
+As soon as the repository is created, GitHub displays a page with a URL and some information on how to configure your local repository:
+
+
+
+This effectively does the following on GitHub’s servers:
+
+```console
+$ mkdir planets
+$ cd planets
+$ git init
+```
+
+If you remember back to the earlier episode where we added and committed our earlier work on mars.txt, we had a diagram of the local repository which looked like this:
+
+
+
+Now that we have two repositories, we need a diagram like this:
+
+
+
+
+Note that our local repository still contains our earlier work on mars.txt, but the remote repository on GitHub appears empty as it doesn’t contain any files yet.
+
+
+### Quiz: GitHub license and README files
+
+In this module, we learned about creating a remote repository on GitHub, but when you initialized your GitHub repo, you didn’t add a README.md or a license file. If you had, what do you think would have happened when you tried to link your local and remote repositories?
+
+[[GitHub license and README files]]
+***
+
+In this case, we’d see a merge conflict due to unrelated histories. When GitHub creates a README.md file, it performs a commit in the remote repository. When you try to pull the remote repository to your local repository, Git detects that they have histories that do not share a common origin and refuses to merge.
+
+```console
+$ git pull origin main
+```
+
+```output
+warning: no common commits
+remote: Enumerating objects: 3, done.
+remote: Counting objects: 100% (3/3), done.
+remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
+Unpacking objects: 100% (3/3), done.
+From https://github.com/vlad/planets
+ * branch main -> FETCH_HEAD
+ * [new branch] main -> origin/main
+fatal: refusing to merge unrelated histories
+```
+
+***
+
+## Connect local to remote repository
+
+Now we connect the two repositories. We do this by making the GitHub repository a remote for the local repository. The home page of the repository on GitHub includes the URL string we need to identify it:
+
+
+
+Click on the ‘SSH’ link to change the protocol from HTTPS to SSH.
+
+
+
+Copy that URL from the browser, go into the local planets repository, and run this command:
+
+
+```console
+$ git remote add origin git@github.com:vlad/planets.git
+```
+
+Make sure to use the URL for your repository rather than Vlad’s: the only difference should be your username instead of `vlad`.
+
+`origin` is a local name used to refer to the remote repository. It could be called anything, but `origin` is a convention that is often used by default in git and GitHub, so it’s helpful to stick with this unless there’s a reason not to.
+
+We can check that the command has worked by running `git remote -v`:
+
+```console
+$ git remote -v
+```
+
+```output
+origin git@github.com:vlad/planets.git (fetch)
+origin git@github.com:vlad/planets.git (push)
+```
+
+We’ll discuss remotes in more detail in the next episode, while talking about how they might be used for collaboration.
+
+### SSH
+
+Before Dracula can connect to a remote repository, he needs to set up a way for his computer to authenticate with GitHub so it knows it’s him trying to connect to his remote repository. Additionally, we use SSH here because, while it requires some additional configuration, it is a security protocol widely used by many applications.
+
+We are going to set up the method that is commonly used by many different services to authenticate access on the command line. This method is called Secure Shell Protocol (SSH). SSH is a cryptographic network protocol that allows secure communication between computers using an otherwise insecure network.
+
+SSH uses what is called a key pair. This is two keys that work together to validate access. One key is publicly known and called the public key, and the other key called the private key is kept private. Very descriptive names.
+
+You can think of the public key as a padlock, and only you have the key (the private key) to open it. You use the public key where you want a secure method of communication, such as your GitHub account. You give this padlock, or public key, to GitHub and say “lock the communications to my account with this so that only computers that have my private key can unlock communications and send git commands as my GitHub account.”
+
+What we will do now is the minimum required to set up the SSH keys and add the public key to a GitHub account.
+
+
+Github no longer supports password authentication as of August 2021. If you do try to use HTTPS to connect to remote you will encounter the error:
+`remote: Support for password authentication was removed on August 13, 2021. Please use a personal access token instead.
+remote: Please see https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/ for more information.
+fatal: Authentication failed for 'https://github.com/vlad/planets.git/`
+
+
+### SSH Setup
+
+The first thing we are going to do is check if this has already been done on the computer you’re on. Because generally speaking, this setup only needs to happen once and then you can forget about it.
+
+
+**Keeping your keys secure**
+You shouldn’t really forget about your SSH keys, since they keep your account secure. It’s good practice to audit your secure shell keys every so often. Especially if you are using multiple computers to access your account.
+
+
+We will run the list command to check what key pairs already exist on your computer.
+
+```console
+ls -al ~/.ssh
+```
+Your output is going to look a little different depending on whether or not SSH has ever been set up on the computer you are using.
+
+Dracula has not set up SSH on his computer, so his output is
+```console
+ls: cannot access '/c/Users/Vlad Dracula/.ssh': No such file or directory
+```
+
+If SSH has been set up on the computer you’re using, the public and private key pairs will be listed. The file names are either `id_ed25519/id_ed25519.pub` or `id_rsa/id_rsa.pub` depending on how the key pairs were set up.
+
+
+#### Create an SSH key pair
+
+To create an SSH key pair Vlad uses this command, where the -t option specifies which type of algorithm to use and -C attaches a comment to the key (here, Vlad’s email):
+
+```console
+ssh-keygen -t ed25519 -C "vlad@tran.sylvan.ia"
+```
+
+If you are using a legacy system that doesn’t support the Ed25519 algorithm, use: `$ ssh-keygen -t rsa -b 4096 -C "your_email@example.com"`
+
+
+```output
+Generating public/private ed25519 key pair.
+Enter file in which to save the key (/c/Users/Vlad Dracula/.ssh/id_ed25519):
+```
+
+We want to use the default file, so just press `Enter`.
+
+```output
+Created directory '/c/Users/Vlad Dracula/.ssh'.
+Enter passphrase (empty for no passphrase):
+```
+
+Now, it is prompting Dracula for a passphrase. Since he is using his lab’s laptop that other people sometimes have access to, he wants to create a passphrase. Be sure to use something memorable or save your passphrase somewhere, as there is no “reset my password” option.
+```output
+Enter same passphrase again:
+```
+
+After entering the same passphrase a second time, we receive the confirmation
+
+```console
+Your identification has been saved in /c/Users/Vlad Dracula/.ssh/id_ed25519
+Your public key has been saved in /c/Users/Vlad Dracula/.ssh/id_ed25519.pub
+The key fingerprint is:
+SHA256:SMSPIStNyA00KPxuYu94KpZgRAYjgt9g4BA4kFy3g1o vlad@tran.sylvan.ia
+The key's randomart image is:
++--[ED25519 256]--+
+|^B== o. |
+|%*=.*.+ |
+|+=.E =.+ |
+| .=.+.o.. |
+|.... . S |
+|.+ o |
+|+ = |
+|.o.o |
+|oo+. |
++----[SHA256]-----+
+```
+
+The “identification” is actually the private key. You should never share it. The public key is appropriately named. The “key fingerprint” is a shorter version of a public key.
+
+Now that we have generated the SSH keys, we will find the SSH files when we check.
+
+```console
+ls -al ~/.ssh
+```
+```output
+drwxr-xr-x 1 Vlad Dracula 197121 0 Jul 16 14:48 ./
+drwxr-xr-x 1 Vlad Dracula 197121 0 Jul 16 14:48 ../
+-rw-r--r-- 1 Vlad Dracula 197121 419 Jul 16 14:48 id_ed25519
+-rw-r--r-- 1 Vlad Dracula 197121 106 Jul 16 14:48 id_ed25519.pub
+```
+
+#### Copy public key to GitHub
+
+Now we have a SSH key pair and we can run this command to check if GitHub can read our authentication.
+
+```console
+ssh -T git@github.com
+```
+```output
+The authenticity of host 'github.com (192.30.255.112)' can't be established.
+RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
+This key is not known by any other names
+Are you sure you want to continue connecting (yes/no/[fingerprint])? y
+Please type 'yes', 'no' or the fingerprint: yes
+Warning: Permanently added 'github.com' (RSA) to the list of known hosts.
+git@github.com: Permission denied (publickey).
+```
+
+Right, we forgot that we need to give GitHub our public key!
+
+First, we need to copy the public key. Be sure to include the `.pub` at the end, otherwise you’re looking at the private key.
+
+```console
+cat ~/.ssh/id_ed25519.pub
+```
+```output
+ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDmRA3d51X0uu9wXek559gfn6UFNF69yZjChyBIU2qKI vlad@tran.sylvan.ia
+```
+
+Now, going to GitHub.com, click on your profile icon in the top right corner to get the drop-down menu. Click “Settings,” then on the settings page, click “SSH and GPG keys,” on the left side “Account settings” menu. Click the “New SSH key” button on the right side. Now, you can add the title (Dracula uses the title “Vlad’s Lab Laptop” so he can remember where the original key pair files are located), paste your SSH key into the field, and click the “Add SSH key” to complete the setup.
+
+Now that we’ve set that up, let’s check our authentication again from the command line.
+
+```console
+$ ssh -T git@github.com
+```
+```output
+Hi Vlad! You've successfully authenticated, but GitHub does not provide shell access.
+```
+
+## Push local changes to a remote
+
+Now that authentication is setup, we can return to the remote. This command will push the changes from our local repository to the repository on GitHub:
+
+```console
+git push origin main
+```
+
+Since Dracula set up a passphrase, it will prompt him for it. If you completed advanced settings for your authentication, it will not prompt for a passphrase.
+
+
+```output
+Enumerating objects: 16, done.
+Counting objects: 100% (16/16), done.
+Delta compression using up to 8 threads.
+Compressing objects: 100% (11/11), done.
+Writing objects: 100% (16/16), 1.45 KiB | 372.00 KiB/s, done.
+Total 16 (delta 2), reused 0 (delta 0)
+remote: Resolving deltas: 100% (2/2), done.
+To https://github.com/vlad/planets.git
+ * [new branch] main -> main
+```
+
+
+
+**Proxy**
+
+If the network you are connected to uses a proxy, there is a chance that your last command failed with “Could not resolve hostname” as the error message. To solve this issue, you need to tell Git about the proxy:
+
+$ git config --global http.proxy http://user:password@proxy.url
+$ git config --global https.proxy https://user:password@proxy.urlTesting hint box
+
+When you connect to another network that doesn’t use a proxy, you will need to tell Git to disable the proxy using:
+
+`$ git config --global --unset http.proxy`
+`$ git config --global --unset https.proxy`
+
+
+
+**Password Managers**
+
+If your operating system has a password manager configured, git push will try to use it when it needs your username and password. For example, this is the default behavior for Git Bash on Windows. If you want to type your username and password at the terminal instead of using a password manager, type:
+
+ ```console
+$ unset SSH_ASKPASS
+```
+
+in the terminal, before you run `git push`. Despite the name, Git uses SSH_ASKPASS for all credential entry, so you may want to unset `SSH_ASKPASS` whether you are using Git via SSH or https.
+
+You may also want to add `unset SSH_ASKPASS` at the end of your `~/.bashrc` to make Git default to using the terminal for usernames and passwords.
+
+
+
+Our local and remote repositories are now in this state:
+
+
+
+
+**the -u flag**
+
+You may see a -u option used with git push in some documentation. This option is synonymous with the --set-upstream-to option for the git branch command, and is used to associate the current branch with a remote branch so that the git pull command can be used without any arguments. To do this, simply use git push -u origin main once the remote has been set up.
+
+
+We can pull changes from the remote repository to the local one as well:
+
+ ```console
+$ git pull origin main
+```
+
+ ```output
+From https://github.com/vlad/planets
+ * branch main -> FETCH_HEAD
+Already up-to-date.
+```
+
+Pulling has no effect in this case because the two repositories are already synchronized. If someone else had pushed some changes to the repository on GitHub, though, this command would download them to our local repository.
+
+### Quiz: Push vs. Commit
+
+In this module, we introduced the “git push” command. How is “git push” different from “git commit”?
+
+[[Push vs. Commit]]
+***
+
+When we push changes, we’re interacting with a remote repository to update it with the changes we’ve made locally (often this corresponds to sharing the changes we’ve made with others). Commit only updates your local repository.
+
+***
+
+
+## GitHub GUI
+
+The GitHub GUI, or Graphical User Interface, in the browser allows you to perform many tasks you are also able to do on the command line.
+
+**Uploading files directly in GitHub browser**
+
+Github also allows you to skip the command line and upload files directly to your repository without having to leave the browser. There are two options. First you can click the “Upload files” button in the toolbar at the top of the file tree. Or, you can drag and drop files from your desktop onto the file tree. You can read more about this on [this GitHub page](https://docs.github.com/en/repositories/working-with-files/managing-files/adding-a-file-to-a-repository).
+
+### Quiz: GitHub GUI
+
+Browse to your `planets` repository on GitHub. Under the Code tab, find and click on the text that says “XX commits” (where “XX” is some number). Hover over, and click on, the three buttons to the right of each commit. What information can you gather/explore from these buttons? How would you get that same information in the shell?
+
+[[GitHub GUI]]
+***
+
+The left-most button (with the picture of a clipboard) copies the full identifier of the commit to the clipboard. In the shell, `git log` will show you the full commit identifier for each commit.
+
+When you click on the middle button, you’ll see all of the changes that were made in that particular commit. Green shaded lines indicate additions and red ones removals. In the shell we can do the same thing with `git diff`. In particular, `git diff ID1..ID2` where ID1 and ID2 are commit identifiers (e.g. `git diff a3bf1e5..041e637`) will show the differences between those two commits.
+
+The right-most button lets you view all of the files in the repository at the time of that commit. To do this in the shell, we’d need to checkout the repository at that particular time. We can do this with git checkout ID where ID is the identifier of the commit we want to look at. If we do this, we need to remember to put the repository back to the right state afterwards!
+
+***
+
+
+### Quiz: GitHub Timestamp
+
+Create a remote repository on GitHub. Push the contents of your local repository to the remote. Make changes to your local repository and push these changes. Go to the repo you just created on GitHub and check the timestamps of the files. How does GitHub record times, and why?
+
+[[GitHub Timestamp]]
+***
+
+GitHub displays timestamps in a human readable relative format (i.e. “22 hours ago” or “three weeks ago”). However, if you hover over the timestamp, you can see the exact time at which the last change to the file occurred.
+
+***
+
+## Wrap-Up
+- A local Git repository can be connected to one or more remote repositories.
+- Use the SSH protocol to connect to remote repositories.
+- `git push` copies changes from a local repository to a remote repository.
+- `git pull` copies changes from a remote repository to a local repository.
+
+
+
+## Additional Resources
+
+To learn more about SSH and its setup, refer to the Software carpentries episode [here](https://swcarpentry.github.io/git-novice/07-github/index.html#3-ssh-background-and-setup).
+
+
+Testing hint box
+
+
+
+Testing options box
+
+
+
+
+## Feedback
+
+In the beginning, we stated some goals.
+
+**Learning Objectives:**
+
+@learning_objectives
+
+We ask you to fill out a brief (5 minutes or less) survey to let us know:
+
+* If we achieved the learning objectives
+* If the module difficulty was appropriate
+* If we gave you the experience you expected
+
+We gather this information in order to iteratively improve our work. Thank you in advance for filling out [our brief survey](https://redcap.chop.edu/surveys/?s=KHTXCXJJ93&module_name=%22Module+Template%22)!
+
+Remember to change the redcap link so that the module name is correct for this module!
diff --git a/how_to.md b/how_to.md
deleted file mode 100644
index b0601b5a2..000000000
--- a/how_to.md
+++ /dev/null
@@ -1,182 +0,0 @@
-# How to Create an Educational Module
-
-This document describes how to create and submit an educational module for inclusion in the Children's Hospital of Philadelphia / Drexel University "Educational Pathways in Biomedical Data Science" program, financed by NIH and institutional funding by recipient organizations.
-
-Contents of this document include:
-
-* [Choose a Topic](#choose-a-atopic)
-* [Begin Writing](#begin-writing)
-* [Use GitHub](#use-github)
-* [Checklists and Reports](#checklists-and-reports)
-
-## Choose a Topic
-
-It's tempting to jump in and start creating, but a few preliminary plans about how you'd like to proceed may save you a lot of time. Please read this entire document and view some of the existing modules to understand how module materials are organized and what topics are appropriate. To start, here are some priorities we keep in mind when we make a judgement on approving the inclusion of a module:
-
-
-FAIR Principles
-
-The educational modules in this project both promote [FAIR principles](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/) as described by NIH and attempt to embody these principles. Our materials should be Findable, Accessible, Interoperable, and Reusable, and advocate these overarching priorities in the application of data science to research.
-
-As part of our FAIR methodology, we give our materials away free of charge and free of limits on their use, as long as proper citation is given. You may not include proprietary data or intellectual property in your module, limit your module's use to only your institution, or promote commercial tools or resources that you have financial interest in. In our use of the FAIR principles we prioritize whenever possible the use of and instruction in free, open-source software (FOSS) rather than commercial products. If you would like to create a module teaching the use of a commercial resource, consider whether FOSS products can provide as good or better a research experience, with as good or greater FAIR data outcomes. For example, Microsoft Excel is paid software (perhaps less accessible in lower resourced settings) in which data analytics is not particularly reusable (a script in R or Python can capture all of the steps of an analysis while an Excel spreadsheet cannot). Instead of describing how to do a task in Excel, describe how to do it in a FOSS alternative instead, such as R within the RStudio IDE.
-
-
-
-Inclusive Design
-
-We expect authors to design with a variety of ["edge" users](https://guide.inclusivedesign.ca/activities/inclusive-design-mapping/) in mind: those with limited access to visual content or auditory content, those with barriers related to attention, cognition, sensory processing, or language, and those with limited technology access and/or financial resources. Wherever possible we encourage a multi-modal approach to education, such that no instruction relies solely on a single type of communication (text, video, audio, images, code) but provides several ways to engage with materials. Our [inclusivity guidelines can be found below](#Inclusivity_Guidelines).
-
-
-
-
-Topic, Audience, and Scope
-
-We aim to provide data analytics and closely related skills to researchers at all levels. We hew closely to the [NIH Strategic Plan for Data Science](https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan_for_Data_Science_Final_508.pdf), and the modules we write should be linked to a particular objective and/or tactic of this Strategic Plan. The topic should be relevant to the conduct of research and should clearly state both what pre-requisite knowledge or skills are necessary prior to engaging with the material, and some contextualization of why the topic matters for researchers.
-
-A module in this project will be brief (one hour or less) and therefore of limited, well-described scope. Consider not only what you teach about, but what you intentionally exclude. It may be useful to map a plan of 2, 3, or more modules that teach related skills and together provide broad coverage, but remember that modules are freestanding. Modules can describe pre-requisite skills, but cannot ask learners to have acquired the necessary skills from other modules. For example, a module covering the use of n-grams in NLP should not include a pre-requisite such as "previous completion of the module Introduction to NLTK in Python", but could include "previous experience with NLTK and the ability to ingest text data in Python", perhaps with a link to a related module for learners who would like to acquire such experience there.
-
-
-
-## Begin Writing
-
-### Name Files and Folders
-
-Make a folder with an expressive name, using lower case and underscores, like `lasso_and_ridge_ml_in_R` or `bayesian_stats_in_python`. This will hold all of the files that are unique to your module.
-
-To start writing the main file that will make up your module, use [a sample module template](a_sample_module_template/a_sample_module_template.md) as your basis. Make a copy of this and save it within your new folder with an almost identical title to the folder, one that ends in `.md`, like `lasso_and_ridge_ml_in_R.md` or `bayesian_stats_in_python.md`. This template includes some boilerplate text to show you how to write in Liascript flavored markdown (it's pretty similar to other markdown flavors, but with a few things added in).
-
-Importantly, starting with your overview text will help you scope your topic. Three to five learning objectives are plenty for a module of about one hour's duration.
-
-### Save Assets Well
-
-Images, videos, and other audio-visual assets that you want to include should be saved in a folder within your main folder, called `media`. Code samples that you want to include should go in a folder named `code`.
-Consider the following directory tree as a sample showing you how your lesson might look:
-
-```
-└── lesson_name
- ├── lesson_name.md
- ├── media
- │ ├── some_video.mp4
- │ └── some_image.png
- └── code
- ├── some_markdown.Rmd
- ├── python_sample.ipynb
- └── to_be_completed.R
-```
-
-## Use GitHub
-
-New module submissions belong on their own branch while they are still in progress. Only once a module has been approved and passed all quality checks may a PR (pull request) be merged to the main branch. Create a new branch that is descriptive, and commit your changes to that branch and publish it to the repository (either the canonical repository, if you are a collaborator, or to your fork of the canonical repository, if you are an offsite potential collaborator). Use descriptive commit messages and leave all changes on your new branch until you are ready to ask for module approval.
-
-When you're ready to request approval, create a Pull Request in GitHub, asking to merge your branch into main. Include a comment describing the module. An administrator of the repository will create a new Issue in the repository, citing your PR. The administrator will apply the [module checklist included below](#module-review-checklist) and may reply with requests for updates or improvements in the form of a [Module Quality Assurance Report](#module-quality-assurance-report) within issue comments. Feel free to conduct a conversation through the comments on the issue, which will endure in time, while your PR will not. Comments on the issue are a better practice for historical records.
-
-Once any outstanding improvements are addressed, the administrator checking your module will approve the PR with a squash and merge and delete your branch. Congratulations, your module is now part of the portfolio of educational modules for the project!
-
-## Checklists and Reports
-
-### Inclusivity Guidelines
-
-This is a working document articulating our goals and standards for creating content that will be valuable and accessible to as wide a range of users as possible. Specifically, we are designing with a variety of ["edge" users](https://guide.inclusivedesign.ca/activities/inclusive-design-mapping/) in mind: those with limited access to visual content or auditory content, those with barriers related to attention, cognition, sensory processing, or language, and those with limited technology access and/or financial resources. A core principle is to allow users as much flexibility as possible to configure their learning experiences to meet their own needs and preferences.
-
-The guidelines presented here are inspired by several other guides including the [Web Content Accessibility Guidelines WCAG21](https://www.w3.org/WAI/WCAG21/quickref) and [The Inclusive Learning Design Handbook](https://handbook.floeproject.org/approachesoverview). Recommendations specific to educational videos can be found [here](https://ctl.wiley.com/how-to-ensure-accessibility-for-educational-videos/).
-
-#### Guidelines
-
-* Maximize opportunities for users to customize their own learning experience
- - Relevant information should be provided for each module upfront to help users decide which ones to do and which to skip: time estimate, expected learning outcomes, etc.
- - Provide content in a variety of forms and styles: screencasts, text, webinars/lectures, practical exercises, etc. Whenever possible, multiple forms/styles should be incorporated in each module so learners have multiple avenues to the content.
-* Maximize opportunities for users to configure pages to meet their own needs/preferences
- - Customization options should be available for the visual appearance: Text size, contrast, colors, etc.
- - Customization options should be available for content/organization support: Optional table of contents
-* Make text alternatives available by default for all visual and audio content
- - Subtitles available for every video with audio.
- - Alt text available for every image.
-* Provide audio description for visuals presented in video
- - For example, in a screencast, instead of just, "And then click here," provide description that could help scaffold someone without visual access like, "And then click on the button that says 'Run' in the top-right corner of the screen". Be sure to make use of text cues when available (e.g. button labels), not just visual signals like color or location.
- - When important content is conveyed in a visual, describe the key elements. For example, "Running this query produces the table below. It displays the first 5 rows by default, and columns for ID, encounter ID, diagnosis, and outcome."
- - When including a data visualization, verbally describe important features, such as both axis labels and visible trends in the data. For example, "Here's a scatterplot showing number of encounters on the y-axis and age on the x-axis. All 183 patients from our sample are represented here, and it looks like a weak positive trend, with older patients being more likely to have had more encounters. There are a few important outliers, though, such as this patient at about 6 months old with more than 20 encounters already."
- - When visual information is repeated with minimal changes, it's fine to indicate that without providing a full description again. For example, "And here's the updated table, filtered to only show patients who have been seen in the last 2 years."
- - When important visual information is too complex to include sufficient audio description (i.e. it would slow the content down so much as to impair its utility), an alternative video file should be provided with audio descriptions included.
-* Use color choices that maximize accessibility
- - Color should never be the sole method for distinguishing visual content
-* Provide clear, consistent organization and content structure to reduce cognitive load for users
- - Use clear, informative headers for each subsection. The resulting table of contents should give a good sense of the flow and structure of the module. Avoid titles and headers that sacrifice clarity for playfulness.
- - Use parallel structure across modules covering similar content.
- - Use consistent formatting (e.g. consistent page headers and footers, consistent style controlled by css file)
-* Reduce language barriers
- - The language of each page is identified in the HTML (e.g. ``)
- - Unusual words, or words taking on a very specific meaning in context, should be defined for the user, either on the page (e.g. using footnotes) or with links to a definition/glossary
- - Provide pronunciation guides for especially unusual words of particular importance
-* Take proactive steps to be welcoming to a diverse group of potential users
- - Avoid unnecessarily gendered language (e.g. use "they" singular rather than "he or she" for an unknown person)
- - Intentionally represent diversity in our examples and images
- - Strive for diverse voices in the people presenting our content (e.g. webinars), and in the sources we direct users to
-
-#### Testing
-
-https://ux.princeton.edu/sites/ux2020/files/resource-links/testing_for_common_accessibility_errors-final.docx
-
-Our pages should work on a variety of platforms. Check that material renders well on Mac and PC, desktop and mobile, chromebook, multiple browsers (chrome, safari, firefox, ie). Try in a variety of conditions (public wifi at a coffee shop or library, non-CHOP machines, older computers, etc.) to identify fragile pieces.
-
-### Module Quality Assurance Report
-
-Module Quality Assurance Report
-Date:
-
-Name of Module :
-
-URL :
-
-Current Version of Module (use the commit value):
-
-Checklist Report*:
-
-Technical Issues (ie. links work, media plays work…etc.):
-
-General Suggestions for Improvement:
-
-* Apply any checklists / template rules and report on missing / incomplete elements
-
-### Module Review Checklist
-
-This is a template for creating issues to review new modules. To create a review issue for the module, copy the checklist below into a new GitHub issue linked to the PR for the new module.
-
-#### Content
-
-* [ ] Good amount of content, both in terms of the complexity/usefulness of the material covered and the time estimate
-* [ ] Clearly defined learning objectives using strong, descriptive verbs. (See [Bloom's taxonomy](https://cft.vanderbilt.edu/guides-sub-pages/blooms-taxonomy/) for ideas.)
-* [ ] Every learning objective is covered in the module content.
-* [ ] There are no tangents or mission creep in the module content, straying from the learning objectives.
-* [ ] No betrayal of expectations: The module title, description, learning objectives, time estimate, and overview all accurately reflect the content of the module. A learner should be able to make an informed decision about whether or not to complete the module.
-* [ ] Avoids unclear language: unexplained idioms or references, unexplained acronyms, unnecessary technical language.
-* [ ] Unusual words, or words taking on a very specific meaning in context, are always defined for the user, either on the page (e.g. using footnotes) or with links to a definition/glossary. Provides pronunciation guides for especially unusual words of particular importance.
-* [ ] Provides content in a variety of forms and styles: screencasts, text, webinars/lectures, practical exercises, etc. Whenever possible, multiple forms/styles should be incorporated in each module so learners have multiple avenues to the content.
-* [ ] Avoids unnecessarily gendered language (e.g. uses "they" singular rather than "he or she" for an unknown person).
-* [ ] Informative link text (e.g. instead of "To learn more about python, click [here](www.example.com)", say "Read this article to [learn more about python](www.example.com).")
-* [ ] Includes accurately formatted and functional link to feedback form.
-
-#### Organization
-
-* [ ] Clear, informative headers and sensible hierarchical structure (the TOC in the left margin should give a good overview of the content convered)
-* [ ] Adheres to the module template structure
-* [ ] Uses specially formatted highlight boxes consistently and appropriately
-* [ ] Short, digestible pieces --- avoids long paragraphs and breaks long sections up with sub-headers
-
-## Formative assessment
-
-* [ ] Frequent [formative assessment](https://carpentries.github.io/instructor-training/02-practice-learning/#identifying-and-correcting-misconceptions) in the form of knowledge checks and/or hands-on exercises
-* [ ] Clear explanations available after questions unless the nature of the question itself or answer options makes it unnecessary (e.g. a T/F question may not always require follow-up explanation)
-* [ ] Knowledge check questions and hands-on exercises relate directly to learning objectives
-
-#### Videos and images
-
-* [ ] Screencasts cover a single coherent task so the recording is a short as is feasible. To demonstrate more than one related task, include several short screencasts in succession rather than recording one long screencast.
-* [ ] Subtitles available for every recording with audio.
-* [ ] Alt text available for every image.
-* [ ] Important visuals (in video, image, or gif) are always described in the audio or in accompanying text.
- - For example, in a screencast, instead of just, "And then click here," provide description that could help scaffold someone without visual access like, "And then click on the button that says 'Run' in the top-right corner of the screen". Be sure to make use of text cues when available (e.g. button labels), not just visual signals like color or location.
- - When important content is conveyed in a visual, describe the key elements. For example, "Running this query produces the table below. It displays the first 5 rows by default, and columns for ID, encounter ID, diagnosis, and outcome."
- - When including a data visualization, describe important features, such as both axis labels and visible trends in the data. For example, "Here's a scatterplot showing number of encounters on the y-axis and age on the x-axis. All 183 patients from our sample are represented here, and it looks like a weak positive trend, with older patients being more likely to have had more encounters. There are a few important outliers, though, such as this patient at about 6 months old with more than 20 encounters already."
- - When visual information is repeated with minimal changes, it's fine to indicate that without providing a full description again. For example, "And here's the updated table, filtered to only show patients who have been seen in the last 2 years."
- - When important visual information in a video is too complex to include sufficient audio description (i.e. it would slow the content down so much as to impair its utility), an alternative video file should be provided with audio descriptions included.
-* [ ] Color is never the sole method for distinguishing visual content (including in data visualizations).
diff --git a/media/issue_comment.png b/media/issue_comment.png
deleted file mode 100644
index b7fdc8b97..000000000
Binary files a/media/issue_comment.png and /dev/null differ
diff --git a/media/new_issue.png b/media/new_issue.png
deleted file mode 100644
index 2081cc9ff..000000000
Binary files a/media/new_issue.png and /dev/null differ
diff --git a/media/pr_with_multiple_commits.png b/media/pr_with_multiple_commits.png
deleted file mode 100644
index 9e6a07a55..000000000
Binary files a/media/pr_with_multiple_commits.png and /dev/null differ
diff --git a/media/task_counter.png b/media/task_counter.png
deleted file mode 100644
index 2d4facbc9..000000000
Binary files a/media/task_counter.png and /dev/null differ
diff --git a/quality_assurance.md b/quality_assurance.md
deleted file mode 100644
index b5b7ce70e..000000000
--- a/quality_assurance.md
+++ /dev/null
@@ -1,105 +0,0 @@
-# Quality Assurance for modules
-
-When a module creator is ready to request that their module be included, they will create a Pull Request (PR). This begins the work of quality assurance. As someone who is reviewing the modules created by others, it's important to have use a consistent method for evaluating content. This work is exacting and can be tedious. It's probably worthwhile to look at other QA issues that have successfully been closed to see a bit more information about the level of detail other reviewers provide. To see a sample issue, look at https://github.com/arcus/education_modules/issues/9. To see a completed review, check out https://github.com/arcus/education_modules/issues/11.
-
-## Step 1: Create an Issue
-
-* Click on "Issues" or go to https://github.com/arcus/education_modules/issues.
-* Choose "New Issue" (or if your screen is small, just "New"). This is a green button on the right side:
-
-* Give your issue a good title: "QA" plus the proposed directory name from the PR. For example, if the PR includes a new module with the directory named "reproducibility", the title would be "QA reproducibility".)
-* In the "Write" tab area, paste the following (from `# Module Quality Assurance Report for PR #[put in the PR number here] ` to `* [ ] description or quote, line ___ in file ____`). Where square brackets appear, remove the square brackets and their contents and replace with the appropriate values. To see a completed review, check out https://github.com/arcus/education_modules/issues/11.
-
-```
-# Module Quality Assurance Report for PR #[PR number here]
-----
-Date: [yyyy-mm-dd]
-Reviewer: [your name]
-Name of Module: [take from the title of the main markdown in the PR]
-Current Liascript URL: [makes it easy for reviewers and authors to look at content as learners will]
-Current Version of Module (use the latest commit value): [click on the PR and get the clickable short link to the latest commit -- add screenshot here]
-
-# Checklist Reports:
-
-## Structural elements
-
-* [ ] YAML top section filled in with name, email, language, narrator, title, and comment (blurb) filled out appropriately.
-* [ ] YAML top section includes proper link to CSS (currently https://chop-dbhi-arcus-education-website-assets.s3.amazonaws.com/css/modules.css).
-* [ ] YAML top section has a version of at least 1.0.0 (first public version).
-* [ ] Title is the first line and is the only level-1 header in the document.
-* [ ] Overview section immediately follows Title, surrounded in div with class overview, and has filled in sections including an intro blurb, Estimated time to completion, Pre-requisites, Learning objectives, and Contents.
-* [ ] Contents within Overview reflect accurately the sections and the links in the contents section work.
-* [ ] Sections following Overview all have content (no pages with just header and no additional text / media material).
-* [ ] Final section is Feedback.
-
-## Content
-
-* [ ] Good amount of content, both in terms of the complexity/usefulness of the material covered and the time estimate
-* [ ] Clearly defined learning objectives using strong, descriptive verbs. (See [Bloom's taxonomy](https://cft.vanderbilt.edu/guides-sub-pages/blooms-taxonomy/) for ideas.)
-* [ ] Every learning objective is covered in the module content.
-* [ ] There are no tangents or mission creep in the module content, straying from the learning objectives.
-* [ ] No betrayal of expectations: The module title, description, learning objectives, time estimate, and overview all accurately reflect the content of the module. A learner should be able to make an informed decision about whether or not to complete the module.
-* [ ] Avoids unclear language: unexplained idioms or references, unexplained acronyms, unnecessary technical language.
-* [ ] Unusual words, or words taking on a very specific meaning in context, are always defined for the user, either on the page (e.g. using footnotes) or with links to a definition/glossary. Provides pronunciation guides for especially unusual words of particular importance.
-* [ ] Provides content in a variety of forms and styles: screencasts, text, webinars/lectures, practical exercises, etc. Whenever possible, multiple forms/styles should be incorporated in each module so learners have multiple avenues to the content.
-* [ ] Avoids unnecessarily gendered language (e.g. uses "they" singular rather than "he or she" for an unknown person).
-* [ ] Informative link text (e.g. instead of "To learn more about python, click [here](www.example.com)", say "Read this article to [learn more about python](www.example.com).")
-* [ ] Includes accurately formatted and functional link to feedback form.
-* [ ] Spelling and grammar are correct.
-
-## Organization
-
-* [ ] Clear, informative headers and sensible hierarchical structure (the TOC in the left margin should give a good overview of the content convered)
-* [ ] Uses specially formatted highlight boxes consistently and appropriately
-* [ ] Short, digestible pieces --- avoids long paragraphs and breaks long sections up with sub-headers
-
-## Formative assessment
-
-* [ ] Frequent [formative assessment](https://carpentries.github.io/instructor-training/02-practice-learning/#identifying-and-correcting-misconceptions) in the form of knowledge checks and/or hands-on exercises
-* [ ] Clear explanations available after questions unless the nature of the question itself or answer options makes it unnecessary (e.g. a T/F question may not always require follow-up explanation)
-* [ ] Knowledge check questions and hands-on exercises relate directly to learning objectives
-
-## Videos and images
-
-* [ ] Screencasts cover a single coherent task so the recording is a short as is feasible. To demonstrate more than one related task, include several short screencasts in succession rather than recording one long screencast.
-* [ ] Subtitles available for every recording with audio.
-* [ ] Alt text available for every image.
-* [ ] Important visuals (in video, image, or gif) are always described in the audio or in accompanying text.
- - For example, in a screencast, instead of just, "And then click here," provide description that could help scaffold someone without visual access like, "And then click on the button that says 'Run' in the top-right corner of the screen". Be sure to make use of text cues when available (e.g. button labels), not just visual signals like color or location.
- - When important content is conveyed in a visual, describe the key elements. For example, "Running this query produces the table below. It displays the first 5 rows by default, and columns for ID, encounter ID, diagnosis, and outcome."
- - When including a data visualization, describe important features, such as both axis labels and visible trends in the data. For example, "Here's a scatterplot showing number of encounters on the y-axis and age on the x-axis. All 183 patients from our sample are represented here, and it looks like a weak positive trend, with older patients being more likely to have had more encounters. There are a few important outliers, though, such as this patient at about 6 months old with more than 20 encounters already."
- - When visual information is repeated with minimal changes, it's fine to indicate that without providing a full description again. For example, "And here's the updated table, filtered to only show patients who have been seen in the last 2 years."
- - When important visual information in a video is too complex to include sufficient audio description (i.e. it would slow the content down so much as to impair its utility), an alternative video file should be provided with audio descriptions included.
-* [ ] Color is never the sole method for distinguishing visual content (including in data visualizations).
-
-## Branch References to Change prior to PR
-
-List here any internal references (stated or hyperlinked) that work now because they refer to the named branch, but will not work once this is on the main branch and the named branch is deleted.
-
-* [ ] description or quote, line ___ in file ____
-* [ ] description or quote, line ___ in file ____
-* [ ] description or quote, line ___ in file ____
-```
-
-* Click on the "Preview" tab to see if everything is rendering nicely and there are at least two clickable links -- one to the PR (the top line in the issue) and one to the commit version (6th line).
-* Click "Submit new issue".
-
-## Step 2: Go through checklists
-
-Once you create the issue, then go through and actually evaluate the checklists.
-
-If you're convinced a checklist item is complete, you can click the checkbox without editing the text of the issue -- simply click in the checkbox as if it were a checkbox on any web page. Helpfully, you will see the number of "tasks" at the top of the issue reflect what's been marked as complete.
-
-
-
-If there are problems to resolve before a checklist item is complete, communicate with the author using comments on the issue and @ the author. Be as precise as possible (e.g. what file, what line, what problem are you referring to?). If it's clear that the module has many glaring issues, it's okay to stop work on review, close the PR, and simply ask the author to review the checklist and resubmit. It's not worth a lot of effort to do QA on a module that needs a substantial rework.
-
-
-
-The author may make fixes to their code and commit to the branch. This will simply update their PR with newer commits. This means that you will want to change the version in the top part of the issue (show screenshot of how to edit the main issue) with the commit hash for the now current version. In GitHub, when you look at the PR, the most recent commit is the lowest one down on the page:
-
-
-
-Once you are satisfied with the quality of the module (don't worry, it can always be improved, this is a best effort only, no perfection expected or implied), the last thing to do before the PR is to make sure that any changes to references within the material relating to the branch name are resolved (see the last bit of the copy-paste issue code). Issue a final comment reminding the author to handle this in a new commit. Check that commit and if all is well, approve the PR and close the issue.
-
-In general we should not delete issues or comments on issues, because they provide a useful history of the project.
diff --git a/reproducibility/media/git_commit_2x.png b/reproducibility/media/git_commit_2x.png
deleted file mode 100644
index 4b3e6a451..000000000
Binary files a/reproducibility/media/git_commit_2x.png and /dev/null differ
diff --git a/reproducibility/media/horse.png b/reproducibility/media/horse.png
deleted file mode 100644
index f049912bb..000000000
Binary files a/reproducibility/media/horse.png and /dev/null differ
diff --git a/reproducibility/media/r_console.gif b/reproducibility/media/r_console.gif
deleted file mode 100644
index 8be91deec..000000000
Binary files a/reproducibility/media/r_console.gif and /dev/null differ
diff --git a/reproducibility/media/r_console.png b/reproducibility/media/r_console.png
deleted file mode 100644
index 1c8f60045..000000000
Binary files a/reproducibility/media/r_console.png and /dev/null differ
diff --git a/reproducibility/media/rstudio.gif b/reproducibility/media/rstudio.gif
deleted file mode 100644
index 004af4c85..000000000
Binary files a/reproducibility/media/rstudio.gif and /dev/null differ
diff --git a/reproducibility/media/rstudio.png b/reproducibility/media/rstudio.png
deleted file mode 100644
index 8c1693da5..000000000
Binary files a/reproducibility/media/rstudio.png and /dev/null differ
diff --git a/reproducibility/media/ten_thousand_2x.png b/reproducibility/media/ten_thousand_2x.png
deleted file mode 100644
index 79784ed6a..000000000
Binary files a/reproducibility/media/ten_thousand_2x.png and /dev/null differ
diff --git a/reproducibility/reproducibility.md b/reproducibility/reproducibility.md
deleted file mode 100644
index 66aea1c48..000000000
--- a/reproducibility/reproducibility.md
+++ /dev/null
@@ -1,470 +0,0 @@
-
-
-# Reproducibility, Generalizability, and Reuse: How Technology Can Help
-
-
-
-## Overview
-
-This module provides learners with an approachable introduction to the concepts and impact of **research reproducibility**, **generalizability**, and **data reuse**, and how technical approaches can help make these goals more attainable.
-
-### Is this module right for me?
-
-**If you currently conduct research or expect to in the future**, the concepts we talk about here are important to grasp. This material will help you understand much of the current literature and debate around how research should be conducted, and will provide you with a starting point for understanding why some practices (like writing code, even for researchers who have never programmed a computer) are gaining traction in the research field. **If research doesn't form part of your future plans, but you want to *use* research** (for example, as a clinician or public health official), this material will help you form criteria for what research to consider the most rigorous and useful and help you understand why science can seem to vacillate or be self-contradictory.
-
-### Details
-
-**Estimated time to completion**: 1 hour
-
-**Pre-requisites**: It is helpful if learners have conducted research, are familiar with -- by reading or writing -- peer-reviewed literature, and have experience using data and methods developed by other people. There is no need to have any specific scientific or medical domain knowledge or technical background.
-
-**Format**: This module uses text and video and is intended to accompany an in-person or otherwise synchronous presentation. Materials contained here will allow for review after a live session.
-
-**Learning Objectives**: After completion of this module, learners will be able to:
-
-* Explain the importance of conducting research that is **reproducible** (can be re-done by a different, unaffiliated scientist)
-* Argue in support of a data analysis method that helps research be more reproducible
-* Argue in support of a method in the organization and description of documents, datasets, and other files that helps research be more reproducible
-
-
-
-
-Contents
-========
-
-* [Before We Get Started](#Before-We-Get-Started)
-* [Concepts](#Concepts)
-
- * [Reproducibility](#Reproducibility)
- * [Generalizability](#Generalizability)
- * [Reuse](#Reuse)
- * [A Data Management and Sharing Snafu](#A-Data-Management-and-Sharing-Snafu)
-* [Tools for Better Practices](#Tools-for-Better-Practices)
-
- * [Scripts](#Scripts)
- * [Data management and metadata](#Data-management-and-metadata)
- * [Version Control](#Version-Control)
- * [Dependency Management](#Dependency-Management)
-* [Additional Materials](#Additional-Materials)
-
- * [Center for Open Science](#Center-for-Open-Science)
- * [John Oliver](#John-Oliver)
- * [For Excel Users](#For-Excel-Users)
- * [Mentioned in This Module](#Mentioned-in-This-Module)
-* [Feedback](#Feedback)
-
-## Before We Get Started
-
-Non-technical tools that you'll need along the way, in this module and elsewhere, include a tolerance for ambiguity, the willingness to be a beginner, and practiced rebuttals to the self-critic / impostor syndrome.
-
-The tools we're describing in this material are often complex and are used differently by different constituencies. It can be intimidating to learn how to use **git**, for example, from people who work in large software development teams. Users like these have years of experience with established **agile** project management practices. They may refer to **milestones**, each related to one or more **sprints**. They may use short acronyms (**LGTM!**) within **issues**, including linking to **commits**, and have specific naming conventions for each **branch**, along with demands to "always **squash and merge**!". Teams with established norms may make inside jokes or references that make it seem like any mistake or deviation from procedure is a huge problem.
-
-Do ***you*** have to learn all of this? What's centrally important, and what's optional or ancillary? It can be daunting even to simply get started when you run into scenarios like this one.
-
-Jargon abounds in the tech field, and unfortunately, so does **gatekeeping.**
-
-By gatekeeping, we mean that with or without intent, some people with greater technical experience can suggest that people with less experience don't belong or shouldn't participate. This can take a lot of forms, such as:
-
-* Using lots of TLAs (Three Letter Acronyms) without context or explanation.
-* Snarky, unhelpful comments on Stack Overflow when a new user poses a question that doesn't meet the standards of a well-crafted, reproducible example.
-* Condescension which can mask insecurity, with words like "well, clearly you should..." or "oh, just...", which add nothing to the conversation except a display of dominance.
-
-We propose an approach that is more gate **opening** than gate **keeping** and includes positive regard such as delight at sharing knowledge. We're not alone:
-
-
-
-
-
-(Image used under a Creative Commons Attribution-NonCommercial 2.5 License. Original post at https://xkcd.com/1053.)
-
-
-
-As learners, we ask that you build your core competencies and non-technical skills by:
-
-* Asking instructors to repeat ourselves, spell terms and explicitate acronyms.
-* Asking for help with self-education. Ask your peers and instructors for additional resources as well as criteria for evaluating whether a page you found is useful or not.
-* Calling out your instructors if we act as gatekeepers. We were beginners once, too!
-* Pushing back on your inner critic. You belong here.
-* Having a sense of humor around failing code. Your instructors have to Google syntax, too!
-* Becoming aware of the physical and psychological markers of fatigue, frustration, and the need for a break.
-* Recalling your own ability to master difficult material. The fact that something feels staggeringly difficult today doesn't mean it will always be so challenging.
-
-
-
-True or False: If you're a driven, intelligent researcher, you're unlikely to experience failure as you learn skills like writing code.
-
-[( )] TRUE
-[(X)] FALSE
-
-
-Click to see an explanation of the answer.
-
-FALSE! Failure is something that people who work a lot with technology have to become comfortable with. You can even think of the process of writing code as failing a lot until you get things right, then moving on to failing on the next project. Error codes, mistakes, and confusion with new methods can be frustrating, especially if you have a lot of confidence and competence in your current way of working. It's easier said than done, but you might find it helpful to recall that failure is a critically important tool in science, even if it's a tool we don't love to talk about.
-
-
-
-
-## Concepts
-
-The concepts of **reproducibility**, **generalizability**, and **reuse** will frame the problem space that we'll describe in this module. We'll define these terms and give some examples.
-
-These concepts will be illustrated in a (charming if rage-inducing) YouTube video, *A Data Management and Sharing Snafu*.
-
-After exploring these concepts, we'll go over some methods to address these challenges, using technology.
-
-### Reproducibility
-
-You may hear the terms **"reproducibility"** and/or **"replicability"**, depending on context. Jargon varies by field and you may see either or both terms used to refer to similar goals: the ability to (1) precisely redo analyses on original data to check the original findings or (2) to carefully apply the original methods to new data to test findings in a different dataset. Here, we usually follow what is becoming more customary and use the term "reproducible" to refer to both efforts.
-
-The **"reproducibility crisis"** refers to the problem in peer-reviewed research in which studies *cannot* be reproduced or replicated because of insufficient information, or in which studies *fail* to be reproduced or replicated because of preventable problems in the initial research. This is problematic because it means wasted time and money, reduced public trust in science, unverifiable claims, and lost chances for scientific consensus.
-
-
-
-
If you've ever tried to reproduce an analysis or study procedure from just the methods section of a paper, you probably experienced it as something like "drawing a horse" as shown here.
-
-Providing vague methods that can't be easily reproduced can be a product of many factors influencing manuscript authors, such as:
-
-* Preserving word count for other sections
-* Assuming that implicit steps will be understood by others
-* Feeling vulnerable about close scrutiny
-* Not having methods well documented to begin with
-* Obscuring practices that might prompt manuscript rejection or unfavorable review
-
-When describing a multi-step task that should be able to be carried out by others, explicit step by step instructions are key.
-
-
-
-
-
-Courtesy of artist. [Original work](https://oktop.tumblr.com/post/15352780846)
-
-
-
-
-
-Examples of reproducibility problems exist at small and large scale. Importantly, reproducibility affects not just interaction between scientific collaborators (or rivals), but also between "current you" and "you six months ago". Perhaps you have felt the impact of non-reproducible research:
-
-* Experiencing dread at trying to reproduce your own findings a few months after doing it the first time
-* Being stymied by a collaborator's cryptic notes that don't explain how to do a particular analysis step
-* Being unable to perform required computation because of reliance on expensive, deprecated, or proprietary software or hardware
-* Results that don't replicate due to poor statistical practices, such as "p-hacking", "HARKing", convenient outlier selection, or multiple tests without correction
-
-
-
-Think about it: when have you been frustrated by a process or study that had poor reproducibility? Have you ever put **yourself** in a bad situation because you didn't think ahead to how you'd need to replicate your actions?
-
-
-
-
-Who benefits from reproducible research practices? Choose all that apply.
-
-[[X]] The original authors of novel research
-[[X]] Patient populations
-[[X]] Journal editors and peer reviewers
-[[X]] Taxpayers
-[[X]] Authors of meta-analyses
-[[?]] There are multiple correct answers!
-
-
-Click to see an explanation of the answer.
-
-All of these groups benefit!
-
-Researchers who publish novel studies that can be reproduced will benefit from having more of their peers use their methods, data, and statistical approaches, which means more citations and greater influence. Researchers may also be able to get their manuscripts into higher reputation journals than if their research was not reproducible.
-
-Patient populations benefit from evidence based research that is robust and demonstrated in multiple settings by various teams. Reproducible research is a key part of translating findings to clinical care.
-
-Journal editors and peer reviewers benefit when submitters use reproducible methods because they can more quickly assess the quality of the research, test the statistical assumptions, and ensure that work can be tested by future investigation.
-
-Taxpayers benefit when government funded research has the greatest generalizability and highest quality. Reproducible methods allow a single funded study to have a ripple effect that will continue to influence scientific knowledge well into the future.
-
-Authors of meta-analyses benefit from reproducible practices like data and script sharing, because it allows them to check the findings asserted within a manuscript, compare its findings to those of other manuscripts, and discover differences between analyses that may point to the best methods to use in the practice of science in a given area.
-
-
-
-
-### Generalizability
-
-Research is **generalizable** if findings can be applied to a broad population.
-
-Historically, biomedical and social science research projects have struggled with generalizability due to unrepresentative data. For example, the acronym **"WEIRD"** refers to the tendency of psychological studies to rely on subjects (often undergraduate students) who are disproportionately from **W**estern, **E**ducated, **I**ndustrialized, **R**ich, and **D**eveloped cultures -- cultures which, compared with the global population as a whole, are indeed weird.
-
-
-
-Read more: in 2010 Joseph Henrich and others published a brief in *Nature* coining "WEIRD" to describe skewed participation in psychological studies. The citation below includes a link to this piece in Penn libraries.
-
-
-
-Henrich, Joseph, et al. "Most people are not WEIRD: to understand human psychology, behavioural scientists must stop doing most of their experiments on Westerners." *Nature*, vol. 466, no. 7302, 2010, p. 29. *Gale In Context: Science*, https://link.gale.com/apps/doc/A230766048/SCIC?u=upenn_main&sid=summon&xid=b438bdf6.
-
-
-
-
-
-Until recently, many biomedical studies were conducted on disproportionately male populations and ignored disease presentation, physiology, and pharmacodynamics in women and girls (or even female lab animals). In 1993, the [NIH Revitalization Act](https://www.ncbi.nlm.nih.gov/books/NBK236531/) began requiring NIH-funded clinical research to include women as subjects.
-
-This mandate did not require the same inclusivity in bench research, but NIH encouraged adoption of sex-balanced research outside of human subjects. Two decades after the 1993 legislation, Janine Clayton and Francis Collins wrote a [pointed call to action in *Nature*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5101948/), indicating that bench researchers had not willingly followed best practices and that NIH apparently needed to require the use of female animals and cells:
-
-> There has not been a corresponding revolution in experimental design and analyses in cell and animal research — despite multiple calls to action. Publications often continue to neglect sex-based considerations and analyses in preclinical studies. Reviewers, for the most part, are not attuned to this failure. The over-reliance on male animals and cells in preclinical research obscures key sex differences that could guide clinical studies. And it might be harmful: women experience higher rates of adverse drug reactions than men do. Furthermore, inadequate inclusion of female cells and animals in experiments and inadequate analysis of data by sex may well contribute to the troubling rise of irreproducibility in preclinical biomedical research, which the NIH is now actively working to address.
-
-In early 2016, a policy requiring the consideration of sex as a biological variable (SABV) went into effect, and applications for NIH funding were required to comply with best practices related to sex-inclusive experimental design. Progress in NIH's SABV efforts were [recently reported in a 2020 article](https://pubmed.ncbi.nlm.nih.gov/31971851/).
-
-
-
-[Listen to Janine Clayton speak about scientific rigor and female animal inclusion (5 minute listen).](https://www.wbur.org/hereandnow/2014/05/20/nih-female-animals). A partial transcript accompanies the audio.
-
-
-
-Human bias doesn't just lead to potentially misleading studies, but to potentially misleading research tools as well. For example, in wearable sensor and computer vision development, engineers using skewed samples failed to realize that the optical sensors and computer vision algorithms they created may perform less well on dark skin. See, for example, [a STAT piece about Fitbits](https://www.statnews.com/2019/07/24/fitbit-accuracy-dark-skin/) and [a New York Times opinion piece about bias in facial analysis](https://www.nytimes.com/2018/06/21/opinion/facial-analysis-technology-bias.html).
-
-The challenge of generalizability is closely linked to reproducibility. For example, a study that demonstrates the effectiveness of exercise to improve functioning in depressed suburban teenagers may not generalize to city-dwelling adults. In order to gain broader generalizability, this promising experiment on a limited population should be reproduced in a broader or different population. If the original study is difficult to reproduce, however, such broader application may prove impossible.
-
-Technological solutions alone cannot correct human problems such as recruitment bias or white overrepresentation in research personnel. Careful use of technology can, however, add to research transparency and reproducibility and promote honest disclosure of challenges to generalizability.
-
-
-
-Can research bias be quantified and disclosed using technology? How can bias be reduced and generalizability improved in your research area?
-
-
-
-
-### Reuse
-
-In addition to reproducibility, another important element of research is the ability to **reuse** assets such as data and methods to related research that may not be a direct replication. Researchers may hypothesize that a computer vision approach used to analyze moles might be useful as well in other areas of medicine that need edge detection, such as tumor classification. Longitudinal data that provides rich phenotyping of a cohort of patients with hypermobility syndromes may be useful not just to the original orthopedic researcher community but also to cardiologists interested in comorbid vascular and ANS conditions. The reuse of research data and methods allows researchers to collaborate in ways that advance cross-domain knowledge and professional interaction, as well as honoring the time and energy of human subjects whose data can be leveraged to get as much scientific value as possible.
-
-The reuse of data and other research assets has numerous challenges. You may have experienced problems in this area such as:
-
-* Encountering resistance to sharing data, methods, scripts, or other artifacts
-* Data that is not well described or labeled, or is stored in a "supplemental materials" page without context
-* Overly strict informed consent documents that prevent researchers from reusing their own data or sharing it with colleagues
-
-Survey results from a recent poll conducted by Arcus librarians and archivists (experts in data reusability) appear to indicate that CHOP researchers generally want to make their data reusable, but report (possibly incorrectly) that regulatory, ethical, or technical constraints prevent them from doing so. Planning ahead for data reuse is an important part of grant writing, experimental design, IRB interaction, subject consent, and documentation of data and methods.
-
-
-
-Read more: Want to get a quick overview of some of the privacy practices that regulate responsible data sharing? Check out [a brief article with an overview of privacy practices related to data sharing in research](https://education.arcus.chop.edu/privacy-overview/).
-
-
-
-### A Data Management and Sharing Snafu
-
-This is an approachable and humorous introduction to the practical impact of poor research practices leading to downstream impact.
-
-
-
-As you listen to the video, try to identify problematic research practices which could have been prevented by more careful use of technology. Which of these mistakes have you encountered personally? Which have you committed?
-
-
-
-!?[A Data Management and Sharing Snafu, which depicts an interaction between two scientists, drawn as cartoon animals](https://www.youtube.com/watch?v=66oNv_DJuPc?cc_load_policy=1)
-
-## Tools for Better Practices
-
-Here we aim to provide a broad overview of how some tools and practices (scripts, data management and metdata, version control, and dependency management) can ameliorate some of the challenges we've outlined earlier. Technology alone cannot solve the reproducibility crisis, but tools can support researchers who are trying to apply rigor and clarity to their research efforts.
-
-Areas we won't cover here, but are critical to the consistent production of reproducible science, include researcher bias, research incentivization, publication bias, research culture, mentorship, and more. While we assert that proper use of technology is a **necessary** part of reproducible science, technology alone is not **sufficient**.
-
-
-
-In the "Data Management and Sharing Snafu" on the previous page, we hear some common researcher errors that are hard to prevent with technology, such as a tendency to gloss over questions with a bit of arrogance: "Everything you need to know is in the article...". However, technology would have helped solve some of the problems that Dr. Judy Benign had to deal with. Which of these problems are examples of the kinds of problems that may have a potential technological fix?
-
-[[ ]] Unwillingness to share data
-[[X]] Lack of clarity about what variable names mean
-[[X]] Not remembering where data is located
-[[X]] Software becoming unavailable
-[[ ]] Mentors relying on postdocs to do most of the work
-[[?]] Hint: We consider three of these to be problems with potential technological fixes!
-
-
-Click to see an explanation of the answer.
-
-While technology alone can't motivate researchers to change some behavior, like being skeptical about data sharing in general or turfing much of the hard work of analysis to junior researchers, technology can help save us from ourselves in other ways.
-
-For example, it's understandable that short variable names may be hard to connect to their full meaning, and the creation of a **data dictionary** might have helped avoid the problem of trying to decode the meaning of "SAM1" and "SAM2".
-
-Researchers are busy, lab churn is a fact of life, and staff like research assistants and postdocs can move on in the middle of a project. That's why consistently applying **data management** best practices such as shared drives, version controlled repositories, or automated backup can be helpful in preventing misplaced files.
-
-The careful listing of **dependencies** like software can help quantify the risk of data becoming unusable, and data management practices can include saving plain text versions of encoded data, so that if a proprietary data format is no longer easily usable, a version of the data exists that can still have some utility.
-
-
-
-
-### Scripts
-
-**Scripts**, in this context, are a series of computer code instructions that handle elements of research such as:
-
-* Ingesting data (for example, accessing a .csv file or downloading the latest data from REDCap)
-* Reshaping and cleaning data (such as removing rows that don't meet given conditions for completeness or correctness, combining data from two or more sources, or creating a new field using two or more existing fields)
-* Reporting statistical characteristics (for example, finding quartiles, median values, or standard deviations)
-* Conducting statistical tests (e.g. ANOVA, two-sample t-tests, Cohen's effect size)
-* Creating models (such as a linear or logistic model or a more complex machine learning algorithm like clustering or random forest classification)
-* Saving interim datasets (e.g. storing a "cleaned" version of data for use in later steps, or creating a deidentified version of data)
-* Creating data visualizations (such as boxplots, Q-Q plots, ROCs, and many more)
-* Communicating methods and findings in a step-by-step way (e.g. writing a methods section from within the steps of analysis)
-* And more....
-
-Scripts may be written in free, open source tools like R, Python, Julia, and Octave, or, with care, can be extracted from commercial tools (for instance, by using a syntax file). It's important to realize that good scripts are complete and don't rely on human memory of steps that aren't recorded in the script. For example, using a point-and-click solution like Excel to clean data prior to analyzing it using code relies on human memory of what was done in Excel.
-
-We can contrast scripts with tools that don't record every step explicitly. Excel is one example we've already touched on. You may also have been exposed to SAS, SPSS, and Stata, all of which have a point-and-click element as well as the possibility of scripted analysis. However, many users of these tools depend on un-scripted actions such as cleaning data beforehand in a separate program and the use of point-and-click, menu driven selections. For this reason, we suggest the use of R and Python for most research purposes. These are widely used, well-documented and tested, and have a scientific and medical user base that is friendly for beginners. Additionally, these tools are free and open-source, which allows for greater reproducibility, including in lower-resourced settings. However, learning these tools requires an investment of time and energy that can be difficult for a busy clinician or scientist to justify, especially when one has already developed considerable experience in point-and-click analysis.
-
-It's worth considering the words of an archaeological team that wrote an article about reproducible research for a lay audience in a [2017 *Slate* article](https://slate.com/technology/2017/07/how-to-make-a-study-reproducible.html):
-
->However, while many researchers do this work by pointing and clicking using off-the-shelf software, we tried as much as possible to write scripts in the R programming language.
Pointing and clicking generally leaves no traces of important decisions made during data analysis. Mouse-driven analyses leave the researcher with a final result, but none of the steps to get that result is saved. This makes it difficult to retrace the steps of an analysis, and check the assumptions made by the researcher.
...
It's easy to understand why many researchers prefer point-and-click over writing scripts for their data analysis. Often that's what they were taught as students. It's hard work and time-consuming to learn new analysis tools among the pressures of teaching, applying for grants, doing fieldwork and writing publications. Despite these challenges, there is an accelerating shift away from point-and-click toward scripted analyses in many areas of science.
-
-
-### Data management and metadata
-
-**Data management** includes the organization, annotation, and preservation of data and metadata related to your research or clinical project.
-
-Data management is a critical pain point for many data users. What's the best way to wrangle the data files needed to carry out a project? Should documents be stored in a common drive? Using what kinds of subfolders? How should researchers deal with emails that are sent back and forth between researchers to define a specific cohort? What is the long-term storage strategy for this data? Are there ways to save the data in multiple formats to accomodate unknown future needs? Even a small project organized by a single researcher can be complex, and when a team of several researchers and supporting staff are involved, individual data management practices can collide. A few topics that fall under the category of data management include:
-
-* File naming standards for project files
-* The format in which data is collected, and where it is stored
-* How data is backed up and kept private
-* Where regulatory files such as protocols are kept
-* How processes and procedures are stored and kept up to date
-* Who has access to what assets and when that access expires
-
-Importantly, NIH will require data sharing & management plan for all grants starting January 2023, and it's worth practicing the skills for developing a robust plan.
-
-**Metadata** is, in its simplest definition, data about data. Some examples of metadata might include:
-
-* Who collected the data?
-* When was the data collected?
-* What units does the data use?
-* What kind of thing is the data measuring?
-* What are the expected values?
-* Are there any codes for specific cases (e.g. missing vs. unwilling to answer vs. does not apply)?
-
-Metadata can be found in many places. Sometimes it's implicit, as when it appears in variable names. The variable "weight_kg", for example, discloses both the measure and the units. Often metadata is found more fully explained in **data dictionaries** or **codebooks**, where variables in a dataset are described more completely. Sometimes metadata can be found almost in passing, mentioned in an abstract or in the methods section of a paper, or in some descriptive text that accompanies a data download or a figure.
-
-Creating useful metadata is a crucial step in reproducible science. It's essential in helping run an efficient project involving multiple people, since helpful metadata can help reduce incorrect data collection, recording, and storage. Metadata can help explain or contextualize some findings (e.g. when the time of day of a blood draw affects lab results). It can also support the use, discovery, and access of data over time.
-
-Metadata can exist at various levels of a project. For example, some metadata is overarching and describes an entire project (e.g. the institution that oversaw the data collection), while other metadata adheres to a specific data field (e.g. the make and model of a medical device that gave a certain measurement).
-
-REDCap is one example of software that explicitly creates a data dictionary that includes information such as the name and description of a variable, the kind of input that can appear there (alphanumeric, numeric, email format, etc.), and whether a field could identify a subject or not.
-
-
-
-Discover: The Arcus program at Children's Hospital of Philadelphia has a team of librarians and archivists who have created materials to help CHOP scientists with data management. Their [Research Data Management Resources](https://www.research.chop.edu/arcus/resources) tend to be very practical and can be used right away to improve data management!
-
-
-
-### Version Control
-
-Version control is the discipline of tracking changes to files in a way that captures important information such as who made the change and why. It also allows for reversion to previous versions of a file.
-
-Many of us use "home grown" version control systems, such as using file names to capture who most recently added comments to a grant proposal or the date of a particular data download. The difficulty with this is that each member of a team or lab may use different file naming protocols, and within a few months, the number of files can proliferate wildly. Collaborators may feel unsure about deleting old versions of files, and data and file hoarding leads to delays and confusion.
-
-Technological solutions for version control have been around for decades, and one system in particular has won the bulk of the market share in version control -- **git**. Git is free, open source version control software.
-
-We won't go into details at this point, but one of the helpful aspects of git is that it allows you to see what changed, when, and by whom, in your files, along with a helpful (we hope) comment describing the change. See below for a humorous interpretation (which isn't that far off the mark).
-
-
-
-
-
-(Image used under a Creative Commons Attribution-NonCommercial 2.5 License. Original post at https://xkcd.com/1296 .)
-
-
-
-Git and GitHub are distinct organizations with different products. Git is a free, open-source version control system, and GitHub is a company that provides services to make it easier to use git for software development and related uses. GitHub has free tier use as well as paid services.
-
-Some institutions, [including Children's Hospital of Philadelphia](https://github.research.chop.edu), pay for an enterprise version of GitHub that may be accessible only to institutional users on a secure network, and not available to the general public. For science that can be more broadly shared, many researchers use the publicly available [GitHub.com](https://github.com), which is a website run by GitHub. As an example of another GitHub resource, many git users find that the [GitHub Desktop](https://desktop.github.com/) software is useful for working with git on their local computers.
-
-### Dependency Management
-
-If you've ever created a slide show in one computer only to have it look terrible in another, you know the problem that **dependencies** can cause. Dependencies are requirements such as (in our slide show example) having the same fonts installed, having a default group of settings turned on, having the same version of PowerPoint or other software running, and having access to particular image or sound files. Dependencies that are well-documented and understood will help make research more reproducible. Dependencies that are undocumented or not known about will inevitably cause problems. Sometimes it isn't clear whether something is a hard dependency (this value or program *must* be the same as what you used) or just a circumstance (you used a particular version of Python but there's no reason to think that previous or subsequent versions wouldn't work just as well). For this reason, recording both known and possible dependencies is a helpful practice. Common dependencies in research and data analytics include:
-
-* Operating system: does your use of particular software require the use of Microsoft Windows 10 or later?
-* Regional data formatting: does your analysis assume that decimal values use a period, not a comma, to set off the decimal value?
-* Program versions: did you conduct your analysis in R 3.6? Have you tried it in 3.7?
-* Technical data formatting: does your analysis expect a .csv of data with specified columns holding certain measures?
-* Access to reference files: are you aligning to hg38 or to a previous reference genome?
-* Hardware requirements: does your research paradigm require a particular kind of hardware for generating stimuli or recording response?
-
-Dependency management is an approach that makes it easier to determine the precise set of tools and inputs required by your data collection and analysis. Every research effort should document which tools were used and which version of each was employed. This can be as simple as a text file, or could include installation instructions or even a file that includes the exact versions of software used in the original research, in order to create a computer environment that can perform the analysis under the original conditions.
-
-## Additional Materials
-
-Enjoy some supplemental materials that you might find useful, and feel free to suggest additions!
-
-
-### Center for Open Science
-
-The Center for Open Science is one of the foremost thought leaders in reproducible science. Their materials provide rich reading material and practical support for researchers and explore topics that are beyond the scope of this training, including how incentivization is contributing to ineffective science.
-
->The mission of the Center for Open Science (COS) is to increase openness, integrity, and reproducibility of research. We envision a future scholarly community in which the process, content, and outcomes of research are openly accessible by default. All scholarly content is preserved and connected and transparency is an aspirational good for scholarly services. All stakeholders are included and respected in the research lifecycle and share pursuit of truth as the primary incentive and motivation for scholarship. Achieving the mission requires culture change in the incentives that drive researchers’ behavior, the infrastructure that supports research, and the business models that dominate scholarly communication. This Strategic Plan is the result of collective effort by the COS team, board, and community stakeholders.
-
-COS Website
-
-### John Oliver
-
-A 20-minute long clip from John Oliver's infotainment show "Last Week Tonight" includes off-color language and adult references but is a great introduction to topics including:
-
-* p-hacking
-* non-publication of null findings (the "file drawer" problem)
-* replication studies
-* incentivization problems in research
-* scientific communication in mass media
-* generalizability
-* public support of science
-* industry funding
-
-Among many other quotable moments, Oliver drives home the point of how research methods have important public funding and public policy implications.
-
->"In science, you don't just get to cherry-pick the parts that justify what what you were going to do anyway! ... And look, this is dangerous... that is what leads people to think that manmade climate change isn't real or that vaccines cause autism...."
-
-To watch this (intermittently NSFW) segment, [watch it directly in YouTube](https://www.youtube.com/watch?v=0Rnq1NpHdmw).
-
-### For Excel Users
-
-* An educator for the Arcus program at Children's Hospital of Philadelphia [shares some harm reduction techniques for Excel users](https://education.arcus.chop.edu/excel-caveats/)
-* A former user of Excel [shares why he's moved on to using scripted code and gives helpful hints to those still using Excel](https://education.arcus.chop.edu/the-spreadsheet-betrayal/)
-
-### Mentioned in This Module
-
-* [How to Draw a Horse](https://oktop.tumblr.com/post/15352780846)
-* [Henrich, Joseph, et al. "Most people are not WEIRD: to understand human psychology, behavioural scientists must stop doing most of their experiments on Westerners."](https://link.gale.com/apps/doc/A230766048/SCIC?u=upenn_main&sid=summon&xid=b438bdf6)
-* [NIH Revitalization Act](https://www.ncbi.nlm.nih.gov/books/NBK236531/)
-* [NIH to balance sex in cell and animal studies](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5101948/)
-* [Sex as a Biological Variable: A 5-Year Progress Report and Call to Action](https://pubmed.ncbi.nlm.nih.gov/31971851/)
-* [NIH: Scientists Must Include Female Animals In Testing](https://www.wbur.org/hereandnow/2014/05/20/nih-female-animals)
-* [Fitbits and other wearables may not accurately track heart rates in people of color](https://www.statnews.com/2019/07/24/fitbit-accuracy-dark-skin/)
-* [When the Robot Doesn’t See Dark Skin](https://www.nytimes.com/2018/06/21/opinion/facial-analysis-technology-bias.html)
-* [Data Sharing and Privacy: A Very Cursory Overview](https://education.arcus.chop.edu/privacy-overview/)
-* [A Data Management and Sharing Snafu](https://www.youtube.com/watch?v=66oNv_DJuPc?cc_load_policy=1)
-* [Here’s How We Made Our Study Reproducible](https://slate.com/technology/2017/07/how-to-make-a-study-reproducible.html)
-* [Arcus Resources](https://www.research.chop.edu/arcus/resources)
-* [Git Commit](https://xkcd.com/1296)
-* [Enterprise GitHub](https://github.research.chop.edu)
-* [GitHub.com](https://github.com)
-* [GitHub Desktop](https://desktop.github.com/)
-
-## Feedback
-
-In the beginning, we stated some learning objectives:
-
-After completion of this module, learners will be able to:
-
-* Explain the importance of conducting research that is **reproducible** (can be re-done by a different, unaffiliated scientist)
-* Argue in support of a data analysis method that helps research be more reproducible
-* Argue in support of a method in the organization and description of documents, datasets, and other files that helps research be more reproducible
-
-Now that you've completed this module, we ask you to fill out a brief (5 minutes or less) survey to let us know:
-
-* If we achieved the learning objectives
-* If the module difficulty was appropriate
-* If we gave you the experience you expected
-
-We gather this information in order to iteratively improve our work. Thank you in advance for filling out [our brief survey](https://redcap.chop.edu/surveys/?s=KHTXCXJJ93&module_name=%22Reproducibility,+Generalizability,+and+Reuse:+How+Technology+Can+Help%22)!