-
Notifications
You must be signed in to change notification settings - Fork 1
Description
https://pm.arcus.chop.edu/browse/EDU-2056
Some ideas for new pathways that would be of use to Arcus researchers (note that a lot of these modules/resources included below would need to be updated or broken down into smaller modules, or created from scratch for those that don't exist at all yet):
Good enough practices in scientific computing
Just the basics, focused on visual/interactive tools rather than CLI.
- reproducibility
- good enough practices in scientific computing
- ten simple rules for biologists learning to program
- Intro to version control
- git cli vs gui
- literate statistical programming
- demystifying CLI
- Three (3) things to do when starting out in Data Science
Take charge of your files: Version control, data sharing, and publishing code for researchers
Focused on building useful habits to create high-quality code and data that will accelerate your own (and others'!) research. This could be general, as proposed here, or (probably better) we could tailor this for different tools. As long as we're brainstorming, I'd love this to start with a brief quiz like "which of the following tools do you already use or would like to use? and what's your computer's OS?"
- ten simple rules for taking advantage of git and github
- getting started with git on github
- git setup on unix/windows
- creating a git repository
- Arcus guide New to GitHub
- exploring the history of a git repo
- how to write a good commit message
- how to write a good readme
- version control your writing
- getting more from rmarkdown (probably update this to quarto and/or add a specific example working through writing a manuscript)
- rstudio projects
- using git within rstudio
- conda environments for reproducibility
- getting started with docker for research
- demystifying online data repositories (i'm thinking maybe CHOP dataverse, zenodo, maybe GH?)
- best practices for publishing your research code (maybe https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011031#pcbi.1011031.ref005)
- ten simple rules for working with other people's code
Machine learning concepts for biomedical researchers
This is all language-agnostic, focused on understanding the important underlying considerations for ML, learning about specific tests and what they're for, etc. Would not include hands-on practice, although I would love for there to be "Applied Machine Learning in R" and "Applied Machine Learning in Python" pathways as well.
- demystifying ML
- bias variance tradeoff
- cross validation
- Logistic Regression
- Logistic Regression, Details Part 1: Coefficients
- Logistic Regression, Details Part 2: Maximum Likelihood
- Logistic Regression, Details Part 3: R-squared and its p-value
- Saturated Models and Deviance Statistics
- Deviance Residuals
- ROC and AUC (and/or ROC and AUC in R)
- assessing ML models (accuracy, precision, recall, etc.)
- prediction with regression models
- prediction with classification models
- hierarchical clustering
- k-means clustering
- k nearest neighbors
- principal components analysis
- lasso regression
- gradient descent
- Stochastic Gradient Descent
- One-Hot, Label, Target and K-Fold Target Encoding
- Decision and Classification Trees
- Decision Trees Part 2: Feature Selection and Missing Data
- Regression Trees
- How to Prune Trees (Cost Complexity Pruning)
- Random Forests Part 1: Building, using and evaluating
- Random Forests Part 2: Missing data and clustering
- AdaBoost
- Support Vector Machines (SVM)
- data exploration for machine learning
- garbage in garbage out / how to do high quality ML work
- demystifying LLM