SCERC2 data analysis

Code required to reproduce [Cheng and Morris et al.,eClinicalMedicine 2025].

folder structure

DEQA_templates = downloaded version of the data extraction and quality assessment table that our team members see on Covidence.
Code = any code, including bash, python, R scripts and notebooks.
data = The downloaded .csvs from Covidence
- data extraction table when downloaded requires a suffix change to DE.csv to ensure uniqueness to find
- please AVOID changing the files manually, and only read into and write from scripts to maximise reproducibility. When reading in to R, be reminded to set read.csv(check.names = FALSE) to avoid characters being altered.
- /lineage is data used for lineage imputation.
output
- contains figures, intermediate and final tables.
- Tables that are intrinsic to the Data Extraction template are in /output/processed_tables, they include:
  - Baseline characteristics (Age and sex distribution of the cancer and non-cancer cohort respectively)
  - Co-morbidities
  - outcome tables which includes
    - Primary Cancer Site
    - Metastasis status
    - Treatment type
    - COVID variants
    - COVID vaccination status
- the exception being /data/review_241469_20240110223959_DE_knownDx_peerReviewed.csv which is generated by data/0.1_Cleaning.Rmd as it is just a cleaning script.
team_members_automated.xlsx

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
DEQA_templates		DEQA_templates
code		code
data		data
output		output
renv		renv
.Rprofile		.Rprofile
.gitignore		.gitignore
README.md		README.md
SCERC2.Rproj		SCERC2.Rproj
renv.lock		renv.lock
team_members_automated.xlsx		team_members_automated.xlsx