Skip to content

ToledoEM/msigdf

 
 

Repository files navigation

MSigDF

The Molecular Signatures Database (MSigDB) in a tidy data frame.

This is the updated version of the archived repo of @stephenturner

Current version: v2026.1.

DOI

Important Notices

Installation

# Install devtools if you don't already have it
install.packages("devtools")

# Just get the data
devtools::install_github("toledoem/msigdf")

# Get the data and build the vignette (requires tidyverse, knitr, rmarkdown)
devtools::install_github("toledoem/msigdf", build_vignettes = TRUE)

Example usage

See the package vignette for more examples.

library(dplyr)
library(msigdf)
#vignette("msigdf")
msigdf.human %>%
  filter(category_code=="h") %>%
  head
# A tibble: 6 x 4
  category_code category_subcode geneset                          symbol 
  <chr>         <chr>            <chr>                            <chr>  
1 h      all              HALLMARK_TNFA_SIGNALING_VIA_NFKB JUNB   
2 h      all              HALLMARK_TNFA_SIGNALING_VIA_NFKB CXCL2  
3 h      all              HALLMARK_TNFA_SIGNALING_VIA_NFKB ATF3   
4 h      all              HALLMARK_TNFA_SIGNALING_VIA_NFKB NFKBIA 
5 h      all              HALLMARK_TNFA_SIGNALING_VIA_NFKB TNFAIP3
6 h      all              HALLMARK_TNFA_SIGNALING_VIA_NFKB PTGS2 
> msigdf.human %>% 
    filter(geneset=="KEGG_NON_HOMOLOGOUS_END_JOINING") %>% 
      group_by(category_subcode) %>% 
        top_n(n = 10)

Since now there are legacy and KEGG gene sets

Selecting by symbol
# A tibble: 20 × 4
# Groups:   category_subcode [2]
   category_code category_subcode geneset                         symbol
   <chr>         <chr>            <chr>                           <chr> 
 1 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING LIG4  
 2 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING MRE11 
 3 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING NHEJ1 
 4 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING POLL  
 5 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING POLM  
 6 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING PRKDC 
 7 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING RAD50 
 8 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING XRCC4 
 9 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING XRCC5 
10 c2            cp.kegg_legacy   KEGG_NON_HOMOLOGOUS_END_JOINING XRCC6 
11 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING LIG4  
12 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING MRE11 
13 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING NHEJ1 
14 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING POLL  
15 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING POLM  
16 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING PRKDC 
17 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING RAD50 
18 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING XRCC4 
19 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING XRCC5 
20 c2            cp               KEGG_NON_HOMOLOGOUS_END_JOINING XRCC6 

Building

Code for building this data is in data-raw/. Update data-raw/data_url.yml with the new MSigDB version and URLs (the top-level version: is now used by the scripts).

  1. Download GMT files: bash data-raw/get_gmt.sh

  2. Build the data frames and save to data/: Rscript data-raw/msigdf.R

The build script reads the version from data-raw/data_url.yml, so you no longer need to edit version strings inside the R code.

See the package vignette for more.

License

MSigDF by US is marked CC0 1.0

CC logo

Zero logo

Packages

 
 
 

Contributors

Languages

  • R 79.2%
  • Shell 17.7%
  • CSS 3.1%