Skip to content

gepi features

SchSascha edited this page Oct 20, 2016 · 6 revisions

GePi features

GePi is tailored towards accelerating the search for regulatory interactions as described in the scientific literature (PubMed and PMC). Specifically, GePi runs in batch mode, that is, all user entries are processed at once and results are delivered reasonably fast. Results are given as a table or summary plots, like bar or pie charts. All results are downloadable.

Key features:

frontend

  • accept one or two arbitrary long input lists
    • minimum: entrez gene ids
    • optional: further id types (swissprot, ensembl, ...)
  • two modes
    • one list: find gepis between all given items
    • two lists: find all gepis between each item of list 1 and each item of list 2
  • provide summarized results
    • spreadsheet like output
      • tentative column header: Source ID; Source name; Target ID; Target name; Publication ID; Sentence
    • graphical output
      • basic charts (bar, pie), optional: sankey chart for m:n relationships

backend

  • process entrez IDs, optional: process swissprot IDs
  • Use GeNo to find / tag genes and proteins
  • use BioSem to find / tag gepis

Current status:

Which modules are fully or partially present, which need to be developed anew?

We need:

  1. A database listing all genes with homolog links known by GePi (exists: https://github.com/khituras/gepi/wiki/Gene-database; Sascha)
  2. Event analysis of Pubmed (exists: https://github.com/khituras/gepi/wiki/jules-preprocessing-pipelines; Franz)
  3. Event analysis of PMC (exists: https://github.com/khituras/gepi/wiki/jules-preprocessing-pipelines; Franz)
  4. Export of UIMA analysis results into a database in a way that corresponds to the gene database. We use ElasticSearch. (exists: https://github.com/khituras/gepi/wiki/semedico-app; Franz)
  5. A web application backend communicating with the ElasticSearch index, i.e. sending queries and retrieving results, possibly doing more complicated functionality (e.g. common interaction partners). (we should be able to use major portions of the semedico-core: https://www.coling.uni-jena.de/wiki/index.php/Semedico-core; Erik)
  6. A web application frontend, allowing users to easily doing batch queries and visualizing results in a comprehensible way by end users. (the code for this is in this GitHub repository; still missing charts and a connection to the yet-to-be-built backend; Erik, Sascha)

Action items / main responsibilities:

  • Sascha: 1, 6
  • Franz: 2, 3, 4
  • Erik: 5, 6