-
Notifications
You must be signed in to change notification settings - Fork 77
Omnipath processor #1158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Omnipath processor #1158
Changes from all commits
Commits
Show all changes
100 commits
Select commit
Hold shift + click to select a range
c3eaa06
Initial commit of Omnipath PTM query
johnbachman c4bb529
Add func to hgnc_client to get db_refs from hgnc name
johnbachman 2dc97c7
Get mod statements from omnipath
johnbachman f1ea514
Add docs to funcs
johnbachman 91395c3
Build statements for other mod types
johnbachman a21bb52
Move omnipath_client to sources
johnbachman fdd3897
Add function to get receptor-ligand interactions
kkaris e2492f2
WIP Add prototype generator of rec-lig statements
kkaris 3935b28
Formatting
kkaris f248842
Better docstring
kkaris 39cf353
Get agents contained in op complex string
kkaris 8298e4c
Handle more cases of up_id_strings
kkaris e83667f
Add helper for making text refs
kkaris 13c7e56
Use PyPath instead of web client
kkaris 65a7263
Manually rebase api.py
kkaris 705046d
Manually rebase omnipath_processor
kkaris 59aba06
Re-rename omnipath_client -> processor
kkaris 4a34e24
Update init
kkaris b2ef37a
Update with new classes
kkaris d60b554
Clean up imports, global scope variables
kkaris 7aae009
Fix docstring, add function to delete cache
kkaris 29077a8
Improve hidden method description
kkaris 2c328ae
Add more annotations
kkaris 80891ae
Only add cellphonedb info for cellphonedb stmts
kkaris 21203f0
Comments, info, docstrings
kkaris 1ff1a54
Add check if pypath can be loaded
kkaris 5c74b2e
Inform user PyPath is not available
kkaris d141209
Bubble up the api functions to __init__
kkaris ecd9d9b
Try import in processor
kkaris 6b8062b
Bubble up processors in __init__
kkaris e0d844e
Add pypath to extras_require in setup.py
kkaris 16057be
Add omnipath to packages, format list
kkaris c809c7d
Make warning more informative
kkaris a8601d8
Update imports, test general web api
kkaris b760c4a
Add mods test
kkaris 05cc18f
Test import PyPath
kkaris e11046f
Start test for ligand-receptor processor
kkaris 5cd47f9
Update docstring, fix args bug
kkaris 0eccf07
Add more tests for ligand-receptor
kkaris a9628a7
Fix bug testing web api
kkaris e77103f
Update travis with PyPath dependencies
kkaris a60e287
Sudo all setup commands for igraph
kkaris 3f5790e
Add dependency url for PyPath
kkaris ba2daef
Try listing pypath as installable from github
kkaris 502dabf
Try explicit pip-install for pypath
kkaris 0f9e435
Add dependency for pycurl
kkaris fea1e55
Add more import tests for pypath
kkaris 67a5455
Fix libigraph.so.0 linking issue
kkaris cea2b5d
Try hpmr as test instead of cellphonedb
kkaris 054237f
Mark PyPyPath test as no-travis
kkaris 79d252f
Catch import errors in pypath network test
kkaris 443f405
Test stmts were produced
kkaris 41c753c
Revert to one class, move preprocessing to api.py
kkaris d41f40b
Comments, docstrings
kkaris 4db186f
Fix bug in _delete_cache
kkaris 7a4483e
Update tests
kkaris b72c728
Update __init__
kkaris dc70526
Add comments
kkaris b8c3787
Add helper getting interactions
kkaris 0dd266c
WIP First pass at new json format
kkaris 9de7699
Remove pypath dependencies and functions
kkaris 1e98867
Put info in annotations
kkaris a56ed97
Build single stmt with multiple evidences
kkaris 1605d50
Update exposed functions
kkaris 600ab58
Count skips instead of logging
kkaris 0220f79
Update fields, remove genesymbols
kkaris 84e3d55
Replace ; with , in docstring
kkaris 2b092a8
Add description to docstring
kkaris d05e464
Skip of protmapper is only source
kkaris 13aa147
Remove unused code
kkaris d218b02
Remove unused fields, update docstring
kkaris cfa7666
Add test for ligrec from web
kkaris 0d30be8
Update ptm processing
kkaris 3e9089d
Set source_db as source_sub_id in annotations
kkaris 080f47b
Remove unused imports in omnipath processor
kkaris 4821636
Remove pypath and dependencies from setup.py
kkaris a6d80a3
Remove docstring overflow
kkaris 5c58d36
Rename helper method
kkaris 409c371
Log count of bad pmids (len>8)
kkaris 5e150ce
Add bound condition regulations
kkaris 0ceca9e
Add docstring for process_from_web
kkaris 85528bb
Add omnipath to docs
kkaris b56702c
Remove pypath dependencies in travis config
kkaris 8c3d2ef
Add sources/omnipath/index.rst
kkaris b08c3a5
Reverse removed newline
kkaris 0a58335
Fix grammar
kkaris 4b10355
Write short blurb about module and its usage
kkaris 5dacf51
Standardize logger name
kkaris 61f9260
Directly use name standardization
kkaris 59266c7
Do not initialize the processing from init
kkaris 6464daa
Initialize agent name with up_id as safeguard
kkaris 8ece227
Switch to ontology standardizing in tests
kkaris 45fa569
Add omnipath to list of sources in README
kkaris 8f9713c
Add skips for already known sources
kkaris 82a13df
Remove logger warning
kkaris 74acab5
Add omnipath to belief json
bgyori 943b66d
Add omnipath to HTML assembler
bgyori 59e3a0a
Remove print
bgyori 087987e
Touch up documentation
bgyori fd5cef4
Add omnipath test UP ids
kkaris File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| OmniPath (:py:mod:`indra.sources.omnipath`) | ||
| =========================================== | ||
|
|
||
| .. automodule:: indra.sources.omnipath | ||
| :members: | ||
|
|
||
| OmniPath API (:py:mod:`indra.sources.omnipath.api`) | ||
| --------------------------------------------------- | ||
|
|
||
| .. automodule:: indra.sources.omnipath.api | ||
| :members: | ||
|
|
||
| OmniPath Processor (:py:mod:`indra.sources.omnipath.processor`) | ||
| --------------------------------------------------------------- | ||
|
|
||
| .. automodule:: indra.sources.omnipath.processor | ||
| :members: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| """ | ||
| The OmniPath module accesses biomolecular interaction data from various | ||
| curated databases using the OmniPath API (see | ||
| https://saezlab.github.io/pypath/html/index.html#webservice) and processes | ||
| the returned data into statements using the OmniPathProcessor. | ||
|
|
||
| Currently, the following data is collected: | ||
| - Modifications from the PTMS endpoint https://saezlab.github.io/pypath/html/index.html#enzyme-substrate-interactions | ||
| - Ligand-Receptor data from the interactions endpoint https://saezlab.github.io/pypath/html/index.html#interaction-datasets | ||
|
|
||
| To process all statements, use the function `process_from_web`: | ||
|
|
||
| >>> from indra.sources.omnipath import process_from_web | ||
| >>> omnipath_processor = process_from_web() | ||
| >>> stmts = omnipath_processor.statements | ||
| """ | ||
| from .api import process_from_web | ||
|
kkaris marked this conversation as resolved.
|
||
| from .processor import OmniPathProcessor | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| import logging | ||
| import requests | ||
| from .processor import OmniPathProcessor | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| op_url = 'http://omnipathdb.org' | ||
|
|
||
|
|
||
| def process_from_web(): | ||
| """Query the OmniPath web API and return an OmniPathProcessor. | ||
|
|
||
| Returns | ||
| ------- | ||
| OmniPathProcessor | ||
| An OmniPathProcessor object which contains a list of extracted | ||
| INDRA Statements in its statements attribute. | ||
| """ | ||
| ptm_json = _get_modifications() | ||
| ligrec_json = _get_interactions() | ||
| op = OmniPathProcessor(ptm_json=ptm_json, ligrec_json=ligrec_json) | ||
| op.process_ptm_mods() | ||
| op.process_ligrec_interactions() | ||
| return op | ||
|
|
||
|
|
||
| def _get_modifications(): | ||
| """Get all PTMs from Omnipath in JSON format. | ||
|
|
||
| Returns | ||
| ------- | ||
| JSON content for PTMs. | ||
| """ | ||
| params = {'format': 'json', | ||
| 'fields': ['curation_effort', 'isoforms', 'references', | ||
| 'resources', 'sources']} | ||
| ptm_url = '%s/ptms' % op_url | ||
| res = requests.get(ptm_url, params=params) | ||
| if not res.status_code == 200 or not res.text: | ||
| return None | ||
| else: | ||
| return res.json() | ||
|
|
||
|
|
||
| def _get_interactions(datasets=None): | ||
| """Wrapper for calling the omnipath interactions API | ||
|
|
||
| See full list of query options here: | ||
| https://omnipathdb.org/queries/interactions | ||
|
|
||
| Parameters | ||
| ---------- | ||
| datasets | ||
| A list of dataset names. Options are: | ||
| dorothea, kinaseextra, ligrecextra, lncrna_mrna, mirnatarget, | ||
| omnipath, pathwayextra, tf_mirna, tf_target, tfregulons | ||
| Default: 'ligrecextra' | ||
|
|
||
| Returns | ||
| ------- | ||
| dict | ||
| json of database request | ||
| """ | ||
| interactions_url = '%s/interactions' % op_url | ||
| params = { | ||
| 'fields': ['curation_effort', 'entity_type', 'references', | ||
| 'resources', 'sources', 'type'], | ||
| 'format': 'json', | ||
| 'datasets': datasets or ['ligrecextra'] | ||
| } | ||
| res = requests.get(interactions_url, params=params) | ||
| res.raise_for_status() | ||
|
|
||
| return res.json() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,205 @@ | ||
| from __future__ import unicode_literals | ||
| import logging | ||
| from collections import Counter | ||
| from indra.ontology.standardize import standardize_agent_name | ||
| from indra.statements import modtype_to_modclass, Agent, Evidence, Complex, \ | ||
| get_statement_by_name as stmt_by_name, BoundCondition | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| ignore_srcs = [db.lower() for db in ['NetPath', 'SIGNOR', 'ProtMapper', | ||
| 'BioGRID', 'HPRD-phos', 'phosphoELM']] | ||
|
|
||
|
|
||
| class OmniPathProcessor(object): | ||
| """Class to process OmniPath JSON into INDRA Statements.""" | ||
| def __init__(self, ptm_json=None, ligrec_json=None): | ||
| self.statements = [] | ||
| self.ptm_json = ptm_json | ||
| self.ligrec_json = ligrec_json | ||
|
|
||
| def process_ptm_mods(self): | ||
| """Process ptm json if present""" | ||
| if self.ptm_json: | ||
| self.statements += self._stmts_from_op_mods(self.ptm_json) | ||
|
|
||
| def process_ligrec_interactions(self): | ||
| """Process ligand-receptor json if present""" | ||
| if self.ligrec_json: | ||
| self.statements += self._stmt_from_op_lr(self.ligrec_json) | ||
|
|
||
| def _stmts_from_op_mods(self, ptm_json): | ||
| """Build Modification Statements from a list of Omnipath PTM entries | ||
| """ | ||
| ptm_stmts = [] | ||
| unhandled_mod_types = [] | ||
| annot_ignore = {'enzyme', 'substrate', 'residue_type', | ||
| 'residue_offset', 'references', 'modification'} | ||
| if ptm_json is None: | ||
| return [] | ||
| for mod_entry in ptm_json: | ||
| # Skip entries without references | ||
| if not mod_entry['references']: | ||
| continue | ||
| enz = self._agent_from_up_id(mod_entry['enzyme']) | ||
| sub = self._agent_from_up_id(mod_entry['substrate']) | ||
| res = mod_entry['residue_type'] | ||
| pos = mod_entry['residue_offset'] | ||
| evidence = [] | ||
| for source_pmid in mod_entry['references']: | ||
| source_db, pmid = source_pmid.split(':', 1) | ||
| # Skip evidence from already known sources | ||
| if source_db.lower() in ignore_srcs: | ||
| continue | ||
| if 'pmc' in pmid.lower(): | ||
| text_refs = {'PMCID': pmid.split('/')[-1]} | ||
| pmid = None | ||
| else: | ||
| text_refs = None | ||
| evidence.append(Evidence( | ||
| source_api='omnipath', | ||
| source_id=source_db, | ||
| pmid=pmid, | ||
| text_refs=text_refs, | ||
| annotations={k: v for k, v in mod_entry.items() if k not | ||
| in annot_ignore} | ||
| )) | ||
| mod_type = mod_entry['modification'] | ||
| modclass = modtype_to_modclass.get(mod_type) | ||
| if modclass is None: | ||
| unhandled_mod_types.append(mod_type) | ||
| continue | ||
| else: | ||
| # All evidences filtered out | ||
| if not evidence: | ||
| continue | ||
| stmt = modclass(enz, sub, res, pos, evidence) | ||
| ptm_stmts.append(stmt) | ||
| return ptm_stmts | ||
|
|
||
| def _stmt_from_op_lr(self, ligrec_json): | ||
| """Make ligand-receptor Complexes from Omnipath API interactions db""" | ||
| ligrec_stmts = [] | ||
| ign_annot = {'source_sub_id', 'source', 'target', 'references'} | ||
| no_refs = 0 | ||
| bad_pmid = 0 | ||
| no_consensus = 0 | ||
| if ligrec_json is None: | ||
| return ligrec_stmts | ||
|
|
||
| for lr_entry in ligrec_json: | ||
| if not lr_entry['references']: | ||
| no_refs += 1 | ||
| continue | ||
| if len(lr_entry['sources']) == 1 and \ | ||
| lr_entry['sources'][0].lower() in ignore_srcs: | ||
| continue | ||
|
|
||
| # Assemble evidence | ||
| evidence = [] | ||
| for source_pmid in lr_entry['references']: | ||
| source_db, pmid = source_pmid.split(':') | ||
| # Skip evidence from already known sources | ||
| if source_db.lower() in ignore_srcs: | ||
| continue | ||
| if len(pmid) > 8: | ||
| bad_pmid += 1 | ||
| continue | ||
| annot = {k: v for k, v in lr_entry.items() if k not in | ||
| ign_annot} | ||
| annot['source_sub_id'] = source_db | ||
| evidence.append(Evidence(source_api='omnipath', pmid=pmid, | ||
| annotations=annot)) | ||
|
|
||
| # Get statements if we have evidences | ||
| if evidence: | ||
| # Get complexes | ||
| ligrec_stmts.append(self._get_op_complex(lr_entry['source'], | ||
| lr_entry['target'], | ||
| evidence)) | ||
|
|
||
| # On consensus, make Activations or Inhibitions as well | ||
| if bool(lr_entry['consensus_stimulation']) ^ \ | ||
| bool(lr_entry['consensus_inhibition']): | ||
| activation = True if lr_entry['consensus_stimulation'] else \ | ||
| False | ||
| ligrec_stmts.append(self._get_ligrec_regs( | ||
| lr_entry['source'], lr_entry['target'], evidence, | ||
| activation=activation)) | ||
| elif lr_entry['consensus_stimulation'] and \ | ||
| lr_entry['consensus_inhibition']: | ||
| no_consensus += 1 | ||
| # All evidences were filtered out | ||
| else: | ||
| no_refs += 1 | ||
|
|
||
| if no_refs: | ||
| logger.warning(f'{no_refs} entries without references were ' | ||
| f'skipped') | ||
| if bad_pmid: | ||
| logger.warning(f'{bad_pmid} references with bad pmids were ' | ||
| f'skipped') | ||
| if no_consensus: | ||
| logger.warning(f'{no_consensus} entries with conflicting ' | ||
| f'regulation were skipped') | ||
|
|
||
| return ligrec_stmts | ||
|
|
||
| @staticmethod | ||
| def _agent_from_up_id(up_id): | ||
| """Build an Agent object from a Uniprot ID. Adds db_refs for both | ||
| Uniprot and HGNC where available.""" | ||
| db_refs = {'UP': up_id} | ||
| ag = Agent(up_id, db_refs=db_refs) | ||
| standardize_agent_name(ag) | ||
| return ag | ||
|
|
||
| def _bc_agent_from_up_list(self, up_id_list): | ||
| # Return the first agent with the remaining agents as a bound condition | ||
| agents_list = [self._agent_from_up_id(up_id) for up_id in up_id_list] | ||
| agent = agents_list[0] | ||
| agent.bound_conditions = \ | ||
| [BoundCondition(a, True) for a in agents_list[1:]] | ||
| return agent | ||
|
|
||
| def _complex_agents_from_op_complex(self, up_id_str): | ||
| """Return a list of agents from a string containing multiple UP ids | ||
| """ | ||
| # Get agents | ||
| if 'complex' in up_id_str.lower(): | ||
| up_id_list = [up for up in up_id_str.split(':')[1].split('_')] | ||
| else: | ||
| up_id_list = [up_id_str] | ||
|
|
||
| return [self._agent_from_up_id(up_id) for up_id in up_id_list] | ||
|
|
||
| def _get_op_complex(self, source, target, evidence_list): | ||
| ag_list = self._complex_agents_from_op_complex(source) + \ | ||
| self._complex_agents_from_op_complex(target) | ||
| return Complex(members=ag_list, | ||
| evidence=evidence_list) | ||
|
|
||
| def _get_ligrec_regs(self, source, target, evidence_list, activation=True): | ||
| # Check if any of the agents is a complex | ||
| # Source | ||
| if 'complex' in source.lower(): | ||
| # Make bound condition agent | ||
| up_id_list = [up for up in source.split(':')[1].split('_')] | ||
| subj = self._bc_agent_from_up_list(up_id_list) | ||
| else: | ||
| subj = self._agent_from_up_id(source) | ||
| # Target | ||
| if 'complex' in target.lower(): | ||
| # Make bound condition agent | ||
| up_id_list = [up for up in target.split(':')[1].split('_')] | ||
| obj = self._bc_agent_from_up_list(up_id_list) | ||
| else: | ||
| obj = self._agent_from_up_id(target) | ||
|
|
||
| # Regular case: | ||
| Regulation = stmt_by_name('activation') if activation else \ | ||
| stmt_by_name('inhibition') | ||
|
|
||
| regulation = Regulation(subj=subj, obj=obj, evidence=evidence_list) | ||
| return regulation |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.