Skip to content

Commit

Permalink
🐛 fix "-" as psm-fdr setting (#320)
Browse files Browse the repository at this point in the history
* 🐛 fix "-" as psm-fdr setting

- allow to have a string value for ProlineStudio as fdr

* 📝 improve description of uploading a submission

* 📝 update changelog reference

* ✨ parse minimum and maximum charge

- ⚠️ only works for positive charges as implemented now.

* 🎨 fix changelog website
  • Loading branch information
Henry Webel authored Jul 12, 2024
1 parent d6a0baf commit f8597c5
Show file tree
Hide file tree
Showing 9 changed files with 82 additions and 21 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Changelog
# Changelog - pre 0.2.10

All notable changes to this project will be documented in this file.

Expand Down
11 changes: 6 additions & 5 deletions docs/available-modules/2-DDA-Quantification-ion-level.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,10 @@ The module is flexible in terms of what workflow the participants can run. Howev

When you have successfully uploaded and visualized a benchmark run, we strongly encourage you to add the result to the online repository. This way, your run will be available to the entire community and can be compared to all other uploaded benchmark runs. By doing so, your workflow outputs, parameters and calculated metrics will be stored and publicly available.

To submit your run for public usage, you need to upload the parameter file associated to your run in the field `Meta data for searches`. Currently, we accept outputs from MaxQuant, FragPipe, Proline and i2MassChroQ (see bellow for more tool-specific details). Please fill the `Comments for submission` if needed, and confirm that the metadata is correct (correspond to the benchmark run) before pressing the button `I really want to upload it`.
To submit your run for public usage, you need to upload the parameter file associated to your run in the field `Meta data for searches`. Currently, we accept outputs from MaxQuant, FragPipe, Proline and i2MassChroQ (see bellow for more tool-specific details). Please fill the `Comments for submission` if needed, and confirm that the metadata is correct (correspond to the benchmark run) before checking the button `I confirm that the metadata is correct`. Then the button
`I really want to upload it` will appear to trigger the submission.

After upload, you will get a link to the pull request associated with your data. Please copy it and save it. With this link, you can get the unique identifier of your run (for example "Proline__20240106_141919"), and follow the advancement of your submission and add comments to communicate with the ProteoBench maintainers. If everything looks good, your submission will be reviewed and accepted (it will take a few working days). Then, your benchmark run will be added to the public runs of this module and plotted alongside all other benchmark runs in the figure.
After upload, you will get a link to the pull request associated with your data. Please copy it and save it. With this link, you can get the unique identifier of your run (for example `Proline__20240106_141919`), and follow the advancement of your submission and add comments to communicate with the ProteoBench maintainers. If everything looks good, your submission will be reviewed and accepted (it will take a few working days). Then, your benchmark run will be added to the public runs of this module and plotted alongside all other benchmark runs in the figure.

## Important Tool-specific settings

Expand Down Expand Up @@ -82,9 +83,9 @@ Some older versions of MaxQuant do not provide the option to change fasta header

### Proline
Use the raw file names as sample names. In the output, it will automatically remove "LFQ_Orbitrap_".
For this module, use the excel exports. Make sure that the Quantified peptide ions tab contains the columns "samesets_accessions" and "subsets_accessions". The accessions in these two field are combined to determine what species a peptide sequence matches to.
The "Quantified peptide ions" tab reports validated PSMs, so precursor ion quantities (retrieved from XICs) are duplicated. This redundancy is removed before metric calculation.
For public submission, you can upload the same excel export, just make sure to have the tabs "Search settings and infos", "Import and filters", "Quant config".
For this module, use the excel exports. Make sure that the `Quantified peptide ions` tab contains the columns `samesets_accessions` and `subsets_accessions`. The accessions in these two field are combined to determine what species a peptide sequence matches to.
The `Quantified peptide ions` tab reports validated PSMs, so precursor ion quantities (retrieved from XICs) are duplicated. This redundancy is removed before metric calculation.
For public submission, you can upload the same excel export, just make sure to have the tabs `Search settings and infos`, `Import and filters`, `Quant config`.

### Sage

Expand Down
14 changes: 14 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,16 @@
Changelog
=========

The recent changelog can be viewed on the Releases page on GitHub:

`Github Releases <https://github.com/Proteobench/ProteoBench/releases>`_

The old changelog is also available in the CHANGELOG.md file in the root of the repository
and displayed below.

Changelog - pre 0.2.10
----------------------

.. include:: ../CHANGELOG.md
:parser: myst_parser.sphinx_
:start-line: 1
40 changes: 29 additions & 11 deletions proteobench/io/params/proline.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
- "Import and filters"
- "Quant config"
"""

import re

import pandas as pd
Expand Down Expand Up @@ -34,6 +35,14 @@

PATTERN_MIN_PEP_LENGTH = r"\[threshold_value=([0-9].*)\]"

PATTERN_CHARGE = r"[\d+]+"


def find_charge(string):
charges = re.findall(PATTERN_CHARGE, string)
charges = [int(c[:-1]) for c in charges]
return charges


def find_min_pep_length(string):
min_length = re.findall(PATTERN_MIN_PEP_LENGTH, string)[0]
Expand Down Expand Up @@ -64,6 +73,11 @@ def extract_params(fname) -> ProteoBenchParameters:
params.precursor_mass_tolerance = sheet.loc[0, "peptide_mass_error_tolerance"]
params.fragment_mass_tolerance = sheet.loc[0, "fragment_mass_error_tolerance"]

# Extract allowed minimum and maximum charge states
charges = find_charge(sheet.loc[0, "peptide_charge_states"])
params.min_precursor_charge = min(charges)
params.max_precursor_charge = max(charges)

# ! Second sheet contains information about the import and filters
sheet_name = "Import and filters"
cols = use_columns[sheet_name]
Expand All @@ -73,7 +87,10 @@ def extract_params(fname) -> ProteoBenchParameters:
assert all(stats.loc["unique", cols] == 1), "Not all columns are unique"
sheet = sheet[cols].drop_duplicates().reset_index(drop=True)
# Extract
params.ident_fdr_psm = int(sheet.loc[0, "psm_filter_expected_fdr"]) / 100
try:
params.ident_fdr_psm = int(sheet.loc[0, "psm_filter_expected_fdr"]) / 100
except ValueError:
params.ident_fdr_psm = sheet.loc[0, "psm_filter_expected_fdr"]
params.min_peptide_length = find_min_pep_length(sheet.loc[0, "psm_filter_2"])

# ! Third sheet only contains match between runs (MBR) information indirectly
Expand All @@ -87,13 +104,14 @@ def extract_params(fname) -> ProteoBenchParameters:
if __name__ == "__main__":
from pathlib import Path

file = Path("../../../test/params/Proline_example_w_Mascot_wo_proteinSets.xlsx")
params = extract_params(file)
data_dict = params.__dict__
series = pd.Series(data_dict)
series.to_csv(file.with_suffix(".csv"))
file = Path("../../../test/params/Proline_example_2.xlsx")
params = extract_params(file)
data_dict = params.__dict__
series = pd.Series(data_dict)
series.to_csv(file.with_suffix(".csv"))
files = [
"../../../test/params/Proline_example_w_Mascot_wo_proteinSets.xlsx",
"../../../test/params/Proline_example_2.xlsx",
"../../../test/params/ProlineStudio_withMBR.xlsx",
]
for file in files:
file = Path(file)
params = extract_params(file)
data_dict = params.__dict__
series = pd.Series(data_dict)
series.to_csv(file.with_suffix(".csv"))
20 changes: 20 additions & 0 deletions test/params/ProlineStudio_withMBR.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
,0
software_name,Proline
software_version,
search_engine,Mascot
search_engine_version,2.8.3
ident_fdr_psm,-
ident_fdr_peptide,
ident_fdr_protein,
enable_match_between_runs,True
precursor_mass_tolerance,10.0 ppm
fragment_mass_tolerance,0.02 Da
enzyme,Trypsin/P
allowed_miscleavages,2
min_peptide_length,7
max_peptide_length,
fixed_mods,Carbamidomethyl (C)
variable_mods,Acetyl (Protein N-term); Oxidation (M)
max_mods,
min_precursor_charge,2
max_precursor_charge,3
Binary file added test/params/ProlineStudio_withMBR.xlsx
Binary file not shown.
4 changes: 2 additions & 2 deletions test/params/Proline_example_2.csv
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ max_peptide_length,
fixed_mods,Carbamidomethyl (C)
variable_mods,Acetyl (Protein N-term); Gln->pyro-Glu (Any N-term Q); Ammonia-loss (Any N-term C); Glu->pyro-Glu (Any N-term E); Oxidation (M)
max_mods,
min_precursor_charge,
max_precursor_charge,
min_precursor_charge,1
max_precursor_charge,4
4 changes: 2 additions & 2 deletions test/params/Proline_example_w_Mascot_wo_proteinSets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ max_peptide_length,
fixed_mods,Carbamidomethyl (C)
variable_mods,Acetyl (Protein N-term); Oxidation (M)
max_mods,
min_precursor_charge,
max_precursor_charge,
min_precursor_charge,2
max_precursor_charge,3
8 changes: 8 additions & 0 deletions test/test_parse_params_proline.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
fnames = [
"Proline_example_w_Mascot_wo_proteinSets.xlsx",
"Proline_example_2.xlsx",
"ProlineStudio_withMBR.xlsx",
]
fnames = [TESTDATA_DIR / f for f in fnames]

Expand Down Expand Up @@ -42,3 +43,10 @@ def test_extract_params(file):
actual = pd.Series(actual.__dict__)
actual = pd.read_csv(io.StringIO(actual.to_csv()), index_col=0).squeeze("columns")
assert expected.equals(actual)


def test_find_charges():
assert proline_params.find_charge("2+ and 3+") == [2, 3]
assert proline_params.find_charge("2+") == [2]
assert proline_params.find_charge("3+") == [3]
assert proline_params.find_charge("30+ and 14+") == [30, 14]

0 comments on commit f8597c5

Please sign in to comment.