diff --git a/docs/available-modules/2-quant-lfq-ion-dda.md b/docs/available-modules/2-quant-lfq-ion-dda.md index a0108729..e61bae1e 100644 --- a/docs/available-modules/2-quant-lfq-ion-dda.md +++ b/docs/available-modules/2-quant-lfq-ion-dda.md @@ -64,7 +64,7 @@ The module is flexible in terms of what workflow the participants can run. Howev When you have successfully uploaded and visualized a benchmark run, we strongly encourage you to add the result to the online repository. This way, your run will be available to the entire community and can be compared to all other uploaded benchmark runs. By doing so, your workflow outputs, parameters and calculated metrics will be stored and publicly available. -To submit your run for public usage, you need to upload the parameter file associated to your run in the field `Meta data for searches`. Currently, we accept outputs from MaxQuant, FragPipe, Proline Studio, and i2MassChroQ (see bellow for more tool-specific details). Please fill the `Comments for submission` if needed, and confirm that the metadata is correct (correspond to the benchmark run) before checking the button `I confirm that the metadata is correct`. Then the button +To submit your run for public usage, you need to upload the parameter file associated to your run in the field `Meta data for searches`. Currently, we accept outputs from MaxQuant, FragPipe, Proline Studio, AlphaPept, PEAKS and i2MassChroQ (see below for more tool-specific details). Please fill the `Comments for submission` if needed, and confirm that the metadata is correct (correspond to the benchmark run) before checking the button `I confirm that the metadata is correct`. Then the button `I really want to upload it` will appear to trigger the submission. After upload, you will get a link to the pull request associated with your data. Please copy it and save it. With this link, you can get the unique identifier of your run (for example `ProlineStudio__20240106_141919`), and follow the advancement of your submission and add comments to communicate with the ProteoBench maintainers. If everything looks good, your submission will be reviewed and accepted (it will take a few working days). Then, your benchmark run will be added to the public runs of this module and plotted alongside all other benchmark runs in the figure. @@ -81,6 +81,7 @@ Table 2 provides an overview of the required input files for public submission. |MaxQuant|evidence.txt|mqpar.xml| |Proline Studio|.xlsx|.xlsx| |Sage|lfq.tsv|results.json| +|PEAKS|lfq_features.csv|parameters.txt| ### AlphaPept 1. Load folder that contains the data files. @@ -129,6 +130,12 @@ For public submission, you can upload the same excel export, just make sure to h MSAngel allows to build piplenes for bottom-up MS analysis with a choice of search engines, validation strategy and the Proline quantification. More information can be found [here](https://www.profiproteomics.fr/ms-angel/) +### PEAKS (work in progress) +When starting a new project and selecting the .RAW files, there is no need to modify the sample names given by PEAKS. Just make sure that Sample 1 -> 3 are Condition "A" and Sample 4 -> 6 are condition "B". +Make sure to set Enzyme as trypsin, Instrument as Orbitrap (Orbi-Orbi), Fragment as HCD and Acquisition as DDA. +In workflow section use the PEAKS Q (de novo assisted search quantification) option. Set the different parameters in "Data refine" and "DB search". In the tab "Quantification" use the "Label Free" option, followed by either adding all samples individually or grouping samples according to their respective condition. In the "Report" tab, make sure both Peptide FDR and Protein Group FDR are set to 1%. +Once the workflow has run succesfully, make sure to check the "All Search Parameters" and the "Feature Vector CSV" from the Label Free Quantification Exports in the "Export" tab. + ### Sage 1. Convert .raw files into .mzML using MSConvert or ThermoRawFileParser **(do not change the file names)** diff --git a/docs/available-modules/4-quant-lfq-ion-dia-aif.md b/docs/available-modules/4-quant-lfq-ion-dia-aif.md index 10a69565..81cc0ba6 100644 --- a/docs/available-modules/4-quant-lfq-ion-dia-aif.md +++ b/docs/available-modules/4-quant-lfq-ion-dia-aif.md @@ -77,7 +77,8 @@ Table 2 provides an overview of the required input files for public submission. |DIA-NN|*_report.tsv|*report.log.txt| |FragPipe|*_report.tsv|fragpipe.workflow| |MaxDIA|evidence.txt|mqpar.xml| -|Spectronaut|*.tsv|*.txt +|Spectronaut|*.tsv|*.txt| +|PEAKS|lfq.dia.peptides.csv|parameters.txt| After upload, you will get a link to the pull request associated with your data. Please copy it and save it. With this link, you can get the unique identifier of your run (for example `Proline__20240106_141919`), and follow the advancement of your submission and add comments to communicate with the ProteoBench maintainers. If everything looks good, your submission will be reviewed and accepted (it will take a few working days). Then, your benchmark run will be added to the public runs of this module and plotted alongside all other benchmark runs in the figure. @@ -113,6 +114,13 @@ By default, MaxDIA uses a contaminants-only fasta file that is located in the so For this module, use the "evidence.txt" output in the "txt" folder of MaxQuant search outputs. For public submission, please upload the "mqpar.xml" file associated with your search. +### [PEAKS](https://www.bioinfor.com//)/) (work in progress) +When starting a new project and selecting the .RAW files, there is no need to modify the sample names given by PEAKS. Just make sure that Sample 1 -> 3 are Condition "A" and Sample 4 -> 6 are condition "B". +Make sure to set Enzyme as trypsin, Instrument as Orbitrap (Orbi-Orbi), Fragment as HCD and Acquisition as DIA. +In workflow section use the Quantification option. While we do not propose to use a custom spectral library, one could define one in the "Spectral library" tab. Define the different search parameters in the tab "DB search". +In the tab "Quantification" use the "Label Free" option, followed by either adding all samples individually or grouping samples according to their respective condition. In the "Report" tab, make sure both Peptide FDR and Protein Group FDR are set to 1%. +Once the workflow has run succesfully, make sure to check the "All Search Parameters" and the "Peptide CSV" from the Label Free Quantification Exports in the "Export" tab. + #### Troubleshooting: Since the Thermo DIA data .raw files were acquired using a staggered window approach it is highly recommended to convert and demultiplex the .RAW files first into .mzML using MSConvert. diff --git a/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DDA/parse_settings_peaks.toml b/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DDA/parse_settings_peaks.toml new file mode 100644 index 00000000..cbcd7cc7 --- /dev/null +++ b/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DDA/parse_settings_peaks.toml @@ -0,0 +1,37 @@ +[mapper] +"Accession" = "Proteins" +"Peptide" = "Sequence" +"z" = "Charge" + +[condition_mapper] +"Sample 1 Normalized Area" = "A" +"Sample 2 Normalized Area" = "A" +"Sample 3 Normalized Area" = "A" +"Sample 4 Normalized Area" = "B" +"Sample 5 Normalized Area" = "B" +"Sample 6 Normalized Area" = "B" + +[run_mapper] +"Sample 1 Normalized Area" = "Condition_A_Sample_Alpha_01" +"Sample 2 Normalized Area" = "Condition_A_Sample_Alpha_02" +"Sample 3 Normalized Area" = "Condition_A_Sample_Alpha_03" +"Sample 4 Normalized Area" = "Condition_B_Sample_Alpha_01" +"Sample 5 Normalized Area" = "Condition_B_Sample_Alpha_02" +"Sample 6 Normalized Area" = "Condition_B_Sample_Alpha_03" + +[species_mapper] +"_YEAST" = "YEAST" +"_ECOLI" = "ECOLI" +"_HUMAN" = "HUMAN" + +[modifications_parser] +"parse_column" = "Sequence" +"before_aa" = false +"isalpha" = true +"isupper" = true +"pattern"="(?<=\\().+?(?=\\))" +"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"} + +[general] +"contaminant_flag" = "Cont_" +"decoy_flag" = false diff --git a/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DIA/AIF/parse_settings_peaks.toml b/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DIA/AIF/parse_settings_peaks.toml new file mode 100644 index 00000000..b128bea6 --- /dev/null +++ b/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DIA/AIF/parse_settings_peaks.toml @@ -0,0 +1,37 @@ +[mapper] +"Accession" = "Proteins" +"Peptide" = "Sequence" +"z" = "Charge" + +[condition_mapper] +"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01.raw Normalized Area" = "A" +"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_02.raw Normalized Area" = "A" +"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_03.raw Normalized Area" = "A" +"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01.raw Normalized Area" = "B" +"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_02.raw Normalized Area" = "B" +"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_03.raw Normalized Area" = "B" + +[run_mapper] +"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01.raw Normalized Area" = "Condition_A_Sample_Alpha_01" +"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_02.raw Normalized Area" = "Condition_A_Sample_Alpha_02" +"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_03.raw Normalized Area" = "Condition_A_Sample_Alpha_03" +"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01.raw Normalized Area" = "Condition_B_Sample_Alpha_01" +"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_02.raw Normalized Area" = "Condition_B_Sample_Alpha_02" +"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_03.raw Normalized Area" = "Condition_B_Sample_Alpha_03" + +[species_mapper] +"_YEAST" = "YEAST" +"_ECOLI" = "ECOLI" +"_HUMAN" = "HUMAN" + +[modifications_parser] +"parse_column" = "Sequence" +"before_aa" = false +"isalpha" = true +"isupper" = true +"pattern"="(?<=\\().+?(?=\\))" +"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"} + +[general] +"contaminant_flag" = "Cont_" +"decoy_flag" = false diff --git a/proteobench/io/parsing/io_parse_settings/Quant/lfq/peptidoform/DDA/parse_settings_peaks.toml b/proteobench/io/parsing/io_parse_settings/Quant/lfq/peptidoform/DDA/parse_settings_peaks.toml new file mode 100644 index 00000000..32620ebf --- /dev/null +++ b/proteobench/io/parsing/io_parse_settings/Quant/lfq/peptidoform/DDA/parse_settings_peaks.toml @@ -0,0 +1,36 @@ +[mapper] +"Accession" = "Proteins" +"Peptide" = "Sequence" + +[condition_mapper] +"Area Sample 1" = "A" +"Area Sample 2" = "A" +"Area Sample 3" = "A" +"Area Sample 4" = "B" +"Area Sample 5" = "B" +"Area Sample 6" = "B" + +[run_mapper] +"Area Sample 1" = "Condition_A_Sample_Alpha_01" +"Area Sample 2" = "Condition_A_Sample_Alpha_02" +"Area Sample 3" = "Condition_A_Sample_Alpha_03" +"Area Sample 4" = "Condition_B_Sample_Alpha_01" +"Area Sample 5" = "Condition_B_Sample_Alpha_02" +"Area Sample 6" = "Condition_B_Sample_Alpha_03" + +[species_mapper] +"_YEAST" = "YEAST" +"_ECOLI" = "ECOLI" +"_HUMAN" = "HUMAN" + +[modifications_parser] +"parse_column" = "Sequence" +"before_aa" = false +"isalpha" = true +"isupper" = true +"pattern"="(?<=\\().+?(?=\\))" +"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"} + +[general] +"contaminant_flag" = "Cont_" +"decoy_flag" = false diff --git a/proteobench/io/parsing/io_parse_settings/parse_settings_files.toml b/proteobench/io/parsing/io_parse_settings/parse_settings_files.toml index 9ab36a55..2b25755a 100644 --- a/proteobench/io/parsing/io_parse_settings/parse_settings_files.toml +++ b/proteobench/io/parsing/io_parse_settings/parse_settings_files.toml @@ -6,10 +6,12 @@ "ProlineStudio" = "parse_settings_proline.toml" "MSAngel" = "parse_settings_msangel.toml" "Sage" = "parse_settings_sage.toml" +"PEAKS" = "parse_settings_peaks.toml" "Custom" = "parse_settings_custom.toml" [quant_lfq_peptidoform_DDA] "WOMBAT" = "parse_settings_wombat.toml" +"PEAKS" = "parse_settings_peaks.toml" "Proteome Discoverer" = "parse_settings_proteomediscoverer.toml" "Custom" = "parse_settings_custom.toml" @@ -21,6 +23,7 @@ "Spectronaut" = "parse_settings_spectronaut.toml" "AlphaDIA" = "parse_settings_alphadia.toml" "MSAID" = "parse_settings_msaid.toml" +"PEAKS" = "parse_settings_peaks.toml" "Custom" = "parse_settings_custom.toml" [quant_lfq_ion_DIA_diaPASEF] diff --git a/proteobench/io/parsing/parse_ion.py b/proteobench/io/parsing/parse_ion.py index c520ca8e..459db26e 100644 --- a/proteobench/io/parsing/parse_ion.py +++ b/proteobench/io/parsing/parse_ion.py @@ -107,6 +107,8 @@ def load_input_file(input_csv: str, input_format: str) -> pd.DataFrame: input_data_frame["PG.ProteinGroups"] = input_data_frame["PG.ProteinGroups"].str.join(";") elif input_format == "MSAID": input_data_frame = pd.read_csv(input_csv, low_memory=False, sep="\t") + elif input_format == "PEAKS": + input_data_frame = pd.read_csv(input_csv, low_memory=False, sep=",") return input_data_frame diff --git a/proteobench/io/parsing/parse_peptidoform.py b/proteobench/io/parsing/parse_peptidoform.py index aa79a496..a031fcb3 100644 --- a/proteobench/io/parsing/parse_peptidoform.py +++ b/proteobench/io/parsing/parse_peptidoform.py @@ -29,6 +29,8 @@ def load_input_file(input_csv: str, input_format: str) -> pd.DataFrame: elif input_format == "Custom": input_data_frame = pd.read_csv(input_csv, low_memory=False, sep="\t") input_data_frame["proforma"] = input_data_frame["Modified sequence"] + elif input_format == "PEAKS": + input_data_frame = pd.read_csv(input_csv, low_memory=False, sep=",") return input_data_frame diff --git a/proteobench/plotting/plot_quant.py b/proteobench/plotting/plot_quant.py index d87e16ab..b714e189 100644 --- a/proteobench/plotting/plot_quant.py +++ b/proteobench/plotting/plot_quant.py @@ -88,6 +88,7 @@ def plot_metric( "FragPipe (DIA-NN quant)": "#ff7f00", "MSAID": "#afff57", "Proteome Discoverer": "#8c564b", + "PEAKS": "#f781bf", }, mapping: Dict[str, int] = {"old": 10, "new": 20}, highlight_color: str = "#d30067",