Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/config-fields-supported.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ These are the supported v2 config keys used by the main workflow. Unknown keys a
| `modules.qc.enabled` | boolean | no | `false` | Enable QC module |
| `modules.qc.clusters` | integer | no | `3` | `>= 1` |
| `modules.qc.min_depth` | number | no | `2` | `>= 0` |
| `modules.qc.max_sample_missingness` | number | no | `0.49` | `0..1` |
| `modules.qc.google_api_key` | string | no | `""` | Optional for map panel |
| `modules.qc.exclude_scaffolds` | string | no | `""` | Comma-separated scaffold list |
| `modules.postprocess.enabled` | boolean | no | `false` | Enable postprocess module |
Expand Down Expand Up @@ -78,7 +79,7 @@ When running module Snakefiles directly (outside the main workflow module-import

### QC module (`workflow/modules/qc/Snakefile`)

`samples`, `sample_metadata`, `vcf`, `fai`, `qc_report`, `clusters`, `min_depth`, `google_api_key`, `exclude_scaffolds`
`samples`, `sample_metadata`, `vcf`, `fai`, `qc_report`, `clusters`, `min_depth`, `max_sample_missingness`, `google_api_key`, `exclude_scaffolds`

### Postprocess module (`workflow/modules/postprocess/Snakefile`)

Expand Down
1 change: 1 addition & 0 deletions docs/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The quality control module aggregates various statistics from the workflow and p
|`modules.qc.clusters`| Number of clusters for PCA visualization.| `int`|
|`modules.qc.google_api_key`| Google Maps API key for the terrain panel (optional).| `str`|
|`modules.qc.min_depth`| Samples with average depth below this will be excluded for QC analysis.| `int`|
|`modules.qc.max_sample_missingness`| Samples with >49% missing genotypes in the pruned QC SNP set are excluded before PLINK PCA/GRM.| `float`|
|`modules.qc.exclude_scaffolds`| Comma-separated scaffolds to exclude from QC SNP sampling.| `str`|

```{note}
Expand Down
27 changes: 27 additions & 0 deletions tests/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1470,6 +1470,33 @@ def test_qc_dashboard_helper_preserves_numeric_like_ids():
assert result.returncode == 0, (result.stdout + result.stderr).strip()


@pytest.mark.dry_run
def test_qc_plink_filters_sparse_samples_before_pca(request):
"""QC PLINK step should pass a per-sample missingness filter to avoid NaN GRMs."""
no_conda = request.config.getoption("--no-conda")
with tempfile.TemporaryDirectory() as tmpdir:
smk = SnakemakeRunner(
Path(tmpdir),
use_conda=not no_conda,
snakefile=WORKFLOW_DIR / "modules" / "qc" / "Snakefile",
)
result = smk.dry_run(
target="results/qc/plink.bed",
configfile=WORKFLOW_DIR / "modules" / "qc" / "config" / "config.yaml",
config_overrides={
"samples": str(TEST_DATA_DIR / "qc" / "samples.csv"),
"sample_metadata": str(TEST_DATA_DIR / "qc" / "sample_metadata.csv"),
"vcf": str(TEST_DATA_DIR / "qc" / "raw.vcf.gz"),
"fai": str(TEST_DATA_DIR / "qc" / "ref.fai"),
"max_sample_missingness": "0.49",
},
)
result.assert_success()

output = result.stdout + result.stderr
assert "--mind 0.49" in output


@pytest.mark.full_run
def test_qc_standalone_full_run(request):
"""Full execution of QC module as standalone workflow against test fixtures."""
Expand Down
1 change: 1 addition & 0 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ if config["modules"]["qc"]["enabled"]:
"qc_report": "results/qc_metrics/qc_report.tsv",
"clusters": config["modules"]["qc"]["clusters"],
"min_depth": config["modules"]["qc"]["min_depth"],
"max_sample_missingness": config["modules"]["qc"]["max_sample_missingness"],
"google_api_key": config["modules"]["qc"]["google_api_key"],
"exclude_scaffolds": config["modules"]["qc"]["exclude_scaffolds"],
}
Expand Down
7 changes: 7 additions & 0 deletions workflow/modules/qc/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ Config keys (flat):
qc_report - path to BAM summary stats TSV (optional; empty = skip mapping rate panel)
clusters - number of k-means clusters for PCA visualization (default 3)
min_depth - minimum depth to include a sample (default 2)
max_sample_missingness - maximum genotype missingness allowed per sample
in the pruned QC SNP set before PLINK drops it
(default 0.49)
google_api_key - Google Maps API key for terrain map (default "")
exclude_scaffolds - comma-separated scaffolds to exclude (default "")
"""
Expand All @@ -37,6 +40,7 @@ _QC_DEFAULTS = {
"qc_report": "",
"clusters": 3,
"min_depth": 2,
"max_sample_missingness": 0.49,
"google_api_key": "",
"exclude_scaffolds": "",
}
Expand Down Expand Up @@ -285,16 +289,19 @@ rule plink:
king="results/qc/plink.king",
params:
prefix="results/qc/plink",
max_sample_missingness=config["max_sample_missingness"],
conda:
"envs/plink.yml"
log:
"logs/qc/plink.txt",
shell:
"""
plink2 --vcf {input.vcf} --pca 10 --out {params.prefix} \
--mind {params.max_sample_missingness} \
--allow-extra-chr --autosome-num 95 --make-bed --make-king square \
--const-fid --bad-freqs &> {log}
plink --vcf {input.vcf} --out {params.prefix} \
--mind {params.max_sample_missingness} \
--allow-extra-chr --autosome-num 95 --distance square \
--const-fid &>> {log}
"""
Expand Down
1 change: 1 addition & 0 deletions workflow/modules/qc/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,6 @@ qc_report: "" # optional: path to BAM summary stats TSV (empty

clusters: 3 # k-means clusters for PCA visualization
min_depth: 2 # minimum depth to include a sample
max_sample_missingness: 0.49 # drop sparse samples before PLINK PCA/GRM
google_api_key: "" # Google Maps API key for terrain map
exclude_scaffolds: "" # comma-separated scaffolds to exclude
1 change: 1 addition & 0 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ DEFAULTS = {
"enabled": False,
"clusters": 3,
"min_depth": 2,
"max_sample_missingness": 0.49,
"google_api_key": "",
"exclude_scaffolds": "",
},
Expand Down
5 changes: 5 additions & 0 deletions workflow/schemas/config.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,11 @@ properties:
type: number
minimum: 0
default: 2
max_sample_missingness:
type: number
minimum: 0
maximum: 1
default: 0.49
google_api_key:
type: string
default: ""
Expand Down
Loading