diff --git a/02_activities/assignments/QQ_plot.png b/02_activities/assignments/QQ_plot.png
new file mode 100644
index 0000000..c0f7648
Binary files /dev/null and b/02_activities/assignments/QQ_plot.png differ
diff --git a/02_activities/assignments/assignment_1.html b/02_activities/assignments/assignment_1.html
new file mode 100644
index 0000000..afff0c1
--- /dev/null
+++ b/02_activities/assignments/assignment_1.html
@@ -0,0 +1,1177 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
+
+<meta charset="utf-8">
+<meta name="generator" content="quarto-1.8.27">
+
+<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
+
+
+<title>Assignment #1</title>
+<style>
+code{white-space: pre-wrap;}
+span.smallcaps{font-variant: small-caps;}
+div.columns{display: flex; gap: min(4vw, 1.5em);}
+div.column{flex: auto; overflow-x: auto;}
+div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+ul.task-list{list-style: none;}
+ul.task-list li input[type="checkbox"] {
+  width: 0.8em;
+  margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 
+  vertical-align: middle;
+}
+/* CSS for syntax highlighting */
+html { -webkit-text-size-adjust: 100%; }
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+  { counter-reset: source-line 0; }
+pre.numberSource code > span
+  { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+  { content: counter(source-line);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+  }
+pre.numberSource { margin-left: 3em;  padding-left: 4px; }
+div.sourceCode
+  {   }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
+</style>
+
+
+<script src="assignment_1_files/libs/clipboard/clipboard.min.js"></script>
+<script src="assignment_1_files/libs/quarto-html/quarto.js" type="module"></script>
+<script src="assignment_1_files/libs/quarto-html/tabsets/tabsets.js" type="module"></script>
+<script src="assignment_1_files/libs/quarto-html/axe/axe-check.js" type="module"></script>
+<script src="assignment_1_files/libs/quarto-html/popper.min.js"></script>
+<script src="assignment_1_files/libs/quarto-html/tippy.umd.min.js"></script>
+<script src="assignment_1_files/libs/quarto-html/anchor.min.js"></script>
+<link href="assignment_1_files/libs/quarto-html/tippy.css" rel="stylesheet">
+<link href="assignment_1_files/libs/quarto-html/quarto-syntax-highlighting-ed96de9b727972fe78a7b5d16c58bf87.css" rel="stylesheet" id="quarto-text-highlighting-styles">
+<script src="assignment_1_files/libs/bootstrap/bootstrap.min.js"></script>
+<link href="assignment_1_files/libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
+<link href="assignment_1_files/libs/bootstrap/bootstrap-d6a003b94517c951b2d65075d42fb01b.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="light">
+
+
+</head>
+
+<body class="fullcontent quarto-light">
+
+<div id="quarto-content" class="page-columns page-rows-contents page-layout-article">
+
+<main class="content" id="quarto-document-content">
+
+<header id="title-block-header" class="quarto-title-block default">
+<div class="quarto-title">
+<h1 class="title">Assignment #1</h1>
+</div>
+
+
+
+<div class="quarto-title-meta">
+
+    
+  
+    
+  </div>
+  
+
+
+</header>
+
+
+<section id="assignment-1" class="level2">
+<h2 class="anchored" data-anchor-id="assignment-1">Assignment 1</h2>
+<p>You only need to write lines of code for each question. When answering questions that ask you to identify or interpret something, the length of your response doesn’t matter. For example, if the answer is just ‘yes,’ ‘no,’ or a number, you can just give that answer without adding anything else.</p>
+<p>We will go through comparable code and concepts in the live learning session. If you run into trouble, start by using the help help() function in R, to get information about the datasets and function in question. The internet is also a great resource when coding (though note that no outside searches are required by the assignment!). If you do incorporate code from the internet, please cite the source within your code (providing a URL is sufficient).</p>
+<p>Please bring questions that you cannot work out on your own to office hours, work periods or share with your peers on Slack. We will work with you through the issue.</p>
+<p>You will need to install PLINK and run the analyses. Please follow the OS-specific setup guide in <a href="../../SETUP.md"><code>SETUP.md</code></a>. PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.</p>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Set up a consistent path for all chunks of code</span></span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>knitr<span class="sc">::</span>opts_knit<span class="sc">$</span><span class="fu">set</span>(<span class="at">root.dir =</span> <span class="fu">normalizePath</span>(<span class="st">"../../"</span>))</span>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(data.table)</span>
+<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(ggplot2)</span>
+<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(seqminer)</span>
+<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(HardyWeinberg)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Loading required package: mice</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning: package 'mice' was built under R version 4.5.2</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>
+Attaching package: 'mice'</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>The following object is masked from 'package:stats':
+
+    filter</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>The following objects are masked from 'package:base':
+
+    cbind, rbind</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Loading required package: nnet</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Loading required package: Rsolnp</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Loading required package: shape</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(dplyr)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stderr">
+<pre><code>
+Attaching package: 'dplyr'</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>The following object is masked from 'package:HardyWeinberg':
+
+    recode</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>The following objects are masked from 'package:data.table':
+
+    between, first, last</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>The following objects are masked from 'package:stats':
+
+    filter, lag</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>The following objects are masked from 'package:base':
+
+    intersect, setdiff, setequal, union</code></pre>
+</div>
+</div>
+<section id="question-1-data-inspection" class="level4">
+<h4 class="anchored" data-anchor-id="question-1-data-inspection">Question 1: Data inspection</h4>
+<p>Before fitting any models, it is essential to understand the data. Use R or bash code to answer the following questions about the <code>gwa.qc.A1.fam</code>, <code>gwa.qc.A1.bim</code>, and <code>gwa.qc.A1.bed</code> files, available at the following Google Drive link: <a href="https://drive.google.com/drive/folders/11meVqGCY5yAyI1fh-fAlMEXQt0VmRGuz?usp=drive_link" class="uri">https://drive.google.com/drive/folders/11meVqGCY5yAyI1fh-fAlMEXQt0VmRGuz?usp=drive_link</a>. Please download all three files from this link and place them in <code>02_activities/data/</code>.</p>
+<ol type="i">
+<li>Read the .fam file. How many samples does the dataset contain?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Read the .fam file</span></span>
+<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="fu">wc</span> <span class="at">-l</span> ./02_activities/data/gwa.qc.A1.fam</span>
+<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a><span class="co">#The dataset contains 4000 samples.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>    4000 ./02_activities/data/gwa.qc.A1.fam</code></pre>
+</div>
+</div>
+<ol start="2" type="i">
+<li>What is the ‘variable type’ of the response variable (i.e.Continuous or binary)?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span> ./02_activities/data/gwa.qc.A1.fam</span>
+<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="co">#The response variable is continuous.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>0   A2001   0   0   1   -0.694438129641973
+1   A2002   0   0   1   1.85384536141856
+2   A2003   0   0   1   2.08263677761584
+3   A2004   0   0   1   2.73871473943968
+4   A2005   0   0   1   1.34114035564636
+5   A2006   0   0   1   0.416778586749647
+6   A2007   0   0   1   2.38297123290054
+7   A2008   0   0   1   1.51429928826958
+8   A2009   0   0   1   0.718686390529039
+9   A2010   0   0   1   2.08904136245205</code></pre>
+</div>
+</div>
+<ol start="3" type="i">
+<li>Read the .bim file. How many SNPs does the dataset contain?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="fu">wc</span> <span class="at">-l</span> ./02_activities/data/gwa.qc.A1.bim</span>
+<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a><span class="co">#There are 101083 SNPs in the dataset.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>  101083 ./02_activities/data/gwa.qc.A1.bim</code></pre>
+</div>
+</div>
+</section>
+<section id="question-2-allele-frequency-estimation" class="level4">
+<h4 class="anchored" data-anchor-id="question-2-allele-frequency-estimation">Question 2: Allele Frequency Estimation</h4>
+<ol type="i">
+<li>Load the genotype matrix for SNPs rs1861, rs3813199, rs3128342, and rs11804831 using additive coding. What are the allele frequencies (AFs) for these four SNPs?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Create SNP list file with 4 SNPs: rs1861, rs3813199, rs3128342, and rs11804831</span></span>
+<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a><span class="ex">setopt</span> interactivecomments</span>
+<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a><span class="bu">printf</span> <span class="st">"%s\n"</span> rs1861 rs3813199 rs3128342 rs11804831 <span class="op">&gt;</span> ./02_activities/data/A1snplist.txt</span>
+<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a><span class="co">#Subset SNPs and output allele frequencies</span></span>
+<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A1 <span class="dt">\</span></span>
+<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>      <span class="at">--extract</span> ./02_activities/data/A1snplist.txt <span class="dt">\</span></span>
+<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a>      <span class="at">--recode</span> A<span class="dt">\</span></span>
+<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a>      <span class="at">--freq</span> <span class="dt">\</span></span>
+<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a>      <span class="at">--out</span> ./02_activities/data/gwa_qc_A1_freq</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>bash: line 1: setopt: command not found
+PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa_qc_A1_freq.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc.A1
+  --export A
+  --extract ./02_activities/data/A1snplist.txt
+  --freq
+  --out ./02_activities/data/gwa_qc_A1_freq
+
+Start time: Wed Mar 18 12:26:12 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A1.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc.A1.bim.
+1 quantitative phenotype loaded (4000 values).
+--extract: 4 variants remaining.
+Calculating allele frequencies... 0%0%done.
+--freq: 0%0%25%50%75%
+--freq: Allele frequencies (founders only) written to
+./02_activities/data/gwa_qc_A1_freq.afreq .
+4 variants remaining after main filters.
+
+--export A pass 1/1: loading... 0%0%writing... 0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%42%43%44%45%46%47%48%49%50%51%52%53%54%55%56%57%58%59%60%61%62%63%64%65%66%67%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%85%86%87%88%89%90%91%92%93%94%95%96%97%98%99% done.
+--export A: ./02_activities/data/gwa_qc_A1_freq.raw written.
+End time: Wed Mar 18 12:26:12 2026</code></pre>
+</div>
+</div>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="co">#View allele frequencies</span></span>
+<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>A1freq <span class="ot">&lt;-</span> <span class="fu">fread</span>(<span class="st">"./02_activities/data/gwa_qc_A1_freq.afreq"</span>)</span>
+<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(A1freq)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>   #CHROM         ID    REF    ALT PROVISIONAL_REF? ALT_FREQS OBS_CT
+    &lt;int&gt;     &lt;char&gt; &lt;char&gt; &lt;char&gt;           &lt;char&gt;     &lt;num&gt;  &lt;int&gt;
+1:      1  rs3813199      G      A                Y 0.0569126   7942
+2:      1 rs11804831      T      C                Y 0.1543410   7924
+3:      1  rs3128342      C      A                Y 0.3051210   7928
+4:      1     rs1861      C      A                Y 0.0539859   7928</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="co">#The allele frequencies for the alternative allele of rs1861, rs3813199, rs3128342, and rs11804831 are as follows: 0.0539859, 0.0569126, 0.3051210, and 0.1543410.</span></span>
+<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a>A1freq<span class="sc">$</span>REF_FREQ <span class="ot">&lt;-</span> <span class="dv">1</span> <span class="sc">-</span> A1freq<span class="sc">$</span>ALT_FREQS</span>
+<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a>A1freq</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>   #CHROM         ID    REF    ALT PROVISIONAL_REF? ALT_FREQS OBS_CT  REF_FREQ
+    &lt;int&gt;     &lt;char&gt; &lt;char&gt; &lt;char&gt;           &lt;char&gt;     &lt;num&gt;  &lt;int&gt;     &lt;num&gt;
+1:      1  rs3813199      G      A                Y 0.0569126   7942 0.9430874
+2:      1 rs11804831      T      C                Y 0.1543410   7924 0.8456590
+3:      1  rs3128342      C      A                Y 0.3051210   7928 0.6948790
+4:      1     rs1861      C      A                Y 0.0539859   7928 0.9460141</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="co">#The allele frequencies for the reference allele of rs1861, rs3813199, rs3128342, and rs11804831 are as follows: 0.9460141, 0.9430874, 0.6948790, and 0.8456590.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<ol start="2" type="i">
+<li>What are the minor allele frequencies (MAFs) for these four SNPs?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a>A1freq<span class="sc">$</span>MAF <span class="ot">&lt;-</span> <span class="fu">pmin</span>(A1freq<span class="sc">$</span>ALT_FREQS, <span class="dv">1</span> <span class="sc">-</span> A1freq<span class="sc">$</span>ALT_FREQS)</span>
+<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a>A1freq[, .(ID, <span class="at">AF =</span> ALT_FREQS, MAF)]</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>           ID        AF       MAF
+       &lt;char&gt;     &lt;num&gt;     &lt;num&gt;
+1:  rs3813199 0.0569126 0.0569126
+2: rs11804831 0.1543410 0.1543410
+3:  rs3128342 0.3051210 0.3051210
+4:     rs1861 0.0539859 0.0539859</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="co">#The minor allele frequencies for rs1861, rs3813199, rs3128342, and rs11804831 are as follows: 0.0539859, 0.0569126, 0.3051210, and 0.1543410.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+</section>
+<section id="question-3-hardyweinberg-equilibrium-hwe-test" class="level4">
+<h4 class="anchored" data-anchor-id="question-3-hardyweinberg-equilibrium-hwe-test">Question 3: Hardy–Weinberg Equilibrium (HWE) Test</h4>
+<ol type="i">
+<li>Conduct the Hardy–Weinberg Equilibrium (HWE) test for all SNPs in the .bim file. Then, load the file containing the HWE p-value results and display the first few rows of the resulting data frame.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A1 <span class="at">--hardy</span> <span class="at">--out</span> ./02_activities/data/gwa_qc_A1_hwe</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa_qc_A1_hwe.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc.A1
+  --hardy
+  --out ./02_activities/data/gwa_qc_A1_hwe
+
+Start time: Wed Mar 18 12:26:12 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A1.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc.A1.bim.
+1 quantitative phenotype loaded (4000 values).
+Calculating allele frequencies... 0%64%done.
+--hardy: 0%0%1%1%2%2%3%3%4%4%5%5%6%6%7%7%8%8%9%9%10%10%11%11%12%12%13%13%14%14%15%15%16%16%17%17%18%18%19%19%20%20%21%21%22%22%23%23%24%24%25%25%26%26%27%27%28%28%29%29%30%30%31%31%32%32%33%33%34%34%35%35%36%36%37%37%38%38%39%39%40%40%41%41%42%42%43%43%44%44%45%45%46%46%47%47%48%48%49%49%50%50%51%51%52%52%53%53%54%54%55%55%56%56%57%57%58%58%59%59%60%60%61%61%62%62%63%63%64%64%65%65%66%66%67%67%68%68%69%69%70%70%71%71%72%72%73%73%74%74%75%75%76%76%77%77%78%78%79%79%80%80%81%81%82%82%83%83%84%84%85%85%86%86%87%87%88%88%89%89%90%90%91%91%92%92%93%93%94%94%95%95%96%96%97%97%98%98%99%
+--hardy: Autosomal Hardy-Weinberg report (founders only) written to
+./02_activities/data/gwa_qc_A1_hwe.hardy .
+End time: Wed Mar 18 12:26:12 2026</code></pre>
+</div>
+</div>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a>A1hwe <span class="ot">&lt;-</span> <span class="fu">fread</span>(<span class="st">"./02_activities/data/gwa_qc_A1_hwe.hardy"</span>)</span>
+<span id="cb34-2"><a href="#cb34-2" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(A1hwe)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>   #CHROM         ID     A1     AX HOM_A1_CT HET_A1_CT TWO_AX_CT O(HET_A1)
+    &lt;int&gt;     &lt;char&gt; &lt;char&gt; &lt;char&gt;     &lt;int&gt;     &lt;int&gt;     &lt;int&gt;     &lt;num&gt;
+1:      1  rs3737728      G      A      1713      1841       428  0.462330
+2:      1  rs1320565      C      T      3368       589        19  0.148139
+3:      1  rs3813199      G      A      3531       428        12  0.107781
+4:      1 rs11804831      T      C      2820      1061        81  0.267794
+5:      1  rs3766178      T      C      2391      1378       214  0.345970
+6:      1  rs3128342      C      A      1927      1655       382  0.417508
+   E(HET_A1)         P
+       &lt;num&gt;     &lt;num&gt;
+1:  0.447932 0.0437892
+2:  0.145262 0.2734290
+3:  0.107347 1.0000000
+4:  0.261040 0.1133540
+5:  0.350629 0.4158770
+6:  0.424044 0.3302730</code></pre>
+</div>
+</div>
+<ol start="2" type="i">
+<li>What are the HWE p-values for SNPs rs1861, rs3813199, rs3128342, and rs11804831?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb36"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb36-1"><a href="#cb36-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Subset SNPs and output allele frequencies</span></span>
+<span id="cb36-2"><a href="#cb36-2" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A1 <span class="dt">\</span></span>
+<span id="cb36-3"><a href="#cb36-3" aria-hidden="true" tabindex="-1"></a>      <span class="at">--extract</span> ./02_activities/data/A1snplist.txt <span class="dt">\</span></span>
+<span id="cb36-4"><a href="#cb36-4" aria-hidden="true" tabindex="-1"></a>      <span class="at">--hardy</span> <span class="dt">\</span></span>
+<span id="cb36-5"><a href="#cb36-5" aria-hidden="true" tabindex="-1"></a>      <span class="at">--out</span> ./02_activities/data/gwa_qc_A1_hwe_snplist</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa_qc_A1_hwe_snplist.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc.A1
+  --extract ./02_activities/data/A1snplist.txt
+  --hardy
+  --out ./02_activities/data/gwa_qc_A1_hwe_snplist
+
+Start time: Wed Mar 18 12:26:12 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A1.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc.A1.bim.
+1 quantitative phenotype loaded (4000 values).
+--extract: 4 variants remaining.
+Calculating allele frequencies... 0%0%done.
+--hardy: 0%0%25%50%75%
+--hardy: Autosomal Hardy-Weinberg report (founders only) written to
+./02_activities/data/gwa_qc_A1_hwe_snplist.hardy .
+End time: Wed Mar 18 12:26:12 2026</code></pre>
+</div>
+</div>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb38"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1"><a href="#cb38-1" aria-hidden="true" tabindex="-1"></a><span class="co">#View allele frequencies</span></span>
+<span id="cb38-2"><a href="#cb38-2" aria-hidden="true" tabindex="-1"></a>A1hwesnplist <span class="ot">&lt;-</span> <span class="fu">fread</span>(<span class="st">"./02_activities/data/gwa_qc_A1_hwe_snplist.hardy"</span>)</span>
+<span id="cb38-3"><a href="#cb38-3" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(A1hwesnplist)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>   #CHROM         ID     A1     AX HOM_A1_CT HET_A1_CT TWO_AX_CT O(HET_A1)
+    &lt;int&gt;     &lt;char&gt; &lt;char&gt; &lt;char&gt;     &lt;int&gt;     &lt;int&gt;     &lt;int&gt;     &lt;num&gt;
+1:      1  rs3813199      G      A      3531       428        12  0.107781
+2:      1 rs11804831      T      C      2820      1061        81  0.267794
+3:      1  rs3128342      C      A      1927      1655       382  0.417508
+4:      1     rs1861      C      A      3551       398        15  0.100404
+   E(HET_A1)        P
+       &lt;num&gt;    &lt;num&gt;
+1:  0.107347 1.000000
+2:  0.261040 0.113354
+3:  0.424044 0.330273
+4:  0.102143 0.274719</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb40"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1"><a href="#cb40-1" aria-hidden="true" tabindex="-1"></a><span class="co">#The HWE p-values for rs1861, rs3813199, rs3128342, and rs11804831 are the following: 0.274719, 1.000000, 0.330273, and 0.113354.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+</section>
+<section id="question-4-genetic-association-test" class="level4">
+<h4 class="anchored" data-anchor-id="question-4-genetic-association-test">Question 4: Genetic Association Test</h4>
+<ol type="i">
+<li>Conduct a linear regression to test the association between SNP rs1861 and the phenotype. What is the p-value?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb41"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb41-1"><a href="#cb41-1" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A1 <span class="dt">\</span></span>
+<span id="cb41-2"><a href="#cb41-2" aria-hidden="true" tabindex="-1"></a>      <span class="at">--extract</span> ./02_activities/data/A1snplist.txt <span class="dt">\</span></span>
+<span id="cb41-3"><a href="#cb41-3" aria-hidden="true" tabindex="-1"></a>      <span class="at">--recode</span> A <span class="dt">\</span></span>
+<span id="cb41-4"><a href="#cb41-4" aria-hidden="true" tabindex="-1"></a>      <span class="at">--out</span> ./02_activities/data/geno.A1.additive</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/geno.A1.additive.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc.A1
+  --export A
+  --extract ./02_activities/data/A1snplist.txt
+  --out ./02_activities/data/geno.A1.additive
+
+Start time: Wed Mar 18 12:26:12 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A1.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc.A1.bim.
+1 quantitative phenotype loaded (4000 values).
+--extract: 4 variants remaining.
+4 variants remaining after main filters.
+
+--export A pass 1/1: loading... 0%0%writing... 0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%42%43%44%45%46%47%48%49%50%51%52%53%54%55%56%57%58%59%60%61%62%63%64%65%66%67%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%85%86%87%88%89%90%91%92%93%94%95%96%97%98%99% done.
+--export A: ./02_activities/data/geno.A1.additive.raw written.
+End time: Wed Mar 18 12:26:12 2026</code></pre>
+</div>
+</div>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb43"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1"><a href="#cb43-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load genotype data</span></span>
+<span id="cb43-2"><a href="#cb43-2" aria-hidden="true" tabindex="-1"></a>geno_A1_subset <span class="ot">&lt;-</span> <span class="fu">fread</span>(<span class="st">"./02_activities/data/geno.A1.additive.raw"</span>)</span>
+<span id="cb43-3"><a href="#cb43-3" aria-hidden="true" tabindex="-1"></a>geno_A1_subset<span class="ot">=</span>geno_A1_subset[<span class="sc">!</span><span class="fu">is.na</span>(rs1861_C), ]</span>
+<span id="cb43-4"><a href="#cb43-4" aria-hidden="true" tabindex="-1"></a>geno_A1_subset<span class="sc">$</span>PHENOTYPE<span class="ot">=</span>geno_A1_subset<span class="sc">$</span>PHENOTYPE<span class="dv">-1</span></span>
+<span id="cb43-5"><a href="#cb43-5" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(geno_A1_subset)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>     FID    IID   PAT   MAT   SEX PHENOTYPE rs3813199_G rs11804831_T
+   &lt;int&gt; &lt;char&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;     &lt;num&gt;       &lt;int&gt;        &lt;int&gt;
+1:     1  A2002     0     0     1   0.85385           2            2
+2:     2  A2003     0     0     1   1.08264           2            1
+3:     3  A2004     0     0     1   1.73871           2            2
+4:     4  A2005     0     0     1   0.34114           2            1
+5:     6  A2007     0     0     1   1.38297           2            2
+6:     7  A2008     0     0     1   0.51430           2            2
+   rs3128342_C rs1861_C
+         &lt;int&gt;    &lt;int&gt;
+1:           2        2
+2:           2        2
+3:           1        2
+4:           1        2
+5:           2        2
+6:           1        2</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb45"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb45-1"><a href="#cb45-1" aria-hidden="true" tabindex="-1"></a>geno_rs1861 <span class="ot">&lt;-</span> geno_A1_subset <span class="sc">%&gt;%</span></span>
+<span id="cb45-2"><a href="#cb45-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">select</span>(FID, IID, PHENOTYPE, rs1861_C) <span class="sc">%&gt;%</span></span>
+<span id="cb45-3"><a href="#cb45-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">na.omit</span>()</span>
+<span id="cb45-4"><a href="#cb45-4" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb45-5"><a href="#cb45-5" aria-hidden="true" tabindex="-1"></a><span class="co">#Conduct a linear regression model between rs1861 and the phenotype</span></span>
+<span id="cb45-6"><a href="#cb45-6" aria-hidden="true" tabindex="-1"></a>lm_rs1861 <span class="ot">&lt;-</span> <span class="fu">lm</span>(PHENOTYPE <span class="sc">~</span> rs1861_C, <span class="at">data =</span> geno_rs1861)</span>
+<span id="cb45-7"><a href="#cb45-7" aria-hidden="true" tabindex="-1"></a><span class="fu">summary</span>(lm_rs1861)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>
+Call:
+lm(formula = PHENOTYPE ~ rs1861_C, data = geno_rs1861)
+
+Residuals:
+    Min      1Q  Median      3Q     Max 
+-3.5439 -0.6850  0.0021  0.6993  3.3268 
+
+Coefficients:
+            Estimate Std. Error t value Pr(&gt;|t|)    
+(Intercept) -0.94762    0.09486  -9.989   &lt;2e-16 ***
+rs1861_C     0.97382    0.04943  19.703   &lt;2e-16 ***
+---
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+
+Residual standard error: 1.003 on 3962 degrees of freedom
+Multiple R-squared:  0.08924,   Adjusted R-squared:  0.08901 
+F-statistic: 388.2 on 1 and 3962 DF,  p-value: &lt; 2.2e-16</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb47"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb47-1"><a href="#cb47-1" aria-hidden="true" tabindex="-1"></a><span class="co">#The p-value is less than 2 × 10⁻¹⁶. There is a statistically significant relationship between rs1861 and the phenotype.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<ol start="2" type="i">
+<li>How would you interpret the beta coefficient from this regression?</li>
+</ol>
+<pre><code>We performed a linear regression to test the association between genotype rs1861 and the continuous phenotype. The beta-coefficient is 0.97382. This means that for every additional C allele an individual has for rs1861, their phenotype increases by 0.97382 units.</code></pre>
+<ol start="3" type="i">
+<li>Plot the scatterplot of phenotype versus the genotype of SNP rs1861. Add the regression line to the plot.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb49"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb49-1"><a href="#cb49-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a new data frame containing a sequence of genotype values</span></span>
+<span id="cb49-2"><a href="#cb49-2" aria-hidden="true" tabindex="-1"></a> geno_range <span class="ot">&lt;-</span> <span class="fu">data.frame</span>(</span>
+<span id="cb49-3"><a href="#cb49-3" aria-hidden="true" tabindex="-1"></a>   <span class="at">rs1861_C =</span> <span class="fu">seq</span>(<span class="dv">0</span>, <span class="dv">2</span>, <span class="at">length.out =</span> <span class="dv">100</span>)</span>
+<span id="cb49-4"><a href="#cb49-4" aria-hidden="true" tabindex="-1"></a> )</span>
+<span id="cb49-5"><a href="#cb49-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Use the fitted linear regression model to predict phenotype probability</span></span>
+<span id="cb49-6"><a href="#cb49-6" aria-hidden="true" tabindex="-1"></a><span class="co"># for each genotype value in geno_range</span></span>
+<span id="cb49-7"><a href="#cb49-7" aria-hidden="true" tabindex="-1"></a> geno_range<span class="sc">$</span>predicted_prob <span class="ot">&lt;-</span> <span class="fu">predict</span>(lm_rs1861, <span class="at">newdata =</span> geno_range, <span class="at">type =</span> <span class="st">"response"</span>)</span>
+<span id="cb49-8"><a href="#cb49-8" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb49-9"><a href="#cb49-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Start from the observed genotype dataset</span></span>
+<span id="cb49-10"><a href="#cb49-10" aria-hidden="true" tabindex="-1"></a>count_data <span class="ot">&lt;-</span> geno_A1_subset <span class="sc">%&gt;%</span></span>
+<span id="cb49-11"><a href="#cb49-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Remove rows with missing genotype or missing phenotype</span></span>
+<span id="cb49-12"><a href="#cb49-12" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(<span class="sc">!</span><span class="fu">is.na</span>(rs1861_C), <span class="sc">!</span><span class="fu">is.na</span>(PHENOTYPE)) <span class="sc">%&gt;%</span></span>
+<span id="cb49-13"><a href="#cb49-13" aria-hidden="true" tabindex="-1"></a> <span class="co"># Group data by genotype value and phenotype category </span></span>
+<span id="cb49-14"><a href="#cb49-14" aria-hidden="true" tabindex="-1"></a>  <span class="fu">group_by</span>(rs1861_C, PHENOTYPE) <span class="sc">%&gt;%</span></span>
+<span id="cb49-15"><a href="#cb49-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Count how many observations fall into each genotype-phenotype combination</span></span>
+<span id="cb49-16"><a href="#cb49-16" aria-hidden="true" tabindex="-1"></a>  <span class="fu">summarise</span>(<span class="at">count =</span> <span class="fu">n</span>(), <span class="at">.groups =</span> <span class="st">"drop"</span>)</span>
+<span id="cb49-17"><a href="#cb49-17" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb49-18"><a href="#cb49-18" aria-hidden="true" tabindex="-1"></a><span class="co"># Start building the plot</span></span>
+<span id="cb49-19"><a href="#cb49-19" aria-hidden="true" tabindex="-1"></a>p1<span class="ot">=</span><span class="fu">ggplot</span>() <span class="sc">+</span></span>
+<span id="cb49-20"><a href="#cb49-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Add points showing observed genotype-phenotype combinations</span></span>
+<span id="cb49-21"><a href="#cb49-21" aria-hidden="true" tabindex="-1"></a>  <span class="fu">geom_point</span>(</span>
+<span id="cb49-22"><a href="#cb49-22" aria-hidden="true" tabindex="-1"></a>    <span class="co"># Use the summarized count_data for the points</span></span>
+<span id="cb49-23"><a href="#cb49-23" aria-hidden="true" tabindex="-1"></a>    <span class="at">data =</span> count_data,</span>
+<span id="cb49-24"><a href="#cb49-24" aria-hidden="true" tabindex="-1"></a>    <span class="co"># Map genotype to x-axis, phenotype to y-axis, and count to point size</span></span>
+<span id="cb49-25"><a href="#cb49-25" aria-hidden="true" tabindex="-1"></a>    <span class="fu">aes</span>(<span class="at">x =</span> rs1861_C, <span class="at">y =</span> PHENOTYPE, <span class="at">size =</span> count),</span>
+<span id="cb49-26"><a href="#cb49-26" aria-hidden="true" tabindex="-1"></a>    <span class="at">color =</span> <span class="st">"blue"</span></span>
+<span id="cb49-27"><a href="#cb49-27" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">+</span></span>
+<span id="cb49-28"><a href="#cb49-28" aria-hidden="true" tabindex="-1"></a>  <span class="co"># Add a regression line showing predicted probability across genotype values</span></span>
+<span id="cb49-29"><a href="#cb49-29" aria-hidden="true" tabindex="-1"></a>  <span class="fu">geom_line</span>(<span class="co"># Use the prediction data frame for the line</span></span>
+<span id="cb49-30"><a href="#cb49-30" aria-hidden="true" tabindex="-1"></a>    <span class="at">data =</span> geno_range,</span>
+<span id="cb49-31"><a href="#cb49-31" aria-hidden="true" tabindex="-1"></a>    <span class="fu">aes</span>(<span class="at">x =</span> rs1861_C, <span class="at">y =</span> predicted_prob),</span>
+<span id="cb49-32"><a href="#cb49-32" aria-hidden="true" tabindex="-1"></a>    <span class="at">color =</span> <span class="st">"red"</span>,</span>
+<span id="cb49-33"><a href="#cb49-33" aria-hidden="true" tabindex="-1"></a>    <span class="co"># Set line thickness</span></span>
+<span id="cb49-34"><a href="#cb49-34" aria-hidden="true" tabindex="-1"></a>    <span class="at">size =</span> <span class="fl">1.2</span></span>
+<span id="cb49-35"><a href="#cb49-35" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">+</span></span>
+<span id="cb49-36"><a href="#cb49-36" aria-hidden="true" tabindex="-1"></a>  <span class="co"># Control the size range of the points</span></span>
+<span id="cb49-37"><a href="#cb49-37" aria-hidden="true" tabindex="-1"></a>  <span class="fu">scale_size_continuous</span>(<span class="at">range =</span> <span class="fu">c</span>(<span class="dv">2</span>, <span class="dv">10</span>)) <span class="sc">+</span></span>
+<span id="cb49-38"><a href="#cb49-38" aria-hidden="true" tabindex="-1"></a>  <span class="fu">labs</span>(</span>
+<span id="cb49-39"><a href="#cb49-39" aria-hidden="true" tabindex="-1"></a>    <span class="at">title =</span> <span class="st">"Linear Regression: PHENOTYPE ~ Genotype"</span>,</span>
+<span id="cb49-40"><a href="#cb49-40" aria-hidden="true" tabindex="-1"></a>    <span class="at">x =</span> <span class="st">"Genotype (0/1/2)"</span>,</span>
+<span id="cb49-41"><a href="#cb49-41" aria-hidden="true" tabindex="-1"></a>    <span class="at">y =</span> <span class="st">"Phenotype "</span>,</span>
+<span id="cb49-42"><a href="#cb49-42" aria-hidden="true" tabindex="-1"></a>    <span class="at">size =</span> <span class="st">"Count"</span></span>
+<span id="cb49-43"><a href="#cb49-43" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">+</span></span>
+<span id="cb49-44"><a href="#cb49-44" aria-hidden="true" tabindex="-1"></a>  <span class="co"># Restrict the visible x- and y-axis range</span></span>
+<span id="cb49-45"><a href="#cb49-45" aria-hidden="true" tabindex="-1"></a>  <span class="fu">coord_cartesian</span>(<span class="at">xlim =</span> <span class="fu">c</span>(<span class="sc">-</span><span class="fl">0.5</span>, <span class="fl">2.5</span>)) <span class="sc">+</span></span>
+<span id="cb49-46"><a href="#cb49-46" aria-hidden="true" tabindex="-1"></a>   <span class="co"># Apply a minimal theme for a clean appearance</span></span>
+<span id="cb49-47"><a href="#cb49-47" aria-hidden="true" tabindex="-1"></a>  <span class="fu">theme_minimal</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
+ℹ Please use `linewidth` instead.</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb51"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb51-1"><a href="#cb51-1" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(p1)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output-display">
+<div>
+<figure class="figure">
+<p><img src="assignment_1_files/figure-html/unnamed-chunk-13-1.png" class="img-fluid figure-img" width="672"></p>
+</figure>
+</div>
+</div>
+</div>
+<ol start="4" type="i">
+<li>Convert the genotype coding for rs1861 to recessive coding.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb52"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb52-1"><a href="#cb52-1" aria-hidden="true" tabindex="-1"></a>geno_A1_subset_rec <span class="ot">&lt;-</span> geno_A1_subset <span class="co"># make a copy</span></span>
+<span id="cb52-2"><a href="#cb52-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb52-3"><a href="#cb52-3" aria-hidden="true" tabindex="-1"></a><span class="co">#change into recessive coding</span></span>
+<span id="cb52-4"><a href="#cb52-4" aria-hidden="true" tabindex="-1"></a>geno_A1_subset_rec[, <span class="dv">7</span><span class="sc">:</span><span class="fu">ncol</span>(geno_A1_subset_rec)] <span class="ot">&lt;-</span> <span class="fu">as.data.frame</span>(</span>
+<span id="cb52-5"><a href="#cb52-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">lapply</span>(geno_A1_subset[, <span class="dv">7</span><span class="sc">:</span><span class="fu">ncol</span>(geno_A1_subset)], <span class="cf">function</span>(x) <span class="fu">ifelse</span>(x <span class="sc">==</span> <span class="dv">2</span>, <span class="dv">1</span>, <span class="dv">0</span>))</span>
+<span id="cb52-6"><a href="#cb52-6" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb52-7"><a href="#cb52-7" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb52-8"><a href="#cb52-8" aria-hidden="true" tabindex="-1"></a><span class="co">#Subset to just rs1861</span></span>
+<span id="cb52-9"><a href="#cb52-9" aria-hidden="true" tabindex="-1"></a>geno_rs1861_rec <span class="ot">&lt;-</span> geno_A1_subset_rec <span class="sc">%&gt;%</span></span>
+<span id="cb52-10"><a href="#cb52-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">select</span>(FID, IID, PHENOTYPE, rs1861_C) <span class="sc">%&gt;%</span></span>
+<span id="cb52-11"><a href="#cb52-11" aria-hidden="true" tabindex="-1"></a>  <span class="fu">na.omit</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<ol start="22" type="a">
+<li>Conduct a linear regression to test the association between the recessive-coded rs1861 and the phenotype. What is the p-value?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb53"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb53-1"><a href="#cb53-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Conduct a linear regression model between rs1861 and the phenotype under recessive coding</span></span>
+<span id="cb53-2"><a href="#cb53-2" aria-hidden="true" tabindex="-1"></a>lm_rs1861_rec <span class="ot">&lt;-</span> <span class="fu">lm</span>(PHENOTYPE <span class="sc">~</span> rs1861_C, <span class="at">data =</span> geno_rs1861_rec)</span>
+<span id="cb53-3"><a href="#cb53-3" aria-hidden="true" tabindex="-1"></a><span class="fu">summary</span>(lm_rs1861_rec)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>
+Call:
+lm(formula = PHENOTYPE ~ rs1861_C, data = geno_rs1861_rec)
+
+Residuals:
+    Min      1Q  Median      3Q     Max 
+-3.5437 -0.6892  0.0015  0.7016  3.3270 
+
+Coefficients:
+             Estimate Std. Error t value Pr(&gt;|t|)    
+(Intercept) -0.007686   0.049446  -0.155    0.876    
+rs1861_C     1.007543   0.052242  19.286   &lt;2e-16 ***
+---
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+
+Residual standard error: 1.005 on 3962 degrees of freedom
+Multiple R-squared:  0.08582,   Adjusted R-squared:  0.08559 
+F-statistic:   372 on 1 and 3962 DF,  p-value: &lt; 2.2e-16</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb55"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1"><a href="#cb55-1" aria-hidden="true" tabindex="-1"></a><span class="co">#The p-value is less than 2 × 10⁻¹⁶. There is a statistically significant relationship between rs1861 and the phenotype.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<ol start="6" type="i">
+<li>Plot the scatterplot of phenotype versus the recessive-coded genotype of rs1861. Add the regression line to the plot.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb56"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb56-1"><a href="#cb56-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a new data frame containing a sequence of genotype values</span></span>
+<span id="cb56-2"><a href="#cb56-2" aria-hidden="true" tabindex="-1"></a> geno_range_rec <span class="ot">&lt;-</span> <span class="fu">data.frame</span>(</span>
+<span id="cb56-3"><a href="#cb56-3" aria-hidden="true" tabindex="-1"></a>   <span class="at">rs1861_C =</span> <span class="fu">seq</span>(<span class="dv">0</span>, <span class="dv">2</span>, <span class="at">length.out =</span> <span class="dv">100</span>)</span>
+<span id="cb56-4"><a href="#cb56-4" aria-hidden="true" tabindex="-1"></a> )</span>
+<span id="cb56-5"><a href="#cb56-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Use the fitted linear regression model to predict phenotype probability</span></span>
+<span id="cb56-6"><a href="#cb56-6" aria-hidden="true" tabindex="-1"></a><span class="co"># for each genotype value in geno_range</span></span>
+<span id="cb56-7"><a href="#cb56-7" aria-hidden="true" tabindex="-1"></a> geno_range_rec<span class="sc">$</span>predicted_prob <span class="ot">&lt;-</span> <span class="fu">predict</span>(lm_rs1861_rec, <span class="at">newdata =</span> geno_range_rec, <span class="at">type =</span> <span class="st">"response"</span>)</span>
+<span id="cb56-8"><a href="#cb56-8" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb56-9"><a href="#cb56-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Start from the observed genotype dataset</span></span>
+<span id="cb56-10"><a href="#cb56-10" aria-hidden="true" tabindex="-1"></a>count_data_rec <span class="ot">&lt;-</span> geno_A1_subset_rec <span class="sc">%&gt;%</span></span>
+<span id="cb56-11"><a href="#cb56-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Remove rows with missing genotype or missing phenotype</span></span>
+<span id="cb56-12"><a href="#cb56-12" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(<span class="sc">!</span><span class="fu">is.na</span>(rs1861_C), <span class="sc">!</span><span class="fu">is.na</span>(PHENOTYPE)) <span class="sc">%&gt;%</span></span>
+<span id="cb56-13"><a href="#cb56-13" aria-hidden="true" tabindex="-1"></a> <span class="co"># Group data by genotype value and phenotype category </span></span>
+<span id="cb56-14"><a href="#cb56-14" aria-hidden="true" tabindex="-1"></a>  <span class="fu">group_by</span>(rs1861_C, PHENOTYPE) <span class="sc">%&gt;%</span></span>
+<span id="cb56-15"><a href="#cb56-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Count how many observations fall into each genotype-phenotype combination</span></span>
+<span id="cb56-16"><a href="#cb56-16" aria-hidden="true" tabindex="-1"></a>  <span class="fu">summarise</span>(<span class="at">count =</span> <span class="fu">n</span>(), <span class="at">.groups =</span> <span class="st">"drop"</span>)</span>
+<span id="cb56-17"><a href="#cb56-17" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb56-18"><a href="#cb56-18" aria-hidden="true" tabindex="-1"></a><span class="co"># Start building the plot</span></span>
+<span id="cb56-19"><a href="#cb56-19" aria-hidden="true" tabindex="-1"></a>p2<span class="ot">=</span><span class="fu">ggplot</span>() <span class="sc">+</span></span>
+<span id="cb56-20"><a href="#cb56-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Add points showing observed genotype-phenotype combinations</span></span>
+<span id="cb56-21"><a href="#cb56-21" aria-hidden="true" tabindex="-1"></a>  <span class="fu">geom_point</span>(</span>
+<span id="cb56-22"><a href="#cb56-22" aria-hidden="true" tabindex="-1"></a>    <span class="co"># Use the summarized count_data for the points</span></span>
+<span id="cb56-23"><a href="#cb56-23" aria-hidden="true" tabindex="-1"></a>    <span class="at">data =</span> count_data_rec,</span>
+<span id="cb56-24"><a href="#cb56-24" aria-hidden="true" tabindex="-1"></a>    <span class="co"># Map genotype to x-axis, phenotype to y-axis, and count to point size</span></span>
+<span id="cb56-25"><a href="#cb56-25" aria-hidden="true" tabindex="-1"></a>    <span class="fu">aes</span>(<span class="at">x =</span> rs1861_C, <span class="at">y =</span> PHENOTYPE, <span class="at">size =</span> count),</span>
+<span id="cb56-26"><a href="#cb56-26" aria-hidden="true" tabindex="-1"></a>    <span class="at">color =</span> <span class="st">"blue"</span></span>
+<span id="cb56-27"><a href="#cb56-27" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">+</span></span>
+<span id="cb56-28"><a href="#cb56-28" aria-hidden="true" tabindex="-1"></a>  <span class="co"># Add a regression line showing predicted probability across genotype values</span></span>
+<span id="cb56-29"><a href="#cb56-29" aria-hidden="true" tabindex="-1"></a>  <span class="fu">geom_line</span>(<span class="co"># Use the prediction data frame for the line</span></span>
+<span id="cb56-30"><a href="#cb56-30" aria-hidden="true" tabindex="-1"></a>    <span class="at">data =</span> geno_range_rec,</span>
+<span id="cb56-31"><a href="#cb56-31" aria-hidden="true" tabindex="-1"></a>    <span class="fu">aes</span>(<span class="at">x =</span> rs1861_C, <span class="at">y =</span> predicted_prob),</span>
+<span id="cb56-32"><a href="#cb56-32" aria-hidden="true" tabindex="-1"></a>    <span class="at">color =</span> <span class="st">"red"</span>,</span>
+<span id="cb56-33"><a href="#cb56-33" aria-hidden="true" tabindex="-1"></a>    <span class="co"># Set line thickness</span></span>
+<span id="cb56-34"><a href="#cb56-34" aria-hidden="true" tabindex="-1"></a>    <span class="at">size =</span> <span class="fl">1.2</span></span>
+<span id="cb56-35"><a href="#cb56-35" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">+</span></span>
+<span id="cb56-36"><a href="#cb56-36" aria-hidden="true" tabindex="-1"></a>  <span class="co"># Control the size range of the points</span></span>
+<span id="cb56-37"><a href="#cb56-37" aria-hidden="true" tabindex="-1"></a>  <span class="fu">scale_size_continuous</span>(<span class="at">range =</span> <span class="fu">c</span>(<span class="dv">2</span>, <span class="dv">10</span>)) <span class="sc">+</span></span>
+<span id="cb56-38"><a href="#cb56-38" aria-hidden="true" tabindex="-1"></a>  <span class="fu">labs</span>(</span>
+<span id="cb56-39"><a href="#cb56-39" aria-hidden="true" tabindex="-1"></a>    <span class="at">title =</span> <span class="st">"Linear Regression: PHENOTYPE ~ Genotype"</span>,</span>
+<span id="cb56-40"><a href="#cb56-40" aria-hidden="true" tabindex="-1"></a>    <span class="at">x =</span> <span class="st">"Genotype (0/1)"</span>,</span>
+<span id="cb56-41"><a href="#cb56-41" aria-hidden="true" tabindex="-1"></a>    <span class="at">y =</span> <span class="st">"Phenotype "</span>,</span>
+<span id="cb56-42"><a href="#cb56-42" aria-hidden="true" tabindex="-1"></a>    <span class="at">size =</span> <span class="st">"Count"</span>) <span class="sc">+</span></span>
+<span id="cb56-43"><a href="#cb56-43" aria-hidden="true" tabindex="-1"></a>  <span class="co"># Restrict the visible x- and y-axis range</span></span>
+<span id="cb56-44"><a href="#cb56-44" aria-hidden="true" tabindex="-1"></a>  <span class="fu">coord_cartesian</span>(<span class="at">xlim =</span> <span class="fu">c</span>(<span class="sc">-</span><span class="fl">0.5</span>,<span class="fl">1.5</span>)) <span class="sc">+</span></span>
+<span id="cb56-45"><a href="#cb56-45" aria-hidden="true" tabindex="-1"></a>  <span class="fu">scale_x_continuous</span>(<span class="at">breaks =</span> <span class="fu">c</span>(<span class="dv">0</span>, <span class="dv">1</span>))<span class="sc">+</span></span>
+<span id="cb56-46"><a href="#cb56-46" aria-hidden="true" tabindex="-1"></a>   <span class="co"># Apply a minimal theme for a clean appearance</span></span>
+<span id="cb56-47"><a href="#cb56-47" aria-hidden="true" tabindex="-1"></a>  <span class="fu">theme_minimal</span>()</span>
+<span id="cb56-48"><a href="#cb56-48" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb56-49"><a href="#cb56-49" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(p2)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output-display">
+<div>
+<figure class="figure">
+<p><img src="assignment_1_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid figure-img" width="672"></p>
+</figure>
+</div>
+</div>
+</div>
+<ol start="7" type="i">
+<li>Which model fits better? Justify your answer.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb57"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb57-1"><a href="#cb57-1" aria-hidden="true" tabindex="-1"></a><span class="fu">AIC</span>(lm_rs1861, lm_rs1861_rec)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>              df      AIC
+lm_rs1861      3 11276.91
+lm_rs1861_rec  3 11291.74</code></pre>
+</div>
+</div>
+<pre><code># The AIC is higher in the recessive model (AIC=11291.74), indicating that the additive model (AIC=11276.91) may fit better slightly better overall by capturing a more gradual allele-phenotype effect.
+
+# However, it is also important to note that there are not a lot of individuals with the homozygous variant genotype in this dataset. As a result, that may be why the recessive model and additive models fit similarly.</code></pre>
+</section>
+<section id="criteria" class="level3">
+<h3 class="anchored" data-anchor-id="criteria">Criteria</h3>
+<table class="caption-top table">
+<colgroup>
+<col style="width: 33%">
+<col style="width: 33%">
+<col style="width: 33%">
+</colgroup>
+<thead>
+<tr class="header">
+<th>Criteria</th>
+<th>Complete</th>
+<th>Incomplete</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td><strong>Data Inspection</strong></td>
+<td>Correct sample/SNP counts and variable type identified.</td>
+<td>Missing or incorrect counts or variable type.</td>
+</tr>
+<tr class="even">
+<td><strong>Allele Frequency Estimation</strong></td>
+<td>Correct allele and minor allele frequencies computed.</td>
+<td>Frequencies missing or wrong.</td>
+</tr>
+<tr class="odd">
+<td><strong>Hardy–Weinberg Equilibrium Test</strong></td>
+<td>Correct PLINK command and p-value extraction in R.</td>
+<td>PLINK command or extraction incorrect/missing.</td>
+</tr>
+<tr class="even">
+<td><strong>Genetic Association Test</strong></td>
+<td>Correct regressions, plots, coding, and interpretation.</td>
+<td>Regression, plots, or interpretation missing/incomplete.</td>
+</tr>
+</tbody>
+</table>
+</section>
+<section id="submission-information" class="level3">
+<h3 class="anchored" data-anchor-id="submission-information">Submission Information</h3>
+<p>📌 Please review our <a href="https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md">Assignment Submission Guide</a> for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.</p>
+<section id="note" class="level4">
+<h4 class="anchored" data-anchor-id="note">Note:</h4>
+<p>If you like, you may collaborate with others in the cohort. If you choose to do so, please indicate with whom you have worked with in your pull request by tagging their GitHub username. Separate submissions are required.</p>
+<hr>
+</section>
+<section id="submission-parameters" class="level4">
+<h4 class="anchored" data-anchor-id="submission-parameters">Submission Parameters</h4>
+<ul>
+<li><p>Submission Due Date: <code>11:59 PM – 16/03/2026</code></p></li>
+<li><p>Branch name for your repo should be: <code>assignment-1</code></p></li>
+<li><p>What to submit for this assignment:</p>
+<ul>
+<li>Populate this Quarto document (<code>assignment_1.qmd</code>).</li>
+<li>Render the document with Quarto: <code>quarto render assignment_1.qmd</code>.</li>
+<li>Submit both <code>assignment_1.qmd</code> and the rendered HTML file <code>assignment_1.html</code> in your pull request.</li>
+</ul></li>
+<li><p>What the pull request link should look like for this assignment: <code>https://github.com/&lt;your_github_username&gt;/gen_data/pull/&lt;pr_id&gt;</code></p>
+<ul>
+<li>Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support team review your submission easily.</li>
+</ul></li>
+</ul>
+<hr>
+<p>Checklist:</p>
+<ul>
+<li>Created a branch with the correct naming convention.</li>
+<li>Ensured that the repository is public.</li>
+<li>Reviewed the PR description guidelines and adhered to them.</li>
+<li>Verified that the link is accessible in a private browser window.</li>
+<li>Confirmed that both <code>assignment_1.qmd</code> and <code>assignment_1.html</code> are included in the pull request.</li>
+</ul>
+<p>If you encounter any difficulties or have questions, please don’t hesitate to reach out to our team via our Slack help channel. Our technical facilitators and learning support team are here to help you navigate any challenges.</p>
+</section>
+</section>
+</section>
+
+</main>
+<!-- /main column -->
+<script id="quarto-html-after-body" type="application/javascript">
+  window.document.addEventListener("DOMContentLoaded", function (event) {
+    const icon = "";
+    const anchorJS = new window.AnchorJS();
+    anchorJS.options = {
+      placement: 'right',
+      icon: icon
+    };
+    anchorJS.add('.anchored');
+    const isCodeAnnotation = (el) => {
+      for (const clz of el.classList) {
+        if (clz.startsWith('code-annotation-')) {                     
+          return true;
+        }
+      }
+      return false;
+    }
+    const onCopySuccess = function(e) {
+      // button target
+      const button = e.trigger;
+      // don't keep focus
+      button.blur();
+      // flash "checked"
+      button.classList.add('code-copy-button-checked');
+      var currentTitle = button.getAttribute("title");
+      button.setAttribute("title", "Copied!");
+      let tooltip;
+      if (window.bootstrap) {
+        button.setAttribute("data-bs-toggle", "tooltip");
+        button.setAttribute("data-bs-placement", "left");
+        button.setAttribute("data-bs-title", "Copied!");
+        tooltip = new bootstrap.Tooltip(button, 
+          { trigger: "manual", 
+            customClass: "code-copy-button-tooltip",
+            offset: [0, -8]});
+        tooltip.show();    
+      }
+      setTimeout(function() {
+        if (tooltip) {
+          tooltip.hide();
+          button.removeAttribute("data-bs-title");
+          button.removeAttribute("data-bs-toggle");
+          button.removeAttribute("data-bs-placement");
+        }
+        button.setAttribute("title", currentTitle);
+        button.classList.remove('code-copy-button-checked');
+      }, 1000);
+      // clear code selection
+      e.clearSelection();
+    }
+    const getTextToCopy = function(trigger) {
+      const outerScaffold = trigger.parentElement.cloneNode(true);
+      const codeEl = outerScaffold.querySelector('code');
+      for (const childEl of codeEl.children) {
+        if (isCodeAnnotation(childEl)) {
+          childEl.remove();
+        }
+      }
+      return codeEl.innerText;
+    }
+    const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', {
+      text: getTextToCopy
+    });
+    clipboard.on('success', onCopySuccess);
+    if (window.document.getElementById('quarto-embedded-source-code-modal')) {
+      const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', {
+        text: getTextToCopy,
+        container: window.document.getElementById('quarto-embedded-source-code-modal')
+      });
+      clipboardModal.on('success', onCopySuccess);
+    }
+      var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
+      var mailtoRegex = new RegExp(/^mailto:/);
+        var filterRegex = new RegExp('/' + window.location.host + '/');
+      var isInternal = (href) => {
+          return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
+      }
+      // Inspect non-navigation links and adorn them if external
+     var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)');
+      for (var i=0; i<links.length; i++) {
+        const link = links[i];
+        if (!isInternal(link.href)) {
+          // undo the damage that might have been done by quarto-nav.js in the case of
+          // links that we want to consider external
+          if (link.dataset.originalHref !== undefined) {
+            link.href = link.dataset.originalHref;
+          }
+        }
+      }
+    function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) {
+      const config = {
+        allowHTML: true,
+        maxWidth: 500,
+        delay: 100,
+        arrow: false,
+        appendTo: function(el) {
+            return el.parentElement;
+        },
+        interactive: true,
+        interactiveBorder: 10,
+        theme: 'quarto',
+        placement: 'bottom-start',
+      };
+      if (contentFn) {
+        config.content = contentFn;
+      }
+      if (onTriggerFn) {
+        config.onTrigger = onTriggerFn;
+      }
+      if (onUntriggerFn) {
+        config.onUntrigger = onUntriggerFn;
+      }
+      window.tippy(el, config); 
+    }
+    const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');
+    for (var i=0; i<noterefs.length; i++) {
+      const ref = noterefs[i];
+      tippyHover(ref, function() {
+        // use id or data attribute instead here
+        let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href');
+        try { href = new URL(href).hash; } catch {}
+        const id = href.replace(/^#\/?/, "");
+        const note = window.document.getElementById(id);
+        if (note) {
+          return note.innerHTML;
+        } else {
+          return "";
+        }
+      });
+    }
+    const xrefs = window.document.querySelectorAll('a.quarto-xref');
+    const processXRef = (id, note) => {
+      // Strip column container classes
+      const stripColumnClz = (el) => {
+        el.classList.remove("page-full", "page-columns");
+        if (el.children) {
+          for (const child of el.children) {
+            stripColumnClz(child);
+          }
+        }
+      }
+      stripColumnClz(note)
+      if (id === null || id.startsWith('sec-')) {
+        // Special case sections, only their first couple elements
+        const container = document.createElement("div");
+        if (note.children && note.children.length > 2) {
+          container.appendChild(note.children[0].cloneNode(true));
+          for (let i = 1; i < note.children.length; i++) {
+            const child = note.children[i];
+            if (child.tagName === "P" && child.innerText === "") {
+              continue;
+            } else {
+              container.appendChild(child.cloneNode(true));
+              break;
+            }
+          }
+          if (window.Quarto?.typesetMath) {
+            window.Quarto.typesetMath(container);
+          }
+          return container.innerHTML
+        } else {
+          if (window.Quarto?.typesetMath) {
+            window.Quarto.typesetMath(note);
+          }
+          return note.innerHTML;
+        }
+      } else {
+        // Remove any anchor links if they are present
+        const anchorLink = note.querySelector('a.anchorjs-link');
+        if (anchorLink) {
+          anchorLink.remove();
+        }
+        if (window.Quarto?.typesetMath) {
+          window.Quarto.typesetMath(note);
+        }
+        if (note.classList.contains("callout")) {
+          return note.outerHTML;
+        } else {
+          return note.innerHTML;
+        }
+      }
+    }
+    for (var i=0; i<xrefs.length; i++) {
+      const xref = xrefs[i];
+      tippyHover(xref, undefined, function(instance) {
+        instance.disable();
+        let url = xref.getAttribute('href');
+        let hash = undefined; 
+        if (url.startsWith('#')) {
+          hash = url;
+        } else {
+          try { hash = new URL(url).hash; } catch {}
+        }
+        if (hash) {
+          const id = hash.replace(/^#\/?/, "");
+          const note = window.document.getElementById(id);
+          if (note !== null) {
+            try {
+              const html = processXRef(id, note.cloneNode(true));
+              instance.setContent(html);
+            } finally {
+              instance.enable();
+              instance.show();
+            }
+          } else {
+            // See if we can fetch this
+            fetch(url.split('#')[0])
+            .then(res => res.text())
+            .then(html => {
+              const parser = new DOMParser();
+              const htmlDoc = parser.parseFromString(html, "text/html");
+              const note = htmlDoc.getElementById(id);
+              if (note !== null) {
+                const html = processXRef(id, note);
+                instance.setContent(html);
+              } 
+            }).finally(() => {
+              instance.enable();
+              instance.show();
+            });
+          }
+        } else {
+          // See if we can fetch a full url (with no hash to target)
+          // This is a special case and we should probably do some content thinning / targeting
+          fetch(url)
+          .then(res => res.text())
+          .then(html => {
+            const parser = new DOMParser();
+            const htmlDoc = parser.parseFromString(html, "text/html");
+            const note = htmlDoc.querySelector('main.content');
+            if (note !== null) {
+              // This should only happen for chapter cross references
+              // (since there is no id in the URL)
+              // remove the first header
+              if (note.children.length > 0 && note.children[0].tagName === "HEADER") {
+                note.children[0].remove();
+              }
+              const html = processXRef(null, note);
+              instance.setContent(html);
+            } 
+          }).finally(() => {
+            instance.enable();
+            instance.show();
+          });
+        }
+      }, function(instance) {
+      });
+    }
+        let selectedAnnoteEl;
+        const selectorForAnnotation = ( cell, annotation) => {
+          let cellAttr = 'data-code-cell="' + cell + '"';
+          let lineAttr = 'data-code-annotation="' +  annotation + '"';
+          const selector = 'span[' + cellAttr + '][' + lineAttr + ']';
+          return selector;
+        }
+        const selectCodeLines = (annoteEl) => {
+          const doc = window.document;
+          const targetCell = annoteEl.getAttribute("data-target-cell");
+          const targetAnnotation = annoteEl.getAttribute("data-target-annotation");
+          const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation));
+          const lines = annoteSpan.getAttribute("data-code-lines").split(",");
+          const lineIds = lines.map((line) => {
+            return targetCell + "-" + line;
+          })
+          let top = null;
+          let height = null;
+          let parent = null;
+          if (lineIds.length > 0) {
+              //compute the position of the single el (top and bottom and make a div)
+              const el = window.document.getElementById(lineIds[0]);
+              top = el.offsetTop;
+              height = el.offsetHeight;
+              parent = el.parentElement.parentElement;
+            if (lineIds.length > 1) {
+              const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]);
+              const bottom = lastEl.offsetTop + lastEl.offsetHeight;
+              height = bottom - top;
+            }
+            if (top !== null && height !== null && parent !== null) {
+              // cook up a div (if necessary) and position it 
+              let div = window.document.getElementById("code-annotation-line-highlight");
+              if (div === null) {
+                div = window.document.createElement("div");
+                div.setAttribute("id", "code-annotation-line-highlight");
+                div.style.position = 'absolute';
+                parent.appendChild(div);
+              }
+              div.style.top = top - 2 + "px";
+              div.style.height = height + 4 + "px";
+              div.style.left = 0;
+              let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");
+              if (gutterDiv === null) {
+                gutterDiv = window.document.createElement("div");
+                gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter");
+                gutterDiv.style.position = 'absolute';
+                const codeCell = window.document.getElementById(targetCell);
+                const gutter = codeCell.querySelector('.code-annotation-gutter');
+                gutter.appendChild(gutterDiv);
+              }
+              gutterDiv.style.top = top - 2 + "px";
+              gutterDiv.style.height = height + 4 + "px";
+            }
+            selectedAnnoteEl = annoteEl;
+          }
+        };
+        const unselectCodeLines = () => {
+          const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"];
+          elementsIds.forEach((elId) => {
+            const div = window.document.getElementById(elId);
+            if (div) {
+              div.remove();
+            }
+          });
+          selectedAnnoteEl = undefined;
+        };
+          // Handle positioning of the toggle
+      window.addEventListener(
+        "resize",
+        throttle(() => {
+          elRect = undefined;
+          if (selectedAnnoteEl) {
+            selectCodeLines(selectedAnnoteEl);
+          }
+        }, 10)
+      );
+      function throttle(fn, ms) {
+      let throttle = false;
+      let timer;
+        return (...args) => {
+          if(!throttle) { // first call gets through
+              fn.apply(this, args);
+              throttle = true;
+          } else { // all the others get throttled
+              if(timer) clearTimeout(timer); // cancel #2
+              timer = setTimeout(() => {
+                fn.apply(this, args);
+                timer = throttle = false;
+              }, ms);
+          }
+        };
+      }
+        // Attach click handler to the DT
+        const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');
+        for (const annoteDlNode of annoteDls) {
+          annoteDlNode.addEventListener('click', (event) => {
+            const clickedEl = event.target;
+            if (clickedEl !== selectedAnnoteEl) {
+              unselectCodeLines();
+              const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active');
+              if (activeEl) {
+                activeEl.classList.remove('code-annotation-active');
+              }
+              selectCodeLines(clickedEl);
+              clickedEl.classList.add('code-annotation-active');
+            } else {
+              // Unselect the line
+              unselectCodeLines();
+              clickedEl.classList.remove('code-annotation-active');
+            }
+          });
+        }
+    const findCites = (el) => {
+      const parentEl = el.parentElement;
+      if (parentEl) {
+        const cites = parentEl.dataset.cites;
+        if (cites) {
+          return {
+            el,
+            cites: cites.split(' ')
+          };
+        } else {
+          return findCites(el.parentElement)
+        }
+      } else {
+        return undefined;
+      }
+    };
+    var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');
+    for (var i=0; i<bibliorefs.length; i++) {
+      const ref = bibliorefs[i];
+      const citeInfo = findCites(ref);
+      if (citeInfo) {
+        tippyHover(citeInfo.el, function() {
+          var popup = window.document.createElement('div');
+          citeInfo.cites.forEach(function(cite) {
+            var citeDiv = window.document.createElement('div');
+            citeDiv.classList.add('hanging-indent');
+            citeDiv.classList.add('csl-entry');
+            var biblioDiv = window.document.getElementById('ref-' + cite);
+            if (biblioDiv) {
+              citeDiv.innerHTML = biblioDiv.innerHTML;
+            }
+            popup.appendChild(citeDiv);
+          });
+          return popup.innerHTML;
+        });
+      }
+    }
+  });
+  </script>
+</div> <!-- /content -->
+
+
+
+
+</body></html>
\ No newline at end of file
diff --git a/02_activities/assignments/assignment_1.qmd b/02_activities/assignments/assignment_1.qmd
index 550af3d..064ee7e 100644
--- a/02_activities/assignments/assignment_1.qmd
+++ b/02_activities/assignments/assignment_1.qmd
@@ -13,98 +13,298 @@ Please bring questions that you cannot work out on your own to office hours, wor
 
 You will need to install PLINK and run the analyses. Please follow the OS-specific setup guide in [`SETUP.md`](../../SETUP.md). PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
 
+```{r setup}
+# Set up a consistent path for all chunks of code
+knitr::opts_knit$set(root.dir = normalizePath("../../"))
+
+library(data.table)
+library(ggplot2)
+library(seqminer)
+library(HardyWeinberg)
+library(dplyr)
+```
+
+
 #### Question 1: Data inspection
 
 Before fitting any models, it is essential to understand the data. Use R or bash code to answer the following questions about the `gwa.qc.A1.fam`, `gwa.qc.A1.bim`, and `gwa.qc.A1.bed` files, available at the following Google Drive link: <https://drive.google.com/drive/folders/11meVqGCY5yAyI1fh-fAlMEXQt0VmRGuz?usp=drive_link>. Please download all three files from this link and place them in `02_activities/data/`.
 
 (i) Read the .fam file. How many samples does the dataset contain?
 
-```         
-# Your answer here...
+```{bash} 
+#Read the .fam file
+wc -l ./02_activities/data/gwa.qc.A1.fam
+
+#The dataset contains 4000 samples.
 ```
 
 (ii) What is the 'variable type' of the response variable (i.e.Continuous or binary)?
 
-```         
-# Your answer here...
+```{bash}         
+head ./02_activities/data/gwa.qc.A1.fam
+
+#The response variable is continuous.
 ```
 
 (iii) Read the .bim file. How many SNPs does the dataset contain?
 
-```         
-# Your answer here...
+```{bash}         
+wc -l ./02_activities/data/gwa.qc.A1.bim
+
+#There are 101083 SNPs in the dataset.
 ```
 
 #### Question 2: Allele Frequency Estimation
 
 (i) Load the genotype matrix for SNPs rs1861, rs3813199, rs3128342, and rs11804831 using additive coding. What are the allele frequencies (AFs) for these four SNPs?
 
-```         
-# Your code here...
+```{bash}         
+#Create SNP list file with 4 SNPs: rs1861, rs3813199, rs3128342, and rs11804831
+setopt interactivecomments
+printf "%s\n" rs1861 rs3813199 rs3128342 rs11804831 > ./02_activities/data/A1snplist.txt
+
+#Subset SNPs and output allele frequencies
+plink2 --bfile ./02_activities/data/gwa.qc.A1 \
+      --extract ./02_activities/data/A1snplist.txt \
+      --recode A\
+      --freq \
+      --out ./02_activities/data/gwa_qc_A1_freq
+```
+```{r}
+#View allele frequencies
+A1freq <- fread("./02_activities/data/gwa_qc_A1_freq.afreq")
+head(A1freq)
+
+#The allele frequencies for the alternative allele of rs1861, rs3813199, rs3128342, and rs11804831 are as follows: 0.0539859, 0.0569126, 0.3051210, and 0.1543410.
+
+A1freq$REF_FREQ <- 1 - A1freq$ALT_FREQS
+A1freq
+
+#The allele frequencies for the reference allele of rs1861, rs3813199, rs3128342, and rs11804831 are as follows: 0.9460141, 0.9430874, 0.6948790, and 0.8456590.
 ```
 
 (ii) What are the minor allele frequencies (MAFs) for these four SNPs?
 
-```         
-# Your code here...
+```{r}         
+A1freq$MAF <- pmin(A1freq$ALT_FREQS, 1 - A1freq$ALT_FREQS)
+
+A1freq[, .(ID, AF = ALT_FREQS, MAF)]
+
+#The minor allele frequencies for rs1861, rs3813199, rs3128342, and rs11804831 are as follows: 0.0539859, 0.0569126, 0.3051210, and 0.1543410.
 ```
 
+
 #### Question 3: Hardy–Weinberg Equilibrium (HWE) Test
 
 (i) Conduct the Hardy–Weinberg Equilibrium (HWE) test for all SNPs in the .bim file. Then, load the file containing the HWE p-value results and display the first few rows of the resulting data frame.
 
-```         
-# Your code here...
+```{bash}         
+plink2 --bfile ./02_activities/data/gwa.qc.A1 --hardy --out ./02_activities/data/gwa_qc_A1_hwe
+```
+
+```{r}
+A1hwe <- fread("./02_activities/data/gwa_qc_A1_hwe.hardy")
+head(A1hwe)
 ```
 
 (ii) What are the HWE p-values for SNPs rs1861, rs3813199, rs3128342, and rs11804831?
 
-```         
-# Your code here...
+```{bash}         
+#Subset SNPs and output allele frequencies
+plink2 --bfile ./02_activities/data/gwa.qc.A1 \
+      --extract ./02_activities/data/A1snplist.txt \
+      --hardy \
+      --out ./02_activities/data/gwa_qc_A1_hwe_snplist
+```
+
+```{r}
+#View allele frequencies
+A1hwesnplist <- fread("./02_activities/data/gwa_qc_A1_hwe_snplist.hardy")
+head(A1hwesnplist)
+
+#The HWE p-values for rs1861, rs3813199, rs3128342, and rs11804831 are the following: 0.274719, 1.000000, 0.330273, and 0.113354.
 ```
 
 #### Question 4: Genetic Association Test
 
 (i) Conduct a linear regression to test the association between SNP rs1861 and the phenotype. What is the p-value?
 
-```         
-# Your code here...
+```{bash}
+plink2 --bfile ./02_activities/data/gwa.qc.A1 \
+      --extract ./02_activities/data/A1snplist.txt \
+      --recode A \
+      --out ./02_activities/data/geno.A1.additive
+```
+
+```{r}
+# Load genotype data
+geno_A1_subset <- fread("./02_activities/data/geno.A1.additive.raw")
+geno_A1_subset=geno_A1_subset[!is.na(rs1861_C), ]
+geno_A1_subset$PHENOTYPE=geno_A1_subset$PHENOTYPE-1
+head(geno_A1_subset)
+
+geno_rs1861 <- geno_A1_subset %>%
+  select(FID, IID, PHENOTYPE, rs1861_C) %>%
+  na.omit()
+
+#Conduct a linear regression model between rs1861 and the phenotype
+lm_rs1861 <- lm(PHENOTYPE ~ rs1861_C, data = geno_rs1861)
+summary(lm_rs1861)
+
+#The p-value is less than 2 × 10⁻¹⁶. There is a statistically significant relationship between rs1861 and the phenotype.
 ```
 
 (ii) How would you interpret the beta coefficient from this regression?
 
 ```         
-# Your answer here...
+We performed a linear regression to test the association between genotype rs1861 and the continuous phenotype. The beta-coefficient is 0.97382. This means that for every additional C allele an individual has for rs1861, their phenotype increases by 0.97382 units.
 ```
 
 (iii) Plot the scatterplot of phenotype versus the genotype of SNP rs1861. Add the regression line to the plot.
 
-```         
-# Your code here...
+```{r}         
+# Create a new data frame containing a sequence of genotype values
+ geno_range <- data.frame(
+   rs1861_C = seq(0, 2, length.out = 100)
+ )
+# Use the fitted linear regression model to predict phenotype probability
+# for each genotype value in geno_range
+ geno_range$predicted_prob <- predict(lm_rs1861, newdata = geno_range, type = "response")
+
+# Start from the observed genotype dataset
+count_data <- geno_A1_subset %>%
+# Remove rows with missing genotype or missing phenotype
+  filter(!is.na(rs1861_C), !is.na(PHENOTYPE)) %>%
+ # Group data by genotype value and phenotype category 
+  group_by(rs1861_C, PHENOTYPE) %>%
+# Count how many observations fall into each genotype-phenotype combination
+  summarise(count = n(), .groups = "drop")
+
+# Start building the plot
+p1=ggplot() +
+# Add points showing observed genotype-phenotype combinations
+  geom_point(
+    # Use the summarized count_data for the points
+    data = count_data,
+    # Map genotype to x-axis, phenotype to y-axis, and count to point size
+    aes(x = rs1861_C, y = PHENOTYPE, size = count),
+    color = "blue"
+  ) +
+  # Add a regression line showing predicted probability across genotype values
+  geom_line(# Use the prediction data frame for the line
+    data = geno_range,
+    aes(x = rs1861_C, y = predicted_prob),
+    color = "red",
+    # Set line thickness
+    size = 1.2
+  ) +
+  # Control the size range of the points
+  scale_size_continuous(range = c(2, 10)) +
+  labs(
+    title = "Linear Regression: PHENOTYPE ~ Genotype",
+    x = "Genotype (0/1/2)",
+    y = "Phenotype ",
+    size = "Count"
+  ) +
+  # Restrict the visible x- and y-axis range
+  coord_cartesian(xlim = c(-0.5, 2.5)) +
+   # Apply a minimal theme for a clean appearance
+  theme_minimal()
+
+print(p1)
 ```
 
 (iv) Convert the genotype coding for rs1861 to recessive coding.
 
-```         
-# Your code here...
+``` {r}  
+geno_A1_subset_rec <- geno_A1_subset # make a copy
+
+#change into recessive coding
+geno_A1_subset_rec[, 7:ncol(geno_A1_subset_rec)] <- as.data.frame(
+  lapply(geno_A1_subset[, 7:ncol(geno_A1_subset)], function(x) ifelse(x == 2, 1, 0))
+)
+
+#Subset to just rs1861
+geno_rs1861_rec <- geno_A1_subset_rec %>%
+  select(FID, IID, PHENOTYPE, rs1861_C) %>%
+  na.omit()
+
 ```
 
 (v) Conduct a linear regression to test the association between the recessive-coded rs1861 and the phenotype. What is the p-value?
 
-```         
-# Your code here...
+``` {r}        
+#Conduct a linear regression model between rs1861 and the phenotype under recessive coding
+lm_rs1861_rec <- lm(PHENOTYPE ~ rs1861_C, data = geno_rs1861_rec)
+summary(lm_rs1861_rec)
+
+#The p-value is less than 2 × 10⁻¹⁶. There is a statistically significant relationship between rs1861 and the phenotype.
 ```
 
 (vi) Plot the scatterplot of phenotype versus the recessive-coded genotype of rs1861. Add the regression line to the plot.
 
-```         
-# Your code here...
+```{r}         
+# Create a new data frame containing a sequence of genotype values
+ geno_range_rec <- data.frame(
+   rs1861_C = seq(0, 2, length.out = 100)
+ )
+# Use the fitted linear regression model to predict phenotype probability
+# for each genotype value in geno_range
+ geno_range_rec$predicted_prob <- predict(lm_rs1861_rec, newdata = geno_range_rec, type = "response")
+
+# Start from the observed genotype dataset
+count_data_rec <- geno_A1_subset_rec %>%
+# Remove rows with missing genotype or missing phenotype
+  filter(!is.na(rs1861_C), !is.na(PHENOTYPE)) %>%
+ # Group data by genotype value and phenotype category 
+  group_by(rs1861_C, PHENOTYPE) %>%
+# Count how many observations fall into each genotype-phenotype combination
+  summarise(count = n(), .groups = "drop")
+
+# Start building the plot
+p2=ggplot() +
+# Add points showing observed genotype-phenotype combinations
+  geom_point(
+    # Use the summarized count_data for the points
+    data = count_data_rec,
+    # Map genotype to x-axis, phenotype to y-axis, and count to point size
+    aes(x = rs1861_C, y = PHENOTYPE, size = count),
+    color = "blue"
+  ) +
+  # Add a regression line showing predicted probability across genotype values
+  geom_line(# Use the prediction data frame for the line
+    data = geno_range_rec,
+    aes(x = rs1861_C, y = predicted_prob),
+    color = "red",
+    # Set line thickness
+    size = 1.2
+  ) +
+  # Control the size range of the points
+  scale_size_continuous(range = c(2, 10)) +
+  labs(
+    title = "Linear Regression: PHENOTYPE ~ Genotype",
+    x = "Genotype (0/1)",
+    y = "Phenotype ",
+    size = "Count") +
+  # Restrict the visible x- and y-axis range
+  coord_cartesian(xlim = c(-0.5,1.5)) +
+  scale_x_continuous(breaks = c(0, 1))+
+   # Apply a minimal theme for a clean appearance
+  theme_minimal()
+
+print(p2)
 ```
 
 (vii) Which model fits better? Justify your answer.
 
+```{r}
+AIC(lm_rs1861, lm_rs1861_rec)
+```
+
 ```         
-# Your answer here...
+# The AIC is higher in the recessive model (AIC=11291.74), indicating that the additive model (AIC=11276.91) may fit better slightly better overall by capturing a more gradual allele-phenotype effect.
+
+# However, it is also important to note that there are not a lot of individuals with the homozygous variant genotype in this dataset. As a result, that may be why the recessive model and additive models fit similarly.
 ```
 
 ### Criteria
diff --git a/02_activities/assignments/assignment_2.html b/02_activities/assignments/assignment_2.html
new file mode 100644
index 0000000..057a085
--- /dev/null
+++ b/02_activities/assignments/assignment_2.html
@@ -0,0 +1,1018 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
+
+<meta charset="utf-8">
+<meta name="generator" content="quarto-1.8.27">
+
+<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
+
+
+<title>Assignment #2</title>
+<style>
+code{white-space: pre-wrap;}
+span.smallcaps{font-variant: small-caps;}
+div.columns{display: flex; gap: min(4vw, 1.5em);}
+div.column{flex: auto; overflow-x: auto;}
+div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+ul.task-list{list-style: none;}
+ul.task-list li input[type="checkbox"] {
+  width: 0.8em;
+  margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 
+  vertical-align: middle;
+}
+/* CSS for syntax highlighting */
+html { -webkit-text-size-adjust: 100%; }
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+  { counter-reset: source-line 0; }
+pre.numberSource code > span
+  { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+  { content: counter(source-line);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+  }
+pre.numberSource { margin-left: 3em;  padding-left: 4px; }
+div.sourceCode
+  {   }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
+</style>
+
+
+<script src="assignment_2_files/libs/clipboard/clipboard.min.js"></script>
+<script src="assignment_2_files/libs/quarto-html/quarto.js" type="module"></script>
+<script src="assignment_2_files/libs/quarto-html/tabsets/tabsets.js" type="module"></script>
+<script src="assignment_2_files/libs/quarto-html/axe/axe-check.js" type="module"></script>
+<script src="assignment_2_files/libs/quarto-html/popper.min.js"></script>
+<script src="assignment_2_files/libs/quarto-html/tippy.umd.min.js"></script>
+<script src="assignment_2_files/libs/quarto-html/anchor.min.js"></script>
+<link href="assignment_2_files/libs/quarto-html/tippy.css" rel="stylesheet">
+<link href="assignment_2_files/libs/quarto-html/quarto-syntax-highlighting-ed96de9b727972fe78a7b5d16c58bf87.css" rel="stylesheet" id="quarto-text-highlighting-styles">
+<script src="assignment_2_files/libs/bootstrap/bootstrap.min.js"></script>
+<link href="assignment_2_files/libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
+<link href="assignment_2_files/libs/bootstrap/bootstrap-d6a003b94517c951b2d65075d42fb01b.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="light">
+
+
+</head>
+
+<body class="fullcontent quarto-light">
+
+<div id="quarto-content" class="page-columns page-rows-contents page-layout-article">
+
+<main class="content" id="quarto-document-content">
+
+<header id="title-block-header" class="quarto-title-block default">
+<div class="quarto-title">
+<h1 class="title">Assignment #2</h1>
+</div>
+
+
+
+<div class="quarto-title-meta">
+
+    
+  
+    
+  </div>
+  
+
+
+</header>
+
+
+<p>You will need <strong>PLINK2</strong> installed and available in your PATH. Please follow the OS-specific setup guide in <a href="../../SETUP.md"><code>SETUP.md</code></a>. The dataset for this assignment consists of the following binary PLINK files: <code>gwa.A2.bed</code>, <code>gwa.A2.bim</code>, <code>gwa.A2.fam</code> , available at the following Google Drive link: <a href="https://drive.google.com/drive/folders/1rHoy3z52Yyj985ukjjtLhfIBxchpyYtZ?usp=drive_link" class="uri">https://drive.google.com/drive/folders/1rHoy3z52Yyj985ukjjtLhfIBxchpyYtZ?usp=drive_link</a>. Please download all three files and save them in <code>02_activities/data/</code>.</p>
+<section id="question-1-data-inspection" class="level4">
+<h4 class="anchored" data-anchor-id="question-1-data-inspection">Question 1: Data inspection</h4>
+<p>Before you run any models, first get familiar with the dataset. You may find <code>data.table::fread()</code> in R helpful for reading <code>.bim</code> and <code>.fam</code> files.</p>
+<ol type="i">
+<li>Read the <code>.fam</code> file. How many samples does the dataset contain?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Read the .fam file</span></span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>fam <span class="ot">&lt;-</span> <span class="fu">fread</span>(<span class="st">"./02_activities/data/gwa.A2.fam"</span>, <span class="at">header =</span> <span class="cn">FALSE</span>)</span>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="fu">nrow</span>(fam)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>[1] 4000</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co">#The dataset contains 4000 samples.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<ol start="2" type="i">
+<li>Read the <code>.bim</code> file. How many SNPs does the dataset contain?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Read the .bim file</span></span>
+<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>bim <span class="ot">&lt;-</span> <span class="fu">fread</span>(<span class="st">"./02_activities/data/gwa.A2.bim"</span>, <span class="at">header =</span> <span class="cn">FALSE</span>)</span>
+<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="fu">nrow</span>(bim)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>[1] 306102</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co">#The dataset contains 306102 SNPs.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<ol start="3" type="i">
+<li>Make a histogram (or density plot) of the phenotype. Does it look roughly Gaussian?</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Check the header</span></span>
+<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(fam)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>      V1     V2    V3    V4    V5         V6
+   &lt;int&gt; &lt;char&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;      &lt;num&gt;
+1:     0  A2001     0     0     1 -0.2754205
+2:     1  A2002     0     0     1 -1.9958434
+3:     2  A2003     0     0     1 -0.7587786
+4:     3  A2004     0     0     1  0.5372893
+5:     4  A2005     0     0     1  1.1367670
+6:     5  A2006     0     0     1 -0.2051836</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Name the columns</span></span>
+<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(fam) <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="st">"FID"</span>,<span class="st">"IID"</span>,<span class="st">"PID"</span>,<span class="st">"MID"</span>,<span class="st">"SEX"</span>,<span class="st">"PHENOTYPE"</span>)</span>
+<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="co">#Double check that worked and it corresponds</span></span>
+<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="fu">names</span>(fam)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>[1] "FID"       "IID"       "PID"       "MID"       "SEX"       "PHENOTYPE"</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(fam)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>     FID    IID   PID   MID   SEX  PHENOTYPE
+   &lt;int&gt; &lt;char&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;      &lt;num&gt;
+1:     0  A2001     0     0     1 -0.2754205
+2:     1  A2002     0     0     1 -1.9958434
+3:     2  A2003     0     0     1 -0.7587786
+4:     3  A2004     0     0     1  0.5372893
+5:     4  A2005     0     0     1  1.1367670
+6:     5  A2006     0     0     1 -0.2051836</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Create a histogram</span></span>
+<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="fu">hist</span>(fam<span class="sc">$</span>PHENOTYPE,</span>
+<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>     <span class="at">main =</span> <span class="st">"Histogram of Phenotype"</span>,</span>
+<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>     <span class="at">xlab =</span> <span class="st">"PHENOTYPE"</span>,</span>
+<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>     <span class="at">breaks =</span> <span class="dv">30</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output-display">
+<div>
+<figure class="figure">
+<p><img src="assignment_2_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
+</figure>
+</div>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Yes, the distribution of the phenotype looks roughly Gaussian.</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+</section>
+<section id="question-2-quality-control-qc" class="level4">
+<h4 class="anchored" data-anchor-id="question-2-quality-control-qc">Question 2: Quality control (QC)</h4>
+<p>Now we will perform QC using PLINK2 for the genotype files in <code>gwa.A2</code>.</p>
+<ol type="i">
+<li>Using PLINK2 from the command line (bash), perform basic QC with the following filters: MAF ≥ 0.05, SNP missingness (<code>--geno</code>) ≤ 0.01, individual missingness (<code>--mind</code>) ≤ 0.10, and HWE p-value ≥ 0.00005, and output the QC’ed dataset as <code>gwa.qc.A2</code>.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.A2 <span class="at">--maf</span> 0.05 <span class="at">--geno</span> 0.01 <span class="at">--mind</span> 0.1 <span class="at">--hwe</span> 0.00005 <span class="at">--make-bed</span> <span class="at">--out</span> ./02_activities/data/gwa.qc.A2</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa.qc.A2.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.A2
+  --geno 0.01
+  --hwe 0.00005
+  --maf 0.05
+  --make-bed
+  --mind 0.1
+  --out ./02_activities/data/gwa.qc.A2
+
+Start time: Tue Mar 31 14:11:15 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.A2.fam.
+306102 variants loaded from ./02_activities/data/gwa.A2.bim.
+1 quantitative phenotype loaded (4000 values).
+Calculating sample missingness rates... 0%21%42%64%85%done.
+0 samples removed due to missing genotype data (--mind).
+4000 samples (2000 females, 2000 males; 4000 founders) remaining after main
+filters.
+4000 quantitative phenotype values remaining after main filters.
+Calculating allele frequencies... 0%21%42%64%85%done.
+--geno: 196578 variants removed due to missing genotype data.
+--hwe: 6 variants removed due to Hardy-Weinberg exact test (founders only).
+8435 variants removed due to allele frequency threshold(s)
+(--maf/--max-maf/--mac/--max-mac).
+101083 variants remaining after main filters.
+Writing ./02_activities/data/gwa.qc.A2.fam ... done.
+Writing ./02_activities/data/gwa.qc.A2.bim ... done.
+Writing ./02_activities/data/gwa.qc.A2.bed ... 0%21%43%65%87%done.
+End time: Tue Mar 31 14:11:16 2026</code></pre>
+</div>
+</div>
+</section>
+<section id="question-3-relatedness" class="level4">
+<h4 class="anchored" data-anchor-id="question-3-relatedness">Question 3: Relatedness</h4>
+<p>In this question, you will use <strong>PLINK2’s built-in KING-robust kinship</strong> (<code>--king-cutoff</code>) detect and remove related individuals.</p>
+<ol type="i">
+<li>Perform LD pruning on <code>gwa.qc.A2</code> using PLINK2 with the following parameters: <code>--indep-pairwise 500 50 0.05</code>, and then generate a new dataset containing only the pruned SNPs.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="co">#Create a subset of approximately independent SNPs</span></span>
+<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A2 <span class="at">--indep-pairwise</span> 500 50 0.05 <span class="at">--out</span> ./02_activities/data/gwa.qc.A2_pruned</span>
+<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a><span class="co">#Build a new dataset using only pruned SNPs</span></span>
+<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A2 <span class="at">--extract</span> ./02_activities/data/gwa.qc.A2_pruned.prune.in <span class="at">--make-bed</span> <span class="at">--out</span> ./02_activities/data/gwa.qc.A2_pruned</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa.qc.A2_pruned.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc.A2
+  --indep-pairwise 500 50 0.05
+  --out ./02_activities/data/gwa.qc.A2_pruned
+
+Start time: Tue Mar 31 14:11:16 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A2.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc.A2.bim.
+1 quantitative phenotype loaded (4000 values).
+Calculating allele frequencies... 0%64%done.
+--indep-pairwise (9 compute threads): 0%50%79169/101083 variants removed.
+Writing...
+Variant lists written to ./02_activities/data/gwa.qc.A2_pruned.prune.in and
+./02_activities/data/gwa.qc.A2_pruned.prune.out .
+End time: Tue Mar 31 14:11:16 2026
+PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa.qc.A2_pruned.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc.A2
+  --extract ./02_activities/data/gwa.qc.A2_pruned.prune.in
+  --make-bed
+  --out ./02_activities/data/gwa.qc.A2_pruned
+
+Start time: Tue Mar 31 14:11:16 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A2.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc.A2.bim.
+1 quantitative phenotype loaded (4000 values).
+--extract: 21914 variants remaining.
+21914 variants remaining after main filters.
+Writing ./02_activities/data/gwa.qc.A2_pruned.fam ... done.
+Writing ./02_activities/data/gwa.qc.A2_pruned.bim ... done.
+Writing ./02_activities/data/gwa.qc.A2_pruned.bed ... 0%61%done.
+End time: Tue Mar 31 14:11:16 2026</code></pre>
+</div>
+</div>
+<ol start="2" type="i">
+<li>Use PLINK2 on the LD-pruned dataset to identify a set of unrelated individuals up to (approximately) 2nd-degree relatives (use a kinship cutoff of 0.0884).</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A2_pruned <span class="at">--king-cutoff</span> 0.0884 <span class="at">--out</span> ./02_activities/data/gwa.A2_king</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa.A2_king.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc.A2_pruned
+  --king-cutoff 0.0884
+  --out ./02_activities/data/gwa.A2_king
+
+Start time: Tue Mar 31 14:11:16 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A2_pruned.fam.
+21914 variants loaded from ./02_activities/data/gwa.qc.A2_pruned.bim.
+1 quantitative phenotype loaded (4000 values).
+--king-cutoff pass 1/1: Scanning for rare variants... 0%done.
+0 variants handled by initial scan (21914 remaining).
+
+--king-cutoff pass 1/1: 0 variants complete.
+--king-cutoff pass 1/1: 1536 variants complete.
+--king-cutoff pass 1/1: 3072 variants complete.
+--king-cutoff pass 1/1: 4608 variants complete.
+--king-cutoff pass 1/1: 6144 variants complete.
+--king-cutoff pass 1/1: 7680 variants complete.
+--king-cutoff pass 1/1: 9216 variants complete.
+--king-cutoff pass 1/1: 10752 variants complete.
+--king-cutoff pass 1/1: 12288 variants complete.
+--king-cutoff pass 1/1: 13824 variants complete.
+--king-cutoff pass 1/1: 15360 variants complete.
+--king-cutoff pass 1/1: 16896 variants complete.
+--king-cutoff pass 1/1: 18432 variants complete.
+--king-cutoff pass 1/1: 19968 variants complete.
+--king-cutoff pass 1/1: 21504 variants complete.
+--king-cutoff pass 1/1: Condensing...                 done.
+--king-cutoff: 21914 variants processed.
+--king-cutoff: Excluded sample IDs written to
+./02_activities/data/gwa.A2_king.king.cutoff.out.id , and 4000 remaining sample
+IDs written to ./02_activities/data/gwa.A2_king.king.cutoff.in.id .
+End time: Tue Mar 31 14:11:17 2026</code></pre>
+</div>
+</div>
+<ol start="3" type="i">
+<li>Using the unrelated individual list from part (ii), create a dataset containing only unrelated individuals and name it <code>gwa.qc_unrelated.A2</code>.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A2 <span class="at">--keep</span> ./02_activities/data/gwa.A2_king.king.cutoff.in.id <span class="at">--make-bed</span> <span class="at">--out</span> ./02_activities/data/gwa.qc_unrelated.A2</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa.qc_unrelated.A2.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc.A2
+  --keep ./02_activities/data/gwa.A2_king.king.cutoff.in.id
+  --make-bed
+  --out ./02_activities/data/gwa.qc_unrelated.A2
+
+Start time: Tue Mar 31 14:11:17 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A2.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc.A2.bim.
+1 quantitative phenotype loaded (4000 values).
+--keep: 4000 samples remaining.
+4000 samples (2000 females, 2000 males; 4000 founders) remaining after main
+filters.
+4000 quantitative phenotype values remaining after main filters.
+Writing ./02_activities/data/gwa.qc_unrelated.A2.fam ... done.
+Writing ./02_activities/data/gwa.qc_unrelated.A2.bim ... done.
+Writing ./02_activities/data/gwa.qc_unrelated.A2.bed ... 0%64%done.
+End time: Tue Mar 31 14:11:17 2026</code></pre>
+</div>
+</div>
+</section>
+<section id="question-4-principal-component-analysis-pca" class="level4">
+<h4 class="anchored" data-anchor-id="question-4-principal-component-analysis-pca">Question 4: principal component analysis (PCA)</h4>
+<ol type="i">
+<li><p>Using PLINK2, perform PCA on the unrelated, LD-pruned dataset to obtain principal components.</p>
+<p>(Hint: Refer to the PCA section in the GWAS Tutorial II. You should use a <code>*.prune.in</code> file to select a set of independent SNPs before performing PCA.)</p></li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc_unrelated.A2 <span class="at">--extract</span> ./02_activities/data/gwa.qc.A2_pruned.prune.in <span class="at">--make-bed</span> <span class="at">--out</span> ./02_activities/data/gwa.qc_unrelated_pruned.A2</span>
+<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc_unrelated_pruned.A2 <span class="at">--pca</span> 20 <span class="at">--out</span> ./02_activities/data/gwa_unrel_PCA.A2</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa.qc_unrelated_pruned.A2.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc_unrelated.A2
+  --extract ./02_activities/data/gwa.qc.A2_pruned.prune.in
+  --make-bed
+  --out ./02_activities/data/gwa.qc_unrelated_pruned.A2
+
+Start time: Tue Mar 31 14:11:17 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc_unrelated.A2.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc_unrelated.A2.bim.
+1 quantitative phenotype loaded (4000 values).
+--extract: 21914 variants remaining.
+21914 variants remaining after main filters.
+Writing ./02_activities/data/gwa.qc_unrelated_pruned.A2.fam ... done.
+Writing ./02_activities/data/gwa.qc_unrelated_pruned.A2.bim ... done.
+Writing ./02_activities/data/gwa.qc_unrelated_pruned.A2.bed ... 0%61%done.
+End time: Tue Mar 31 14:11:17 2026
+PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa_unrel_PCA.A2.log.
+Options in effect:
+  --bfile ./02_activities/data/gwa.qc_unrelated_pruned.A2
+  --out ./02_activities/data/gwa_unrel_PCA.A2
+  --pca 20
+
+Start time: Tue Mar 31 14:11:17 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc_unrelated_pruned.A2.fam.
+21914 variants loaded from ./02_activities/data/gwa.qc_unrelated_pruned.A2.bim.
+1 quantitative phenotype loaded (4000 values).
+Calculating allele frequencies... 0%done.
+Constructing GRM: 0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%42%43%44%45%45%46%47%48%49%50%51%52%53%54%55%56%57%58%59%60%61%62%63%64%65%66%67%68%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%85%86%87%88%89%90%91%91%92%93%94%95%96%97%98%99%done.
+Correcting for missingness... 0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%42%43%44%45%46%47%48%49%50%51%52%53%54%55%56%57%58%59%60%61%62%63%64%65%66%67%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%85%86%87%88%89%90%91%92%93%94%95%96%97%98%99%done.
+Extracting eigenvalues and eigenvectors... done.
+--pca: Eigenvectors written to ./02_activities/data/gwa_unrel_PCA.A2.eigenvec ,
+and eigenvalues written to ./02_activities/data/gwa_unrel_PCA.A2.eigenval .
+End time: Tue Mar 31 14:11:22 2026</code></pre>
+</div>
+</div>
+</section>
+<section id="question-5-gwas-analyses" class="level4">
+<h4 class="anchored" data-anchor-id="question-5-gwas-analyses">Question 5: GWAS analyses</h4>
+<p>We now test for association between each SNP and the continuous phenotype, adjusting for population structure using the top 3 PCs.</p>
+<p>Assume that:</p>
+<ul>
+<li><p>You will run GWAS on the unrelated, QC’ed dataset <code>gwa.qc_unrelated.A2</code>.</p></li>
+<li><p>You have a covariate file (e.g., the <code>*.eigenvec</code> output from the PCA analysis) containing PC1–PC3 for each individual.</p></li>
+</ul>
+<ol type="i">
+<li>Using PLINK2, run a linear regression GWAS using <code>gwa.qc_unrelated.A2</code> as the input dataset. Adjust for PCs 1–3 as covariates.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ex">plink2</span> <span class="at">--bfile</span> ./02_activities/data/gwa.qc.A2 <span class="at">--ci</span> 0.95 <span class="at">--logistic</span> <span class="at">--covar</span> ./02_activities/data/gwa_unrel_PCA.A2.eigenvec <span class="at">--covar-name</span> PC1,PC2,PC3 <span class="at">--adjust</span> <span class="at">--out</span> ./02_activities/data/gwa.qc_assoc.A2</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>PLINK v2.00a5.12 M1 (25 Jun 2024)              www.cog-genomics.org/plink/2.0/
+(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
+Logging to ./02_activities/data/gwa.qc_assoc.A2.log.
+Options in effect:
+  --adjust
+  --bfile ./02_activities/data/gwa.qc.A2
+  --ci 0.95
+  --covar ./02_activities/data/gwa_unrel_PCA.A2.eigenvec
+  --covar-name PC1,PC2,PC3
+  --glm
+  --out ./02_activities/data/gwa.qc_assoc.A2
+
+Start time: Tue Mar 31 14:11:22 2026
+16384 MiB RAM detected; reserving 8192 MiB for main workspace.
+Using up to 10 threads (change this with --threads).
+4000 samples (2000 females, 2000 males; 4000 founders) loaded from
+./02_activities/data/gwa.qc.A2.fam.
+101083 variants loaded from ./02_activities/data/gwa.qc.A2.bim.
+1 quantitative phenotype loaded (4000 values).
+3 covariates loaded from ./02_activities/data/gwa_unrel_PCA.A2.eigenvec.
+Calculating allele frequencies... 0%64%done.
+--glm linear regression on phenotype 'PHENO1': 0%64%done.
+Results written to ./02_activities/data/gwa.qc_assoc.A2.PHENO1.glm.linear .
+--adjust: Genomic inflation est. lambda (based on median chisq) = 1.01779.
+--adjust values (101083 tests) written to
+./02_activities/data/gwa.qc_assoc.A2.PHENO1.glm.linear.adjusted .
+End time: Tue Mar 31 14:11:23 2026</code></pre>
+</div>
+</div>
+<ol start="2" type="i">
+<li>Create a Manhattan plot of the GWAS results.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load PLINK2 linear results</span></span>
+<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>assocA2 <span class="ot">&lt;-</span> data.table<span class="sc">::</span><span class="fu">fread</span>(<span class="st">"./02_activities/data/gwa.qc_assoc.A2.PHENO1.glm.linear"</span>)</span>
+<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(assocA2)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>   #CHROM     POS        ID    REF    ALT PROVISIONAL_REF?     A1 OMITTED
+    &lt;int&gt;   &lt;int&gt;    &lt;char&gt; &lt;char&gt; &lt;char&gt;           &lt;char&gt; &lt;char&gt;  &lt;char&gt;
+1:      1 1011278 rs3737728      G      A                Y      A       G
+2:      1 1011278 rs3737728      G      A                Y      A       G
+3:      1 1011278 rs3737728      G      A                Y      A       G
+4:      1 1011278 rs3737728      G      A                Y      A       G
+5:      1 1109721 rs1320565      C      T                Y      T       C
+6:      1 1109721 rs1320565      C      T                Y      T       C
+     A1_FREQ   TEST OBS_CT       BETA        SE        L95       U95    T_STAT
+       &lt;num&gt; &lt;char&gt;  &lt;int&gt;      &lt;num&gt;     &lt;num&gt;      &lt;num&gt;     &lt;num&gt;     &lt;num&gt;
+1: 0.3386490    ADD   3982  0.0182086 0.0240795 -0.0289864 0.0654035  0.756187
+2: 0.3386490    PC1   3982 -0.3779240 1.0015500 -2.3409200 1.5850800 -0.377340
+3: 0.3386490    PC2   3982  0.9818050 1.0015100 -0.9811150 2.9447300  0.980326
+4: 0.3386490    PC3   3982  0.4832040 1.0015300 -1.4797500 2.4461600  0.482467
+5: 0.0788481    ADD   3976 -0.0153297 0.0420641 -0.0977739 0.0671145 -0.364436
+6: 0.0788481    PC1   3976 -0.4762890 1.0037500 -2.4436000 1.4910200 -0.474511
+          P ERRCODE
+      &lt;num&gt;  &lt;char&gt;
+1: 0.449582       .
+2: 0.705941       .
+3: 0.326985       .
+4: 0.629501       .
+5: 0.715552       .
+6: 0.635162       .</code></pre>
+</div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Keep only the additive test (one row per SNP)</span></span>
+<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a>assoc_addA2 <span class="ot">&lt;-</span> assocA2[TEST <span class="sc">==</span> <span class="st">"ADD"</span>]</span>
+<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb29-4"><a href="#cb29-4" aria-hidden="true" tabindex="-1"></a><span class="co"># qqman expects columns named CHR, BP, SNP, and P.</span></span>
+<span id="cb29-5"><a href="#cb29-5" aria-hidden="true" tabindex="-1"></a><span class="co"># In PLINK2 output they are #CHROM, POS, ID, and P.</span></span>
+<span id="cb29-6"><a href="#cb29-6" aria-hidden="true" tabindex="-1"></a><span class="fu">setnames</span>(assoc_addA2, <span class="fu">c</span>(<span class="st">"#CHROM"</span>,<span class="st">"POS"</span>,<span class="st">"ID"</span>), <span class="fu">c</span>(<span class="st">"CHR"</span>,<span class="st">"BP"</span>,<span class="st">"SNP"</span>))</span>
+<span id="cb29-7"><a href="#cb29-7" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb29-8"><a href="#cb29-8" aria-hidden="true" tabindex="-1"></a><span class="fu">png</span>(<span class="st">"./02_activities/assignments/manhattan_plot.png"</span>, <span class="at">width =</span> <span class="dv">1200</span>, <span class="at">height =</span> <span class="dv">800</span>, <span class="at">res =</span> <span class="dv">150</span>)</span>
+<span id="cb29-9"><a href="#cb29-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Manhattan plot</span></span>
+<span id="cb29-10"><a href="#cb29-10" aria-hidden="true" tabindex="-1"></a><span class="fu">manhattan</span>(assoc_addA2,</span>
+<span id="cb29-11"><a href="#cb29-11" aria-hidden="true" tabindex="-1"></a>          <span class="at">chr =</span> <span class="st">"CHR"</span>,</span>
+<span id="cb29-12"><a href="#cb29-12" aria-hidden="true" tabindex="-1"></a>          <span class="at">bp  =</span> <span class="st">"BP"</span>,</span>
+<span id="cb29-13"><a href="#cb29-13" aria-hidden="true" tabindex="-1"></a>          <span class="at">snp =</span> <span class="st">"SNP"</span>,</span>
+<span id="cb29-14"><a href="#cb29-14" aria-hidden="true" tabindex="-1"></a>          <span class="at">p   =</span> <span class="st">"P"</span>,</span>
+<span id="cb29-15"><a href="#cb29-15" aria-hidden="true" tabindex="-1"></a>          <span class="at">xlab =</span> <span class="st">"Chromosome"</span>,</span>
+<span id="cb29-16"><a href="#cb29-16" aria-hidden="true" tabindex="-1"></a>          <span class="at">ylab =</span> <span class="st">"-log10(P)"</span>,</span>
+<span id="cb29-17"><a href="#cb29-17" aria-hidden="true" tabindex="-1"></a>          <span class="at">suggestiveline =</span> <span class="cn">FALSE</span>,</span>
+<span id="cb29-18"><a href="#cb29-18" aria-hidden="true" tabindex="-1"></a>          <span class="at">cex.axis =</span> <span class="fl">1.5</span>,</span>
+<span id="cb29-19"><a href="#cb29-19" aria-hidden="true" tabindex="-1"></a>          <span class="at">col =</span> <span class="fu">c</span>(<span class="st">"lightblue"</span>, <span class="st">"lightslateblue"</span>))</span>
+<span id="cb29-20"><a href="#cb29-20" aria-hidden="true" tabindex="-1"></a><span class="fu">dev.off</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>quartz_off_screen 
+                2 </code></pre>
+</div>
+</div>
+<ol start="3" type="i">
+<li>Create a QQ plot of the GWAS p-values.</li>
+</ol>
+<div class="cell">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="fu">png</span>(<span class="st">"./02_activities/assignments/QQ_plot.png"</span>, <span class="at">width =</span> <span class="dv">1200</span>, <span class="at">height =</span> <span class="dv">800</span>, <span class="at">res =</span> <span class="dv">150</span>)</span>
+<span id="cb31-2"><a href="#cb31-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Q-Q plot</span></span>
+<span id="cb31-3"><a href="#cb31-3" aria-hidden="true" tabindex="-1"></a><span class="fu">qq</span>(assoc_addA2<span class="sc">$</span>P, <span class="at">main =</span> <span class="st">"Q-Q plot of GWAS p-values"</span>)</span>
+<span id="cb31-4"><a href="#cb31-4" aria-hidden="true" tabindex="-1"></a><span class="fu">dev.off</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>quartz_off_screen 
+                2 </code></pre>
+</div>
+</div>
+</section>
+<section id="criteria" class="level3">
+<h3 class="anchored" data-anchor-id="criteria">Criteria</h3>
+<table class="caption-top table">
+<colgroup>
+<col style="width: 20%">
+<col style="width: 47%">
+<col style="width: 31%">
+</colgroup>
+<thead>
+<tr class="header">
+<th style="text-align: left;">Criteria</th>
+<th style="text-align: left;">Complete</th>
+<th style="text-align: left;">Incomplete</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td style="text-align: left;"><strong>Data inspection</strong></td>
+<td style="text-align: left;">Correct sample and SNP counts; phenotype plot with a brief comment on Gaussianity.</td>
+<td style="text-align: left;">Counts or phenotype description/plot missing or incorrect.</td>
+</tr>
+<tr class="even">
+<td style="text-align: left;"><strong>QC &amp; LD pruning</strong></td>
+<td style="text-align: left;">Correct PLINK2 QC command and thresholds.</td>
+<td style="text-align: left;">QC/pruning commands, thresholds, or output datasets missing or incorrect.</td>
+</tr>
+<tr class="odd">
+<td style="text-align: left;"><strong>Relatedness &amp; PCA</strong></td>
+<td style="text-align: left;">Correct use of PLINK2 command to obtain unrelated samples and PCA run on pruned SNPs.</td>
+<td style="text-align: left;">Relatedness step, unrelated dataset, or PCA analysis missing or incorrect.</td>
+</tr>
+<tr class="even">
+<td style="text-align: left;"><strong>GWAS &amp; visualisation</strong></td>
+<td style="text-align: left;">Linear regression GWAS with PCs as covariates; Manhattan and QQ plots produced.</td>
+<td style="text-align: left;">GWAS command, or Manhattan/QQ plots missing or clearly incorrect.</td>
+</tr>
+</tbody>
+</table>
+</section>
+<section id="submission-information" class="level3">
+<h3 class="anchored" data-anchor-id="submission-information">Submission Information</h3>
+<p>🚨 <strong>Please review our <a href="https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md">Assignment Submission Guide</a></strong> 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.</p>
+<hr>
+<section id="submission-parameters" class="level4">
+<h4 class="anchored" data-anchor-id="submission-parameters">Submission Parameters</h4>
+<ul>
+<li><p>Submission Due Date: 11:59 PM – 01/04/2026</p></li>
+<li><p>The branch name for your repo should be: <code>assignment-2</code></p></li>
+<li><p>What to submit for this assignment:</p>
+<ul>
+<li>Populate this Quarto document (<code>assignment_2.qmd</code>).</li>
+<li>Render the document with Quarto: <code>quarto render assignment_2.qmd</code>.</li>
+<li>Submit both <code>assignment_2.qmd</code> and the rendered HTML file <code>assignment_2.html</code> in your pull request.</li>
+</ul></li>
+<li><p>What the pull request link should look like for this assignment: <code>https://github.com/&lt;your_github_username&gt;/gen_data/pull/&lt;pr_id&gt;</code></p>
+<ul>
+<li>Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support team review your submission easily.</li>
+</ul></li>
+</ul>
+<hr>
+<p>Checklist:</p>
+<ul>
+<li>Create a branch called <code>assignment-2</code>.</li>
+<li>Ensure that your repository is public.</li>
+<li>Review <a href="https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md#guidelines-for-pull-request-descriptions">the PR description guidelines</a> and adhere to them.</li>
+<li>Verify that your link is accessible in a private browser window.</li>
+<li>Confirm that both <code>assignment_2.qmd</code> and <code>assignment_2.html</code> are included in the pull request.</li>
+</ul>
+</section>
+</section>
+
+</main>
+<!-- /main column -->
+<script id="quarto-html-after-body" type="application/javascript">
+  window.document.addEventListener("DOMContentLoaded", function (event) {
+    const icon = "";
+    const anchorJS = new window.AnchorJS();
+    anchorJS.options = {
+      placement: 'right',
+      icon: icon
+    };
+    anchorJS.add('.anchored');
+    const isCodeAnnotation = (el) => {
+      for (const clz of el.classList) {
+        if (clz.startsWith('code-annotation-')) {                     
+          return true;
+        }
+      }
+      return false;
+    }
+    const onCopySuccess = function(e) {
+      // button target
+      const button = e.trigger;
+      // don't keep focus
+      button.blur();
+      // flash "checked"
+      button.classList.add('code-copy-button-checked');
+      var currentTitle = button.getAttribute("title");
+      button.setAttribute("title", "Copied!");
+      let tooltip;
+      if (window.bootstrap) {
+        button.setAttribute("data-bs-toggle", "tooltip");
+        button.setAttribute("data-bs-placement", "left");
+        button.setAttribute("data-bs-title", "Copied!");
+        tooltip = new bootstrap.Tooltip(button, 
+          { trigger: "manual", 
+            customClass: "code-copy-button-tooltip",
+            offset: [0, -8]});
+        tooltip.show();    
+      }
+      setTimeout(function() {
+        if (tooltip) {
+          tooltip.hide();
+          button.removeAttribute("data-bs-title");
+          button.removeAttribute("data-bs-toggle");
+          button.removeAttribute("data-bs-placement");
+        }
+        button.setAttribute("title", currentTitle);
+        button.classList.remove('code-copy-button-checked');
+      }, 1000);
+      // clear code selection
+      e.clearSelection();
+    }
+    const getTextToCopy = function(trigger) {
+      const outerScaffold = trigger.parentElement.cloneNode(true);
+      const codeEl = outerScaffold.querySelector('code');
+      for (const childEl of codeEl.children) {
+        if (isCodeAnnotation(childEl)) {
+          childEl.remove();
+        }
+      }
+      return codeEl.innerText;
+    }
+    const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', {
+      text: getTextToCopy
+    });
+    clipboard.on('success', onCopySuccess);
+    if (window.document.getElementById('quarto-embedded-source-code-modal')) {
+      const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', {
+        text: getTextToCopy,
+        container: window.document.getElementById('quarto-embedded-source-code-modal')
+      });
+      clipboardModal.on('success', onCopySuccess);
+    }
+      var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
+      var mailtoRegex = new RegExp(/^mailto:/);
+        var filterRegex = new RegExp('/' + window.location.host + '/');
+      var isInternal = (href) => {
+          return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
+      }
+      // Inspect non-navigation links and adorn them if external
+     var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)');
+      for (var i=0; i<links.length; i++) {
+        const link = links[i];
+        if (!isInternal(link.href)) {
+          // undo the damage that might have been done by quarto-nav.js in the case of
+          // links that we want to consider external
+          if (link.dataset.originalHref !== undefined) {
+            link.href = link.dataset.originalHref;
+          }
+        }
+      }
+    function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) {
+      const config = {
+        allowHTML: true,
+        maxWidth: 500,
+        delay: 100,
+        arrow: false,
+        appendTo: function(el) {
+            return el.parentElement;
+        },
+        interactive: true,
+        interactiveBorder: 10,
+        theme: 'quarto',
+        placement: 'bottom-start',
+      };
+      if (contentFn) {
+        config.content = contentFn;
+      }
+      if (onTriggerFn) {
+        config.onTrigger = onTriggerFn;
+      }
+      if (onUntriggerFn) {
+        config.onUntrigger = onUntriggerFn;
+      }
+      window.tippy(el, config); 
+    }
+    const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');
+    for (var i=0; i<noterefs.length; i++) {
+      const ref = noterefs[i];
+      tippyHover(ref, function() {
+        // use id or data attribute instead here
+        let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href');
+        try { href = new URL(href).hash; } catch {}
+        const id = href.replace(/^#\/?/, "");
+        const note = window.document.getElementById(id);
+        if (note) {
+          return note.innerHTML;
+        } else {
+          return "";
+        }
+      });
+    }
+    const xrefs = window.document.querySelectorAll('a.quarto-xref');
+    const processXRef = (id, note) => {
+      // Strip column container classes
+      const stripColumnClz = (el) => {
+        el.classList.remove("page-full", "page-columns");
+        if (el.children) {
+          for (const child of el.children) {
+            stripColumnClz(child);
+          }
+        }
+      }
+      stripColumnClz(note)
+      if (id === null || id.startsWith('sec-')) {
+        // Special case sections, only their first couple elements
+        const container = document.createElement("div");
+        if (note.children && note.children.length > 2) {
+          container.appendChild(note.children[0].cloneNode(true));
+          for (let i = 1; i < note.children.length; i++) {
+            const child = note.children[i];
+            if (child.tagName === "P" && child.innerText === "") {
+              continue;
+            } else {
+              container.appendChild(child.cloneNode(true));
+              break;
+            }
+          }
+          if (window.Quarto?.typesetMath) {
+            window.Quarto.typesetMath(container);
+          }
+          return container.innerHTML
+        } else {
+          if (window.Quarto?.typesetMath) {
+            window.Quarto.typesetMath(note);
+          }
+          return note.innerHTML;
+        }
+      } else {
+        // Remove any anchor links if they are present
+        const anchorLink = note.querySelector('a.anchorjs-link');
+        if (anchorLink) {
+          anchorLink.remove();
+        }
+        if (window.Quarto?.typesetMath) {
+          window.Quarto.typesetMath(note);
+        }
+        if (note.classList.contains("callout")) {
+          return note.outerHTML;
+        } else {
+          return note.innerHTML;
+        }
+      }
+    }
+    for (var i=0; i<xrefs.length; i++) {
+      const xref = xrefs[i];
+      tippyHover(xref, undefined, function(instance) {
+        instance.disable();
+        let url = xref.getAttribute('href');
+        let hash = undefined; 
+        if (url.startsWith('#')) {
+          hash = url;
+        } else {
+          try { hash = new URL(url).hash; } catch {}
+        }
+        if (hash) {
+          const id = hash.replace(/^#\/?/, "");
+          const note = window.document.getElementById(id);
+          if (note !== null) {
+            try {
+              const html = processXRef(id, note.cloneNode(true));
+              instance.setContent(html);
+            } finally {
+              instance.enable();
+              instance.show();
+            }
+          } else {
+            // See if we can fetch this
+            fetch(url.split('#')[0])
+            .then(res => res.text())
+            .then(html => {
+              const parser = new DOMParser();
+              const htmlDoc = parser.parseFromString(html, "text/html");
+              const note = htmlDoc.getElementById(id);
+              if (note !== null) {
+                const html = processXRef(id, note);
+                instance.setContent(html);
+              } 
+            }).finally(() => {
+              instance.enable();
+              instance.show();
+            });
+          }
+        } else {
+          // See if we can fetch a full url (with no hash to target)
+          // This is a special case and we should probably do some content thinning / targeting
+          fetch(url)
+          .then(res => res.text())
+          .then(html => {
+            const parser = new DOMParser();
+            const htmlDoc = parser.parseFromString(html, "text/html");
+            const note = htmlDoc.querySelector('main.content');
+            if (note !== null) {
+              // This should only happen for chapter cross references
+              // (since there is no id in the URL)
+              // remove the first header
+              if (note.children.length > 0 && note.children[0].tagName === "HEADER") {
+                note.children[0].remove();
+              }
+              const html = processXRef(null, note);
+              instance.setContent(html);
+            } 
+          }).finally(() => {
+            instance.enable();
+            instance.show();
+          });
+        }
+      }, function(instance) {
+      });
+    }
+        let selectedAnnoteEl;
+        const selectorForAnnotation = ( cell, annotation) => {
+          let cellAttr = 'data-code-cell="' + cell + '"';
+          let lineAttr = 'data-code-annotation="' +  annotation + '"';
+          const selector = 'span[' + cellAttr + '][' + lineAttr + ']';
+          return selector;
+        }
+        const selectCodeLines = (annoteEl) => {
+          const doc = window.document;
+          const targetCell = annoteEl.getAttribute("data-target-cell");
+          const targetAnnotation = annoteEl.getAttribute("data-target-annotation");
+          const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation));
+          const lines = annoteSpan.getAttribute("data-code-lines").split(",");
+          const lineIds = lines.map((line) => {
+            return targetCell + "-" + line;
+          })
+          let top = null;
+          let height = null;
+          let parent = null;
+          if (lineIds.length > 0) {
+              //compute the position of the single el (top and bottom and make a div)
+              const el = window.document.getElementById(lineIds[0]);
+              top = el.offsetTop;
+              height = el.offsetHeight;
+              parent = el.parentElement.parentElement;
+            if (lineIds.length > 1) {
+              const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]);
+              const bottom = lastEl.offsetTop + lastEl.offsetHeight;
+              height = bottom - top;
+            }
+            if (top !== null && height !== null && parent !== null) {
+              // cook up a div (if necessary) and position it 
+              let div = window.document.getElementById("code-annotation-line-highlight");
+              if (div === null) {
+                div = window.document.createElement("div");
+                div.setAttribute("id", "code-annotation-line-highlight");
+                div.style.position = 'absolute';
+                parent.appendChild(div);
+              }
+              div.style.top = top - 2 + "px";
+              div.style.height = height + 4 + "px";
+              div.style.left = 0;
+              let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");
+              if (gutterDiv === null) {
+                gutterDiv = window.document.createElement("div");
+                gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter");
+                gutterDiv.style.position = 'absolute';
+                const codeCell = window.document.getElementById(targetCell);
+                const gutter = codeCell.querySelector('.code-annotation-gutter');
+                gutter.appendChild(gutterDiv);
+              }
+              gutterDiv.style.top = top - 2 + "px";
+              gutterDiv.style.height = height + 4 + "px";
+            }
+            selectedAnnoteEl = annoteEl;
+          }
+        };
+        const unselectCodeLines = () => {
+          const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"];
+          elementsIds.forEach((elId) => {
+            const div = window.document.getElementById(elId);
+            if (div) {
+              div.remove();
+            }
+          });
+          selectedAnnoteEl = undefined;
+        };
+          // Handle positioning of the toggle
+      window.addEventListener(
+        "resize",
+        throttle(() => {
+          elRect = undefined;
+          if (selectedAnnoteEl) {
+            selectCodeLines(selectedAnnoteEl);
+          }
+        }, 10)
+      );
+      function throttle(fn, ms) {
+      let throttle = false;
+      let timer;
+        return (...args) => {
+          if(!throttle) { // first call gets through
+              fn.apply(this, args);
+              throttle = true;
+          } else { // all the others get throttled
+              if(timer) clearTimeout(timer); // cancel #2
+              timer = setTimeout(() => {
+                fn.apply(this, args);
+                timer = throttle = false;
+              }, ms);
+          }
+        };
+      }
+        // Attach click handler to the DT
+        const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');
+        for (const annoteDlNode of annoteDls) {
+          annoteDlNode.addEventListener('click', (event) => {
+            const clickedEl = event.target;
+            if (clickedEl !== selectedAnnoteEl) {
+              unselectCodeLines();
+              const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active');
+              if (activeEl) {
+                activeEl.classList.remove('code-annotation-active');
+              }
+              selectCodeLines(clickedEl);
+              clickedEl.classList.add('code-annotation-active');
+            } else {
+              // Unselect the line
+              unselectCodeLines();
+              clickedEl.classList.remove('code-annotation-active');
+            }
+          });
+        }
+    const findCites = (el) => {
+      const parentEl = el.parentElement;
+      if (parentEl) {
+        const cites = parentEl.dataset.cites;
+        if (cites) {
+          return {
+            el,
+            cites: cites.split(' ')
+          };
+        } else {
+          return findCites(el.parentElement)
+        }
+      } else {
+        return undefined;
+      }
+    };
+    var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');
+    for (var i=0; i<bibliorefs.length; i++) {
+      const ref = bibliorefs[i];
+      const citeInfo = findCites(ref);
+      if (citeInfo) {
+        tippyHover(citeInfo.el, function() {
+          var popup = window.document.createElement('div');
+          citeInfo.cites.forEach(function(cite) {
+            var citeDiv = window.document.createElement('div');
+            citeDiv.classList.add('hanging-indent');
+            citeDiv.classList.add('csl-entry');
+            var biblioDiv = window.document.getElementById('ref-' + cite);
+            if (biblioDiv) {
+              citeDiv.innerHTML = biblioDiv.innerHTML;
+            }
+            popup.appendChild(citeDiv);
+          });
+          return popup.innerHTML;
+        });
+      }
+    }
+  });
+  </script>
+</div> <!-- /content -->
+
+
+
+
+</body></html>
\ No newline at end of file
diff --git a/02_activities/assignments/assignment_2.qmd b/02_activities/assignments/assignment_2.qmd
index fd93722..a5399bf 100644
--- a/02_activities/assignments/assignment_2.qmd
+++ b/02_activities/assignments/assignment_2.qmd
@@ -5,6 +5,14 @@ format: html
 
 You will need **PLINK2** installed and available in your PATH. Please follow the OS-specific setup guide in [`SETUP.md`](../../SETUP.md). The dataset for this assignment consists of the following binary PLINK files: `gwa.A2.bed`, `gwa.A2.bim`, `gwa.A2.fam` , available at the following Google Drive link: <https://drive.google.com/drive/folders/1rHoy3z52Yyj985ukjjtLhfIBxchpyYtZ?usp=drive_link>. Please download all three files and save them in `02_activities/data/`.
 
+```{r setup, include=FALSE}
+# Input files are expected in ./02_activities/data/.
+#knitr::opts_chunk$set(echo = TRUE)
+knitr::opts_knit$set(root.dir = normalizePath("../../"))
+library(data.table)
+library(qqman)
+```
+
 #### Question 1: Data inspection
 
 Before you run any models, first get familiar with the dataset. You may find `data.table::fread()` in R helpful for reading `.bim` and `.fam` files.
@@ -12,19 +20,45 @@ Before you run any models, first get familiar with the dataset. You may find `da
 (i) Read the `.fam` file. How many samples does the dataset contain?
 
 ```{r}
-# Your code here...
+#Read the .fam file
+
+fam <- fread("./02_activities/data/gwa.A2.fam", header = FALSE)
+nrow(fam)
+
+#The dataset contains 4000 samples.
 ```
 
 (ii) Read the `.bim` file. How many SNPs does the dataset contain?
 
 ```{r}
-# Your code here...
+#Read the .bim file
+bim <- fread("./02_activities/data/gwa.A2.bim", header = FALSE)
+nrow(bim)
+
+#The dataset contains 306102 SNPs.
 ```
 
 (iii) Make a histogram (or density plot) of the phenotype. Does it look roughly Gaussian?
 
 ```{r}
-# Your code here...
+#Check the header
+head(fam)
+
+#Name the columns
+colnames(fam) <- c("FID","IID","PID","MID","SEX","PHENOTYPE")
+
+#Double check that worked and it corresponds
+names(fam)
+head(fam)
+
+#Create a histogram
+hist(fam$PHENOTYPE,
+     main = "Histogram of Phenotype",
+     xlab = "PHENOTYPE",
+     breaks = 30)
+
+#Yes, the distribution of the phenotype looks roughly Gaussian.
+
 ```
 
 #### Question 2: Quality control (QC)
@@ -34,7 +68,7 @@ Now we will perform QC using PLINK2 for the genotype files in `gwa.A2`.
 (i) Using PLINK2 from the command line (bash), perform basic QC with the following filters: MAF ≥ 0.05, SNP missingness (`--geno`) ≤ 0.01, individual missingness (`--mind`) ≤ 0.10, and HWE p-value ≥ 0.00005, and output the QC’ed dataset as `gwa.qc.A2`.
 
 ```{bash}
-# Your code here...
+plink2 --bfile ./02_activities/data/gwa.A2 --maf 0.05 --geno 0.01 --mind 0.1 --hwe 0.00005 --make-bed --out ./02_activities/data/gwa.qc.A2
 ```
 
 #### Question 3: Relatedness
@@ -44,19 +78,25 @@ In this question, you will use **PLINK2’s built-in KING-robust kinship** (`--k
 i)  Perform LD pruning on `gwa.qc.A2` using PLINK2 with the following parameters: `--indep-pairwise 500 50 0.05`, and then generate a new dataset containing only the pruned SNPs.
 
 ```{bash}
-# Your code here...
+#Create a subset of approximately independent SNPs
+plink2 --bfile ./02_activities/data/gwa.qc.A2 --indep-pairwise 500 50 0.05 --out ./02_activities/data/gwa.qc.A2_pruned
+
+#Build a new dataset using only pruned SNPs
+plink2 --bfile ./02_activities/data/gwa.qc.A2 --extract ./02_activities/data/gwa.qc.A2_pruned.prune.in --make-bed --out ./02_activities/data/gwa.qc.A2_pruned
 ```
 
 (ii) Use PLINK2 on the LD-pruned dataset to identify a set of unrelated individuals up to (approximately) 2nd-degree relatives (use a kinship cutoff of 0.0884).
 
 ```{bash}
-# Your code here...
+
+plink2 --bfile ./02_activities/data/gwa.qc.A2_pruned --king-cutoff 0.0884 --out ./02_activities/data/gwa.A2_king
+
 ```
 
 (iii) Using the unrelated individual list from part (ii), create a dataset containing only unrelated individuals and name it `gwa.qc_unrelated.A2`.
 
 ```{bash}
-# Your code here...
+plink2 --bfile ./02_activities/data/gwa.qc.A2 --keep ./02_activities/data/gwa.A2_king.king.cutoff.in.id --make-bed --out ./02_activities/data/gwa.qc_unrelated.A2
 ```
 
 #### Question 4: principal component analysis (PCA)
@@ -66,7 +106,9 @@ i)  Perform LD pruning on `gwa.qc.A2` using PLINK2 with the following parameters
     (Hint: Refer to the PCA section in the GWAS Tutorial II. You should use a `*.prune.in` file to select a set of independent SNPs before performing PCA.)
 
 ```{bash}
-# Your code here...
+plink2 --bfile ./02_activities/data/gwa.qc_unrelated.A2 --extract ./02_activities/data/gwa.qc.A2_pruned.prune.in --make-bed --out ./02_activities/data/gwa.qc_unrelated_pruned.A2
+
+plink2 --bfile ./02_activities/data/gwa.qc_unrelated_pruned.A2 --pca 20 --out ./02_activities/data/gwa_unrel_PCA.A2
 ```
 
 #### Question 5: GWAS analyses
@@ -82,19 +124,46 @@ Assume that:
 (i) Using PLINK2, run a linear regression GWAS using `gwa.qc_unrelated.A2` as the input dataset. Adjust for PCs 1–3 as covariates.
 
 ```{bash}
-# Your code here...
+plink2 --bfile ./02_activities/data/gwa.qc.A2 --ci 0.95 --logistic --covar ./02_activities/data/gwa_unrel_PCA.A2.eigenvec --covar-name PC1,PC2,PC3 --adjust --out ./02_activities/data/gwa.qc_assoc.A2
 ```
 
 (ii) Create a Manhattan plot of the GWAS results.
 
 ```{r}
-# Your code here...
+# Load PLINK2 linear results
+assocA2 <- data.table::fread("./02_activities/data/gwa.qc_assoc.A2.PHENO1.glm.linear")
+
+head(assocA2)
+
+# Keep only the additive test (one row per SNP)
+assoc_addA2 <- assocA2[TEST == "ADD"]
+
+# qqman expects columns named CHR, BP, SNP, and P.
+# In PLINK2 output they are #CHROM, POS, ID, and P.
+setnames(assoc_addA2, c("#CHROM","POS","ID"), c("CHR","BP","SNP"))
+
+png("./02_activities/assignments/manhattan_plot.png", width = 1200, height = 800, res = 150)
+# Manhattan plot
+manhattan(assoc_addA2,
+          chr = "CHR",
+          bp  = "BP",
+          snp = "SNP",
+          p   = "P",
+          xlab = "Chromosome",
+          ylab = "-log10(P)",
+          suggestiveline = FALSE,
+          cex.axis = 1.5,
+          col = c("lightblue", "lightslateblue"))
+dev.off()
 ```
 
 (iii) Create a QQ plot of the GWAS p-values.
 
 ```{r}
-# Your code here...
+png("./02_activities/assignments/QQ_plot.png", width = 1200, height = 800, res = 150)
+# Q-Q plot
+qq(assoc_addA2$P, main = "Q-Q plot of GWAS p-values")
+dev.off()
 ```
 
 ### Criteria
diff --git a/02_activities/assignments/manhattan_plot.png b/02_activities/assignments/manhattan_plot.png
new file mode 100644
index 0000000..71b3b30
Binary files /dev/null and b/02_activities/assignments/manhattan_plot.png differ

Criteria	Complete	Incomplete
Data Inspection	Correct sample/SNP counts and variable type identified.	Missing or incorrect counts or variable type.
Allele Frequency Estimation	Correct allele and minor allele frequencies computed.	Frequencies missing or wrong.
Hardy–Weinberg Equilibrium Test	Correct PLINK command and p-value extraction in R.	PLINK command or extraction incorrect/missing.
Genetic Association Test	Correct regressions, plots, coding, and interpretation.	Regression, plots, or interpretation missing/incomplete.
Criteria	Complete	Incomplete
Data inspection	Correct sample and SNP counts; phenotype plot with a brief comment on Gaussianity.	Counts or phenotype description/plot missing or incorrect.
QC & LD pruning	Correct PLINK2 QC command and thresholds.	QC/pruning commands, thresholds, or output datasets missing or incorrect.
Relatedness & PCA	Correct use of PLINK2 command to obtain unrelated samples and PCA run on pruned SNPs.	Relatedness step, unrelated dataset, or PCA analysis missing or incorrect.
GWAS & visualisation	Linear regression GWAS with PCs as covariates; Manhattan and QQ plots produced.	GWAS command, or Manhattan/QQ plots missing or clearly incorrect.