-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path08.0_TwoPops.Rmd
1601 lines (897 loc) · 66.8 KB
/
08.0_TwoPops.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
html_document:
fig_caption: yes
number_sections: yes
theme: readable
toc: yes
toc_depth: 3
editor_options:
chunk_output_type: console
---
```{r setup0, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE, include=FALSE}
# load packages for chapter
options(digits = 10)
library(bookdown)
library(emmeans)
library(ggplot2)
library(dplyr)
library(kableExtra)
library(knitr)
library(tables)
library(plyr)
library(pander)
library(multcomp)
library(agricolae)
library(nlme)
library(car)
library(tidyr)
library(latex2exp)
```
# Two Populations Means {#ch2pops}
## Learning Objectives for Chapter {#LearnObj8}
1. Compare two population means based on two samples
1. Determine if samples are paired or unpaired ("independent") when comparing two means
1. State the null and alternative hypothesis for a two sample t-test
1. Calculate the sample averages and the difference of the sample averages
1. Perform an F-test of homogeneity of variance by calculating sample variances, and determine if you can "pool" the sample variances
1. Calculate the standard error of the difference
1. Describe the three cases for comparing two populations means and determine when each one is appropriate
1. Calculate the t-statistic and identify the appropriate critical t-value
1. Interpret the results of a t-test in terms of the original scientific question
1. Sketch the t distribution, with the following parts labeled:
- the critical t value
- the test statistic value for the sample
- the p value area under the curve
- the alpha level area under the curve
- where the CI bounds would lie on the X axis (approximately)
## Two Populations
We have learned in the previous chapters how to determine if the mean of a population sampled is significantly different from a hypothesized value. However, researchers may want to know if two populations are significantly different from each other when both populations are sampled and neither of the means are known. In research experiments, population means are hypothesized to differ when they represent different treatments that are applied to them. For example, one could conduct an experiment comparing a fertilized field to an unfertilized field to determine if they differ in yield, or an experiment comparing a high-carbohydrate diet to a high-protein diet to determine if diet affects cow's milk quality. These experiments are conducted by taking samples from each of the populations and using the difference between the two samples to make inferences about the difference between the two treatments or the two populations.
```{block, type = 'stattip'}
# Steps to Test Two Population Means
1. State the null and alternative hypotheses about the two populations.
2. Collect samples from each population.
3. Calculate the sample averages, $\bar{Y}_1$ and $\bar{Y}_2$, the difference between the averages,$\bar{d}$, and the sample variances, $S^2_1$ and $S^2_2$.
4. If you do not know whether the two population variances are equal, $\sigma_1 = \sigma_2$, perform an F-test to determine if you can pool the sample variances.
5. Calculate the standard error of the difference $S_{\bar{d}}$ between the two samples accordingly.
7. Compare the calculated t-statistic to the critical t-value, and decide whether to reject or fail to reject the null hypothesis
```
## Hypothesis Testing
Before starting calculations for the test of hypothesis, it is important to understand the scientific question being asked. The purpose of testing two population means is to determine if there is a statistically significant difference between two populations or two treatments. Thus, our null and alternative hypotheses can be stated as
<br>
$$\text{Null hypothesis: the mean of population 1 is equal to the mean of population 2} \\[15pt]$$
$$H_0 : \mu_1 = \mu_2 \quad \text{or} \quad \mu_{\bar{d}} = 0 \\[15pt]$$
$$\text{Alternative hypothesis: the mean of population 1 is not equal to the mean of population 2} \\[15pt]$$
$$H_1 : \mu_1 \neq \mu_2 \quad \text{or} \quad \mu_{\bar{d}} \neq 0$$
<br>
As it turns out, it is a lot better to pose hypotheses as inequalities, for example:
<br>
$$H_0 : \mu_2 \le \mu_1 + PSD \quad \text{or} \quad \mu_2 - \mu_1 \le PSD$$
$$H_0 : \mu_2 \gt \mu_1 + PSD \quad \text{or} \quad \mu_2 - \mu_1 \gt PSD$$
<br>
The reason for this is that just testing whether two means are different is not very informative. Presumably, all means are different, even if by an infinitesimal amount. So we already know that and it is not informative to test it. If we have large enough samples, the difference will always be detected. But by posing the hypothesis as a statement that says in which direction the difference goes and whether it is larger than a Practically Significant Difference, we have the opportunity to learn much more. In addition to that, we can use the PSD and the variances from the samples and we can calculate the *power* of the test, the probability of detecting a difference that is that large with the sample size available. If power is low, the inability to reject the null hypothesis does not mean much. The main point here is that we should pose hypotheses that give us the chance to learn more than what we already know. We know the means are different; but how different are they and which one is larger?
## Types of samples to compare two populations
There are two different methods of sampling to compare treatment means: **independent** observations and **paired** or dependent observations. It is important to understand how the two populations are sampled to decide which equations are appropriate for the experiment. If observations are paired, where each pair consists of one observation from one population and one observation from the other population, it is as if each pair were a block.
### Two samples of Independent Observations
Samples are considered **independent** when there is no relationship between the observations in one treatment and the observations in the other treatment, and the experimental units are randomly and independently assigned to a given treatment. For example, suppose that 8 pigs are available to determine the effects of two diets on weight gain. Individual pigs can be randomly assigned to one of the two different diets and their weight gain recorded after some time. In such case, each pig would be an experimental unit. There would be no reason to link the weight gain of any pig in one of the diets with any specific pig in the other diet. Weight gains in the two treatments are independent from each other.
Another way to understand the idea of independence of observations in this case is to think that the actual weight gain of each individual pig will have two important causes or sources of variation among pigs: diet and individual pig differences. Individual pigs will differ from each other because of their "personal" body type and composition and their "personal" level of activity and food use efficiency. These individual effects can potentially be different for all 8 pigs. The number of "individual pig"" effects and the number of observations would be the same. Assuming that the pigs were randomly selected from a relevant population, their individual effects would not be related to each other in any way (Figure \@ref(fig:IndSamples)).
<br>
```{r IndSamples, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '80%', fig.align='center', echo=FALSE, fig.cap ="A set of 8 independent pigs is randomly split into two treatments. Half of the eight pigs received Treatment 1 and half of the eight pigs received Treatment 2, where $r = 4$ is the sample size per treatment, and $i = 1, 2, 3, 4$ is the observation index for a given treatment. In this case, the random variable that estimates the difference between means is the difference between averages, where each average comes from 4 observations."}
knitr::include_graphics("images/CH8IndSamp.png")
```
<br>
### One sample of Paired Observations
Observations are considered **paired** or dependent when there is a relationship between observations in one treatment and observations in the other treatment. Each observation in one treatment is associated with one and only one observation in the other treatment by some grouping or pairing factor that can potentially affect weight gain. For example, observations that are made on the same experimental unit are obviously paired by the experimental unit. Observations that are made in the same area could also be considered to be paired. Many factors that can group observations, such as class, size, age, paddock, pen, furrow, etc.
Following the example of pig diets, instead of using 8 pigs measured once each, we could have used 8 pigs and measured their performance under each of the diets during two different periods. Suppose that in period 1, pigs 1-4 received the high protein diet and pigs 5-84 received the high carbohydrate diet. Then, after a period of equalization, diets were switched during period 2. In this case we would obtain a sample with 16 weight gains, but there would be only 8 individual pig effects, and each observation in one diet would be paired to an observation in the other diet because they were both obtained from the same pig. The number of observations would be twice the number of individual pig effects, which means that each pig effect would enter in two observations, thus introducing a correlation or lack of independence between observations.
<br>
```{r PairedMeasurements, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '90%', fig.align='center', echo=FALSE, fig.cap ="Paired observations of eight pigs under two treatments. Each of the eight pigs has one measurement recorded for each treatment, where $r = 8$ is the number of observations per treatment. In this case, the random variable that estimates the difference between means is the average of differences between members of each pair, which comes from 8 pairs of observations."}
knitr::include_graphics("images/CH8PairSamp2Measurements.png")
```
<br>
In the above example, the samples were paired because the same individual pigs were used for each treatment. Paired sampling is not limited to before-and-after experiments on the same individuals, however. It can be extended to situations where there is some relationship or dependency between the replicates across treatment groups. For example, pigs that are from the same pen share genetic and environmental effects that are unique to that pen, and we can therefore treat the pens (and the pigs that come from them) as our experimental units. Imagine there are three different pig pens. From each pen, two pigs are randomly selected and assigned to different diets. The pigs that come from the same pen are "paired" and the differences between weights from each of the paired samples are measured. In this example, the experimental unit is not individual pigs but the pens that the pair of pigs come from.
Pairing of observations can result from other experimental procedures, and those pairings may have to be incorporated in the analysis. Pigs that are from the same litter and pen share genetic and environmental effects that are unique to that pen, and we can treat pairs of individuals from the same pen as associated by the common pen effect. Observations on twins are paired due to common genetic makeup, common womb environment and potentially common rearing environment.
<br>
```{r PairedSamples, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '90%', fig.align='center', echo=FALSE, fig.cap ="Paired observations of pigs from the same pen under two treatments. Only one pig from each of the three pens received treatment 1, while the remaining pig from each of the three pens recieved treatment 2. Each pen has two measurements recorded for each treatment, where $r = 3$ is the number of observations per treatment, and $i = 1,2,3$ is the observation number for a given treatment."}
knitr::include_graphics("images/CH8PairSamp.png")
```
<br>
### Advantages and disadvantages of paired observations
Why use independent or paired procedures? Each has advantages and disadvantages, and the best choice depends on many factors, but particularly on the cost of using each experimental unit, on the amount of variation among units (e.g. pigs) and the degree to which individual unit effects remain constant from one observation to another.
- Independent observations may require more units (pigs) than paired ones for the same total number of observations.
- Paired observations remove the individual effects from the errors, because each unit is observed under both treatments and values subtracted from each other. By subtracting, the error due to individual (pig) cancels out.
- If individual effects are small relative to other sources of error, treating the observations as pairs is inefficient and reduces the power of the test.
## Comparing Sample Variances {#compare2Variances}
When observations are independent, we will have two estimated variances, one for each sample. In some cases we may have previous information indicating that the population variances are equal or unequal and we use that information to determine if we can pool the estimates of the variance from each sample. When there is no information about the equality of the variances we can use an F-test to determine if variances estimates can be pooled. Pooling variance estimates is very desirable because it increases the degrees of freedom and results in tests that have lower probability of error Type II.
Population variances are estimated with the sample variances, $S^2_i$, for each of the populations or treatments.
<br>
$$S^2_i = \frac{\sum (Y_{ij} - \bar{Y}_i)^2}{r_i - 1}$$
<br>
To perform a test of equality of variances we need a statistics that has a known distribution to be able to associate probabilities with sample results. It turns out that, as seen in the [F Distribution section of Chapter 6](#FDist), a quotient of independent estimates of the same variance has an F distribution. Therefore, we use the F distribution to determine the probability of obtaining variance estimates that are as different as or more than observed when indeed the population variance is the same. If the probability is too small (we use $\alpha = 0.10$ for this test), we reject the equality of variances.
When the variances are found to be different, we clearly need to estimate two different variances. However, when we cannot reject the hypothesis of equality, it may simply be due to lack of power, which usually results from small sample size. Therefore, the decision to pool variances because of a failure to reject the proposed equality may not be a good decision. In the end, what really matters is how much the variances differ. For practical purposes, in this course we test the equality of variances with $\alpha = 0.10$, and if we fail to reject the equality, we pool the variance estimates from the two samples.
<br>
$$\text{Null hypothesis: the variance of population 1 is equal to the variance of population 2} \\[20pt]$$
$$H_0 : \sigma_1 = \sigma_2 \\[20pt]$$
$$\text{Alternative hypothesis: the variance of population 1 is not equal to the variance of population 2} \\[25pt]$$
$$H_1 : \sigma_1 \neq \sigma_2 \\[20pt]$$
<br>
**Calculated F**
<br>
```{r FCurve, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '80%', fig.align='center', echo=FALSE, fig.cap ="F-Distribution for testing the equality of variances. The shaded area on the right is $\alpha/2$. Calculated F values in the right tail lead to rejection of the hypothesis of equality of variances."}
### F()
df1 <- 15
df2 <- 15
# Two tails on the right
curve(df(x, df1 = df1, df2 = df2),
from = 0,
to = 5,
lwd = 2.5,
xlab = "",
ylab = "",
ylim = c(-0.05, 0.9))
abline(h = 0)
# Add the shaded area.
cord.x <- c(qf(0.95, df1 = df1, df2 = df2),
seq(qf(0.95, df1 = df1, df2 = df2), 5, 0.02),
qf(0.95, df1 = df1, df2 = df2))
cord.y <- c(0,
df(seq(qf(0.95, df1 = df1, df2 = df2), 5, 0.02),
df1 = df1, df2 = df2), 0)
polygon(cord.x, cord.y, col = "skyblue")
curve(df(x, df1 = df1, df2 = df2),
from = 0,
to = 5,
lwd = 2.5,
xlab = "",
ylab = "",
add = TRUE,
ylim = c(-0.05, 0.9))
text(y = -0.04, x = qf(0.95, df1 = df1, df2 = df2), pos = 4,
labels = "Reject equality of variances", col = "red")
lines(x = c(2.5, 2.8), y = c(0.03, 0.20))
library(latex2exp)
text(y = 0.19, x = 2.80, pos = 4,
labels = TeX("$\\frac{\\alpha}{2}"), col = "blue")
```
<br>
In the calculation, the larger of the two variances is used as the numerator and the smaller of the two variances as the denominator.
<br>
$$F = \frac { \text{larger} \ S^2}{\text{smaller} \ S^2} \qquad \text{with} \qquad df_{\text{numerator}} = r_{\text{larger}} -1 , df_{\text{denominator}} = r_{\text{smaller}} -1$$
<br>
This calculated F-value can be tested for significance against the critical F-value on the F-distribution table. The degrees of freedom for the two samples are needed to identify the critical F-value on the F-distribution table to determine if the calculated F-value is significant and if our sample variances are equal. Since this is a two-tailed F-test, the $\alpha$ value to determine the critical F-value, $F_{crit}$, will be divided by two, $\frac{\alpha}{2}$.
The results of the F-test will help identify which case to use for calculating the standard error of the difference for our samples.
**Decision rule**
If $F_{calc} > F_{crit, \frac{\alpha}{2}}$, then the null hypothesis, $H_0 : \sigma^2_1 = \sigma_2^2$), is rejected and the population variances are not equal.
If $F_{calc} < F_{crit, \frac{\alpha}{2}}$, then the null hypothesis, $H_0 : \sigma_1^2 = \sigma_2^2$), is not rejected and the population variances are assumed to be equal.
<br>
```{r FTestDec, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '80%', fig.align='center', echo=FALSE, fig.cap ="F-Test Decision Table from www.statistics4u.info"}
knitr::include_graphics("images/CH8FTestDec.png")
```
<br>
## Cases to test the difference between two population means
There are 3 cases to perform this test. When observations are independent, we can have two cases. Case 1 is the case when variances are known to be equal or the test results in non-rejection of equality of variances. Case 2 applies when variances are known to be unequal or the test results in the rejection of equality of variances. Paired observations are called Case 3, where there is only one variance, because the original set of paired observations reduces to a single set of differences that are treated as a single mean to be tested against 0. The equations for Cases 1 and 2 are the same up to the point where we either pool variances or not. The main subdivision between Cases 1 & 2 vs, Case 3 is depicted with formulas and a numerical example in Figure \@ref(fig:PairedIndepCalcFig)
<br>
```{r PairedIndepCalcFig, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '90%', fig.align='center', echo=FALSE, fig.cap ="Test of differences between two population means when observations are paired (top) and independent (bottom). The independent case is further divided into Case 1 and Case 2 depending on whether the variance estimates can be pooled or not, respectively. When variances can be pooled, the equation for the variance of the difference between averages is further simplified by using the same pooled variance for both samples."}
knitr::include_graphics("images/PairedIndepCalcFig.png")
```
<br>
### Case 1: Independent samples with equal population variances (#Case1)
When samples are independent, the random variable of interest is the difference between two independent means. We apply the [Properties of Mean and Variance] of random variables to obtain an estimate of the variance of $\bar{d} = \bar{Y_2} - \bar{Y_1}$. Because the averages are independently obtained, the variance of the difference between averages is the sum of the variances of each average. Using the symbols "$V\{\text{random variable}\}$" to represent the variance of a random variable, we have:
sample averages, $\bar{Y_i}$, for each of the populations or treatments and the difference between these two averages, $\bar{d}$ are
<br>
$$\bar{Y}_i = \frac{1}{r_i} \sum_{j=1}^{r_i} Y_{ij} \qquad \text{where} \ i \ \text{is the population, 1 or 2, and } \ j \ \text{is the observation number} \\[20pt]$$
$$\bar{d} = \bar{Y}_1-\bar{Y}_2$$
<br>
$$V\{\bar{d}\} = V\{\bar{Y_2} - \bar{Y_1}\} = V\{\bar{Y_2}\} + V\{\bar{Y_1}\} \\[20pt]
= \frac{V\{Y_2\}}{r_2} + \frac{V\{Y_1\}}{r_1}$$
<br>
Those variances are estimated from the samples, so we use capital S to refer to the estimates:
<br>
$$S_{\bar{d}}^2 = \frac{S_{Y_2}^2}{r_2} + \frac{S_{Y_1}^2}{r_1}$$
<br>
In this case when the two population variances are equal, then the two sample variances are estimates of the same population variance. To get a better estimate of this single population variance we can pool the deviations about each average and obtained a pooled sample variance $(S_Y^2)$ that has more degrees of freedom.
<br>
$$S_Y^2 = \frac{S_{Y_1}^2 \ (r_1-1) + S_{Y_2}^2 \ (r_2-1)}{(r_1 + r_2-2)} \qquad \text{with} \qquad df = r_1+r_2-2$$
<br>
The pooled variance is now applied to the equation for $S_{\bar{d}}^2$ to obtain
$$S_{\bar{d}}^2 = \frac{S_Y^2}{r_1} + \frac{S_Y^2}{r_2} \qquad \text{with} \qquad df = r_1+r_2-2$$
<br>
The test statistic for the hypothesis $H_0: \mu_{\bar{d}} = 0$ can be calculated as
<br>
$$t_{calc} = \frac{\bar{d}-\mu_{\bar{d}}}{S_{\bar{d}}} = \frac{\bar{d}}{S_{\bar{d}}}$$
<br>
The decision rule is as before, for a single population when variance was unknown (Figure \@ref(fig:NormalTails)).
### Bean Drought Example: independent observations, equal variances
Below is data collected on common bean plots that were randomly assigned to two different irrigation treatments: drought and normal irrigation. Yield data was collected from these plots and recorded in the table below.
<br>
<br>
| Treatment | Yield 1 | Yield 2 | Yield 3 | Yield 4 | Total | Mean |
|------------:|:-------:|:-------:|:-------:|:-------:|:-------:|:------:|
| Drought | 590 | 720 | 720 | 190 | 2220 | 555 |
| Irrigated | 2990 | 2950 | 2660 | 2120 | 10720 | 2680 |
<br>
```{r}
bean.yield <- data.frame(
yield = c("yield 1", "yield 2", "yield 3", "yield 4"),
drought = c(590, 720, 720, 190 ),
irrigated = c(2990, 2950, 2660, 2120))
```
<br>
**Hypotheses**
This experiment features two distinctly defined treatments: drought and irrigated. Researchers want to determine if there is a significant difference between the two treatments, therefore, the null and alternative hypotheses can be stated as
<br>
$$\text{Null hypothesis:} \\[25pt]
\text{the mean of the drought treatment is equal to the mean of the irrigated treatment, or} \\[25pt]
\text{there is no difference between drought and irrigated treatments on common bean yield} \\[25pt]$$
$$\mu_1 = \mu_2 \quad \text{or} \quad \mu_{\bar{d}} = 0 \\[25pt]$$
$$\text{Alternative hypothesis: } \\[15pt]
\text{the mean of the drought treatment is not equal to the mean of the irrigated treatment, or } \\[15pt]
\text{there is a difference between drought and irrigated treatments on common bean yield} \\[15pt]$$
$$\mu_1 \neq \mu_2 \quad \text{or} \quad \mu_{\bar{d}} \neq 0$$
<br>
**Sampling Method**
Since the samples are randomly assigned to the treatment and there is no additional information provided on how each sample was assigned to a given treatment, we can assume that the observations are independent of one another.
**Difference between averages**
As the calculations begin on the two treatments of this experiment, we will designate **Sample 1** as the **drought** treatment and **sample 2** as the **irrigated** treatment to simplify labeling. Be careful to maintain consistency of this designation throughout the complete set of calculations.
Since the observations are independent, the average of each sample is calculated as
<br>
\begin{equation}
\bar{Y}_1 = \frac{\sum Y_{i1}}{r_1} \\[20pt]
= \frac{590 + 720 + 720 + 190}{4} \\[20pt]
= 555 \\[25pt]
\bar{Y}_2 = \frac{\sum Y_{i2}}{r_2} \\[20pt]
= \frac{2990 + 2950 + 2660 + 2120}{4} \\[20pt]
= 2680
\end{equation}
<br>
$$\bar{Y}_2 = \frac{\sum Y_{2i}}{r_2} = \frac{2990 + 2950 + 2660 + 2120}{4} = 2680$$
<br>
and the average difference is calculated as the difference between the two averages.
<br>
$$\bar{d} = \bar{Y}_2 - \bar{Y}_1 = 2680 - 555 = 2125$$
<br>
```{r, message=FALSE, warning=FALSE}
#the sum of the observations in each treatment
sum.Y1 <- sum(bean.yield$drought)
sum.Y2 <- sum(bean.yield$irrigated)
#the number of observations in each treatment
r1 <- length(bean.yield$drought)
r2 <- length(bean.yield$irrigated)
# calculate the sample averages by dividing the sample sums by the number
# of observations for each treatment "by hand"
Ybar1 <- sum.Y1 / r1
Ybar2 <- sum.Y2 / r2
# a simpler method to calculate the sample averages
Ybar1 <- mean(bean.yield$drought)
Ybar2 <- mean(bean.yield$irrigated)
# calculate the average of the difference between the two treatments
dbar <- Ybar2 - Ybar1
```
<br>
**Sample Variances**
Before we determine if we can pool our sample variances, we need to determine if the population variances are equal. Information on the equality of the population variances may be provided for a given question. However, in the absence of this information, the equality of the sample variances can be tested as follows.
<br>
$$S^2_{Y_1} = \frac{\sum (Y_{1i} - \bar{Y}_1)^2}{r_1 - 1}\\[20pt]
= \frac{(590 - 555)^2 + (720 - 555)^2 + (720-555)^2 + (190-555)^2}{4 - 1} = 62966.67$$
<br>
$$S^2_{Y_2} = \frac{\sum (Y_{2i} - \bar{Y}_2)^2}{r_2 - 1}\\[20pt]
= \frac{(2990 - 2680)^2 + (2950 - 2680)^2 + (2660 - 2680)^2 + (2120 - 2680)^2}{4 -1} = 161000$$
<br>
```{r}
# calculate the sample variances for each treatment "by hand"
var1 <- sum((bean.yield$drought - Ybar1)^2) / (r1 - 1)
var2 <- sum((bean.yield$irrigated - Ybar2)^2) / (r2 - 1)
# a simpler way to calculate the sample variances
var1 <- var(bean.yield$drought)
var2 <- var(bean.yield$irrigated)
# both methods yield the same answers
```
**Test of equality of variances**
The purpose of an F-test is to determine if the population variances are different, therefore the null hypothesis is that the population variance of the drought treatment is equal to the population variance of the irrigated treatment. If the population variances are not different, sample variances can be pooled to provide a more accurate estimate of the common variance.
The calculated F-value is simply the ratio of the larger sample variance divided by the smaller sample variance, so the critical F value will always be in the right tail. The test should be performed with $\alpha$ equal to 0.05 or 0.10 to have greater power.
<br>
$$F_{calc} = \frac{ \text{larger} \ S^2 }{\text{smaller} \ S^2} = \frac{161000}{62966.67} = 2.56$$
<br>
```{r}
# since var.2 is greater than var.1, var.2 will be used as the numerator
# and var.1 will be used as the denominator
Fcalc <- var2 / var1
```
Next, compare the calculated F-value, $F_{calc}$, to the critical F-values, $F_{crit}$, which can be found in F-Distribution table or simply in R. With $df_{denominator} = 3$ , $df_{numerator} = 3$ and $\alpha = 0.10$, it is determined $F_{crit} = `r qf(p = 0.95, df1 = 3, df2 = 3)`$
<br>
```{r}
```{r, message=FALSE}
# calculation of F-critical value using alpha = 0.10 /2 since the F-test is two-tailed
p <- 0.10/2
Fcrit.upper <- qf(p, r1-1, r2-1, lower.tail = FALSE)
p <- 0.10 / 2
Fcrit.upper <- qf(p, r1 - 1, r2 - 1, lower.tail = FALSE)
```
<br>
Since the F-statistic falls within the two critical F-values, we fail to reject the null hypothesis that the two population variances are equal.
<br>
$$F_{calc} = 2.56 < F_{crit_\alpha} = 9.28$$ and
Therefore, the population variances can be treated as equal and the sample variances can be pooled.
<br>
```{r, echo=FALSE, message=FALSE, warning=FALSE}
x <- seq(0.000001,16, 0.25)
pd <- df(x, df1 = 3, df2 = 3, ncp = 0)
plot(x, pd,
type = "l",
xlab = "F-Value", ylab = "Probability Distribution",
main = "the F-Distribution Curve",
lwd = 1.5)
abline(v = Fcrit.upper,
col = "chartreuse3",
lty = 3)
points(0.44, 0.57, pch = 4, col = "cornflowerblue")
text(1.5, 0.57, "F-calc", col = "cornflowerblue")
text(1.4,0.05, "lower F-crit", col = "chartreuse3")
text(14,0.05, "upper F-crit", col = "chartreuse3")
text(8,0.25, "fail to reject H0")
```
<br>
**Pooling Sample Variances**
Since the F-test did not reject equality of variances, sample variances are pooled to provide a more accurate estimate of the true variance. Since the two sample sizes are the same, $r_1 = 4$ and $r_2 = 4$, the following equation is used to pool the sample variances:
<br>
$$S_Y^2 = \frac{S_{Y_1}^2 \ (r_1-1) + S_{Y_2}^2 \ (r_2-1)}{(r_1 + r_2-2)}= \frac{62966.67 \times 3 + 161000 \times 3}{4 + 4 - 2} = 111983.33$$
<br>
```{r}
# Because sample sizes are the same, pooled variance can be calculated
# by averaging the two sample variances
pooled.var <- ( var1 + var2 ) / 2
```
<br>
since our two sample sizes are equal, the degrees of freedom are
<br>
$$df = 2(r-1) = 6$$
<br>
```{r}
# since r1 = r2 = 4, we can just use r = 4 for continued calculations
r <- length(bean.yield$yield)
# the degrees of freedom are calculated as
df <- 2 * (r -1 )
df <- 2 * (r - 1 )
```
**Standard Error of the Difference**
The standard error of the difference is calculated using the pooled variance $S^2$ and our sample size $r$, where $r_1 = r_2 = r$
The standard error of the difference is calculated using the pooled variance $s^2$ and our sample size $r$, where $r_1 = r_2 = r$
<br>
\begin{equation}
s_{\bar{d}} = \sqrt{ \frac{2s^2}{r\mathstrut}} \\[20pt]
= \sqrt{\frac{2 \times 111983.33}{4\mathstrut}} \\[20pt]
= 236.63
\end{equation}
<br>
<br>
$$S_{\bar{d}} = \sqrt{\frac{2S_Y^2}{r}} = \sqrt{\frac{2 \times 111983.33}{4}} = 236.63$$
<br>
```{r}
# the standard error of the difference is calculated by taking the square root
# after multiplying the pooled variance by 2 and dividing by r
se.dbar <- sqrt( (2*pooled.var) / r)
```
<br>
**Calculating the t-statistic**
The t-statistic is used to determine if our null hypothesis is true that the population means are equal, $H_0: \mu_1 = \mu_2$ or that there is no difference between our population means, $H_0: \mu_{\bar{d}} = 0$.
The t-statistic, $t_{calc}$, is calculated by using the difference of the averages, $\bar{d}$ and the standard error of the difference, $S_{\bar{d}}$.
<br>
$$t_{calc} = \frac{\bar{d} - \mu_{\bar{d}}}{S_{\bar{d}}} = \frac{2125 - 0}{236.63} = 8.98$$
<br>
```{r}
# calculate the t-statistic by subtracting our hypothesized mean which is 0
# from the average difference and dividing by the standard error of the difference
t.calc <- ( dbar - 0 ) / se.dbar
t.calc <- (dbar - 0 ) / se.dbar
#a quicker method to calculate the t-statistic
(bean.test.equal <- t.test(bean.yield$irrigated, bean.yield$drought,
alternative = "two.sided", paired = FALSE,
var.equal = TRUE))
# both calculations yield the same value
```
<br>
The calculated t-statistic is compared to the critical t-value, $t_{crit}$, which can be found in the Student's t-Distribution Table. With the degrees of freedom of our pooled samples, $df = 2(r-1) = 6$, and $\alpha_{two-tailed} = \frac{0.05}{2}$, it is determined that $t_{crit} = `r qt(0.975, df = 6)`$
Since $t_{calc} = 9.33 > t_{crit} = 2.447$, we reject the null hypothesis that there is no difference between the drought treatment and the irrigated treatment.
<br>
```{r, echo=FALSE, message=FALSE, warning=FALSE}
```{r echo=FALSE, message=FALSE, warning=FALSE}
y <- seq(-6,10, 0.25)
pfd <- dt(y, df = 6, ncp = 0)
plot(y, pfd,
type = "l",
xlab = "", ylab = "")
criticalt1 <- qt(p = 1 - 0.025,
df = 6)
criticalt2 <- qt(p = 0.025,
df = 6)
plot(y, pfd,
type = "l",
xlab = "", ylab = "")
abline(v = criticalt1,
col = "chartreuse3",
lty = 3)
abline(v=criticalt2,
abline(v = criticalt2,
col = "chartreuse3",
lty =3)
lty = 3)
points(8.98, 0.0001, pch = 4, col = "cornflowerblue")
text(8.98, 0.03, "t-calc", col = "cornflowerblue")
text(-4,0.35, "lower t-crit", col = "chartreuse3")
text(4,0.35, "upper t-crit", col = "chartreuse3")
text(0,0.05, "fail to reject H0")
text(6,0.05, "reject H0")
```
<br>
### Case 2: Independent samples with unequal population variances (#Case2)
When the population variances are unequal, we calculate the standard error of the difference using the known information about the sample variances and sample sizes. This equation is the same as for Vase1, except that it cannot be further simplified because $\sigma_{Y_1}$ and $\sigma_{Y_2}$ are different and their estimates cannot be pooled.
<br>
$$S_{\bar{d}} = \left ( \frac{S_{Y_1}^2}{r_1}+\frac{S_{Y_2}^2}{r_2} \right) ^ {1/2}$$
<br>
The test statistic is calculated as
<br>
$$t_{calc} = \frac{\bar{d}-\mu_{\bar{d}}}{S_{\bar{d}}}$$
<br>
Since the population variances are unequal, we cannot use the Student's t-table to identify the critical t-value ($t_\alpha$). Therefore, the critical t-value is calculated using Satterthwaite's approximation, which is simply a weighted average of the t values for $r_1$ and $r_2$, using the corresponding estimated variances of the averages as weights:
<br>
$$t_{critical} = \frac{t_1 S_{\bar{Y_1}}^2+t_2 S_{\bar{Y_2}}^2}{S_{\bar{Y_1}}^2+S_{\bar{Y_2}}^2}$$
<br>
where $t_1$ and $t_2$ are the values "from the table" with $r_1 -1$ and $r_2 -1$ degrees of freedom, respectively. For $\alpha = 0.05$ in R code $t_1$ is `qt(p = 0.975, df = r1 - 1)` and $t_2$ is `qt(p = 0.975, df = r2 - 1)`.
### Bean Drought Example: independent observations with unequal variances
We use the same data as above, but we decide NOT to pool the variances. Therefore, the variance of the difference between averages is:
<br>
$$S_{\bar{d}} = \left ( \frac{S_{Y_1}^2}{r_1}+\frac{S_{Y_2}^2}{r_2} \right) ^ {1/2} = \left ( \frac{62966.67}{4}+\frac{161000}{4} \right) ^ {1/2} = 420.41 \\[30pt]
t_{calc} = \frac{2125 - 0}{420.41} = 5.055 \\[30pt]
t_1 = t_{0.975, df = 3} = 3.182446 \qquad \qquad t_2 = t_{0.975, df = 3} = 3.182446 \\[30pt]
t_{critical} = \frac{3.182446 \times 62966.67 + 3.182446 \times 161000} {62966.67 + 161000} = 3.182446$$
<br>
Because sample sizes are the same, the critical t corresponds to the critical t for common degrees of freedom and it would not be necessary to do the weighted average to arrive to the same number. The calculation is shown to remind the reader how it has to be done if sample sizes were different. The calculated t is larger than the critical t, so the null hypothesis is still rejected. However, note how the estimated standard deviation of the difference between averages is much larger than when the variances were pooled.
In R, the code to compute the t-test when variances cannot be pooled and samples are independent is:
```{r, message=FALSE, warning=FALSE}
(bean.test.unequal <- t.test(bean.yield$irrigated, bean.yield$drought,
alternative = "two.sided", paired = FALSE,
var.equal = FALSE))
```
The results are different from the test calculated by hand because R uses Welch's method, which instead of calculating a weighted average of t-values calculates an approximated degrees of freedom for the sum of the variances. In general, Welch's test is better than Student's t test because it has equal or better power and achieves Type I error rates closer to the nominal $\alpha$.
### Case 3: Paired samples (#Case3)
When observations are made on the same experimental unit (adjacent plots in the same field, pigs from the same pen) and are assigned to different treatments, they are considered **paired samples**. In this case, we treat the difference between the paired observations as the single variable of interest and we proceed as in the test for a single mean as seen in the previous chapter. The difference between paired averages, $\bar{d}$, the variance of the difference, $S_d^2$, the standard error of the difference, $S_\bar{d}$, and the test statistic, $t$, are calculated as
<br>
$$\bar{d} = \frac{\sum{d_i}}{r}$$
<br>
$$S_d^2 = \frac{\sum(d_i-\bar{d})^2}{r-1} \qquad \text{with} \qquad = r-1$$
<br>
$$S_\bar{d} = (\frac{S_d^2}{r})^{1/2}$$
<br>
$$t_{calc} = \frac{\bar{d}-\mu_{\bar{d}}}{S_{\bar{d}}}$$
<br>
If the null hypothesis is true, the calculated t has a Student's t distribution with r-1 degrees of freedom
### Two Populations Equation Summary
```{block}
Table: (\#tab:CaseEquations)
| Case | Pop Variances | Sample Size | Paired / Independent | Pooled Sample Variance | Standard Error of the Difference | df | t-statistic | Confidence Interval |
|-----:|:-------------:|:-------------:|:------------------:|:---------------------------------------------------------------:|:-------------------------------------------------:|:-----------:| :-------------------------------------------:|:----------------:|
| 1 | Equal |Equal/ Not Equal| Independent | $\frac{({{S_1}^2})({r_1-1})+({{S_2}^2})({r_2-1})}{(r_1+r_2-2)}$ | $\left (\frac{S^2}{r_1} + \frac{S^2}{r_2} \right )^{1/2}$ | $r_1+r_2-2$ | $\frac{\bar{d}-\mu_{\bar{d}}}{S_{\bar{d}}}$ | ${L\atop U} = \bar{d} \pm t_\alpha S_\bar{d}$ |
| 1 | Equal | Equal | Independent | $\frac{{{S_1}^2}+{{S_2}^2}}{2}$ | $(\frac{{2S}^2}{r})^{1/2}$ | $2(r-1)$ | $\frac{(\bar{d}-\mu_{\bar{d}})}{S_{\bar{d}}}$| ${L\atop U} = \bar{d} \pm t_\alpha S_\bar{d}$ |
| 2 | Not Equal |Equal/ Not Equal| Independent | | $(\frac{{S_1}^2}{r_1}+\frac{{S_2}^2}{r_2})^{1/2}$ | $r_1+r_2-2$ | $\frac{\bar{d}-\mu_{\bar{d}}}{S_{\bar{d}}}$ | ${L\atop U} = \bar{d} \pm t_\alpha S_\bar{d}$ |
| 3 | Equal | Equal | Paired | $S_d^2=\frac{\sum(d_i-\bar{d})^2}{r-1}$ | $(\frac{S_d^2}{r})^{1/2}$ | $r-1$ | $\frac{(\bar{d}-\mu_{\bar{d}})}{S_{\bar{d}}}$| ${L\atop U} = \bar{d} \pm t_\alpha S_\bar{d}$ |
```
## Confidence Intervals
After identifying the difference between the two samples averages ($\bar{d}$) and the standard error of the difference ($S_{\bar{d}}$), a confidence interval for the mean difference can be calculated to understand the level of confidence associated with the estimated mean difference.
<br>
$${U\atop L} = \bar{d} \pm \ t_{\alpha/2} \ S_\bar{d}$$
<br>
Note that for Case 2, you will need to calculate the critical t-value, however for Case 1 and 3 you can identify this value from the Student's t-table with the appropriate degrees of freedom (see appropriate Case for equation). Depending on which case is used to calculate the standard error of the difference, the degrees of freedom to identify the critical t-value ($t_{critical}$) will vary when calculating your confidence intervals.
## Decision to Reject or Fail to Reject the Null Hypothesis
After all of the steps and calculations, the calculated t-value and the critical t-value, $t_{\alpha}$, can be used in the final decision to reject or fail to reject the null hypothesis. **The null hypothesis is never "accepted", rather the decision is to fail to reject.** Remember, the null hypothesis when testing two population means is that the two populations are equal, $H_0 : \mu_1 = \mu_2 \ \ \text{or} \ \ \bar{d} = 0$
<br>
$$\text{If} \quad t_{calc} > t_{critical} \quad \text{and} \ P (t > t_{calc}) < \alpha \quad \text{ we reject the null hypothesis} $$
$$\text{If} \quad t_{calc} < t_{critical} \quad \text{and} \ P (t > t_{calc}) > \alpha \quad \text{ we fail to reject the null hypothesis}$$
<br>
```{r DecisionMaking, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '70%', fig.align='center', echo=FALSE, fig.cap ="General procedure for sampling two populations to make a final decision to reject or fail to reject the null hypothesis"}
knitr::include_graphics("images/CH8Strategy.png")
```
<br>
## Bean Drought Example - Paired
Below is same data collected on common bean plots from the previous example. *However, it is revealed that there are 4 randomly selected varieties of common bean that were each tested under both drought and normal irrigation*. The yields are grouped by variety. Yield data were collected from these plots and recorded in the table below.
<br>
Table: (\#tab:BeanDrought) Common bean yield (kg/ha) is measured under drought treatment and irrigation in four varieties in Quilichao, Columbia (Sponchiado, 1985). The difference is calculated by subtracting the drought yield from the irrigated yield (sample 2 - sample 1)
<br>
| Treatment | Var 1 | Var 2 | Var 3 | Var 4 | Total | Mean |
|------------:|:------:|:------:|:-------:|:------:|:-------:|:------:|
| Drought | 590 | 720 | 720 | 190 | 2220 | 555 |
| Irrigated | 2990 | 2950 | 2660 | 2120 | 10720 | 2680 |
| Difference | 2400 | 2230 | 1940 | 1930 | 8500 | 2125 |
<br>
```{r}
```{r BeanDroughtPaired, message =FALSE, echo = FALSE}
bean.paired <- data.frame(
"variety" = c("var1", "var2", "var3", "var4"),
"drought" = c(590, 720, 720, 190 ),
"irrigated" = c(2990, 2950, 2660, 2120))
```
<br>
### Stating the Hypotheses
**Our null and alternative hypotheses will be the same as before**
<br>
$$\text{Null hypothesis: there is no difference in yield between drought and irrigated treatments on common bean} \\[15pt]$$
$$\mu_1 = \mu_2 \quad \text{or} \quad \mu_{\bar{d}} = 0$$
$$\text{Alternative hypothesis: there is a difference in yield between drought and irrigated treatments on common bean}\\$$
$$\mu_1 \neq \mu_2 \quad \text{or} \quad \mu_{\bar{d}} \neq 0$$
<br>
### Sampling Method
**The sampling method is now different**. Since there are four known common bean varieties planted in both drought and irrigated treatments, each variety is considered an experimental unit and the observation can be considered paired.
### Calculating Sample Averages and the Average of the Difference
For paired samples, it is not necessary to calculate sample averages since we will directly calculate the difference of each experimental unit (each variety) between the two treatments as the average of the difference
For paired samples, we take the difference of each experimental unit (i.e., each variety) between the two treatments, then take the average of the differences.
<br>
$$\bar{d} = \frac{d_i}{r} =\frac{\sum(Y_{2i} - Y_{1i})}{r} = \frac{(2990-590) + (2950 -720) + (2660-720) + (2120-190)}{4} = 2125$$
<br>
```{r}
# create a new column of the difference between the drought
# and irrigated treatment columns
bean.paired$d_i <- bean.paired$irrigated - bean.paired$drought
# calculate d_i by adding the column containing the differences
# between the two treatments
sum.d_i <- sum(bean.paired$d_i)
# calculate r as the number of varieties (pairs of treatments)
r.pair <- length(bean.paired$variety)
# calculate d_bar
dbar.pair <- mean(bean.paired$d_i)
```
<br>
### Calculating the Variance of the Difference
We do not need to calculate individual sample variances since we will directly calculate the variance of the differences between the treatments for each variety
We do not need to calculate individual sample variances, since we will directly calculate the variance of the paired differences:
<br>
$$S^2_{d} = \frac{\sum( d_i - \bar{d})^2}{r-1} = \\[30pt]
=\frac{(2400-2125)^2 + (2230-2125)^2 + (1940-2125)^2 + (1930-2125)^2}{3} = 52966.67$$
<br>
```{r}
#calculate the variance of the differences by adding the paired differences from the average difference divided by r - 1
(var.d.pair <- sum((bean.paired$d_i - dbar.pair) ^ 2) / (r.pair - 1))
# alternatively:
var(bean.paired$d_i)
```
<br>
### Calculating the Standard Error of the Difference
<br>
$$S_{\bar{d}} = \sqrt{\frac{S_d^2}{r}} = \sqrt{\frac{52966.67}{4}} = 115.07$$
<br>
```{r}
# calculate the standard error of the difference by taking the square root
# of the variance of the difference divided by r
se.dbar.pair <- sqrt(var.d.pair / r.pair)
```
### Calculating the t-statistic
The t-statistic is calculated using the same equation as in the independent sampling example
<br>
$$t_{calc} = \frac{(\bar{d} - \mu_{\bar{d}})}{S_{\bar{d}}} = \frac{2125 - 0}{115.07} = 18.47$$
<br>
```{r}
# calculate the t-statistic by dividing the difference of the averages
# by the standard error of the difference
t.calc.pair <- (dbar.pair - 0) / se.dbar.pair
# a simpler way to calculate the t-statistic
(t.test.pair <- t.test(bean.paired$irrigated,
bean.paired$drought,
alternative = "two.sided",
paired = TRUE))
```