ANOVA.Rmd

---
title: "ANOVA"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

If you want to compare means from two or more groups/populations, you can use the ANalysis Of VAriance (ANOVA). In R, this function is aov() and gives you a F-test value to show if there are differences between the means. If so, we can do mutliple comparison tests to figure out how they are different.

Say we have some simple data about weight loss from 3 drugs:

```{r}
weightLoss=c(2,5,4,0,1,7,8,8,6,9,3,1,7,4,5)
drug=c(rep("A",5),rep("B",5),rep("C",5))
weightData=data.frame(weightLoss, drug)
weightData
```

Let's plot the data using a boxplot format:
plot(response ~ factor, data=data_name)
```{r}
plot(weightLoss ~ drug, data = weightData)
```

Looking at the plot, it seems that the mean for drug A is the lowest, B is the highest, and C is in the middle. What about the variance for each drug? 

Let's run the ANOVA now to look at differences in the means between the 3 groups.

We will use the formula:

aov(response ~ factor, data=data_name)  
-->seem familiar??

Then we can use the summary() function to see our ANOVA table.

```{r}
results=aov(weightLoss ~ drug, data=weightData)
summary(results)
```

From the table, we find our F value of 10.04 with a significance of 0.0027.

We can reject the hypothesis that all 3 drug means are equal.

Now what about figuring out how the means are different? We can now do multiple comparison tests. We'll show you the pairwise t-test with Bonferroni correction and the Tukey method. Use the Bonferroni method when you are selecting which means to compare, and the Tukey method when comparing every mean to every other mean.

pairwise.t.test will perform pair-wise comparisons between group means, and we can correct for multiple comparisons with using the Bonferroni method:

pairwise.t.test(response, factor, p.adjust=method, alternative=c("two.sided","less","greater"))

```{r}
pairwise.t.test(weightLoss, drug, p.adjust="bonferroni")
```

From the output of this test, we can see the comparisons of the means between A&B and C&B are significantly different (p=0.0027, 0.0315, respectively) but the means between A&C are not significantly different (p=0.6097).

If you want to run a Tukey, use this formula:

TukeyHSD(x, conf.level=0.95) 

where x is a fitted model object (like what we got from aov()).

```{r}
results=aov(weightLoss ~ drug, data=weightData)
TukeyHSD(results, conf.level=0.95)
```

This table shows us that the means between A&B and C&B are significantly different (p=0.0024, 0.026, respectively), but the means between A&C are not sig different (p=0.398). This confirms our Bonferroni test!