LabCRD4book.Rmd

---
title: "Lab05 CRD Anova"
author: "YourFirstName YourLastName"
date: "enter date here"
output: html_document
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library(pander)

```


#### Instructions

For this lab you will modify this file and submit this file with the file name changed so it has your email ID (the part before @) in lower case instead of "email." Do not add spaces to the file name.

This is a markdown document. You will type your code and run it one line at a time as you add it in the lines indicated below. Add code **ONLY** in the areas between ```` ```{r} ```` and ```` ``` ````. These areas are highlighted with a light grey color. Text outside those bookends will be interpreted as simple text, not code. Experiment by running each line and modifying it you get the desired result. Keep the lines that work and move on. At any time you can see if your document "knits" or not by clicking on the Knit to HTML icon at the top. Once you have completed all work, knit your document and save the html file produced with the same file name but with an html extension (Lab05email.html).

**Submit BOTH files for your lab report using the appropriate Canvas tool**

For each part and question below, type your code in the grey area below, between the sets of back-ticks (```) to perform the desired computation and get output. Type your answers **below** the corresponding grey area.

In this exercise we analyze the data resulting from an experiment at the UC Davis Plant Sciences research fields. The data and description of the experimental setup have been simplified to facilitate understanding. For the purpose of this lab we will consider that the experiment was designed as a Completely Randomized Design in which 12 treatments were applied randomly to 13 plots each. We will also assume that the variances of the errors or residuals are the same in all treatments. One medusahead plant was grown in each plot and its total seed production was recorded at maturity (seedMass.g).

Treatments resulted from combining two levels of nitrogen fertilization (n: no fertilizer added and N: nitrogen added), two levels of watering (w: no water other than rain, W: with water added) and three environments (s: areas without addition of seed representing the typical California Annual Grassland, S: areas where native perennial grasses were seeded, E: edge between seeded and unseeded areas). Based on previous experience and plant biology, we expect medusahead seed production to be lowest without water or fertilizer and when exposed to competition by native perennial grasses.

This exercise has 4 parts. First we read in and inspect the data using averages and box plots. Second, data are analyzed as if they came from a Completely Randomized Design, ignoring the blocking. We do this to be able to assess the impact of using a block design by comparison. Third, we compute the ANOVA for the RCBD using basic functions for calculations. Each observation is partitioned into the components indicated by the model and then the corresponding sums of squares (SS) and degreees of freedom (df) are computed.

#### Part 1. Inspection and summary of data [25 points]

Get a histogram of the data. Make a graph showing boxplots of the data for each treatment. Notice that the data have a highly skewed distribution, so a logarithmic transformation is necessary. Create a new column called logSmass that contains the log of seed mass after adding 0.3 to seed mass. We add 0.3 to avoid problems with the zeroes, because log(0) is not defined. Plot histograms and boxplots of the new transformed variable. Inspect the boxplots and determine if there appear to be any effects of treatments on the seed production of the invasive exotic weed. Create a table of averages and standard errors by treatment.


```{r}

# seed <- read.csv(file = "Lab05SeedMassData.txt", header = TRUE)

seed <- read.table(header = TRUE, text = "
id block Treatment seedMass.g
1 1 WNE 0.8452
2 1 WNE 1.628599896
3 1 WNE 1.71330012
5 1 WNE 0.605599925
63 2 WNE 1.478700073
64 2 WNE 1.655799925
65 2 WNE 3.579000285
66 2 WNE 2.52810012
67 2 WNE 0.676500068
119 3 WNE 0.84060003
120 3 WNE 0.455200004
121 3 WNE 0.642700076
122 3 WNE 0
6 1 wNE 2.39799978
7 1 wNE 3.630199812
8 1 wNE 3.19679986
9 1 wNE 1.579800105
10 1 wNE 2.524800075
73 2 wNE 1.628599875
74 2 wNE 5.239899819
75 2 wNE 2.374499816
76 2 wNE 2.957700192
128 3 wNE 0.39629997
129 3 wNE 1.11970005
130 3 wNE 2.129599824
131 3 wNE 1.958600049
13 1 wnE 0.049800007
14 1 wnE 0.46560003
15 1 wnE 0.358300019
68 2 wnE 0.616600044
69 2 wnE 1.018299904
70 2 wnE 0.402999948
71 2 wnE 0.600599976
72 2 wnE 0.420299968
132 3 wnE 1.267800064
133 3 wnE 0.722599913
134 3 wnE 0.6414
135 3 wnE 1.559600013
136 3 wnE 0.301799972
16 1 WnE 0.49499996
17 1 WnE 0.42920001
18 1 WnE 0.52509996
19 1 WnE 0.314599977
78 2 WnE 1.070900032
79 2 WnE 0.523999944
80 2 WnE 0.879599924
81 2 WnE 1.015500081
123 3 WnE 0.745699944
124 3 WnE 0.804199928
125 3 WnE 0.633399966
126 3 WnE 0.965900001
127 3 WnE 1.932200036
20 1 wnS 0.3136
21 1 wnS 0.43789998
22 1 wnS 0.326600029
23 1 wnS 2.688000014
24 1 wnS 0.438300029
83 2 wnS 0.184299984
85 2 wnS 0.3579
86 2 wnS 0.2591
152 3 wnS 0.299800035
153 3 wnS 0.713900075
154 3 wnS 0.384100024
155 3 wnS 0.67930006
156 3 wnS 0.631599974
25 1 WNS 2.424700152
26 1 WNS 0.5808
27 1 WNS 0.584400068
28 1 WNS 0.67930005
91 2 WNS 0.7138
92 2 WNS 0.057500004
93 2 WNS 0.18880002
94 2 WNS 2.87149987
95 2 WNS 0.408300021
147 3 WNS 0.62479995
148 3 WNS 0.471400036
150 3 WNS 1.432499985
151 3 WNS 1.666099926
30 1 wNS 0.81229995
31 1 wNS 3.78009968
32 1 wNS 2.014800174
33 1 wNS 1.697400132
34 1 wNS 1.161500076
87 2 wNS 0.997699956
88 2 wNS 2.960099866
89 2 wNS 1.23570003
90 2 wNS 1.034799964
138 3 wNS 1.05819988
139 3 wNS 2.100399994
140 3 wNS 1.347200127
141 3 wNS 2.373300001
35 1 WnS 0.455399956
36 1 WnS 0.258800003
38 1 WnS 0.30340002
39 1 WnS 0.2277
96 2 WnS 0.28540002
97 2 WnS 0.915400058
98 2 WnS 0.3118
99 2 WnS 0.35759997
142 3 WnS 0.209499996
143 3 WnS 0.355199988
144 3 WnS 0.479600032
145 3 WnS 0.195100021
146 3 WnS 0.455800025
40 1 WNs 2.977100165
41 1 WNs 0
42 1 WNs 0
43 1 WNs 0.93639996
105 2 WNs 0.717100059
106 2 WNs 2.6535999
107 2 WNs 4.149200016
108 2 WNs 5.559399812
109 2 WNs 2.09089998
167 3 WNs 6.93320061
168 3 WNs 4.72049974
169 3 WNs 8.660299999
171 3 WNs 2.437699887
44 1 Wns 0.650700024
45 1 Wns 0.93370002
46 1 Wns 2.126800006
47 1 Wns 1.311599922
48 1 Wns 2.114299946
115 2 Wns 2.113199912
116 2 Wns 7.767500208
117 2 Wns 3.437700174
118 2 Wns 2.931799789
172 3 Wns 0.635800038
173 3 Wns 1.607500132
174 3 Wns 0.947599979
175 3 Wns 1.075900105
49 1 wns 1.70000015
51 1 wns 1.504
52 1 wns 0.94400004
53 1 wns 3.488500337
100 2 wns 2.2786001
101 2 wns 1.578499845
102 2 wns 0.764299998
103 2 wns 0.474700014
104 2 wns 1.31689998
157 3 wns 1.770300075
159 3 wns 2.36320013
160 3 wns 4.271699849
161 3 wns 4.158900105
54 1 wNs 5.17460008
55 1 wNs 3.88629957
56 1 wNs 0.592299989
57 1 wNs 2.063099988
58 1 wNs 1.889700111
110 2 wNs 3.202000016
111 2 wNs 5.395399962
112 2 wNs 3.959800395
114 2 wNs 6.27350022
163 3 wNs 2.42960022
164 3 wNs 4.8853
165 3 wNs 1.079000118
166 3 wNs 4.73389994
")

seed$Treatment <- factor(as.character(seed$Treatment), levels = c("wns", "wnE", "wnS", "wNs", "wNE", "wNS", "Wns", "WnE", "WnS", "WNs", "WNE", "WNS"))

# Look at the data

str(seed) # Note the column that has our dependent variable

boxplot(seedMass.g ~ Treatment, seed)

# Obtain a histogram of the data for seed mass

hist(seed$seedMass.g) # Distribution is too far from Normal

seed$logSmass <- log(seed$seedMass.g + 0.3)

# Obtain a histogram of the transformed data by adding the column name

hist(seed$logSmass)

boxplot(logSmass ~ Treatment, 
        data = seed, 
        ylab = 'log Seed mass (g)')


# Read help about aggregate() function.

smeans <- aggregate(seedMass.g ~ Treatment, 
                    data = seed, 
                    FUN = mean)

smedians <- aggregate(seedMass.g ~ Treatment, 
                      data = seed, 
                      FUN = median)

# Use the aggregate function to get the standard error of the average for each treatment. Note that the function applied gets the standard deviation and divides by the square root of the sample size to get the standard error.

sserrors <- aggregate(seedMass.g ~ Treatment, 
                      data = seed, 
                      FUN = function(x) sd(x)/sqrt(length(x)))


table.seed.mass <- cbind(smeans, sserrors, smedians)[, c(1, 2, 4, 6)]

names(table.seed.mass) <- c("Treatment", "Mean", "SE", "Median")

pander(table.seed.mass)


```


ANSWER THE FOLLOWING QUESTIONS:

Inspect the boxplot. What treatments appear to differ? What would you expect to see based on the hypotheses and previous knowledge? Do the data appear to support your expectations?


Explain what cbind() does.


Explain what aggregate() does.


#### Part 2. Partition of Sum of Squares and Degrees of Freedom [30 points]


Use basic functions to partition the total sum of squares of logSmass into treatments and residual or error. Start by creating a column with the treatment averages for each observation. Then, add columns for total deviation of observation from the overall average, deviation of treatment average from the overall average, and deviation of observation from treatment average. Calculate the corresponding degrees of freedom and prepare a complete analysis of variance table with columns for Source, SS, df, and MS. Make it into a data frame and then print it nicely with pander(). Calculate the F test and the critical F to test the Ho: mean seed production is the same in all treatments. Interpret the results.


```{r}

slog.means <- aggregate(logSmass ~ Treatment, 
                        data = seed, 
                        FUN = mean)

names(slog.means)[2] <- "AvgLogMass"

slog.means

seed <- merge(seed, 
              slog.means, 
              by = "Treatment", 
              all = TRUE)

head(seed)

ssTot <- sum((seed$logSmass - mean(seed$logSmass)) ^ 2) # Total sum of squares: total deviation of observations from the overall average

ssTrt <- sum((seed$AvgLogMass - mean(seed$logSmass)) ^ 2) # Treatment sum of squares: deviation of treatment average from the overall average

ssRes <- sum((seed$logSmass - seed$AvgLogMass) ^ 2) # Residual sum of squares: deviation of observations from treatment average

# calculate the degrees of freedom by completing the code below

df.Trt <- length(levels(seed$Treatment)) - 1

dfe <- length(seed$AvgLogMass) - df.Trt - 1

df.Tot <- df.Trt + dfe

# Now calculate the means squares

MSTrt <- ssTrt / df.Trt

MSE <- ssRes / dfe

# Calculate the F ratio and F critical value

(Fcalc <- MSTrt / MSE)

(Fcrit <- qf(0.05, 
             df1 = df.Trt, 
             df2 = dfe, 
             lower.tail = FALSE))


```


ANSWER THE FOLLOWING QUESTIONS:

Are there differences among treatments? Report the F statistic and result. State your conclusions.


#### Part 3. ANOVA using R functions.[20 points]


Use the R functions aov() and anova() to obtain the same tests of the null hypothesis that all means are equal. Report the results of using each function and compare to the previous results. The purpose of this part is for you to become familiar with different R functions that accomplish the same task. Read the help about the function oneway.test() and use it, making sure that all arguments have proper values. Pay particular attention to the assumption of equal variances. Note that this function allows you to do the test even when variances are not equal, but we assume that the variances are equal.


```{r}

# read help about function aov(), anova(), oneway.test()

summary(aov(formula = logSmass ~ Treatment, 
            data = seed))

linear.model1 <- lm(formula = logSmass ~ Treatment, 
                    data = seed)

anova(linear.model1)

oneway.test(logSmass ~ Treatment, 
            data = seed, 
            var.equal = "something here")

```


ANSWER THE FOLLOWING QUESTION:

Do the results differ among functions? Compare to the results from Part 2 and Part 3.


#### Part 4. Confidence intervals for treatment means [25 points]

Create 95% confidence intervals for all the treatment means, back-transform to see mass in g by exponentiating and subtracting 0.3, then add them to the table created in Part 1. Explain how the line to add the CI's to the ci.data works.


```{r}

ci.data <- data.frame(Treatment = levels(seed$Treatment))

ci.data <- cbind(ci.data, predict(linear.model1, 
                                  newdata = ci.data, 
                                  interval = "confidence")) 

str(ci.data)

# Complete the code below to do the back transformation of the CI's and make a table with them

ci.data.bt <- ci.data[,2:4]

# Note that we will be doing a back-transformation on 3 columns from the ci.data table: the estimated mean values for each of 12 treatments, the lower 95% CIs, and the upper 95% CIs

# Back-transformation code goes here ...


library(plotrix)
plotCI(ci.data.bt$fit, 
       uiw = ci.data.bt$upr - ci.data.bt$fit, 
       liw = ci.data.bt$fit - ci.data.bt$lwr, 
       ylab = "Seed Mass (g)")


```


ANSWER THE FOLLOWING QUESTIONS.

(1) Can you conclude that any of the treatments are effective at controlling medusahead seed production, if your threshold for control is 2 g? In other words, do any treatments result in an expected seed mass production that is significantly less than 2 g?


(2) Explain how the line to add the CI's to the ci.data works (i.e., explain the cbind() function).


### Animal Sciences