Skip to content

Commit 2a6844a

Browse files
committed
beta diversity revised
1 parent 4cb299c commit 2a6844a

File tree

3 files changed

+1559
-286
lines changed

3 files changed

+1559
-286
lines changed

Betadiversity.Rmd

Lines changed: 73 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,14 @@ output:
2323

2424

2525

26-
## Beta diversity
26+
## Beta diversity
2727

28-
Some examples on calculating beta diversity and using it to quantify community divergence within a given sample set.
28+
Beta diversity quantifies dissimilarity in community composition between samples. Dissimilarity can be also quantified by _distance_ or _divergence_. These measures have a broad use in statistical data analysis.
29+
30+
The [vegan R package](https://cran.r-project.org/web/packages/vegan/index.html) and the [phyloseq R package](https://bioconductor.org/packages/release/bioc/html/phyloseq.html) implement a number of standard ecological dissimilarity measures implemented in the 'vegdist' function.
31+
32+
Here, we show brief examples on how to compare sample heterogeneity between groups and over time.
2933

30-
See [Community comparisons](Comparisons.html) page for examples on group-level comparisons based on beta diversity measures, including [limma](limma.html), [PERMANOVA](PERMANOVA.html), [mixed models](Mixedmodels.html), and [negative binomial](Negativebinomial.html).
3134

3235
Load example data
3336

@@ -38,37 +41,15 @@ data(peerj32)
3841
pseq <- peerj32$phyloseq
3942
```
4043

41-
42-
## Quantifying group divergence / spread
43-
44-
Divergence of a given sample set can be quantified as the average dissimilarity of each sample from the group mean; the dissimilarity can be quantified by beta diversity, for instance. This was applied in group-level comparisons for instance in [Salonen et al. ISME J 2014](http://www.nature.com/ismej/journal/v8/n11/full/ismej201463a.html) (they focused on homogeneity using inverse correlation, whereas here we focus on divergence using correlation but the measure is essentially the same).
45-
46-
Calculate group divergences within the LGG (probiotic) and Placebo groups
47-
48-
```{r divergence-example2bb, message=FALSE}
49-
b.pla <- divergence(subset_samples(pseq, group == "Placebo"))
50-
b.lgg <- divergence(subset_samples(pseq, group == "LGG"))
51-
```
52-
53-
Use these to compare microbiota divergence within each group. The LGG group tends to have smaller values, indicating that the samples are more similar to the group mean, and the LGG group is less heterogeneous (has smaller spread / is more homogeneous):
54-
55-
```{r divergence-example2bbb, message=FALSE, out.width="300px"}
56-
boxplot(list(LGG = b.lgg, Placebo = b.pla))
57-
```
58-
59-
The **inter- and intra-invididual stability** (or homogeneity) measures are obtained as 1-b where b is the group divergence with the anticorrelation method ([Salonen et al. ISME J 2014](http://www.nature.com/ismej/journal/v8/n11/full/ismej201463a.html)).
60-
61-
62-
6344
## Intra-individual divergence
6445

65-
Quantify beta diversity within subjects over time (as in [Salonen et al. ISME J 2014](http://www.nature.com/ismej/journal/v8/n11/full/ismej201463a.html) for intra-individual stability)
46+
Divergence within subjects may increase following intervention.
6647

67-
```{r homogeneity-example2c, message=FALSE, warning=FALSE, out.width="300px"}
48+
```{r divergence2c, message=FALSE, warning=FALSE, out.width="300px"}
6849
betas <- list()
6950
groups <- as.character(unique(meta(pseq)$group))
7051
for (g in groups) {
71-
#df <- meta(subset_samples(pseq, group == g))
52+
7253
df <- subset(meta(pseq), group == g)
7354
beta <- c()
7455
@@ -80,9 +61,9 @@ for (g in groups) {
8061
s <- as.character(dfs$sample)
8162
# Here with just two samples we can calculate the
8263
# beta diversity directly
83-
beta[[subj]] <- 1-cor(abundances(pseq)[, s[[1]]],
64+
beta[[subj]] <- divergence(abundances(pseq)[, s[[1]]],
8465
abundances(pseq)[, s[[2]]],
85-
method = "spearman")
66+
method = "bray")
8667
}
8768
}
8869
betas[[g]] <- beta
@@ -92,40 +73,89 @@ boxplot(betas)
9273
```
9374

9475

95-
## Beta diversity within individual over time
76+
## Divergence within individual over time
9677

97-
Calculate change in beta diversity (community dissimilarity) over time within a single individual
78+
Community divergence within individual often tends to increase over time with respect to the baseline sample.
9879

9980
```{r homogeneity-example2d, message=FALSE, warning=FALSE, out.width="300px"}
100-
data(atlas1006)
101-
pseq <- atlas1006
102-
103-
# Identify subject with the longest time series (most time points)
104-
s <- names(which.max(sapply(split(meta(pseq)$time, meta(pseq)$subject), function (x) {length(unique(x))})))
81+
library(MicrobeDS)
82+
library(microbiome)
83+
data(MovingPictures)
10584
10685
# Pick the metadata for this subject and sort the
10786
# samples by time
10887
library(dplyr)
109-
df <- meta(pseq) %>% filter(subject == s) %>% arrange(time)
88+
89+
# Pick the data and modify variable names
90+
pseq <- MovingPictures
91+
s <- "F4" # Selected subject
92+
b <- "UBERON:feces" # Selected body site
93+
94+
# Let us pick a subset
95+
pseq <- subset_samples(MovingPictures, host_subject_id == s & body_site == b)
96+
97+
# Rename variables
98+
sample_data(pseq)$subject <- sample_data(pseq)$host_subject_id
99+
sample_data(pseq)$sample <- sample_data(pseq)$X.SampleID
100+
101+
# Tidy up the time point information (convert from dates to days)
102+
sample_data(pseq)$time <- as.numeric(as.Date(gsub(" 0:00", "", as.character(sample_data(pseq)$collection_timestamp)), "%m/%d/%Y") - as.Date("10/21/08", "%m/%d/%Y"))
103+
104+
# Order the entries by time
105+
df <- meta(pseq) %>% arrange(time)
110106
111107
# Calculate the beta diversity between each time point and
112108
# the baseline (first) time point
113-
beta <- c(0, 0) # Baseline similarity
109+
beta <- c() # Baseline similarity
114110
s0 <- subset(df, time == 0)$sample
111+
# Let us transform to relative abundance for Bray-Curtis calculations
112+
a <- abundances(transform(pseq, "compositional"))
115113
for (tp in df$time[-1]) {
116114
# Pick the samples for this subject
117115
# If the same time point has more than one sample,
118116
# pick one at random
119117
st <- sample(subset(df, time == tp)$sample, 1)
120-
a <- abundances(pseq)
121-
b <- 1 - cor(a[, s0], a[, st], method = "spearman")
118+
# Beta diversity between the current time point and baseline
119+
b <- vegdist(rbind(a[, s0], a[, st]), method = "bray")
120+
# Add to the list
122121
beta <- rbind(beta, c(tp, b))
123122
}
124123
colnames(beta) <- c("time", "beta")
125124
beta <- as.data.frame(beta)
126125
126+
theme_set(theme_bw(20))
127127
library(ggplot2)
128128
p <- ggplot(beta, aes(x = time, y = beta)) +
129-
geom_point() + geom_line()
130-
print(p)
129+
geom_point() +
130+
geom_line() +
131+
geom_smooth() +
132+
labs(x = "Time (Days)", y = "Beta diversity (Bray-Curtis)")
133+
print(p)
131134
```
135+
136+
137+
## Inter-individual divergence / spread
138+
139+
Divergence within a sample set quantifies the overall heterogeneity in community composition across samples or individuals. This is sometimes quantified as the average dissimilarity of each sample from the group mean; the dissimilarity can be quantified by beta diversity as in [Salonen et al. ISME J 2014](http://www.nature.com/ismej/journal/v8/n11/full/ismej201463a.html) (they focused on homogeneity using inverse divergence but the measure is essentially the same).
140+
141+
Calculate divergences within the LGG (probiotic) and Placebo groups with respect to the median profile within each group.
142+
143+
```{r divergence-example2bb, message=FALSE}
144+
pseq <- peerj32$phyloseq
145+
146+
b.pla <- divergence(subset_samples(pseq, group == "Placebo"),
147+
apply(abundances(subset_samples(pseq, group == "Placebo")), 1, median))
148+
149+
b.lgg <- divergence(subset_samples(pseq, group == "LGG"),
150+
apply(abundances(subset_samples(pseq, group == "LGG")), 1, median))
151+
```
152+
153+
154+
The group with larger values has a more heterogeneous community composition.
155+
156+
```{r divergence-example2bbb, message=FALSE, out.width="300px"}
157+
boxplot(list(LGG = b.lgg, Placebo = b.pla))
158+
```
159+
160+
See [Community comparisons](Comparisons.html) for examples on group-level comparisons based on beta diversity.
161+

0 commit comments

Comments
 (0)