12-Intro_Geospatial_Vector/12-Intro-Geospatial-R.qmd at master · VT-Hydroinformatics/12-Intro_Geospatial_Vector · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
---
title: "Chapter 12: Intro Geospatial - Vector"
date: today
author: JP Gannon
format: gfm
execute:
  enabled: true
---


# Geospatial data in R - Vector

The following activity is available as a template github repository at the following link: <https://github.com/VT-Hydroinformatics/12-Intro_Geospatial_Vector>

For more: <https://datacarpentry.org/r-raster-vector-geospatial/>

Much of this is adapted from: <https://geocompr.robinlovelace.net/index.html> Chapter 8

This demo uses data from the CAMELS dataset (full ref below) <https://ral.ucar.edu/solutions/products/camels>

A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. <https://dx.doi.org/10.5065/D6MW2F4D>

```{r, message=FALSE, warning=FALSE}
library(tidyverse)
library(sf)
library(tmap)      # for static and interactive maps
library(leaflet)   # for interactive maps
library(webshot2)  # for saving maps
library(htmltools) # for saving maps


theme_set(theme_classic())
```

## Goals

Our goals for this chapter are just to see some of the ways we can wrangle and plot vector spatial data using R. This is by no means the only way and is not an exhaustive demonstration of the packages loaded, but it'll get us started.

First, we need to define raster and vector spatial data.

Check out the images below for two examples of the same data represented as raster data or vector data.

Vector: Points, lines, polygons, boundaries are crisp regardless of scale Raster: Grid of same sized cells, vales in cells, cell size = resolution (smaller cells, higher resolution)

![](images/rastervector1.png "Raster vs. Vector 1")

_Raster vs. Vector 1_

![](images/rastervector2.png "Raster vs. Vector 2")

_Raster vs. Vector 2_

**Questions from these two images:**

*What are the advantages/disadvantages of raster/vector for each?* Which is best to show on a map for each?  \*For elevation, which would be better for calculating slope?

So, today we are sticking to vector data, but then we will be deal primarily with raster elevation data.


## Intro to tmap

We are going to make maps mostly with tmap. But there are several other options (ggplot, leaflet, etc).

Let's look at how tmap works. It uses the same syntax as ggplot: the grammar of graphics.

First we want to set tmap to static map mode. This is what we would want if we were making maps for a manuscript or slides. You can also make interactive maps with tmap, which we will show later. We will also set the check.and.fix option in tmap_options to TRUE, we need to do this for the data we are using, but it isn't always necessary.

Second, we will read in our data. We'll read in the "smallerws" shapefile from the CAMELS dataset and another shapefile of the outline of US states. To read in the shapefiles we will use st_read() from the sf package.

Note that each of these shapefiles is in a separate folder and contains several files. You must have all of those files for the shapefile to work. \*yes: "A shapefile" is actually several files. Silly? Maybe. The cause of much confusion when emailing someone "a shapefile"? Definitely.

Finally we will read in a csv called gauge information that has some extra info we will join to the watershed shapefile later.

Once that is all done, we will look at the watershed data to see what is available in the shapefile attribute table.

*What extra information does the data have beyond a regular R object?* Play around with it, can you reference columns in the table the same way you would with a regular object?

```{r}
#make sure tmap is in static map mode
tmap_mode("plot")
```

```{r}
#the CAMELS shapefile throws an error about having
#invalid polygons, this line allows it to plot
tm_check_fix()

#Read shapefiles
watersheds <- st_read("small_ws/smallerws.shp")
```

```{r}
states <- st_read("cb_2018_us_state_20m/cb_2018_us_state_20m.shp")
```

```{r}
gageinfo <- read_csv("gauge information.csv")
```

```{r}
#look at the watersheds shapefile data
print(head(watersheds))
```

Let's make a map showing the watersheds data. Each watershed has coordinates to draw it in the dataset, and tmap knows how to deal with that. It uses the same format as ggplot, but instead of ggplot() you will use tm_shape(). Then the geoms are prefixed tm\_, so we will use tm_fill to show a map of the watersheds filled in with a color.

```{r}
# Pass the watershed data to tmap and fill the polygons
tm_shape(watersheds) +
  tm_fill()
```

If we use tm_borders instead, it will just outline the watersheds.

```{r}
# Add border layer to shape
tm_shape(watersheds) +
  tm_borders()
```

That's fun, and maybe you can guess what country we are in based on the distribution of watersheds, but it would be better to show some geopolitical boundaries to get a sense of where we are.

To do this, we will use a second tm_shape and show the states data we read in on the plot as well. Just like in ggplot, you can use multiple tm\_ functions to show multiple datsets on the same map.

```{r}
# Add border layer to shape
tm_shape(watersheds) +
  tm_fill() +
  tm_shape(states)+
  tm_borders()
```

You can also show the SAME data with multiple geoms. Let's add tm_borders under the watershed portion of the map and before the states portion so we get fill and borders on out watersheds.

```{r}
# Add fill and border layers to shape
tm_shape(watersheds) +
  tm_fill() +
  tm_borders()+
  tm_shape(states)+
  tm_borders()
```

Okay, this is starting to look like a map! But we need to add elements to make it better.

We could do this all in one statement, but you can also save your existing map as a kind of "basemap" and then add to it later, just like with a ggplot object. We will save the above map as *usa*.

Then we can use several built in geometries in tmap to add a compass, scale, and title. Note the syntax for specifying the position of the objects. Again, you could do this all in one statement too if you wanted.

```{r}
#Save basic map object as "usa"
usa <- tm_shape(watersheds) +
  tm_fill(col = "lightblue")+
  tm_shape(states)+
  tm_borders()

usa +
  tm_compass(type = "8star", position = c("right", "bottom")) +
  tm_scale_bar(position = c("left", "bottom"))+
  tm_layout(title = "CAMELS Watersheds", title.position = c("right", "TOP"))
```

Below is an example of how to edit the "symbology" of the map. In other words, we want to color each of the polygons depending on a variable. Here we make the watersheds darker blue if they have a higher elevation.

The syntax below is basically (pseudo code):

> Represent watersheds as shapes +\
> color the shapes based on elev_mean, use 10 colors, use the Blues palette\
> add a legend in the bottom right, add some space for the title, define the title, position the title\
> add a compass at the bottom left\
> add a scale bar at the bottom left\

```{r}
tm_shape(watersheds) +
  tm_fill(col = "elev_mean", n = 10, palette = "Blues")+
  tm_borders(lwd = 0.2)+
  tm_shape(states)+
  tm_borders()+
  tm_layout(legend.position = c("right", "bottom"),
            inner.margins = 0.1,
            title = "Mean Elevation",
            title.position = c("center", "TOP"))+
  tm_compass(type = "8star", position = c("left", "bottom")) +
  tm_scale_bar(position = c("left", "bottom"))

```

## Data wrangling with tidyverse principles

You can use the same techniques as with other data to change or filter the spatial data. Below we filter to show just watershed number 3164000, which will be in quotes because the column is a character datatype. Note when we looked at the watersheds object above there is a column called hru_id for the watershed ids.

```{r}
watersheds |> filter(hru_id == "3164000") |>
  tm_shape() +
  tm_fill() +
  tm_borders()
```

## Add non-spatial data to spatial data with a join

We have been using the watersheds shapefile. The other dataset we read in "gageinfo" has more data, but it is just a regular tibble, not a geospatial file.

We need to attach data from gageinfo object to the watersheds geospatial object based on watershed ID. How in the world will we do that?

A JOIN!!

In the watersheds shapefile the watershed ids are in a column called *hru_id* and in gageinfo tibble they are in a column called *GAGE_ID*. So when we do the join, we need to tell R that these columns are the same and we want to use them to match the values. We will do a left join to accomplish this.

BUT. TWIST!

Let's look at the first value in hru_id in watersheds: "1013500" Now the same one in GAGE_ID in gageinfo: "01013500"

They're both the same IDs, but one has a leading zero. And because they are character format, "1013500" does not equal "01013500". We can address this a couple of ways, but the best is probably to keep them as characters, but add a leading 0 to the hru_id column.

We will do this by creating a new column with a mutate and using the paste0() function to add a leading 0. To make the join easier, we will also give this new column the same name as in the gageinfo tibble: "GAGE_ID".

To review joins, check out chapter \@ref(getdata)

```{r}
#add leading 0 to hru_id column and save it as GAGE_ID
watersheds <- watersheds |>
  mutate(GAGE_ID = paste0("0", hru_id))

#join watersheds and gageinfo using the GAGE_ID column as the key
watersheds_info <- watersheds |>
  left_join(gageinfo, by = "GAGE_ID")

print(head(watersheds_info))
```

And now we can plot this formerly non-spatial data on our map.

In this case, we can now add the name of the watershed to the map rather than the number.

```{r}
#now the gageinfo columns are available for us to use when mapping
watersheds_info |> filter(hru_id == "3164000") |>
  tm_shape() +
  tm_fill() +
  tm_borders() +
  tm_text("GAGE_NAME", size = 1)
```

We can also subset vector data to create new datasets or plot. Below we will use filter statements to grab all the polygons in Virginia from the watersheds shapefile and then the Virginia state outline from the states shapefile.

The method we will use for getting the Virginia watersheds is a little new. We are going to use the grepl() function to grab any gages that include the text ", VA" since there isn't a state column, but the watershed names include the state.

Why are we doing ", VA" and not just "VA"? Good question!

", VA" will most likely only show up in gage names that are in Virginia because they'll be in the format "Gage name, VA". If we just say "VA" we might get some gages that have "VA" as part of their name but are not in Virginia.

Once we successfully filter to Virginia, we will then make a nice map of the CAMELS watersheds in Virginia!

```{r}
va_watersheds <- filter(watersheds_info, grepl(", VA", GAGE_NAME))
va_outline <- filter(states, NAME == "Virginia")

va_outline |>
  tm_shape() +
  tm_borders() +
  tm_shape(va_watersheds) +
  tm_fill() +
  tm_borders() +
  tm_layout(inner.margins = 0.1,
            title = "VA CAMELS Watersheds",
            title.position = c("left", "TOP"))+
  tm_compass(type = "8star", position = c("right", "top")) +
  tm_scale_bar(position = c("left", "top"))
```

In addition to filtering, we can use the data in the attribute table to calculate additional parameters, just like with a normal object. Below we calculate the ratio of Area to Perimeter by changing the perimeter datatype to numeric and then dividing area by perimeter.

```{r}
va_watersheds <- va_watersheds |>
  mutate(PerimeterNum = as.numeric(Perimeter),
         Area_Perimeter = AREA / PerimeterNum)

print(head(va_watersheds))

```

Now we can plot our newly calculated data by controlling color with that new column name.

```{r}
va_outline |>
  tm_shape() +
  tm_borders() +
  tm_shape(va_watersheds) +
  tm_fill(col = "Area_Perimeter", n = 10, palette = "Reds") +
  tm_borders()+
  tm_layout(inner.margins = 0.12,
            title = "VA CAMELS Watersheds",
            title.position = c("left", "TOP"))+
  tm_compass(type = "8star", position = c("right", "top")) +
  tm_scale_bar(position = c("right", "bottom"))
```

## Plot maps side by side

Just like we can use facets in ggplot, we can use facets to show multiple maps. Below we color our map by AREA and elev_mean and put them next to each other using tm_facets.

```{r}
facets = c("AREA", "elev_mean")

tm_shape(va_watersheds) +
  tm_polygons(facets) +
  tm_facets(ncol = 2, sync = FALSE)

```

## Built in styles, like themes in ggplot

Tmap also has built in styles, which are like themes in ggplot. We can use these styles with tm_style. Try "classic", "cobalt", or "col_blind" below.

```{r}
va_outline |>
  tm_shape() +
  tm_borders() +
  tm_shape(va_watersheds) +
  tm_fill(col = "Area_Perimeter", n = 10, palette = "Reds") +
  tm_borders()+
  tm_layout(inner.margins = 0.12,
            title = "VA CAMELS Watersheds",
            title.position = c("left", "TOP"))+
  tm_compass(type = "8star", position = c("right", "top")) +
  tm_scale_bar(position = c("right", "bottom"))+
  tm_style("classic") #try cobalt, bw, col_blind

```

## Interactive Maps

### tmap

You can also generate maps that you can interact with, as opposed to static maps, that we have been using before. If you are generating a map for an app or webpage, this may be a good choice. But for a pdf report, the static maps are more appropriate.

In tmap all you have to do is run tmap_mode("view") and it will create an interactive map with the exact same syntax! To switch back to a static map, run tmap_mode("plot")

Also in this chunk we see how to add a basemap to a tmap object, using tm_basemap.

```{r}
tmap_mode("view")

usa <- tm_shape(watersheds_info) +
  tm_fill(col = "lightblue", alpha = 0.3)+
  tm_borders()+
  tm_shape(states)+
  tm_borders() +
  tm_layout(title = "CAMELS Watersheds", title.position = c("right", "TOP"))

usa + tm_basemap(server = "OpenTopoMap")
```
**[View the interactive map here](images/tmap_plot.html)**

### Leaflet

Leaflet is another way to make interactive maps. It's syntax is very different, as you can see below. But depending on what functionality you need, it could be a better choice.

```{r}
leaflet(watersheds_info) |>
  addTiles() |>
  addPolygons(color = "#444444", weight = 1, smoothFactor = 0.5,
    opacity = 1.0, fillOpacity = 0.5,
    fillColor = ~colorQuantile("YlOrRd", elev_mean)(elev_mean))
```
**[View the interactive map here](images/leaflet_plot.html)**