You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: 01-intro.Rmd
+3
Original file line number
Diff line number
Diff line change
@@ -66,3 +66,6 @@ Statistical thinking is about understanding our world by modeling the variation
66
66
67
67
Computational statistical thinking is an exciting new way of doing statistics that makes use the computational tools of today. To understand randomness, we sample, re-sample, simulate or generate values from a model. We use these to learn how the problem might look if we'd collected different data, or if particular conditions hold. It allows us to create a sandbox to play in, a virtual world to examine randomness and variation.
The book [Spreaadsheet Munging Strategies](https://nacnudus.github.io/spreadsheet-munging-strategies/index.html) by Duncan Garmonsway is a really good source of dealing with complicated excel spreadsheets. We recommend working through examples in this book to familiarise ways to deal with messy spreadsheets, and incorporating information such as special formatting into the data.
105
107
106
-
### Your turn: Australian Bureau of Statistics data
108
+
The [case study](https://nacnudus.github.io/spreadsheet-munging-strategies/vaccinations.html#vaccinations) developed from Bob Rudis's post on CDC vaccination data is especially recommended.
109
+
110
+
<!--
111
+
### Your turn: New Zealand Census data
112
+
113
+
StatsNZ makes [tables of data from the five year censuses](http://nzdotstat.stats.govt.nz/) publicly available. Take a look at the 2018 Census data, the population and migration data. You need to expand the cells, and select the levels to use, to have the numbers broken down by age group, sex and ethnicity. (A sample file `NZ_census.xlsx` is provided as an example.) In excel format, the variables and levels of the variables are in the header names, with a twist, a row for each variable, in a multicolumn format. (The R package `tidyxl` has the capacity to deal with multiple headers like this, but requires `xlsx` format. The sample file has been opened and saved in this format, and the first two blank lines in the origial file were also manually removed.)
114
+
115
+
Luckily, choosing the `csv` format will provide the data in tidy long form.
116
+
117
+
GIVING UP ON THE XLS FORMAT - ITS JUST REALLY IRREGULAR - AND tidyxl even cannot handle it.
Copy file name to clipboardexpand all lines: 03a-tidying-data.Rmd
+13
Original file line number
Diff line number
Diff line change
@@ -113,6 +113,19 @@ fly
113
113
114
114
What are the variables?
115
115
116
+
## ABS Datapack
117
+
118
+
The Australian Bureau of Statistics (ABS) collects, maintains and delivers data and official statistics on a wide range of economic, social, population and environmental matters of importance to Australia. There are many different access points for data, but primarily aggregated data is the main type available. Examples at accessing the Census data from the ABS can be found in the `eechidna` package.
119
+
120
+
1. The individual `csv` files must be held locally. They come from a zip file and can be downloaded from: https://datapacks.censusdata.abs.gov.au/datapacks/
121
+
2. Select: 2016 Census Datapacks, General Community Profile, Commonwealth Electoral Divisons
122
+
3. Download for all of Australia
123
+
4. Unzip the package - its necessary, because the data is delivered in many small csv files. There is also the license information detailing appropriate usage, and detailed information about the formats.
124
+
125
+
```{r eval=FALSE}
126
+
G1_Main <- read_csv(here::here("data/2016 Census GCP Commonwealth Electoral Divisions for AUST/", "2016Census_G01_AUS_CED.csv"))
127
+
```
128
+
116
129
## Messy vs tidy
117
130
118
131
Messy data is messy in its own way. You can make unique solutions, but then another data set comes along, and you have to again make a unique solution.
0 commit comments