r.rmd

---
layout: default
title: R code
output: bookdown::html_chapter
---

# R code

The most important part of most packages is the `R/` directory because it contains all of your R code. We'll start with `R/`, because even if you do nothing else, putting your R files in this directory gives you some useful tools.

In this chapter you'll learn:

* How to organise the R code in your package.
* Your first package development workflow.
* What happens when you install a package.
* The difference between a library and package.

## Starting a package

There are two ways to get started with a new package:

* Firstly, you can just create an R directory and move in your R code.
  Files containing R code should always have the extension `.R` (you can also 
  use `.r`, but I recommend sticking to `.R` for consistency).

* Alternatively, use `devtools::create("pkgname")`: this will create a
  directory called `pkgname`, containing an empty `R/` directory and 
  some useful files that you'll learn about in the course of this book.

__Never__ use `package.skeleton()`: it's designed for a bygone era of package development, and mostly serves to make your life harder, not easier.

## Loading code

The first advantage of using a package is that it's now easier to reload all of your code. There are two options: 

* `load_all()`, Cmd + Shift + L, reloads all code in the package. It's 
  fast but takes some shortcuts that occassionally backfire. In RStudio,
  this also saves all open files, saving you a key press.

* Build & reload, Cmd + Shift + B. This is only available in RStudio, because
  it installs the package, then restarts R, then loads the package with 
  `library()` (If you don't use RStudio, you'll have to do this by hand;
  that's a bit painful.)

These commands make it very easy to iterate between writing code in your editor and trying it out in the R console.

## Organisation

It's up to you how you arrange your functions into files. There are two
possible extremes: all functions in one file, and each function in its own
file. I think these are both too extreme, and I suggest grouping related
functions into a single file. My rule of thumb is that if I can't remember
which file a function lives in, I probably need to split them up into more
files: having only one function in a file is perfectly reasonable,
particularly if the functions are large or have a lot of documentation. As
you'll see in the next chapter, often the code for the function is small
compared to its documentation (it's much easier to do something than it is to
explain to someone else how to do it.)

You can not use subdirectories. If you have many files with related purposes, I recommend using a common prefix: `abc-*.R`.

Learning how to use F2 (jump to defintion) and Ctrl + . (jump to function) in RStudio considerably weakens the need to have a very organised directory.

## Working directories & RStudio projects

All devtools functions assume that your current working directory is the package. If you need to override that, you can supply the full path in the first argument, but you're better off learning how to use working directories or RStudio projects effectively.

Every R session has a "working directory" associated with. Whenever you use a relative path, it's relative to this directory. It's a good practice to get into.

You can see the current working directory with `getwd()`. You can also change it with `setwd()`, but seeing this in a script is a bad sign. An explicit `setwd()` makes code fragile because it will now only work on a computer set up exactly the same way as yours. Instead, you should assume that when someone uses your code, they're already set up the working directory correctly. The easiest way to do that is to use RStudio projects.

### RStudio projects

If you use RStudio, I highly recommend using an RStudio project for your package. Projects are a very thin wrapper around working directories - they remember what working directory you are in and what files you have open.

* Projects isolate state so that unrelated things stay unrelated.

* You get handy code navigation tools like `F2` to jump to a function
  definition and `Ctrl + .` to look up functions by name.

* You get useful keyboard shortcuts for common package development tasks.
  See Help | Keyboard shortcuts, or press Alt + Shift + K to see them all.

* Projects are a light weight text file, and devtools makes sure that 
  this file is not included in binary packages that you share with others 
  (by including in `.Rbuildignore`).

* Package development projects have a number of very useful keyboard 
  shortcuts that we'll use throughout the book.

`create()` will automatically create a `.Rproj` file for you. To create one after the fact, run `devtools::use_rstudio()` in the package directory, or use the "New Project..." command in the project menu.

```{r, echo = FALSE}
bookdown::embed_png("screenshots/new-project.png", dpi = 220)
```

`.Rproj` files are just text files. For example, if you looked at the project file associated with this book, you'd see:

```
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: XeLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
```

Never modify this file by hand. Instead, use the friendly project options dialog, accessible from the projects menu in the top-right corner.

```{r, echo = FALSE}
bookdown::embed_png("screenshots/project-options-1.png", dpi = 220)
bookdown::embed_png("screenshots/project-options-2.png", dpi = 220)
```

## Development cycles

The package development cycle describes the sequence of operations that you use when developing a package. You probably already have a sequence of operations that you're comfortable with when developing a single file of R code. It might be:

1. Try something out on the command line.

1. Modify it until it works and then copy and paste the command into an R file.

1. Every now and then restart R and reload the complete file to make sure
   you've ordered all the code correctly.

Or maybe you:

1. Write all your functions in an R file and save it.

1. `source()` the file into your current session.

1. Interactively try out the functions and see if they return the correct
  results.

1. Repeat the above steps until the functions work the way you expect.

With devtools and RStudio, you can:

1. Edit R files in the editor.

1. Press Cmd + Shift + L

1. Explore the code in the console

1. Rinse and repeat.

You might also be a little bit more worried about checking that your functions work, not only now, on your computer, but also in the future and on other computers. You'll learn more about more sophisticated workflows that prevent that from happening in [testing](#tests).


## What is a package?

To develop packages, it's good to have a good understand of what a package is, and the various forms that it comes in.

### Source, binary and bundled packages

So far we've just worked with a __source__ package: the development version of a package that lives on your computer. There are also three other types of package: bundled, binary and installed.

A package __bundle__ is a compressed version of a package in a single file. By convention, package bundles in R use the extension `.tar.gz`. This is Linux convention indicating multiple files have been collapsed into a single file (`.tar`) and then compressed using gzip (`.gz`). The package bundle is useful if you want to manually distribute your package to another R package developer. It is not OS specific. You can use `devtools::build()` to make a package bundle. If you decompress a package bundle, you'll see it looks almost the same as your source package. (The only difference is that it excludes files listed in `.Rbuildignore`).

If you want to distribute your package to another R user (i.e. someone who doesn't necessarily have the development tools installed) you need to make a __binary__ package. Like a package bundle, a binary package is a single file, but if you uncompress it, you'll see that the internal structure is a little different to a source package: 

* a `Meta/` directory contains a number of `Rds` files. These contain cached
  metadata about the package, like what topics the help files cover and
  parsed versions of the `DESCRIPTION` files. (If you want to look at what's
  in these files you can use `readRDS()`). The goal of these files is to make
  packages load as quickly as possible.

* a `html/` directory contains some files needed for the HTML help.

* there are no `.R` files in the `R/` directory - instead there are three
  files that store the parsed functions in an efficient format. This is
  basically the result of loading all the R code and then saving the
  functions with `save()`, but with a little extra metadata to make things as
  fast as possible.

* If you had any code in the `src/` directory there will now be a `libs/`
  directory that contains the results of compiling that code for 32 bit
  (`i386/`) and 64 bit (`x64/`)
  
* The contents of `inst/` have been moved into the top-level directory.

Binary packages are platform specific: you can't install a Windows binary package on a Mac or vice versa. You can use `devtools::build(binary = TRUE)` to make a binary package.  Mac binary packages end in `.tgz` and Windows binary packages end in `.zip`.

The following table summarises the files present in the root directory for the source, built and binary versions of devtools.

```{r, results = "asis", echo = FALSE}
knitr::kable(readRDS("extras/pkg-paths.rds"))
```

(Needs diagram instead of table?)

* `cran-comments.md` and `devtools.Rproj` are included in `src` but nothing 
  else. That's because they're only need for development, and are listed in
  `.Rbuildignore`.
  
* `inst/` goes away and `templates/` appears - that's because building a 
  binary copies the contents of `inst/` into the root directory. You'll learn
  more about that in XXX.

An __installed__ package is just a binary package that's been uncompressed into a package library, described next.

### Libraries

In R, to load a package, you use the `library()` function. That's confusing, and some people use the terms library and package interchangeable. But libraries are different to packages, and it's useful to understand the difference.

A library is a collection of installed packages. You can have multiple libraries on your computer and most people have at least two: one for the recommended packages that come with a base R install (like `base`, `stats` etc), and one library where the packages you've installed live. The default is to make that directory dependent on which version of R you have installed - that's why you normally lose all your packages when you reinstall R. If you want to avoid this behaviour, you can manually set the `R_LIBS` environmental variable to point somewhere else. `.libPaths()` tells you where your current libraries live.

My `.libPaths()` has two entries: the first is where packages that I've installed R live, and the second is where packages that come with R live:

```{r}
.libPaths()
lapply(.libPaths(), dir)
```

When you use `library(pkg)` to load a package, R looks through each path in `.libPaths()` to see if a directory called `pkg` exists. 

Packrat, which we'll learn about in Chapter XXX, automates the process of managing project specific libraries. This means that when you upgrade a package in one project, it only affects that project, not every project on your computer. This is useful because it means you can play around with cutting-edge packages in one place, but all your other projects continue to use the old reliable packages. This is also useful when you're both developing and using a package. 

### Side effects

You should not call functions that cause side-effects at the top-level of the package:

* `library()`, `require()`
* `options()`, `par()`
* `write()`, `write.csv()`, `saveRDS()`, etc. which save output to disk.

When you're developing a package with `load_all()` they will appear to work. When you turn your code into a package (e.g. with "build and reload") they will not work, because they will be run only once when the package is built (affecting your R session), not every time it is loaded. 

It is almost always best to make changes to the users environment explicit. They shouldn't happen automatically, but instead should be requested by executing a function. 

Occassionally, there are important side-effects that you a package does need. This might be create an temporary file somewhere or similar. To run that code every time the package is loaded, use the special functions `.onLoad()` and `onAttach()`. These are called when the package is loaded or attached. You'll learn about the distinction between the two in [Namespaces](#namespace). Unless you know otherwise use `.onAttach()`. NEVER use `library()` or `require()` in these functions.

Whenever you use `.onLoad()` or `.onAttach()`, make sure to define `.onUnload()` or `.onDetach()` to clean up when the package is unload/detached. A common use of `.onLoad()` is to dynamically load a DLL file. This is old-fashioned practice and there are better alternatives available. You'll learn about these in [Namespaces](#namespace).  Can still be necessary if you're connecting to external C/C++ libraries that need set up and tear down. For example, devtools uses `.onAttach()` to run some code that locates Rtools on windows; shiny uses it to set up PRNG correctly.

One type of side-effect that's not so harmful is displaying a message when the package is loaded. This might make clear licensing conditions, or important usage tips. Startup messages should go in `.onAttach()`. To display startup messages, always use `packageStartupMessage()`, and not `message()`. (This allows `suppressPackageStartupMessages()` to selectively suppress package startup messages).

Can also use to set up hooks that run when other packages are loaded. This can be useful if you want to warn when packages are loaded in sub-optimal manner (e.g. dplyr before plyr), or when you need to add S3 methods to a suggested package. That's discussed in more detail in [namespaces](#namespace).

### What happens when you install a package?

Package installation is the process whereby a source package gets converted into a binary package and then installed into a local library.  There are a number of tools that automate this process:

* `install.packages()` installs a package from CRAN. Here CRAN takes care of
  making the binary package and so installation from CRAN basically is
  equivalent to downloading the binary package value and unzipping it in
  `.libPaths()[1]`.

* `devtools::install()` installs a source package from a directory on your
  computer. It does this by first `build()`ing a package bundle, and then
  asking `install.packages()` to install it.

* `devtools::install_github()` installs a package that someone has published
  to their [github](http://github.com) account. There are a number of similar
  functions that make it easy to install packages from other internet
  locations: `install_url()`, `install_gitorious()`, `install_bitbucket()`, 
  and so on. You'll learn more about these in [release](release.html).
  

## Exercises

* Go to CRAN and download the source and binary for XXX. Unzip and compare.
  How do they differ?

* Download the __source__ packages for XXX, YYY, ZZZ. What directories do they
  contain?

* Where is your default library? What happens if you install a new package
  from CRAN?

## CRAN notes

If you're planning on submitting your package to CRAN, it's best to use only ASCII characters. If you need unicode characters in strings, use `"\u1234"` format:

```{r}
# TODO figure out how to display x as y
x <- "•"
y <- "\u2022"
identical(x, y)
```

Your R directory should not include any other files. Subdirectories will be ignored.