|
| 1 | +--- |
| 2 | +title: Using tryCatch for robust R scripts |
| 3 | +author: Rahul S |
| 4 | +date: '2018-12-20' |
| 5 | +slug: try-catch |
| 6 | +categories: |
| 7 | + - R |
| 8 | +tags: |
| 9 | + - R |
| 10 | + - programming-practices |
| 11 | +output: |
| 12 | + blogdown::html_page: |
| 13 | + toc: false |
| 14 | +--- |
| 15 | + |
| 16 | +```{r include=FALSE} |
| 17 | +library(tidyverse) |
| 18 | +library(knitr) |
| 19 | +``` |
| 20 | + |
| 21 | +Using `tryCatch` to write robust R code can be a bit confusing. I found the help file dry to read. There are some resources which explore `tryCatch`, linked below. Over the years, I have developed a few programming paradigms which I've repeatedly found useful. A quick introduction to `tryCatch` below, followed by three use-cases I use on a regular basis. |
| 22 | + |
| 23 | +# Syntax |
| 24 | + |
| 25 | +`tryCatch` has a slightly complex syntax structure. However, once we understand the 4 parts which constitute a complete `tryCatch` call as shown below, it becomes easy to remember: |
| 26 | + |
| 27 | +* `expr` : [Required] R code(s) to be evaluated |
| 28 | +* `error` : [Optional] What should run if an error occured while evaluating the codes in `expr` |
| 29 | +* `warning` : [Optional] What should run if a warning occured while evaluating the codes in `expr` |
| 30 | +* `finally` : [Optional] What should run just before quitting the `tryCatch` call, irrespective of if `expr` ran succcessfuly, with an error, or with a warning |
| 31 | + |
| 32 | +```{r eval=FALSE} |
| 33 | +tryCatch( |
| 34 | + expr = { |
| 35 | + # Your code... |
| 36 | + # goes here... |
| 37 | + # ... |
| 38 | + }, |
| 39 | + error = function(e){ |
| 40 | + # (Optional) |
| 41 | + # Do this if an error is caught... |
| 42 | + }, |
| 43 | + warning = function(w){ |
| 44 | + # (Optional) |
| 45 | + # Do this if an warning is caught... |
| 46 | + }, |
| 47 | + finally = { |
| 48 | + # (Optional) |
| 49 | + # Do this at the end before quitting the tryCatch structure... |
| 50 | + } |
| 51 | +) |
| 52 | +``` |
| 53 | + |
| 54 | +## Hello World example |
| 55 | + |
| 56 | +This is a toy example showing how a function can use `tryCatch` to handle execution. |
| 57 | + |
| 58 | +```{r} |
| 59 | +log_calculator <- function(x){ |
| 60 | + tryCatch( |
| 61 | + expr = { |
| 62 | + message(log(x)) |
| 63 | + message("Successfully executed the log(x) call.") |
| 64 | + }, |
| 65 | + error = function(e){ |
| 66 | + message('Caught an error!') |
| 67 | + print(e) |
| 68 | + }, |
| 69 | + warning = function(w){ |
| 70 | + message('Caught an warning!') |
| 71 | + print(w) |
| 72 | + }, |
| 73 | + finally = { |
| 74 | + message('All done, quitting.') |
| 75 | + } |
| 76 | + ) |
| 77 | +} |
| 78 | +``` |
| 79 | + |
| 80 | +If x is a valid number, `expr` and `finally` are executed: |
| 81 | + |
| 82 | +```{r collapse=TRUE} |
| 83 | +log_calculator(10) |
| 84 | +``` |
| 85 | + |
| 86 | +If x is an invalid number (negative, zero, `NA`), `expr` is attempted, and `warning` and `finally` are executed: |
| 87 | + |
| 88 | +```{r collapse=TRUE} |
| 89 | +log_calculator(-10) |
| 90 | +``` |
| 91 | + |
| 92 | +If x is an invalid entry which raises an error, `expr` is attempted, and `error` and `finally` are executed: |
| 93 | + |
| 94 | +```{r collapse=TRUE} |
| 95 | +log_calculator("log_me") |
| 96 | +``` |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +# More useful examples |
| 101 | + |
| 102 | +## Use `tryCatch` within loops |
| 103 | + |
| 104 | +There are cases at work where I have quite large datasets to pre-process before model building can begin. The sources of these data can be varied and thus the quality of these data can vary. While each dataset _should_ conform to our data quality standards (datatypes, data dictionaries, other domain-specific constraints), very often these isn't the case. As a result, common data preprocessing functions might fail on few datasets. We can use `tryCatch` within the `for` loop to catch errors without breaking the loop. |
| 105 | + |
| 106 | +_Another toy example:_ Say, we have a nested dataframe of the `mtcars` data, nested on the cylinder numbers, and say, we had a few character values in `mpg` which is our response variable. |
| 107 | + |
| 108 | +```{r message=FALSE, warning=FALSE, paged.print=FALSE, collapse=TRUE} |
| 109 | +# Example nested dataframe |
| 110 | +(df_nested <- mtcars %>% as_tibble() %>% tidyr::nest(-cyl)) |
| 111 | +
|
| 112 | +df_nested$data[[2]][c(4,8),"mpg"] <- "Missing" |
| 113 | +``` |
| 114 | + |
| 115 | +We wish to run a few custom preprocessors, including taking the log of `mpg`. |
| 116 | + |
| 117 | +```{r} |
| 118 | +convert_gear_to_factors <- function(df){ df %>% mutate(gear = factor(gear, levels = 1:5, labels = paste0("Gear_",1:5))) } |
| 119 | +transform_response_to_log <- function(df){ df %>% mutate(log_mpg = log(mpg)) %>% select(-mpg) } |
| 120 | +``` |
| 121 | + |
| 122 | +How do we run our preprocessors over all the rows without error-ing out? |
| 123 | + |
| 124 | +```{r collapse=TRUE} |
| 125 | +for (indx in 1:nrow(df_nested)) { |
| 126 | + tryCatch( |
| 127 | + expr = { |
| 128 | + df_nested[[indx, "data"]] <- df_nested[[indx, "data"]] %>% |
| 129 | + convert_gear_to_factors() %>% |
| 130 | + transform_response_to_log() |
| 131 | + message("Iteration ", indx, " successful.") |
| 132 | + }, |
| 133 | + error = function(e){ |
| 134 | + message("* Caught an error on itertion ", indx) |
| 135 | + print(e) |
| 136 | + } |
| 137 | + ) |
| 138 | +} |
| 139 | +``` |
| 140 | + |
| 141 | +We're able to handle the error on iteration 2, let the user know, and run the remaining iterations. |
| 142 | + |
| 143 | +## Catch issues early, log progress often |
| 144 | + |
| 145 | +An important component of preparing 'development' code to be 'production' ready is implementation of good defensive programming and logging practices. I won't go into details of either here, except to showcase the style of programs I have been writing to prepare code before it goes to our production cluster. |
| 146 | + |
| 147 | +```{r eval=FALSE} |
| 148 | +preprocess_data <- function(df, x, b, ...){ |
| 149 | + message("-- Within preprocessor") |
| 150 | + df %>% |
| 151 | + assertive::assert_is_data.frame() %>% |
| 152 | + assertive::assert_is_non_empty() |
| 153 | + x %>% |
| 154 | + assertive::assert_is_numeric() %>% |
| 155 | + assertive::assert_all_are_greater_than(3.14) |
| 156 | + b %>% |
| 157 | + assertive::assert_is_a_bool() |
| 158 | + |
| 159 | + # Code here... |
| 160 | + # .... |
| 161 | + # .... |
| 162 | + |
| 163 | + return(df) |
| 164 | +} |
| 165 | +build_model <- function(...){message("-- Building model...")} |
| 166 | +eval_model <- function(...) {message("-- Evaluating model...")} |
| 167 | +save_model <- function(...) {message("-- Saving model...")} |
| 168 | +
|
| 169 | +main_executor <- function(...){ |
| 170 | + tryCatch( |
| 171 | + expr = { |
| 172 | + preprocess_data(df, x, b, more_args,...) %>% |
| 173 | + build_model() %>% |
| 174 | + eval_model() %>% |
| 175 | + save_model() |
| 176 | + }, |
| 177 | + error = function(e){ |
| 178 | + message('** ERR at ', Sys.time(), " **") |
| 179 | + print(e) |
| 180 | + write_to_log_file(e, logger_level = "ERR") #Custom logging function |
| 181 | + }, |
| 182 | + warning = function(w){ |
| 183 | + message('** WARN at ', Sys.time(), " **") |
| 184 | + print(w) |
| 185 | + write_to_log_file(w, logger_level = "WARN") #Custom logging function |
| 186 | + }, |
| 187 | + finally = { |
| 188 | + message("--- Main Executor Complete ---") |
| 189 | + } |
| 190 | + ) |
| 191 | +} |
| 192 | +``` |
| 193 | + |
| 194 | +Each utility function starts with checking arguments. There are plenty of packages which allow run-time testing. My favorite one is [assertive](https://cran.r-project.org/web/packages/assertive/index.html). It's easy to read the code, and it's pipe-able. Errors and warnings are handled using `tryCatch` - they are printed to the console if running in interactive mode, and then written to log files as well. I have written my own custom logging functions, but there are packages like [logging](https://cran.r-project.org/web/packages/logging/logging.pdf) and [log4r](https://cran.r-project.org/web/packages/log4r/index.html) which work perfectly fine. |
| 195 | + |
| 196 | +## Use `tryCatch` while model building |
| 197 | + |
| 198 | +`tryCatch` is quite invaluable during model building. This is an actual piece of code I wrote for a kaggle competition as part of my midterm work at school. [Github link here](https://github.com/rsangole/413_midterm_kaggle). The details of what's going on isn't important. At a high level, I was fitting `stlf` models using [`forecast`](https://www.rdocumentation.org/packages/forecast/versions/8.4/topics/forecast.stl) for each shop, among 60 unique shop-ID numbers. For various reasons, for some shops, an `stlf` model could not be be fit, in which case a default seasonal naive model using `snaive` was to be used. `tryCatch` is a perfect way to handle such exceptions as shown below. I used a similar approach while building models at an "item" level: the number of unique items was in the 1000s; manually debugging one at a time is impossible. `tryCatch` allows us to programatically handle such situations. |
| 199 | + |
| 200 | +```{r eval=FALSE, message=FALSE, warning=FALSE} |
| 201 | +stlf_yhats <- vector(mode = 'list', length = length(unique_shops)) |
| 202 | +for (i in seq_along(unique_shops)) { |
| 203 | + cat('\nProcessing shop', unique_shops[i]) |
| 204 | + tr_data <- c6_tr %>% filter(shop_id == unique_shops[i]) |
| 205 | + tr_data_ts <- |
| 206 | + dcast( |
| 207 | + formula = yw ~ shop_id, |
| 208 | + data = tr_data, |
| 209 | + fun.aggregate = sum, |
| 210 | + value.var = 'total_sales', |
| 211 | + fill = 0 |
| 212 | + ) |
| 213 | + tr_data_ts <- ts(tr_data_ts[, -1], frequency = 52) |
| 214 | +
|
| 215 | + ################## |
| 216 | + # <--Look here --> |
| 217 | + fit <- tryCatch( |
| 218 | + expr = {tr_data_ts %>% stlf(lambda = 'auto')}, |
| 219 | + error = function(e) { tr_data_ts %>% snaive()} |
| 220 | + ) |
| 221 | + ################## |
| 222 | + |
| 223 | + fc <- fit %>% forecast(h = h) |
| 224 | + stlf_yhats[[i]] <- as.numeric(fc$mean) |
| 225 | + stlf_yhats[[i]] <- ifelse(stlf_yhats[[i]] < 0, 0, stlf_yhats[[i]]) |
| 226 | +} |
| 227 | +``` |
| 228 | + |
| 229 | +Hope this is useful to others learning `tryCatch`. Cheers. |
| 230 | + |
| 231 | +# Links |
| 232 | + |
| 233 | + - https://www.rdocumentation.org/packages/R.oo/versions/1.2.7/topics/trycatch |
| 234 | + - https://www.r-bloggers.com/careful-with-trycatch/ |
| 235 | + - http://adv-r.had.co.nz/Exceptions-Debugging.html |
| 236 | + |
0 commit comments