Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak involving warnings #7649

Open
dkutner opened this issue Jan 31, 2025 · 0 comments
Open

Possible memory leak involving warnings #7649

dkutner opened this issue Jan 31, 2025 · 0 comments

Comments

@dkutner
Copy link

dkutner commented Jan 31, 2025

I recently updated my dplyr version (late to the party), and I'm hitting some increased memory usage. I've traced it back to how warnings are handled. Beginning in dplyr 1.1.1, I get the following output:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

identity <- function(x, warn) {
    if (warn) {
        warning("fake warning")
    }
    x
}

df <- tibble::tibble(e = rep(1, 1e8))

print(gc())
#>             used  (Mb) gc trigger   (Mb)  max used  (Mb)
#> Ncells    620941  33.2    1306337   69.8   1306337  69.8
#> Vcells 101049658 771.0  148096356 1129.9 101084366 771.3

df <- df %>% mutate(e = identity(e, warn = TRUE))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `e = identity(e, warn = TRUE)`.
#> Caused by warning in `identity()`:
#> ! fake warning

print(gc())
#>             used  (Mb) gc trigger   (Mb)  max used  (Mb)
#> Ncells    729780  39.0    1306337   69.8   1306337  69.8
#> Vcells 101287706 772.8  148096356 1129.9 102369359 781.1

rm(df)
print(gc())
#>             used  (Mb) gc trigger   (Mb)  max used  (Mb)
#> Ncells    729742  39.0    1306337   69.8   1306337  69.8
#> Vcells 101287654 772.8  148096356 1129.9 102369359 781.1

Created on 2025-01-31 with reprex v2.0.2

If I restart R and rerun with warn = FALSE, the final memory usage is only 7.9 MB rather than 772.8 MB. Additionally, if I rewrite the mutate to avoid using a pipe via df <- mutate(df, e = identity(e, warn = TRUE)), the final memory usage is only 8.8 MB. Switching the pipe to |> also yields low memory usage. Under dplyr 1.1.0, the above reprex yields 18.8 MB.

I don't have a full appreciation for whether warnings would capture my environment, but I'm wondering if that's perhaps happening within either base R or dplyr's own record of warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant