I was getting timeout errors after the function passed to apply/2 was finished. Turned out I needed to increase the timeout in stop_trace/2. It would be nice if it could be customized by the user.
The generated stacks.out file in my case was 81 MB. Admittedly, the resulting flame graph is noisy enough that it's difficult to drill in, but the high-level picture is still somewhat insightful. The noise and large file size are due in part to me testing a recursive algorithm.