Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1927513: Shaded vs non-shaded Arrow + Netty memory manager, potential memory leak #2076

Open
aiguofer opened this issue Feb 13, 2025 · 2 comments
Assignees
Labels
bug status-triage_done Initial triage done, will be further handled by the driver team

Comments

@aiguofer
Copy link

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of JDBC driver are you using?

3.22.0

  1. What operating system and processor architecture are you using?

Linux x64

  1. What version of Java are you using?

amazoncorretto:17.0.7

  1. What did you do?

We have an Arrow Flight SQL service (Arrow 14.0.0) that allows users to execute SQL queries and we're currently using the (shaded) Snowflake JDBC driver to execute the query, convert the results into Arrow, and stream the results back. We're currently using -Dio.netty.maxDirectMemory=0 -Xms4G -Xmx4G -XX:MaxDirectMemorySize=2G and limiting our pods at 9G of ram, but we're still hitting the k8s OOM killer due to excessive use of direct memory.

  1. What did you expect to see?

Our total memory usage should stay at or below 6G

  1. Can you set logging to DEBUG and collect the logs?

I did this but didn't see any logs related to Snowflake's Arrow memory handling


I know our "local" non-shaded Arrow + Netty are respecting our max direct memory limit, so my current hypothesis is that the problem lies in Snowflake's Arrow memory management.

Questons:

  • Does the shaded Netty library respect Dio.netty.maxDirectMemory=0?
  • Could Arrow + Netty be competing against their shaded counterparts for direct memory?
  • Is there a way to pass our BufferAllocator instance to the JDBC driver so all memory management is centralized?
    • I see that SnowflakeResultSetSerializableV1 has a setRootAllocator method, but it's not immediately obvious how to use this
@aiguofer aiguofer added the bug label Feb 13, 2025
@github-actions github-actions bot changed the title Shaded vs non-shaded Arrow + Netty memory manager, potential memory leak SNOW-1927513: Shaded vs non-shaded Arrow + Netty memory manager, potential memory leak Feb 13, 2025
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Feb 18, 2025
@sfc-gh-sghosh
Copy link
Contributor

Hello @aiguofer ,

Thanks for raising the issue.

When the parameter in shaded code is set then the parameter should be also specified for a shaded package
which means: -Dio.netty.maxDirectMemory=0 -> -Dnet.snowflake.client.jdbc.internal.io.netty.maxDirectMemory=0

Please try and let us know.

For the 3rd question, setRootAllocator is going to be considered a private API in the future, so it should not be used.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added the status-triage_done Initial triage done, will be further handled by the driver team label Feb 18, 2025
@aiguofer
Copy link
Author

Thanks! I tried this in dev but never saw any logs indicating the option had been picked up so I wasn't sure it was working.

Curious, with this setting, would the shaded Netty keep track of its own memory usage separately from the shaded Netty?

For example, if I have -XX:MaxDirectMemorySize=2G, would each Netty use up to 2G for a total of 4G, or would they both somehow be able to tell how much of the direct memory is being used by each other and respect the 2G limit regardless?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants