Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spill related configs to system configs #24726

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 73 additions & 28 deletions presto-docs/src/main/sphinx/presto_cpp/properties.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ For information on catalog configuration properties, see :doc:`Connectors </conn

For information on Presto C++ session properties, see :doc:`properties-session`.

NOTE: While some of the configuration properties below with "-gb" in their names
show gigabytes (gB; 1 gB equals 1000000000 B), it is actually
NOTE: While some of the configuration properties below with "-gb" in their names
show gigabytes (gB; 1 gB equals 1000000000 B), it is actually
gibibytes (GiB; 1 GiB equals 1073741824 B).

.. contents::
Expand Down Expand Up @@ -137,8 +137,8 @@ The configuration properties of Presto C++ workers are described here, in alphab
1) Memory used by the queries as specified in ``query-memory-gb``; 2) Memory used by the
system, such as disk spilling and cache prefetch.

Set ``system-memory-gb`` to about 90% of available machine memory of the deployment.
This allows some buffer room to handle unaccounted memory in order to prevent out-of-memory conditions.
Set ``system-memory-gb`` to about 90% of available machine memory of the deployment.
This allows some buffer room to handle unaccounted memory in order to prevent out-of-memory conditions.
The default value of 57 gb is calculated based on available machine memory of 64 gb.


Expand All @@ -162,6 +162,51 @@ The configuration properties of Presto C++ workers are described here, in alphab
storage used for spilling. If it is zero, then there is no limit and spilling
might exhaust the storage or takes too long to run.


``spill-enabled``
^^^^^^^^^^^^^^^^^

* **Type:** ``boolean``
* **Default value:** ``false``

Try spilling memory to disk to avoid exceeding memory limits for the query.

Spilling works by offloading memory to disk. This process can allow a query with a large memory
footprint to pass at the cost of slower execution times. Currently, spilling is supported only for
aggregations and joins (inner and outer), so this property will not reduce memory usage required for
window functions, sorting and other join types.


``join-spill-enabled``
^^^^^^^^^^^^^^^^^^^^^^

* **Type:** ``boolean``
* **Default value:** ``true``

When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for joins to
avoid exceeding memory limits for the query.


``aggregation-spill-enabled``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* **Type:** ``boolean``
* **Default value:** ``true``

When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for aggregations to
avoid exceeding memory limits for the query.


``order-by-spill-enabled``
^^^^^^^^^^^^^^^^^^^^^^^^^^

* **Type:** ``boolean``
* **Default value:** ``true``

When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for order by to
avoid exceeding memory limits for the query.


``shared-arbitrator.reserved-capacity``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -390,32 +435,32 @@ The configuration properties of AsyncDataCache and SSD cache are described here.
^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``string``
* **Default value:** ``/mnt/flash/async_cache.``

The path of the directory that is mounted onto the SSD.

``async-cache-max-ssd-write-ratio``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``double``
* **Default value:** ``0.7``
The maximum ratio of the number of in-memory cache entries written to the SSD cache
over the total number of cache entries. Use this to control SSD cache write rate,

The maximum ratio of the number of in-memory cache entries written to the SSD cache
over the total number of cache entries. Use this to control SSD cache write rate,
once the ratio exceeds this threshold then we stop writing to the SSD cache.

``async-cache-ssd-savable-ratio``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``double``
* **Default value:** ``0.125``

The min ratio of SSD savable (in-memory) cache space over the total cache space.
Once the ratio exceeds this limit, we start writing SSD savable cache entries
Once the ratio exceeds this limit, we start writing SSD savable cache entries
into SSD cache.

``async-cache-min-ssd-savable-bytes``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``integer``
* **Default value:** ``16777216``

Min SSD savable (in-memory) cache space to start writing SSD savable cache entries into SSD cache.

The default value ``16777216`` is 16 MB.
Expand All @@ -427,61 +472,61 @@ The configuration properties of AsyncDataCache and SSD cache are described here.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``string``
* **Default value:** ``0s``

The interval for persisting in-memory cache to SSD. Set this configuration to a non-zero value to
activate periodic cache persistence.
The following time units are supported:

The following time units are supported:

ns, us, ms, s, m, h, d

``async-cache-ssd-disable-file-cow``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``bool``
* **Default value:** ``false``

In file systems such as btrfs that support cow (copy on write), the SSD cache can use all of the SSD
space and stop working. To prevent that, use this option to disable cow for cache files.

``ssd-cache-checksum-enabled``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``bool``
* **Default value:** ``false``
When enabled, a CRC-based checksum is calculated for each cache entry written to SSD.

When enabled, a CRC-based checksum is calculated for each cache entry written to SSD.
The checksum is stored in the next checkpoint file.

``ssd-cache-read-verification-enabled``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``bool``
* **Default value:** ``false``
When enabled, the checksum is recalculated and verified against the stored value when

When enabled, the checksum is recalculated and verified against the stored value when
cache data is loaded from the SSD.

``cache.velox.ttl-enabled``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``bool``
* **Default value:** ``false``

Enable TTL for AsyncDataCache and SSD cache.

``cache.velox.ttl-threshold``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``string``
* **Default value:** ``2d``

TTL duration for AsyncDataCache and SSD cache entries.

The following time units are supported:

ns, us, ms, s, m, h, d

``cache.velox.ttl-check-interval``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``string``
* **Default value:** ``1h``

The periodic duration to apply cache TTL and evict AsyncDataCache and SSD cache entries.

Memory Checker Properties
Expand All @@ -508,9 +553,9 @@ server is under low memory pressure.

Specifies the system memory limit that triggers the memory pushback or heap dump if
the server memory usage is beyond this limit. A value of zero means no limit is set.
This only applies if ``system-mem-pushback-enabled`` is ``true``.
Set ``system-mem-limit-gb`` to be greater than or equal to system-memory-gb but not
higher than the available machine memory of the deployment.
This only applies if ``system-mem-pushback-enabled`` is ``true``.
Set ``system-mem-limit-gb`` to be greater than or equal to system-memory-gb but not
higher than the available machine memory of the deployment.
The default value of 60 gb is calculated based on available machine memory of 64 gb.

``system-mem-shrink-gb``
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,15 @@ void updateFromSystemConfigs(
{core::QueryConfig::kQueryMaxMemoryPerNode,
std::string(SystemConfig::kQueryMaxMemoryPerNode)},
{core::QueryConfig::kSpillFileCreateConfig,
std::string(SystemConfig::kSpillerFileCreateConfig)}};
std::string(SystemConfig::kSpillerFileCreateConfig)},
{core::QueryConfig::kSpillEnabled,
std::string(SystemConfig::kSpillEnabled)},
{core::QueryConfig::kJoinSpillEnabled,
std::string(SystemConfig::kJoinSpillEnabled)},
{core::QueryConfig::kOrderBySpillEnabled,
std::string(SystemConfig::kOrderBySpillEnabled)},
{core::QueryConfig::kAggregationSpillEnabled,
std::string(SystemConfig::kAggregationSpillEnabled)}};

for (const auto& configNameEntry : sessionSystemConfigMapping) {
const auto& sessionName = configNameEntry.first;
Expand Down
20 changes: 20 additions & 0 deletions presto-native-execution/presto_cpp/main/common/Configs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,10 @@ SystemConfig::SystemConfig() {
BOOL_PROP(kPlanValidatorFailOnNestedLoopJoin, false),
STR_PROP(kPrestoDefaultNamespacePrefix, "presto.default"),
STR_PROP(kPoolType, "DEFAULT"),
BOOL_PROP(kSpillEnabled, false),
BOOL_PROP(kJoinSpillEnabled, true),
BOOL_PROP(kAggregationSpillEnabled, true),
BOOL_PROP(kOrderBySpillEnabled, true),
};
}

Expand Down Expand Up @@ -313,6 +317,22 @@ std::string SystemConfig::poolType() const {
return value;
}

bool SystemConfig::spillEnabled() const {
return optionalProperty<bool>(kSpillEnabled).value();
}

bool SystemConfig::joinSpillEnabled() const {
return optionalProperty<bool>(kJoinSpillEnabled).value();
}

bool SystemConfig::aggregationSpillEnabled() const {
return optionalProperty<bool>(kAggregationSpillEnabled).value();
}

bool SystemConfig::orderBySpillEnabled() const {
return optionalProperty<bool>(kOrderBySpillEnabled).value();
}

bool SystemConfig::mutableConfig() const {
return optionalProperty<bool>(kMutableConfig).value();
}
Expand Down
17 changes: 17 additions & 0 deletions presto-native-execution/presto_cpp/main/common/Configs.h
Original file line number Diff line number Diff line change
Expand Up @@ -712,6 +712,14 @@ class SystemConfig : public ConfigBase {

// Specifies the type of worker pool
static constexpr std::string_view kPoolType{"pool-type"};

// Spill related configs
static constexpr std::string_view kSpillEnabled{"spill-enabled"};
static constexpr std::string_view kJoinSpillEnabled{"join-spill-enabled"};
static constexpr std::string_view kAggregationSpillEnabled{
"aggregation-spill-enabled"};
static constexpr std::string_view kOrderBySpillEnabled{
"order-by-spill-enabled"};

SystemConfig();

Expand Down Expand Up @@ -963,9 +971,18 @@ class SystemConfig : public ConfigBase {
bool enableRuntimeMetricsCollection() const;

bool prestoNativeSidecar() const;

std::string prestoDefaultNamespacePrefix() const;

std::string poolType() const;

bool spillEnabled() const;

bool joinSpillEnabled() const;

bool aggregationSpillEnabled() const;

bool orderBySpillEnabled() const;
};

/// Provides access to node properties defined in node.properties file.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,34 +108,54 @@ TEST_F(QueryContextManagerTest, defaultSessionProperties) {
EXPECT_EQ(queryConfig.maxSpillLevel(), defaultQC->maxSpillLevel());
EXPECT_EQ(
queryConfig.spillCompressionKind(), defaultQC->spillCompressionKind());
EXPECT_EQ(queryConfig.spillEnabled(), defaultQC->spillEnabled());
EXPECT_EQ(queryConfig.aggregationSpillEnabled(), defaultQC->aggregationSpillEnabled());
EXPECT_EQ(queryConfig.joinSpillEnabled(), defaultQC->joinSpillEnabled());
EXPECT_EQ(queryConfig.orderBySpillEnabled(), defaultQC->orderBySpillEnabled());
EXPECT_EQ(
queryConfig.validateOutputFromOperators(),
defaultQC->validateOutputFromOperators());
EXPECT_EQ(
queryConfig.spillWriteBufferSize(), defaultQC->spillWriteBufferSize());
}

TEST_F(QueryContextManagerTest, overrdingSessionProperties) {
TEST_F(QueryContextManagerTest, overridingSessionProperties) {
protocol::TaskId taskId = "scan.0.0.1.0";
const auto& systemConfig = SystemConfig::instance();
{
protocol::SessionRepresentation session{.systemProperties = {}};
auto queryCtx =
taskManager_->getQueryContextManager()->findOrCreateQueryCtx(
taskId, session);
// When session properties are not explicitly set, they should be set to
// system config values.
EXPECT_EQ(
queryCtx->queryConfig().queryMaxMemoryPerNode(),
systemConfig->queryMaxMemoryPerNode());
EXPECT_EQ(
queryCtx->queryConfig().spillFileCreateConfig(),
systemConfig->spillerFileCreateConfig());
EXPECT_EQ(
queryCtx->queryConfig().spillEnabled(),
systemConfig->spillEnabled());
EXPECT_EQ(
queryCtx->queryConfig().aggregationSpillEnabled(),
systemConfig->aggregationSpillEnabled());
EXPECT_EQ(
queryCtx->queryConfig().joinSpillEnabled(),
systemConfig->joinSpillEnabled());
EXPECT_EQ(
queryCtx->queryConfig().orderBySpillEnabled(),
systemConfig->orderBySpillEnabled());
}
{
protocol::SessionRepresentation session{
.systemProperties = {
{"query_max_memory_per_node", "1GB"},
{"spill_file_create_config", "encoding:replica_2"}}};
{"spill_file_create_config", "encoding:replica_2"},
{"spill_enabled", "true"},
{"aggregation_spill_enabled", "false"},
{"join_spill_enabled", "true"}}};
auto queryCtx =
taskManager_->getQueryContextManager()->findOrCreateQueryCtx(
taskId, session);
Expand All @@ -144,6 +164,28 @@ TEST_F(QueryContextManagerTest, overrdingSessionProperties) {
1UL * 1024 * 1024 * 1024);
EXPECT_EQ(
queryCtx->queryConfig().spillFileCreateConfig(), "encoding:replica_2");
// Override with different value
EXPECT_EQ(
queryCtx->queryConfig().spillEnabled(), true);
EXPECT_NE(
queryCtx->queryConfig().spillEnabled(),
systemConfig->spillEnabled());
// Override with different value
EXPECT_EQ(
queryCtx->queryConfig().aggregationSpillEnabled(), false);
EXPECT_NE(
queryCtx->queryConfig().aggregationSpillEnabled(),
systemConfig->aggregationSpillEnabled());
// Override with same value
EXPECT_EQ(
queryCtx->queryConfig().joinSpillEnabled(), true);
EXPECT_EQ(
queryCtx->queryConfig().joinSpillEnabled(),
systemConfig->joinSpillEnabled());
// No override
EXPECT_EQ(
queryCtx->queryConfig().orderBySpillEnabled(),
systemConfig->orderBySpillEnabled());
}
}

Expand Down
Loading