-
Notifications
You must be signed in to change notification settings - Fork 305
Add a weigher to the EntityCache based on approximate entity size #490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f0ca27b
e28c519
a7556b9
fd351a1
d98be9c
78725c5
93a5de1
5efb1c6
4e9d192
ce7b2e5
ed42b8c
94c3fe3
1a26018
230c95e
eb0a817
8782774
fc9daee
f399043
ca620aa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.polaris.core.persistence.cache; | ||
|
||
import com.github.benmanes.caffeine.cache.Weigher; | ||
import org.apache.polaris.core.persistence.ResolvedPolarisEntity; | ||
import org.checkerframework.checker.index.qual.NonNegative; | ||
|
||
/** | ||
* A {@link Weigher} implementation that weighs {@link ResolvedPolarisEntity} objects by the | ||
* approximate size of the entity object. | ||
*/ | ||
public class EntityWeigher implements Weigher<Long, ResolvedPolarisEntity> { | ||
|
||
/** The amount of weight that is expected to roughly equate to 1MB of memory usage */ | ||
public static final long WEIGHT_PER_MB = 1024 * 1024; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this constant anymore? Why not inline its value in the config default? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me the constant seems somewhat coupled to the weigher -- i.e. another weigher implementation could treat 1 string character as 1000 weight units, and would have a very different There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this constant is specifically mean for the config default? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pretty much yep. I also thought it was a helpful constant to have for anyone trying to read this code rather than us just setting the default to e.g. On the other hand I can see how it's potentially misleading given the extended discussion around the amount of heap we expect a given weight to use, so I'm OK with yanking the constant if you think it's problematic. edit: We could also consider renaming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Re: constant, how about renaming it to |
||
|
||
/* Represents the approximate size of an entity beyond the properties */ | ||
private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where does the value 1000 come from? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See this old comment -- the apparent assumption was that entities are generally ~1KB. This means in the worst case, we will not get a larger number of entries due to this change. It should strictly be smaller. |
||
|
||
/* Represents the amount of bytes that a character is expected to take up */ | ||
private static final int APPROXIMATE_BYTES_PER_CHAR = 3; | ||
|
||
/** Singleton instance */ | ||
private static final EntityWeigher instance = new EntityWeigher(); | ||
|
||
private EntityWeigher() {} | ||
|
||
/** Gets the singleton {@link EntityWeigher} */ | ||
public static EntityWeigher getInstance() { | ||
return instance; | ||
} | ||
|
||
/** | ||
* Computes the weight of a given entity. The unit here is not exactly bytes, but it's close. | ||
* | ||
* @param key The entity's key; not used | ||
* @param value The entity to be cached | ||
* @return The weight of the entity | ||
*/ | ||
@Override | ||
public @NonNegative int weigh(Long key, ResolvedPolarisEntity value) { | ||
return APPROXIMATE_ENTITY_OVERHEAD | ||
+ (value.getEntity().getName().length() * APPROXIMATE_BYTES_PER_CHAR) | ||
+ (value.getEntity().getProperties().length() * APPROXIMATE_BYTES_PER_CHAR) | ||
+ (value.getEntity().getInternalProperties().length() * APPROXIMATE_BYTES_PER_CHAR); | ||
Comment on lines
+59
to
+62
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The constants in this expression do not "approximate" any actual sizes. I suggest: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, they do approximate actual sizes. The testing described above details that approximately There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can leave this "as is" for now. Putting a more general comment in the main thread. |
||
} | ||
|
||
/** Factory method to provide a typed Weigher */ | ||
public static Weigher<Long, ResolvedPolarisEntity> asWeigher() { | ||
return getInstance(); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.polaris.core.persistence.cache; | ||
|
||
import java.util.List; | ||
import java.util.Optional; | ||
import org.apache.iceberg.catalog.TableIdentifier; | ||
import org.apache.polaris.core.PolarisDefaultDiagServiceImpl; | ||
import org.apache.polaris.core.PolarisDiagnostics; | ||
import org.apache.polaris.core.entity.table.IcebergTableLikeEntity; | ||
import org.apache.polaris.core.persistence.ResolvedPolarisEntity; | ||
import org.assertj.core.api.Assertions; | ||
import org.junit.jupiter.api.Test; | ||
|
||
public class EntityWeigherTest { | ||
|
||
private PolarisDiagnostics diagnostics; | ||
|
||
public EntityWeigherTest() { | ||
diagnostics = new PolarisDefaultDiagServiceImpl(); | ||
} | ||
|
||
private ResolvedPolarisEntity getEntity( | ||
String name, | ||
String metadataLocation, | ||
String properties, | ||
Optional<String> internalProperties) { | ||
var entity = | ||
new IcebergTableLikeEntity.Builder(TableIdentifier.of(name), metadataLocation).build(); | ||
entity.setProperties(properties); | ||
internalProperties.ifPresent(p -> entity.setInternalProperties(p)); | ||
return new ResolvedPolarisEntity(diagnostics, entity, List.of(), 1); | ||
} | ||
|
||
@Test | ||
public void testBasicWeight() { | ||
int weight = EntityWeigher.getInstance().weigh(1L, getEntity("t", "", "", Optional.empty())); | ||
Assertions.assertThat(weight).isGreaterThan(0); | ||
} | ||
|
||
@Test | ||
public void testNonZeroWeight() { | ||
int weight = EntityWeigher.getInstance().weigh(1L, getEntity("t", "", "", Optional.of(""))); | ||
Assertions.assertThat(weight).isGreaterThan(0); | ||
} | ||
|
||
@Test | ||
public void testWeightIncreasesWithNameLength() { | ||
int smallWeight = | ||
EntityWeigher.getInstance().weigh(1L, getEntity("t", "", "", Optional.empty())); | ||
int largeWeight = | ||
EntityWeigher.getInstance().weigh(1L, getEntity("looong name", "", "", Optional.empty())); | ||
Assertions.assertThat(smallWeight).isLessThan(largeWeight); | ||
} | ||
|
||
@Test | ||
public void testWeightIncreasesWithMetadataLocationLength() { | ||
int smallWeight = | ||
EntityWeigher.getInstance().weigh(1L, getEntity("t", "", "", Optional.empty())); | ||
int largeWeight = | ||
EntityWeigher.getInstance() | ||
.weigh(1L, getEntity("t", "looong location", "", Optional.empty())); | ||
Assertions.assertThat(smallWeight).isLessThan(largeWeight); | ||
} | ||
|
||
@Test | ||
public void testWeightIncreasesWithPropertiesLength() { | ||
int smallWeight = | ||
EntityWeigher.getInstance().weigh(1L, getEntity("t", "", "", Optional.empty())); | ||
int largeWeight = | ||
EntityWeigher.getInstance() | ||
.weigh(1L, getEntity("t", "", "looong properties", Optional.empty())); | ||
Assertions.assertThat(smallWeight).isLessThan(largeWeight); | ||
} | ||
|
||
@Test | ||
public void testWeightIncreasesWithInternalPropertiesLength() { | ||
int smallWeight = | ||
EntityWeigher.getInstance().weigh(1L, getEntity("t", "", "", Optional.of(""))); | ||
int largeWeight = | ||
EntityWeigher.getInstance() | ||
.weigh(1L, getEntity("t", "", "", Optional.of("looong iproperties"))); | ||
Assertions.assertThat(smallWeight).isLessThan(largeWeight); | ||
} | ||
|
||
@Test | ||
public void testExactWeightCalculation() { | ||
int preciseWeight = | ||
EntityWeigher.getInstance() | ||
.weigh(1L, getEntity("name", "location", "{a: b}", Optional.of("{c: d, e: f}"))); | ||
Assertions.assertThat(preciseWeight).isEqualTo(1066); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -904,7 +904,7 @@ void dropEntity(List<PolarisEntityCore> catalogPath, PolarisBaseEntity entityToD | |
} | ||
|
||
/** Grant a privilege to a catalog role */ | ||
void grantPrivilege( | ||
public void grantPrivilege( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The changes here look unrelated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They are not; the |
||
PolarisBaseEntity role, | ||
List<PolarisEntityCore> catalogPath, | ||
PolarisBaseEntity securable, | ||
|
@@ -1291,7 +1291,7 @@ PolarisBaseEntity createTestCatalog(String catalogName) { | |
* | ||
* @return the identity we found | ||
*/ | ||
PolarisBaseEntity ensureExistsByName( | ||
public PolarisBaseEntity ensureExistsByName( | ||
List<PolarisEntityCore> catalogPath, | ||
PolarisEntityType entityType, | ||
PolarisEntitySubType entitySubType, | ||
|
@@ -1337,7 +1337,7 @@ PolarisBaseEntity ensureExistsByName( | |
* | ||
* @return the identity we found | ||
*/ | ||
PolarisBaseEntity ensureExistsByName( | ||
public PolarisBaseEntity ensureExistsByName( | ||
List<PolarisEntityCore> catalogPath, PolarisEntityType entityType, String name) { | ||
return this.ensureExistsByName( | ||
catalogPath, entityType, PolarisEntitySubType.NULL_SUBTYPE, name); | ||
|
@@ -1352,7 +1352,7 @@ PolarisBaseEntity ensureExistsByName( | |
* @param internalProps updated internal properties | ||
* @return updated entity | ||
*/ | ||
PolarisBaseEntity updateEntity( | ||
public PolarisBaseEntity updateEntity( | ||
List<PolarisEntityCore> catalogPath, | ||
PolarisBaseEntity entity, | ||
String props, | ||
|
@@ -1844,7 +1844,7 @@ void validateBootstrap() { | |
this.ensureGrantRecordExists(principalRole, principal, PolarisPrivilege.PRINCIPAL_ROLE_USAGE); | ||
} | ||
|
||
void testCreateTestCatalog() { | ||
public void testCreateTestCatalog() { | ||
// create test catalog | ||
this.createTestCatalog("test"); | ||
|
||
|
@@ -2418,7 +2418,7 @@ public void testPrivileges() { | |
* @param newCatPath new catalog path | ||
* @param newName new name | ||
*/ | ||
void renameEntity( | ||
public void renameEntity( | ||
List<PolarisEntityCore> catPath, | ||
PolarisBaseEntity entity, | ||
List<PolarisEntityCore> newCatPath, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify: What is our expected benefit from using soft values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing, it improved performance and prevented situations where a large cache lead to OOMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this feedback! I'm glad to see that it had positive impact in you env 👍
However, my experience with soft references in cache is not so bright and I'd advise caution, especially when Parallel GC is used. It may be able to recover from close-to-OOM conditions, but API response times are likely to suffer due to GC pauses.
Given that the default is off (not using soft values). I'm fine with this PR in general.