fix_: cache read-only communities to reduce memory pressure #6519

osmaczko · 2025-04-11T10:36:32Z

Full database reads, especially on message receipt, caused repeated allocations and high RAM usage due to unmarshaling full community objects. This change introduces a lightweight cache (up to 5 entries, 1-minute TTL) to avoid redundant DB access and deserialization for commonly used communities.

Issue found during investigation of status-im/status-mobile#22463

CPU and Memory are a bit better.

Total memory allocation (not to be confused with memory in use) during a 4-minute app run dropped from 10 GB to less than 2 GB.

Interestingly, the Go runtime is quite greedy and reluctant to return unused memory to the operating system. See below:

HeapAlloc: 177 MB
HeapSys:   337 MB
HeapInuse: 199 MB
HeapIdle: 137 MB
HeapReleased: 12 MB
HeapToReturnToOS: 125 MB
StackInuse: 6 MB

status-im-auto · 2025-04-11T10:39:47Z

Jenkins Builds

Click to see older builds (25)

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
✔️	`1124215`	#1	2025-04-11 10:39:47	~2 min	`ios`	📦`zip`
✔️	`1124215`	#1	2025-04-11 10:40:03	~3 min	`android`	📦`aar`
✔️	`1124215`	#1	2025-04-11 10:41:28	~4 min	`macos`	📦`zip`
✔️	`1124215`	#1	2025-04-11 10:42:03	~5 min	`macos`	📦`zip`
✔️	`1124215`	#1	2025-04-11 10:42:33	~5 min	`windows`	📦`zip`
✔️	`1124215`	#1	2025-04-11 10:43:01	~6 min	`linux`	📦`zip`
✔️	`1124215`	#1	2025-04-11 10:47:04	~10 min	`tests-rpc`	📄`log`
✔️	`1124215`	#1	2025-04-11 11:13:51	~36 min	`tests`	📄`log`

✔️	`98457f8`	#2	2025-04-11 10:52:44	~3 min	`ios`	📦`zip`
✔️	`98457f8`	#2	2025-04-11 10:52:50	~3 min	`android`	📦`aar`
✔️	`98457f8`	#2	2025-04-11 10:54:01	~4 min	`windows`	📦`zip`
✔️	`98457f8`	#2	2025-04-11 10:54:17	~4 min	`macos`	📦`zip`
✔️	`98457f8`	#2	2025-04-11 10:54:51	~5 min	`macos`	📦`zip`
✔️	`98457f8`	#2	2025-04-11 10:55:57	~6 min	`linux`	📦`zip`
✖️	`98457f8`	#2	2025-04-11 10:58:45	~8 min	`tests-rpc`	📄`log`
✔️	`98457f8`	#2	2025-04-11 11:49:57	~35 min	`tests`	📄`log`
✔️	`98457f8`	#3	2025-04-11 18:30:29	~5 min	`tests-rpc`	📄`log`

✔️	`0bb489f`	#3	2025-04-14 20:28:37	~2 min	`android`	📦`aar`
✔️	`0bb489f`	#3	2025-04-14 20:28:59	~3 min	`ios`	📦`zip`
✔️	`0bb489f`	#3	2025-04-14 20:30:43	~4 min	`windows`	📦`zip`
✔️	`0bb489f`	#3	2025-04-14 20:31:05	~5 min	`macos`	📦`zip`
✔️	`0bb489f`	#3	2025-04-14 20:31:06	~5 min	`macos`	📦`zip`
✔️	`0bb489f`	#3	2025-04-14 20:31:47	~5 min	`linux`	📦`zip`
✔️	`0bb489f`	#4	2025-04-14 20:35:09	~9 min	`tests-rpc`	📄`log`
✔️	`0bb489f`	#3	2025-04-14 21:01:49	~35 min	`tests`	📄`log`

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
✔️	`37e5e84`	#4	2025-05-13 13:59:28	~2 min	`android`	📦`aar`
✔️	`37e5e84`	#4	2025-05-13 14:00:26	~3 min	`ios`	📦`zip`
✔️	`37e5e84`	#4	2025-05-13 14:00:57	~4 min	`macos`	📦`zip`
✔️	`37e5e84`	#4	2025-05-13 14:02:18	~5 min	`macos`	📦`zip`
✔️	`37e5e84`	#4	2025-05-13 14:02:26	~5 min	`linux`	📦`zip`
✔️	`37e5e84`	#4	2025-05-13 14:03:20	~6 min	`windows`	📦`zip`
✔️	`37e5e84`	#5	2025-05-13 14:08:16	~11 min	`tests-rpc`	📄`log`
✖️	`37e5e84`	#4	2025-05-13 14:32:39	~35 min	`tests`	📄`log`
✖️	`37e5e84`	#5	2025-05-13 15:24:28	~36 min	`tests`	📄`log`
✖️	`37e5e84`	#6	2025-05-13 16:03:37	~32 min	`tests`	📄`log`
✖️	`37e5e84`	#7	2025-05-13 19:09:31	~34 min	`tests`	📄`log`
✖️	`37e5e84`	#8	2025-05-14 07:08:08	~33 min	`tests`	📄`log`
✖️	`37e5e84`	#9	2025-05-14 09:18:19	~33 min	`tests`	📄`log`
✖️	`37e5e84`	#10	2025-05-14 13:40:29	~31 min	`tests`	📄`log`

✔️	`eeac6d9`	#5	2025-05-14 13:47:25	~3 min	`android`	📦`aar`
✔️	`eeac6d9`	#5	2025-05-14 13:47:34	~3 min	`ios`	📦`zip`
✔️	`eeac6d9`	#5	2025-05-14 13:49:04	~4 min	`macos`	📦`zip`
✔️	`eeac6d9`	#5	2025-05-14 13:49:37	~5 min	`macos`	📦`zip`
✔️	`eeac6d9`	#5	2025-05-14 13:50:00	~5 min	`windows`	📦`zip`
✔️	`eeac6d9`	#5	2025-05-14 13:50:46	~6 min	`linux`	📦`zip`
✖️	`eeac6d9`	#6	2025-05-14 13:54:47	~10 min	`tests-rpc`	📄`log`
✔️	`eeac6d9`	#11	2025-05-14 14:20:29	~35 min	`tests`	📄`log`
✖️	`eeac6d9`	#7	2025-05-14 18:38:03	~7 min	`tests-rpc`	📄`log`
✔️	`eeac6d9`	#8	2025-05-14 18:49:48	~7 min	`tests-rpc`	📄`log`

igor-sirotin

noice!

igor-sirotin · 2025-04-11T11:01:46Z

protocol/communities/manager.go

@@ -3993,6 +3996,10 @@ func (m *Manager) GetByID(id []byte) (*Community, error) {
 	return community, nil
 }

+func (m *Manager) GetByIDReadonly(id []byte) (ReadonlyCommunity, error) {


Can you please add some func description for both GetByIDReadonly and GetByID?
And perhaps mention that GetByIDReadonly must be used where possible

qfrank

Thank you for the improvement!

qfrank · 2025-04-11T11:15:24Z

protocol/communities/community.go

+	IsControlNode() bool
+	CanPost(pk *ecdsa.PublicKey, chatID string, messageType protobuf.ApplicationMetadataMessage_Type) (bool, error)
+	IsBanned(pk *ecdsa.PublicKey) bool
+}


Nice to know what ReadonlyCommunity can do 👍

This is just a subset. It should be extended to cover all read-only functions. I only included the ones that were called frequently according to pprof to iterate fast.

codecov · 2025-04-11T11:50:41Z

Codecov Report

Attention: Patch coverage is 77.14286% with 8 lines in your changes missing coverage. Please review.

Project coverage is 60.48%. Comparing base (ae86dd5) to head (eeac6d9).
Report is 2 commits behind head on develop.

Files with missing lines	Patch %	Lines
protocol/communities/manager.go	75.75%	4 Missing and 4 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6519      +/-   ##
===========================================
+ Coverage    60.41%   60.48%   +0.07%     
===========================================
  Files          841      841              
  Lines       104917   104930      +13     
===========================================
+ Hits         63388    63471      +83     
+ Misses       33940    33899      -41     
+ Partials      7589     7560      -29

Flag	Coverage Δ
functional	`25.93% <54.28%> (+0.43%)`	⬆️
unit	`58.30% <77.14%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
protocol/communities/community.go	`75.00% <ø> (ø)`
protocol/messenger.go	`64.98% <100.00%> (+0.30%)`	⬆️
protocol/messenger_handler.go	`59.87% <100.00%> (+0.08%)`	⬆️
protocol/communities/manager.go	`65.76% <75.75%> (+0.08%)`	⬆️

... and 29 files with indirect coverage changes

jrainville

Nice. It's very cleanly done

ilmotta · 2025-04-11T16:35:42Z

protocol/communities/manager.go

@@ -432,6 +434,7 @@ func NewManager(
 		communityLock:          NewCommunityLock(logger),
 		mediaServer:            mediaServer,
 		communityImageVersions: make(map[string]uint32),
+		cache:                  ttlcache.New(ttlcache.WithCapacity[string, ReadonlyCommunity](5), ttlcache.WithTTL[string, ReadonlyCommunity](time.Minute)),


Interesting solution @osmaczko. Have you tried other combinations of caching parameters before settling on these?

Around the time the TTL expires, do you see any potential risk that different parts of the code might receive stale or cached data while others get fresh data? I've run into similar timing issues in the past, that's why I'm asking. Might be a point of concern when using with some goroutines.

Interesting solution @osmaczko. Have you tried other combinations of caching parameters before settling on these?

I haven’t experimented with other parameter combinations. I set them based on the results observed in the pprof output. The test scenario involved joining the Status community, fetching historical messages, and passively observing activity. The 1-minute TTL is aligned with the duration of history batch processing, which takes approximately one minute—as indicated by the CPU spike between 30s and 80s in the second screenshot. The choice of 5 communities is somewhat arbitrary. A fully deserialized Status community consumes roughly 18MB of RAM, so five communities amount to about 90MB, which I considered a reasonable threshold, rounding it to ~100MB.

Around the time the TTL expires, do you see any potential risk that different parts of the code might receive stale or cached data while others get fresh data? I've run into similar timing issues in the past, that's why I'm asking. Might be a point of concern when using with some goroutines.

Good question. The cache is invalidated in thread-safe way each time a new community is saved. In theory, the behavior remains identical to the previous implementation, except that data is now read from the cache instead of directly from the database. Unless there’s a subtle edge case I’ve overlooked, I don’t see any risks with this approach.

osmaczko · 2025-04-11T18:38:32Z

Interestingly, the Go runtime is quite greedy and reluctant to return unused memory to the operating system. See below:

HeapAlloc: 177 MB
HeapSys:   337 MB
HeapInuse: 199 MB
HeapIdle: 137 MB
HeapReleased: 12 MB
HeapToReturnToOS: 125 MB
StackInuse: 6 MB

Regarding this, we could try using GOMEMLIMIT to set a soft memory limit for mobile builds. This will prompt the runtime to stay within the specified memory budget and may cause it to return memory more eagerly.

CC: @ilmotta @qfrank

osmaczko · 2025-04-11T18:58:49Z

"Out-of-memory errors (OOMs) have been a pain-point for Go applications. A class of these errors come from the same underlying cause: a temporary spike in memory causes the Go runtime to grow the heap, but it takes a very long time (on the order of minutes) to return that unneeded memory back to the system."
source: golang/go#30333

"Both of these situations, dealing with out-of-memory errors and homegrown garbage collection tuning, have a straightforward solution that other platforms (like Java and TCMalloc) already provide its users: a configurable memory limit, enforced by the Go runtime. A memory limit would give the Go runtime the information it needs to both respect users' memory limits, and allow them to optionally use that memory always, to cut back the cost of garbage collection."
source: https://github.com/golang/proposal/blob/master/design/48409-soft-memory-limit.md

osmaczko · 2025-04-11T19:20:29Z

Be careful, setting soft memory limit too low makes CPU go bonkers:

ilmotta · 2025-04-15T12:51:27Z

"Out-of-memory errors (OOMs) have been a pain-point for Go applications. A class of these errors come from the same underlying cause: a temporary spike in memory causes the Go runtime to grow the heap, but it takes a very long time (on the order of minutes) to return that unneeded memory back to the system." source: golang/go#30333

@osmaczko Would it help to manually trigger the GC when the mobile app goes to the background to force reclaim memory? Should we entertain the idea of manipulating the GC, like affecting the GOGC value with SetGCPercent at specific checkpoints in the code? I have no experience manipulating the GC in Go, but these ideas could be fruitful. Or perhaps have we tried a bit of this already and it didn't bring meaningful improvements?

osmaczko · 2025-04-15T14:33:12Z

@osmaczko Would it help to manually trigger the GC when the mobile app goes to the background to force reclaim memory?

I think that's a good approach. The only concern is how long it takes the Go runtime to release memory, and whether it's fast enough before the OS terminates the application. If we decide to go this route, we could use runtime/debug.FreeOSMemory.

Relevant paragraph on that from golang/go#30333:

"The second example of such tuning is calling runtime/debug.FreeOSMemory at some regular interval, forcing a garbage collection to trigger sooner, usually to respect some memory limit. This case is much more dangerous, because calling it too frequently can lead a process to entirely freeze up, spending all its time on garbage collection. Working with it takes careful consideration and experimentation to be both effective and avoid serious repercussions."

We need to be cautious, but I believe triggering this only when the app moves to the background is relatively safe.

Should we entertain the idea of manipulating the GC, like affecting the GOGC value with SetGCPercent at specific checkpoints in the code? I have no experience manipulating the GC in Go, but these ideas could be fruitful. Or perhaps have we tried a bit of this already and it didn't bring meaningful improvements?

I don't have experience with that. According to below, I don't think we should:

"This out-of-memory avoidance led to the Go community developing its own homegrown garbage collector tuning.

The first example of such tuning is the heap ballast. In order to increase their productivity metrics while also avoiding out-of-memory errors, users sometimes pick a low GOGC value, and fool the GC into thinking that there's a fixed amount of memory live. This solution elegantly scales with GOGC: as the real live heap increases, the impact of that fixed set decreases, and GOGC's contribution to the heap size dominates. In effect, GOGC is larger when the heap is smaller, and smaller (down to its actual value) when the heap is larger. Unfortunately, this solution is not portable across platforms, and is not guaranteed to continue to work as the Go runtime changes. Furthermore, users are forced to do complex calculations and estimate runtime overheads in order to pick a heap ballast size that aligns with their memory limits."

ilmotta · 2025-04-16T23:25:18Z

Thanks @osmaczko! Indeed, we should always be careful GC tuning. For mobile there's such a wide variety of devices that it's nearly impossible to find one single hard coded value to work optimally for everybody.

We have yet to verify the impact of forcefully freeing up memory in different devices. There's also this scenario where a user switches between apps multiple times in a row, in which case it would be good to not call FreeOSMemory every time the app is backgrounded, more like doing that and debouncing could be better. But to decide how long to debounce we would need a clearer picture of how long it would take for the GC to do its thing with casual usage and how much CPU pressure we would add by cleaning up (potentially) too frequently. Complexity 😅

jrainville · 2025-05-12T17:34:40Z

@osmaczko can we merge this?

osmaczko · 2025-05-13T13:55:57Z

@osmaczko can we merge this?

Need to resolve status-im/status-desktop#17781 first. Let me take a look.

Full database reads, especially on message receipt, caused repeated allocations and high RAM usage due to unmarshaling full community objects. This change introduces a lightweight cache (up to 5 entries, 1-minute TTL) to avoid redundant DB access and deserialization for commonly used communities.

osmaczko self-assigned this Apr 11, 2025

osmaczko added this to Status Desktop/Mobile Board Apr 11, 2025

osmaczko moved this to Code Review in Status Desktop/Mobile Board Apr 11, 2025

osmaczko requested review from ilmotta, qfrank, jrainville, igor-sirotin and a team April 11, 2025 10:36

osmaczko force-pushed the fix/community-extensive-memory-allocations branch from 1124215 to 98457f8 Compare April 11, 2025 10:49

osmaczko changed the title ~~fix_: introduce read-only community cache to reduce memory pressure~~ fix_: cache read-only communities to reduce memory pressure Apr 11, 2025

igor-sirotin approved these changes Apr 11, 2025

View reviewed changes

qfrank approved these changes Apr 11, 2025

View reviewed changes

jrainville approved these changes Apr 11, 2025

View reviewed changes

ilmotta approved these changes Apr 11, 2025

View reviewed changes

ilmotta reviewed Apr 11, 2025

View reviewed changes

This was referenced Apr 11, 2025

App frequently killed by Android while in background status-im/status-mobile#22463

Open

fix_: cache read-only communities to reduce memory pressure status-im/status-desktop#17781

Closed

osmaczko force-pushed the fix/community-extensive-memory-allocations branch from 98457f8 to 0bb489f Compare April 14, 2025 20:25

jrainville mentioned this pull request Apr 22, 2025

Cache read-only communities to reduce memory pressure #6545

Closed

jrainville linked an issue Apr 22, 2025 that may be closed by this pull request

Cache read-only communities to reduce memory pressure #6545

Closed

jrainville mentioned this pull request Apr 22, 2025

Investigate the use of runtime/debug.FreeOSMemory when the app goes in the background #6547

Open

osmaczko force-pushed the fix/community-extensive-memory-allocations branch from 0bb489f to 37e5e84 Compare May 13, 2025 13:56

osmaczko force-pushed the fix/community-extensive-memory-allocations branch from 37e5e84 to eeac6d9 Compare May 14, 2025 13:44

osmaczko merged commit b76b2bc into develop May 15, 2025
21 checks passed

osmaczko deleted the fix/community-extensive-memory-allocations branch May 15, 2025 07:40

github-project-automation bot moved this from Code Review to Done in Status Desktop/Mobile Board May 15, 2025

osmaczko mentioned this pull request May 19, 2025

feat_: implement low memory mode with garbage collection logging #6549

Open

fix_: cache read-only communities to reduce memory pressure #6519

fix_: cache read-only communities to reduce memory pressure #6519

Conversation

osmaczko commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

status-im-auto commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jenkins Builds

Uh oh!

igor-sirotin left a comment

Choose a reason for hiding this comment

Uh oh!

igor-sirotin Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

qfrank left a comment

Choose a reason for hiding this comment

Uh oh!

qfrank Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

osmaczko Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jrainville left a comment

Choose a reason for hiding this comment

Uh oh!

ilmotta Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

osmaczko Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

osmaczko commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osmaczko commented Apr 11, 2025

Uh oh!

osmaczko commented Apr 11, 2025

Uh oh!

ilmotta commented Apr 15, 2025

Uh oh!

osmaczko commented Apr 15, 2025

Uh oh!

ilmotta commented Apr 16, 2025

Uh oh!

jrainville commented May 12, 2025

Uh oh!

osmaczko commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

osmaczko commented Apr 11, 2025 •

edited

Loading

status-im-auto commented Apr 11, 2025 •

edited

Loading

codecov bot commented Apr 11, 2025 •

edited

Loading

osmaczko commented Apr 11, 2025 •

edited

Loading