Add support for files larger than 2GB #289

silvanocerza · 2025-09-05T15:30:48Z

This in an attempt to make the library work with DBs larger than 2GB.

As discussed in #154 I started out creating a private Buffer interface that defines all ByteBuffer methods used by the library, though using long instead of int where necessary.

I also implemented it with SingleBuffer, it just wraps a single ByteBuffer and dispatches most method calls to that.

I obviously had to make some changes to use Buffer and long where neccessary.

These are the benchmarks before and after the change. There seems to be a small impact in the performance, I would consider it negligible but I'd love your feedback @oschwald. If this is good for you I'll keep with this approach.

Before:

$ java -cp target/classes:sample Benchmark "src/test/resources/maxmind-db/test-data/GeoLite2-City-Test.mmdb"
No caching
Warming up
Requests per second: 12238560
Requests per second: 15192594
Requests per second: 18715945

Benchmarking
Requests per second: 19939930
Requests per second: 20182262
Requests per second: 18237705
Requests per second: 19340692
Requests per second: 20495198

With caching
Warming up
Requests per second: 20486957
Requests per second: 16962125
Requests per second: 20492853

Benchmarking
Requests per second: 20080976
Requests per second: 19449435
Requests per second: 19674518
Requests per second: 17438679
Requests per second: 18250728

After:

$ java -cp target/classes:sample Benchmark "src/test/resources/maxmind-db/test-data/GeoLite2-City-Test.mmdb"
No caching
Warming up
Requests per second: 9042490
Requests per second: 15097511
Requests per second: 18496268

Benchmarking
Requests per second: 18985389
Requests per second: 17247624
Requests per second: 18502386
Requests per second: 19458596
Requests per second: 19564717

With caching
Warming up
Requests per second: 18938661
Requests per second: 16036499
Requests per second: 19542414

Benchmarking
Requests per second: 17596791
Requests per second: 18068941
Requests per second: 19250173
Requests per second: 19177891
Requests per second: 19458849

oschwald

This looks like a good start! I had a number of minor comments. In terms of performance, I believe the bounds checks on the methods should be unnecessary given their usage and eliminating them should help reduce the overhead a bit.

src/main/java/com/maxmind/db/SingleBuffer.java

src/main/java/com/maxmind/db/BufferHolder.java

silvanocerza · 2025-09-08T12:08:58Z

Made the requested changes, this is the new benchmark.

$ java -cp target/classes:sample Benchmark "src/test/resources/maxmind-db/test-data/GeoLite2-City-Test.mmdb"
No caching
Warming up
Requests per second: 9419547
Requests per second: 15061866
Requests per second: 18671361

Benchmarking
Requests per second: 19330939
Requests per second: 17348774
Requests per second: 18305868
Requests per second: 19884570
Requests per second: 19941156

With caching
Warming up
Requests per second: 19980485
Requests per second: 16225425
Requests per second: 19684910

Benchmarking
Requests per second: 19650596
Requests per second: 16992233
Requests per second: 18420135
Requests per second: 19801441
Requests per second: 19802013

I'll keep working on the MulltiBuffer implementation.

silvanocerza · 2025-09-10T10:52:22Z

MultiBuffer implemented.

@oschwald should I update the script in https://github.com/maxmind/MaxMind-DB to generate big DB too? I guess it will be useful for tests and benchmarks.

oschwald · 2025-09-10T15:06:09Z

MultiBuffer implemented.

Thanks! I'll try to take a look soon.

@oschwald should I update the script in https://github.com/maxmind/MaxMind-DB to generate big DB too? I guess it will be useful for tests and benchmarks.

I don't think we would want a large database in that repo. There are quite a few other projects pulling that in, and a large database would impact them. In terms of testing, it might make sense to focus on good unit-test coverage of MultiBuffer and all of its methods directly. One way to do this more easily without slowing down the whole test suite would be to add a constructor where you can set the chunk size and setting it to a small number for the tests. Potentially we could also add a package-private constructor for the reader that allowed setting the chunk size and then parameterize the existing reader tests so that they cover both the SingleBuffer case and the MultiBuffer case.

silvanocerza · 2025-09-10T18:02:40Z

I don't think we would want a large database in that repo. There are quite a few other projects pulling that in, and a large database would impact them.

Ah ok, I thought it was only used for your libraries.

In terms of testing, it might make sense to focus on good unit-test coverage of MultiBuffer and all of its methods directly. One way to do this more easily without slowing down the whole test suite would be to add a constructor where you can set the chunk size and setting it to a small number for the tests. Potentially we could also add a package-private constructor for the reader that allowed setting the chunk size and then parameterize the existing reader tests so that they cover both the SingleBuffer case and the MultiBuffer case.

Sounds good, I can cover both buffers like that. 👍

oschwald

I've only had a chance to do a cursory review, but I noticed a few things.

src/main/java/com/maxmind/db/MultiBuffer.java

oschwald · 2025-09-11T21:53:31Z

src/main/java/com/maxmind/db/MultiBuffer.java

+            throw new IllegalArgumentException("File channel has no data");
+        }
+
+        MultiBuffer buf = new MultiBuffer(size);


Won't this allocate a bunch of ByteBuffers that we will immediately replace with the mmap-backed ones? I think this problem exists several other places as well, e.g., duplicate.

Yeah, I added the private constructor that works with buffers after this and forgot to change.

Isn't this still an issue? I'd expect something like this:

int fullChunks = (int) (size / DEFAULT_CHUNK_SIZE); int remainder = (int) (size % DEFAULT_CHUNK_SIZE); int totalChunks = fullChunks + (remainder > 0 ? 1 : 0); ByteBuffer[] buffers = new ByteBuffer[totalChunks]; long remaining = size; for (int i = 0; i < totalChunks; i++) { long chunkPos = (long) i * DEFAULT_CHUNK_SIZE; long chunkSize = Math.min(DEFAULT_CHUNK_SIZE, remaining); buffers[i] = channel.map( FileChannel.MapMode.READ_ONLY, chunkPos, chunkSize ); remaining -= chunkSize; } return new MultiBuffer(buffers, DEFAULT_CHUNK_SIZE);

I thought I saw this fixed last time, but either it was lost in the rebase or I overlooked it.

Think I missed it completely, fixed it now.

src/main/java/com/maxmind/db/MultiBuffer.java

oschwald · 2025-09-19T20:45:35Z

src/main/java/com/maxmind/db/BufferHolder.java

            throw new NullPointerException("Unable to use a NULL InputStream");
        }
-        final int chunkSize = Integer.MAX_VALUE;
+        final int chunkSize = Integer.MAX_VALUE / 2;


What was the motivation behind this change?

Mostly cause of the changes in aafbae6. I made the same change in MultiBuffer too.

I was getting allocation errors trying to allocate byte[Integer.MAX_VALUE], as far as I understood because when allocating an array some memory is reserved for various metadata.

I noticed that Integer.MAX_VALUE - 8 does the trick, at least on my machine, though I don't know if every platform would have been fine so I went with half max int to be safe.

Whatever threshold we use, we should probably use it for the decision to use a single buffer on lines 23 and 35 as well. Presumably the allocation there would have the same issue. From what I can tell, Integer.MAX_VALUE - 8 should be safe. We probably just just define this as a constant in the class.

Actually, you should just set DEFAULT_CHUNK_SIZE to this and then use that here.

silvanocerza · 2025-09-22T16:58:18Z

src/test/java/com/maxmind/db/MultiBufferTest.java

+    @Test
+    public void testWrapValidChunks() {
+        ByteBuffer[] chunks = new ByteBuffer[] {
+                ByteBuffer.allocateDirect(MultiBuffer.DEFAULT_CHUNK_SIZE),
+                ByteBuffer.allocateDirect(500)
+        };
+
+        MultiBuffer buffer = MultiBuffer.wrap(chunks);
+        assertEquals(MultiBuffer.DEFAULT_CHUNK_SIZE + 500, buffer.capacity());
+    }
+
+    @Test
+    public void testWrapInvalidChunkSize() {
+        ByteBuffer[] chunks = new ByteBuffer[] {
+                ByteBuffer.allocateDirect(500),
+                ByteBuffer.allocateDirect(MultiBuffer.DEFAULT_CHUNK_SIZE)
+        };
+
+        assertThrows(IllegalArgumentException.class, () -> MultiBuffer.wrap(chunks));
+    }


I guess these tests might be causing the failure? Quite strange as the chunk size is not max int.

Possible solution I see is move the chunks size check from wrap to the constructor and test that using a small chunk size. At that point wrap is just a oneliner. Sounds good?

I think all the tests allocating buffers of MultiBuffer.DEFAULT_CHUNK_SIZE will need to be adjusted as we are likely hitting the MaxDirectMemorySize limit set on the JVM. This also includes testDecodeStringTooLarge below, I believe.

Your approach for wrap makes sense.

oschwald

Sorry, I have been pretty busy, but here is some preliminary feedback.

oschwald · 2025-09-25T21:58:25Z

src/main/java/com/maxmind/db/BufferHolder.java

            throw new NullPointerException("Unable to use a NULL InputStream");
        }
-        final int chunkSize = Integer.MAX_VALUE;
+        final int chunkSize = Integer.MAX_VALUE / 2;


Whatever threshold we use, we should probably use it for the decision to use a single buffer on lines 23 and 35 as well. Presumably the allocation there would have the same issue. From what I can tell, Integer.MAX_VALUE - 8 should be safe. We probably just just define this as a constant in the class.

src/main/java/com/maxmind/db/MultiBuffer.java

oschwald · 2025-09-25T22:10:16Z

src/main/java/com/maxmind/db/BufferHolder.java

            throw new NullPointerException("Unable to use a NULL InputStream");
        }
-        final int chunkSize = Integer.MAX_VALUE;
+        final int chunkSize = Integer.MAX_VALUE / 2;


Actually, you should just set DEFAULT_CHUNK_SIZE to this and then use that here.

oschwald · 2025-09-25T22:36:23Z

src/test/java/com/maxmind/db/MultiBufferTest.java

+    @Test
+    public void testWrapValidChunks() {
+        ByteBuffer[] chunks = new ByteBuffer[] {
+                ByteBuffer.allocateDirect(MultiBuffer.DEFAULT_CHUNK_SIZE),
+                ByteBuffer.allocateDirect(500)
+        };
+
+        MultiBuffer buffer = MultiBuffer.wrap(chunks);
+        assertEquals(MultiBuffer.DEFAULT_CHUNK_SIZE + 500, buffer.capacity());
+    }
+
+    @Test
+    public void testWrapInvalidChunkSize() {
+        ByteBuffer[] chunks = new ByteBuffer[] {
+                ByteBuffer.allocateDirect(500),
+                ByteBuffer.allocateDirect(MultiBuffer.DEFAULT_CHUNK_SIZE)
+        };
+
+        assertThrows(IllegalArgumentException.class, () -> MultiBuffer.wrap(chunks));
+    }


I think all the tests allocating buffers of MultiBuffer.DEFAULT_CHUNK_SIZE will need to be adjusted as we are likely hitting the MaxDirectMemorySize limit set on the JVM. This also includes testDecodeStringTooLarge below, I believe.

Your approach for wrap makes sense.

silvanocerza · 2025-09-26T15:28:07Z

I think I fixed everything you pointed out. The current test failures though have me quite stumped.
I see they're failing because the heap has run out of memory, though it's failing in ReaderTest before MultiBufferTest runs. 🤔

I tried bumping the Surefire JVM heap size but there are so conflicts with master so I'm not sure whether that will solve the failures or not. I'm not even sure if it's a good way to solve the failure to be fair. 😅

oschwald

The suggested test change may help with the CI, although I haven't gone through all the tests closely.

oschwald · 2025-09-30T20:17:39Z

src/main/java/com/maxmind/db/Reader.java

    }

-    int readNode(ByteBuffer buffer, int nodeNumber, int index)
+    int readNode(Buffer buffer, long nodeNumber, int index)


I think we should return long from this function. We will also need to update it replace the static decodeInteger with a decodeLong. The issue is that for 32-bit nodes, we could overflow.

src/test/java/com/maxmind/db/MultiBufferTest.java

…MultiBuffer

…NK_SIZE

…uffer

This reverts commit d6d7acd.

silvanocerza · 2025-10-02T09:29:28Z

I rebased to increase the Surefire JVM heap size but that doesn't seem to work. Though I managed to reproduce the issue locally with this:

$ docker run --rm -m 5g --memory-swap 5g \
  -v "$PWD":/ws -w /ws maven:3.9-eclipse-temurin-21 \
  bash -lc 'export MAVEN_OPTS="-Xms512m -Xmx1g"; mvn -B -e clean test \
    -Dsurefire.argLine="-Xms512m -Xmx1024m -XX:MaxMetaspaceSize=192m -XX:MaxDirectMemorySize=256m -XX:+ExitOnOutOfMemoryError" \
    -Dsurefire.forkCount=1'

The issues seems to be here when running TestReader.testBrokenSearchTreePointerStream(), for some reason it exceeds the heap size. Using ByteBuffer.allocateDirect() doesn't help either, only lowering considerably MultiBuffer.DEFAULT_CHUNK_SIZE works.

Though than it fails in MultiBufferTest.testDecodeStringTooLarge().

Think I'll go with a similar approach we used for other tests and create methods and constructor to set the chunk size and test those.

silvanocerza · 2025-10-02T10:07:37Z

I managed to make tests pass, I added some protected methods and the public ones are simple wrappers like we'd done for other methods.

I removed the MultiBuffer.wrap() method as it was redundant at this point given the changes in BufferHolder constructors to use a custom chunk size when building it from a stream.

All in all I'm quite satisfied with the current state of the PR.

oschwald · 2025-10-02T15:35:46Z

src/test/java/com/maxmind/db/ReaderTest.java

    @Test
    public void testNoIpV4SearchTreeStream() throws IOException {
-        this.testReader = new Reader(getStream("MaxMind-DB-no-ipv4-search-tree.mmdb"));
+        this.testReader = new Reader(getStream("MaxMind-DB-no-ipv4-search-tree.mmdb"), 2048);


It would be nice to parameterize the tests (and others) so that we are testing both SingleBuffer and MultiBuffer.

Also, we might get better test coverage of edge cases if we used a lower value for the MultiBuffer cases.

oschwald · 2025-10-02T15:43:03Z

src/main/java/com/maxmind/db/MultiBuffer.java

+            throw new IllegalArgumentException("File channel has no data");
+        }
+
+        MultiBuffer buf = new MultiBuffer(size);


Isn't this still an issue? I'd expect something like this:

int fullChunks = (int) (size / DEFAULT_CHUNK_SIZE); int remainder = (int) (size % DEFAULT_CHUNK_SIZE); int totalChunks = fullChunks + (remainder > 0 ? 1 : 0); ByteBuffer[] buffers = new ByteBuffer[totalChunks]; long remaining = size; for (int i = 0; i < totalChunks; i++) { long chunkPos = (long) i * DEFAULT_CHUNK_SIZE; long chunkSize = Math.min(DEFAULT_CHUNK_SIZE, remaining); buffers[i] = channel.map( FileChannel.MapMode.READ_ONLY, chunkPos, chunkSize ); remaining -= chunkSize; } return new MultiBuffer(buffers, DEFAULT_CHUNK_SIZE);

I thought I saw this fixed last time, but either it was lost in the rebase or I overlooked it.

…ions

oschwald requested changes Sep 5, 2025

View reviewed changes

oschwald requested changes Sep 11, 2025

View reviewed changes

oschwald reviewed Sep 19, 2025

View reviewed changes

silvanocerza commented Sep 22, 2025

View reviewed changes

silvanocerza requested a review from oschwald September 25, 2025 09:28

oschwald requested changes Sep 25, 2025

View reviewed changes

silvanocerza requested a review from oschwald September 26, 2025 15:28

oschwald requested changes Sep 30, 2025

View reviewed changes

silvanocerza added 17 commits October 2, 2025 10:14

Add package private Buffer interface

d04d33f

Add SingleBuffer class that implements Buffer class

c9400f6

Replace ByteBuffer use with Buffer and SingleBuffer

fc85b18

Remove unnecessary checkIndex method

45484a9

Make SingleBuffer package private

6f05c1a

Remove asReadOnlyBuffer method from Buffer interface

e0197a1

Implement MultiBuffer

5977117

Use MultiBuffer when DB size can't fit in SingleBuffer

e0488e4

Fix style checks

0223e9a

Simplify remaining bytes checks when retrieving double or float from …

545f669

…MultiBuffer

Remove unnecessary buffer duplication

23eed67

Move values declaration in getDouble and getFloat

4fcadcb

Change useless while to if

39b5de2

Add read limit check in MultiBuffer.get

ca1645a

Change MultiBuffer wrap to actually wrap chunks

a4998a1

Change MultiBuffer CHUNK_SIZE to half of max int

5c4aeee

Change MultiBuffer to use an array instead of a list to store buffers

c068a22

silvanocerza added 12 commits October 2, 2025 10:14

Remove unnecessary buffer duplicate

4b3d8fb

Fix check style failure

934d39b

Add some package private methods in MultiBuffer to customize chunk size

862ace7

Add MultiBuffer tests

fdce85c

Simplify MultiBuffer.duplicate()

c6d47ef

Change MultiBuffer.DEFAULT_CHUNK_SIZE value

0ff4987

Change different occurences of chunk sizes to MultiBuffer.DEFAULT_CHU…

41bae0a

…NK_SIZE

Move chunk length check from wrap to constructor

1963719

Fix testDecodeStringTooLarge so it doesn't allocate a max int sized b…

7acab35

…uffer

Use allocate instead of allocateDirect in MultiBuffer constructor

8ba97a7

Increase heap size in Surefire JVM

d6d7acd

Change Reader.readNode to return long instead of int

da3cbac

silvanocerza force-pushed the large-db-support branch from 3a5f79a to da3cbac Compare October 2, 2025 08:15

Revert "Increase heap size in Surefire JVM"

4a4a80d

This reverts commit d6d7acd.

silvanocerza added 3 commits October 2, 2025 11:55

Fix Reader testing streams

57d8c60

Remove unused MultiBuffer.wrap()

610a801

Fix MultiBuffer decode test

ac33057

silvanocerza marked this pull request as ready for review October 2, 2025 10:04

silvanocerza requested a review from oschwald October 2, 2025 10:05

oschwald requested changes Oct 2, 2025

View reviewed changes

silvanocerza added 4 commits October 7, 2025 14:18

Fix MultiBuffer.mapFromChannel() to avoid unnecessary buffers allocat…

820cb85

…ions

Parametrize Reader streams tests

181f754

Parametrize all Reader tests

fd1bf1e

Lower test chunk size to fix failure in CI

ba92425

silvanocerza requested a review from oschwald October 8, 2025 09:24

Add support for files larger than 2GB #289

Are you sure you want to change the base?

Add support for files larger than 2GB #289

Uh oh!

Conversation

silvanocerza commented Sep 5, 2025

Uh oh!

oschwald left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

silvanocerza commented Sep 8, 2025

Uh oh!

silvanocerza commented Sep 10, 2025

Uh oh!

oschwald commented Sep 10, 2025

Uh oh!

silvanocerza commented Sep 10, 2025

Uh oh!

oschwald left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oschwald Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oschwald left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

silvanocerza commented Sep 26, 2025

Uh oh!

oschwald left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

silvanocerza commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silvanocerza commented Oct 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oschwald Oct 2, 2025 •

edited

Loading

silvanocerza commented Oct 2, 2025 •

edited

Loading

oschwald Oct 2, 2025 •

edited

Loading