feat: separate Dockerfile for Hadoop #1186

dervoeti · 2025-06-20T12:41:57Z

Description

Follow up for #1173

This PR looks like a lot but it really just does one thing:
We had a single Dockerfile for Hadoop. This PR separates the build from Hadoop itself from the image build (which also includes things like hdfs-utils). We do the same thing for HBase already, were we have one Dockerfile for the image and another one just to build the HBase JARs.

For Hadoop, this has two advantages:

Currently, our HDFS image includes /stackable/patched-libs, which contains the Hadoop libraries other products use as dependencies. These are not required in the HDFS image though. But because other projects depend on hadoop and need to COPY these libraries form the image in order to use them as dependencies, they had to be part of the HDFS image. Now products can just depend on hadoop/hadoop instead, so just on the Java build itself.
Builds of products that depend on Hadoop (HBase, Druid, Spark, Hive) should be faster, because things like hdfs-utils don't need to be built now. Just Hadoop itself.

In the future we could even think about not copying everything from hadoop/hadoop into hadoop, but just the components required to run HDFS.

I tested building Hadoop 3.3.6 and 3.4.1, as well as Spark 3.5.6, Hive 4.0.0 and Druid 33.0.0. I also ran the smoke tests for the built images. Everything succeeded.

Definition of Done Checklist

Note

Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant.

Please make sure all these things are done and tick the boxes

Changes are OpenShift compatible
All added packages (via microdnf or otherwise) have a comment on why they are added
Things not downloaded from Red Hat repositories should be mirrored in the Stackable repository and downloaded from there
All packages should have (if available) signatures/hashes verified
Add an entry to the CHANGELOG.md file
Integration tests ran successfully

TIP: Running integration tests with a new product image

The image can be built and uploaded to the kind cluster with the following commands:

bake --product <product> --image-version <stackable-image-version>
kind load docker-image <image-tagged-with-the-major-version> --name=<name-of-your-test-cluster>

See the output of bake to retrieve the image tag for <image-tagged-with-the-major-version>.

dervoeti added 2 commits June 20, 2025 14:34

feat: separate Dockerfile for Hadoop

4e4454c

chore: changelog / lint fix

01298b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: separate Dockerfile for Hadoop #1186

feat: separate Dockerfile for Hadoop #1186

Uh oh!

dervoeti commented Jun 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

feat: separate Dockerfile for Hadoop #1186

Are you sure you want to change the base?

feat: separate Dockerfile for Hadoop #1186

Uh oh!

Conversation

dervoeti commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Definition of Done Checklist

Uh oh!

Uh oh!

dervoeti commented Jun 20, 2025 •

edited

Loading