Skip to content

feat: separate Dockerfile for Hadoop #1186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dervoeti
Copy link
Member

@dervoeti dervoeti commented Jun 20, 2025

Description

Follow up for #1173

This PR looks like a lot but it really just does one thing:
We had a single Dockerfile for Hadoop. This PR separates the build from Hadoop itself from the image build (which also includes things like hdfs-utils). We do the same thing for HBase already, were we have one Dockerfile for the image and another one just to build the HBase JARs.

For Hadoop, this has two advantages:

  • Currently, our HDFS image includes /stackable/patched-libs, which contains the Hadoop libraries other products use as dependencies. These are not required in the HDFS image though. But because other projects depend on hadoop and need to COPY these libraries form the image in order to use them as dependencies, they had to be part of the HDFS image. Now products can just depend on hadoop/hadoop instead, so just on the Java build itself.
  • Builds of products that depend on Hadoop (HBase, Druid, Spark, Hive) should be faster, because things like hdfs-utils don't need to be built now. Just Hadoop itself.

In the future we could even think about not copying everything from hadoop/hadoop into hadoop, but just the components required to run HDFS.

I tested building Hadoop 3.3.6 and 3.4.1, as well as Spark 3.5.6, Hive 4.0.0 and Druid 33.0.0. I also ran the smoke tests for the built images. Everything succeeded.

Definition of Done Checklist

Note

Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant.

Please make sure all these things are done and tick the boxes

  • Changes are OpenShift compatible
  • All added packages (via microdnf or otherwise) have a comment on why they are added
  • Things not downloaded from Red Hat repositories should be mirrored in the Stackable repository and downloaded from there
  • All packages should have (if available) signatures/hashes verified
  • Add an entry to the CHANGELOG.md file
  • Integration tests ran successfully
TIP: Running integration tests with a new product image

The image can be built and uploaded to the kind cluster with the following commands:

bake --product <product> --image-version <stackable-image-version>
kind load docker-image <image-tagged-with-the-major-version> --name=<name-of-your-test-cluster>

See the output of bake to retrieve the image tag for <image-tagged-with-the-major-version>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant