If you want to learn how to run Kafka Connect, create a custom SMT, and test a Connector is working end to end in your local environment with automated tests asserting on data in the Database...*gasps for air* ๐ฎ... this project is for you; read on...
Tip
Run ./gradlew build
to build the project including running the integration tests. Afterward you can check the code
coverage and container logs, more details about where to find them below.
Note
The Docker Compose stack used in this project is a variant of the Confluent cp-all-in-one
stack found
at https://github.com/confluentinc/cp-all-in-one/tree/7.6.0-post
The purpose of this project is to help a developer understand how to use:
- Kafka Connect :: Run a Docker Compose stack with all dependant services a Kafka Connect instance needs
- Single Message Transforms :: Develop custom Kafka Connect SMTs and build a jar library
- Kafka Connect Docker Image :: Build a Docker image
based on
confluentinc/cp-server-connect-base
- Auto Connector creation :: Add Connector configs to docker image, created at startup with a custom script
- JDBCSinkConnector :: Create a Kafka Connect Connector to move data from a Kafka Topic to a Database
- Gradle Multi-Project structure :: Separate our SMT library code and the Spring Boot application/tests
- Spring Boot and @SpringBootTest :: Produce a Kafka record on a topic and read from a Database with JPA
- Integration Tests :: Run a single Gradle command to ensure our Connector/SMTs are working end-to-end
- gradle-docker-compose-plugin :: Start and stop the full stack of Docker Compose services using Gradle
- JaCoCo Report Aggregation Plugin :: Get code coverage stats from subprojects in a single HTML/XML report
Every commit pushed to git is built by Circle CI to ensure the integration tests still pass,
and SonarCloud analysis is run if possible (see SONAR_TOKEN
requirement below). You can see the
Circle CI build configuration in the .circleci/config.yml file.
The Gradle dependencies are cached based on a checksum of all the build.gradle
files, which can speed up the builds
significantly; see more at the Caching dependencies page.
After the build has finished, the test results can be found in the Tests
tab of the build job; see more at
the Collect test data page.
The test html reports, JaCoCo aggregate report and connect container logs can be found in the Artifacts
tab of the
build job. The log file can be useful to check if integration tests fail; see more at
the Storing build artifacts page.
SonarCloud helps to keep the codebase clean by reporting on several metrics such as issues (problems in code where rules are violated), code coverage, and technical debt.
The sonar
Gradle task will only run if the SONAR_TOKEN
environment
variable is present, which makes the build more robust in different
environments. Circle CI has the token applied, so it will run there.
Important
If you fork this repo and want to be able to run SonarCloud analysis you will need to change the projectKey
and
organization
sonar properties to point to your own project, they can be found in the root
build.gradle.
The code in this project is split into a few Gradle Multi-Projects with dependencies between them. Best practises from the Gradle User Manual have been followed:
Project Directory | Description |
---|---|
Root project ย |
Contains Java Toolchain plugin in settings.gradle , and plugin definitions in build.gradle which are applied to subprojects (this centralises the version numbers). |
buildSrc |
Contains Gradle convention plugins to share build logic in sub projects. |
connect-smt-lib |
Contains Java code and tests for custom SMTs that are exported to a jar library and added to the Kafka Connect Docker image at build time. |
connect-spring-boot-app |
Contains Java code which uses Spring Boot, Spring Kafka, Spring Data JPA, and @SpringBootTest to run end to end integration tests with the help of some Grade plugins to control the Docker Compose stack. |
jacoco-report-aggregation |
Contains Gradle build logic to run a plugin which aggregates all JaCoCo execution data and creates a consolidated HTML/XML report with coverage data for all subprojects. |
connect-connector-configs | Contains Kafka Connect connector configurations in JSON format. Each JSON file is processed by the Kafka Connect container at startup to create a connector automatically once the REST API is ready to receive requests. |
connect-scripts | Contains custom bash scripts to start the Connect service with the Confluent startup script, wait for the REST API to be available, then create connectors automatically. |
db | Contains SQL scripts run by the postgres service on startup; we usually add CREATE TABLE definitions in these scripts. |
The objective of this project is to allow you to easily add your own SMTs and use them in Connectors.
If you want to do this follow these steps:
- Add a new SMT to the connect-smt-lib module
- ๐ don't forget to add a unit test too!
- Add a new connector config to the connect-connector-configs directory
- ๐It will automatically get detected at container startup and the connector will be created
- ๐ก Keep an eye on the container logs at startup to ensure your connector config is valid
- ๐ก Or use the Confluent Control Center UI to add a connector by uploading the json file
- Add an AVRO schema for the new record/topic here
- Add an AVRO JSON data file for the new record/topic here
- If you are adding a new
JDBCSinkConnector
- Add a new Creator class to map AVRO data to a
GenericData.Record
based on FFVIIAllyUpdateCreator - Add a
@SpringBootTest
integration test for the new connector based on FFVIIAllyUpdateTest - Run the integration tests
./gradlew integrationTest
- Java 17 - Used by Gradle to build the Java code/tests, and it is the JRE running inside the Connect container
- Gradle Wrapper - The project contains
a Gradle Wrapper to execute the build
- The Gradle installation will be downloaded the first time you run the wrapper
- Docker & Docker Compose - to build Docker images and run containers in the stack
- Postgres Database - to store data from Kafka Connect
Important
Please ensure you have Java 17 set as your Java Home, otherwise you will get errors like this:
FAILURE: Build failed with an exception.
* What went wrong:
A problem occurred configuring root project 'kafka-connect-stack'.
> Could not resolve all artifacts for configuration ':classpath'.
> Could not resolve org.springframework.boot:spring-boot-gradle-plugin:3.2.4.
Required by:
project : > org.springframework.boot:org.springframework.boot.gradle.plugin:3.2.4
> No matching variant of org.springframework.boot:spring-boot-gradle-plugin:3.2.4 was found.
The consumer was configured to find a library for use during runtime, compatible with
Java 11, packaged as a jar, and its dependencies declared externally, as well as attribute
'org.gradle.plugin.api-version' with value '8.7' but:
- Variant 'apiElements' declares a library, packaged as a jar, and its dependencies declared
externally:
- Incompatible because this component declares a component for use during compile-time,
compatible with Java 17 and the consumer needed a component for use during runtime,
compatible with Java 11
- Other compatible attribute:
- Doesn't say anything about org.gradle.plugin.api-version (required '8.7')
The Docker Compose stack used in this project is a variant of the Confluent cp-all-in-one
stack found
at https://github.com/confluentinc/cp-all-in-one/tree/7.6.0-post
Note
image source: https://github.com/confluentinc/cp-all-in-one/blob/7.6.0-post/images/cp-all-in-one-community.png
Tip
You can start the stack by running the docker compose command below, but we recommend you use
the gradle-docker-compose-plugin
task ./gradlew intTestComposeUp
to start it as it will also build the Connect image if the SMT library or
connector configs have changed in any way.
docker compose up -d --build
To connect to services in Docker, refer to the following ports:
Service | Port | Notes |
---|---|---|
ZooKeeper | 2181 | |
Kafka broker | 9092 | |
Kafka broker JMX | 9101 | |
Confluent Schema Registry | 8081 | Schema Registry API Reference |
Kafka Connect | 8083 | Kafka Connect Rest API documentation |
Confluent Control Center | 9021 | Confluent Control Center documentation |
ksqlDB | 8088 | |
Confluent REST Proxy | 8082 | API Reference for Confluent REST Proxy |
The most useful service is the Confluent Control Center which exposes a graphical user interface at http://localhost:9021/clusters/ to manage various services including:
- the Kafka Broker
- Topics (including producing and consuming records)
- Consumers/Consumer groups
- Connectors running on the Kafka Connect service
- Schemas associated with topics (via the Schema Registry under the hood)
- use ksqlDB Streams, Tables, and queries
An integrationTest
Gradle sourceSet has been added so that we can isolate the unit tests and the integration tests
(the former completing in a much shorter time by running the standard ./gradlew test
task).
Running the ./gradlew check
gradle task will invoke the integrationTest
task which has the following steps:
- bring the Docker Compose stack up in a project named
kafka-connect-stack_inttest
- run the integration tests
- bring the Docker Compose stack down
- remove all the containers
Tip
This behaviour can be changed by using the various configuration properties of
the gradle-docker-compose-plugin such
as stopContainers
and removeContainers
After running the integration tests, you can view the logs for all the containers in the connect-spring-boot-app/build directory.
If the integration tests fail ๐ , you might want to check the Kafka Connect container logs for errors on startup when automatic connector creation happens, or when a connector tried to process a record and failed.
As mentioned above all containers are stopped and removed after the integration tests have finished running, which is useful when running them on a CI server. It is possible to run the integration tests without stopping and removing all the Docker containers in the stack which is more suited to local development.
This is useful if you are making frequent changes to the tests, the underlying Spring Boot application, or even the
SMTs in the connect-smt-lib
library/connector configurations (making changes to the latter will force the
connect
container to be recreated, which is desirable).
- To do this run
./gradlew integrationTestRun
Tip
Exclude the intTestComposeUp
task to run the integration tests even faster by not checking if all containers are
responding on the correct port before the tests are run.
Skipping this check can be useful if we are only making changes to the tests, and not the SMTs, connector configs or the Kafka Connect Dockerfile.
- To do this run
./gradlew integrationTestRun -x intTestComposeUp
Note
making changes to any SMT library code or connector configs will recreate the Kafka Connect docker image via
the intTestComposeBuild
task, which is a dependant of integrationTestRun
.
After running the ./gradlew check
task (included in build
), you can find the aggregate report for
code in all the subprojects in the
jacoco-report-aggregation/build
directory.
Some outstanding tasks to make the project more complete can be found here