|
| 1 | +# CLAUDE.md — Apache DolphinScheduler |
| 2 | + |
| 3 | +Apache DolphinScheduler is a distributed, visual DAG workflow-scheduling platform. This is the monorepo: backend servers (master / worker / api / alert), a Vue 3 frontend, plugin families for tasks / datasources / storage / alerting / scheduling, and the release tooling. |
| 4 | + |
| 5 | +**This file is an index.** Each module has its own `CLAUDE.md` with the details — do not duplicate module contents here. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Tech stack (project-wide) |
| 10 | + |
| 11 | +- **Java 1.8** (do not assume 11+ APIs; `dolphinscheduler-api-test` is the only Java 11 island). |
| 12 | +- **Spring Boot 2.6.1** across servers, **Jetty** (Tomcat is excluded transitively). |
| 13 | +- **MyBatis-Plus** for ORM; **HikariCP** for the metadata DB pool, **Druid** inside user-facing datasource plugins. |
| 14 | +- **Quartz** for cron scheduling (via `scheduler-plugin`). |
| 15 | +- **Netty / gRPC** for inter-server RPC (see `extract-base`). |
| 16 | +- **Vue 3 + Vite + TypeScript + Naive UI** for the frontend. |
| 17 | +- **Maven** multi-module reactor (26 modules in root `pom.xml` + 2 test modules). |
| 18 | +- **Zookeeper 3.8** by default for the registry (Etcd and JDBC also supported). |
| 19 | + |
| 20 | +## Runnable services |
| 21 | + |
| 22 | +A production deployment runs **four independent services** (plus an external registry and metadata DB). A fifth entry point — `StandaloneServer` — embeds all four in one JVM for development. |
| 23 | + |
| 24 | +| Service | Module | Main class | Default ports | |
| 25 | +|---------|--------|------------|---------------| |
| 26 | +| **API** | [`dolphinscheduler-api`](dolphinscheduler-api/CLAUDE.md) | `org.apache.dolphinscheduler.api.ApiApplicationServer` | `12345` (HTTP / UI + REST) | |
| 27 | +| **Master** | [`dolphinscheduler-master`](dolphinscheduler-master/CLAUDE.md) | `org.apache.dolphinscheduler.server.master.MasterServer` | `5679` (RPC) | |
| 28 | +| **Worker** | [`dolphinscheduler-worker`](dolphinscheduler-worker/CLAUDE.md) | `org.apache.dolphinscheduler.server.worker.WorkerServer` | `1235` (RPC) | |
| 29 | +| **Alert** | [`dolphinscheduler-alert`](dolphinscheduler-alert/CLAUDE.md) (→ `-alert-server`) | `org.apache.dolphinscheduler.alert.AlertServer` | `50053` (HTTP), `50052` (RPC) | |
| 30 | +| Standalone (dev only) | [`dolphinscheduler-standalone-server`](dolphinscheduler-standalone-server/CLAUDE.md) | `org.apache.dolphinscheduler.StandaloneServer` | `12345` + `50052` (API + alert; master/worker use in-JVM calls) | |
| 31 | + |
| 32 | +Every service is a `@SpringBootApplication` on Jetty and implements `IStoppable`. Scale Master / Worker / Alert horizontally; coordination happens via the registry (Zookeeper by default). API is stateless and also scales horizontally behind a load balancer. |
| 33 | + |
| 34 | +Ports are overridable via `server.port` / service-specific keys in each service's `application.yaml`. |
| 35 | + |
| 36 | +## Build & run |
| 37 | + |
| 38 | +```bash |
| 39 | +# Full build (release profile; produces dist tarball) |
| 40 | +./mvnw clean install -Prelease |
| 41 | + |
| 42 | +# Zookeeper 3.4 legacy |
| 43 | +./mvnw clean install -Prelease -Dzk-3.4 |
| 44 | + |
| 45 | +# Skip UI build (faster iteration on backend only) |
| 46 | +./mvnw -pl '!dolphinscheduler-ui' clean install |
| 47 | + |
| 48 | +# Build one module (+ its required siblings) |
| 49 | +./mvnw -pl dolphinscheduler-master -am clean install |
| 50 | + |
| 51 | +# Format (Spotless is configured) |
| 52 | +./mvnw spotless:apply |
| 53 | + |
| 54 | +# Standalone server (after building) |
| 55 | +cd dolphinscheduler-standalone-server/target && ./bin/start.sh |
| 56 | +``` |
| 57 | + |
| 58 | +Binary artifact: `dolphinscheduler-dist/target/apache-dolphinscheduler-*-bin.tar.gz`. |
| 59 | + |
| 60 | +## Test |
| 61 | + |
| 62 | +```bash |
| 63 | +# Unit tests for one module |
| 64 | +./mvnw -pl dolphinscheduler-master test |
| 65 | + |
| 66 | +# API integration tests (separate reactor, requires Docker) |
| 67 | +mvn -pl dolphinscheduler-api-test/dolphinscheduler-api-test-case test |
| 68 | + |
| 69 | +# E2E browser tests (Selenium + Docker) |
| 70 | +mvn -pl dolphinscheduler-e2e/dolphinscheduler-e2e-case test |
| 71 | + |
| 72 | +# Apple Silicon: add -Dm1_chip=true to the Docker-driven suites |
| 73 | +``` |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +## Module index |
| 78 | + |
| 79 | +Click into a module's `CLAUDE.md` for details. Each description is one line here on purpose. |
| 80 | + |
| 81 | +### Core execution |
| 82 | + |
| 83 | +- [`dolphinscheduler-master`](dolphinscheduler-master/CLAUDE.md) — workflow orchestration engine; consumes `Command`s, runs the DAG state machine, dispatches to workers. |
| 84 | +- [`dolphinscheduler-worker`](dolphinscheduler-worker/CLAUDE.md) — runs physical tasks dispatched from master; hosts task plugins. |
| 85 | +- [`dolphinscheduler-task-executor`](dolphinscheduler-task-executor/CLAUDE.md) — reusable task-lifecycle framework embedded by the worker. |
| 86 | +- [`dolphinscheduler-alert`](dolphinscheduler-alert/CLAUDE.md) — alert server + channel plugins (email, Feishu, DingTalk, …). |
| 87 | + |
| 88 | +### API layer |
| 89 | + |
| 90 | +- [`dolphinscheduler-api`](dolphinscheduler-api/CLAUDE.md) — REST API server (entry point for UI, Python SDK, external clients). |
| 91 | +- [`dolphinscheduler-api-test`](dolphinscheduler-api-test/CLAUDE.md) — integration tests against the REST API (Docker Compose + Testcontainers). |
| 92 | +- [`dolphinscheduler-authentication`](dolphinscheduler-authentication/CLAUDE.md) — Actuator-endpoint auth + AWS credential helpers (NOT the main login path). |
| 93 | + |
| 94 | +### Shared libraries |
| 95 | + |
| 96 | +- [`dolphinscheduler-common`](dolphinscheduler-common/CLAUDE.md) — foundation utilities (everything depends on this). |
| 97 | +- [`dolphinscheduler-dao`](dolphinscheduler-dao/CLAUDE.md) — MyBatis DAO layer + SQL migration scripts. |
| 98 | +- [`dolphinscheduler-service`](dolphinscheduler-service/CLAUDE.md) — business logic between DAO and the servers. |
| 99 | +- [`dolphinscheduler-spi`](dolphinscheduler-spi/CLAUDE.md) — Service-Provider Interface root (every plugin depends on this). |
| 100 | +- [`dolphinscheduler-extract`](dolphinscheduler-extract/CLAUDE.md) — RPC interface contracts between servers. |
| 101 | +- [`dolphinscheduler-eventbus`](dolphinscheduler-eventbus/CLAUDE.md) — in-process event-bus abstractions. |
| 102 | +- [`dolphinscheduler-registry`](dolphinscheduler-registry/CLAUDE.md) — pluggable registry (Zookeeper / Etcd / JDBC). |
| 103 | +- [`dolphinscheduler-meter`](dolphinscheduler-meter/CLAUDE.md) — metrics (Prometheus) + server load-protection primitives. |
| 104 | + |
| 105 | +### Plugin families |
| 106 | + |
| 107 | +- [`dolphinscheduler-task-plugin`](dolphinscheduler-task-plugin/CLAUDE.md) — task-type plugins (shell, SQL, Spark, Flink, K8s, EMR, …). 33 concrete plugins. |
| 108 | +- [`dolphinscheduler-datasource-plugin`](dolphinscheduler-datasource-plugin/CLAUDE.md) — user-facing datasource plugins (MySQL, Hive, Trino, Snowflake, …). 28 concrete plugins. |
| 109 | +- [`dolphinscheduler-storage-plugin`](dolphinscheduler-storage-plugin/CLAUDE.md) — resource storage (S3, HDFS, OSS, GCS, ABS, OBS, COS). |
| 110 | +- [`dolphinscheduler-scheduler-plugin`](dolphinscheduler-scheduler-plugin/CLAUDE.md) — cron scheduler (Quartz today). |
| 111 | +- [`dolphinscheduler-dao-plugin`](dolphinscheduler-dao-plugin/CLAUDE.md) — metadata-DB dialect support (MySQL / PostgreSQL / H2). |
| 112 | + |
| 113 | +### Build, ops, tools |
| 114 | + |
| 115 | +- [`dolphinscheduler-bom`](dolphinscheduler-bom/CLAUDE.md) — Maven BOM; central dependency version pinning. |
| 116 | +- [`dolphinscheduler-dist`](dolphinscheduler-dist/CLAUDE.md) — assembles the release tarball + Docker images. |
| 117 | +- [`dolphinscheduler-standalone-server`](dolphinscheduler-standalone-server/CLAUDE.md) — all-in-one JVM with H2 (dev / smoke tests). |
| 118 | +- [`dolphinscheduler-tools`](dolphinscheduler-tools/CLAUDE.md) — CLIs for schema upgrade + resource / lineage migration. |
| 119 | +- [`dolphinscheduler-microbench`](dolphinscheduler-microbench/CLAUDE.md) — JMH micro-benchmarks. |
| 120 | +- [`dolphinscheduler-yarn-aop`](dolphinscheduler-yarn-aop/CLAUDE.md) — AspectJ weaver capturing YARN ApplicationIds. |
| 121 | + |
| 122 | +### Frontend & E2E |
| 123 | + |
| 124 | +- [`dolphinscheduler-ui`](dolphinscheduler-ui/CLAUDE.md) — Vue 3 frontend. |
| 125 | +- [`dolphinscheduler-e2e`](dolphinscheduler-e2e/CLAUDE.md) — Selenium browser tests. |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## Architecture overview (one paragraph) |
| 130 | + |
| 131 | +A **user** hits the UI, which calls the API server. The API server writes to the **metadata DB** and, for runtime operations (start / kill / pause workflow), talks to the **master** over RPC. The master consumes `t_ds_command` rows, runs the workflow state machine, and dispatches tasks to **workers**. Workers execute task plugins (shell, SQL, Spark, …) and stream lifecycle events back to master. Failures and SLA breaches flow to the **alert server**, which fans out through alert plugins. **Registry** (Zookeeper / Etcd / JDBC) provides service discovery, leader election, and distributed locks. **Storage plugins** back the resource center and distributed-task artifacts. **Quartz** (via scheduler plugin) fires scheduled workflows, which become new `Command` rows. |
| 132 | + |
| 133 | +## Where things live (quick lookup) |
| 134 | + |
| 135 | +| Looking for… | Start here | |
| 136 | +|--------------|------------| |
| 137 | +| A REST endpoint | `dolphinscheduler-api/src/main/java/.../api/controller/` | |
| 138 | +| Workflow execution logic | `dolphinscheduler-master/src/main/java/.../server/master/engine/` | |
| 139 | +| Task execution logic | `dolphinscheduler-worker` + the specific `task-plugin/<type>` | |
| 140 | +| How "X" is stored | `dolphinscheduler-dao/src/main/java/.../dao/entity/` | |
| 141 | +| SQL schema / upgrade | `dolphinscheduler-dao/src/main/resources/sql/` | |
| 142 | +| RPC contract between servers | `dolphinscheduler-extract/dolphinscheduler-extract-<role>` | |
| 143 | +| UI page source | `dolphinscheduler-ui/src/views/<feature>/` | |
| 144 | +| API call in the UI | `dolphinscheduler-ui/src/service/modules/<resource>.ts` | |
| 145 | +| Version of a dependency | `dolphinscheduler-bom/pom.xml` | |
| 146 | + |
| 147 | +## Project-wide conventions |
| 148 | + |
| 149 | +- **Formatting**: `./mvnw spotless:apply`. CI will fail PRs that aren't formatted. Java imports are ordered; license headers are enforced. |
| 150 | +- **Commit style**: `[Type-ISSUE_ID] [Scope] Subject`, e.g. `[Fix-18168] [Worker] ...`. Scopes match module names. |
| 151 | +- **Branching**: `dev` is the main integration branch (not `main`/`master`). |
| 152 | +- **PRs must link a GitHub issue** and keep their scope tight — one module / one concern. |
| 153 | +- **Do not break wire / DB compatibility** silently. Changes to `extract-*` RPC interfaces, `dao` entities, enum values, and `spi.DbType` ripple to deployed clusters mid-upgrade. |
| 154 | +- **Only one registry / storage / DB dialect is active at runtime**. Code paths that check "which one" belong inside the plugin SPI, not sprinkled through services. |
| 155 | + |
| 156 | +## External references |
| 157 | + |
| 158 | +- Release docs (version-specific): https://dolphinscheduler.apache.org/en-us/docs |
| 159 | +- GitHub issues: https://github.com/apache/dolphinscheduler/issues |
| 160 | +- Python SDK: https://dolphinscheduler.apache.org/python/main/index.html |
| 161 | +- Contribution guide: [`docs/docs/en/contribute/join/contribute.md`](docs/docs/en/contribute/join/contribute.md) |
0 commit comments