Skip to content

Add buildkite-webhook-handler lambda to ingest webhook events from Buildkite #6998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 14, 2025

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Aug 12, 2025

This lambda receives webhook events from vLLM so that we can build HUD-like dashboard there. https://app.hex.tech/533fe68e-dcd8-4a52-a101-aefba762f581/app/030kdEgDv6lSlh1UPYOkWP is an early example from Simon.

Testing

I have manually created buildkite-webhook-handler-debug lambda and writing into vllm-buildkite-* dynamo table since few weeks back.

Also create a test release for the lambda at https://github.com/pytorch/test-infra/actions/runs/16928003507/job/47967490767

…ildkite

This is used on vLLM CI for the time being

Signed-off-by: Huy Do <[email protected]>
@huydhn huydhn requested review from clee2000 and yangw-dev August 12, 2025 22:44
Copy link

vercel bot commented Aug 12, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Project Deployment Preview Updated (UTC)
torchci ⬜️ Ignored Preview Aug 14, 2025 8:43am

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 12, 2025
Signed-off-by: Huy Do <[email protected]>
@yangw-dev
Copy link
Contributor

optional
I know it might be a bit to ask, but do you think it's good to add unit test for this?

@yangw-dev
Copy link
Contributor

@huydhn do you mind provide more context for what is the purpose of storing the buildkite jobs and events in our db? This would be helpful info

@huydhn
Copy link
Contributor Author

huydhn commented Aug 13, 2025

optional I know it might be a bit to ask, but do you think it's good to add unit test for this?

Let me ask Claude to write some unit tests, the lambda code mostly comes from Claude anyway ;)

@huydhn
Copy link
Contributor Author

huydhn commented Aug 13, 2025

@huydhn do you mind provide more context for what is the purpose of storing the buildkite jobs and events in our db? This would be helpful info

Let me add that into the README file. The high level context here is that all Buildkite webhook events about jobs running on vLLM will be available to query in the same way as GitHub webhook events.

  • In the near-term, it allows vLLM maintainers to explore their CI data like time to signals, queueing time, etc.
  • In the longer-term, this will provide the foundation for future UX projects on vLLM, i.e. vLLM HUD, CI failures notifications, etc. There are many opportunities here.

@huydhn huydhn requested a review from seemethere August 14, 2025 08:31
@huydhn huydhn merged commit 83f58f3 into main Aug 14, 2025
5 checks passed
@huydhn huydhn deleted the add-buildkite-webhook-lambda branch August 14, 2025 19:29
huydhn added a commit that referenced this pull request Aug 16, 2025
A follow-up of #6998 and
#7001 to sync these dynamoDB
tables to ClickHouse, I will need to find a permanent home for these
tables instead of using `fortesting`

### Testing

Actually running and ingesting data at the moment (I update the lambda
manually on AWS). Also backfilling the data from DynamoDB as I have run
a debug version of #6998 since
mid July.

```
python dynamo2ch.py --dynamodb-table vllm-buildkite-build-events --clickhouse-table fortesting.vllm_buildkite_builds --stored-data vllm_buildkite_builds.json
```

```
python dynamo2ch.py --dynamodb-table vllm-buildkite-job-events --clickhouse-table fortesting.vllm_buildkite_jobs --stored-data vllm_buildkite_jobs.json
```

---------

Signed-off-by: Huy Do <[email protected]>
huydhn added a commit that referenced this pull request Aug 18, 2025
…obs tables (#7001)

There are some events on dynamoDB now, so I use these event to create
the schema for `vllm_buildkite_builds` and `vllm_buildkite_jobs` tables
on ClickHouse. There is another for `vllm_buildkite_agents` that records
events from Buildkite agents, but I will add it later in a separate PR
once there are some records on dynamoDB that I can use to create that
schema.

### Testing

Run the two `CREATE TABLE` queries on the playground database
`fortesting`. The two tables are also backfilling with data from
dynamoDB because I just realize that I have been leaving the ingestion
lambda #6998 running since
July 16th-ish

cc @simon-mo

---------

Signed-off-by: Huy Do <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants