Add Automated Testing with Microservices Demo Workload #48

yonch · 2025-01-31T22:08:45Z

Motivation: To validate different collection strategies, we need automated testing with realistic workloads. The Google Microservices Demo provides a good initial test environment with multiple languages and garbage collectors (Go, C#, Node.js, Python, Java). Building on our existing GitHub Actions infrastructure, we want to automatically deploy Kubernetes and run this demo workload.

The result will be a GitHub Action that spins up an AWS instance, deploys Kubernetes and the workload, and tears down the instance.

Tasks:

Install Kubernetes (k3s or KinD) on the CI instance
Deploy Google Microservices Demo (github.com/GoogleCloudPlatform/microservices-demo)
Configure and run the included load-generator for 1 minute with non-trivial load
Collect and store load-generator statistics

This infrastructure will allow testing different collection strategies and can be extended to other workloads in the future.

Chanaka1200 · 2025-02-25T01:36:47Z

Hi @yonch

I will take on this task if no one else is working on it.

Thanks

atimeofday · 2025-02-25T18:07:39Z

Hi @Chanaka1200

There was some discussion about me learning a few things to take this one on, but I ended up busy and then sick - your contribution would be greatly appreciated.

Chanaka1200 · 2025-02-26T15:58:50Z

Hi @atimeofday

I completely understand how things can get busy, and I hope you're feeling better now. I'm happy to help and will do my best to contribute!

Chanaka1200 · 2025-03-09T14:27:31Z

Hi, I’ve almost completed this, but I have an issue. I am using a single GitHub Action for both creating and destroying the VM in AWS. To avoid this, I am trying to use AWS SSM Parameter Store. Therefore, I am requesting permission for it form @yonch.

yonch · 2025-03-09T17:37:10Z

Great news! Can you say more about why we need to split to two actions? It's no problem adding permissions to SSM.

I'm asking to control complexity and potential leaked resources: if we keep everything in the same github action then it's easier to make cleanup more airtight. When we want to run multiple tests, each one is self contained...

Chanaka1200 · 2025-03-10T05:23:48Z

Hi @yonch

I'm currently using a single action file to initiate a VM, set up K3, and deploy microservices, with the VM being destroyed at the end. I suggest splitting this into two separate actions: one for setup and deployment, and another for VM destruction. I found a way to pass the VM ID to the destroy action, but I welcome any other solutions!

yonch · 2025-03-10T13:47:29Z

I think you have the right workflow:

launch VM
start k3s
deploy Microservices
run a test or multiple tests
upload results as an artifact
terminate the instance

I don't have a use-case for separating the VM launch from termination; keeping the VMs around (e.g., so we can manually change things) runs the risk of drift, where the system would not be at the state we expect and so experiment results would be invalid.

Do you have a strong use-case that requires separating the creation and termination? If not, let's leave that capability to future work.

Chanaka1200 · 2025-03-12T05:02:43Z

Cool, my mistake. I initially thought we needed to keep this VM up and running to collect the metrics. If I understand correctly now, we run this single action once, invoke a microservice to collect the metrics, and then push the artifact within the same action. Apologies for the confusion—I’ll update this to align with the provided workflow.

yonch · 2025-03-12T05:10:33Z

Great @Chanaka1200 sounds good! I'd love to start playing with this when you're done!

For a load generator, I am not sure if the Microservices demo bundles one; I've used Locust before, and a Grafana contributor told me over the weekend they use K6 (Grafana labs acquired the company that developed it, so not a huge surprise).

Chanaka1200 · 2025-03-16T09:04:08Z

Thank you for your patience. I’m currently working on it and apologize for the slight delay. Regarding the load generator, there isn’t one available for that microservice at the moment, but I’ll generate it—no issue. Both Locust and K6 are fine, and we can explore what we can do. I have one question: Should I add steps to collect metrics from test-kernel-module.yaml to the current YAML I’m working on?

yonch · 2025-03-16T15:41:42Z

If it's not too much trouble, would help to get the parquet file that is produced by test-ebpf-module -- that is what we need.

but once there is a Kubernetes cluster with a workload and load generator, I can add that too, so if you think you won't be doing that for a while, PR what you have and I'll continue it on Monday..

Chanaka1200 · 2025-03-16T16:58:10Z

No worries, I will complete this as soon as possible. Seems there is a load generator within the project that I am currently reviewing. Once I finish, I will implement the workflow and submit the PR.

loadgenerator

yonch · 2025-03-19T04:33:26Z

Hi @Chanaka1200 I really appreciate you taking on this ticket. I'd like to get a trial run going ahead of Kubecon EU to show some data in the talk. Not expecting any extra work -- it's volunteer work and on your schedule -- happy to pick up where you left off if you want to point me to it.

Chanaka1200 · 2025-03-19T04:40:30Z

Hi @yonch I'm stuck on a small issue—I tried to pass input to run the action, but it's not passing correctly. Once I resolve this, the action file will be almost complete. Reason was Locust load generator pick pass user and rate count via environment variables

workflow_dispatch: # Manual trigger for testing
inputs:
users:
description: 'Number of concurrent users for load generator'
required: false
default: '200'
type: string
rate:
escription: 'Request per second'
required: false
default: '1'
type: string

microservice-deployment:
needs: [start-runner,init-ebpf, k3-deployment]
runs-on: ${{ needs.start-runner.outputs.label }}
steps:
- name: Use input
run: echo "Input was ${{ github.event.inputs.users }}"

Chanaka1200 · 2025-03-19T06:13:05Z

It has been resolved for now using environment variables. I am still verifying, and I will send a PR once the eBPF test actions are implemented. I expect to complete this by the end of the day today.

Chanaka1200 · 2025-03-19T08:24:48Z

Hi @yonch

I’m encountering an issue again after deploying with eBPF and microservices. The microservices are not running because there isn’t enough disk space on the two VM types we're using: m7i.metal-24xl for RDT and c5.9xlarge. I believe the machulav/[email protected] is using the default disk size, and there seems to be no option to configure the disk size expansion.

Currently, this is part of a pull request that has not been merged yet:
PR #220.

Would you recommend using a VM with a larger default disk size, or do you have any suggestions to resolve this issue?

Sample event logs:

LAST SEEN   TYPE      REASON                           OBJECT                                        MESSAGE
2m26s       Warning   FailedScheduling                 pod/adservice-997b6fc95-5xxgl                 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
2m58s       Normal    Scheduled                        pod/adservice-997b6fc95-d5wdt                 Successfully assigned default/adservice-997b6fc95-d5wdt to ip-172-31-11-249
2m46s       Normal    Pulling                          pod/adservice-997b6fc95-d5wdt                 Pulling image "us-central1-docker.pkg.dev/google-samples/microservices-demo/adservice:v0.10.2"
2m57s       Warning   Failed                           pod/adservice-997b6fc95-d5wdt                 Failed to pull image "us-central1-docker.pkg.dev/google-samples/microservices-demo/adservice:v0.10.2": pull QPS exceeded
2m57s       Warning   Failed                           pod/adservice-997b6fc95-d5wdt                 Error: ErrImagePull
2m57s       Normal    BackOff                          pod/adservice-997b6fc95-d5wdt                 Back-off pulling image "us-central1-docker.pkg.dev/google-samples/microservices-demo/adservice:v0.10.2"
2m57s       Warning   Failed                           pod/adservice-997b6fc95-d5wdt                 Error: ImagePullBackOff
2m43s       Normal    Pulled                           pod/adservice-997b6fc95-d5wdt                 Successfully pulled image "us-central1-docker.pkg.dev/google-samples/microservices-demo/adservice:v0.10.2" in 2.975s (2.975s including waiting). Image size: 100256966 bytes.
2m42s       Normal    Created                          pod/adservice-997b6fc95-d5wdt                 Created container: server
2m41s       Normal    Started                          pod/adservice-997b6fc95-d5wdt                 Started container server
2m27s       Warning   Evicted                          pod/adservice-997b6fc95-d5wdt                 The node was low on resource: ephemeral-storage. Threshold quantity: 406608697, available: 385580Ki. Container server was using 28Ki, request is 0, has larger consumption of ephemeral-storage.

yonch · 2025-03-19T14:51:59Z

@tverghis has used a fork of that GitHub action with features we needed, and it worked fine until the maintainer merged those features. We could do that here as well with that branch. You can see the uses statement here.

Chanaka1200 · 2025-03-19T15:13:06Z

Let me review this, and I will implement and test it as soon as possible.

Chanaka1200 · 2025-03-20T08:36:34Z

Hi @yonch

I implemented @tverghis 's repository, but I noticed that it does not include disk space modifications. To address this, I used devin-purple's feature branch for the runner, and it is working fine.

Would it be okay to use this branch directly, or would you recommend forking it into my own repository before proceeding? Please let me know if any changes are needed.

If this approach is fine, I can send a PR, and we can review whether the eBPF implementation is correct.

Additionally, I have a question—while running the load generator, the eBPF collector runs in seconds to gather metrics. Is this behavior acceptable, or should any adjustments be made?

yonch · 2025-03-20T20:37:34Z

I think it should be fine to use their branch. Just please use a SHA to refer to the exact commit. I'm pretty sure Devin-purple will not maliciously add code to that branch, but it's the Internet! To be on the safe side let's refer to a known specific revision.

Chanaka1200 · 2025-03-21T04:25:07Z

Sure! I will make the changes and send the PR today. Let me know if any further modifications are needed. I will inform you once the PR is sent.

Chanaka1200 · 2025-03-22T07:40:56Z

Microservices deployment with Test eBPF Metrics collector

I have submitted the pull request. Please review it and let me know if any changes are needed. I’m happy to contribute to this project!

yonch · 2025-03-22T23:38:30Z

Awesome! Taking a look now

Chanaka1200 · 2025-03-23T04:19:03Z

Please let me know if there's anything I should do or change. I'm happy to help!

yonch · 2025-03-23T13:56:34Z

The results look like there isn't much stress on the system, wondering if the rate is too low. I also saw the load generator is limited to a low number of millicores (maybe 300?) which might limit it
waiting for the system to become ready, we can watch for those events. Haven't tried it, but something like this might work:
'''
kubectl wait --for=condition=Available --timeout=300s deployment --all -n default
'''
https://kubernetes.io/docs/reference/kubectl/generated/kubectl_wait/
I took a look at the opentelemetry demo repo, is it a more recently maintained version of the microsecond demo?

Chanaka1200 · 2025-03-23T14:51:58Z

I initially used the default resource values. Let me adjust them further to apply more load to the node. I used kubectl wait --for=condition=Available --timeout=300s deployment --all -n default, but sometimes a few pods take longer to become ready. I will try again. The OpenTelemetry demo repo and Google's microservices repo were both recently updated. I believe the issue was due to the default values. My apologies—I will correct this.

yonch added good first issue Good for newcomers help wanted Extra attention is needed labels Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Automated Testing with Microservices Demo Workload #48

Add Automated Testing with Microservices Demo Workload #48

yonch commented Jan 31, 2025

Chanaka1200 commented Feb 25, 2025

atimeofday commented Feb 25, 2025

Chanaka1200 commented Feb 26, 2025

Chanaka1200 commented Mar 9, 2025

yonch commented Mar 9, 2025

Chanaka1200 commented Mar 10, 2025

yonch commented Mar 10, 2025

Chanaka1200 commented Mar 12, 2025 •

edited

Loading

yonch commented Mar 12, 2025

Chanaka1200 commented Mar 16, 2025

yonch commented Mar 16, 2025

Chanaka1200 commented Mar 16, 2025 •

edited

Loading

yonch commented Mar 19, 2025

Chanaka1200 commented Mar 19, 2025 •

edited

Loading

Chanaka1200 commented Mar 19, 2025 •

edited

Loading

Chanaka1200 commented Mar 19, 2025 •

edited

Loading

yonch commented Mar 19, 2025

Chanaka1200 commented Mar 19, 2025

Chanaka1200 commented Mar 20, 2025 •

edited

Loading

yonch commented Mar 20, 2025

Chanaka1200 commented Mar 21, 2025

Chanaka1200 commented Mar 22, 2025

yonch commented Mar 22, 2025

Chanaka1200 commented Mar 23, 2025

yonch commented Mar 23, 2025

Chanaka1200 commented Mar 23, 2025

Add Automated Testing with Microservices Demo Workload #48

Add Automated Testing with Microservices Demo Workload #48

Comments

yonch commented Jan 31, 2025

Chanaka1200 commented Feb 25, 2025

atimeofday commented Feb 25, 2025

Chanaka1200 commented Feb 26, 2025

Chanaka1200 commented Mar 9, 2025

yonch commented Mar 9, 2025

Chanaka1200 commented Mar 10, 2025

yonch commented Mar 10, 2025

Chanaka1200 commented Mar 12, 2025 • edited Loading

yonch commented Mar 12, 2025

Chanaka1200 commented Mar 16, 2025

yonch commented Mar 16, 2025

Chanaka1200 commented Mar 16, 2025 • edited Loading

yonch commented Mar 19, 2025

Chanaka1200 commented Mar 19, 2025 • edited Loading

Chanaka1200 commented Mar 19, 2025 • edited Loading

Chanaka1200 commented Mar 19, 2025 • edited Loading

yonch commented Mar 19, 2025

Chanaka1200 commented Mar 19, 2025

Chanaka1200 commented Mar 20, 2025 • edited Loading

yonch commented Mar 20, 2025

Chanaka1200 commented Mar 21, 2025

Chanaka1200 commented Mar 22, 2025

yonch commented Mar 22, 2025

Chanaka1200 commented Mar 23, 2025

yonch commented Mar 23, 2025

Chanaka1200 commented Mar 23, 2025

Chanaka1200 commented Mar 12, 2025 •

edited

Loading

Chanaka1200 commented Mar 16, 2025 •

edited

Loading

Chanaka1200 commented Mar 19, 2025 •

edited

Loading

Chanaka1200 commented Mar 19, 2025 •

edited

Loading

Chanaka1200 commented Mar 19, 2025 •

edited

Loading

Chanaka1200 commented Mar 20, 2025 •

edited

Loading