-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Debug Shell for dungbeetle #46
Comments
Hi @knadh I’ve been thinking about a few approaches for implementing metrics collection, each with its own advantages. Here are the three options I’ve come up with (apologies in advance for the long read): 1. Push-based metric collection (Quickest to implement)We already have some metrics available via API calls like This approach doesn’t require much work on the database metrics front since we already have observable metrics available through plugins. 2. CLI Client (More intuitive)This would be similar to how While intuitive, this option involves a lot of work to develop and maintain especially when you have distrubted machines over IP network 3. Generic Debug Shell (A bit complex, but versatile)This might be overkill, but I’ve seen an implementation (not open source) and thought about incorporating something similar. The idea is to have a generic debug shell, essentially a thread, that responds to debug commands from any application. The key benefit here is that each application wouldn’t need to build its own CLI for metrics. For example, say you have a command The debug shell would then standardize the way this information is fetched, making it reusable across applications without each needing its own dedicated CLI. I could work on these suggestions if one suits the purpose , I am confused if something like is this even useful in real world projects or a luxury to have |
A debug shell built into the core doesn't make sense. The correct approach is to have HTTP APIs that expose all necessary stats. Then someone can build a CLI shell or a web app etc. that uses the APIs. |
Got it,
Can I contribute to this activity in any way if this is in something in
pipeline?
…On Wed, 25 Sep 2024 at 11:49 AM, Kailash Nadh ***@***.***> wrote:
A debug shell built into the core doesn't make sense. The correct approach
is to have HTTP APIs that expose all necessary stats. Then someone can
build a CLI shell or a web app etc. that uses the APIs.
—
Reply to this email directly, view it on GitHub
<#46 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOEH5O5EPG7EZF6TTSL5LELZYJIXXAVCNFSM6AAAAABOYABIC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZTGEZDONZUGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
DungBeetle doesn't expose a metrics endpoint. We can integrate the VictoriaMetrics lib (like here: https://github.com/zerodha/kaf-relay/blob/main/main.go) and expose useful metrics for job statuses, task names etc. |
Yes makes sense ! @knadh |
+1 to HTTP APIs for stats. @RohitKumarGit if you are interested in adding support maybe can refer to the example @knadh has shared. Eg: source_pool.go and target.go Separate entities maintain their own metrics set and those are written to a single common metrics http server. On top of my head I think for I am also of the opinion that metrics per registered task can also be useful (but adding this can be up for discussion). We can use parameterized metrics for this, eg : https://github.com/zerodha/kaf-relay/blob/b5059eb0e05a38a3c0fcd2f133f6359f27259ec4/internal/relay/common.go#L29 |
Hi @kalbhor kaf-relay gathers metrics from nodes which has some ID , dungeebeetle is like one node and kaf-relay makes some-sort of http calls to gather metrics and ingests it to some metric server... am I correct? why we are clubbing metric gathering with kaf-relay , @knadh said we could go with some CLI as |
@kalbhor I guess we could achieve this like create a seperate project metric cli gets input in yaml/json/toml like ths
and yes , this config file is similar to what we define in metrics scraping in When we run the cli tool , it will read this config file and do calls to gather metrics and handling the printing of all metrics as defined in the this design is extensible and can be used for any future things like dungebeetle! |
Hm, the suggestion was to refer to the kaf-relay implementation as an example and simply replicate it here in a similar manner. It does not make sense create a separate CLI or create custom handlers. It has to be the standard Prometheus-style metrics like in kaf-relay |
okay got it , but then how will we print the output in CLI when we want ? we will have to use some gui like grafana I guess? anyways for the metrics part, yas I understand ,you are suggesting that we expose I could figure out these metrics to begin with , anything you would like to add?
|
The best way to expose Prometheus-style metrics is definitely the way this makes it possible for monitoring systems like VictoriaMetrics to scrape and manage them efficiently. Instead of printing metrics in the CLI, for real-time visibility and querying, I would suggest using tools like Grafana. They can easily integrate with Prometheus or VictoriaMetrics and provide very powerful dashboards to track metrics such as errored jobs, queue sizes, worker health, and priorities. This method is more scalable, and there is less need for an individual human intervention. What do you think? |
dungbeetle currenly has client api and HTTP endpoints for doing things like
Use Case:
Lets consider case when one of the queue started reporting some error or if a system is running 40 workers but still we see delays in requests.
Now if we want to see
Then we have only the option to do HTTP calls which is not the most efficient way and requires technical knowledge about endpoints etc.
What if we have something and we run command like this
dungeecli workers --wide
Sugegsted output
this would make debugging so easy
And I guess you would have guessed it , it's very similar to
oc
command for openshift orkubectl
for kubernetesExtra info:
In our projects we pin processes to particular core , like process1 to run only on core0 , core1.
But somehow the core changes for some random reason like crash. In this case the debug shell tells us which process is being run on which core and when it was last changes. Doing this using linux commands was not efficient , hence the debug shell.
@knadh What is your opinion on this? is it something which could be explored?
The text was updated successfully, but these errors were encountered: