Skip to content

Conversation

@HeriLFIU
Copy link
Collaborator

@HeriLFIU HeriLFIU commented Jul 16, 2025

Closes #222

Summary

The Interface Utilization Graph is fixed.

image

Local Tests

The Graphs displayed properly and no errors were thrown.

Related PR

kytos-ng/kytos_stats#25
kytos-ng/topology#275

@HeriLFIU HeriLFIU requested a review from a team as a code owner July 16, 2025 23:40
@HeriLFIU
Copy link
Collaborator Author

@viniarck @rmotitsuki
It is done

@HeriLFIU
Copy link
Collaborator Author

You need the related PRs to run it

@HeriLFIU
Copy link
Collaborator Author

Also, I blew up the unit tests with a change. I know what it was; I just need to rewrite them.

@HeriLFIU
Copy link
Collaborator Author

@rmotitsuki I just found what I believe to be a very clean solution for the dynamic resizing of the graph. I'll maybe have the changes by today.

@HeriLFIU
Copy link
Collaborator Author

I don't think it's a good idea for the chart in its current state to have a dynamic width. It should have a base width, and then the SVG can scale up and down while maintaining its aspect ratio.

@HeriLFIU
Copy link
Collaborator Author

I just made those changes and fixed the unit tests.
I'm now going to remove the D3 request library and replace it with Axios. Then maybe add a width prop.

@HeriLFIU
Copy link
Collaborator Author

But it seems to be working well now. It even scales. The x-axis ticks sometimes make no sense and squish together, but the D3 functions I am using are supposed to dynamically take care of the ticks as they see fit while it updates and receives new data.

@HeriLFIU
Copy link
Collaborator Author

The data does not save when you exit the info panel, I may need to add local storage.

@HeriLFIU
Copy link
Collaborator Author

image image

@HeriLFIU
Copy link
Collaborator Author

I think Vue 3 has a new feature called keep-alive that can cache the data from components, but I think I would have to enable it for all of the NAPP components, and that seems like it may cause performance issues, but I haven't read much into it. I probably need to test it and read the full documentation to understand it fully.

Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HeriLFIU, fantastic to have the interface utilization graphs again. They look great.

First, I don't have answers to all questions/points you raised, so we'll need to keep researching and reassessing. What I found though regarding the graphs being displayed:

  1. At some point it was displaying 2 Tbps on a 10 Gbps interface:
20250721_153729
  1. At some point it was not displaying the lines of the chart although it had numeric values:
20250721_153329

I'm not sure if all of these details are due to axis scaling, but seem like they need to be adjusted.

How to reproduce

I was using this EPL with a ring topology (but a linear,3 should suffice too):

{
    "name": "epl",
    "service_level": 6,
    "dynamic_backup_path": true,
    "uni_a": {
        "interface_id": "00:00:00:00:00:00:00:01:1"
    },
    "uni_z": {
        "interface_id": "00:00:00:00:00:00:00:03:1"
    }
}

And then I was generating traffic with iperf on mininet:

mininet> iperf h11 h3
*** Iperf: testing TCP bandwidth between h11 and h3 
*** Results: ['45.7 Gbits/sec', '45.7 Gbits/sec']
mininet> 

@HeriLFIU could you explore this scenario and check out the charts again? Also let me know regarding the timestamp value, let's see if Italo, me or someone else can help out to unblock the kytos_stats PR that you need.

@HeriLFIU
Copy link
Collaborator Author

@viniarck That other issue with the interfaces, it could be the scaling, but the most likely issue which I think could be the culprit is an old conversion function that I left in and didn't check.
image

@viniarck
Copy link
Member

@viniarck That other issue with the interfaces, it could be the scaling, but the most likely issue which I think could be the culprit is an old conversion function that I left in and didn't check. image

Right. Let's look into it. Also, btw, I observed the same behavior both when the charts panel were maximized too.

@HeriLFIU
Copy link
Collaborator Author

@viniarck The numeric values not displaying is a strange one. Did you leave it on all the time? Currently, when you open and close the info panel, it resets the table, and you need to wait for it to re-update. I'm working on a fix for that with local storage.

That's strange because if the table receives bad data, it should just ignore it by default, and it shouldn't stop displaying the values since the table itself isn't updated or changed while it's active; it only collects data from the backend and displays it.

@viniarck
Copy link
Member

viniarck commented Jul 21, 2025

@HeriLFIU

Did you leave it on all the time?

I observed that behavior when opening a new panel or when switching to another switch (opening the panel again). It stays for several seconds without a line chart sometimes.

20250721_161317 20250721_161254

Also @HeriLFIU, even when the tx_bytes and rx_bytes aren't incrementing much anymore it's displaying the incorrect interface utilization. For instance in the prior graphs in addition to 229 Gbps being incoherent (way more than the interface speed), it was no longer sending traffic in the data plane, so the counters weren't incrementing much anymore. The UI needs to plot the delta between the samples, I haven't reviewed that part yet, but that should give some clue. Try to look into it, in addition to the part that you shared in the prior reply.

@HeriLFIU
Copy link
Collaborator Author

@viniarck Oh, then yes, that's the updating issue; it only runs when its menu/info panel is currently open and resets its data if it switches menus.

@HeriLFIU
Copy link
Collaborator Author

Ok, I'll look into the other issue and see what may be causing it.

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Dec 4, 2025

@viniarck
image
There is an issue with the local storage that I'm fixing, but it seems to be working for the most part.
Should I round the decimals to two places or leave it as is?

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Dec 5, 2025

@viniarck @rmotitsuki It should all be working. I did some testing and the new calculations for bps were working great and the local storage as well.
image
This one was running for a while.

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Dec 5, 2025

@viniarck @rmotitsuki There are actually only 2 issues; the charts only run while open. A fix for this is just to have a sort of manager or store that handles all the data and constantly fetches it so that when the timeseries chart is opened, it can fetch the data from the store.

Another issue is that when no topology is present, the time series chart throws an error.

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Dec 5, 2025

Also currently the local storage just keeps pushing new data till infinity; I don't know if that could be a future issue. Maybe a "clear data" button would be nice and a limit to the amount of data stored within the chart. What do you all think?

@viniarck
Copy link
Member

viniarck commented Dec 7, 2025

@HeriLFIU, great to see the latest commits and fixes. I think we're overusing and overstretching what local store is typically for, especially considering the volume of data and the other concerns you raised too (QuotaExceededError and managing all of that data clean up).

Can we get rid of local storage and then only display the data when the chart is option and regarding the prior issues/requirements mentioned on #228 (comment), when it still doesn't have enough data then we could simply show that's loading data?

These charts here are expected to be a quick glance summarized over a reasonable (up to 15 minutes tops?) time window, that would also simplify what we'd need to maintain in the front-end. What do you think @HeriLFIU and @rmotitsuki? Any other suggestions?

@HeriLFIU
Copy link
Collaborator Author

@viniarck That sounds good. Should I also implement a store so that the chart updates while closed, or should it only update while open? Also, if it should only update while open, should I also shorten the time to fetch new data?

@HeriLFIU
Copy link
Collaborator Author

@viniarck It's a bit tricky because the component gets removed whenever the tab is not open. When you close the tab/accordion, the data gets deleted as well. It's currently loading in all of the data because of local storage. I need to somehow keep the data even when the tab/accordion is closed, and I think I can do that with a store, but moving a lot of the code and data from the component, especially since it receives a lot of it from other components, is a bit tricky. Maybe I can think of another simpler solution.

@viniarck
Copy link
Member

@HeriLFIU, since the stats are part of switch details and since just one request to api/amlight/kytos_stats/v1/port/stats/ can fetch all the data for all dpids that's relatively cheap, and storing in memory a few samples over time won't be too much data either (so we won't have MBs of data), so having a period request every 50s (and in the future we can make this configurable, just so users can use a valuable that's aligned with of_core stats interval), so pre-computing this even before switch details is open I think it's acceptable. Although only computing when it's open it's fine too, although will tend to have poorer UX since it'll always have a cold start before getting the data.

@HeriLFIU, yes, please think about and let's see which one we can go, looks like pre-computing/pre-fetching and keep fetching periodically wouldn't be too far away from what you currently have here right? Except no longer using localStorage but using the browser heap memory

@HeriLFIU
Copy link
Collaborator Author

dear god,
everything seems to be in order and working.

@HeriLFIU
Copy link
Collaborator Author

@viniarck I was able to apply all of the latest changes, and they all seem to be functioning pretty well. There is only one sad caveat. The store I made for the interfaces—currently I have seen no issues with it (although it gave more than enough issues while writing it)—is excellent and can be reused for many other components. I had this idea in the past of having a store that fetches and constantly refreshes the data for all of the components since it's all very similar, and the code from this store can be recycled to do just that. I believe it just needs a ton of getters to safely retrieve the data and clone it if it's an object or array, but I just discovered something very sad, but it could have a clean solution; it's very tricky to know, especially since it will require a lot of testing.

Currently, I had set up the store to also fetch the data from switch_info within the topology napp. Since the store required that data anyways for the interfaces, it would have been great for the switch_info to just fetch all of its data from the store, removing a ton of redundant code and an API request, but since it's an Napp UI that is being loaded by the vue3-sfc-loader, it does not have direct access to the stores, and I have been trying to read about it online, but I have yet to find a way to do so. Since this is probably some very specific issue, I don't think there are a lot of people dynamically loading components with the vue3-sfc-loader that also want those components to have access to Pinia.

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 9, 2026

@viniarck I checked the code for the port stats endpoint, and it seems like it should work. Every time it's called, it should get a new datetime.utcnow().

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 9, 2026

The reason I wasn't getting this issue before is because I was using a linear topo, and everything seemed to run very smoothly, but now that I run a ring topo, it's very laggy and takes a few minutes to load.

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 9, 2026

I see a bunch of overlapping stats requests, I don't know much about the backend but that seems related.

@viniarck
Copy link
Member

viniarck commented Jan 9, 2026

I see a bunch of overlapping stats requests, I don't know much about the backend but that seems related.

@HeriLFIU that's typically a symptom when either the controller or mininet is overload, usually, it ends up being mininet especially when running in a VM, can you double check if you've given sufficient CPUs/vCPUs and RAM for it?

I don't know what the right course of action would be for this, so I'll just switch back to new Date() and add in a check for NaN and set that point on the graph to 0 if it is detected.

You should be able to rely on updated_at per switch, it'll be unique, and that indeed is expected to be the correct source of the truth as far as that stats switch data point. But, for sure, if switches are overloaded and taking to long to respond that might start drifting, but still, it'll be correct for the given switch.

We can also consider ignoring if the front-end ended up seeing the same timestamp per switch, but let's debug here and see if it's also not being truncated, even when things get super overloaded it's still expected to have unique timestamps per switch

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 9, 2026

image @viniarck I've got 4 CPUs and 8 Gb of ram. Oh, it could also be because I haven't pulled the latest changes from kytos in the longest time. Let me get to it real quick.

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 9, 2026

and it could be caching the first request

@viniarck
Copy link
Member

viniarck commented Jan 9, 2026

image @viniarck I've got 4 CPUs and 8 Gb of ram. Oh, it could also be because I haven't pulled the latest changes from kytos in the longest time. Let me get to it real quick.

Right. Yes, that's a pretty good config/resource, kytosd with a ring topology should run without stats overlap warning in this env

@viniarck
Copy link
Member

viniarck commented Jan 9, 2026

and it could be caching the first request

That might be it @HeriLFIU, at least the part that was contributing to a same timestamp, now regarding overlapping requests and overall slowness that one we'd have to see, but usually as mentioned before it's when the server doesn't have too much CPU/RAM available, but in your case seems sufficient. Try rebooting too the VM, see if with just ta ring topo if you're able to run it without any log warnings, if you still see warnings we can try to try debug this together

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 9, 2026

@viniarck I deleted my virtual environment, pulled the latest changes from all repositories, and redid a pip install editable in each one.
I'm still getting the overlapping stats request issue even after rebooting.
They look like this:

2026-01-09 17:22:00,522 - INFO [kytos.napps.kytos/of_core] (Thread-1147 (request_stats)) Overlapping stats request: switch 00:00:00:00:00:00:00:03 flows_xid 1040849432 ports_xid 3878065281
2026-01-09 17:22:06,841 - INFO [kytos.napps.kytos/of_core] (Thread-1148 (request_stats)) Overlapping stats request: switch 00:00:00:00:00:00:00:01 flows_xid 2675405323 ports_xid 1928944852
2026-01-09 17:22:14,192 - INFO [kytos.napps.kytos/of_core] (Thread-1152 (request_stats)) Overlapping stats request: switch 00:00:00:00:00:00:00:02 flows_xid 2124966190 ports_xid 954779010
2026-01-09 17:22:14,550 - INFO [kytos.napps.kytos/of_core] (Thread-1153 (request_stats)) Overlapping stats request: switch 00:00:00:00:00:00:00:03 flows_xid 3168261271 ports_xid 2183271043

I also did a lot of testing and I couldn't find any signs of caching.
I tried using Postman and Chrome to send a lot of requests to the same endpoint, and sometimes they would just repeat for a certain amount of time without getting a new set of data.
I can only think of one last thing and that's a reference issue with the response.

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 9, 2026

image

@HeriLFIU
Copy link
Collaborator Author

@viniarck It took me some time, but I believe I have finally identified the root cause of the issue; it was related to references and JS objects. I was pushing an object into an array; I forgot it would be a reference and not a copy. I pushed the response data into an array, and whenever I modified the array, it would in turn modify the response data, which meant that the response data would always be the same.

At least that's what I got from the code; let me test it out.

@HeriLFIU
Copy link
Collaborator Author

@viniarck I actually don't think the reference issue affected the response.
I did one last test, and I was able to take these images:
image
image
image
I performed an HTTP request with Postman, Chrome, and Axios at different times, and they all gave me the same data and the same updated_at property, which resulted in the 0/0 division.

I also checked for caching and saw none of the headers or signs in Chrome DevTools or Postman that show that the data was retrieved from a cache. Could it be being cached in the backend, or is something else going on here?

I'm stumped.

@HeriLFIU
Copy link
Collaborator Author

@viniarck Also, this seems to be out of whack, because I sometimes restart Kytos and the VM, and I do get different timestamps or updated_at values, but then I reload and it stays the same throughout. Which led me to a bit of confusion.

@HeriLFIU
Copy link
Collaborator Author

Maybe it's an issue on my environment.

@HeriLFIU
Copy link
Collaborator Author

@viniarck David helped me locate the issue, but it's still a bit strange. I created an issue on kytos_stats for it, but it seems to span both kytos_stats and of_core.
kytos-ng/kytos_stats#28

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 12, 2026

@viniarck The date does not update on request because the function it is within updates when of_core retrieves stats data from a switch, and for some reason of_core it doesn't seem to be returning data every 7 seconds as it should and just fails, but it does work at 20 seconds, and there are no overlapping stats requests at 20 seconds.
image

@HeriLFIU
Copy link
Collaborator Author

Ill add an additional check to make sure the data is fresh before putting it into the graph.

@HeriLFIU
Copy link
Collaborator Author

@viniarck Actually raising the time to 20 seconds fixed it for a bit, but the overlapping requests started to take hold again, and now it will update no more.

@HeriLFIU
Copy link
Collaborator Author

image

@HeriLFIU
Copy link
Collaborator Author

Even if I get a timestamp from the frontend or the backend sends a new timestamp on every request it receives instead of when it receives new data from of_core, then it would still be kind of broken because the other data isn't also being updated and the graphs would remain without change.

@viniarck
Copy link
Member

@HeriLFIU thanks for the updates.

  • Yes, your of_core STATS_INTERVAL is too low, by default it should be 60 secs. Whenever you see that warning overlapped requests, it's a sign that it's taking to long to process and either switches are overwhelmed or the interval is too low (which was the case here), either way it should never keeps warning for too long, one or two spike might happen, so if it does, it's expected that the user will set a higher STATS_INTERVAL.

  • Also, yes, the updated_at being different is under the assumption that it was making a request on different stats interval periods, but as you noticed, front-end gotta also be robust not to crash if either the interval is configured too low or if the back-end might be experiencing temporary overload, and then we also should try to let users know somehow by not also showing misleading or incorrect data.

Even if I get a timestamp from the frontend or the backend sends a new timestamp on every request it receives instead of when it receives new data from of_core, then it would still be kind of broken because the other data isn't also being updated and the graphs would remain without change.

  • Yes, maybe we can still plot the repeated data point, and whenever the updated_at hasn't been updated we could signal on each interface chart that it might have "stale" data? I was thinking of maybe adding a label displaying "stale"? Or maybe we can also just color that left bar that becomes red with high traffic to also be another color that represents being stale? Think about it @HeriLFIU, see whichever best UX you can also come up with:
20260114_130847

@viniarck viniarck self-requested a review January 14, 2026 21:12
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just re-reviewed the code too, looks good @HeriLFIU, the only final thing remaining is dealing with potential stale data points, and think can land. Once again, thanks for all the effort fixing these charts and enabling this feature to work again.

Also, reminder not to forget to collect a good looking screenshot for us to add on the upcoming release notes that will be announced with the 2025.2 release

@HeriLFIU
Copy link
Collaborator Author

HeriLFIU commented Jan 14, 2026

@viniarck I think adding the little stale data option and some color would be the best way to go, because it is the most direct. I think just showing a color to represent stale could be interpreted in different ways and could lead to some confusion, so I'll maybe choose an orange or brown color for stale and add the name.

I'll also send you some nice screenshots in a bit. I was installing WSL2 and Docker on my PC to see if I could do some projects directly on Windows without booting up a VM, and I had some drivers, from some lab equipment I worked with for an engineering lab, installed that triggered something and crashed my PC, since all of those lab programs are old anyways and I'll probably never use them again. I'm uninstalling them all to then try and reinstall WSL.

@viniarck
Copy link
Member

viniarck commented Jan 14, 2026

@viniarck I think adding the little stale data option and some color would be the best way to go, because it is the most direct. I think just showing a color to represent stale could be interpreted in different ways and could lead to some confusion, so I'll maybe choose an orange or brown color for stale and add the name.

Sounds good @HeriLFIU, I agree, just a color might not be obvious for whoever is reading the chart in this case, yes, if you can add a name/label that'd be great.

I'll also send you some nice screenshots in a bit. I was installing WSL2 and Docker on my PC to see if I could do some projects directly on Windows without booting up a VM, and I had some drivers, from some lab equipment I worked with for an engineering lab, installed that triggered something and crashed my PC, since all of those lab programs are old anyways and I'll probably never use them again. I'm uninstalling them all to then try and reinstall WSL.

Thanks, you can upload them here for the record, and then whoever is writing the release notes (me or someone else) can use it. Damn, it sucks to hear that you had this crash with WSL2, yes, not using a VM is nice, with Linux I get not to use a VM and things run pretty smoothly. Good luck uninstalling them all and try WSL again, if you succeed with WSL that's very helpful and good to know since that can be helpful for other team mates or new contributors in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix the interface utilization graph

2 participants