|
| 1 | +# Aggregate VOQ Counters in SONiC # |
| 2 | +#### Rev 1.0 |
| 3 | + |
| 4 | +## Table of Content |
| 5 | + * [Revision](#revision) |
| 6 | + * [Overview](#overview) |
| 7 | + * [Requirements](#requirements) |
| 8 | + * [Architecture Design](#architecture-design) |
| 9 | + * [High-Level Design](#high-level-design) |
| 10 | + * [Repositories that need to be changed](#repositories-that-need-to-be-changed) |
| 11 | + * [Configuration and management](#configuration-and-management) |
| 12 | + * [Testing Requirements/Design](#testing-requirementsdesign) |
| 13 | + * [System Test cases](#system-test-cases) |
| 14 | + * [Limitations and future work](#limitations-and-future-work) |
| 15 | + |
| 16 | +### Revision |
| 17 | +| Rev | Date | Author | Change Description | |
| 18 | +|:---:|:-----------:|:----------------------------------------------------------------------------------:|-----------------------------------| |
| 19 | +| 1.0 | 19-Nov-2024 | Harsis Yadav, Pandurangan R S, Vivek Kumar Verma (Arista Networks) | Initial public version | |
| 20 | + |
| 21 | +### Overview |
| 22 | + |
| 23 | +In a [distributed VOQ architecture](https://github.com/sonic-net/SONiC/blob/master/doc/voq/architecture.md) corresponding to each output VOQ present on an ASIC, there are VOQs present on every ASIC in the system. Each ASIC has its own set of VOQ stats maintained in the FSI that have to be gathered independently and can be hard to visualize, providing a non-cohesive experience. |
| 24 | + |
| 25 | +### Requirements |
| 26 | + |
| 27 | +Provide aggregate VOQ counters in a distributed VOQ architecture. |
| 28 | + |
| 29 | +### Architecture Design |
| 30 | + |
| 31 | +No new architecture changes are required to SONiC. |
| 32 | + |
| 33 | +### High-Level Design |
| 34 | + |
| 35 | +On multi-asic systems (fixed system or linecard) the redis databases corresponding to each namespace are exposed on the docker network (non-loopback IP): https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-database/docker-database-init.sh. And when the IP to be bound to redis instance is not loopback IP we start redis server in unprotected mode: https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-database/supervisord.conf.j2#L38 |
| 36 | + |
| 37 | +We can leverage this property to expose redis instances that correspond to each ASIC over midplane IP addresses in addition to docker network in case of a chassis based system using the shell script docker-database.sh by simple redis-commands. |
| 38 | + |
| 39 | +``` |
| 40 | +redis-cli -h $ip -p $port config set bind "$bound_ips $midplane_ip" |
| 41 | +redis-cli -h $ip -p $port config rewrite |
| 42 | +``` |
| 43 | + |
| 44 | +This gives us access to each ASIC's redis instance from the supervisor. Then queuestat script can access the counters data and provide the user an aggregated view of the VOQ counters. |
| 45 | + |
| 46 | +#### sonic-buildimage changes |
| 47 | +`docker_image_ctl.j2` needs to be modified to incorporate changes in order expose namespace redis instances on linecards over midplane network. |
| 48 | + |
| 49 | +PR: https://github.com/sonic-net/sonic-buildimage/pull/20803 |
| 50 | + |
| 51 | +#### sonic-swss-common changes |
| 52 | +A new API will be added to sonicv2connector.cpp which can take the db name and host IP as an argument and connect us to the redis instance. The existing [API](https://github.com/sonic-net/sonic-swss-common/blob/202411/common/sonicv2connector.cpp#L18-L30) needs the db_name and the host_ip and port (or unix_socket) is decoded using [database_config.json](https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-database/database_config.json.j2). This API is tailored for use cases when you want to connect to the namespace redis instances from the same device. Our use case involves connecting to a redis instances over midplane IP hence the new API is needed. |
| 53 | + |
| 54 | +PR: https://github.com/sonic-net/sonic-swss-common/pull/1003 |
| 55 | + |
| 56 | +#### sonic-py-swsssdk changes |
| 57 | +Same rationale as sonic-swss-common applies here as well. Also we would like to maintain parity between dbconnector.py between swsscommon and swsssdk so that we can write a unit test for this feature or even otherwise. |
| 58 | + |
| 59 | +PR: https://github.com/sonic-net/sonic-py-swsssdk/pull/147 |
| 60 | + |
| 61 | +#### sonic-utilities changes |
| 62 | +We would leverage the existing `show queue counters --voq` on the supervisor to connect to various forwarding ASIC's database instances and do a summation of VOQ counters corresponding to each system port. |
| 63 | + |
| 64 | +PR: https://github.com/sonic-net/sonic-utilities/pull/3617 |
| 65 | + |
| 66 | +#### Repositories that need to be changed |
| 67 | + * sonic-buildimage |
| 68 | + * sonic-swss-common |
| 69 | + * sonic-py-swsssdk |
| 70 | + * sonic-utilities |
| 71 | + |
| 72 | +### Configuration and management |
| 73 | +#### CLI |
| 74 | + |
| 75 | +From linecard - cmp217-5 for asic0 (existing CLI) |
| 76 | +``` |
| 77 | +admin@cmp217-5:~$ show queue counters -n asic0 "cmp217-5|asic1|Ethernet256" --voq |
| 78 | +For namespace asic0: |
| 79 | + Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes Credit-WD-Del/pkts |
| 80 | +-------------------------- ----- -------------- --------------- ----------- ------------ -------------------- |
| 81 | +cmp217-5|asic1|Ethernet256 VOQ0 123 6150 0 0 0 |
| 82 | +cmp217-5|asic1|Ethernet256 VOQ1 12 600 0 0 0 |
| 83 | +cmp217-5|asic1|Ethernet256 VOQ2 1000 50000 0 0 0 |
| 84 | +cmp217-5|asic1|Ethernet256 VOQ3 456 22800 0 0 0 |
| 85 | +cmp217-5|asic1|Ethernet256 VOQ4 211 10550 0 0 0 |
| 86 | +cmp217-5|asic1|Ethernet256 VOQ5 45 2250 0 0 0 |
| 87 | +cmp217-5|asic1|Ethernet256 VOQ6 24 1200 0 0 0 |
| 88 | +cmp217-5|asic1|Ethernet256 VOQ7 0 0 0 0 0 |
| 89 | +``` |
| 90 | + |
| 91 | +From linecard - cmp217-5 for asic1 (existing CLI) |
| 92 | +``` |
| 93 | +admin@cmp217-5:~$ show queue counters -n asic1 "cmp217-5|asic1|Ethernet256" --voq |
| 94 | +For namespace asic1: |
| 95 | + Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes Credit-WD-Del/pkts |
| 96 | +-------------------------- ----- -------------- --------------- ----------- ------------ -------------------- |
| 97 | +cmp217-5|asic1|Ethernet256 VOQ0 1111 55550 0 0 0 |
| 98 | +cmp217-5|asic1|Ethernet256 VOQ1 45 2250 0 0 0 |
| 99 | +cmp217-5|asic1|Ethernet256 VOQ2 9 450 0 0 0 |
| 100 | +cmp217-5|asic1|Ethernet256 VOQ3 91 4550 0 0 0 |
| 101 | +cmp217-5|asic1|Ethernet256 VOQ4 55 2750 0 0 0 |
| 102 | +cmp217-5|asic1|Ethernet256 VOQ5 88 4400 0 0 0 |
| 103 | +cmp217-5|asic1|Ethernet256 VOQ6 21 1050 0 0 0 |
| 104 | +cmp217-5|asic1|Ethernet256 VOQ7 48 14437 0 0 0 |
| 105 | +
|
| 106 | +``` |
| 107 | + |
| 108 | +From supervisor (same command extended for sup.) |
| 109 | + |
| 110 | +``` |
| 111 | +admin@cmp217:~$ show queue counters "cmp217-5|asic1|Ethernet256" --voq |
| 112 | + Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes Credit-WD-Del/pkts |
| 113 | +-------------------------- ----- -------------- --------------- ----------- ------------ -------------------- |
| 114 | +cmp217-5|asic1|Ethernet256 VOQ0 1234 61700 0 0 0 |
| 115 | +cmp217-5|asic1|Ethernet256 VOQ1 57 2850 0 0 0 |
| 116 | +cmp217-5|asic1|Ethernet256 VOQ2 1009 50450 0 0 0 |
| 117 | +cmp217-5|asic1|Ethernet256 VOQ3 547 27350 0 0 0 |
| 118 | +cmp217-5|asic1|Ethernet256 VOQ4 266 13300 0 0 0 |
| 119 | +cmp217-5|asic1|Ethernet256 VOQ5 133 6650 0 0 0 |
| 120 | +cmp217-5|asic1|Ethernet256 VOQ6 45 2250 0 0 0 |
| 121 | +cmp217-5|asic1|Ethernet256 VOQ7 52 15809 0 0 0 |
| 122 | +
|
| 123 | +``` |
| 124 | + |
| 125 | +### Testing Requirements/Design |
| 126 | +#### System Test cases |
| 127 | +Send traffic across different ASICs and ensure aggregate counters are correctly displayed. |
| 128 | + |
| 129 | +### Limitations and future work |
| 130 | +1. Currently we are not exposing redis instance over midplane IP for single ASIC linecards as redis runs in protected mode. |
| 131 | +2. Clear functionality is not supported as of now for aggregate VOQ counters. |
0 commit comments