Skip to content

Cleanup of invalid VIDs is missing for FlexCounter bulk operation only, its present for individual operation #1682

@rajshekhar-nexthop

Description

@rajshekhar-nexthop

Summary

When processing flex-counter SET events, some VIDs in the event payload may no longer translate to RIDs (for example after a port was removed or split). When translation fails for one or more VIDs in a SET event, the system can end up with stale entries in the flex-counter database and in the counter manager, and bulk counter operations may be invoked with mismatched or invalid inputs.

Missing VID cleanup in the code

File: syncd/Syncd.cpp — code snippet where invalid-VID cleanup is missing

if (fromAsicChannel && op == SET_COMMAND && (!vidStringVector.empty()))
{
    std::vector<sai_object_id_t> vids;
    std::vector<sai_object_id_t> rids;
    std::vector<std::string> keys;
    ...
    for (auto &strVid: vidStringVector)
    {
        sai_object_id_t vid, rid;
        sai_deserialize_object_id(strVid, vid);
        vids.emplace_back(vid);

        if (!m_translator->tryTranslateVidToRid(vid, rid))
        {
            SWSS_LOG_ERROR("port VID %s, was not found (probably port was removed/splitted)",
                           sai_serialize_object_id(vid).c_str());
            continue;
        }

        rids.emplace_back(rid);
        keys.emplace_back(groupName + ":" + strVid);
    }

    m_manager->bulkAddCounter(vids, rids, groupName, values);

    for (auto &singleKey: keys)
    {
        m_flexCounterTable->set(singleKey, values);
    }

    if (fromAsicChannel)
    {
        sendApiResponse(SAI_COMMON_API_SET, SAI_STATUS_SUCCESS);
    }
}

This is the section in the original code where invalid VIDs were skipped (logged) but not removed from the FLEX_COUNTER_TABLE or from the counter manager before calling bulkAddCounter.

Observed symptoms

  • Stale flex-counter keys remain in the DB (groupName:VID) for VIDs that no longer exist or translate.
  • The counter manager retains entries for removed ports, leading to resource/state leak.
  • Bulk counter operations may be called with mismatched inputs (e.g., VIDs present that do not have corresponding valid RIDs), potentially causing incorrect bookkeeping or errors further downstream.
  • Log messages (from flex-counter processing) indicating failed translation or missing port RIDs when events arrive referencing removed/split ports.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions