There are two radical approaches for clustering:

fully automated with the usual challenges (dealing with too much noise, with almost similar yet different content, with clusters that may "subsume" other clusters, etc.)
manual with pattern matching: yes, but you need to know what to search

Of course the solution is neither completely black nor completely white...
An hybrid approach is to find "generic" patterns (like @jualvespereira proposes).
Another is to use some knowledge we gather throughout the review of failures.

For instance, I have come across some patterns:

`undefined reference to backlight Undefined reference to `backlight_device_unregister ProjetIrma#148
error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter __read_overflow2(); SRP and INFINIBAND variant of error: call to '__read_overflow2' ProjetIrma#145
undefined reference to `v4l2_ Undefined reference to `v4l2_subdev_link_validate' ProjetIrma#141
undefined references to `crc32_le' Undefined reference to `crc32_le' ProjetIrma#143

and I implemented some ad-hoc regex
something like

for err in err_logs_configuration(cid).splitlines(): 
        if "read_overflow2" in err:
            print (err)

maybe we can have pre-defined regex for labelling failures... and fully automated techniques for the rest.

Final remark: we may have more than one cluster attached to a failure -- see this failure #1 (comment)

What are the relevant data of the error logs? #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions