Implement Unified Line Coverage Formula #898

Kartikayy007 · 2025-03-19T12:32:51Z

This PR standardizes how line coverage is calculated throughout the system to ensure consistent and mathematically correct measurements.

Reference: Issue #727

Solution

Implement Standardized Formulas

As defined in the issue:

Single Target Coverage:
Cov(f1) / Linked(f1)
Coverage Improvement:
[Cov(f1) - Cov(f0)] / [Linked(f1 ∪ f0)]
- Improvement is the difference in covered lines divided by the union of all linked lines.

Implementation Details

New Coverage Module

Created: experiment/coverage.py with standardized formula implementations.
Implemented:
- calculate_coverage() for single targets.
- calculate_coverage_improvement() for comparing targets.

Updated Calculation Points

stage/execution_stage.py: Updated line coverage diff calculation.
run_all_experiments.py: Fixed project coverage gain calculation.
report/aggregate_coverage_diff.py: Standardized coverage diff computation.
experiment/evaluator.py: Aligned with the unified formula.

Non-Destructive Operations

Use copy() method to prevent modifications to original coverage objects.
Ensures subtraction operations don’t affect original data.

Kartikayy007 · 2025-03-19T12:57:19Z

CC: @DonggeLiu please review

DonggeLiu · 2025-03-20T01:00:32Z

experiment/coverage.py

+
+
+def calculate_coverage(cov: textcov.Textcov, linked_lines: int) -> float:
+  """Calculate coverage according to formula: Cov(f) / Linked(f)."""


nit: 'Calculates' for consistency, same below.

DonggeLiu · 2025-03-20T01:47:33Z

experiment/evaluator.py

@@ -465,7 +467,9 @@ def check_target(self, ai_binary, target_path: str) -> Result:
          run_result.coverage.subtract_covered_lines(existing_textcov)

        if total_lines and run_result.coverage:
-          coverage_diff = run_result.coverage.covered_lines / total_lines
+          union_linked_lines = max(run_result.coverage.total_lines, total_lines)


I used the term "union" in the mathematical sense, where each line represents an element of a set. Each set can correspond to covered lines or linked lines in the project.

For example, suppose our fuzz target links lines {1,2,3,4} of the project at compile time and covers lines {1,2,3} during fuzzing, while an older fuzz target links lines {1,2,5} and covers {2,5} . Then:

The union linked lines is {1,2,3,4,5}.

The new fuzz target's coverage is 3/5 ({1,2,3} out of {1,2,3,4,5}).

The old fuzz target's coverage is 2/5 ({2,5} out of {1,2,3,4,5}).

The coverage increase is 2/5 (newly covered lines {1,3} out of {1,2,3,4,5}).

Ideally, the denominator should represent "the total reachable lines" or "the total number of lines" in the project. However, they are difficult to determine accurately if certain files are not linked at compile time.

Given this is a complicate task, please feel free to prioritize on writing and refining your proposal : )
That's the most important factor for you application.
If you have a draft, I am more than happy to provide general feedback to ensure you are on the right track.

You are correct sir I'll prioritise my proposal firstly and will surely take a look back to it and completely resolve it, Thank you !

DonggeLiu · 2025-03-20T01:54:57Z

experiment/textcov.py

+                                       hit_count=line.hit_count)
+      new_cov.files[name] = new_file
+
+    return new_cov


Could we use python's builtin deepcopy?

DonggeLiu · 2025-03-20T02:01:00Z

run_all_experiments.py


    if total_lines:
      coverage_gain[project] = {
          'language':
              oss_fuzz_checkout.get_project_language(project),
          'coverage_diff':
-              total_cov.covered_lines / total_lines,
+              coverage_utils.calculate_coverage_improvement(
+                  total_cov, existing_textcov, union_linked_lines),


nit: I think we have subtracted existing_textcov above:

oss-fuzz-gen/run_all_experiments.py

Lines 504 to 506 in 546036d

total_existing_lines = sum(lines)

total_cov_covered_lines_before_subtraction = total_cov.covered_lines

total_cov.subtract_covered_lines(existing_textcov)

While this action should be idempotent, repeated actions add unnecessary complexity and may confuse readers.

DonggeLiu · 2025-03-20T02:02:28Z

As always, thanks sooo much for looking into this!

…rmula

Changes: - Add dedicated coverage.py module with standardized implementations - Fix double-subtraction issue in evaluator.py - Use non-destructive operations to preserve original coverage objects - Replace custom copy implementation with Python's built-in deepcopy

Kartikayy007 · 2025-03-24T22:23:52Z

Hello! @DonggeLiu
been thinking for the mathematical union of sets of lines. I have described a set-based solution that would actually implement the mathematical union you suggested.

# Changes to experiment/textcov.py

@dataclasses.dataclass
class Textcov:
   """Textcov with set-based line tracking."""
   # Original properties (keep for backward compatibility)
   functions: dict[str, Function] = dataclasses.field(default_factory=dict)
   files: dict[str, File] = dataclasses.field(default_factory=dict)
   language: str = 'c++'
   
   # New set-based tracking
   covered_line_numbers: set[int] = dataclasses.field(default_factory=set)
   linked_line_numbers: set[int] = dataclasses.field(default_factory=set)
   
   def subtract_covered_lines(self, other: 'Textcov'):
       """Remove lines covered in other from self using set operations."""
       # Original function implementation (keep for backward compatibility)
       if self.language == 'python':
           for file in other.files.values():
               if file.name in self.files:
                   self.files[file.name].subtract_covered_lines(file)
       else:
           for function in other.functions.values():
               if function.name in self.functions:
                   self.functions[function.name].subtract_covered_lines(
                       function, self.language)
       
       # New set-based implementation
       self.covered_line_numbers -= other.covered_line_numbers
   
   @property
   def covered_lines(self):
       """Return count of covered lines."""
       # Prefer set-based count if available
       if self.covered_line_numbers:
           return len(self.covered_line_numbers)
           
       # Fall back to original implementation
       if self.language == 'python':
           return sum(f.covered_lines for f in self.files.values())
       return sum(f.covered_lines for f in self.functions.values()
)


# Changes to experiment/coverage.py

def calculate_coverage_improvement(new_cov: textcov.Textcov,
                                  existing_cov: textcov.Textcov,
                                  union_linked_lines: int = None) -> float:
   """Calculates coverage improvement: [Cov(f1) - Cov(f0)] / [Linked(f1 ∪ f0)]."""
   # Make a copy to avoid modifying the original
   diff_cov = new_cov.copy()
   diff_cov.subtract_covered_lines(existing_cov)
   
   # If set-based line numbers are available, use them for true mathematical union
   if new_cov.linked_line_numbers and existing_cov.linked_line_numbers:
       # Calculate true union of linked lines
       true_union = new_cov.linked_line_numbers.union(existing_cov.linked_line_numbers)
       union_size = len(true_union)
       return diff_cov.covered_lines / union_size if union_size else 0.0
   
   # Fall back to approximation if sets aren't available
   if not union_linked_lines:
       return 0.0
   return diff_cov.covered_lines / union_linked_lines

It would store actual line numbers in sets and perform corresponding set operations. This is closer to your example, What do you think? im confused I think here parsing and backward compatibility aren't fully addressed and mixing old and new approaches
Should we implement this for improved accuracy?

…alculation This change eliminates the redundant subtraction of covered lines from the total coverage calculation, streamlining the process and ensuring accurate coverage metrics.

Kartikayy007 · 2025-03-24T22:36:37Z

please take a look whenever you are free Thank you

DonggeLiu · 2025-03-31T07:18:04Z

Hi @Kartikayy007,
I am not very confident about this new solution.
For example, how does it handle the line number of multiple files, or different files under the same name but different path?

I reckon this issue will likely be rather complicated, feel free to leave it for now.

demoncoder-crypto · 2025-04-03T13:46:41Z

@DonggeLiu I have done a alternative approach please do let me know If its up to expectation

Kartikayy007 · 2025-04-04T03:42:34Z

@DonggeLiu I have done a alternative approach please do let me know If its up to expectatio
Can you you explain?

DonggeLiu · 2025-04-11T06:02:16Z

I will close this PR if you don't mind? @Kartikayy007

Kartikayy007 · 2025-04-11T06:18:47Z

I will close this PR if you don't mind? @Kartikayy007

Please put this in draft.

Kartikayy007 added 2 commits March 19, 2025 17:29

Implement unified line coverage formula

6a480c4

formater/linted the code and revreted the log message

546036d

DonggeLiu reviewed Mar 20, 2025

View reviewed changes

DonggeLiu marked this pull request as draft March 20, 2025 11:04

Kartikayy007 and others added 3 commits March 23, 2025 14:23

Merge branch 'main' into unified-coverage-formula

33e42b3

Merge remote-tracking branch 'upstream/main' into unified-coverage-fo…

3469348

…rmula

Kartikayy007 marked this pull request as ready for review March 24, 2025 22:16

fix: Remove unnecessary line coverage subtraction in total coverage c…

ebe46e4

…alculation This change eliminates the redundant subtraction of covered lines from the total coverage calculation, streamlining the process and ensuring accurate coverage metrics.

DonggeLiu mentioned this pull request Apr 3, 2025

Fix(experiment): Update coverage calculation logic for #727 #960

Open

DonggeLiu marked this pull request as draft April 14, 2025 06:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Unified Line Coverage Formula #898

Implement Unified Line Coverage Formula #898

Uh oh!

Kartikayy007 commented Mar 19, 2025

Uh oh!

Kartikayy007 commented Mar 19, 2025

Uh oh!

DonggeLiu Mar 20, 2025

Uh oh!

DonggeLiu Mar 20, 2025

Uh oh!

Kartikayy007 Mar 20, 2025

Uh oh!

DonggeLiu Mar 20, 2025

Uh oh!

DonggeLiu Mar 20, 2025

Uh oh!

DonggeLiu commented Mar 20, 2025

Uh oh!

Kartikayy007 commented Mar 24, 2025 •

edited

Loading

Uh oh!

Kartikayy007 commented Mar 24, 2025

Uh oh!

DonggeLiu commented Mar 31, 2025

Uh oh!

demoncoder-crypto commented Apr 3, 2025

Uh oh!

Kartikayy007 commented Apr 4, 2025

Uh oh!

DonggeLiu commented Apr 11, 2025

Uh oh!

Kartikayy007 commented Apr 11, 2025 •

edited

Loading

Uh oh!

Uh oh!



		def calculate_coverage(cov: textcov.Textcov, linked_lines: int) -> float:
		"""Calculate coverage according to formula: Cov(f) / Linked(f)."""

	total_existing_lines = sum(lines)
	total_cov_covered_lines_before_subtraction = total_cov.covered_lines
	total_cov.subtract_covered_lines(existing_textcov)

Implement Unified Line Coverage Formula #898

Are you sure you want to change the base?

Implement Unified Line Coverage Formula #898

Uh oh!

Conversation

Kartikayy007 commented Mar 19, 2025

Solution

Implement Standardized Formulas

Implementation Details

New Coverage Module

Updated Calculation Points

Non-Destructive Operations

Uh oh!

Kartikayy007 commented Mar 19, 2025

Uh oh!

DonggeLiu Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

DonggeLiu Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Kartikayy007 Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

DonggeLiu Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

DonggeLiu Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

DonggeLiu commented Mar 20, 2025

Uh oh!

Kartikayy007 commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kartikayy007 commented Mar 24, 2025

Uh oh!

DonggeLiu commented Mar 31, 2025

Uh oh!

demoncoder-crypto commented Apr 3, 2025

Uh oh!

Kartikayy007 commented Apr 4, 2025

Uh oh!

DonggeLiu commented Apr 11, 2025

Uh oh!

Kartikayy007 commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Kartikayy007 commented Mar 24, 2025 •

edited

Loading

Kartikayy007 commented Apr 11, 2025 •

edited

Loading