Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced a script to capture leaks from malloc / free #44

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions leak_detector/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Leak detector

Matches all calls to `malloc` and `free` and shows any unmatched `malloc` call
with a Bbcount to jump to.

## Usage
```
ugo start
mleak
```

Before using the script it must be loaded in to the debugger:
```
source PATHTOADDONS/leak_detector/leak_detector.py
```
79 changes: 79 additions & 0 deletions leak_detector/leak_detector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
"""
Find memory leaks when malloc and free are used.
Starting from the current position it matches all calls to malloc
with calls to free with the same pointer and
keeps track of the calls that have no corresponding call to free.
It is recommended to go to the start of time and then use the command.
It prints a full list of unmatched calls at the end.
Usage: mleak

Contibutors: Emiliano Testa
Copyright (C) 2022 Undo Ltd
"""

import copy
import gdb

from undodb.debugger_extensions import (
udb,
)


ALLOC_FN = "malloc"
FREE_FN = "free"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need these as globals if they are only used in one place.

all_allocs = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer if thee state were not a global. See below.



class MemAlloc:
def __init__(self, addr, size, bbcount):
self.addr = addr
self.size = size
self.bbcount = bbcount


def handle_alloc_fn():
frame = gdb.selected_frame()
size = frame.read_register("rdi")
bbcount = udb.time.get().bbcount
gdb.execute("finish")
frame = gdb.selected_frame()
addr = frame.read_register("rax")
all_allocs.append(MemAlloc(addr, size, bbcount))


def handle_free_fn():
frame = gdb.selected_frame()
addr = frame.read_register("rdi")
for alloc in copy.copy(all_allocs):
if alloc.addr == addr:
all_allocs.remove(alloc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a set and then you can just do .remove(alloc). This requires MemAlloc to be hashable and not modifiable so using a frozen data class as I suggested above will make it work.



def handle_bp_event(event):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can achieve the same by defining a stop method in a class derived from gdb.Breakpoint.

class AllocBreakpoint(gdb.Breakpoint):
    def __init__(self, allocations: set[MemAlloc]) -> None:
        super().__init__("malloc", internal=True)
        self.allocations = allocations

    def stop(self) -> bool:
        [... logic from handle_alloc_fn ...]
        alloc = MemAlloc(addr, size, bbcount)
        assert alloc not in self.allocations, f"XXX something helpful here"
        all_allocs.append(alloc)

        return False

And similar for free.

Returning False from stop prevents GDB from stopping at this breakpoint so you don't need multiple continue.

Also use internal=True so that the breakpoint doesn't modify the number of the next BP created by the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about using this approach but the documentation states:

You should not alter the execution state of the inferior (i.e., step, next, etc.), alter the current frame context (i.e., change the current active frame), or alter, add or delete any breakpoint. As a general rule, you should not alter any data within GDB or the inferior at this time. 

and, because I need to get to the end of malloc() I decided not to use this technique and I went for the stop handlers instead.

if hasattr(event, "breakpoints"):
for bp in event.breakpoints:
if bp.location == ALLOC_FN:
handle_alloc_fn()
elif bp.location == FREE_FN:
handle_free_fn()


class LeakDetect(gdb.Command):
def __init__(self):
super().__init__("mleaks", gdb.COMMAND_USER)

@staticmethod
def invoke(arg, from_tty):
gdb.Breakpoint(ALLOC_FN)
gdb.Breakpoint(FREE_FN)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove the breakpoints you create after you are done. If you follow my suggestions from above you will have:

allocations: set[MemAlloc] = set()
try:
    alloc_bp = AllocBreakpoint()
    free_bp = FreeBreakpoint()
    [... rest of the function ...]
finally:
    alloc_bp.delete()
    free_bp.delete()

gdb.events.stop.connect(handle_bp_event)
end_of_time = udb.get_event_log_extent().max_bbcount
gdb.execute("continue")
while udb.time.get().bbcount < end_of_time:
gdb.execute("continue")
print("Calls to allocator fn that don't have a corresponding free")
for alloc in all_allocs:
print(f"{hex(alloc.addr)} - {hex(alloc.size)} - {alloc.bbcount}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a set then the results will be in an arbitrary order. You can do this:

for alloc in sorted(allocations, key=lambda ma: ma.bbcount):
    ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few possible improvements here:

  • Show the bbcount as first element as it's what you are sorting on
  • Show a range of addresses, not just the start
  • Don't use an hex for the size
  • Use commas for long bbcounts and sizes
  • Pad numbers so they aligned

Maybe something like this:

print(f"{alloc.bbcount:18,}: {alloc.addr:#018x} - {alloc.addr + alloc.size:#018x} (size={alloc.size:,})")

The various bits after the : characters mean:

  • 18: pad to 18 characters with spaces
  • 018: pad to 18 characters with zeroes
  • ,: format numbers with commas every 3 digits
  • x: use hexadecimal
  • #: add 0x



LeakDetect()