-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add malloc free leak detector scripts #59
base: master
Are you sure you want to change the base?
Changes from all commits
fa19d0e
59de618
617dba1
f9244d6
099e850
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
memchecker: memchecker.c | ||
gcc -g -o $@ $< | ||
|
||
memchecker.undo: memchecker | ||
live-record -o $@ ./$< | ||
|
||
run: memchecker.undo | ||
./malloc-free.py $< | ||
|
||
all: memchecker.undo | ||
|
||
clean: | ||
rm *.undo *.pyc memchecker | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Memory leak detection example | ||
This example implements a simple memory leak detector with the Undo Automation API. | ||
|
||
The `memchecker.c` example application has a single unmatched `malloc()` call (excluding some from before the application has actually started, including the 1KB buffer for printfs, which are detected). | ||
|
||
Using the Undo Automation API, the python scripts process the recording to find all `malloc()` and `free()` calls. The script ignores all `malloc()` calls with matching `free()` call, and after parsing the entire recording, jumps back in time to each of the unmatched `malloc()` calls. For each call, the scripts: | ||
* Output the backtrace at the time of the call. | ||
* Continue execution until `malloc()` returns. | ||
* Outputs the souce code for the calling function (if available) and locals. | ||
|
||
In the case of the example program, this is sufficient to clearly show the root cause for the deliberate leak. Generally it should give a good hint for other recordings, and the output does clearly provide the timestamps for the `malloc()` calls to enable opening the recording and jumping directly to the leaking memory allocation to start debugging from there. | ||
|
||
These scripts can be used as a starting point to implement other kinds of analysis related to the standard allocation functions, such as producing a profile of how much memory is being used during execution. | ||
|
||
## How to run the demo | ||
Simply enter the directory and run: | ||
|
||
`make run` | ||
|
||
## How to use the scripts on other recordings | ||
Simply run the `malloc-free.py` script, passing the recording as the parameter: | ||
|
||
`./malloc-free.py <recording.undo>` | ||
|
||
## Enhancements ideas | ||
* Provide some way to filter out library code. | ||
* Add verbosity controls. | ||
* Support recordings without symbols (provide address for `malloc()` & `free()` at command line). | ||
* Automatically trace the use of leaking memory to identify the last read or write access to the memory. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
import pathlib | ||
|
||
|
||
def maybe_install_script(script_path: pathlib.Path, script_name: str) -> None: | ||
""" | ||
Ask for permission and then install a script into ~/.local/bin. | ||
""" | ||
local_bin = pathlib.Path.home() / ".local" / "bin" | ||
install_path = local_bin / script_name | ||
|
||
choice = input(f"Do you want to install {script_name} to {local_bin}? [y/N] ") | ||
if choice.lower() not in ("y", "yes"): | ||
return | ||
|
||
try: | ||
install_path.symlink_to(script_path) | ||
install_path.chmod(0o755) | ||
except OSError as e: | ||
print(f"Failed to install the script: {e}") | ||
|
||
|
||
script = pathlib.Path(__file__).resolve().parent / "malloc_free_check.py" | ||
|
||
print( | ||
f"""\ | ||
The {script.name!r} script can be run outside of UDB: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could use |
||
|
||
$ {script} <recording-file> | ||
""" | ||
) | ||
|
||
maybe_install_script(script, "malloc-free-check") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
#! /usr/bin/env udb-automate | ||
""" | ||
Undo Automation command-line script for tracking calls to malloc() and free() and checking for | ||
leaked memory. | ||
|
||
This script only support the x86-64 architecture. | ||
|
||
Contributors: Chris Croft-White, Magne Hov | ||
""" | ||
|
||
import sys | ||
import textwrap | ||
|
||
from undo.udb_launcher import REDIRECTION_COLLECT, UdbLauncher | ||
|
||
|
||
def main(argv: list[str]) -> None: | ||
# Get the arguments from the command line. | ||
try: | ||
recording = argv[1] | ||
except ValueError: | ||
# Wrong number of arguments. | ||
print(f"{sys.argv[0]} RECORDING_FILE", file=sys.stderr) | ||
raise SystemExit(1) | ||
|
||
# Prepare for launching UDB. | ||
launcher = UdbLauncher() | ||
# Make UDB run with our recording. | ||
launcher.recording_file = recording | ||
# Make UDB load the malloc_free_check_extension.py file from the current directory. | ||
launcher.add_extension("malloc_free_check_extension") | ||
# Finally, launch UDB! | ||
# We collect the output as, in normal conditions, we don't want to show it | ||
# to the user but, in case of errors, we want to display it. | ||
res = launcher.run_debugger(redirect_debugger_output=REDIRECTION_COLLECT) | ||
|
||
if not res.exit_code: | ||
# All good as UDB exited with exit code 0 (i.e. no errors). | ||
# The result_data attribute is used to pass information from the extension to this script. | ||
unmatched = res.result_data["unmatched"] | ||
print(f"The recording failed to free allocated memory {unmatched} time(s).") | ||
else: | ||
# Something went wrong! Print a useful message. | ||
print( | ||
textwrap.dedent( | ||
f"""\ | ||
Error! | ||
UDB exited with code {res.exit_code}. | ||
|
||
The output was: | ||
|
||
{res.output} | ||
""" | ||
), | ||
file=sys.stderr, | ||
) | ||
# Exit this script with the same error code as UDB. | ||
raise SystemExit(res.exit_code) | ||
|
||
|
||
if __name__ == "__main__": | ||
main(sys.argv) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
""" | ||
Undo Automation extension module for tracking calls to malloc() and free() and checking for | ||
leaked memory. | ||
|
||
This script only support the x86-64 architecture. | ||
|
||
Contributors: Chris Croft-White, Magne Hov | ||
""" | ||
|
||
import collections | ||
import re | ||
|
||
import gdb | ||
|
||
from undodb.debugger_extensions import udb | ||
from undodb.debugger_extensions.debugger_io import redirect_to_launcher_output | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These two should be |
||
|
||
|
||
def leak_check() -> int: | ||
""" | ||
Implements breakpoints and stops on all calls to malloc() and free(), capturing the | ||
timestamp, size and returned pointer for malloc(), then confirms the address pointer is later | ||
seen in a free() call. | ||
|
||
If a subsequent free() is not seen, then at the end of execution, output the timestamp and | ||
details of the memory which was never freed. | ||
|
||
Returns the number of unmatched allocations found. | ||
""" | ||
# Set a breakpoint for the specified function. | ||
gdb.Breakpoint("malloc") | ||
gdb.Breakpoint("free") | ||
|
||
# Declare allocations dictionary structure. | ||
allocations = collections.OrderedDict() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can just use a |
||
|
||
# Do "continue" until we have gone through the whole recording, potentially | ||
# hitting the breakpoints several times. | ||
end_of_time = udb.get_event_log_extent().end | ||
while True: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is likely to break if there are signals in the recording. I guess it doesn't particularly matter, but maybe we should expose |
||
gdb.execute("continue") | ||
|
||
# Rather than having the check directly in the while condition we have | ||
# it here as we don't want to print the backtrace when we hit the end of | ||
# the recording but only when we stop at a breakpoint. | ||
if udb.time.get().bbcount >= end_of_time: | ||
break | ||
|
||
# Use the $PC output to get the symbol and idenfity whether execution has stopped | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typo |
||
# at a malloc() or free() call. | ||
mypc = format(gdb.parse_and_eval("$pc")) | ||
if re.search("malloc", mypc): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Or maybe |
||
# In malloc(), set a FinishBreakpoint to capture the pointer returned later. | ||
mfbp = gdb.FinishBreakpoint() | ||
|
||
# For now, capture the timestamp and size of memory requested. | ||
time = udb.time.get() | ||
size = int(gdb.parse_and_eval("$rdi")) | ||
|
||
gdb.execute("continue") | ||
|
||
# Should stop at the finish breakpoint, so capture the pointer. | ||
assert mfbp.return_value is not None, "Expected to see a return value." | ||
addr = int(mfbp.return_value) | ||
|
||
if addr: | ||
# Store details in the dictionary. | ||
allocations[hex(addr)] = time, size | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be cleanear if the address was stored as an integer rather than its string representation. |
||
else: | ||
print(f"-- INFO: Malloc called for {size} byte(s) but null returned.") | ||
|
||
print(f"{time}: malloc() called: {size} byte(s) allocated at {addr}.") | ||
|
||
elif re.search("free", mypc): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly here. |
||
# In free(), get the pointer address. | ||
addr = int(gdb.parse_and_eval("$rdi")) | ||
|
||
time = udb.time.get() | ||
|
||
# Delete entry from the dictionary as this memory was released. | ||
if addr > 0: | ||
if allocations[hex(addr)]: | ||
del allocations[hex(addr)] | ||
else: | ||
print("--- INFO: Free called with unknown address") | ||
else: | ||
print("--- INFO: Free called with null address") | ||
|
||
# with redirect_to_launcher_output(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Delete this commented out line? |
||
print(f"{time}: free() called for {addr:#x}") | ||
|
||
# If Allocations has any entries remaining, they were not released. | ||
with redirect_to_launcher_output(): | ||
print() | ||
print(f"{len(allocations)} unmatched memory allocation(s):") | ||
print() | ||
|
||
total = 0 | ||
|
||
# Increase the amount of source from default (10) to 16 lines for more context. | ||
gdb.execute("set listsize 16") | ||
for location, (time, size) in allocations.items(): | ||
total += size | ||
print("===============================================================================") | ||
print(f"{time}: {size} bytes was allocated at {location}, but never freed.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And then here location can be printed as |
||
print("===============================================================================") | ||
udb.time.goto(time) | ||
print("Backtrace:") | ||
gdb.execute("backtrace") | ||
print() | ||
print("Source (if available):") | ||
gdb.execute("finish") | ||
gdb.execute("list") | ||
print() | ||
print("Locals (after malloc returns):") | ||
gdb.execute("info locals") | ||
print() | ||
print() | ||
print("===============================================================================") | ||
print(f" In total, {total} byte(s) were allocated and not released") | ||
print() | ||
|
||
return len(allocations) | ||
|
||
|
||
# UDB will automatically load the modules passed to UdbLauncher.add_extension and, if present, | ||
# automatically execute any function (with no arguments) called "run". | ||
def run() -> None: | ||
# Needed to allow GDB to fixup breakpoints properly after glibc has been loaded. | ||
gdb.Breakpoint("main") | ||
|
||
unmatched = leak_check() | ||
|
||
# Pass the number of unmatched allocations back to the outer script. | ||
udb.result_data["unmatched"] = unmatched |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#include <stdio.h> | ||
#include <stdlib.h> | ||
|
||
int | ||
main(void) | ||
{ | ||
int i; | ||
|
||
for (i = 1; i < 20; ++i) | ||
{ | ||
int *addr = (int *)malloc(10 * sizeof(int)); | ||
printf("Address allocated: %p\n", addr); | ||
|
||
if (!(i % 10 == 0)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why A comment explaining what this is doing could also be useful to users. |
||
{ | ||
printf("Address freed: %p\n", addr); | ||
free(addr); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any recent Python would create
.pyc
files in the same directory. They should go in__pycache__
.Removing all recordings is a bit scary as it may remove things customers put in there and didn't want deleted. Maybe just remove
memchecker
andmemchecker.undo
?