Skip to content
This repository was archived by the owner on Nov 26, 2021. It is now read-only.
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion trappy/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,9 +174,11 @@ def append_data(self, time, comm, pid, cpu, data):
def generate_parsed_data(self):

# Get a rough idea of how much memory we have to play with
CHECK_MEM_COUNT = 10000
kb_free = _get_free_memory_kb()
starting_maxrss = getrusage(RUSAGE_SELF).ru_maxrss
check_memory_usage = True
check_memory_count = 1

for (comm, pid, cpu, data_str) in zip(self.comm_array, self.pid_array,
self.cpu_array, self.data_array):
Expand All @@ -200,13 +202,20 @@ def generate_parsed_data(self):
# When running out of memory, Pandas has been observed to segfault
# rather than throwing a proper Python error.
# Look at how much memory our process is using and warn if we seem
# to be getting close to the system's limit.
# to be getting close to the system's limit, check it only once
# in the beginning and then every CHECK_MEM_COUNT events
check_memory_count -= 1
if check_memory_count != 0:
Copy link
Copy Markdown
Contributor

@bjackman bjackman Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this confusing - IMO this would be much clearer (no continue):

        if check_memory_usage and check_memory_count == 0:
             kb_used = (getrusage(RUSAGE_SELF).ru_maxrss - starting_maxrss)
             if  kb_free and kb_used > kb_free * 0.9:
                 warnings.warn("TRAPpy: Appear to be low on memory. "
                               "If errors arise, try providing more RAM")
                 check_memory_usage = False
             check_memory_count = CHECK_MEM_COUNT
        
        yield data_dict

Not sure if this is just me..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS this getrusage stuff was originally added (by me) to get around the fact that Pandas was segfaulting, instead of raising a proper Python error, when it ran out of memory. It might be worth checking that the latest Pandas still has that problem; if it doesn't maybe we can drop this stuff entirely as it's pretty ugly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brendan, the memory limit has never been a problem for me even for large data frames, I don't mind ripping it out altogether unless anyone else has an objection?

Not sure how much RAM you had, but why not rip it out yourself and see if you still hit it with new pandas? I agree its pretty ugly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll give it a try (I was using a VM with 1Gb memory).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I tested it and pandas still segfaults when you run out of memory, so we'll need to keep this hack.

yield data_dict
continue

kb_used = (getrusage(RUSAGE_SELF).ru_maxrss - starting_maxrss)
if check_memory_usage and kb_free and kb_used > kb_free * 0.9:
warnings.warn("TRAPpy: Appear to be low on memory. "
"If errors arise, try providing more RAM")
check_memory_usage = False

check_memory_count = CHECK_MEM_COUNT
yield data_dict

def create_dataframe(self):
Expand Down