Skip to content

Commit

Permalink
Add Stack Frame Collection (#83)
Browse files Browse the repository at this point in the history
* Add StackFrameTable

* Really add StackFrameTable

* Initial support for stackframes. (#2)

* Initial support for stackframes.

Hidden for now behind a define (default off) for now.

* Make stackframe support compile-time.

Document that it requires editing Makefile.

* Switch to at-runtime enable/disable.

* Remove outdated comment

* Remove include.

* Remove wrongly added line

* Document in FEATURES.md

* Track more APIs in stackframe.

* Redo build.  Use a submodule.  Refactor to uncouple unwind

* Fix cpptrace linkage.  Add a stackframe view

* Fix build. (#3)

* Fix build.

Use cmake, install, point to installed cpptrace.

* Also correct README

* Point to directory

* Remove system dwarf.

* Add doc.

* Remove debug, add playbook (#4)

* Minor improvements. (#5)

---------

Co-authored-by: iotamudelta <[email protected]>
  • Loading branch information
mwootton and iotamudelta authored Jan 28, 2025
1 parent 60cb12e commit 51b0daf
Show file tree
Hide file tree
Showing 21 changed files with 736 additions and 10 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
[submodule "autocoplite"]
path = autocoplite
url = https://github.com/brieflynn/autocoplite.git
[submodule "cpptrace"]
path = cpptrace
url = https://github.com/jeremy-rifkin/cpptrace.git
30 changes: 30 additions & 0 deletions FEATURES.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Contents:
- [Graph Subclass](#graph-subclass)
- [Pytorch Autograd Subclass](#pytorch-autograd-subclass)
- [Call Stack Accounting](#call-stack-accounting)
- [Stackframe recording](#stackframe-recording)
- [Rpd_tracer Start/stop](#rpd_tracer-startstop)
- [Schema v2](#schema-v2)

Expand Down Expand Up @@ -180,6 +181,35 @@ for row in connection.execute("select args, avg(cpu_time), avg(gpu_time) from ca
```


--------------------------------------------------------------------------------
## Stackframe recording
Stackframe recording requires initialization of the `ccptrace` submodule. Additionally,`RPDT_STACKFRAMES=1` must must be set at profiling time as an environment variable to record stack traces for every HIP API call. The data is recorded in the `rocpd_stackframe` table:
```
> select * from rocpd_stackframe limit 20;
id|api_ptr_id|depth|name_id
1|2|0|5
2|2|1|6
3|2|2|7
4|2|3|8
5|2|4|9
6|2|5|10
7|3|0|12
8|3|1|13
9|3|2|14
10|3|3|15
11|3|4|16
12|6|0|20
13|6|1|21
14|6|2|14
15|6|3|15
16|6|4|16
17|9|0|20
18|9|1|21
19|9|2|14
20|9|3|15
```
The `api_ptr_id` maps to the HIP API correlation ID, `depth` is the stack trace depth starting with 0, `name_id` is the stack frame mapping to `rocpd_string`.


--------------------------------------------------------------------------------

Expand Down
16 changes: 14 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PYTHON ?= python3

.PHONY:
all: rpd rocpd remote
all: cpptrace rpd rocpd remote

.PHONY: install
install: all
Expand All @@ -16,7 +16,7 @@ uninstall:
$(MAKE) uninstall -C remote

.PHONY: clean
clean:
clean: cpptrace-clean
$(MAKE) clean -C rocpd_python
$(MAKE) clean -C rpd_tracer
$(MAKE) clean -C remote
Expand All @@ -30,3 +30,15 @@ rocpd:
.PHONY: remote
remote:
$(MAKE) -C remote
.PHONY: cpptrace

CPPTRACE_MAKE?= $(wildcard cpptrace/Makefile)
ifneq ($(CPPTRACE_MAKE),)
cpptrace:
cd cpptrace; cmake -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../cpptrace_install; cmake --build build; cmake --install build
cpptrace-clean:
$(MAKE) clean -C cpptrace
else
cpptrace:
cpptrace-clean:
endif
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ make; make install
This will install python modules that are used to manipulate trace files.
It will also build and install the native tracer, rpd_tracer.


## Quickstart

+ Install per the [Installation](#installation) section.
Expand Down Expand Up @@ -133,3 +132,20 @@ make clean
```

Follow the README.md file within the autocoplite submodule for additional instructions and examples for how to run.

### cpptrace submodule setup

The cpptrace submodule adds the ability to capture stacktraces for every HIP API invocation. The module needs to be initialized and updated for this:
```sh
git submodule update --init --recursive
```

This command will initialize, fetch and checkout the submodule to the commit specified in the main repository.

To update the submodule at any time and pull the latest changes, run:

```sh
git submodule update --remote
```

`make` will subsequently build `cpptrace` and link `rpd_tracer` against it. Enabling stacktrace capture requires setting `RPDT_STACKFRAMES=1`.
1 change: 1 addition & 0 deletions cpptrace
Submodule cpptrace added at 6689d1
3 changes: 2 additions & 1 deletion docs/perf_playbooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ Available currently:
- Documentation of rpd tables and their use in performance analysis [RPD tables](rpd-tables.md).
- Capturing and analyzing GPU frequencies during workload execution in [frequency capture](freq-capture.md).
- Extracting collectives and benchmarking them standlone in [collective tuning](collective-tuning.md).
- Variability analysis to gauge performance impact in [variability-analysis.md](variability-analysis.md) and example SQL commands for [matched](variability-analysis.sql) and [unmatched](variability-analysis_nolaunch.sql) rpds.
- Variability analysis to gauge performance impact in [variability-analysis](variability-analysis.md) and example SQL commands for [matched](variability-analysis.sql) and [unmatched](variability-analysis_nolaunch.sql) rpds.
- Call stack analysis to analyze where HIP APIs were called from in [stackframe-analysis](stackframe-analysis.md).
355 changes: 355 additions & 0 deletions docs/perf_playbooks/stackframe-analysis.md

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#!/bin/bash
apt-get install -y sqlite3 libsqlite3-dev libfmt-dev
apt-get install -y libzstd-dev

make; make install
12 changes: 11 additions & 1 deletion rocpd/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ class KernelApi(Api):
groupSegmentSize = models.IntegerField(default=0)
privateSegmentSize = models.IntegerField(default=0)
codeObject = models.ForeignKey(KernelCodeObject, on_delete=models.PROTECT)
kernelName = models.ForeignKey(String, on_delete=models.PROTECT)
kernelName = models.ForeignKey(String, related_name='+', on_delete=models.PROTECT)
kernelArgAddress = models.CharField(max_length=18) #64 bit int
aquireFence = models.CharField(max_length=8) #(none, agent, system)
releaseFence = models.CharField(max_length=8) #(none, agent, system)
Expand All @@ -93,6 +93,11 @@ class CopyApi(Api):
sync = models.BooleanField()
pinned = models.BooleanField()

class AnnotationApi(Api):
domain = models.ForeignKey(String, related_name='+', on_delete=models.PROTECT)
category = models.ForeignKey(String, related_name='+', on_delete=models.PROTECT)
data = models.CharField(max_length=8)

class BarrierOp(Op):
#op = models.OneToOneField(Ops, on_delete=models.PROTECT, primary_key=True)
signalCount = models.IntegerField()
Expand All @@ -118,6 +123,11 @@ class MonitorType(models.TextChoices):
end = models.IntegerField(default=0)
value = models.CharField(max_length=255)

class StackFrame(models.Model):
api = models.ForeignKey(Api, related_name='+', on_delete=models.PROTECT)
depth = models.IntegerField(default=0)
name = models.ForeignKey(String, related_name='+', on_delete=models.PROTECT)

#class InputSignal(models.Model)
# op = models.ForeignKey(Ops, on_delete=models.PROTECT)
# inputOp = models.ForeignKey(Ops, on_delete=models.PROTECT)
Expand Down
1 change: 1 addition & 0 deletions rocpd_python/rocpd/schema_data/tableSchema.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ CREATE TABLE IF NOT EXISTS "rocpd_api_ops" ("id" integer NOT NULL PRIMARY KEY AU
CREATE TABLE IF NOT EXISTS "rocpd_kernelapi" ("api_ptr_id" integer NOT NULL PRIMARY KEY REFERENCES "rocpd_api" ("id") DEFERRABLE INITIALLY DEFERRED, "stream" varchar(18) NOT NULL, "gridX" integer NOT NULL, "gridY" integer NOT NULL, "gridZ" integer NOT NULL, "workgroupX" integer NOT NULL, "workgroupY" integer NOT NULL, "workgroupZ" integer NOT NULL, "groupSegmentSize" integer NOT NULL, "privateSegmentSize" integer NOT NULL, "kernelArgAddress" varchar(18) NOT NULL, "aquireFence" varchar(8) NOT NULL, "releaseFence" varchar(8) NOT NULL, "codeObject_id" integer NOT NULL REFERENCES "rocpd_kernelcodeobject" ("id") DEFERRABLE INITIALLY DEFERRED, "kernelName_id" integer NOT NULL REFERENCES "rocpd_string" ("id") DEFERRABLE INITIALLY DEFERRED);
CREATE TABLE IF NOT EXISTS "rocpd_metadata" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "tag" varchar(4096) NOT NULL, "value" varchar(4096) NOT NULL);
CREATE TABLE IF NOT EXISTS "rocpd_monitor" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "deviceType" varchar(16) NOT NULL, "deviceId" integer NOT NULL, "monitorType" varchar(16) NOT NULL, "start" integer NOT NULL, "end" integer NOT NULL, "value" varchar(255) NOT NULL);
CREATE TABLE IF NOT EXISTS "rocpd_stackframe" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "api_ptr_id" integer NOT NULL REFERENCES "rocpd_api" ("id") DEFERRABLE INITIALLY DEFERRED, "depth" integer NOT NULL, "name_id" integer NOT NULL REFERENCES "rocpd_string" ("id") DEFERRABLE INITIALLY DEFERRED);


INSERT INTO "rocpd_metadata"(tag, value) VALUES ("schema_version", "2")
Expand Down
3 changes: 3 additions & 0 deletions rocpd_python/rocpd/schema_data/utilitySchema.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,6 @@ CREATE VIEW copy AS SELECT B.id, pid, tid, start, end, C.string AS apiName, stre

-- Async copies (op timing)
CREATE VIEW copyop AS SELECT B.id, gpuId, queueId, sequenceId, B.start, B.end, (B.end-B.start) AS duration, stream, size, width, height, kind, dst, src, dstDevice, srcDevice, sync, pinned, E.string AS apiName FROM rocpd_api_ops A JOIN rocpd_op B ON B.id = A.op_id JOIN rocpd_copyapi C ON C.api_ptr_id = A.api_id JOIN rocpd_api D on D.id = A.api_id JOIN rocpd_string E ON E.id = D.apiName_id;

-- Stack Frames
CREATE VIEW stackframe AS SELECT B.id, C.string, depth, D.string FROM rocpd_stackframe A JOIN rocpd_api B ON B.id = A.api_ptr_id JOIN rocpd_string C ON C.id = B.apiname_id JOIN rocpd_string D ON D.id = A.name_id;
1 change: 1 addition & 0 deletions rpd_tracer/CuptiDataSource.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -608,6 +608,7 @@ void CUPTIAPI CuptiDataSource::api_callback(void *userdata, CUpti_CallbackDomain
break;
}
logger.apiTable().insert(row);
unwind(logger, name, row.api_id);
}
}
else if (domain = CUPTI_CB_DOMAIN_NVTX) {
Expand Down
11 changes: 11 additions & 0 deletions rpd_tracer/Logger.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ void Logger::rpdflush()
m_opTable->flush();
m_apiTable->flush();
m_monitorTable->flush();
m_stackFrameTable->flush();

const timestamp_t cb_end_time = clocktime_ns();
createOverheadRecord(cb_begin_time, cb_end_time, "rpdflush", "");
Expand Down Expand Up @@ -200,6 +201,7 @@ void Logger::init()
m_opTable = new OpTable(filename);
m_apiTable = new ApiTable(filename);
m_monitorTable = new MonitorTable(filename);
m_stackFrameTable = new StackFrameTable(filename);

// Offset primary keys so they do not collide between sessions
sqlite3_int64 offset = m_metadataTable->sessionId() * (sqlite3_int64(1) << 32);
Expand All @@ -209,6 +211,7 @@ void Logger::init()
m_copyApiTable->setIdOffset(offset);
m_opTable->setIdOffset(offset);
m_apiTable->setIdOffset(offset);
m_stackFrameTable->setIdOffset(offset);

// Create one instance of each available datasource
std::list<std::string> factories = {
Expand Down Expand Up @@ -259,6 +262,13 @@ void Logger::init()
m_worker = new std::thread(&Logger::autoflushWorker, this);
}
}

// Enable stack frame recording
const char *stackframe = getenv("RPDT_STACKFRAMES");
if (stackframe != nullptr) {
int val = atoi(stackframe);
m_writeStackFrames = (val != 0);
}
}

static bool doFinalize = true;
Expand Down Expand Up @@ -286,6 +296,7 @@ void Logger::finalize()
m_kernelApiTable->finalize();
m_copyApiTable->finalize();
m_monitorTable->finalize();
m_stackFrameTable->finalize();
m_writeOverheadRecords = false; // Don't make any new overhead records (api calls)
m_apiTable->finalize();
m_stringTable->finalize(); // String table last
Expand Down
5 changes: 4 additions & 1 deletion rpd_tracer/Logger.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class Logger
CopyApiTable &copyApiTable() { return *m_copyApiTable; }
ApiTable &apiTable() { return *m_apiTable; }
MonitorTable &monitorTable() { return *m_monitorTable; }

StackFrameTable &stackFrameTable() { return *m_stackFrameTable; }

// External control to stop/stop logging
void rpdstart();
Expand All @@ -66,6 +66,7 @@ class Logger
static void rpdFinalize() __attribute__((destructor));

const std::string filename() { return m_filename; };
bool writeStackFrames() { return m_writeStackFrames; };

private:
int m_activeCount {0};
Expand All @@ -80,12 +81,14 @@ class Logger
CopyApiTable *m_copyApiTable {nullptr};
ApiTable *m_apiTable {nullptr};
MonitorTable *m_monitorTable {nullptr};
StackFrameTable *m_stackFrameTable {nullptr};

void init();
void finalize();

std::string m_filename;
bool m_writeOverheadRecords {true};
bool m_writeStackFrames {false};

bool m_done {false};
int m_period{1};
Expand Down
10 changes: 9 additions & 1 deletion rpd_tracer/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,15 @@ PREFIX = /usr/local

HIP_PATH?= $(wildcard /opt/rocm/)
CUDA_PATH?= $(wildcard /usr/local/cuda/)
CPPTRACE_INCLUDE_PATH?= $(wildcard ../cpptrace_install/include)

HIPCC=$(HIP_PATH)/bin/hipcc

TARGET=hcc

RPD_LIBS = -lsqlite3 -lfmt
RPD_INCLUDES =
RPD_SRCS = Table.cpp BufferedTable.cpp OpTable.cpp KernelApiTable.cpp CopyApiTable.cpp ApiTable.cpp StringTable.cpp MetadataTable.cpp MonitorTable.cpp ApiIdList.cpp DbResource.cpp Logger.cpp
RPD_SRCS = Table.cpp BufferedTable.cpp OpTable.cpp KernelApiTable.cpp CopyApiTable.cpp ApiTable.cpp StringTable.cpp MetadataTable.cpp MonitorTable.cpp StackFrameTable.cpp ApiIdList.cpp DbResource.cpp Logger.cpp Unwind.cpp

ifneq (,$(HIP_PATH))
$(info Building with roctracer)
Expand All @@ -27,6 +28,13 @@ ifneq ($(CUDA_PATH),)
RPD_SRCS += CuptiDataSource.cpp
endif

ifneq ($(CPPTRACE_INCLUDE_PATH),)
$(info Building with cpptrace)
RPD_INCLUDES += -DRPD_STACKFRAME_SUPPORT
RPD_LIBS += -L$(CPPTRACE_INCLUDE_PATH)/../lib -lcpptrace -ldwarf -lz -lzstd -ldl
RPD_INCLUDES += -I $(CPPTRACE_INCLUDE_PATH)
endif

RPD_OBJS = $(RPD_SRCS:.cpp=.o)


Expand Down
6 changes: 4 additions & 2 deletions rpd_tracer/RoctracerDataSource.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ namespace {
int mapDeviceId(int id) { return id - deviceOffset; };
} // namespace


void RoctracerDataSource::api_callback(
uint32_t domain,
uint32_t cid,
Expand Down Expand Up @@ -138,12 +139,12 @@ void RoctracerDataSource::api_callback(
std::snprintf(buff, 4096, "ptr=%p | size=0x%x",
*data->args.hipMalloc.ptr,
(uint32_t)(data->args.hipMalloc.size));
row.args_id = logger.stringTable().getOrCreate(std::string(buff));
row.args_id = logger.stringTable().getOrCreate(std::string(buff));
break;
case HIP_API_ID_hipFree:
std::snprintf(buff, 4096, "ptr=%p",
data->args.hipFree.ptr);
row.args_id = logger.stringTable().getOrCreate(std::string(buff));
row.args_id = logger.stringTable().getOrCreate(std::string(buff));
break;

case HIP_API_ID_hipLaunchCooperativeKernelMultiDevice:
Expand Down Expand Up @@ -746,6 +747,7 @@ void RoctracerDataSource::api_callback(
}
#endif
logger.apiTable().insert(row);
unwind(logger, name, row.api_id);
}
}

Expand Down
2 changes: 1 addition & 1 deletion rpd_tracer/RoctracerDataSource.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@

#include "DataSource.h"
#include "ApiIdList.h"
#include "Logger.h"

class RocmApiIdList : public ApiIdList
{
Expand All @@ -54,5 +55,4 @@ class RoctracerDataSource : public DataSource
roctracer_pool_t *m_hccPool{nullptr};
static void api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void* arg);
static void hcc_activity_callback(const char* begin, const char* end, void* arg);

};
Loading

0 comments on commit 51b0daf

Please sign in to comment.