Skip to content

Commit

Permalink
Improve and rename gpuinfo, simplify cl_hot_functions sample
Browse files Browse the repository at this point in the history
  • Loading branch information
anton-v-gorshkov committed Apr 14, 2021
1 parent 0f88019 commit df968c9
Show file tree
Hide file tree
Showing 18 changed files with 488 additions and 631 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ You may obtain a copy of the License at https://opensource.org/licenses/MIT
- [onetrace](tools/onetrace) - host and device tracing tool for OpenCL(TM) and Level Zero backends with support of DPC++ (both for CPU and GPU) and OpenMP* GPU offload;
- [ze_tracer](tools/ze_tracer) - "Swiss army knife" for Level Zero API call tracing and profiling (former ze_intercept);
- [cl_tracer](tools/cl_tracer) - "Swiss army knife" for OpenCL(TM) API call tracing and profiling;
- [gpu_info](tools/gpu_info) - provides basic information about the GPUs installed in a system, and the list of HW metrics one can collect for it;
- [gpuinfo](tools/gpuinfo) - provides basic information about the GPUs installed in a system, and the list of HW metrics one can collect for it;

## Sample Tools & Utilities
- tools for OpenCL(TM), DPC++ (with OpenCL(TM) backend) and OpenMP* GPU offload (with OpenCL(TM) backend):
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.20.1
0.21.0
2 changes: 1 addition & 1 deletion chapters/device_activity_tracing/OpenCL.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Intel(R) Xeon(R) Processor / Intel(R) Core(TM) Processor (CPU) Runtimes use `Que
- Windows

**Supported HW**:
- Intel(R) Processor Graphics GEN9+
- Any

**Needed Headers**:
- OpenCL(TM) [headers](https://github.com/KhronosGroup/OpenCL-Headers)
Expand Down
6 changes: 4 additions & 2 deletions chapters/metrics_collection/MetricsDiscoveryAPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,5 +295,7 @@ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_libmd.so> ./<application>
[Compute Architecture Manuals](https://software.intel.com/en-us/articles/intel-graphics-developers-guides) to learn more on Intel(R) Processor Graphics architecture

## Samples
- [GPU Info](../../tools/gpu_info)
- [GPU Metrics for OpenCL(TM)](../../samples/cl_gpu_metrics)
- [GPU Metrics for OpenCL(TM)](../../samples/cl_gpu_metrics)

## Tools
- [GPU Info](../../tools/gpuinfo)
8 changes: 6 additions & 2 deletions chapters/runtime_api_tracing/OpenCL.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ cl_int CL_API_CALL clGetTracingStateINTEL(
- Windows
**Supported HW**:
- Any
- Intel(R) Processor Graphics, Intel(R) Xeon(R) Processor and Intel(R) Core(TM) Processor
**Needed Headers**:
- OpenCL(TM) [headers](https://github.com/KhronosGroup/OpenCL-Headers)
Expand Down Expand Up @@ -115,4 +115,8 @@ void Callback(cl_function_id fid,
- [OpenCL(TM) Hot Functions](../../samples/cl_hot_functions)
- [OpenCL(TM) Hot Kernels](../../samples/cl_hot_kernels)
- [OpenCL(TM) Debug Info](../../samples/cl_debug_info)
- [OpenCL(TM) GPU Metrics](../../samples/cl_gpu_metrics)
- [OpenCL(TM) GPU Metrics](../../samples/cl_gpu_metrics)

## Tools
- [OpenCL(TM) Tracer](../../tools/cl_tracer)
- [Tracing and Profiling Tool for Data Parallel C++ (DPC++)](../../tools/onetrace)
66 changes: 25 additions & 41 deletions samples/cl_hot_functions/README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,30 @@
# OpenCL(TM) Hot Functions
## Overview
This sample is a simple LD_PRELOAD based tool that allows to collect all called OpenCL(TM) API functions within an application along with their total execution time and call count.
This is a simple LD_PRELOAD based tool that allows to collect all called OpenCL(TM) API functions within an application along with their total execution time and call count for GPU device.

As a result, table like the following will be printed.
```
=== API Timing Results: ===
Total Execution Time (ns): 363687486
Total API Time for CPU backend (ns): 524
Total API Time for GPU backend (ns): 355355363
== CPU Backend: ==
Function, Calls, Time (ns), Time (%), Average (ns), Min (ns), Max (ns)
clGetDeviceIDs, 1, 524, 100.00, 524, 524, 524
== GPU Backend: ==
Function, Calls, Time (ns), Time (%), Average (ns), Min (ns), Max (ns)
clBuildProgram, 1, 173888026, 48.93, 173888026, 173888026, 173888026
clFinish, 4, 172908147, 48.66, 43227036, 42711785, 44318785
clEnqueueWriteBuffer, 8, 4636256, 1.30, 579532, 207825, 1864890
clEnqueueReadBuffer, 4, 2051244, 0.58, 512811, 498662, 542971
clEnqueueNDRangeKernel, 4, 1623139, 0.46, 405784, 236120, 609050
clReleaseMemObject, 12, 95182, 0.03, 7931, 3525, 16436
clCreateBuffer, 12, 81056, 0.02, 6754, 2511, 16990
clSetKernelArg, 16, 24515, 0.01, 1532, 141, 7038
clGetEventProfilingInfo, 8, 13139, 0.00, 1642, 103, 3288
clCreateContext, 1, 12680, 0.00, 12680, 12680, 12680
clReleaseProgram, 1, 9503, 0.00, 9503, 9503, 9503
clCreateProgramWithSource, 1, 3880, 0.00, 3880, 3880, 3880
clCreateKernel, 1, 2941, 0.00, 2941, 2941, 2941
clReleaseKernel, 1, 1679, 0.00, 1679, 1679, 1679
clGetKernelInfo, 4, 1617, 0.00, 404, 190, 552
clCreateCommandQueueWithProperties, 1, 1388, 0.00, 1388, 1388, 1388
clGetDeviceIDs, 2, 311, 0.00, 155, 138, 173
clReleaseCommandQueue, 1, 270, 0.00, 270, 270, 270
clGetDeviceInfo, 2, 227, 0.00, 113, 103, 124
clReleaseContext, 1, 163, 0.00, 163, 163, 163
Function, Calls, Time (ns), Average (ns)
clBuildProgram, 1, 183549198, 183549198
clCreateBuffer, 12, 108285, 9023
clCreateCommandQueueWithProperties, 1, 1265, 1265
clCreateContext, 1, 9322, 9322
clCreateKernel, 1, 3428, 3428
clCreateProgramWithSource, 1, 3219, 3219
clEnqueueNDRangeKernel, 4, 2237845, 559461
clEnqueueReadBuffer, 4, 2358133, 589533
clEnqueueWriteBuffer, 8, 5719781, 714972
clFinish, 4, 174064236, 43516059
clGetDeviceIDs, 2, 362, 181
clGetDeviceInfo, 2, 354, 177
clGetEventProfilingInfo, 8, 14198, 1774
clGetKernelInfo, 4, 2411, 602
clReleaseCommandQueue, 1, 1046, 1046
clReleaseContext, 1, 173, 173
clReleaseKernel, 1, 2741, 2741
clReleaseMemObject, 12, 110922, 9243
clReleaseProgram, 1, 11561, 11561
clSetKernelArg, 16, 75282, 4705
```
## Supported OS
- Linux
Expand All @@ -48,8 +35,7 @@ clCreateCommandQueueWithProperties, 1, 1388, 0.00,
- [Git](https://git-scm.com/) (version 1.8 and above)
- [Python](https://www.python.org/) (version 2.7 and above)
- [OpenCL(TM) ICD Loader](https://github.com/KhronosGroup/OpenCL-ICD-Loader)
- [Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver](https://github.com/intel/compute-runtime) to run on GPU
- [Intel(R) Xeon(R) Processor / Intel(R) Core(TM) Processor (CPU) Runtimes](https://software.intel.com/en-us/articles/opencl-drivers#cpu-section) to run on CPU
- [Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver](https://github.com/intel/compute-runtime)

## Build and Run
### Linux
Expand All @@ -65,10 +51,9 @@ Use this command line to run the tool:
```sh
./cl_hot_functions <target_application>
```
One may use [cl_gemm](../cl_gemm) or [dpc_gemm](../dpc_gemm) as target application:
One may use [cl_gemm](../cl_gemm) as target application:
```sh
./cl_hot_functions ../../cl_gemm/build/cl_gemm
./cl_hot_functions ../../dpc_gemm/build/dpc_gemm cpu
```
### Windows
Use Microsoft* Visual Studio x64 command prompt to run the following commands and build the sample:
Expand All @@ -83,8 +68,7 @@ Use this command line to run the tool:
```sh
cl_hot_functions.exe <target_application>
```
One may use [cl_gemm](../cl_gemm) or [dpc_gemm](../dpc_gemm) as target application:
One may use [cl_gemm](../cl_gemm) as target application:
```sh
cl_hot_functions.exe ..\..\cl_gemm\build\cl_gemm.exe
cl_hot_functions.exe ..\..\dpc_gemm\build\dpc_gemm.exe cpu
```
205 changes: 0 additions & 205 deletions samples/cl_hot_functions/cl_api_collector.h

This file was deleted.

Loading

0 comments on commit df968c9

Please sign in to comment.