-
-
Notifications
You must be signed in to change notification settings - Fork 541
Preliminary support for NVIDIA Jetson boards #1692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I fear the option of |
|
Another problem is the conflict with #1620, which is an attempt to unify the GPU meter structure to one interface. |
linux/NvidiaJetson.c
Outdated
| RichString_appendAscii(out, CRT_colors[METER_VALUE], buffer); | ||
|
|
||
| RichString_appendAscii(out, CRT_colors[METER_TEXT], " temp:"); | ||
| xSnprintf(buffer, sizeof(buffer), "%.1f°C", this->values[JETSON_GPU_TEMP]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use CRT_degreeSign rather than hard-code a degree sign here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Additionally supported Fahrenheit.
I looked through 'main' branch implementation of the GpuMeter. If I understand correctly, it collects information about the GPU usage from each running process. NVIDIA Jetson has a different approach - it provides a separate GPU statistics via sysfs / custom nvgpu driver. Since all the NVIDIA Jetson specific code is hidden under the C define 'NVIDIA_JETSON', there should be no code collisions. Semantically, the switch 'NVIDIA_JETSON' for GPU might turn off all the future code in #1620 and turning on the Jetson specific GPU code (anyway, all the data is already collected by the nvgpu driver). You could merge the final version of the #1620 first, then I'll figure out how to reuse it correctly, on the next big holidays :) |
The main purpose of the commit was to minimize the interference with the major code base for the default x86_64 platform. Honestly, I do not want to compile in the nvidia jetson board specific code anywhere else. What approach would you recommend here? |
53914a0 to
68ddb34
Compare
There are two ideas that came in my mind.
Update: Oh no. Nvidia didn't use a unique machine type for their GCC cross-toolchain. Reference |
@BenBE , as a maintainer, are you agree? If so, I will fix according to the idea №2. |
|
There's some internal discussion still going on. We're still discussing which direction we'd like to move forward in. |
0f0f553 to
109046f
Compare
|
Changes:
Added the Action for this functionality. Pressing 'g' hot key the main screen shows only the processes which uses GPU right now. Having the GPU_MEM field, you see the current GPU load per process. Useful, I guess. Hope, you'll utilize the same approach in your future development. |
|
I've left all the deep details in both: the commit message and the NvidiaJetson.c file. Have a look, please. @BenBE Finally, with "Jetson GPU" Meter and "g" hot key applied, with GPU_MEM field, the "htop" looks like this: |
CPUMeter.c
Outdated
| #else | ||
| #ifdef BUILD_WITH_CPU_TEMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe flatten into one #if block as:
#if defined(BUILD_WITH_CPU_TEMP) && !defined(NVIDIA_JETSON)
…
#elif defined(NVIDIA_JETSON)
…
#endifThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the relationship between the macro NVIDIA_JETSON and BUILD_WITH_CPU_TEMP? At a glance they don't seem to be mutually exclusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS, BUILD_WITH_CPU_TEMP is set, whenever you are building with temperature sensor support (i.e. libsensors or similar source), It's also set, when building with NVIDIA Jetson support.
You also could check for libsensors in the first block AFAICS, but that should be properly evaluated it that hits all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The CPU temperature showing code is rewritten.
- The duplication is removed.
- Additional settings showCPUTemperatureFractional is added.
- All decisions are explained in comments.
linux/LinuxMachine.c
Outdated
| if (settings->showCPUFrequency | ||
| #ifdef HAVE_SENSORS_SENSORS_H | ||
| || settings->showCPUTemperature | ||
| || settings->showCPUTemperature /* TODO: looks like this line in the condition might be removed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain your comment on this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Removed
/* TODO: ... */comment. Let's keep backward compatibility. - Explanation.
The semantics is broken. If showCPUTemperature is activated, the application starts to read CPU frequencies from the system via LinuxMachine_scanCPUFrequency function. There's no connection with the CPU temperature. CPU frequency is not shown in case if showCPUFrequency is turned off. No commentary why it's done. Looks like some old forgotten hack (maybe there was a connection cpu_frequency with cpu_temperature some time ago, or other platforms have/had these connection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code inside the linux directory only affects Linux itself, thus there should be no connection with other platforms here. And if the code guarded by this condition no longer has any references to CPU temperatures, then this condition can be removed.
IIRC libsensors is used for CPU temperature stuff, but this was refactored some time ago. I thus think it's more likely a remnant from that refactoring.
106efc9 to
af4c327
Compare
af4c327 to
212c424
Compare
|
Fixed issues, @BenBE, have a look, please. |
Action.c
Outdated
| #ifdef NVIDIA_JETSON | ||
| #include "NvidiaJetson.h" | ||
| #include "ProcessTable.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Includes should be kept near the top of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
CRT.h
Outdated
| #ifdef NVIDIA_JETSON | ||
| GPU_FILTER, | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to define this conditional. Also, probably group near the GPU_ENGINE_x fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
CRT.c
Outdated
| #ifdef NVIDIA_JETSON | ||
| [GPU_FILTER] = A_BOLD | ColorPair(Red, Cyan), | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be inserted for every color scheme. Also see the note regarding conditional defines/ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
XUtils.h
Outdated
| return r; | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one blank line between functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
linux/NvidiaJetson.c
Outdated
| char CPU_TEMP_SENSOR_PATH[64]; | ||
| char GPU_TEMP_SENSOR_PATH[64]; | ||
| char GPU_FREQ_SENSOR_PATH[64]; | ||
| char GPU_LOAD_SENSOR_PATH[64]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only pre-processor macros should have uppercase names.
Also not quite happy with fixed-size buffers for filenames here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed, changed to the xMalloc'ed strings
212c424 to
82c18be
Compare
XUtils.c
Outdated
| char *xSnprintfDup(char *buf, const size_t len, const char *fmt, ...) { | ||
| assert(buf != NULL); | ||
| assert(len > 0); | ||
|
|
||
| va_list vl; | ||
| va_start(vl, fmt); | ||
| int n = vsnprintf(buf, len, fmt, vl); | ||
| va_end(vl); | ||
|
|
||
| if (n < 0 || (size_t)n >= len) { | ||
| fail(); | ||
| } | ||
|
|
||
| /* vsnprintf returns number of bytes without terminating null character */ | ||
| size_t slen = n + 1; | ||
| void *data = malloc(slen); | ||
| if (!data) { | ||
| fail(); | ||
| } | ||
|
|
||
| memcpy(data, buf, slen); | ||
| return data; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a case for xAsprintf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Really, haven't seen the asprintf() before :)
Non POSIX, but pretty useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You rarely ever see it in code, but when you see it, it's quite helpful.
There's often the question regarding whether or not to force that one additional allocation, but when not inside a tight loop, that's usually fine.
Machine.c
Outdated
|
|
||
| #ifdef NVIDIA_JETSON | ||
| NvidiaJetson_LoadGpuProcessTable(this->activeTable->table); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the Jetson code is placed in the Linux platform directory, I'd've expected this call somewhere inthe LinuxMachine code …
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the LinuxProcessTable::ProcessTable_goThroughEntries.
82c18be to
f096e35
Compare
NVIDIA Jetson device is an insdustrial Linux based embedded aarch64
platfrom with powerful builtin GPU, which is used for AI tasks,
mostly for CV purposes.
The support is provided via --enable-nvidia-jetson switch in the
configure script.
All the source code related to the NVIDIA Jetson is placed in the
linux/NvidiaJetson.{h,c} source files and hidden by 'NVIDIA_JETSON'
C preprocessor define. So, for x86_64 platforms the
source code stays unchanged.
Additional functionality added by this commit:
1. Fix for the CPU temperature reading. The Jetson device is not
supported by libsensors. The CPU has 8 cores with only one CPU
temperature sensor for all of them located in the thermal zone file.
libsensor might be compiled in or turned off. The additional care was
taken to provide successfull build with/without libsensors.
2. The Jetson GPU Meter was added: current load, frequency and
temperature.
3. The exact GPU memory allocated by each process is loaded from the
nvgpu kernel driver via sysfs and merged to the LinuxProcess data
(field LinuxProcess::gpu_mem). The field "GPU_MEM" visualizes this
field. For root user only.
4. Additional filter for processes which use GPU right now via hot
key 'g', the help is supplied. For root user only.
== Technical details ==
The code tries to find out the correct sensors during the application
startup. As an example, the sensors location for NVIDIA Jetson Orin
are the following:
- CPU temperature: /sys/devices/virtual/thermal/thermal_zone0/type
- GPU temperature: /sys/devices/virtual/thermal/thermal_zone1/type
- GPU frequency: /sys/class/devfreq/17000000.gpu/cur_freq
- GPU curr load: /sys/class/devfreq/17000000.gpu/device/load
Measure:
- The GPU frequency is provided in Hz, shown in MHz.
- The CPU/GPU temperatures are provided in Celsius multipled by 1000
(milli Celsius), shown in Cesius
P.S. The GUI shows all temperatures for NVIDIA Jetson with additional
precision comparing to the default x86_64 platform.
If htop starts with root privileges (effective user id is 0), the
experimental code activates. It reads the fixed sysfs file
/sys/kernel/debug/nvmap/iovmm/clients with the following content, e.g.:
```
CLIENT PROCESS PID SIZE
user gpu_burn 7979 23525644K
user gnome_shell 8119 5800K
user Xorg 2651 17876K
total 23549320K
```
Unfortunately, the /sys/kernel/debug/* files are allowed to read only for
the root user, that's why the restriction applies.
The patch also adds a separate field 'GPU_MEM', which reads data from
the added LinuxProcess::gpu_mem field. The field stores memory allocated for GPU
in kilobytes. It is populated by the function NvidiaJetson_LoadGpuProcessTable
(the implementation is located in NvidiaJetson.c), which is called at the end of
the function Machine_scanTables.
Additionally, the new Action is added: actionToggleGpuFilter, which is activated by
'g' hot key (the help is updated appropriately). The GpuFilter shows only the
processes which currently utilize GPU (i.e. highly extended nvmap/iovmm/clients table).
It is achieved by the filtering machinery associated with ProcessTable::pidMatchList.
The code below constructs GPU_PID_MATCH_LIST hash table, then actionToggleGpuFilter
either stores it to the ProcessTable::pidMatchList or restores old value of
ProcessTable::pidMatchList.
The separate LinuxProcess's PROCESS_FLAG_LINUX_GPU_JETSON (or something ...) flag isn't
added for GPU_MEM, because currently the functionality of population LinuxProcess::gpu_mem
is shared with the GPU consumers filter construction.
So, even if GPU_MEM field is not activated, the filter showing GPU consumers should work.
This kind of architecture is chosen intentially since it saves memory for the hash table
GPU_PID_MATCH_LIST (which is now actually a set), and therefore increases performance.
All other approaches convert GPU_PID_MATCH_LIST to a true key/value storage (key = pid,
value = gpu memory allocated) with further merge code.
== NVIDIA Jetson models ==
Tested for NVIDIA Jetson Orin and Xavier boards.
f096e35 to
bd4285e
Compare
|
Polite ping, @BenBE |
NP. Haven't forgotten you, just busy right now (day job). Will take a look as things calm down again; this will take a few weeks though. |

NVIDIA Jetson device is an insdustrial Linux based embedded aarch64 platfrom with powerful builtin GPU, which is used for AI tasks, mostly for CV purposes.
The support is provided via --enable-nvidia-jetson switch in the configure script.
All the source code related to the NVIDIA Jetson is placed in the linux/NvidiaJetson.{h,c} source files and hidden by 'NVIDIA_JETSON' C preprocessor define. So, for x86_64 platforms the source code stays unchanged.
Additional functionality added by this commit:
== Technical details ==
The code tries to find out the correct sensors during the application startup. As an example, the sensors location for NVIDIA Jetson Orin are the following:
Measure:
P.S. The GUI shows all temperatures for NVIDIA Jetson with additional precision comparing to the default x86_64 platform.
== NVIDIA Jetson models ==
Tested for NVIDIA Jetson Orin and Xavier boards.