Variorium Connector: kernel granularity and JSON#265
Conversation
|
Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find |
masterleinad
left a comment
There was a problem hiding this comment.
Please provide sample output.
| #include <inttypes.h> | ||
| #include <chrono> | ||
| #include <cstdlib> | ||
| #include <cstring> | ||
| #include <vector> | ||
| #include <unordered_set> | ||
| #include <string> | ||
| #include <regex> | ||
| #include <ctime> | ||
| #include <cxxabi.h> | ||
| #include <dlfcn.h> | ||
| #include <ctime> | ||
| #include <chrono> | ||
| #include <iostream> | ||
| #include <fstream> | ||
|
|
||
| #include <inttypes.h> | ||
| #include <iostream> | ||
| #include <regex> | ||
| #include <stdio.h> | ||
| #include <string> | ||
| #include <unordered_set> | ||
| #include <vector> |
There was a problem hiding this comment.
Which of these header files are actually required? Please make sure to only use what you need.
| int power = -1; | ||
| int power1 = 0; | ||
| int power2 = 0; | ||
| int filemake = -1; | ||
| long long time1 = 0; | ||
| long long time2 = 0; | ||
| uint32_t gdevID = -1; | ||
| std::vector<float> gpu_powers; | ||
| std::vector<float> gpu_powers2; | ||
| std::string output = ""; |
There was a problem hiding this comment.
What are all these global variables used for? Please add comments in the code explaining their usage.
| int type_of_profiling = | ||
| 0; // 0 is for both print power & json, 1 is for print power, 2 is for json | ||
| bool usingMPI = false; | ||
| bool verbosePrint = false; | ||
| bool mpiOutPut = false; | ||
| 0; // 0 is for both print power & json, 1 is for print power, 2 is for json | ||
| bool usingMPI = false; | ||
| bool verbosePrint = true; | ||
| bool mpiOutPut = false; |
There was a problem hiding this comment.
Are all of these still used/necessary?
| /*void printFile() { | ||
|
|
||
| std::ofstream file("variorumoutput.txt", std::ios::app); // Open in append | ||
| mode if (!file) { std::cerr << "Error creating the file!" << std::endl; } else { | ||
| std::cout << "File created or opened successfully." << std::endl; | ||
| } | ||
| file.close(); // Close the file | ||
| i}*/ |
There was a problem hiding this comment.
Please remove unused code.
| file.close(); // Close the file | ||
| i}*/ | ||
| void printFile() { | ||
| const std::string filename = "variorumoutput.txt"; |
There was a problem hiding this comment.
Maybe make this configurable via an environment variable.
| // std::cout << "Number of Sockets: " << num_sockets << | ||
| // std::endl; |
There was a problem hiding this comment.
| // std::cout << "Number of Sockets: " << num_sockets << | |
| // std::endl; |
| char *num_sockets_pos = strstr(s, "\"num_gpus_per_socket\":"); | ||
| if (num_sockets_pos != nullptr) { | ||
| num_sockets_pos += strlen("\"num_gpus_per_socket\":"); | ||
|
|
||
| char *num_sockets_end_pos = strchr(num_sockets_pos, ','); | ||
| if (num_sockets_end_pos == nullptr) { | ||
| num_sockets_end_pos = strchr(num_sockets_pos, '}'); | ||
| } | ||
| if (num_sockets_end_pos != nullptr) { | ||
|
|
||
| std::string num_sockets_str(num_sockets_pos, | ||
| num_sockets_end_pos - num_sockets_pos); | ||
|
|
||
| num_sockets = std::stoll(num_sockets_str); |
There was a problem hiding this comment.
Please try using jansson here as well (and below).
| std::cout << " Gpu start power: " << gpu_powers2[gdevID] | ||
| << " Gpu end power: " << gpu_powers[gdevID] | ||
| << " Energy Estimation: " | ||
| << ((gpu_powers[gdevID] + gpu_powers2[0]) / 2) * | ||
| ((temp - time1) * .001) | ||
| << std::endl; |
There was a problem hiding this comment.
Let's just not print anything to stdout but only to the file.
| /* std::cout << "kokkos library call\n" << std::endl; | ||
| if (usingMPI) { | ||
| variorum_call_mpi(); | ||
| } else { | ||
| variorum_call(); | ||
| } | ||
| time_t total_time; | ||
| time_t end_time; | ||
| time(&end_time); | ||
| std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | ||
| << "\n"; | ||
| total_time = end_time - start_time; | ||
|
|
||
| std::cout << "The kokkos library was alive for " << total_time << " | ||
| seconds." | ||
| << std::endl;*/ |
There was a problem hiding this comment.
| /* std::cout << "kokkos library call\n" << std::endl; | |
| if (usingMPI) { | |
| variorum_call_mpi(); | |
| } else { | |
| variorum_call(); | |
| } | |
| time_t total_time; | |
| time_t end_time; | |
| time(&end_time); | |
| std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | |
| << "\n"; | |
| total_time = end_time - start_time; | |
| std::cout << "The kokkos library was alive for " << total_time << " | |
| seconds." | |
| << std::endl;*/ |
There was a problem hiding this comment.
@twilk10 Is this suggestion OK with you? If so, can you commit it?
It is good with me.
| std::ostream &operator<<( | ||
| std::ostream &os, | ||
| const Kokkos::Tools::Experimental::ExecutionSpaceIdentifier &identifier) { | ||
|
|
||
| os << " Type: " << identifier.type << ""; | ||
| os << " Device ID: " << identifier.device_id << ""; | ||
| gdevID = identifier.device_id; | ||
| os << " Instance ID: " << identifier.instance_id; | ||
| // output += " Device ID: " + std::to_string(identifier.device_id) + | ||
| // " Instance ID: " + | ||
| // std::to_string(identifier.instance_id); | ||
| writeToFile("variorumoutput.txt", output); | ||
| return os; | ||
| } |
There was a problem hiding this comment.
Don't overload operator<< but just write to the output file directly.
|
Please use |
+1 @twilk10 Thanks for submitting this PR! To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add |
Thank you I will work on getting that information up as soon as possible. |
Thanks! Be sure to run clang-format-8 on each code file that you changed in this PR, as @masterleinad mentions in one of his comments. This is a prereq for your PR to be merged. Note that this is the CI check that doesn't pass. |
I ran clang-format 8 on it I will have to evaluate what went wrong and try it again. |
@twilk10 OK. You want to do something like:
You can diff in between to see whether and/or how the file has indeed changed and then If that doesn't work, can you send the error output with the flag |
|
@twilk10 Can you look at @masterleinad 's comments particularly on printing output and parsing json with jansson? I think they are good suggestions and maybe you can commit those before committing other changes with clang-format. Also, make sure to note to not include any unneeded headers. |
I will take a look and try updating withthe help of these suggestions. |
I am having trouble getting access to clang-format-8 but I can get other versions. |
I have solved this issue. |
|
Great! All checks have passed for clang-format-8. Can you address the comments by @masterleinad ? (You can commit suggestions if you agree). I also updated the title to capture the changes you made in your code better. @masterleinad Are there any other things to resolve here? |
I am currently working on updating the connector to address the comments made by @masterleinad |
| @@ -1,4 +1,5 @@ | |||
| //@HEADER | |||
| //@HEADER140 | |||
There was a problem hiding this comment.
| //@HEADER140 | |
| //@HEADER |
There was a problem hiding this comment.
You still have HEADER140 here.
| std::ostream &operator<<( | ||
| std::ostream &os, | ||
| const Kokkos::Tools::Experimental::DeviceType &deviceType) { | ||
| switch (deviceType) { | ||
| case Kokkos::Tools::Experimental::DeviceType::Serial: | ||
| os << "CPU"; | ||
| output += "Type: CPU"; | ||
| break; | ||
| case Kokkos::Tools::Experimental::DeviceType::OpenMP: | ||
| os << "OpenMP"; | ||
| output += "Type: OpenMP"; | ||
| break; | ||
| case Kokkos::Tools::Experimental::DeviceType::Cuda: | ||
| os << "cuda"; | ||
| output += "Type: CUDA"; | ||
| break; | ||
| case Kokkos::Tools::Experimental::DeviceType::HIP: | ||
| os << "hip"; | ||
| output += "Type: HIP"; | ||
| break; | ||
| case Kokkos::Tools::Experimental::DeviceType::OpenMPTarget: | ||
| os << "openmptarget"; | ||
| output += "Type: openmptarget"; | ||
| break; | ||
| case Kokkos::Tools::Experimental::DeviceType::HPX: | ||
| os << "hpx"; | ||
| output += "Type: hpx"; | ||
| break; | ||
| case Kokkos::Tools::Experimental::DeviceType::Threads: | ||
| os << "threads"; | ||
| output += "Type: threads"; | ||
| break; | ||
| case Kokkos::Tools::Experimental::DeviceType::SYCL: | ||
| os << "sycl"; | ||
| output += "Type: SYCL"; | ||
| break; | ||
| case Kokkos::Tools::Experimental::DeviceType::OpenACC: | ||
| os << "openacc"; | ||
| output += "Type: OPENACC"; | ||
| break; | ||
|
|
||
| default: | ||
| os << "Unknown Device Type"; | ||
| output += "Type: Uknown Device Type"; | ||
| break; | ||
| } | ||
| time_t total_time; | ||
| time_t end_time; | ||
| time(&end_time); | ||
| std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | ||
| << "\n"; | ||
| total_time = end_time - start_time; | ||
|
|
||
| std::cout << "The kokkos library was alive for " << total_time << " seconds." | ||
| << std::endl; | ||
| return os; |
There was a problem hiding this comment.
Define a function such as
kokkos-tools/profiling/space-time-stack/kp_space_time_stack.cpp
Lines 54 to 67 in 5e895a3
Kokkos::Tools::Experimental::DeviceType implicitly printable.
|
This is an update to the Kokkos-tool variorum connector and can be used to get the power data an Compatible with variorum version 6.0 |
|
You pushed your build directory. You need to remove it. |
| @@ -1,4 +1,5 @@ | |||
| //@HEADER | |||
| //@HEADER140 | |||
| //#include <cxxabi.h> | ||
| //#include <dlfcn.h> |
There was a problem hiding this comment.
| //#include <cxxabi.h> | |
| //#include <dlfcn.h> |
If you don't need it, remove it.
| if (gdevID == 0) { | ||
| switch (gdevID) { | ||
| case 1: { | ||
| json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_1"); |
There was a problem hiding this comment.
| json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_1"); | |
| json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_" + std::to_string(gdevID)); |
and remove all the switch cases.
| output = " Energy Estimation " + std::to_string(((power1 + power2) / 2) * | ||
| ((temp - time1) * .001)); |
| std::cout << "No MPI Option provided, not using per rank output" | ||
| << std::endl; | ||
| usingMPI = false; | ||
| // usingMPI = false; |
There was a problem hiding this comment.
| // usingMPI = false; |
| std::ostream& operator<<( | ||
| std::ostream& os, | ||
| const Kokkos::Tools::Experimental::ExecutionSpaceIdentifier& identifier) { | ||
| gdevID = identifier.device_id; | ||
|
|
||
| output += | ||
| " Device ID: " + std::to_string(identifier.device_id) + | ||
| " Instance ID: " + std::to_string(identifier.instance_id) + | ||
| " DeviceType: " + | ||
| deviceTypeToString(static_cast<Kokkos::Tools::Experimental::DeviceType>( | ||
| identifier.device_id)); | ||
|
|
||
| writeToFile(filename, output); | ||
| return os; | ||
| } |
There was a problem hiding this comment.
Please don't introduce an overload for operator<< but use a stringstream or write to the file immediately instead.
| writeToFile(filename, "name: " + std::string(name)); | ||
| gdevID = devID; | ||
| auto result = Kokkos::Tools::Experimental::identifier_from_devid(devID); | ||
| std::cout << result; |
There was a problem hiding this comment.
Please don't print to cout anywhere but only to the output file.
Please provide output from the incremental tests or so. |
| // Initial and final power values for a kernel | ||
| double global_power[2] = {0, 0}; | ||
| // Initial and final time value for a kernel | ||
| long long global_time[2] = {0, 0}; | ||
| uint32_t global_device_id = -1; | ||
| uint32_t global_instance_id = -1; |
There was a problem hiding this comment.
Obvisously, this tool is not thread-safe.
| uint32_t global_instance_id = -1; | ||
| std::string global_filename; | ||
| std::string global_kernel_name; | ||
| std::string global_device_type; |
There was a problem hiding this comment.
why do we need so many global variables?
There was a problem hiding this comment.
We need to record a bunch of information in the "begin kernel" calls that we are using in the "end kernel" calls. We can't pass that through the API interfaces directly.
|
Might be a good idea to move forward with this PR |
A review would help. 🙂 |
dalg24
left a comment
There was a problem hiding this comment.
replying to #265 (comment)
I will defer to you.
| variorum_call(); | ||
| } | ||
| auto result = Kokkos::Tools::Experimental::identifier_from_devid(devID); | ||
| global_kernel_name = std::string(name); |
There was a problem hiding this comment.
| global_kernel_name = std::string(name); | |
| global_kernel_name = name; |
Same comment everywhere
| json_t* my_power_obj = nullptr; | ||
| my_power_obj = json_object(); |
There was a problem hiding this comment.
| json_t* my_power_obj = nullptr; | |
| my_power_obj = json_object(); | |
| json_t* my_power_obj = json_object(); |
There was a problem hiding this comment.
Fragment from previous version, remove my_power_obj. We're using root now instead.
| if (!root) { | ||
| fprintf(stderr, "Error parsing JSON: %s\n", error.text); | ||
| } |
|
Thanks @masterleinad for adding the CI testing. Seems to compile fine with Variorum for me. |
|
I am ok with this if CI with Variorum succeeds. |
| cmake --install build-with-${{ matrix.preset }} --prefix=${Kokkos_ROOT} | ||
| - name: Build Kokkos Tools, enabling examples | ||
| run: | | ||
| export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH};${VARIORUM_ROOT} |
There was a problem hiding this comment.
I expect the environment variable already adds it to the search path
There was a problem hiding this comment.
Apparently it is spelled with an uppercase
V.Please adjust your env variable accordingly and remove the line I highlighted.
There was a problem hiding this comment.
The previous commit had Variorum_ROOT as environemnt variable and didn't find variorum.
kokkos-tools/cmake/configure_variorum.cmake
Lines 8 to 10 in dd08b98
There was a problem hiding this comment.
That configure cmake file logic is absurd and need fixing.
(On my cell can't navigate easily but in short it should not write to Variorum_ROOT that is nonsensical)
Updates to the current connector with updates such as json updates, energy estimation, device information. It has also been altered to get power information at the kernel level.