Refactor providers into separate libraries #1190

RyanUnderhill · 2025-01-16T07:05:41Z

This removes most of the #if USE_CUDA and #if USE_DML blocks for the model handling code. Device memory management is also handled through the DeviceSpan structure and now all data copying is done in a device independent manner.

It's a huge change, and there will be some rough edges when submitted. Goal is to unblock other people needing the changes and then to make larger improvements in future prs.

Details: Add a DML DeviceInterface and DML DeviceBuffer handler. Remove #if blocks that are doing memory copies between device/cpu memory and use the DeviceSpan interface.

Remove as many #if USE_CUDA/USE_DML as possible

src/cuda/interface.cpp

aciddelgado · 2025-01-21T22:42:11Z

src/dml/interface.cpp

+std::string CurrentModulePath();
+
+namespace Generators {
+namespace Dml {  // If this was in a shared library it wouldn't need to be in its own namespace


Why isn't the CUDA one in it's own namespace? If we build with both DML and CUDA, wouldn't GpuMemory overlap (if it weren't for the namespace)

Cuda is built as a separate shared library so it shouldn't overlap at all. Though I'm not 100% sure I'm using the right dlopen options as symbols on non windows can behave differently.

src/dml/interface.cpp

aciddelgado · 2025-01-22T00:37:31Z

src/models/input_ids.cpp

-#endif
-    }
-
+    value_ = OrtValue::CreateTensor<int32_t>(*model_.allocator_device_, shape_);


where did static buffer go?

It went away, the existing logic was overly complicated and I couldn't figure out why we still needed it. Do you know?

Likely related to graph capture stuff. But I don't know details.

aciddelgado · 2025-01-22T00:43:14Z

src/models/input_ids.cpp

-          }
-        }
-      }
+  // Update input_ids with next tokens


nit: the comment made me think WrapTensor was doing the update when it's the following code

Ah, I can see that. I'm not sure what would make it clearer. Maybe an extra newline?

natke

Can you enumerate the places where any #ifdefs remain and why they need to be there please

And what impact will the rough edges have and can they be smoothed before you merge this PR?

baijumeswani

First pass through the code

baijumeswani · 2025-01-27T17:48:07Z

test/c_api_tests.cpp

@@ -31,7 +31,6 @@ TEST(CAPITests, Config) {
  config->AppendProvider("cuda");
 #endif
 }
-


Why remove this line?

baijumeswani · 2025-01-27T17:50:19Z

src/models/captured_graph_pool.cpp

@@ -19,7 +19,7 @@ void CapturedGraphInfoRecycler::operator()(CapturedGraphInfo* captured_graph_inf
 }

 CapturedGraphInfoPtr CapturedGraphPool::ReserveCapturedGraph(const Model& model, const GeneratorParams& params) const {
-  if (!params.use_cuda_graph || (model.device_type_ != DeviceType::CUDA && model.device_type_ != DeviceType::DML)) {
+  if (!params.use_cuda_graph || (model.device_type_ != DeviceType::CUDA)) {


Does this mean that DML will no longer use graph capture?

baijumeswani · 2025-01-27T17:58:50Z

src/models/input_ids.cpp

-#endif
-    }
-
+    value_ = OrtValue::CreateTensor<int32_t>(*model_.allocator_device_, shape_);


Likely related to graph capture stuff. But I don't know details.

baijumeswani · 2025-01-27T18:01:39Z

src/generators.h

@@ -160,6 +167,8 @@ std::shared_ptr<GeneratorParams> CreateGeneratorParams(const Model& model);
 std::shared_ptr<GeneratorParams> CreateGeneratorParams(const Config& config);  // For benchmarking purposes only
 std::unique_ptr<Generator> CreateGenerator(const Model& model, const GeneratorParams& params);

+void CopyThroughCpu(DeviceBuffer& dest, size_t begin_dest, DeviceBuffer& source, size_t begin_source, size_t size_in_bytes);


A comment explaining what CopyThroughCpu means here would be helpful.

I see the comment in the cpp. :)
Could we move it here?

Yep, I put it in both places as it was small.

baijumeswani · 2025-01-27T18:15:12Z

src/generators.cpp

@@ -96,77 +92,73 @@ OrtEnv& GetOrtEnv() {
  return *GetOrtGlobals()->env_;
 }

+// Fallback to copy between two separate device buffers by going through CPU memory (slow unless we're the CPU device)
+void CopyThroughCpu(DeviceBuffer& dest, size_t begin_dest, DeviceBuffer& source, size_t begin_source, size_t size_in_bytes) {


Should this go inside device interface file as a free function as opposed to generators.cpp?

It's used by the cuda provider directly, so having it inside the CPU interface made that difficult. I might change it going forward once we have more shared providers and base class implementations (inheritance doesn't work across shared libraries, so passing in the CPU provider as the base interface might be the solution).

baijumeswani · 2025-01-27T18:16:40Z

src/generators.cpp

+  void DumpSpan(std::ostream& stream, std::span<const float> values) override { return Generators::DumpSpan(stream, values); }
+  void DumpSpan(std::ostream& stream, std::span<const int> values) override { return Generators::DumpSpan(stream, values); }


Why is this scope needed?

baijumeswani · 2025-01-27T18:18:21Z

src/generators.cpp

  }
-  throw std::runtime_error("Unknown device type");


Is the static analysis tools we have smart enough to detect that the control will never reach end of this function. Do we need a dummy return std::string()?

It was actually a build warning on android or iOS that spurred me to fix that. I didn't see any other issues after doing it.

baijumeswani · 2025-01-27T18:19:31Z

src/generators.cpp

    case DeviceType::CUDA:
      return GetCudaInterface();
+#if USE_DML


Why do we need this #if and not one for CUDA?

It's not built as a shared library yet, only cuda is. So we can't just try loading it and failing if it doesn't exist. It's either built with it or not. Once it's a shared library there will be no #ifdef.

I could have the function exist but the definition of this function will be inside another #ifdef !USE_DML so it's just moving the problem around.

baijumeswani · 2025-01-27T18:20:34Z

src/cuda/beam_search_scorer_cuda.cuh

@@ -1,3 +1,4 @@
+#include "models/onnxruntime_api.h"


license at the top of the file.

baijumeswani · 2025-01-27T18:29:08Z

src/models/whisper.cpp

@@ -42,7 +38,7 @@ Whisper_State::Whisper_State(const Whisper_Model& model, DeviceSpan<int32_t> seq
  }

  if (inputs.alignment_heads != nullptr) {
-#if USE_CUDA
+#if 0  // USE_CUDA


Will these USE_CUDA's be removed?

Yep, that will happen when Kunal merges his whisper branch

RyanUnderhill · 2025-01-28T19:56:45Z

Can you enumerate the places where any #ifdefs remain and why they need to be there please

And what impact will the rough edges have and can they be smoothed before you merge this PR?

There are some #if USE_CUDA in our tests, this shouldn't be a problem
There are two #if USE_DML, one in generators.cpp due to it not being a shared library and a second in model.cpp for a similar reason. Making it into a shared library should factor those out and remove the #ifs (the shared library's existence takes the place of the #if, since when it's static you will fail to build without the #if)

The rough edges are just expected simple bugs we'll find and easily fix that I can't find in advance.

edgchen1 · 2025-01-29T20:26:50Z

src/models/model.cpp

+
+  auto& device = GetOrtGlobals()->allocator_device_[static_cast<int>(type)];
+  if (!device) {
+    static const char* device_type_names[static_cast<int>(DeviceType::MAX)] = {"CPU - SEE ABOVE", "Cuda", "DML", "WebGPU Buffer"};


what about DeviceType::QNN?

I think "WebGPU Buffer" should be "WebGPU_Buffer".
https://github.com/microsoft/onnxruntime/blob/e3e41739a7ca0ce0806805aa7e2814c72748d0e5/include/onnxruntime/core/framework/allocator.h#L56
https://github.com/microsoft/onnxruntime/blob/e3e41739a7ca0ce0806805aa7e2814c72748d0e5/onnxruntime/core/framework/allocator.cc#L143

Good catch on WebGPU_Buffer.

Doesn't QNN use CPU memory? It doesn't have a device allocator "QnnWithSharedMemory".

Actually, you gave me an idea, this should fail to compile if new providers are added. I fixed it.

Lint

RyanUnderhill added 13 commits November 22, 2024 17:24

Use DeviceInterface for debugging

7e4668b

Merge remote-tracking branch 'origin/main' into ryanunderhill/providers

34381af

Merge remote-tracking branch 'origin/main' into ryanunderhill/providers

35e79ce

Summary: Remove #ifdefs for providers and go through device interface.

3823664

Details: Add a DML DeviceInterface and DML DeviceBuffer handler. Remove #if blocks that are doing memory copies between device/cpu memory and use the DeviceSpan interface.

Finish refactoring model processing

41b462a

Remove as many #if USE_CUDA/USE_DML as possible

Merge with main

237fb1e

Fix merge build issues

bdbb09c

Formatting

66321dd

Build fixes

0bc39a5

Merge with main

0f2ea36

Build fix

5244049

Build fix

49b51ef

Fix input_ids issue from merge

d3db2f6

aciddelgado reviewed Jan 21, 2025

View reviewed changes

aciddelgado reviewed Jan 22, 2025

View reviewed changes

RyanUnderhill added 14 commits January 21, 2025 16:56

Fix C# unit tests

133d5a0

Try again to fix C# test

b079b74

Merge with main

0e7064c

Test theory

afecf1d

Test instrumenting

1734f5c

Crash investigation

2bc83eb

Extra debug logging

fd788d7

Merge with main

67d914c

Undefined behavior fix in startup

0303592

Don't load cuda library outside of linux & windows

d87807c

Fix iOS break

2df5fe1

Android tweak

6736517

Leftover #ifdef fix

a011fe0

Type tweak

c11704f

natke self-requested a review January 27, 2025 18:31

natke reviewed Jan 27, 2025

View reviewed changes

natke requested review from hanbitmyths, baijumeswani and ajindal1 January 27, 2025 18:34

baijumeswani reviewed Jan 27, 2025

View reviewed changes

Review feedback

45dad2b

edgchen1 reviewed Jan 29, 2025

View reviewed changes

RyanUnderhill added 3 commits January 29, 2025 12:51

Edward gave me ideas.

53c666c

Clean up allocators, now everything is through p_device_* interfaces.

e804697

Previous change also added device interfaces for webgpu & qnn

f8ed9ce

Lint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor providers into separate libraries #1190

Refactor providers into separate libraries #1190

RyanUnderhill commented Jan 16, 2025

aciddelgado Jan 21, 2025

RyanUnderhill Jan 26, 2025

aciddelgado Jan 22, 2025

RyanUnderhill Jan 26, 2025

baijumeswani Jan 27, 2025

aciddelgado Jan 22, 2025

RyanUnderhill Jan 26, 2025

natke left a comment •

edited

Loading

baijumeswani left a comment

baijumeswani Jan 27, 2025

RyanUnderhill Jan 28, 2025

baijumeswani Jan 27, 2025

baijumeswani Jan 27, 2025

baijumeswani Jan 27, 2025

baijumeswani Jan 27, 2025

RyanUnderhill Jan 28, 2025

baijumeswani Jan 27, 2025

RyanUnderhill Jan 28, 2025

baijumeswani Jan 27, 2025

baijumeswani Jan 27, 2025

RyanUnderhill Jan 27, 2025

baijumeswani Jan 27, 2025

RyanUnderhill Jan 27, 2025

baijumeswani Jan 27, 2025

baijumeswani Jan 27, 2025

RyanUnderhill Jan 27, 2025

RyanUnderhill commented Jan 28, 2025

edgchen1 Jan 29, 2025

RyanUnderhill Jan 29, 2025

RyanUnderhill Jan 29, 2025

		void DumpSpan(std::ostream& stream, std::span<const float> values) override { return Generators::DumpSpan(stream, values); }
		void DumpSpan(std::ostream& stream, std::span<const int> values) override { return Generators::DumpSpan(stream, values); }

Refactor providers into separate libraries #1190

Are you sure you want to change the base?

Refactor providers into separate libraries #1190

Conversation

RyanUnderhill commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natke left a comment • edited Loading

Choose a reason for hiding this comment

baijumeswani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RyanUnderhill commented Jan 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natke left a comment •

edited

Loading