Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 207 additions & 10 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,8 @@ enum class node_type {
memadvise,
ext_oneapi_barrier,
host_task,
async_malloc,
async_free
};

class node {
Expand Down Expand Up @@ -724,6 +726,8 @@ public:
void update(node& node);
void update(const std::vector<node>& nodes);
void update(const command_graph<graph_state::modifiable>& graph);

size_t get_required_mem_size() const noexcept;
};

} // namespace sycl::ext::oneapi::experimental
Expand Down Expand Up @@ -938,6 +942,8 @@ Both the source and target graphs for the update must satisfy the following
conditions:

* Both graphs must have been created with the same device and context.
* Neither graph may have any nodes of type `node_type::async_malloc` or
`node_type::async_free`.
* Both graphs must be topologically identical. The graphs are considered
topologically identical when:

Expand All @@ -946,7 +952,9 @@ conditions:
** Nodes must be added in the same order in the two graphs. Nodes may be added
via `command_graph::add`, or for a recorded queue via `queue::submit` or
queue shortcut functions.
** Corresponding nodes in each graph must be kernels that have the same type:
** Corresponding nodes in each graph must have the same `node_type`.
** Corresponding nodes of type `node_type::kernel` must have kernels with
identical types:

*** When the kernel is defined as a lambda, the lambda must be the same.
*** When the kernel is defined as a named function object, the kernel class
Expand All @@ -970,6 +978,147 @@ If a node containing a dynamic parameter is updated through the whole graph
update API, then any previous updates to the dynamic parameter will be reflected
in the new graph.

==== Graph-Owned Memory Allocations [[graph-memory-allocations]]

:async_alloc_spec: xref:../proposed/sycl_ext_oneapi_async_memory_alloc.asciidoc

It can be desirable for a graph to own and manage memory allocations for memory
associated with commands in the graph. This is made possible by using the
`async_<malloc/malloc_from_pool>` and `async_free` commands from the
{async_alloc_spec}[sycl_ext_oneapi_async_memory_alloc] extension. These
commands can be added to a graph either via queue recording or explicit graph
creation, which will create allocations which are owned and managed by that
specific `command_graph`, and who's lifetimes are tied to the lifetime of that
graph.

Pointers returned from allocation nodes can be used in other graph nodes in the
same way as regular USM pointers.

===== API Usage

Malloc and free nodes can be added to a graph via both the explicit and queue
recording graph APIs using the `async_<malloc/malloc_from_pool/free>` free
functions inside a command-group:

[source,c++]
----
void* Ptr = nullptr;
size_t AllocSize = 1024;
auto CGF = [&](handler &CGH){
Ptr = async_malloc(CGH, usm::alloc::device, AllocSize);
}

// Explicit graph creation
Graph.add(CGF);

Graph.add([&](handler &CGH){
async_free(CGH, Ptr);
});

// Queue recording
Graph.begin_recording(Queue);
Queue.submit(CGF);
Queue.submit([&](handler &CGH){
async_free(CGH, Ptr);
});
Graph.end_recording(Queue);
----

The `async_*` functions which take a queue can also be used with queue
recording, particularly when recording an in-order queue to specify dependencies
as no SYCL event is returned.

[source,c++]
----
void* Ptr = nullptr;
size_t AllocSize = 1024;
queue Queue {syclContext, syclDevice, {property::queue::in_order{}}};

Graph.begin_recording(Queue);
Ptr = async_malloc(Queue, usm::alloc::device, AllocSize);
async_free(Queue, Ptr);
Graph.end_recording(Queue);
----

===== Supported Features [[allocation-supported-features]]

Currently only device allocations are supported. Attempting to add allocations
of any other type to a graph will result in synchronous errors being thrown
with error code `feature_not_supported`.

===== Restrictions

The following restrictions apply to any graph containing async malloc or free
nodes:

* Only one executable graph instance for a given modifiable graph (created by
finalizing the modifiable graph) can be alive at any time, and all copies of
that instance (created via the {crs}[common reference semantics] of the
`command_graph` class) must be destroyed before the graph can be finalized
again.
* The graph cannot be used as a sub-graph in another graph.
* Graph memory allocation nodes cannot be updated, and graphs containing these
nodes cannot be updated via <<whole-graph-update, Whole Graph Update>>.

Attempting to perform any of the above operations will result in a sychronous
error being thrown with error code `invalid`.

===== Allocation Lifetime

The lifetime of graph-owned allocations are tied to the lifetime of the graph
itself.

It is only valid to use the pointers returned from graph allocation nodes inside
the graph in which they were allocated. Any nodes using these allocations must
be ordered after the allocation node and before the free node for that
allocation. Failure to do so will result in undefined behavior.

It is invalid to use these pointers outside of the owning graph and doing so
will result in undefined behavior.

===== Behaviour

The semantics of `async_malloc` and `async_free` within a graph differ from the
non-graph usage described in the
{async_alloc_spec}[sycl_ext_oneapi_async_memory_alloc] extension.

* Graph memory allocations are not made directly from any default or
user-provided Memory Pool. Each graph containing async mlloc/free nodes
maintains its own pool of memory from which allocations are made.
The following properties of a default or user-provided memory pool provided in
calls to `async_<malloc/malloc_from_pool>` will be respected for the associated
graph allocations, all other properties will be ignored:

** The allocation type specified when creating the pool with
`usm::alloc::<host/device/shared>`, subject to the limitations in the
<<allocation-supported-features, supported features>> section.

** `property::memory_pool::zero_init` - Allocated memory will be
zero-initialized only once when first allocated. It will not be zero-initialized
again before or during any subsequent executions of the graph. If that is
required by the application it is the responsibility of the user to add the
appropriate commands to the graph to do this.

* `node_type::async_malloc` nodes within a graph will return a pointer to an
allocation of the provided size. This pointer can then be used in other graph
nodes ordered after that node in the same way any USM pointer would be.

* `node_type::async_free` nodes within a graph indicate that a given allocation
is no longer in use. They must be ordered after the associated allocation node.
The pointer provided to `async_free` must be the address of a memory allocation
allocated by an async malloc node in the same graph.
Violating these preconditions will result in undefined behavior.

* Other nodes which use a given graph allocation must be ordered via
dependencies such that they are ordered after the allocation node and before the
free node for a given allocation. It is the user's responsibility to ensure that
dependencies are correct. Using a pointer in a graph command ordered after it
has been freed via an `async_free` node results in undefined behavior.

The total amount of memory required for graph allocations by an executable graph
can be queried using the `command_graph::get_required_mem_size()` member
function.

==== Graph Properties [[graph-properties]]

===== No-Cycle-Check Property
Expand Down Expand Up @@ -1153,8 +1302,12 @@ _Throws:_
and this command uses a buffer. See the
<<assume-buffer-outlives-graph-property, Assume-Buffer-Outlives-Graph>>
property for more information.
* An `exception` with error code `invalid` if the type of the command-group is
not a kernel execution and a `dynamic_parameter` was registered inside `cgf`.
* An `exception` with error code `invalid` if the type of the command contained
in the command-group is `async_malloc` and the `usm::alloc` type of the
associated memory pool is not `usm::alloc::device`.
* An `exception` with error code `invalid` if the type of the command contained
in the command-group is not a kernel execution and a `dynamic_parameter` was
registered inside `cgf`.

[source,c++]
----
Expand Down Expand Up @@ -1222,18 +1375,25 @@ finalize(const property_list& propList = {}) const;

_Effects:_ Synchronous operation that creates a new graph in the executable state with a
fixed topology that can be submitted for execution on any queue sharing the
context associated with the graph. It is valid to call this method multiple times
to create subsequent executable graphs. It is also valid to continue to add new
nodes to the modifiable graph instance after calling this function. It is valid
to finalize an empty graph instance with no recorded commands.
context associated with the graph. It is valid to call this member function
multiple times to create subsequent executable graphs, unless the graph contains
<<graph-memory-allocations, graph-owned memory allocations>>. It is also valid
to continue to add new nodes to the modifiable graph instance after calling this
function. It is valid to finalize an empty graph instance with no recorded
commands.

_Constraints:_ This member function is only available when the `command_graph`
state is `graph_state::modifiable`.

_Returns:_ A new executable graph object which can be submitted to a queue.

_Throws:_ Synchronously `exception` with error code `feature_not_supported` if
the graph contains a command that is not supported by the device.
_Throws:_

* Synchronously `exception` with error code `feature_not_supported` if
the graph contains a command that is not supported by the device.
* An `exception` with error code `invalid` if the graph contains
<<graph-memory-allocations, graph-owned memory allocations>> and any instance
of an executable graph created from this modifiable graph is still alive.

[source,c++]
----
Expand Down Expand Up @@ -1267,6 +1427,17 @@ std::vector<node> get_root_nodes() const;

_Returns:_ A list of all nodes in the graph which have no dependencies.

[source,c++]
----
size_t get_required_mem_size() const noexcept;
----

_Constraints:_ This member function is only available when the `command_graph` state is
`graph_state::executable`.

_Returns:_ The total size in bytes of the memory required for
<<graph-memory-allocations, graph-owned memory allocations>> in this graph.

===== Member functions of the `command_graph` class for graph update

[source,c++]
Expand All @@ -1293,6 +1464,8 @@ _Throws:_
created.
* An `exception` with error code `invalid` if `node` is not part of the
graph.
* An `exception` with error code `invalid` if the type of `node` is either
`node_type::async_malloc` or `node_type::async_free`.
* If any other `exception` is thrown the state of the graph node is undefined.

[source,c++]
Expand All @@ -1318,6 +1491,8 @@ _Throws:_
`property::graph::updatable` was not set when the executable graph was created.
* An `exception` with error code `invalid` if any node in `nodes` is not part of the
graph.
* An `exception` with error code `invalid` if the type of any node in `nodes` is
either `node_type::async_malloc` or `node_type::async_free`.
* If any other `exception` is thrown the state of the graph nodes is undefined.

[source, c++]
Expand Down Expand Up @@ -1361,6 +1536,9 @@ _Throws:_
`property::graph::updatable` was not set when the executable graph was
created.

* Synchronous `exception` with error code `invalid` if the graph contains any
<<graph-memory-allocations, graph-owned memory allocations>>.

* If any other `exception` is thrown the state of the graph nodes is undefined.

===== Member functions of the `command_graph` class for queue recording
Expand Down Expand Up @@ -1637,6 +1815,9 @@ as the device and context used on creation of the graph.

_Returns:_ An event which represents the command which is submitted to the queue.

_Throws:_ Synchronous `exception` with error code `invalid` if `graph` contains
any <<graph-memory-allocations, graph-owned memory allocations>>.

[source,c++]
----
event
Expand All @@ -1657,6 +1838,9 @@ are the same as the device and context used on creation of the graph.

_Returns:_ An event which represents the command which is submitted to the queue.

_Throws:_ Synchronous `exception` with error code `invalid` if `graph` contains
any <<graph-memory-allocations, graph-owned memory allocations>>.

[source,c++]
----
event
Expand All @@ -1677,6 +1861,9 @@ are the same as the device and context used on creation of the graph.

_Returns:_ An event which represents the command which is submitted to the queue.

_Throws:_ Synchronous `exception` with error code `invalid` if `graph` contains
any <<graph-memory-allocations, graph-owned memory allocations>>.

==== New Handler Member Functions

===== Additional member functions of the `sycl::handler` class
Expand All @@ -1692,9 +1879,12 @@ execute at any time. If `graph` is submitted multiple times, dependencies
are automatically added by the runtime to prevent concurrent executions of
an identical graph.

_Throws:_ Synchronously `exception` with error code `invalid` if the handler
_Throws:_
* Synchronously `exception` with error code `invalid` if the handler
is submitted to a queue which is associated with a device or context that is
different from the device and context used on creation of the graph.
* Synchronous `exception` with error code `invalid` if `graph` contains
any <<graph-memory-allocations, graph-owned memory allocations>>.

[source,c++]
----
Expand Down Expand Up @@ -1888,6 +2078,13 @@ recording mode, as opposed to throwing.
This section defines the interaction of `sycl_ext_oneapi_graph` with other
extensions.

==== sycl_ext_oneapi_async_memory_alloc

The APIs defined in
link:../proposed/sycl_ext_oneapi_async_memory_alloc.asciidoc[sycl_ext_oneapi_async_memory_alloc]
are supported for use in graphs. For further details see the section on
<<graph-memory-allocations, Graph-owned memory allocations>>.

==== sycl_ext_codeplay_enqueue_native_command

The new methods defined by
Expand Down
Loading