-
Notifications
You must be signed in to change notification settings - Fork 430
Initial refactoring of edge storage #1085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
28c4331 to
42693d6
Compare
| iterator operator--() { | ||
| value_ -= 1; | ||
| return *this; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the corresponding operators (also != below)? They don't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what? This comment doesn't make a lot of sense.
vpr/src/route/rr_node_storage.h
Outdated
| t_edge_size num_edges(const RRNodeId& id) const { | ||
| auto first_id = first_edge_[id]; | ||
| auto second_id = (&first_edge_[id])[1]; | ||
| return (size_t)second_id - (size_t)first_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dangerous to use auto followed by a cast. I'd prefer an explicit type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually a cast, it's a operator size_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider using size_t(second_id) - size_t(first_id) to make this more explicit. That's how its done elsewhere in VPR.
e6267aa to
87d0ef8
Compare
|
There are many uses of |
|
Sanity check with Before: After: This jump in ~1.2 GiB is basically the cost of storing the source rr node explicitly. I'll noddle over alternative strategies. |
87d0ef8 to
25f946f
Compare
So in an unexpected turn, while the max_rss after create device is higher than before, because the heap is not thrashed, the final max_rss is much better: Before: After: I'll need to do a full |
kmurray
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thoughts after looking through this is that its looking good!
It is interesting that some parts of it are looking structurally more like the proposed RRGraph in #1046 (e.g. SoA form edge storage). I think ultimately moving towards that style of interface (pass an ID to get an attribute from an element in the graph) is where we want to go.
I have a couple more detailed comments below including a possible approach to avoid storing the edge_src_nodes_ after edge partitioning but still allowing (moderatly fast) access to an edge's source node.
| vtr::vector<RRNodeId, RREdgeId> first_edge_; | ||
| vtr::vector<RRNodeId, t_edge_size> fan_in_; | ||
|
|
||
| vtr::vector<RREdgeId, RRNodeId> edge_src_node_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you noted, the additional storage of the source node is non-trivial (since there are more edges than nodes).
It's nice to have this information, but it isn't currently used (e.g. in the router). So if we really want to focus on memory usage, we could consider dropping it from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, given the sorted structure of the edges and that we know the start/end of each node's out-going edges, it seems like we should be able to get the source node for a given edge ID in log(num_nodes) time via binary search against the (sorted) t_rr_node_data::first_edge_s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary reason for the edge_src_node_ data is to sort the underlying data. After that is complete, we can discard the array.
vpr/src/route/rr_node_storage.h
Outdated
| t_edge_size num_edges(const RRNodeId& id) const { | ||
| auto first_id = first_edge_[id]; | ||
| auto second_id = (&first_edge_[id])[1]; | ||
| return (size_t)second_id - (size_t)first_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider using size_t(second_id) - size_t(first_id) to make this more explicit. That's how its done elsewhere in VPR.
vpr/src/route/rr_node_storage.h
Outdated
| }; | ||
|
|
||
| // RR node and edge storage class. | ||
| class t_rr_node_storage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point this struct is really now representing the RR graph itself. Perhaps it should be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
| return; | ||
| } | ||
|
|
||
| edges_read_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on clear on what this variable is for, and how it differs from partitioned_. Likely need comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added
| const auto& device_ctx = g_vpr_ctx.device(); | ||
| std::stable_sort( | ||
| edge_sort_iterator(this, 0), | ||
| edge_sort_iterator(this, edge_src_node_.size()), | ||
| edge_compare_src_node_and_configurable_first(device_ctx.rr_switch_inf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need some comments describing what's going on here.
I think its accomplishing:
- Ordering edges by src_node (so edges with a shared source node are contiguous)
- Partitioning the edges for each src_node by configurable/non-configurable
with a single sort. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added
vpr/src/route/rr_node_storage.h
Outdated
|
|
||
| // Take allocated edges in edge_src_node_/ edge_dest_node_ / edge_switch_ | ||
| // sort, and assign the first edge for each | ||
| void assign_edges(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename assign_first_edges()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| * the edge data is directly usable for each node by simply slicing the arrays. | ||
| * | ||
| * */ | ||
| struct edge_swapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice way to keep the swapping clean in Struct-of-Arrays form!
| RREdgeId edge_id(const RRNodeId& id, t_edge_size iedge) const { | ||
| RREdgeId first_edge = first_edge_[id]; | ||
| RREdgeId ret((size_t)first_edge + iedge); | ||
| VTR_ASSERT_SAFE(ret < (&first_edge_[id])[1]); | ||
| return ret; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That we can do this easily given the edge ordering is a really nice feature, which I expect will make porting/refactoring downstream code much easier.
|
I think I've identified a way to regain some of the lost CPU performance. I've started a vtr_reg_weekly QoR run, but it won't return results for a day or two. I'll push a new branch today with a rebase onto of master, along with fixes to review comments. |
25f946f to
aca9c27
Compare
|
I believe all feedback has been addressed. I've kicked off a QoR run, results will be ready in a couple days. Preliminary results show the CPU cost is gone, with the 4% memory increase remaining. #1096 resolves the memory increase, and changes it to a 35-60% memory reduction. |
2b055d6 to
bbf1d04
Compare
Signed-off-by: Keith Rothman <[email protected]>
This should have a negliable performance impact, but this enables future changes to modify how rr nodes and rr edges are storaged. Signed-off-by: Keith Rothman <[email protected]>
This changes edge storage from an allocation array of struct per node to struct of array for all edge data. Several algorithms over edges that were previous per node per edge, but were actually just iteration over edges are now part of rr_node_storage. Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
This enables 16-byte alignment (4 nodes per cache line). Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
bbf1d04 to
e96a3ba
Compare
Description
This changes edge storage from an allocation array of struct per node to
struct of array for all edge data.
Several algorithms over edges that were previous per node per edge, but
were actually just iteration over edges are now part of rr_node_storage.
This PR is built on top #1084 .
Related Issue
#1079
#1081
#1084
Motivation and Context
This changes the number of heap allocations on preallocated array (e.g. loading rr graph from file) from
Nnodesto3.This changes reduces the number of heap allocations during rr graph building from
Nnodesto3 * ceil(log2(Nedges) - log2(10*Nodes)).When the number of edges is known the max memory usage for edges is now
Nedge * (2*sizeof(int)+sizeof(short)). When the number of edges is not known, the max memory usage for edges is now:How Has This Been Tested?
Types of changes
Checklist: