Skip to content

Finite Lifetime for IO Tensors #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: devel
Choose a base branch
from

Conversation

Victor-Jung
Copy link
Member

@Victor-Jung Victor-Jung commented Mar 17, 2025

This PR is based on this currently open PR, don't review it before PR#44 is merged.

Added

  • Two attributes to VariableBuffer is_output, and is_input, to indicate that a VariableBuffer is an IO of the network. IO buffers have to be treated differently than normal buffers; while they live in the global scope, inputs need to be alive at the beginning of the computation (lifetime[0]=0), and outputs have to be alive at the end of the computation (lifetime[-1]=inf).
  • One comprehensive test of the memory map generated by the tiler.

Changed

  • The _calculateLifetimes method of the MemoryScheduler is now giving a proper lifetime of VariableBuffer from the global scope.
  • Memory arena buffers are now added to the beginning of the OrderedDict representing the global context. This is necessary as other global buffers can depend on these arenas for their definition.

Fixed

  • Align the test memory allocation to fail properly, caused by optimization of the memory footprint.
  • Fix generateBufferAllocationCode for PULP Deployer. Previously, the output was loaded in L3 at the beginning of the computation. This is completely unnecessary; breaks when the outputs don't have an infinite lifetime, and it is prone to error. Hence now the outputs are not loaded in L3.
  • The Softmax_fp32 kernel for Snitch has a memory leak if used with 8 cores. I added a comment and constrained its execution to a single core.
  • In _calculateLifetimes: The lifetime of aliased buffers is not correctly computed.
  • Some type hinting was done as a forward reference (represented by strings); this is unnecessary and has been adapted to a proper type hint.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

@Victor-Jung Victor-Jung requested a review from Xeratec as a code owner March 17, 2025 13:22
@Victor-Jung Victor-Jung self-assigned this Mar 17, 2025
@Victor-Jung Victor-Jung force-pushed the pr/finite-lifetime-io branch from 0084047 to dd1370c Compare March 18, 2025 10:32
@Victor-Jung Victor-Jung marked this pull request as draft March 18, 2025 13:50
@Victor-Jung Victor-Jung marked this pull request as ready for review March 19, 2025 08:01
@Xeratec Xeratec changed the title DRAFT: Finite Lifetime for IO Tensors Finite Lifetime for IO Tensors Mar 19, 2025
Copy link
Member

@Xeratec Xeratec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and I already know how important it is ;) I also like the testMemoryMapCorrectness you added.

No major comments, just some small questions.

Comment on lines -50 to +57
uint32_t compute_num = snrt_cluster_compute_core_num();
uint32_t compute_num = 1; //snrt_cluster_compute_core_num();
int32_t ldI = compute_num * ${input_samples};
int32_t batch_offset = ${seq_len} * ${input_samples};

${kernelName}(${data_in}, ${data_out}, ldI, batch_offset, batch_size, ${seq_len}, ${input_samples});

// JUNGVI: This implementation is broken and has memory leak.
if (snrt_hartid() == 0){
${kernelName}(${data_in}, ${data_out}, ldI, batch_offset, batch_size, ${seq_len}, ${input_samples});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be part of this PR? How hard is it to fix it or did Run already fix it?

Comment on lines 305 to +306
# SCHEREMO: ConstantBuffers are assigned and allocated at compile time, Global Var Buffers are assigned at init time
if isinstance(ctxt.lookup(tensorMemoryConstraint.tensorName), ConstantBuffer) or ctxt.is_global(
tensorMemoryConstraint.tensorName):
if isinstance(ctxt.lookup(tensorMemoryConstraint.tensorName), ConstantBuffer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adapt the comment according to you changes.

Comment on lines +140 to +141
if hasattr(ctxt.lookup(buffer.name), "_alias"):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be helpful to mark aliased buffers with a special color or sth. else?

Comment on lines -378 to +383
def setupModel(self, ctxt: NetworkContext, schedule: Schedule, layerBinding: OrderedDict[str, ONNXLayer],
def setupModel(self, ctxt: NetworkContext, schedule: Schedule, layerBinding: 'OrderedDict[str, ONNXLayer]',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? The ONNXLayer is imported above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants