-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AUDIO_WORKLET] Added support for MEMORY64 and 2GB heap (including tests) #23508
base: main
Are you sure you want to change the base?
Changes from 80 commits
cc43cd8
5fe9631
8a711fe
575027a
fe509b6
ea4f9a9
c39928a
8f6a793
0bfa410
0935e01
8dfe26d
9afee91
663ef89
4024368
38d3425
fc3476a
f6a78ae
ac98c9e
5163588
dbc7cee
4e9d358
55e70fa
392fede
56676a4
2ff5c87
20b222b
db3528a
b14084f
dff172c
6b92ee4
34a5422
49ad4d1
90c4bae
8fd7016
88ac28e
ad0abee
466a447
659323a
8ea671d
2dfefb1
1d68bd1
21f7761
54b171d
788d7cf
85ca54a
047223a
8ad0c4a
bd13512
11751cf
68cdb19
fa97e7d
1b36c8d
ccb12a0
568a86f
d218d23
4e1af38
bd5d653
48361fd
37db147
b9a7742
8f9f6ad
b0268c7
92dc53e
52ccabb
8597567
1eba671
e2b64dc
75da785
c600212
e8e6395
a40a2d7
67bf288
178b9c5
b2c8826
a173ac7
59acd50
5d8cee3
2b8d764
43e9e71
0c3df92
6b3f01e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -29,14 +29,47 @@ function createWasmAudioWorkletProcessor(audioParams) { | |||
|
||||
// Capture the Wasm function callback to invoke. | ||||
let opts = args.processorOptions; | ||||
this.callbackFunction = Module['wasmTable'].get(opts['cb']); | ||||
this.userData = opts['ud']; | ||||
this.callbackFunction = Module['wasmTable'].get({{{ toIndexType("opts['cb']") }}}); | ||||
this.userData = {{{ toIndexType("opts['ud']") }}}; | ||||
|
||||
// Then the samples per channel to process, fixed for the lifetime of the | ||||
// context that created this processor. Note for when moving to Web Audio | ||||
// 1.1: the typed array passed to process() should be the same size as this | ||||
// 'render quantum size', and this exercise of passing in the value | ||||
// shouldn't be required (to be verified). | ||||
// context that created this processor. Even though this 'render quantum | ||||
// size' is fixed at 128 samples in the 1.0 spec, it will be variable in | ||||
// the 1.1 spec. It's passed in now, just to prove it's settable, but will | ||||
// eventually be a property of the AudioWorkletGlobalScope (globalThis). | ||||
this.samplesPerChannel = opts['sc']; | ||||
this.bytesPerChannel = this.samplesPerChannel * {{{ getNativeTypeSize('float') }}}; | ||||
|
||||
// Create up-front as many typed views for marshalling the output data as | ||||
// may be required (with an arbitrary maximum of 10, for the case where a | ||||
// multi-MB stack is passed), allocated at the *top* of the worklet's | ||||
// stack (and whose addresses are fixed). The 'minimum alloc' firstly | ||||
// stops STACK_OVERFLOW_CHECK failing (since the stack will be full, and | ||||
// 16 being the minimum allocation size due to alignments) and leaves room | ||||
// for a single AudioSampleFrame as a minumum. | ||||
this.maxBuffers = Math.min(((Module['sz'] - /*minimum alloc*/ 16) / this.bytesPerChannel) | 0, /*sensible limit*/ 10); | ||||
#if ASSERTIONS | ||||
console.assert(this.maxBuffers > 0, `AudioWorklet needs more stack allocating (at least ${this.samplesPerChannel * 4})`); | ||||
#endif | ||||
// These are still alloc'd to take advantage of the overflow checks, etc. | ||||
var oldStackPtr = stackSave(); | ||||
var viewDataIdx = {{{ getHeapOffset('stackAlloc(this.maxBuffers * this.bytesPerChannel)', 'float') }}}; | ||||
#if WEBAUDIO_DEBUG | ||||
console.log(`AudioWorklet creating ${this.maxBuffers} buffer one-time views (for a stack size of ${Module['sz']} at address 0x${(viewDataIdx * 4).toString(16)})`); | ||||
#endif | ||||
this.outputViews = []; | ||||
for (var i = this.maxBuffers; i > 0; i--) { | ||||
// Added in reverse so the lowest indices are closest to the stack top | ||||
this.outputViews.unshift( | ||||
HEAPF32.subarray(viewDataIdx, viewDataIdx += this.samplesPerChannel) | ||||
); | ||||
} | ||||
stackRestore(oldStackPtr); | ||||
|
||||
#if ASSERTIONS | ||||
// Explicitly verify this later in process() | ||||
this.ctorOldStackPtr = oldStackPtr; | ||||
#endif | ||||
} | ||||
|
||||
static get parameterDescriptors() { | ||||
|
@@ -51,75 +84,122 @@ function createWasmAudioWorkletProcessor(audioParams) { | |||
let numInputs = inputList.length, | ||||
numOutputs = outputList.length, | ||||
numParams = 0, i, j, k, dataPtr, | ||||
bytesPerChannel = this.samplesPerChannel * 4, | ||||
outputViewsNeeded = 0, | ||||
stackMemoryNeeded = (numInputs + numOutputs) * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}}, | ||||
oldStackPtr = stackSave(), | ||||
inputsPtr, outputsPtr, outputDataPtr, paramsPtr, | ||||
inputsPtr, outputsPtr, paramsPtr, | ||||
didProduceAudio, paramArray; | ||||
|
||||
// Calculate how much stack space is needed. | ||||
for (i of inputList) stackMemoryNeeded += i.length * bytesPerChannel; | ||||
for (i of outputList) stackMemoryNeeded += i.length * bytesPerChannel; | ||||
// Calculate how much stack space is needed | ||||
for (i of inputList) stackMemoryNeeded += i.length * this.bytesPerChannel; | ||||
for (i of outputList) outputViewsNeeded += i.length; | ||||
stackMemoryNeeded += outputViewsNeeded * this.bytesPerChannel; | ||||
for (i in parameters) stackMemoryNeeded += parameters[i].byteLength + {{{ C_STRUCTS.AudioParamFrame.__size__ }}}, ++numParams; | ||||
|
||||
// Allocate the necessary stack space. | ||||
inputsPtr = stackAlloc(stackMemoryNeeded); | ||||
#if ASSERTIONS | ||||
console.assert(oldStackPtr == this.ctorOldStackPtr, 'AudioWorklet stack address has unexpectedly moved'); | ||||
console.assert(outputViewsNeeded <= this.outputViews.length, `Too many AudioWorklet outputs (need ${outputViewsNeeded} but have stack space for ${this.outputViews.length})`); | ||||
#endif | ||||
|
||||
// Allocate the necessary stack space (dataPtr is always in bytes, and | ||||
// advances as space for structs and data is taken, but note the switching | ||||
// between bytes and indices into the various heaps, usually in 'k'). This | ||||
// will be 16-byte aligned (from _emscripten_stack_alloc()), as were the | ||||
// output views, so we round up and advance the required bytes to ensure | ||||
// the addresses all work out at the end. | ||||
i = (stackMemoryNeeded + 15) & ~15; | ||||
dataPtr = stackAlloc(i) + (i - stackMemoryNeeded); | ||||
|
||||
// Copy input audio descriptor structs and data to Wasm | ||||
k = inputsPtr >> 2; | ||||
dataPtr = inputsPtr + numInputs * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}}; | ||||
// Note: filling the structs was tried with makeSetValue() but it creates | ||||
// minor overhead (adds and shifts) that we can avoid (and no combination | ||||
// of optimisations will fold). | ||||
inputsPtr = dataPtr; | ||||
k = {{{ getHeapOffset('inputsPtr', 'u32') }}}; | ||||
dataPtr += numInputs * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}}; | ||||
for (i of inputList) { | ||||
// Write the AudioSampleFrame struct instance | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.numberOfChannels / 4 }}}] = i.length; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.samplesPerChannel / 4 }}}] = this.samplesPerChannel; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.data / 4 }}}] = dataPtr; | ||||
k += {{{ C_STRUCTS.AudioSampleFrame.__size__ / 4 }}}; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.numberOfChannels / getNativeTypeSize('u32') }}}] = i.length; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.samplesPerChannel / getNativeTypeSize('u32') }}}] = this.samplesPerChannel; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.data / getNativeTypeSize('u32') }}}] = dataPtr; | ||||
#if MEMORY64 | ||||
// See the note in the constructor for dealing with 64-bit addresses | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.data / getNativeTypeSize('u32') + 1 }}}] = dataPtr / 0x100000000; | ||||
#endif | ||||
k += {{{ C_STRUCTS.AudioSampleFrame.__size__ / getNativeTypeSize('u32') }}}; | ||||
// Marshal the input audio sample data for each audio channel of this input | ||||
for (j of i) { | ||||
HEAPF32.set(j, dataPtr>>2); | ||||
dataPtr += bytesPerChannel; | ||||
HEAPF32.set(j, {{{ getHeapOffset('dataPtr', 'float') }}}); | ||||
dataPtr += this.bytesPerChannel; | ||||
} | ||||
} | ||||
|
||||
// Copy output audio descriptor structs to Wasm | ||||
outputsPtr = dataPtr; | ||||
k = outputsPtr >> 2; | ||||
outputDataPtr = (dataPtr += numOutputs * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}}) >> 2; | ||||
for (i of outputList) { | ||||
// Write the AudioSampleFrame struct instance | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.numberOfChannels / 4 }}}] = i.length; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.samplesPerChannel / 4 }}}] = this.samplesPerChannel; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.data / 4 }}}] = dataPtr; | ||||
k += {{{ C_STRUCTS.AudioSampleFrame.__size__ / 4 }}}; | ||||
// Reserve space for the output data | ||||
dataPtr += bytesPerChannel * i.length; | ||||
} | ||||
|
||||
// Copy parameters descriptor structs and data to Wasm | ||||
paramsPtr = dataPtr; | ||||
k = paramsPtr >> 2; | ||||
k = {{{ getHeapOffset('paramsPtr', 'u32') }}}; | ||||
dataPtr += numParams * {{{ C_STRUCTS.AudioParamFrame.__size__ }}}; | ||||
for (i = 0; paramArray = parameters[i++];) { | ||||
// Write the AudioParamFrame struct instance | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioParamFrame.length / 4 }}}] = paramArray.length; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioParamFrame.data / 4 }}}] = dataPtr; | ||||
k += {{{ C_STRUCTS.AudioParamFrame.__size__ / 4 }}}; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioParamFrame.length / getNativeTypeSize('u32') }}}] = paramArray.length; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioParamFrame.data / getNativeTypeSize('u32') }}}] = dataPtr; | ||||
#if MEMORY64 | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.data / getNativeTypeSize('u32') + 1 }}}] = dataPtr / 0x100000000; | ||||
#endif | ||||
k += {{{ C_STRUCTS.AudioParamFrame.__size__ / getNativeTypeSize('u32') }}}; | ||||
// Marshal the audio parameters array | ||||
HEAPF32.set(paramArray, dataPtr>>2); | ||||
dataPtr += paramArray.length*4; | ||||
HEAPF32.set(paramArray, {{{ getHeapOffset('dataPtr', 'float') }}}); | ||||
dataPtr += paramArray.length * {{{ getNativeTypeSize('float') }}}; | ||||
} | ||||
|
||||
// Copy output audio descriptor structs to Wasm (note that dataPtr after | ||||
// the struct offsets should now be 16-byte aligned). | ||||
outputsPtr = dataPtr; | ||||
k = {{{ getHeapOffset('outputsPtr', 'u32') }}}; | ||||
dataPtr += numOutputs * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}}; | ||||
for (i of outputList) { | ||||
// Write the AudioSampleFrame struct instance | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.numberOfChannels / getNativeTypeSize('u32') }}}] = i.length; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.samplesPerChannel / getNativeTypeSize('u32') }}}] = this.samplesPerChannel; | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.data / getNativeTypeSize('u32') }}}] = dataPtr; | ||||
#if MEMORY64 | ||||
HEAPU32[k + {{{ C_STRUCTS.AudioSampleFrame.data / getNativeTypeSize('u32') + 1 }}}] = dataPtr / 0x100000000; | ||||
#endif | ||||
k += {{{ C_STRUCTS.AudioSampleFrame.__size__ / getNativeTypeSize('u32') }}}; | ||||
// Advance the output pointer to the next output (matching the pre-allocated views) | ||||
dataPtr += this.bytesPerChannel * i.length; | ||||
} | ||||
|
||||
#if ASSERTIONS | ||||
// If all the maths worked out, we arrived at the original stack address | ||||
console.assert(dataPtr == oldStackPtr, `AudioWorklet stack missmatch (audio data finishes at ${dataPtr} instead of ${oldStackPtr})`); | ||||
|
||||
// Sanity checks. If these trip the most likely cause, beyond unforeseen | ||||
// stack shenanigans, is that the 'render quantum size' changed. | ||||
if (numOutputs) { | ||||
// First that the output view addresses match the stack positions. | ||||
k = dataPtr - this.bytesPerChannel; | ||||
for (i = 0; i < outputViewsNeeded; i++) { | ||||
console.assert(k == this.outputViews[i].byteOffset, 'AudioWorklet internal error in addresses of the output array views'); | ||||
k -= this.bytesPerChannel; | ||||
} | ||||
// And that the views' size match the passed in output buffers | ||||
for (i of outputList) { | ||||
for (j of i) { | ||||
console.assert(j.byteLength == this.bytesPerChannel, `AudioWorklet unexpected output buffer size (expected ${this.bytesPerChannel} got ${j.byteLength})`); | ||||
} | ||||
} | ||||
} | ||||
#endif | ||||
|
||||
// Call out to Wasm callback to perform audio processing | ||||
if (didProduceAudio = this.callbackFunction(numInputs, inputsPtr, numOutputs, outputsPtr, numParams, paramsPtr, this.userData)) { | ||||
if (didProduceAudio = this.callbackFunction(numInputs, {{{ toIndexType('inputsPtr') }}}, numOutputs, {{{ toIndexType('outputsPtr') }}}, numParams, {{{ toIndexType('paramsPtr') }}}, this.userData)) { | ||||
// Read back the produced audio data to all outputs and their channels. | ||||
// (A garbage-free function TypedArray.copy(dstTypedArray, dstOffset, | ||||
// srcTypedArray, srcOffset, count) would sure be handy.. but web does | ||||
// not have one, so manually copy all bytes in) | ||||
// The preallocated 'outputViews' already have the correct offsets and | ||||
// sizes into the stack (recall from the ctor that they run backwards). | ||||
k = outputViewsNeeded - 1; | ||||
for (i of outputList) { | ||||
for (j of i) { | ||||
for (k = 0; k < this.samplesPerChannel; ++k) { | ||||
j[k] = HEAPF32[outputDataPtr++]; | ||||
} | ||||
j.set(this.outputViews[k--]); | ||||
} | ||||
} | ||||
} | ||||
|
@@ -193,14 +273,9 @@ class BootstrapMessages extends AudioWorkletProcessor { | |||
// 'cb' the callback function | ||||
// 'ch' the context handle | ||||
// 'ud' the passed user data | ||||
p.postMessage({'_wsc': d['cb'], 'x': [d['ch'], 1/*EM_TRUE*/, d['ud']] }); | ||||
p.postMessage({'_wsc': {{{ toIndexType("d['cb']") }}}, 'x': [d['ch'], 1/*EM_TRUE*/, {{{ toIndexType("d['ud']") }}}] }); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are the changes on this line necessary? I would hope that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Originally the type conversions were in the calling code, here for example: emscripten/src/lib/libwebaudio.js Line 269 in 0c3df92
But you suggested moving them to where they're needed. Without these it results in type errors, e.g.: |
||||
} else if (d['_wsc']) { | ||||
#if MEMORY64 | ||||
var ptr = BigInt(d['_wsc']); | ||||
#else | ||||
var ptr = d['_wsc']; | ||||
#endif | ||||
Module['wasmTable'].get(ptr)(...d['x']); | ||||
Module['wasmTable'].get({{{ toIndexType("d['_wsc']") }}})(...d['x']); | ||||
}; | ||||
} | ||||
} | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file sseems to have changed a lot more than I would expect just for the memory64 change. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's built on #22753 from last October that was never merged, which contained, besides the performance improvements, the groundwork for the struct offsets which made 2GB and wasm64 support straightforward. The diff between the old PR and this isn't that large, with most of the work going into the tests (which went into their own #23659).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll break this down into a series of smaller PRs on the current main, e.g. move to the
C_STRUCTS
offsets andmakeGetValue
, then introduce wasm64, then the performance changes. I couldn't review in its current state what I wrote myself.I don't know when this will be though since we're shipping it and it's done what I set out to do (we have millions of users putting in hour long sessions per day, so a few millis per second saved is a big deal on low-end school hardware when juggling 3D content with audio, and if it didn't work we'd really know about it!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We appreciate all the work you have done here.
I'm excited to see these changes lands. If you can find time to spit split the PR that would be amazing. If not, we can circle back in a few weeks.