You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A better place for this would be a wiki page as this isn't an issue.
This is meant to show the process of sampling and adding a new sound to be recognised as per the design here.
Adding a new class of sounds
The class MicroBitSoundRecogniser contains all the code to recognise sounds but doesn't have any sample of sounds added - hence it's made abstract by having the constructor private. For now, there is only one class that inherits it: EmojiRecogniser, which is supposed to recognise the emoji class of sounds.
To add a new recogniser one would need to create a new class that inherits the MicroBitSoundRecogniser and add each sound that should be recognised - as described below. An alternative is to replace the MicroBitSoundRecogniser altogether - in the pipeline - with a custom component that analyses the frequencies of each time frame to determine the sound being played - this would be preferable if the sounds are very long and constant.
Sampling a sound
Preparing the micro:bit
To sample a sound one would need to output the dominant frequency in each time frame. This can be done by either creating a component that outputs to serial only the dominant frequency as it comes from the MicroBitAudioProcessor, or just using the .hex attached.
Preparing the host machine
The micro:bit would need to be connected to a host machine with a serial monitor - the default baud rate is 115200.
A good serial monitor is CoolTerm, and it should be configured with the following settings:
The actual sampling
A sound can be sampled by clearing the serial monitor, playing the sound and disconnecting the serial monitor. The result - e.g. a sample of the happy emoji sound is shown below - would then be copied to an excel to be graphed.
If using the .hex provided, play the sound at a higher volume or closer as the thresholds for noise are higher than usual to filter more noise - this makes it easier to find where the sound started and ended.
Multiple samples would be needed to find which parts of the sound are consistent across multiple plays - that's because of the randomness in the generation of the sounds.
Analysing the results
Identifying a consistent part and aligning the samples
After having a couple of samples of the sound in excel, they can be graphed to see its shape. Graphing all of them would look like below.
Although the first half seems random, the sound can be recognised by its final part - which seems less random. Aligning the samples to match the final part (moving a couple of columns up or down), it should look like:
To mark where the first sequence starts, it's a good idea to add an empty row there. This would make it look like this:
To allow for deviations from these samples, it will further be broken down at a "checkpoint" - some frequency all samples reach. This would look like this:
The columns would now look like:
Removing redundant samples
As some of the samples are quite similar, they can be removed. To do this, it is useful to first copy each sequence to other columns.
For the first sequence, the first 2 samples are the same, so one of them can be removed. The 5th is the same as the first two but one shorter - so the other of the first two can be removed. When choosing which samples to remove, most of the times is better to keep the shorter one as the algorithm for matching the sequences tries to match them exactly one after the other or with another frequency that can be anything in between.
For the second sequence, the last 2 samples are the same, so one of them can be removed. Furthermore, when graphing the rest of the samples - see below - most of them are quite similar - only ~20 Hz deviation. This can be accommodated by setting a threshold >= 25 Hz for this sequence - although a threshold of ~70-80 Hz would be better in cases where there's more noise, and it's safer to have a larger threshold. In this case, only the 3rd and any one of the other samples would do.
After removing the redundant samples, the excel would look like:
Adding the sound to the recogniser
The code used for adding the happy sound (in the EmojiRecogniser class) is:
happy_sequences - the number of sequences in the sound
happy_max_deviations - the maximum number of deviations allowed (i.e. data-points that can be more than the threshold away from the sampled frequency)
happy_samples - the samples from the excel
happy_thresholds - the threshold (i.e. how many Hz off the sampled frequency is allowed)
happy_deviations - the maximum number of deviation allowed for each sequence. The deviations should satisfy both this and happy_max_deviations.
happy_nr_samples - the number of samples in each sequence
To help copying the data from excel to happy_samples, a function that initializes the values of the array in excel can be used - for google sheets that would be = CONCATENATE("{ ",COUNT(J$6:J), ", ", textjoin(", ", 1, J$6:J), "}, "):
Attachments
Attached here are the .hex to stream the frequencies from the micro:bit and the excel - google sheets actually - I used to sample happy.
A better place for this would be a wiki page as this isn't an issue.
This is meant to show the process of sampling and adding a new sound to be recognised as per the design here.
Adding a new class of sounds
The class
MicroBitSoundRecogniser
contains all the code to recognise sounds but doesn't have any sample of sounds added - hence it's made abstract by having the constructor private. For now, there is only one class that inherits it:EmojiRecogniser
, which is supposed to recognise the emoji class of sounds.To add a new recogniser one would need to create a new class that inherits the
MicroBitSoundRecogniser
and add each sound that should be recognised - as described below. An alternative is to replace theMicroBitSoundRecogniser
altogether - in the pipeline - with a custom component that analyses the frequencies of each time frame to determine the sound being played - this would be preferable if the sounds are very long and constant.Sampling a sound
Preparing the micro:bit
To sample a sound one would need to output the dominant frequency in each time frame. This can be done by either creating a component that outputs to serial only the dominant frequency as it comes from the
MicroBitAudioProcessor
, or just using the .hex attached.Preparing the host machine
The micro:bit would need to be connected to a host machine with a serial monitor - the default baud rate is 115200.
A good serial monitor is CoolTerm, and it should be configured with the following settings:
The actual sampling
A sound can be sampled by clearing the serial monitor, playing the sound and disconnecting the serial monitor. The result - e.g. a sample of the happy emoji sound is shown below - would then be copied to an excel to be graphed.
If using the .hex provided, play the sound at a higher volume or closer as the thresholds for noise are higher than usual to filter more noise - this makes it easier to find where the sound started and ended.
Multiple samples would be needed to find which parts of the sound are consistent across multiple plays - that's because of the randomness in the generation of the sounds.
Analysing the results
Identifying a consistent part and aligning the samples
After having a couple of samples of the sound in excel, they can be graphed to see its shape. Graphing all of them would look like below.
Although the first half seems random, the sound can be recognised by its final part - which seems less random. Aligning the samples to match the final part (moving a couple of columns up or down), it should look like:
To mark where the first sequence starts, it's a good idea to add an empty row there. This would make it look like this:
To allow for deviations from these samples, it will further be broken down at a "checkpoint" - some frequency all samples reach. This would look like this:
The columns would now look like:
Removing redundant samples
As some of the samples are quite similar, they can be removed. To do this, it is useful to first copy each sequence to other columns.
For the first sequence, the first 2 samples are the same, so one of them can be removed. The 5th is the same as the first two but one shorter - so the other of the first two can be removed. When choosing which samples to remove, most of the times is better to keep the shorter one as the algorithm for matching the sequences tries to match them exactly one after the other or with another frequency that can be anything in between.
For the second sequence, the last 2 samples are the same, so one of them can be removed. Furthermore, when graphing the rest of the samples - see below - most of them are quite similar - only ~20 Hz deviation. This can be accommodated by setting a threshold >= 25 Hz for this sequence - although a threshold of ~70-80 Hz would be better in cases where there's more noise, and it's safer to have a larger threshold. In this case, only the 3rd and any one of the other samples would do.
After removing the redundant samples, the excel would look like:
Adding the sound to the recogniser
The code used for adding the happy sound (in the
EmojiRecogniser
class) is:The constants are:
happy_sequences
- the number of sequences in the soundhappy_max_deviations
- the maximum number of deviations allowed (i.e. data-points that can be more than the threshold away from the sampled frequency)happy_samples
- the samples from the excelhappy_thresholds
- the threshold (i.e. how many Hz off the sampled frequency is allowed)happy_deviations
- the maximum number of deviation allowed for each sequence. The deviations should satisfy both this andhappy_max_deviations
.happy_nr_samples
- the number of samples in each sequenceTo help copying the data from excel to
happy_samples
, a function that initializes the values of the array in excel can be used - for google sheets that would be= CONCATENATE("{ ",COUNT(J$6:J), ", ", textjoin(", ", 1, J$6:J), "}, ")
:Attachments
Attached here are the .hex to stream the frequencies from the micro:bit and the excel - google sheets actually - I used to sample happy.
MICROBIT-STREAM_FEQUENCIES.hex.zip
happy-sound-sample
The text was updated successfully, but these errors were encountered: