Sampling sounds for recognition (should be a wiki page?) #100

vladturcuman · 2021-05-02T16:59:54Z

A better place for this would be a wiki page as this isn't an issue.

This is meant to show the process of sampling and adding a new sound to be recognised as per the design here.

Adding a new class of sounds

The class MicroBitSoundRecogniser contains all the code to recognise sounds but doesn't have any sample of sounds added - hence it's made abstract by having the constructor private. For now, there is only one class that inherits it: EmojiRecogniser, which is supposed to recognise the emoji class of sounds.

To add a new recogniser one would need to create a new class that inherits the MicroBitSoundRecogniser and add each sound that should be recognised - as described below. An alternative is to replace the MicroBitSoundRecogniser altogether - in the pipeline - with a custom component that analyses the frequencies of each time frame to determine the sound being played - this would be preferable if the sounds are very long and constant.

Sampling a sound

Preparing the micro:bit

To sample a sound one would need to output the dominant frequency in each time frame. This can be done by either creating a component that outputs to serial only the dominant frequency as it comes from the MicroBitAudioProcessor, or just using the .hex attached.

Preparing the host machine

The micro:bit would need to be connected to a host machine with a serial monitor - the default baud rate is 115200.

A good serial monitor is CoolTerm, and it should be configured with the following settings:

The actual sampling

A sound can be sampled by clearing the serial monitor, playing the sound and disconnecting the serial monitor. The result - e.g. a sample of the happy emoji sound is shown below - would then be copied to an excel to be graphed.

If using the .hex provided, play the sound at a higher volume or closer as the thresholds for noise are higher than usual to filter more noise - this makes it easier to find where the sound started and ended.

Multiple samples would be needed to find which parts of the sound are consistent across multiple plays - that's because of the randomness in the generation of the sounds.

Analysing the results

Identifying a consistent part and aligning the samples

After having a couple of samples of the sound in excel, they can be graphed to see its shape. Graphing all of them would look like below.

Although the first half seems random, the sound can be recognised by its final part - which seems less random. Aligning the samples to match the final part (moving a couple of columns up or down), it should look like:

To mark where the first sequence starts, it's a good idea to add an empty row there. This would make it look like this:

To allow for deviations from these samples, it will further be broken down at a "checkpoint" - some frequency all samples reach. This would look like this:

The columns would now look like:

Removing redundant samples

As some of the samples are quite similar, they can be removed. To do this, it is useful to first copy each sequence to other columns.

For the first sequence, the first 2 samples are the same, so one of them can be removed. The 5th is the same as the first two but one shorter - so the other of the first two can be removed. When choosing which samples to remove, most of the times is better to keep the shorter one as the algorithm for matching the sequences tries to match them exactly one after the other or with another frequency that can be anything in between.

For the second sequence, the last 2 samples are the same, so one of them can be removed. Furthermore, when graphing the rest of the samples - see below - most of them are quite similar - only ~20 Hz deviation. This can be accommodated by setting a threshold >= 25 Hz for this sequence - although a threshold of ~70-80 Hz would be better in cases where there's more noise, and it's safer to have a larger threshold. In this case, only the 3rd and any one of the other samples would do.

After removing the redundant samples, the excel would look like:

Adding the sound to the recogniser

The code used for adding the happy sound (in the EmojiRecogniser class) is:

const uint8_t happy_sequences = 2;
const uint8_t happy_max_deviations = 2;

uint16_t happy_samples[happy_sequences][2][8] = {
    {
        { 4, 2121, 2394, 2646, 2646}, 	
        { 5, 2121, 2373, 2373, 2646, 2646}
    },
    {
        { 7, 2646, 2835, 2646, 2646, 2394, 2394, 2394}, 	
        { 7, 2646, 2835, 2835, 2646, 2394, 2373, 2394}
    }
};

uint16_t happy_thresholds[happy_sequences] = {
    40,
    50
};

uint8_t happy_deviations[happy_sequences] = {
    1,
    2
};

uint8_t happy_nr_samples[happy_sequences] = {
    2,
    2
};


void EmojiRecogniser::addHappySound() {

    uint8_t it = sounds_size;
    sounds_size ++;
    sounds_names[it] = new ManagedString("happy");

    uint8_t history = 0;

    for(uint8_t i = 0; i < happy_sequences; i++)
        for(uint8_t j = 0; j < happy_nr_samples[i]; j ++)
            history = max(history, happy_samples[i][j][0] + 4);

    sounds[it] = new Sound(happy_sequences, happy_max_deviations, history, true);

    for(uint8_t i = 0; i < happy_sequences; i++){
        sounds[it] -> sequences[i] = new SoundSequence(happy_nr_samples[i], happy_thresholds[i], happy_deviations[i]);
        for(uint8_t j = 0; j < happy_nr_samples[i]; j ++)
            sounds[it] -> sequences[i] -> samples[j] = new SoundSample(happy_samples[i][j] + 1, happy_samples[i][j][0]);
    }

}

The constants are:

happy_sequences - the number of sequences in the sound
happy_max_deviations - the maximum number of deviations allowed (i.e. data-points that can be more than the threshold away from the sampled frequency)
happy_samples - the samples from the excel
happy_thresholds - the threshold (i.e. how many Hz off the sampled frequency is allowed)
happy_deviations - the maximum number of deviation allowed for each sequence. The deviations should satisfy both this and happy_max_deviations.
happy_nr_samples - the number of samples in each sequence

To help copying the data from excel to happy_samples, a function that initializes the values of the array in excel can be used - for google sheets that would be = CONCATENATE("{ ",COUNT(J$6:J), ", ", textjoin(", ", 1, J$6:J), "}, "):

Attachments

Attached here are the .hex to stream the frequencies from the micro:bit and the excel - google sheets actually - I used to sample happy.

MICROBIT-STREAM_FEQUENCIES.hex.zip

happy-sound-sample

The text was updated successfully, but these errors were encountered:

vladturcuman mentioned this issue May 6, 2021

Added sound recognition: emoji and morse code #99

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling sounds for recognition (should be a wiki page?) #100

Sampling sounds for recognition (should be a wiki page?) #100

vladturcuman commented May 2, 2021 •

edited

Loading

Sampling sounds for recognition (should be a wiki page?) #100

Sampling sounds for recognition (should be a wiki page?) #100

Comments

vladturcuman commented May 2, 2021 • edited Loading

Adding a new class of sounds

Sampling a sound

Preparing the micro:bit

Preparing the host machine

The actual sampling

Analysing the results

Identifying a consistent part and aligning the samples

Removing redundant samples

Adding the sound to the recogniser

Attachments

vladturcuman commented May 2, 2021 •

edited

Loading