lr_speech

Sep 18, 2024

06e0f05 · Sep 18, 2024

Name	Name	Last commit message	Last commit date
parent directory ..
Hillenbrand	Hillenbrand	whoops	Sep 10, 2024
README.md	README.md	Update README.md	Sep 16, 2024
lr_speech.py	lr_speech.py	Update lr_speech.py	Sep 18, 2024
requirements.txt	requirements.txt	Add files via upload	Sep 10, 2024
tests.py	tests.py	Update tests.py	Sep 17, 2024

README.md

Logistic Regression Redux: Pytorch

Overview

In this homework you'll implement a stochastic gradient descent for logistic regression and you'll apply it to vowel classification, for vowels spoken in 'hVd' contexts. The dataset has the following vowels:

ae, ah, aw, eh, ei, er, ih, iy, oa, oo, uh, uw

and the user will input a list of two of these, separated by a comma.

The dataset is described in Hillenbrand et al. (1995), and is contained in the 'Hillenbrand' directory, with subdirectories for men, women, and children's utterances. You'll be using the librosa package to read in the wav files and compute 13-dimensional MFCCs. The basic code for this is given.

You'll be running logistic regression on pairs of vowels, in order to classify them. Some pairs of vowels (like iy vs. ah) are easy to tell apart, and classification accuracy will be at ceiling, whereas other pairs of vowels (like eh vs. ae) are more difficult to tell apart. You should expect your regressions to yield test accuracies ranging between 0.5 and 1, depending on which pair of vowels you are classifying.

What you have to do

Setup:

You may need to create a virtual environment and install Python packages:

python3 -m venv venv
./venv/bin/pip3 install -r requirements.txt

Coding:

Create a dataset in the create_dataset function. You'll be extracting MFCCs from each wav file and taking the middle frame of the utterance as your feature vector. The matrix needs to have a row for each utterance and a column for each feature. Once you've extracted the middle frame of each utterance, you'll z-score each feature across all of the utterances.
Create a logistic regression model with a softmax/sigmoid activation function, using a network with one linear layer. To make unit tests work, we had to initialize a member of the SimpleLogreg class. Replace the none object with an appropriate nn.Module.
Optimize the function (remember to zero out gradients) in the step function.

Extra credit:

MFCCs computed at the midpoint frame may not be the best way of classifying vowels. Write an alternative feature extractor function, create_alt_dataset, that gets better classification performance on the pairs of vowels that weren't already at ceiling.

References:

Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97(5):3099– 3111.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

lr_speech

lr_speech

README.md

Overview

What you have to do

Files

lr_speech

Directory actions

More options

Directory actions

More options

Latest commit

History

lr_speech

Folders and files

parent directory

README.md

Overview

What you have to do