Skip to content

Commit 249cff1

Browse files
authored
Merge branch 'master' into 75-howto100m-preprocessing
2 parents 1e2d5fd + 90cc6de commit 249cff1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+392
-186
lines changed

.github/pull_request_template.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
<!-- Describe your PR here -->
2+
3+
<!-- Please, make sure the following items are checked -->
4+
Checklist before merging:
5+
6+
- [ ] Does this PR warrant a version bump? If so, make sure you add a git tag and GitHub release.
7+
- [ ] Did you update the changelog with a user-readable summary? Please, include references to relevant issues or PR discussions. See [CII [release_notes] criterion](https://bestpractices.coreinfrastructure.org/en/criteria/0#0.release_notes_vulns).
8+
- [ ] When adding new functionality: did you also add a test for it?

CHANGELOG.md

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Changelog
2+
<!--
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
-->
8+
9+
## [Unreleased]
10+
<!-- track upcoming changes here; move to new versioned section at release time -->
11+
12+
## [1.0] - 9 December 2020
13+
14+
### Added
15+
- Introducing an attention-based encoder-decoder architecture for speech recognition.
16+
- Multitask training with multiple objectives (e.g. cross-modality retrieval and speech transcription) is also possible now.
17+
18+
<!--
19+
### Removed
20+
21+
### Changed
22+
-->
23+
24+
## [0.9] - 20 January 2020
25+
26+
State of the repo before @bhigy's merge leading to version 1.0.
27+
28+
[Unreleased]: https://github.com/spokenlanguage/platalea/compare/v1.0...HEAD
29+
[1.0]: https://github.com/spokenlanguage/platalea/releases/tag/v1.0
30+
[0.9]: https://github.com/gchrupala/platalea/releases/tag/v0.9

CONTRIBUTING.md

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Contributing guidelines
2+
3+
We welcome any kind of contribution to our software, from simple comment or question to a full fledged [pull request](https://help.github.com/articles/about-pull-requests/).
4+
5+
A contribution can be one of the following cases:
6+
7+
1. you have a question;
8+
1. you think you may have found a bug (including unexpected behavior);
9+
1. you want to make some kind of change to the code base (e.g. to fix a bug, to add a new feature, to update documentation);
10+
1. you want to make a new release of the code base.
11+
12+
The sections below outline the steps in each case.
13+
14+
## You have a question
15+
16+
1. use the search functionality [here](https://github.com/spokenlanguage/platalea/issues) to see if someone already filed the same issue;
17+
2. if your issue search did not yield any relevant results, make a new issue;
18+
3. apply the "Question" label; apply other labels when relevant.
19+
20+
## You think you may have found a bug
21+
22+
1. use the search functionality [here](https://github.com/spokenlanguage/platalea/issues) to see if someone already filed the same issue;
23+
1. if your issue search did not yield any relevant results, make a new issue, making sure to provide enough information to the rest of the community to understand the cause and context of the problem. Depending on the issue, you may want to include:
24+
- the [SHA hashcode](https://help.github.com/articles/autolinked-references-and-urls/#commit-shas) of the commit that is causing your problem;
25+
- some identifying information (name and version number) for dependencies you're using;
26+
- information about the operating system;
27+
1. apply relevant labels to the newly created issue.
28+
29+
## You want to make some kind of change to the code base
30+
31+
1. (**important**) announce your plan to the rest of the community *before you start working*. This announcement should be in the form of a (new) issue;
32+
1. (**important**) wait until some kind of consensus is reached about your idea being a good idea;
33+
1. if needed, fork the repository to your own Github profile and create your own feature branch off of the latest master commit. While working on your feature branch, make sure to stay up to date with the master branch by pulling in changes, possibly from the 'upstream' repository (follow the instructions [here](https://help.github.com/articles/configuring-a-remote-for-a-fork/) and [here](https://help.github.com/articles/syncing-a-fork/));
34+
1. make sure the existing tests still work by running ``tox``;
35+
1. also make sure you do not diverge from PEP8 standards by fixing any issues flake8 reports in the ``tox`` run; if you do diverge, make sure you clearly explain why;
36+
1. add your own tests (if necessary);
37+
1. update or expand the documentation;
38+
1. push your feature branch to (your fork of) the platalea repository on GitHub;
39+
1. create the pull request, e.g. following the instructions [here](https://help.github.com/articles/creating-a-pull-request/).
40+
41+
In case you feel like you've made a valuable contribution, but you don't know how to write or run tests for it, or how to generate the documentation: don't let this discourage you from making the pull request; we can help you! Just go ahead and submit the pull request, but keep in mind that you might be asked to append additional commits to your pull request.

README.md

+23-1
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,29 @@
22
Understanding visually grounded spoken language via multi-tasking
33

44
[![DOI](https://zenodo.org/badge/239750248.svg)](https://zenodo.org/badge/latestdoi/239750248)
5-
![install and run tests](https://github.com/egpbos/platalea/workflows/install%20and%20run%20tests/badge.svg?branch=master)
5+
[![install and run tests](https://github.com/egpbos/platalea/workflows/install%20and%20run%20tests/badge.svg?branch=master)](https://github.com/spokenlanguage/platalea/actions/workflows/pythonapp.yml)
66
[![codecov](https://codecov.io/gh/spokenlanguage/platalea/branch/master/graph/badge.svg)](https://codecov.io/gh/spokenlanguage/platalea)
77

88
## Installation
99

10+
Clone this repo and cd into it:
11+
12+
```sh
13+
git clone https://github.com/spokenlanguage/platalea.git
14+
cd platalea
15+
```
16+
17+
To install in a conda environment, assuming conda has already been installed, run the following to download and install dependencies:
18+
1019
```sh
1120
conda create -n platalea python==3.8 pytorch -c conda-forge -c pytorch
1221
conda activate platalea
1322
pip install torchvision
23+
```
24+
25+
Then install platalea with:
26+
27+
```sh
1428
pip install .
1529
```
1630

@@ -93,6 +107,14 @@ If you don't want to use cloud logging of learning curves using wandb, you can
93107
disable it by running:
94108
```wandb disabled```
95109

110+
## Contributing
111+
112+
If you want to contribute to the development of platalea, have a look at the [contribution guidelines](CONTRIBUTING.md).
113+
114+
## Changelog
115+
116+
We keep track of what is added, changed and removed in releases in the [changelog](CHANGELOG.md).
117+
96118
## References
97119

98120
[1] Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing Image Description

platalea/asr.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ def cost(self, item):
8282

8383
def experiment(net, data, config, slt=False):
8484
_device = platalea.hardware.device()
85+
8586
def val_loss():
8687
with torch.no_grad():
8788
net.eval()
@@ -114,10 +115,10 @@ def val_loss():
114115
average_loss = cost['cost'] / cost['N']
115116
if 'opt' not in config.keys() or config['opt'] == 'adam':
116117
scheduler.step()
117-
if j % 100 == 0:
118+
if j % config['loss_logging_interval'] == 0:
118119
logging.info("train {} {} {}".format(
119120
epoch, j, average_loss))
120-
if j % 400 == 0:
121+
if j % config['validation_interval'] == 0:
121122
logging.info("valid {} {} {}".format(epoch, j, val_loss()))
122123
with torch.no_grad():
123124
net.eval()
@@ -154,6 +155,7 @@ def val_loss():
154155
torch.save(net, 'net.best.pt')
155156
return results
156157

158+
157159
def get_default_config(hidden_size_factor=1024):
158160
fd = D.Flickr8KData
159161
hidden_size = hidden_size_factor * 3 // 4

platalea/attention.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ def forward(self, input):
1818
# return the resulting embedding
1919
return x
2020

21+
2122
class MeanPool(nn.Module):
2223
def __init__(self):
2324
super(MeanPool, self).__init__()
@@ -75,7 +76,7 @@ def __init__(self, in_size_enc, in_size_state, hidden_size):
7576
self.U_a = nn.Linear(in_size_enc, hidden_size, bias=False)
7677
self.W_a = nn.Linear(in_size_state, hidden_size, bias=False)
7778
self.v_a = nn.Linear(hidden_size, 1, bias=True)
78-
self.prev_enc_out= None
79+
self.prev_enc_out = None
7980

8081
def forward(self, hidden, encoder_outputs):
8182
# Calculate energies for each encoder output

platalea/audio/features.py

+28-23
Original file line numberDiff line numberDiff line change
@@ -5,38 +5,40 @@
55
66
@author: danny
77
"""
8-
from platalea.audio.preproc import four,pad,preemph, hamming, notch
9-
from platalea.audio.filters import apply_filterbanks,filter_centers, create_filterbanks
8+
from platalea.audio.preproc import four, pad, preemph, hamming, notch
9+
from platalea.audio.filters import apply_filterbanks, filter_centers, create_filterbanks
1010
from scipy.fftpack import dct
1111
import numpy
1212
import math
1313

1414
# this file contains the main bulk of the actuall feature creation functions
1515

16-
def delta (data, N):
17-
# calculate delta features, n is the number of frames to look forward and backward
16+
17+
def delta(data, N):
18+
# calculate delta features, n is the number of frames to look forward and backward
1819

1920
# create a delta array of the right shape
2021
dt = numpy.zeros(data.shape)
2122
# pad data with first and last frame for size of n
22-
for n in range (N):
23-
data = numpy.row_stack((data[0,:],data, data[-1,:]))
23+
for n in range(N):
24+
data = numpy.row_stack((data[0, :], data, data[-1, :]))
2425
# calc n*c[x+n] + c[x-n] for n in Nand sum them
25-
for n in range (1, N + 1):
26-
dt += numpy.array([n * (data[x+n,:] - data[x-n,:]) for x in range (N, len(data) - N)])
26+
for n in range(1, N + 1):
27+
dt += numpy.array([n * (data[x+n, :] - data[x-n, :]) for x in range(N, len(data) - N)])
2728
# normalise the deltas for the size of N
28-
normalise = 2* sum([numpy.power(x,2) for x in range (1, N+1)])
29+
normalise = 2 * sum([numpy.power(x, 2) for x in range(1, N+1)])
2930

3031
dt = dt/normalise
3132

3233
return (dt)
3334

35+
3436
def raw_frames(data, frame_shift, window_size):
35-
# this function cuts the data into frames and calculates each frames' accuracy
37+
# this function cuts the data into frames and calculates each frames' accuracy
3638

37-
#determine the number of frames to be extracted
39+
# determine the number of frames to be extracted
3840
nframes = math.floor(data.size/frame_shift)
39-
#apply notch filter
41+
# apply notch filter
4042
notched_data = notch(data)
4143
# pad the data
4244
data = pad(notched_data, window_size, frame_shift)
@@ -46,8 +48,8 @@ def raw_frames(data, frame_shift, window_size):
4648
frames = []
4749
energy = []
4850

49-
for f in range (0, nframes):
50-
frame = data[f * frame_shift : f * frame_shift + window_size]
51+
for f in range(0, nframes):
52+
frame = data[(f * frame_shift):(f * frame_shift + window_size)]
5153
energy.append(numpy.log(numpy.sum(numpy.square(frame), 0)))
5254
frames.append(frame)
5355

@@ -59,43 +61,46 @@ def raw_frames(data, frame_shift, window_size):
5961

6062
return (frames, energy)
6163

64+
6265
def get_freqspectrum(frames, alpha, fs, window_size):
63-
# this function prepares the raw frames for conversion to frequency spectrum
64-
# and applies fft
66+
# this function prepares the raw frames for conversion to frequency spectrum
67+
# and applies fft
6568

6669
# apply preemphasis
6770
frames = preemph(frames, alpha)
6871
# apply hamming windowing
6972
frames = hamming(frames)
7073
# apply fft
71-
freq_spectrum = four(frames,fs,window_size)
74+
freq_spectrum = four(frames, fs, window_size)
7275

7376
return freq_spectrum
7477

78+
7579
def get_fbanks(freq_spectrum, nfilters, fs):
76-
# this function calculates the filters and creates filterbank features from
77-
# the fft features
80+
# this function calculates the filters and creates filterbank features from
81+
# the fft features
7882

7983
# get the frequencies corresponding to the bins returned by the fft
8084
xf = numpy.linspace(0.0, fs/2, numpy.shape(freq_spectrum)[1])
8185
# get the filter frequencies
82-
fc = filter_centers (nfilters,fs,xf)
86+
fc = filter_centers(nfilters, fs, xf)
8387
# create filterbanks
8488
filterbanks = create_filterbanks(nfilters, xf, fc)
8589
# apply filterbanks
8690
fbanks = apply_filterbanks(freq_spectrum, filterbanks)
8791

8892
return fbanks
8993

94+
9095
def get_mfcc(fbanks):
91-
# this function creates mfccs from the fbank features
96+
# this function creates mfccs from the fbank features
9297

9398
# apply discrete cosine transform to get mfccs. According to convention,
9499
# we discard the first filterbank (which is roughly equal to the method
95100
# where we only space filters from 1000hz onwards)
96-
mfcc = dct(fbanks[:,1:])
101+
mfcc = dct(fbanks[:, 1:])
97102
# discard the first coefficient of the mffc as well and take the next 13
98103
# coefficients.
99-
mfcc = mfcc[:,1:13]
104+
mfcc = mfcc[:, 1:13]
100105

101106
return mfcc

platalea/audio/filters.py

+19-16
Original file line numberDiff line numberDiff line change
@@ -9,24 +9,25 @@
99
from platalea.audio.melfreq import freq2mel, mel2freq
1010
import numpy
1111

12-
def create_filterbanks (nfilters,freqrange,fc):
12+
13+
def create_filterbanks(nfilters, freqrange, fc):
1314
# function to create filter banks. takes as input
1415
# the number of filters to be created, the frequency range and the
1516
# filter centers
1617
filterbank = []
1718
# for the desired # of filters do
18-
for n in range (0,nfilters):
19+
for n in range(0, nfilters):
1920
# set the begin center and end frequency of the filters
2021
begin = fc[n]
21-
center= fc[n+1]
22+
center = fc[n+1]
2223
end = fc[n+2]
2324
f = []
2425
# create triangular filters
2526
for x in freqrange:
2627
# 0 for f outside the filter
2728
if x < begin:
2829
f.append(0)
29-
#increasing to 1 towards the center
30+
# increasing to 1 towards the center
3031
elif begin <= x and x <= center:
3132
f.append((x-begin)/(center-begin))
3233
# decreasing to 0 upwards from the center
@@ -36,27 +37,29 @@ def create_filterbanks (nfilters,freqrange,fc):
3637
elif x > end:
3738
f.append(0)
3839
filterbank.append(f)
39-
40+
4041
return filterbank
41-
42+
43+
4244
def filter_centers(nfilters, fs, xf):
4345
# calculates the center frequencies for the mel filters
44-
45-
#space the filters equally in mels
46+
47+
# space the filters equally in mels
4648
spacing = numpy.linspace(0, freq2mel(fs/2), nfilters+2)
47-
#back from mels to frequency
49+
# back from mels to frequency
4850
spacing = mel2freq(spacing)
4951
# round the filter frequencies to the nearest availlable fft bin frequencies
50-
# and return the centers for the filters.
51-
filters = [xf[numpy.argmin(numpy.abs(xf-x))] for x in spacing]
52-
52+
# and return the centers for the filters.
53+
filters = [xf[numpy.argmin(numpy.abs(xf-x))] for x in spacing]
54+
5355
return filters
54-
56+
57+
5558
def apply_filterbanks(data, filters):
5659
# function to apply the filterbanks and take the log of the filterbanks
57-
filtered_freq = numpy.log(numpy.dot(data, numpy.transpose(filters)))
60+
filtered_freq = numpy.log(numpy.dot(data, numpy.transpose(filters)))
5861
# same as with energy, taking the log of a filter bank with 0 power results in -inf
5962
# we approximate 0 power with -50 the log of 2e-22
60-
filtered_freq[filtered_freq == numpy.log(0)] = -50
61-
63+
filtered_freq[filtered_freq == numpy.log(0)] = -50
64+
6265
return filtered_freq

platalea/audio/melfreq.py

+8-6
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,16 @@
66
@author: danny
77
"""
88
import numpy
9-
#provides simple functions to convert a frequency to mel and vice versa
9+
# provides simple functions to convert a frequency to mel and vice versa
10+
1011

1112
def freq2mel(f):
12-
#converts a frequency to mel
13-
mel=1125*numpy.log(1+f/700)
13+
# converts a frequency to mel
14+
mel = 1125*numpy.log(1+f/700)
1415
return (mel)
1516

17+
1618
def mel2freq(m):
17-
#converts mel to frequency
18-
f=700*(numpy.exp(m/1125)-1)
19-
return (f)
19+
# converts mel to frequency
20+
f = 700*(numpy.exp(m/1125)-1)
21+
return f

0 commit comments

Comments
 (0)