spokenlanguage
diff --git a/‎.github/pull_request_template.md
+8 b/‎.github/pull_request_template.md
+8
diff --git a/‎CHANGELOG.md
+30 b/‎CHANGELOG.md
+30
diff --git a/‎CONTRIBUTING.md
+41 b/‎CONTRIBUTING.md
+41
diff --git a/‎README.md
+23-1 b/‎README.md
+23-1
diff --git a/‎platalea/asr.py
+4-2 b/‎platalea/asr.py
+4-2
diff --git a/‎platalea/attention.py
+2-1 b/‎platalea/attention.py
+2-1
diff --git a/‎platalea/audio/features.py
+28-23 b/‎platalea/audio/features.py
+28-23
diff --git a/‎platalea/audio/filters.py
+19-16 b/‎platalea/audio/filters.py
+19-16
diff --git a/‎platalea/audio/melfreq.py
+8-6 b/‎platalea/audio/melfreq.py
+8-6
@@ -0,0 +1,8 @@
+<!-- Describe your PR here -->
+
+<!-- Please, make sure the following items are checked -->
+Checklist before merging:
+
+- [ ] Does this PR warrant a version bump? If so, make sure you add a git tag and GitHub release.
+- [ ] Did you update the changelog with a user-readable summary? Please, include references to relevant issues or PR discussions. See [CII [release_notes] criterion](https://bestpractices.coreinfrastructure.org/en/criteria/0#0.release_notes_vulns).
+- [ ] When adding new functionality: did you also add a test for it?
@@ -0,0 +1,30 @@
+# Changelog
+<!--
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+-->
+
+## [Unreleased]
+<!-- track upcoming changes here; move to new versioned section at release time -->
+
+## [1.0] - 9 December 2020
+
+### Added
+- Introducing an attention-based encoder-decoder architecture for speech recognition.
+- Multitask training with multiple objectives (e.g. cross-modality retrieval and speech transcription) is also possible now.
+
+<!--
+### Removed
+
+### Changed
+-->
+
+## [0.9] - 20 January 2020
+
+State of the repo before @bhigy's merge leading to version 1.0.
+
+[Unreleased]: https://github.com/spokenlanguage/platalea/compare/v1.0...HEAD
+[1.0]: https://github.com/spokenlanguage/platalea/releases/tag/v1.0
+[0.9]: https://github.com/gchrupala/platalea/releases/tag/v0.9
@@ -0,0 +1,41 @@
+# Contributing guidelines
+
+We welcome any kind of contribution to our software, from simple comment or question to a full fledged [pull request](https://help.github.com/articles/about-pull-requests/).
+
+A contribution can be one of the following cases:
+
+1. you have a question;
+1. you think you may have found a bug (including unexpected behavior);
+1. you want to make some kind of change to the code base (e.g. to fix a bug, to add a new feature, to update documentation);
+1. you want to make a new release of the code base.
+
+The sections below outline the steps in each case.
+
+## You have a question
+
+1. use the search functionality [here](https://github.com/spokenlanguage/platalea/issues) to see if someone already filed the same issue;
+2. if your issue search did not yield any relevant results, make a new issue;
+3. apply the "Question" label; apply other labels when relevant.
+
+## You think you may have found a bug
+
+1. use the search functionality [here](https://github.com/spokenlanguage/platalea/issues) to see if someone already filed the same issue;
+1. if your issue search did not yield any relevant results, make a new issue, making sure to provide enough information to the rest of the community to understand the cause and context of the problem. Depending on the issue, you may want to include:
+    - the [SHA hashcode](https://help.github.com/articles/autolinked-references-and-urls/#commit-shas) of the commit that is causing your problem;
+    - some identifying information (name and version number) for dependencies you're using;
+    - information about the operating system;
+1. apply relevant labels to the newly created issue.
+
+## You want to make some kind of change to the code base
+
+1. (**important**) announce your plan to the rest of the community *before you start working*. This announcement should be in the form of a (new) issue;
+1. (**important**) wait until some kind of consensus is reached about your idea being a good idea;
+1. if needed, fork the repository to your own Github profile and create your own feature branch off of the latest master commit. While working on your feature branch, make sure to stay up to date with the master branch by pulling in changes, possibly from the 'upstream' repository (follow the instructions [here](https://help.github.com/articles/configuring-a-remote-for-a-fork/) and [here](https://help.github.com/articles/syncing-a-fork/));
+1. make sure the existing tests still work by running ``tox``;
+1. also make sure you do not diverge from PEP8 standards by fixing any issues flake8 reports in the ``tox`` run; if you do diverge, make sure you clearly explain why;
+1. add your own tests (if necessary);
+1. update or expand the documentation;
+1. push your feature branch to (your fork of) the platalea repository on GitHub;
+1. create the pull request, e.g. following the instructions [here](https://help.github.com/articles/creating-a-pull-request/).
+
+In case you feel like you've made a valuable contribution, but you don't know how to write or run tests for it, or how to generate the documentation: don't let this discourage you from making the pull request; we can help you! Just go ahead and submit the pull request, but keep in mind that you might be asked to append additional commits to your pull request.
@@ -2,15 +2,29 @@
 Understanding visually grounded spoken language via multi-tasking
 
 [![DOI](https://zenodo.org/badge/239750248.svg)](https://zenodo.org/badge/latestdoi/239750248)
-![install and run tests](https://github.com/egpbos/platalea/workflows/install%20and%20run%20tests/badge.svg?branch=master)
+[![install and run tests](https://github.com/egpbos/platalea/workflows/install%20and%20run%20tests/badge.svg?branch=master)](https://github.com/spokenlanguage/platalea/actions/workflows/pythonapp.yml)
 [![codecov](https://codecov.io/gh/spokenlanguage/platalea/branch/master/graph/badge.svg)](https://codecov.io/gh/spokenlanguage/platalea)
 
 ## Installation
 
+Clone this repo and cd into it:
+
+```sh
+git clone https://github.com/spokenlanguage/platalea.git
+cd platalea
+```
+
+To install in a conda environment, assuming conda has already been installed, run the following to download and install dependencies:
+
 ```sh
 conda create -n platalea python==3.8 pytorch -c conda-forge -c pytorch
 conda activate platalea
 pip install torchvision
+```
+
+Then install platalea with:
+
+```sh
 pip install .
 ```
 
@@ -93,6 +107,14 @@ If you don't want to use cloud logging of learning curves using wandb, you can
 disable it by running:
 ```wandb disabled```
 
+## Contributing
+
+If you want to contribute to the development of platalea, have a look at the [contribution guidelines](CONTRIBUTING.md).
+
+## Changelog
+
+We keep track of what is added, changed and removed in releases in the [changelog](CHANGELOG.md).
+
 ## References
 
 [1] Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing Image Description
 
@@ -82,6 +82,7 @@ def cost(self, item):
 
 def experiment(net, data, config, slt=False):
     _device = platalea.hardware.device()
+
     def val_loss():
         with torch.no_grad():
             net.eval()
@@ -114,10 +115,10 @@ def val_loss():
                 average_loss = cost['cost'] / cost['N']
                 if 'opt' not in config.keys() or config['opt'] == 'adam':
                     scheduler.step()
-                if j % 100 == 0:
+                if j % config['loss_logging_interval'] == 0:
                     logging.info("train {} {} {}".format(
                         epoch, j, average_loss))
-                if j % 400 == 0:
+                if j % config['validation_interval'] == 0:
                     logging.info("valid {} {} {}".format(epoch, j, val_loss()))
             with torch.no_grad():
                 net.eval()
@@ -154,6 +155,7 @@ def val_loss():
         torch.save(net, 'net.best.pt')
     return results
 
+
 def get_default_config(hidden_size_factor=1024):
     fd = D.Flickr8KData
     hidden_size = hidden_size_factor * 3 // 4
 
@@ -18,6 +18,7 @@ def forward(self, input):
         # return the resulting embedding
         return x
 
+
 class MeanPool(nn.Module):
     def __init__(self):
         super(MeanPool, self).__init__()
@@ -75,7 +76,7 @@ def __init__(self, in_size_enc, in_size_state, hidden_size):
         self.U_a = nn.Linear(in_size_enc, hidden_size, bias=False)
         self.W_a = nn.Linear(in_size_state, hidden_size, bias=False)
         self.v_a = nn.Linear(hidden_size, 1, bias=True)
-        self.prev_enc_out= None
+        self.prev_enc_out = None
 
     def forward(self, hidden, encoder_outputs):
         # Calculate energies for each encoder output
 
@@ -5,38 +5,40 @@
 
 @author: danny
 """
-from platalea.audio.preproc import four,pad,preemph, hamming, notch
-from platalea.audio.filters import apply_filterbanks,filter_centers, create_filterbanks
+from platalea.audio.preproc import four, pad, preemph, hamming, notch
+from platalea.audio.filters import apply_filterbanks, filter_centers, create_filterbanks
 from scipy.fftpack import dct
 import numpy
 import math
 
 # this file contains the main bulk of the actuall feature creation functions
 
-def delta (data, N):
-# calculate delta features, n is the number of frames to look forward and backward
+
+def delta(data, N):
+    # calculate delta features, n is the number of frames to look forward and backward
 
     # create a delta array of the right shape
     dt = numpy.zeros(data.shape)
     # pad data with first and last frame for size of n
-    for n in range (N):
-        data = numpy.row_stack((data[0,:],data, data[-1,:]))
+    for n in range(N):
+        data = numpy.row_stack((data[0, :], data, data[-1, :]))
     # calc n*c[x+n] + c[x-n] for n in Nand sum them
-    for n in range (1, N + 1):
-       dt += numpy.array([n * (data[x+n,:] - data[x-n,:]) for x  in range (N, len(data) - N)])
+    for n in range(1, N + 1):
+        dt += numpy.array([n * (data[x+n, :] - data[x-n, :]) for x in range(N, len(data) - N)])
     # normalise the deltas for the size of N
-    normalise = 2* sum([numpy.power(x,2) for x in range (1, N+1)])
+    normalise = 2 * sum([numpy.power(x, 2) for x in range(1, N+1)])
 
     dt = dt/normalise
 
     return (dt)
 
+
 def raw_frames(data, frame_shift, window_size):
-# this function cuts the data into frames and calculates each frames' accuracy
+    # this function cuts the data into frames and calculates each frames' accuracy
 
-    #determine the number of frames to be extracted
+    # determine the number of frames to be extracted
     nframes = math.floor(data.size/frame_shift)
-    #apply notch filter
+    # apply notch filter
     notched_data = notch(data)
     # pad the data
     data = pad(notched_data, window_size, frame_shift)
@@ -46,8 +48,8 @@ def raw_frames(data, frame_shift, window_size):
     frames = []
     energy = []
 
-    for f in range (0, nframes):
-        frame = data[f * frame_shift : f * frame_shift + window_size]
+    for f in range(0, nframes):
+        frame = data[(f * frame_shift):(f * frame_shift + window_size)]
         energy.append(numpy.log(numpy.sum(numpy.square(frame), 0)))
         frames.append(frame)
 
@@ -59,43 +61,46 @@ def raw_frames(data, frame_shift, window_size):
 
     return (frames, energy)
 
+
 def get_freqspectrum(frames, alpha, fs, window_size):
-# this function prepares the raw frames for conversion to frequency spectrum
-# and applies fft
+    # this function prepares the raw frames for conversion to frequency spectrum
+    # and applies fft
 
     # apply preemphasis
     frames = preemph(frames, alpha)
     # apply hamming windowing
     frames = hamming(frames)
     # apply fft
-    freq_spectrum = four(frames,fs,window_size)
+    freq_spectrum = four(frames, fs, window_size)
 
     return freq_spectrum
 
+
 def get_fbanks(freq_spectrum, nfilters, fs):
-#  this function calculates the filters and creates filterbank features from
-#  the fft features
+    #  this function calculates the filters and creates filterbank features from
+    #  the fft features
 
     # get the frequencies corresponding to the bins returned by the fft
     xf = numpy.linspace(0.0, fs/2, numpy.shape(freq_spectrum)[1])
     # get the filter frequencies
-    fc = filter_centers (nfilters,fs,xf)
+    fc = filter_centers(nfilters, fs, xf)
     # create filterbanks
     filterbanks = create_filterbanks(nfilters, xf, fc)
     # apply filterbanks
     fbanks = apply_filterbanks(freq_spectrum, filterbanks)
 
     return fbanks
 
+
 def get_mfcc(fbanks):
-# this function creates mfccs from the fbank features
+    # this function creates mfccs from the fbank features
 
     # apply discrete cosine transform to get mfccs. According to convention,
     # we discard the first filterbank (which is roughly equal to the method
     # where we only space filters from 1000hz onwards)
-    mfcc = dct(fbanks[:,1:])
+    mfcc = dct(fbanks[:, 1:])
     # discard the first coefficient of the mffc as well and take the next 13
     # coefficients.
-    mfcc = mfcc[:,1:13]
+    mfcc = mfcc[:, 1:13]
 
     return mfcc
@@ -9,24 +9,25 @@
 from platalea.audio.melfreq import freq2mel, mel2freq
 import numpy
 
-def create_filterbanks (nfilters,freqrange,fc):
+
+def create_filterbanks(nfilters, freqrange, fc):
     # function to create filter banks. takes as input
     # the number of filters to be created, the frequency range and the
     # filter centers
     filterbank = []
     # for the desired # of filters do
-    for n in range (0,nfilters):
+    for n in range(0, nfilters):
         # set the begin center and end frequency of the filters
         begin = fc[n]
-        center= fc[n+1]
+        center = fc[n+1]
         end = fc[n+2]
         f = []
         # create triangular filters
         for x in freqrange:
             # 0 for f outside the filter
             if x < begin:
                 f.append(0)
-            #increasing to 1 towards the center
+            # increasing to 1 towards the center
             elif begin <= x and x <= center:
                 f.append((x-begin)/(center-begin))
             # decreasing to 0 upwards from the center
@@ -36,27 +37,29 @@ def create_filterbanks (nfilters,freqrange,fc):
             elif x > end:
                 f.append(0)
         filterbank.append(f)
-        
+
     return filterbank
-    
+
+
 def filter_centers(nfilters, fs, xf):
     # calculates the center frequencies for the mel filters
-    
-    #space the filters equally in mels
+
+    # space the filters equally in mels
     spacing = numpy.linspace(0, freq2mel(fs/2), nfilters+2)
-    #back from mels to frequency
+    # back from mels to frequency
     spacing = mel2freq(spacing)
     # round the filter frequencies to the nearest availlable fft bin frequencies
-    # and return the centers for the filters.  
-    filters = [xf[numpy.argmin(numpy.abs(xf-x))] for x in spacing]    
-    
+    # and return the centers for the filters.
+    filters = [xf[numpy.argmin(numpy.abs(xf-x))] for x in spacing]
+
     return filters
-    
+
+
 def apply_filterbanks(data, filters):
     # function to apply the filterbanks and take the log of the filterbanks
-    filtered_freq = numpy.log(numpy.dot(data, numpy.transpose(filters)))  
+    filtered_freq = numpy.log(numpy.dot(data, numpy.transpose(filters)))
     # same as with energy, taking the log of a filter bank with 0 power results in -inf
     # we approximate 0 power with -50 the log of 2e-22
-    filtered_freq[filtered_freq == numpy.log(0)] = -50     
-    
+    filtered_freq[filtered_freq == numpy.log(0)] = -50
+
     return filtered_freq
@@ -6,14 +6,16 @@
 @author: danny
 """
 import numpy
-#provides simple functions to convert a frequency to mel and vice versa
+# provides simple functions to convert a frequency to mel and vice versa
+
 
 def freq2mel(f):
-    #converts a frequency to mel
-    mel=1125*numpy.log(1+f/700)
+    # converts a frequency to mel
+    mel = 1125*numpy.log(1+f/700)
     return (mel)
 
+
 def mel2freq(m):
-    #converts mel to frequency
-    f=700*(numpy.exp(m/1125)-1)
-    return (f)
+    # converts mel to frequency
+    f = 700*(numpy.exp(m/1125)-1)
+    return f