Conversation
…ted histogram processor
…and add numpy-compatible slicing access
This variant is slower than the previous one but is more flexible.
65a0c80 to
0b9de3c
Compare
superbock
left a comment
There was a problem hiding this comment.
I haven't had the chance to test everything, thus only some general comments.
madmom/features/beats.py
Outdated
| ---------- | ||
| max_agents : int | ||
| Max number of agents. | ||
| tempi : int |
There was a problem hiding this comment.
To be in line with other beat trackers, this should be num_tempi.
Additionally, all these parameters should be available in __init__, otherwise they cannot be controlled. If they are optional (i.e. initialised with default values) the should be marked accordingly.
| Child agents will inherit a percentage of their parents. | ||
|
|
||
| """ | ||
| # TODO: Should we use @classmethod to set all values from outside? |
There was a problem hiding this comment.
@classmethod is probably not the right choice for this. Usually these parameters are just added to __init__.
Since you documented them here (as numpydoc requires the init-params to be documented at class-level), these should be available to __init__. Vice versa, the params of init must be documented here as well.
| TEMPI = 3 | ||
| INDUCTION_TIME = 2. | ||
|
|
||
| class Agent(object): |
There was a problem hiding this comment.
I'd rather make this a class at module level. Although it is only used by MultiAgentBeatTrackingProcessor it is IMHO cleaner to move this class one level up. I'd rename it to BeatTrackingAgent(Processor) or something like that, maybe add a leading underscore if it is not usable otherwise.
| CORRECTION_FACTOR = 0.25 | ||
| INHERIT_SCORE_FACTOR = 0.9 | ||
|
|
||
| # used for normalizing the score, set from outside |
There was a problem hiding this comment.
If these are set from outside, normal class variables are just fine (lowercase without the leading _).
| interval=self.interval + int(error * 0.5), | ||
| prediction=self.prediction + int(error * 0.5), | ||
| beats=self.beats) | ||
| ] |
There was a problem hiding this comment.
This could be done in less lines of code with a simple loop.
Maybe a stupid question, but why is the error always added and not subtracted as well?
There was a problem hiding this comment.
Hmm, since all three objects created here have different intervals and predictions the only loop-thingy coming to my mind right now would be something like that:
predictions = [self.prediction + error, self.prediction + error, self.prediction + int(error * 0.5)]
intervals = [self.interval, self.interval + error, self.interval + int(error * 0.5)]
for p, i in zip(predictions, intervals):
self.__class__(prediction=p, interval=i, ...)
Is that would you mean? I also want to test whether all three child-agents are really necessary to make the results better (they are used like that in the paper).
Regarding the error variable: It can also be a negative number (if the max event occurred before the prediction), so adding should work ;)
There was a problem hiding this comment.
Yes, something like this, but you could even simplify it to:
prediction_errors = [1, 1, 0.5]
interval_errors = [0, 1, 0.5]
for p, i in zip(prediction_errors, interval_errors):
self.__class__(prediction=self.prediction + int(p * error),
interval=self.interval + int(i * error), ...)
| # penalize agent for not detecting the beat | ||
| self.score -= (abs(error) / frames) * normalization * act | ||
| # return new child agents | ||
| return new_agents |
There was a problem hiding this comment.
This could probably be refactored a bit by moving common code outside the if/else clause and then decide later on whether new agents need to be created or not.
| **kwargs) | ||
| self.tempo_estimator = tempo_estimator | ||
| self.agents = [] | ||
| # TODO: Not sure if thats nice? |
There was a problem hiding this comment.
No, it's not 😏 . I think these values should be passed to the agents when instantiating them.
| # induct agents after induction time has passed | ||
| if self.counter == self.INDUCTION_TIME * self.fps: | ||
| # create histogram of induction window | ||
| histogram = self.tempo_estimator.interval_histogram(buffer) |
There was a problem hiding this comment.
I am not sure, but shouldn't activation and not the buffer be passed to the tempo estimator? In the latter case you compute the histogram for more than the last activation which is probably not what you want.
There was a problem hiding this comment.
Ah yeah, thats a good point and actually a larger topic ;)
Well, the concept for both offline and online is the same: Always use the first N seconds to estimate the tempo. Then introduce the agents (for offline at the beginning of the song, for online after those N seconds). From that point in time there is no more tempo estimation going on and the agents are themselves responsible to adapt to the changing speed of the music. That's why in the code the online tempo estimator is not used at all, only the offline tempo estimator is used once after those N seconds (Btw. thats also done in a similar way in all papers I found). That also means that for online the first N seconds don't have agents and therefore cannot produce beat predictions which probably leads to a worse result. For that reason in some papers they just ignore those first N induction seconds when evaluating - but I think that would be unfair to all the other tracking algorithms which are capable of producing predictions from the beginning.
One other problem we are facing here is that the online networks tend to give worse predictions in (approx.) the first 5 seconds of a song. And that is exactly the section used to estimate the base-tempi for the agents. So I already thought about how agents could get their tempo updated during runtime but it's difficult because each agent has its own tempo already from the beginning (since we are using multiple tempo estimations) and updates it on the fly.
Also connected to that issue is the handling of abrupt tempo changes. Some solutions I found use a kind of "overseer" module which is only looking for sudden changes and starts another agent induction when necessary. I don't know yet whether we will need something like this since it also introduces additional complexity.
So yeah, right now for offline I get 90% and for online 80% F-Score for the Ballroom set (didn't play around with the parameters yet). If we have some more ideas I will try them out ofc. :)
There was a problem hiding this comment.
Yes, tempo estimation at the very beginning is indeed a problem. Relying on those estimations is thus probably not a very good idea. How about simply starting a fixed number of agents at the beginning of the song and initialise them with tempi distributed uniformly on a logarithmic tempo scale? Soon afterwards the "bad" agents will be killed/die anyways, right?
For example for downbeat tracking I limit the number of modelled tempi to 60 in order to prevent an excessively bloated state space. 60 tempi is enough to cover the whole tempo range from 40..250bpm. Please note, that the downbeat tracker is not able to use any other tempi.
I guess the agents are able to adapt their tempo quite quickly, so maybe this is a good workaround/extension and would render the need for tempo estimation completely.
There was a problem hiding this comment.
Ah, yeah interesting idea! I will test that (maybe also for offline) ;)
| # return beat(s) | ||
| return np.array(beats_) / self.fps | ||
|
|
||
| def induct_agents(self, activations, interval, start=0): |
There was a problem hiding this comment.
Probably make start a required parameter. Guess it is less error-prone then.
madmom/features/beats.py
Outdated
| # if no maxima could be found just use max value | ||
| if len(maxima) == 0: | ||
| maxima = np.array([activations.argmax()]) | ||
| # pick N maxima indizes where activation is highest |
492a4d5 to
3776eec
Compare
3776eec to
b4d5381
Compare
b2eaafa to
dcdbb8b
Compare
dcdbb8b to
1c640e5
Compare
1c640e5 to
be97b36
Compare
This PR adds another Beat Tracker which is based on Multiple Agents. It already works for offline and online but some improvements are necessary. Ideas and suggestions are highly welcome :)
Remaining TODOs before this pull request can be merged