`--processors` / `n_processors` greater than 1 produces incorrect token timestamps

The `--processors` / `n_processors` option, which initially I assumed was related to core count, actually seems to split the audio to chunks and process them in parallel.

When `n_processors` is set to `1`, the correct token timestamps are produced for each segment:
```json
{
	"timestamps": {
		"from": "00:00:20,000",
		"to": "00:00:24,560"
	},
	"offsets": {
		"from": 20000,
		"to": 24560
	},
	"text": " actually took a watch out of its waistcoat pocket and look at it and then hurried on,",
	"tokens": [
		{
			"text": " actually",
			"timestamps": {
				"from": "00:00:20,020",
				"to": "00:00:20,530"
			},
			"offsets": {
				"from": 20020,
				"to": 20530
			},
			"id": 1682,
			"p": 0.998559,
			"t_dtw": -1
		},
		{
			"text": " took",
			"timestamps": {
				"from": "00:00:20,530",
				"to": "00:00:20,760"
			},
			"offsets": {
				"from": 20530,
				"to": 20760
			},
			"id": 1718,
			"p": 0.999518,
			"t_dtw": -1
		},
		{
			"text": " a",
			"timestamps": {
				"from": "00:00:20,810",
				"to": "00:00:20,850"
			},
			"offsets": {
				"from": 20810,
				"to": 20850
			},
			"id": 257,
			"p": 0.999047,
			"t_dtw": -1
		},

```


When `n_processors` is set to `2`, the timestamps reset at `20.930s` (which is possibly the split point used):
```json
{
	"timestamps": {
		"from": "00:00:20,930",
		"to": "00:00:25,970"
	},
	"offsets": {
		"from": 20930,
		"to": 25970
	},
	"text": " watch out of its waistcoat pocket and look at it and then hurry on, Alice started to her feet,",
	"tokens": [
		{
			"text": "[_BEG_]",
			"timestamps": {
				"from": "00:00:00,000",
				"to": "00:00:00,000"
			},
			"offsets": {
				"from": 0,
				"to": 0
			},
			"id": 50363,
			"p": 0.99157,
			"t_dtw": -1
		},
		{
			"text": " watch",
			"timestamps": {
				"from": "00:00:00,000",
				"to": "00:00:00,330"
			},
			"offsets": {
				"from": 0,
				"to": 330
			},
			"id": 2342,
			"p": 0.641946,
			"t_dtw": -1
		},
		{
			"text": " out",
			"timestamps": {
				"from": "00:00:00,330",
				"to": "00:00:00,530"
			},
			"offsets": {
				"from": 330,
				"to": 530
			},
			"id": 503,
			"p": 0.99322,
			"t_dtw": -1
		},
		{
			"text": " of",
			"timestamps": {
				"from": "00:00:00,530",
				"to": "00:00:00,660"
			},
			"offsets": {
				"from": 530,
				"to": 660
			},
			"id": 286,
			"p": 0.997212,
			"t_dtw": -1
		},
```

To work around this problem, it's possible, for the special case of `processors > 1`, to maybe try to track the `[_BEG_]` tokens which appear to 'zero-out' the time and add some offsets relative to them, but that makes things a bit more complex than needed. 

In general, it would be more natural and easy to just have the tokens timed correctly relative to the source audio, rather than needing to deal with complex hacks trying to guess the intended timestamps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`--processors` / `n_processors` greater than 1 produces incorrect token timestamps #2036

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

--processors / n_processors greater than 1 produces incorrect token timestamps #2036

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`--processors` / `n_processors` greater than 1 produces incorrect token timestamps #2036