Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebVTT does not parse in rsubs-lib 0.3.1 but did in 0.1.9 #58

Open
rbozan opened this issue Jun 7, 2024 · 5 comments
Open

WebVTT does not parse in rsubs-lib 0.3.1 but did in 0.1.9 #58

rbozan opened this issue Jun 7, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@rbozan
Copy link

rbozan commented Jun 7, 2024

https://gist.github.com/samdutton/ca37f3adaf4e23679957b8083e061177

pub fn parse_vtt(content: String) -> anyhow::Result<Vec<VTTLine>> {
    let result = VTT::parse(content)?.lines;
    Ok(result)
}

#[cfg(test)]
mod tests {
    use super::parse_vtt;

    #[tokio::test]
    async fn it_parses_vtt() {
        let subs = parse_vtt(include_str!("../fixtures/test.vtt").to_string());

        insta::assert_debug_snapshot!(subs);
    }

}

0.1.9:

Ok(
    [
        VTTLine {
            line_number: "",
            style: Some(
                "Default",
            ),
            line_start: Time {
                h: 0,
                m: 0,
                s: 2,
                ms: 500,
                frames: 0,
                fps: 0.0,
            },
            line_end: Time {
                h: 0,
                m: 0,
                s: 4,
                ms: 300,
                frames: 0,
                fps: 0.0,
            },
            position: Some(
                VTTPos {
                    pos: 0,
                    pos_align: None,
                    size: 0,
                    line: 0,
                    line_align: None,
                    align: "center",
                },
            ),
            line_text: "and the way we access it is changing\\N",
        },
    ],
)

0.3.1:

          0 │+Err(
          1 │+    VTTError {
          2 │+        line: 1,
          3 │+        kind: InvalidFormat,
          4 │+    },
 2313     5 │ )
@bytedream
Copy link
Contributor

bytedream commented Jul 3, 2024

The formatting probably crewed up when you copy-pasted the lines from the gist (at least that's what happened to me). After the WEBVTT header MUST be a newline, else the file is invalid (according to the spec; since 0.3.0 this library only parses successfully if the file is 100% spec compatible).

Invalid:

WEBVTT
00:00:00.500 --> 00:00:02.000
The Web is always changing

00:00:02.500 --> 00:00:04.300
and the way we access it is changing

Valid:

WEBVTT

00:00:00.500 --> 00:00:02.000
The Web is always changing

00:00:02.500 --> 00:00:04.300
and the way we access it is changing

@rbozan
Copy link
Author

rbozan commented Jul 3, 2024

Hm I guess you are right about that as I retried and it worked, but my original subtitles still do not seem to work. I'm not sure if it's because of this library or because their format is not following the spec. But I suppose they are not following the spec if you are telling me this library is now 100% spec compliant.

https://invidious.fdn.fr/api/v1/captions/Tnpe6aoJOA0?label=English

Their header is like this:

WEBVTT
Kind: captions
Language: en

00:00:00.000 --> 00:00:05.820
My friends are from Morocco, are you from Morocco too? 
- Morocco, of course. Yes.
Look here, how interesting, it says

00:00:05.820 --> 00:00:11.700
 "House of Pakistan".
It's truly being in one territory but feeling entirely in another.  

@adracea
Copy link
Owner

adracea commented Jul 3, 2024

No, this is a valid concern regarding the comments section of the spec. I believe I might have been ignoring them previously. But this is indeed something that I don't think is currently being accounted for.

From the specs: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#examples_2

@adracea adracea added the bug Something isn't working label Jul 3, 2024
@adracea
Copy link
Owner

adracea commented Jul 3, 2024

if !line.starts_with("WEBVTT") || block_lines.next().is_some() {

I believe the issue is there at a first glance, we could probably just remove the || ... stuff @bytedream , right?

@bytedream
Copy link
Contributor

if !line.starts_with("WEBVTT") || block_lines.next().is_some() {

I believe the issue is there at a first glance, we could probably just remove the || ... stuff @bytedream , right?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants