Skip to content

Add video models + functions #814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 61 commits into from
Closed

Add video models + functions #814

wants to merge 61 commits into from

Conversation

dreadatour
Copy link
Contributor

@dreadatour dreadatour commented Jan 13, 2025

See #797

Video models added

class VideoFile(File):
    """`DataModel` for reading video files."""

    def get_info(self) -> "Video":
        """Returns video file information."""

    def get_frame_np(self, frame: int) -> "ndarray":
        """Reads video frame from a file."""

    def get_frame(self, frame: int, format: str = "jpg") -> bytes:
        """Reads video frame from a file and returns as image bytes."""

    def save_frame(self, frame: int, output_file: str, format: Optional[str] = None) -> "VideoFrame":
        """Saves video frame as an image file."""

    def get_frames_np(self, start_frame: int = 0, end_frame: Optional[int] = None, step: int = 1) -> "Iterator[ndarray]":
        """Reads video frames from a file."""

    def get_frames(self, start_frame: int = 0, end_frame: Optional[int] = None, step: int = 1, format: str = "jpg") -> "Iterator[bytes]":
        """Reads video frames from a file and returns as bytes."""

    def save_frames(self, output_dir: str, start_frame: int = 0, end_frame: Optional[int] = None, step: int = 1, format: str = "jpg") -> "Iterator[VideoFrame]":
        """Saves video frames as image files."""

    def save_fragment(self, start_time: float, end_time: float, output_file: str) -> "VideoFragment":
        """Saves video interval as a new video file."""

    def save_fragments(self, intervals: list[tuple[float, float]], output_dir: str) -> "Iterator[VideoFragment]":
        """Saves video intervals as new video files."""


class VideoFragment(VideoFile):
    """`DataModel` for reading video fragments."""

    start: float = Field(default=-1.0)
    end: float = Field(default=-1.0)


class VideoFrame(VideoFile):
    """`DataModel` for reading video frames."""

    frame: int = Field(default=-1)
    timestamp: float = Field(default=-1.0)

Meta models added

class Image(DataModel):
    """`DataModel` for image file meta information."""

    width: int = Field(default=-1)
    height: int = Field(default=-1)
    format: str = Field(default="")


class Video(DataModel):
    """`DataModel` for video file meta information."""

    width: int = Field(default=-1)
    height: int = Field(default=-1)
    fps: float = Field(default=-1.0)
    duration: float = Field(default=-1.0)
    frames: int = Field(default=-1)
    format: str = Field(default="")
    codec: str = Field(default="")


class Frame(DataModel):
    """`DataModel` for video frame image meta information."""

    frame: int = Field(default=-1)
    timestamp: float = Field(default=-1.0)
    width: int = Field(default=-1)
    height: int = Field(default=-1)
    format: str = Field(default="")

Usage examples

Can be found here: iterative/datachain-examples#28

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 90.80460% with 16 lines in your changes missing coverage. Please review.

Project coverage is 87.81%. Comparing base (7f757b3) to head (4098e8b).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/lib/video.py 88.42% 8 Missing and 6 partials ⚠️
src/datachain/lib/file.py 96.22% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #814      +/-   ##
==========================================
+ Coverage   87.74%   87.81%   +0.07%     
==========================================
  Files         129      130       +1     
  Lines       11462    11633     +171     
  Branches     1545     1567      +22     
==========================================
+ Hits        10057    10216     +159     
- Misses       1017     1023       +6     
- Partials      388      394       +6     
Flag Coverage Δ
datachain 87.74% <90.80%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing PR!

It would be great to use concise and minimalistic naming and API because we are going to have many file types for multiple domains.

  1. Naming

Keywords like Meta will make it hard for user to remember and use the classes - user have their own meta 🙂

How about this renaming:
VideoFile -> BaseVideo (I assume people won't use this often)
VideoMeta -> Video (the most used class)
VideoClip -> Clip (also, shouldn't it be based on Video with meta?)
VideoFrame -> FrameBase
VideoFrameMeta -> Frame

start_time --> start
end_time --> end
frames_count --> count

Image -> BaseImage
ImageMeta -> Image

FileTypes can be also extended: image (read meta), base_image (do not read meta), video (read meta), base_video (do not read meta), video_clip, base_video_clip , ...

  1. Do we need dummy classes?

I assume that people prefer working with meta information while dealing with images and videos. A followup question - do we really need BaseImages and BaseVideo without any logic? Why don't we clean up API and keep only Meta-enrich version in the API? User still can work with videos as File if meta is not needed.

  1. Do we need singular methods?

save_video_clips() and save_video_clip() How much extra code user needs to get rid of singular form. If one method - let's avoid the singular version.

The same question for video_frames() and video_frames_np()

I assume, we can add the method and classes later if there is a need. But I'd not start with such rich API for now and try my best to keep in minimalistic.

WDYT?

Copy link

cloudflare-workers-and-pages bot commented Jan 14, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 4098e8b
Status: ✅  Deploy successful!
Preview URL: https://4d43370e.datachain-documentation.pages.dev
Branch Preview URL: https://video-models.datachain-documentation.pages.dev

View logs

@dreadatour
Copy link
Contributor Author

  1. Naming

Keywords like Meta will make it hard for user to remember and use the classes - user have their own meta 🙂

👍

How about this renaming: VideoFile -> BaseVideo (I assume people won't use this often) VideoMeta -> Video (the most used class) VideoClip -> Clip (also, shouldn't it be based on Video with meta?) VideoFrame -> FrameBase VideoFrameMeta -> Frame

For now we have naming with File: TextFile, ImageFile and File itself. I left VideoFile for now, but rename others:

  • ImageMeta -> Image
  • VideoClipFile -> VideoClip (I can rename it to Clip as you suggested, just not sure yet, because see next line)
  • VideoFrameFile -> VideoFrame (I can rename it to Frame to be consistent with Clip, also Frame is already busy, see below)
  • VideoMeta -> Video
  • VideoFrameMeta -> Frame

start_time --> start end_time --> end frames_count --> count

Done. Only frames_count became frames, because I am not sure about count, too general, IMO.

Image -> BaseImage ImageMeta -> Image

We don't have Image model, we have ImageFile model, left it as is for now. ImageMeta -> Image done.

FileTypes can be also extended: image (read meta), base_image (do not read meta), video (read meta), base_video (do not read meta), video_clip, base_video_clip , ...

That's good suggestion, only we use FileTypes for now only in from_storage method. I am not sure we we want to change it to download files and read meta 🤔 Even with additional param.

  1. Do we need dummy classes?

I assume that people prefer working with meta information while dealing with images and videos. A followup question - do we really need BaseImages and BaseVideo without any logic? Why don't we clean up API and keep only Meta-enrich version in the API? User still can work with videos as File if meta is not needed.

Good question. I've added VideoFile only because we already have ImageFile, just to be consistent. Also it is useful when we use from_storage with type=video, and then we can use VideoFile type in mappers, like this:

def video_meta(file: "VideoFile") -> Video:
    """
    Returns video file meta information.

    Args:
        file (VideoFile): VideoFile object.

    Returns:
        Video: Video file meta information.
    """
  1. Do we need singular methods?

save_video_clips() and save_video_clip() How much extra code user needs to get rid of singular form. If one method - let's avoid the singular version.
The same question for video_frames() and video_frames_np()

Sounds reasonable to me 👍 Will update the code (not done yet).

  1. Default values

Done.

WDYT?

Those are great comments! Love the discussion ❤️


def save(self, destination: str):
"""Writes it's content to destination"""
self.read().save(destination)


class Image(DataModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this separate model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for video info (Video model). I can remove it from this PR 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just a bit weird that we have ImageFile and Image (that contains only some basic metadata) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was VideoMeta (and ImageMeta) before, but Dmitry was asked to rename these models here. I agree having Video (Image) model with just meta looks odd. I think you're right and we should inherit this model from VideoFile (ImageFile) to extend files with meta, than it will make sense. If no, what do you think about VideoInfo (and ImageInfo)?


return video_frame(self, frame, format)

def save_frame(self, frame: int, output_file: str) -> "VideoFrame":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does VideoFrame has information that is needed it to be File?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. VideoFrame class looks like this:

class VideoFrame(VideoFile):
    """`DataModel` for reading video frames."""

    frame: int = Field(default=-1)
    timestamp: float = Field(default=-1.0)
    orig: File

It is inherited from VideoFile, which is inherited from File. I am thinking now this VideoFrame model should be inherited from ImageFile instead, because it is an image.

Also I am not sure TBH if we need orig: File here (also should it be orig: VideoFile?)

Also what if we want to have "virtual video frame" model? Inherited from VideoFile, so we do have a video file in this model and with additional frame and timestamp fields. This way we can do not store frame in the storage and use original video file and strip the frame from it then needed.

Many questions regarding to the API, many answers but same time it is way to much for this PR and we need real world use cases and examples to make it all works the best way.

@dreadatour
Copy link
Contributor Author

@dreadatour is the PR description up to date?

It is now 👍

codec: str = Field(default="")


class Frame(DataModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to rename those back to SomethingMeta if we keep this approach

it is super confusing - VideoFrame and Frame - which one is the main class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dreadatour
Copy link
Contributor Author

In this Video Models PR, we now have models for:

  • VideoFile – based on the File model with additional methods added to work with video: get info, get frames, get fragments.
  • VideoFragment – based on the VideoFile model with additional fields: start time and end time.
  • VideoFrame – based on ImageFile with additional fields: frame number and timestamp.

There are some questions about this implementation:

  1. When we are splitting VideoFile into fragments, we are uploading fragment video clips into storage and creating a new VideoFragment model with this uploaded video file. In this model, we have start time and end time signals, but there is no link by default to the original video file. Do we need this link (for example, an orig field with VideoFile type)? Do we always need this link? Do we need these signals (start and end time) without a link to the original video file?

  2. Same for frames: when we are splitting VideoFile into frames, we are uploading frame images into storage and creating a new VideoFrame model with this uploaded image file. We do have frame and timestamp signals, but no link to the original video. Do we need these signals? Do we need the link to the original video?

  3. What about virtual video fragments and lightweight frame models? It is an original video file model with additional signals (start and end time for fragments and timestamp for frames), but the original file is still the same—no physical file split and no upload happens. It looks like these virtual fragment and virtual frame models are required and will be used often, but we haven’t implemented them yet. Do we need them? How should we organize an API to work with these models? Do we need a set of additional methods in the VideoFile model class?

@dreadatour dreadatour mentioned this pull request Feb 3, 2025
@dreadatour dreadatour closed this Feb 6, 2025
@dreadatour
Copy link
Contributor Author

Continue work in #890

@dreadatour dreadatour deleted the video-models branch February 6, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Video file and Video clip, Video frame models and operations with them
9 participants