Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StarDoc model training #5

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

StarDoc model training #5

wants to merge 8 commits into from

Conversation

akshaykalkunte
Copy link
Contributor

WIP StarDoc model integration into FastLLM

@tscholak
Copy link
Collaborator

tscholak commented Nov 11, 2024

Hi @jlamypoirier! @akshaykalkunte and I talked and we want to push this PR over the finish line. There's a lot going on here, and we should review the approach top down to decide how this needs to be refactored to go into main. At the top of my head are the following separate concerns:

  1. Model architecture: Are VLMs GPTs from the point of view of Fast-LLM? I think they aren't because too much is different. We should add a new model architecture (e.g. "vlm") to Fast-LLM.
  2. Data preprocessing: Related to Add prepare command #38, we should factor out data preprocessing and introduce an offline preprocessing step, fast-llm prepare_data vlm --config stardoc.yaml, that makes VLMMemmapDatasets and stores them on disk.
  3. Vision encoder implementation: Right now it's a monolithic wrapper layer that uses a HF auto model. We should discuss if and when we reimplement this in Fast-LLM. This can be a separate effort and (as a side effect) result in yet another model class, vision_encoder, that we can also train from scratch if we wanted to.
  4. Cross-attention instead of adapter layer: StarDoc is moving towards a special form of cross-attention between the vision encoder and the LM decoder. This likely has implications for parallelization.
  5. Llama 3 support: StarDoc will use pre-trained Llama 3.2 (text-only?) models, we need to be able to load them. See also [feat] Llama 3.x rope scaling support #39.
  6. YAML configs: This PR currently doesn't support Fast-LLM's new YAML-based configs.

I think we can divide and conquer here.

@tscholak tscholak mentioned this pull request Nov 11, 2024
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants