Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiModal does not work with Next.js Frontend and FastAPI backend #65

Open
Laktus opened this issue Apr 25, 2024 · 10 comments
Open

MultiModal does not work with Next.js Frontend and FastAPI backend #65

Laktus opened this issue Apr 25, 2024 · 10 comments
Assignees

Comments

@Laktus
Copy link

Laktus commented Apr 25, 2024

Hi,

I wanted to implement custom evaluating logic. Realizing that only the python implemention of LLamaIndex supports QuestionGenerator i thought that it would be more reasonable to the FastAPI backend + Next.js Frontend setup.

I managed to pass the data for images to the backend extending the handleSubmit of useChat for vercel/ai#725. I however don't know how to duplicate the functionality of StreamData in the FastAPI backend.

Can you make this example work out of the box or provide some further documentation of how to implement this? Currently the multi modality does not work, without multiple changes.

Thanks for taking your time and reading my request.

@marcusschiesser
Copy link
Collaborator

@Laktus by coincidence, I just added a vercel/ai compatible StreamingResponse in the last release, see https://github.com/run-llama/create-llama/blob/main/templates/types/streaming/fastapi/app/api/routers/vercel_response.py

@Laktus
Copy link
Author

Laktus commented Apr 26, 2024

@marcusschiesser That looks awesome thanks! I think someone still needs to modify the template to work with it though. Or is it already integrated in the last release? The template is missing being able to handle the incoming data from useChat's handleSubmit as well as passing it back from server to frontend using the StreamingTextResponse (or your vercel_response in FastAPI).
Who would be responsible for integrating the changes of the latest FastAPI into the starter-template?

@marcusschiesser
Copy link
Collaborator

@Laktus The template is part of create-llama since npx [email protected] - It was just updated in npx [email protected] - what are you missing?

@Laktus
Copy link
Author

Laktus commented May 5, 2024

@marcusschiesser I will try integrating my changes to the backend for handling the data parameter and then will add a message if it works, thanks for the update!

@Laktus
Copy link
Author

Laktus commented May 9, 2024

@marcusschiesser
Hi Marcus, i don't see any inherent integration with images in the FastAPI backend. Do you know how i can add this? How do i pass image information into the ChatMessage object when im using a image capable model like GPT-4 or GPT-4-Vision? (I already managed to pass the image from front-to-backend and back to the frontend for the display)

Thanks for any help.

@marcusschiesser
Copy link
Collaborator

@Laktus, the problem is that in Python, you have to use the MultiModalVectorStoreIndex to use images

So I would start replacing VectorStoreIndex with this class.

Details about using it are here:
https://docs.llamaindex.ai/en/stable/examples/evaluation/multi_modal/multi_modal_rag_evaluation/?h=multimodalvectorstoreindex

If you like, you're welcome to post a diff of your code here.

@Laktus
Copy link
Author

Laktus commented May 10, 2024

@marcusschiesser But this saves the images into the vector DB or not? I don't want to populate the vector DB with the image information, but only want to attach the image to one message.

In the Vision Docs of OpenAI (https://platform.openai.com/docs/guides/vision) you can see the following possibilities of the completion API

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4-turbo",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

The astream_chat only accepts the message as a raw str. If we could directly pass a ChatMessage object then weit should be possible to add this additional information to the API call below or not? Why is this not supported out of the box? I think the TS version also solves it in this way.

@marcusschiesser
Copy link
Collaborator

@Laktus yes, this is a current issue of the Python version. We're working on aligning the multi-modal capabilities of the Python and the Typescript version. Once that's done, we will add image upload support to the FastAPI backend

@Laktus
Copy link
Author

Laktus commented Jul 28, 2024

@marcusschiesser Is there any update on when this will be implemented?

@marcusschiesser
Copy link
Collaborator

Not yet, we'll need multi-modal support in the Python framework first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants