start gemini module

autodistill · Dec 13, 2023 · 8a0842c · 8a0842c
1 parent 2fcc32b
commit 8a0842c
Show file tree

Hide file tree

Showing 8 changed files with 162 additions and 136 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Roboflow
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -3,64 +3,67 @@
     <a align="center" href="" target="_blank">
       <img
         width="850"
-        src="https://media.roboflow.com/open-source/autodistill/autodistill-banner.png?3"
+        src="https://media.roboflow.com/open-source/autodistill/autodistill-banner.png"
       >
     </a>
   </p>
 </div>
 
-# Autodistill Base Model Template
+# Autodistill Gemini Module
 
-**⚠️ Note: Before you start building a Base Model, check out our [Available Models](https://docs.autodistill.com/#available-models) directory to see if a model is already being implemented. If your desired model is being implemented, check the [Autodistill](https://github.com/autodistill/autodistill) GitHub Issues for progress. We encourage you to offer support to models you want to see in Autodistill if work is already being done on them.**
+This repository contains the code supporting the Gemini base model for use with [Autodistill](https://github.com/autodistill/autodistill).
 
-This repository contains a template for use in creating a Base Model for [Autodistill](https://github.com/autodistill/autodistill).
+[Gemini](https://deepmind.google/technologies/gemini/), developed by Google, is a multimodal computer vision model that allows you to ask questions about images. You can use Gemini with Autodistill for image classification.
 
-A Base Model is a large model that you can use for automatically labeling data. Autodistill enables you to connect Base Models to a smaller Target Model. A new model is trained using the Target Model architecture and your labeled data. This model will be smaller and thus more cost effective to run.
-
-Autodistill is an ecosystem of Base and Target Models, with the main [Autodistill](https://github.com/autodistill/autodistill) repository acting as the bridge between the two.
-
-This repository contains a starter template from which you can create a Base Model extension.
+> [!NOTE]
+> Using this project will incur billing charges for API calls to the Gemini API.
+> Refer to the [Google Cloud pricing](https://cloud.google.com/pricing/) page for more information and to calculate your expected pricing. This package makes one API call per image you want to label.
 
 Read the full [Autodistill documentation](https://autodistill.github.io/autodistill/).
-## Steps to Build a Base Model
-
-To build a base model, first rename the `src` directory to the name of the model you want to implement:
-
-```
-mkdir autodistill_model_name
-```
-
-Use underscores to separate words in the folder name.
 
-Next, open the `model.py` file. This is the file where your model loading and inference code will be stored. If you need to write helper functions for use with your model -- for example, long methods for loading data, processing extensions -- you may opt to create new files to store the helper scripts.
+## Installation
 
-In `model.py`, replace the `Model` class name with the name of your model.
+To use Gemini with autodistill, you need to install the following dependency:
 
-Next, implement the following functions:
-
-1. `__init__`: Code for loading the model.
-2. `predict`: A function that takes in an image name, runs inference, and returns a `supervision` Detections object (object detection) or a `supervision` Classifications object (classification).
-
-Replace the import statement in the `__init__.py` file in your model directory to point to your model. You only need to import the model, such as:
 
+```bash
+pip3 install autodistill-gemini
 ```
-from autodistill_clip.clip_model import CLIP
-```
-
-Your version should be set in the `__init__.py` file as `0.1.0` before submitting your model for review.
 
-Update the `setup.py` file to use the name of your model where appropriate. Add all of the requisite dependencies to the `install_requires` section.
-
-Your Base Model should feature a README that shows a minimal example of how to use the base model. This should only be a few lines of code. Refer to `README_EXAMPLE.md` for an example of an Autodistill Base Model README. Feel free to copy this example and replace all parts as required.
-
-Your package must be licensed under the same license as the model you are using (i.e. if your model uses an Apache 2.0 license, your Autodistill extension must use the same license). Your license should be in a file called `LICENSE`, stored in the root directory of your Autodistill extension GitHub repository.
+## Quickstart
+
+```python
+from autodistill_gemini import Gemini
+
+# define an ontology to map class names to our Gemini prompt
+# the ontology dictionary has the format {caption: class}
+# where caption is the prompt sent to the base model, and class is the label that will
+# be saved for that caption in the generated annotations
+# then, load the model
+base_model = Gemini(
+    ontology=CaptionOntology(
+        {
+            "person": "person",
+            "a forklift": "forklift"
+        }
+    ),
+    api_key="api-key",
+    gcp_region="us-central1",
+    gcp_project="project-name",
+)
+
+result = base_model.predict("image.jpg")
+
+print(result)
+
+# label a folder of images
+base_model.label("./context_images", extension=".jpeg")
+```
 
-Update your README to note the license applied to your package.
+## License
 
-When your Autodistill extension is ready for testing, open an Issue in the main [Autodistill](https://github.com/autodistill/autodistill) repository with a link to a public GitHub repository that contains your code.
+This project is licensed under an [MIT license](LICENSE).
 
-An Autodistill maintainer will review your code. If accepted, we will:
+## 🏆 Contributing
 
-1. Add your package to the [Autodistill documentation](https://docs.autodistill.com).
-2. Package your project up to PyPi and publish it as an official `autodistill` extension.
-3. Announce your project on social media.
+We love your input! Please see the core Autodistill [contributing guide](https://github.com/autodistill/autodistill/blob/main/CONTRIBUTING.md) to get started. Thank you 🙏 to all our contributors!
diff --git a/README_EXAMPLE.md b/README_EXAMPLE.md
diff --git a/autodistill_base_model/__init__.py b/autodistill_base_model/__init__.py
diff --git a/autodistill_base_model/model.py b/autodistill_base_model/model.py
diff --git a/autodistill_gemini/__init__.py b/autodistill_gemini/__init__.py
@@ -0,0 +1,3 @@
+from autodistill_gemini.gemini_model import Gemini
+
+__version__ = "0.1.0"
diff --git a/autodistill_gemini/gemini_model.py b/autodistill_gemini/gemini_model.py
@@ -0,0 +1,85 @@
+import os
+from dataclasses import dataclass
+
+import requests
+import supervision as sv
+from autodistill.detection import CaptionOntology, DetectionBaseModel
+
+HOME = os.path.expanduser("~")
+
+
+@dataclass
+class Gemini(DetectionBaseModel):
+    ontology: CaptionOntology
+    api_key: str
+    gcp_region: str
+    gcp_project: str
+
+    def __init__(
+        self, ontology: CaptionOntology, api_key: str, gcp_region: str, gcp_project: str
+    ) -> None:
+        self.ontology = ontology
+        self.api_key = api_key
+        self.gcp_region = gcp_region
+        self.gcp_project = gcp_project
+
+    def predict(self, input: str, prompt: str, confidence: int = 0.5) -> sv.Detections:
+        payload = {
+            "contents": {
+                "role": "user",
+                "parts": [
+                    {
+                        "fileData": {
+                            "mimeType": "image/png",
+                            "fileUri": input,
+                        }
+                    },
+                    {"text": prompt},
+                ],
+            },
+            "safety_settings": {
+                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
+                "threshold": "BLOCK_LOW_AND_ABOVE",
+            },
+            "generation_config": {
+                "temperature": 0.4,
+                "topP": 1.0,
+                "topK": 32,
+                "maxOutputTokens": 2048,
+            },
+        }
+
+        response = requests.post(
+            f"https://{self.gcp_region}-aiplatform.googleapis.com/v1/projects/{self.gcp_project}/locations/{self.gcp_region}/publishers/google/models/gemini-pro-vision:streamGenerateContent",
+            json=payload,
+            headers={"Authorization": f"Bearer {self.api_key}"},
+        )
+
+    #       "candidates": [
+    # {
+    #   "content": {
+    #     "parts": [
+    #       {
+    #         "text": string
+    #       }
+    #     ]
+    #   },
+
+        if not response.ok:
+            raise Exception(response.text)
+
+        response_body = response.json()
+
+        text_response = response_body["candidates"][0]["content"]["parts"][0]["text"]
+
+        prompts = self.ontology.prompts()
+
+        is_in = []
+
+        for prompt in prompts:
+            is_in.append(prompt in text_response)
+
+        return sv.Classifications(
+            class_ids=self.ontology.class_ids(),
+            confidence=[1 if i else 0 for i in is_in],
+        )
diff --git a/setup.py b/setup.py
@@ -1,27 +1,26 @@
+import re
+
 import setuptools
 from setuptools import find_packages
-import re
 
-with open("./autodistill_base_model/__init__.py", 'r') as f:
+with open("./autodistill_gemini/__init__.py", "r") as f:
     content = f.read()
     # from https://www.py4u.net/discuss/139845
     version = re.search(r'__version__\s*=\s*[\'"]([^\'"]*)[\'"]', content).group(1)
-    
+
 with open("README.md", "r") as fh:
     long_description = fh.read()
 
 setuptools.setup(
-    name="autodistill-base-model",
+    name="autodistill-gemini",
     version=version,
-    author="",
-    author_email="",
+    author="Roboflow",
+    author_email="[email protected]",
     description="Model for use with Autodistill",
     long_description=long_description,
     long_description_content_type="text/markdown",
-    url="",
-    install_requires=[
-        # list your requires
-    ],
+    url="https://github.com/autodistill/autodistill-gemini",
+    install_requires=["autodistill", "supervision"],
     packages=find_packages(exclude=("tests",)),
     extras_require={
         "dev": ["flake8", "black==22.3.0", "isort", "twine", "pytest", "wheel"],
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		from autodistill_gemini.gemini_model import Gemini

		__version__ = "0.1.0"