Skip to content

Ansumanbhujabal/Influencer_Performance_System

Repository files navigation

Influencer Performance System

A resource-intensive project designed to process video data, cluster influencers, and generate performance insights, using OpenAI's CLIP model for embeddings, facial recognition, and clustering techniques.


Architecture Overview

  1. Input:

    • Video URLs provided as raw data.
  2. Output:

    • Generate a comprehensive report showcasing influencers’ performance with clustered data, visualizations, and insights.
    • See The full output Influencer Performance System

      Influencer Performance Report

Influencer Label Influencer Pic Average Performance Video URL
30 Influencer 30 1.5304 Video Link
25 Influencer 25 1.12256666666667 Video Link
3 Influencer 3 1.02478604992305 Video Link
55 Influencer 55 0.9830456907 Video Link
45 Influencer 45 0.917806254725 Video Link
15 Influencer 15 0.8273821321 Video Link
7 Influencer 7 0.80381331575 Video Link
22 Influencer 22 0.7929559845 Video Link
34 Influencer 34 0.5907609883 Video Link
  1. Process Flow:

    • Generate vector embeddings for each video using OpenAI CLIP.

    • Cluster videos based on their embeddings to identify unique videos.

    • Calculate the average performance score for each unique video.

    • Extract human faces from the videos using OpenCV Haar cascades.

    • Identify the best-captured face from extracted images.

    • Save images to a GitHub raw content repository.

    • Match influencer faces across clusters to combine clusters based on face similarity using OpenAI CLIP.

    • Calculate the average performance score for each unique influencer.

      diagram-export-12-1-2024-9_08_38-PM

  2. Visualization:

    • Accessible via a Streamlit app or as an HTML file.

System Used

Machine Specifications:

  • CPU: 4 vCPUs (Intel Xeon Scalable, 3.5 GHz, Sapphire Rapids)
  • Memory: 16 GiB (4 GiB per vCPU)
  • Operating System: x86_64 architecture
  • Environment: LightningAI equivalent system

Dependencies:

All dependencies are listed in the requirements.txt file.


Setup Guide

Clone the Repository

git clone https://github.com/Ansumanbhujabal/Influencer_Performance_System.git
cd Influencer_Performance_System

Create and Activate Virtual Environment

# Create a virtual environment
python3 -m venv venv

# Activate the environment
source venv/bin/activate    # For Linux/Mac
venv\Scripts\activate       # For Windows

Install Dependencies

Install all required packages:

pip install -r requirements.txt

Manually install additional dependencies:

pip install ftfy regex tqdm --quiet
pip install git+https://github.com/openai/CLIP.git --quiet
pip install matplotlib --quiet
pip install opencv-python-headless --quiet
pip install torch torchvision torchaudio --quiet

Run the Jupyter Notebook

  1. Navigate to the notebooks directory:
    cd notebooks
  2. Start Jupyter Notebook:
    jupyter notebook
  3. Open and run Data_Processor_and_Insights.ipynb to process the data and generate insights.

Visualization and Report Access

Online Access:

The performance report can be visualized remotely:

Local Access:

  1. Open The HTML file present in any browser
    influencer_report_up.html
  2. Run the Streamlit app:
    streamlit run app.py
  3. For Numbers , you can refer to the
    cd /output
    Final_Influencer_Data_insights_up_dec1_t2.xlsx
  4. Open the generated report in any browser.

Future Scope

  • Real-time Face Matching: Optimize the workflow by introducing real-time face extraction and matching, eliminating redundant processes to save resources and reduce execution time.
  • Enhanced Clustering: Improve clustering mechanisms for better influencer detection and performance accuracy.
  • Scalability: Adapt the system to process larger datasets more efficiently.

Challenges Faced

  • Resource-Intensive Processing: Managing vector generation, face matching, and clustering on non-GPU systems.
  • Data Cleaning: Ensuring unique identification of videos and influencers from raw, unstructured data.
  • Cluster Matching: Efficiently combining clusters to avoid duplicates while maintaining accuracy.

Made Over a Weekend

Created with passion and dedication in a short span to showcase influencer performance analytics.


Author: Ansuman Bhujabal
GitHub Repository: Influencer Performance System

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published