A resource-intensive project designed to process video data, cluster influencers, and generate performance insights, using OpenAI's CLIP model for embeddings, facial recognition, and clustering techniques.
-
Input:
- Video URLs provided as raw data.
-
Output:
- Generate a comprehensive report showcasing influencers’ performance with clustered data, visualizations, and insights.
- See The full output Influencer Performance System
Influencer Label | Influencer Pic | Average Performance | Video URL |
---|---|---|---|
30 | ![]() |
1.5304 | Video Link |
25 | ![]() |
1.12256666666667 | Video Link |
3 | ![]() |
1.02478604992305 | Video Link |
55 | ![]() |
0.9830456907 | Video Link |
45 | ![]() |
0.917806254725 | Video Link |
15 | ![]() |
0.8273821321 | Video Link |
7 | ![]() |
0.80381331575 | Video Link |
22 | ![]() |
0.7929559845 | Video Link |
34 | ![]() |
0.5907609883 | Video Link |
-
Process Flow:
-
Generate vector embeddings for each video using OpenAI CLIP.
-
Cluster videos based on their embeddings to identify unique videos.
-
Calculate the average performance score for each unique video.
-
Extract human faces from the videos using OpenCV Haar cascades.
-
Identify the best-captured face from extracted images.
-
Save images to a GitHub raw content repository.
-
Match influencer faces across clusters to combine clusters based on face similarity using OpenAI CLIP.
-
Calculate the average performance score for each unique influencer.
-
-
Visualization:
- Accessible via a Streamlit app or as an HTML file.
- CPU: 4 vCPUs (Intel Xeon Scalable, 3.5 GHz, Sapphire Rapids)
- Memory: 16 GiB (4 GiB per vCPU)
- Operating System: x86_64 architecture
- Environment: LightningAI equivalent system
All dependencies are listed in the requirements.txt
file.
git clone https://github.com/Ansumanbhujabal/Influencer_Performance_System.git
cd Influencer_Performance_System
# Create a virtual environment
python3 -m venv venv
# Activate the environment
source venv/bin/activate # For Linux/Mac
venv\Scripts\activate # For Windows
Install all required packages:
pip install -r requirements.txt
Manually install additional dependencies:
pip install ftfy regex tqdm --quiet
pip install git+https://github.com/openai/CLIP.git --quiet
pip install matplotlib --quiet
pip install opencv-python-headless --quiet
pip install torch torchvision torchaudio --quiet
- Navigate to the
notebooks
directory:cd notebooks
- Start Jupyter Notebook:
jupyter notebook
- Open and run
Data_Processor_and_Insights.ipynb
to process the data and generate insights.
The performance report can be visualized remotely:
- Open The HTML file present in any browser
influencer_report_up.html
- Run the Streamlit app:
streamlit run app.py
- For Numbers , you can refer to the
cd /output Final_Influencer_Data_insights_up_dec1_t2.xlsx
- Open the generated report in any browser.
- Real-time Face Matching: Optimize the workflow by introducing real-time face extraction and matching, eliminating redundant processes to save resources and reduce execution time.
- Enhanced Clustering: Improve clustering mechanisms for better influencer detection and performance accuracy.
- Scalability: Adapt the system to process larger datasets more efficiently.
- Resource-Intensive Processing: Managing vector generation, face matching, and clustering on non-GPU systems.
- Data Cleaning: Ensuring unique identification of videos and influencers from raw, unstructured data.
- Cluster Matching: Efficiently combining clusters to avoid duplicates while maintaining accuracy.
Created with passion and dedication in a short span to showcase influencer performance analytics.
Author: Ansuman Bhujabal
GitHub Repository: Influencer Performance System