Objective Personality AI (OPAI) is project aimed at developing AI models to classify personality types based on video transcripts. This project utilizes datasets gathered from various sources, including YouTube, and incorporates machine learning techniques to achieve its goal.
To run the scripts effectively, especially those involving computing embeddings like GritLM/GritLM-7B
, ensure your system meets the following requirements:
- GPU Memory: At least 27 GB of GPU memory is required to compute embeddings with the
GritLM/GritLM-7B
model. - Processing Time: It takes approximately 5 seconds to compute embeddings per dataset entry.
Note: These requirements are crucial for performance and avoiding runtime errors due to insufficient resources.
For alternative models see MTEB leaderboard
The success of AI typing relies heavily on the quality and variety of data it can access. Currently, there are two methods for gathering data:
- Utilizing the dataset created by Tom Aylott (subtlegradient)
- Scraping data from YouTube videos using this project
The repository assumes the dataset is provided in the following CSV format specified in the .env
file under the TRANSCRIPTS_CSV=<path_to_dataset.csv>
:
name,ops_type,ModalitySensory,ModalityDe,ObserverDecider,DiDe,OiOe,SN,TF,SleepPlay,BlastConsume,InfoEnergy,IntroExtro,FlexFriends,GeneralisationSpecialisation,transcript_tokens_length,transcript
Field descriptions:
name
Normalized person's name usingutils#normalise_name(name)
ops_type
Full ops type (e.g. MF-Ni/Fi-SB/P(C) [2])ModalitySensory: 'F' | 'M' | None
Sexual modality of the sensory functionModalityDe: 'F' | 'M' | None
Sexual modality of the extroverted decider functionObserverDecider: 'Observer' | 'Decider' | None
DiDe: 'Di' | 'De' | None
OiOe: 'Oi' | 'Oe' | None
- ...
transcript_tokens_length
number of tokens computed withtiktoken.get_encoding("cl100k_base")
transcript
transcript of a person
Clone the repository and navigate to the project directory:
git clone https://github.com/stanbar/objectivepersonality.ai.git
cd objectivepersonality.ai
Install dependencies:
poetry install
To compute embeddings based on the transcripts from TRANSCRIPTS_CSV
and outputs to TRANSCRIPTS_WITH_EMBEDDINGS_CSV
poetry run append_embeddings.py
To run benchmarks for all classifiers:
./benchmark.sh
This project is licensed under the PolyForm Perimeter License 1.0.1 - see the LICENSE file for details.
For support, raise an issue in the GitHub issue tracker or contact the maintainers via [email protected]
To compute a values of the people in the interviews
ollama serve
python3 run values.py
To compute peoples' demons and saviours
ollama serve
python3 run saviours_demons.py