Skip to content

OriNachum/autonomous-intelligence

Repository files navigation

Tau - The Autonomous, Understanding robot

This is Tau!
Tau is inspired by Pi.AI and if you havent tried Pi yet, I strongly encourage you to try.
Like Pi, Tau's conversation is on continual conversation, unlike Chat based bots which feature many conversations and threads.
This is by design - Tau has a single conversation, like speaking to a human.
This is reflected by consulting Tau in decisions made along development: Order of features, voice type, etc.

Tau is a personal fun project.
I opened it as an open source for anyone to experiment with (fork), or just follow. (A star is appreciated!)
If you fork - delete history and facts to reset their knowledge and embark the journey anew!

Update status

  • System Prompt: Speech-actions speak conversation structure.
  • Conversation loop: A continueous conversation with ongoing context.
  • Immediate memory: Reduce context by summarizing it to key points. Inject memory to System prompt.
  • Long term memory: Save the running memory to vector database.
  • Speech: Voice based conversation with hearing and speaking. (Whisper and OpenAI TTS)
  • Vision infra: Set up Hailo-8L as an internal vision webservice.
    • Setup Hailo-8L on Raspberry Pi, validate examples work.
    • Look for best practices and options for integrating Hailo in your application.
    • Find a suitable, working architecture to wrap hailo as a service
    • Implement and improve the wrapper
    • Pending Hailo review (update, will be integrates as community-examples, confirmed by Hailo)
    • Integrate in the system, allow Tau to recognize faces
    • add more-than-one models to be used serially, or use different devices (Coral, Sony AI Camera x2, Jetson)
  • Long term fetching: Pull from long term memory into context.
  • Auto-start on device startup.
  • Long term memory archiving support.
  • Entity based memory: Add GraphRAG based memory.
    • Learn about GraphRAG, how to implement, etc.
    • Use or implement GraphRAG
  • Design further split to applications, event communications
  • Setup Nvidia Jetson Orin Nano Super 8GB
    • Local LLM on Jetson
      • Ollama (Llama 3.2 3:b)
      • Move to use jetson-containers
      • TensorRT
      • Transformers
    • Local Speech to text (faster-whisper) on Jetson
      • WebRT VAD
      • Silero VAD
    • Implement Text to speech
      • piperTTS
      • kokoroTTS
      • israwave
  • Write a setup guide for Nvidia Jetson Orin Nano Super 8GB
  • Build every component as a single event-based app
    • Communication infra with websocket or unix domain socket (Global)
    • Configuration infra, local configuration per device (Global)
    • Detect main component, connects the secondary device to main device (Global)
    • LLM as a service (Jetson)
    • Speech detection as a service (Jetson)
    • Speech as a service (Jetson)
    • Memory as a service (Jetson)
    • Vision as a service (Raspberry Pi)
    • Face as a service (Raspberry Pi)
    • Main loop (Jetson)
  • Integrate Nvidia Jetson Orin Nano Super 8GB
  • Integrate Hailo 10 as inference station (Llama 3.2 3b)
  • Advanced voice: Move to ElevenLabs advanced voices.
  • Tool use
    • Add frameqork for actions:
    • Open live camera feed action
    • Snap a picture
  • Add aec for voice recognition from https://gist.github.com/thewh1teagle/929af1c6b05d5f96ceef01130e758471
  • Introspection: Add Introspection agent for active and background thinking and processing.
  • Growth: Add nightly finetuning, move to smaller model.

Prerequisites

Tau should be able to run on any linux with internet, but was tested only on a raspberry pi 5 8GB with official OS 64bit.
Raspberry AI Kit is needed for vision (Can be disabled in code - configuration support per request/in future)

Keys

All needed keys are in .env_sample.
Copy it to .env and add your keys.
Currently, the main key is OpenAI (Chat, Speech, Whisper), and VoyageAI + Pinecone is for vectordb

I plan on moving back to Anthropic (3.5 sonnet only)

Groq was used for a fast understand action usecase

Installation

  1. Cloning Git repositories 1.1. Clone this repository to your Raspberry Pi:
git clone https://github.com/OriNachum/autonomous-intelligence.git

1.2. Clone this repository to your Raspberry Pi:

git clone https://github.com/OriNachum/hailo-rpi5-examples.git

I have a pending PR to integrate this to main repo.

https://github.com/hailo-ai/hailo-rpi5-examples/pull/50

If you do, set up the your machine for Hailo-8L chip per Hailo's instructions.

  1. Copy .env_sample to .env and add all keys:
  • ANTHROPIC_API_KEY: used for Claude based text completion and vision. Currently unused.
  • OPENAI_API_KEY: Used for Speech, Whisper, vision and text.
  • GROQ_API_KEY: Used for a super quick action understanding, May be replaced with embeddings.
  • VOYAGE_API_KEY: VoyageAI is recommended by Anthropic. They offer the best embeddings to date (of when I selected it), and offer a great option for innovators.
  • PINECONE_API_KEY: API Key of pinecone. Serverless is a great option.
  • PINECONE_DIMENSION: Dimension of the embeddings generated by Voyage. Used for the setup of Pinecone
  • PINECONE_INDEX_NAME: Name of the index in Pinecone, for memory

Usage

There are five programs to run by this order:

  1. hailo-rpi5-examples:
  2. basic-pipelines/detection_service.py: This runs the camera and emits events on changes on detection
  3. autonomous-intelligence
  4. services/face_service.py: this starts the face app, and reacts when speech occurs
  5. tau.py: this is the main LLM conversation loop
  6. tau_speech.py: this consumes speech events, and produces actual speech
  7. services/microphone_listener.py this listens to your speech and emits events to tau.py as input

Acknowledgements

There are multiple people for which I want to acknowledge for this development.
Of them, these are the people who confirmed for me to mention them:

  • @Sagigamil

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published