Skip to content

Simply make AI models faster, cheaper, smaller, greener!

Twitter GitHub LinkedIn Discord

Pruna AI makes, in one line of code, any AI model faster, cheaper, smaller, greener on any hardware. It covers CV, NLP, audio, graphs for predictive and generative AI.

We provide two packages. You can read their documentations to know more here.

  • pruna: Package to smash your AI model to make it more efficient without losing quality. For this, you only need to call pruna.smash(). If you want to compress models on your side, you can request access here.
  • pruna_engine: Package to run your AI model more efficiently without changing your pipeline. If you want to run models publicly shared on HuggingFace, you can install it from here.

Popular repositories Loading

  1. stable-fast stable-fast Public

    Forked from chengzeyi/stable-fast

    Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

    Python 1

  2. fast-hadamard-transform fast-hadamard-transform Public

    Forked from Dao-AILab/fast-hadamard-transform

    Fast Hadamard transform in CUDA, with a PyTorch interface

    C 1

  3. .github .github Public

  4. flute flute Public

    Forked from HanGuo97/flute

    Fast Matrix Multiplications for Lookup Table-Quantized LLMs

    Cuda

  5. tritonserver tritonserver Public

    This repository describes how to use pruna with tritonserver

    Python

  6. replicate-example replicate-example Public

    Python

Repositories

Showing 6 of 6 repositories
  • PrunaAI/replicate-example’s past year of commit activity
    Python 0 0 0 0 Updated Jan 20, 2025
  • tritonserver Public

    This repository describes how to use pruna with tritonserver

    PrunaAI/tritonserver’s past year of commit activity
    Python 0 0 0 0 Updated Jan 15, 2025
  • flute Public Forked from HanGuo97/flute

    Fast Matrix Multiplications for Lookup Table-Quantized LLMs

    PrunaAI/flute’s past year of commit activity
    Cuda 0 Apache-2.0 8 0 0 Updated Jan 13, 2025
  • fast-hadamard-transform Public Forked from Dao-AILab/fast-hadamard-transform

    Fast Hadamard transform in CUDA, with a PyTorch interface

    PrunaAI/fast-hadamard-transform’s past year of commit activity
    C 1 BSD-3-Clause 19 0 0 Updated Jan 13, 2025
  • stable-fast Public Forked from chengzeyi/stable-fast

    Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

    PrunaAI/stable-fast’s past year of commit activity
    Python 1 MIT 78 0 0 Updated Jan 13, 2025
  • .github Public
    PrunaAI/.github’s past year of commit activity
    0 0 0 0 Updated Aug 12, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…