I have a question about "https://replicate.com/meta/llama-2-70b-chat" deployment. I read the codes of models/llama-2-70b-chat/config.py. it seems using exllama. What model do it use? Do you use [TheBloke/Llama-2-70B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-70B-Chat-GPTQ) ?
I have a question about "https://replicate.com/meta/llama-2-70b-chat" deployment. I read the codes of models/llama-2-70b-chat/config.py. it seems using exllama. What model do it use? Do you use TheBloke/Llama-2-70B-Chat-GPTQ ?