LPU-accelerated inference that returns Llama-70B at 500 tokens/sec. The choice when latency matters more than absolute capability.
From Wikipedia
Groq, Inc. is an American artificial intelligence (AI) company that builds an AI accelerator application-specific integrated circuit (ASIC). The architecture was originally introduced as a Tensor Streaming Processor (TSP) but was later rebranded as a Language Processing Unit (LPU) following the widespread adoption of large language models after the breakthrough of ChatGPT. The company also develops related computer hardware and software to accelerate AI inference performance.