Vladimir Malinovsky, a member of the Yandex Research team, has developed a service that enables users to run a language model with 8 billion parameters on their computers or smartphones through any web browser. This approach can significantly reduce computational costs for corporations, startups, and researchers, making the deployment and use of LLMs more affordable. The project's source code is publicly available on GitHub.
By leveraging the AQLM neural network compression technology developed by the Yandex Research team in collaboration with ISTA and KAUST universities during the summer of 2024, he successfully offloaded all computations to the user’s device, eliminating the need for expensive, high-performance GPUs.
Users can try out the demo via this link. When a user accesses the platform, the Llama3.1-8B model is downloaded from the cloud to their device. Compressed to just 2.5 GB, it's now 6 times smaller. Once downloaded, the model can be used offline without requiring an internet connection. The response speed of the neural network depends on the device's processing power; for instance, on a MacBook Pro M1, it can process about 1.5 tokens per second, or approximately 3–4 characters.
The program is written in Rust and compiled using WebAssembly, allowing it to run in any web browser. The model has been compressed using a combination of AQLM and PV-tuning methods. AQLM dramatically reduces the model's size (by up to eight times) and enhances its speed, while PV-tuning corrects errors that arise during compression, ensuring minimal performance loss. As a result, despite the size reduction, Llama3.1-8B retains about 80% of its original response quality.
About Yandex Research
Yandex Research is a team focused on exploring fundamental questions in artificial intelligence. Research engineers specialize in natural language processing, computer vision, neural networks, and more. The Yandex Research team develops solutions integrated into the company’s products, bringing tangible benefits to people. Thanks to their work, Yandex has become one of the leading tech companies in scientific publications at NeurIPS, ICML, and other major international machine learning conferences.
Contacts
Yandex Press Office
pr@yandex-team.com