Nous Research trains an artificial intelligence model using machines distributed over the Internet

Nous Research trains an artificial intelligence model using machines distributed over the Internet

Technology News

Join our daily and weekly newsletters for the latest updates and exclusive content on top AI coverage. More information


A team of artificial intelligence researchers known as Nous Research is currently doing something unique in the fast-moving space of generative artificial intelligence (at least as far as I know): Nous is in the middle of preparing a new 15-billion-parameter large-scale language model (LLM) using machines distributed across the Internet and around the world, avoiding the need to centralize model development as has traditionally been the case in expensive, power-hungry AI data centers and graphics “superclusters” processing units (GPUs), such as the one recently completed. by Elon Musk’s xAI in Memphis, Tennessee.

In addition, Nous live-streams the pre-workout process on a dedicated website — distro.nousresearch.com — showing how well it’s doing on benchmarks, how it’s going, as well as a simple map of the various training hardware locations behind the workout. , including several locations in the US and Europe.

At the time of this article, there are roughly 57 hours (2.3 days) of the pre-training run remaining with more than 75% of the process completed.

Pre-training is the first of two and probably the most fundamental aspect of LLM training, as it involves training a model on a large corpus of text data to learn the statistical properties and structures of the language. The model processes large text datasets, capturing patterns, grammar and contextual relationships between words. This phase equips the model with a broad understanding of the language, enabling it to produce coherent text and perform various language-related tasks.

After preliminary training, the model is fine-tuned on a more specific dataset tailored to specific tasks or domains.

If successful, Nous will demonstrate that it is possible to train boundary-class LLMs without the need for expensive superclusters or low-latency transmission using a new, open-source training method. It could usher in a new era of distributed AI training as the main or potentially dominant source of new AI models and shift the balance of power in AI away from well-earned large tech companies and toward smaller groups and non-corporate actors. .

We DisTrO: the technology behind the training exercise

Nous, which made headlines earlier this year for the release of its permissive and existentially conflicted Meta Llama 3.1 Hermes 3 variant and its overall mission to make AI development personalized and limitless, uses its open-source distributed training technology called Nous DisTrO (Distributed Training). Over-the-Internet), which Nous originally published in a research paper in August 2024.

According to a recent Nous Research publication, DisTrO reduces the bandwidth requirements of inter-GPU communication by up to 10,000x during pre-training. This innovation allows models to be trained on slower and more accessible Internet connections – potentially up to 100 Mbps download and 10 Mbps upload – while maintaining competitive convergence rates and loss curves.

The core of DisTrO lies in its ability to efficiently compress data exchanged between GPUs without sacrificing model performance.

As described in an August 2024 VentureBeat article, this method reduced communication requirements from 74.4 gigabytes to just 86.8 megabytes during a test with the Llama 2 architecture, an efficiency increase of nearly 857x. This dramatic improvement paves the way for a new era of decentralized, collaborative AI research.

DisTrO builds on earlier work on Decoupled Momentum Optimization (DeMo), an algorithm designed to reduce communication between GPUs by several orders of magnitude while maintaining training performance comparable to traditional methods.

Both the DeMo algorithm and the DisTrO stack are part of Nous Research’s ongoing mission to decentralize AI capabilities and bring advanced AI development to a wider audience.

The team also made the DeMo algorithm open source on GitHub and invited researchers and developers around the world to experiment and build on their findings.

Hardware partners

Pretraining Nous Research’s 15 billion parameter language model included contributions from several major partners, including Oracle, Lambda Labs, Northern Data Group, Crusoe Cloud, and Andromeda Cluster.

Together, they provided the heterogeneous hardware necessary to test DisTrO’s capabilities in a real distributed environment.

Profound implications for future AI model development

DisTrO’s implications go beyond technical innovation. By reducing reliance on centralized data centers and specialized infrastructure, DisTrO offers a path to a more inclusive and collaborative AI research ecosystem.

Smaller institutions, independent researchers, and even hobbyists with internet access and consumer-grade GPUs can potentially train large models—a feat previously reserved for companies with significant capital and expertise.

Diederik P. Kingma, co-author of the research paper and co-inventor of the Adam optimizer, has joined Nous Research as a DeMo and DisTrO development associate. Kingma’s contributions, along with those of Nous Research co-founders Bowen Peng and Jeffrey Quesnelle, lend credibility to the project and signal its potential impact on the broader AI community.

Next steps

Nous Research has opened the door to a future where AI development is no longer dominated by a handful of corporations. Their work on DisTrO shows that with the right optimizations, large-scale artificial intelligence models can be efficiently trained in a decentralized manner.

While the current demonstration used high-end GPUs such as the Nvidia H100, the scalability of DisTrO to less specialized hardware remains an area for further investigation.

As Nous Research continues to refine its methods, the technology’s potential applications—from decentralized federated learning to training diffusion models for image generation—could redefine the boundaries of AI innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *