Useful information
Prime News delivers timely, accurate news and insights on global events, politics, business, and technology
Useful information
Prime News delivers timely, accurate news and insights on global events, politics, business, and technology
Unique our daily and weekly newsletters to obtain the latest updates and exclusive content on the coverage of the leader in the industry. More information
hugging the face He has achieved remarkable advance in AI, by presenting vision language models that are executed on devices as small as smartphones and exceed their predecessors that require massive data centers.
The new company SMOLRLM-256M MODELwhich requires less than a GPU memory gigabyte exceeds the performance of its IDEFICS 80B For only 17 months: a 300 times larger system. This drastic reduction of size and improvement of the capacity marks a decisive moment for the practical deployment of AI.
“When we launched 80b in August 2023, we were the first company to open an open source video language model,” said Andrés Marafioti, automatic learning research engineer at Hugging Face, in an exclusive interview with VentureBeat. “By achieving a size reduction of 300 times and at the same time improve performance, SMolrolm marks a great advance in vision and language models.”
The advance arrives at a crucial moment for the companies that fight with the Astronomical Computer Costs to implement AI systems. The new SMOLMM models, available in 256m and 500m Parameter sizes: Process images and understand the visual content at speeds that were previously unattainable in their size class.
The smallest version process 16 examples per second using only 15 GB of RAM with a lot size of 64, which makes it particularly attractive to companies that seek to process large volumes of visual data. “For a medium -sized company that processes 1 million images per month, this translates into substantial annual savings in computer costs,” Marafioti told Venturebeat. “Memory reduction means that companies can implement cheaper cloud instances, which reduces infrastructure costs.”
Development has already caught the attention of the main technological actors. IBM has associated with Hugging Face to integrate the 256m model into DoclingYour document processing software. “While IBM certainly has access to substantial computer resources, the use of smaller models such as these allows them to efficiently process millions of documents to a cost fraction,” Marafioti said.
Efficiency gains come from technical innovations both in vision processing and language components. The team changed from a 400m parameter vision encoder to a 93m parameter version and implemented more aggressive tokens compression techniques. These changes maintain high performance while drastically reduce computational requirements.
For new companies and smaller companies, these advances could be transformers. “Emerging companies can now launch sophisticated computer vision products in weeks instead of months, with infrastructure costs that were just a few months ago,” Marafioti said.
The impact extends beyond cost savings and allows completely new applications. The models promote advanced documents search capabilities through ColipaliAn algorithm that creates databases with search capacity from document files. “They obtain yields very close to those of 10 times larger models and at the same time significantly increase the speed at which the database is created and looking First time, “Marafioti explained.
The advance challenges conventional wisdom about the relationship between the size and capacity of the model. While many researchers have assumed that larger models are needed for advanced vision and language tasks, SMololm demonstrates that smaller and efficient architectures can achieve similar results. The 500M parameter version achieves 90% of its 2.2B parameter brother performance at key reference points.
Instead of suggesting an efficiency plateau, Marafioti sees these results as evidence of an unleashed potential: “Until today, the standard was to launch VLM from 2b parameters; We thought that smaller models were not useful. We are demonstrating that, in fact, 1/10 models of size can be extremely useful for companies. ”
This development comes in the midst of growing concerns about AI. Environmental impact and Computer costs. By drastically reducing the necessary resources for the vision-language, the innovation of Hugging Face could help address both problems and at the same time make the advanced abilities of AI be accessible to a broader range of organizations.
The models are Open source availableContinuing the Hugging Face tradition of increasing access to artificial intelligence technology. This accessibility, combined with the efficiency of the models, could accelerate the adoption of the AI of visual language in industries ranging from medical care to retail trade, where processing costs have been previously prohibitive.
In a field where for a long time it means better, the achievement of Hugging Face efficient that run directly on our devices. While the industry faces issues of scale and sustainability, these smaller models could represent the greatest advance so far.