In short
- The compactifai tech of Multiversum is said to have reduced the parameters by 70% and the model memory by 93%, while retaining the accuracy of 97-98%.
- The company has just been closed a series B of $ 215 million, supported by Bullhound Capital, HP Tech Ventures and Toshiba.
- The method uses tensor networks of quantum physics to compress models and to “heal” them with fast retraining, which means that 50% faster performance in the event of inference claims.
A Spanish AI startup has just convinced investors to transfer $ 215 million on the basis of a daring claim: they can shrink large language models with 95% without jeopardizing their performance.
The innovation of multiverse computing depends on its compactifai technology, a compression method that lends mathematical concepts from quantum physics to reduce AI models to smartphonegroot.
The company San Sebastian says that their compressed LLAMA-2 7B model runs 25% faster with inference, while 70% fewer parameters are used, with the accuracy falling by only 2-3%.
If validated on a scale, this can tackle the problem of the elephant sizes of AI: models so enormously that they require specialized data centers to work.
“For the first time in history, we are able to profile the inner operation of a neural network to eliminate billions of false correlations to really optimize all kinds of AI models,” said Román Orús, Chief Scientific Officer of Multiverse, in one Blog post On Thursday.
Bullhound Capital led the $ 215 million series B -round with support from HP Tech Ventures and Toshiba.
Physics behind compression
The application of quantum-inspired concepts to tackle one of the most urgent problems of AI sounds unlikely as the research Hold on, it’s real.
In contrast to traditional compression that simply cuts neurons or reduces numerical precision, compactifai tensor networks – slightly structures that physicists developed to keep particle interactions without drowning in data.
The process works as an origami for AI models: Weight matrices are folded in smaller, interconnected structures called Matrix product operators.
Instead of storing any connection between neurons, the system only retains meaningful correlations while throwing away unnecessary patterns, such as information or relationships that are repeated again and again.
Multiverse discovered that AI models are not uniform compressible. Early layers turn out to be fragile, while deeper layers – which are even less critical of performance – are resistant to aggressive compression.
With this selective approach they can reach dramatic size reductions where other methods fail.
After compression, models briefly undergo “healing” – recoil that lasts less than one era thanks to the reduced parameters. The company claims that this restoration process runs 50% faster than training original models due to reduced GPU-CPU transfer taxes.
Long story short – per offers from the company – you start with a model, perform the Compactify magic and ends with a compressed version that has less than 50% of its parameters, can run twice the inference speed, costs much less and is just as capable as the original.
In his research, the team shows that you can reduce the memory needs of the LLAMA-2 7B model by 93%, can lower the number of parameters by 70%, speed up the training by 50%and accelerates the answer (conclusion) by 25%, while only losing an accuracy of 2-3%.
Traditional shrinking methods such as quantization (reducing precision such as the use of fewer decimal places), pruning (cutting out less important neurons, such as cropping dead branches from a tree), or distillation techniques (training of a smaller model to mimic greater behavior) are not even in the vicinity of achieving these numbers.
Multiversum has already served more than 100 customers, including Bosch and Bank of Canada, and applies them on quantum inspired algorithms beyond AI for energy optimization and financial modeling.
The Spanish government has been partly invested € 67 million In March the total financing of more than $ 250 million pushes.
Currently offered with compressed versions of open-source models such as Llama and Miltral via AWS, the company intends to expand to Deepseek R1 and other reasoning models.
Own systems of OpenAi or Claude remain clearly prohibited because they are not available for crafts or study.
The promise of technology extends beyond cost savings. The involvement of HP Tech Ventures gives interest in Rand AI implementation – current advanced models locally instead of cloud servers.
“The innovative approach to Multiversum has the potential to offer AI benefits of improved performance, personalization, privacy and cost efficiency for companies of any size,” said Tuan Tran, President of HP Technology and Innovation, HP.
So if you notice that you will ever perform Deepseek R1 on your smartphone, these guys may be the ones to thank.
Published by Josh Quitittner and Sebastian Sinclair
Generally intelligent Newsletter
A weekly AI trip told by Gen, a generative AI model.