MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Font ResizerAa
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Reading: Xiaomi announces Xiaomi-Robotics-0, its first-generation robot large-scale model – Gizmochina
Share
Font ResizerAa
MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Search
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Have an existing account? Sign In
Follow US
© Market Alert News. All Rights Reserved.
  • bitcoinBitcoin(BTC)$65,993.00-1.10%
  • ethereumEthereum(ETH)$1,920.84-0.82%
  • tetherTether(USDT)$1.00-0.01%
  • rippleXRP(XRP)$1.37-0.16%
  • binancecoinBNB(BNB)$606.940.53%
  • usd-coinUSDC(USDC)$1.000.01%
  • solanaSolana(SOL)$78.29-1.74%
  • tronTRON(TRX)$0.2764640.23%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.041.54%
  • dogecoinDogecoin(DOGE)$0.0917921.28%
Learn

Xiaomi announces Xiaomi-Robotics-0, its first-generation robot large-scale model – Gizmochina

Last updated: February 12, 2026 10:55 am
Published: 13 hours ago
Share

Xiaomi is best known for smartphones, smart home gear, and the occasional electric vehicle update. Now it wants a place in robotics research too.

The company has announced Xiaomi-Robotics-0, an open-source vision-language-action (VLA) model with 4.7 billion parameters. It’s designed to combine visual understanding, language comprehension, and real-time action execution, which Xiaomi says are the core of “physical intelligence.” And according to the company, it’s already setting multiple state-of-the-art records in both simulations and real-world tests.

At a high level, robotics models like this solve a closed loop: perception, decision, and execution. A robot needs to see the world, understand what it’s being asked to do, decide on a plan, and then carry it out smoothly. Xiaomi says Robotics-0 was built specifically to balance broad understanding with fine motor control.

To do that, the model uses what’s known as a Mixture-of-Transformers (MoT) architecture. It splits responsibilities between two main components.

The first is a Visual Language Model (VLM), which acts as the “brain.” It’s trained to interpret human instructions — including vague ones like “Please fold the towel” — and understand spatial relationships from high-resolution visual input. This part handles object detection, visual question answering, and logical reasoning.

The second component is what Xiaomi calls the Action Expert. This is built around a multi-layer Diffusion Transformer (DiT). Instead of producing a single action at a time, it generates something called an “Action Chunk,” — understand it as a sequence of movements — using flow-matching techniques to keep motion accurate and smooth.

One common issue with VLA models is that when they learn to perform physical actions, they tend to lose some of their original understanding capabilities. Xiaomi says it avoided that by co-training the model on both multimodal data and action data. The result, at least in theory, is a system that can still reason about the world while learning how to move within it.

The training process happens in stages. First, an “Action Proposal” mechanism forces the VLM to predict possible action distributions while interpreting images. This aligns its internal representation of what it sees with how actions are performed. After that, the VLM is frozen, and the DiT is trained separately to generate accurate action sequences from noise, relying on key-value features rather than discrete language tokens.

Xiaomi also tackled another practical problem called inference latency. It is when delays between model predictions and physical movement can create awkward pauses or unstable behavior.

Xiaomi says it implemented an asynchronous inference, decoupling model computation from robot operation, so movements remain continuous even if the model takes extra time to think.

To improve stability, Xiaomi is using a “Clean Action Prefix” technique, which feeds the previously predicted action back into the model to ensure smooth, jitter-free motion over time.

Meanwhile, a Λ-shaped attention mask biases the model toward current visual input instead of relying too heavily on past states. The goal is to make the robot more responsive to sudden environmental changes.

In benchmark testing, Xiaomi-Robotics-0 reportedly achieved state-of-the-art results in LIBERO, CALVIN, and SimplerEnv simulations, outperforming around 30 other models.

More interestingly, Xiaomi deployed it on a dual-arm robot platform in real-world experiments. In long-horizon tasks like folding towels and disassembling building blocks, Xiaomi says the robot demonstrated steady hand-eye coordination and handled both rigid and flexible objects without obvious breakdowns.

Unlike earlier VLA systems that often sacrificed multimodal reasoning once action training began, the Robotics-0 model retains strong visual and language capabilities, especially in tasks that blend perception with physical interaction.

Read more on Gizmochina

This news is powered by Gizmochina Gizmochina

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Like this:

Like Loading...

Related

How Ozak AI’s $3.61M Growth Momentum at $0.012 Is Building Tomorrow’s Crypto Success Stories Today
Petrino ready to deploy Washington for Hogs after missing out in 2024
David Hughes has ‘no regrets’ after Newport County sacking
School closures push Rohingya refugee children into marriage and work
DeFi Technologies Announces the Pricing of an Oversubscribed US$100 Million Registered Direct Offering

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Email Copy Link Print
Previous Article Europe warily awaits Rubio at Munich Security Conference as Trump roils transatlantic ties
Next Article Hustle hold off Warriors in Education Day game | DeSoto County News
© Market Alert News. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Prove your humanity


Lost your password?

%d