World’s first fully agentic AI smartphone: Is this China’s second DeepSeek moment? | Technology News


China is moving ahead briskly in the AI arms race. While the rest of the world has been seeing an influx of AI-driven smartphone features, mainly voice assistants and app-by-app interactions, China has taken a major leap. ZTE, a Shenzhen-based multinational telecom company, has introduced a smartphone powered by an AI agent. Built in collaboration with ByteDance, the device features an agent that doesn’t just live inside apps but is integrated directly into the operating system. Its most striking capability is that it can operate the smartphone the same way a human would.

Taylor Ogan, an entrepreneur from Shenzhen, took to his X (formerly Twitter) account to share the prototype named Nubia M153. The smartphone runs on a customised version of Android integrated with ByteDance’s Doubao AI agent. For the uninitiated, Doubao is ByteDance’s proprietary large-scale general-purpose AI model ecosystem that is widely deployed across China as a chatbot and tool for productivity.

This prototype is much more than a normal on-device assistant. Ogan’s demo showed that the AI has full-stack control of the phone, meaning it can see the user interface, open apps, download apps, tap and type on screen, make calls, and execute multi-step tasks without the user having to know which apps are required. In simple words, the AI here uses the phone just like a human user would and not like an app would. 

What does the Agentic AI smartphone do?

Ogan began his thread showing him requesting the AI to find someone to wait in line for him. While this is not a norm yet in India, China’s gig economy apps commonly offer queue-standing services to people at hospitals, government offices, and other venues with high demand. Ogan is seen asking the AI in English, to which it responds immediately. The AI can be seen choosing which local service app, configuring the task, filling the necessary fields, and offering a final confirmation screen. The CEO in his short video admits that he would not have known which app handled that job or how to set it up. The video shows the AI agent doing the entire process autonomously. 

This is groundbreaking, as most current AI assistants seen on smartphones can reason about tasks but cannot navigate through third-party apps on behalf of a user. Although Samsung, Apple, and other tech giants have been experimenting with AI actions, they are largely permission gated and limited to only partner apps. The ZTE-ByteDance prototype here is much ahead, as it allows its AI to act directly within the Graphical User Interface (GUI) as if it were a human. 

The hardware behind the Agentic AI

Ogan, in his thread, revealed that the prototype is powered by Qualcomm’s new Snapdragon 8 Elite Gen 5 chipset with 16 GB of RAM. This is key as the agent divides its workload between cloud-based semantic reasoning and on-device screen control. According to the OP, running the ‘vision of the screen’ locally allows the AI to move quickly and maintain privacy for the sensitive UI interactions like payment flows and passwords. 

When it comes to the AI model, ByteDance’s Doubao is currently being used by over 175 million people in China. It is essentially a large, sparse Mixture-of-Experts model with multimodal, meaning text and vision, support. In the second instance, when Ogan clicks a picture of a NIO battery-swap station and asks, “What is this thing?” The model identifies the station from the image and links it to NIO’s national EV-charging network and goes on to explain how it works. 

 

Cloud + on-device architecture

Perhaps the coolest demonstration is that of booking a hotel. The CEO takes a single picture of the hotel entrance; he says nothing more than his intent to book a stay. The AI understands the assignment and divides its workloads. 

Story continues below this ad

Firstly, Doubao (cloud) translates the semantics, such as which hotel it is, that he wants to book for tonight, and that pet policies matter. Secondly, Nebula-GUI (on-device), which is reportedly a 7-billion-parameter model trained by ZTE, takes care of the physical actions such as opening a Ctrip (Chinese booking app), entering dates, locating the best rate, looking through the app for pet policies, and informing Ogan if dogs are allowed or not.

Based on the demo, this two-layer architecture is what allows the task to run smoothly. In simple terms, Doubao plans and Nebula-GUI executes it.

App-level knowledge and interaction with other bots

In another demo, the agent is asked to book a robotaxi, and Doubao uses GPS data and looks for local ride-hailing apps to decide which operator serves the particular route. On Ogan’s phone, Nebula-GUI opens the Baidu Apollo app, navigates through its menus, selects pickup points, and confirms the trip. Sometime later, Ogan asks it to change the drop-off location mid-ride. Again, the AI recognises the active Apollo session, opens the correct screen, changes the destination, and fires up a confirmation both on the phone and inside the robotaxi itself. This is a fine demonstration of the AI’s app-specific knowledge.

During the demo, when Ogan forgets the phone number linked to his Apollo account, the AI navigates the app’s settings and brings the last four digits. Now, this is something most AI assistants will not be able to do unless they have access and deep OS-level visibility. 

Meanwhile, in another test, Ogan uses Meituan, a Chinese tech company that offers on-demand drone delivery services. He asks the agent to order two drinks, and it updates his cart, makes the payment, and arranges delivery to a nearby locker. And, when Meituan’s automated system makes a confirmation call, Doubao answers on his behalf and speaks to Meituan’s bot. Thus, both the bots complete the exchange without any user intervention. This is an example of how agents can negotiate with other agents on behalf of a user. 

Ogan admits that through his walk, he uses the device as a passive layer of intelligence, identifying whether a store is part of a Shenzhen brand network, checking trademark and business registry data, or evaluating whether a passerby wearing an NYPD jacket is an actual police officer. In the demo, the system correctly contextualises location (Shenzhen) and identifies the jacket as a civilian fashion item.

Story continues below this ad

The demo also shows ByteDance’s image-generation tools, modifying only the clothes in a photo while leaving the scene intact. This allows the agent to re-render the person in a Chinese police uniform or FBI jacket on request.

What does this mean for us?

This device is essentially an OS-native GUI agent that has been trained on Chinese mobile UI flows and is backed by a large, multimodal reasoning model. It eliminates the need to understand apps, menus, or workflows. Simply give the phone intent; it handles the execution.

As of today, nothing in the global smartphone market demonstrates this level of autonomy. It remains to be seen if this becomes a commercial product, but the prototype clearly shows how agentic smartphones may change our lives. It also shows that the first true agentic smartphones may not come from Silicon Valley, but from China’s integrated AI and mobile ecosystem.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *