Contact
Jun 4, 2026

Doubao vs Qwen's 618 Shopping Test: Why This Generation of AI Still Can't Sell

Key Takeaways

  1. When Doubao and Qwen were asked identical shopping questions during 618, language understanding had improved but product matching broke down, pairing an 8,000-yuan camera budget with toys priced around 7 yuan
  2. The AI decision layer and the platform's product retrieval layer are disconnected, so budget constraints vanish the moment a user leaves the recommendation card
  3. Recommendation neutrality structurally collides with ad revenue, so even as the tech matures, designing AI that sells comes down to a problem of trust for e-commerce operators

What 618 Revealed About AI Shopping's Real Capabilities

China's "618" is a major e-commerce sale held around June 18 each year, the second of the two biggest annual shopping festivals after Singles' Day. The 2026 edition became a different kind of proving ground. ByteDance's Doubao and Alibaba's Qwen both rolled out conversational shopping features in quick succession. Could buying really be completed through dialogue instead of search? To probe that question, Chinese outlet 36Kr ran a hands-on test, posing the same four sets of questions to both assistants.

The results offered a clear-eyed snapshot of where agentic shopping actually stands. This article uses that test as a starting point to explore, from an e-commerce operator's perspective, how far we can currently trust AI to shop for us, and what stands in the way.

Four Questions Split "the AI That Decides" From "the AI That Opens a Window"

The test spanned four scenarios of differing character. The first was a basic recommendation: choose a laptop under 3,000 yuan. Doubao narrowed the requirements, presented specific product cards complete with purchase warnings, and built a path straight to checkout in a click. Qwen, by contrast, merely routed users to a product results page where they had to filter for themselves, with matching accuracy closer to keyword search.

The second question contained a false premise: do Dyson and Xiaomi vacuum cleaners differ in performance as much as in price? Both refused to follow the flawed framing and answered correctly. Doubao, however, pushed products aggressively throughout the conversation, inserting real-time product cards. Qwen stuck to organizing information in a comparison table, behaving more like a pure information tool.

The problem surfaced in the third scenario. For a high-value, complex decision, "choose a camera for photographing children with an 8,000-yuan budget," Qwen's brand judgment was sound. But the product cards it attached collapsed entirely. Its "new mirrorless plan" linked to a 53-yuan Sanrio children's toy camera, and its "second-hand full-frame plan" linked to a 7.78-yuan toy. The language understanding was right, while the product layer had gone completely off the rails. Returning a toy priced around 7 yuan against an 8,000-yuan budget captures the structural defect of this generation of AI in a single image.

When a "Selling Lie" Is More Dangerous Than an "Honest Refusal"

The fourth scenario, a cross-platform comparison asking where the same AirPods 4 were cheapest across JD.com, Taobao, and Pinduoduo, produced the most revealing result. Qwen candidly admitted that, as a Taobao AI assistant, it could not check rival platforms' prices, then honestly offered money-saving tactics within Taobao using real-time data.

Doubao, by contrast, confidently presented a three-platform price comparison table. It looked highly professional. Yet Doubao is connected to neither JD.com nor Pinduoduo. That comparison data was generated by the model from related information, not retrieved in real time. The "636 yuan" lowest price it cited was a theoretical figure stacking 88VIP, coupons, and national subsidies, not something an ordinary user could actually reach. Worse, the answer ended with a product card for its own Douyin Mall, unrelated to the three platforms in the question.

Here lies a crucial lesson. When users see a neat price comparison table, they naturally believe it to be accurate, real-time data. In a shopping decision, a fabricated answer is more dangerous than no answer at all. Qwen honestly conceded its limits; Doubao played the expert and returned a hallucination. That contrast cuts to the heart of trust design in AI commerce.

Three Walls Technology Cannot Tear Down

The issues the test exposed are structural, not one-off bugs.

First, the foundation of the recommendations may not align with users' interests. The core business model of Taobao and Tmall is advertising and bid ranking. If Qwen truly sorted by "what suits the user best," the investments of countless paying merchants would evaporate and the ecosystem's commercial logic would break. Indeed, observers note that Qwen's recommendations concentrate among merchants with higher payment weight, while high-value products with tens of thousands of sales sink dozens of slots lower. Doubao is no different. The fact that a live-streaming room surfaces first when you click a product card is no coincidence but a reflection of ByteDance's traffic-distribution logic. Traditional search results carry an "Ad" label; AI recommendations claim to "select for you," leaving users unable to tell algorithm from commercial promotion.

Second, AI makes decisions but fails to control the whole process. Doubao filters well at the recommendation-card stage. Yet press "view more," and the budget constraint disappears, surfacing 3,739-yuan and 4,499-yuan models against a 3,000-yuan request. This exposes a shared engineering flaw of current AI shopping products. The AI decision layer is not connected to the platform's product retrieval layer. The moment users leave the recommendation card, conventional e-commerce logic, sorting by sales volume, ad weight, and platform interest, takes over. AI influences only the first step of the purchase, none of the steps that follow.

Third, the efficiency of conversational shopping has not yet surpassed search. The premise that stating needs in natural language is more efficient holds only where "needs are clear, products are standardized, and decisions are simple." Real-world shopping is rarely like that: needs are vague, comparison dimensions are multidimensional, and trust takes time to build. When a user doubts whether a product card is trustworthy, they instinctively open another app to verify. At that moment, AI adds an extra confirmation step rather than improving efficiency.

The Paradox of Massive Spending and Absent Revenue

Behind this test lies a larger structure. China's tech giants are reportedly spending an estimated $42 million a month on AI shopping tools with no clear monetization strategy. In a roughly 15-trillion-yuan Chinese e-commerce market, whoever controls the AI shopping interface could redirect traffic across an entire digital ecosystem.

User scale signals how serious the investment is. According to QuestMobile, monthly active users in Q1 2026 stood at roughly 345 million for Doubao and 166 million for Qwen. Their design philosophies diverge sharply. A CIW analysis frames Qwen as keeping "the assistant visible," while Doubao bets on making "the transaction feel effortless."

Yet both run into the same contradiction. Users expect neutral recommendations, but paid ranking works in the opposite direction. A sponsored AI assistant risks losing the very trust that made users try it. That is precisely why neither company discussed monetization at launch. Technically sophisticated, yet commercially self-contradictory: that is the real face of this generation of AI shopping.

What Japanese E-Commerce Operators Can Learn From 618

This Chinese case study is a living textbook for Japanese e-commerce operators facing agentic commerce. In Japan too, Yahoo! Shopping launched its "Yahoo! Shopping Agent" in February 2026, and in-chat checkout, exemplified by OpenAI's Instant Checkout, is spreading. The same tensions will eventually surface in the Japanese market.

The first thing to grasp is that AI can handle proxy purchasing only in "clear, standardized, low-decision categories" for now. High-value categories with strong comparison needs, such as appliances, smartphones, PCs, and apparel, have not reached the reliability where AI can take over. Judging which type your flagship products resemble is the starting point.

Next comes the real-time integrity of product data. The recurring problem in this test was that model knowledge cannot keep pace with an e-commerce environment where prices, inventory, and promotions change by the minute. To be handled correctly by AI agents, you need infrastructure that supplies structured product data accurately and instantly. This is the unglamorous but decisive investment area of agent-ready product data.

Finally, trust is the greatest differentiator. Doubao's "selling lie" showed that an AI straining to sell ultimately erodes user trust. The more commercial intent you bury beneath a seemingly neutral recommendation, the more quietly brand trust is chipped away. To keep being chosen in AI commerce, operators must place verifiable, honest information at the center of their design, not a superficial conversational interface.

Conclusion

What the 618 test revealed is the sober fact that this generation of AI still cannot master selling. Language understanding has advanced, yet product matching breaks down, the decision and retrieval layers stay disconnected, and neutrality and revenue remain in conflict. The symbolic gap of returning a toy priced around 7 yuan against an 8,000-yuan budget speaks to both immature technology and structural contradiction. The takeaway for Japanese e-commerce operators is clear: judge soberly which domains AI can be trusted with, build real-time product data infrastructure, and above all place honesty at the center of trust design. The main battlefield of agentic commerce is still to come.