Alibaba launches Qwen 3.5 small model series with sub-1B edge options

Written by Joseph Nordqvist/March 2, 2026 at 2:57 PM UTC

4 min read

Alibaba's Qwen team has released four new small language models — Qwen3.5-0.8B, 2B, 4B, and 9B — alongside their base model counterparts, extending the Qwen3.5 architecture to compact, resource-efficient deployments.

The release, announced March 2 on X, positions the series as a practical option for edge devices, lightweight agents, and industrial research. All four models share the same foundation as the larger Qwen3.5 family: native multimodal capability, an improved architecture, and training with scaled reinforcement learning.

The launch of the Qwen 3.5 Small Model Series adds to a growing ecosystem of models designed for local deployment, where cost, latency, and data privacy are priorities.

Screenshot of an X post by @Alibaba_Qwen announcing the Qwen 3.5 Small Model Series (0.8B, 2B, 4B, 9B), with benchmark charts comparing the models against GPT-OSS-120B, GPT-5 nano, and Gemini 2.5 Flash-Lite across eight evaluations including IFBench, GPQA Diamond, and Video-MME
Alibaba's Qwen team announced the Qwen 3.5 small model series on X, March 2, 2026. (Source: @Alibaba_Qwen / X)

What Was Released

  • Qwen3.5-0.8B and 2B — Designed for speed and minimal compute, with edge hardware deployments as the primary use case.

  • Qwen3.5-4B — Described by the team as "a surprisingly strong multimodal base for lightweight agents," indicating competitive capability at a size that can run on consumer hardware.

  • Qwen3.5-9B — The largest of the series, benchmarked against models significantly bigger than itself.

Base model weights for all four sizes are also being released, which the team said is intended to better support "research, experimentation, and real-world industrial innovation."

  • Dual Operating Modes: The models officially support two states:

    • Non-Thinking: Optimized for speed, general chat, and standard visual tasks.

    • Thinking: Engages a Chain-of-Thought reasoning process to solve complex logic, math, and coding problems.

  • Dense Architecture: While the larger 35B+ versions of Qwen 3.5 use Mixture-of-Experts (MoE), the small series (0.8B–9B) are dense. Every parameter is active for every response, maximizing reasoning depth for their physical footprint.

Benchmark comparison table showing Qwen3.5-4B and 9B models versus GPT-OSS-120B and 20B across Knowledge, Reasoning, Long Context, Instruction Following, General Agent, and Multilingual categories
Alibaba Qwen3.5-9B benchmark scores compared to GPT-OSS-120B across knowledge, reasoning, and multilingual tasks. (Source: Alibaba / Qwen)

Benchmark Performance

According to benchmarks self-reported by Alibaba alongside the release, the Qwen3.5-9B shows particularly strong results in instruction following (91.5 on IFEval) and long-context tasks (63.0 on AA-LCR), outperforming several larger models in those categories. On knowledge benchmarks like MMLU-Pro (82.5) and MMLU-Redux (91.1), the 9B also posts competitive scores relative to the GPT-OSS-120B baseline included in the comparison.

Benchmark table comparing Qwen3.5-9B and 4B against GPT-OSS-120B, GPT-OSS-20B, Qwen3-Next-80B-A3B-Thinking, and Qwen3-30B-A3B-Thinking-2507 across 25 evaluations in six categories: Knowledge & STEM, Instruction Following, Long Context, Reasoning & Coding, General Agent, and Multilingualism
Qwen3.5-9B and 4B benchmark results across 25 evaluations, compared against GPT-OSS-120B and Qwen3 predecessors. General Agent benchmarks were not reported for GPT-OSS models. (Source: Alibaba / Qwen)

The Qwen3.5-4B demonstrates solid multilingual performance across 29-language and 55-language evaluations, suggesting it may be viable for international deployment scenarios despite its compact size.

On reasoning and coding tasks, both the 4B and 9B lag behind the larger GPT-OSS-120B on HMMT benchmarks — an expected gap given the parameter difference — but remain competitive on LiveCodeBench v6.

General agent benchmarks (BFCL-V4, TAU2-Bench, VITA-Bench, DeepPlanning) were only reported for the Qwen3.5 models in this comparison, not for GPT-OSS variants, limiting direct comparison on agentic tasks.

Context

The release continues a pattern of frontier labs publishing small model series that punch above their weight class, particularly for deployment scenarios where cost, latency, and local execution matter more than absolute benchmark rank. Alibaba has made Qwen3.5 models available via Hugging Face and ModelScope.


AI News Home covers AI industry developments with primary source verification. Benchmark data is self-reported by Alibaba and has not been independently verified.

Joseph Nordqvist

Written by

Joseph Nordqvist

Joseph founded AI News Home in 2026. He studied marketing and later completed a postgraduate program in AI and machine learning (business applications) at UT Austin’s McCombs School of Business. He is now pursuing an MSc in Computer Science at the University of York.

View all articles →

This article was written by the AI News Home editorial team with the assistance of AI-powered research and drafting tools. All analysis, conclusions, and editorial decisions were made by human editors. Read our Editorial Guidelines

References

  1. 1.
    Qwen 3.5 Small Model Series announcementAlibaba Qwen, X (formerly Twitter), March 2, 2026
    Primary

Was this useful?