Gemma 4 on an iPhone: here's what a 2B model can actually do
Google shipped a real LLM to your phone. We tested Gemma 4 E2B fully offline (math, code, ethics, and history) to find where on-device AI shines and where it breaks.

Joseph founded AI News Home in 2026. He studied marketing and later completed a postgraduate program in AI and machine learning (business applications) at UT Austin’s McCombs School of Business. He is now pursuing an MSc in Computer Science at the University of York.
71 articles
Google shipped a real LLM to your phone. We tested Gemma 4 E2B fully offline (math, code, ethics, and history) to find where on-device AI shines and where it breaks.
Netflix's first open-source model isn't a chatbot — it's a production VFX tool that understands physics. We ran it on an H100 to see if the claims hold up.
53 practical tests — vision benchmarks, 14 UI recreations, and 30 text challenges covering math, logic, essays, creative writing, and translation. All local, all free.
Google launched Veo 3.1 Lite today at $0.05 per second. I ran 7 text-to-video prompts and one image-to-video test on both models to find out where Lite holds up and where it falls apart.
I recorded 43 seconds of audio on a MacBook, sent it to Mistral's Voxtral API, and got back natural-sounding voice clones in minutes. Then I ran the same tests on the 6-bit quantized local version and Kokoro 82M to see how they compare.
Cohere's first speech recognition model tops the Open ASR Leaderboard as well as our own independent benchmark of four providers.
Google's expansion of Search Live to more than 200 countries this week is easy to frame as a product update. A voice feature got wider availability. But zoom out and the move looks more significant: it is the first time any company has deployed conversational, multimodal search at truly global scale.
Google released Gemini 3.1 Flash Live on March 26, positioning it as the company's most capable real-time audio model to date.
Researchers have developed a machine learning model that predicts an individual's risk of hepatocellular carcinoma using routine clinical information already collected during standard medical visits.
In a pair of posts on X on March 25, Andrej Karpathy noted that the memory feature has an issue: models can treat stale, one-off queries as deep ongoing interests.
Reflection AI, a startup backed by Nvidia that is building freely available AI models, is in talks to raise $2.5 billion at a pre-money valuation of $25 billion, the Wall Street Journal reported on March 25.
Anthropic has introduced auto mode for Claude Code, a new permissions setting that lets it approve or block actions on the developer's behalf.
Google DeepMind's Lyria 3 Pro music generation model can create complete tracks up to three minutes long, a sixfold increase over the 30-second clips produced by the standard Lyria 3 model that launched in February.
OpenAI is discontinuing Sora, its AI video generation platform, just six months after the standalone app launched.
Cursor released a technical report on March 24 detailing how Composer 2 was trained, formally crediting Moonshot AI's Kimi K2.5 as the base model for the first time in an official document.
A new compression technique from Google Research could make AI models cheaper to run, faster to respond, and capable of handling much longer conversations.
A developer discovered that Cursor's Composer 2 model was built on top of Kimi K2.5, an open-source model from Beijing-based Moonshot AI, a detail Cursor had not mentioned anywhere in its announcement.
Nvidia DLSS 5 introduces a real-time neural rendering model that uses generative AI to enhance game visuals with photorealistic lighting and materials.
Moonshot AI's Kimi Team has published a technical report proposing a change to a fundamental component of modern AI models called residual connections.
A Sydney tech entrepreneur with no background in biology used ChatGPT, Google DeepMind's AlphaFold, and his own machine learning expertise to design a custom mRNA cancer vaccine for his rescue dog.
Elon Musk set a March 21 date for what may well be one of the most ambitious semiconductor ventures ever attempted by a company outside the established chipmaking industry
Anthropic announced on March 12 that it is investing $100 million in 2026 to launch the Claude Partner Network, a program designed to help consulting firms, professional services companies, and AI specialists deploy Claude across enterprise customers.
Anthropic has moved the 1 million token context window for Claude Opus 4.6 and Sonnet 4.6 from beta to general availability, and eliminated the long-context pricing premium entirely.
Google DeepMind CEO Demis Hassabis published a warm retrospective about AlphaGo's 10th anniversary in what may very well be an act of narrative positioning at a moment when the rest of the industry is consumed by the question of who AI serves.
Cursor's new Automations let AI agents run in the background, triggered by PRs, PagerDuty alerts, Slack messages, and timers, with no prompt required.
OpenAI released GPT-5.4 today, billing it as its most capable model to date for professional work, coding, and automated tasks.
Google's AI research tool upgrades from narrated slides to source-grounded cinematic video using a three-model pipeline of Gemini 3, Nano Banana Pro, and Veo 3. Ultra-tier access is required, with a dedicated 20/day Cinematic limit.
Google has made its Canvas feature available to all users in the United States who access AI Mode in Search, the company announced on March 4, 2026.
Google has released a preview of Gemini 3.1 Flash-Lite, positioning it as the fastest and most cost-efficient model in the Gemini 3 series.
Google's research AI ships ten style presets, a sign the competition for AI tool loyalty is moving beyond accuracy.
Alibaba's Qwen team has released four new small language models — Qwen3.5-0.8B, 2B, 4B, and 9B — alongside their base model counterparts, extending the Qwen3.5 architecture to compact, resource-efficient deployments.
In one week, Anthropic dismantled its hardest safety promise and absorbed a federal blacklisting rather than cross its ethical red lines. Both things are true. That's the story.
After the Pentagon banned Anthropic for refusing to allow Claude to be used for mass surveillance and autonomous weapons, users responded by making Claude the #1 app on the US App Store, overtaking ChatGPT for the first time.
OpenAI published excerpts of its agreement with the Department of Defense on Saturday morning. The language is more detailed than expected, yet more ambiguous than it first appears.
OpenAI CEO Sam Altman announced a deal to deploy AI on the Pentagon's classified network just hours after the administration blacklisted Anthropic for requesting the same restrictions OpenAI says it secured.
OpenAI announced $110 billion in new investment at a $730 billion pre-money valuation, anchored by $50 billion from Amazon, $30 billion from SoftBank, and $30 billion from NVIDIA. The Amazon commitment comes with a broad strategic partnership covering cloud infrastructure, custom silicon, and enterprise AI distribution.
The U.S. Department of War gave Anthropic a Friday deadline to remove AI safety guardrails from Claude or face contract cancellation, a supply chain risk designation, and invocation of the Defense Production Act. Anthropic refused.
Anthropic has rolled out an automatic memory feature for Claude Code that writes persistent notes about project patterns, debugging discoveries, and user preferences to a MEMORY.md file, carrying context between sessions without manual effort.
Meta and AMD announced a five-year, approximately $60 billion agreement to deploy up to six gigawatts of AMD Instinct GPUs, with Meta gaining the option to acquire roughly 10% of AMD through performance-based warrants.
Google DeepMind releases Nano Banana 2 (Gemini 3.1 Flash Image), delivering Pro-tier image generation quality at Flash speeds. The model rolls out across the Gemini app, Google Search, Flow, and Google Ads with advanced world knowledge, subject consistency, and improved text rendering.
Claude Code's new 'Remote Control' feature, launched February 24, 2026, allows devs to continue local terminal sessions from their phone, tablet, or any web browser.
Anthropic announced Claude Code Security on February 20, 2026, a tool that scans codebases for security vulnerabilities and suggests patches for human review.
Google has released Gemini 3.1 Pro, a new version of its Pro model line that it says upgrades the core reasoning capabilities behind recent Gemini 3 advances.
Google has begun rolling out Lyria 3 in the Gemini app, adding a built-in tool for generating short 30-second music tracks.
Meta and NVIDIA announced a multiyear strategic partnership deploying millions of Blackwell and Rubin GPUs, standalone Grace CPUs, and Spectrum-X networking across Meta's hyperscale data centers.
Anthropic released Claude Sonnet 4.6, upgrading its mid-tier AI model with improved coding, computer use, and long-context reasoning capabilities.
OpenAI introduced Lockdown Mode in ChatGPT and added “Elevated Risk” labels to reduce prompt injection data exfiltration risk.
Manus has launched “Manus Agents,” a way to use its AI agent directly inside messaging apps, starting with Telegram.
The Vatican is rolling out an AI-assisted live translation service intended to help attendees follow principal celebrations at St. Peter’s Basilica in their preferred language.
Peter Steinberger, the developer behind OpenClaw, announced in a blog post that he is joining OpenAI to work on “brin...
Developments in early 2026 show a change in how AI companies compete. Response speed appears to be moving from a background engineering concern to a visible product differentiator.
Manus has introduced a feature called Project Skills that lets teams curate and lock sets of reusable AI workflows at the project level.
Google is rolling out a new Gemini feature in Google Docs that lets you listen to a short audio summary of a document.
Airbnb CEO Brian Chesky said the company’s AI agent is now handling nearly 30% of English-language customer support t...
OpenAI has launched a research preview of GPT-5.3-Codex-Spark, a new version of its Codex coding model designed to respond quickly enough for real-time software work.
Gemini Deep Think is being used to assist with research-level problems in mathematics, computer science, physics, and economics.
OpenAI has rolled out a set of upgrades to Deep Research in ChatGPT, including a new backend model a...
Alibaba’s Qwen team has launched Qwen-Image-2.0, a new image generation model positioned as a single system for both text-to-image generation and image editing.
A new study from MIT researchers argues that some popular platforms used to rank large language models can be far mor...
OpenAI said on Feb. 9, 2026 that it is beginning a U.S. test of ads in ChatGPT, positioning ads as a way to support broader access while keeping answers independent.
A new "fast mode" for Claude Opus delivers up to 2.5 times higher output token generation speed at a significant premium: six times the standard.
A product update billed as a modest expansion triggered the worst week for enterprise software in years, then markets snapped back. Here's what actually happened, and what it means.
Goldman Sachs has been working with Anthropic for the past six months to co-develop autonomous AI agents for back-off...
OpenAI on Feb. 5, 2026, released GPT-5.3-Codex<sup referenceindex="1" refer...
Anthropic announced Claude Opus 4.6, the latest...
A major international report released this week finds that artificial intelligence systems are advancing faster than the methods used to test, monitor, and govern them.
Roblox has launched the beta release of 4D generation, a new AI system that allows creators to generate functional, interactive 3D objects from text prompts inside Roblox experiences.
On January 15, 2026, OpenAI removed the Voice feature from the ChatGPT macOS desktop app. While voice remains available on the web, mobile, and Windows, its absence on macOS has changed how the tool fits into daily academic and professional work.
OpenAI released a dedicated macOS application for Codex, its AI-powered coding assistant. The app is designed to serve as a "command center" that makes it easy for software developers to manage multiple AI agents at once.
A look at OpenAI’s in-house data agent, including how it uses context, access controls, and evals to support internal analytics.
Microsoft’s Maia 200 is not trying to be the fastest chip on paper. It is trying to be the cheapest place to run AI at scale. That distinction matters. While most attention still goes to training massive models, the real cost of AI now sits in inference, the endless stream of everyday queries powering copilots, chatbots, and search tools. Maia 200 is Microsoft’s attempt to own that layer of the stack, quietly reducing dependence on Nvidia while reshaping how Azure prices and delivers AI. If it works, the biggest shift will not be technical, it will be economic.