In 2025, Apple and Google took a key step by enabling generative models that run directly on mobile devices. Thanks to technologies like TFLite and CoreML, it's now possible to interact with ChatGPT-like assistants offline. This represents a revolution in privacy, accessibility, and user experience.
A before and after
Previously, generative models relied on remote servers, which meant latency, data consumption, and the risk of sensitive information leaks. Now, compact models like the Gemini Nano or Claude Instant allow users to perform tasks such as writing, summarizing, or translating directly from the device.
Key use cases
- Students and professionals on the move can use AI on planes or trains without signal.
- Medical staff write reports without compromising privacy.
- Users in regions with low connectivity have seamless access to artificial intelligence.
How does it work?
The templates are downloaded only once (approx. 200–500 MB), updated via Wi-Fi, and activated for tasks such as writing, translation, or summarizing, without requiring an external API.
Limitations
- Lower capacity compared to cloud models.
- Functions such as web browsing or access to documents still require a connection.
- They are not updated in real time; GPT-4o Online is still superior for recent data.
Real impact
This advance democratizes access to AI. It enables greater adoption in sensitive sectors such as banking and healthcare, and paves the way for devices like smartwatches with integrated generative capabilities.
We recommend reading our article about GEO: the new discipline of “Generative Engine Optimization” for visibility at IAG