Article based on video by
Here’s a strange reality: I asked ChatGPT to set a timer for me, and it described how to do it. That’s it. No action, no alert, just words explaining words. After watching a critical breakdown of the current state of AI, I spent a week testing exactly where this line breaks—and why the biggest AI companies haven’t crossed it despite $800 billion in investment. Most guides skip this part entirely because it’s uncomfortable to admit that the world’s most famous AI still can’t do what your $15 alarm clock can.
📺 Watch the Original Video
The Timer Test: Why One Simple Action Reveals Everything
Ask ChatGPT to set a timer for 15 minutes. Watch what happens.
What Actually Happens When You Ask for a Timer
You get text. Instructions, explanations, maybe even a helpful ASCII art diagram of a countdown. But no timer. The AI responds with language because that’s the only tool it has—it generates text the way a typewriter creates letters. There’s no motor, no clock chip, no connection to your device’s operating system pulling the strings.
Here’s the thing that still gets me: a $15 alarm clock from a dollar store can do what an $800 billion AI company cannot. That 1990s Casio on your wrist has outperformed every advanced language model ever built at this one specific task. It sounds absurd until you think about what’s actually required.
The Difference Between Describing and Doing
The gap comes down to something deceptively simple: language generation versus task execution. The model predicts the next token based on probability distributions learned from text. This makes it remarkable at writing, analysis, and conversation. But setting a timer requires triggering a state change in your device’s operating system—something completely outside the training distribution that tokens simply can’t accomplish.
This isn’t a bug. It’s architecture. Every design choice that makes LLMs impressive at language tasks is the same choice that prevents them from directly controlling systems. Function calling and plugins don’t change this fundamental constraint—they just add intermediaries, verbose and fallible ones, doing what a cheap clock handles natively.
Why This One Example Matters More Than Any Benchmark
This is why the “timer test” has become shorthand among AI researchers for measuring the gap between hype and reality. It’s immediate, quantifiable, and requires no specialized knowledge to understand.
When Sam Altman publicly acknowledged we’d need “another year” before basic task execution works reliably, he was essentially validating this critique. The timer sits there, mocking $800 billion in investment, reminding everyone that describing something and doing it remain stubbornly different.
Sound familiar?
Understanding Why ChatGPT Has These Limitations
Here’s something that took me a while to fully grasp: when you interact with ChatGPT, you’re not actually running a program. You’re requesting a response from a system that’s trained to predict what text should come next—not execute actions or change anything in the real world.
The architecture problem: prediction vs. execution
This is where most people’s mental model breaks down. LLMs (large language models) are built to do one thing exceptionally well: predict the next token in a sequence. Think of it like the most sophisticated autocomplete you’ve ever used. When you ask ChatGPT to set a timer, it can describe setting a timer in perfect detail—but it’s just generating text that sounds like instructions, not actually running code that touches your device’s clock.
The architecture that makes LLMs remarkable at language tasks is structurally incompatible with many real-world actions. This isn’t a bug waiting to be patched; it’s a fundamental design choice. And honestly? That’s okay—but more on that in a bit.
Why training data creates a text-only world
There’s a concept called training data cutoff that most people never think about. ChatGPT doesn’t know what’s happening right now. It can’t browse live websites, check today’s news, or see what’s on your screen unless you specifically paste it in. The model learned from a snapshot of the internet taken at a specific moment in time.
So when someone says “ChatGPT doesn’t know current events,” they’re describing a literal technical constraint—not a failure of will. Without external tools like web browsing, you’re interacting with a very smart, very informed snapshot of the past.
The session state mystery: why ChatGPT forgets
Here’s one that frustrates a lot of users: start a conversation today, close the tab, come back tomorrow—and it’s like you never spoke before. That’s not ChatGPT being forgetful. It’s because every conversation starts fresh, with no persistent memory, no saved variables, no system access between sessions.
You’re essentially renting a brilliant mind that resets completely with each new chat. The model can’t “remember” you across sessions because there’s no persistent state being maintained on your behalf.
This architecture serves a purpose—it makes LLMs excellent at language tasks. But it does mean ChatGPT is closer to a sophisticated autocomplete than a true agent. Sound familiar? That gap between “understands” and “does” is exactly why Sam Altman himself acknowledged it might be “another year” before basic task execution capability becomes reality.
The $800B Disconnect: Investment vs. What We Actually Got
Here’s something that keeps me up at night: the AI industry has raised over $800 billion in total investment, with OpenAI alone securing more than $20 billion—yet ask any of these systems to set a timer and watch it fumble. That’s not a talking point. That’s a literal demonstration of the gap between capital and capability.
Where all that money actually went
The honest answer? Mostly into three buckets: scaling up existing architectures, safety research, and making language models slightly better at generating language. Those are real priorities, but they’re fundamentally about improving what these systems already do well—predicting the next token.
What we haven’t seen proportional investment in are action systems, the kind that would let an AI actually interact with your calendar, your devices, or the physical world.
Why capability benchmarks don’t match consumer experience
This is where most industry reporting gets it wrong. They point to benchmarks showing AI passing medical exams or writing competent code and conclude the technology is mature. But here’s the catch: those benchmarks measure text prediction performance.
The moment you need something done—a reminder that actually fires, an email that actually sends, a task that spans three apps—suddenly you’re dealing with a fundamentally different problem. The architecture remains text-prediction-based underneath, and no amount of polishing that core changes what’s missing at the foundation.
Sam Altman’s own timeline admission
Maybe the most telling moment came when Sam Altman publicly acknowledged we’re roughly “about a year away” from AI systems that can reliably execute multi-step tasks. That admission from the CEO of the world’s leading AI company carries weight.
What it reveals is that the gap everyone feels between impressive demos and reliable daily use isn’t a temporary bug—it’s a structural acknowledgment that the current architecture has real limits. Companies now face a genuine fork: layer tools and agents on top of existing systems, or rebuild from scratch. Most are choosing the quicker path, which means we’re likely stuck with this disconnect for a while longer.
What ChatGPT Actually Does Well (And Where It Falls Short)
Core Competencies: Where LLMs Genuinely Excel
Here’s the thing nobody talks about enough: ChatGPT is genuinely brilliant at thinking tasks. I’ve spent enough time with it to know that when you need someone to wrestle with an idea, synthesize information, or help you debug code, it delivers. Text generation, summarization, analysis, and coding assistance remain genuinely impressive and useful.
What surprises people is how good it is at explaining complex topics in ways that actually make sense. It can take a dense technical paper and distill it into something actionable. When I’m stuck on a problem, I’ll often describe it to ChatGPT not because I expect it to solve it for me, but because articulating the problem often leads somewhere unexpected. That’s a thinking tool at its best.
Honest Assessment of Current AI Agent Tools
But here’s where things get uncomfortable. Every AI assistant—Claude, Gemini, ChatGPT—they all face the same fundamental limitation: they’re text prediction systems at heart. They describe actions rather than execute them. Function calling exists, sure, but it requires extensive setup, often executes unreliably, and doesn’t match the native integration you’d get from a system-level tool.
I’ve tried the third-party agent tools and automation platforms. They extend capabilities in theory, but in practice you’re adding layers of complexity, cost, and brittleness. A timer or alarm? That sounds simple, but it requires system-level integration that none of these tools have natively.
Why Expectations Matter More Than Features
This is where most tutorials get it wrong: they sell you on what these tools could do with enough engineering, rather than what they actually do today. The $800 billion in AI investment hasn’t changed the fundamental architecture—these are still systems trained on static data, generating text one token at a time.
Sam Altman himself admitted they’re probably “another year” away from reliable basic task execution. That’s coming from the person running the company. When you set realistic expectations—ChatGPT as a thinking tool, not a doing tool—you avoid a lot of frustration and actually use it where it shines.
The Path From Chatbots to AI Agents: What’s Actually Needed
Here’s something that stuck with me after watching the video: OpenAI has raised $800 billion, and yet the most requested feature from users is something a basic smartphone app could do in 2009 — setting a timer. That gap between investment and capability isn’t an accident. It reveals something fundamental about where AI actually is versus where we assume it is.
What True AI Agents Require That LLMs Lack
The LLMs we use today are text-prediction engines dressed up as thinking machines. They generate responses token by token, with no memory of your previous session, no ability to open your calendar, and no pathway to actually do things in the world. Setting a timer seems trivial, but it requires persistent state, system-level access, and cross-application coordination — capabilities that live in a completely different architectural layer than language generation.
True AI agents need five things current models simply don’t have: persistent state across interactions, direct system access, reliable tool execution, error recovery when something fails, and the ability to coordinate across multiple applications. Right now, even sophisticated tasks require humans as the connective tissue.
Industry Developments to Watch in 2025
Companies are racing to build what’s being called “agentic” layers — essentially scaffolding that wraps around LLMs to give them tool use, retry logic, and session memory. This is clever, but it’s a workaround. The video makes a point I hadn’t considered: this transition requires entirely different system architectures, not just better models. We’re not upgrading a house; we’re trying to add a second story to a foundation built for a bungalow.
Sam Altman publicly admitted the timeline is “about one year” for reliable, consumer-ready task execution. I’ve learned to take his estimates with a grain of salt — but even if he’s right, that’s one year until basic automation, not the capable agents we imagine.
The Honest Implication
For the foreseeable future, AI will assist your thinking while human hands still do the doing. That’s not pessimism — it’s the realistic gap between where we are and where the industry needs to go.
Frequently Asked Questions
What can ChatGPT actually do in 2025?
ChatGPT excels at text generation, analysis, and coding assistance in 2025, but it’s fundamentally a prediction engine—it generates likely next words, not real-world actions. What I’ve found is that tasks like drafting emails, explaining concepts, debugging code, and summarizing documents work well because they stay within the text domain. The moment you need something to actually happen in your physical or digital environment—setting a reminder, sending a message, moving a file—ChatGPT hits a wall.
Why can’t AI assistants like ChatGPT set timers or take real actions?
ChatGPT was designed purely as a language model with no access to your operating system, device hardware, or installed applications. When you ask it to set a timer, it can only generate text describing a timer—it has no pathway to send a command to your phone or computer. In my experience, this isn’t a bug that needs fixing; it’s a fundamental architectural choice. To actually execute timers and alerts, you need what I call ‘system-level hooks’—direct integration points into iOS, Android, Windows, or macOS that simply don’t exist in standard LLM architectures.
What is the difference between ChatGPT and AI agents?
Think of it this way: ChatGPT predicts what to say next, while an AI agent predicts what to do next and then does it. A chatbot receives ‘Write me a Python function’ and outputs text. An AI agent receives the same request, writes the code, then opens your IDE, saves the file, runs tests, and reports back. The gap is massive—roughly 80% of what people expect from AI today requires agentic capabilities that don’t reliably exist yet. Agents need tool use, memory persistence, and real-time feedback loops that pure LLMs weren’t built for.
When will ChatGPT be able to reliably execute tasks for me?
Sam Altman himself said ‘another year’ before basic task execution works reliably—and that was in late 2024. What I’ve observed is that OpenAI is funneling that $800 billion investment toward exactly this capability, but execution is hard. They need to solve security sandboxing (preventing AI from doing harmful things), permission systems (determining what AI can access), and rollback mechanisms (undoing mistakes). My prediction: you’ll see incremental improvements starting in late 2025, but truly reliable agentic behavior across all platforms is probably 2-3 years away for enterprise use, longer for consumer.
Are there AI tools that can actually take actions instead of just generating text?
Yes, but they’re niche and come with caveats. Tools like Browser-use, OpenAI’s Operator, and various RPA (Robotic Process Automation) integrations can actually click buttons, fill forms, and navigate interfaces. What I’ve found is they work best in constrained environments—like booking flights, managing calendar invites, or controlling a single application. The moment tasks require cross-app coordination or handling unexpected UI changes, failure rates spike. These aren’t replacements for general-purpose assistants yet; they’re specialized tools for specific, repetitive workflows.
📚 Related Articles
If you’re trying to automate workflows or reduce manual tasks, understanding what AI can and cannot do natively will save you hours of frustration—start with what ChatGPT actually excels at before building around its limitations.
Subscribe to Fix AI Tools for weekly AI & tech insights.
Onur
AI Content Strategist & Tech Writer
Covers AI, machine learning, and enterprise technology trends.