Having just read a paper - https://arxiv.org/pdf/2412.13501 - a survey on GUI in the context of AI Agents, I'm thinking about why people (read: developers) are excited about agents/agentic AI.On some level this feels like developers rushing to solutioning without thinking like a designer e.g. they're not asking the simple question: what problem are we solving here? Or, put in other terms, what jobs would people hire and agentic AI to do?
Staying on this trajectory puts Agentic AI on the path of being a solution in search of a problem.
Thinking about what's outlined in the paper - at a high level, that in order to use websites (or presumably apps) like people do, AI agents need to "think" and "behave" like people e.g. they (AI agents) require perception, reasoning, planning, and acting capabilities.
This feels backwards to me.
- Let's start with problems people need solving, and what they might hire an AI agent to do
- Instead of constraining the solution around the current paradigm of navigating to webpage or opening an app (and having an AI agent do this for you), let's explore what alternative approaches (abstracting the familiar UIs away and doing providing the solution programmatically, powered by APIs and supported by a UI better suited to this new interaction model)
I get that there's already so much out that could be done by AI agents acting on our behalf by just using websites and apps for us vs. having to redesign everything that already exists to be "programmatic" but, again, this somehow feels very inefficient.
Perhaps this is an interim approach, a steppingstone (and a way to train AI agents on the dimensions of perception, reasoning, planning, and acting).
It reminds me a bit of how Shoprunner went to market by having bots place orders on multiple seller websites om behalf of their members so that they could provide a unified interface to their members (which also helped members discover brands/sellers they could shop from) without the need to manually navigate to multiple websites. It also enabled Shoprunner to scale fasters as it removed the dependency on sellers to add Shoprunner code to their websites, or integrate with an API.
Maybe this is similar.
However, I still think it's crucial to start with the underlying jobs to be done.
What are the most common things people do on the web and in apps that would lend itself to automation by AI Agents ultimately, or maybe even directly, via a programmatic solution delivered in an abstracted UI?
So much of what we do on the web and in apps could count as "entertainment" even if it isn't explicitly positioned or served up as such e.g. Social is entertainment, Messaging is entertainment, Shopping is entertainment, etc. - I use entertainment in the loosest sense here: it offers distraction, is fun, or helps pass the time.
What happens to this broad entertainment in the context of agentic AI? Does it lend itself to agentic use-cases? Does Agentic AI become its own form of entertainment? To some extent we're already seeing this with GPT based products like ChatGPT, Anthropic, MidJourney, Sora and Meta AI (this is an explicit use-case for Meta AI) which are fun to use, offer a distraction, and help pass the time and therefore count as entertainment.