Most people’s experience of AI so far has been underwhelming at best. Copilot summarises a meeting and gets the action items wrong. ChatGPT confidently tells you strawberry has two r’s. You ask a customer service bot a real question and get a menu, or a decision tree clearly designed around the organisation’s internal logic, not yours. You give up, you call the number, you wait on hold. The AI was supposed to help, but it just got in the way.
I’m betting this is as familiar to you as it is me. It’s the same pattern that has cursed the reputation of government digital services for decades. A service gets shoved out the door because the minister wants it live yesterday. It’s scoped as an IT project, built to a spec, technically functional, and completely indifferent to the people who’ll actually use it. No one designed the experience. No one tested it with real users in real contexts. It works—in the sense that it exists and does things—but it doesn’t work for anyone.
AI is heading down the same path, faster.
Without good design around them, AI-powered services aren’t better than what came before—they’re just worse in new ways. Instead of “I’m sorry, I didn’t understand that,” you get a confident, plausible answer that happens to be wrong. Instead of a rigid decision tree, you get a system that’ll happily help you bake an orange cake when you came to make a complaint about a government agency. The failure mode has changed from deflection to hallucination, from “can’t help” to “will help with absolutely anything, whether or not it should.” This is a design and implementation problem, not an AI problem.
Most of the fear about AI in services—and in government especially—sits right here. The model might say the wrong thing, make something up, or go somewhere it shouldn’t. These are legitimate concerns. But they’re symptoms of what happens when you put a powerful language model in front of people without designing the experience around it. Every one of those risks is addressable through design.
Hallucination drops when you connect the model to your own verified knowledge base, so it’s synthesising information you’ve already validated rather than improvising from its training data. Going off-piste is addressed by guardrails that define what the system should and shouldn’t discuss. Sensitive situations are caught by content detection that triggers escalation to a person. Accountability is addressed by monitoring that gives staff visibility into every conversation. This is the harness. It’s where the overwhelming majority of the design work lives.
Design here means more than a lick of paint on the interface.
It means humanity, feel, care, usefulness and usability designed into every layer of the system—the retrieval architecture, the conversational tone, the escalation logic, the choice of which model to use for which task. It means collaborative workshops with the teams who’ll be accountable for what the system says. Content strategy that encodes plain language and appropriate tone, because people using these services need to feel heard and oriented, not processed. Co-design with the people who’ll actually use the thing. Testing with real people in real contexts, because the edge cases that matter only surface when you’re close enough to the complexity. The material is new. The methods are not.
This matters more with AI than it ever has with any technology we’ve worked with. Because the gap between a carelessly deployed AI service and a thoughtfully designed one is enormous—it’s the difference between the thing “working” as far as the human on the end of it is concerned, or it being another moment of frustration.
It’s 11pm and you need to understand how to make a complaint about a government agency. You’re frustrated, maybe anxious, and there’s no one to call. You type your question in plain language and get a clear, accurate answer drawn from the organisation’s actual knowledge base. When you go somewhere the system shouldn’t follow, it tells you. When your situation gets complicated or distressing, it connects you to a person. It feels like talking to someone who knows what they’re talking about, stays in their lane, and wants to help you find your way through.
That experience exists. We built it for the NSW Ombudsman, and the distance between that and “sure, let’s bake an orange cake” is design—applied all the way through, not bolted on at the end.
The techniques are proven and in production.
Retrieval architectures, content guardrails, escalation logic, human-in-the-loop monitoring—they work within existing governance frameworks like the NSW AI Assessment Framework. They can be piloted at a scale proportionate to your risk appetite, and expanded as your confidence grows. The models themselves will keep getting cheaper, more energy efficient, and more capable. Any organisation can access the same engine.
We can do so much better than the AI services most people are experiencing. The tools are here, right now. What’s been missing is good design—the same discipline of starting with people, understanding context, and caring about the quality of the experience that separates good services from bad ones in every other domain. AI just raises the stakes.