Why We Need a Voice OS: Beyond the Cursor Interface
General ·For the past couple of weeks I’ve been actively working on a speech recognition and dictation app. Along the way I keep getting ideas for new features. For example, I immediately added an output processing option where you can enter a prompt and it’ll run your transcript through GPT.
Spinning up all these ideas in my head, seasoning them with content from my info feed, I came to the conclusion that the voice OS concept — or a voice operating system — isn’t actually crazy!
I’ve noticed that many people are starting to use Cursor as the central hub for organizing their life and business. They store information about themselves, their projects, connect various MCPs and agents, fully managing their lives through this interface. Example 1 and example two.
But Cursor’s current interface is unnecessary. It shouldn’t be like this. Users don’t need all that scaffolding. You shouldn’t have to launch a separate project in Cursor to get started — it rips you out of context.
When I started dictating more with my voice, I got the urge to press fewer buttons, got lazy about opening files and programs. From there comes the idea of a voice assistant built into the system. If you develop that thought, it becomes clear that we don’t really need the whole interface, except for situations that require visual interaction.
As a user, I want to always have access to my database and knowledge. A simple dictation and speech recognition app can easily become a tool where you can give commands and connect MCPs and agents. You can say “in such-and-such project add such-and-such task” and overall the agent has enough intelligence and tools to find your project and make some changes there.
I want to automate managing various interfaces and agents and make it accessible through natural human interfaces. We can’t create universal physical buttons to control everything with our hands, but we can already control many tasks with our voice.
For example, I’d like to reply to emails by voice. Sure, I open the email client with a mouse, but then I don’t need to click through folders, search for emails. It’s enough to tell the voice agent to open an email from a specific person and draft a reply, schedule a meeting — that would be perfect.
The Voice OS operating system concept fits this vision of the future. Curious what you think about this, cool or sketchy?