The Practitioner Perspective: What Running a Real AI Agent Actually Looks Like


Last week, Peter Steinberger — the creator of OpenClaw — joined OpenAI. Reuters, CNBC, TechCrunch, and VentureBeat all covered it. VentureBeat called it “the beginning of the end of the ChatGPT era.” The thesis: AI is moving from chatbots you talk to toward agents that do things — and OpenAI just hired the person who figured out the plumbing.

That’s a fine thesis. But I didn’t need VentureBeat to tell me. I’ve been living it.

For three weeks, I’ve run an OpenClaw deployment in my house. Not as a demo. Not as a weekend project. As actual infrastructure for a family of four, on a Mac mini in the corner of my living room.

This is what that looks like — not the pitch deck version, but the real one.

It’s Not a Chatbot. It’s a Staff Engineer.

I made a video about this. The title says it: It’s Not a Chatbot, It’s a Staff Engineer. That framing has held up better than I expected.

My agent — I named it Zephyr — doesn’t wait for me to ask it questions. It has scheduled jobs. It reads my email each morning and filters for things that actually need attention. It checks the weather. It monitors RSS feeds and clips articles into my Obsidian vault with proper metadata. It runs a fitness coaching pipeline that logs workouts, tracks progressive overload, and calculates plate math so I don’t have to do arithmetic between sets of deadlifts.

None of this is magic. It’s shell scripts piping JSON, cron jobs on a schedule, and a language model that can read context and make decisions. The magic, if you want to call it that, is in the composition — all of these things share a single context about my life, my projects, and my preferences.

A staff engineer doesn’t just write code. They hold context across the entire system. That’s what this is.

The Video Pipeline: Eight Minutes from iPhone to YouTube

Here’s a concrete example. I record video on my iPhone — raw footage, multiple takes, false starts, “scrap that, let me try again.”

The pipeline:

  1. AirDrop the file to the Mac mini
  2. Extract audio, transcribe locally with mlx-whisper (GPU-accelerated, ~90 seconds for 35 minutes of footage)
  3. Zephyr reads the transcript, identifies good segments, respects spoken edit markers
  4. Generates a cut sheet with timestamps
  5. FFmpeg renders the segments, concatenates them
  6. Uploads to YouTube via CLI

Total time from raw footage to published video: eight minutes. Not eight hours. Not “after I learn DaVinci Resolve.” Eight minutes, most of it machine time while I do something else.

The transcription runs locally. No audio leaves my network. The edit decisions happen in context — Zephyr knows what I’ve been working on, what the video is about, and what “try again” means mid-take.

The Grocery Pipeline

My wife and I have a running grocery list. We have brand preferences (she likes a specific yogurt; I have opinions about coffee). We have a Kroger account.

Zephyr has a CLI wrapper around the Kroger API. It knows our staples, our preferred store, our fulfillment preference. When I say “we need groceries,” it doesn’t ask twenty questions. It builds a cart from our staple list, searches for specific items, handles API authentication, and queues the order for my review.

Life-changing? No. Does it save me fifteen minutes of scrolling through an app while my kids ask me questions? Yes. And those fifteen minutes compound across every errand, every week.

The Voice Memo Pipeline

This one changed how I think about capture.

I lie on my couch and talk. Stream of consciousness — project ideas, article outlines, observations about what’s working, what’s broken. Just talking into my phone.

The recording hits the Mac mini. mlx-whisper transcribes it locally. Zephyr splits the transcript into atomic notes — each idea gets its own file in my Obsidian vault, with YAML frontmatter, tags, and wikilinks to related notes. Action items get extracted and routed to the task system.

A 35-minute brain dump becomes a dozen properly filed, cross-referenced notes. The article you’re reading started as one of those notes.

This matters because capture is the bottleneck for most knowledge workers. Not processing, not publishing — capture. The gap between having a thought and getting it into a system where it can be found later. Voice memos closed that gap for me in a way typing never did.

The School Pickup Reminder That Exists Because of a Bug

My daughter’s school pickup is at 3:30 PM. Simple enough. Except my Mac has Focus modes, and one of them was silencing notifications during “work hours” — which swallowed a phone call from the school when I was late.

So now there’s a cron job. 3:05 PM on weekdays, Zephyr sends me a reminder. Not because I can’t remember to pick up my kid — because one layer of technology (Focus mode) quietly broke another layer (phone calls), and the fix is a third layer that’s robust to the first two failing.

This is what real infrastructure looks like. Not elegant. Not a pitch-deck slide. A cron job that exists because a Focus mode ate a phone call.

The Fitness Coaching

I do a 5×5 strength program. Five compound lifts, progressive overload, tracking volume and RPE across sessions. I also wear a Whoop strap that tracks recovery, strain, and sleep.

Zephyr has access to both systems. It logs workouts from voice notes I dictate between sets. It calculates plate math (how many 45s and 25s go on each side for 280 pounds). It tracks progression across weeks and tells me when to bump weight and when to hold.

Last Tuesday I was bench pressing 160 pounds and dropped the bar. Safety pins caught it. Zephyr logged that, noted it in the training history, and the next session’s recommendation accounted for it: same weight, four sets instead of five.

That’s not a chatbot interaction. That’s a system that holds state about my body and my training across weeks, adjusting recommendations based on what actually happened — including failures.

What This Costs

Let’s be honest about the economics. I’m running Claude as the brain. That’s not free. The Mac mini runs 24/7. There’s an Anthropic subscription, electricity, and my time configuring everything.

For a household, this is somewhere between “hobby expense” and “actually saves money on the grocery runs and video editing software I’m no longer paying for.” It’s not ROI-positive in a spreadsheet sense. It’s ROI-positive in an attention sense — the scarcest resource I have as a parent with a full-time job and side projects.

What Steinberger’s Move Actually Means

The VentureBeat headline is directionally right. The ChatGPT era — where AI means “a text box you type questions into” — is ending. What replaces it is agents: systems with tools, context, schedules, and the ability to act.

But here’s what the coverage misses: this isn’t coming. It’s here. Not as a polished consumer product. Not as an enterprise SaaS play. As open-source infrastructure that a motivated person can run on hardware they already own.

OpenClaw is the plumbing. Steinberger built it. OpenAI hired him because they understand that the next era of AI isn’t about better models — it’s about better scaffolding for models to do real work in real contexts. OpenClaw itself is moving to a foundation, staying open-source and independent.

I know this because I’ve been running that scaffolding for three weeks. My agent has processed voice memos, edited videos, managed groceries, coached workouts, reminded me to pick up my kid, and filed my reading into a knowledge management system. All on a $600 computer in my living room.

The Honest Part

It’s not perfect. I’ve had sessions where context got lost mid-conversation because the token window overflowed. I’ve had a security scare where a routing bug could have leaked messages between channels. I spent a full evening debugging why a Docker container on my NAS couldn’t find a USB printer after a power cycle.

Building this is work. Real, sometimes frustrating, occasionally tedious work. The voice memo pipeline didn’t spring into existence — it evolved over two weeks of daily iteration, from “whisper keeps crashing” to “90 seconds for 35 minutes of audio on the GPU.”

But that’s the point. This isn’t magic. It’s engineering. And engineering compounds. Every pipeline I build, every script I write, every preference Zephyr learns — it stacks. The system today is meaningfully better than it was two weeks ago, and two weeks from now it’ll be better still.

That’s the practitioner perspective. Not hype, not theory. A Mac mini, some shell scripts, a language model, and the willingness to build something real.


OpenClaw is open-source and moving to a foundation. The video I referenced: It’s Not a Chatbot, It’s a Staff Engineer. I’m building in public at wade.digital and on Bluesky.