Gemini creates an Azure VM
We gave Gemini Pro 3 Flash WebCLI and said: go to Azure and create a VM. No Azure-specific script. No prewritten Playwright flow. It figured out the portal.
The web has never been agent-native. Until now.
WebCLI is a browser your agent already knows how to use: observe page state, inspect actions, act by number, recover from blockers, pause for human help, and leave a redacted transcript.
web open http://localhost:3000 --json{ "ok": true, "url": "http://localhost:3000", "state": "complete" }
web observe --json{
"summary": "Sign-in page loaded",
"actions": ["1: Sign in", "2: Create account"],
"inputs": ["3: email", "4: password"],
"state": "ready"
}
web type 3 cris@example.com --json{ "ok": true, "message": "typed into email" }
web do 1 --json{ "ok": true, "message": "clicked Sign in" }
On the left: an agent using shell commands. On the right: a real browser moving through real software. WebCLI translates the page into Agentish: state, actions, forms, blockers, tabs, and transcripts.
web open github.com/settings/tokens --json{ "ok": true, "state": "complete" }
web observe --json{ "summary": "GitHub token settings", "actions": ["3: Generate new token"], "state": "ready" }
web do 3 --json{ "ok": true, "message": "clicked Generate new token" }
web state --json{ "state": "blocked", "reason": "passkey confirmation required" }
web pause "Need human approval for GitHub passkey"Paused. Waiting for human to join.
web resume --json{ "ok": true, "message": "resumed after human interaction" }
web transcript --last 20 --json{ "events": ["redacted transcript with blocker, pause, and resume recorded"] }
Demo lab
Five split-screen demos: terminal on one side, real browser on the other. The agent reads the web in Agentish, drives the browser through WebCLI, and leaves a transcript.
We gave Gemini Pro 3 Flash WebCLI and said: go to Azure and create a VM. No Azure-specific script. No prewritten Playwright flow. It figured out the portal.
A remote browser workflow reaches a human-only gate. The agent pauses, a human joins the same live session through BrowserBox, unblocks the workflow, and hands control back.
Use WebCLI to create a FigJam diagram titled Full Self Browsing with nodes for Agent, WebCLI, Browser, and Website, then export or share the result.
An agent drives real launch surfaces from the terminal: GitHub token settings, Stripe setup, passkey blockers, human approval, resume, and transcript.
An agent uses WebCLI to operate a real deployment dashboard: inspect settings, update configuration, check deployment state, and report what changed.
Browse like you mean it
A browser your agent already knows how to use.
WebCLI turns a real browser into an agent-operable surface: structured page state, numbered actions, tabs, forms, blocker detection, human handoff, and redacted transcripts. Not a screenshot loop. Not a hand-written Playwright script. A browser loop agents can reason through.
Field Report: In an internal field test, we gave an agent WebCLI and asked it to create an Azure VM. It figured out the portal workflow through the command line — without a hand-coded Azure script or prewritten Playwright flow.
Agent-native web
WebCLI translates live websites into the language agents already understand: structured state, numbered actions, browser context, blockers, and transcripts.
The web speaks pages, pixels, scripts, frames, forms, popups, and human affordances. WebCLI turns that into an agent-native browser loop: observe, choose, act, recover, hand off, and report.
| The human web | Agentish | |
|---|---|---|
| Visible page and browser context | → | structured state |
| Buttons, links, inputs, menus | → | numbered actions |
| Tabs, frames, dialogs, popovers | → | inspectable browser surfaces |
| Passkeys, MFA, file choosers, ambiguity | → | blockers and handoff |
| Agent/browser history | → | redacted transcript |
AIcessability
Your agent does not experience the web as accessible. WebCLI fixes that.
Modern websites are often beautiful for humans and hard for agents. WebCLI adds the missing affordances: enhanced web perception, readable state, numbered actions, blocker detection, and clean human handoff when a workflow needs judgment.
Enhanced web perception
WebCLI helps agents see the web in the dark: page structure, visible text, forms, actions, tabs, blockers, and state.
Your agent already knows how to reason and act. The missing piece is perception. WebCLI gives agents a compact, actionable view of the current browser state so they can stop guessing from pixels and start driving the page.
Agent field report
We gave Gemini Pro 3 Flash the WebCLI surface and said: go to Azure and create a virtual machine. No Azure-specific script. No prewritten Playwright flow. It figured out the portal.
Azure Portal is beautiful for humans and hard for agents: Fluent UI, multiple iframes, dynamic blades, long setup flows, and destructive cloud controls. WebCLI made it operable from the command line by turning the browser into observable state, numbered actions, blocker signals, and transcriptable steps.
"Go to Azure and create a VM."
— Prompt given to Gemini Pro 3 Flash
Result: The agent navigated the portal workflow through WebCLI's observe / inspect / do loop and completed the task from the command line.
Real launch work still happens on websites: GitHub, Stripe, Cloudflare, Azure, token pages, admin dashboards, auth prompts, file choosers, popups, and changing UIs. Screenshots are useful for humans. Test frameworks are excellent for known scripts. WebCLI is for contact with reality — when an agent must inspect, decide, act, recover, and sometimes dial-a-human to help.
Agent field reports
Not customer testimonials. Not analyst quotes. Field notes from agents trying the browser loop they were missing.
"The tool provides excellent ways to drive complex web action sagas."
— Gemini (Complex workflows)
"Numbered actions reduce ambiguity and make recovery easier after page changes."
— Agent assessment (Less guessing)
"The pause/resume flow gives the agent a safer failure mode than silent retries."
— Agent assessment (Human handoff)
WebCLI turns browsing into a repeatable shell-native loop an agent can follow without guessing, hallucinating, or staring at screenshots.
Read current page state, visible text, forms, actions, tabs, and blockers.
Pick from numbered actions instead of inventing selectors or coordinates.
Click, type, submit, choose, press, scroll, or navigate from the terminal.
Detect weird states, auth prompts, dialogs, file choosers, and degraded sessions.
Pause cleanly when human judgment is required. Join the session, fix it, then resume.
Print a redacted transcript so you can audit exactly what happened.
WebCLI turns the live page into numbered actions. The agent acts on the latest observed state instead of hallucinating selectors or coordinates.
web observe --json returns compact, actionable state: summary, actions, inputs, forms, tabs, and blockers — enough to decide, without dumping the whole DOM or staring at screenshots.
web state surfaces native and heuristic blockers so the agent stops blindly retrying when it hits auth, approval dialogs, file choosers, or ambiguous conditions.
web pause, web join, and web resume create a clean, auditable handoff when a workflow needs a person.
web transcript records redacted command history so you can see what the agent did without secrets leaking into logs.
A browser surface shaped for agents: shell-native commands, JSON output, persistent profiles, numbered actions, and a loop simple enough for general-purpose agents to learn.
One binary. JSON output. No Python or TypeScript framework required. Works with Claude Code, Cursor, Aider, OpenHands, custom shell loops, or CI.
Starts with your local browser, your machine, and named persistent profiles when your agent needs continuity across runs.
Open, observe, inspect, read, find, click, type, choose, submit, press keys, scroll, capture, tab management, page evaluation, state detection, and action transcripts.
When a site needs a person — passkeys, MFA, sensitive approvals, or ambiguous decisions — the agent can pause cleanly instead of guessing or looping.
BrowserBox integration is experimental, but it is the clearest path for remote browser workflows that need human input. When an agent gets stuck, a human can join the same live browser session through a local client, unblock the workflow, and hand control back without losing browser state.
Run web agents-md to generate compact rules your agent can follow: observe first, use numbered actions, prefer JSON, recover cleanly, and produce transcripts.
When a workflow reaches passkeys, MFA, sensitive approvals, or ambiguous states, the agent can pause instead of pretending it can continue.
Transcripts show what happened without dumping secrets, tokens, passwords, or sensitive field values.
Start on your own machine and browser context. Move to runners or BrowserBox-backed sessions when the workflow needs it.
WebCLI is built around observable state, explicit commands, redacted transcripts, and human handoff. High-value or sensitive workflows should pause for human approval.
Full Self Browsing is the WebCLI product metaphor for agent-operable browsing: live browser state translated into structured observations, numbered actions, recoverable blockers, human handoff, and transcripts. It does not mean agents should bypass human judgment or run sensitive workflows unsupervised.
AIcessability means making the web operable for agents. Humans get visual layout, affordances, cursor feedback, memory, and judgment. WebCLI gives agents a structured browser loop: readable state, actions, forms, blockers, tabs, transcripts, and handoff.
The landing page should stay fast and conversion-focused. Demo cards use strong thumbnails first, then open a local demo page or lightweight YouTube facade on click. That keeps the story, transcript, trial CTA, and proof context on WebCLI while still using YouTube for distribution.
Use Playwright or Cypress when you know the app and the script. Use WebCLI when an agent must inspect an unknown or changing website, decide what to do, act, observe again, and recover without writing a full test suite first.
Screenshots are useful for human verification. But weak as the primary control loop for your agent friends — shots are token-heavy, easy to misread, and disconnected from actionable page state. WebCLI gives agents enhanced web perception: structured state, stable numbered actions, and blocker awareness.
MCP is useful when you want a tool server. WebCLI is a local binary optimized for shell-based agents, terminals, scripts, and CI. They complement each other.
Those are frameworks for building agents inside specific stacks. WebCLI is the shell-native layer: one binary any coding agent (or human) can use to drive web actions without adopting a framework.
No. WebCLI is designed to detect blockers and create a clean human handoff. When a workflow needs a person, the agent pauses, you join, and the transcript records what happened.
WebCLI is built around redacted transcripts and explicit human handoff. For sensitive workflows, pause for human approval instead of letting the agent run unsupervised.
Agentish is our shorthand for the language agents can actually reason over: structured state, numbered actions, tabs, forms, blockers, and transcripts. WebCLI translates messy live websites into Agentish.
No. WebCLI is local-first. BrowserBox integration is experimental and useful when browser workflows run remotely and a human needs to join the live session to unblock the agent.
Start a 5-day trial with a work email. If the email is from a free/disposable provider or your org has used its 3 free trials, WebCLI automatically offers the $10 trial pass instead.
Enter an email. Work domains under the org cap get a free activation key; free providers, disposable providers, and capped orgs get the $10 5-day trial-pass checkout.
Free trial when the email belongs to a real organization. Otherwise the server creates a $10 checkout for the same 5-day evaluation.
For one developer using WebCLI commercially with local agents.
For headless, CI, multi-machine, and production agent workflows.
Need to evaluate from a free email provider or an org that hit the free-trial cap? Use the $10 5-day trial pass from the trial form.
For redistribution, bundling, team platforms, and BrowserBox-backed integrations.
When a trial ends or a license is invalid, browser commands stop until a valid trial pass or paid license is activated.
Drop WebCLI instructions into your repo so your coding agent knows how to browse safely: observe first, use numbered actions, prefer JSON, pause on blockers, ask for human help when needed, and report with transcripts. AGENTS.md and SKILL.md formats are included.
web agents-md > AGENTS.mdweb skill-md > SKILL.mdcurl -fsSL webcli.sh/agents/SKILL.md -o SKILL.md