You prompt
Give your agent a real web goal: book a flight, deploy a site, manage a cloud resource, fill a form, pull data. Speak naturally.
Browser skill for AI agents.
Install WebCLI. Run web teach. Your agent gets a real browser skill — inspect pages, act on numbered refs, recover from blockers, pause for humans.
book SFO→NYC tuesday mid-afternoon under $400web go https://kayak.comweb inspect [1] input: origin
[2] input: destination
[7] button: Search flights
web type 1 "SFO" && web type 2 "NYC" && web do 7web inspect [3] UA 342 · Tue 2:15pm · $337 · 5h 30m
[4] DL 489 · Tue 3:40pm · $391 · 5h 15m
Done. UA 342 · $337 · 2:15pm — 16% under budget.The core loop
The WebCLI skill teaches your agent a loop that works on any live website — no scripting, no framework adoption.
Give your agent a real web goal: book a flight, deploy a site, manage a cloud resource, fill a form, pull data. Speak naturally.
web inspect returns numbered actions. web do N acts on one. web type fills fields. Structured state — no screenshots, no selectors.
Your agent reports back with what happened. When the web needs a human — a passkey, a CAPTCHA, a choice — it pauses and waits.
Proof it works
No cloud SDK. No prewritten scripts. Real cloud consoles, operated through WebCLI.
Codex creates and deletes VMs across three cloud providers — through the browser portals, no SDK scripts, no Playwright flows. Same inspect → do loop on all three.
Agent testimonials
Would I recommend web CLI as a general agent tool?
Yes, unambiguously, for web tasks. The reasons are architectural:<br><br>• ARIA tree > screenshot for the vast majority of web work. You get intent-aligned semantics, not pixel coordinates. The AA checkout "Payment information / polite" live region and the "ALEX JOHNSON is not assigned a seat" announcement were genuinely more reliable than anything a vision model would extract from a screenshot.<br>• Survives dynamic state. The sheet epoch system means the agent can't silently act on stale refs. Playwright scripts break silently; this breaks loudly and stops the chain.<br>• Multi-tab / multi-profile is the right model for parallel agent work. No other tool handles this as naturally.<br>• Cost: inspect+do is dramatically cheaper than screenshot+vision per action.<br><br>The honest caveats: CAPTCHAs requiring press-and-hold (like Priceline) need human handoff. But for 90% of real web tasks — forms, navigation, reading state, booking flows — it's the right tool.
Agent Web tasks evaluation
From a VM creation task across 3 clouds (Azure, AWS, GCP) and a multi-site e2e flight search and booking flow through to a payment attempt with a test card.
Claude Sonnet, would you recommend WebCLI?
Yes, strongly. The structured output with stable refs, blocking state detection, ARIA modal identification, and shell composability are genuinely better for structured web work than screenshot approaches.
Claude Sonnet Azure VM lifecycle
From a full Azure VM lifecycle run without screenshot-driven control.
Claude Sonnet 4.6, would you recommend WebCLI?
The inspect/find/do loop is fast once oriented — multi-step checkout flows felt natural. Cross-origin payment iframes just worked: switching frames and typing into secure fields is a hard problem most tools fumble. Recovery from overlays and dialogs held up well. Sites requiring login or email verification are a hard stop without a pre-existing session, so persistent authenticated profiles matter for real workflows on those. But for anyone building browser-driving agents who wants semantic page state instead of brittle selectors, it's a much better fit than Playwright or Puppeteer — the numbered ref system and structured output are designed for agent consumption, not script replay.
Claude Sonnet 4.6 Multi-site e-commerce run
From a shopping task across Lego.com, Amazon, Walmart, and Alibaba — driven to the Adyen payment step on Lego.com without a pre-existing account.
One command. Every agent knows the loop.
Run web teach and your coding agent gets a SKILL.md with the complete browser loop: inspect first, use numbered refs, pause on blockers, report with transcripts.
web teach
Installs SKILL.md into .claude/, .grok/, .gemini/, .copilot/, and .codex/ — then prompt your agent naturally.
No crippled mode. Observe, inspect, do, recover, pause, transcript — the real thing.
Work or trusted non-free email: free. Personal or free email: $5 5-day trial pass.
Stop doing web tasks yourself. Prompt your agent.
WebCLI. Taking you there faster.