Everything you need to master RocketMouse AI
Detailed documentation for all 196 operations across 24 categories.
Each AI block explained with parameters, use cases, and example prompts.
AI sees the screen and clicks any UI element by natural language description. Uses 2-pass refinement for screen-edge accuracy.
Parameters: prompt (required), button (Left/Right/Middle), clickType (Single/Double), provider, model, outputX, outputY
Prompt examples:
"Click the Save button""Click the File menu in Finder""Click the red close button on the top-left of the window""Click the search field in Safari"Tip: Be specific about location and appearance. "the blue Submit button in the bottom right" works better than "Submit".
AI monitors the screen at regular intervals and waits until a described condition is met. Replaces fixed-time waits with intelligent visual checks.
Parameters: prompt (required), timeoutMs (default: 30000), pollingMs (default: 2000, min: 500), provider, model
Prompt examples:
"The download progress dialog has closed""The Notes app window is visible on screen""The loading spinner has disappeared""A confirmation dialog with an OK button is showing"Tip: Use specific, visually verifiable conditions. "The Notes app window is visible on screen" is better than "Notes is open".
AI reads text from the screen and stores it in a variable. Works with any font, any language, any app.
Parameters: prompt (required), outputVariable (default: AiOcrResult), provider, model
Prompt examples:
"Read the total price shown on screen""Read all text in the Notes app""Read the error message in the dialog""Read the title of the active window"AI checks the screen and returns true/false. Ideal for verifying that an action succeeded before proceeding.
Parameters: prompt (required), outputVariable (default: AiValidateResult), provider, model
Prompt examples:
"The file was saved successfully""The login page is showing""An error message is displayed""The checkbox is checked"Autonomous agent that takes full control of the screen to complete a described task. Dual-provider Computer Use: Anthropic (Claude Sonnet 4.6 / Opus 4.7) or OpenAI (GPT-5.5 / GPT-5.4 / computer-use-preview). The AI captures screenshots, reasons about the next action, clicks, types, scrolls, and repeats until the task is done.
Parameters: prompt (required), maxSteps (default: 30, max: 200), timeoutSeconds (default: 120, max: 600), outputVariable, provider, model
Prompt examples:
"Open the Calculator app and compute 2 + 5""Open TextEdit, type Hello World, and save to Desktop""Open System Settings and enable Dark Mode""Open Safari, navigate to apple.com, and take a screenshot"Note: Requires an Anthropic or OpenAI API key. Choose the default model in AI Settings (Sonnet 4.6 / Opus 4.7 / GPT-5.5 / GPT-5.4 / computer-use-preview). GPT-5.5 is Tier 1 friendly and fast; Sonnet 4.6 is reliable for complex layouts; computer-use-preview requires Tier 3+. Cmd+Space (Spotlight) is unavailable via automation — the AI uses Launchpad to open apps.
Send a text prompt to any LLM and store the response in a variable. Text-only (no screen capture). Works with Apple Intelligence.
Parameters: prompt (required), outputVariable, provider, model
Prompt examples:
"Summarize the following text in 3 sentences: {=myText}""Translate to English: {=japaneseText}""Generate a professional email reply to: {=emailContent}""Calculate the tax for {=price} at 10%"A Boolean (hexagonal) block that uses AI Vision to evaluate true/false conditions. Use inside if/while blocks for visual condition branching.
Parameters: prompt (required), provider, model
Prompt examples:
"The login screen is showing""There is an error message on screen""The file download is complete"| AI Click (high precision) | Claude Sonnet 4.6, GPT-4o |
| AI Click (budget) | Gemini 2.5 Flash (free: 250 req/day), Gemini 2.5 Flash-Lite (free: 1,000 req/day), GPT-4o-mini |
| AI Smart Wait / Validate | Any Vision model (Yes/No decision — even budget models work) |
| AI OCR | Claude Sonnet 4.6, GPT-4o (best for complex layouts) |
| AI Autopilot | GPT-5.5 (fast/Tier 1 OK), Claude Sonnet 4.6 (reliable), Claude Opus 4.7 / GPT-5.4 / computer-use-preview |
| AI Instruction | Apple Intelligence (free), Gemini Flash (budget), GPT-4o (quality) |
{=variableName} syntax. System variables: {=_loopIndex}, {=_loopIteration}.Cmd+Z | Undo |
Cmd+Shift+Z | Redo |
Cmd+C / Cmd+V | Copy / Paste blocks |
Cmd+D | Duplicate selection |
Cmd+A | Select all blocks |
Delete | Delete selected blocks |
Cmd+S | Save project |
Cmd+Shift+S | Save as... |
Cmd+F | Search blocks |
Home | Zoom to fit |
F9 | Toggle breakpoint |
Cmd+Shift+F | Collapse/expand C-blocks |
Escape | Deselect all |
| Trackpad | |
| Two-finger swipe | Pan (scroll) the workspace |
| Pinch in/out | Zoom in/out |
| Mouse | |
| Right-click + drag | Pan (scroll) the workspace |
| Cmd + scroll wheel | Zoom in/out |
| Home key | Zoom to fit all blocks |
AI Vision blocks (AI Click, AI Smart Wait, AI OCR, AI Validate, Bool AI Condition) require a Vision-capable LLM provider. AI Autopilot is separate — it uses Computer Use APIs from Anthropic (Sonnet/Opus) or OpenAI (GPT-5.5/5.4/computer-use-preview), so an Anthropic or OpenAI API key is required. Here's how to set everything up.
AI Settings has 5 sections, each for a different purpose:
Apple Intelligence powers the AI Assistant for free, with no API key and full privacy. Before using it, you need to complete a one-time setup on your Mac:
Requirements: macOS 26 or later, Apple Silicon (M1 or later). Intel Macs are not supported.
Note: If the model download is not complete, Apple Intelligence will not appear in the provider list. Check System Settings to verify the status.
| Block | Apple Intelligence | Cloud Provider | Reason |
|---|---|---|---|
| AI Assistant (Chat / Explain / Diagnose) | ✅ Yes (default) | ✅ Yes | Text-only |
| AI Instruction | ✅ Yes | ✅ Yes | Text-only |
| AI Macro Generate | ✅ Yes | ✅ Yes | Text-only |
| AI Click | ❌ No | ✅ Required | Needs Vision (screenshot analysis) |
| AI Smart Wait | ❌ No | ✅ Required | Needs Vision (screenshot analysis) |
| AI OCR | ❌ No | ✅ Required | Needs Vision (screenshot analysis) |
| AI Validate | ❌ No | ✅ Required | Needs Vision (screenshot analysis) |
| AI Autopilot | ❌ No | Anthropic or OpenAI | Computer Use API (Sonnet/Opus or GPT-5.5/5.4/computer-use-preview) |
| Bool AI Condition | ❌ No | ✅ Required | Needs Vision (screenshot analysis) |
Apple Intelligence runs a compact on-device model. It works well for summarization, text formatting, simple calculations, and short instructions, but may produce inaccurate results for knowledge-based questions (e.g., dates, facts, trivia). For tasks that require broad knowledge or high accuracy, use a cloud provider such as OpenAI, Anthropic, or Gemini.
| Provider | Vision Support | Recommended Model | Notes |
|---|---|---|---|
| OpenAI | ✅ Yes | gpt-5.5 |
Latest flagship (Tier 1+). Also: gpt-5.5-mini (cheap), gpt-4o (legacy) |
| Anthropic | ✅ Yes | claude-sonnet-4-6 |
Balanced performance. Also: claude-opus-4-7 (top), claude-haiku-4-5-20251001 (cheap) |
| Google Gemini | ✅ Yes | gemini-2.5-flash (free: 250 req/day) |
Fast and cost-effective. Also: gemini-2.5-pro (most capable) |
| Groq | ❌ No | — | Text-only. Works for AI Assistant but not AI Vision blocks |
| Custom / Local LLM | ⚠ Depends | Varies | Requires an OpenAI-compatible API with Vision support (e.g., LM Studio with a vision model) |
| Apple Intelligence | ❌ Not yet | — | Text-only (macOS 26+). Works for AI Assistant. Vision support may come in future updates |
Sign up for an account with your chosen provider:
sk-)sk-ant-)gemini-2.5-flash (free: 250 req/day) or gemini-2.5-flash-lite (free: 1,000 req/day) for simpler tasks. Both are free within daily limits!You can use a local Vision LLM instead of cloud providers:
llava, moondream, minicpm-v)http://localhost:1234)Note: Local vision models may be less accurate than cloud providers like GPT-4o or Claude.
Can't find what you're looking for? Get in touch.