Help — RocketMouse AI

👁

AI Click

AI sees the screen and clicks any UI element by natural language description. Uses 2-pass refinement for screen-edge accuracy.

Parameters: prompt (required), button (Left/Right/Middle), clickType (Single/Double), provider, model, outputX, outputY

Prompt examples:

"Click the Save button"
"Click the File menu in Finder"
"Click the red close button on the top-left of the window"
"Click the search field in Safari"

Tip: Be specific about location and appearance. "the blue Submit button in the bottom right" works better than "Submit".

⏳

AI Smart Wait

AI monitors the screen at regular intervals and waits until a described condition is met. Replaces fixed-time waits with intelligent visual checks.

Parameters: prompt (required), timeoutMs (default: 30000), pollingMs (default: 2000, min: 500), provider, model

Prompt examples:

"The download progress dialog has closed"
"The Notes app window is visible on screen"
"The loading spinner has disappeared"
"A confirmation dialog with an OK button is showing"

Tip: Use specific, visually verifiable conditions. "The Notes app window is visible on screen" is better than "Notes is open".

📖

AI OCR

AI reads text from the screen and stores it in a variable. Works with any font, any language, any app.

Parameters: prompt (required), outputVariable (default: AiOcrResult), provider, model

Prompt examples:

"Read the total price shown on screen"
"Read all text in the Notes app"
"Read the error message in the dialog"
"Read the title of the active window"

✅

AI Validate

AI checks the screen and returns true/false. Ideal for verifying that an action succeeded before proceeding.

Parameters: prompt (required), outputVariable (default: AiValidateResult), provider, model

Prompt examples:

"The file was saved successfully"
"The login page is showing"
"An error message is displayed"
"The checkbox is checked"

🚀

AI Autopilot

Autonomous agent that takes full control of the screen to complete a described task. Dual-provider Computer Use: Anthropic (Claude Sonnet 4.6 / Opus 4.7) or OpenAI (GPT-5.5 / GPT-5.4 / computer-use-preview). The AI captures screenshots, reasons about the next action, clicks, types, scrolls, and repeats until the task is done.

Parameters: prompt (required), maxSteps (default: 30, max: 200), timeoutSeconds (default: 120, max: 600), outputVariable, provider, model

Prompt examples:

"Open the Calculator app and compute 2 + 5"
"Open TextEdit, type Hello World, and save to Desktop"
"Open System Settings and enable Dark Mode"
"Open Safari, navigate to apple.com, and take a screenshot"

Note: Requires an Anthropic or OpenAI API key. Choose the default model in AI Settings (Sonnet 4.6 / Opus 4.7 / GPT-5.5 / GPT-5.4 / computer-use-preview). GPT-5.5 is Tier 1 friendly and fast; Sonnet 4.6 is reliable for complex layouts; computer-use-preview requires Tier 3+. Cmd+Space (Spotlight) is unavailable via automation — the AI uses Launchpad to open apps.

💬

AI Instruction

Send a text prompt to any LLM and store the response in a variable. Text-only (no screen capture). Works with Apple Intelligence.

Parameters: prompt (required), outputVariable, provider, model

Prompt examples:

"Summarize the following text in 3 sentences: {=myText}"
"Translate to English: {=japaneseText}"
"Generate a professional email reply to: {=emailContent}"
"Calculate the tax for {=price} at 10%"

🔶

Bool AI Condition

A Boolean (hexagonal) block that uses AI Vision to evaluate true/false conditions. Use inside if/while blocks for visual condition branching.

Parameters: prompt (required), provider, model

Prompt examples:

"The login screen is showing"
"There is an error message on screen"
"The file download is complete"

⭐

Recommended Models by Use Case

AI Click (high precision)	Claude Sonnet 4.6, GPT-4o
AI Click (budget)	Gemini 2.5 Flash (free: 250 req/day), Gemini 2.5 Flash-Lite (free: 1,000 req/day), GPT-4o-mini
AI Smart Wait / Validate	Any Vision model (Yes/No decision — even budget models work)
AI OCR	Claude Sonnet 4.6, GPT-4o (best for complex layouts)
AI Autopilot	GPT-5.5 (fast/Tier 1 OK), Claude Sonnet 4.6 (reliable), Claude Opus 4.7 / GPT-5.4 / computer-use-preview
AI Instruction	Apple Intelligence (free), Gemini Flash (budget), GPT-4o (quality)

▶️

Running Macros

Click Play to run from the Start block. Active blocks highlight in green. Use Pause/Resume/Step for controlled execution. Speed adjustable from 0.1x to 5.0x.

🐞

Breakpoints

Right-click a block or press F9 to set a breakpoint (red dot). Execution pauses before the marked block, letting you inspect variables and step through.

🧪

DryRun Mode

Test your macro logic without executing real mouse/keyboard actions. DryRun logs each step so you can verify flow, conditions, and variable values safely.

📊

Variable Watch

The Variables tab shows real-time values during execution. Reference variables in parameters with {=variableName} syntax. System variables: {=_loopIndex}, {=_loopIteration}.

📝

Custom Functions

Define reusable functions with FunctionDefine (Hat block), call with FunctionCall, return values with FunctionReturn (Cap block). Supports recursion up to 100 levels deep.

⏺

Recording

Click Record to capture mouse clicks, keystrokes, and scrolling across any app. After recording, AI can optionally analyze clicks and suggest converting them to AI Click blocks for resilience.

`Cmd+Z`	Undo
`Cmd+Shift+Z`	Redo
`Cmd+C` / `Cmd+V`	Copy / Paste blocks
`Cmd+D`	Duplicate selection
`Cmd+A`	Select all blocks
`Delete`	Delete selected blocks
`Cmd+S`	Save project
`Cmd+Shift+S`	Save as...
`Cmd+F`	Search blocks
`Home`	Zoom to fit
`F9`	Toggle breakpoint
`Cmd+Shift+F`	Collapse/expand C-blocks
`Escape`	Deselect all

🎨

7 Block Shapes

Stack, Hat, C-Block, Boolean, Reporter, Cap, and If-Else. Each shape represents a different operation type.

🔗

Snap Connections

Blocks automatically snap together when dragged near compatible connectors. Visual guides show valid connections.

🔍

Search & Minimap

Find operations instantly with incremental search. Navigate large macros with the minimap overview.

↩️

Undo & Redo

Every edit is tracked. Cmd+Z to undo, Cmd+Shift+Z to redo. Never lose your work.

🐞

Debug Tools

Set breakpoints, step through execution, inspect variables, and use DryRun mode for safe testing.

⏺

Macro Recording

Record mouse and keyboard actions automatically. The recorder generates editable blocks you can customize.

🖱

Navigation & Zoom

Trackpad
Two-finger swipe	Pan (scroll) the workspace
Pinch in/out	Zoom in/out
Mouse
Right-click + drag	Pan (scroll) the workspace
Cmd + scroll wheel	Zoom in/out
Home key	Zoom to fit all blocks

❓

How AI Settings are organized

AI Settings has 5 sections, each for a different purpose:

AI Assistant — Provider for the chat panel (explain, diagnose, macro generation). Apple Intelligence is the default on macOS 26+ (free, on-device, no API key).
Default Provider — Used for all AI Vision blocks (AI Click, AI OCR, AI Smart Wait, AI Validate, Bool AI Condition). Requires a Vision-capable cloud provider.
AI Autopilot — Dedicated to the Autopilot block. Choose a default model (Anthropic Sonnet 4.6 / Opus 4.7, or OpenAI GPT-5.5 / GPT-5.4 / computer-use-preview). Requires the corresponding provider's API key.
Provider Settings — Configure API keys, models, and temperature for each cloud provider (OpenAI, Anthropic, Gemini, Groq, Custom/Local LLM).
Self-Healing — When image/OCR operations fail, AI Vision automatically retries using natural language.



Apple Intelligence Setup (macOS 26+)

Apple Intelligence powers the AI Assistant for free, with no API key and full privacy. Before using it, you need to complete a one-time setup on your Mac:

Open System Settings
Select "Apple Intelligence & Siri"
Click "Turn on Apple Intelligence"
Wait for the model download to complete (may take several minutes on Wi-Fi)
Once ready, open RocketMouse AI → AI Settings → AI Assistant section should show "Apple Intelligence" as the default provider

Requirements: macOS 26 or later, Apple Silicon (M1 or later). Intel Macs are not supported.
Note: If the model download is not complete, Apple Intelligence will not appear in the provider list. Check System Settings to verify the status.

Apple Intelligence Compatibility by Block

Block	Apple Intelligence	Cloud Provider	Reason
AI Assistant (Chat / Explain / Diagnose)	✅ Yes (default)	✅ Yes	Text-only
AI Instruction	✅ Yes	✅ Yes	Text-only
AI Macro Generate	✅ Yes	✅ Yes	Text-only
AI Click	❌ No	✅ Required	Needs Vision (screenshot analysis)
AI Smart Wait	❌ No	✅ Required	Needs Vision (screenshot analysis)
AI OCR	❌ No	✅ Required	Needs Vision (screenshot analysis)
AI Validate	❌ No	✅ Required	Needs Vision (screenshot analysis)
AI Autopilot	❌ No	Anthropic or OpenAI	Computer Use API (Sonnet/Opus or GPT-5.5/5.4/computer-use-preview)
Bool AI Condition	❌ No	✅ Required	Needs Vision (screenshot analysis)

💡

Note on Apple Intelligence Capabilities

Apple Intelligence runs a compact on-device model. It works well for summarization, text formatting, simple calculations, and short instructions, but may produce inaccurate results for knowledge-based questions (e.g., dates, facts, trivia). For tasks that require broad knowledge or high accuracy, use a cloud provider such as OpenAI, Anthropic, or Gemini.

Vision-Capable Providers & Recommended Models

Provider	Vision Support	Recommended Model	Notes
OpenAI	✅ Yes	`gpt-5.5`	Latest flagship (Tier 1+). Also: `gpt-5.5-mini` (cheap), `gpt-4o` (legacy)
Anthropic	✅ Yes	`claude-sonnet-4-6`	Balanced performance. Also: `claude-opus-4-7` (top), `claude-haiku-4-5-20251001` (cheap)
Google Gemini	✅ Yes	`gemini-2.5-flash` (free: 250 req/day)	Fast and cost-effective. Also: `gemini-2.5-pro` (most capable)
Groq	❌ No	—	Text-only. Works for AI Assistant but not AI Vision blocks
Custom / Local LLM	⚠ Depends	Varies	Requires an OpenAI-compatible API with Vision support (e.g., LM Studio with a vision model)
Apple Intelligence	❌ Not yet	—	Text-only (macOS 26+). Works for AI Assistant. Vision support may come in future updates

Step-by-Step Setup

1️⃣

Get an API Key

OpenAI: platform.openai.com/api-keys — Create a new secret key (starts with sk-)
Anthropic: console.anthropic.com/settings/keys — Create API key (starts with sk-ant-)
Google Gemini: aistudio.google.com/apikey — Create API key

2️⃣

Open AI Settings

In RocketMouse AI, go to the menu bar and select Settings > AI Settings... (or click the brain icon in the toolbar).

3️⃣

Configure Default Provider & API Key

In the "Default Provider" section, select your provider (e.g., OpenAI, Gemini)
Scroll down to "Provider Settings" and expand your chosen provider
Paste your API key and click Test to verify
For AI Autopilot: pick a default model in the AI Autopilot section (Anthropic Sonnet/Opus or OpenAI GPT-5.5/5.4/computer-use-preview), then expand the matching provider in Provider Settings and enter the API key

4️⃣

Start Using AI Vision Blocks

Open the block palette, find the AI category (purple), and drag an AI Click block into your workspace. Type a description like "Click the OK button" and run your macro!

💡

Tips for Best Results

Be specific in descriptions: "Click the blue Submit button in the bottom right" works better than "Click submit"
Use Self-Healing (in AI Vision Options) to automatically retry with AI when image/OCR operations fail
Per-block override: Each AI Vision block has its own Provider and Model fields, letting you use different providers for different blocks
Recording Enhancement: Record your clicks first, then let AI analyze and suggest converting them to AI Click blocks for resilience
Cost tip: Use gemini-2.5-flash (free: 250 req/day) or gemini-2.5-flash-lite (free: 1,000 req/day) for simpler tasks. Both are free within daily limits!

💻

Using Local LLMs (Advanced)

You can use a local Vision LLM instead of cloud providers:

Install LM Studio or a similar tool
Download a vision-capable model (e.g., llava, moondream, minicpm-v)
Start the local server (usually at http://localhost:1234)
In AI Settings, select Custom (Local LLM) and set the Base URL
API Key can be left empty for local servers

Note: Local vision models may be less accurate than cloud providers like GPT-4o or Claude.

Help Center

Start here

Operation Categories

AI Block Details & Prompt Examples

Execution & Debugging

Keyboard Shortcuts

Block Editor

AI Vision Setup Guide

Apple Intelligence Compatibility by Block

Vision-Capable Providers & Recommended Models

Step-by-Step Setup

Need more help?