Getting Started

Start here

📥
Installation Guide
Download, install, and set up permissions for RocketMouse AI on your Mac.
🧩
Block Editor Basics
Learn how to drag, drop, snap, and connect blocks to build your first macro.
🚀
Your First Macro
A step-by-step walkthrough to create and run your first automation in 5 minutes.

Reference

Operation Categories

Detailed documentation for all 196 operations across 24 categories.

🖱
Mouse Operations
Click, double-click, drag, drop, wheel, move, and modifier key combinations.
16 operations
⌨️
Keyboard
Send keystrokes and paste text with full modifier key support.
2 operations
🪟
Window Management
Activate, close, minimize, maximize, resize, and move windows.
8 operations
🌐
Browser Automation
Playwright-powered Chrome automation: navigate, click, fill, extract, script.
16 operations
📊
Excel
Read/write cells, formulas, filters, sheets, and run Excel macros.
18 operations
📂
File Operations
Copy, move, delete, rename files and folders. Read and write text files.
12 operations
🔀
Control Flow
If/else, for/while/forEach loops, try-catch, functions, break, continue.
17 operations
💾
Data Processing
JSON parsing, regex, string manipulation, math, date calculations.
14 operations
👁
Vision & OCR
Image recognition, template matching, OCR text extraction, color detection.
8 operations
🤖
AI Automation
AI Click, AI OCR, AI Smart Wait, AI Validate, AI Autopilot (Computer Use), AI Instruction, macro generation, debugging.
8 operations
📝
Variables & Lists
Define, delete, search variables. Create and manipulate lists.
13 operations
Wait Conditions
Wait for time, image, window, color, or CPU conditions.
5 operations
📡
Network
HTTP GET/POST requests and file downloads.
3 operations
⚙️
System
INI files, dialogs, clipboard, sound, environment variables, ZIP, Base64, hash.
13 operations

AI Blocks Guide

AI Block Details & Prompt Examples

Each AI block explained with parameters, use cases, and example prompts.

👁
AI Click

AI sees the screen and clicks any UI element by natural language description. Uses 2-pass refinement for screen-edge accuracy.

Parameters: prompt (required), button (Left/Right/Middle), clickType (Single/Double), provider, model, outputX, outputY

Prompt examples:

  • "Click the Save button"
  • "Click the File menu in Finder"
  • "Click the red close button on the top-left of the window"
  • "Click the search field in Safari"

Tip: Be specific about location and appearance. "the blue Submit button in the bottom right" works better than "Submit".

AI Smart Wait

AI monitors the screen at regular intervals and waits until a described condition is met. Replaces fixed-time waits with intelligent visual checks.

Parameters: prompt (required), timeoutMs (default: 30000), pollingMs (default: 2000, min: 500), provider, model

Prompt examples:

  • "The download progress dialog has closed"
  • "The Notes app window is visible on screen"
  • "The loading spinner has disappeared"
  • "A confirmation dialog with an OK button is showing"

Tip: Use specific, visually verifiable conditions. "The Notes app window is visible on screen" is better than "Notes is open".

📖
AI OCR

AI reads text from the screen and stores it in a variable. Works with any font, any language, any app.

Parameters: prompt (required), outputVariable (default: AiOcrResult), provider, model

Prompt examples:

  • "Read the total price shown on screen"
  • "Read all text in the Notes app"
  • "Read the error message in the dialog"
  • "Read the title of the active window"
AI Validate

AI checks the screen and returns true/false. Ideal for verifying that an action succeeded before proceeding.

Parameters: prompt (required), outputVariable (default: AiValidateResult), provider, model

Prompt examples:

  • "The file was saved successfully"
  • "The login page is showing"
  • "An error message is displayed"
  • "The checkbox is checked"
🚀
AI Autopilot

Autonomous agent that takes full control of the screen to complete a described task. Dual-provider Computer Use: Anthropic (Claude Sonnet 4.6 / Opus 4.7) or OpenAI (GPT-5.5 / GPT-5.4 / computer-use-preview). The AI captures screenshots, reasons about the next action, clicks, types, scrolls, and repeats until the task is done.

Parameters: prompt (required), maxSteps (default: 30, max: 200), timeoutSeconds (default: 120, max: 600), outputVariable, provider, model

Prompt examples:

  • "Open the Calculator app and compute 2 + 5"
  • "Open TextEdit, type Hello World, and save to Desktop"
  • "Open System Settings and enable Dark Mode"
  • "Open Safari, navigate to apple.com, and take a screenshot"

Note: Requires an Anthropic or OpenAI API key. Choose the default model in AI Settings (Sonnet 4.6 / Opus 4.7 / GPT-5.5 / GPT-5.4 / computer-use-preview). GPT-5.5 is Tier 1 friendly and fast; Sonnet 4.6 is reliable for complex layouts; computer-use-preview requires Tier 3+. Cmd+Space (Spotlight) is unavailable via automation — the AI uses Launchpad to open apps.

💬
AI Instruction

Send a text prompt to any LLM and store the response in a variable. Text-only (no screen capture). Works with Apple Intelligence.

Parameters: prompt (required), outputVariable, provider, model

Prompt examples:

  • "Summarize the following text in 3 sentences: {=myText}"
  • "Translate to English: {=japaneseText}"
  • "Generate a professional email reply to: {=emailContent}"
  • "Calculate the tax for {=price} at 10%"
🔶
Bool AI Condition

A Boolean (hexagonal) block that uses AI Vision to evaluate true/false conditions. Use inside if/while blocks for visual condition branching.

Parameters: prompt (required), provider, model

Prompt examples:

  • "The login screen is showing"
  • "There is an error message on screen"
  • "The file download is complete"
Recommended Models by Use Case
AI Click (high precision) Claude Sonnet 4.6, GPT-4o
AI Click (budget) Gemini 2.5 Flash (free: 250 req/day), Gemini 2.5 Flash-Lite (free: 1,000 req/day), GPT-4o-mini
AI Smart Wait / Validate Any Vision model (Yes/No decision — even budget models work)
AI OCR Claude Sonnet 4.6, GPT-4o (best for complex layouts)
AI Autopilot GPT-5.5 (fast/Tier 1 OK), Claude Sonnet 4.6 (reliable), Claude Opus 4.7 / GPT-5.4 / computer-use-preview
AI Instruction Apple Intelligence (free), Gemini Flash (budget), GPT-4o (quality)

Reference

Execution & Debugging

▶️
Running Macros
Click Play to run from the Start block. Active blocks highlight in green. Use Pause/Resume/Step for controlled execution. Speed adjustable from 0.1x to 5.0x.
🐞
Breakpoints
Right-click a block or press F9 to set a breakpoint (red dot). Execution pauses before the marked block, letting you inspect variables and step through.
🧪
DryRun Mode
Test your macro logic without executing real mouse/keyboard actions. DryRun logs each step so you can verify flow, conditions, and variable values safely.
📊
Variable Watch
The Variables tab shows real-time values during execution. Reference variables in parameters with {=variableName} syntax. System variables: {=_loopIndex}, {=_loopIteration}.
📝
Custom Functions
Define reusable functions with FunctionDefine (Hat block), call with FunctionCall, return values with FunctionReturn (Cap block). Supports recursion up to 100 levels deep.
Recording
Click Record to capture mouse clicks, keystrokes, and scrolling across any app. After recording, AI can optionally analyze clicks and suggest converting them to AI Click blocks for resilience.

Reference

Keyboard Shortcuts

Cmd+ZUndo
Cmd+Shift+ZRedo
Cmd+C / Cmd+VCopy / Paste blocks
Cmd+DDuplicate selection
Cmd+ASelect all blocks
DeleteDelete selected blocks
Cmd+SSave project
Cmd+Shift+SSave as...
Cmd+FSearch blocks
HomeZoom to fit
F9Toggle breakpoint
Cmd+Shift+FCollapse/expand C-blocks
EscapeDeselect all

Guide

Block Editor

🎨
7 Block Shapes
Stack, Hat, C-Block, Boolean, Reporter, Cap, and If-Else. Each shape represents a different operation type.
🔗
Snap Connections
Blocks automatically snap together when dragged near compatible connectors. Visual guides show valid connections.
🔍
Search & Minimap
Find operations instantly with incremental search. Navigate large macros with the minimap overview.
↩️
Undo & Redo
Every edit is tracked. Cmd+Z to undo, Cmd+Shift+Z to redo. Never lose your work.
🐞
Debug Tools
Set breakpoints, step through execution, inspect variables, and use DryRun mode for safe testing.
Macro Recording
Record mouse and keyboard actions automatically. The recorder generates editable blocks you can customize.
🖱
Navigation & Zoom
Trackpad
Two-finger swipePan (scroll) the workspace
Pinch in/outZoom in/out
Mouse
Right-click + dragPan (scroll) the workspace
Cmd + scroll wheelZoom in/out
Home keyZoom to fit all blocks

NEW in v2.0

AI Vision Setup Guide

AI Vision blocks (AI Click, AI Smart Wait, AI OCR, AI Validate, Bool AI Condition) require a Vision-capable LLM provider. AI Autopilot is separate — it uses Computer Use APIs from Anthropic (Sonnet/Opus) or OpenAI (GPT-5.5/5.4/computer-use-preview), so an Anthropic or OpenAI API key is required. Here's how to set everything up.

How AI Settings are organized

AI Settings has 5 sections, each for a different purpose:

  1. AI Assistant — Provider for the chat panel (explain, diagnose, macro generation). Apple Intelligence is the default on macOS 26+ (free, on-device, no API key).
  2. Default Provider — Used for all AI Vision blocks (AI Click, AI OCR, AI Smart Wait, AI Validate, Bool AI Condition). Requires a Vision-capable cloud provider.
  3. AI Autopilot — Dedicated to the Autopilot block. Choose a default model (Anthropic Sonnet 4.6 / Opus 4.7, or OpenAI GPT-5.5 / GPT-5.4 / computer-use-preview). Requires the corresponding provider's API key.
  4. Provider Settings — Configure API keys, models, and temperature for each cloud provider (OpenAI, Anthropic, Gemini, Groq, Custom/Local LLM).
  5. Self-Healing — When image/OCR operations fail, AI Vision automatically retries using natural language.
Apple Intelligence Setup (macOS 26+)

Apple Intelligence powers the AI Assistant for free, with no API key and full privacy. Before using it, you need to complete a one-time setup on your Mac:

  1. Open System Settings
  2. Select "Apple Intelligence & Siri"
  3. Click "Turn on Apple Intelligence"
  4. Wait for the model download to complete (may take several minutes on Wi-Fi)
  5. Once ready, open RocketMouse AI → AI Settings → AI Assistant section should show "Apple Intelligence" as the default provider

Requirements: macOS 26 or later, Apple Silicon (M1 or later). Intel Macs are not supported.
Note: If the model download is not complete, Apple Intelligence will not appear in the provider list. Check System Settings to verify the status.

Apple Intelligence Compatibility by Block

Block Apple Intelligence Cloud Provider Reason
AI Assistant (Chat / Explain / Diagnose) Yes (default) Yes Text-only
AI Instruction Yes Yes Text-only
AI Macro Generate Yes Yes Text-only
AI Click No Required Needs Vision (screenshot analysis)
AI Smart Wait No Required Needs Vision (screenshot analysis)
AI OCR No Required Needs Vision (screenshot analysis)
AI Validate No Required Needs Vision (screenshot analysis)
AI Autopilot No Anthropic or OpenAI Computer Use API (Sonnet/Opus or GPT-5.5/5.4/computer-use-preview)
Bool AI Condition No Required Needs Vision (screenshot analysis)
💡
Note on Apple Intelligence Capabilities

Apple Intelligence runs a compact on-device model. It works well for summarization, text formatting, simple calculations, and short instructions, but may produce inaccurate results for knowledge-based questions (e.g., dates, facts, trivia). For tasks that require broad knowledge or high accuracy, use a cloud provider such as OpenAI, Anthropic, or Gemini.

Vision-Capable Providers & Recommended Models

Provider Vision Support Recommended Model Notes
OpenAI Yes gpt-5.5 Latest flagship (Tier 1+). Also: gpt-5.5-mini (cheap), gpt-4o (legacy)
Anthropic Yes claude-sonnet-4-6 Balanced performance. Also: claude-opus-4-7 (top), claude-haiku-4-5-20251001 (cheap)
Google Gemini Yes gemini-2.5-flash (free: 250 req/day) Fast and cost-effective. Also: gemini-2.5-pro (most capable)
Groq No Text-only. Works for AI Assistant but not AI Vision blocks
Custom / Local LLM Depends Varies Requires an OpenAI-compatible API with Vision support (e.g., LM Studio with a vision model)
Apple Intelligence Not yet Text-only (macOS 26+). Works for AI Assistant. Vision support may come in future updates

Step-by-Step Setup

1️⃣
Get an API Key

Sign up for an account with your chosen provider:

2️⃣
Open AI Settings
In RocketMouse AI, go to the menu bar and select Settings > AI Settings... (or click the brain icon in the toolbar).
3️⃣
Configure Default Provider & API Key
  1. In the "Default Provider" section, select your provider (e.g., OpenAI, Gemini)
  2. Scroll down to "Provider Settings" and expand your chosen provider
  3. Paste your API key and click Test to verify
  4. For AI Autopilot: pick a default model in the AI Autopilot section (Anthropic Sonnet/Opus or OpenAI GPT-5.5/5.4/computer-use-preview), then expand the matching provider in Provider Settings and enter the API key
4️⃣
Start Using AI Vision Blocks
Open the block palette, find the AI category (purple), and drag an AI Click block into your workspace. Type a description like "Click the OK button" and run your macro!
💡
Tips for Best Results
  • Be specific in descriptions: "Click the blue Submit button in the bottom right" works better than "Click submit"
  • Use Self-Healing (in AI Vision Options) to automatically retry with AI when image/OCR operations fail
  • Per-block override: Each AI Vision block has its own Provider and Model fields, letting you use different providers for different blocks
  • Recording Enhancement: Record your clicks first, then let AI analyze and suggest converting them to AI Click blocks for resilience
  • Cost tip: Use gemini-2.5-flash (free: 250 req/day) or gemini-2.5-flash-lite (free: 1,000 req/day) for simpler tasks. Both are free within daily limits!
💻
Using Local LLMs (Advanced)

You can use a local Vision LLM instead of cloud providers:

  1. Install LM Studio or a similar tool
  2. Download a vision-capable model (e.g., llava, moondream, minicpm-v)
  3. Start the local server (usually at http://localhost:1234)
  4. In AI Settings, select Custom (Local LLM) and set the Base URL
  5. API Key can be left empty for local servers

Note: Local vision models may be less accurate than cloud providers like GPT-4o or Claude.

Need more help?

Can't find what you're looking for? Get in touch.

Contact Support