Getting Started

What is RocketMouse AI?

RocketMouse AI is a visual editor for building RPA macros using Scratch-style puzzle blocks. Drag and snap blocks together to automate mouse actions, keyboard input, browser interactions, Excel operations, image recognition, AI integration, and more — all without writing code.

Screen Layout

The app uses a 3-column layout:

  • Left: Category sidebar + Block palette — list of available blocks
  • Center: Workspace — main area for placing and connecting blocks
  • Right: Property panel + Log/Variables panel — block settings and execution results

Project Files

Saved in .rmproj format (JSON). Ctrl+S to save, Ctrl+Shift+S for Save As. An asterisk * in the title bar indicates unsaved changes.

Editor Basics

Placing Blocks

Drag blocks from the palette to the workspace. You can also click a palette item to place it directly.

Snap Connections

Bring blocks close together (within 20px) and they automatically snap-connect. Yellow highlight shows connection candidates.

Stack Dragging

Drag the top block of a connected chain to move all attached blocks together.

Block Selection

Click to select (blue border). Ctrl+click for multi-select. Drag on canvas for marquee selection. Ctrl+A to select all.

Zoom / Pan

Mouse wheel to zoom. Middle button drag or right button drag to pan. Home key to zoom-to-fit all blocks.

Property Panel

Appears on the right when a block is selected. Set comments, delay (ms), enabled/disabled, and block-specific parameters.

Palette Search

Use the search box above the palette for incremental search across all categories. Esc or the ✕ button clears the search.

Minimap

200×150px overview in the bottom-left corner. Click/drag to pan the workspace. Toggle via toolbar button.

Block Types

ShapeDescriptionExamples
StackStandard block with top notch + bottom bumpClick, type, file operations
HatRounded top, no top connectorStart block, function definitions
C-BlockContains child blocks in its mouth, auto-expandsLoops, conditions, TryCatch
BooleanHexagonal, represents true/false conditionsComparisons (=, >, <), AND/OR/NOT
ReporterCapsule shape, returns values, snaps into parameter slotsVariable refs, math, strings
CapTop connector only, terminates flowBreak, Continue, MacroExit, Return

Block Categories

Mouse 16 blocks

BlockDescription
LeftClick / RightClick / MiddleClickMouse button clicks (with optional coordinates)
DoubleClickDouble-click
MouseMoveMove cursor to coordinates
MouseDragDrag from point A to point B
ScrollUp / ScrollDownMouse wheel scroll
MouseClickRelative / MouseMoveRelativeRelative to active window
OthersLeftDown/Up, RightDown/Up, ScrollLeft/Right

Keyboard 2 blocks

BlockDescription
KeyInputKey combos (e.g., Ctrl+S, {ENTER}, {F8}{ENTER})
TextPastePaste text via clipboard

Window 8 blocks

BlockDescription
ActivateWindowBring window to front
CloseWindowClose specified window
MinimizeWindow / MaximizeWindow / RestoreWindowWindow state
ResizeWindow / MoveWindowResize and reposition
GetWindowTitleGet active window title

Browser 16 blocks

Control Edge/Chrome via Playwright.

BlockDescription
BrowserLaunchLaunch browser (Edge/Chrome/Chromium)
BrowserNavigateNavigate to URL
BrowserClickClick element (CSS selector)
BrowserFillType text into input field
BrowserSelectSelect from dropdown
BrowserCheckToggle checkbox on/off
BrowserGetText / BrowserGetAttributeExtract element data
BrowserGetTitle / BrowserGetUrlGet page title or URL
BrowserScreenshotSave page or element screenshot
BrowserExecuteScriptExecute JavaScript
BrowserWaitForElementWait for element to appear
BrowserSwitchTab / BrowserGoBackTab and navigation control
BrowserCloseClose browser

Excel 18 blocks

COM-based Excel operations (requires Microsoft Excel).

BlockDescription
ExcelOpen / ExcelClose / ExcelSaveWorkbook management
ExcelReadCell / ExcelWriteCellCell read/write
ExcelSetFormulaSet cell formula
ExcelRunMacroExecute VBA macro
ExcelFilter / ExcelReadRangeFilter and range ops
OthersCopySheet, DeleteSheet, WriteRange, InsertRow, Sort, FindCell, Chart, ExportPdf, GetLastRow

File / Folder 15 blocks

BlockDescription
FileCopy / FileMove / FileDeleteBasic file operations
FileRead / FileWrite / FileAppendText file I/O
FolderCreate / FolderCopy / FolderDeleteFolder operations
FileRename / FileGetInfo / FileListFilesFile utilities

Data Processing 22 blocks

BlockDescription
JsonParse / JsonStringifyJSON parsing and generation
RegexMatch / RegexReplaceRegular expressions
StringIndexOf / StartsWith / EndsWithString search
DateAdd / DateDiff / DateFormatDate arithmetic and formatting
ListCreate / Add / Get / Set / Length / Remove / Sort / JoinList operations
OthersTextExtract, TextFormat, TextReplace, TextQuote

Control Flow 17 blocks

BlockDescription
ConditionBranchif-then (C-block)
ConditionBranchElseif-then-else (dual mouth)
LoopStartFor loop (specified count)
ForEachLoopIterate over list items
TryCatchError handling (try/catch mouths)
BreakLoop / ContinueLoopLoop control (Cap type)
MacroExitImmediate macro termination (Cap type)
WaitTime / WaitWindow / WaitImage / WaitColor / WaitCpuConditional waits
FunctionDefine / FunctionCall / FunctionReturnCustom functions

AI 1 block

BlockDescription
AiInstructionSend prompt to LLM, store response in variable. 5 providers (OpenAI, Anthropic, Gemini, Groq, local LLM). Vision API support (image attachment, 4MB limit).

Vision / OCR 9 blocks

BlockDescription
VisionClickClick on template image
VisionWait / VisionDisappearWait for image appear/disappear
VisionGetPositionGet image coordinates
VisionCaptureSave screen capture
OcrReadTextRead text from screen region (Windows OCR)
OcrClickText / OcrWaitTextFind and click/wait for text
WaitImageImage recognition wait (OpenCV)

Boolean / Condition 11 types

Hexagonal blocks that snap into C-block condition slots.

BlockDescription
BoolCompareEquals / Greater / LessComparison operators (=, >, <)
BoolVariableEqualsVariable value match
BoolFileExistsFile existence check
BoolWindowVisibleWindow visibility check
BoolImageFound / BoolColorFoundImage/color detection
BoolAnd / BoolOr / BoolNotLogical operators (nestable)

System / Utility 18+ blocks

BlockDescription
AppLaunch / AppCloseLaunch and close applications
MessageBox / InputDialogUser interaction dialogs
ClipboardSet / ClipboardGetClipboard operations
ProcessStart / ProcessKillProcess management
ZipCreate / ZipExtractZIP compression/extraction
Base64Encode / Base64DecodeBase64 encoding
HashComputeHash (MD5/SHA1/SHA256/SHA512)
HttpGet / HttpPost / HttpDownloadHTTP operations
RegistryRead / RegistryWriteRegistry operations
TextToSpeech / ScreenCaptureSystem utilities

Reporter 15 blocks

Capsule-shaped blocks that snap into parameter slots of other blocks.

BlockDescription
MathAdd / Subtract / Multiply / Divide / ModuloArithmetic + modulo
MathFunctionMath functions (abs, round, ceil, floor, sqrt)
StringConcat / StringLength / StringCharAtString operations
RandomNumberRandom number generation
SenseMouseX / SenseMouseYCurrent mouse coordinates
SenseCurrentDateTimeCurrent date/time
SenseClipboardTextClipboard text
SenseEnvironmentVariableEnvironment variable value

Execution & Debugging

Running Macros

Press the ▶ Play button in the toolbar. Execution starts from the Start block. The currently executing block is highlighted with a green border.

Pause / Resume / Step

While running: ⏸ Pause to stop, ▶ Resume to continue, ⏩ Step to execute one block at a time.

Stop

Press ■ Stop to immediately halt execution.

Dry Run

Test the flow without performing actual mouse/keyboard operations. Results are shown in the log panel.

Speed Control

Adjust execution speed from 1x to 10x using the toolbar slider. Changes apply in real-time.

Error Visualization

Failed blocks are highlighted with a red border, and the error message is shown in a tooltip.

Log Panel

Structured log in the bottom panel with 4 color levels: gray (Info), green (Success), red (Error), yellow (Warning).

Variable Watch

The "Variables" tab in the bottom panel shows runtime variable values. Changed variables are highlighted in green.

Breakpoints

Right-click a block → "Set Breakpoint" or press F9. A red dot marker appears, and execution pauses automatically when it reaches that block.

Recording

Press the ⚫ Record button to capture mouse/keyboard actions. Recorded actions are placed as blocks in the workspace.

Variables & Functions

Defining Variables

Use the VariableDefine block to create variables. Reference values with {=variableName} syntax in any parameter field.

Expressions

The VariableExpression block supports arithmetic: +, -, *, /, %, parentheses. Example: {=count} + 1

System Variables

Automatically set inside loops:

  • {=_loopIndex} — 1-based loop counter (1, 2, 3, ...)
  • {=_loopIteration} — 0-based loop counter (0, 1, 2, ...)

List Operations

ListCreate for empty lists, ListAdd to append, ListGet/ListSet for index access. ListSort, ListJoin, and more available.

Custom Functions

Define functions with FunctionDefine (Hat), call with FunctionCall (Stack), return values with FunctionReturn (Cap). Recursion supported (max depth 100).

Auto Output Variables

Blocks with output variables get smart default names (e.g., CellValue). Duplicates are auto-numbered (CellValue2, CellValue3...).

AI Features

AI Instruction Block

The AiInstruction block sends a prompt to an LLM and stores the response in a variable. Attach image files for Vision API (4MB limit). Parameters: prompt, outputVariable, filePath, provider, model, temperature.

Supported Providers

ProviderDefault ModelVision
OpenAIgpt-5.5 (default), gpt-5.5-mini, gpt-5.4, gpt-5.2, gpt-4.1, gpt-4o
Anthropicclaude-sonnet-4-6 (default), claude-opus-4-7, claude-haiku-4-5
Google Geminigemini-3-pro / 3-flash (Preview), gemini-2.5-pro / 2.5-flash
GroqLlama 4 Scout 17B (multimodal), Llama 3.3 70B, Llama 3.1 8B✓ (with Llama 4 Scout)
Local LLMUser-specified

AI Vision / Computer Use

New in v2.0 — AI Vision features let AI "see" and understand your screen, enabling automation that adapts to dynamic UI changes without fixed coordinates or template images.

AI Vision Blocks

BlockFunctionKey Parameters
AiClickAI sees the screen, locates the element, and clicksprompt, button, clickType, provider, model
AiSmartWaitAI periodically checks the screen until condition is metprompt, timeoutMs, pollingMs, provider, model
AiOcrAI reads text from the screen into a variableprompt, outputVariable, provider, model
AiValidateAI verifies screen state and returns true/falseprompt, outputVariable, provider, model
BoolAiConditionAI judgment in if/while conditionsprompt, provider, model
2-Pass Refinement (AiClick): Pass 1 uses a 1280×720 scaled screenshot for coarse coordinates. If near a screen edge (within 200px), Pass 2 crops a 1000×1000px region at original resolution for precise targeting. This accurately detects small icons like the Start button.

Prompt Examples

Example prompts for each AI Vision block. Specific, clear descriptions improve AI accuracy.

AiClick — Describe the target UI element specifically
The "File" menu in Notepad The "Save" button in the Save As dialog The Chrome icon in the taskbar Cell B3 in the Excel spreadsheet
AiSmartWait — Describe the screen state change to wait for
The file download has completed (progress bar disappeared) A "Processing complete" message is displayed The splash screen has closed and the main window is visible
AiOcr — Specify the location and content of text to read
Read the error message shown in the dialog at the center of the screen Read the total amount in the bottom-right cell of the Excel table Read the filename shown in the title bar
AiValidate — Describe the expected screen state after an action
The file was saved successfully (no "*" mark in the title bar) A login screen is displayed (username and password fields are visible) The print preview is correctly displayed
BoolAiCondition — Describe screen conditions for if/while logic
An error dialog is displayed on the screen Notepad is visible on the screen The web page has finished loading (no loading spinner visible)
Self-Healing: Enable in AI Settings to automatically fall back to AiClick when a standard VisionClick (template matching) fails. Dramatically improves robustness of existing macros.

AI Autopilot (Autonomous Agent)

AiAutopilot takes a natural language task description and autonomously operates your computer — taking screenshots, deciding actions, and repeating until the task is complete.

Dual-Provider Native Routing

Autopilot automatically selects from 3 execution paths based on the provider and model. v2.0.6 adds OpenAI GPT-5.5 native path.

PathWhenHow It WorksNotes
Anthropic Native PathAnthropic + Sonnet/OpusClaude Computer Use API (multi-turn tool_use/tool_result)High accuracy, well-proven stability
OpenAI Native Path NEWOpenAI + GPT-5.5 / GPT-5.4 / computer-use-previewOpenAI Responses API (stateless continuation via previous_response_id)Faster (~1.3x) and cheaper (~5x) vs. Sonnet
Generic PathAll other providers + Anthropic HaikuPrompt-based JSON approachStandard

Native paths send screenshots to the AI, receive tool calls (click, type, scroll, etc.), execute them, and return the resulting screenshot — a multi-turn conversation enabling high accuracy and self-correction.

Approach differs by provider: GPT-5.5 tends to take the shortest path (e.g., Win+R Run dialog), while Claude Sonnet behaves more human-like, navigating the Start menu step-by-step with visual confirmation. Pick GPT-5.5 for simple repetitive tasks, Sonnet for complex screen recognition.

Parameters

ParameterDescriptionDefault
task (prompt)Natural language task (e.g., "Open Notepad, type Hello World, and save")(required)
maxStepsMaximum steps (each step = screenshot + LLM + action)30
timeoutSecondsTimeout in seconds300
providerLLM provider. Default uses the provider selected in AI Settings → "AI Autopilot" section (Anthropic or OpenAI). Can be overridden per block: Anthropic / OpenAI / OpenAICompatible etc.Default
modelModel ID. Can be overridden per block (e.g., gemma4:26b)(empty = use AI Settings)
outputVariableResult variable (True/False)AutopilotResult

Supported Actions

ActionDescription
left_click / right_click / double_clickMouse click operations
typeText input (Japanese supported via clipboard)
keyKeyboard input (Ctrl+S, Enter, Win+D, etc.)
scrollMouse wheel scroll
mouse_move / left_click_dragCursor movement and drag
screenshotRequest additional screenshot
waitWait 2 seconds

Task Description Examples (task parameter)

Specific, step-clear task descriptions lead to more accurate Autopilot execution.

AiAutopilot — Describe the full task in natural language
Open Notepad, type "Hello World", and save it as test.txt on the Desktop → Specify filename and location Open Chrome, navigate to https://example.com, copy the page title, and paste it into Notepad → Multi-app coordination task Open Calculator and compute 1234 × 5678 → Simple, clear objective Open report.xlsx from the Desktop and check the value in cell A1 → Working with existing files
Cost Awareness: Autopilot sends a screenshot (~200KB) to the LLM on each step. A 30-step task incurs significant API costs. We recommend a hybrid approach: use Win32 blocks for routine tasks, Autopilot only where judgment is needed.

AI Vision works with all Vision-capable providers, but here are the recommended models for each use case:

Use CaseRecommended ModelReason
Autopilot (Speed & Cost) NEWOpenAI GPT-5.5Built-in Computer Use. ~1.3x faster, ~5x cheaper than Sonnet. Available from OpenAI Tier 1. Best for simple repetitive tasks.
Autopilot (Accuracy & Stability)Claude Sonnet 4.6Native Computer Use API. Long production track record; step-by-step verification approach excels at complex tasks.
Autopilot (Top Performance)Claude Opus 4.7Native Computer Use API. Highest accuracy and self-correction. For long, complex tasks.
Autopilot (OpenAI Premium)computer-use-previewOpenAI's specialized Computer Use model. Requires Tier 3+ (cumulative $100 spend + 7 days).
Autopilot (Local LLM)Gemma 4 26B A4B / Qwen3-VL 8BVia Ollama. No API costs, works offline. Speed depends on GPU.
AiClick / AiSmartWait / AiOcrClaude Sonnet 4.6 / GPT-5.5 / Gemini 2.5 Pro / Llama 4 ScoutSingle Vision API call. Any Vision-capable model works well.
BoolAiConditionGemini 2.5 Flash / GPT-5.5 miniSimple Yes/No judgment. Fast, low-cost models suffice.
AiInstruction (Text)Any (per your needs)No Vision required — Groq or local LLMs are also valid.

Computer Use Model Setting

The native-path model is selected from a single unified ComboBox at: AI Settings → "AI Autopilot" section → "Default Model". Pick from Anthropic (Sonnet 4.6 / Opus 4.7) or OpenAI (GPT-5.5 / 5.4 / computer-use-preview) in one selection. You can also override per-block with the provider and model parameters.

Model Priority: (1) Block-level provider + model parameters → (2) AI Settings "AI Autopilot" Default Model → (3) Default: Claude Sonnet 4.6 (Anthropic)
OpenAI Note: GPT-5.5 is available from Tier 1 ($5+ spent), but computer-use-preview requires Tier 3+ (cumulative $100 + 7 days). Also, the OpenAI project's "Allowed Models" whitelist must include the target model (some Default projects only permit gpt-4o).

Using Autopilot with Local LLMs

Run a Vision-capable local LLM (e.g., Gemma 4) via Ollama to use Autopilot with no API costs and fully offline.

  1. AI Settings → "Provider Settings" → "OpenAI Compatible": set the URL to http://localhost:11434/v1 and the model name (e.g., gemma4:26b)
  2. On the AI Autopilot block, set provider to OpenAICompatible
  3. Local LLMs are slower — increase timeoutSeconds as needed

Recommended Local LLMs for Autopilot

ModelOllama CommandVRAMNotes
Gemma 4 26B A4Bollama pull gemma4:26b~15GBMoE (3.8B active) for fast inference. Vision + function calling. Apache 2.0 license.
Qwen3-VL 8Bollama pull qwen3-vl:8b~6GBLightweight and fast. Runs on GPUs with less VRAM. Strong Vision accuracy for its size.
Note: Local LLMs are slower and less accurate than Claude Sonnet/Opus. Start with simple tasks.

Local Vision LLMs (LM Studio / Ollama)

AI Vision features work with locally-hosted Vision-capable LLMs via the "Local LLM" provider (OpenAI-compatible API). No cloud API required.

Below are recommended Vision-capable local models for screen recognition and UI element detection (as of March 2026).

Recommended Local Vision Models

ModelSizeVRAM (Q4)RuntimeUse Case & Notes
Qwen3-VL 8B 8B ~6GB LM Studio / Ollama TOP PICK Best-in-class GUI grounding (ScreenSpot 94.4%). Excellent at screen analysis, OCR, and UI element detection. 128K context.
Qwen2.5-VL 7B 7B ~6GB LM Studio / Ollama Battle-tested and stable. Exceptional document OCR (DocVQA 95.7). Choose this for proven reliability.
Gemma 3 4B 4B ~3-4GB LM Studio / Ollama LIGHTWEIGHT Runs on 6GB GPUs. Good for simple screen checks (Yes/No). Not suited for precise coordinate detection.
Phi-4-Reasoning-Vision 15B 15B ~10GB LM Studio (GGUF) Microsoft. Excels at complex reasoning over screen content: charts, tables, error messages.
Gemma 3 27B QAT 27B ~14GB LM Studio / Ollama HIGH-END For 24GB GPUs. QAT (Quantization-Aware Training) preserves quality. Best local quality available.
Recommendation: Start with Qwen3-VL 8B. It runs on 6GB VRAM with GUI grounding accuracy approaching cloud APIs. In LM Studio, search for "qwen3-vl"; in Ollama, run ollama pull qwen3-vl.

Connection Setup

  1. Launch a Vision-capable model in LM Studio or Ollama
  2. RocketMouse AI → AI Settings → "Local LLM" section
  3. Base URL: LM Studio http://localhost:1234, Ollama http://localhost:11434
  4. Model ID: the running model name (e.g., qwen3-vl-8b)
  5. Test Connection → Save
About AI Autopilot: The native path (Computer Use API) supports both Anthropic Claude (Sonnet/Opus) and OpenAI (GPT-5.5 / GPT-5.4 / computer-use-preview). Local LLMs use the generic path (prompt-based JSON). AiClick, AiOcr, AiSmartWait, AiValidate, and BoolAiCondition all work with any provider including local LLMs.

AI Assistant

The AI Assistant chat panel on the right side offers:

  • Explain — AI explains the current macro
  • Diagnose — AI suggests issues and improvements
  • General Chat — Ask about block usage and more
  • Block Generation — Describe tasks in natural language to auto-generate block sequences

License & Settings

License

Free 15-day trial, then activate with a license key for full access. See License Policy for details.

AI Settings

Toolbar → AI Settings to configure:

  1. Enter API keys for each provider
  2. Test connectivity
  3. Select default provider and model
  4. "AI Autopilot" section: pick "Default Model" from the unified ComboBox (Anthropic Sonnet 4.6 / Opus 4.7, or OpenAI GPT-5.5 / 5.4 / computer-use-preview)
  5. Enable/disable Self-Healing
  6. Save

API keys are stored encrypted (DPAPI) in the Windows registry. They are never saved in project files.

Project Settings

.rmproj files are JSON format. They can be manually edited but are typically managed within the app.

Keyboard Shortcuts

ShortcutAction
Ctrl+ZUndo
Ctrl+YRedo
Ctrl+CCopy
Ctrl+VPaste
Ctrl+DDuplicate
Ctrl+ASelect All
Ctrl+FSearch Blocks
Ctrl+SSave
Ctrl+Shift+SSave As
Ctrl+Shift+FFold / Unfold
DeleteDelete Selected
EscapeDeselect
HomeZoom to Fit
F9Toggle Breakpoint
Mouse WheelZoom
Middle Button DragPan
Right Button DragPan