Help — RocketMouse AI

Getting Started

What is RocketMouse AI?

RocketMouse AI is a visual editor for building RPA macros using Scratch-style puzzle blocks. Drag and snap blocks together to automate mouse actions, keyboard input, browser interactions, Excel operations, image recognition, AI integration, and more — all without writing code.

Screen Layout

The app uses a 3-column layout:

Left: Category sidebar + Block palette — list of available blocks
Center: Workspace — main area for placing and connecting blocks
Right: Property panel + Log/Variables panel — block settings and execution results

Project Files

Saved in .rmproj format (JSON). Ctrl+S to save, Ctrl+Shift+S for Save As. An asterisk * in the title bar indicates unsaved changes.

Editor Basics

Placing Blocks

Drag blocks from the palette to the workspace. You can also click a palette item to place it directly.

Snap Connections

Bring blocks close together (within 20px) and they automatically snap-connect. Yellow highlight shows connection candidates.

Stack Dragging

Drag the top block of a connected chain to move all attached blocks together.

Block Selection

Click to select (blue border). Ctrl+click for multi-select. Drag on canvas for marquee selection. Ctrl+A to select all.

Zoom / Pan

Mouse wheel to zoom. Middle button drag or right button drag to pan. Home key to zoom-to-fit all blocks.

Property Panel

Appears on the right when a block is selected. Set comments, delay (ms), enabled/disabled, and block-specific parameters.

Palette Search

Use the search box above the palette for incremental search across all categories. Esc or the ✕ button clears the search.

Minimap

200×150px overview in the bottom-left corner. Click/drag to pan the workspace. Toggle via toolbar button.

Block Types

Shape	Description	Examples
Stack	Standard block with top notch + bottom bump	Click, type, file operations
Hat	Rounded top, no top connector	Start block, function definitions
C-Block	Contains child blocks in its mouth, auto-expands	Loops, conditions, TryCatch
Boolean	Hexagonal, represents true/false conditions	Comparisons (=, >, <), AND/OR/NOT
Reporter	Capsule shape, returns values, snaps into parameter slots	Variable refs, math, strings
Cap	Top connector only, terminates flow	Break, Continue, MacroExit, Return

Block Categories

● Mouse 16 blocks

▶

Block	Description
LeftClick / RightClick / MiddleClick	Mouse button clicks (with optional coordinates)
DoubleClick	Double-click
MouseMove	Move cursor to coordinates
MouseDrag	Drag from point A to point B
ScrollUp / ScrollDown	Mouse wheel scroll
MouseClickRelative / MouseMoveRelative	Relative to active window
Others	LeftDown/Up, RightDown/Up, ScrollLeft/Right

● Keyboard 2 blocks

▶

Block	Description
KeyInput	Key combos (e.g., `Ctrl+S`, `{ENTER}`, `{F8}{ENTER}`)
TextPaste	Paste text via clipboard

● Window 8 blocks

▶

Block	Description
ActivateWindow	Bring window to front
CloseWindow	Close specified window
MinimizeWindow / MaximizeWindow / RestoreWindow	Window state
ResizeWindow / MoveWindow	Resize and reposition
GetWindowTitle	Get active window title

● Browser 16 blocks

▶

Control Edge/Chrome via Playwright.

Block	Description
BrowserLaunch	Launch browser (Edge/Chrome/Chromium)
BrowserNavigate	Navigate to URL
BrowserClick	Click element (CSS selector)
BrowserFill	Type text into input field
BrowserSelect	Select from dropdown
BrowserCheck	Toggle checkbox on/off
BrowserGetText / BrowserGetAttribute	Extract element data
BrowserGetTitle / BrowserGetUrl	Get page title or URL
BrowserScreenshot	Save page or element screenshot
BrowserExecuteScript	Execute JavaScript
BrowserWaitForElement	Wait for element to appear
BrowserSwitchTab / BrowserGoBack	Tab and navigation control
BrowserClose	Close browser

● Excel 18 blocks

▶

COM-based Excel operations (requires Microsoft Excel).

Block	Description
ExcelOpen / ExcelClose / ExcelSave	Workbook management
ExcelReadCell / ExcelWriteCell	Cell read/write
ExcelSetFormula	Set cell formula
ExcelRunMacro	Execute VBA macro
ExcelFilter / ExcelReadRange	Filter and range ops
Others	CopySheet, DeleteSheet, WriteRange, InsertRow, Sort, FindCell, Chart, ExportPdf, GetLastRow

● File / Folder 15 blocks

▶

Block	Description
FileCopy / FileMove / FileDelete	Basic file operations
FileRead / FileWrite / FileAppend	Text file I/O
FolderCreate / FolderCopy / FolderDelete	Folder operations
FileRename / FileGetInfo / FileListFiles	File utilities

● Data Processing 22 blocks

▶

Block	Description
JsonParse / JsonStringify	JSON parsing and generation
RegexMatch / RegexReplace	Regular expressions
StringIndexOf / StartsWith / EndsWith	String search
DateAdd / DateDiff / DateFormat	Date arithmetic and formatting
ListCreate / Add / Get / Set / Length / Remove / Sort / Join	List operations
Others	TextExtract, TextFormat, TextReplace, TextQuote

● Control Flow 17 blocks

▶

Block	Description
ConditionBranch	if-then (C-block)
ConditionBranchElse	if-then-else (dual mouth)
LoopStart	For loop (specified count)
ForEachLoop	Iterate over list items
TryCatch	Error handling (try/catch mouths)
BreakLoop / ContinueLoop	Loop control (Cap type)
MacroExit	Immediate macro termination (Cap type)
WaitTime / WaitWindow / WaitImage / WaitColor / WaitCpu	Conditional waits
FunctionDefine / FunctionCall / FunctionReturn	Custom functions

● AI 1 block

▶

Block	Description
AiInstruction	Send prompt to LLM, store response in variable. 5 providers (OpenAI, Anthropic, Gemini, Groq, local LLM). Vision API support (image attachment, 4MB limit).

● Vision / OCR 9 blocks

▶

Block	Description
VisionClick	Click on template image
VisionWait / VisionDisappear	Wait for image appear/disappear
VisionGetPosition	Get image coordinates
VisionCapture	Save screen capture
OcrReadText	Read text from screen region (Windows OCR)
OcrClickText / OcrWaitText	Find and click/wait for text
WaitImage	Image recognition wait (OpenCV)

● Boolean / Condition 11 types

▶

Hexagonal blocks that snap into C-block condition slots.

Block	Description
BoolCompareEquals / Greater / Less	Comparison operators (=, >, <)
BoolVariableEquals	Variable value match
BoolFileExists	File existence check
BoolWindowVisible	Window visibility check
BoolImageFound / BoolColorFound	Image/color detection
BoolAnd / BoolOr / BoolNot	Logical operators (nestable)

● System / Utility 18+ blocks

▶

Block	Description
AppLaunch / AppClose	Launch and close applications
MessageBox / InputDialog	User interaction dialogs
ClipboardSet / ClipboardGet	Clipboard operations
ProcessStart / ProcessKill	Process management
ZipCreate / ZipExtract	ZIP compression/extraction
Base64Encode / Base64Decode	Base64 encoding
HashCompute	Hash (MD5/SHA1/SHA256/SHA512)
HttpGet / HttpPost / HttpDownload	HTTP operations
RegistryRead / RegistryWrite	Registry operations
TextToSpeech / ScreenCapture	System utilities

● Reporter 15 blocks

▶

Capsule-shaped blocks that snap into parameter slots of other blocks.

Block	Description
MathAdd / Subtract / Multiply / Divide / Modulo	Arithmetic + modulo
MathFunction	Math functions (abs, round, ceil, floor, sqrt)
StringConcat / StringLength / StringCharAt	String operations
RandomNumber	Random number generation
SenseMouseX / SenseMouseY	Current mouse coordinates
SenseCurrentDateTime	Current date/time
SenseClipboardText	Clipboard text
SenseEnvironmentVariable	Environment variable value

Execution & Debugging

Running Macros

Press the ▶ Play button in the toolbar. Execution starts from the Start block. The currently executing block is highlighted with a green border.

Pause / Resume / Step

While running: ⏸ Pause to stop, ▶ Resume to continue, ⏩ Step to execute one block at a time.

Stop

Press ■ Stop to immediately halt execution.

Dry Run

Test the flow without performing actual mouse/keyboard operations. Results are shown in the log panel.

Speed Control

Adjust execution speed from 1x to 10x using the toolbar slider. Changes apply in real-time.

Error Visualization

Failed blocks are highlighted with a red border, and the error message is shown in a tooltip.

Log Panel

Structured log in the bottom panel with 4 color levels: gray (Info), green (Success), red (Error), yellow (Warning).

Variable Watch

The "Variables" tab in the bottom panel shows runtime variable values. Changed variables are highlighted in green.

Breakpoints

Right-click a block → "Set Breakpoint" or press F9. A red dot marker appears, and execution pauses automatically when it reaches that block.

Recording

Press the ⚫ Record button to capture mouse/keyboard actions. Recorded actions are placed as blocks in the workspace.

Variables & Functions

Defining Variables

Use the VariableDefine block to create variables. Reference values with {=variableName} syntax in any parameter field.

Expressions

The VariableExpression block supports arithmetic: +, -, *, /, %, parentheses. Example: {=count} + 1

System Variables

Automatically set inside loops:

{=_loopIndex} — 1-based loop counter (1, 2, 3, ...)
{=_loopIteration} — 0-based loop counter (0, 1, 2, ...)

List Operations

ListCreate for empty lists, ListAdd to append, ListGet/ListSet for index access. ListSort, ListJoin, and more available.

Custom Functions

Define functions with FunctionDefine (Hat), call with FunctionCall (Stack), return values with FunctionReturn (Cap). Recursion supported (max depth 100).

Auto Output Variables

Blocks with output variables get smart default names (e.g., CellValue). Duplicates are auto-numbered (CellValue2, CellValue3...).

AI Features

AI Instruction Block

The AiInstruction block sends a prompt to an LLM and stores the response in a variable. Attach image files for Vision API (4MB limit). Parameters: prompt, outputVariable, filePath, provider, model, temperature.

Supported Providers

Provider	Default Model	Vision
OpenAI	gpt-5.5 (default), gpt-5.5-mini, gpt-5.4, gpt-5.2, gpt-4.1, gpt-4o	✓
Anthropic	claude-sonnet-4-6 (default), claude-opus-4-7, claude-haiku-4-5	✓
Google Gemini	gemini-3-pro / 3-flash (Preview), gemini-2.5-pro / 2.5-flash	✓
Groq	Llama 4 Scout 17B (multimodal), Llama 3.3 70B, Llama 3.1 8B	✓ (with Llama 4 Scout)
Local LLM	User-specified	✓

AI Vision / Computer Use

New in v2.0 — AI Vision features let AI "see" and understand your screen, enabling automation that adapts to dynamic UI changes without fixed coordinates or template images.

AI Vision Blocks

Block	Function	Key Parameters
AiClick	AI sees the screen, locates the element, and clicks	prompt, button, clickType, provider, model
AiSmartWait	AI periodically checks the screen until condition is met	prompt, timeoutMs, pollingMs, provider, model
AiOcr	AI reads text from the screen into a variable	prompt, outputVariable, provider, model
AiValidate	AI verifies screen state and returns true/false	prompt, outputVariable, provider, model
BoolAiCondition	AI judgment in if/while conditions	prompt, provider, model

2-Pass Refinement (AiClick): Pass 1 uses a 1280×720 scaled screenshot for coarse coordinates. If near a screen edge (within 200px), Pass 2 crops a 1000×1000px region at original resolution for precise targeting. This accurately detects small icons like the Start button.

Prompt Examples

Example prompts for each AI Vision block. Specific, clear descriptions improve AI accuracy.

AiClick — Describe the target UI element specifically
The "File" menu in Notepad The "Save" button in the Save As dialog The Chrome icon in the taskbar Cell B3 in the Excel spreadsheet

AiSmartWait — Describe the screen state change to wait for
The file download has completed (progress bar disappeared) A "Processing complete" message is displayed The splash screen has closed and the main window is visible

AiOcr — Specify the location and content of text to read
Read the error message shown in the dialog at the center of the screen Read the total amount in the bottom-right cell of the Excel table Read the filename shown in the title bar

AiValidate — Describe the expected screen state after an action
The file was saved successfully (no "*" mark in the title bar) A login screen is displayed (username and password fields are visible) The print preview is correctly displayed

BoolAiCondition — Describe screen conditions for if/while logic
An error dialog is displayed on the screen Notepad is visible on the screen The web page has finished loading (no loading spinner visible)

Self-Healing: Enable in AI Settings to automatically fall back to AiClick when a standard VisionClick (template matching) fails. Dramatically improves robustness of existing macros.

AI Autopilot (Autonomous Agent)

AiAutopilot takes a natural language task description and autonomously operates your computer — taking screenshots, deciding actions, and repeating until the task is complete.

Dual-Provider Native Routing

Autopilot automatically selects from 3 execution paths based on the provider and model. v2.0.6 adds OpenAI GPT-5.5 native path.

Path	When	How It Works	Notes
Anthropic Native Path	Anthropic + Sonnet/Opus	Claude Computer Use API (multi-turn tool_use/tool_result)	High accuracy, well-proven stability
OpenAI Native Path NEW	OpenAI + GPT-5.5 / GPT-5.4 / computer-use-preview	OpenAI Responses API (stateless continuation via previous_response_id)	Faster (~1.3x) and cheaper (~5x) vs. Sonnet
Generic Path	All other providers + Anthropic Haiku	Prompt-based JSON approach	Standard

Native paths send screenshots to the AI, receive tool calls (click, type, scroll, etc.), execute them, and return the resulting screenshot — a multi-turn conversation enabling high accuracy and self-correction.

Approach differs by provider: GPT-5.5 tends to take the shortest path (e.g., Win+R Run dialog), while Claude Sonnet behaves more human-like, navigating the Start menu step-by-step with visual confirmation. Pick GPT-5.5 for simple repetitive tasks, Sonnet for complex screen recognition.

Parameters

Parameter	Description	Default
task (prompt)	Natural language task (e.g., "Open Notepad, type Hello World, and save")	(required)
maxSteps	Maximum steps (each step = screenshot + LLM + action)	30
timeoutSeconds	Timeout in seconds	300
provider	LLM provider. `Default` uses the provider selected in AI Settings → "AI Autopilot" section (Anthropic or OpenAI). Can be overridden per block: `Anthropic` / `OpenAI` / `OpenAICompatible` etc.	Default
model	Model ID. Can be overridden per block (e.g., `gemma4:26b`)	(empty = use AI Settings)
outputVariable	Result variable (True/False)	AutopilotResult

Supported Actions

Action	Description
left_click / right_click / double_click	Mouse click operations
type	Text input (Japanese supported via clipboard)
key	Keyboard input (Ctrl+S, Enter, Win+D, etc.)
scroll	Mouse wheel scroll
mouse_move / left_click_drag	Cursor movement and drag
screenshot	Request additional screenshot
wait	Wait 2 seconds

Task Description Examples (task parameter)

Specific, step-clear task descriptions lead to more accurate Autopilot execution.

AiAutopilot — Describe the full task in natural language
Open Notepad, type "Hello World", and save it as test.txt on the Desktop → Specify filename and location Open Chrome, navigate to https://example.com, copy the page title, and paste it into Notepad → Multi-app coordination task Open Calculator and compute 1234 × 5678 → Simple, clear objective Open report.xlsx from the Desktop and check the value in cell A1 → Working with existing files

Cost Awareness: Autopilot sends a screenshot (~200KB) to the LLM on each step. A 30-step task incurs significant API costs. We recommend a hybrid approach: use Win32 blocks for routine tasks, Autopilot only where judgment is needed.

Recommended Models for AI Vision / Autopilot

AI Vision works with all Vision-capable providers, but here are the recommended models for each use case:

Use Case	Recommended Model	Reason
Autopilot (Speed & Cost) NEW	OpenAI GPT-5.5	Built-in Computer Use. ~1.3x faster, ~5x cheaper than Sonnet. Available from OpenAI Tier 1. Best for simple repetitive tasks.
Autopilot (Accuracy & Stability)	Claude Sonnet 4.6	Native Computer Use API. Long production track record; step-by-step verification approach excels at complex tasks.
Autopilot (Top Performance)	Claude Opus 4.7	Native Computer Use API. Highest accuracy and self-correction. For long, complex tasks.
Autopilot (OpenAI Premium)	computer-use-preview	OpenAI's specialized Computer Use model. Requires Tier 3+ (cumulative $100 spend + 7 days).
Autopilot (Local LLM)	Gemma 4 26B A4B / Qwen3-VL 8B	Via Ollama. No API costs, works offline. Speed depends on GPU.
AiClick / AiSmartWait / AiOcr	Claude Sonnet 4.6 / GPT-5.5 / Gemini 2.5 Pro / Llama 4 Scout	Single Vision API call. Any Vision-capable model works well.
BoolAiCondition	Gemini 2.5 Flash / GPT-5.5 mini	Simple Yes/No judgment. Fast, low-cost models suffice.
AiInstruction (Text)	Any (per your needs)	No Vision required — Groq or local LLMs are also valid.

Computer Use Model Setting

The native-path model is selected from a single unified ComboBox at: AI Settings → "AI Autopilot" section → "Default Model". Pick from Anthropic (Sonnet 4.6 / Opus 4.7) or OpenAI (GPT-5.5 / 5.4 / computer-use-preview) in one selection. You can also override per-block with the provider and model parameters.

Model Priority: (1) Block-level provider + model parameters → (2) AI Settings "AI Autopilot" Default Model → (3) Default: Claude Sonnet 4.6 (Anthropic)

OpenAI Note: GPT-5.5 is available from Tier 1 ($5+ spent), but computer-use-preview requires Tier 3+ (cumulative $100 + 7 days). Also, the OpenAI project's "Allowed Models" whitelist must include the target model (some Default projects only permit gpt-4o).

Using Autopilot with Local LLMs

Run a Vision-capable local LLM (e.g., Gemma 4) via Ollama to use Autopilot with no API costs and fully offline.

AI Settings → "Provider Settings" → "OpenAI Compatible": set the URL to http://localhost:11434/v1 and the model name (e.g., gemma4:26b)
On the AI Autopilot block, set provider to OpenAICompatible
Local LLMs are slower — increase timeoutSeconds as needed

Recommended Local LLMs for Autopilot

Model	Ollama Command	VRAM	Notes
Gemma 4 26B A4B	`ollama pull gemma4:26b`	~15GB	MoE (3.8B active) for fast inference. Vision + function calling. Apache 2.0 license.
Qwen3-VL 8B	`ollama pull qwen3-vl:8b`	~6GB	Lightweight and fast. Runs on GPUs with less VRAM. Strong Vision accuracy for its size.

Note: Local LLMs are slower and less accurate than Claude Sonnet/Opus. Start with simple tasks.

Local Vision LLMs (LM Studio / Ollama)

AI Vision features work with locally-hosted Vision-capable LLMs via the "Local LLM" provider (OpenAI-compatible API). No cloud API required.

Below are recommended Vision-capable local models for screen recognition and UI element detection (as of March 2026).

Recommended Local Vision Models

Model	Size	VRAM (Q4)	Runtime	Use Case & Notes
Qwen3-VL 8B	8B	~6GB	LM Studio / Ollama	TOP PICK Best-in-class GUI grounding (ScreenSpot 94.4%). Excellent at screen analysis, OCR, and UI element detection. 128K context.
Qwen2.5-VL 7B	7B	~6GB	LM Studio / Ollama	Battle-tested and stable. Exceptional document OCR (DocVQA 95.7). Choose this for proven reliability.
Gemma 3 4B	4B	~3-4GB	LM Studio / Ollama	LIGHTWEIGHT Runs on 6GB GPUs. Good for simple screen checks (Yes/No). Not suited for precise coordinate detection.
Phi-4-Reasoning-Vision 15B	15B	~10GB	LM Studio (GGUF)	Microsoft. Excels at complex reasoning over screen content: charts, tables, error messages.
Gemma 3 27B QAT	27B	~14GB	LM Studio / Ollama	HIGH-END For 24GB GPUs. QAT (Quantization-Aware Training) preserves quality. Best local quality available.

Recommendation: Start with Qwen3-VL 8B. It runs on 6GB VRAM with GUI grounding accuracy approaching cloud APIs. In LM Studio, search for "qwen3-vl"; in Ollama, run ollama pull qwen3-vl.

Connection Setup

Launch a Vision-capable model in LM Studio or Ollama
RocketMouse AI → AI Settings → "Local LLM" section
Base URL: LM Studio http://localhost:1234, Ollama http://localhost:11434
Model ID: the running model name (e.g., qwen3-vl-8b)
Test Connection → Save

About AI Autopilot: The native path (Computer Use API) supports both Anthropic Claude (Sonnet/Opus) and OpenAI (GPT-5.5 / GPT-5.4 / computer-use-preview). Local LLMs use the generic path (prompt-based JSON). AiClick, AiOcr, AiSmartWait, AiValidate, and BoolAiCondition all work with any provider including local LLMs.

AI Assistant

The AI Assistant chat panel on the right side offers:

Explain — AI explains the current macro
Diagnose — AI suggests issues and improvements
General Chat — Ask about block usage and more
Block Generation — Describe tasks in natural language to auto-generate block sequences

License & Settings

License

Free 15-day trial, then activate with a license key for full access. See License Policy for details.

AI Settings

Toolbar → AI Settings to configure:

Enter API keys for each provider
Test connectivity
Select default provider and model
"AI Autopilot" section: pick "Default Model" from the unified ComboBox (Anthropic Sonnet 4.6 / Opus 4.7, or OpenAI GPT-5.5 / 5.4 / computer-use-preview)
Enable/disable Self-Healing
Save

API keys are stored encrypted (DPAPI) in the Windows registry. They are never saved in project files.

Project Settings

.rmproj files are JSON format. They can be manually edited but are typically managed within the app.

Keyboard Shortcuts

Shortcut	Action
`Ctrl`+`Z`	Undo
`Ctrl`+`Y`	Redo
`Ctrl`+`C`	Copy
`Ctrl`+`V`	Paste
`Ctrl`+`D`	Duplicate
`Ctrl`+`A`	Select All
`Ctrl`+`F`	Search Blocks
`Ctrl`+`S`	Save
`Ctrl`+`Shift`+`S`	Save As
`Ctrl`+`Shift`+`F`	Fold / Unfold
`Delete`	Delete Selected
`Escape`	Deselect
`Home`	Zoom to Fit
`F9`	Toggle Breakpoint
Mouse Wheel	Zoom
Middle Button Drag	Pan
Right Button Drag	Pan