Help - RocketWhisper for Mac | AI Speech Recognition & Transcription

1Installation

System Requirements

OS: macOS 14.0 Sonoma or later
Processor: Apple Silicon recommended (M1 / M2 / M3 / M4)
Memory: RAM 8GB or more (16GB recommended)
Storage: 200MB + model files (up to 3GB)

Installation Steps

Download the latest .dmg file from the download page.
Double-click the downloaded DMG file and drag the RocketWhisper icon to the Applications folder.
If Gatekeeper shows a warning on first launch, click "Open". Alternatively, go to System Settings > Privacy & Security and click "Open Anyway".
On first launch, the Whisper model will begin downloading. This may take a few minutes depending on your network.

Choosing a Model

Whisper Model Comparison

Model	Size	Accuracy	Speed	Recommended Use
Small	500MB	High	Fast	Lower-spec Macs
Medium	1.5GB	High	Normal	Audio under 5 seconds
Large V3 Turbo Recommended	1.6GB	High	Fast	5-20 second audio
Large V3	3.0GB	Highest	Slower	Audio over 20 seconds

Tip: If you're unsure, Large V3 Turbo is recommended. It offers excellent balance of accuracy and speed, optimized for Apple Silicon's Neural Engine.

* For Japanese speech recognition, Large V3 Turbo or higher is recommended. Small/Medium may have reduced accuracy for kanji and katakana words.

2Basic Usage

Voice Recognition from Microphone

Click the RocketWhisper icon in the menu bar to show the popup window.
Click the Record button (microphone icon).
Speak into your microphone.
Click the Stop button.
The recognition result will appear in the text area.

Using Recognition Results

Copy: Click the copy button to copy the result to your clipboard.
Auto Copy: When enabled in settings, results are automatically copied on completion.
Auto Paste: When enabled, results are automatically pasted into the active app's text field when using shortcut recording.

Tip: When using shortcut recording, text can be automatically pasted into the app that was focused before recording. UI button recording requires manual copy and paste.

3Settings

Click the gear icon in the popup window to open settings. Various options are available in the following tabs.

Tab	Settings
Model & Language	Whisper model selection, recognition language
Input Device	Microphone selection, auto copy, auto paste
Shortcut	Recording shortcut customization, Right Option key, cancel key, AI Command shortcut (`⌃⇧Space`)
Word Dictionary	Custom terminology: technical terms, company names, personal names
Text Processing	Auto punctuation, line breaks, voice commands enable/disable
Correction Rules	Auto-correction enable/disable, preset rules, custom rules
App-Specific	Processing mode settings, app-to-mode mapping
AI Processing	AI provider selection (OpenAI / Anthropic / Groq / Gemini / Local LLM), model, API keys
License	License type, license key entry

4Global Shortcut

RocketWhisper supports customizable global shortcuts, allowing instant recording from any app. It also supports Right Option key tap and hold (Push-to-Talk).

Right Option Key Operations

Action	Behavior
Hold Right Option → Release	Push-to-Talk (records while held, stops and recognizes on release)
Double-tap Right Option	Switch to continuous recording mode (tap again to stop)
Press another key while holding Right Option	Cancel recording (works as normal Option modifier)

Recommended Shortcut Settings

Shortcut	Type	Description
`⌥Space`	Toggle	Default setting. Option + Space to start/stop recording. Same as Superwhisper.
`Right Option` (hold)	Push-to-Talk	Most recommended. Records only while held, auto-stops on release.
`⌃⇧R`	Toggle	R for Record. Each press toggles recording on/off.
`F9`	Toggle	Function key. Less likely to conflict with other shortcuts.

Cancel Recording

Press Escape during recording to cancel without processing.

AI Command Shortcut

⌃⇧Space (Control + Shift + Space) activates AI Command mode. See AI Command Mode for details.

Note: Global shortcuts require Accessibility permissions. You'll be prompted on first launch. Verify in System Settings > Privacy & Security > Accessibility that RocketWhisper is enabled.

5Recognition History

RocketWhisper automatically saves past recognition results. Click the History button in the popup to view the history list.

History Features

List View: View past results with timestamps
Search: Search past results by keyword
Copy: Copy any history item to clipboard
Delete: Delete individual history items
Export: Export history as a text file

6Text Processing (Punctuation & Line Breaks)

RocketWhisper includes advanced text processing features to produce natural, well-formatted text.

Punctuation Prompt

A prompt is set for the Whisper model to encourage punctuation output, making the model more likely to include punctuation.

Auto Punctuation

Post-processing applies 7 rule-based punctuation insertions. Even if Whisper's output lacks punctuation, natural punctuation is added.

Punctuation Rules (7 Stages)

Insert periods after sentence-ending expressions
Insert commas after conjunctive particles
Insert question marks at end of questions
Insert exclamation marks at end of exclamations
Insert commas in enumerations
Insert commas at long phrase boundaries
Remove unnecessary leading punctuation

Auto Line Breaks

Automatically inserts line breaks at sentence boundaries, making long text more readable with paragraphs.

Tip: Auto punctuation and auto line breaks can be enabled/disabled independently. Disable line breaks for chat apps, enable for document creation.

7Voice Commands

Voice commands let you execute text editing operations by speaking specific phrases. Enable/disable in "Text Processing" settings.

Supported Commands

Command	Trigger Phrases	Action
New Line	"new line", "enter", "line break"	Insert line break
Paragraph	"paragraph", "new paragraph"	Insert double line break (paragraph)
Delete	"delete", "backspace", "undo"	Delete previous word

Tip: Voice commands are processed at Stage 1 of the text processing pipeline. When a command is detected, the action is executed and the phrase is removed from the text.

If Voice Commands Aren't Recognized

Whisper may sometimes misrecognize homophones based on context.

If voice commands don't work as expected:

Check that voice commands are enabled — Verify "Voice Commands" is ON in the "Text Processing" settings tab.
Pause before and after commands — A short pause before and after helps distinguish commands from regular speech.
Speak clearly — Pronouncing each syllable distinctly improves recognition.

8Word Dictionary (Custom Terms)

The word dictionary allows you to register technical terms, company names, personal names, and acronyms that Whisper might not recognize well. This dramatically improves recognition accuracy. Not available in macOS built-in dictation.

How It Works

Registered words are used as WhisperKit promptTokens, making the Whisper model prioritize these terms in its output.

How to Register

Open the "Word Dictionary" tab in settings.
Click the "Add" button.
Enter the word to register (e.g., React, TypeScript, AWS).
Optionally set a reading/pronunciation (for Whisper recognition assistance).

Note: Keep registered terms to about 15 words (short tokens) maximum. Too many terms may affect decoder log probability and reduce accuracy.

Registration Examples

Technical terms: React, TypeScript, Kubernetes, Docker
Company names: Mojosoft, OpenAI
Personal names: John Smith
Acronyms: AWS, GCP, CI/CD

Features

Fully local processing — No API costs, no internet required
Real-time effect — Active immediately after registration
Dictionary replacement — Can also be used for auto-replacement rules

9Auto-Correction Rules

Set up rules to automatically correct misrecognitions in Whisper output. Supports both simple string replacement and regex.

Rule Types

Simple replacement: Replace a specific string with another
Regex: Advanced replacement using regular expression patterns
Case sensitivity: Set per-rule whether to match case

Built-in Hallucination Filters

27 filters are built in to automatically remove "hallucination text" that Whisper sometimes generates during silence, such as:

"Thank you for watching"
"Please subscribe"
"Good night" (hallucination during silence)

Preset Rules

Preset rules for common misrecognition patterns are available. Enable with one click from settings.

Custom Rules

Open the "Correction Rules" tab in settings.
Click "Add Rule".
Enter the search string (misrecognized text) and replacement string (correct text).
Optionally enable "Use regex" and "Ignore case" options.

10App-Specific Processing Modes

App-specific modes automatically apply different text processing settings based on the focused app. For example, use punctuation for text editors and casual style for chat apps.

Processing Modes

Mode	AI Required	Description
Smart	No	Auto-formats punctuation and line breaks. Most versatile mode.
Simple	No	Output recognition result as-is. Minimal processing.
Business	Yes	Auto-converts to polite, formal style. For emails and documents.
Casual	Yes	Converts to friendly style. For chat and social media.
Summary	Yes	Summarizes recognized text. For meeting notes and memos.
Translation	Yes	Translates to another language.
Grammar Fix	Yes	AI corrects misrecognition and grammar errors.

Setting Up App Mapping

Open the "App-Specific" tab in settings.
Enable "App-Specific Processing Modes".
Click "Add" to select an app and set its mode.

Tip: When app-specific mode is enabled and a mode is set for the current app, mode-specific settings apply. Otherwise, global settings are used.

11AI Processing (LLM Integration)

RocketWhisper integrates with 5 AI providers for advanced processing like text formatting, translation, and summarization.

Supported AI Providers

Provider	Model Examples	Features
OpenAI	GPT-4o, GPT-4o mini	High accuracy, broad language support
Anthropic	Claude Sonnet 4.5, Haiku 4.5	Natural language, polite output
Groq	LLaMA 3.3 70B	Ultra-fast inference, free tier available
Google Gemini	Gemini 2.5 Pro / Flash	Generous free tier, multimodal support
Local LLM	LM Studio, Ollama	Fully offline, privacy-focused

Setup

Open the "AI Processing" tab in settings.
Select your provider.
Enter your API key (not needed for Local LLM).
Select the model to use.
Enable AI processing and choose a processing mode.

Local LLM Setup Examples

Using LM Studio

Install LM Studio and download your preferred model.
Start the local server in LM Studio (default: http://localhost:1234).
Select "Local LLM" in RocketWhisper's AI settings.
Enter http://localhost:1234 as the base URL.
Model ID can be left empty (LM Studio uses the loaded model).
API key can be empty or a dummy value (e.g., lm-studio).

Using Ollama

Install Ollama.
Download a model from terminal:
ollama pull llama3.2 (example: Llama 3.2)
Verify Ollama server is running (usually auto-starts after install).
Select "Local LLM" in RocketWhisper's AI settings.
Enter http://localhost:11434 as the base URL.
Enter the downloaded model name as Model ID (e.g., llama3.2, qwen2.5, gemma2).
* Run ollama list to see available model names.
API key can be left empty (not required for Ollama).

Tip: To minimize costs, Groq (free tier) or Google Gemini (generous free tier) are recommended. For complete privacy, Local LLM processes everything from speech recognition to AI formatting fully offline.

12AI Command Mode

AI Command mode lets you give voice instructions to edit selected text with AI. Perform translation, summarization, tone changes, and more with just your voice.

How to Use

Select text in any app.
Press ⌃⇧Space (Control + Shift + Space) to activate AI Command mode.
Speak your instruction (e.g., "Translate to English").
AI processes the selected text according to your instruction and replaces it.

Examples

Voice Instruction	Processing
"Translate to English"	Translate selected text to English
"Summarize this"	Summarize selected text concisely
"Make it formal"	Convert casual text to formal style
"Add comments"	Add comments to code
"Convert to bullet points"	Convert text to bullet list format
"Fix typos"	Correct spelling and grammar errors

Note: AI Command mode requires an API key for one of the AI providers to be configured.

13Custom Instructions

Custom Instructions let you pre-assign AI processing prompts to dedicated shortcuts. Unlike AI Commands, you don't need to speak instructions each time — the recognized speech is automatically processed using a pre-configured prompt.

Difference from AI Commands

Feature	AI Commands	Custom Instructions
AI Instruction	Speak instructions each time	Pre-configured prompt
Text Selection	Required (processes selected text)	Not required (processes speech input)
Shortcut	Single shared shortcut (⌃⇧Space)	Individual shortcut per instruction
Best For	Ad-hoc, varying instructions	Frequently used operations in one action

How to Use

Create an instruction in the "Custom Instructions" settings tab and assign a shortcut.
In any app, press the assigned shortcut to start recording.
Speak into the microphone (the recognized text becomes the AI input).
Press the same shortcut again to stop recording.
Speech is recognized and processed by AI with the pre-configured prompt, then automatically pasted.

Preset Instructions

Four presets are automatically created on first launch. These can be edited but not deleted.

Preset	Description
🌐 Translate to English	Translate speech to natural English
💼 Business Style	Convert to formal business Japanese
📝 Summary	Summarize text concisely
✔️ Grammar Fix	Fix grammar errors and misrecognitions

Note: Custom Instructions require an API key for one of the AI providers. Up to 20 instructions can be registered.

14Voice Launcher

Voice Launcher lets you launch apps or open URLs by speaking registered keywords. Processed at Stage 0 of the pipeline, so matching keywords execute actions without text output.

How It Works

Exact match for keywords (ignoring punctuation and case)
On match, launches the registered app or opens URL in browser
No text output on match (action only)

Setup

Open Voice Launcher settings.
Click "Add".
Enter the trigger keyword (e.g., "Notes", "Browser").
Enter the app path or URL to open.

Configuration Examples

Keyword	Action	Type
"Notes"	`/Applications/Notes.app`	App Launch
"Browser"	`/Applications/Safari.app`	App Launch
"Terminal"	`/Applications/Utilities/Terminal.app`	App Launch
"GitHub"	`https://github.com`	Open URL
"Mail"	`/System/Applications/Mail.app`	App Launch

15Voice Search

Voice Search lets you trigger Google searches instantly by speaking specific phrases. Results open in your default browser.

Supported Phrases (10 Patterns)

Phrase Pattern	Example
"Search for..."	"Search for SwiftUI"
"Look up..."	"Look up Neural Engine"
"Google..."	"Google macOS Sequoia"
"What is..."	"What is WhisperKit"
"Find information about..."	"Find information about Metal API"

Tip: Voice Search is processed at Stage 0.5 of the pipeline. The keyword portion is automatically extracted and used as the Google search query.

16Menu Bar Icon

RocketWhisper is a menu bar resident app. It doesn't appear in the Dock. All operations are accessible from the menu bar icon.

Basic Operations

Click: Click the menu bar icon to show the popup window.
Recording status: The icon changes during recording to show current state.
Auto-start: Add to login items to auto-start in the menu bar on Mac startup.

Setting Up Auto-Start

Open System Settings.
Go to General > Login Items.
Click "+" to add RocketWhisper.

Tip: The menu bar icon is always visible, allowing instant voice input with shortcuts without interrupting your work.

17Floating Waveform Indicator

A floating window showing a small mini equalizer-style waveform during recording. Always on top, so you can confirm recording status even while working in other apps.

Specifications

Indicator Details

Size: 96 x 48 pixels (compact capsule shape)
Bars: 8 mini equalizer-style bars
Color: Blue → Purple → Pink gradient
Background: Frosted glass (ultraThinMaterial) + rounded corners
Display: Fades in on recording start, fades out on stop
Initial position: Bottom center of screen

Operations

Drag to move: Drag the indicator to any position.
Position memory: Position is saved and restored on next launch.
All Spaces: Visible on all macOS desktops (Spaces).
Always on top: Stays above all other windows.

Settings

Toggle "Show floating waveform during recording" in the "Model & Language" settings tab. Default is enabled (ON).

Tip: To reset position, run these commands in Terminal:
defaults delete biz.mojosoft.RocketWhisper FloatingWaveformX
defaults delete biz.mojosoft.RocketWhisper FloatingWaveformY

18Batch Processing

Batch transcribe multiple audio files at once. Convenient for processing recorded meeting audio or interview files.

How to Open

Open the menu bar popup.
Click the Batch Processing button (document icon) in the header.
A separate batch processing window opens.

Usage

Add files: Click "Add Files" to select audio files, or drag & drop onto the window.
Start batch: Click "Start Batch" to transcribe files sequentially.
View results: Recognition results (character count) are shown in the list.
Export: Select export format from the "Export" menu and choose a destination folder.

Supported File Formats

WAV, MP3, M4A, FLAC, OGG, WMA, AAC, AIFF

Export Formats

Format	Description	Use Case
TXT	Plain text	General transcription text
SRT	SubRip subtitle format	Subtitle creation for video editing
VTT	WebVTT subtitle format	Subtitles for web video and HTML5

Tip: Batch processing uses its own Whisper model instance, so it can run simultaneously with real-time voice input. However, be mindful of memory usage when processing many files.

19Troubleshooting

If you encounter issues, refer to this FAQ.

Check your network connection. Model files are several hundred MB to 3GB, so a stable Wi-Fi connection is recommended. If download is interrupted, restart the app to retry. Temporarily disable VPN or proxy if applicable.

Check the following:

Change model: Switch to a larger model (Large V3 Turbo recommended).
Microphone: Try an external mic, adjust distance, reduce ambient noise.
Language setting: Verify the recognition language is correctly set.
Word dictionary: Register technical terms in the dictionary to improve accuracy.

Check Accessibility permissions.

Open System Settings.
Go to Privacy & Security > Accessibility.
Verify RocketWhisper is listed and the toggle is enabled.
If not listed, click "+" to add it.
If already added but not working, disable and re-enable.

Also check that no other app is using the same shortcut. Change to a different shortcut if there's a conflict.

Check the following:

macOS version: Requires macOS 14.0 Sonoma or later. Check via Apple menu > About This Mac.
Gatekeeper: If you see "app can't be opened because developer cannot be verified", go to System Settings > Privacy & Security and click "Open Anyway".
Apple Silicon: Works on Intel Macs too, but Apple Silicon (M1+) is recommended.

Check the following:

API key: Verify the API key is correctly entered in settings.
Internet connection: Cloud AI providers require internet.
API credits: For OpenAI or Anthropic, verify you have remaining API credits.
Local LLM: For LM Studio or Ollama, verify the local server is running.

RocketWhisper needs microphone permission.

Open System Settings.
Go to Privacy & Security > Microphone.
Verify RocketWhisper's toggle is enabled.

If the permission dialog didn't appear on first launch, quit and restart the app.

Check the following:

Microphone input: Verify the correct microphone is selected in "Input Device" settings.
Microphone permission: Check microphone access is allowed in macOS privacy settings.
Volume: Check input volume is adequate. Verify in System Settings > Sound > Input.
Recording duration: Very short recordings (under 1 second) may not produce results.

User Guide

1Installation

System Requirements

System Requirements

Installation Steps

Choosing a Model

Whisper Model Comparison

2Basic Usage

Voice Recognition from Microphone

Using Recognition Results

3Settings

4Global Shortcut

Right Option Key Operations

Recommended Shortcut Settings

Cancel Recording

AI Command Shortcut

5Recognition History

History Features

6Text Processing (Punctuation & Line Breaks)

Punctuation Prompt

Auto Punctuation

Punctuation Rules (7 Stages)

Auto Line Breaks

7Voice Commands

Supported Commands

If Voice Commands Aren't Recognized

8Word Dictionary (Custom Terms)

How It Works

How to Register

Registration Examples

Features

9Auto-Correction Rules

Rule Types

Built-in Hallucination Filters

Preset Rules

Custom Rules

10App-Specific Processing Modes

Processing Modes

Setting Up App Mapping

11AI Processing (LLM Integration)

Supported AI Providers

Setup

Local LLM Setup Examples

Using LM Studio

Using Ollama

12AI Command Mode

How to Use

Examples

13Custom Instructions

Difference from AI Commands

How to Use

Preset Instructions

14Voice Launcher

How It Works

Setup

Configuration Examples

15Voice Search

Supported Phrases (10 Patterns)

16Menu Bar Icon

Basic Operations

Setting Up Auto-Start

17Floating Waveform Indicator

Specifications

Indicator Details

Operations

Settings

18Batch Processing

How to Open

Usage

Supported File Formats

Export Formats

19Troubleshooting

Need More Help?