1Installation

System Requirements

System Requirements

  • OS: macOS 14.0 Sonoma or later
  • Processor: Apple Silicon recommended (M1 / M2 / M3 / M4)
  • Memory: RAM 8GB or more (16GB recommended)
  • Storage: 200MB + model files (up to 3GB)

Installation Steps

  1. Download the latest .dmg file from the download page.
  2. Double-click the downloaded DMG file and drag the RocketWhisper icon to the Applications folder.
  3. If Gatekeeper shows a warning on first launch, click "Open". Alternatively, go to System Settings > Privacy & Security and click "Open Anyway".
  4. On first launch, the Whisper model will begin downloading. This may take a few minutes depending on your network.

Choosing a Model

Whisper Model Comparison

Model Size Accuracy Speed Recommended Use
Small 500MB High Fast Lower-spec Macs
Medium 1.5GB High Normal Audio under 5 seconds
Large V3 Turbo Recommended 1.6GB High Fast 5-20 second audio
Large V3 3.0GB Highest Slower Audio over 20 seconds

Tip: If you're unsure, Large V3 Turbo is recommended. It offers excellent balance of accuracy and speed, optimized for Apple Silicon's Neural Engine.

* For Japanese speech recognition, Large V3 Turbo or higher is recommended. Small/Medium may have reduced accuracy for kanji and katakana words.

2Basic Usage

Voice Recognition from Microphone

  1. Click the RocketWhisper icon in the menu bar to show the popup window.
  2. Click the Record button (microphone icon).
  3. Speak into your microphone.
  4. Click the Stop button.
  5. The recognition result will appear in the text area.

Using Recognition Results

Tip: When using shortcut recording, text can be automatically pasted into the app that was focused before recording. UI button recording requires manual copy and paste.

3Settings

Click the gear icon in the popup window to open settings. Various options are available in the following tabs.

Tab Settings
Model & Language Whisper model selection, recognition language
Input Device Microphone selection, auto copy, auto paste
Shortcut Recording shortcut customization, Right Option key, cancel key, AI Command shortcut (⌃⇧Space)
Word Dictionary Custom terminology: technical terms, company names, personal names
Text Processing Auto punctuation, line breaks, voice commands enable/disable
Correction Rules Auto-correction enable/disable, preset rules, custom rules
App-Specific Processing mode settings, app-to-mode mapping
AI Processing AI provider selection (OpenAI / Anthropic / Groq / Gemini / Local LLM), model, API keys
License License type, license key entry

4Global Shortcut

RocketWhisper supports customizable global shortcuts, allowing instant recording from any app. It also supports Right Option key tap and hold (Push-to-Talk).

Right Option Key Operations

Action Behavior
Hold Right Option → Release Push-to-Talk (records while held, stops and recognizes on release)
Double-tap Right Option Switch to continuous recording mode (tap again to stop)
Press another key while holding Right Option Cancel recording (works as normal Option modifier)

Recommended Shortcut Settings

Shortcut Type Description
⌥Space Toggle Default setting. Option + Space to start/stop recording. Same as Superwhisper.
Right Option (hold) Push-to-Talk Most recommended. Records only while held, auto-stops on release.
⌃⇧R Toggle R for Record. Each press toggles recording on/off.
F9 Toggle Function key. Less likely to conflict with other shortcuts.

Cancel Recording

Press Escape during recording to cancel without processing.

AI Command Shortcut

⌃⇧Space (Control + Shift + Space) activates AI Command mode. See AI Command Mode for details.

Note: Global shortcuts require Accessibility permissions. You'll be prompted on first launch. Verify in System Settings > Privacy & Security > Accessibility that RocketWhisper is enabled.

5Recognition History

RocketWhisper automatically saves past recognition results. Click the History button in the popup to view the history list.

History Features

6Text Processing (Punctuation & Line Breaks)

RocketWhisper includes advanced text processing features to produce natural, well-formatted text.

Punctuation Prompt

A prompt is set for the Whisper model to encourage punctuation output, making the model more likely to include punctuation.

Auto Punctuation

Post-processing applies 7 rule-based punctuation insertions. Even if Whisper's output lacks punctuation, natural punctuation is added.

Punctuation Rules (7 Stages)

  • Insert periods after sentence-ending expressions
  • Insert commas after conjunctive particles
  • Insert question marks at end of questions
  • Insert exclamation marks at end of exclamations
  • Insert commas in enumerations
  • Insert commas at long phrase boundaries
  • Remove unnecessary leading punctuation

Auto Line Breaks

Automatically inserts line breaks at sentence boundaries, making long text more readable with paragraphs.

Tip: Auto punctuation and auto line breaks can be enabled/disabled independently. Disable line breaks for chat apps, enable for document creation.

7Voice Commands

Voice commands let you execute text editing operations by speaking specific phrases. Enable/disable in "Text Processing" settings.

Supported Commands

Command Trigger Phrases Action
New Line "new line", "enter", "line break" Insert line break
Paragraph "paragraph", "new paragraph" Insert double line break (paragraph)
Delete "delete", "backspace", "undo" Delete previous word

Tip: Voice commands are processed at Stage 1 of the text processing pipeline. When a command is detected, the action is executed and the phrase is removed from the text.

If Voice Commands Aren't Recognized

Whisper may sometimes misrecognize homophones based on context.

If voice commands don't work as expected:

  1. Check that voice commands are enabled — Verify "Voice Commands" is ON in the "Text Processing" settings tab.
  2. Pause before and after commands — A short pause before and after helps distinguish commands from regular speech.
  3. Speak clearly — Pronouncing each syllable distinctly improves recognition.

8Word Dictionary (Custom Terms)

The word dictionary allows you to register technical terms, company names, personal names, and acronyms that Whisper might not recognize well. This dramatically improves recognition accuracy. Not available in macOS built-in dictation.

How It Works

Registered words are used as WhisperKit promptTokens, making the Whisper model prioritize these terms in its output.

How to Register

  1. Open the "Word Dictionary" tab in settings.
  2. Click the "Add" button.
  3. Enter the word to register (e.g., React, TypeScript, AWS).
  4. Optionally set a reading/pronunciation (for Whisper recognition assistance).

Note: Keep registered terms to about 15 words (short tokens) maximum. Too many terms may affect decoder log probability and reduce accuracy.

Registration Examples

  • Technical terms: React, TypeScript, Kubernetes, Docker
  • Company names: Mojosoft, OpenAI
  • Personal names: John Smith
  • Acronyms: AWS, GCP, CI/CD

Features

9Auto-Correction Rules

Set up rules to automatically correct misrecognitions in Whisper output. Supports both simple string replacement and regex.

Rule Types

Built-in Hallucination Filters

27 filters are built in to automatically remove "hallucination text" that Whisper sometimes generates during silence, such as:

Preset Rules

Preset rules for common misrecognition patterns are available. Enable with one click from settings.

Custom Rules

  1. Open the "Correction Rules" tab in settings.
  2. Click "Add Rule".
  3. Enter the search string (misrecognized text) and replacement string (correct text).
  4. Optionally enable "Use regex" and "Ignore case" options.

10App-Specific Processing Modes

App-specific modes automatically apply different text processing settings based on the focused app. For example, use punctuation for text editors and casual style for chat apps.

Processing Modes

Mode AI Required Description
Smart No Auto-formats punctuation and line breaks. Most versatile mode.
Simple No Output recognition result as-is. Minimal processing.
Business Yes Auto-converts to polite, formal style. For emails and documents.
Casual Yes Converts to friendly style. For chat and social media.
Summary Yes Summarizes recognized text. For meeting notes and memos.
Translation Yes Translates to another language.
Grammar Fix Yes AI corrects misrecognition and grammar errors.

Setting Up App Mapping

  1. Open the "App-Specific" tab in settings.
  2. Enable "App-Specific Processing Modes".
  3. Click "Add" to select an app and set its mode.

Tip: When app-specific mode is enabled and a mode is set for the current app, mode-specific settings apply. Otherwise, global settings are used.

11AI Processing (LLM Integration)

RocketWhisper integrates with 5 AI providers for advanced processing like text formatting, translation, and summarization.

Supported AI Providers

Provider Model Examples Features
OpenAI GPT-4o, GPT-4o mini High accuracy, broad language support
Anthropic Claude Sonnet 4.5, Haiku 4.5 Natural language, polite output
Groq LLaMA 3.3 70B Ultra-fast inference, free tier available
Google Gemini Gemini 2.5 Pro / Flash Generous free tier, multimodal support
Local LLM LM Studio, Ollama Fully offline, privacy-focused

Setup

  1. Open the "AI Processing" tab in settings.
  2. Select your provider.
  3. Enter your API key (not needed for Local LLM).
  4. Select the model to use.
  5. Enable AI processing and choose a processing mode.

Local LLM Setup Examples

Using LM Studio

  1. Install LM Studio and download your preferred model.
  2. Start the local server in LM Studio (default: http://localhost:1234).
  3. Select "Local LLM" in RocketWhisper's AI settings.
  4. Enter http://localhost:1234 as the base URL.
  5. Model ID can be left empty (LM Studio uses the loaded model).
  6. API key can be empty or a dummy value (e.g., lm-studio).

Using Ollama

  1. Install Ollama.
  2. Download a model from terminal:
    ollama pull llama3.2 (example: Llama 3.2)
  3. Verify Ollama server is running (usually auto-starts after install).
  4. Select "Local LLM" in RocketWhisper's AI settings.
  5. Enter http://localhost:11434 as the base URL.
  6. Enter the downloaded model name as Model ID (e.g., llama3.2, qwen2.5, gemma2).
    * Run ollama list to see available model names.
  7. API key can be left empty (not required for Ollama).

Tip: To minimize costs, Groq (free tier) or Google Gemini (generous free tier) are recommended. For complete privacy, Local LLM processes everything from speech recognition to AI formatting fully offline.

12AI Command Mode

AI Command mode lets you give voice instructions to edit selected text with AI. Perform translation, summarization, tone changes, and more with just your voice.

How to Use

  1. Select text in any app.
  2. Press ⌃⇧Space (Control + Shift + Space) to activate AI Command mode.
  3. Speak your instruction (e.g., "Translate to English").
  4. AI processes the selected text according to your instruction and replaces it.

Examples

Voice Instruction Processing
"Translate to English" Translate selected text to English
"Summarize this" Summarize selected text concisely
"Make it formal" Convert casual text to formal style
"Add comments" Add comments to code
"Convert to bullet points" Convert text to bullet list format
"Fix typos" Correct spelling and grammar errors

Note: AI Command mode requires an API key for one of the AI providers to be configured.

13Custom Instructions

Custom Instructions let you pre-assign AI processing prompts to dedicated shortcuts. Unlike AI Commands, you don't need to speak instructions each time — the recognized speech is automatically processed using a pre-configured prompt.

Difference from AI Commands

Feature AI Commands Custom Instructions
AI Instruction Speak instructions each time Pre-configured prompt
Text Selection Required (processes selected text) Not required (processes speech input)
Shortcut Single shared shortcut (⌃⇧Space) Individual shortcut per instruction
Best For Ad-hoc, varying instructions Frequently used operations in one action

How to Use

  1. Create an instruction in the "Custom Instructions" settings tab and assign a shortcut.
  2. In any app, press the assigned shortcut to start recording.
  3. Speak into the microphone (the recognized text becomes the AI input).
  4. Press the same shortcut again to stop recording.
  5. Speech is recognized and processed by AI with the pre-configured prompt, then automatically pasted.

Preset Instructions

Four presets are automatically created on first launch. These can be edited but not deleted.

Preset Description
🌐 Translate to English Translate speech to natural English
💼 Business Style Convert to formal business Japanese
📝 Summary Summarize text concisely
✔️ Grammar Fix Fix grammar errors and misrecognitions

Note: Custom Instructions require an API key for one of the AI providers. Up to 20 instructions can be registered.

14Voice Launcher

Voice Launcher lets you launch apps or open URLs by speaking registered keywords. Processed at Stage 0 of the pipeline, so matching keywords execute actions without text output.

How It Works

Setup

  1. Open Voice Launcher settings.
  2. Click "Add".
  3. Enter the trigger keyword (e.g., "Notes", "Browser").
  4. Enter the app path or URL to open.

Configuration Examples

Keyword Action Type
"Notes" /Applications/Notes.app App Launch
"Browser" /Applications/Safari.app App Launch
"Terminal" /Applications/Utilities/Terminal.app App Launch
"GitHub" https://github.com Open URL
"Mail" /System/Applications/Mail.app App Launch

17Floating Waveform Indicator

A floating window showing a small mini equalizer-style waveform during recording. Always on top, so you can confirm recording status even while working in other apps.

Specifications

Indicator Details

  • Size: 96 x 48 pixels (compact capsule shape)
  • Bars: 8 mini equalizer-style bars
  • Color: Blue → Purple → Pink gradient
  • Background: Frosted glass (ultraThinMaterial) + rounded corners
  • Display: Fades in on recording start, fades out on stop
  • Initial position: Bottom center of screen

Operations

Settings

Toggle "Show floating waveform during recording" in the "Model & Language" settings tab. Default is enabled (ON).

Tip: To reset position, run these commands in Terminal:
defaults delete biz.mojosoft.RocketWhisper FloatingWaveformX
defaults delete biz.mojosoft.RocketWhisper FloatingWaveformY

18Batch Processing

Batch transcribe multiple audio files at once. Convenient for processing recorded meeting audio or interview files.

How to Open

  1. Open the menu bar popup.
  2. Click the Batch Processing button (document icon) in the header.
  3. A separate batch processing window opens.

Usage

  1. Add files: Click "Add Files" to select audio files, or drag & drop onto the window.
  2. Start batch: Click "Start Batch" to transcribe files sequentially.
  3. View results: Recognition results (character count) are shown in the list.
  4. Export: Select export format from the "Export" menu and choose a destination folder.

Supported File Formats

WAV, MP3, M4A, FLAC, OGG, WMA, AAC, AIFF

Export Formats

Format Description Use Case
TXT Plain text General transcription text
SRT SubRip subtitle format Subtitle creation for video editing
VTT WebVTT subtitle format Subtitles for web video and HTML5

Tip: Batch processing uses its own Whisper model instance, so it can run simultaneously with real-time voice input. However, be mindful of memory usage when processing many files.

19Troubleshooting

If you encounter issues, refer to this FAQ.

Check your network connection. Model files are several hundred MB to 3GB, so a stable Wi-Fi connection is recommended. If download is interrupted, restart the app to retry. Temporarily disable VPN or proxy if applicable.

Check the following:

  • Change model: Switch to a larger model (Large V3 Turbo recommended).
  • Microphone: Try an external mic, adjust distance, reduce ambient noise.
  • Language setting: Verify the recognition language is correctly set.
  • Word dictionary: Register technical terms in the dictionary to improve accuracy.

Check Accessibility permissions.

  1. Open System Settings.
  2. Go to Privacy & Security > Accessibility.
  3. Verify RocketWhisper is listed and the toggle is enabled.
  4. If not listed, click "+" to add it.
  5. If already added but not working, disable and re-enable.

Also check that no other app is using the same shortcut. Change to a different shortcut if there's a conflict.

Check the following:

  • macOS version: Requires macOS 14.0 Sonoma or later. Check via Apple menu > About This Mac.
  • Gatekeeper: If you see "app can't be opened because developer cannot be verified", go to System Settings > Privacy & Security and click "Open Anyway".
  • Apple Silicon: Works on Intel Macs too, but Apple Silicon (M1+) is recommended.

Check the following:

  • API key: Verify the API key is correctly entered in settings.
  • Internet connection: Cloud AI providers require internet.
  • API credits: For OpenAI or Anthropic, verify you have remaining API credits.
  • Local LLM: For LM Studio or Ollama, verify the local server is running.

RocketWhisper needs microphone permission.

  1. Open System Settings.
  2. Go to Privacy & Security > Microphone.
  3. Verify RocketWhisper's toggle is enabled.

If the permission dialog didn't appear on first launch, quit and restart the app.

Check the following:

  • Microphone input: Verify the correct microphone is selected in "Input Device" settings.
  • Microphone permission: Check microphone access is allowed in macOS privacy settings.
  • Volume: Check input volume is adequate. Verify in System Settings > Sound > Input.
  • Recording duration: Very short recordings (under 1 second) may not produce results.