1Installation
System Requirements
System Requirements
- OS: macOS 14.0 Sonoma or later
- Processor: Apple Silicon recommended (M1 / M2 / M3 / M4)
- Memory: RAM 8GB or more (16GB recommended)
- Storage: 200MB + model files (up to 3GB)
Installation Steps
- Download the latest
.dmgfile from the download page. - Double-click the downloaded DMG file and drag the RocketWhisper icon to the Applications folder.
- If Gatekeeper shows a warning on first launch, click "Open". Alternatively, go to System Settings > Privacy & Security and click "Open Anyway".
- On first launch, the Whisper model will begin downloading. This may take a few minutes depending on your network.
Choosing a Model
Whisper Model Comparison
| Model | Size | Accuracy | Speed | Recommended Use |
|---|---|---|---|---|
| Small | 500MB | High | Fast | Lower-spec Macs |
| Medium | 1.5GB | High | Normal | Audio under 5 seconds |
| Large V3 Turbo Recommended | 1.6GB | High | Fast | 5-20 second audio |
| Large V3 | 3.0GB | Highest | Slower | Audio over 20 seconds |
Tip: If you're unsure, Large V3 Turbo is recommended. It offers excellent balance of accuracy and speed, optimized for Apple Silicon's Neural Engine.
* For Japanese speech recognition, Large V3 Turbo or higher is recommended. Small/Medium may have reduced accuracy for kanji and katakana words.
2Basic Usage
Voice Recognition from Microphone
- Click the RocketWhisper icon in the menu bar to show the popup window.
- Click the Record button (microphone icon).
- Speak into your microphone.
- Click the Stop button.
- The recognition result will appear in the text area.
Using Recognition Results
- Copy: Click the copy button to copy the result to your clipboard.
- Auto Copy: When enabled in settings, results are automatically copied on completion.
- Auto Paste: When enabled, results are automatically pasted into the active app's text field when using shortcut recording.
Tip: When using shortcut recording, text can be automatically pasted into the app that was focused before recording. UI button recording requires manual copy and paste.
3Settings
Click the gear icon in the popup window to open settings. Various options are available in the following tabs.
| Tab | Settings |
|---|---|
| Model & Language | Whisper model selection, recognition language |
| Input Device | Microphone selection, auto copy, auto paste |
| Shortcut | Recording shortcut customization, Right Option key, cancel key, AI Command shortcut (⌃⇧Space) |
| Word Dictionary | Custom terminology: technical terms, company names, personal names |
| Text Processing | Auto punctuation, line breaks, voice commands enable/disable |
| Correction Rules | Auto-correction enable/disable, preset rules, custom rules |
| App-Specific | Processing mode settings, app-to-mode mapping |
| AI Processing | AI provider selection (OpenAI / Anthropic / Groq / Gemini / Local LLM), model, API keys |
| License | License type, license key entry |
4Global Shortcut
RocketWhisper supports customizable global shortcuts, allowing instant recording from any app. It also supports Right Option key tap and hold (Push-to-Talk).
Right Option Key Operations
| Action | Behavior |
|---|---|
| Hold Right Option → Release | Push-to-Talk (records while held, stops and recognizes on release) |
| Double-tap Right Option | Switch to continuous recording mode (tap again to stop) |
| Press another key while holding Right Option | Cancel recording (works as normal Option modifier) |
Recommended Shortcut Settings
| Shortcut | Type | Description |
|---|---|---|
⌥Space |
Toggle | Default setting. Option + Space to start/stop recording. Same as Superwhisper. |
Right Option (hold) |
Push-to-Talk | Most recommended. Records only while held, auto-stops on release. |
⌃⇧R |
Toggle | R for Record. Each press toggles recording on/off. |
F9 |
Toggle | Function key. Less likely to conflict with other shortcuts. |
Cancel Recording
Press Escape during recording to cancel without processing.
AI Command Shortcut
⌃⇧Space (Control + Shift + Space) activates AI Command mode. See AI Command Mode for details.
Note: Global shortcuts require Accessibility permissions. You'll be prompted on first launch. Verify in System Settings > Privacy & Security > Accessibility that RocketWhisper is enabled.
5Recognition History
RocketWhisper automatically saves past recognition results. Click the History button in the popup to view the history list.
History Features
- List View: View past results with timestamps
- Search: Search past results by keyword
- Copy: Copy any history item to clipboard
- Delete: Delete individual history items
- Export: Export history as a text file
6Text Processing (Punctuation & Line Breaks)
RocketWhisper includes advanced text processing features to produce natural, well-formatted text.
Punctuation Prompt
A prompt is set for the Whisper model to encourage punctuation output, making the model more likely to include punctuation.
Auto Punctuation
Post-processing applies 7 rule-based punctuation insertions. Even if Whisper's output lacks punctuation, natural punctuation is added.
Punctuation Rules (7 Stages)
- Insert periods after sentence-ending expressions
- Insert commas after conjunctive particles
- Insert question marks at end of questions
- Insert exclamation marks at end of exclamations
- Insert commas in enumerations
- Insert commas at long phrase boundaries
- Remove unnecessary leading punctuation
Auto Line Breaks
Automatically inserts line breaks at sentence boundaries, making long text more readable with paragraphs.
Tip: Auto punctuation and auto line breaks can be enabled/disabled independently. Disable line breaks for chat apps, enable for document creation.
7Voice Commands
Voice commands let you execute text editing operations by speaking specific phrases. Enable/disable in "Text Processing" settings.
Supported Commands
| Command | Trigger Phrases | Action |
|---|---|---|
| New Line | "new line", "enter", "line break" | Insert line break |
| Paragraph | "paragraph", "new paragraph" | Insert double line break (paragraph) |
| Delete | "delete", "backspace", "undo" | Delete previous word |
Tip: Voice commands are processed at Stage 1 of the text processing pipeline. When a command is detected, the action is executed and the phrase is removed from the text.
If Voice Commands Aren't Recognized
Whisper may sometimes misrecognize homophones based on context.
If voice commands don't work as expected:
- Check that voice commands are enabled — Verify "Voice Commands" is ON in the "Text Processing" settings tab.
- Pause before and after commands — A short pause before and after helps distinguish commands from regular speech.
- Speak clearly — Pronouncing each syllable distinctly improves recognition.
8Word Dictionary (Custom Terms)
The word dictionary allows you to register technical terms, company names, personal names, and acronyms that Whisper might not recognize well. This dramatically improves recognition accuracy. Not available in macOS built-in dictation.
How It Works
Registered words are used as WhisperKit promptTokens, making the Whisper model prioritize these terms in its output.
How to Register
- Open the "Word Dictionary" tab in settings.
- Click the "Add" button.
- Enter the word to register (e.g.,
React,TypeScript,AWS). - Optionally set a reading/pronunciation (for Whisper recognition assistance).
Note: Keep registered terms to about 15 words (short tokens) maximum. Too many terms may affect decoder log probability and reduce accuracy.
Registration Examples
- Technical terms: React, TypeScript, Kubernetes, Docker
- Company names: Mojosoft, OpenAI
- Personal names: John Smith
- Acronyms: AWS, GCP, CI/CD
Features
- Fully local processing — No API costs, no internet required
- Real-time effect — Active immediately after registration
- Dictionary replacement — Can also be used for auto-replacement rules
9Auto-Correction Rules
Set up rules to automatically correct misrecognitions in Whisper output. Supports both simple string replacement and regex.
Rule Types
- Simple replacement: Replace a specific string with another
- Regex: Advanced replacement using regular expression patterns
- Case sensitivity: Set per-rule whether to match case
Built-in Hallucination Filters
27 filters are built in to automatically remove "hallucination text" that Whisper sometimes generates during silence, such as:
- "Thank you for watching"
- "Please subscribe"
- "Good night" (hallucination during silence)
Preset Rules
Preset rules for common misrecognition patterns are available. Enable with one click from settings.
Custom Rules
- Open the "Correction Rules" tab in settings.
- Click "Add Rule".
- Enter the search string (misrecognized text) and replacement string (correct text).
- Optionally enable "Use regex" and "Ignore case" options.
10App-Specific Processing Modes
App-specific modes automatically apply different text processing settings based on the focused app. For example, use punctuation for text editors and casual style for chat apps.
Processing Modes
| Mode | AI Required | Description |
|---|---|---|
| Smart | No | Auto-formats punctuation and line breaks. Most versatile mode. |
| Simple | No | Output recognition result as-is. Minimal processing. |
| Business | Yes | Auto-converts to polite, formal style. For emails and documents. |
| Casual | Yes | Converts to friendly style. For chat and social media. |
| Summary | Yes | Summarizes recognized text. For meeting notes and memos. |
| Translation | Yes | Translates to another language. |
| Grammar Fix | Yes | AI corrects misrecognition and grammar errors. |
Setting Up App Mapping
- Open the "App-Specific" tab in settings.
- Enable "App-Specific Processing Modes".
- Click "Add" to select an app and set its mode.
Tip: When app-specific mode is enabled and a mode is set for the current app, mode-specific settings apply. Otherwise, global settings are used.
11AI Processing (LLM Integration)
RocketWhisper integrates with 5 AI providers for advanced processing like text formatting, translation, and summarization.
Supported AI Providers
| Provider | Model Examples | Features |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o mini | High accuracy, broad language support |
| Anthropic | Claude Sonnet 4.5, Haiku 4.5 | Natural language, polite output |
| Groq | LLaMA 3.3 70B | Ultra-fast inference, free tier available |
| Google Gemini | Gemini 2.5 Pro / Flash | Generous free tier, multimodal support |
| Local LLM | LM Studio, Ollama | Fully offline, privacy-focused |
Setup
- Open the "AI Processing" tab in settings.
- Select your provider.
- Enter your API key (not needed for Local LLM).
- Select the model to use.
- Enable AI processing and choose a processing mode.
Local LLM Setup Examples
Using LM Studio
- Install LM Studio and download your preferred model.
- Start the local server in LM Studio (default:
http://localhost:1234). - Select "Local LLM" in RocketWhisper's AI settings.
- Enter
http://localhost:1234as the base URL. - Model ID can be left empty (LM Studio uses the loaded model).
- API key can be empty or a dummy value (e.g.,
lm-studio).
Using Ollama
- Install Ollama.
- Download a model from terminal:
ollama pull llama3.2(example: Llama 3.2) - Verify Ollama server is running (usually auto-starts after install).
- Select "Local LLM" in RocketWhisper's AI settings.
- Enter
http://localhost:11434as the base URL. - Enter the downloaded model name as Model ID (e.g.,
llama3.2,qwen2.5,gemma2).
* Runollama listto see available model names. - API key can be left empty (not required for Ollama).
Tip: To minimize costs, Groq (free tier) or Google Gemini (generous free tier) are recommended. For complete privacy, Local LLM processes everything from speech recognition to AI formatting fully offline.
12AI Command Mode
AI Command mode lets you give voice instructions to edit selected text with AI. Perform translation, summarization, tone changes, and more with just your voice.
How to Use
- Select text in any app.
- Press
⌃⇧Space(Control + Shift + Space) to activate AI Command mode. - Speak your instruction (e.g., "Translate to English").
- AI processes the selected text according to your instruction and replaces it.
Examples
| Voice Instruction | Processing |
|---|---|
| "Translate to English" | Translate selected text to English |
| "Summarize this" | Summarize selected text concisely |
| "Make it formal" | Convert casual text to formal style |
| "Add comments" | Add comments to code |
| "Convert to bullet points" | Convert text to bullet list format |
| "Fix typos" | Correct spelling and grammar errors |
Note: AI Command mode requires an API key for one of the AI providers to be configured.
13Custom Instructions
Custom Instructions let you pre-assign AI processing prompts to dedicated shortcuts. Unlike AI Commands, you don't need to speak instructions each time — the recognized speech is automatically processed using a pre-configured prompt.
Difference from AI Commands
| Feature | AI Commands | Custom Instructions |
|---|---|---|
| AI Instruction | Speak instructions each time | Pre-configured prompt |
| Text Selection | Required (processes selected text) | Not required (processes speech input) |
| Shortcut | Single shared shortcut (⌃⇧Space) | Individual shortcut per instruction |
| Best For | Ad-hoc, varying instructions | Frequently used operations in one action |
How to Use
- Create an instruction in the "Custom Instructions" settings tab and assign a shortcut.
- In any app, press the assigned shortcut to start recording.
- Speak into the microphone (the recognized text becomes the AI input).
- Press the same shortcut again to stop recording.
- Speech is recognized and processed by AI with the pre-configured prompt, then automatically pasted.
Preset Instructions
Four presets are automatically created on first launch. These can be edited but not deleted.
| Preset | Description |
|---|---|
| 🌐 Translate to English | Translate speech to natural English |
| 💼 Business Style | Convert to formal business Japanese |
| 📝 Summary | Summarize text concisely |
| ✔️ Grammar Fix | Fix grammar errors and misrecognitions |
Note: Custom Instructions require an API key for one of the AI providers. Up to 20 instructions can be registered.
14Voice Launcher
Voice Launcher lets you launch apps or open URLs by speaking registered keywords. Processed at Stage 0 of the pipeline, so matching keywords execute actions without text output.
How It Works
- Exact match for keywords (ignoring punctuation and case)
- On match, launches the registered app or opens URL in browser
- No text output on match (action only)
Setup
- Open Voice Launcher settings.
- Click "Add".
- Enter the trigger keyword (e.g., "Notes", "Browser").
- Enter the app path or URL to open.
Configuration Examples
| Keyword | Action | Type |
|---|---|---|
| "Notes" | /Applications/Notes.app |
App Launch |
| "Browser" | /Applications/Safari.app |
App Launch |
| "Terminal" | /Applications/Utilities/Terminal.app |
App Launch |
| "GitHub" | https://github.com |
Open URL |
| "Mail" | /System/Applications/Mail.app |
App Launch |
15Voice Search
Voice Search lets you trigger Google searches instantly by speaking specific phrases. Results open in your default browser.
Supported Phrases (10 Patterns)
| Phrase Pattern | Example |
|---|---|
| "Search for..." | "Search for SwiftUI" |
| "Look up..." | "Look up Neural Engine" |
| "Google..." | "Google macOS Sequoia" |
| "What is..." | "What is WhisperKit" |
| "Find information about..." | "Find information about Metal API" |
Tip: Voice Search is processed at Stage 0.5 of the pipeline. The keyword portion is automatically extracted and used as the Google search query.
17Floating Waveform Indicator
A floating window showing a small mini equalizer-style waveform during recording. Always on top, so you can confirm recording status even while working in other apps.
Specifications
Indicator Details
- Size: 96 x 48 pixels (compact capsule shape)
- Bars: 8 mini equalizer-style bars
- Color: Blue → Purple → Pink gradient
- Background: Frosted glass (ultraThinMaterial) + rounded corners
- Display: Fades in on recording start, fades out on stop
- Initial position: Bottom center of screen
Operations
- Drag to move: Drag the indicator to any position.
- Position memory: Position is saved and restored on next launch.
- All Spaces: Visible on all macOS desktops (Spaces).
- Always on top: Stays above all other windows.
Settings
Toggle "Show floating waveform during recording" in the "Model & Language" settings tab. Default is enabled (ON).
Tip: To reset position, run these commands in Terminal:
defaults delete biz.mojosoft.RocketWhisper FloatingWaveformX
defaults delete biz.mojosoft.RocketWhisper FloatingWaveformY
18Batch Processing
Batch transcribe multiple audio files at once. Convenient for processing recorded meeting audio or interview files.
How to Open
- Open the menu bar popup.
- Click the Batch Processing button (document icon) in the header.
- A separate batch processing window opens.
Usage
- Add files: Click "Add Files" to select audio files, or drag & drop onto the window.
- Start batch: Click "Start Batch" to transcribe files sequentially.
- View results: Recognition results (character count) are shown in the list.
- Export: Select export format from the "Export" menu and choose a destination folder.
Supported File Formats
WAV, MP3, M4A, FLAC, OGG, WMA, AAC, AIFF
Export Formats
| Format | Description | Use Case |
|---|---|---|
| TXT | Plain text | General transcription text |
| SRT | SubRip subtitle format | Subtitle creation for video editing |
| VTT | WebVTT subtitle format | Subtitles for web video and HTML5 |
Tip: Batch processing uses its own Whisper model instance, so it can run simultaneously with real-time voice input. However, be mindful of memory usage when processing many files.
19Troubleshooting
If you encounter issues, refer to this FAQ.
Check your network connection. Model files are several hundred MB to 3GB, so a stable Wi-Fi connection is recommended. If download is interrupted, restart the app to retry. Temporarily disable VPN or proxy if applicable.
Check the following:
- Change model: Switch to a larger model (Large V3 Turbo recommended).
- Microphone: Try an external mic, adjust distance, reduce ambient noise.
- Language setting: Verify the recognition language is correctly set.
- Word dictionary: Register technical terms in the dictionary to improve accuracy.
Check Accessibility permissions.
- Open System Settings.
- Go to Privacy & Security > Accessibility.
- Verify RocketWhisper is listed and the toggle is enabled.
- If not listed, click "+" to add it.
- If already added but not working, disable and re-enable.
Also check that no other app is using the same shortcut. Change to a different shortcut if there's a conflict.
Check the following:
- macOS version: Requires macOS 14.0 Sonoma or later. Check via Apple menu > About This Mac.
- Gatekeeper: If you see "app can't be opened because developer cannot be verified", go to System Settings > Privacy & Security and click "Open Anyway".
- Apple Silicon: Works on Intel Macs too, but Apple Silicon (M1+) is recommended.
Check the following:
- API key: Verify the API key is correctly entered in settings.
- Internet connection: Cloud AI providers require internet.
- API credits: For OpenAI or Anthropic, verify you have remaining API credits.
- Local LLM: For LM Studio or Ollama, verify the local server is running.
RocketWhisper needs microphone permission.
- Open System Settings.
- Go to Privacy & Security > Microphone.
- Verify RocketWhisper's toggle is enabled.
If the permission dialog didn't appear on first launch, quit and restart the app.
Check the following:
- Microphone input: Verify the correct microphone is selected in "Input Device" settings.
- Microphone permission: Check microphone access is allowed in macOS privacy settings.
- Volume: Check input volume is adequate. Verify in System Settings > Sound > Input.
- Recording duration: Very short recordings (under 1 second) may not produce results.
Need More Help?
If the above doesn't solve your issue, please contact support.
Contact Support