1. Installation
System Requirements
- Windows 11 (64-bit)
- .NET 8.0 Desktop Runtime
- 8GB+ RAM recommended
- GPU (CUDA-compatible) for accelerated processing
Installation Steps
- Download the latest version from the Download page
- Extract the downloaded ZIP file
- Run
RocketWhisper.exe - On first launch, download the Whisper model you want to use
Choosing the Right Model
| Model | Size | Accuracy | Speed | Recommended Use |
|---|---|---|---|---|
| small | 466MB | High | Fast | Low-spec PCs |
| medium | 1.5GB | High | Normal | Best for audio under 5 seconds |
| large-v3-turbo | 1.6GB | High | Fast | Best for 5–20 second audio |
| large-v3 | 2.9GB | Highest | Slightly slow | Best for audio over 20 seconds |
Tips for Choosing
- Short audio (under 5 seconds): medium is recommended
- Medium-length audio (5–20 seconds): large-v3-turbo is recommended
- Long audio (over 20 seconds): large-v3 offers the highest accuracy
* Larger models may produce repetition artifacts with very short audio clips. If your use case involves brief utterances, try the medium model.
2. Basic Usage
Speech Recognition via Microphone
- Launch RocketWhisper
- Click the record button (microphone icon)
- Speak into the microphone
- Click the stop button
- The recognized text appears in the text area
Speech Recognition from File
- Click the file select button
- Choose an audio file (WAV/MP3/FLAC/OGG/M4A/WMA)
- Recognition processing starts automatically
- The recognized text appears in the text area
Drag & Drop Support
Simply drag and drop an audio file onto the RocketWhisper window to start recognition immediately.
Using Recognition Results
- Copy: Click the copy button to copy to clipboard
- Save: Click the save button to save as a text file
- Auto Copy: When enabled in settings, automatically copies upon recognition completion
- Auto Paste: When enabled in settings, automatically pastes upon recognition completion
3. Settings
Click the gear icon to open the settings window. Detailed configuration is available across 8 tabs.
Model & Language Tab
- Model Selection: Choose the Whisper model to use
- Language: Recognition language (Japanese/English/Chinese/Korean/Auto-detect)
- Punctuation Prompt: Instructs Whisper to include punctuation in the output (recommended)
Input Device Tab
- Microphone Device: Select the microphone to use
- Auto Copy: Automatically copy to clipboard upon recognition completion
- Auto Paste: Automatically paste to the active window upon recognition completion
Hotkey Tab
- Record Start/Stop Hotkey: A shortcut to start recording from any application
- Right Alt (Hold): Hold Right Alt to record, release to stop (double-tap for continuous recording)
- Cancel Hotkey: Key to cancel recording (default: Escape)
- AI Command Hotkey: Key to execute AI processing with text selection + voice instruction (default: Ctrl + Space)
Custom Terms Tab
- Custom Term Registration: Register terms to improve recognition accuracy
- Token Usage: Whisper accepts up to 224 tokens' worth of terms (approximately 50 English words)
Text Processing Tab
- Auto Punctuation: Add punctuation via post-processing after recognition
- Auto Line Breaks: Automatically insert line breaks in long text
- Voice Commands: Enable/disable voice command functionality
Correction Rules Tab
- Auto Correction: Enable/disable automatic correction of misrecognition patterns
- Preset Rules: One-click addition of AI/tech terms, numbers/dates, common errors, and filler removal
- Custom Rules: Add and edit custom correction rules (regex supported)
App-Specific Processing Tab
- Processing Modes: Apply different text processing settings per application
- App Mapping: Assign processing modes to specific applications
AI Processing Tab
- AI Provider: Choose from OpenAI, Anthropic, Groq, Gemini, or Local LLM
- Model Selection: Select the AI model to use
- API Key: Enter the API key for your chosen provider
- Connection Test: Verify that settings are correct
Punctuation Prompt vs. Auto Punctuation
| Feature | Processing Timing | How It Works |
|---|---|---|
| Punctuation Prompt | During recognition (inside Whisper) | The AI understands context and places punctuation naturally |
| Auto Punctuation | After recognition (post-processing) | Mechanically inserts punctuation after sentence-ending patterns |
We recommend starting with Punctuation Prompt only. If punctuation is still insufficient, enable Auto Punctuation as well.
License Tab
- License Type: Free, Personal License, Business License
- License Key: Enter and authenticate your license key
4. Global Hotkeys
Regardless of which application you are using, you can instantly start recording with your configured key combination.
Setup
- Open the settings window
- Select the "Hotkey" tab
- Click the input field
- Press your desired key combination (e.g.,
Ctrl + Shift + R) - Check the "Enabled" checkbox
How to Use
- Open any application such as a text editor or web browser
- Press the configured hotkey to start recording
- When finished speaking, press the hotkey again to stop recording
- If Auto Paste is enabled, the recognized text is automatically typed into your application
Recommended Hotkeys
Right Alt (Hold)- Hold to record, release to stop (default, most recommended)Ctrl + Shift + R- R for Record (toggle mode)F9- Function key (unlikely to conflict with other apps)
Right Alt Hold Mode (Push-to-Talk)
An intuitive mode where recording occurs only while the Right Alt key is held down, and stops automatically when released.
Basic Operations
| Action | Behavior |
|---|---|
| Hold Right Alt → Release | Records only while held (Push-to-Talk) |
| Quick double-tap Right Alt | Switches to continuous recording mode (tap again to stop) |
| Press another key while holding Right Alt | Cancels recording (normal shortcuts like Alt+Tab remain functional) |
Setup
- Open the settings window
- Select the "Hotkey" tab
- Click the "Right Alt (Hold)" button for the Record Start/Stop hotkey
- Check "Enabled" and save
Comparison with Traditional Hotkeys
| Mode | Start Recording | Stop Recording |
|---|---|---|
| Traditional Hotkey | Press once | Press again (toggle) |
| Right Alt Hold | Hold down | Release (auto-stop) |
| Right Alt Double-Tap | Quick double-tap | Tap once more |
Tip
Use "Hold" mode for short phrases and "Double-Tap (continuous recording)" for longer dictation. Since v1.0.5, the default hotkey is set to "Right Alt (Hold)".
Cancel Recording
Press the Escape key during recording to cancel without performing recognition. This works in both Hold mode and continuous recording mode.
- Default Key: Escape
- Customize: Settings → Hotkey tab → Cancel Hotkey
- Enable/Disable: Toggle with the checkbox
5. Batch Processing
Transcribe multiple audio and video files at once. Video files are automatically processed by extracting audio via FFmpeg.
Batch Processing Steps
- Select "Batch Processing" from the menu
- Add the files you want to process (multiple selection supported)
- Choose the output format (Text/SRT/VTT)
- Click "Start Processing"
- Wait for processing to complete
Supported Formats
- Audio Input: WAV, MP3, FLAC, OGG, M4A, WMA
- Video Input: MP4, MKV, AVI, MOV, WebM, WMV, FLV
- Output: TXT (plain text), SRT (subtitles), VTT (web subtitles)
* FFmpeg is required for video file transcription. It is included with the installer version.
6. Recognition History
Past recognition results are automatically saved and can be referenced or reused at any time.
Viewing History
- Click the history button (clock icon)
- A list of past recognition results is displayed
- Click an item to view its details
History Operations
- Search: Search past recognition results by keyword
- Copy: Copy the selected result to the clipboard
- Delete: Delete unwanted history entries
- Export: Export history as a file
History Storage Location
History data is stored at %APPDATA%\RocketWhisper\history.json.
7. Text Processing (Punctuation & Line Breaks)
Features for automatically inserting punctuation and line breaks into recognition results. There are two approaches.
Punctuation Prompt (Recommended)
Instructs Whisper AI to output recognition results that include punctuation.
- How it works: Passes a sample sentence to Whisper to encourage punctuated output
- Advantage: The AI understands context and places punctuation naturally
- Supported languages: Japanese, English, Chinese, Korean
Auto Punctuation (Post-Processing)
Adds punctuation using rule-based logic after recognition.
- How it works: Detects sentence-ending patterns and inserts periods
- Advantage: Useful as a fallback when Whisper doesn't output punctuation
- Note: Pattern-based mechanical processing that doesn't consider context
Auto Line Breaks
Inserts line breaks at appropriate positions in long text.
- Breaks before conjunctions (e.g., "however", "also", "furthermore")
- Splits sentences longer than 100 characters at sentence boundaries
Recommended Settings
- Start with Punctuation Prompt only enabled
- If punctuation is insufficient, also enable Auto Punctuation
- For long text, consider enabling Auto Line Breaks
8. Voice Commands
Perform actions like inserting line breaks or deleting text using voice commands.
Available Commands
| Command | Trigger Phrase | Action |
|---|---|---|
| New Line | "new line", "enter" | Inserts a line break |
| New Paragraph | "new paragraph", "paragraph" | Inserts two line breaks (paragraph break) |
| Delete | "delete", "undo", "backspace" | Deletes the previous word |
About Punctuation
Punctuation (periods, commas, question marks, exclamation marks) is automatically inserted by the "Punctuation Prompt" and "Auto Punctuation" features. Enable them in the Text Processing tab of the settings.
Usage Tips
- Speak commands clearly at natural pauses in your speech
- Pausing briefly before and after a command improves recognition accuracy
- Voice commands can be enabled/disabled in the settings
9. Custom Terminology
Register industry-specific terms and proper nouns to improve recognition accuracy.
How to Register
- Open the settings window
- Select the "Custom Terms" tab
- Enter a term and click "Add"
Registration Capacity
There is a limit to the number of custom terms Whisper can accept (up to approximately 224 tokens).
- English: Approximately 50 words
- Japanese: Approximately 100 words
Registration capacity can be monitored in real-time in the settings. When the limit is exceeded, older terms are ignored, so prioritize registering frequently used terms.
This Feature is Completely Free
Custom terminology registration uses Whisper's initial prompt feature and processes everything locally. There are no API costs whatsoever.
10. Misrecognition Correction Rules
Set up rules to automatically correct specific misrecognition patterns. Similar to an IME word dictionary, it replaces recognized text strings with different strings.
How to Add Rules
- Open the settings window
- Select the "Correction Rules" tab
- Check "Auto-correct misrecognition patterns"
- Click the "Edit Correction Rules..." button
- Enter the "Search Pattern" and "Replacement String"
- Click "Add Rule"
Basic Usage (Simple Replacement)
Perform simple string replacements without using regular expressions.
| Search Pattern | Replacement String | Description |
|---|---|---|
| micro soft | Microsoft | Fix spacing in company name |
| mojosoft | Mojosoft | Fix capitalization of company name |
| whisper | Whisper | Fix capitalization of product name |
Using Regular Expressions (Advanced Replacement)
Check "Use Regular Expressions" to enable advanced pattern matching.
| Search Pattern | Replacement String | Description |
|---|---|---|
| micro\s*soft | Microsoft | Replace regardless of spacing |
| you know$ | Remove "you know" at end of line |
Regular Expression Symbol Guide:
\s*: Matches zero or more whitespace characters (spaces, tabs, etc.). For example, "micro\s*soft" matches both "microsoft" and "micro soft".$: Matches the end of a line. "you know$" only matches "you know" at the end of a line, not in the middle of a sentence.
Built-in Hallucination Protection
RocketWhisper includes 73 built-in Whisper hallucination countermeasures. Unwanted phrases such as "Thank you for watching" are automatically removed.
Difference from Custom Terminology
- Custom Terminology: Gives Whisper a hint that "this word might appear," improving recognition accuracy
- Correction Rules: Replaces text in the recognition result after the fact
Using both together yields the most accurate recognition results.
11. App-Specific Processing Modes
Automatically apply different text processing settings depending on which application you are using.
Processing Mode List
Processing modes are presets that bundle settings for punctuation, line breaks, voice commands, and more.
| Mode | Icon | AI Required | Description |
|---|---|---|---|
| Smart | ✨ | No | Auto-formats with punctuation and line breaks (standard) |
| Simple | 📋 | No | Outputs recognized text as-is |
| Business | 💼 | Yes | Auto-converts to formal/polite language |
| Casual | 💬 | Yes | Converts to friendly, casual tone |
| Summary | 📝 | Yes | Summarizes text concisely |
| Translation | 🌐 | Yes | Translates to English |
| Grammar Fix | ✔️ | Yes | Auto-corrects misrecognitions and grammar |
Hint
To use modes marked "AI Required", configure an AI provider in the "AI Processing" tab of the settings.
Assigning Modes to Apps
- Open the settings window
- Select the "App-Specific Processing" tab
- Click "Add App"
- Select the target app (from the list of running apps or enter manually)
- Choose the processing mode to apply
Usage Examples
- Email client: Auto-insert punctuation and line breaks (Smart mode)
- Code editor: No formatting (Simple mode)
- Notepad: Bullet-point-friendly settings (Custom mode)
Default Processing Mode
If an app is not registered, the "Default" mode is applied. Default mode settings can be changed in the "Text Processing" tab.
12. AI Processing (LLM Integration)
Integrate with external AI (LLM) to automatically convert recognition results to business language, casual tone, or perform summarization and translation. With a local LLM (LM Studio / Ollama), AI processing works even in offline environments.
Supported AI Providers
| Provider | Example Models | Features |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini | High accuracy, consistent quality |
| Anthropic | Claude 4.5 Sonnet/Haiku | Natural language, long text support |
| Groq | LLaMA 3.3 70B, Mixtral | High speed, free tier available |
| Google Gemini | gemini-2.5-pro/flash | Generous free tier, high quality |
| Local LLM | LM Studio, Ollama | Offline operation, privacy protection |
Setup
- Open the settings window
- Select the "AI Processing" tab
- Choose the provider to use
- Enter the API key (for Local LLM, only the base URL is needed)
- Click "Connection Test" to verify
AI Processing Modes
Once AI processing is configured, the following modes become available in app-specific processing.
| Mode | Description |
|---|---|
| 💼 Business | Auto-converts to formal/polite language |
| 💬 Casual | Converts to friendly, casual tone |
| 📝 Summary | Summarizes text concisely |
| 🌐 Translation | Translates to English |
| ✔️ Grammar Fix | Auto-corrects misrecognitions and grammar |
Local LLM Setup Examples
Use LM Studio or Ollama to utilize AI processing completely offline.
LM Studio
LM Studio Setup:
- Launch LM Studio
- Select the model to use
- In the status bar, choose "Power User" or "Developer" from User / Power User / Developer
- Navigate to the Developer tab (click the green "Developer" icon in the left menu)
- If the "Status: Running" toggle at the top of the screen is ON, the server is running
- When "Running", the API URL (e.g.,
http://localhost:1234) is displayed - To stop the server, click the toggle to switch to "Stopped"
RocketWhisper Setup:
| Setting | Value |
|---|---|
| Provider | Local LLM |
| Base URL | http://127.0.0.1:1234 |
| Model ID | Leave empty (automatically uses the currently loaded model) |
| API Key | Leave empty |
Ollama
Ollama Setup:
- Install and launch Ollama (the API server starts automatically in the background)
- Download the desired model:
ollama pull gemma3 - Verify downloaded models:
ollama list
RocketWhisper Setup:
| Setting | Value |
|---|---|
| Provider | Local LLM |
| Base URL | http://localhost:11434 |
| Model ID | Enter the name of a downloaded model (e.g., gemma3) |
| API Key | Leave empty |
LM Studio vs. Ollama
| Item | LM Studio | Ollama |
|---|---|---|
| Default Port | 1234 | 11434 |
| Model ID | Leave empty (auto-uses loaded model) | Required (specify downloaded model name) |
| GUI | Yes (desktop app) | CLI-based (can supplement with Open WebUI, etc.) |
Benefits of Local LLM
- Fully Offline: Both speech recognition and AI processing work without internet
- Privacy Protection: Neither audio data nor text is sent to any external service
- Free: No API costs
* The /v1 path is automatically appended
Processing Flow
AI processing follows this pipeline: Speech Recognition → Punctuation Insertion → Line Break Insertion → AI Processing → Character Conversion. If AI processing fails, the original text is used as-is.
13. AI Command Mode
Select text, press the dedicated hotkey, and give a voice instruction — the AI instantly processes the text. Supports translation, summarization, tone adjustment, and any other instruction you can think of.
Basic Usage
- Select text in any application (Notepad, browser, Word, etc.)
- Press the AI Command hotkey (default:
Ctrl + Space) - The status changes to "Waiting for voice instruction..." — give your instruction by voice (e.g., "Translate to Japanese", "Summarize this", "Make it formal")
- Press the AI Command hotkey again to stop recording
- The AI processes the text, and the result appears in RocketWhisper's text area
Usage Examples
| Selected Text | Voice Instruction | Result |
|---|---|---|
| Text in another language | "Translate to English" | Text translated to English |
| Long meeting notes | "Summarize this" | A concise summary |
| Casual memo | "Make it formal" | Business-appropriate text |
| Program code | "Add comments" | Code with comments added |
Hotkey Configuration
- Open the settings window
- Select the "Hotkey" tab
- Click the "AI Command Hotkey" input field
- Press your desired key combination (default:
Ctrl + Space) - Check "Enabled"
Prerequisites
To use AI Command Mode, you must configure an AI provider in the "AI Processing" tab of the settings. With a local LLM (LM Studio / Ollama), it works completely offline and free of charge.
Mutual Exclusion
AI Command is disabled during normal recording, and normal hotkeys are disabled during AI Command recording. This safety design prevents accidental operations.
14. Custom Instructions
Register your most-used AI tasks (translation, business rephrasing, summarization, etc.) to dedicated hotkeys. Just speak and the AI processing is applied automatically. Unlike AI Command Mode, you don't need to say "translate this" each time, making it ideal for repetitive tasks.
Setup
- Configure an AI provider in the "AI Processing" tab of the settings (local LLM or cloud API)
- Open the "Hotkeys" tab in the settings and find the "Custom Instructions" section
- Choose from presets (Translate to English, Business Style, Summary, Grammar Fix) or click "Add" to create a new one
- Assign a dedicated hotkey
- Make sure "Enabled" is checked
Basic Usage
- Press the hotkey assigned to your custom instruction
- Speak into the microphone (any content in any language)
- Press the same hotkey again to stop recording
- Whisper recognizes the audio → the pre-configured AI processing is applied automatically → result is output
Built-in Presets
| Preset | Icon | Action |
|---|---|---|
| Translate to English | 🌐 | Translates recognized text to English |
| Business Style | 💼 | Rephrases into formal/polite business tone |
| Summary | 📝 | Summarizes the text concisely |
| Grammar Fix | ✔️ | Corrects grammar and misrecognition errors |
Adding Custom Instructions
- Go to Settings → "Hotkeys" tab → "Custom Instructions" section → click "Add"
- Enter a name (e.g., "Translate to French")
- Enter a prompt (e.g., "Please translate the following text into French")
- Assign a dedicated hotkey
- Click "Save"
You can register up to 20 custom instructions.
Difference from AI Command Mode
AI Command Mode is a versatile tool where you select text and give voice instructions each time. Custom Instructions bind a pre-set prompt to a hotkey, so no voice instruction is needed — ideal for repeated tasks.
Prerequisites
To use Custom Instructions, you must configure an AI provider in the "AI Processing" tab of the settings. With a local LLM (LM Studio / Ollama), it works completely offline and free of charge.
Mutual Exclusion
Custom Instruction hotkeys are disabled during normal recording or AI Command recording. This safety design prevents accidental operations.
15. Recording Indicator
When performing hotkey recording while RocketWhisper is minimized to the system tray, a floating window appears to visually confirm the recording state.
Indicator Features
- Real-time Waveform Display: Visualizes audio levels with 16 animated bars
- Processing Mode Display: Shows the currently applied mode (Smart, Business, etc.) with icon and name
- Non-Focus-Stealing: Does not steal focus from other applications
- Draggable: Can be moved to any position; position is automatically saved
Default Position
Initially appears at the bottom center of the screen (above the taskbar). Once dragged to a new position, it will appear there on subsequent recordings.
When the Indicator Appears
The recording indicator only appears when hotkey recording is performed while RocketWhisper is minimized to the system tray. It does not appear when the main window is visible.
16. Voice Launcher
Automatically launches an application when a specific keyword is recognized. Simply say "Notepad" to open Notepad — control apps with just your voice.
Setup
- Open the settings window and select the "Voice Launcher & Search" tab
- Check "Enable Voice Launcher"
- Click "Edit Launcher Settings..."
- Click "Add" to register a keyword and executable file
- Click "Save" in the settings window
Registration Fields
| Field | Description | Example |
|---|---|---|
| Keyword | The trigger phrase for launching | Notepad, Calculator, Browser |
| Executable | Path to the application | C:\Windows\notepad.exe |
| Parameters | Launch arguments (optional) | --new-window |
| Description | Notes (optional) | Launch Notepad |
Matching Behavior
- Exact Match: Launches only when the recognition result exactly matches the keyword
- Punctuation Ignored: Trailing punctuation marks (commas, periods, !, ?) are automatically stripped
- Case Insensitive: English words are matched regardless of case
Registration Tips
Register the exact phrase you would naturally say as the keyword. For example, if you would say "open Notepad", register "open Notepad" as the keyword. Shorter keywords (like "Notepad") are easier to recognize.
Difference from Voice Commands
Voice commands ("new line", "new paragraph", etc.) are for text editing, while Voice Launcher is for launching applications. Both can be enabled simultaneously, but if the same keyword is registered for both, Voice Launcher takes priority.
17. Voice Search
Simply say "Search for..." during normal voice input to automatically open a Google search in your default browser.
Supported Phrases
| Phrase | Example |
|---|---|
| Search for ___ | "Search for Tokyo Tower" |
| Look up ___ | "Look up weather forecast" |
| Google ___ | "Google RocketWhisper" |
| What is ___ | "What is machine learning" |
| Tell me about ___ | "Tell me about Python" |
| ___ meaning | "AI meaning" |
Setup
- Open the settings window
- Select the "Voice Launcher & Search" tab
- Check "Enable Voice Search"
How It Works
The flow is: Recording → Speech Recognition → Phrase Detection → Auto-open Google search in default browser. If the input doesn't match a search phrase, it proceeds through the normal text processing pipeline.
Difference from Voice Launcher
Voice Launcher "launches a specific app when an exact keyword match is detected", while Voice Search "opens a browser search when a search phrase pattern is matched". Both can be enabled simultaneously.
18. Tray Icon
Check the application's status via the system tray icon.
Icon States
| State | Icon | Description |
|---|---|---|
| Standby | 🎤 (blue) | Ready to record |
| Recording | 🎤 (blue) + 🔴 (blinking) | Red dot blinks in the bottom-right corner |
| Processing | ⚙️ (yellow, spinning) | Whisper is performing speech recognition |
Tray Icon Actions
- Left Click: Show the main window
- Right Click: Show the context menu
19. Troubleshooting
Please check your network connection. Firewall or proxy settings may be causing the issue.
- Verify your internet connection
- Allow RocketWhisper through your firewall
- Temporarily disable VPN
- Use the large-v3-turbo model (high accuracy & fast, default)
- Move the microphone closer to your mouth
- Reduce ambient noise
- Register custom terminology (* For even better accuracy with specialized terms, consider using the large-v3 model)
- Verify that the hotkey is set to "Enabled" in settings
- Check for conflicts with other applications
- Try running as administrator
- Verify that voice commands are "Enabled" in settings
- Speak commands clearly
- Pause briefly before and after the command
- Verify that .NET 8.0 Desktop Runtime is installed
- Confirm you are running Windows 11 64-bit
- Check if antivirus software is blocking the application
Still Having Issues?
Please contact us through the Support page.