1. Installation

System Requirements

  • Windows 11 (64-bit)
  • .NET 8.0 Desktop Runtime
  • 8GB+ RAM recommended
  • GPU (CUDA-compatible) for accelerated processing

Installation Steps

  1. Download the latest version from the Download page
  2. Extract the downloaded ZIP file
  3. Run RocketWhisper.exe
  4. On first launch, download the Whisper model you want to use

Choosing the Right Model

Model Size Accuracy Speed Recommended Use
small 466MB High Fast Low-spec PCs
medium 1.5GB High Normal Best for audio under 5 seconds
large-v3-turbo 1.6GB High Fast Best for 5–20 second audio
large-v3 2.9GB Highest Slightly slow Best for audio over 20 seconds

Tips for Choosing

  • Short audio (under 5 seconds): medium is recommended
  • Medium-length audio (5–20 seconds): large-v3-turbo is recommended
  • Long audio (over 20 seconds): large-v3 offers the highest accuracy

* Larger models may produce repetition artifacts with very short audio clips. If your use case involves brief utterances, try the medium model.

2. Basic Usage

Speech Recognition via Microphone

  1. Launch RocketWhisper
  2. Click the record button (microphone icon)
  3. Speak into the microphone
  4. Click the stop button
  5. The recognized text appears in the text area

Speech Recognition from File

  1. Click the file select button
  2. Choose an audio file (WAV/MP3/FLAC/OGG/M4A/WMA)
  3. Recognition processing starts automatically
  4. The recognized text appears in the text area

Drag & Drop Support

Simply drag and drop an audio file onto the RocketWhisper window to start recognition immediately.

Using Recognition Results

  • Copy: Click the copy button to copy to clipboard
  • Save: Click the save button to save as a text file
  • Auto Copy: When enabled in settings, automatically copies upon recognition completion
  • Auto Paste: When enabled in settings, automatically pastes upon recognition completion

3. Settings

Click the gear icon to open the settings window. Detailed configuration is available across 8 tabs.

Model & Language Tab

  • Model Selection: Choose the Whisper model to use
  • Language: Recognition language (Japanese/English/Chinese/Korean/Auto-detect)
  • Punctuation Prompt: Instructs Whisper to include punctuation in the output (recommended)

Input Device Tab

  • Microphone Device: Select the microphone to use
  • Auto Copy: Automatically copy to clipboard upon recognition completion
  • Auto Paste: Automatically paste to the active window upon recognition completion

Hotkey Tab

  • Record Start/Stop Hotkey: A shortcut to start recording from any application
  • Right Alt (Hold): Hold Right Alt to record, release to stop (double-tap for continuous recording)
  • Cancel Hotkey: Key to cancel recording (default: Escape)
  • AI Command Hotkey: Key to execute AI processing with text selection + voice instruction (default: Ctrl + Space)

Custom Terms Tab

  • Custom Term Registration: Register terms to improve recognition accuracy
  • Token Usage: Whisper accepts up to 224 tokens' worth of terms (approximately 50 English words)

Text Processing Tab

  • Auto Punctuation: Add punctuation via post-processing after recognition
  • Auto Line Breaks: Automatically insert line breaks in long text
  • Voice Commands: Enable/disable voice command functionality

Correction Rules Tab

  • Auto Correction: Enable/disable automatic correction of misrecognition patterns
  • Preset Rules: One-click addition of AI/tech terms, numbers/dates, common errors, and filler removal
  • Custom Rules: Add and edit custom correction rules (regex supported)

App-Specific Processing Tab

  • Processing Modes: Apply different text processing settings per application
  • App Mapping: Assign processing modes to specific applications

AI Processing Tab

  • AI Provider: Choose from OpenAI, Anthropic, Groq, Gemini, or Local LLM
  • Model Selection: Select the AI model to use
  • API Key: Enter the API key for your chosen provider
  • Connection Test: Verify that settings are correct

Punctuation Prompt vs. Auto Punctuation

Feature Processing Timing How It Works
Punctuation Prompt During recognition (inside Whisper) The AI understands context and places punctuation naturally
Auto Punctuation After recognition (post-processing) Mechanically inserts punctuation after sentence-ending patterns

We recommend starting with Punctuation Prompt only. If punctuation is still insufficient, enable Auto Punctuation as well.

License Tab

  • License Type: Free, Personal License, Business License
  • License Key: Enter and authenticate your license key

4. Global Hotkeys

Regardless of which application you are using, you can instantly start recording with your configured key combination.

Setup

  1. Open the settings window
  2. Select the "Hotkey" tab
  3. Click the input field
  4. Press your desired key combination (e.g., Ctrl + Shift + R)
  5. Check the "Enabled" checkbox

How to Use

  1. Open any application such as a text editor or web browser
  2. Press the configured hotkey to start recording
  3. When finished speaking, press the hotkey again to stop recording
  4. If Auto Paste is enabled, the recognized text is automatically typed into your application

Recommended Hotkeys

  • Right Alt (Hold) - Hold to record, release to stop (default, most recommended)
  • Ctrl + Shift + R - R for Record (toggle mode)
  • F9 - Function key (unlikely to conflict with other apps)

Right Alt Hold Mode (Push-to-Talk)

An intuitive mode where recording occurs only while the Right Alt key is held down, and stops automatically when released.

Basic Operations

Action Behavior
Hold Right Alt → Release Records only while held (Push-to-Talk)
Quick double-tap Right Alt Switches to continuous recording mode (tap again to stop)
Press another key while holding Right Alt Cancels recording (normal shortcuts like Alt+Tab remain functional)

Setup

  1. Open the settings window
  2. Select the "Hotkey" tab
  3. Click the "Right Alt (Hold)" button for the Record Start/Stop hotkey
  4. Check "Enabled" and save

Comparison with Traditional Hotkeys

Mode Start Recording Stop Recording
Traditional Hotkey Press once Press again (toggle)
Right Alt Hold Hold down Release (auto-stop)
Right Alt Double-Tap Quick double-tap Tap once more

Tip

Use "Hold" mode for short phrases and "Double-Tap (continuous recording)" for longer dictation. Since v1.0.5, the default hotkey is set to "Right Alt (Hold)".

Cancel Recording

Press the Escape key during recording to cancel without performing recognition. This works in both Hold mode and continuous recording mode.

  • Default Key: Escape
  • Customize: Settings → Hotkey tab → Cancel Hotkey
  • Enable/Disable: Toggle with the checkbox

5. Batch Processing

Transcribe multiple audio and video files at once. Video files are automatically processed by extracting audio via FFmpeg.

Batch Processing Steps

  1. Select "Batch Processing" from the menu
  2. Add the files you want to process (multiple selection supported)
  3. Choose the output format (Text/SRT/VTT)
  4. Click "Start Processing"
  5. Wait for processing to complete

Supported Formats

  • Audio Input: WAV, MP3, FLAC, OGG, M4A, WMA
  • Video Input: MP4, MKV, AVI, MOV, WebM, WMV, FLV
  • Output: TXT (plain text), SRT (subtitles), VTT (web subtitles)

* FFmpeg is required for video file transcription. It is included with the installer version.

6. Recognition History

Past recognition results are automatically saved and can be referenced or reused at any time.

Viewing History

  1. Click the history button (clock icon)
  2. A list of past recognition results is displayed
  3. Click an item to view its details

History Operations

  • Search: Search past recognition results by keyword
  • Copy: Copy the selected result to the clipboard
  • Delete: Delete unwanted history entries
  • Export: Export history as a file

History Storage Location

History data is stored at %APPDATA%\RocketWhisper\history.json.

7. Text Processing (Punctuation & Line Breaks)

Features for automatically inserting punctuation and line breaks into recognition results. There are two approaches.

Punctuation Prompt (Recommended)

Instructs Whisper AI to output recognition results that include punctuation.

  • How it works: Passes a sample sentence to Whisper to encourage punctuated output
  • Advantage: The AI understands context and places punctuation naturally
  • Supported languages: Japanese, English, Chinese, Korean

Auto Punctuation (Post-Processing)

Adds punctuation using rule-based logic after recognition.

  • How it works: Detects sentence-ending patterns and inserts periods
  • Advantage: Useful as a fallback when Whisper doesn't output punctuation
  • Note: Pattern-based mechanical processing that doesn't consider context

Auto Line Breaks

Inserts line breaks at appropriate positions in long text.

  • Breaks before conjunctions (e.g., "however", "also", "furthermore")
  • Splits sentences longer than 100 characters at sentence boundaries

Recommended Settings

  1. Start with Punctuation Prompt only enabled
  2. If punctuation is insufficient, also enable Auto Punctuation
  3. For long text, consider enabling Auto Line Breaks

8. Voice Commands

Perform actions like inserting line breaks or deleting text using voice commands.

Available Commands

Command Trigger Phrase Action
New Line "new line", "enter" Inserts a line break
New Paragraph "new paragraph", "paragraph" Inserts two line breaks (paragraph break)
Delete "delete", "undo", "backspace" Deletes the previous word

About Punctuation

Punctuation (periods, commas, question marks, exclamation marks) is automatically inserted by the "Punctuation Prompt" and "Auto Punctuation" features. Enable them in the Text Processing tab of the settings.

Usage Tips

  • Speak commands clearly at natural pauses in your speech
  • Pausing briefly before and after a command improves recognition accuracy
  • Voice commands can be enabled/disabled in the settings

9. Custom Terminology

Register industry-specific terms and proper nouns to improve recognition accuracy.

How to Register

  1. Open the settings window
  2. Select the "Custom Terms" tab
  3. Enter a term and click "Add"

Registration Capacity

There is a limit to the number of custom terms Whisper can accept (up to approximately 224 tokens).

  • English: Approximately 50 words
  • Japanese: Approximately 100 words

Registration capacity can be monitored in real-time in the settings. When the limit is exceeded, older terms are ignored, so prioritize registering frequently used terms.

This Feature is Completely Free

Custom terminology registration uses Whisper's initial prompt feature and processes everything locally. There are no API costs whatsoever.

10. Misrecognition Correction Rules

Set up rules to automatically correct specific misrecognition patterns. Similar to an IME word dictionary, it replaces recognized text strings with different strings.

How to Add Rules

  1. Open the settings window
  2. Select the "Correction Rules" tab
  3. Check "Auto-correct misrecognition patterns"
  4. Click the "Edit Correction Rules..." button
  5. Enter the "Search Pattern" and "Replacement String"
  6. Click "Add Rule"

Basic Usage (Simple Replacement)

Perform simple string replacements without using regular expressions.

Search Pattern Replacement String Description
micro soft Microsoft Fix spacing in company name
mojosoft Mojosoft Fix capitalization of company name
whisper Whisper Fix capitalization of product name

Using Regular Expressions (Advanced Replacement)

Check "Use Regular Expressions" to enable advanced pattern matching.

Search Pattern Replacement String Description
micro\s*soft Microsoft Replace regardless of spacing
you know$ Remove "you know" at end of line

Regular Expression Symbol Guide:

  • \s* : Matches zero or more whitespace characters (spaces, tabs, etc.). For example, "micro\s*soft" matches both "microsoft" and "micro soft".
  • $ : Matches the end of a line. "you know$" only matches "you know" at the end of a line, not in the middle of a sentence.

Built-in Hallucination Protection

RocketWhisper includes 73 built-in Whisper hallucination countermeasures. Unwanted phrases such as "Thank you for watching" are automatically removed.

Difference from Custom Terminology

  • Custom Terminology: Gives Whisper a hint that "this word might appear," improving recognition accuracy
  • Correction Rules: Replaces text in the recognition result after the fact

Using both together yields the most accurate recognition results.

11. App-Specific Processing Modes

Automatically apply different text processing settings depending on which application you are using.

Processing Mode List

Processing modes are presets that bundle settings for punctuation, line breaks, voice commands, and more.

Mode Icon AI Required Description
Smart No Auto-formats with punctuation and line breaks (standard)
Simple 📋 No Outputs recognized text as-is
Business 💼 Yes Auto-converts to formal/polite language
Casual 💬 Yes Converts to friendly, casual tone
Summary 📝 Yes Summarizes text concisely
Translation 🌐 Yes Translates to English
Grammar Fix ✔️ Yes Auto-corrects misrecognitions and grammar

Hint

To use modes marked "AI Required", configure an AI provider in the "AI Processing" tab of the settings.

Assigning Modes to Apps

  1. Open the settings window
  2. Select the "App-Specific Processing" tab
  3. Click "Add App"
  4. Select the target app (from the list of running apps or enter manually)
  5. Choose the processing mode to apply

Usage Examples

  • Email client: Auto-insert punctuation and line breaks (Smart mode)
  • Code editor: No formatting (Simple mode)
  • Notepad: Bullet-point-friendly settings (Custom mode)

Default Processing Mode

If an app is not registered, the "Default" mode is applied. Default mode settings can be changed in the "Text Processing" tab.

12. AI Processing (LLM Integration)

Integrate with external AI (LLM) to automatically convert recognition results to business language, casual tone, or perform summarization and translation. With a local LLM (LM Studio / Ollama), AI processing works even in offline environments.

Supported AI Providers

Provider Example Models Features
OpenAI gpt-4o, gpt-4o-mini High accuracy, consistent quality
Anthropic Claude 4.5 Sonnet/Haiku Natural language, long text support
Groq LLaMA 3.3 70B, Mixtral High speed, free tier available
Google Gemini gemini-2.5-pro/flash Generous free tier, high quality
Local LLM LM Studio, Ollama Offline operation, privacy protection

Setup

  1. Open the settings window
  2. Select the "AI Processing" tab
  3. Choose the provider to use
  4. Enter the API key (for Local LLM, only the base URL is needed)
  5. Click "Connection Test" to verify

AI Processing Modes

Once AI processing is configured, the following modes become available in app-specific processing.

Mode Description
💼 Business Auto-converts to formal/polite language
💬 Casual Converts to friendly, casual tone
📝 Summary Summarizes text concisely
🌐 Translation Translates to English
✔️ Grammar Fix Auto-corrects misrecognitions and grammar

Local LLM Setup Examples

Use LM Studio or Ollama to utilize AI processing completely offline.

LM Studio

LM Studio Setup:

  1. Launch LM Studio
  2. Select the model to use
  3. In the status bar, choose "Power User" or "Developer" from User / Power User / Developer
  4. Navigate to the Developer tab (click the green "Developer" icon in the left menu)
  5. If the "Status: Running" toggle at the top of the screen is ON, the server is running
  6. When "Running", the API URL (e.g., http://localhost:1234) is displayed
  7. To stop the server, click the toggle to switch to "Stopped"

RocketWhisper Setup:

Setting Value
Provider Local LLM
Base URL http://127.0.0.1:1234
Model ID Leave empty (automatically uses the currently loaded model)
API Key Leave empty

Ollama

Ollama Setup:

  1. Install and launch Ollama (the API server starts automatically in the background)
  2. Download the desired model: ollama pull gemma3
  3. Verify downloaded models: ollama list

RocketWhisper Setup:

Setting Value
Provider Local LLM
Base URL http://localhost:11434
Model ID Enter the name of a downloaded model (e.g., gemma3)
API Key Leave empty

LM Studio vs. Ollama

Item LM Studio Ollama
Default Port 1234 11434
Model ID Leave empty (auto-uses loaded model) Required (specify downloaded model name)
GUI Yes (desktop app) CLI-based (can supplement with Open WebUI, etc.)

Benefits of Local LLM

  • Fully Offline: Both speech recognition and AI processing work without internet
  • Privacy Protection: Neither audio data nor text is sent to any external service
  • Free: No API costs

* The /v1 path is automatically appended

Processing Flow

AI processing follows this pipeline: Speech Recognition → Punctuation Insertion → Line Break Insertion → AI Processing → Character Conversion. If AI processing fails, the original text is used as-is.

13. AI Command Mode

Select text, press the dedicated hotkey, and give a voice instruction — the AI instantly processes the text. Supports translation, summarization, tone adjustment, and any other instruction you can think of.

Basic Usage

  1. Select text in any application (Notepad, browser, Word, etc.)
  2. Press the AI Command hotkey (default: Ctrl + Space)
  3. The status changes to "Waiting for voice instruction..." — give your instruction by voice (e.g., "Translate to Japanese", "Summarize this", "Make it formal")
  4. Press the AI Command hotkey again to stop recording
  5. The AI processes the text, and the result appears in RocketWhisper's text area

Usage Examples

Selected Text Voice Instruction Result
Text in another language "Translate to English" Text translated to English
Long meeting notes "Summarize this" A concise summary
Casual memo "Make it formal" Business-appropriate text
Program code "Add comments" Code with comments added

Hotkey Configuration

  1. Open the settings window
  2. Select the "Hotkey" tab
  3. Click the "AI Command Hotkey" input field
  4. Press your desired key combination (default: Ctrl + Space)
  5. Check "Enabled"

Prerequisites

To use AI Command Mode, you must configure an AI provider in the "AI Processing" tab of the settings. With a local LLM (LM Studio / Ollama), it works completely offline and free of charge.

Mutual Exclusion

AI Command is disabled during normal recording, and normal hotkeys are disabled during AI Command recording. This safety design prevents accidental operations.

14. Custom Instructions

Register your most-used AI tasks (translation, business rephrasing, summarization, etc.) to dedicated hotkeys. Just speak and the AI processing is applied automatically. Unlike AI Command Mode, you don't need to say "translate this" each time, making it ideal for repetitive tasks.

Setup

  1. Configure an AI provider in the "AI Processing" tab of the settings (local LLM or cloud API)
  2. Open the "Hotkeys" tab in the settings and find the "Custom Instructions" section
  3. Choose from presets (Translate to English, Business Style, Summary, Grammar Fix) or click "Add" to create a new one
  4. Assign a dedicated hotkey
  5. Make sure "Enabled" is checked

Basic Usage

  1. Press the hotkey assigned to your custom instruction
  2. Speak into the microphone (any content in any language)
  3. Press the same hotkey again to stop recording
  4. Whisper recognizes the audio → the pre-configured AI processing is applied automatically → result is output

Built-in Presets

Preset Icon Action
Translate to English 🌐 Translates recognized text to English
Business Style 💼 Rephrases into formal/polite business tone
Summary 📝 Summarizes the text concisely
Grammar Fix ✔️ Corrects grammar and misrecognition errors

Adding Custom Instructions

  1. Go to Settings → "Hotkeys" tab → "Custom Instructions" section → click "Add"
  2. Enter a name (e.g., "Translate to French")
  3. Enter a prompt (e.g., "Please translate the following text into French")
  4. Assign a dedicated hotkey
  5. Click "Save"

You can register up to 20 custom instructions.

Difference from AI Command Mode

AI Command Mode is a versatile tool where you select text and give voice instructions each time. Custom Instructions bind a pre-set prompt to a hotkey, so no voice instruction is needed — ideal for repeated tasks.

Prerequisites

To use Custom Instructions, you must configure an AI provider in the "AI Processing" tab of the settings. With a local LLM (LM Studio / Ollama), it works completely offline and free of charge.

Mutual Exclusion

Custom Instruction hotkeys are disabled during normal recording or AI Command recording. This safety design prevents accidental operations.

15. Recording Indicator

When performing hotkey recording while RocketWhisper is minimized to the system tray, a floating window appears to visually confirm the recording state.

Indicator Features

  • Real-time Waveform Display: Visualizes audio levels with 16 animated bars
  • Processing Mode Display: Shows the currently applied mode (Smart, Business, etc.) with icon and name
  • Non-Focus-Stealing: Does not steal focus from other applications
  • Draggable: Can be moved to any position; position is automatically saved

Default Position

Initially appears at the bottom center of the screen (above the taskbar). Once dragged to a new position, it will appear there on subsequent recordings.

When the Indicator Appears

The recording indicator only appears when hotkey recording is performed while RocketWhisper is minimized to the system tray. It does not appear when the main window is visible.

16. Voice Launcher

Automatically launches an application when a specific keyword is recognized. Simply say "Notepad" to open Notepad — control apps with just your voice.

Setup

  1. Open the settings window and select the "Voice Launcher & Search" tab
  2. Check "Enable Voice Launcher"
  3. Click "Edit Launcher Settings..."
  4. Click "Add" to register a keyword and executable file
  5. Click "Save" in the settings window

Registration Fields

Field Description Example
Keyword The trigger phrase for launching Notepad, Calculator, Browser
Executable Path to the application C:\Windows\notepad.exe
Parameters Launch arguments (optional) --new-window
Description Notes (optional) Launch Notepad

Matching Behavior

  • Exact Match: Launches only when the recognition result exactly matches the keyword
  • Punctuation Ignored: Trailing punctuation marks (commas, periods, !, ?) are automatically stripped
  • Case Insensitive: English words are matched regardless of case

Registration Tips

Register the exact phrase you would naturally say as the keyword. For example, if you would say "open Notepad", register "open Notepad" as the keyword. Shorter keywords (like "Notepad") are easier to recognize.

Difference from Voice Commands

Voice commands ("new line", "new paragraph", etc.) are for text editing, while Voice Launcher is for launching applications. Both can be enabled simultaneously, but if the same keyword is registered for both, Voice Launcher takes priority.

18. Tray Icon

Check the application's status via the system tray icon.

Icon States

State Icon Description
Standby 🎤 (blue) Ready to record
Recording 🎤 (blue) + 🔴 (blinking) Red dot blinks in the bottom-right corner
Processing ⚙️ (yellow, spinning) Whisper is performing speech recognition

Tray Icon Actions

  • Left Click: Show the main window
  • Right Click: Show the context menu

19. Troubleshooting

Please check your network connection. Firewall or proxy settings may be causing the issue.

  • Verify your internet connection
  • Allow RocketWhisper through your firewall
  • Temporarily disable VPN
  • Use the large-v3-turbo model (high accuracy & fast, default)
  • Move the microphone closer to your mouth
  • Reduce ambient noise
  • Register custom terminology (* For even better accuracy with specialized terms, consider using the large-v3 model)
  • Verify that the hotkey is set to "Enabled" in settings
  • Check for conflicts with other applications
  • Try running as administrator
  • Verify that voice commands are "Enabled" in settings
  • Speak commands clearly
  • Pause briefly before and after the command
  • Verify that .NET 8.0 Desktop Runtime is installed
  • Confirm you are running Windows 11 64-bit
  • Check if antivirus software is blocking the application

Still Having Issues?

Please contact us through the Support page.