User Guide - RocketWhisper | AI Speech Recognition & Transcription

1. Installation

System Requirements

Windows 11 (64-bit)
.NET 8.0 Desktop Runtime
8GB+ RAM recommended
GPU (CUDA-compatible) for accelerated processing

Installation Steps

Download the latest version from the Download page
Extract the downloaded ZIP file
Run RocketWhisper.exe
On first launch, download the Whisper model you want to use

Choosing the Right Model

Model	Size	Accuracy	Speed	Recommended Use
small	466MB	High	Fast	Low-spec PCs
medium	1.5GB	High	Normal	Best for audio under 5 seconds
large-v3-turbo	1.6GB	High	Fast	Best for 5–20 second audio
large-v3	2.9GB	Highest	Slightly slow	Best for audio over 20 seconds

Tips for Choosing

Short audio (under 5 seconds): medium is recommended
Medium-length audio (5–20 seconds): large-v3-turbo is recommended
Long audio (over 20 seconds): large-v3 offers the highest accuracy

* Larger models may produce repetition artifacts with very short audio clips. If your use case involves brief utterances, try the medium model.

2. Basic Usage

Speech Recognition via Microphone

Launch RocketWhisper
Click the record button (microphone icon)
Speak into the microphone
Click the stop button
The recognized text appears in the text area

Speech Recognition from File

Click the file select button
Choose an audio file (WAV/MP3/FLAC/OGG/M4A/WMA)
Recognition processing starts automatically
The recognized text appears in the text area

Drag & Drop Support

Simply drag and drop an audio file onto the RocketWhisper window to start recognition immediately.

Using Recognition Results

Copy: Click the copy button to copy to clipboard
Save: Click the save button to save as a text file
Auto Copy: When enabled in settings, automatically copies upon recognition completion
Auto Paste: When enabled in settings, automatically pastes upon recognition completion

3. Settings

Click the gear icon to open the settings window. Detailed configuration is available across 8 tabs.

Model & Language Tab

Model Selection: Choose the Whisper model to use
Language: Recognition language (Japanese/English/Chinese/Korean/Auto-detect)
Punctuation Prompt: Instructs Whisper to include punctuation in the output (recommended)

Input Device Tab

Microphone Device: Select the microphone to use
Auto Copy: Automatically copy to clipboard upon recognition completion
Auto Paste: Automatically paste to the active window upon recognition completion

Hotkey Tab

Record Start/Stop Hotkey: A shortcut to start recording from any application
Right Alt (Hold): Hold Right Alt to record, release to stop (double-tap for continuous recording)
Cancel Hotkey: Key to cancel recording (default: Escape)
AI Command Hotkey: Key to execute AI processing with text selection + voice instruction (default: Ctrl + Space)

Custom Terms Tab

Custom Term Registration: Register terms to improve recognition accuracy
Token Usage: Whisper accepts up to 224 tokens' worth of terms (approximately 50 English words)

Text Processing Tab

Auto Punctuation: Add punctuation via post-processing after recognition
Auto Line Breaks: Automatically insert line breaks in long text
Voice Commands: Enable/disable voice command functionality

Correction Rules Tab

Auto Correction: Enable/disable automatic correction of misrecognition patterns
Preset Rules: One-click addition of AI/tech terms, numbers/dates, common errors, and filler removal
Custom Rules: Add and edit custom correction rules (regex supported)

App-Specific Processing Tab

Processing Modes: Apply different text processing settings per application
App Mapping: Assign processing modes to specific applications

AI Processing Tab

AI Provider: Choose from OpenAI, Anthropic, Groq, Gemini, or Local LLM
Model Selection: Select the AI model to use
API Key: Enter the API key for your chosen provider
Connection Test: Verify that settings are correct

Punctuation Prompt vs. Auto Punctuation

Feature	Processing Timing	How It Works
Punctuation Prompt	During recognition (inside Whisper)	The AI understands context and places punctuation naturally
Auto Punctuation	After recognition (post-processing)	Mechanically inserts punctuation after sentence-ending patterns

We recommend starting with Punctuation Prompt only. If punctuation is still insufficient, enable Auto Punctuation as well.

License Tab

License Type: Free, Personal License, Business License
License Key: Enter and authenticate your license key

4. Global Hotkeys

Regardless of which application you are using, you can instantly start recording with your configured key combination.

Setup

Open the settings window
Select the "Hotkey" tab
Click the input field
Press your desired key combination (e.g., Ctrl + Shift + R)
Check the "Enabled" checkbox

How to Use

Open any application such as a text editor or web browser
Press the configured hotkey to start recording
When finished speaking, press the hotkey again to stop recording
If Auto Paste is enabled, the recognized text is automatically typed into your application

Recommended Hotkeys

Right Alt (Hold) - Hold to record, release to stop (default, most recommended)
Ctrl + Shift + R - R for Record (toggle mode)
F9 - Function key (unlikely to conflict with other apps)

Right Alt Hold Mode (Push-to-Talk)

An intuitive mode where recording occurs only while the Right Alt key is held down, and stops automatically when released.

Basic Operations

Action	Behavior
Hold Right Alt → Release	Records only while held (Push-to-Talk)
Quick double-tap Right Alt	Switches to continuous recording mode (tap again to stop)
Press another key while holding Right Alt	Cancels recording (normal shortcuts like Alt+Tab remain functional)

Setup

Open the settings window
Select the "Hotkey" tab
Click the "Right Alt (Hold)" button for the Record Start/Stop hotkey
Check "Enabled" and save

Comparison with Traditional Hotkeys

Mode	Start Recording	Stop Recording
Traditional Hotkey	Press once	Press again (toggle)
Right Alt Hold	Hold down	Release (auto-stop)
Right Alt Double-Tap	Quick double-tap	Tap once more

Tip

Use "Hold" mode for short phrases and "Double-Tap (continuous recording)" for longer dictation. Since v1.0.5, the default hotkey is set to "Right Alt (Hold)".

Cancel Recording

Press the Escape key during recording to cancel without performing recognition. This works in both Hold mode and continuous recording mode.

Default Key: Escape
Customize: Settings → Hotkey tab → Cancel Hotkey
Enable/Disable: Toggle with the checkbox

5. Batch Processing

Transcribe multiple audio and video files at once. Video files are automatically processed by extracting audio via FFmpeg.

Batch Processing Steps

Select "Batch Processing" from the menu
Add the files you want to process (multiple selection supported)
Choose the output format (Text/SRT/VTT)
Click "Start Processing"
Wait for processing to complete

Supported Formats

Audio Input: WAV, MP3, FLAC, OGG, M4A, WMA
Video Input: MP4, MKV, AVI, MOV, WebM, WMV, FLV
Output: TXT (plain text), SRT (subtitles), VTT (web subtitles)

* FFmpeg is required for video file transcription. It is included with the installer version.

6. Recognition History

Past recognition results are automatically saved and can be referenced or reused at any time.

Viewing History

Click the history button (clock icon)
A list of past recognition results is displayed
Click an item to view its details

History Operations

Search: Search past recognition results by keyword
Copy: Copy the selected result to the clipboard
Delete: Delete unwanted history entries
Export: Export history as a file

History Storage Location

History data is stored at %APPDATA%\RocketWhisper\history.json.

7. Text Processing (Punctuation & Line Breaks)

Features for automatically inserting punctuation and line breaks into recognition results. There are two approaches.

Punctuation Prompt (Recommended)

Instructs Whisper AI to output recognition results that include punctuation.

How it works: Passes a sample sentence to Whisper to encourage punctuated output
Advantage: The AI understands context and places punctuation naturally
Supported languages: Japanese, English, Chinese, Korean

Auto Punctuation (Post-Processing)

Adds punctuation using rule-based logic after recognition.

How it works: Detects sentence-ending patterns and inserts periods
Advantage: Useful as a fallback when Whisper doesn't output punctuation
Note: Pattern-based mechanical processing that doesn't consider context

Auto Line Breaks

Inserts line breaks at appropriate positions in long text.

Breaks before conjunctions (e.g., "however", "also", "furthermore")
Splits sentences longer than 100 characters at sentence boundaries

Recommended Settings

Start with Punctuation Prompt only enabled
If punctuation is insufficient, also enable Auto Punctuation
For long text, consider enabling Auto Line Breaks

8. Voice Commands

Perform actions like inserting line breaks or deleting text using voice commands.

Available Commands

Command	Trigger Phrase	Action
New Line	"new line", "enter"	Inserts a line break
New Paragraph	"new paragraph", "paragraph"	Inserts two line breaks (paragraph break)
Delete	"delete", "undo", "backspace"	Deletes the previous word

About Punctuation

Punctuation (periods, commas, question marks, exclamation marks) is automatically inserted by the "Punctuation Prompt" and "Auto Punctuation" features. Enable them in the Text Processing tab of the settings.

Usage Tips

Speak commands clearly at natural pauses in your speech
Pausing briefly before and after a command improves recognition accuracy
Voice commands can be enabled/disabled in the settings

9. Custom Terminology

Register industry-specific terms and proper nouns to improve recognition accuracy.

How to Register

Open the settings window
Select the "Custom Terms" tab
Enter a term and click "Add"

Registration Capacity

There is a limit to the number of custom terms Whisper can accept (up to approximately 224 tokens).

English: Approximately 50 words
Japanese: Approximately 100 words

Registration capacity can be monitored in real-time in the settings. When the limit is exceeded, older terms are ignored, so prioritize registering frequently used terms.

This Feature is Completely Free

Custom terminology registration uses Whisper's initial prompt feature and processes everything locally. There are no API costs whatsoever.

10. Misrecognition Correction Rules

Set up rules to automatically correct specific misrecognition patterns. Similar to an IME word dictionary, it replaces recognized text strings with different strings.

How to Add Rules

Open the settings window
Select the "Correction Rules" tab
Check "Auto-correct misrecognition patterns"
Click the "Edit Correction Rules..." button
Enter the "Search Pattern" and "Replacement String"
Click "Add Rule"

Basic Usage (Simple Replacement)

Perform simple string replacements without using regular expressions.

Search Pattern	Replacement String	Description
micro soft	Microsoft	Fix spacing in company name
mojosoft	Mojosoft	Fix capitalization of company name
whisper	Whisper	Fix capitalization of product name

Using Regular Expressions (Advanced Replacement)

Check "Use Regular Expressions" to enable advanced pattern matching.

Search Pattern	Replacement String	Description
micro\s*soft	Microsoft	Replace regardless of spacing
you know$		Remove "you know" at end of line

Regular Expression Symbol Guide:

\s* : Matches zero or more whitespace characters (spaces, tabs, etc.). For example, "micro\s*soft" matches both "microsoft" and "micro soft".
$ : Matches the end of a line. "you know$" only matches "you know" at the end of a line, not in the middle of a sentence.

Built-in Hallucination Protection

RocketWhisper includes 73 built-in Whisper hallucination countermeasures. Unwanted phrases such as "Thank you for watching" are automatically removed.

Difference from Custom Terminology

Custom Terminology: Gives Whisper a hint that "this word might appear," improving recognition accuracy
Correction Rules: Replaces text in the recognition result after the fact

Using both together yields the most accurate recognition results.

11. App-Specific Processing Modes

Automatically apply different text processing settings depending on which application you are using.

Processing Mode List

Processing modes are presets that bundle settings for punctuation, line breaks, voice commands, and more.

Mode	Icon	AI Required	Description
Smart	✨	No	Auto-formats with punctuation and line breaks (standard)
Simple	📋	No	Outputs recognized text as-is
Business	💼	Yes	Auto-converts to formal/polite language
Casual	💬	Yes	Converts to friendly, casual tone
Summary	📝	Yes	Summarizes text concisely
Translation	🌐	Yes	Translates to English
Grammar Fix	✔️	Yes	Auto-corrects misrecognitions and grammar

Hint

To use modes marked "AI Required", configure an AI provider in the "AI Processing" tab of the settings.

Assigning Modes to Apps

Open the settings window
Select the "App-Specific Processing" tab
Click "Add App"
Select the target app (from the list of running apps or enter manually)
Choose the processing mode to apply

Usage Examples

Email client: Auto-insert punctuation and line breaks (Smart mode)
Code editor: No formatting (Simple mode)
Notepad: Bullet-point-friendly settings (Custom mode)

Default Processing Mode

If an app is not registered, the "Default" mode is applied. Default mode settings can be changed in the "Text Processing" tab.

12. AI Processing (LLM Integration)

Integrate with external AI (LLM) to automatically convert recognition results to business language, casual tone, or perform summarization and translation. With a local LLM (LM Studio / Ollama), AI processing works even in offline environments.

Supported AI Providers

Provider	Example Models	Features
OpenAI	gpt-4o, gpt-4o-mini	High accuracy, consistent quality
Anthropic	Claude 4.5 Sonnet/Haiku	Natural language, long text support
Groq	LLaMA 3.3 70B, Mixtral	High speed, free tier available
Google Gemini	gemini-2.5-pro/flash	Generous free tier, high quality
Local LLM	LM Studio, Ollama	Offline operation, privacy protection

Setup

Open the settings window
Select the "AI Processing" tab
Choose the provider to use
Enter the API key (for Local LLM, only the base URL is needed)
Click "Connection Test" to verify

AI Processing Modes

Once AI processing is configured, the following modes become available in app-specific processing.

Mode	Description
💼 Business	Auto-converts to formal/polite language
💬 Casual	Converts to friendly, casual tone
📝 Summary	Summarizes text concisely
🌐 Translation	Translates to English
✔️ Grammar Fix	Auto-corrects misrecognitions and grammar

Local LLM Setup Examples

Use LM Studio or Ollama to utilize AI processing completely offline.

LM Studio

LM Studio Setup:

Launch LM Studio
Select the model to use
In the status bar, choose "Power User" or "Developer" from User / Power User / Developer
Navigate to the Developer tab (click the green "Developer" icon in the left menu)
If the "Status: Running" toggle at the top of the screen is ON, the server is running
When "Running", the API URL (e.g., http://localhost:1234) is displayed
To stop the server, click the toggle to switch to "Stopped"

RocketWhisper Setup:

Setting	Value
Provider	Local LLM
Base URL	`http://127.0.0.1:1234`
Model ID	Leave empty (automatically uses the currently loaded model)
API Key	Leave empty

Ollama

Ollama Setup:

Install and launch Ollama (the API server starts automatically in the background)
Download the desired model: ollama pull gemma3
Verify downloaded models: ollama list

RocketWhisper Setup:

Setting	Value
Provider	Local LLM
Base URL	`http://localhost:11434`
Model ID	Enter the name of a downloaded model (e.g., `gemma3`)
API Key	Leave empty

LM Studio vs. Ollama

Item	LM Studio	Ollama
Default Port	1234	11434
Model ID	Leave empty (auto-uses loaded model)	Required (specify downloaded model name)
GUI	Yes (desktop app)	CLI-based (can supplement with Open WebUI, etc.)

Benefits of Local LLM

Fully Offline: Both speech recognition and AI processing work without internet
Privacy Protection: Neither audio data nor text is sent to any external service
Free: No API costs

* The /v1 path is automatically appended

Processing Flow

AI processing follows this pipeline: Speech Recognition → Punctuation Insertion → Line Break Insertion → AI Processing → Character Conversion. If AI processing fails, the original text is used as-is.

13. AI Command Mode

Select text, press the dedicated hotkey, and give a voice instruction — the AI instantly processes the text. Supports translation, summarization, tone adjustment, and any other instruction you can think of.

Basic Usage

Select text in any application (Notepad, browser, Word, etc.)
Press the AI Command hotkey (default: Ctrl + Space)
The status changes to "Waiting for voice instruction..." — give your instruction by voice (e.g., "Translate to Japanese", "Summarize this", "Make it formal")
Press the AI Command hotkey again to stop recording
The AI processes the text, and the result appears in RocketWhisper's text area

Usage Examples

Selected Text	Voice Instruction	Result
Text in another language	"Translate to English"	Text translated to English
Long meeting notes	"Summarize this"	A concise summary
Casual memo	"Make it formal"	Business-appropriate text
Program code	"Add comments"	Code with comments added

Hotkey Configuration

Open the settings window
Select the "Hotkey" tab
Click the "AI Command Hotkey" input field
Press your desired key combination (default: Ctrl + Space)
Check "Enabled"

Prerequisites

To use AI Command Mode, you must configure an AI provider in the "AI Processing" tab of the settings. With a local LLM (LM Studio / Ollama), it works completely offline and free of charge.

Mutual Exclusion

AI Command is disabled during normal recording, and normal hotkeys are disabled during AI Command recording. This safety design prevents accidental operations.

14. Custom Instructions

Register your most-used AI tasks (translation, business rephrasing, summarization, etc.) to dedicated hotkeys. Just speak and the AI processing is applied automatically. Unlike AI Command Mode, you don't need to say "translate this" each time, making it ideal for repetitive tasks.

Setup

Configure an AI provider in the "AI Processing" tab of the settings (local LLM or cloud API)
Open the "Hotkeys" tab in the settings and find the "Custom Instructions" section
Choose from presets (Translate to English, Business Style, Summary, Grammar Fix) or click "Add" to create a new one
Assign a dedicated hotkey
Make sure "Enabled" is checked

Basic Usage

Press the hotkey assigned to your custom instruction
Speak into the microphone (any content in any language)
Press the same hotkey again to stop recording
Whisper recognizes the audio → the pre-configured AI processing is applied automatically → result is output

Built-in Presets

Preset	Icon	Action
Translate to English	🌐	Translates recognized text to English
Business Style	💼	Rephrases into formal/polite business tone
Summary	📝	Summarizes the text concisely
Grammar Fix	✔️	Corrects grammar and misrecognition errors

Adding Custom Instructions

Go to Settings → "Hotkeys" tab → "Custom Instructions" section → click "Add"
Enter a name (e.g., "Translate to French")
Enter a prompt (e.g., "Please translate the following text into French")
Assign a dedicated hotkey
Click "Save"

You can register up to 20 custom instructions.

Difference from AI Command Mode

AI Command Mode is a versatile tool where you select text and give voice instructions each time. Custom Instructions bind a pre-set prompt to a hotkey, so no voice instruction is needed — ideal for repeated tasks.

Prerequisites

To use Custom Instructions, you must configure an AI provider in the "AI Processing" tab of the settings. With a local LLM (LM Studio / Ollama), it works completely offline and free of charge.

Mutual Exclusion

Custom Instruction hotkeys are disabled during normal recording or AI Command recording. This safety design prevents accidental operations.

15. Recording Indicator

When performing hotkey recording while RocketWhisper is minimized to the system tray, a floating window appears to visually confirm the recording state.

Indicator Features

Real-time Waveform Display: Visualizes audio levels with 16 animated bars
Processing Mode Display: Shows the currently applied mode (Smart, Business, etc.) with icon and name
Non-Focus-Stealing: Does not steal focus from other applications
Draggable: Can be moved to any position; position is automatically saved

Default Position

Initially appears at the bottom center of the screen (above the taskbar). Once dragged to a new position, it will appear there on subsequent recordings.

When the Indicator Appears

The recording indicator only appears when hotkey recording is performed while RocketWhisper is minimized to the system tray. It does not appear when the main window is visible.

16. Voice Launcher

Automatically launches an application when a specific keyword is recognized. Simply say "Notepad" to open Notepad — control apps with just your voice.

Setup

Open the settings window and select the "Voice Launcher & Search" tab
Check "Enable Voice Launcher"
Click "Edit Launcher Settings..."
Click "Add" to register a keyword and executable file
Click "Save" in the settings window

Registration Fields

Field	Description	Example
Keyword	The trigger phrase for launching	Notepad, Calculator, Browser
Executable	Path to the application	C:\Windows\notepad.exe
Parameters	Launch arguments (optional)	--new-window
Description	Notes (optional)	Launch Notepad

Matching Behavior

Exact Match: Launches only when the recognition result exactly matches the keyword
Punctuation Ignored: Trailing punctuation marks (commas, periods, !, ?) are automatically stripped
Case Insensitive: English words are matched regardless of case

Registration Tips

Register the exact phrase you would naturally say as the keyword. For example, if you would say "open Notepad", register "open Notepad" as the keyword. Shorter keywords (like "Notepad") are easier to recognize.

Difference from Voice Commands

Voice commands ("new line", "new paragraph", etc.) are for text editing, while Voice Launcher is for launching applications. Both can be enabled simultaneously, but if the same keyword is registered for both, Voice Launcher takes priority.

17. Voice Search

Simply say "Search for..." during normal voice input to automatically open a Google search in your default browser.

Supported Phrases

Phrase	Example
Search for ___	"Search for Tokyo Tower"
Look up ___	"Look up weather forecast"
Google ___	"Google RocketWhisper"
What is ___	"What is machine learning"
Tell me about ___	"Tell me about Python"
___ meaning	"AI meaning"

Setup

Open the settings window
Select the "Voice Launcher & Search" tab
Check "Enable Voice Search"

How It Works

The flow is: Recording → Speech Recognition → Phrase Detection → Auto-open Google search in default browser. If the input doesn't match a search phrase, it proceeds through the normal text processing pipeline.

Difference from Voice Launcher

Voice Launcher "launches a specific app when an exact keyword match is detected", while Voice Search "opens a browser search when a search phrase pattern is matched". Both can be enabled simultaneously.

18. Tray Icon

Check the application's status via the system tray icon.

Icon States

State	Icon	Description
Standby	🎤 (blue)	Ready to record
Recording	🎤 (blue) + 🔴 (blinking)	Red dot blinks in the bottom-right corner
Processing	⚙️ (yellow, spinning)	Whisper is performing speech recognition

Tray Icon Actions

Left Click: Show the main window
Right Click: Show the context menu

19. Troubleshooting

Please check your network connection. Firewall or proxy settings may be causing the issue.

Verify your internet connection
Allow RocketWhisper through your firewall
Temporarily disable VPN

Use the large-v3-turbo model (high accuracy & fast, default)
Move the microphone closer to your mouth
Reduce ambient noise
Register custom terminology (* For even better accuracy with specialized terms, consider using the large-v3 model)

Verify that the hotkey is set to "Enabled" in settings
Check for conflicts with other applications
Try running as administrator

Verify that voice commands are "Enabled" in settings
Speak commands clearly
Pause briefly before and after the command

Verify that .NET 8.0 Desktop Runtime is installed
Confirm you are running Windows 11 64-bit
Check if antivirus software is blocking the application

Still Having Issues?

Please contact us through the Support page.