Help - RocketWhisper for Linux | AI Speech Recognition & Transcription

📦 1. Installation

Running the AppImage

RocketWhisper is distributed as an AppImage. No installation is needed -- just download and run.

                # 1. Grant execute permission to the downloaded AppImage

chmod +x RocketWhisper-1.2.0-aarch64.AppImage

# 2. Run it

./RocketWhisper-1.2.0-aarch64.AppImage

Environments Without FUSE

AppImages use FUSE. If FUSE is not installed on your system, you can extract and run it as follows:

                # Install FUSE

sudo apt install fuse libfuse2

# Or extract and run

./RocketWhisper-1.2.0-aarch64.AppImage --appimage-extract

./squashfs-root/AppRun

📋 2. Required Packages

The following packages are required to use RocketWhisper:

Ubuntu / Debian / Linux Mint / Pop!_OS

sudo apt install pulseaudio-utils xdotool xclip ffmpeg

Fedora

sudo dnf install pulseaudio-utils xdotool xclip ffmpeg

Arch Linux

sudo pacman -S pulseaudio xdotool xclip ffmpeg

Package Details

Package	Purpose	Required
`pulseaudio-utils`	Microphone recording (parec command)	Required
`xdotool`	Keyboard automation / window detection	Required
`xclip`	Clipboard access	Required
`ffmpeg`	Audio file conversion	Optional

Verify installation: which parec xdotool xclip

🚀 3. First Launch

On first launch, RocketWhisper will automatically download the Whisper model.

Run the AppImage
The initial setup screen appears
Select a model (recommended: large-v3-turbo)
Wait for the model download to complete
Once download finishes, the main window appears

Models are saved to ~/.local/share/RocketWhisper/Models/.

⌨️ 4. Hotkeys

Default Hotkeys

Function	Hotkey	Description
Start/Stop Recording	`F8`	Press to talk. Press again to recognize.
Cancel	`Escape`	Cancel recording
AI Command	`Ctrl + Shift + Space`	AI processing on selected text

Changing Hotkeys

Go to Settings (gear icon) → "Hotkeys" tab to change hotkeys.

Click the text box and press the desired key combination to set it.

🧠 5. Whisper Models

Model	Size	Accuracy	Speed	Recommended RAM
small	466MB	Medium	Normal	8GB
medium	1.5GB	High	Somewhat slow	8GB
large-v3-turbo	1.6GB	High	Fast	8GB
large-v3	2.9GB	Highest	Slow	16GB

Recommended: large-v3-turbo offers the best balance of accuracy and speed for most environments.

📁 6. Batch Processing (Video Support)

Transcribe multiple audio and video files at once. Video files are automatically processed by extracting audio with FFmpeg.

Supported Formats

Audio input: WAV, MP3, FLAC, OGG, M4A, WMA
Video input: MP4, MKV, AVI, MOV, WebM, WMV, FLV
Text output: TXT, SRT subtitles, VTT subtitles

How to Use

Click the "Batch Processing" button in the main window
Add files (drag and drop supported)
Select output format (Text / SRT subtitles / VTT subtitles)
Click "Start Processing"

Tip: FFmpeg is required for video file transcription. Install it with sudo apt install ffmpeg.

✨ 7. Custom Instructions

Process recognized text with any AI prompt you define. Set up custom instructions for meeting minutes, translation, summarization, and more.

How to Use

Configure an AI provider in Settings → "AI Processing" tab
Enter your custom prompt in the "Custom Instructions" field (e.g., "Format as meeting minutes")
Use the Custom Instructions hotkey to record → when stopped, the recognized text is processed by AI

Use Cases

"Translate Japanese to English" → Output speech as translated text
"Format as meeting minutes" → Structure speech into meeting notes
"Summarize in bullet points" → Condense long speech into key points
"Convert to formal language" → Transform casual speech into business writing

Tip: Use a local LLM (Ollama, etc.) for completely offline and free custom instructions.

📜 8. Recognition History

All recognition results are automatically saved and can be searched, copied, and reused at any time.

Features

Auto-save: Recognition results are automatically saved to history
Search: Search past recognition results by keyword
Copy: Copy text from history for reuse
Timestamps: See when each recognition was made

How to Use

Click the "History" button in the main window to open the recognition history window.

🤖 9. AI Command Mode

Give voice instructions to perform AI processing on selected text.

How to Use

Select text in any application
Press the AI Command hotkey (Ctrl + Shift + Space)
Speak your instruction (e.g., "Translate to Japanese", "Summarize this")
Press the hotkey again to execute
Results are displayed in the RocketWhisper window

Setting Up an AI Provider

To use AI Command Mode, you need to configure an AI provider in Settings:

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude)
Google Gemini
Groq (LLaMA)
OpenAI-compatible API (local LLMs, etc.)

AI Command Mode requires an API key from an AI provider. Configure it in Settings → "AI Processing" tab.

🔍 10. Voice Search

When specific phrases are recognized, a browser search is automatically performed.

Supported Phrases

"Search for ..."
"Look up ..."
"Google ..."
"What is ...?"

Settings

Enable or disable this feature in Settings → "Voice Launcher & Search" tab.

🚀 11. Voice Launcher

Launch applications by speaking specific keywords.

Setup

Go to Settings → "Voice Launcher & Search" tab
Click the "Add" button
Enter a keyword (e.g., "open terminal")
Enter the executable path (e.g., /usr/bin/gnome-terminal)
Click Save

You can also use the "Browse..." button to select a file.

🎯 12. Per-App Processing Modes

Automatically apply different settings for each application.

Preset Modes

Smart: Auto punctuation + auto correction
Simple: Output recognition results as-is
Command: Voice commands enabled

App Mapping

Go to Settings → "Per-App Processing" tab
Click "Get Current App" (have the target app in the foreground)
Select the mode to use
Click Save

⚠️ 13. Wayland Environment

Some features are limited under Wayland.

Limitations

Global hotkeys: ydotool required, may need root privileges
Auto paste: wl-clipboard required
AI Command Mode: Affected by clipboard restrictions
Per-app processing: Limited window detection

Additional Packages

sudo apt install ydotool wl-clipboard

For full functionality, select an "Xorg" or "X11" session at login.

Checking Your Session

                echo $XDG_SESSION_TYPE

# "x11" means X11 environment

# "wayland" means Wayland environment

🎮 14. CUDA/GPU Setup

CUDA acceleration is automatically enabled on systems with NVIDIA GPUs.

Supported GPUs

NVIDIA DGX Spark (Blackwell) - Optimized
Jetson AGX Orin (Ampere)
Jetson Orin NX/Nano (Ampere)

Verification

                # Check if NVIDIA driver is recognized

nvidia-smi

# Check CUDA version

nvcc --version

CUDA 12.0 or later is required. If no GPU is available, the app automatically falls back to CPU.

🔧 15. Troubleshooting

Microphone Not Detected

                # List PulseAudio sources (input devices)

pactl list sources short

# Recording test

parec --device=0 --rate=16000 --channels=1 --format=s16le | head -c 160000 > test.raw

Hotkey Not Responding

                # Test xdotool

xdotool getactivewindow

# Check if running in X11 session

echo $XDG_SESSION_TYPE

AppImage Won't Start

                # Install FUSE

sudo apt install fuse libfuse2

# Or extract and run

./RocketWhisper-*.AppImage --appimage-extract

./squashfs-root/AppRun

Garbled Characters

                # Install CJK fonts

sudo apt install fonts-noto-cjk

Configuration File Locations

                ~/.config/RocketWhisper/

├── settings.json          # App settings

├── modes.json             # Processing mode settings

├── mappings.json          # Per-app mappings

├── voice_launcher.json    # Voice launcher settings

└── correction_rules.json  # Correction rules

📖Help

Table of Contents

📦 1. Installation

Running the AppImage

Environments Without FUSE

📋 2. Required Packages

Ubuntu / Debian / Linux Mint / Pop!_OS

Fedora

Arch Linux

Package Details

🚀 3. First Launch

⌨️ 4. Hotkeys

Default Hotkeys

Changing Hotkeys

🧠 5. Whisper Models

📁 6. Batch Processing (Video Support)

Supported Formats

How to Use

✨ 7. Custom Instructions

How to Use

Use Cases

📜 8. Recognition History

Features

How to Use

🤖 9. AI Command Mode

How to Use

Setting Up an AI Provider

🔍 10. Voice Search

Supported Phrases

Settings

🚀 11. Voice Launcher

Setup

🎯 12. Per-App Processing Modes

Preset Modes

App Mapping

⚠️ 13. Wayland Environment

Limitations

Additional Packages

Checking Your Session

🎮 14. CUDA/GPU Setup

Supported GPUs

Verification

🔧 15. Troubleshooting

Microphone Not Detected

Hotkey Not Responding

AppImage Won't Start

Garbled Characters

Configuration File Locations