MediaCraft - Audio & Video Tool

Overview

MediaCraft is an all-in-one audio and video processing tool built with PyQt5 + FFmpeg. It provides a graphical interface to make complex tasks simple. It supports 15 core feature modules, from basic editing to advanced AI processing.

Version: 1.0.0 | Developer: Hainan Xiandao | Website: https://www.myzhenai.com.cn/

Download & Install

Baidu Netdisk - MediaCraft Extract code: 79uu

Quark - MediaCraft Extract code: WqYX

eCloud - MediaCraft Access code: zw01

Size tip: If you do not use the following advanced features, you can skip or delete the corresponding plugin folders to save space:
- RTVC (real-time voice cloning) - MediaCraft\Plugin\RTVC
- SoVITS (high-quality voice cloning) - MediaCraft\Plugin\SoVITS
- Whisper (AI subtitle recognition) - MediaCraft\Plugin\whisper

Keeping the core program is enough for most common tasks (transcode, merge, watermark, screenshot, split, extract audio/subtitles, etc.).

Installation: Extract and run. For AI/voice cloning, add the corresponding Plugin subfolders as needed.

Core Features

1. Tool Settings

FFmpeg path (auto-detect from PATH or set manually)
Whisper path (system or Plugin directory)
Tesseract-OCR path (for hard subtitle OCR)
Font management (system fonts + program fonts directory)
Audio device detection (system audio and microphone)

Centralized path configuration for all dependencies, with auto-detect and manual set. Audio devices are detected and listed automatically.

Use cases: System config, environment setup, component management

2. Screen Recording

Full screen / region
System sound / microphone
Mouse cursor
High resolution
Multiple output formats

Full screen or custom region; system sound and microphone; mouse cursor. Configurable FPS, encoding quality (ultrafast/fast/medium/slow), bitrate (5000k-15000k or custom). Pause/resume/stop; default save to desktop. Requires Screen Capturer Recorder and VB-CABLE for audio.

Use cases: Tutorials, game recording, demos, online teaching

3. Video Watermark

Image and text watermarks
Custom font and color
Position and size
Transparency
Batch processing

Image (PNG/JPG) and text watermarks; font, size, color (HEX), outline, transparency; position; batch; drag-and-drop; overwrite option.

Use cases: Copyright, branding, identification

4. Video Subtitles

Hard and soft subtitles
Custom font and color
Position
Multi-language
Batch

Hard subtitles (burned-in, SRT); soft subtitles (external track, no re-encode). Languages, position, font, color, outline, background. Auto-detect same-name .srt/.ass/.ssa; track name, language code (ISO 639-1), format (mov_text/subrip/ass) for MP4/MKV.

Use cases: Multilingual video, accessibility, teaching, promos

5. Video Merge

Batch video merge
Batch audio merge
Format compatibility check
Reorder

Merge multiple videos or audio files; auto format check; drag to reorder.

Use cases: Episode merge, album creation, long-form video

6. Video Transcode

Batch format conversion
Custom encoder
Quality
Hardware acceleration

Batch re-encode; custom video encoder (e.g. H.264/H.265), audio encoder, resolution, bitrate; hardware acceleration when available.

Use cases: Format conversion, compression, compatibility, batch

7. Extract Subtitles

Soft subtitle extraction
Hard subtitle OCR
Multi-format output
Batch

Extract soft subtitles directly; use Tesseract-OCR for hard subtitles. Output SRT/ASS/VTT; batch supported.

Use cases: Silent video subtitles, multilingual subs, analysis

8. Video Screenshot

Single / batch
Custom count
Random or by time
High quality

Single or batch (e.g. 3/6/9 frames); random or time-based; output PNG/JPG/BMP.

Use cases: Thumbnails, covers, preview, assets

9. Video Split

By time range
By segment count
Batch
Multi-format

Split by start/end time or into N equal segments; batch; multiple formats.

Use cases: Clips, short videos, segments

10. Images to Video

Multiple images to video
Duration per image
Transitions
Background music

Create video from images; duration, background music, transitions, encoder settings.

Use cases: Slideshows, product demos, albums

11. Add Music to Video

Background music
Volume
Fade in/out
Batch

Merge one video + one audio; batch add same audio to multiple videos; volume and fade; intros/outros.

Use cases: Intros/outros, BGM, audio replacement

12. Extract Video / Audio

Extract audio
Video-only stream
Multi-format output
Batch

Demux video and audio; extract audio (e.g. MP3/WAV/FLAC); video-only; batch.

Use cases: Audio extraction, demux, assets

13. Media Metadata

Edit metadata
Title, artist, album
Batch
No re-encode

Batch edit title, artist, album, year, description; visible in file properties.

Use cases: Tagging, copyright, library management

14. Whisper Subtitle Generation

AI-generated
Multi-language
Multiple models
Translation
Batch

Use Whisper to extract subtitles. Models: tiny/base/small/medium/large. Multi-language (Chinese, English, Japanese, Korean, etc.); auto-detect language; translate; output SRT/ASS/VTT; naming: [filename]_[lang].ext. Auto-detect Whisper path (system or Plugin/whisper).

Use cases: Auto subtitles, translation, transcription, accessibility

15. Voice Cloning (RTVC / SoVITS)

RTVC and SoVITS
Pitch and speed
Speaker detection
Batch

Built-in RTVC and SoVITS. RTVC: real-time voice conversion and TTS (English only); text + reference audio. SoVITS: high-quality cloning, multi-language; source + reference audio; auto speaker list from config. Auto-detect model and Python environment.

Use cases: Voice synthesis, dubbing, content creation

System Requirements

Required

1. FFmpeg - Core engine for audio/video

Download: https://ffmpeg.org/download.html

Add to system PATH or place in program directory

2. Python - For running from source / extensions

Python 3.9+ | Install PyQt5: pip install PyQt5

Optional

1. OpenAI Whisper - AI subtitles

pip install openai-whisper

2. Tesseract-OCR - Text recognition for hard subtitles

Download: Tesseract. Program auto-detects e.g. C:\Program Files\Tesseract-OCR\tesseract.exe

3. Voice cloning models - RTVC/SoVITS in Plugin directory (see Directory structure below)

Directory Structure

Program Directory

MediaCraft/ ├── ffmpeg.exe # FFmpeg (optional) ├── whisper.exe # Whisper (optional) ├── tesseract.exe # Tesseract (optional) ├── fonts/ # Fonts │ └── *.ttf ├── Plugin/ │ ├── RTVC/ # RTVC voice cloning │ │ └── Real-Time-Voice-Cloning/ │ │ ├── pretrained_models/ # New structure (preferred) │ │ │ ├── encoder/encoder.pt │ │ │ ├── synthesizer/synthesizer.pt │ │ │ └── vocoder/vocoder.pt │ │ └── saved_models/default/ # Legacy │ ├── SoVITS/ │ │ └── so-vits-svc/ │ │ ├── configs/config.json │ │ ├── logs/44k/ │ │ ├── trained/ │ │ ├── pretrain/ │ │ ├── inference_main.py │ │ ├── raw/ │ │ └── results/ │ └── whisper/ │ ├── whisper.exe │ └── models/ ├── img/ └── version.json

fonts Directory

custom_fonts/ or fonts/: Custom fonts (TTF, OTF, WOFF) for subtitles and watermarks. Auto-detected.

Plugin Directory

RTVC/: encoder.pt, synthesizer.pt, vocoder.pt; new (pretrained_models) or legacy (saved_models/default) structure.
SoVITS/: config.json, G_*.pth, D_*.pth, pretrain; speaker list from config; conda environment recommended.
whisper/: Executable and models (tiny/base/small/medium/large) for AI subtitles.

User Guide

First-time Setup

Install FFmpeg and add to PATH
Run the program and check component status in Tool Settings
Install Whisper and Tesseract if needed
Configure paths if required
Place voice cloning models in Plugin directory

Basic Workflow

Select a feature from the left panel
Add files (drag and drop supported)
Set parameters
Click Execute
Check the operation log for status

Batch & Advanced

Most features support batch processing, drag-and-drop, and reordering. AI subtitles (Whisper), voice cloning (RTVC/SoVITS), OCR subtitle extraction, and auto path detection are available. All operations are logged; log can be cleared and viewed in real time.

Notes

Paths: Avoid special characters in file paths
Format: Ensure input files are valid and not corrupted
Disk space: Ensure enough space for large files
Time: Complex tasks may take a while
Models: Voice cloning needs complete model files; Windows 10+ recommended

FAQ

Q: Program says FFmpeg not found?

A: Install FFmpeg and add it to system PATH, or place ffmpeg.exe in the program directory.

Q: SoVITS speaker dropdown is empty?

A: Check that config.json contains speaker info, e.g. "spk": {"Speaker1": 0, "Speaker2": 1}

Q: Whisper model detection failed?

A: Program checks: system PATH for whisper; Plugin/whisper/whisper.exe. Ensure model files are in the models directory.

Q: RTVC model detection failed?

A: Place models under Plugin/RTVC/Real-Time-Voice-Cloning/ with encoder.pt, synthesizer.pt, vocoder.pt. Both pretrained_models/ and saved_models/default/ are supported; new structure is preferred.

Q: SoVITS model detection failed?

A: Use Plugin/SoVITS/so-vits-svc/; check G_*.pth, D_*.pth in logs/44k or trained, and pretrain files; ensure configs/config.json exists with spk or spk2id.

Q: Does RTVC only support English?

A: Yes. Current RTVC accepts English text only. For other languages, use SoVITS.

Q: SoVITS error: "The name you entered is not in the speaker list!"

A: The selected speaker is not in the config. Choose a speaker from the dropdown that exactly matches a name in config.json.

Q: SoVITS missing modules in dedicated Python env?

A: Program prefers a conda env whose name contains "so-vits". In that env run: pip install torch torchaudio soundfile librosa numpy scipy or use the project requirements.txt.

Q: Voice cloning takes very long?

A: Normal. AI models need heavy computation; SoVITS is slower than RTVC but higher quality; CPU is much slower than GPU. Do not close the program; check the log for progress.

Q: Program freezes during batch processing?

A: Check file sizes and disk space; large files need more time and resources.

Support

Official site: https://www.myzhenai.com.cn/

Tech blog: https://jiayu.mybabya.com/

This document is updated with the program; please refer to the latest version.