MediaCraft - Audio & Video Tool

Overview

MediaCraft is an all-in-one audio and video processing tool built with PyQt5 + FFmpeg. It provides a graphical interface to make complex tasks simple. It supports 15 core feature modules, from basic editing to advanced AI processing.

Version: 1.0.0 | Developer: Hainan Xiandao | Website: https://www.myzhenai.com.cn/

Download & Install

Baidu Netdisk - MediaCraft Extract code: 79uu
Quark - MediaCraft Extract code: WqYX
eCloud - MediaCraft Access code: zw01
Size tip: If you do not use the following advanced features, you can skip or delete the corresponding plugin folders to save space:
- RTVC (real-time voice cloning) - MediaCraft\Plugin\RTVC
- SoVITS (high-quality voice cloning) - MediaCraft\Plugin\SoVITS
- Whisper (AI subtitle recognition) - MediaCraft\Plugin\whisper
Keeping the core program is enough for most common tasks (transcode, merge, watermark, screenshot, split, extract audio/subtitles, etc.).

Installation: Extract and run. For AI/voice cloning, add the corresponding Plugin subfolders as needed.

Core Features

1. Tool Settings

Centralized path configuration for all dependencies, with auto-detect and manual set. Audio devices are detected and listed automatically.

Use cases: System config, environment setup, component management

2. Screen Recording

Full screen or custom region; system sound and microphone; mouse cursor. Configurable FPS, encoding quality (ultrafast/fast/medium/slow), bitrate (5000k-15000k or custom). Pause/resume/stop; default save to desktop. Requires Screen Capturer Recorder and VB-CABLE for audio.

Use cases: Tutorials, game recording, demos, online teaching

3. Video Watermark

Image (PNG/JPG) and text watermarks; font, size, color (HEX), outline, transparency; position; batch; drag-and-drop; overwrite option.

Use cases: Copyright, branding, identification

4. Video Subtitles

Hard subtitles (burned-in, SRT); soft subtitles (external track, no re-encode). Languages, position, font, color, outline, background. Auto-detect same-name .srt/.ass/.ssa; track name, language code (ISO 639-1), format (mov_text/subrip/ass) for MP4/MKV.

Use cases: Multilingual video, accessibility, teaching, promos

5. Video Merge

Merge multiple videos or audio files; auto format check; drag to reorder.

Use cases: Episode merge, album creation, long-form video

6. Video Transcode

Batch re-encode; custom video encoder (e.g. H.264/H.265), audio encoder, resolution, bitrate; hardware acceleration when available.

Use cases: Format conversion, compression, compatibility, batch

7. Extract Subtitles

Extract soft subtitles directly; use Tesseract-OCR for hard subtitles. Output SRT/ASS/VTT; batch supported.

Use cases: Silent video subtitles, multilingual subs, analysis

8. Video Screenshot

Single or batch (e.g. 3/6/9 frames); random or time-based; output PNG/JPG/BMP.

Use cases: Thumbnails, covers, preview, assets

9. Video Split

Split by start/end time or into N equal segments; batch; multiple formats.

Use cases: Clips, short videos, segments

10. Images to Video

Create video from images; duration, background music, transitions, encoder settings.

Use cases: Slideshows, product demos, albums

11. Add Music to Video

Merge one video + one audio; batch add same audio to multiple videos; volume and fade; intros/outros.

Use cases: Intros/outros, BGM, audio replacement

12. Extract Video / Audio

Demux video and audio; extract audio (e.g. MP3/WAV/FLAC); video-only; batch.

Use cases: Audio extraction, demux, assets

13. Media Metadata

Batch edit title, artist, album, year, description; visible in file properties.

Use cases: Tagging, copyright, library management

14. Whisper Subtitle Generation

Use Whisper to extract subtitles. Models: tiny/base/small/medium/large. Multi-language (Chinese, English, Japanese, Korean, etc.); auto-detect language; translate; output SRT/ASS/VTT; naming: [filename]_[lang].ext. Auto-detect Whisper path (system or Plugin/whisper).

Use cases: Auto subtitles, translation, transcription, accessibility

15. Voice Cloning (RTVC / SoVITS)

Built-in RTVC and SoVITS. RTVC: real-time voice conversion and TTS (English only); text + reference audio. SoVITS: high-quality cloning, multi-language; source + reference audio; auto speaker list from config. Auto-detect model and Python environment.

Use cases: Voice synthesis, dubbing, content creation

System Requirements

Required

1. FFmpeg - Core engine for audio/video

Download: https://ffmpeg.org/download.html

Add to system PATH or place in program directory

2. Python - For running from source / extensions

Python 3.9+ | Install PyQt5: pip install PyQt5

Optional

1. OpenAI Whisper - AI subtitles

pip install openai-whisper

2. Tesseract-OCR - Text recognition for hard subtitles

Download: Tesseract. Program auto-detects e.g. C:\Program Files\Tesseract-OCR\tesseract.exe

3. Voice cloning models - RTVC/SoVITS in Plugin directory (see Directory structure below)

Directory Structure

Program Directory

MediaCraft/ ├── ffmpeg.exe # FFmpeg (optional) ├── whisper.exe # Whisper (optional) ├── tesseract.exe # Tesseract (optional) ├── fonts/ # Fonts │ └── *.ttf ├── Plugin/ │ ├── RTVC/ # RTVC voice cloning │ │ └── Real-Time-Voice-Cloning/ │ │ ├── pretrained_models/ # New structure (preferred) │ │ │ ├── encoder/encoder.pt │ │ │ ├── synthesizer/synthesizer.pt │ │ │ └── vocoder/vocoder.pt │ │ └── saved_models/default/ # Legacy │ ├── SoVITS/ │ │ └── so-vits-svc/ │ │ ├── configs/config.json │ │ ├── logs/44k/ │ │ ├── trained/ │ │ ├── pretrain/ │ │ ├── inference_main.py │ │ ├── raw/ │ │ └── results/ │ └── whisper/ │ ├── whisper.exe │ └── models/ ├── img/ └── version.json

fonts Directory

Plugin Directory

User Guide

First-time Setup

  1. Install FFmpeg and add to PATH
  2. Run the program and check component status in Tool Settings
  3. Install Whisper and Tesseract if needed
  4. Configure paths if required
  5. Place voice cloning models in Plugin directory

Basic Workflow

  1. Select a feature from the left panel
  2. Add files (drag and drop supported)
  3. Set parameters
  4. Click Execute
  5. Check the operation log for status

Batch & Advanced

Most features support batch processing, drag-and-drop, and reordering. AI subtitles (Whisper), voice cloning (RTVC/SoVITS), OCR subtitle extraction, and auto path detection are available. All operations are logged; log can be cleared and viewed in real time.

Notes

FAQ

Q: Program says FFmpeg not found?

A: Install FFmpeg and add it to system PATH, or place ffmpeg.exe in the program directory.

Q: SoVITS speaker dropdown is empty?

A: Check that config.json contains speaker info, e.g. "spk": {"Speaker1": 0, "Speaker2": 1}

Q: Whisper model detection failed?

A: Program checks: system PATH for whisper; Plugin/whisper/whisper.exe. Ensure model files are in the models directory.

Q: RTVC model detection failed?

A: Place models under Plugin/RTVC/Real-Time-Voice-Cloning/ with encoder.pt, synthesizer.pt, vocoder.pt. Both pretrained_models/ and saved_models/default/ are supported; new structure is preferred.

Q: SoVITS model detection failed?

A: Use Plugin/SoVITS/so-vits-svc/; check G_*.pth, D_*.pth in logs/44k or trained, and pretrain files; ensure configs/config.json exists with spk or spk2id.

Q: Does RTVC only support English?

A: Yes. Current RTVC accepts English text only. For other languages, use SoVITS.

Q: SoVITS error: "The name you entered is not in the speaker list!"

A: The selected speaker is not in the config. Choose a speaker from the dropdown that exactly matches a name in config.json.

Q: SoVITS missing modules in dedicated Python env?

A: Program prefers a conda env whose name contains "so-vits". In that env run: pip install torch torchaudio soundfile librosa numpy scipy or use the project requirements.txt.

Q: Voice cloning takes very long?

A: Normal. AI models need heavy computation; SoVITS is slower than RTVC but higher quality; CPU is much slower than GPU. Do not close the program; check the log for progress.

Q: Program freezes during batch processing?

A: Check file sizes and disk space; large files need more time and resources.

Support

Official site: https://www.myzhenai.com.cn/

Tech blog: https://jiayu.mybabya.com/

© 2025 Hainan Xiandao | JiaYu Blog

This document is updated with the program; please refer to the latest version.