MediaCraft is an all-in-one audio and video processing tool built with PyQt5 + FFmpeg. It provides a graphical interface to make complex tasks simple. It supports 15 core feature modules, from basic editing to advanced AI processing.
Version: 1.0.0 | Developer: Hainan Xiandao | Website: https://www.myzhenai.com.cn/
MediaCraft\Plugin\RTVCMediaCraft\Plugin\SoVITSMediaCraft\Plugin\whisper
Installation: Extract and run. For AI/voice cloning, add the corresponding Plugin subfolders as needed.
Centralized path configuration for all dependencies, with auto-detect and manual set. Audio devices are detected and listed automatically.
Use cases: System config, environment setup, component management
Full screen or custom region; system sound and microphone; mouse cursor. Configurable FPS, encoding quality (ultrafast/fast/medium/slow), bitrate (5000k-15000k or custom). Pause/resume/stop; default save to desktop. Requires Screen Capturer Recorder and VB-CABLE for audio.
Use cases: Tutorials, game recording, demos, online teaching
Image (PNG/JPG) and text watermarks; font, size, color (HEX), outline, transparency; position; batch; drag-and-drop; overwrite option.
Use cases: Copyright, branding, identification
Hard subtitles (burned-in, SRT); soft subtitles (external track, no re-encode). Languages, position, font, color, outline, background. Auto-detect same-name .srt/.ass/.ssa; track name, language code (ISO 639-1), format (mov_text/subrip/ass) for MP4/MKV.
Use cases: Multilingual video, accessibility, teaching, promos
Merge multiple videos or audio files; auto format check; drag to reorder.
Use cases: Episode merge, album creation, long-form video
Batch re-encode; custom video encoder (e.g. H.264/H.265), audio encoder, resolution, bitrate; hardware acceleration when available.
Use cases: Format conversion, compression, compatibility, batch
Extract soft subtitles directly; use Tesseract-OCR for hard subtitles. Output SRT/ASS/VTT; batch supported.
Use cases: Silent video subtitles, multilingual subs, analysis
Single or batch (e.g. 3/6/9 frames); random or time-based; output PNG/JPG/BMP.
Use cases: Thumbnails, covers, preview, assets
Split by start/end time or into N equal segments; batch; multiple formats.
Use cases: Clips, short videos, segments
Create video from images; duration, background music, transitions, encoder settings.
Use cases: Slideshows, product demos, albums
Merge one video + one audio; batch add same audio to multiple videos; volume and fade; intros/outros.
Use cases: Intros/outros, BGM, audio replacement
Demux video and audio; extract audio (e.g. MP3/WAV/FLAC); video-only; batch.
Use cases: Audio extraction, demux, assets
Batch edit title, artist, album, year, description; visible in file properties.
Use cases: Tagging, copyright, library management
Use Whisper to extract subtitles. Models: tiny/base/small/medium/large. Multi-language (Chinese, English, Japanese, Korean, etc.); auto-detect language; translate; output SRT/ASS/VTT; naming: [filename]_[lang].ext. Auto-detect Whisper path (system or Plugin/whisper).
Use cases: Auto subtitles, translation, transcription, accessibility
Built-in RTVC and SoVITS. RTVC: real-time voice conversion and TTS (English only); text + reference audio. SoVITS: high-quality cloning, multi-language; source + reference audio; auto speaker list from config. Auto-detect model and Python environment.
Use cases: Voice synthesis, dubbing, content creation
1. FFmpeg - Core engine for audio/video
Download: https://ffmpeg.org/download.html
Add to system PATH or place in program directory
2. Python - For running from source / extensions
Python 3.9+ | Install PyQt5: pip install PyQt5
1. OpenAI Whisper - AI subtitles
pip install openai-whisper
2. Tesseract-OCR - Text recognition for hard subtitles
Download: Tesseract. Program auto-detects e.g. C:\Program Files\Tesseract-OCR\tesseract.exe
3. Voice cloning models - RTVC/SoVITS in Plugin directory (see Directory structure below)
Most features support batch processing, drag-and-drop, and reordering. AI subtitles (Whisper), voice cloning (RTVC/SoVITS), OCR subtitle extraction, and auto path detection are available. All operations are logged; log can be cleared and viewed in real time.
A: Install FFmpeg and add it to system PATH, or place ffmpeg.exe in the program directory.
A: Check that config.json contains speaker info, e.g. "spk": {"Speaker1": 0, "Speaker2": 1}
A: Program checks: system PATH for whisper; Plugin/whisper/whisper.exe. Ensure model files are in the models directory.
A: Place models under Plugin/RTVC/Real-Time-Voice-Cloning/ with encoder.pt, synthesizer.pt, vocoder.pt. Both pretrained_models/ and saved_models/default/ are supported; new structure is preferred.
A: Use Plugin/SoVITS/so-vits-svc/; check G_*.pth, D_*.pth in logs/44k or trained, and pretrain files; ensure configs/config.json exists with spk or spk2id.
A: Yes. Current RTVC accepts English text only. For other languages, use SoVITS.
A: The selected speaker is not in the config. Choose a speaker from the dropdown that exactly matches a name in config.json.
A: Program prefers a conda env whose name contains "so-vits". In that env run: pip install torch torchaudio soundfile librosa numpy scipy or use the project requirements.txt.
A: Normal. AI models need heavy computation; SoVITS is slower than RTVC but higher quality; CPU is much slower than GPU. Do not close the program; check the log for progress.
A: Check file sizes and disk space; large files need more time and resources.
Official site: https://www.myzhenai.com.cn/
Tech blog: https://jiayu.mybabya.com/
© 2025 Hainan Xiandao | JiaYu Blog
This document is updated with the program; please refer to the latest version.