Clone Your Voice

Complete Guide - From Audio Preparation to Model Deployment

1
Prepare Audio File
Create a MP3 or WAV file for the person speaking without background noise. This will be the source material for training your voice clone.
πŸ’‘ Tip: Record in a quiet environment using a good quality microphone for best results.
2
Segment and Transcribe
Use the following tool to upload, segment (into sentences) and transcribe the source audio file.

Processing Steps:

  • Upload audio file
  • Select sentence start/end positions
  • Cut into segments
  • Download segmented audio files and transcription
πŸ‘‰ Segment and transcribe πŸ‘‰ εˆ†ε₯δΈŽε¬ε†™
3
Finetune the TTS Model
Follow the GPT-SoVITS guide to finetune the TTS AI model using your prepared audio and transcription data.

Finetune Steps:

  • Set up CUDA and PyTorch environments
  • Use the audio and transcription files to finetune
  • Export the finetuned model files for EchoKit TTS server
πŸ“– GPT-SoVITS Guide
Export the finetuned model files for Echokit's Rust inference library
python GPT_SoVITS/stream_v2pro.py \
 --gpt_model finetuned.ckpt \
 --sovits_model finetuned.pth \
 --ref_audio ref.wav \
 --ref_text 'The text in the reference audio' \
 --output_path jit/ad_v2_pro --version v2Pro --device=cuda
                    
⚠️ Note: Model training takes considerable time. GPU acceleration is recommended. Monitor the loss function regularly during training.
4
Deploy the TTS Model
Deploy the finetuned model in the EchoKit streaming audio TTS server for real-time voice synthesis.
πŸš€ EchoKit TTS Server
βœ… Complete! You can now "speak" any text using your cloned voice!