Build and run a GPT-SoVITS server

As the EchoKit server responds to the user queries, it streams voice audio data to the EchoKit device. The TTS service on the server must be able to generate audio in a stream.

GPT-SoVITS is a leading open-source TTS model and inference library. It can emulate a speaker with only 5 seconds of audio sample, and no change to the model itself (zero shot). With a 1 minute audio sample, you could finetune the model to clone the voice.

We have created a Rust-based API server that supports streaming TTS based on GPT-SoVITS models. You can build and deploy a streaming TTS server, and then use it in the tts section in the config.toml.

Install libtorch dependencies

Linux x86 CPU with or without CUDA and ROCm

# download libtorch
curl -LO https://download.pytorch.org/libtorch/cu124/libtorch-cxx11-abi-shared-with-deps-2.4.0%2Bcu124.zip
unzip libtorch-cxx11-abi-shared-with-deps-2.4.0%2Bcu124.zip

MacOS on Apple Silicon (M-series) devices

curl -LO https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.8.0.zip
unzip libtorch-macos-arm64-2.8.0.zip

Then, tell the system where to find your LibTorch.

# Add to ~/.zprofile or ~/.bash_profile
export LD_LIBRARY_PATH=$(pwd)/libtorch/lib:$LD_LIBRARY_PATH
export LIBTORCH=$(pwd)/libtorch 

Build the API server

git clone https://github.com/second-state/gsv_tts
git clone https://github.com/second-state/gpt_sovits_rs

cd gsv_tts
cargo build --release

Get model files

cd resources

curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/t2s.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/vits.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/ssl_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/bert_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/g2pw_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/mini-bart-g2p.pt

If you do not have Nvidia GPU and CUDA installed. You will download the following CPU over-write versions.

curl -L -o t2s.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/t2s.cpu.pt
curl -L -o vits.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/vits.cpu.pt

Run the API server

TTS_LISTEN=0.0.0.0:9094 nohup target/release/gsv_tts &

Test the server here: http://localhost:9094/.

Configure the EchoKit server

In the TTS section of config.toml, you can now use the following URL for the streaming TTS from this server.

[tts]
platform = "StreamGSV"
url = "http://localhost:9094/v1/audio/stream_speech"
speaker = "texan"

Next step

You can clone any voice for the GPT-SoVITS server.

Install libtorch dependencies​

Build the API server​

Get model files​

Run the API server​

Configure the EchoKit server​

Next step​