Package: chatterbox 0.2.1

Troy Hernandez

chatterbox: Text-to-Speech Using the 'Chatterbox' Engine

A native R 'torch' port of the 'Chatterbox' text-to-speech engine <https://github.com/resemble-ai/chatterbox>. Provides speech synthesis with voice cloning; model weights are downloaded from 'HuggingFace' <https://huggingface.co/> via the 'hfhub' package.

Authors:Troy Hernandez [aut, cre], cornball.ai [cph], Resemble AI [cph]

chatterbox_0.2.1.tar.gz
chatterbox_0.2.1.tar.gz(r-4.7-any)chatterbox_0.2.1.tar.gz(r-4.6-any)
chatterbox_0.2.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
chatterbox/json (API)

# Install 'chatterbox' in R:
install.packages('chatterbox', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/cornball-ai/chatterbox/issues

On CRAN:

Conda:

1.00 score 10 scripts 25 exports 38 dependencies

Last updated from:e5342fd873. Checks:4 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK137
source / vignettesOK213
linux-release-x86_64OK128
wasm-releaseOK147

Exports:chatterboxchatterbox_gc_optionscreate_voice_embeddingdownload_chatterbox_modelsdownload_chatterbox_turbo_modelsgenerategenerate_batchintegrated_loudnessload_chatterboxload_chatterbox_turboload_voice_embeddingmodels_availablenormalize_loudnessnormalize_tts_textquick_ttsread_audioresample_audios3_tokenizersave_voice_embeddingservetts_chunkedtts_to_fileturbo_models_availablevoice_convertwrite_audio

Dependencies:askpassbitbit64callrclicorocurldescfarverfilelockfsgluehfhubhttrjsonlitelabelinglifecyclemagrittrMASSmimeopensslotelprocessxpsR6RColorBrewerRcpprlangsafetensorsscalessignalsystorchtriebeardtuneRurltoolsviridisLitewithr

Readme and manuals

Help Manual

Help pageTopics
Apply Llama3-style RoPE scalingapply_llama3_rope_scaling
Apply rotary position embeddingsapply_rotary_emb_s3
Apply rotary position embeddings to Q and Kapply_rotary_pos_emb
Attention block for perceiverattention_block
Basic residual block for FCMbasic_res_block
Basic transformer blockbasic_transformer_block
CAM Dense TDNN Block (multiple layers with dense connections)cam_dense_tdnn_block
CAM Dense TDNN Layercam_dense_tdnn_layer
CAM (Context-Aware Masking) Layercam_layer
CAMPPlus speaker encodercampplus
Causal Block 1D - CausalConv + LayerNorm + Mishcausal_block1d
Causal Conditional Flow Matchingcausal_cfm
Causal Conv1d - pads left onlycausal_conv1d
Causal Masked Diff with Xvectorcausal_masked_diff_xvec
Causal ResNet Block 1Dcausal_resnet_block1d
Self-attention for transformer blockcfm_attention
CFM Estimator (ConditionalDecoder)cfm_estimator
Create (and load) a Chatterbox TTS modelchatterbox
Recommended torch garbage-collection settings for chatterboxchatterbox_gc_options
Compute rotary position embeddings frequenciescompute_rope_frequencies
Compute mel spectrogram for voice encodercompute_ve_mel
Conformer Encoder Layerconformer_encoder_layer
Convolutional RNN F0 Predictorconv_rnn_f0_predictor
Create pre-allocated KV cachecreate_kv_cache
Create mel filterbankcreate_mel_filterbank
Create voice embedding from reference audiocreate_voice_embedding
Dense layer for final embeddingdense_layer
Download Chatterbox Models from HuggingFacedownload_chatterbox_models
Download Chatterbox Turbo Models from HuggingFacedownload_chatterbox_turbo_models
Drop invalid speech tokensdrop_invalid_tokens
Sinusoidal positional encoding (Espnet RelPositionalEncoding)espnet_rel_positional_encoding
Factorized Convolutional Module (FCM)fcm_module
Feed-forward network for transformer Matches diffusers FeedForward: net = [GELU(proj), Dropout, Linear]feed_forward
FSMN Multi-Head Attentionfsmn_multi_head_attention
FSQ Codebook modulefsq_codebook
FSQ Vector Quantization wrapperfsq_vector_quantization
GELU activation with projection (matches diffusers GELU structure)gelu_with_proj
Generate speech from textgenerate
Generate speech for several texts with one batched synthesis passgenerate_batch
Get padding for convolutionget_conv_padding
Get or create traced layers for cached inferenceget_traced_layers
GPT-2 Attention (combined QKV projection)gpt2_attention
GPT-2 Transformer Blockgpt2_block
GPT-2 Model Configurationgpt2_config
GPT-2 Layer Normalizationgpt2_layer_norm
GPT-2 MLP (GELU activation)gpt2_mlp
GPT-2 Model (transformer backbone)gpt2_model
HiFiGAN Residual Blockhifigan_resblock
HiFTNet Generatorhift_generator
Initialize cache with first token K/V valuesinit_cache_from_first
Integrated loudness (ITU-R BS.1770-4)integrated_loudness
Check if model is loadedis_loaded
Learned position embeddings modulelearned_position_embeddings
Linear No Subsampling layerlinear_no_subsampling
Llama attention modulellama_attention
Create Llama 520M configurationllama_config_520m
Llama decoder layerllama_decoder_layer
Llama MLP modulellama_mlp
Llama model (decoder only)llama_model
RMS Normalization modulellama_rms_norm
Load Chatterbox model weightsload_chatterbox
Load Chatterbox Turbo model weightsload_chatterbox_turbo
Load Conformer Encoder weightsload_conformer_encoder_weights
Load weights from safetensors into Llama modelload_llama_weights
Load T3 turbo weights from safetensorsload_t3_turbo_weights
Load T3 weights from safetensorsload_t3_weights
Load tokenizer from JSON file (internal)load_tokenizer
Load a voice embedding from diskload_voice_embedding
Load voice encoder weights from safetensorsload_voice_encoder_weights
Create non-padding maskmake_non_pad_mask_s3
Create padding maskmake_pad_mask
Convert mask to attention biasmask_to_bias
Mish activationmish_activation
Check if Models are Downloadedmodels_available
Normalize audio to a target loudnessnormalize_loudness
Normalize text for TTSnormalize_tts_text
Pad audio to multiple of token ratepad_audio_for_tokenizer
Perceiver resampler for conditioning compressionperceiver_resampler
Positionwise Feed Forwardpositionwise_feedforward
Pre-Lookahead Layerpre_lookahead_layer
Precompute rotary position embedding frequenciesprecompute_freqs_cis
Print method for chatterboxprint.chatterbox
Print method for chatterbox_gc_optionsprint.chatterbox_gc_options
Print method for voice_embeddingprint.voice_embedding
Normalize punctuation for TTSpunc_norm
Quick TTS - one-line text-to-speechquick_tts
Read audio fileread_audio
Reflection padding for 1D (nn_reflection_pad1d equivalent)reflection_pad1d
Relative Position Multi-Headed Attentionrel_position_attention
Resample audioresample_audio
Rotate half of the tensor for RoPErotate_half
S3 Audio Encoder V2s3_audio_encoder
Compute log mel spectrogram for S3Tokenizers3_log_mel_spectrogram
Multi-Head Attention base modules3_multi_head_attention
Residual attention blocks3_residual_attention_block
S3Tokenizer V2 modules3_tokenizer
S3Tokenizer model configurations3_tokenizer_config
S3Gen Token to Waveforms3gen
Save a voice embedding to disksave_voice_embedding
Serve chatterbox over HTTPserve
Sine Generatorsine_gen
Sinusoidal positional embedding for timestepssinusoidal_pos_emb
Snake activation functionsnake_activation
Source Module for Neural Source Filtersource_module_hn_nsf
Statistics poolingstatistics_pooling
Create T3 conditioning objectt3_cond
T3 conditioning encodert3_cond_enc
Move T3 conditioning to devicet3_cond_to_device
Create T3 configuration (English-only)t3_config_english
Create T3 turbo configuration (GPT-2 backbone)t3_config_turbo
T3 inference with JIT tracing (optimized)t3_inference_traced
T3 Token-to-Token TTS modelt3_model
T3 Token-to-Token TTS model (Turbo variant with GPT-2 backbone)t3_model_turbo
TDNN Layertdnn_layer
Timestep embedding MLPtimestep_embedding
Encode text to token IDs using BPEtokenize_text
Traceable attention module with pre-allocated KV cachetraceable_attention
Traceable decoder layer with pre-allocated KV cachetraceable_decoder_layer
Traceable K/V projection moduletraceable_kv_projector
Traceable transformer for cached inferencetraceable_transformer_cached
Traceable transformer for first token (no cache)traceable_transformer_first
Transit layer (channel reduction)transit_layer
Transpose layer for use in sequentialtranspose_layer
Generate speech for long text (the long-form policy layer)tts_chunked
Generate speech and save to filetts_to_file
Check if Turbo Models are Downloadedturbo_models_available
Update KV cache with new K/V valuesupdate_kv_cache
Update valid mask to include new positionupdate_valid_mask
Upsample 1Dupsample_1d
Upsample Conformer Encoderupsample_conformer_encoder
Upsample Conformer Encoderupsample_conformer_encoder_full
Convert speech to a target voicevoice_convert
Voice encoder modulevoice_encoder
Voice encoder configurationvoice_encoder_config
Write audio filewrite_audio