Package: chatterbox 0.2.1

Troy Hernandez

chatterbox: Text-to-Speech Using the 'Chatterbox' Engine

A native R 'torch' port of the 'Chatterbox' text-to-speech engine <https://github.com/resemble-ai/chatterbox>. Provides speech synthesis with voice cloning; model weights are downloaded from 'HuggingFace' <https://huggingface.co/> via the 'hfhub' package.

Authors:Troy Hernandez [aut, cre], cornball.ai [cph], Resemble AI [cph]

chatterbox_0.2.1.tar.gz
chatterbox_0.2.1.tar.gz(r-4.7-any)chatterbox_0.2.1.tar.gz(r-4.6-any)
chatterbox_0.2.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
DESCRIPTION |NEWS
card.svg |card.png
chatterbox/json (API)

# Install 'chatterbox' in R:

install.packages('chatterbox', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/cornball-ai/chatterbox/issues

On CRAN:

1.00 score 10 scripts 25 exports 38 dependencies

Last updated from:e5342fd873. Checks:4 OK. Indexed: yes.

Target	Result	Time
linux-devel-x86_64	OK	137
source / vignettes	OK	213
linux-release-x86_64	OK	128
wasm-release	OK	147

Exports:chatterbox chatterbox_gc_options create_voice_embedding download_chatterbox_models download_chatterbox_turbo_models generate generate_batch integrated_loudness load_chatterbox load_chatterbox_turbo load_voice_embedding models_available normalize_loudness normalize_tts_text quick_tts read_audio resample_audio s3_tokenizer save_voice_embedding serve tts_chunked tts_to_file turbo_models_available voice_convert write_audio

Dependencies:askpass bit bit64 callr cli coro curl desc farver filelock fs glue hfhub httr jsonlite labeling lifecycle magrittr MASS mime openssl otel processx ps R6 RColorBrewer Rcpp rlang safetensors scales signal sys torch triebeard tuneR urltools viridisLite withr

Citation

Readme and manuals

Help Manual

Help page	Topics
Apply Llama3-style RoPE scaling	apply_llama3_rope_scaling
Apply rotary position embeddings	apply_rotary_emb_s3
Apply rotary position embeddings to Q and K	apply_rotary_pos_emb
Attention block for perceiver	attention_block
Basic residual block for FCM	basic_res_block
Basic transformer block	basic_transformer_block
CAM Dense TDNN Block (multiple layers with dense connections)	cam_dense_tdnn_block
CAM Dense TDNN Layer	cam_dense_tdnn_layer
CAM (Context-Aware Masking) Layer	cam_layer
CAMPPlus speaker encoder	campplus
Causal Block 1D - CausalConv + LayerNorm + Mish	causal_block1d
Causal Conditional Flow Matching	causal_cfm
Causal Conv1d - pads left only	causal_conv1d
Causal Masked Diff with Xvector	causal_masked_diff_xvec
Causal ResNet Block 1D	causal_resnet_block1d
Self-attention for transformer block	cfm_attention
CFM Estimator (ConditionalDecoder)	cfm_estimator
Create (and load) a Chatterbox TTS model	chatterbox
Recommended torch garbage-collection settings for chatterbox	chatterbox_gc_options
Compute rotary position embeddings frequencies	compute_rope_frequencies
Compute mel spectrogram for voice encoder	compute_ve_mel
Conformer Encoder Layer	conformer_encoder_layer
Convolutional RNN F0 Predictor	conv_rnn_f0_predictor
Create pre-allocated KV cache	create_kv_cache
Create mel filterbank	create_mel_filterbank
Create voice embedding from reference audio	create_voice_embedding
Dense layer for final embedding	dense_layer
Download Chatterbox Models from HuggingFace	download_chatterbox_models
Download Chatterbox Turbo Models from HuggingFace	download_chatterbox_turbo_models
Drop invalid speech tokens	drop_invalid_tokens
Sinusoidal positional encoding (Espnet RelPositionalEncoding)	espnet_rel_positional_encoding
Factorized Convolutional Module (FCM)	fcm_module
Feed-forward network for transformer Matches diffusers FeedForward: net = [GELU(proj), Dropout, Linear]	feed_forward
FSMN Multi-Head Attention	fsmn_multi_head_attention
FSQ Codebook module	fsq_codebook
FSQ Vector Quantization wrapper	fsq_vector_quantization
GELU activation with projection (matches diffusers GELU structure)	gelu_with_proj
Generate speech from text	generate
Generate speech for several texts with one batched synthesis pass	generate_batch
Get padding for convolution	get_conv_padding
Get or create traced layers for cached inference	get_traced_layers
GPT-2 Attention (combined QKV projection)	gpt2_attention
GPT-2 Transformer Block	gpt2_block
GPT-2 Model Configuration	gpt2_config
GPT-2 Layer Normalization	gpt2_layer_norm
GPT-2 MLP (GELU activation)	gpt2_mlp
GPT-2 Model (transformer backbone)	gpt2_model
HiFiGAN Residual Block	hifigan_resblock
HiFTNet Generator	hift_generator
Initialize cache with first token K/V values	init_cache_from_first
Integrated loudness (ITU-R BS.1770-4)	integrated_loudness
Check if model is loaded	is_loaded
Learned position embeddings module	learned_position_embeddings
Linear No Subsampling layer	linear_no_subsampling
Llama attention module	llama_attention
Create Llama 520M configuration	llama_config_520m
Llama decoder layer	llama_decoder_layer
Llama MLP module	llama_mlp
Llama model (decoder only)	llama_model
RMS Normalization module	llama_rms_norm
Load Chatterbox model weights	load_chatterbox
Load Chatterbox Turbo model weights	load_chatterbox_turbo
Load Conformer Encoder weights	load_conformer_encoder_weights
Load weights from safetensors into Llama model	load_llama_weights
Load T3 turbo weights from safetensors	load_t3_turbo_weights
Load T3 weights from safetensors	load_t3_weights
Load tokenizer from JSON file (internal)	load_tokenizer
Load a voice embedding from disk	load_voice_embedding
Load voice encoder weights from safetensors	load_voice_encoder_weights
Create non-padding mask	make_non_pad_mask_s3
Create padding mask	make_pad_mask
Convert mask to attention bias	mask_to_bias
Mish activation	mish_activation
Check if Models are Downloaded	models_available
Normalize audio to a target loudness	normalize_loudness
Normalize text for TTS	normalize_tts_text
Pad audio to multiple of token rate	pad_audio_for_tokenizer
Perceiver resampler for conditioning compression	perceiver_resampler
Positionwise Feed Forward	positionwise_feedforward
Pre-Lookahead Layer	pre_lookahead_layer
Precompute rotary position embedding frequencies	precompute_freqs_cis
Print method for chatterbox	print.chatterbox
Print method for chatterbox_gc_options	print.chatterbox_gc_options
Print method for voice_embedding	print.voice_embedding
Normalize punctuation for TTS	punc_norm
Quick TTS - one-line text-to-speech	quick_tts
Read audio file	read_audio
Reflection padding for 1D (nn_reflection_pad1d equivalent)	reflection_pad1d
Relative Position Multi-Headed Attention	rel_position_attention
Resample audio	resample_audio
Rotate half of the tensor for RoPE	rotate_half
S3 Audio Encoder V2	s3_audio_encoder
Compute log mel spectrogram for S3Tokenizer	s3_log_mel_spectrogram
Multi-Head Attention base module	s3_multi_head_attention
Residual attention block	s3_residual_attention_block
S3Tokenizer V2 module	s3_tokenizer
S3Tokenizer model configuration	s3_tokenizer_config
S3Gen Token to Waveform	s3gen
Save a voice embedding to disk	save_voice_embedding
Serve chatterbox over HTTP	serve
Sine Generator	sine_gen
Sinusoidal positional embedding for timesteps	sinusoidal_pos_emb
Snake activation function	snake_activation
Source Module for Neural Source Filter	source_module_hn_nsf
Statistics pooling	statistics_pooling
Create T3 conditioning object	t3_cond
T3 conditioning encoder	t3_cond_enc
Move T3 conditioning to device	t3_cond_to_device
Create T3 configuration (English-only)	t3_config_english
Create T3 turbo configuration (GPT-2 backbone)	t3_config_turbo
T3 inference with JIT tracing (optimized)	t3_inference_traced
T3 Token-to-Token TTS model	t3_model
T3 Token-to-Token TTS model (Turbo variant with GPT-2 backbone)	t3_model_turbo
TDNN Layer	tdnn_layer
Timestep embedding MLP	timestep_embedding
Encode text to token IDs using BPE	tokenize_text
Traceable attention module with pre-allocated KV cache	traceable_attention
Traceable decoder layer with pre-allocated KV cache	traceable_decoder_layer
Traceable K/V projection module	traceable_kv_projector
Traceable transformer for cached inference	traceable_transformer_cached
Traceable transformer for first token (no cache)	traceable_transformer_first
Transit layer (channel reduction)	transit_layer
Transpose layer for use in sequential	transpose_layer
Generate speech for long text (the long-form policy layer)	tts_chunked
Generate speech and save to file	tts_to_file
Check if Turbo Models are Downloaded	turbo_models_available
Update KV cache with new K/V values	update_kv_cache
Update valid mask to include new position	update_valid_mask
Upsample 1D	upsample_1d
Upsample Conformer Encoder	upsample_conformer_encoder
Upsample Conformer Encoder	upsample_conformer_encoder_full
Convert speech to a target voice	voice_convert
Voice encoder module	voice_encoder
Voice encoder configuration	voice_encoder_config
Write audio file	write_audio

Package: chatterbox 0.2.1

chatterbox: Text-to-Speech Using the 'Chatterbox' Engine

Citation

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)