| Apply Llama3-style RoPE scaling | apply_llama3_rope_scaling |
| Apply rotary position embeddings | apply_rotary_emb_s3 |
| Apply rotary position embeddings to Q and K | apply_rotary_pos_emb |
| Attention block for perceiver | attention_block |
| Basic residual block for FCM | basic_res_block |
| Basic transformer block | basic_transformer_block |
| CAM Dense TDNN Block (multiple layers with dense connections) | cam_dense_tdnn_block |
| CAM Dense TDNN Layer | cam_dense_tdnn_layer |
| CAM (Context-Aware Masking) Layer | cam_layer |
| CAMPPlus speaker encoder | campplus |
| Causal Block 1D - CausalConv + LayerNorm + Mish | causal_block1d |
| Causal Conditional Flow Matching | causal_cfm |
| Causal Conv1d - pads left only | causal_conv1d |
| Causal Masked Diff with Xvector | causal_masked_diff_xvec |
| Causal ResNet Block 1D | causal_resnet_block1d |
| Self-attention for transformer block | cfm_attention |
| CFM Estimator (ConditionalDecoder) | cfm_estimator |
| Create (and load) a Chatterbox TTS model | chatterbox |
| Recommended torch garbage-collection settings for chatterbox | chatterbox_gc_options |
| Compute rotary position embeddings frequencies | compute_rope_frequencies |
| Compute mel spectrogram for voice encoder | compute_ve_mel |
| Conformer Encoder Layer | conformer_encoder_layer |
| Convolutional RNN F0 Predictor | conv_rnn_f0_predictor |
| Create pre-allocated KV cache | create_kv_cache |
| Create mel filterbank | create_mel_filterbank |
| Create voice embedding from reference audio | create_voice_embedding |
| Dense layer for final embedding | dense_layer |
| Download Chatterbox Models from HuggingFace | download_chatterbox_models |
| Download Chatterbox Turbo Models from HuggingFace | download_chatterbox_turbo_models |
| Drop invalid speech tokens | drop_invalid_tokens |
| Sinusoidal positional encoding (Espnet RelPositionalEncoding) | espnet_rel_positional_encoding |
| Factorized Convolutional Module (FCM) | fcm_module |
| Feed-forward network for transformer Matches diffusers FeedForward: net = [GELU(proj), Dropout, Linear] | feed_forward |
| FSMN Multi-Head Attention | fsmn_multi_head_attention |
| FSQ Codebook module | fsq_codebook |
| FSQ Vector Quantization wrapper | fsq_vector_quantization |
| GELU activation with projection (matches diffusers GELU structure) | gelu_with_proj |
| Generate speech from text | generate |
| Generate speech for several texts with one batched synthesis pass | generate_batch |
| Get padding for convolution | get_conv_padding |
| Get or create traced layers for cached inference | get_traced_layers |
| GPT-2 Attention (combined QKV projection) | gpt2_attention |
| GPT-2 Transformer Block | gpt2_block |
| GPT-2 Model Configuration | gpt2_config |
| GPT-2 Layer Normalization | gpt2_layer_norm |
| GPT-2 MLP (GELU activation) | gpt2_mlp |
| GPT-2 Model (transformer backbone) | gpt2_model |
| HiFiGAN Residual Block | hifigan_resblock |
| HiFTNet Generator | hift_generator |
| Initialize cache with first token K/V values | init_cache_from_first |
| Integrated loudness (ITU-R BS.1770-4) | integrated_loudness |
| Check if model is loaded | is_loaded |
| Learned position embeddings module | learned_position_embeddings |
| Linear No Subsampling layer | linear_no_subsampling |
| Llama attention module | llama_attention |
| Create Llama 520M configuration | llama_config_520m |
| Llama decoder layer | llama_decoder_layer |
| Llama MLP module | llama_mlp |
| Llama model (decoder only) | llama_model |
| RMS Normalization module | llama_rms_norm |
| Load Chatterbox model weights | load_chatterbox |
| Load Chatterbox Turbo model weights | load_chatterbox_turbo |
| Load Conformer Encoder weights | load_conformer_encoder_weights |
| Load weights from safetensors into Llama model | load_llama_weights |
| Load T3 turbo weights from safetensors | load_t3_turbo_weights |
| Load T3 weights from safetensors | load_t3_weights |
| Load tokenizer from JSON file (internal) | load_tokenizer |
| Load a voice embedding from disk | load_voice_embedding |
| Load voice encoder weights from safetensors | load_voice_encoder_weights |
| Create non-padding mask | make_non_pad_mask_s3 |
| Create padding mask | make_pad_mask |
| Convert mask to attention bias | mask_to_bias |
| Mish activation | mish_activation |
| Check if Models are Downloaded | models_available |
| Normalize audio to a target loudness | normalize_loudness |
| Normalize text for TTS | normalize_tts_text |
| Pad audio to multiple of token rate | pad_audio_for_tokenizer |
| Perceiver resampler for conditioning compression | perceiver_resampler |
| Positionwise Feed Forward | positionwise_feedforward |
| Pre-Lookahead Layer | pre_lookahead_layer |
| Precompute rotary position embedding frequencies | precompute_freqs_cis |
| Print method for chatterbox | print.chatterbox |
| Print method for chatterbox_gc_options | print.chatterbox_gc_options |
| Print method for voice_embedding | print.voice_embedding |
| Normalize punctuation for TTS | punc_norm |
| Quick TTS - one-line text-to-speech | quick_tts |
| Read audio file | read_audio |
| Reflection padding for 1D (nn_reflection_pad1d equivalent) | reflection_pad1d |
| Relative Position Multi-Headed Attention | rel_position_attention |
| Resample audio | resample_audio |
| Rotate half of the tensor for RoPE | rotate_half |
| S3 Audio Encoder V2 | s3_audio_encoder |
| Compute log mel spectrogram for S3Tokenizer | s3_log_mel_spectrogram |
| Multi-Head Attention base module | s3_multi_head_attention |
| Residual attention block | s3_residual_attention_block |
| S3Tokenizer V2 module | s3_tokenizer |
| S3Tokenizer model configuration | s3_tokenizer_config |
| S3Gen Token to Waveform | s3gen |
| Save a voice embedding to disk | save_voice_embedding |
| Serve chatterbox over HTTP | serve |
| Sine Generator | sine_gen |
| Sinusoidal positional embedding for timesteps | sinusoidal_pos_emb |
| Snake activation function | snake_activation |
| Source Module for Neural Source Filter | source_module_hn_nsf |
| Statistics pooling | statistics_pooling |
| Create T3 conditioning object | t3_cond |
| T3 conditioning encoder | t3_cond_enc |
| Move T3 conditioning to device | t3_cond_to_device |
| Create T3 configuration (English-only) | t3_config_english |
| Create T3 turbo configuration (GPT-2 backbone) | t3_config_turbo |
| T3 inference with JIT tracing (optimized) | t3_inference_traced |
| T3 Token-to-Token TTS model | t3_model |
| T3 Token-to-Token TTS model (Turbo variant with GPT-2 backbone) | t3_model_turbo |
| TDNN Layer | tdnn_layer |
| Timestep embedding MLP | timestep_embedding |
| Encode text to token IDs using BPE | tokenize_text |
| Traceable attention module with pre-allocated KV cache | traceable_attention |
| Traceable decoder layer with pre-allocated KV cache | traceable_decoder_layer |
| Traceable K/V projection module | traceable_kv_projector |
| Traceable transformer for cached inference | traceable_transformer_cached |
| Traceable transformer for first token (no cache) | traceable_transformer_first |
| Transit layer (channel reduction) | transit_layer |
| Transpose layer for use in sequential | transpose_layer |
| Generate speech for long text (the long-form policy layer) | tts_chunked |
| Generate speech and save to file | tts_to_file |
| Check if Turbo Models are Downloaded | turbo_models_available |
| Update KV cache with new K/V values | update_kv_cache |
| Update valid mask to include new position | update_valid_mask |
| Upsample 1D | upsample_1d |
| Upsample Conformer Encoder | upsample_conformer_encoder |
| Upsample Conformer Encoder | upsample_conformer_encoder_full |
| Convert speech to a target voice | voice_convert |
| Voice encoder module | voice_encoder |
| Voice encoder configuration | voice_encoder_config |
| Write audio file | write_audio |