---
title: "Introduction to punycoder"
author: "Package Author"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to punycoder}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

The `punycoder` package provides high-performance Unicode and Punycode encoding/decoding for internationalized domain names (IDNs). It addresses critical gaps in R's URL processing capabilities by offering reliable, fast conversion between Unicode and ASCII representations of domain names.

## Why punycoder?

### The Problem

International domain names containing Unicode characters (like café.com or москва.рф) need to be converted to ASCII format for use in many network protocols and systems. Existing R packages have limitations:

- **Inconsistent legacy helpers**: Existing workflows may produce incorrect punycode output
- **Limited functionality**: No comprehensive IDN handling
- **Performance**: No efficient bulk processing

### The Solution

`punycoder` provides:

- **Reliable encoding/decoding** following RFC 3492 standards
- **URL-aware processing** for complete URL handling
- **High performance** for large datasets
- **Comprehensive validation** with informative error messages

## Basic Usage

### Domain Encoding and Decoding

```{r basic-usage, eval=FALSE}
library(punycoder)

# Encode Unicode domains to ASCII
puny_encode("café.com")
# Returns: "xn--caf-dma.com"

puny_encode("москва.рф")
# Returns: "xn--80adxhks.xn--p1ai"

# Decode ASCII domains back to Unicode
puny_decode("xn--caf-dma.com")
# Returns: "café.com"

# Vectorized operations
domains <- c("café.com", "москва.рф", "北京.中国")
encoded <- puny_encode(domains)
print(encoded)
```

### URL Processing

```{r url-processing, eval=FALSE}
# Encode URLs with Unicode domains
url_encode("https://café.example.com/menu")

# Decode URLs back to Unicode
url_decode("https://xn--caf-dma.example.com/menu")

# Parse URLs with IDN handling
url_parts <- parse_url("https://café.example.com:8080/path?q=test#section")
print(url_parts)
```

### Validation and Utilities

```{r validation, eval=FALSE}
# Check if domain is already punycode
is_punycode("xn--caf-dma.com")   # TRUE
is_punycode("café.com")          # FALSE

# Check if domain contains Unicode characters
is_idn("café.com")               # TRUE
is_idn("example.com")            # FALSE

# Comprehensive domain validation
result <- validate_domain(c("café.com", "invalid..domain", "valid.org"))
print(result)
```

## Data Analysis Workflows

### Web Scraping with International Domains

```{r web-scraping, eval=FALSE}
# Example: Processing international URLs for web scraping
international_urls <- c(
  "https://café.paris.fr/menu",
  "https://москва.рф/news",
  "https://北京.中国/info"
)

# Convert to ASCII for HTTP requests
ascii_urls <- url_encode(international_urls)
print(ascii_urls)

# Process the data...

# Convert back to Unicode for display
display_urls <- url_decode(ascii_urls)
print(display_urls)
```

### Bulk Domain Processing

```{r bulk-processing, eval=FALSE}
# Example: Processing large datasets
set.seed(123)
sample_domains <- c(
  rep("example.com", 1000),
  rep("café.com", 1000),
  rep("test.org", 1000)
)

# Efficient vectorized encoding
system.time({
  encoded_domains <- puny_encode(sample_domains)
})

# Check results
table(is_punycode(encoded_domains))
```

## Error Handling

The package provides robust error handling with informative messages:

```{r error-handling, eval=FALSE}
# Strict validation (default)
try({
  puny_encode(c("valid.com", ""))  # Empty string causes error
})

# Non-strict mode returns NA for invalid input
result <- puny_encode(c("valid.com", ""), strict = FALSE)
print(result)

# Validation provides detailed error information
validation <- validate_domain(c("valid.com", "invalid..domain", ""))
print(validation)
```

## Performance Considerations

The package is designed for high-performance processing:

- **Vectorized operations**: Process thousands of domains efficiently
- **C++ backend**: Native implementation for speed
- **Memory efficient**: Handles large datasets without excessive memory use

```{r performance, eval=FALSE}
# Benchmark with large dataset
large_domains <- rep(c("example.com", "café.com"), 5000)

system.time({
  encoded <- puny_encode(large_domains)
})

# Should process 10,000+ domains per second
```

## Package Options

You can configure package behavior using R options:

```{r options, eval=FALSE}
# Set global strict validation
options(punycoder.strict = FALSE)

# Check current setting
getOption("punycoder.strict")

# Set encoding preference
options(punycoder.encoding = "UTF-8")
```

## Integration with Other Packages

`punycoder` is designed to integrate well with other R packages:

```{r integration, eval=FALSE}
# With data.table
library(data.table)
dt <- data.table(
  original = c("café.com", "москва.рф"),
  encoded = puny_encode(c("café.com", "москва.рф"))
)

# With dplyr
library(dplyr)
urls_df <- data.frame(
  unicode_url = c("https://café.com", "https://москва.рф")
) |>
  mutate(
    ascii_url = url_encode(unicode_url),
    is_international = is_idn(unicode_url)
  )
```

## Next Steps

- Explore the function documentation: `help(package = "punycoder")`
- Check out the test suite for more examples
- Report issues at: https://github.com/bart-turczynski/punycoder/issues

## Technical Details

The package uses a C++ backend with Rcpp for performance, and follows RFC 3492 standards for punycode implementation. When `libidn2` is available at build time, `punycoder` uses it behind the same R-level API and falls back to the built-in implementation otherwise.