--- title: "Introduction to the **levelSets** Package" author: "Richard Raubertas" date: "31 December 2025" output: pdf_document: toc: false toc_depth: 3 number_sections: false html_document: toc: false toc_depth: 3 number_sections: false vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{The levelSets Package} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set(echo=TRUE, fig.align="center") ``` A level set of a function $f(x)$ is the set of input points $x$ for which the function value is greater than or equal to a specified threshold $t$^[Terminology is not fully standardized. This definition is also referred to as an *upper* level set or an excursion set. For brevity the term level set is used throughout this package.]: $$L_f(t) = \{x: f(x) \ge t\}$$ Applications of level sets include confidence or credible regions for parameters of statistical models (where $x$ is a vector of parameter values and $f$ is the likelihood or posterior density); regions where classification rules assign high probability to a given class ($x$ contains feature values for an item and $f$ is the class probability); and scientific or engineering models of physical phenomena (where one is interested in input regions where model output is above a threshold). To avoid confusion with `R` functions, $f$ will be referred to as the *response function*. Its inputs $x$ are vectors with $d$ elements, so are points in $d$-dimensional *input space*. Often the valid inputs to $f$ are only a subset of input space, and that subset is called the *feasible region* for $f$. This package maps out a level set by finding its intersection with collections of 1-dimensional *rays*. A ray is just a line with a defined origin (referred to as *position* 0 of the ray) and orientation. The use of rays to map out the boundary of confidence regions for parameters of statistical models was proposed by Kim and Lindsay (2011). See the vignette *levelSets Example: Confidence Regions* for an example of that application. This package generalizes and implements Kim and Lindsay's idea. The package provides tools to generate rays, find intersections, and visualize results. It makes few assumptions about $f$: it can be discontinuous, it may have a complicated feasible region, and the target level set may be non-convex or have multiple, disconnected parts. (Note that because $f$ can be discontinuous, boundary points of a level set do not necessarily satisfy $f(x) = t$.) However it does assume that level sets of $f$ have non-zero $d$-dimensional volume. For example in $d=2$ dimensions, level sets should occupy a non-zero area of the $(x_1, x_2)$ plane, rather than being a 1-dimensional line or curve. # An example To illustrate the idea of using rays to map out a level set, we examine a simple response function with a 2-dimensional input space. Other examples that illustrate a wider range of package features are in separate vignettes. The response function is $$f(x_1, x_2) = \left\{ \begin{array} {r@{\quad\mbox{if}\quad}l} -(x_1^2 + 2x_1x_2 + 2x_2^2) & x_1 + 2x_2 < 15 \\ -((x_1 - 8)^2 + 2(x_1 - 8)(x_2 - 8) + 2(x_2 - 8)^2) & x_1 + 2x_2 \ge 15 \end{array} \right. $$ The response surface has two hills with elliptical contours. Although this function is well-defined for any $(x_1, x_2)$, for illustration we define the feasible region to be $\{x: x_1 \ge 0\}$. The goal is to map out the level set where $f \ge -40$. The response and feasibility functions can be coded in `R` as ```{r} respf <- function(x) { c1 <- x[, 1]^2 + 2*x[, 1]*x[, 2] + 2*(x[, 2]^2) c2 <- (x[, 1]-8)^2 + 2*(x[, 1]-8)*(x[, 2]-8) + 2*((x[, 2]-8)^2) resp <- ifelse(x[, 1] + 2*x[, 2] < 15, -1*c1, -1*c2) resp } feasf <- function(x) { (x[, 1] >= 0) } ``` In this package point coordinates are always represented as rows of a $d$-column matrix, one row per point. Both the response and feasibility functions must accept as their first argument a matrix of point coordinates, and return a vector with one value per point: a numeric response value, and TRUE or FALSE, respectively. These functions are wrapped in an `fnObj` object that also contains specifications for the input space and possibly other specifications and arguments (see `?fnObj`). ```{r} library(levelSets) fobj <- fnObj(c("X1", "X2"), respfn=respf, feasfn=feasf) ``` If we just wish to evaluate the response function at a set of points, the `respInfo` function will do that. It returns a data frame with one row per point. Column `feas` reports whether the point is feasible and column `lset` whether the point is in the level set. ```{r} pts <- rbind(c(2, 2), c(4, -1), c(-1, 4), c(3, 5)) respInfo(pts, fnobj=fobj, lsetthresh=-40) ``` Because `fnObj` objects are aware of the feasible region of the response function, they ensure that the latter is never called with an infeasible point, such as `pts[3, ]` here. The response value is automatically set to `NA` for such points. For a level set boundary search we need to specify a region of input space where the search will be focused. Search regions are represented by `inpRect` objects, which are axis-aligned hyper-rectangles ($d$-dimensional "boxes"). They are created by the `inpRect` function by specifying the coordinates of the lower and upper corners of the box. ```{r} rect1 <- inpRect(rbind(c(-5, -10), c(15, 12)), spec=fobj) ``` And finally we create a set of rays along which boundary intersections will be searched for. ```{r} set.seed(1) rays1 <- randomRays(50, origin=c(2, 1), rect=rect1) plot(rays1, main="Random rays, position 1 defined by a rectangle") ``` Because of the key role played by rays in this package, it is important to understand how they are parameterized. A set of rays is represented by an `inpRays` object. All rays in the object have the same *origin* point, which defines *position* 0 along the rays. The direction of a ray is defined by a second, distinct point that defines position 1 along that ray. The position of any other point on the ray is defined by linear interpolation or extrapolation from the origin and position 1 points. (Ray points on the opposite side of the origin from the position 1 point have negative position values.) Note that there is no requirement that the Euclidean distance between positions 0 and 1 be the same for all rays, and thus any given position value may represent different Euclidean distances from the origin for different rays. In particular, when the `rect` argument is used when creating an `inpRays` object, position 1 for each ray is defined as its intersection with the boundary of that hyper-rectangle, as for `rays1` above. This turns out to be useful and convenient for level set boundary searches with a search region defined by `rect`. The function `bdryFromRays` finds intersections of rays with the level set. ```{r} segs1 <- bdryFromRays(fobj, rays=rays1, lsetthresh=-40) ``` It returns an object with class `lsetSegs`. These objects actually contain *pairs* of level set boundary points. The points in each pair lie along the same ray, and the search algorithm is designed to find point pairs for which the entire line *segment* connecting the pair belongs to the level set. (Such segments are sometimes called *chords* of the level set.) There are `plot` and `summary` methods for `lsetSegs` objects. ```{r} plot(segs1, rect=rect1, main="Level set at threshold -40 (50 rays)") summary(segs1) ``` This example shows how ray-boundary intersections can outline the shape and size of a level set. There are several things to note. - 34 segments extend to the boundary of the feasible region ($x_1 = 0$). Segment endpoints on the feasible region boundary are plotted with a different color and shape. - 17 rays had more than one level set segment found. Whenever there are two or more segments along a single ray, it indicates that the level set is not convex and/or has multiple parts. - However segments along a number of other rays cross the gap between the two ellipses. This leads us to suspect that they are *invalid segments*: The claim that all points on the segment connecting two boundary points also belong to the level set appears to be false. Checking and correcting this is described below. - Default parameters for the search algorithm allow the search to go modestly beyond the specified search region `rect1`, as indicated by segments that extend outside the dashed gray rectangle. This can be controlled by the `initPosns` and `initPosns2` control parameters (see `?srchControl`). The function `lsetSegsCheck` checks the validity of segments and attempts to fix invalid ones by splitting them into two or more shorter, valid segments. ```{r} chk1 <- lsetSegsCheck(segs1, fnobj=fobj, posns=(1:4)/5) table(chk1$validIn) table(chk1$validOut) segs1 <- lsetSegs(chk1) # all TRUE, so update 'segs1' plot(segs1, rect=rect1, main="Level set at threshold -40 (invalid segs fixed)") ``` Increasing the number of rays will give a clearer picture of the level set, at the cost of more computation. We can generate another set of rays and their level set intersections ```{r} rays2 <- randomRays(50, origin=c(2, 1), rect=rect1) segs2 <- bdryFromRays(fobj, rays=rays2, lsetthresh=-40, control=list(initPosns=(0:5)/5)) ``` and combine them with the first ```{r} segs12 <- c(segs1, segs2) # 'c' method for 'lsetSegs' objects plot(segs12, rect=rect1, main="Level set at threshold -40 (100 rays)") summary(segs12) ``` # Main search functions There are four main functions in the package to find level set boundary points and segments. - `bdryFromRays()`. The user defines a single set of rays (origin, directions, and, implicitly or explicitly, an associated search region). A search is made for level set segments along those rays in that region. Not every ray needs to intersect the level set, and in fact the search might not find any segments. But when the user has a good idea of the location of the level set and it does not have a complicated shape, this approach is simple and works well. - `bdrySearch()`. An algorithm adaptively selects multiple ray origins in an attempt to explore more of the level set than a single set of rays can. Useful when the level set has a complicated shape. - `slicedBdryFromRays()`, `slicedBdrySearch()`. When the input space has more than two dimensions, it may be useful to determine boundary points and segments within lower-dimensional *slices* of input space. A slice is just a (hyper)plane in which a subset of the inputs are held at fixed values. These functions apply the algorithms from `bdryFromRays` and `bdrySearch`, respectively, to each slice separately. In addition there is a function `respProfiles()` that simply evaluates the response function at a fixed set of positions along a user-specified set of rays. The vignette *Problem Setup* goes into more detail about how to use this package. There are also two vignettes with extended examples of how to use the package for different types of level set problems. # References {-} Kim, Daeyoung and Lindsay, Bruce G. 2011. Using confidence distribution sampling to visualize confidence sets. *Statistica Sinica* 21:923--948.