Package 'BurStMisc'

Title:	Burns Statistics Miscellaneous
Description:	Script search, corner, genetic optimization, permutation tests, write expect test.
Authors:	Pat Burns
Maintainer:	Pat Burns <[email protected]>
License:	Unlimited
Version:	1.1
Built:	2025-02-21 06:53:02 UTC
Source:	CRAN

Help Index

Write a Fragment of a C Program
Portion of an Array or Data Frame
Genetic Optimization
Control parameters for genopt
Membership of ntile groups
Random Permutation Test
Plot Distribution from a Permutation Test
Search R scripts for text
Summary of genopt object
Text for a testthat test

Write a Fragment of a C Program

Description

Writes either the C declaration of items in a list, or initializes them with the contents of the list.

Usage

Cfrag.list(x, file = NULL,  item.num = c(3, 10, 5),
	indent = c("\t", "\t\t"), declaration.only = FALSE,
	long = FALSE, append = FALSE)
Cfrag.list(x, file = NULL,  item.num = c(3, 10, 5),
	indent = c("\t", "\t\t"), declaration.only = FALSE,
	long = FALSE, append = FALSE)

Arguments

`x`	required. A list with names.
`file`	either `NULL` or a character string. If `NULL` (or an empty string), then a vector of characters is returned.
`item.num`	length three vector giving the number of items per line for doubles, integers and characters.
`indent`	length two vector giving the amount to indent declarations and the items in the initialization.
`declaration.only`	logical flag. If `TRUE`, then the variables are declared but not given initial values.
`long`	logical flag. If `TRUE`, then integers are declared to be "long". If `FALSE`, then integers are declared to be "int".
`append`	logical flag. If `TRUE`, then the file is appended to; otherwise it is overwritten if it exists.

Value

if file is a non-empty character string, then the name of the file that is written. Otherwise, a character vector of the declarations – each element representing a different line.

Side Effects

if file is a non-empty character string, then the file is created, overwritten or appended.

Details

The type to declare is dependent on the storage mode of the component of x. You may need to coerce components to get them to be declared the correct type.

Note

Rcpp now probably makes most uses of this function obsolete.

Examples

test.list <- list(adoub=as.double(rnorm(20)), anint=as.integer(92:109),
	achar=c("aaa", "bbbb", "ccccc"))

Cfrag.list(test.list, file="")

## Not run: 
Cfrag.list(test.list, file="test.c")

Cfrag.list(test.list[1], file="test.c", dec=TRUE)
Cfrag.list(test.list[-1], file="test.c", dec=FALSE, append=TRUE)

## End(Not run)
test.list <- list(adoub=as.double(rnorm(20)), anint=as.integer(92:109),
	achar=c("aaa", "bbbb", "ccccc"))

Cfrag.list(test.list, file="")

## Not run: 
Cfrag.list(test.list, file="test.c")

Cfrag.list(test.list[1], file="test.c", dec=TRUE)
Cfrag.list(test.list[-1], file="test.c", dec=FALSE, append=TRUE)

## End(Not run)

Portion of an Array or Data Frame

Description

Returns an array (or data frame) of the same number of dimensions as the input, but generally with smaller dimensions.

Usage

  corner(x, corner = "tlffff", n = 6)
corner(x, corner = "tlffff", n = 6)

Arguments

`x`	an array (which includes matrices) or data frame.
`corner`	a single character string which codes the particular corner of the array that is desired. The first character is either "t" (top) or "b" (bottom). The second character is either "l" (left) or "r" (right). Remaining characters should be either "f" (front) or "b" (back).
`n`	a vector of positive integers which generally should have length at least that of the `dim` of the array – it is replicated if not. This states (the maximum of) each of the dimensions of the result.

Value

an array similar to the input x, but with smaller dimensions in general.

Examples

  corner(freeny.x)
  corner(freeny.x, "br", c(2,3))

  # with three-dimensional arrays:
  corner(iris3)
  corner(array(1:1000, rep(10, 3)), "brf", 2:4)
corner(freeny.x)
  corner(freeny.x, "br", c(2,3))

  # with three-dimensional arrays:
  corner(iris3)
  corner(array(1:1000, rep(10, 3)), "brf", 2:4)

Genetic Optimization

Description

Approximately minimizes the value of a function using a simple heuristic optimizer that uses a combination of genetic and simulated annealing optimization.

Usage

  genopt(fun, population, lower = -Inf, upper = Inf,
         scale = dcontrol["eps"], add.args = NULL,
         control = genopt.control(...), ...)
genopt(fun, population, lower = -Inf, upper = Inf,
         scale = dcontrol["eps"], add.args = NULL,
         control = genopt.control(...), ...)

Arguments

`fun`	function that returns a numerical value.
`population`	a matrix or a list. If a matrix, then the rows correspond to the parameters and the columns are different parameter vectors. If a list, then it should have a component named "population" and optionally a component named "objective" – in particular, it can be the result of a call to `genopt`. If the "objective" component exists, then it is assumed to be the function values of the population and the function will not be evaluated on the members of the population.
`lower`	vector giving the lower bound for parameter values. This is replicated to be as long as the number of parameters.
`upper`	vector giving the upper bound for parameter values. This is replicated to be as long as the number of parameters.
`scale`	vector of scales to use when doing local search with a solution. This is replicated to be as long as the number of parameters.
`add.args`	list of additional arguments to `fun`.
`control`	an object like the output of `genopt.control` containing control parameters for the optimization.
`...`	arguments for `genopt.control` if the `control` argument is not given explicitly.

Details

There is a summary method for class genopt which shows the call, a summary of the set of objectives found, and the best solution (set of parameters).

Value

a list of class genopt with the following components:

`population`	a matrix of the same size as the input population matrix, but generally with different values in it.
`objective`	a vector with length equal to the number of columns of `population` containing the function value for the corresponding parameter vector in `population`.
`funevals`	the number of function evaluations performed. If `population` is given as a list that contains a "funevals" component, then that number is added to the number of evaluations done in the current call.
`random.seed`	the random seed at the start of the call – given so that you can reproduce the computations.
`call`	an image of the call that created this object.

References

The original version of this function appeared in "S Poetry".

Examples

  # two parameters, population size 5
  go1 <- genopt(function(x, other) sum(x, other),
                population=matrix(rexp(10), nrow=2, ncol=5),
                lower=0, add.arg=list(other=3), trace=FALSE)
  summary(go1)

  go2 <- genopt(function(x, other) sum(x, other), population=go1,
                lower=0, add.arg=list(other=3), trace=FALSE)

# two parameters, population size 5
  go1 <- genopt(function(x, other) sum(x, other),
                population=matrix(rexp(10), nrow=2, ncol=5),
                lower=0, add.arg=list(other=3), trace=FALSE)
  summary(go1)

  go2 <- genopt(function(x, other) sum(x, other), population=go1,
                lower=0, add.arg=list(other=3), trace=FALSE)

Control parameters for genopt

Description

Returns a list suitable as the control argument of the genopt function.

Usage

  genopt.control(births = 100, random.n = 0, jitters.n = 3,
                 trace = TRUE, eps = 0.1, prob = 0.4, scale.min = 1e-12,
                 maxeval = Inf)
genopt.control(births = 100, random.n = 0, jitters.n = 3,
                 trace = TRUE, eps = 0.1, prob = 0.4, scale.min = 1e-12,
                 maxeval = Inf)

Arguments

`births`	the number of times two parents are combined to create a new solution.
`random.n`	the number of random solutions created (and evaluated) before the genetic phase starts.
`jitters.n`	the number of times a newly successful solution is changed (in the simulated annealing phase) in an attempt to find a nearby better one.
`trace`	logical value: should progress be printed?
`eps`	the default value for the scale of the jittering in simulated annealing.
`prob`	in the selection of parameters between two parents, the probability of each coming from the first parent.
`scale.min`	the minimum value allowed in the `scale` argument of `genopt`.
`maxeval`	the maximum number of function evaluations allowed. This takes previous function calls into account if `funevals` is a component of the `population` argument of `genopt`.

Value

a list with components:

`icontrol`	vector of the control parameters that are logically integer (or logical).
`dcontrol`	vector of the control parameters that are logically real-valued.

Examples

  goc1 <- genopt.control(random.n=200, births=1000, trace=FALSE)

  go1 <- genopt(function(x, other) sum(x, other),
                matrix(rexp(10), nrow=2, ncol=5), lower=0, add.arg=list(other=3),
                control=goc1)

  goc2 <- genopt.control(random.n=0, births=1000, trace=FALSE)

  go2 <- genopt(function(x, other) sum(x, other), go1,
                lower=0, add.arg=list(other=3), control=goc2)

goc1 <- genopt.control(random.n=200, births=1000, trace=FALSE)

  go1 <- genopt(function(x, other) sum(x, other),
                matrix(rexp(10), nrow=2, ncol=5), lower=0, add.arg=list(other=3),
                control=goc1)

  goc2 <- genopt.control(random.n=0, births=1000, trace=FALSE)

  go2 <- genopt(function(x, other) sum(x, other), go1,
                lower=0, add.arg=list(other=3), control=goc2)

Membership of ntile groups

Description

Creates groups where the groups each have as close to the same number of members as possible.

Usage

ntile(x, ngroups, na.rm = FALSE, result = "list",
    reverse = FALSE, checkBleed = TRUE)
ntile(x, ngroups, na.rm = FALSE, result = "list",
    reverse = FALSE, checkBleed = TRUE)

Arguments

`x`	a numeric vector.
`ngroups`	a positive integer giving the number of groups to create.
`na.rm`	logical value: if `TRUE`, then missing values are removed before forming the groups.
`result`	a character string specifying the form of the resulting value. This must be (an abbreviation of) one of: `"list"`, `"numeric"`, `"factor"`
`reverse`	logical value: if `TRUE`, then the biggest values are in group 1.
`checkBleed`	logical value: if `TRUE`, then a warning is issued if the same value appears in more than one group.

Value

the form of the value depends on the result argument.

The "list" result has the values of the input x grouped into ngroups components.

The "numeric" result is a vector of integers from 1 to ngroups indicating which group the corresponding element of x is in.

The "factor" result is an ordered factor version of the "numeric" result.

Note

A more primitive version of this function appeared in a blog post called "Miles of iles" on the Portfolio Probe website. A bit of comparison with the alternative functions in See Also may be found there.

Author(s)

Pat Burns

Examples

ntile(setNames(state.area, state.name), 10)

ntile(Loblolly$height, 5, result="factor", reverse=TRUE)

ntile(c(-10:10, rep(0, 7)), 4)
ntile(setNames(state.area, state.name), 10)

ntile(Loblolly$height, 5, result="factor", reverse=TRUE)

ntile(c(-10:10, rep(0, 7)), 4)

Random Permutation Test

Description

Performs a random permutation test on the relationship between two discrete variables, or by using a function.

Usage

permutation.test.discrete(x, y = NULL, scores, alternative = "greater", 
	trials = 1000)
permutation.test.fun(x, y = NULL, fun = function(x, y) sum(x * y), 
	alternative = "greater", trials = 1000)
permutation.test.discrete(x, y = NULL, scores, alternative = "greater", 
	trials = 1000)
permutation.test.fun(x, y = NULL, fun = function(x, y) sum(x * y), 
	alternative = "greater", trials = 1000)

Arguments

`x`	either a two-column matrix or data frame, or a vector. When this has two columns, `y` should not be given.
`y`	either a zero-length object, or a vector the same length as `x`.
`scores`	a numeric matrix providing the scores for each combination of the unique values in `x` and `y`. The dimnames need to consist of these unique values.
`fun`	a function that takes two arguments and returns a single numeric value.
`alternative`	a character string that partially matches either `"greater"` or `"less"`.
`trials`	the number of random permutations to be performed.

Value

an object of class permtstBurSt which is a list with the following components:

`original.score`	the score (or function value) produced by the original data.
`perm.scores`	a vector of the scores (or function values) from the random permutations.
`stats`	a numeric vector that has the number of observations in the data, the number of random permutations done, the number of permutations that produced a score at least as extreme as the original, and the p-value for the test.
`alternative`	either `"greater"` or `"less"`.
`random.seed`	the random seed at the start of the call.
`call`	an image of the call that was used.

Side effects

The object .Random.seed is either created or updated.

Details

There are print and plot methods for this class of object. The print method merely describes the object and shows the p-value of the test (rounded, by default, to 4 digits).

A simple version of the p-value is the number of random permutations that are at least as extreme as the original divided by the total number of random permutations. The value computed, which is more correct, has 1 added to both numerator and denominator. In general the difference is of no consequence. However, there are cases where it does matter, for example when independent p-values are combined.

These functions are related to "Permuting Super Bowl Theory" which can be found in the working papers section of http://www.burns-stat.com. The paper explains permutation tests via a discussion of the Super Bowl indicator of the stock market.

Examples

winner <- c('N', 'N', 'A', 'N', 'A', 'N')
market <- c('+', '-', '-', '+', '+', '+')
smat <- diag(2)
dimnames(smat) <- list(c('N', 'A'), c('+', '-'))

pt1 <- permutation.test.discrete(winner, market, smat)
print(pt1)
plot(pt1)

pt2 <- permutation.test.fun(ToothGrowth[, -2], fun=cor)
print(pt2)
plot(pt2)


smat2 <- matrix(c(-3, -.5, 3, -1, 1, 0, 0, 1, -1, 3, -.5, -3),
	3, 4, dimnames=list(c('Up', 'Neut', 'Down'), 
	c('Q1', 'Q2', 'Q3', 'Q4')))
my.results <- data.frame(results=sample(c('Up', 'Neut', 'Down'),
       100, replace=TRUE), quartile=sample(
       c('Q1', 'Q2', 'Q3', 'Q4'), 100, replace=TRUE))
permutation.test.discrete(my.results[, c("results", "quartile")], 
	score=smat2)

winner <- c('N', 'N', 'A', 'N', 'A', 'N')
market <- c('+', '-', '-', '+', '+', '+')
smat <- diag(2)
dimnames(smat) <- list(c('N', 'A'), c('+', '-'))

pt1 <- permutation.test.discrete(winner, market, smat)
print(pt1)
plot(pt1)

pt2 <- permutation.test.fun(ToothGrowth[, -2], fun=cor)
print(pt2)
plot(pt2)


smat2 <- matrix(c(-3, -.5, 3, -1, 1, 0, 0, 1, -1, 3, -.5, -3),
	3, 4, dimnames=list(c('Up', 'Neut', 'Down'), 
	c('Q1', 'Q2', 'Q3', 'Q4')))
my.results <- data.frame(results=sample(c('Up', 'Neut', 'Down'),
       100, replace=TRUE), quartile=sample(
       c('Q1', 'Q2', 'Q3', 'Q4'), 100, replace=TRUE))
permutation.test.discrete(my.results[, c("results", "quartile")], 
	score=smat2)

Plot Distribution from a Permutation Test

Description

Shows the distribution of the scores of the random permutations relative to the score from the actual data.

Usage

## S3 method for class 'permtstBurSt'
plot(x, col = c("black", "red"), width = 10, 
	uniqlim = 10, main = "", xlab = "Scores", ...)
## S3 method for class 'permtstBurSt'
plot(x, col = c("black", "red"), width = 10, 
	uniqlim = 10, main = "", xlab = "Scores", ...)

Arguments

`x`	an object of class `permtstBurSt`, which is likely the result of: `permutation.test.discrete` or `permutation.test.fun`.
`col`	the colors to use in the plot – mainly for the bar-type plot when there are only a few scores in the permutation distribution. The first color is the one used for the insignificant part of the distribution, the second color is for the significant part of the distribution. The second color is used to identify the original score when a histogram is used.
`width`	the width of the bars.
`uniqlim`	the value that determines which type of plot to use. If the number of unique values in the scores from the permutations is bigger than `uniqlim`, then a histogram is drawn. Otherwise, bars are drawn.
`main`	character string giving the main title for the plot.
`xlab`	character string giving the label for the x-axis.
`...`	additional graphics parameters may be given.

Side effects

a plot is produced.

Details

Two styles of plot are possible. A histogram is more appropriate when there are more than a few values in the permutation distribution. When there are only a few distinct values, then a bar-type plot is more informative. The uniqlim argument controls the definition of "a few".

Examples

## Not run: 
pt1 <- permutation.test.discrete(winner, market, smat)
plot(pt1, col=c("blue", "gold"))

## End(Not run)
## Not run: 
pt1 <- permutation.test.discrete(winner, market, smat)
plot(pt1, col=c("blue", "gold"))

## End(Not run)

Search R scripts for text

Description

Returns a list of text matches in the scripts in a directory.

Usage

scriptSearch(pattern, path = ".", subdirs = TRUE, suffix = "\\.[rR]$",
      commentsIncluded = FALSE, ..., verbose = FALSE)
scriptSearch(pattern, path = ".", subdirs = TRUE, suffix = "\\.[rR]$",
      commentsIncluded = FALSE, ..., verbose = FALSE)

Arguments

`pattern`	character string containing a regular expression (this is the argument of the same name for `grep`).
`path`	a character string giving the (relative) path of the directory holding the R scripts.
`subdirs`	logical value: if `TRUE`, then subdirectories of `path` are also searched.
`suffix`	a character string restricting the files to be searched.
`commentsIncluded`	logical value: if `FALSE`, then lines beginning with the hash symbol are ignored.
`...`	additional arguments to `grep` may be given – the most likely possibilities are `ignore.case`, `perl` and `fixed`.
`verbose`	logical value: if `TRUE`, then the names of the scripts being searched are printed. The full name of this argument needs to be given – no abbreviation is allowed.

Details

The default suffix argument restricts the search to R scripts. An R script is considered to be a file that ends in dot-R, where the "R" can be lowercase or uppercase.

Value

a list: the names are equal to the file names containing matches, and each component is a character vector of the whitespace-trimmed lines in the file that match the pattern.

Limitations

Using commentsIncluded=FALSE does not preclude a match being declared when it is in a comment not starting the line.

Author(s)

Pat Burns

Examples

## Not run: 
scriptSearch("list.files")
scriptSearch("garch", "~/../burns-stat3/webpages/blog")

# search for an assignment
# ' *' means zero or more spaces
scriptSearch("specialObject *<- ", "~/myScriptDirectory", subdirs=FALSE)

# search COBOL files
scriptSearch("blah", suffix="\\.cbl$")


## End(Not run)
## Not run: 
scriptSearch("list.files")
scriptSearch("garch", "~/../burns-stat3/webpages/blog")

# search for an assignment
# ' *' means zero or more spaces
scriptSearch("specialObject *<- ", "~/myScriptDirectory", subdirs=FALSE)

# search COBOL files
scriptSearch("blah", suffix="\\.cbl$")


## End(Not run)

Summary of genopt object

Description

The call, best solution and summary of objectives in the final population.

Usage

## S3 method for class 'genopt'
summary(object, ...)
## S3 method for class 'genopt'
summary(object, ...)

Arguments

`object`	the result of a call to `genopt`.
`...`	currently unused.

Value

a list with components:

`call`	image of the call to `genopt`.
`summary.objectives`	summary of the objectives of the solutions in the final population.
`best.solution`	the vector of parameters with the best solution in the population.

Examples

  # two parameters, population size 5
  go1 <- genopt(function(x, other) sum(x, other),
                population=matrix(rexp(10), nrow=2, ncol=5),
                lower=0, add.arg=list(other=3), trace=FALSE)
  summary(go1)
# two parameters, population size 5
  go1 <- genopt(function(x, other) sum(x, other),
                population=matrix(rexp(10), nrow=2, ncol=5),
                lower=0, add.arg=list(other=3), trace=FALSE)
  summary(go1)

Text for a testthat test

Description

Automatically writes the text of a testthat::expect_equal test of an expression.

Usage

writeExpectTest(expr, filename = "", ...)
writeExpectTest(expr, filename = "", ...)

Arguments

`expr`	an expression to be tested.
`filename`	where should the results go to? The empty string means that it goes to the console.
`...`	additional arguments to `cat` may be given to control the output. The most likely choice is `append`.

Value

NULL

Side effects

text is written to the console or to a file.

Note

Some functions are hard to test, like scriptSearch. It seems like this one would be too, but in fact it is self-testing. If the test it writes doesn't pass in the same environment, then it must have done something wrong.

Examples

writeExpectTest(head(1:10))
writeExpectTest(head(cars))
writeExpectTest(head(1:10))
writeExpectTest(head(cars))