Package 'reader' reference manual

Title:	Suite of Functions to Flexibly Read Data from Files
Description:	A set of functions to simplify reading data from files. The main function, reader(), should read most common R datafile types without needing any parameters except the filename. Other functions provide simple ways of handling file paths and extensions, and automatically detecting file format and structure.
Authors:	Nicholas Cooper
Maintainer:	Nicholas Cooper <[email protected]>
License:	GPL (>= 2)
Version:	1.0.6
Built:	2025-01-28 07:37:51 UTC
Source:	CRAN

Suite of Functions to Flexibly Read Data from Files

Description

A set of functions to simplify reading data from files. The main function, reader(), should read most common R datafile types without needing any parameters except the filename. Other functions provide simple ways of handling file paths and extensions, and automatically detecting file format and structure.

Details

Package:	reader
Type:	Package
Version:	1.0.6
Date:	2016-12-29
License:	GPL (>= 2)

The reader() function, for which the package is named, should be able to read most of the common types of datafiles used in R without needing any arguments other than the filename. The structure, header, file-format and delimiter are determined automatically. Usually no extra parameters are needed. Other functions provide similarly flexibility to run contigent on data type and file format, or can look for an input file in multiple directory locations. The function cat.path() provides a simple interface to construct file paths using directories, suffixes, prefixes and file extension. Functions in this package can be nested inside new functions, providing flexible parameter format, without having to use multiple if-statements to cope with contigencies. Supported types included delimited text files, R binary files, big.matrix files, text list files, and unstructured text. Note that the file type that will be attempted to read in is initially determine by the file extension, using the function: 'classify.ext()'.

List of key functions:

cat.path Simple and foolproof way to create full-path file names.
classify.ext Classify file types readable by standard R I/O functions.
column.salvage Change column name in different form to desired form.
file.ncol Find the number of columns (lines) in a file.
file.nrow Find the number of rows (lines) in a file.
find.id.col Find which column in a dataframe contains a specified set of values.
shift.rownames Shift the first column of a dataframe to rownames()
force.frame returns a dataframe if 'unknown.data' can in anyway relate to such
force.vec returns a vector if 'unknown.data' can in anyway relate to such
get.delim Determine the delimiter for a text data file.
get.ext Get the file extension from a file-name.
is.file Test whether a file exists in a target directory.
make.fixed.width Convert a matrix or dataframe to fixed-width.
n.readLines Read 'n' lines (ignoring comments and header) from a file.
parse.args Function to collect arguments when running R from the command line.
reader Flexibly load from a text or binary file, accepts multiple file formats.
rmv.ext Remove the file extension from a file-name.
find.file Construct a path to a file, where multiple directories can be searched to find an existing file.

Author(s)

Nicholas Cooper

Maintainer: Nicholas Cooper <[email protected]>

Examples

mydir <- "/Documents"
cat.path(mydir,"temp.doc","NEW",suf=5)
## example for the reader() function ##
df <- data.frame(ID=paste("ID",101:110,sep=""),
                 scores=sample(70,10,TRUE)+30,age=sample(7,10,TRUE)+11)
test.files <- c("temp.txt","temp2.csv","temp3.rda")
write.table(df,file=test.files[1],col.names=TRUE,row.names=TRUE,sep="\t",quote=FALSE)
# file.nrow and file.ncol examples
file.nrow(test.files[1])
file.ncol(test.files[1])
write.csv(df,file=test.files[2])
save(df,file=test.files[3])
# use the same simple reader() function call to read in each file type
for(cc in 1:length(test.files)) {
    cat(test.files[cc],"\n")
    myobj <- reader(test.files[cc])  # add 'quiet=F' to see some working
    print(myobj); cat("\n\n")
}
# inspect files before deleting if desired:
#  unlink(test.files)
#
# find id column in data frame
new.frame <- data.frame(day=c("M","T","W"),time=c(9,12,3),staff=c("Mary","Jane","John"))
staff.ids <- c("Mark","Jane","John","Andrew","Sally","Mary")
new.frame; find.id.col(new.frame,staff.ids)
mydir <- "/Documents"
cat.path(mydir,"temp.doc","NEW",suf=5)
## example for the reader() function ##
df <- data.frame(ID=paste("ID",101:110,sep=""),
                 scores=sample(70,10,TRUE)+30,age=sample(7,10,TRUE)+11)
test.files <- c("temp.txt","temp2.csv","temp3.rda")
write.table(df,file=test.files[1],col.names=TRUE,row.names=TRUE,sep="\t",quote=FALSE)
# file.nrow and file.ncol examples
file.nrow(test.files[1])
file.ncol(test.files[1])
write.csv(df,file=test.files[2])
save(df,file=test.files[3])
# use the same simple reader() function call to read in each file type
for(cc in 1:length(test.files)) {
    cat(test.files[cc],"\n")
    myobj <- reader(test.files[cc])  # add 'quiet=F' to see some working
    print(myobj); cat("\n\n")
}
# inspect files before deleting if desired:
#  unlink(test.files)
#
# find id column in data frame
new.frame <- data.frame(day=c("M","T","W"),time=c(9,12,3),staff=c("Mary","Jane","John"))
staff.ids <- c("Mark","Jane","John","Andrew","Sally","Mary")
new.frame; find.id.col(new.frame,staff.ids)

Simple and robust way to create full-path file names.

Description

Create a path with a file name, plus optional directory, prefix, suffix, and file extension. dir/ext are robust, so that if they already exist, the path produced will still make sense. Prefix is applied after the directory, and suffix before the file extension.

Usage

cat.path(dir = "", fn, pref = "", suf = "", ext = "",
  must.exist = FALSE)
cat.path(dir = "", fn, pref = "", suf = "", ext = "",
  must.exist = FALSE)

Arguments

`dir`	directory for the full path, if 'fn' already has a dir, then dir will be overridden. Auto add file separator if not present
`fn`	compulsory vector of file names/paths
`pref`	prefix to add in front of the file name
`suf`	suffix to add after the file name, before the extension
`ext`	file extension, will override an existing extension
`must.exist`	the specified file must already exist, else error

Value

returns vector of file names with the full paths

`ext`	filenames or extensions to classify
`more.txt`	more extensions that should be treated as txt
`more.bin`	more extensions that should be treated as binary
`more.csv`	more extensions that should be treated as csv
`print.all`	setting to T, simply prints the list of supported ext

`frame`	a dataframe or matrix with column names
`desired`	the column name wanted
`testfor`	possible alternate forms of the desired column name
`ignore.case`	whether to ignore the upper/lower case of the column names

`fn`	name of the file(s) to get the length of
`reader`	try to read the entire file to get a result, else looks at the top few lines (ignoring comments)
`del`	specify a delimiter (else this will be auto-detected)
`comment`	a comment symbol to ignore lines in files
`skip`	number of lines to skip at top of file before processing
`force`	try to read the file regardless of whether it looks like an invalid file type. Only use when you know the files are valid
`excl.rn`	exclude rownames from column count (essentially subtract 1)

`fn`	name of the file to search for
`dir`	the first directory to look in (expected location)
`dirs`	vector/list, a set of directories to look in should the file not be found in 'dir'.

`frame`	a data.frame, or similarly 2 dimensional object which might contain ids
`ids`	a vector of IDs/value that might be found in at least 1 column of frame
`ret`	specify what should be returned, see values

`unknown.data`	something that is or can refer to a 2d dataset
`too.big`	max size in GB, to prevent unintended conversion to matrix of a very large big.matrix object.

`unknown.data`	something that is or can refer to a 2d dataset
`most.unique`	if TRUE, select most unique column if a unknown.data is a matrix, else select the first column
`dir`	if unknown.data is a file name, specifies directory(s) to look for the file
`warn`	whether to display a warning if unknown.data is a matrix

`fn`	name of the file to parse
`n`	the number of lines to read to make the inference
`comment`	a comment symbol to ignore lines in files
`skip`	number of lines to skip at top of file before processing
`delims`	the set of delimiters to test for
`large`	search initially for delimiters that imply more than 1, and less than this 'large' columns; if none in this range, look next at >large.
`one.byte`	only check for one-byte delimiters, [e.g, whitespace regular expr is >1 byte]

`fn`	name of the file(s) to get the length of
`n`	number of valid lines to attempt to read looks at the top few lines (ignoring comments)
`comment`	a comment symbol to ignore lines in files
`skip`	number of lines to skip at top of file before processing
`header`	whether to allow for, and skip, a header row

`arg.list`	the result of a commandArgs() call, or else NULL to initiate this call within the function
`coms`	list of valid commands to look for, not case sensitive
`def`	list of default values for each parameter (in same order)
`list.out`	logical, whether to return output as a list or data.frame
`verbose`	logical, whether to print to the console which assignments are made and warning messages

`dataf`	data.frame to run the conversion on
`override`	assume col 1 is rownames, regardless of numeric() test
`warn`	whether to display warnings if assumptions aren't met

`fn`	filename (with or without path if dir is specified)
`dir`	optional directory if separate path/filename is preferred
`want.type`	if loading a binary file with multiple objects, specify here the is() type of object you are trying to load
`def`	the default delimiter to try first
`force.read`	attempt to read the file even if the file type looks unsupported
`header`	presence of a header should be autodetected, but can specify header status if you don't trust the autodetection
`h.test.p`	p value to discriminate between number of characters in a column name versus a column value (sensitivity parameter for automatic header detection)
`quiet`	run without messages and warnings
`treatas`	a standard file extension, e.g, 'txt', to treat file as
`override`	assume first col is rownames, regardless of heuristic
`more.types`	optionally add more file types which are read as text
`auto.vec`	if the file seems to only have a single column, automatically return the result as a vector rather than a dataframe with 1 column
`one.byte`	logical parameter, passed to 'get.delim', whether to look for only 1-byte delimiters, to also search for 'whitespace' which is a multibyte (wildcard) delimiter type. Use one.byte = FALSE, to read fixed width files, e.g, many plink files.
`...`	further arguments to the function used by 'reader' to parse the file, e.g, depending on file.type, can be read.table(), read.delim(), read.csv().

`fn`	filename(s) (with full path is ok too)
`only.known`	logical, only remove extension if in the 'known' list
`more.known`	character vector, add to the list of known extensions
`print.known`	return the list of 'known' file extensions

Package 'reader'

Help Index

Suite of Functions to Flexibly Read Data from Files

Description

Details

Author(s)

See Also

Examples

Simple and robust way to create full-path file names.

Description

Usage

Arguments

Value

Author(s)

Examples

Classify file types readable by standard R I/O functions.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Change column name in different form to desired form.

Description

Usage

Arguments

Value

Author(s)

Examples

Convert a matrix or dataframe to fixed-width for nice file output

Description

Usage

Arguments

Value

Author(s)

Examples

Find the number of columns (lines) in a file.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Find the number of rows (lines) in a file.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Search for a directory to add to the path so that a file exists.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Find which column in a dataframe contains a specified set of values.

Description

Usage

Arguments

Value

Author(s)

Examples

returns a dataframe if 'unknown.data' can in anyway relate to such:

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

returns a vector if 'unknown.data' can in anyway relate to such:

Description

Usage

Arguments