Title: | Utilities for Importing and Manipulating Biomedical Data Files |
---|---|
Description: | Tools to read various file types into one list of data structures, usually, but not limited to, data frames. Excel files are read sheet-wise, i.e., all or a selection of sheets can be read. Field delimiters and decimal separators are determined automatically. |
Authors: | Vidal Fey |
Maintainer: | Vidal Fey <[email protected]> |
License: | GPL-3 |
Version: | 0.2-12 |
Built: | 2024-12-22 06:36:15 UTC |
Source: | CRAN |
Determine the number of lines in a (large) text file without importing it.
get.nlines(file, n = 1, pattern = NULL, incl.header = FALSE)
get.nlines(file, n = 1, pattern = NULL, incl.header = FALSE)
file |
|
n |
|
pattern |
|
incl.header |
|
An integer value.
Determine field delimiter in text files
get.sep(file, n = 1, pattern)
get.sep(file, n = 1, pattern)
file |
|
n |
|
pattern |
|
If successful, the filed delimiter. If more than on of the possible delimiters is found, an error is returned.
get.skip
attempts to determine the number of rows that could be skipped when reading text files.
get.skip(file, n = 1, pattern = NULL)
get.skip(file, n = 1, pattern = NULL)
file |
( |
n |
( |
pattern |
( |
The skip
value. If no value is determined 0 (zero) is returned.
read.to.list
is meant to act as a universal reading function as it attempts to read
a number of different file formats into a list of data frames.
read.to.list( dat, type, folder, nsheets = 1, sheet = NULL, keep.tibble = FALSE, skip = 0, sep = NULL, lines = FALSE, dec = NULL, ..., verbose = TRUE, x.verbose = FALSE )
read.to.list( dat, type, folder, nsheets = 1, sheet = NULL, keep.tibble = FALSE, skip = 0, sep = NULL, lines = FALSE, dec = NULL, ..., verbose = TRUE, x.verbose = FALSE )
dat |
|
type |
|
folder |
|
nsheets |
|
sheet |
|
keep.tibble |
|
skip |
|
sep |
|
lines |
|
dec |
|
... |
Additional arguments passed to functions. |
verbose |
|
x.verbose |
|
Excel files (file extension .xls or .xlsx) will be read by readxl::read_excel
. A test is attempted
to determine whether the input file is genuinely derived from Excel or only named like an nExcel file. If the latter,
it will be attempted to read it as text file.
Text files are read as tables or by line if lines
is TRUE
.
For text files, field delimiters and decimal separators are determined automatically if not provided.
Files with the extensions .txt", ".tsv", ".csv", ".gtf" and ".gff" are treated and read as text files.
VCF files are also treated as text files but can noly be read in full (incl. header) if read by line. Otherwise,
if skip
is 0
, the line with the column names will be determined automatically and the file read
as delimited text file.
XML files are read by xml2::read_xml
.
".RData" files are loaded and assigned a name.
".rds" and ".rda" files are read by readRDS
.
".xdr" files are read by R.utils::loadObject
.
A list of tibbles/data frames.
# The function readxl::read_excel is used internally to read Excel files. # The example used their example data. readxl_datasets <- readxl::readxl_example("datasets.xlsx") # A randomly generated data frame was saved to a tab-separated text file # and two different R object files. tsv_datasets <- dir(system.file("extdata", package = "readmoRe"), full.names = TRUE) # All example data are read into a list. From the Excel file, the first # sheet is read. dat <- read.to.list(c(readxl_datasets, tsv_datasets)) # All example data are read into a list. From the Excel file, the first # 3 sheets are read. dat <- read.to.list(c(readxl_datasets, tsv_datasets), nsheets=3) # All example data are read into a list. From the Excel file, sheets 1 and # 4 are read. dat <- read.to.list(c(readxl_datasets, tsv_datasets), sheet=c(1, 4)) # From two Excel files, different sheets are read: 1 and 4 from the first # file and 2 and 3 from the second. # (For simplicity, the same example file is used.) dat <- read.to.list(c(readxl_datasets, readxl_datasets), sheet=list(c(1, 4), c(2, 3)))
# The function readxl::read_excel is used internally to read Excel files. # The example used their example data. readxl_datasets <- readxl::readxl_example("datasets.xlsx") # A randomly generated data frame was saved to a tab-separated text file # and two different R object files. tsv_datasets <- dir(system.file("extdata", package = "readmoRe"), full.names = TRUE) # All example data are read into a list. From the Excel file, the first # sheet is read. dat <- read.to.list(c(readxl_datasets, tsv_datasets)) # All example data are read into a list. From the Excel file, the first # 3 sheets are read. dat <- read.to.list(c(readxl_datasets, tsv_datasets), nsheets=3) # All example data are read into a list. From the Excel file, sheets 1 and # 4 are read. dat <- read.to.list(c(readxl_datasets, tsv_datasets), sheet=c(1, 4)) # From two Excel files, different sheets are read: 1 and 4 from the first # file and 2 and 3 from the second. # (For simplicity, the same example file is used.) dat <- read.to.list(c(readxl_datasets, readxl_datasets), sheet=list(c(1, 4), c(2, 3)))
read2list
is meant to act as a universal reading function as it attempts to read
a number of different file formats into a list of data frames.
read2list( dat, nsheets = 1, sheet = NULL, keep.tibble = FALSE, skip = 0, sep = NULL, lines = FALSE, dec = NULL, ..., verbose = TRUE, x.verbose = FALSE )
read2list( dat, nsheets = 1, sheet = NULL, keep.tibble = FALSE, skip = 0, sep = NULL, lines = FALSE, dec = NULL, ..., verbose = TRUE, x.verbose = FALSE )
dat |
|
nsheets |
|
sheet |
|
keep.tibble |
|
skip |
|
sep |
|
lines |
|
dec |
|
... |
Additional arguments passed to functions. |
verbose |
|
x.verbose |
|
A collection of utilities for reading and importing data into R by performing (usually small) manipulations of data structures such as data frames, matrices and list and automatically determining import parameters.
Package: | readmoRe |
Type: | Package |
Initial version: | 0.1-0 |
Created: | 2011-01-07 |
License: | GPL-3 |
LazyLoad: | yes |
The main function of the package is read.to.list
which reads a number of different file formats into a list of data objects
such as data frames, depending on the source file.
Vidal Fey <[email protected]>
rm.empty.cols
removes columns that have only NAs
AND whose names
start with a capital 'X' (unless na.only is TRUE
in which case all NA
columns
will be removed).
rm.empty.cols(x, na.only = FALSE)
rm.empty.cols(x, na.only = FALSE)
x |
( |
na.only |
( |
Empty columns in Excel sheets are imported to NA
columns in the resulting data frame.
Columns that did not have a column name in the spread sheet will result in data frame column names
starting with 'X'. rm.empty.cols
makes use of these two criteria to identify columns that
can safely be removed from the data frame.
A data frame.
rm.newline.chars
removes ‘newline’ characters (\n
) from any column of a data frame.
rm.newline.chars(x, verbose = TRUE)
rm.newline.chars(x, verbose = TRUE)
x |
( |
verbose |
( |
‘Newline’ characters in data frame rows are read verbatim and will cause rows in output text files to be distributed across two ore more lines. Such characters, entered accidentally or deliberately in the source Excel file, should be avoided. This function removes all ‘newline’ characters found at the end of a line or replaces them when found within the line text.
A data frame.