Title: | Air Quality Evaluation |
---|---|
Description: | Developed for use by those tasked with the routine detection, characterisation and quantification of discrete changes in air quality time-series, such as identifying the impacts of air quality policy interventions. The main functions use signal isolation then break-point/segment (BP/S) methods based on 'strucchange' and 'segmented' methods to detect and quantify change events (Ropkins & Tate, 2021, <doi:10.1016/j.scitotenv.2020.142374>). |
Authors: | Karl Ropkins [aut, cre] , Anthony Walker [aut] , James Tate [aut] |
Maintainer: | Karl Ropkins <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.5.7 |
Built: | 2024-10-31 06:38:59 UTC |
Source: | CRAN |
R AQEval: R code for the analysis of discrete change in Air Quality time-series.
AQEval
was developed for use by those tasked with
the routine detection, characterisation and quantification
of discrete changes in air quality time-series.
The main functions, quantBreakPoints
and quantBreakSegments
, use
break-point/segment (BP/S) methods
based on the consecutive use of methods in the
strucchange
and segmented
R
packages
to first detection (as break-points) and then characterise
and quantify (as segments), discrete changes in
air-quality time-series.
AQEval
functions adopt an openair
-friendly
approach using function and data structures that many
in the air quality research community are already familiar
with.
Most notably, most functions expect supplied data
to be time-series, to be supplied as a single
data.frame
(or similar R object), and for
time-series to be identified by column names.
The main functions are typically structured expect
first the data.frame
, then the name of the
pollutant to be used, then other arguments:
function(data, "polluant.name", ...)
output <- function(data, "polluant.name", ...)
Karl Ropkins
Ropkins et al (In Prep).
For more about data structure and an example data set,
see AQEval.data
For more about the main functions, see
quantBreakPoints
and quantBreakSegments
Data packaged with AQEval for use with example code.
aq.data
aq.data
(26280x6) 'tbl_df' objects
Time-series of POSIX class date and time records.
Time-series of nitrogen dioxide measurements from local site.
Time-series of nitrogen dioxide measurements from nearby background site.
Time-series of local wind speed measurements.
Time-series of local wind direction measurements.
Time-series of local air temperature measurements.
Most of functions in AQEval
adopt the
openair
convention of assuming supplied data is
a single data.frame
or similar.
The data frame was initially adopted for two reasons:
Firstly, air quality data collected and archived in numerous formats and keeping the import requirements simple minimises the frustrations associated with data importation.
Secondly, restricting the user to work with a single data format greatly simplifies data management for those less familiar with programming environments.
As part of this work several openair
coding
conventions were adopted, most importantly that data
sets should include a column named date
of
POSIX
class data-and-time-stamps
(DateTimeClasses
).
This and other conventions, such as the use of
ws
and wd
for numeric wind speed and
direction data-series, and site
and code
for character or factor monitoring site name and
identifier code, are now commonplace for many working
with R in the air quality research community, and many
air quality archives provide data in (or support import
functions that convert their own data structures to)
this openair
-friendly structure.
Air quality and meteorological data packaged for use with AQEval Examples.
Time-series sources:
date Date-and-time-stamp of POSIX class
(DateTimeClasses
).
no2 Nitrogen dioxide downloaded from King's
College London Archive using importKCL
function in openair
.
bg.no2 Nitrogen dioxide downloaded from
the Automatic Urban and Rural Network Archive using
importAURN
function in openair
.
ws, wd, air_temp Wind
speed, wind direction and air temperature downloaded from
NOAA's Integrated Surface Database using importNOAA
function in worldmet
.
Regarding openair
and openair
-friendly
data structuring, see:
Carslaw, D. C. and K. Ropkins (2012), openair — an R package for air quality data analysis. Environmental Modelling & Software. Volume 27-28, 52-61, DOI doi:10.1016/j.envsoft.2011.09.008
Ropkins, K. and D.C. Carslaw (2012), openair-Data Analysis Tools for the Air Quality Community. R Journal, 4(1). URL https://journal.r-project.org/archive/2012/RJ-2012-003/RJ-2012-003.pdf
Regarding worldmet
, see:
David Carslaw (2021), worldmet: Import Surface Meteorological Data from NOAA Integrated Surface Database (ISD). R package version 0.9.5. URL https://CRAN.R-project.org/package=worldmet
openair
: functions importAURN
and
importKCL
worldmet
: function importNOAA
(See References)
#data set used in AQEval Examples dim(aq.data) head(aq.data) with(aq.data, plot(date, no2, type="l"))
#data set used in AQEval Examples dim(aq.data) head(aq.data) with(aq.data, plot(date, no2, type="l"))
Calculate data set statistics for selected time intervals.
calcDateRangeStat( data, from = NULL, to = NULL, stat = NULL, pollutant = NULL, ..., method = 2 ) calcRollingDateRangeStat( data, range = "year", res = "day", stat = NULL, pollutant = NULL, from = NULL, to = NULL, ..., method = 2 )
calcDateRangeStat( data, from = NULL, to = NULL, stat = NULL, pollutant = NULL, ..., method = 2 ) calcRollingDateRangeStat( data, range = "year", res = "day", stat = NULL, pollutant = NULL, from = NULL, to = NULL, ..., method = 2 )
data |
(data.frame, tibble, etc) Data set containing
data statistic to be calculated for, and |
from |
(various) Start date(s) to subsample from when
calculating statistic, by default end of supplied
|
to |
(various) End date(s) to subsample to when
calculating statistic, by default end of supplied
|
stat |
(function) Statistic to be applied to selected
data, by default |
pollutant |
(character) The name(s) of data-series to
analyse in |
... |
extra arguments. |
method |
(numeric) Method to use when calculating statistic. |
range |
(character) For |
res |
(character) For |
These functions return data.frame
s of function
outputs.
These functions are in development and likely to change significantly in future versions, please handle with care.
Finding and testing break-points in conventionally formatted air quality data sets.
findBreakPoints(data, pollutant, h = 0.15, ...) testBreakPoints(data, pollutant, breaks, ...)
findBreakPoints(data, pollutant, h = 0.15, ...) testBreakPoints(data, pollutant, breaks, ...)
data |
Data source, typically a |
pollutant |
Name of time-series, assumed to be
a column in |
h |
( |
... |
other parameters |
breaks |
( |
findBreakPoints
uses methods from
strucchange
package (see references) and
modifications as suggested by the main author of
strucchange
to handle missing cases to find
potential breaks-points in a supplied time-series.
testBreakPoints
tests and identifies most likely
break-points using methods proposed for use with
quantBreakPoints
and quantBreakSegments
and conventionally formatted air quality data sets.
findBreakPoints
returns a data.frame
of found break-points.
testBreakPoints
return a likely break-point/segment
report.
Regarding strucchange
methods see
breakpoints
, and:
Achim Zeileis, Friedrich Leisch, Kurt Hornik and Christian Kleiber (2002). strucchange: An R Package for Testing for Structural Change in Linear Regression Models. Journal of Statistical Software, 7(2), 1-38. URL https://www.jstatsoft.org/v07/i02/.
Achim Zeileis, Christian Kleiber, Walter Kraemer and Kurt Hornik (2003). Testing and Dating of Structural Changes in Practice. Computational Statistics & Data Analysis, 44, 109-123.
Regarding missing data handling, see:
URL: https://stackoverflow.com/questions/43243548/strucchange-not-reporting-breakdates.
Regarding testBreakPoints
, see:
Ropkins et al (In Prep).
Function to find nearest locations in a reference by latitude and longitude.
findNearLatLon(lat, lon = NULL, nmax = 10, ..., ref = NULL, units = "m") findNearSites( lat, lon, pollutant = "no2", site.type = "rural background", nmax = 10, ..., ref = NULL, units = "m" )
findNearLatLon(lat, lon = NULL, nmax = 10, ..., ref = NULL, units = "m") findNearSites( lat, lon, pollutant = "no2", site.type = "rural background", nmax = 10, ..., ref = NULL, units = "m" )
lat , lon
|
(numeric) The supplied latitude and longitude. |
nmax |
(numeric) The maximum number of nearest sites to report, by default 10. |
... |
Other parameters, currently ignored. |
ref |
( |
units |
(character) The units to use when reporting distances to near locations; current options m. |
pollutant |
(character) For |
site.type |
(character) For |
If investigating air quality in a particular location,
for example a UK Clean Air Zone
(https://www.gov.uk/guidance/driving-in-a-clean-air-zone),
you may wish to locate an appropriate rural background air quality
monitoring station. findNearSites
locates air quality monitoring
sites with openly available data such as that available from the UK AURN
network (https://uk-air.defra.gov.uk/networks/network-info?view=aurn)
find.near
returns data.frame
of near site meta
data.
This function uses haversine formula to account to the Earth's surface curvature, and uses 6371 km as the radius of earth.
#find rural background NO2 monitoring sites #near latitude = 50, longitude = -1 #not run: requires internet ## Not run: findNearSites(lat = 50, lon = -1) ## End(Not run)
#find rural background NO2 monitoring sites #near latitude = 50, longitude = -1 #not run: requires internet ## Not run: findNearSites(lat = 50, lon = -1) ## End(Not run)
Environmental time-series signal processing: Contribution isolation based on background subtraction, deseasonalisation and/or deweathering.
isolateContribution( data, pollutant, background = NULL, deseason = TRUE, deweather = TRUE, method = 2, add.term = NULL, formula = NULL, output = "mean", ... )
isolateContribution( data, pollutant, background = NULL, deseason = TRUE, deweather = TRUE, method = 2, add.term = NULL, formula = NULL, output = "mean", ... )
data |
Data source, typically |
pollutant |
The column name of the |
background |
(optional) if supplied, the background time-series to use as a background correction. See below. |
deseason |
logical or character vector, if
|
deweather |
logical or character vector, if
|
method |
numeric, contribution isolation method (default 2). See Note. |
add.term |
extra terms to add to the contribution isolation model; ignore for now (in development). |
formula |
(optional) Signal isolate model formula;
this allows user to set the signal isolation model formula
directly, but means function arguments |
output |
output options; currently, |
... |
other arguments; ignore for now (in development) |
isolateContribution
estimates and
subtracts pollutant
variance associated with
factors that may hinder break-point/segment analysis:
Background Correction If applied, this fits
the supplied background
time-series as a
spline term: s(background)
.
Seasonality If applied, this fits regular
frequency terms, e.g. day.hour
, year.day
,
as spline terms, default TRUE is equivalent to
s(day.hour)
and s(year.day)
. All terms are
calculated from date
column in data
.
Weather If applied, this fits time-series of
identified meteorological measurements, e.g. wind speed
and direction (ws
and wd
in data
).
If both ws
and wd
are present these are
fitted as a tensor term te(ws, wd)
. Other
deweather
ing terms, if included, are fitted
as spline term s(term)
. The default TRUE
is equivalent to te(ws, wd)
.
Using the supplied arguments, it builds a signal
(mgcv
) GAM model, calculates,
and returns the mean-centred residuals as an
estimate of the isolated local contribution.
isolateContribution
returns a vector of
predictions of the pollutant
time-series after
the requested signal isolation.
method
was included as part of method
development and testing work, and retained for now.
Please ignore for now.
Karl Ropkins
Regarding mgcv
GAM fitting methods, see
Wood (2017) for general introduction and package
documentation regarding coding (mgcv
):
Wood, S.N. (2017) Generalized Additive Models: an introduction with R (2nd edition), Chapman and Hall/CRC.
Regarding isolateContribution
, see:
Ropkins et al (In Prep).
Regarding seasonal terms and frequency
analysis, see also stl
and
spectralFrequency
.
#fitting a simple deseasonalisation, deweathering #and background correction (dswb) model to no2: aq.data$dswb.no2 <- isolateContribution(aq.data, "no2", background="bg.no2") #compare at 7 day resolution: temp <- openair::timeAverage(aq.data, "7 day") #without dswb quantBreakPoints(temp, "no2", test=FALSE, h=0.1) #with dswb quantBreakPoints(temp, "dswb.no2", test=FALSE, h=0.1)
#fitting a simple deseasonalisation, deweathering #and background correction (dswb) model to no2: aq.data$dswb.no2 <- isolateContribution(aq.data, "no2", background="bg.no2") #compare at 7 day resolution: temp <- openair::timeAverage(aq.data, "7 day") #without dswb quantBreakPoints(temp, "no2", test=FALSE, h=0.1) #with dswb quantBreakPoints(temp, "dswb.no2", test=FALSE, h=0.1)
Other packaged Air Quality Models.
fitNearSiteModel(data, pollutant = "no2", y, x = "rest", elements = NULL, ...)
fitNearSiteModel(data, pollutant = "no2", y, x = "rest", elements = NULL, ...)
data |
|
pollutant |
The name of the |
y |
The name of the monitor site to be modelled,
assumed to be one several names in the |
x |
The other sites to use when building the model, the default 'rest' uses all supplied sites except 'y'. |
elements |
The number of inputs to use in the
site models, can be any number up to length of x or
combination thereof; by default this is set as
|
... |
extra arguments. |
fitNearSiteModel
builds an air quality
model for one location using air quality data from nearby
sites.
data
with model output added as additional
column.
Quantify either break-points or break-segment methods for pollutant time-series
quantBreakPoints( data, pollutant, breaks, ylab = NULL, xlab = NULL, pt.col = c("lightgrey", "darkgrey"), line.col = "red", break.col = "blue", event = NULL, show = c("plot", "report"), ... ) quantBreakSegments( data, pollutant, breaks, ylab = NULL, xlab = NULL, pt.col = c("lightgrey", "darkgrey"), line.col = "red", break.col = "blue", event = NULL, seg.method = 2, seg.seed = 12345, show = c("plot", "report"), ... )
quantBreakPoints( data, pollutant, breaks, ylab = NULL, xlab = NULL, pt.col = c("lightgrey", "darkgrey"), line.col = "red", break.col = "blue", event = NULL, show = c("plot", "report"), ... ) quantBreakSegments( data, pollutant, breaks, ylab = NULL, xlab = NULL, pt.col = c("lightgrey", "darkgrey"), line.col = "red", break.col = "blue", event = NULL, seg.method = 2, seg.seed = 12345, show = c("plot", "report"), ... )
data |
Data source, typically a data.frame or similar, containing data-series to model and a paired time-stamp data-series, named date. |
pollutant |
The name of the data-series to break-point or break-segment model. |
breaks |
(Optional) The break-points and
confidence intervals to use when building either
break-point or break-segment models. If not supplied
these are build using |
ylab |
Y-label term, by default pollutant. |
xlab |
X-label term, by default date. |
pt.col |
Point fill and line colours for plot, defaults lightgrey and darkgrey. |
line.col |
Line colour for plot, default red. |
break.col |
Break-point/segment colour for plot, default blue. |
event |
An optional list of plot terms for an event
marker, applied to a vertical line and text label. List
items include: |
show |
What to show before returning the break-point quantification mode, by default plot and report. |
... |
other parameters |
seg.method |
( |
seg.seed |
( |
quantBreakPoints
and
quantBreakSegments
both use
strucchange
methods to identify potential
break-points in time-series, and then quantify
these as conventional break-points or break-segments,
respectively:
Finding Break-points Using the
strucchange
methods of Zeileis and colleagues
and independent change detection model, the functions
apply a rolling-window approach, assuming the first
window (or data subset) is without change, building a
statistical model of that, advancing the window,
building a second model and comparing these, and so on,
to identify the most likely points of change in a
larger data-series. See also findBreakPoints
Quantifying Break-points Using the supplied break-points to build a break-point model.
Quantifying Break-segments Using the confidence regions for the supplied break-points as the starting points to build a break-segment model.
Both functions use the show
argument
to control which elements of the functions outputs
are shown but also invisible return a list
of all outputs which can caught using, e.g.:
brk.mod <- quantBreakPoints(data, pollutant)
AQEval
function quantBreakSegments
is currently running segmented v.1.3-4
while we
evaluate latest version, v.1.4-0
.
Karl Ropkins
Regarding strucchange
methods see in-package
documentation, e.g. breakpoints
,
and:
Achim Zeileis, Friedrich Leisch, Kurt Hornik and Christian Kleiber (2002). strucchange: An R Package for Testing for Structural Change in Linear Regression Models. Journal of Statistical Software, 7(2), 1-38. URL https://www.jstatsoft.org/v07/i02/.
Achim Zeileis, Christian Kleiber, Walter Kraemer and Kurt Hornik (2003). Testing and Dating of Structural Changes in Practice. Computational Statistics & Data Analysis, 44, 109-123. DOI doi:10.1016/S0167-9473(03)00030-6.
Regarding segmented
methods see in-package
documentation, e.g.
segmented
, and:
Vito M. R. Muggeo (2003). Estimating regression models with unknown break-points. Statistics in Medicine, 22, 3055-3071. DOI 10.1002/sim.1545.
Vito M. R. Muggeo (2008). segmented: an R Package to Fit Regression Models with Broken-Line Relationships. R News, 8/1, 20-25. URL https://cran.r-project.org/doc/Rnews/.
Vito M. R. Muggeo (2016). Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. J of Statistical Computation and Simulation, 86, 3059-3067. DOI 10.1080/00949655.2016.1149855.
Vito M. R. Muggeo (2017). Interval estimation for the breakpoint in segmented regression: a smoothed score-based approach. Australian & New Zealand Journal of Statistics, 59, 311-322. DOI 10.1111/anzs.12200.
Regarding break-points/segment methods, see:
Ropkins et al (In Prep).
timeAverage
in openair
,
breakpoints
in strucchange
, and
segmented
in segmented
.
#using openair timeAverage to covert 1-hour data to 1-day averages temp <- openair::timeAverage(aq.data, "1 day") #break-points quantBreakPoints(temp, "no2", h=0.3) #break-segments quantBreakSegments(temp, "no2", h=0.3) #addition examples (not run) ## Not run: #in-call plot modification #removing x axis label #recolouring break line and #adding an event marker quantBreakPoints(temp, "no2", h=0.3, xlab="", break.col = "red", event=list(label="Event expected here", x="2002-08-01", col="grey")) ## End(Not run)
#using openair timeAverage to covert 1-hour data to 1-day averages temp <- openair::timeAverage(aq.data, "1 day") #break-points quantBreakPoints(temp, "no2", h=0.3) #break-segments quantBreakSegments(temp, "no2", h=0.3) #addition examples (not run) ## Not run: #in-call plot modification #removing x axis label #recolouring break line and #adding an event marker quantBreakPoints(temp, "no2", h=0.3, xlab="", break.col = "red", event=list(label="Event expected here", x="2002-08-01", col="grey")) ## End(Not run)
Time-series spectral frequency analysis.
spectralFrequency(data, pollutant, ...)
spectralFrequency(data, pollutant, ...)
data |
|
pollutant |
The name of the time-series, typically pollutant measurements, to be analysed. |
... |
extra arguments. |
spectralFrequency
producing a
time frequency analysis of the requested
pollutant
.
spectralFrequency
uses the show
argument to control which elements of the functions outputs
are shown but also invisibly returns a list
of all outputs which can caught using, e.g.:
sfa.mod <- spectralFrequency(data, pollutant)
spectralFrequency(aq.data, "no2")
spectralFrequency(aq.data, "no2")