Title: | Tools for the Analysis of Air Pollution Data |
---|---|
Description: | Tools to analyse, interpret and understand air pollution data. Data are typically regular time series and air quality measurement, meteorological data and dispersion model output can be analysed. The package is described in Carslaw and Ropkins (2012, <doi:10.1016/j.envsoft.2011.09.008>) and subsequent papers. |
Authors: | David Carslaw [aut, cre], Jack Davison [aut], Karl Ropkins [aut] |
Maintainer: | David Carslaw <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.18-2 |
Built: | 2025-01-06 06:30:56 UTC |
Source: | CRAN |
Calculate a range of air pollution-relevant statistics by year.
aqStats( mydata, pollutant = "no2", type = "default", data.thresh = 0, percentile = c(95, 99), transpose = FALSE, ... )
aqStats( mydata, pollutant = "no2", type = "default", data.thresh = 0, percentile = c(95, 99), transpose = FALSE, ... )
mydata |
A data frame containing a |
pollutant |
The name of a pollutant e.g. |
type |
|
data.thresh |
The data capture threshold in %. No values are calculated
if data capture over the period of interest is less than this value.
|
percentile |
Percentile values to calculate for each pollutant. |
transpose |
The default is to return a data frame with columns
representing the statistics. If |
... |
Other arguments, currently unused. |
This function calculates a range of common and air pollution-specific
statistics from a data frame. The statistics are calculated on an annual
basis and the input is assumed to be hourly data. The function can cope with
several sites and years e.g. using type = "site"
. The user can control
the output by setting transpose
appropriately.
Note that the input data is assumed to be in mass units e.g. ug/m3 for all species except CO (mg/m3).
The following statistics are calculated:
data.capture — percentage data capture over a full year.
mean — annual mean.
minimum — minimum hourly value.
maximum — maximum hourly value.
median — median value.
max.daily — maximum daily mean.
max.rolling.8 — maximum 8-hour rolling mean.
max.rolling.24 — maximum 24-hour rolling mean.
percentile.95 — 95th percentile. Note that several percentiles can be calculated.
roll.8.O3.gt.100 — number of days when the daily maximum rolling 8-hour mean ozone concentration is >100 ug/m3. This is the target value.
roll.8.O3.gt.120 — number of days when the daily maximum rolling 8-hour mean ozone concentration is >120 ug/m3. This is the Limit Value not to be exceeded > 10 days a year.
AOT40 — is the accumulated amount of ozone over the threshold
value of 40 ppb for daylight hours in the growing season (April to
September). Note that latitude
and longitude
can also be passed
to this calculation.
hours.gt.200 — number of hours NO2 is more than 200 ug/m3.
days.gt.50 — number of days PM10 is more than 50 ug/m3.
For the rolling means, the user can supply the option align
, which can
be "centre" (default), "left" or "right". See rollingMean
for more
details.
There can be small discrepancies with the AURN due to the treatment of
rounding data. The aqStats
function does not round, whereas AURN data
can be rounded at several stages during the calculations.
David Carslaw
## Statistics for 2004. NOTE! these data are in ppb/ppm so the ## example is for illustrative purposes only aqStats(selectByDate(mydata, year = 2004), pollutant = "no2")
## Statistics for 2004. NOTE! these data are in ppb/ppm so the ## example is for illustrative purposes only aqStats(selectByDate(mydata, year = 2004), pollutant = "no2")
Bin a variable and calculate mean an uncertainties in mean
binData(mydata, bin = "nox", uncer = "no2", n = 40, interval = NA, breaks = NA)
binData(mydata, bin = "nox", uncer = "no2", n = 40, interval = NA, breaks = NA)
mydata |
Name of the data frame to process. |
bin |
The name of the column to divide into intervals |
uncer |
The name of the column for which the mean, lower and upper
uncertainties should be calculated for each interval of |
n |
The number of intervals to split |
interval |
The interval to be used for binning the data. |
breaks |
User specified breaks to use for binning. |
This function summarises data by intervals and calculates the mean and bootstrap 95 % confidence intervals in the mean of a chosen variable in a data frame. Any other numeric variables are summarised by their mean intervals.
There are three options for binning. The default is to bon bin
into 40
intervals. Second, the user can choose an binning interval e.g.
interval = 5
. Third, the user can supply their own breaks to use as
binning intervals.
Returns a summarised data frame with new columns for the mean and upper / lower 95 percent confidence intervals in the mean.
# how does nox vary by intervals of wind speed? results <- binData(mydata, bin = "ws", uncer = "nox") # easy to plot this using ggplot2 ## Not run: library(ggplot2) ggplot(results, aes(ws, mean, ymin = min, ymax = max)) + geom_pointrange() ## End(Not run)
# how does nox vary by intervals of wind speed? results <- binData(mydata, bin = "ws", uncer = "nox") # easy to plot this using ggplot2 ## Not run: library(ggplot2) ggplot(results, aes(ws, mean, ymin = min, ymax = max)) + geom_pointrange() ## End(Not run)
A utility function to calculation the uncertainty intervals in the mean of a vector. The function removes any missing data before the calculation.
bootMeanDF(x, conf.int = 0.95, B = 1000)
bootMeanDF(x, conf.int = 0.95, B = 1000)
x |
A vector from which the mean and bootstrap confidence intervals in the mean are to be calculated |
conf.int |
The confidence interval; default = 0.95. |
B |
The number of bootstrap simulations |
Returns a data frame with the mean, lower uncertainty, upper uncertainty and number of values used in the calculation
test <- rnorm(20, mean = 10) bootMeanDF(test)
test <- rnorm(20, mean = 10) bootMeanDF(test)
Given hourly NOX and NO2 from a roadside site and hourly NOX, NO2 and O3 from a background site the function will estimate the emissions ratio of NO2/NOX — the level of primary NO2
calcFno2(input, tau = 60, user.fno2, main = "", xlab = "year", ...)
calcFno2(input, tau = 60, user.fno2, main = "", xlab = "year", ...)
input |
A data frame with the following fields. |
tau |
Mixing time scale. It is unlikely the user will need to adjust this. See details below. |
user.fno2 |
User-supplied f-NO2 fraction e.g. 0.1 is a NO2/NOX ratio of
10% by volume. |
main |
Title of plot if required. |
xlab |
x-axis label. |
... |
Arguments passed on to
|
The principal purpose of this function is to estimate the level of primary
(or direct) NO2 from road vehicles. When hourly data of NOX, NO2 and O3 are
available, the total oxidant method of Clapp and Jenkin (2001) can be used.
If roadside O3 measurements are available see linearRelation()
for details
of how to estimate the primary NO2 fraction.
In the absence of roadside O3 measurements, it is rather more problematic to
calculate the fraction of primary NO2. Carslaw and Beevers (2005c) developed
an approach based on linearRelation()
the analysis of roadside and
background measurements. The increment in roadside NO2 concentrations is
primarily determined by direct emissions of NO2 and the availability of One
to react with NO to form NO2. The method aims to quantify the amount of NO2
formed through these two processes by seeking the optimum level of primary
NO2 that gives the least error.
Test data is provided at https://davidcarslaw.github.io/openair/.
an openair object
David Carslaw
Clapp, L.J., Jenkin, M.E., 2001. Analysis of the relationship between ambient levels of O3, NO2 and NO as a function of NOX in the UK. Atmospheric Environment 35 (36), 6391-6405.
Carslaw, D.C. and N Carslaw (2007). Detecting and characterising small changes in urban nitrogen dioxide concentrations. Atmospheric Environment. Vol. 41, 4723-4733.
Carslaw, D.C., Beevers, S.D. and M.C. Bell (2007). Risks of exceeding the hourly EU limit value for nitrogen dioxide resulting from increased road transport emissions of primary nitrogen dioxide. Atmospheric Environment 41 2073-2082.
Carslaw, D.C. (2005a). Evidence of an increasing NO2/NOX emissions ratio from road traffic emissions. Atmospheric Environment, 39(26) 4793-4802.
Carslaw, D.C. and Beevers, S.D. (2005b). Development of an urban inventory for road transport emissions of NO2 and comparison with estimates derived from ambient measurements. Atmospheric Environment, (39): 2049-2059.
Carslaw, D.C. and Beevers, S.D. (2005c). Estimations of road vehicle primary NO2 exhaust emission fractions using monitoring data in London. Atmospheric Environment, 39(1): 167-177.
Carslaw, D. C. and S. D. Beevers (2004). Investigating the Potential Importance of Primary NO2 Emissions in a Street Canyon. Atmospheric Environment 38(22): 3585-3594.
Carslaw, D. C. and S. D. Beevers (2004). New Directions: Should road vehicle emissions legislation consider primary NO2? Atmospheric Environment 38(8): 1233-1234.
linearRelation
if you have roadside ozone
measurements.
Calculates multiple percentile values from a time series, with flexible time aggregation.
calcPercentile( mydata, pollutant = "o3", avg.time = "month", percentile = 50, data.thresh = 0, start = NA )
calcPercentile( mydata, pollutant = "o3", avg.time = "month", percentile = 50, data.thresh = 0, start = NA )
mydata |
A data frame of data with a |
pollutant |
Name of variable to process. Mandatory. |
avg.time |
Averaging period to use. See |
percentile |
A vector of percentile values. For example |
data.thresh |
Data threshold to apply when aggregating data. See
|
start |
Start date to use - see |
This is a utility function to calculate percentiles and is used in, for
example, timePlot
. Given a data frame with a date
field and one
other numeric variable, percentiles are calculated.
Returns a data frame with new columns for each percentile level. New columns are given names like percentile.95 e.g. when percentile = 95 is chosen. See examples below.
David Carslaw
# 95th percentile monthly o3 concentrations percentiles <- calcPercentile(mydata, pollutant ="o3", avg.time = "month", percentile = 95) head(percentiles) # 5, 50, 95th percentile monthly o3 concentrations ## Not run: percentiles <- calcPercentile(mydata, pollutant ="o3", avg.time = "month", percentile = c(5, 50, 95)) head(percentiles) ## End(Not run)
# 95th percentile monthly o3 concentrations percentiles <- calcPercentile(mydata, pollutant ="o3", avg.time = "month", percentile = 95) head(percentiles) # 5, 50, 95th percentile monthly o3 concentrations ## Not run: percentiles <- calcPercentile(mydata, pollutant ="o3", avg.time = "month", percentile = c(5, 50, 95)) head(percentiles) ## End(Not run)
This function will plot data by month laid out in a conventional calendar format. The main purpose is to help rapidly visualise potentially complex data in a familiar way. Users can also choose to show daily mean wind vectors if wind speed and direction are available.
calendarPlot( mydata, pollutant = "nox", year = 2003, month = 1:12, type = "default", annotate = "date", statistic = "mean", cols = "heat", limits = c(0, 100), lim = NULL, col.lim = c("grey30", "black"), col.arrow = "black", font.lim = c(1, 2), cex.lim = c(0.6, 1), digits = 0, data.thresh = 0, labels = NA, breaks = NA, w.shift = 0, w.abbr.len = 1, remove.empty = TRUE, main = NULL, key.header = "", key.footer = "", key.position = "right", key = TRUE, auto.text = TRUE, plot = TRUE, ... )
calendarPlot( mydata, pollutant = "nox", year = 2003, month = 1:12, type = "default", annotate = "date", statistic = "mean", cols = "heat", limits = c(0, 100), lim = NULL, col.lim = c("grey30", "black"), col.arrow = "black", font.lim = c(1, 2), cex.lim = c(0.6, 1), digits = 0, data.thresh = 0, labels = NA, breaks = NA, w.shift = 0, w.abbr.len = 1, remove.empty = TRUE, main = NULL, key.header = "", key.footer = "", key.position = "right", key = TRUE, auto.text = TRUE, plot = TRUE, ... )
mydata |
A data frame minimally containing |
pollutant |
Mandatory. A pollutant name corresponding to a variable in a
data frame should be supplied e.g. |
year |
Year to plot e.g. |
month |
If only certain month are required. By default the function will
plot an entire year even if months are missing. To only plot certain months
use the |
type |
Not yet implemented. |
annotate |
This option controls what appears on each day of the calendar. Can be: "date" — shows day of the month; "wd" — shows vector-averaged wind direction, or "ws" — shows vector-averaged wind direction scaled by wind speed. Finally it can be “value” which shows the daily mean value. |
statistic |
Statistic passed to |
cols |
Colours to be used for plotting. See |
limits |
Use this option to manually set the colour scale limits. This
is useful in the case when there is a need for two or more plots and a
consistent scale is needed on each. Set the limits to cover the maximum
range of the data for all plots of interest. For example, if one plot had
data covering 0–60 and another 0–100, then set |
lim |
A threshold value to help differentiate values above and below
|
col.lim |
For the annotation of concentration labels on each day. The
first sets the colour of the text below |
col.arrow |
The colour of the annotated wind direction / wind speed arrows. |
font.lim |
For the annotation of concentration labels on each day. The
first sets the font of the text below |
cex.lim |
For the annotation of concentration labels on each day. The
first sets the size of the text below |
digits |
The number of digits used to display concentration values when
|
data.thresh |
Data capture threshold passed to |
labels |
If a categorical scale is defined using |
breaks |
If a categorical scale is required then these breaks will be
used. For example, |
w.shift |
Controls the order of the days of the week. By default the
plot shows Saturday first ( |
w.abbr.len |
The default ( |
remove.empty |
Should months with no data present be removed? Default is
|
main |
The plot title; default is pollutant and year. |
key.header |
Adds additional text/labels to the scale key. For example,
passing |
key.footer |
see |
key.position |
Location where the scale key is to plotted. Allowed
arguments currently include |
key |
Fine control of the scale key via |
auto.text |
Either |
plot |
Should a plot be produced? |
... |
Other graphical parameters are passed onto the |
calendarPlot()
will plot data in a conventional calendar format, i.e., by
month and day of the week. Daily statistics are calculated using
timeAverage()
, which by default will calculate the daily mean
concentration.
If wind direction is available it is then possible to plot the wind direction
vector on each day. This is very useful for getting a feel for the
meteorological conditions that affect pollutant concentrations. Note that if
hourly or higher time resolution are supplied, then calendarPlot()
will
calculate daily averages using timeAverage()
, which ensures that wind
directions are vector-averaged.
If wind speed is also available, then setting the option annotate = "ws"
will plot the wind vectors whose length is scaled to the wind speed. Thus
information on the daily mean wind speed and direction are available.
It is also possible to plot categorical scales. This is useful where, for
example, an air quality index defines concentrations as bands, e.g., "good",
"poor". In these cases users must supply labels
and corresponding breaks
.
Note that is is possible to pre-calculate concentrations in some way before
passing the data to calendarPlot()
. For example rollingMean()
could be
used to calculate rolling 8-hour mean concentrations. The data can then be
passed to calendarPlot()
and statistic = "max"
chosen, which will plot
maximum daily 8-hour mean concentrations.
an openair object
David Carslaw
Other time series and trend functions:
TheilSen()
,
runRegression()
,
smoothTrend()
,
timePlot()
,
timeProp()
,
timeVariation()
,
trendLevel()
# basic plot calendarPlot(mydata, pollutant = "o3", year = 2003) # show wind vectors calendarPlot(mydata, pollutant = "o3", year = 2003, annotate = "wd") ## Not run: # show wind vectors scaled by wind speed and different colours calendarPlot(mydata, pollutant = "o3", year = 2003, annotate = "ws", cols = "heat" ) # show only specific months with selectByDate calendarPlot(selectByDate(mydata, month = c(3, 6, 10), year = 2003), pollutant = "o3", year = 2003, annotate = "ws", cols = "heat" ) # categorical scale example calendarPlot(mydata, pollutant = "no2", breaks = c(0, 50, 100, 150, 1000), labels = c("Very low", "Low", "High", "Very High"), cols = c("lightblue", "green", "yellow", "red"), statistic = "max" ) # UK daily air quality index pm10.breaks <- c(0, 17, 34, 50, 59, 67, 75, 84, 92, 100, 1000) calendarPlot(mydata, "pm10", year = 1999, breaks = pm10.breaks, labels = c(1:10), cols = "daqi", statistic = "mean", key.header = "DAQI" ) ## End(Not run)
# basic plot calendarPlot(mydata, pollutant = "o3", year = 2003) # show wind vectors calendarPlot(mydata, pollutant = "o3", year = 2003, annotate = "wd") ## Not run: # show wind vectors scaled by wind speed and different colours calendarPlot(mydata, pollutant = "o3", year = 2003, annotate = "ws", cols = "heat" ) # show only specific months with selectByDate calendarPlot(selectByDate(mydata, month = c(3, 6, 10), year = 2003), pollutant = "o3", year = 2003, annotate = "ws", cols = "heat" ) # categorical scale example calendarPlot(mydata, pollutant = "no2", breaks = c(0, 50, 100, 150, 1000), labels = c("Very low", "Low", "High", "Very High"), cols = c("lightblue", "green", "yellow", "red"), statistic = "max" ) # UK daily air quality index pm10.breaks <- c(0, 17, 34, 50, 59, 67, 75, 84, 92, 100, 1000) calendarPlot(mydata, "pm10", year = 1999, breaks = pm10.breaks, labels = c(1:10), cols = "daqi", statistic = "mean", key.header = "DAQI" ) ## End(Not run)
This function enhances conditionalQuantile()
by also considering how other
variables vary over the same intervals. Conditional quantiles are very useful
on their own for model evaluation, but provide no direct information on how
other variables change at the same time. For example, a conditional quantile
plot of ozone concentrations may show that low concentrations of ozone tend
to be under-predicted. However, the cause of the under-prediction can be
difficult to determine. However, by considering how well the model predicts
other variables over the same intervals, more insight can be gained into the
underlying reasons why model performance is poor.
conditionalEval( mydata, obs = "obs", mod = "mod", var.obs = "var.obs", var.mod = "var.mod", type = "default", bins = 31, statistic = "MB", xlab = "predicted value", ylab = "statistic", col = brewer.pal(5, "YlOrRd"), col.var = "Set1", var.names = NULL, auto.text = TRUE, ... )
conditionalEval( mydata, obs = "obs", mod = "mod", var.obs = "var.obs", var.mod = "var.mod", type = "default", bins = 31, statistic = "MB", xlab = "predicted value", ylab = "statistic", col = brewer.pal(5, "YlOrRd"), col.var = "Set1", var.names = NULL, auto.text = TRUE, ... )
mydata |
A data frame containing the field |
obs |
The name of the observations in |
mod |
The name of the predictions (modelled values) in |
var.obs |
Other variable observations for which statistics should be
calculated. Can be more than length one e.g. |
var.mod |
Other variable predictions for which statistics should be
calculated. Can be more than length one e.g. |
type |
It is also possible to choose |
bins |
Number of bins to be used in calculating the different quantile levels. |
statistic |
Statistic(s) to be plotted. Can be “MB”,
“NMB”, “r”, “COE”, “MGE”, “NMGE”,
“RMSE” and “FAC2”, as described in
|
xlab |
label for the x-axis, by default “predicted value”. |
ylab |
label for the y-axis, by default “observed value”. |
col |
Colours to be used for plotting the uncertainty bands and median line. Must be of length 5 or more. |
col.var |
Colours for the additional variables to be compared. See
|
var.names |
Variable names to be shown on plot for plotting
|
auto.text |
Either |
... |
Other graphical parameters passed onto |
The conditionalEval
function provides information on how other
variables vary across the same intervals as shown on the conditional quantile
plot. There are two types of variable that can be considered by setting the
value of statistic
. First, statistic
can be another variable in
the data frame. In this case the plot will show the different proportions of
statistic
across the range of predictions. For example statistic = "season"
will show for each interval of mod
the proportion of
predictions that were spring, summer, autumn or winter. This is useful
because if model performance is worse for example at high concentrations of
mod
then knowing that these tend to occur during a particular season
etc. can be very helpful when trying to understand why a model fails.
See cutData()
for more details on the types of variable that can
be statistic
. Another example would be statistic = "ws"
(if
wind speed were available in the data frame), which would then split wind
speed into four quantiles and plot the proportions of each.
Second, conditionalEval
can simultaneously plot the model performance
of other observed/predicted variable pairs according to different
model evaluation statistics. These statistics derive from the
modStats()
function and include “MB”, “NMB”,
“r”, “COE”, “MGE”, “NMGE”, “RMSE” and
“FAC2”. More than one statistic can be supplied e.g. statistic = c("NMB", "COE")
. Bootstrap samples are taken from the corresponding values
of other variables to be plotted and their statistics with 95\
intervals calculated. In this case, the model performance of other
variables is shown across the same intervals of mod
, rather than just
the values of single variables. In this second case the model would need to
provide observed/predicted pairs of other variables.
For example, a model may provide predictions of NOx and wind speed (for which
there are also observations available). The conditionalEval
function
will show how well these other variables are predicted for the same intervals
of the main variables assessed in the conditional quantile e.g. ozone. In
this case, values are supplied to var.obs
(observed values for other
variables) and var.mod
(modelled values for other variables). For
example, to consider how well the model predicts NOx and wind speed
var.obs = c("nox.obs", "ws.obs")
and var.mod = c("nox.mod", "ws.mod")
would be supplied (assuming nox.obs, nox.mod, ws.obs, ws.mod
are present in the data frame). The analysis could show for example,
when ozone concentrations are under-predicted, the model may also be shown to
over-predict concentrations of NOx at the same time, or under-predict wind
speeds. Such information can thus help identify the underlying causes of poor
model performance. For example, an under-prediction in wind speed could
result in higher surface NOx concentrations and lower ozone concentrations.
Similarly if wind speed predictions were good and NOx was over predicted it
might suggest an over-estimate of NOx emissions. One or more additional
variables can be plotted.
A special case is statistic = "cluster"
. In this case a data frame is
provided that contains the cluster calculated by trajCluster()
and importTraj()
. Alternatively users could supply their own
pre-calculated clusters. These calculations can be very useful in showing
whether certain back trajectory clusters are associated with poor (or good)
model performance. Note that in the case of statistic = "cluster"
there will be fewer data points used in the analysis compared with the
ordinary statistics above because the trajectories are available for every
three hours. Also note that statistic = "cluster"
cannot be used
together with the ordinary model evaluation statistics such as MB. The output
will be a bar chart showing the proportion of each interval of mod
by
cluster number.
Far more insight can be gained into model performance through conditioning
using type
. For example, type = "season"
will plot conditional
quantiles and the associated model performance statistics of other variables
by each season. type
can also be a factor or character field e.g.
representing different models used.
See Wilks (2005) for more details of conditional quantile plots.
David Carslaw
Wilks, D. S., 2005. Statistical Methods in the Atmospheric Sciences, Volume 91, Second Edition (International Geophysics), 2nd Edition. Academic Press.
The verification
package for comprehensive functions for
forecast verification.
Other model evaluation functions:
TaylorDiagram()
,
conditionalQuantile()
,
modStats()
Function to calculate conditional quantiles with flexible conditioning. The function is for use in model evaluation and more generally to help better understand forecast predictions and how well they agree with observations.
conditionalQuantile( mydata, obs = "obs", mod = "mod", type = "default", bins = 31, min.bin = c(10, 20), xlab = "predicted value", ylab = "observed value", col = brewer.pal(5, "YlOrRd"), key.columns = 2, key.position = "bottom", auto.text = TRUE, ... )
conditionalQuantile( mydata, obs = "obs", mod = "mod", type = "default", bins = 31, min.bin = c(10, 20), xlab = "predicted value", ylab = "observed value", col = brewer.pal(5, "YlOrRd"), key.columns = 2, key.position = "bottom", auto.text = TRUE, ... )
mydata |
A data frame containing the field |
obs |
The name of the observations in |
mod |
The name of the predictions (modelled values) in |
type |
It is also possible to choose Type can be up length two e.g. |
bins |
Number of bins to be used in calculating the different quantile levels. |
min.bin |
The minimum number of points required for the estimates of the 25/75th and 10/90th percentiles. |
xlab |
label for the x-axis, by default “predicted value”. |
ylab |
label for the y-axis, by default “observed value”. |
col |
Colours to be used for plotting the uncertainty bands and median line. Must be of length 5 or more. |
key.columns |
Number of columns to be used in the key. |
key.position |
Location of the key e.g. “top”, “bottom”,
“right”, “left”. See |
auto.text |
Either |
... |
Other graphical parameters passed onto |
Conditional quantiles are a very useful way of considering model performance against observations for continuous measurements (Wilks, 2005). The conditional quantile plot splits the data into evenly spaced bins. For each predicted value bin e.g. from 0 to 10~ppb the corresponding values of the observations are identified and the median, 25/75th and 10/90 percentile (quantile) calculated for that bin. The data are plotted to show how these values vary across all bins. For a time series of observations and predictions that agree precisely the median value of the predictions will equal that for the observations for each bin.
The conditional quantile plot differs from the quantile-quantile plot (Q-Q plot) that is often used to compare observations and predictions. A Q-Q~plot separately considers the distributions of observations and predictions, whereas the conditional quantile uses the corresponding observations for a particular interval in the predictions. Take as an example two time series, the first a series of real observations and the second a lagged time series of the same observations representing the predictions. These two time series will have identical (or very nearly identical) distributions (e.g. same median, minimum and maximum). A Q-Q plot would show a straight line showing perfect agreement, whereas the conditional quantile will not. This is because in any interval of the predictions the corresponding observations now have different values.
Plotting the data in this way shows how well predictions agree with
observations and can help reveal many useful characteristics of how well
model predictions agree with observations — across the full distribution of
values. A single plot can therefore convey a considerable amount of
information concerning model performance. The conditionalQuantile
function in openair allows conditional quantiles to be considered in a
flexible way e.g. by considering how they vary by season.
The function requires a data frame consisting of a column of observations and
a column of predictions. The observations are split up into bins
according to values of the predictions. The median prediction line together
with the 25/75th and 10/90th quantile values are plotted together with a line
showing a “perfect” model. Also shown is a histogram of predicted
values (shaded grey) and a histogram of observed values (shown as a blue
line).
Far more insight can be gained into model performance through conditioning
using type
. For example, type = "season"
will plot conditional
quantiles by each season. type
can also be a factor or character field
e.g. representing different models used.
See Wilks (2005) for more details and the examples below.
David Carslaw
Murphy, A. H., B.G. Brown and Y. Chen. (1989) Diagnostic Verification of Temperature Forecasts, Weather and Forecasting, Volume: 4, Issue: 4, Pages: 485-501.
Wilks, D. S., 2005. Statistical Methods in the Atmospheric Sciences, Volume 91, Second Edition (International Geophysics), 2nd Edition. Academic Press.
The verification
package for comprehensive functions for
forecast verification.
Other model evaluation functions:
TaylorDiagram()
,
conditionalEval()
,
modStats()
## make some dummy prediction data based on 'nox' mydata$mod <- mydata$nox * 1.1 + mydata$nox * runif(1:nrow(mydata)) # basic conditional quantile plot ## A "perfect" model is shown by the blue line ## predictions tend to be increasingly positively biased at high nox, ## shown by departure of median line from the blue one. ## The widening uncertainty bands with increasing NOx shows that ## hourly predictions are worse for higher NOx concentrations. ## Also, the red (median) line extends beyond the data (blue line), ## which shows in this case some predictions are much higher than ## the corresponding measurements. Note that the uncertainty bands ## do not extend as far as the median line because there is insufficient # to calculate them conditionalQuantile(mydata, obs = "nox", mod = "mod") ## can split by season to show seasonal performance (not very ## enlightening in this case - try some real data and it will be!) ## Not run: conditionalQuantile(mydata, obs = "nox", mod = "mod", type = "season") ## End(Not run)
## make some dummy prediction data based on 'nox' mydata$mod <- mydata$nox * 1.1 + mydata$nox * runif(1:nrow(mydata)) # basic conditional quantile plot ## A "perfect" model is shown by the blue line ## predictions tend to be increasingly positively biased at high nox, ## shown by departure of median line from the blue one. ## The widening uncertainty bands with increasing NOx shows that ## hourly predictions are worse for higher NOx concentrations. ## Also, the red (median) line extends beyond the data (blue line), ## which shows in this case some predictions are much higher than ## the corresponding measurements. Note that the uncertainty bands ## do not extend as far as the median line because there is insufficient # to calculate them conditionalQuantile(mydata, obs = "nox", mod = "mod") ## can split by season to show seasonal performance (not very ## enlightening in this case - try some real data and it will be!) ## Not run: conditionalQuantile(mydata, obs = "nox", mod = "mod", type = "season") ## End(Not run)
Function to to draw and visualise correlation matrices using lattice. The primary purpose is as a tool for exploratory data analysis. Hierarchical clustering is used to group similar variables.
corPlot( mydata, pollutants = NULL, type = "default", cluster = TRUE, method = "pearson", use = "pairwise.complete.obs", dendrogram = FALSE, lower = FALSE, cols = "default", r.thresh = 0.8, text.col = c("black", "black"), auto.text = TRUE, plot = TRUE, ... )
corPlot( mydata, pollutants = NULL, type = "default", cluster = TRUE, method = "pearson", use = "pairwise.complete.obs", dendrogram = FALSE, lower = FALSE, cols = "default", r.thresh = 0.8, text.col = c("black", "black"), auto.text = TRUE, plot = TRUE, ... )
mydata |
A data frame which should consist of some numeric columns. |
pollutants |
the names of data-series in |
type |
It is also possible to choose |
cluster |
Should the data be ordered according to cluster analysis. If
|
method |
The correlation method to use. Can be “pearson”, “spearman” or “kendall”. |
use |
How to handle missing values in the |
dendrogram |
Should a dendrogram be plotted? When |
lower |
Should only the lower triangle be plotted? |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “spectral”,
“hue”, “greyscale” and user defined (see |
r.thresh |
Values of greater than |
text.col |
The colour of the text used to show the correlation values. The first value controls the colour of negative correlations and the second positive. |
auto.text |
Either |
plot |
Should a plot be produced? |
... |
Other graphical parameters passed onto |
The corPlot
function plots correlation matrices. The implementation
relies heavily on that shown in Sarkar (2007), with a few extensions.
Correlation matrices are a very effective way of understating relationships
between many variables. The corPlot
shows the correlation coded in
three ways: by shape (ellipses), colour and the numeric value. The ellipses
can be thought of as visual representations of scatter plot. With a perfect
positive correlation a line at 45 degrees positive slope is drawn. For zero
correlation the shape becomes a circle. See examples below.
With many different variables it can be difficult to see relationships
between variables, i.e., which variables tend to behave most like one
another. For this reason hierarchical clustering is applied to the
correlation matrices to group variables that are most similar to one another
(if cluster = TRUE
).
If clustering is chosen it is also possible to add a dendrogram using the
option dendrogram = TRUE
. Note that dendrogramscan only be plotted for
type = "default"
i.e. when there is only a single panel. The
dendrogram can also be recovered from the plot object itself and plotted more
clearly; see examples below.
It is also possible to use the openair
type option to condition the
data in many flexible ways, although this may become difficult to visualise
with too many panels.
an openair object
David Carslaw — but mostly based on code contained in Sarkar (2007)
Sarkar, D. (2007). Lattice Multivariate Data Visualization with R. New York: Springer.
Friendly, M. (2002). Corrgrams : Exploratory displays for correlation matrices. American Statistician, 2002(4), 1-16. doi:10.1198/000313002533
taylor.diagram
from the plotrix
package from which
some of the annotation code was used.
## basic corrgram plot corPlot(mydata) ## plot by season ... and so on corPlot(mydata, type = "season") ## recover dendrogram when cluster = TRUE and plot it res <-corPlot(mydata) plot(res$clust) ## Not run: ## a more interesting are hydrocarbon measurements hc <- importAURN(site = "my1", year = 2005, hc = TRUE) ## now it is possible to see the hydrocarbons that behave most ## similarly to one another corPlot(hc) ## End(Not run)
## basic corrgram plot corPlot(mydata) ## plot by season ... and so on corPlot(mydata, type = "season") ## recover dendrogram when cluster = TRUE and plot it res <-corPlot(mydata) plot(res$clust) ## Not run: ## a more interesting are hydrocarbon measurements hc <- importAURN(site = "my1", year = 2005, hc = TRUE) ## now it is possible to see the hydrocarbons that behave most ## similarly to one another corPlot(hc) ## End(Not run)
Utility function to split data frames up in various ways for conditioning
plots. Users would generally not be expected to call this function directly.
Widely used by many openair
functions usually through the option
type
.
cutData( x, type = "default", hemisphere = "northern", n.levels = 4, start.day = 1, is.axis = FALSE, local.tz = NULL, latitude = 51, longitude = -0.5, ... )
cutData( x, type = "default", hemisphere = "northern", n.levels = 4, start.day = 1, is.axis = FALSE, local.tz = NULL, latitude = 51, longitude = -0.5, ... )
x |
A data frame containing a field |
type |
A string giving the way in which the data frame should be split. Pre-defined values are: “default”, “year”, “hour”, “month”, “season”, “weekday”, “site”, “weekend”, “monthyear”, “daylight”, “dst” (daylight saving time).
|
hemisphere |
Can be |
n.levels |
Number of quantiles to split numeric data into. |
start.day |
What day of the week should the |
is.axis |
A logical ( |
local.tz |
Used for identifying whether a date has daylight savings time
(DST) applied or not. Examples include |
latitude |
The decimal latitude used in |
longitude |
The decimal longitude. Note that locations west of Greenwich are negative. |
... |
All additional parameters are passed on to next function(s). |
This section give a brief description of each of the define levels of
type
. Note that all time dependent types require a column date
.
"default" does not split the data but will describe the levels as a date range in the format "day month year".
"year" splits the data by each year.
"month" splits the data by month of the year.
"hour" splits the data by hour of the day.
"monthyear" splits the data by year and month. It differs from month in that a level is defined for each month of the data set. This is useful sometimes to show an ordered sequence of months if the data set starts half way through a year; rather than starting in January.
"weekend" splits the data by weekday and weekend.
"weekday" splits the data by day of the week - ordered to start Monday.
"season" splits data up by season. In the northern hemisphere winter =
December, January, February; spring = March, April, May etc. These
definitions will change of hemisphere = "southern"
.
"seasonyear (or "yearseason") will split the data into year-season intervals,
keeping the months of a season together. For example, December 2010 is
considered as part of winter 2011 (with January and February 2011). This
makes it easier to consider contiguous seasons. In contrast, type =
"season"
will just split the data into four seasons regardless of the year.
"daylight" splits the data relative to estimated sunrise and sunset to give
either daylight or nighttime. The cut is made by cutDaylight
but more
conveniently accessed via cutData
, e.g. cutData(mydata, type =
"daylight", latitude = my.latitude, longitude = my.longitude)
. The daylight
estimation, which is valid for dates between 1901 and 2099, is made using the
measurement location, date, time and astronomical algorithms to estimate the
relative positions of the Sun and the measurement location on the Earth's
surface, and is based on NOAA methods. Measurement location should be set
using latitude
(+ to North; - to South) and longitude
(+ to
East; - to West).
"dst" will split the data by hours that are in daylight saving time (DST) and
hours that are not for appropriate time zones. The option "dst" also requires
that the local time zone is given e.g. local.tz = "Europe/London"
,
local.tz = "America/New_York"
. Each of the two periods will be in
local time. The main purpose of this option is to test whether there
is a shift in the diurnal profile when DST and non-DST hours are compared.
This option is particularly useful with the timeVariation
function.
For example, close to the source of road vehicle emissions, ‘rush-hour’ will
tend to occur at the same local time throughout the year e.g. 8 am and
5 pm. Therefore, comparing non-DST hours with DST hours will tend to show
similar diurnal patterns (at least in the timing of the peaks, if not
magnitude) when expressed in local time. By contrast a variable such as wind
speed or temperature should show a clear shift when expressed in local time.
In essence, this option when used with timeVariation
may help
determine whether the variation in a pollutant is driven by man-made
emissions or natural processes.
"wd" splits the data by 8 wind sectors and requires a column wd
: "NE",
"E", "SE", "S", "SW", "W", "NW", "N".
"ws" splits the data by 8 quantiles of wind speed and requires a column
ws
.
"site" splits the data by site and therefore requires a column site
.
Note that all the date-based types e.g. month/year are derived from a column
date
. If a user already has a column with a name of one of the
date-based types it will not be used.
Returns a data frame with a column cond
that is defined by
type
.
David Carslaw (cutData) and Karl Ropkins (cutDaylight)
## split data by day of the week mydata <- cutData(mydata, type = "weekday")
## split data by day of the week mydata <- cutData(mydata, type = "weekday")
General function for producing scale keys for other openair functions. The function is a crude modification of the draw.colorkey function developed by Deepayan Sarkar as part of the lattice package, and allows additional key labelling to added, and provides some additional control of the appearance and scaling.
drawOpenKey(key, draw = FALSE, vp = NULL)
drawOpenKey(key, draw = FALSE, vp = NULL)
key |
List defining the scale key structure to be produced. Most
options are identical to original Original
Note: Additional options include:
Notes:
|
draw |
Option to return the key object or plot it directly. The default, FALSE, should always be used within openair calls. |
vp |
View port to be used when plotting key. The default, NULL, should always be used within openair calls. (Note: |
The drawOpenKey
function produces scale keys for other openair
functions.
Most drawOpenKey
options are identical to those of
lattice::draw.colorkey
. For example, scale key size and position
are controlled via height
, width
and space
. Likewise,
the axis labelling can be set in and formatted by labels
. See
draw.colorkey
for further details.
Additional scale labelling may be added above and below the scale using
header
and footer
options within key
. As in other
openair
functions, automatic text formatting can be enabled via
auto.key
.
(Note: Currently, the formatting of header
and footer
text
are fixed to the same style as labels
(the scale axis) and cannot be
defined locally.)
The relationship between header
, footer
and the scale key
itself can be controlled using fit
options. These can be set in
key$fit
to apply uniform control or individually in
key$header$fit
and/or key$footer$fit
to control locally.
The appearance of the scale can be controlled using plot.style
.
The function is a modification of lattice::draw.colorkey
and
returns a scale key using a similar mechanism to that used in in the
original function as developed by Deepayan Sarkar.
We gratefully acknowledge the considerable help and advice of Deepayan Sarkar.
draw.colorkey
is part of the lattice
package,
developed by Deepayan Sarkar.
Additional modifications by Karl Ropkins.
Deepayan Sarkar (2010). lattice: Lattice Graphics. R package version 0.18-5. http://r-forge.r-project.org/projects/lattice/
Functions using drawOpenKey
currently include
windRose
, pollutionRose
.
For details of the original function, see draw.colorkey
########## #example 1 ########## #paddle style scale key used by windRose windRose(mydata,) #adding text and changing style and position via key #note: #some simple key control also possible directly #For example, below does same as #windRose(mydata, key.position="right") windRose(mydata, key =list(space="right") ) #however: #more detailed control possible working with #key and drawOpenKey. For example, windRose(mydata, key = list(header="Title", footer="wind speed", plot.style = c("ticks", "border"), fit = "all", height = 1, space = "top") )
########## #example 1 ########## #paddle style scale key used by windRose windRose(mydata,) #adding text and changing style and position via key #note: #some simple key control also possible directly #For example, below does same as #windRose(mydata, key.position="right") windRose(mydata, key =list(space="right") ) #however: #more detailed control possible working with #key and drawOpenKey. For example, windRose(mydata, key = list(header="Title", footer="wind speed", plot.style = c("ticks", "border"), fit = "all", height = 1, space = "top") )
Function(s) to import various ADMS file types into openair. Currently handles
".met", ".bgd", ".mop" and ".pst" file structures. Uses utils::read.csv()
to read in data, format for R and openair and apply some file structure
testing.
importADMS( file = file.choose(), file.type = "unknown", drop.case = TRUE, drop.input.dates = TRUE, keep.units = TRUE, simplify.names = TRUE, test.file.structure = TRUE, drop.delim = TRUE, add.prefixes = TRUE, names = NULL, all = FALSE, ... )
importADMS( file = file.choose(), file.type = "unknown", drop.case = TRUE, drop.input.dates = TRUE, keep.units = TRUE, simplify.names = TRUE, test.file.structure = TRUE, drop.delim = TRUE, add.prefixes = TRUE, names = NULL, all = FALSE, ... )
file |
The ADMS file to be imported. Default, |
file.type |
Type of ADMS file to be imported. With default, "unknown",
the import uses the file extension to identify the file type and, where
recognised, uses this to identify the file structure and import method to
be applied. Where file extension is not recognised the choice may be forced
by setting |
drop.case |
Option to convert all data names to lower case. Default,
|
drop.input.dates |
Option to remove ADMS "hour", "day", and "year" data
columns after generating openair "date" timeseries. Default, |
keep.units |
Option to retain ADMS data units. Default, |
simplify.names |
Option to simplify data names in accordance with common
|
test.file.structure |
Option to test file structure before trying to
import. Default, |
drop.delim |
Option to remove delim columns from the data frame. ADMS
.mop files include two columns, "INPUT_DATA:" and "PROCESSED_DATA:", to
separate model input and output types. Default, |
add.prefixes |
Option to add prefixes to data names. ADMS .mop files
include a number of input and process data types with shared names.
Prefixes can be automatically added to these so individual data can be
readily identified in the R/openair environment. Default, |
names |
Option applied by |
all |
For .MOP files, return all variables or not. If |
... |
Arguments passed on to
|
The importADMS
function were developed to help import various ADMS
file types into openair. In most cases the parent import function should work
in default configuration, e.g. mydata <- importADMS()
. The function
currently recognises four file formats: .bgd
, .met
, .mop
and .pst
. Where other file extensions have been set but the file
structure is known, the import call can be forced by, e.g, mydata <-
importADMS(file.type="bgd")
. Other options can be adjusted to provide fine
control of the data structuring and renaming.
In standard use importADMS()
returns a data frame for use in
openair. By comparison to the original file, the resulting data frame is
modified as follows:
Time and date information will combined in a single column "date",
formatted as a conventional timeseries (as.POSIX*
). If
drop.input.dates
is enabled data series combined to generated the
new "date" data series will also be removed.
If simplify.names
is enabled common chemical names may be
simplified, and some other parameters may be reset to openair standards
(e.g. "ws", "wd" and "temp") according to operations defined in
simplifyNamesADMS
. A summary of simplification operations can be
obtained using, e.g., the call importADMS(simplify.names)
.
If drop.case
is enabled all upper case characters in names will be
converted to lower case.
If keep.units
is enabled data units information may also be retained
as part of the data frame comment if available.
With .mop
files, input and processed data series names may also been
modified on the basis of drop.delim
and add.prefixes
settings
Times are assumed to be in GMT. Zero wind directions reset to 360 as
part of .mop
file import.
Karl Ropkins, David Carslaw and Matthew Williams (CERC).
Other import functions:
importAURN()
,
importEurope()
,
importKCL()
,
importMeta()
,
importTraj()
,
importUKAQ()
########## #example 1 ########## #To be confirmed #all current simplify.names operations importADMS(simplify.names) #to see what simplify.names does to adms data series name PHI new.name <- importADMS(simplify.names, names="PHI") new.name
########## #example 1 ########## #To be confirmed #all current simplify.names operations importADMS(simplify.names) #to see what simplify.names does to adms data series name PHI new.name <- importADMS(simplify.names, names="PHI") new.name
These functions act as wrappers for importUKAQ()
to import air pollution
data from a range of UK networks including the Automatic Urban and Rural
Network (AURN), the individual England (AQE), Scotland (SAQN), Wales (WAQN)
and Northern Ireland (NI) Networks, and many "locally managed" monitoring
networks across England. While importUKAQ()
allows for data to be imported
more flexibly, including across multiple monitoring networks, these functions
are provided for convenience and back-compatibility.
importAURN( site = "my1", year = 2009, data_type = "hourly", pollutant = "all", hc = FALSE, meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importAQE( site = "yk13", year = 2018, data_type = "hourly", pollutant = "all", meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importSAQN( site = "gla4", year = 2009, data_type = "hourly", pollutant = "all", meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importWAQN( site = "card", year = 2018, data_type = "hourly", pollutant = "all", meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importNI( site = "bel0", year = 2018, data_type = "hourly", pollutant = "all", meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importLocal( site = "ad1", year = 2018, data_type = "hourly", pollutant = "all", meta = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE )
importAURN( site = "my1", year = 2009, data_type = "hourly", pollutant = "all", hc = FALSE, meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importAQE( site = "yk13", year = 2018, data_type = "hourly", pollutant = "all", meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importSAQN( site = "gla4", year = 2009, data_type = "hourly", pollutant = "all", meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importWAQN( site = "card", year = 2018, data_type = "hourly", pollutant = "all", meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importNI( site = "bel0", year = 2018, data_type = "hourly", pollutant = "all", meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE ) importLocal( site = "ad1", year = 2018, data_type = "hourly", pollutant = "all", meta = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE )
site |
Site code of the site to import, e.g., |
year |
Year(s) to import. To import a series of years use, e.g.,
|
data_type |
The type of data to be returned, defaulting to
|
pollutant |
Pollutants to import. If omitted will import all pollutants
from a site. To import only NOx and NO2 for example use |
hc |
Include hydrocarbon measurements in the imported data? Defaults to
|
meta |
Append the site type, latitude and longitude of each selected
|
meteo |
Append modelled meteorological data, if available? Defaults to
|
ratified |
Append |
to_narrow |
Return the data in a "narrow"/"long"/"tidy" format? By
default the returned data is "wide" and has a column for each
pollutant/variable. When |
verbose |
Print messages to the console if hourly data cannot be
imported? Default is |
progress |
Show a progress bar when many sites/years are being imported?
Defaults to |
This family of functions has been written to make it easy to import data from across several UK air quality networks. Ricardo have provided .RData files (R workspaces) of all individual sites and years, as well as up to date meta data. These files are updated on a daily basis. This approach requires a link to the Internet to work.
There are several advantages over the web portal approach where .csv files are downloaded.
First, it is quick to select a range of sites, pollutants and periods (see examples below).
Second, storing the data as .RData objects is very efficient as they are about four times smaller than .csv files — which means the data downloads quickly and saves bandwidth.
Third, the function completely avoids any need for data manipulation or setting time formats, time zones etc. The function also has the advantage that the proper site name is imported and used in openair functions.
Users should take care if using data from both openair and web portals (for example, UK AIR). One key difference is that the data provided by openair is date beginning, whereas the web portal provides date ending. Hourly concentrations may therefore appear offset by an hour, for example.
The data are imported by stacking sites on top of one another and will have
field names site
, code
(the site code) and pollutant
.
By default, the function returns hourly average data. However, annual,
monthly, daily and 15 minute data (for SO2) can be returned using the
option data_type
. Annual and monthly data provide whole network
information including data capture statistics.
All units are expressed in mass terms for gaseous species (ug/m3 for NO, NO2, NOx (as NO2), SO2 and hydrocarbons; and mg/m3 for CO). PM10 concentrations are provided in gravimetric units of ug/m3 or scaled to be comparable with these units. Over the years a variety of instruments have been used to measure particulate matter and the technical issues of measuring PM10 are complex. In recent years the measurements rely on FDMS (Filter Dynamics Measurement System), which is able to measure the volatile component of PM. In cases where the FDMS system is in use there will be a separate volatile component recorded as 'v10' and non-volatile component 'nv10', which is already included in the absolute PM10 measurement. Prior to the use of FDMS the measurements used TEOM (Tapered Element Oscillating. Microbalance) and these concentrations have been multiplied by 1.3 to provide an estimate of the total mass including the volatile fraction.
Some sites report hourly and daily PM10 and / or PM2.5. When data_type = "daily"
and there are both hourly and 'proper' daily measurements
available, these will be returned as e.g. "pm2.5" and "gr_pm2.5"; the
former corresponding to data based on original hourly measurements and the
latter corresponding to daily gravimetric measurements.
The function returns modelled hourly values of wind speed (ws
), wind
direction (wd
) and ambient temperature (air_temp
) if available
(generally from around 2010). These values are modelled using the WRF model
operated by Ricardo.
The BAM (Beta-Attenuation Monitor) instruments that have been incorporated into the network throughout its history have been scaled by 1.3 if they have a heated inlet (to account for loss of volatile particles) and 0.83 if they do not have a heated inlet. The few TEOM instruments in the network after 2008 have been scaled using VCM (Volatile Correction Model) values to account for the loss of volatile particles. The object of all these scaling processes is to provide a reasonable degree of comparison between data sets and with the reference method and to produce a consistent data record over the operational period of the network, however there may be some discontinuity in the time series associated with instrument changes.
No corrections have been made to the PM2.5 data. The volatile component of FDMS PM2.5 (where available) is shown in the 'v2.5' column.
Other import functions:
importADMS()
,
importEurope()
,
importKCL()
,
importMeta()
,
importTraj()
,
importUKAQ()
This function is a simplified version of the saqgetr
package (see
https://github.com/skgrange/saqgetr) for accessing European air quality
data. The function only returns valid hourly data and is meant as a fast and
convenient way of accessing the most common type of hourly air quality data.
The function works in the same way as other openair
functions that
import air quality data that generally need a site code and year to be
supplied.
importEurope( site = "debw118", year = 2018, tz = "UTC", meta = FALSE, to_narrow = FALSE, progress = TRUE )
importEurope( site = "debw118", year = 2018, tz = "UTC", meta = FALSE, to_narrow = FALSE, progress = TRUE )
site |
The code of the site(s). |
year |
Year or years to import. To import a sequence of years from 1990
to 2000 use |
tz |
Not used |
meta |
Should meta data be returned? If |
to_narrow |
By default the returned data has a column for each
pollutant/variable. When |
progress |
Show a progress bar when many sites/years are being imported?
Defaults to |
The function can however return key site meta data.
The saqgetr
package is much more comprehensive and provides data at
other time averages e.g. daily data.
a tibble
Other import functions:
importADMS()
,
importAURN()
,
importKCL()
,
importMeta()
,
importTraj()
,
importUKAQ()
# import data for Stuttgart Am Neckartor (S) ## Not run: stuttgart <- importEurope("debw118", year = 2010:2019, meta = TRUE) ## End(Not run)
# import data for Stuttgart Am Neckartor (S) ## Not run: stuttgart <- importEurope("debw118", year = 2010:2019, meta = TRUE) ## End(Not run)
Function for importing hourly mean data from King's College London networks. Files are imported from a remote server operated by King's College London that provides air quality data files as R data objects.
importKCL( site = "my1", year = 2009, pollutant = "all", met = FALSE, units = "mass", extra = FALSE, meta = FALSE, to_narrow = FALSE, progress = TRUE )
importKCL( site = "my1", year = 2009, pollutant = "all", met = FALSE, units = "mass", extra = FALSE, meta = FALSE, to_narrow = FALSE, progress = TRUE )
site |
Site code of the network site to import e.g. "my1" is Marylebone
Road. Several sites can be imported with |
year |
Year(s) to import. To import a series of years use, e.g.,
|
pollutant |
Pollutants to import. If omitted will import all pollutants
from a site. To import only NOx and NO2 for example use |
met |
Should meteorological data be added to the import data? The
default is |
units |
By default the returned data frame expresses the units in mass
terms (ug/m3 for NOx, NO2, O3, SO2; mg/m3 for CO). Use |
extra |
Not currently used. |
meta |
Append the site type, latitude and longitude of each selected
|
to_narrow |
Return the data in a "narrow"/"long"/"tidy" format? By
default the returned data is "wide" and has a column for each
pollutant/variable. When |
progress |
Show a progress bar when many sites/years are being imported?
Defaults to |
The importKCL
function has been written to make it easy to import
data from the King's College London air pollution networks. KCL have
provided .RData files (R workspaces) of all individual sites and years for
the KCL networks. These files are updated on a weekly basis. This approach
requires a link to the Internet to work.
There are several advantages over the web portal approach where .csv files
are downloaded. First, it is quick to select a range of sites, pollutants
and periods (see examples below). Second, storing the data as .RData objects
is very efficient as they are about four times smaller than .csv files —
which means the data downloads quickly and saves bandwidth. Third, the
function completely avoids any need for data manipulation or setting time
formats, time zones etc. Finally, it is easy to import many years of data
beyond the current limit of about 64,000 lines. The final point makes it
possible to download several long time series in one go. The function also
has the advantage that the proper site name is imported and used in
openair
functions.
The site codes and pollutant names can be upper or lower case. The function will issue a warning when data less than six months old is downloaded, which may not be ratified.
The data are imported by stacking sites on top of one another and will have
field names date
, site
, code
(the site code) and
pollutant(s). Sometimes it is useful to have columns of site data. This can
be done using the reshape
function — see examples below.
The situation for particle measurements is not straightforward given the
variety of methods used to measure particle mass and changes in their use
over time. The importKCL
function imports two measures of PM10 where
available. PM10_raw
are TEOM measurements with a 1.3 factor applied
to take account of volatile losses. The PM10
data is a current best
estimate of a gravimetric equivalent measure as described below. NOTE! many
sites have several instruments that measure PM10 or PM2.5. In the case of
FDMS measurements, these are given as separate site codes (see below). For
example "MY1" will be TEOM with VCM applied and "MY7" is the FDMS data.
Where FDMS data are used the volatile and non-volatile components are separately reported i.e. v10 = volatile PM10, v2.5 = volatile PM2.5, nv10 = non-volatile PM10 and nv2.5 = non-volatile PM2.5. Therefore, PM10 = v10 + nv10 and PM2.5 = v2.5 + nv2.5.
For the assessment of the EU Limit Values, PM10 needs to be measured using the reference method or one shown to be equivalent to the reference method. Defra carried out extensive trials between 2004 and 2006 to establish which types of particulate analysers in use in the UK were equivalent. These trials found that measurements made using Partisol, FDMS, BAM and SM200 instruments were shown to be equivalent to the PM10 reference method. However, correction factors need to be applied to measurements from the SM200 and BAM instruments. Importantly, the TEOM was demonstrated as not being equivalent to the reference method due to the loss of volatile PM, even when the 1.3 correction factor was applied. The Volatile Correction Model (VCM) was developed for Defra at King's to allow measurements of PM10 from TEOM instruments to be converted to reference equivalent; it uses the measurements of volatile PM made using nearby FDMS instruments to correct the measurements made by the TEOM. It passed the equivalence testing using the same methodology used in the Defra trials and is now the recommended method for correcting TEOM measurements (Defra, 2009). VCM correction of TEOM measurements can only be applied after 1st January 2004, when sufficiently widespread measurements of volatile PM became available. The 1.3 correction factor is now considered redundant for measurements of PM10 made after 1st January 2004. Further information on the VCM can be found at http://www.volatile-correction-model.info/.
All PM10 statistics on the LondonAir web site, including the bulletins and
statistical tools (and in the RData objects downloaded using
importKCL
), now report PM10 results as reference equivalent. For PM10
measurements made by BAM and SM200 analysers the applicable correction
factors have been applied. For measurements from TEOM analysers the 1.3
factor has been applied up to 1st January 2004, then the VCM method has been
used to convert to reference equivalent.
The meteorological data are meant to represent 'typical' conditions in London, but users may prefer to use their own data. The data provide a an estimate of general meteorological conditions across Greater London. For meteorological species (wd, ws, rain, solar) each data point is formed by averaging measurements from a subset of LAQN monitoring sites that have been identified as having minimal disruption from local obstacles and a long term reliable dataset. The exact sites used varies between species, but include between two and five sites per species. Therefore, the data should represent 'London scale' meteorology, rather than local conditions.
While the function is being developed, the following site codes should help with selection. We will also make available other meta data such as site type and location to make it easier to select sites based on other information. Note that these codes need to be refined because only the common species are available for export currently i.e. NOx, NO2, O3, CO, SO2, PM10, PM2.5.
A30 | Kingston - Kingston Bypass A3 | Roadside
AD1 | Shoreham-by-Sea | Kerbside
AR1 | Chichester - Lodsworth | Rural
AR2 | Wealden - Isfield | Rural
AS1 | Bath Aethalometer | Urban Background
BA1 | Basildon - Gloucester Park | Roadside
BB1 | Broxbourne (Roadside) | Roadside
BE0 | Belfast - Carbon | Urban Background
BE1 | Belfast Centre AURN | Urban Background
BE3 | Belfast Centre Aethalometer | Urban Background
BE7 | Belfast Centre FDMS trial | Urban Background
BE8 | Belfast - Nitrate | Urban Background
BE9 | Belfast - Partisol SO4 | Urban Background
BF1 | Bedford Stewartby (Rural) | Industrial
BF3 | Bedford - Kempston | Industrial
BF4 | Bedford - Prebend Street | Roadside
BF5 | Bedford - Lurke Street | Roadside
BG1 | Barking and Dagenham - Rush Green | Suburban
BG2 | Barking and Dagenham - Scrattons Farm | Suburban
BG3 | Barking and Dagenham - North Street | Kerbside
BH0 | Brighton Preston Park AURN | Urban Background
BH1 | Brighton Roadside | Roadside
BH2 | Brighton and Hove - Hove Town Hall | Roadside
BH3 | Brighton and Hove - Foredown Tower | Urban Background
BH5 | Brighton Mobile (Preston Fire Station) | Roadside
BH6 | Brighton Mobile (Lewes Road) | Roadside
BH7 | Brighton Mobile (Gloucester Road) | Roadside
BH8 | Brighton and Hove - Stanmer Park | Rural
BH9 | Brighton Mobile Beaconsfield Road | Roadside
BI1 | Birmingham Tyburn CPC | Urban Background
BL0 | Camden - Bloomsbury | Urban Background
BL1 | Bloomsbury AURN SMPS | Urban Background
BM1 | Ballymena - Ballykeel | Suburban
BM2 | Ballymena - North Road | Roadside
BN1 | Barnet - Tally Ho Corner | Kerbside
BN2 | Barnet
Finchley | Urban Background
BN3 | Barnet - Strawberry Vale | Urban Background
BO1 | Ballymoney 1 | Suburban
BP0 | Westminster - Bridge Place | Urban Background
BQ5 | Bexley - Manor Road West Gravimetric | Industrial
BQ6 | Bexley - Manor Road East Gravimetric | Industrial
BQ7 | Belvedere West | Urban Background
BQ8 | Belvedere West FDMS | Urban Background
BT1 | Brent - Kingsbury | Suburban
BT2 | Brent - Ikea Car Park | Roadside
BT3 | Brent - Harlesden | Roadside
BT4 | Brent - Ikea | Roadside
BT5 | Brent - Neasden Lane | Industrial
BT6 | Brent - John Keble Primary School | Roadside
BT7 | Brent - St Marys Primary School | Urban Background
BW1 | Brentwood - Brentwood Town Hall | Urban Background
BX0 | Bexley - Belvedere FDMS | Suburban
BX1 | Bexley - Slade Green | Suburban
BX2 | Bexley - Belvedere | Suburban
BX3 | Bexley - Thamesmead | Suburban
BX4 | Bexley - Erith | Industrial
BX5 | Bexley - Bedonwell | Suburban
BX6 | Bexley - Thames Road North FDMS | Roadside
BX7 | Bexley - Thames Road North | Roadside
BX8 | Bexley - Thames Road South | Roadside
BX9 | Bexley - Slade Green FDMS | Suburban
BY1 | Bromley - Rent Office | Urban Background
BY4 | Bromley - Tweedy Rd | Roadside
BY5 | Bromley - Biggin Hill | Suburban
BY7 | Bromley - Harwood Avenue | Roadside
CA1 | Crawley Background | Urban Background
CA2 | Crawley - Gatwick Airport | Urban Background
CB1 | Chelmsford - Fire Station | Roadside
CB2 | Chelmsford - Springfield Road | Roadside
CB3 | Chelmsford - Chignal St James | Urban Background
CB4 | Chelmsford - Baddow Road | Roadside
CC1 | Colchester - Lucy Lane South | Roadside
CC2 | Colchester - Brook Street | Roadside
CC3 | Colchester - Mersea Road | Roadside
CD1 | Camden - Swiss Cottage | Kerbside
CD3 | Camden - Shaftesbury Avenue | Roadside
CD4 | Camden - St Martins College (NOX
| Urban Background
CD5 | Camden - St Martins College (NOX 2) | Urban Background
CD7 | Camden - Swiss Cottage Partisol | Kerbside
CD9 | Camden - Euston Road | Roadside
CF1 | Cardiff Aethalometer | Urban Background
CH1 | Cheltenham | Urban Background
CI1 | Chichester - A27 Chichester Bypass | Roadside
CI4 | Chichester - Orchard Street | Roadside
CK1 | Cookstown | Suburban
CP1 | Castle Point - Canvey Island | Urban Background
CR2 | Croydon - Purley Way | Roadside
CR3 | Croydon - Thornton Heath | Suburban
CR4 | Croydon - George Street | Roadside
CR5 | Croydon - Norbury | Kerbside
CR6 | Croydon - Euston Road | Suburban
CT1 | City of London - Senator House | Urban Background
CT2 | City of London - Farringdon Street | Kerbside
CT3 | City of London - Sir John Cass School | Urban Background
CT4 | City of London - Beech Street | Roadside
CT6 | City of London - Walbrook Wharf | Roadside
CT8 | City of London - Upper Thames Street | Roadside
CY1 | Crystal Palace - Crystal Palace Parade | Roadside
DC1 | Dacorum 1 Hemel Hempstead (Background) | Urban Background
DC2 | Dacorum 2 Hemel Hempstead (Background) | Urban Background
DC3 | High Street Northchurch | Roadside
DE1 | Derry City - Brandywell | Urban Background
DE2 | Derry City - Dales Corner | Roadside
DM1 | Dunmurry Aethalometer | Urban Background
EA0 | Ealing - Acton Town Hall FDMS | Roadside
EA1 | Ealing - Ealing Town Hall | Urban Background
EA2 | Ealing - Acton Town Hall | Roadside
EA3 | Ealing 3 - A40 East Acton | Roadside
EA4 | Ealing Mobile - Hamilton Road | Roadside
EA5 | Ealing Mobile - Southall | Roadside
EA6 | Ealing - Hanger Lane Gyratory | Roadside
EA7 | Ealing - Southall | Urban Background
EA8 | Ealing - Horn Lane | Industrial
EA9 | Ealing - Court Way | Roadside
EB1 | Eastbourne - Devonshire Park | Urban Background
EB3 | Eastbourne - Holly Place | Urban Background
EH1 | E Herts Throcking (Rural) | Rural
EH2 | East Herts Sawbridgeworth (Background) | Urban Background
EH3 | East Herts Sawbridgeworth (Roadside) | Roadside
EH4 | East Herts Ware | Roadside
EH5 | East Herts Bishops Stortford | Roadside
EI0 | Ealing - Greenford | Urban Background
EI1 | Ealing - Western Avenue | Roadside
EL1 | Elmbridge - Bell Farm Hersham | Urban Background
EL2 | Elmbridge - Esher High Street | Roadside
EL3 | Elmbridge - Hampton Court Parade | Roadside
EL4 | Elmbridge - Walton High Street | Kerbside
EN1 | Enfield - Bushhill Park | Suburban
EN2 | Enfield
Church Street | Roadside
EN3 | Enfield - Salisbury School | Urban Background
EN4 | Enfield - Derby Road | Roadside
EN5 | Enfield - Bowes Primary School | Roadside
FB1 | Rushmoor - Medway Drive | Roadside
GB0 | Greenwich and Bexley - Falconwood FDMS | Roadside
GB6 | Greenwich and Bexley - Falconwood | Roadside
GL1 | Glasgow Centre | Suburban
GL4 | Glasgow Centre Aethalometer | Suburban
GN0 | Greenwich - A206 Burrage Grove | Roadside
GN2 | Greenwich - Millennium Village | Industrial
GN3 | Greenwich - Plumstead High Street | Roadside
GN4 | Greenwich - Fiveways Sidcup Rd A20 | Roadside
GR4 | Greenwich - Eltham | Suburban
GR5 | Greenwich - Trafalgar Road | Roadside
GR7 | Greenwich - Blackheath | Roadside
GR8 | Greenwich - Woolwich Flyover | Roadside
GR9 | Greenwich - Westhorne Avenue | Roadside
HA0 | Harwell - Carbon | Rural
HA1 | Harwell Rural AURN | Rural
HA2 | Harwell Rural PARTISOL | Rural
HA4 | Harwell Rural SMPS | Rural
HA9 | Harwell - Partisol SO4 | Urban Background
HF1 | Hammersmith and Fulham - Broadway | Roadside
HF2 | Hammersmith and Fulham - Brook Green | Urban Background
HF3 | Hammersmith and Fulham - Scrubs Lane | Kerbside
HG1 | Haringey - Haringey Town Hall | Roadside
HG2 | Haringey - Priory Park | Urban Background
HG3 | Haringey - Bounds Green | Roadside
HI0 | Hillingdon - Sipson Road | Suburban
HI1 | Hillingdon - South Ruislip | Roadside
HI2 | Hillingdon - Hillingdon Hospital | Roadside
HI3 | Hillingdon - Oxford Avenue | Roadside
HK4 | Hackney - Clapton | Urban Background
HK6 | Hackney - Old Street | Roadside
HL1 | Halifax Aethalometer | Urban Background
HM1 | Hertsmere Borehamwood 1 (Background) | Urban Background
HM4 | Hertsmere - Borehamwood | Urban Background
HO1 | Horsham Background | Urban Background
HO2 | Horsham - Park Way | Roadside
HO4 | Horsham - Storrington | Roadside
HO5 | Horsham - Cowfold | Roadside
HR1 | Harrow - Stanmore | Urban Background
HR2 | Harrow - Pinner Road | Roadside
HS1 | Hounslow - Brentford | Roadside
HS2 | Hounslow - Cranford | Suburban
HS3 | Hounslow - Brentford | Roadside
HS4 | Hounslow - Chiswick High Road | Roadside
HS5 | Hounslow - Brentford | Roadside
HS6 | Hounslow - Heston Road | Roadside
HS7 | Hounslow - Hatton Cross | Urban Background
HS9 | Hounslow - Feltham | Roadside
HT1 | Hastings - Bulverhythe | Roadside
HT2 | Hastings - Fresh Fields | Roadside
HV1 | Havering - Rainham | Roadside
HV2 | Havering - Harold Hill | Suburban
HV3 | Havering - Romford | Roadside
HX0 | Birmingham Tyburn Aethalometer | Urban Background
IC6 | City of London
Walbrook Wharf Indoor | Roadside
IG4 | Greenwich - Eltham Ecology Centre Indoor | Urban Background
IS1 | Islington - Upper Street | Urban Background
IS2 | Islington - Holloway Road | Roadside
IS4 | Islington - Foxham Gardens | Urban Background
IS5 | Islington - Duncan Terrace | Roadside
IS6 | Islington - Arsenal | Urban Background
IT2 | Tower Hamlets - Mile End Road | Roadside
KB1 | South Kirkby Aethalometer | Urban Background
KC0 | North Kensington - Carbon | Urban Background
KC1 | Kensington and Chelsea - North Ken | Urban Background
KC2 | Kensington and Chelsea - Cromwell Road | Roadside
KC3 | Kensington and Chelsea - Knightsbridge | Roadside
KC4 | Kensington and Chelsea - Kings Road | Roadside
KC5 | Kensington and Chelsea - Earls Court Rd | Kerbside
KC7 | Kensington and Chelsea - North Ken FDMS | Urban Background
KC9 | North Kensington - Partisol SO4 | Urban Background
KT1 | Kingston - Chessington | Suburban
KT2 | Kingston - Town Centre | Roadside
LA1 | Luton Airport | Urban Background
LB1 | Lambeth - Christchurch Road | Roadside
LB2 | Lambeth - Vauxhall Cross | Roadside
LB3 | Lambeth - Loughborough Junct | Urban Background
LB4 | Lambeth - Brixton Road | Kerbside
LB5 | Lambeth - Bondway Interchange | Roadside
LB6 | Lambeth - Streatham Green | Urban Background
LH0 | Hillingdon - Harlington | Urban Background
LH2 | Heathrow Airport | Urban Background
LL1 | Lullington Heath Rural AURN | Rural
LN1 | Luton - Challney Community College | Urban Background
LS1 | Lewes - Telscombe Cliffs | Roadside
LS2 | Lewes - Commercial Square | Roadside
LS4 | Newhaven - Denton School | Urban Background
LW1 | Lewisham - Catford | Urban Background
LW2 | Lewisham - New Cross | Roadside
LW3 | Lewisham
Mercury Way | Industrial
MA1 | Manchester Piccadilly CPC | Urban Background
MA2 | Manchester Piccadilly | Urban Background
MD1 | Mid Beds Biggleswade (Roadside) | Roadside
MD2 | Mid Beds Silsoe (Rural) | Rural
MD3 | Central Beds - Sandy | Roadside
MD4 | Central Beds - Marston Vale | Rural
ME1 | Merton - Morden Civic Centre | Roadside
MP1 | Marchwood Power - Marchwood | Industrial
MP2 | Marchwood Power - Millbrook Rd Soton | Industrial
MR3 | Marylebone Road Aethalometer | Kerbside
MV1 | Mole Valley - Leatherhead | Rural
MV2 | Mole Valley - Lower Ashtead | Suburban
MV3 | Mole Valley - Dorking | Urban Background
MW1 | Windsor and Maidenhead - Frascati Way | Roadside
MW2 | Windsor and Maidenhead - Clarence Road | Roadside
MW3 | Windsor and Maidenhead - Ascot | Rural
MY0 | Marylebone Road - Carbon | Kerbside
MY1 | Westminster - Marylebone Road | Kerbside
MY7 | Westminster - Marylebone Road FDMS | Kerbside
NA5 | Newtownabbey- Mallusk | Urban Background
NA6 | Newtownabbey- Shore Road | Roadside
NE2 | Port Talbot TEOM and CPC | Urban Background
NF1 | New Forest - Holbury | Industrial
NF2 | New Forest - Fawley | Industrial
NF3 | New Forest - Ringwood | Urban Background
NF4 | New Forest - Totton | Roadside
NF5 | New Forest - Lyndhurst | Roadside
NH1 | North Herts Mobile - Baldock 1 | Roadside
NH2 | North Herts Mobile - Baldock 2 | Roadside
NH3 | North Herts Mobile - Royston | Urban Background
NH4 | North Herts - Breechwood Green | Urban Background
NH5 | North Herts - Baldock Roadside | Roadside
NH6 | North Herts - Hitchin Library | Roadside
NK1 | North Kensington - CPC | Urban Background
NK3 | North Kensington Aethalometer | Urban Background
NK6 | North Kensington - URG | Urban Background
NM1 | Newham - Tant Avenue | Urban Background
NM2 | Newham - Cam Road | Roadside
NM3 | Newham - Wren Close | Urban Background
NW1 | Norwich Centre Aethalometer | Urban Background
OX0 | Oxford Centre Roadside AURN | Urban Background
OX1 | South Oxfordshire - Henley | Roadside
OX2 | South Oxfordshire - Wallingford | Roadside
OX3 | South Oxfordshire - Watlington | Roadside
OX4 | Oxford St Ebbes AURN | Urban Background
PO1 | Portsmouth Background AURN | Urban Background
PT6 | Port Talbot Dyffryn School | Industrial
RB1 | Redbridge - Perth Terrace | Urban Background
RB2 | Redbridge - Ilford Broadway | Kerbside
RB3 | Redbridge - Fullwell Cross | Kerbside
RB4 | Redbridge - Gardner Close | Roadside
RB5 | Redbridge - South Woodford | Roadside
RD0 | Reading AURN - New Town | Urban Background
RD1 | Reading - Caversham Road | Roadside
RD2 | Reading - Kings Road | Roadside
RD3 | Reading - Oxford Road | Roadside
RG1 | Reigate and Banstead - Horley | Suburban
RG2 | Reigate and Banstead - Horley South | Suburban
RG3 | Reigate and Banstead - Poles Lane | Rural
RG4 | Reigate and Banstead - Reigate High St | Kerbside
RHA | Richmond - Lower Mortlake Road | Roadside
RHB | Richmond - Lower Mortlake Road | Roadside
RI1 | Richmond - Castelnau | Roadside
RI2 | Richmond - Barnes Wetlands | Suburban
RI5 | Richmond Mobile - St Margarets | Kerbside
RI6 | Richmond Mobile
St Margarets | Kerbside
RI7 | Richmond Mobile - Richmond Park | Suburban
RI8 | Richmond Mobile - Richmond Park | Suburban
RIA | Richmond Mobile - George Street | Kerbside
RIB | Richmond Mobile - George Street | Kerbside
RIC | Richmond Mobile - Kew Rd | Kerbside
RID | Richmond Mobile - Kew Rd | Kerbside
RIE | Richmond Mobile
Richmond Rd Twickenham | Roadside
RIF | Richmond Mobile - Richmond Rd Twickenham | Roadside
RIG | Richmond Mobile - Upper Teddington Rd | Roadside
RIH | Richmond Mobile - Upper Teddington Rd | Roadside
RII | Richmond Mobile - Somerset Rd Teddington | Urban Background
RIJ | Richmond Mobile - Somerset Rd Teddington | Urban Background
RIK | Richmond Mobile - St. Margarets Grove | Urban Background
RIL | Richmond Mobile - St. Margarets Grove | Urban Background
RIM | Richmond Mobile - Petersham Rd Ham | Roadside
RIN | Richmond Mobile - Petersham Rd Ham | Roadside
RIO | Richmond Mobile - Stanley Rd Twickenham | Roadside
RIP | Richmond Mobile - Stanley Rd Twickenham | Roadside
RIQ | Richmond Mobile - Richmond Rd Twickenham | Roadside
RIR | Richmond Mobile - Richmond Rd Twickenham | Roadside
RIS | Richmond Mobile - Lincoln Ave Twickenham | Roadside
RIU | Richmond Mobile - Mortlake Rd Kew | Roadside
RIW | Richmond - Upper Teddington Road | Roadside
RIY | Richmond - Hampton Court Road | Kerbside
RO1 | Rochford - Rayleigh High Street | Roadside
RY1 | Rother - Rye Harbour | Rural
RY2 | Rother - De La Warr Road | Roadside
SA1 | St Albans - Fleetville | Urban Background
SB1 | South Beds - Dunstable | Urban Background
SC1 | Sevenoaks 1 | Suburban
SD1 | Southend-on-Sea AURN | Urban Background
SE1 | Stevenage - Lytton Way | Roadside
SH1 | Southampton Background AURN | Urban Background
SH2 | Southampton - Redbridge | Roadside
SH3 | Southampton - Onslow Road | Roadside
SH4 | Southampton - Bitterne | Urban Background
SK1 | Southwark - Larcom Street | Urban Background
SK2 | Southwark - Old Kent Road | Roadside
SK5 | Southwark - A2 Old Kent Road | Roadside
SL1 | Sunderland Aethalometer | Urban Background
ST1 | Sutton - Robin Hood School | Roadside
ST2 | Sutton - North Cheam | Urban Background
ST3 | Sutton - Carshalton | Suburban
ST4 | Sutton - Wallington | Kerbside
ST5 | Sutton - Beddington Lane | Industrial
ST6 | Sutton - Worcester Park | Kerbside
ST7 | Sutton - Therapia Lane | Industrial
SU1 | Sussex Mobile10 Stockbridge | Kerbside
SU2 | Sussex Mobile11 Jct Whitley Rd | Kerbside
SU3 | Sussex Mobile12 Cowfold | Kerbside
SU4 | Sussex Mobile 13 Newhaven | Roadside
SU5 | Sussex Mobile 14 Crawley | Roadside
SU6 | Sussex Mobile15 Chichester County Hall | Urban Background
SU7 | Sussex Mobile 16 Warnham | Rural
SU8 | Sussex Mobile 17 Newhaven Paradise Park | Roadside
SX1 | Sussex Mobile 1 | Urban Background
SX2 | Sussex Mobile 2 North Berstead | Roadside
SX3 | Sussex Mobile 3 | Roadside
SX4 | Sussex Mobile 4 Adur | Roadside
SX5 | Sussex Mobile 5 Fresh Fields Rd Hastings | Roadside
SX6 | Sussex Mobile 6 Orchard St Chichester | Roadside
SX7 | Sussex Mobile 7 New Road Newhaven | Roadside
SX8 | Sussex Mobile 8 Arundel | Kerbside
SX9 | Sussex Mobile 9 Newhaven Kerbside | Kerbside
TD0 | Richmond - National Physical Laboratory | Suburban
TE0 | Tendring St Osyth AURN | Rural
TE1 | Tendring - Town Hall | Roadside
TH1 | Tower Hamlets - Poplar | Urban Background
TH2 | Tower Hamlets - Mile End Road | Roadside
TH3 | Tower Hamlets - Bethnal Green | Urban Background
TH4 | Tower Hamlets - Blackwall | Roadside
TK1 | Thurrock - London Road (Grays) | Urban Background
TK2 | Thurrock - Purfleet | Roadside
TK3 | Thurrock - Stanford-le-Hope | Roadside
TK8 | Thurrock - London Road (Purfleet) | Roadside
TR1 | Three Rivers - Rickmansworth | Urban Background
UT1 | Uttlesford - Saffron Walden Fire Station | Roadside
UT2 | Uttlesford - Takeley | Urban Background
UT3 | Uttlesford - Broxted Farm | Rural
VS1 | Westminster - Victoria Street | Kerbside
WA1 | Wandsworth - Garratt Lane | Roadside
WA2 | Wandsworth - Town Hall | Urban Background
WA3 | Wandsworth - Roehampton | Rural
WA4 | Wandsworth - High Street | Roadside
WA6 | Wandsworth - Tooting | Roadside
WA7 | Wandsworth - Putney High Street | Kerbside
WA8 | Wandsworth - Putney High Street Facade | Roadside
WA9 | Wandsworth - Putney | Urban Background
WE0 | Kensington and Chelsea - Pembroke Road | Urban Background
WF1 | Watford (Roadside) | Roadside
WF2 | Watford - Watford Town Hall | Roadside
WH1 | Welwyn Hatfield - Council Offices | Urban Background
WL1 | Waltham Forest - Dawlish Road | Urban Background
WL2 | Waltham Forest - Mobile | Roadside
WL3 | Waltham Forest - Chingford | Roadside
WL4 | Waltham Forest - Crooked Billet | Kerbside
WL5 | Waltham Forest - Leyton | Roadside
WM0 | Westminster - Horseferry Road | Urban Background
WM3 | Westminster - Hyde Park Partisol | Roadside
WM4 | Westminster - Charing Cross Library | Roadside
WM5 | Westminster - Covent Garden | Urban Background
WM6 | Westminster - Oxford St | Kerbside
WR1 | Bradford Town Hall Aethalometer | Urban Background
WT1 | Worthing - Grove Lodge | Kerbside
XB1 | Bletchley | Rural
XS1 | Shukri Outdoor | Industrial
XS2 | Shukri Indoor | Industrial
XS3 | Osiris mobile | Urban Background
YH1 | Harrogate Roadside | Roadside
ZA1 | Ashford Rural - Pluckley | Rural
ZA2 | Ashford Roadside | Roadside
ZA3 | Ashford Background | Urban Background
ZA4 | Ashford M20 Background | Urban Background
ZC1 | Chatham Roadside - A2 | Roadside
ZD1 | Dover Roadside - Town Hall | Roadside
ZD2 | Dover Roadside - Townwall Street | Roadside
ZD3 | Dover Background - Langdon Cliff | Urban Background
ZD4 | Dover Background - East Cliff | Urban Background
ZD5 | Dover Coast Guard Met | Urban Background
ZD6 | Dover Docks | Industrial
ZF1 | Folkestone Suburban - Cheriton | Suburban
ZG1 | Gravesham Backgrnd - Northfleet | Urban Background
ZG2 | Gravesham Roadside - A2 | Roadside
ZG3 | Gravesham Ind Bgd - Northfleet | Urban Background
ZH1 | Thanet Rural - Minster | Rural
ZH2 | Thanet Background - Margate | Urban Background
ZH3 | Thanet Airport - Manston | Urban Background
ZH4 | Thanet Roadside - Ramsgate | Roadside
ZL1 | Luton Background | Urban Background
ZM1 | Maidstone Meteorological | Urban Background
ZM2 | Maidstone Roadside - Fairmeadow | Kerbside
ZM3 | Maidstone Rural - Detling | Rural
ZR1 | Dartford Roadside - St Clements | Kerbside
ZR2 | Dartford Roadside 2 - Town Centre | Roadside
ZR3 | Dartford Roadside 3 - Bean Interchange | Roadside
ZS1 | Stoke Rural AURN | Rural
ZT1 | Tonbridge Roadside - Town Centre | Roadside
ZT2 | Tunbridge Wells Background - Town Hall | Urban Background
ZT3 | Tunbridge Wells Rural - Southborough | Rural
ZT4 | Tunbridge Wells Roadside - St Johns | Roadside
ZT5 | Tonbridge Roadside 2 - High St | Roadside
ZV1 | Sevenoaks - Greatness Park | Urban Background
ZV2 | Sevenoaks - Bat and Ball | Roadside
ZW1 | Swale Roadside - Ospringe A2 | Roadside
ZW2 | Swale Background - Sheerness | Urban Background
ZW3 | Swale Roadside 2 - Ospringe Street | Roadside
ZY1 | Canterbury Backgrnd - Chaucer TS | Urban Background
ZY2 | Canterbury Roadside - St Dunstans | Roadside
ZY4 | Canterbury St Peters Place | Roadside
Returns a data frame of hourly mean values with date in POSIXct class and time zone GMT.
David Carslaw and Ben Barratt
Other import functions:
importADMS()
,
importAURN()
,
importEurope()
,
importMeta()
,
importTraj()
,
importUKAQ()
## import all pollutants from Marylebone Rd from 1990:2009 ## Not run: mary <- importKCL(site = "my1", year = 2000:2009) ## import nox, no2, o3 from Marylebone Road and North Kensignton for 2000 ## Not run: thedata <- importKCL(site = c("my1", "kc1"), year = 2000, pollutant = c("nox", "no2", "o3")) ## End(Not run) ## import met data too... ## Not run: my1 <- importKCL(site = "my1", year = 2008, met = TRUE)
## import all pollutants from Marylebone Rd from 1990:2009 ## Not run: mary <- importKCL(site = "my1", year = 2000:2009) ## import nox, no2, o3 from Marylebone Road and North Kensignton for 2000 ## Not run: thedata <- importKCL(site = c("my1", "kc1"), year = 2000, pollutant = c("nox", "no2", "o3")) ## End(Not run) ## import met data too... ## Not run: my1 <- importKCL(site = "my1", year = 2008, met = TRUE)
Function to import meta data for air quality monitoring sites. By default,
the function will return the site latitude, longitude and site type, as well
as the code used in functions like importUKAQ()
, importKCL()
and
importEurope()
. Additional information may optionally be returned.
importMeta(source = "aurn", all = FALSE, year = NA, duplicate = FALSE)
importMeta(source = "aurn", all = FALSE, year = NA, duplicate = FALSE)
source |
One or more air quality networks for which data is available through openair. Available networks include:
|
all |
When |
year |
If a single year is selected, only sites that were open at some
point in that year are returned. If |
duplicate |
Some UK air quality sites are part of multiple networks, so
could appear more than once when |
This function imports site meta data from several networks in the UK and Europe:
"aurn"
, The UK Automatic Urban and Rural Network.
"aqe"
, The Air Quality England Network.
"saqn"
, The Scottish Air Quality Network.
"waqn"
, The Welsh Air Quality Network.
"ni"
, The Northern Ireland Air Quality Network.
"local"
, Locally managed air quality networks in England.
"kcl"
, King's College London networks.
"europe"
, Hourly European data (Air Quality e-Reporting) based on a
simplified version of the {saqgetr}
package.
By default, the function will return the site latitude, longitude and site
type. If the option all = TRUE
is used, much more detailed information is
returned. For most networks, this detailed information includes per-pollutant
summaries, opening and closing dates of sites etc.
Thanks go to Trevor Davies (Ricardo), Dr Stuart Grange (EMPA) and Dr Ben Barratt (KCL) and for making these data available.
A data frame with meta data.
David Carslaw
the networkMap()
function from the openairmaps
package which can
visualise site metadata on an interactive map.
Other import functions:
importADMS()
,
importAURN()
,
importEurope()
,
importKCL()
,
importTraj()
,
importUKAQ()
## Not run: # basic info: meta <- importMeta(source = "aurn") # more detailed information: meta <- importMeta(source = "aurn", all = TRUE) # from the Scottish Air Quality Network: meta <- importMeta(source = "saqn", all = TRUE) # from multiple networks: meta <- importMeta(source = c("aurn", "aqe", "local")) ## End(Not run)
## Not run: # basic info: meta <- importMeta(source = "aurn") # more detailed information: meta <- importMeta(source = "aurn", all = TRUE) # from the Scottish Air Quality Network: meta <- importMeta(source = "saqn", all = TRUE) # from multiple networks: meta <- importMeta(source = c("aurn", "aqe", "local")) ## End(Not run)
Function to import pre-calculated back trajectories using the NOAA HYSPLIT
model. The trajectories have been calculated for a select range of locations
which will expand in time. They cover the last 20 years or so and can be used
together with other openair
functions.
importTraj(site = "london", year = 2009, local = NA, progress = TRUE)
importTraj(site = "london", year = 2009, local = NA, progress = TRUE)
site |
Site code of the network site to import e.g. "london". Only one site can be imported at a time. The following sites are typically available from 2000-2012, although some UK ozone sites go back to 1988 (code, location, lat, lon, year):
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
year |
Year or years to import. To import a sequence of years from
1990 to 2000 use |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
local |
File path to .RData trajectory files run by user and
not stored on the Ricardo web server. These files would have been
generated from the Hysplit trajectory code shown in the appendix
of the openair manual. An example would be |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
progress |
Show a progress bar when many receptors/years are being
imported? Defaults to |
This function imports pre-calculated back trajectories using the HYSPLIT trajectory model (Hybrid Single Particle Lagrangian Integrated Trajectory Model. Back trajectories provide some very useful information for air quality data analysis. However, while they are commonly calculated by researchers it is generally difficult for them to be calculated on a routine basis and used easily. In addition, the availability of back trajectories over several years can be very useful, but again difficult to calculate.
Trajectories are run at 3-hour intervals and stored in yearly files (see below). The trajectories are started at ground-level (10m) and propagated backwards in time.
These trajectories have been calculated using the Global NOAA-NCEP/NCAR reanalysis data archives. The global data are on a latitude-longitude grid (2.5 degree). Note that there are many different meteorological data sets that can be used to run HYSPLIT e.g. including ECMWF data. However, in order to make it practicable to run and store trajectories for many years and sites, the NOAA-NCEP/NCAR reanalysis data is most useful. In addition, these archives are available for use widely, which is not the case for many other data sets e.g. ECMWF. HYSPLIT calculated trajectories based on archive data may be distributed without permission. For those wanting, for example, to consider higher resolution meteorological data sets it may be better to run the trajectories separately.
We are extremely grateful to NOAA for making HYSPLIT available to produce back trajectories in an open way. We ask that you cite HYSPLIT if used in published work.
Users can supply their own trajectory files to plot in openair. These files must have the following fields: date, lat, lon and hour.inc (see details below).
The files consist of the following information:
This is the arrival point time and is repeated the
number of times equal to the length of the back trajectory — typically 96
hours (except early on in the file). The format is POSIXct
. It is this
field that should be used to link with air quality data. See example below.
Receptor number, currently only 1.
The year
Month 1-12
Day of the month 1-31
Hour of the day 0-23 GMT
Number of hours back in time e.g. 0 to -96.
Latitude in decimal format.
Longitude in decimal format.
Height of trajectory (m).
Pressure of trajectory (kPa).
Returns a data frame with pre-calculated back trajectories.
The trajectories were run using the February 2011 HYSPLIT model. The function is primarily written to investigate a single site at a time for a single year. The trajectory files are quite large and care should be exercised when importing several years and/or sites.
David Carslaw
Other import functions:
importADMS()
,
importAURN()
,
importEurope()
,
importKCL()
,
importMeta()
,
importUKAQ()
Other trajectory analysis functions:
trajCluster()
,
trajLevel()
,
trajPlot()
## import trajectory data for London in 2009 ## Not run: mytraj <- importTraj(site = "london", year = 2009) ## combine with measurements ## Not run: theData <- importAURN(site = "kc1", year = 2009) mytraj <- merge(mytraj, theData, by = "date") ## End(Not run)
## import trajectory data for London in 2009 ## Not run: mytraj <- importTraj(site = "london", year = 2009) ## combine with measurements ## Not run: theData <- importAURN(site = "kc1", year = 2009) mytraj <- merge(mytraj, theData, by = "date") ## End(Not run)
Functions for importing air pollution data from a range of UK networks
including the Automatic Urban and Rural Network (AURN), the individual
England (AQE), Scotland (SAQN), Wales (WAQN) and Northern Ireland (NI)
Networks, and many "locally managed" monitoring networks across England.
Files are imported from a remote server operated by Ricardo that provides air
quality data files as R data objects. For an up to date list of available
sites that can be imported, see importMeta()
.
importUKAQ( site = "my1", year = 2022, source = "aurn", data_type = "hourly", pollutant = "all", hc = FALSE, meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE )
importUKAQ( site = "my1", year = 2022, source = "aurn", data_type = "hourly", pollutant = "all", hc = FALSE, meta = FALSE, meteo = TRUE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE )
site |
Site code of the site to import, e.g., |
year |
Year(s) to import. To import a series of years use, e.g.,
|
source |
The network to which the
|
data_type |
The type of data to be returned, defaulting to
|
pollutant |
Pollutants to import. If omitted will import all pollutants
from a site. To import only NOx and NO2 for example use |
hc |
Include hydrocarbon measurements in the imported data? Defaults to
|
meta |
Append the site type, latitude and longitude of each selected
|
meteo |
Append modelled meteorological data, if available? Defaults to
|
ratified |
Append |
to_narrow |
Return the data in a "narrow"/"long"/"tidy" format? By
default the returned data is "wide" and has a column for each
pollutant/variable. When |
verbose |
Print messages to the console if hourly data cannot be
imported? Default is |
progress |
Show a progress bar when many sites/years are being imported?
Defaults to |
a tibble
This family of functions has been written to make it easy to import data from across several UK air quality networks. Ricardo have provided .RData files (R workspaces) of all individual sites and years, as well as up to date meta data. These files are updated on a daily basis. This approach requires a link to the Internet to work.
There are several advantages over the web portal approach where .csv files are downloaded.
First, it is quick to select a range of sites, pollutants and periods (see examples below).
Second, storing the data as .RData objects is very efficient as they are about four times smaller than .csv files — which means the data downloads quickly and saves bandwidth.
Third, the function completely avoids any need for data manipulation or setting time formats, time zones etc. The function also has the advantage that the proper site name is imported and used in openair functions.
Users should take care if using data from both openair and web portals (for example, UK AIR). One key difference is that the data provided by openair is date beginning, whereas the web portal provides date ending. Hourly concentrations may therefore appear offset by an hour, for example.
The data are imported by stacking sites on top of one another and will have
field names site
, code
(the site code) and pollutant
.
By default, the function returns hourly average data. However, annual,
monthly, daily and 15 minute data (for SO2) can be returned using the
option data_type
. Annual and monthly data provide whole network
information including data capture statistics.
All units are expressed in mass terms for gaseous species (ug/m3 for NO, NO2, NOx (as NO2), SO2 and hydrocarbons; and mg/m3 for CO). PM10 concentrations are provided in gravimetric units of ug/m3 or scaled to be comparable with these units. Over the years a variety of instruments have been used to measure particulate matter and the technical issues of measuring PM10 are complex. In recent years the measurements rely on FDMS (Filter Dynamics Measurement System), which is able to measure the volatile component of PM. In cases where the FDMS system is in use there will be a separate volatile component recorded as 'v10' and non-volatile component 'nv10', which is already included in the absolute PM10 measurement. Prior to the use of FDMS the measurements used TEOM (Tapered Element Oscillating. Microbalance) and these concentrations have been multiplied by 1.3 to provide an estimate of the total mass including the volatile fraction.
Some sites report hourly and daily PM10 and / or PM2.5. When data_type = "daily"
and there are both hourly and 'proper' daily measurements
available, these will be returned as e.g. "pm2.5" and "gr_pm2.5"; the
former corresponding to data based on original hourly measurements and the
latter corresponding to daily gravimetric measurements.
The function returns modelled hourly values of wind speed (ws
), wind
direction (wd
) and ambient temperature (air_temp
) if available
(generally from around 2010). These values are modelled using the WRF model
operated by Ricardo.
The BAM (Beta-Attenuation Monitor) instruments that have been incorporated into the network throughout its history have been scaled by 1.3 if they have a heated inlet (to account for loss of volatile particles) and 0.83 if they do not have a heated inlet. The few TEOM instruments in the network after 2008 have been scaled using VCM (Volatile Correction Model) values to account for the loss of volatile particles. The object of all these scaling processes is to provide a reasonable degree of comparison between data sets and with the reference method and to produce a consistent data record over the operational period of the network, however there may be some discontinuity in the time series associated with instrument changes.
No corrections have been made to the PM2.5 data. The volatile component of FDMS PM2.5 (where available) is shown in the 'v2.5' column.
David Carslaw, Trevor Davies, and Jack Davison
Other import functions:
importADMS()
,
importAURN()
,
importEurope()
,
importKCL()
,
importMeta()
,
importTraj()
## Not run: # import a single site from the AURN importUKAQ("my1", year = 2022) # import sites from another network importUKAQ(c("bn1", "bn2"), year = 2022, source = "aqe") # import sites across multiple networks importUKAQ(c("my1", "bn1", "bn2"), year = 2022, source = c("aurn", "aqe", "aqe") ) # get "long" format hourly data with a ratification flag importUKAQ( "card", source = "waqn", year = 2022, to_narrow = TRUE, ratified = TRUE ) # import other data types, filtering by pollutant importUKAQ( data_type = "annual", pollutant = c("no2", "pm2.5", "pm10"), source = c("aurn", "aqe") ) ## End(Not run)
## Not run: # import a single site from the AURN importUKAQ("my1", year = 2022) # import sites from another network importUKAQ(c("bn1", "bn2"), year = 2022, source = "aqe") # import sites across multiple networks importUKAQ(c("my1", "bn1", "bn2"), year = 2022, source = c("aurn", "aqe", "aqe") ) # get "long" format hourly data with a ratification flag importUKAQ( "card", source = "waqn", year = 2022, to_narrow = TRUE, ratified = TRUE ) # import other data types, filtering by pollutant importUKAQ( data_type = "annual", pollutant = c("no2", "pm2.5", "pm10"), source = c("aurn", "aqe") ) ## End(Not run)
This function considers linear relationships between two pollutants. The relationships are calculated on different times bases using a linear model. The slope and 95% confidence interval in slope relationships by time unit are plotted in many ways. The function is particularly useful when considering whether relationships are consistent with emissions inventories.
linearRelation( mydata, x = "nox", y = "no2", period = "month", condition = FALSE, n = 20, rsq.thresh = 0, ylab = paste0("slope from ", y, " = m.", x, " + c"), auto.text = TRUE, cols = "grey30", date.breaks = 5, plot = TRUE, ... )
linearRelation( mydata, x = "nox", y = "no2", period = "month", condition = FALSE, n = 20, rsq.thresh = 0, ylab = paste0("slope from ", y, " = m.", x, " + c"), auto.text = TRUE, cols = "grey30", date.breaks = 5, plot = TRUE, ... )
mydata |
A data frame minimally containing |
x |
First pollutant that when plotted would appear on the x-axis of a
relationship e.g. |
y |
Second pollutant that when plotted would appear on the y-axis of a
relationship e.g. |
period |
A range of different time periods can be analysed. The default
is |
condition |
For |
n |
The minimum number of points to be sent to the linear model.
Because there may only be a few points e.g. hours where two pollutants are
available over one week, |
rsq.thresh |
The minimum correlation coefficient (R2) allowed. If the
relationship between |
ylab |
y-axis title, specified by the user. |
auto.text |
Either |
cols |
Colour for the points and uncertainty intervals. |
date.breaks |
Number of major x-axis intervals to use. The function will
try and choose a sensible number of dates/times as well as formatting the
date/time appropriately to the range being considered. This does not
always work as desired automatically. The user can therefore increase or
decrease the number of intervals by adjusting the value of
|
plot |
Should a plot be produced? |
... |
Other graphical parameters. A useful one to remove the strip with
the date range on at the top of the plot is to set |
The relationships between pollutants can yield some very useful information
about source emissions and how they change. A scatterPlot between two
pollutants is the usual way to investigate the relationship. A linear
regression is useful to test the strength of the relationship. However,
considerably more information can be gleaned by considering different time
periods, such as how the relationship between two pollutants vary over time,
by day of the week, diurnally and so on. The linearRelation
function
does just that - it fits a linear relationship between two pollutants over a
wide range of time periods determined by period
.
linearRelation
function is particularly useful if background
concentrations are first removed from roadside concentrations, as the
increment will relate more directly with changes in emissions. In this
respect, using linearRelation
can provide valuable information on how
emissions may have changed over time, by hour of the day etc. Using the
function in this way will require users to do some basic manipulation with
their data first.
If a data frame is supplied that contains nox
, no2
and
o3
, the y
can be chosen as y = "ox"
. In function will
therefore consider total oxidant slope (sum of NO2 + O3), which can provide
valuable information on likely vehicle primary NO emissions. Note, however,
that most roadside sites do not have ozone measurements and
calcFno2
is the alternative.
an openair object
David Carslaw
# monthly relationship between NOx and SO2 - note rapid fall in # ratio at the beginning of the series linearRelation(mydata, x = "nox", y = "so2") # monthly relationship between NOx and SO2 - note rapid fall in # ratio at the beginning of the series ## Not run: linearRelation(mydata, x = "nox", y = "ox") # diurnal oxidant slope by year # clear change in magnitude # starting 2003, but the diurnal profile has also changed: the # morning and evening peak hours are more important, presumably # due to change in certain vehicle types ## Not run: linearRelation(mydata, x = "nox", y = "ox", period = "hour", condition = TRUE) # PM2.5/PM10 ratio, but only plot where monthly R2 >= 0.8 ## Not run: linearRelation(mydata, x = "pm10", y = "pm25", rsq.thresh = 0.8)
# monthly relationship between NOx and SO2 - note rapid fall in # ratio at the beginning of the series linearRelation(mydata, x = "nox", y = "so2") # monthly relationship between NOx and SO2 - note rapid fall in # ratio at the beginning of the series ## Not run: linearRelation(mydata, x = "nox", y = "ox") # diurnal oxidant slope by year # clear change in magnitude # starting 2003, but the diurnal profile has also changed: the # morning and evening peak hours are more important, presumably # due to change in certain vehicle types ## Not run: linearRelation(mydata, x = "nox", y = "ox", period = "hour", condition = TRUE) # PM2.5/PM10 ratio, but only plot where monthly R2 >= 0.8 ## Not run: linearRelation(mydata, x = "pm10", y = "pm25", rsq.thresh = 0.8)
Function to calculate common numerical model evaluation statistics with flexible conditioning.
modStats( mydata, mod = "mod", obs = "obs", statistic = c("n", "FAC2", "MB", "MGE", "NMB", "NMGE", "RMSE", "r", "COE", "IOA"), type = "default", rank.name = NULL, ... )
modStats( mydata, mod = "mod", obs = "obs", statistic = c("n", "FAC2", "MB", "MGE", "NMB", "NMGE", "RMSE", "r", "COE", "IOA"), type = "default", rank.name = NULL, ... )
mydata |
A data frame. |
mod |
Name of a variable in |
obs |
Name of a variable in |
statistic |
The statistic to be calculated. See details below for a description of each. |
type |
It is also possible to choose More than one type can be considered e.g. |
rank.name |
Simple model ranking can be carried out if |
... |
Arguments passed on to
|
This function is under development and currently provides some common model evaluation statistics. These include (to be mathematically defined later):
, the number of complete pairs of data.
, fraction of predictions within a factor of two.
, the mean bias.
, the mean gross error.
, the normalised mean bias.
, the normalised mean gross error.
, the root mean squared error.
, the Pearson correlation coefficient. Note, can also supply and
argument
method
e.g. method = "spearman"
. Also returned is the
P value of the correlation coefficient, , which may present as
0
for
very low values.
, the Coefficient of Efficiency based on Legates and
McCabe (1999, 2012). There have been many suggestions for measuring model
performance over the years, but the COE is a simple formulation which is easy
to interpret.
A perfect model has a COE = 1. As noted by Legates and McCabe although the COE has no lower bound, a value of COE = 0.0 has a fundamental meaning. It implies that the model is no more able to predict the observed values than does the observed mean. Therefore, since the model can explain no more of the variation in the observed values than can the observed mean, such a model can have no predictive advantage.
For negative values of COE, the model is less effective than the observed mean in predicting the variation in the observations.
, the
Index of Agreement based on Willmott et al. (2011), which spans between -1
and +1 with values approaching +1 representing better model performance.
An IOA of 0.5, for example, indicates that the sum of the error-magnitudes is one half of the sum of the observed-deviation magnitudes. When IOA = 0.0, it signifies that the sum of the magnitudes of the errors and the sum of the observed-deviation magnitudes are equivalent. When IOA = -0.5, it indicates that the sum of the error-magnitudes is twice the sum of the perfect model-deviation and observed-deviation magnitudes. Values of IOA near -1.0 can mean that the model-estimated deviations about O are poor estimates of the observed deviations; but, they also can mean that there simply is little observed variability - so some caution is needed when the IOA approaches -1.
All statistics are based on complete pairs of mod
and obs
.
Conditioning is possible through setting type
, which can be a vector
e.g. type = c("weekday", "season")
.
Returns a data frame with model evaluation statistics.
David Carslaw
Legates DR, McCabe GJ. (1999). Evaluating the use of goodness-of-fit measures in hydrologic and hydroclimatic model validation. Water Resources Research 35(1): 233-241.
Legates DR, McCabe GJ. (2012). A refined index of model performance: a rejoinder, International Journal of Climatology.
Willmott, C.J., Robeson, S.M., Matsuura, K., 2011. A refined index of model performance. International Journal of Climatology.
Other model evaluation functions:
TaylorDiagram()
,
conditionalEval()
,
conditionalQuantile()
## the example below is somewhat artificial --- assuming the observed ## values are given by NOx and the predicted values by NO2. modStats(mydata, mod = "no2", obs = "nox") ## evaluation stats by season modStats(mydata, mod = "no2", obs = "nox", type = "season")
## the example below is somewhat artificial --- assuming the observed ## values are given by NOx and the predicted values by NO2. modStats(mydata, mod = "no2", obs = "nox") ## evaluation stats by season modStats(mydata, mod = "no2", obs = "nox", type = "season")
The mydata dataset is provided as an example dataset as part of the openair package. The dataset contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite between 1st January 1998 and 23rd June 2005. The data set is a tibble.
mydata
mydata
Data frame with 65533 observations (rows) and 10 variables:
Observation date/time stamp in year-month-day hour:minute:second format (POSIXct).
Wind speed, in m/s, as numeric vector.
Wind direction, in degrees from North, as a numeric vector.
Oxides of nitrogen concentration, in ppb, as a numeric vector.
Nitrogen dioxide concentration, in ppb, as a numeric vector.
Ozone concentration, in ppb, as a numeric vector.
Particulate PM10 fraction measurement, in ug/m3 (raw TEOM), as a numeric vector.
Sulfur dioxide concentration, in ppb, as a numeric vector.
Carbon monoxide concentration, in ppm, as a numeric vector.
Particulate PM2.5 fraction measurement, in ug/m3, as a numeric vector.
mydata
is supplied with the openair
package as an example
dataset for use with documented examples.
openair
functions generally require data frames with a field
"date" that can be in either POSIXct
or Date
format
mydata
was compiled from data archived in the London Air
Quality Archive. See https://www.londonair.org.uk for site details.
#basic structure head(mydata)
#basic structure head(mydata)
This in primarily an internal openair function to make it easy for users to select particular colour schemes, or define their own range of colours of a user-defined length.
openColours(scheme = "default", n = 100)
openColours(scheme = "default", n = 100)
scheme |
Any one of the pre-defined |
n |
number of colours required. |
A character vector of hex codes
The following schemes are made available by openColours()
:
Sequential Colours:
"default", "increment", "brewer1", "heat", "jet", "turbo", "hue", "greyscale".
Simplified versions of the viridis
colours: "viridis", "plasma",
"magma", "inferno", "cividis", and "turbo".
Simplified versions of the RColorBrewer
sequential palettes: "Blues", "BuGn",
"BuPu", "GnBu", "Greens", "Greys", "Oranges", "OrRd", "PuBu", "PuBuGn",
"PuRd", "Purples", "RdPu", "Reds", "YlGn", "YlGnBu", "YlOrBr", "YlOrRd".
Diverging Palettes:
Simplified versions of the RColorBrewer
diverging palettes: "BrBG",
"PiYG", "PRGn", "PuOr", "RdBu", "RdGy", "RdYlBu", "RdYlGn", "Spectral".
Qualitative Palettes:
Simplified versions of the RColorBrewer
qualitative palettes:
"Accent", "Dark2", "Paired", "Pastel1", "Pastel2", "Set1", "Set2", "Set3".
"cbPalette", a colour-blind safe palette based on the work of https://www.nature.com/articles/nmeth.1618
UK Government Palettes:
"daqi" and "daqi.bands", the colours associated with the UK daily air quality index; "daqi" (a palette of 10 colours, corresponding to each
index value) or "daqi.bands" (4 colours, corresponding to each band - Low,
Moderate, High, and Very High). These colours were taken directly from
https://uk-air.defra.gov.uk/air-pollution/daqi and may be useful in
figures like calendarPlot()
.
"gaf.cat", "gaf.focus" and "gaf.seq", colours recommended by the UK Government Analysis function (https://analysisfunction.civilservice.gov.uk/policy-store/data-visualisation-colours-in-charts/). "gaf.cat" will return the 'categorical' palette (max 6 colours), "gaf.focus" the 'focus' palette (max 2 colours), and "gaf.seq" the 'sequential' palette.
Because of the way many of the schemes have been developed they only exist
over certain number of colour gradations (typically 3–10) — see
?brewer.pal
for actual details. If less than or more than the required
number of colours is supplied then openair
will interpolate the colours.
Each of the pre-defined schemes have merits and their use will depend on a
particular situation. For showing incrementing concentrations, e.g., high
concentrations emphasised, then "default", "heat", "jet", "turbo", and
"increment" are very useful. See also the description of RColorBrewer
schemes for the option scheme
.
To colour-code categorical-type problems, e.g., colours for different pollutants, "hue" and "brewer1" are useful.
When publishing in black and white, "greyscale" is often convenient. With most openair functions, as well as generating a greyscale colour gradient, it also resets strip background and other coloured text and lines to greyscale values.
Failing that, the user can define their own schemes based on R colour
names. To see the full list of names, type colors()
into R.
David Carslaw
Jack Davison
https://uk-air.defra.gov.uk/air-pollution/daqi
https://analysisfunction.civilservice.gov.uk/policy-store/data-visualisation-colours-in-charts/
# to return 5 colours from the "jet" scheme: cols <- openColours("jet", 5) cols # to interpolate between named colours e.g. 10 colours from yellow to # green to red: cols <- openColours(c("yellow", "green", "red"), 10) cols
# to return 5 colours from the "jet" scheme: cols <- openColours("jet", 5) cols # to interpolate between named colours e.g. 10 colours from yellow to # green to red: cols <- openColours(c("yellow", "green", "red"), 10) cols
percentileRose
plots percentiles by wind direction with flexible
conditioning. The plot can display multiple percentile lines or filled areas.
percentileRose( mydata, pollutant = "nox", wd = "wd", type = "default", percentile = c(25, 50, 75, 90, 95), smooth = FALSE, method = "default", cols = "default", angle = 10, mean = TRUE, mean.lty = 1, mean.lwd = 3, mean.col = "grey", fill = TRUE, intervals = NULL, angle.scale = 45, auto.text = TRUE, key.header = NULL, key.footer = "percentile", key.position = "bottom", key = TRUE, alpha = 1, plot = TRUE, ... )
percentileRose( mydata, pollutant = "nox", wd = "wd", type = "default", percentile = c(25, 50, 75, 90, 95), smooth = FALSE, method = "default", cols = "default", angle = 10, mean = TRUE, mean.lty = 1, mean.lwd = 3, mean.col = "grey", fill = TRUE, intervals = NULL, angle.scale = 45, auto.text = TRUE, key.header = NULL, key.footer = "percentile", key.position = "bottom", key = TRUE, alpha = 1, plot = TRUE, ... )
mydata |
A data frame minimally containing |
pollutant |
Mandatory. A pollutant name corresponding to a variable in a
data frame should be supplied e.g. |
wd |
Name of wind direction field. |
type |
It is also possible to choose Type can be up length two e.g. |
percentile |
The percentile value(s) to plot. Must be between 0–100. If
|
smooth |
Should the wind direction data be smoothed using a cyclic spline? |
method |
When |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
angle |
Default angle of “spokes” is when |
mean |
Show the mean by wind direction as a line? |
mean.lty |
Line type for mean line. |
mean.lwd |
Line width for mean line. |
mean.col |
Line colour for mean line. |
fill |
Should the percentile intervals be filled (default) or should
lines be drawn ( |
intervals |
User-supplied intervals for the scale e.g. |
angle.scale |
Sometimes the placement of the scale may interfere with an
interesting feature. The user can therefore set |
auto.text |
Either |
key.header |
Adds additional text/labels to the scale key. For example,
passing the options |
key.footer |
see |
key.position |
Location where the scale key is to plotted. Allowed
arguments currently include |
key |
Fine control of the scale key via |
alpha |
The alpha transparency to use for the plotting surface (a value
between 0 and 1 with zero being fully transparent and 1 fully opaque).
Setting a value below 1 can be useful when plotting surfaces on a map using
the package |
plot |
Should a plot be produced? |
... |
Other graphical parameters are passed onto |
percentileRose
calculates percentile levels of a pollutant and plots
them by wind direction. One or more percentile levels can be calculated and
these are displayed as either filled areas or as lines.
The wind directions are rounded to the nearest 10 degrees, consistent with
surface data from the UK Met Office before a smooth is fitted. The levels by
wind direction are optionally calculated using a cyclic smooth cubic spline
using the option smooth
. If smooth = FALSE
then the data are
shown in 10 degree sectors.
The percentileRose
function compliments other similar functions
including windRose
, pollutionRose
,
polarFreq
or polarPlot
. It is most useful for
showing the distribution of concentrations by wind direction and often can
reveal different sources e.g. those that only affect high percentile
concentrations such as a chimney stack.
Similar to other functions, flexible conditioning is available through the
type
option. It is easy for example to consider multiple percentile
values for a pollutant by season, year and so on. See examples below.
percentileRose
also offers great flexibility with the scale used and
the user has fine control over both the range, interval and colour.
an openair object
David Carslaw
Ashbaugh, L.L., Malm, W.C., Sadeh, W.Z., 1985. A residence time probability analysis of sulfur concentrations at ground canyon national park. Atmospheric Environment 19 (8), 1263-1270.
Other polar directional analysis functions:
polarAnnulus()
,
polarCluster()
,
polarDiff()
,
polarFreq()
,
polarPlot()
,
pollutionRose()
,
windRose()
# basic percentile plot percentileRose(mydata, pollutant = "o3") # 50/95th percentiles of ozone, with different colours percentileRose(mydata, pollutant = "o3", percentile = c(50, 95), col = "brewer1") ## Not run: # percentiles of ozone by year, with different colours percentileRose(mydata, type = "year", pollutant = "o3", col = "brewer1") # percentile concentrations by season and day/nighttime.. percentileRose(mydata, type = c("season", "daylight"), pollutant = "o3", col = "brewer1") ## End(Not run)
# basic percentile plot percentileRose(mydata, pollutant = "o3") # 50/95th percentiles of ozone, with different colours percentileRose(mydata, pollutant = "o3", percentile = c(50, 95), col = "brewer1") ## Not run: # percentiles of ozone by year, with different colours percentileRose(mydata, type = "year", pollutant = "o3", col = "brewer1") # percentile concentrations by season and day/nighttime.. percentileRose(mydata, type = c("season", "daylight"), pollutant = "o3", col = "brewer1") ## End(Not run)
Typically plots the concentration of a pollutant by wind direction and as a function of time as an annulus. The function is good for visualising how concentrations of pollutants vary by wind direction and a time period e.g. by month, day of week.
polarAnnulus( mydata, pollutant = "nox", resolution = "fine", local.tz = NULL, period = "hour", type = "default", statistic = "mean", percentile = NA, limits = NULL, cols = "default", width = "normal", min.bin = 1, exclude.missing = TRUE, date.pad = FALSE, force.positive = TRUE, k = c(20, 10), normalise = FALSE, key.header = statistic, key.footer = pollutant, key.position = "right", key = TRUE, auto.text = TRUE, alpha = 1, plot = TRUE, ... )
polarAnnulus( mydata, pollutant = "nox", resolution = "fine", local.tz = NULL, period = "hour", type = "default", statistic = "mean", percentile = NA, limits = NULL, cols = "default", width = "normal", min.bin = 1, exclude.missing = TRUE, date.pad = FALSE, force.positive = TRUE, k = c(20, 10), normalise = FALSE, key.header = statistic, key.footer = pollutant, key.position = "right", key = TRUE, auto.text = TRUE, alpha = 1, plot = TRUE, ... )
mydata |
A data frame minimally containing |
pollutant |
Mandatory. A pollutant name corresponding to a variable in a
data frame should be supplied e.g. |
resolution |
Two plot resolutions can be set: “normal” and “fine” (the default). |
local.tz |
Should the results be calculated in local time that includes
a treatment of daylight savings time (DST)? The default is not to consider
DST issues, provided the data were imported without a DST offset. Emissions
activity tends to occur at local time e.g. rush hour is at 8 am every day.
When the clocks go forward in spring, the emissions are effectively
released into the atmosphere typically 1 hour earlier during the summertime
i.e. when DST applies. When plotting diurnal profiles, this has the effect
of “smearing-out” the concentrations. Sometimes, a useful approach
is to express time as local time. This correction tends to produce
better-defined diurnal profiles of concentration (or other variables) and
allows a better comparison to be made with emissions/activity data. If set
to |
period |
This determines the temporal period to consider. Options are “hour” (the default, to plot diurnal variations), “season” to plot variation throughout the year, “weekday” to plot day of the week variation and “trend” to plot the trend by wind direction. |
type |
It is also possible to choose Type can be up length two e.g. Also note that for the |
statistic |
The statistic that should be applied to each wind
speed/direction bin. Can be “mean” (default), “median”,
“max” (maximum), “frequency”. “stdev” (standard
deviation), “weighted.mean” or “cpf” (Conditional Probability
Function). Because of the smoothing involved, the colour scale for some of
these statistics is only to provide an indication of overall pattern and
should not be interpreted in concentration units e.g. for |
percentile |
If |
limits |
The function does its best to choose sensible limits
automatically. However, there are circumstances when the user will wish to
set different ones. An example would be a series of plots showing each year
of data separately. The limits are set in the form |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
width |
The width of the annulus; can be “normal” (the default), “thin” or “fat”. |
min.bin |
The minimum number of points allowed in a wind speed/wind
direction bin. The default is 1. A value of two requires at least 2 valid
records in each bin an so on; bins with less than 2 valid records are set
to NA. Care should be taken when using a value > 1 because of the risk of
removing real data points. It is recommended to consider your data with
care. Also, the |
exclude.missing |
Setting this option to |
date.pad |
For |
force.positive |
The default is |
k |
The smoothing value supplied to |
normalise |
If |
key.header |
Adds additional text/labels to the scale key. For example,
passing the options |
key.footer |
see |
key.position |
Location where the scale key is to plotted. Allowed
arguments currently include |
key |
Fine control of the scale key via |
auto.text |
Either |
alpha |
The alpha transparency to use for the plotting surface (a value
between 0 and 1 with zero being fully transparent and 1 fully opaque).
Setting a value below 1 can be useful when plotting surfaces on a map using
the package |
plot |
Should a plot be produced? |
... |
Other graphical parameters passed onto |
The polarAnnulus
function shares many of the properties of the
polarPlot
. However, polarAnnulus
is focussed on displaying
information on how concentrations of a pollutant (values of another variable)
vary with wind direction and time. Plotting as an annulus helps to reduce
compression of information towards the centre of the plot. The circular plot
is easy to interpret because wind direction is most easily understood in
polar rather than Cartesian coordinates.
The inner part of the annulus represents the earliest time and the outer part of the annulus the latest time. The time dimension can be shown in many ways including "trend", "hour" (hour or day), "season" (month of the year) and "weekday" (day of the week). Taking hour as an example, the plot will show how concentrations vary by hour of the day and wind direction. Such plots can be very useful for understanding how different source influences affect a location.
For type = "trend"
the amount of smoothing does not vary linearly with
the length of the time series i.e. a certain amount of smoothing per unit
interval in time. This is a deliberate choice because should one be
interested in a subset (in time) of data, more detail will be provided for
the subset compared with the full data set. This allows users to investigate
specific periods in more detail. Full flexibility is given through the
smoothing parameter k
.
an openair object
David Carslaw
Other polar directional analysis functions:
percentileRose()
,
polarCluster()
,
polarDiff()
,
polarFreq()
,
polarPlot()
,
pollutionRose()
,
windRose()
# diurnal plot for PM10 at Marylebone Rd ## Not run: polarAnnulus(mydata, pollutant = "pm10", main = "diurnal variation in pm10 at Marylebone Road") ## End(Not run) # seasonal plot for PM10 at Marylebone Rd ## Not run: polarAnnulus(mydata, poll="pm10", period = "season") # trend in coarse particles (PMc = PM10 - PM2.5), calculate PMc first mydata$pmc <- mydata$pm10 - mydata$pm25 ## Not run: polarAnnulus(mydata, poll="pmc", period = "trend", main = "trend in pmc at Marylebone Road") ## End(Not run)
# diurnal plot for PM10 at Marylebone Rd ## Not run: polarAnnulus(mydata, pollutant = "pm10", main = "diurnal variation in pm10 at Marylebone Road") ## End(Not run) # seasonal plot for PM10 at Marylebone Rd ## Not run: polarAnnulus(mydata, poll="pm10", period = "season") # trend in coarse particles (PMc = PM10 - PM2.5), calculate PMc first mydata$pmc <- mydata$pm10 - mydata$pm25 ## Not run: polarAnnulus(mydata, poll="pmc", period = "trend", main = "trend in pmc at Marylebone Road") ## End(Not run)
Function for identifying clusters in bivariate polar plots (polarPlot()
);
identifying clusters in the original data for subsequent processing.
polarCluster( mydata, pollutant = "nox", x = "ws", wd = "wd", n.clusters = 6, after = NA, cols = "Paired", angle.scale = 315, units = x, auto.text = TRUE, plot = TRUE, plot.data = FALSE, ... )
polarCluster( mydata, pollutant = "nox", x = "ws", wd = "wd", n.clusters = 6, after = NA, cols = "Paired", angle.scale = 315, units = x, auto.text = TRUE, plot = TRUE, plot.data = FALSE, ... )
mydata |
A data frame minimally containing |
pollutant |
Mandatory. A pollutant name corresponding to a variable in a
data frame should be supplied e.g. |
x |
Name of variable to plot against wind direction in polar coordinates, the default is wind speed, “ws”. |
wd |
Name of wind direction field. |
n.clusters |
Number of clusters to use. If |
after |
The function can be applied to differences between polar plot
surfaces (see polarDiff for details). If an |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
angle.scale |
Sometimes the placement of the scale may interfere with an
interesting feature. The user can therefore set |
units |
The units shown on the polar axis scale. |
auto.text |
Either |
plot |
Should a plot be produced? |
plot.data |
By default, the |
... |
Arguments passed on to
|
Bivariate polar plots generated using the polarPlot
function provide a
very useful graphical technique for identifying and characterising different
air pollution sources. While bivariate polar plots provide a useful graphical
indication of potential sources, their location and wind-speed or other
variable dependence, they do have several limitations. Often, a ‘feature’
will be detected in a plot but the subsequent analysis of data meeting
particular wind speed/direction criteria will be based only on the judgement
of the investigator concerning the wind speed-direction intervals of
interest. Furthermore, the identification of a feature can depend on the
choice of the colour scale used, making the process somewhat arbitrary.
polarCluster
applies Partition Around Medoids (PAM) clustering
techniques to polarPlot()
surfaces to help identify potentially interesting
features for further analysis. Details of PAM can be found in the
cluster
package (a core R package that will be pre-installed on all R
systems). PAM clustering is similar to k-means but has several advantages
e.g. is more robust to outliers. The clustering is based on the equal
contribution assumed from the u and v wind components and the associated
concentration. The data are standardized before clustering takes place.
The function works best by first trying different numbers of clusters and
plotting them. This is achieved by setting n.clusters
to be of length
more than 1. For example, if n.clusters = 2:10
then a plot will be
output showing the 9 cluster levels 2 to 10.
The clustering can also be applied to differences in polar plot surfaces (see
polarDiff()
). On this case a second data frame (after
) should be
supplied.
Note that clustering is computationally intensive and the function can take a
long time to run — particularly when the number of clusters is increased.
For this reason it can be a good idea to run a few clusters first to get a
feel for it e.g. n.clusters = 2:5
.
Once the number of clusters has been decided, the user can then run
polarCluster
to return the original data frame together with a new
column cluster
, which gives the cluster number as a character (see
example). Note that any rows where the value of pollutant
is NA
are ignored so that the returned data frame may have fewer rows than the
original.
Note that there are no automatic ways in ensuring the most appropriate number of clusters as this is application dependent. However, there is often a-priori information available on what different features in polar plots correspond to. Nevertheless, the appropriateness of different clusters is best determined by post-processing the data. The Carslaw and Beevers (2012) paper discusses these issues in more detail.
Note that unlike most other openair
functions only a single
type
“default” is allowed.
an openair object. The object includes four main
components: call
, the command used to generate the plot;
data
, by default the original data frame with a new field
cluster
identifying the cluster, clust_stats
giving the
contributions made by each cluster to number of measurements, their
percentage and the percentage by pollutant; and plot
, the plot
itself. Note that any rows where the value of pollutant
is NA
are ignored so that the returned data frame may have fewer rows than the
original.
If the clustering is carried out considering differences, i.e., an
after
data frame is supplied, the output also includes the
after
data frame with cluster identified.
David Carslaw
Carslaw, D.C., Beevers, S.D, Ropkins, K and M.C. Bell (2006). Detecting and quantifying aircraft and other on-airport contributions to ambient nitrogen oxides in the vicinity of a large international airport. Atmospheric Environment. 40/28 pp 5424-5434.
Carslaw, D.C., & Beevers, S.D. (2013). Characterising and understanding emission sources using bivariate polar plots and k-means clustering. Environmental Modelling & Software, 40, 325-329. doi:10.1016/j.envsoft.2012.09.005
Other polar directional analysis functions:
percentileRose()
,
polarAnnulus()
,
polarDiff()
,
polarFreq()
,
polarPlot()
,
pollutionRose()
,
windRose()
Other cluster analysis functions:
timeProp()
,
trajCluster()
## Not run: ## plot 2-8 clusters. Warning! This can take several minutes... polarCluster(mydata, pollutant = "nox", n.clusters = 2:8) # basic plot with 6 clusters results <- polarCluster(mydata, pollutant = "nox", n.clusters = 6) ## get results, could read into a new data frame to make it easier to refer to ## e.g. results <- results$data... head(results$data) ## how many points are there in each cluster? table(results$data$cluster) ## plot clusters 3 and 4 as a timeVariation plot using SAME colours as in ## cluster plot timeVariation(subset(results$data, cluster %in% c("3", "4")), pollutant = "nox", group = "cluster", col = openColours("Paired", 6)[c(3, 4)] ) ## End(Not run)
## Not run: ## plot 2-8 clusters. Warning! This can take several minutes... polarCluster(mydata, pollutant = "nox", n.clusters = 2:8) # basic plot with 6 clusters results <- polarCluster(mydata, pollutant = "nox", n.clusters = 6) ## get results, could read into a new data frame to make it easier to refer to ## e.g. results <- results$data... head(results$data) ## how many points are there in each cluster? table(results$data$cluster) ## plot clusters 3 and 4 as a timeVariation plot using SAME colours as in ## cluster plot timeVariation(subset(results$data, cluster %in% c("3", "4")), pollutant = "nox", group = "cluster", col = openColours("Paired", 6)[c(3, 4)] ) ## End(Not run)
This function provides a way of showing the differences in concentrations between two time periods as a polar plot. There are several uses of this function, but the most common will be to see how source(s) may have changed between two periods.
polarDiff( before, after, pollutant = "nox", x = "ws", limits = NULL, plot = TRUE, ... )
polarDiff( before, after, pollutant = "nox", x = "ws", limits = NULL, plot = TRUE, ... )
before |
A data frame that represents the "before" case. See
|
after |
A data frame that represents the "after" case. See |
pollutant |
Mandatory. A pollutant name corresponding to a variable in a
data frame should be supplied e.g. |
x |
Name of variable to plot against wind direction in polar coordinates, the default is wind speed, “ws”. |
limits |
The function does its best to choose sensible limits
automatically. However, there are circumstances when the user will wish to
set different ones. An example would be a series of plots showing each year
of data separately. The limits are set in the form |
plot |
Should a plot be produced? |
... |
Arguments passed on to
|
While the function is primarily intended to compare two time periods at the same location, it can be used for any two data sets that contain the same pollutant. For example, data from two sites that are close to one another, or two co-located instruments.
The analysis works by calculating the polar plot surface for the
before
and after
periods and then subtracting the before
surface from the after
surface.
an openair plot.
Other polar directional analysis functions:
percentileRose()
,
polarAnnulus()
,
polarCluster()
,
polarFreq()
,
polarPlot()
,
pollutionRose()
,
windRose()
## Not run: before_data <- selectByDate(mydata, year = 2002) after_data <- selectByDate(mydata, year = 2003) polarDiff(before_data, after_data, pollutant = "no2") # with some options polarDiff(before_data, after_data, pollutant = "no2", cols = "RdYlBu", limits = c(-20, 20)) ## End(Not run)
## Not run: before_data <- selectByDate(mydata, year = 2002) after_data <- selectByDate(mydata, year = 2003) polarDiff(before_data, after_data, pollutant = "no2") # with some options polarDiff(before_data, after_data, pollutant = "no2", cols = "RdYlBu", limits = c(-20, 20)) ## End(Not run)
polarFreq
primarily plots wind speed-direction frequencies in
‘bins’. Each bin is colour-coded depending on the frequency of
measurements. Bins can also be used to show the concentration of pollutants
using a range of commonly used statistics.
polarFreq( mydata, pollutant = NULL, statistic = "frequency", ws.int = 1, wd.nint = 36, grid.line = 5, breaks = NULL, cols = "default", trans = TRUE, type = "default", min.bin = 1, ws.upper = NA, offset = 10, border.col = "transparent", key.header = statistic, key.footer = pollutant, key.position = "right", key = TRUE, auto.text = TRUE, alpha = 1, plot = TRUE, ... )
polarFreq( mydata, pollutant = NULL, statistic = "frequency", ws.int = 1, wd.nint = 36, grid.line = 5, breaks = NULL, cols = "default", trans = TRUE, type = "default", min.bin = 1, ws.upper = NA, offset = 10, border.col = "transparent", key.header = statistic, key.footer = pollutant, key.position = "right", key = TRUE, auto.text = TRUE, alpha = 1, plot = TRUE, ... )
mydata |
A data frame minimally containing |
pollutant |
Mandatory. A pollutant name corresponding to a variable in
a data frame should be supplied e.g. |
statistic |
The statistic that should be applied to each wind
speed/direction bin. Can be “frequency”, “mean”,
“median”, “max” (maximum), “stdev” (standard
deviation) or “weighted.mean”. The option “frequency” (the
default) is the simplest and plots the frequency of wind speed/direction
in different bins. The scale therefore shows the counts in each bin. The
option “mean” will plot the mean concentration of a pollutant (see
next point) in wind speed/direction bins, and so on. Finally,
“weighted.mean” will plot the concentration of a pollutant weighted
by wind speed/direction. Each segment therefore provides the percentage
overall contribution to the total concentration. More information is given
in the examples. Note that for options other than “frequency”, it
is necessary to also provide the name of a pollutant. See function
|
ws.int |
Wind speed interval assumed. In some cases e.g. a low met mast, an interval of 0.5 may be more appropriate. |
wd.nint |
Number of intervals of wind direction. |
grid.line |
Radial spacing of grid lines. |
breaks |
The user can provide their own scale. |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
trans |
Should a transformation be applied? Sometimes when producing
plots of this kind they can be dominated by a few high points. The default
therefore is |
type |
It is also possible to choose Type can be up length two e.g. |
min.bin |
The minimum number of points allowed in a wind speed/wind
direction bin. The default is 1. A value of two requires at least 2 valid
records in each bin an so on; bins with less than 2 valid records are set
to NA. Care should be taken when using a value > 1 because of the risk of
removing real data points. It is recommended to consider your data with
care. Also, the |
ws.upper |
A user-defined upper wind speed to use. This is useful for
ensuring a consistent scale between different plots. For example, to
always ensure that wind speeds are displayed between 1-10, set
|
offset |
|
border.col |
The colour of the boundary of each wind speed/direction bin. The default is transparent. Another useful choice sometimes is "white". |
key.header |
Adds additional text/labels to the scale key. For example,
passing the options |
key.footer |
see |
key.position |
Location where the scale key is to plotted. Allowed
arguments currently include |
key |
Fine control of the scale key via |
auto.text |
Either |
alpha |
The alpha transparency to use for the plotting surface (a value
between 0 and 1 with zero being fully transparent and 1 fully opaque).
Setting a value below 1 can be useful when plotting surfaces on a map using
the package |
plot |
Should a plot be produced? |
... |
Other graphical parameters passed onto |
polarFreq
is its default use provides details of wind speed and
direction frequencies. In this respect it is similar to
windRose
, but considers wind direction intervals of 10 degrees
and a user-specified wind speed interval. The frequency of wind
speeds/directions formed by these ‘bins’ is represented on a colour
scale.
The polarFreq
function is more flexible than either
windRose()
or polarPlot()
. It can, for example, also
consider pollutant concentrations (see examples below). Instead of the
number of data points in each bin, the concentration can be shown. Further,
a range of statistics can be used to describe each bin - see
statistic
above. Plotting mean concentrations is useful for source
identification and is the same as polarPlot()
but without
smoothing, which may be preferable for some data. Plotting with
statistic = "weighted.mean"
is particularly useful for understanding
the relative importance of different source contributions. For example, high
mean concentrations may be observed for high wind speed conditions, but the
weighted mean concentration may well show that the contribution to overall
concentrations is very low.
polarFreq
also offers great flexibility with the scale used and the
user has fine control over both the range, interval and colour.
an openair object
David Carslaw
Other polar directional analysis functions:
percentileRose()
,
polarAnnulus()
,
polarCluster()
,
polarDiff()
,
polarPlot()
,
pollutionRose()
,
windRose()
# basic wind frequency plot polarFreq(mydata) # wind frequencies by year ## Not run: polarFreq(mydata, type = "year") # mean SO2 by year, showing only bins with at least 2 points ## Not run: polarFreq(mydata, pollutant = "so2", type = "year", statistic = "mean", min.bin = 2) # weighted mean SO2 by year, showing only bins with at least 2 points ## Not run: polarFreq(mydata, pollutant = "so2", type = "year", statistic = "weighted.mean", min.bin = 2) ## End(Not run) #windRose for just 2000 and 2003 with different colours ## Not run: polarFreq(subset(mydata, format(date, "%Y") %in% c(2000, 2003)), type = "year", cols = "turbo") ## End(Not run) # user defined breaks from 0-700 in intervals of 100 (note linear scale) ## Not run: polarFreq(mydata, breaks = seq(0, 700, 100)) # more complicated user-defined breaks - useful for highlighting bins # with a certain number of data points ## Not run: polarFreq(mydata, breaks = c(0, 10, 50, 100, 250, 500, 700)) # source contribution plot and use of offset option ## Not run: polarFreq(mydata, pollutant = "pm25", statistic ="weighted.mean", offset = 50, ws.int = 25, trans = FALSE) ## End(Not run)
# basic wind frequency plot polarFreq(mydata) # wind frequencies by year ## Not run: polarFreq(mydata, type = "year") # mean SO2 by year, showing only bins with at least 2 points ## Not run: polarFreq(mydata, pollutant = "so2", type = "year", statistic = "mean", min.bin = 2) # weighted mean SO2 by year, showing only bins with at least 2 points ## Not run: polarFreq(mydata, pollutant = "so2", type = "year", statistic = "weighted.mean", min.bin = 2) ## End(Not run) #windRose for just 2000 and 2003 with different colours ## Not run: polarFreq(subset(mydata, format(date, "%Y") %in% c(2000, 2003)), type = "year", cols = "turbo") ## End(Not run) # user defined breaks from 0-700 in intervals of 100 (note linear scale) ## Not run: polarFreq(mydata, breaks = seq(0, 700, 100)) # more complicated user-defined breaks - useful for highlighting bins # with a certain number of data points ## Not run: polarFreq(mydata, breaks = c(0, 10, 50, 100, 250, 500, 700)) # source contribution plot and use of offset option ## Not run: polarFreq(mydata, pollutant = "pm25", statistic ="weighted.mean", offset = 50, ws.int = 25, trans = FALSE) ## End(Not run)
Function for plotting pollutant concentration in polar coordinates showing
concentration by wind speed (or another numeric variable) and direction. Mean
concentrations are calculated for wind speed-direction ‘bins’ (e.g.
0-1, 1-2 m/s,... and 0-10, 10-20 degrees etc.). To aid interpretation,
gam
smoothing is carried out using mgcv
.
polarPlot( mydata, pollutant = "nox", x = "ws", wd = "wd", type = "default", statistic = "mean", limits = NULL, exclude.missing = TRUE, uncertainty = FALSE, percentile = NA, cols = "default", weights = c(0.25, 0.5, 0.75), min.bin = 1, mis.col = "grey", upper = NA, angle.scale = 315, units = x, force.positive = TRUE, k = 100, normalise = FALSE, key.header = statistic, key.footer = pollutant, key.position = "right", key = TRUE, auto.text = TRUE, ws_spread = 1.5, wd_spread = 5, x_error = NA, y_error = NA, kernel = "gaussian", formula.label = TRUE, tau = 0.5, alpha = 1, plot = TRUE, ... )
polarPlot( mydata, pollutant = "nox", x = "ws", wd = "wd", type = "default", statistic = "mean", limits = NULL, exclude.missing = TRUE, uncertainty = FALSE, percentile = NA, cols = "default", weights = c(0.25, 0.5, 0.75), min.bin = 1, mis.col = "grey", upper = NA, angle.scale = 315, units = x, force.positive = TRUE, k = 100, normalise = FALSE, key.header = statistic, key.footer = pollutant, key.position = "right", key = TRUE, auto.text = TRUE, ws_spread = 1.5, wd_spread = 5, x_error = NA, y_error = NA, kernel = "gaussian", formula.label = TRUE, tau = 0.5, alpha = 1, plot = TRUE, ... )
mydata |
A data frame minimally containing |
pollutant |
Mandatory. A pollutant name corresponding to a variable in a
data frame should be supplied e.g. |
x |
Name of variable to plot against wind direction in polar coordinates, the default is wind speed, “ws”. |
wd |
Name of wind direction field. |
type |
It is also possible to choose Type can be up length two e.g. |
statistic |
The statistic that should be applied to each wind
speed/direction bin. Because of the smoothing involved, the colour scale
for some of these statistics is only to provide an indication of overall
pattern and should not be interpreted in concentration units e.g. for
|
limits |
The function does its best to choose sensible limits
automatically. However, there are circumstances when the user will wish to
set different ones. An example would be a series of plots showing each year
of data separately. The limits are set in the form |
exclude.missing |
Setting this option to |
uncertainty |
Should the uncertainty in the calculated surface be shown?
If |
percentile |
If
|
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
weights |
At the edges of the plot there may only be a few data points
in each wind speed-direction interval, which could in some situations
distort the plot if the concentrations are high. An alternative to down-weighting these points they can be removed
altogether using |
min.bin |
The minimum number of points allowed in a wind speed/wind
direction bin. The default is 1. A value of two requires at least 2 valid
records in each bin an so on; bins with less than 2 valid records are set
to NA. Care should be taken when using a value > 1 because of the risk of
removing real data points. It is recommended to consider your data with
care. Also, the |
mis.col |
When |
upper |
This sets the upper limit wind speed to be used. Often there are only a relatively few data points at very high wind speeds and plotting all of them can reduce the useful information in the plot. |
angle.scale |
Sometimes the placement of the scale may interfere with an
interesting feature. The user can therefore set |
units |
The units shown on the polar axis scale. |
force.positive |
The default is |
k |
This is the smoothing parameter used by the |
normalise |
If |
key.header |
Adds additional text/labels to the scale key. For example,
passing the options |
key.footer |
see |
key.position |
Location where the scale key is to plotted. Allowed
arguments currently include |
key |
Fine control of the scale key via |
auto.text |
Either |
ws_spread |
The value of sigma used for Gaussian kernel weighting of
wind speed when |
wd_spread |
The value of sigma used for Gaussian kernel weighting of
wind direction when |
x_error |
The |
y_error |
The |
kernel |
Type of kernel used for the weighting procedure for when
correlation or regression techniques are used. Only |
formula.label |
When pair-wise statistics such as regression slopes are calculated and plotted, should a formula label be displayed? |
tau |
The quantile to be estimated when |
alpha |
The alpha transparency to use for the plotting surface (a value
between 0 and 1 with zero being fully transparent and 1 fully opaque).
Setting a value below 1 can be useful when plotting surfaces on a map using
the package |
plot |
Should a plot be produced? |
... |
Other graphical parameters passed onto |
The bivariate polar plot is a useful diagnostic tool for quickly gaining an idea of potential sources. Wind speed is one of the most useful variables to use to separate source types (see references). For example, ground-level concentrations resulting from buoyant plumes from chimney stacks tend to peak under higher wind speed conditions. Conversely, ground-level, non-buoyant plumes such as from road traffic, tend to have highest concentrations under low wind speed conditions. Other sources such as from aircraft engines also show differing characteristics by wind speed.
The function has been developed to allow variables other than wind speed to be plotted with wind direction in polar coordinates. The key issue is that the other variable plotted against wind direction should be discriminating in some way. For example, temperature can help reveal high-level sources brought down to ground level in unstable atmospheric conditions, or show the effect a source emission dependent on temperature e.g. biogenic isoprene.
The plots can vary considerably depending on how much smoothing is done. The
approach adopted here is based on the very flexible and capable mgcv
package that uses Generalized Additive Models. While methods do exist
to find an optimum level of smoothness, they are not necessarily useful. The
principal aim of polarPlot
is as a graphical analysis rather than for
quantitative purposes. In this respect the smoothing aims to strike a balance
between revealing interesting (real) features and overly noisy data. The
defaults used in polarPlot()
are based on the analysis of data from many
different sources. More advanced users may wish to modify the code and adopt
other smoothing approaches.
Various statistics are possible to consider e.g. mean, maximum, median.
statistic = "max"
is often useful for revealing sources. Pair-wise
statistics between two pollutants can also be calculated.
The function can also be used to compare two pollutant species through a
range of pair-wise statistics (see help on statistic
) and Grange et
al. (2016) (open-access publication link below).
Wind direction is split up into 10 degree intervals and the other variable (e.g. wind speed) 30 intervals. These 2D bins are then used to calculate the statistics.
These plots often show interesting features at higher wind speeds (see
references below). For these conditions there can be very few measurements
and therefore greater uncertainty in the calculation of the surface. There
are several ways in which this issue can be tackled. First, it is possible to
avoid smoothing altogether and use polarFreq()
. Second, the effect of
setting a minimum number of measurements in each wind speed-direction bin can
be examined through min.bin
. It is possible that a single point at
high wind speed conditions can strongly affect the surface prediction.
Therefore, setting min.bin = 3
, for example, will remove all wind
speed-direction bins with fewer than 3 measurements before fitting the
surface. Third, consider setting uncertainty = TRUE
. This option will
show the predicted surface together with upper and lower 95% confidence
intervals, which take account of the frequency of measurements.
Variants on polarPlot
include polarAnnulus()
and polarFreq()
.
an openair object. data
contians four set
columns: cond
, conditioning based on type
; u
and
v
, the translational vectors based on ws
and wd
; and
the local pollutant
estimate.
David Carslaw
Ashbaugh, L.L., Malm, W.C., Sadeh, W.Z., 1985. A residence time probability analysis of sulfur concentrations at ground canyon national park. Atmospheric Environment 19 (8), 1263-1270.
Carslaw, D.C., Beevers, S.D, Ropkins, K and M.C. Bell (2006). Detecting and quantifying aircraft and other on-airport contributions to ambient nitrogen oxides in the vicinity of a large international airport. Atmospheric Environment. 40/28 pp 5424-5434.
Carslaw, D.C., & Beevers, S.D. (2013). Characterising and understanding emission sources using bivariate polar plots and k-means clustering. Environmental Modelling & Software, 40, 325-329. doi:10.1016/j.envsoft.2012.09.005
Henry, R.C., Chang, Y.S., Spiegelman, C.H., 2002. Locating nearby sources of air pollution by nonparametric regression of atmospheric concentrations on wind direction. Atmospheric Environment 36 (13), 2237-2244.
Henry, R., Norris, G.A., Vedantham, R., Turner, J.R., 2009. Source region identification using Kernel smoothing. Environ. Sci. Technol. 43 (11), 4090e4097. http:// dx.doi.org/10.1021/es8011723.
Uria-Tellaetxe, I. and D.C. Carslaw (2014). Source identification using a conditional bivariate Probability function. Environmental Modelling & Software, Vol. 59, 1-9.
Westmoreland, E.J., N. Carslaw, D.C. Carslaw, A. Gillah and E. Bates (2007). Analysis of air quality within a street canyon using statistical and dispersion modelling techniques. Atmospheric Environment. Vol. 41(39), pp. 9195-9205.
Yu, K.N., Cheung, Y.P., Cheung, T., Henry, R.C., 2004. Identifying the impact of large urban airports on local air quality by nonparametric regression. Atmospheric Environment 38 (27), 4501-4507.
Grange, S. K., Carslaw, D. C., & Lewis, A. C. 2016. Source apportionment advances with bivariate polar plots, correlation, and regression techniques. Atmospheric Environment. 145, 128-134. https://www.sciencedirect.com/science/article/pii/S1352231016307166
Other polar directional analysis functions:
percentileRose()
,
polarAnnulus()
,
polarCluster()
,
polarDiff()
,
polarFreq()
,
pollutionRose()
,
windRose()
# Use openair 'mydata' # basic plot polarPlot(openair::mydata, pollutant = "nox") ## Not run: # polarPlots by year on same scale polarPlot(mydata, pollutant = "so2", type = "year", main = "polarPlot of so2") # set minimum number of bins to be used to see if pattern remains similar polarPlot(mydata, pollutant = "nox", min.bin = 3) # plot by day of the week polarPlot(mydata, pollutant = "pm10", type = "weekday") # show the 95% confidence intervals in the surface fitting polarPlot(mydata, pollutant = "so2", uncertainty = TRUE) # Pair-wise statistics # Pearson correlation polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "r") # Robust regression slope, takes a bit of time polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "robust.slope") # Least squares regression works too but it is not recommended, use robust # regression # polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "slope") ## End(Not run)
# Use openair 'mydata' # basic plot polarPlot(openair::mydata, pollutant = "nox") ## Not run: # polarPlots by year on same scale polarPlot(mydata, pollutant = "so2", type = "year", main = "polarPlot of so2") # set minimum number of bins to be used to see if pattern remains similar polarPlot(mydata, pollutant = "nox", min.bin = 3) # plot by day of the week polarPlot(mydata, pollutant = "pm10", type = "weekday") # show the 95% confidence intervals in the surface fitting polarPlot(mydata, pollutant = "so2", uncertainty = TRUE) # Pair-wise statistics # Pearson correlation polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "r") # Robust regression slope, takes a bit of time polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "robust.slope") # Least squares regression works too but it is not recommended, use robust # regression # polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "slope") ## End(Not run)
The traditional wind rose plot that plots wind speed and wind direction by different intervals. The pollution rose applies the same plot structure but substitutes other measurements, most commonly a pollutant time series, for wind speed.
pollutionRose( mydata, pollutant = "nox", key.footer = pollutant, key.position = "right", key = TRUE, breaks = 6, paddle = FALSE, seg = 0.9, normalise = FALSE, alpha = 1, plot = TRUE, ... )
pollutionRose( mydata, pollutant = "nox", key.footer = pollutant, key.position = "right", key = TRUE, breaks = 6, paddle = FALSE, seg = 0.9, normalise = FALSE, alpha = 1, plot = TRUE, ... )
mydata |
A data frame containing fields |
pollutant |
Mandatory. A pollutant name corresponding to a variable in a
data frame should be supplied e.g. |
key.footer |
Adds additional text/labels below the scale key. See
|
key.position |
Location where the scale key is to plotted. Allowed arguments currently include “top”, “right”, “bottom” and “left”. |
key |
Fine control of the scale key via |
breaks |
Most commonly, the number of break points for pollutant
concentrations. The default, 6, attempts to breaks the supplied data at
approximately 6 sensible break points. However, |
paddle |
Either |
seg |
When |
normalise |
If |
alpha |
The alpha transparency to use for the plotting surface (a value
between 0 and 1 with zero being fully transparent and 1 fully opaque).
Setting a value below 1 can be useful when plotting surfaces on a map using
the package |
plot |
Should a plot be produced? |
... |
Arguments passed on to
|
pollutionRose()
is a windRose()
wrapper which brings pollutant
forward in the argument list, and attempts to sensibly rescale break points
based on the pollutant
data range by by-passing ws.int
.
By default, pollutionRose()
will plot a pollution rose of nox
using
"wedge" style segments and placing the scale key to the right of the plot.
It is possible to compare two wind speed-direction data sets using
pollutionRose()
. There are many reasons for doing so e.g. to see how one
site compares with another or for meteorological model evaluation. In this
case, ws
and wd
are considered to the the reference data sets
with which a second set of wind speed and wind directions are to be compared
(ws2
and wd2
). The first set of values is subtracted from the
second and the differences compared. If for example, wd2
was biased
positive compared with wd
then pollutionRose
will show the bias
in polar coordinates. In its default use, wind direction bias is colour-coded
to show negative bias in one colour and positive bias in another.
an openair object. Summarised proportions can be
extracted directly using the $data
operator, e.g.
object$data
for output <- windRose(mydata)
. This returns a
data frame with three set columns: cond
, conditioning based on
type
; wd
, the wind direction; and calm
, the
statistic
for the proportion of data unattributed to any specific
wind direction because it was collected under calm conditions; and then
several (one for each range binned for the plot) columns giving proportions
of measurements associated with each ws
or pollutant
range
plotted as a discrete panel.
Other polar directional analysis functions:
percentileRose()
,
polarAnnulus()
,
polarCluster()
,
polarDiff()
,
polarFreq()
,
polarPlot()
,
windRose()
# pollutionRose of nox pollutionRose(mydata, pollutant = "nox") ## source apportionment plot - contribution to mean ## Not run: pollutionRose(mydata, pollutant = "pm10", type = "year", statistic = "prop.mean") ## End(Not run) ## example of comparing 2 met sites ## first we will make some new ws/wd data with a postive bias mydata$ws2 = mydata$ws + 2 * rnorm(nrow(mydata)) + 1 mydata$wd2 = mydata$wd + 30 * rnorm(nrow(mydata)) + 30 ## need to correct negative wd id <- which(mydata$wd2 < 0) mydata$wd2[id] <- mydata$wd2[id] + 360 ## results show postive bias in wd and ws pollutionRose(mydata, ws = "ws", wd = "wd", ws2 = "ws2", wd2 = "wd2")
# pollutionRose of nox pollutionRose(mydata, pollutant = "nox") ## source apportionment plot - contribution to mean ## Not run: pollutionRose(mydata, pollutant = "pm10", type = "year", statistic = "prop.mean") ## End(Not run) ## example of comparing 2 met sites ## first we will make some new ws/wd data with a postive bias mydata$ws2 = mydata$ws + 2 * rnorm(nrow(mydata)) + 1 mydata$wd2 = mydata$wd + 30 * rnorm(nrow(mydata)) + 30 ## need to correct negative wd id <- which(mydata$wd2 < 0) mydata$wd2[id] <- mydata$wd2[id] + 360 ## results show postive bias in wd and ws pollutionRose(mydata, ws = "ws", wd = "wd", ws2 = "ws2", wd2 = "wd2")
Workhorse function that automatically applies routine text formatting to common expressions and data names used in openair.
quickText(text, auto.text = TRUE)
quickText(text, auto.text = TRUE)
text |
A character vector. |
auto.text |
A logical option. The default, |
quickText
is routine formatting lookup table. It screens the
supplied character vector text
and automatically applies formatting
to any recognised character sub-series. The function is used in a number of
openair
functions and can also be used directly by users to format
text components of their own graphs (see below).
The function returns an expression for graphical evaluation.
Karl Ropkins.
#example 1 ##see axis formatting in an openair plot, e.g.: scatterPlot(mydata, x = "no2", y = "pm10") #example 2 ##using quickText in other plots plot(mydata$no2, mydata$pm10, xlab = quickText("my no2 label"), ylab = quickText("pm10 [ ug.m-3 ]"))
#example 1 ##see axis formatting in an openair plot, e.g.: scatterPlot(mydata, x = "no2", y = "pm10") #example 2 ##using quickText in other plots plot(mydata$no2, mydata$pm10, xlab = quickText("my no2 label"), ylab = quickText("pm10 [ ug.m-3 ]"))
Calculate rollingMean values taking account of data capture thresholds
rollingMean( mydata, pollutant = "o3", width = 8, new.name = "rolling", data.thresh = 75, align = "centre", ... )
rollingMean( mydata, pollutant = "o3", width = 8, new.name = "rolling", data.thresh = 75, align = "centre", ... )
mydata |
A data frame containing a |
pollutant |
The name of a pollutant e.g. |
width |
The averaging period (rolling window width) to use
e.g. |
new.name |
The name given to the new rollingMean variable. If not supplied it will create a name based on the name of the pollutant and the averaging period used. |
data.thresh |
The data capture threshold in %. No values are
calculated if data capture over the period of interest is less
than this value. For example, with |
align |
specifies how the moving window should be
aligned. |
... |
other arguments, currently unused. |
This is a utility function mostly designed to calculate rolling mean statistics relevant to some pollutant limits e.g. 8 hour rolling means for ozone and 24 hour rolling means for PM10. However, the function has a more general use in helping to display rolling mean values in flexible ways e.g. with the rolling window width left, right or centre aligned.
The function will try and fill in missing time gaps to get a full time sequence but return a data frame with the same number of rows supplied.
David Carslaw
## rolling 8-hour mean for ozone mydata <- rollingMean(mydata, pollutant = "o3", width = 8, new.name = "rollingo3", data.thresh = 75, align = "right")
## rolling 8-hour mean for ozone mydata <- rollingMean(mydata, pollutant = "o3", width = 8, new.name = "rollingo3", data.thresh = 75, align = "right")
This function calculates rolling regressions for input data with a set window width. The principal use of teh function is to identify "dilution lines" where the ratio between two pollutant concentrations is invariant. The original idea is based on the work of Bentley (2004).
runRegression(mydata, x = "nox", y = "pm10", run.len = 3, date.pad = TRUE)
runRegression(mydata, x = "nox", y = "pm10", run.len = 3, date.pad = TRUE)
mydata |
A data frame with colums for |
x |
The column name of the |
y |
The column name of the |
run.len |
The window width to be used for a rolling regression. A value of 3 for example for hourly data will consider 3 one-hour time sequences. |
date.pad |
Should gaps in time series be filled before calculations are made? |
The intended use is to apply the approach to air pollution data to extract
consecutive points in time where the ratio between two pollutant
concentrations changes by very little. By filtering the output for high R2
values (typically more than 0.90 to 0.95), conditions where local source
dilution is dominant can be isolated for post processing. The function is
more fully descfibed and used in the openair
online manual, together
with examples.
A tibble with date
and calculated regression coefficients and
other information to plot dilution lines.
For original inspiration:
Bentley, S. T. (2004). Graphical techniques for constraining estimates of aerosol emissions from motor vehicles using air monitoring network data. Atmospheric Environment,(10), 1491–1500. https://doi.org/10.1016/j.atmosenv.2003.11.033
Example for vehicle emissions high time resolution data:
Farren, N. J., Schmidt, C., Juchem, H., Pöhler, D., Wilde, S. E., Wagner, R. L., Wilson, S., Shaw, M. D., & Carslaw, D. C. (2023). Emission ratio determination from road vehicles using a range of remote emission sensing techniques. Science of The Total Environment, 875. https://doi.org/10.1016/j.scitotenv.2023.162621.
Other time series and trend functions:
TheilSen()
,
calendarPlot()
,
smoothTrend()
,
timePlot()
,
timeProp()
,
timeVariation()
,
trendLevel()
# Just use part of a year of data output <- runRegression(selectByDate(mydata, year = 2004, month = 1:3), x = "nox", y = "pm10", run.len = 3) output
# Just use part of a year of data output <- runRegression(selectByDate(mydata, year = 2004, month = 1:3), x = "nox", y = "pm10", run.len = 3) output
Scatter plots with conditioning and three main approaches: conventional scatterPlot, hexagonal binning and kernel density estimates. The former also has options for fitting smooth fits and linear models with uncertainties shown.
scatterPlot( mydata, x = "nox", y = "no2", z = NA, method = "scatter", group = NA, avg.time = "default", data.thresh = 0, statistic = "mean", percentile = NA, type = "default", smooth = FALSE, spline = FALSE, linear = FALSE, ci = TRUE, mod.line = FALSE, cols = "hue", plot.type = "p", key = TRUE, key.title = group, key.columns = 1, key.position = "right", strip = TRUE, log.x = FALSE, log.y = FALSE, x.inc = NULL, y.inc = NULL, limits = NULL, windflow = NULL, y.relation = "same", x.relation = "same", ref.x = NULL, ref.y = NULL, k = NA, dist = 0.02, map = FALSE, auto.text = TRUE, plot = TRUE, ... )
scatterPlot( mydata, x = "nox", y = "no2", z = NA, method = "scatter", group = NA, avg.time = "default", data.thresh = 0, statistic = "mean", percentile = NA, type = "default", smooth = FALSE, spline = FALSE, linear = FALSE, ci = TRUE, mod.line = FALSE, cols = "hue", plot.type = "p", key = TRUE, key.title = group, key.columns = 1, key.position = "right", strip = TRUE, log.x = FALSE, log.y = FALSE, x.inc = NULL, y.inc = NULL, limits = NULL, windflow = NULL, y.relation = "same", x.relation = "same", ref.x = NULL, ref.y = NULL, k = NA, dist = 0.02, map = FALSE, auto.text = TRUE, plot = TRUE, ... )
mydata |
A data frame containing at least two numeric variables to plot. |
x |
Name of the x-variable to plot. Note that x can be a date field or a
factor. For example, |
y |
Name of the numeric y-variable to plot. |
z |
Name of the numeric z-variable to plot for |
method |
Methods include “scatter” (conventional scatter plot),
“hexbin” (hexagonal binning using the |
group |
The grouping variable to use, if any. Setting this to a variable in the data frame has the effect of plotting several series in the same panel using different symbols/colours etc. If set to a variable that is a character or factor, those categories or factor levels will be used directly. If set to a numeric variable, it will split that variable in to quantiles. |
avg.time |
This defines the time period to average to. Can be
“sec”, “min”, “hour”, “day”, “DSTday”,
“week”, “month”, “quarter” or “year”. For much
increased flexibility a number can precede these options followed by a
space. For example, a timeAverage of 2 months would be |
data.thresh |
The data capture threshold to use (\
the data using |
statistic |
The statistic to apply when aggregating the data; default is
the mean. Can be one of "mean", "max", "min", "median", "frequency", "sd",
"percentile". Note that "sd" is the standard deviation and "frequency" is
the number (frequency) of valid records in the period. "percentile" is the
percentile level (\
"percentile" option - see below. Not used if |
percentile |
The percentile level in percent used when |
type |
It is also possible to choose Type can be up length two e.g. |
smooth |
A smooth line is fitted to the data if |
spline |
A smooth spline is fitted to the data if |
linear |
A linear model is fitted to the data if |
ci |
Should the confidence intervals for the smooth/linear fit be shown? |
mod.line |
If |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
plot.type |
|
key |
Should a key be drawn? The default is |
key.title |
The title of the key (if used). |
key.columns |
Number of columns to be used in the key. With many
pollutants a single column can make to key too wide. The user can thus
choose to use several columns by setting |
key.position |
Location where the scale key is to plotted. Allowed arguments currently include “top”, “right”, “bottom” and “left”. |
strip |
Should a strip be drawn? The default is |
log.x |
Should the x-axis appear on a log scale? The default is
|
log.y |
Should the y-axis appear on a log scale? The default is
|
x.inc |
The x-interval to be used for binning data when |
y.inc |
The y-interval to be used for binning data when |
limits |
For |
windflow |
This option allows a scatter plot to show the wind
speed/direction shows as an arrow. The option is a list e.g. The maximum length of the arrow plotted is a fraction of the plot dimension
with the longest arrow being This option works best where there are not too many data to ensure over-plotting does not become a problem. |
y.relation |
This determines how the y-axis scale is plotted. “same” ensures all panels use the same scale and “free” will use panel-specific scales. The latter is a useful setting when plotting data with very different values. |
x.relation |
This determines how the x-axis scale is plotted. “same” ensures all panels use the same scale and “free” will use panel-specific scales. The latter is a useful setting when plotting data with very different values. |
ref.x |
See |
ref.y |
A list with details of the horizontal lines to be added
representing reference line(s). For example, |
k |
Smoothing parameter supplied to |
dist |
When plotting smooth surfaces ( |
map |
Should a base map be drawn? This option is under development. |
auto.text |
Either |
plot |
Should a plot be produced? |
... |
Other graphical parameters are passed onto For |
scatterPlot()
is the basic function for plotting scatter plots in flexible
ways in openair
. It is flexible enough to consider lots of
conditioning variables and takes care of fitting smooth or linear
relationships to the data.
There are four main ways of plotting the relationship between two variables,
which are set using the method
option. The default "scatter"
will plot a conventional scatterPlot. In cases where there are lots of data
and over-plotting becomes a problem, then method = "hexbin"
or
method = "density"
can be useful. The former requires the
hexbin
package to be installed.
There is also a method = "level"
which will bin the x
and
y
data according to the intervals set for x.inc
and
y.inc
and colour the bins according to levels of a third variable,
z
. Sometimes however, a far better understanding of the relationship
between three variables (x
, y
and z
) is gained by
fitting a smooth surface through the data. See examples below.
A smooth fit is shown if smooth = TRUE
which can help show the overall
form of the data e.g. whether the relationship appears to be linear or not.
Also, a linear fit can be shown using linear = TRUE
as an option.
The user has fine control over the choice of colours and symbol type used.
Another way of reducing the number of points used in the plots which can
sometimes be useful is to aggregate the data. For example, hourly data can be
aggregated to daily data. See timePlot()
for examples here.
By default plots are shown with a colour key at the bottom and in the case of
conditioning, strips on the top of each plot. Sometimes this may be overkill
and the user can opt to remove the key and/or the strip by setting key
and/or strip
to FALSE
. One reason to do this is to maximise the
plotting area and therefore the information shown.
an openair object
David Carslaw
linearRelation
, timePlot
and
timeAverage()
for details on selecting averaging times and other
statistics in a flexible way
# load openair data if not loaded already dat2004 <- selectByDate(mydata, year = 2004) # basic use, single pollutant scatterPlot(dat2004, x = "nox", y = "no2") ## Not run: # scatterPlot by year scatterPlot(mydata, x = "nox", y = "no2", type = "year") ## End(Not run) # scatterPlot by day of the week, removing key at bottom scatterPlot(dat2004, x = "nox", y = "no2", type = "weekday", key = FALSE) # example of the use of continuous where colour is used to show # different levels of a third (numeric) variable # plot daily averages and choose a filled plot symbol (pch = 16) # select only 2004 ## Not run: scatterPlot(dat2004, x = "nox", y = "no2", z = "co", avg.time = "day", pch = 16) # show linear fit, by year scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth = FALSE, linear = TRUE) # do the same, but for daily means... scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth = FALSE, linear = TRUE, avg.time = "day") # log scales scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth = FALSE, linear = TRUE, avg.time = "day", log.x = TRUE, log.y = TRUE) # also works with the x-axis in date format (alternative to timePlot) scatterPlot(mydata, x = "date", y = "no2", avg.time = "month", key = FALSE) ## multiple types and grouping variable and continuous colour scale scatterPlot(mydata, x = "nox", y = "no2", z = "o3", type = c("season", "weekend")) # use hexagonal binning library(hexbin) # basic use, single pollutant scatterPlot(mydata, x = "nox", y = "no2", method = "hexbin") # scatterPlot by year scatterPlot(mydata, x = "nox", y = "no2", type = "year", method = "hexbin") ## bin data and plot it - can see how for high NO2, O3 is also high scatterPlot(mydata, x = "nox", y = "no2", z = "o3", method = "level", dist = 0.02) ## fit surface for clearer view of relationship - clear effect of ## increased O3 scatterPlot(mydata, x = "nox", y = "no2", z = "o3", method = "level", x.inc = 10, y.inc = 2, smooth = TRUE) ## End(Not run)
# load openair data if not loaded already dat2004 <- selectByDate(mydata, year = 2004) # basic use, single pollutant scatterPlot(dat2004, x = "nox", y = "no2") ## Not run: # scatterPlot by year scatterPlot(mydata, x = "nox", y = "no2", type = "year") ## End(Not run) # scatterPlot by day of the week, removing key at bottom scatterPlot(dat2004, x = "nox", y = "no2", type = "weekday", key = FALSE) # example of the use of continuous where colour is used to show # different levels of a third (numeric) variable # plot daily averages and choose a filled plot symbol (pch = 16) # select only 2004 ## Not run: scatterPlot(dat2004, x = "nox", y = "no2", z = "co", avg.time = "day", pch = 16) # show linear fit, by year scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth = FALSE, linear = TRUE) # do the same, but for daily means... scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth = FALSE, linear = TRUE, avg.time = "day") # log scales scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth = FALSE, linear = TRUE, avg.time = "day", log.x = TRUE, log.y = TRUE) # also works with the x-axis in date format (alternative to timePlot) scatterPlot(mydata, x = "date", y = "no2", avg.time = "month", key = FALSE) ## multiple types and grouping variable and continuous colour scale scatterPlot(mydata, x = "nox", y = "no2", z = "o3", type = c("season", "weekend")) # use hexagonal binning library(hexbin) # basic use, single pollutant scatterPlot(mydata, x = "nox", y = "no2", method = "hexbin") # scatterPlot by year scatterPlot(mydata, x = "nox", y = "no2", type = "year", method = "hexbin") ## bin data and plot it - can see how for high NO2, O3 is also high scatterPlot(mydata, x = "nox", y = "no2", z = "o3", method = "level", dist = 0.02) ## fit surface for clearer view of relationship - clear effect of ## increased O3 scatterPlot(mydata, x = "nox", y = "no2", z = "o3", method = "level", x.inc = 10, y.inc = 2, smooth = TRUE) ## End(Not run)
Utility function to make it easier to select periods from a data frame before sending to a function
selectByDate( mydata, start = "1/1/2008", end = "31/12/2008", year = 2008, month = 1, day = "weekday", hour = 1 )
selectByDate( mydata, start = "1/1/2008", end = "31/12/2008", year = 2008, month = 1, day = "weekday", hour = 1 )
mydata |
A data frame containing a |
start |
A start date string in the form d/m/yyyy e.g. “1/2/1999” or in ‘R’ format i.e. “YYYY-mm-dd”, “1999-02-01” |
end |
See |
year |
A year or years to select e.g. |
month |
A month or months to select. Can either be numeric e.g.
|
day |
A day name or or days to select. |
hour |
An hour or hours to select from 0-23 e.g. |
This function makes it much easier to select periods of interest from a data frame based on dates in a British format. Selecting date/times in R format can be intimidating for new users. This function can be used to select quite complex dates simply - see examples below.
Dates are assumed to be inclusive, so start = "1/1/1999"
means that
times are selected from hour zero. Similarly, end = "31/12/1999"
will
include all hours of the 31st December. start
and end
can also
be in standard R format as a string i.e. "YYYY-mm-dd", so start =
"1999-01-01"
is fine.
All options are applied in turn making it possible to select quite complex dates
David Carslaw
## select all of 1999 data.1999 <- selectByDate(mydata, start = "1/1/1999", end = "31/12/1999") head(data.1999) tail(data.1999) # or... data.1999 <- selectByDate(mydata, start = "1999-01-01", end = "1999-12-31") # easier way data.1999 <- selectByDate(mydata, year = 1999) # more complex use: select weekdays between the hours of 7 am to 7 pm sub.data <- selectByDate(mydata, day = "weekday", hour = 7:19) # select weekends between the hours of 7 am to 7 pm in winter (Dec, Jan, Feb) sub.data <- selectByDate(mydata, day = "weekend", hour = 7:19, month = c("dec", "jan", "feb"))
## select all of 1999 data.1999 <- selectByDate(mydata, start = "1/1/1999", end = "31/12/1999") head(data.1999) tail(data.1999) # or... data.1999 <- selectByDate(mydata, start = "1999-01-01", end = "1999-12-31") # easier way data.1999 <- selectByDate(mydata, year = 1999) # more complex use: select weekdays between the hours of 7 am to 7 pm sub.data <- selectByDate(mydata, day = "weekday", hour = 7:19) # select weekends between the hours of 7 am to 7 pm in winter (Dec, Jan, Feb) sub.data <- selectByDate(mydata, day = "weekend", hour = 7:19, month = c("dec", "jan", "feb"))
Utility function to extract user-defined run lengths (durations) above a threshold
selectRunning( mydata, pollutant = "nox", criterion = ">", run.len = 5, threshold = 500, result = c("yes", "no") )
selectRunning( mydata, pollutant = "nox", criterion = ">", run.len = 5, threshold = 500, result = c("yes", "no") )
mydata |
A data frame with a |
pollutant |
Name of variable to process. Mandatory. |
criterion |
Condition to select run lengths e.g. |
run.len |
Run length for extracting contiguous values of
|
threshold |
The threshold value for |
result |
A new column |
This is a utility function to extract runs of values above a certain threshold. For example, for a data frame of hourly NOx values we would like to extract all those hours where the concentration is at least 500ppb for contiguous periods of 5 or more hours.
This function is useful, for example, for selecting pollution episodes from
a data frame i.e. where concentrations remain elevated for a certain period
of time. It may also be of more general use when analysing air pollution
data. For example, selectRunning
could be used to extract continuous
periods of rainfall — which could be important for particle
concentrations.
Returns a data frame that meets the chosen criteria. See examples below.
David Carslaw
## extract those hours where there are at least 5 consecutive NOx ## concentrations above 500ppb mydata <- selectRunning(mydata, run.len = 5, threshold = 500) ## make a polar plot of those conditions...shows that those ## conditions are dominated by low wind speeds, not ## in-canyon recirculation ## Not run: polarPlot(mydata, pollutant = "nox", type = "criterion")
## extract those hours where there are at least 5 consecutive NOx ## concentrations above 500ppb mydata <- selectRunning(mydata, run.len = 5, threshold = 500) ## make a polar plot of those conditions...shows that those ## conditions are dominated by low wind speeds, not ## in-canyon recirculation ## Not run: polarPlot(mydata, pollutant = "nox", type = "criterion")
Use non-parametric methods to calculate time series trends
smoothTrend( mydata, pollutant = "nox", deseason = FALSE, type = "default", statistic = "mean", avg.time = "month", percentile = NA, data.thresh = 0, simulate = FALSE, n = 200, autocor = FALSE, cols = "brewer1", shade = "grey95", xlab = "year", y.relation = "same", ref.x = NULL, ref.y = NULL, key.columns = length(percentile), name.pol = pollutant, ci = TRUE, alpha = 0.2, date.breaks = 7, auto.text = TRUE, k = NULL, plot = TRUE, ... )
smoothTrend( mydata, pollutant = "nox", deseason = FALSE, type = "default", statistic = "mean", avg.time = "month", percentile = NA, data.thresh = 0, simulate = FALSE, n = 200, autocor = FALSE, cols = "brewer1", shade = "grey95", xlab = "year", y.relation = "same", ref.x = NULL, ref.y = NULL, key.columns = length(percentile), name.pol = pollutant, ci = TRUE, alpha = 0.2, date.breaks = 7, auto.text = TRUE, k = NULL, plot = TRUE, ... )
mydata |
A data frame containing the field |
pollutant |
The parameter for which a trend test is required. Mandatory. |
deseason |
Should the data be de-deasonalized first? If |
type |
It is also possible to choose Type can be up length two e.g. |
statistic |
Statistic used for calculating monthly values. Default is
“mean”, but can also be “percentile”. See |
avg.time |
Can be “month” (the default), “season” or “year”. Determines the time over which data should be averaged. Note that for “year”, six or more years are required. For “season” the data are plit up into spring: March, April, May etc. Note that December is considered as belonging to winter of the following year. |
percentile |
Percentile value(s) to use if |
data.thresh |
The data capture threshold to use (%) when aggregating
the data using |
simulate |
Should simulations be carried out to determine the
Mann-Kendall tau and p-value. The default is |
n |
Number of bootstrap simulations if |
autocor |
Should autocorrelation be considered in the trend uncertainty
estimates? The default is |
cols |
Colours to use. Can be a vector of colours e.g. |
shade |
The colour used for marking alternate years. Use “white” or “transparent” to remove shading. |
xlab |
x-axis label, by default “year”. |
y.relation |
This determines how the y-axis scale is plotted. "same"
ensures all panels use the same scale and "free" will use panel-specific
scales. The latter is a useful setting when plotting data with very
different values. ref.x See |
ref.x |
See |
ref.y |
A list with details of the horizontal lines to be added
representing reference line(s). For example, |
key.columns |
Number of columns used if a key is drawn when using the
option |
name.pol |
Names to be given to the pollutant(s). This is useful if you want to give a fuller description of the variables, maybe also including subscripts etc. |
ci |
Should confidence intervals be plotted? The default is
|
alpha |
The alpha transparency of shaded confidence intervals - if plotted. A value of 0 is fully transparent and 1 is fully opaque. |
date.breaks |
Number of major x-axis intervals to use. The function
will try and choose a sensible number of dates/times as well as formatting
the date/time appropriately to the range being considered. This does not
always work as desired automatically. The user can therefore increase or
decrease the number of intervals by adjusting the value of
|
auto.text |
Either |
k |
This is the smoothing parameter used by the |
plot |
Should a plot be produced? |
... |
Other graphical parameters are passed onto |
The smoothTrend
function provides a flexible way of estimating the
trend in the concentration of a pollutant or other variable. Monthly mean
values are calculated from an hourly (or higher resolution) or daily time
series. There is the option to deseasonalise the data if there is evidence
of a seasonal cycle.
smoothTrend
uses a Generalized Additive Model (GAM) from the
gam
package to find the most appropriate level of smoothing.
The function is particularly suited to situations where trends are not
monotonic (see discussion with TheilSen()
for more details on
this). The smoothTrend
function is particularly useful as an
exploratory technique e.g. to check how linear or non-linear trends are.
95% confidence intervals are shown by shading. Bootstrap estimates of the
confidence intervals are also available through the simulate
option.
Residual resampling is used.
Trends can be considered in a very wide range of ways, controlled by setting
type
- see examples below.
an openair object
David Carslaw
Other time series and trend functions:
TheilSen()
,
calendarPlot()
,
runRegression()
,
timePlot()
,
timeProp()
,
timeVariation()
,
trendLevel()
# trend plot for nox smoothTrend(mydata, pollutant = "nox") # trend plot by each of 8 wind sectors ## Not run: smoothTrend(mydata, pollutant = "o3", type = "wd", ylab = "o3 (ppb)") # several pollutants, no plotting symbol ## Not run: smoothTrend(mydata, pollutant = c("no2", "o3", "pm10", "pm25"), pch = NA) # percentiles ## Not run: smoothTrend(mydata, pollutant = "o3", statistic = "percentile", percentile = 95) ## End(Not run) # several percentiles with control over lines used ## Not run: smoothTrend(mydata, pollutant = "o3", statistic = "percentile", percentile = c(5, 50, 95), lwd = c(1, 2, 1), lty = c(5, 1, 5)) ## End(Not run)
# trend plot for nox smoothTrend(mydata, pollutant = "nox") # trend plot by each of 8 wind sectors ## Not run: smoothTrend(mydata, pollutant = "o3", type = "wd", ylab = "o3 (ppb)") # several pollutants, no plotting symbol ## Not run: smoothTrend(mydata, pollutant = c("no2", "o3", "pm10", "pm25"), pch = NA) # percentiles ## Not run: smoothTrend(mydata, pollutant = "o3", statistic = "percentile", percentile = 95) ## End(Not run) # several percentiles with control over lines used ## Not run: smoothTrend(mydata, pollutant = "o3", statistic = "percentile", percentile = c(5, 50, 95), lwd = c(1, 2, 1), lty = c(5, 1, 5)) ## End(Not run)
Utility function to prepare input data for use in openair functions
splitByDate( mydata, dates = "1/1/2003", labels = c("before", "after"), name = "split.by" )
splitByDate( mydata, dates = "1/1/2003", labels = c("before", "after"), name = "split.by" )
mydata |
A data frame containing a |
dates |
A date or dates to split data by. |
labels |
Labels for each time partition. |
name |
The name to give the new column to identify the periods split |
This function partitions a data frame up into different time segments. It
produces a new column called controlled by name
that can be used in many
openair
functions. Note that there must be one more label than there
are dates. See examples below and in full openair
documentation.
David Carslaw
## split data up into "before" and "after" mydata <- splitByDate(mydata, dates = "1/04/2000", labels = c("before", "after")) ## split data into 3 partitions: mydata <- splitByDate(mydata, dates = c("1/1/2000", "1/3/2003"), labels = c("before", "during", "after"))
## split data up into "before" and "after" mydata <- splitByDate(mydata, dates = "1/04/2000", labels = c("before", "after")) ## split data into 3 partitions: mydata <- splitByDate(mydata, dates = c("1/1/2000", "1/3/2003"), labels = c("before", "during", "after"))
This function provides a quick graphical and numerical summary of data. The
location presence/absence of data are shown, with summary statistics and
plots of variable distributions. summaryPlot()
can also provide summaries
of a single pollutant across many sites.
summaryPlot( mydata, na.len = 24, clip = TRUE, percentile = 0.99, type = "histogram", pollutant = "nox", period = "years", avg.time = "day", print.datacap = TRUE, breaks = NULL, plot.type = "l", col.trend = "darkgoldenrod2", col.data = "lightblue", col.mis = rgb(0.65, 0.04, 0.07), col.hist = "forestgreen", cols = NULL, date.breaks = 7, auto.text = TRUE, plot = TRUE, debug = FALSE, ... )
summaryPlot( mydata, na.len = 24, clip = TRUE, percentile = 0.99, type = "histogram", pollutant = "nox", period = "years", avg.time = "day", print.datacap = TRUE, breaks = NULL, plot.type = "l", col.trend = "darkgoldenrod2", col.data = "lightblue", col.mis = rgb(0.65, 0.04, 0.07), col.hist = "forestgreen", cols = NULL, date.breaks = 7, auto.text = TRUE, plot = TRUE, debug = FALSE, ... )
mydata |
A data frame to be summarised. Must contain a |
na.len |
Missing data are only shown with at least |
clip |
When data contain outliers, the histogram or density plot can
fail to show the distribution of the main body of data. Setting |
percentile |
This is used to clip the data. For example, |
type |
|
pollutant |
|
period |
|
avg.time |
This defines the time period to average the time series
plots. Can be "sec", "min", "hour", "day" (the default), "week", "month",
"quarter" or "year". For much increased flexibility a number can precede
these options followed by a space. For example, a |
print.datacap |
Should the data capture % be shown for each period? |
breaks |
Number of histogram bins. Sometime useful but not easy to set a single value for a range of very different variables. |
plot.type |
The |
col.trend |
Colour to be used to show the monthly trend of the data,
shown as a shaded region. Type |
col.data |
Colour to be used to show the presence of data. Type
|
col.mis |
Colour to be used to show missing data. |
col.hist |
Colour for the histogram or density plot. |
cols |
Predefined colour scheme, currently only enabled for
|
date.breaks |
Number of major x-axis intervals to use. The function will
try and choose a sensible number of dates/times as well as formatting the
date/time appropriately to the range being considered. This does not
always work as desired automatically. The user can therefore increase or
decrease the number of intervals by adjusting the value of |
auto.text |
Either |
plot |
Should a plot be produced? |
debug |
Should data types be printed to the console? |
... |
Other graphical parameters. Commonly used examples include the
axis and title labelling options (such as |
summaryPlot()
produces two panels of plots: one showing the
presence/absence of data and the other the distributions. The left panel
shows time series and codes the presence or absence of data in different
colours. By stacking the plots one on top of another it is easy to compare
different pollutants/variables. Overall statistics are given for each
variable: mean, maximum, minimum, missing hours (also expressed as a
percentage), median and the 95th percentile. For each year the data capture
rate (expressed as a percentage of hours in that year) is also given.
The right panel shows either a histogram or a density plot depending on the
choice of type
. Density plots avoid the issue of arbitrary bin sizes that
can sometimes provide a misleading view of the data distribution. Density
plots are often more appropriate, but their effectiveness will depend on the
data in question.
summaryPlot()
will only show data that are numeric or integer type. This is
useful for checking that data have been imported properly. For example, if
for some reason a column representing wind speed erroneously had one or more
fields with characters in, the whole column would be either character or
factor type. The absence of a wind speed variable in the summaryPlot()
plot
would therefore indicate a problem with the input data. In this particular
case, the user should go back to the source data and remove the characters or
remove them using R functions.
If there is a field site
, which would generally mean there is more than one
site, summaryPlot()
will provide information on a
single pollutant across all sites, rather than provide details on all
pollutants at a single site. In this case the user should also provide a
name of a pollutant e.g. pollutant = "nox"
. If a pollutant is not provided
the first numeric field will automatically be chosen.
It is strongly recommended that the summaryPlot()
function is
applied to all new imported data sets to ensure the data are imported as
expected.
David Carslaw
# do not clip density plot data ## Not run: summaryPlot(mydata, clip = FALSE) ## End(Not run) # exclude highest 5 % of data etc. ## Not run: summaryPlot(mydata, percentile = 0.95) ## End(Not run) # show missing data where there are at least 96 contiguous missing # values (4 days) ## Not run: summaryPlot(mydata, na.len = 96) ## End(Not run) # show data in green ## Not run: summaryPlot(mydata, col.data = "green") ## End(Not run) # show missing data in yellow ## Not run: summaryPlot(mydata, col.mis = "yellow") ## End(Not run) # show density plot line in black ## Not run: summaryPlot(mydata, col.dens = "black") ## End(Not run)
# do not clip density plot data ## Not run: summaryPlot(mydata, clip = FALSE) ## End(Not run) # exclude highest 5 % of data etc. ## Not run: summaryPlot(mydata, percentile = 0.95) ## End(Not run) # show missing data where there are at least 96 contiguous missing # values (4 days) ## Not run: summaryPlot(mydata, na.len = 96) ## End(Not run) # show data in green ## Not run: summaryPlot(mydata, col.data = "green") ## End(Not run) # show missing data in yellow ## Not run: summaryPlot(mydata, col.mis = "yellow") ## End(Not run) # show density plot line in black ## Not run: summaryPlot(mydata, col.dens = "black") ## End(Not run)
Function to draw Taylor Diagrams for model evaluation. The function allows conditioning by any categorical or numeric variables, which makes the function very flexible.
TaylorDiagram( mydata, obs = "obs", mod = "mod", group = NULL, type = "default", normalise = FALSE, cols = "brewer1", rms.col = "darkgoldenrod", cor.col = "black", arrow.lwd = 3, annotate = "centred\nRMS error", text.obs = "observed", key = TRUE, key.title = group, key.columns = 1, key.pos = "right", strip = TRUE, auto.text = TRUE, ... )
TaylorDiagram( mydata, obs = "obs", mod = "mod", group = NULL, type = "default", normalise = FALSE, cols = "brewer1", rms.col = "darkgoldenrod", cor.col = "black", arrow.lwd = 3, annotate = "centred\nRMS error", text.obs = "observed", key = TRUE, key.title = group, key.columns = 1, key.pos = "right", strip = TRUE, auto.text = TRUE, ... )
mydata |
A data frame minimally containing a column of observations and a column of predictions. |
obs |
A column of observations with which the predictions ( |
mod |
A column of model predictions. Note, |
group |
The
|
type |
It is also possible to choose Type can be up length two e.g. Note that often it will make sense to use |
normalise |
Should the data be normalised by dividing the standard
deviation of the observations? The statistics can be normalised (and
non-dimensionalised) by dividing both the RMS difference and the standard
deviation of the |
cols |
Colours to be used for plotting. Useful options for categorical
data are available from |
rms.col |
Colour for centred-RMS lines and text. |
cor.col |
Colour for correlation coefficient lines and text. |
arrow.lwd |
Width of arrow used when used for comparing two model outputs. |
annotate |
Annotation shown for RMS error. |
text.obs |
The plot annotation for observed values; default is "observed". |
key |
Should the key be shown? |
key.title |
Title for the key. |
key.columns |
Number of columns to be used in the key. With many
pollutants a single column can make to key too wide. The user can thus
choose to use several columns by setting |
key.pos |
Position of the key e.g. “top”, “bottom”,
“left” and “right”. See details in |
strip |
Should a strip be shown? |
auto.text |
Either |
... |
Other graphical parameters are passed onto |
The Taylor Diagram is a very useful model evaluation tool. The diagram provides a way of showing how three complementary model performance statistics vary simultaneously. These statistics are the correlation coefficient R, the standard deviation (sigma) and the (centred) root-mean-square error. These three statistics can be plotted on one (2D) graph because of the way they are related to one another which can be represented through the Law of Cosines.
The openair
version of the Taylor Diagram has several enhancements
that increase its flexibility. In particular, the straightforward way of
producing conditioning plots should prove valuable under many circumstances
(using the type
option). Many examples of Taylor Diagrams focus on
model-observation comparisons for several models using all the available
data. However, more insight can be gained into model performance by
partitioning the data in various ways e.g. by season, daylight/nighttime, day
of the week, by levels of a numeric variable e.g. wind speed or by land-use
type etc.
To consider several pollutants on one plot, a column identifying the
pollutant name can be used e.g. pollutant
. Then the Taylor Diagram can
be plotted as (assuming a data frame thedata
):
TaylorDiagram(thedata, obs = "obs", mod = "mod", group = "model", type
= "pollutant")
which will give the model performance by pollutant in each panel.
Note that it is important that each panel represents data with the same mean
observed data across different groups. Therefore TaylorDiagram(mydata,
group = "model", type = "season")
is OK, whereas TaylorDiagram(mydata,
group = "season", type = "model")
is not because each panel (representing a
model) will have four different mean values — one for each season.
Generally, the option group
is either missing (one model being
evaluated) or represents a column giving the model name. However, the data
can be normalised using the normalise
option. Normalisation is carried
out on a per group
/type
basis making it possible to compare
data on different scales e.g. TaylorDiagram(mydata, group = "season",
type = "model", normalise = TRUE)
. In this way it is possible to compare
different pollutants, sites etc. in the same panel.
Also note that if multiple sites are present it makes sense to use type
= "site"
to ensure that each panel represents an individual site with its
own specific standard deviation etc. If this is not the case then select a
single site from the data first e.g. subset(mydata, site ==
"Harwell")
.
an openair object. If retained, e.g., using
output <- TaylorDiagram(thedata, obs = "nox", mod = "mod")
, this
output can be used to recover the data, reproduce or rework the original
plot or undertake further analysis. For example, output$data
will be
a data frame consisting of the group, type, correlation coefficient (R),
the standard deviation of the observations and measurements.
David Carslaw
Taylor, K.E.: Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res., 106, 7183-7192, 2001 (also see PCMDI Report 55).
taylor.diagram
from the plotrix
package from which
some of the annotation code was used.
Other model evaluation functions:
conditionalEval()
,
conditionalQuantile()
,
modStats()
## in the examples below, most effort goes into making some artificial data ## the function itself can be run very simply ## Not run: ## dummy model data for 2003 dat <- selectByDate(mydata, year = 2003) dat <- data.frame(date = mydata$date, obs = mydata$nox, mod = mydata$nox) ## now make mod worse by adding bias and noise according to the month ## do this for 3 different models dat <- transform(dat, month = as.numeric(format(date, "%m"))) mod1 <- transform(dat, mod = mod + 10 * month + 10 * month * rnorm(nrow(dat)), model = "model 1") ## lag the results for mod1 to make the correlation coefficient worse ## without affecting the sd mod1 <- transform(mod1, mod = c(mod[5:length(mod)], mod[(length(mod) - 3) : length(mod)])) ## model 2 mod2 <- transform(dat, mod = mod + 7 * month + 7 * month * rnorm(nrow(dat)), model = "model 2") ## model 3 mod3 <- transform(dat, mod = mod + 3 * month + 3 * month * rnorm(nrow(dat)), model = "model 3") mod.dat <- rbind(mod1, mod2, mod3) ## basic Taylor plot TaylorDiagram(mod.dat, obs = "obs", mod = "mod", group = "model") ## Taylor plot by season TaylorDiagram(mod.dat, obs = "obs", mod = "mod", group = "model", type = "season") ## now show how to evaluate model improvement (or otherwise) mod1a <- transform(dat, mod = mod + 2 * month + 2 * month * rnorm(nrow(dat)), model = "model 1") mod2a <- transform(mod2, mod = mod * 1.3) mod3a <- transform(dat, mod = mod + 10 * month + 10 * month * rnorm(nrow(dat)), model = "model 3") mod.dat2 <- rbind(mod1a, mod2a, mod3a) mod.dat$mod2 <- mod.dat2$mod ## now we have a data frame with 3 models, 1 set of observations ## and TWO sets of model predictions (mod and mod2) ## do for all models TaylorDiagram(mod.dat, obs = "obs", mod = c("mod", "mod2"), group = "model") ## End(Not run) ## Not run: ## all models, by season TaylorDiagram(mod.dat, obs = "obs", mod = c("mod", "mod2"), group = "model", type = "season") ## consider two groups (model/month). In this case all months are shown by model ## but are only differentiated by model. TaylorDiagram(mod.dat, obs = "obs", mod = "mod", group = c("model", "month")) ## End(Not run)
## in the examples below, most effort goes into making some artificial data ## the function itself can be run very simply ## Not run: ## dummy model data for 2003 dat <- selectByDate(mydata, year = 2003) dat <- data.frame(date = mydata$date, obs = mydata$nox, mod = mydata$nox) ## now make mod worse by adding bias and noise according to the month ## do this for 3 different models dat <- transform(dat, month = as.numeric(format(date, "%m"))) mod1 <- transform(dat, mod = mod + 10 * month + 10 * month * rnorm(nrow(dat)), model = "model 1") ## lag the results for mod1 to make the correlation coefficient worse ## without affecting the sd mod1 <- transform(mod1, mod = c(mod[5:length(mod)], mod[(length(mod) - 3) : length(mod)])) ## model 2 mod2 <- transform(dat, mod = mod + 7 * month + 7 * month * rnorm(nrow(dat)), model = "model 2") ## model 3 mod3 <- transform(dat, mod = mod + 3 * month + 3 * month * rnorm(nrow(dat)), model = "model 3") mod.dat <- rbind(mod1, mod2, mod3) ## basic Taylor plot TaylorDiagram(mod.dat, obs = "obs", mod = "mod", group = "model") ## Taylor plot by season TaylorDiagram(mod.dat, obs = "obs", mod = "mod", group = "model", type = "season") ## now show how to evaluate model improvement (or otherwise) mod1a <- transform(dat, mod = mod + 2 * month + 2 * month * rnorm(nrow(dat)), model = "model 1") mod2a <- transform(mod2, mod = mod * 1.3) mod3a <- transform(dat, mod = mod + 10 * month + 10 * month * rnorm(nrow(dat)), model = "model 3") mod.dat2 <- rbind(mod1a, mod2a, mod3a) mod.dat$mod2 <- mod.dat2$mod ## now we have a data frame with 3 models, 1 set of observations ## and TWO sets of model predictions (mod and mod2) ## do for all models TaylorDiagram(mod.dat, obs = "obs", mod = c("mod", "mod2"), group = "model") ## End(Not run) ## Not run: ## all models, by season TaylorDiagram(mod.dat, obs = "obs", mod = c("mod", "mod2"), group = "model", type = "season") ## consider two groups (model/month). In this case all months are shown by model ## but are only differentiated by model. TaylorDiagram(mod.dat, obs = "obs", mod = "mod", group = c("model", "month")) ## End(Not run)
Theil-Sen slope estimates and tests for trend.
TheilSen( mydata, pollutant = "nox", deseason = FALSE, type = "default", avg.time = "month", statistic = "mean", percentile = NA, data.thresh = 0, alpha = 0.05, dec.place = 2, xlab = "year", lab.frac = 0.99, lab.cex = 0.8, x.relation = "same", y.relation = "same", data.col = "cornflowerblue", trend = list(lty = c(1, 5), lwd = c(2, 1), col = c("red", "red")), text.col = "darkgreen", slope.text = NULL, cols = NULL, shade = "grey95", auto.text = TRUE, autocor = FALSE, slope.percent = FALSE, date.breaks = 7, date.format = NULL, plot = TRUE, silent = FALSE, ... )
TheilSen( mydata, pollutant = "nox", deseason = FALSE, type = "default", avg.time = "month", statistic = "mean", percentile = NA, data.thresh = 0, alpha = 0.05, dec.place = 2, xlab = "year", lab.frac = 0.99, lab.cex = 0.8, x.relation = "same", y.relation = "same", data.col = "cornflowerblue", trend = list(lty = c(1, 5), lwd = c(2, 1), col = c("red", "red")), text.col = "darkgreen", slope.text = NULL, cols = NULL, shade = "grey95", auto.text = TRUE, autocor = FALSE, slope.percent = FALSE, date.breaks = 7, date.format = NULL, plot = TRUE, silent = FALSE, ... )
mydata |
A data frame containing the field |
pollutant |
The parameter for which a trend test is required. Mandatory. |
deseason |
Should the data be de-deasonalized first? If |
type |
It is also possible to choose Type can be up length two e.g. |
avg.time |
Can be “month” (the default), “season” or “year”. Determines the time over which data should be averaged. Note that for “year”, six or more years are required. For “season” the data are split up into spring: March, April, May etc. Note that December is considered as belonging to winter of the following year. |
statistic |
Statistic used for calculating monthly values. Default is
“mean”, but can also be “percentile”. See |
percentile |
Single percentile value to use if |
data.thresh |
The data capture threshold to use (%) when aggregating
the data using |
alpha |
For the confidence interval calculations of the slope. The default is 0.05. To show 99\ trend, choose alpha = 0.01 etc. |
dec.place |
The number of decimal places to display the trend estimate at. The default is 2. |
xlab |
x-axis label, by default |
lab.frac |
Fraction along the y-axis that the trend information should be printed at, default 0.99. |
lab.cex |
Size of text for trend information. |
x.relation |
This determines how the x-axis scale is plotted. “same” ensures all panels use the same scale and “free” will use panel-specific scales. The latter is a useful setting when plotting data with very different values. |
y.relation |
This determines how the y-axis scale is plotted. “same” ensures all panels use the same scale and “free” will use panel-specific scales. The latter is a useful setting when plotting data with very different values. |
data.col |
Colour name for the data |
trend |
list containing information on the line width, line type and line colour for the main trend line and confidence intervals respectively. |
text.col |
Colour name for the slope/uncertainty numeric estimates |
slope.text |
The text shown for the slope (default is ‘units/year’). |
cols |
Predefined colour scheme, currently only enabled for
|
shade |
The colour used for marking alternate years. Use “white” or “transparent” to remove shading. |
auto.text |
Either |
autocor |
Should autocorrelation be considered in the trend uncertainty
estimates? The default is |
slope.percent |
Should the slope and the slope uncertainties be
expressed as a percentage change per year? The default is For |
date.breaks |
Number of major x-axis intervals to use. The function
will try and choose a sensible number of dates/times as well as formatting
the date/time appropriately to the range being considered. This does not
always work as desired automatically. The user can therefore increase or
decrease the number of intervals by adjusting the value of
|
date.format |
This option controls the date format on the
x-axis. While |
plot |
Should a plot be produced? |
silent |
When |
... |
Other graphical parameters passed onto |
The TheilSen
function provides a collection of functions to
analyse trends in air pollution data. The TheilSen
function
is flexible in the sense that it can be applied to data in many
ways e.g. by day of the week, hour of day and wind direction. This
flexibility makes it much easier to draw inferences from data
e.g. why is there a strong downward trend in concentration from
one wind sector and not another, or why trends on one day of the
week or a certain time of day are unexpected.
For data that are strongly seasonal, perhaps from a background
site, or a pollutant such as ozone, it will be important to
deseasonalise the data (using the option deseason =
TRUE
.Similarly, for data that increase, then decrease, or show
sharp changes it may be better to use smoothTrend
.
A minimum of 6 points are required for trend estimates to be made.
Note! that since version 0.5-11 openair uses Theil-Sen to derive the p values also for the slope. This is to ensure there is consistency between the calculated p value and other trend parameters i.e. slope estimates and uncertainties. The p value and all uncertainties are calculated through bootstrap simulations.
Note that the symbols shown next to each trend estimate relate to how statistically significant the trend estimate is: p $<$ 0.001 = ***, p $<$ 0.01 = **, p $<$ 0.05 = * and p $<$ 0.1 = $+$.
Some of the code used in TheilSen
is based on that from
Rand Wilcox. This mostly
relates to the Theil-Sen slope estimates and uncertainties.
Further modifications have been made to take account of correlated
data based on Kunsch (1989). The basic function has been adapted
to take account of auto-correlated data using block bootstrap
simulations if autocor = TRUE
(Kunsch, 1989). We follow the
suggestion of Kunsch (1989) of setting the block length to n(1/3)
where n is the length of the time series.
The slope estimate and confidence intervals in the slope are plotted and numerical information presented.
an openair object. The data
component of the
TheilSen
output includes two subsets: main.data
, the monthly
data res2
the trend statistics. For output <- TheilSen(mydata,
"nox")
, these can be extracted as object$data$main.data
and
object$data$res2
, respectively. Note: In the case of the intercept,
it is assumed the y-axis crosses the x-axis on 1/1/1970.
David Carslaw with some trend code from Rand Wilcox
Helsel, D., Hirsch, R., 2002. Statistical methods in water resources. US Geological Survey. Note that this is a very good resource for statistics as applied to environmental data.
Hirsch, R. M., Slack, J. R., Smith, R. A., 1982. Techniques of trend analysis for monthly water-quality data. Water Resources Research 18 (1), 107-121.
Kunsch, H. R., 1989. The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17 (3), 1217-1241.
Sen, P. K., 1968. Estimates of regression coefficient based on Kendall's tau. Journal of the American Statistical Association 63(324).
Theil, H., 1950. A rank invariant method of linear and polynomial regression analysis, i, ii, iii. Proceedings of the Koninklijke Nederlandse Akademie Wetenschappen, Series A - Mathematical Sciences 53, 386-392, 521-525, 1397-1412.
... see also several of the Air Quality Expert Group (AQEG) reports for the use of similar tests applied to UK/European air quality data.
Other time series and trend functions:
calendarPlot()
,
runRegression()
,
smoothTrend()
,
timePlot()
,
timeProp()
,
timeVariation()
,
trendLevel()
# trend plot for nox TheilSen(mydata, pollutant = "nox") # trend plot for ozone with p=0.01 i.e. uncertainty in slope shown at # 99 % confidence interval ## Not run: TheilSen(mydata, pollutant = "o3", ylab = "o3 (ppb)", alpha = 0.01) # trend plot by each of 8 wind sectors ## Not run: TheilSen(mydata, pollutant = "o3", type = "wd", ylab = "o3 (ppb)") # and for a subset of data (from year 2000 onwards) ## Not run: TheilSen(selectByDate(mydata, year = 2000:2005), pollutant = "o3", ylab = "o3 (ppb)")
# trend plot for nox TheilSen(mydata, pollutant = "nox") # trend plot for ozone with p=0.01 i.e. uncertainty in slope shown at # 99 % confidence interval ## Not run: TheilSen(mydata, pollutant = "o3", ylab = "o3 (ppb)", alpha = 0.01) # trend plot by each of 8 wind sectors ## Not run: TheilSen(mydata, pollutant = "o3", type = "wd", ylab = "o3 (ppb)") # and for a subset of data (from year 2000 onwards) ## Not run: TheilSen(selectByDate(mydata, year = 2000:2005), pollutant = "o3", ylab = "o3 (ppb)")
Function to flexibly aggregate or expand data frames by different time periods, calculating vector-averaged wind direction where appropriate. The averaged periods can also take account of data capture rates.
timeAverage( mydata, avg.time = "day", data.thresh = 0, statistic = "mean", type = "default", percentile = NA, start.date = NA, end.date = NA, interval = NA, vector.ws = FALSE, fill = FALSE, progress = TRUE, ... )
timeAverage( mydata, avg.time = "day", data.thresh = 0, statistic = "mean", type = "default", percentile = NA, start.date = NA, end.date = NA, interval = NA, vector.ws = FALSE, fill = FALSE, progress = TRUE, ... )
mydata |
A data frame containing a |
avg.time |
This defines the time period to average to. Can be
“sec”, “min”, “hour”, “day”, “DSTday”,
“week”, “month”, “quarter” or “year”. For much
increased flexibility a number can precede these options followed by a
space. For example, a timeAverage of 2 months would be Note that |
data.thresh |
The data capture threshold to use (%). A value of zero
means that all available data will be used in a particular period
regardless if of the number of values available. Conversely, a value of 100
will mean that all data will need to be present for the average to be
calculated, else it is recorded as |
statistic |
The statistic to apply when aggregating the data; default is
the mean. Can be one of “mean”, “max”, “min”,
“median”, “frequency”, “sum”, “sd”,
“percentile”. Note that “sd” is the standard deviation,
“frequency” is the number (frequency) of valid records in the period
and “data.cap” is the percentage data capture. “percentile”
is the percentile level (%) between 0-100, which can be set using the
“percentile” option — see below. Not used if |
type |
|
percentile |
The percentile level used when |
start.date |
A string giving a start date to use. This is sometimes
useful if a time series starts between obvious intervals. For example, for
a 1-minute time series that starts “2009-11-29 12:07:00” that needs
to be averaged up to 15-minute means, the intervals would be
“2009-11-29 12:07:00”, “2009-11-29 12:22:00” etc. Often,
however, it is better to round down to a more obvious start point e.g.
“2009-11-29 12:00:00” such that the sequence is then
“2009-11-29 12:00:00”, “2009-11-29 12:15:00” ...
|
end.date |
A string giving an end date to use. This is sometimes useful
to make sure a time series extends to a known end point and is useful when
|
interval |
The This option can sometimes be useful with |
vector.ws |
Should vector averaging be carried out on wind speed if
available? The default is |
fill |
When time series are expanded i.e. when a time interval is less
than the original time series, data are ‘padded out’ with |
progress |
Show a progress bar when many groups make up |
... |
Additional arguments for other functions calling
|
This function calculates time averages for a data frame. It also treats wind direction correctly through vector-averaging. For example, the average of 350 degrees and 10 degrees is either 0 or 360 - not 180. The calculations therefore average the wind components.
When a data capture threshold is set through data.thresh
it is
necessary for timeAverage
to know what the original time interval of
the input time series is. The function will try and calculate this interval
based on the most common time gap (and will print the assumed time gap to the
screen). This works fine most of the time but there are occasions where it
may not e.g. when very few data exist in a data frame or the data are monthly
(i.e. non-regular time interval between months). In this case the user can
explicitly specify the interval through interval
in the same format as
avg.time
e.g. interval = "month"
. It may also be useful to set
start.date
and end.date
if the time series do not span the
entire period of interest. For example, if a time series ended in October and
annual means are required, setting end.date
to the end of the year
will ensure that the whole period is covered and that data.thresh
is
correctly calculated. The same also goes for a time series that starts later
in the year where start.date
should be set to the beginning of the
year.
timeAverage
should be useful in many circumstances where it is
necessary to work with different time average data. For example, hourly air
pollution data and 15-minute meteorological data. To merge the two data sets
timeAverage
can be used to make the meteorological data 1-hour means
first. Alternatively, timeAverage
can be used to expand the hourly
data to 15 minute data - see example below.
For the research community timeAverage
should be useful for dealing
with outputs from instruments where there are a range of time periods used.
It is also very useful for plotting data using timePlot
. Often
the data are too dense to see patterns and setting different averaging
periods easily helps with interpretation.
Returns a data frame with date in class POSIXct
.
David Carslaw
See timePlot
that plots time series data and uses
timeAverage
to aggregate data where necessary.
## daily average values daily <- timeAverage(mydata, avg.time = "day") ## daily average values ensuring at least 75 % data capture ## i.e. at least 18 valid hours ## Not run: daily <- timeAverage(mydata, avg.time = "day", data.thresh = 75) ## 2-weekly averages ## Not run: fortnight <- timeAverage(mydata, avg.time = "2 week") ## make a 15-minute time series from an hourly one ## Not run: min15 <- timeAverage(mydata, avg.time = "15 min", fill = TRUE) ## End(Not run) # average by grouping variable ## Not run: dat <- importAURN(c("kc1", "my1"), year = 2011:2013) timeAverage(dat, avg.time = "year", type = "site") # can also retain site code timeAverage(dat, avg.time = "year", type = c("site", "code")) # or just average all the data, dropping site/code timeAverage(dat, avg.time = "year") ## End(Not run)
## daily average values daily <- timeAverage(mydata, avg.time = "day") ## daily average values ensuring at least 75 % data capture ## i.e. at least 18 valid hours ## Not run: daily <- timeAverage(mydata, avg.time = "day", data.thresh = 75) ## 2-weekly averages ## Not run: fortnight <- timeAverage(mydata, avg.time = "2 week") ## make a 15-minute time series from an hourly one ## Not run: min15 <- timeAverage(mydata, avg.time = "15 min", fill = TRUE) ## End(Not run) # average by grouping variable ## Not run: dat <- importAURN(c("kc1", "my1"), year = 2011:2013) timeAverage(dat, avg.time = "year", type = "site") # can also retain site code timeAverage(dat, avg.time = "year", type = c("site", "code")) # or just average all the data, dropping site/code timeAverage(dat, avg.time = "year") ## End(Not run)
Plot time series quickly, perhaps for multiple pollutants, grouped or in separate panels.
timePlot( mydata, pollutant = "nox", group = FALSE, stack = FALSE, normalise = NULL, avg.time = "default", data.thresh = 0, statistic = "mean", percentile = NA, date.pad = FALSE, type = "default", cols = "brewer1", plot.type = "l", key = TRUE, log = FALSE, windflow = NULL, smooth = FALSE, ci = TRUE, y.relation = "same", ref.x = NULL, ref.y = NULL, key.columns = 1, key.position = "bottom", name.pol = pollutant, date.breaks = 7, date.format = NULL, auto.text = TRUE, plot = TRUE, ... )
timePlot( mydata, pollutant = "nox", group = FALSE, stack = FALSE, normalise = NULL, avg.time = "default", data.thresh = 0, statistic = "mean", percentile = NA, date.pad = FALSE, type = "default", cols = "brewer1", plot.type = "l", key = TRUE, log = FALSE, windflow = NULL, smooth = FALSE, ci = TRUE, y.relation = "same", ref.x = NULL, ref.y = NULL, key.columns = 1, key.position = "bottom", name.pol = pollutant, date.breaks = 7, date.format = NULL, auto.text = TRUE, plot = TRUE, ... )
mydata |
A data frame of time series. Must include a |
pollutant |
Name of variable to plot. Two or more pollutants can be
plotted, in which case a form like |
group |
If more than one pollutant is chosen, should they all be plotted
on the same graph together? The default is |
stack |
If |
normalise |
Should variables be normalised? The default is is not to
normalise the data. |
avg.time |
This defines the time period to average to. Can be
“sec”, “min”, “hour”, “day”, “DSTday”,
“week”, “month”, “quarter” or “year”. For much
increased flexibility a number can precede these options followed by a
space. For example, a timeAverage of 2 months would be |
data.thresh |
The data capture threshold to use when aggregating the
data using |
statistic |
The statistic to apply when aggregating the data; default is
the mean. Can be one of “mean”, “max”, “min”,
“median”, “frequency”, “sd”, “percentile”. Note
that “sd” is the standard deviation and “frequency” is the
number (frequency) of valid records in the period. “percentile” is
the percentile level between 0-100, which can be set using the
“percentile” option - see below. Not used if |
percentile |
The percentile level in percent used when |
date.pad |
Should missing data be padded-out? This is useful where a
data frame consists of two or more "chunks" of data with time gaps between
them. By setting |
type |
It is also possible to choose Only one |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
plot.type |
The |
key |
Should a key be drawn? The default is |
log |
Should the y-axis appear on a log scale? The default is
|
windflow |
This option allows a scatter plot to show the wind
speed/direction as an arrow. The option is a list e.g. The maximum length of the arrow plotted is a fraction of the plot dimension
with the longest arrow being This option works best where there are not too many data to ensure over-plotting does not become a problem. |
smooth |
Should a smooth line be applied to the data? The default is
|
ci |
If a smooth fit line is applied, then |
y.relation |
This determines how the y-axis scale is plotted. "same" ensures all panels use the same scale and "free" will use panel-specific scales. The latter is a useful setting when plotting data with very different values. |
ref.x |
See |
ref.y |
A list with details of the horizontal lines to be added
representing reference line(s). For example, |
key.columns |
Number of columns to be used in the key. With many
pollutants a single column can make to key too wide. The user can thus
choose to use several columns by setting |
key.position |
Location where the scale key is to plotted. Can include “top”, “bottom”, “right” and “left”. |
name.pol |
This option can be used to give alternative names for the
variables plotted. Instead of taking the column headings as names, the user
can supply replacements. For example, if a column had the name “nox”
and the user wanted a different description, then setting |
date.breaks |
Number of major x-axis intervals to use. The function will
try and choose a sensible number of dates/times as well as formatting the
date/time appropriately to the range being considered. This does not always
work as desired automatically. The user can therefore increase or decrease
the number of intervals by adjusting the value of |
date.format |
This option controls the date format on the x-axis. While
|
auto.text |
Either |
plot |
Should a plot be produced? |
... |
Other graphical parameters are passed onto |
The timePlot
is the basic time series plotting function in
openair
. Its purpose is to make it quick and easy to plot time series
for pollutants and other variables. The other purpose is to plot potentially
many variables together in as compact a way as possible.
The function is flexible enough to plot more than one variable at once. If
more than one variable is chosen plots it can either show all variables on
the same plot (with different line types) on the same scale, or (if
group = FALSE
) each variable in its own panels with its own scale.
The general preference is not to plot two variables on the same graph with
two different y-scales. It can be misleading to do so and difficult with more
than two variables. If there is in interest in plotting several variables
together that have very different scales, then it can be useful to normalise
the data first, which can be down be setting the normalise
option.
The user has fine control over the choice of colours, line width and line types used. This is useful for example, to emphasise a particular variable with a specific line type/colour/width.
timePlot
works very well with selectByDate()
, which is used for
selecting particular date ranges quickly and easily. See examples below.
By default plots are shown with a colour key at the bottom and in the case of
multiple pollutants or sites, strips on the left of each plot. Sometimes this
may be overkill and the user can opt to remove the key and/or the strip by
setting key
and/or strip
to FALSE
. One reason to do this
is to maximise the plotting area and therefore the information shown.
an openair object
David Carslaw
Other time series and trend functions:
TheilSen()
,
calendarPlot()
,
runRegression()
,
smoothTrend()
,
timeProp()
,
timeVariation()
,
trendLevel()
# basic use, single pollutant timePlot(mydata, pollutant = "nox") # two pollutants in separate panels ## Not run: timePlot(mydata, pollutant = c("nox", "no2")) # two pollutants in the same panel with the same scale ## Not run: timePlot(mydata, pollutant = c("nox", "no2"), group = TRUE) # alternative by normalising concentrations and plotting on the same scale ## Not run: timePlot(mydata, pollutant = c("nox", "co", "pm10", "so2"), group = TRUE, avg.time = "year", normalise = "1/1/1998", lwd = 3, lty = 1) ## End(Not run) # examples of selecting by date # plot for nox in 1999 ## Not run: timePlot(selectByDate(mydata, year = 1999), pollutant = "nox") # select specific date range for two pollutants ## Not run: timePlot(selectByDate(mydata, start = "6/8/2003", end = "13/8/2003"), pollutant = c("no2", "o3")) ## End(Not run) # choose different line styles etc ## Not run: timePlot(mydata, pollutant = c("nox", "no2"), lty = 1) # choose different line styles etc ## Not run: timePlot(selectByDate(mydata, year = 2004, month = 6), pollutant = c("nox", "no2"), lwd = c(1, 2), col = "black") ## End(Not run) # different averaging times #daily mean O3 ## Not run: timePlot(mydata, pollutant = "o3", avg.time = "day") # daily mean O3 ensuring each day has data capture of at least 75% ## Not run: timePlot(mydata, pollutant = "o3", avg.time = "day", data.thresh = 75) # 2-week average of O3 concentrations ## Not run: timePlot(mydata, pollutant = "o3", avg.time = "2 week")
# basic use, single pollutant timePlot(mydata, pollutant = "nox") # two pollutants in separate panels ## Not run: timePlot(mydata, pollutant = c("nox", "no2")) # two pollutants in the same panel with the same scale ## Not run: timePlot(mydata, pollutant = c("nox", "no2"), group = TRUE) # alternative by normalising concentrations and plotting on the same scale ## Not run: timePlot(mydata, pollutant = c("nox", "co", "pm10", "so2"), group = TRUE, avg.time = "year", normalise = "1/1/1998", lwd = 3, lty = 1) ## End(Not run) # examples of selecting by date # plot for nox in 1999 ## Not run: timePlot(selectByDate(mydata, year = 1999), pollutant = "nox") # select specific date range for two pollutants ## Not run: timePlot(selectByDate(mydata, start = "6/8/2003", end = "13/8/2003"), pollutant = c("no2", "o3")) ## End(Not run) # choose different line styles etc ## Not run: timePlot(mydata, pollutant = c("nox", "no2"), lty = 1) # choose different line styles etc ## Not run: timePlot(selectByDate(mydata, year = 2004, month = 6), pollutant = c("nox", "no2"), lwd = c(1, 2), col = "black") ## End(Not run) # different averaging times #daily mean O3 ## Not run: timePlot(mydata, pollutant = "o3", avg.time = "day") # daily mean O3 ensuring each day has data capture of at least 75% ## Not run: timePlot(mydata, pollutant = "o3", avg.time = "day", data.thresh = 75) # 2-week average of O3 concentrations ## Not run: timePlot(mydata, pollutant = "o3", avg.time = "2 week")
This function shows time series plots as stacked bar charts. The different
categories in the bar chart are made up from a character or factor variable
in a data frame. The function is primarily developed to support the plotting
of cluster analysis output from polarCluster
and
trajCluster
that consider local and regional (back trajectory)
cluster analysis respectively. However, the function has more general use for
understanding time series data.
timeProp( mydata, pollutant = "nox", proportion = "cluster", avg.time = "day", type = "default", normalise = FALSE, cols = "Set1", date.breaks = 7, date.format = NULL, key.columns = 1, key.position = "right", key.title = proportion, auto.text = TRUE, plot = TRUE, ... )
timeProp( mydata, pollutant = "nox", proportion = "cluster", avg.time = "day", type = "default", normalise = FALSE, cols = "Set1", date.breaks = 7, date.format = NULL, key.columns = 1, key.position = "right", key.title = proportion, auto.text = TRUE, plot = TRUE, ... )
mydata |
A data frame containing the fields |
pollutant |
Name of the pollutant to plot contained in |
proportion |
The splitting variable that makes up the bars in the bar
chart e.g. |
avg.time |
This defines the time period to average to. Can be
“sec”, “min”, “hour”, “day”, “DSTday”,
“week”, “month”, “quarter” or “year”. For much
increased flexibility a number can precede these options followed by a
space. For example, a timeAverage of 2 months would be Note that |
type |
It is also possible to choose
|
normalise |
If |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
date.breaks |
Number of major x-axis intervals to use. The function will
try and choose a sensible number of dates/times as well as formatting the
date/time appropriately to the range being considered. This does not
always work as desired automatically. The user can therefore increase or
decrease the number of intervals by adjusting the value of
|
date.format |
This option controls the date format on the x-axis. While
|
key.columns |
Number of columns to be used in the key. With many
pollutants a single column can make to key too wide. The user can thus
choose to use several columns by setting |
key.position |
Location where the scale key is to plotted. Allowed arguments currently include “top”, “right”, “bottom” and “left”. |
key.title |
The title of the key. |
auto.text |
Either |
plot |
Should a plot be produced? |
... |
Other graphical parameters passed onto |
In order to plot time series in this way, some sort of time aggregation is
needed, which is controlled by the option avg.time
.
The plot shows the value of pollutant
on the y-axis (averaged
according to avg.time
). The time intervals are made up of bars split
according to proportion
. The bars therefore show how the total value
of pollutant
is made up for any time interval.
an openair object
David Carslaw
Other time series and trend functions:
TheilSen()
,
calendarPlot()
,
runRegression()
,
smoothTrend()
,
timePlot()
,
timeVariation()
,
trendLevel()
Other cluster analysis functions:
polarCluster()
,
trajCluster()
## monthly plot of SO2 showing the contribution by wind sector timeProp(mydata, pollutant = "so2", avg.time = "month", proportion = "wd")
## monthly plot of SO2 showing the contribution by wind sector timeProp(mydata, pollutant = "so2", avg.time = "month", proportion = "wd")
Plots the diurnal, day of the week and monthly variation for different variables, typically pollutant concentrations. Four separate plots are produced.
timeVariation( mydata, pollutant = "nox", local.tz = NULL, normalise = FALSE, xlab = c("hour", "hour", "month", "weekday"), name.pol = pollutant, type = "default", group = NULL, difference = FALSE, statistic = "mean", conf.int = 0.95, B = 100, ci = TRUE, cols = "hue", ref.y = NULL, key = NULL, key.columns = 1, start.day = 1, panel.gap = 0.2, auto.text = TRUE, alpha = 0.4, month.last = FALSE, plot = TRUE, ... )
timeVariation( mydata, pollutant = "nox", local.tz = NULL, normalise = FALSE, xlab = c("hour", "hour", "month", "weekday"), name.pol = pollutant, type = "default", group = NULL, difference = FALSE, statistic = "mean", conf.int = 0.95, B = 100, ci = TRUE, cols = "hue", ref.y = NULL, key = NULL, key.columns = 1, start.day = 1, panel.gap = 0.2, auto.text = TRUE, alpha = 0.4, month.last = FALSE, plot = TRUE, ... )
mydata |
A data frame of hourly (or higher temporal resolution data).
Must include a |
pollutant |
Name of variable to plot. Two or more pollutants can be
plotted, in which case a form like |
local.tz |
Should the results be calculated in local time that includes
a treatment of daylight savings time (DST)? The default is not to consider
DST issues, provided the data were imported without a DST offset. Emissions
activity tends to occur at local time e.g. rush hour is at 8 am every day.
When the clocks go forward in spring, the emissions are effectively
released into the atmosphere typically 1 hour earlier during the summertime
i.e. when DST applies. When plotting diurnal profiles, this has the effect
of “smearing-out” the concentrations. Sometimes, a useful approach
is to express time as local time. This correction tends to produce
better-defined diurnal profiles of concentration (or other variables) and
allows a better comparison to be made with emissions/activity data. If set
to |
normalise |
Should variables be normalised? The default is |
xlab |
x-axis label; one for each sub-plot. |
name.pol |
Names to be given to the pollutant(s). This is useful if you want to give a fuller description of the variables, maybe also including subscripts etc. |
type |
It is also possible to choose Only one |
group |
This sets the grouping variable to be used. For example, if a
data frame had a column |
difference |
If two pollutants are chosen then setting |
statistic |
Can be “mean” (default) or “median”. If the
statistic is ‘mean’ then the mean line and the 95\
interval in the mean are plotted by default. If the statistic is
‘median’ then the median line is plotted together with the 5/95 and
25/75th quantiles are plotted. Users can control the confidence intervals
with |
conf.int |
The confidence intervals to be plotted. If |
B |
Number of bootstrap replicates to use. Can be useful to reduce this value when there are a large number of observations available to increase the speed of the calculations without affecting the 95\ interval calculations by much. |
ci |
Should confidence intervals be shown? The default is |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
ref.y |
A list with details of the horizontal lines to be added
representing reference line(s). For example, |
key |
By default |
key.columns |
Number of columns to be used in the key. With many
pollutants a single column can make to key too wide. The user can thus
choose to use several columns by setting |
start.day |
What day of the week should the plots start on? The user can
change the start day by supplying an integer between 0 and 6. Sunday = 0,
Monday = 1, ... For example to start the weekday plots on a Saturday,
choose |
panel.gap |
The gap between panels in the hour-day plot. |
auto.text |
Either |
alpha |
The alpha transparency used for plotting confidence intervals. 0 is fully transparent and 1 is opaque. The default is 0.4 |
month.last |
Should the order of the plots be changed so the plot showing monthly means be the last plot for a logical hierarchy of averaging periods? |
plot |
Should a plot be produced? |
... |
Other graphical parameters passed onto |
The variation of pollutant concentrations by hour of the day and day of the week etc. can reveal many interesting features that relate to source types and meteorology. For traffic sources, there are often important differences in the way vehicles vary by vehicles type e.g. less heavy vehicles at weekends.
The timeVariation
function makes it easy to see how concentrations
(and many other variable types) vary by hour of the day and day of the week.
The plots also show the 95\ confidence intervals in the mean are calculated through bootstrap simulations, which will provide more robust estimates of the confidence intervals (particularly when there are relatively few data).
The function can handle multiple pollutants and uses the flexible type
option to provide separate panels for each 'type' — see cutData
for
more details. timeVariation
can also accept a group
option
which is useful if data are stacked. This will work in a similar way to
having multiple pollutants in separate columns.
The user can supply their own ylim
e.g. ylim = c(0, 200)
that
will be used for all plots. ylim
can also be a list of length four to
control the y-limits on each individual plot e.g. ylim =
list(c(-100,500), c(200, 300), c(-400,400), c(50,70))
. These pairs
correspond to the hour, weekday, month and day-hour plots respectively.
The option difference
will calculate the difference in means of two
pollutants together with bootstrap estimates of the 95\
in the difference in the mean. This works in two ways: either two pollutants
are supplied in separate columns e.g. pollutant = c("no2", "o3")
, or
there are two unique values of group
. The difference is calculated as
the second pollutant minus the first and is labelled as such. Considering
differences in this way can provide many useful insights and is particularly
useful for model evaluation when information is needed about where a model
differs from observations by many different time scales. The manual contains
various examples of using difference = TRUE
.
Note also that the timeVariation
function works well on a subset of
data and in conjunction with other plots. For example, a
polarPlot
may highlight an interesting feature for a particular
wind speed/direction range. By filtering for those conditions
timeVariation
can help determine whether the temporal variation of
that feature differs from other features — and help with source
identification.
In addition, timeVariation
will work well with other variables if
available. Examples include meteorological and traffic flow data.
Depending on the choice of statistic, a subheading is added. Users can
control the text in the subheading through the use of sub
e.g.
sub = ""
will remove any subheading.
an openair object. The four components of
timeVariation are: day.hour
, hour
, day
and
month
. Associated data.frames can be extracted directly using the
subset
option, e.g. as in plot(object, subset = "day.hour")
,
summary(output, subset = "hour")
, etc., for output <-
timeVariation(mydata, "nox")
David Carslaw
Other time series and trend functions:
TheilSen()
,
calendarPlot()
,
runRegression()
,
smoothTrend()
,
timePlot()
,
timeProp()
,
trendLevel()
# basic use timeVariation(mydata, pollutant = "nox") # for a subset of conditions ## Not run: timeVariation(subset(mydata, ws > 3 & wd > 100 & wd < 270), pollutant = "pm10", ylab = "pm10 (ug/m3)") ## End(Not run) # multiple pollutants with concentrations normalised ## Not run: timeVariation(mydata, pollutant = c("nox", "co"), normalise = TRUE) # show BST/GMT variation (see ?cutData for more details) # the NOx plot shows the profiles are very similar when expressed in # local time, showing that the profile is dominated by a local source # that varies by local time and not by GMT i.e. road vehicle emissions ## Not run: timeVariation(mydata, pollutant = "nox", type = "dst", local.tz = "Europe/London") ## In this case it is better to group the results for clarity: ## Not run: timeVariation(mydata, pollutant = "nox", group = "dst", local.tz = "Europe/London") # By contrast, a variable such as wind speed shows a clear shift when # expressed in local time. These two plots can help show whether the # variation is dominated by man-made influences or natural processes ## Not run: timeVariation(mydata, pollutant = "ws", group = "dst", local.tz = "Europe/London") ## It is also possible to plot several variables and set type. For ## example, consider the NOx and NO2 split by levels of O3: ## Not run: timeVariation(mydata, pollutant = c("nox", "no2"), type = "o3", normalise = TRUE) ## difference in concentrations ## Not run: timeVariation(mydata, poll= c("pm25", "pm10"), difference = TRUE) # It is also useful to consider how concentrations vary by # considering two different periods e.g. in intervention # analysis. In the following plot NO2 has clearly increased but much # less so at weekends - perhaps suggesting vehicles other than cars # are important because flows of cars are approximately invariant by # day of the week ## Not run: mydata <- splitByDate(mydata, dates= "1/1/2003", labels = c("before Jan. 2003", "After Jan. 2003")) timeVariation(mydata, pollutant = "no2", group = "split.by", difference = TRUE) ## End(Not run) ## sub plots can be extracted from the openair object ## Not run: myplot <- timeVariation(mydata, pollutant = "no2") plot(myplot, subset = "day.hour") # top weekday and plot ## End(Not run) ## individual plots ## plot(myplot, subset="day.hour") for the weekday and hours subplot (top) ## plot(myplot, subset="hour") for the diurnal plot ## plot(myplot, subset="day") for the weekday plot ## plot(myplot, subset="month") for the monthly plot ## numerical results (mean, lower/upper uncertainties) ## myplot$data$day.hour # the weekday and hour data set ## summary(myplot, subset = "hour") #summary of hour data set ## head(myplot, subset = "day") #head/top of day data set ## tail(myplot, subset = "month") #tail/top of month data set ## plot quantiles and median ## Not run: timeVariation(mydata, stati="median", poll="pm10", col = "firebrick") ## with different intervals timeVariation(mydata, stati="median", poll="pm10", conf.int = c(0.75, 0.99), col = "firebrick") ## End(Not run)
# basic use timeVariation(mydata, pollutant = "nox") # for a subset of conditions ## Not run: timeVariation(subset(mydata, ws > 3 & wd > 100 & wd < 270), pollutant = "pm10", ylab = "pm10 (ug/m3)") ## End(Not run) # multiple pollutants with concentrations normalised ## Not run: timeVariation(mydata, pollutant = c("nox", "co"), normalise = TRUE) # show BST/GMT variation (see ?cutData for more details) # the NOx plot shows the profiles are very similar when expressed in # local time, showing that the profile is dominated by a local source # that varies by local time and not by GMT i.e. road vehicle emissions ## Not run: timeVariation(mydata, pollutant = "nox", type = "dst", local.tz = "Europe/London") ## In this case it is better to group the results for clarity: ## Not run: timeVariation(mydata, pollutant = "nox", group = "dst", local.tz = "Europe/London") # By contrast, a variable such as wind speed shows a clear shift when # expressed in local time. These two plots can help show whether the # variation is dominated by man-made influences or natural processes ## Not run: timeVariation(mydata, pollutant = "ws", group = "dst", local.tz = "Europe/London") ## It is also possible to plot several variables and set type. For ## example, consider the NOx and NO2 split by levels of O3: ## Not run: timeVariation(mydata, pollutant = c("nox", "no2"), type = "o3", normalise = TRUE) ## difference in concentrations ## Not run: timeVariation(mydata, poll= c("pm25", "pm10"), difference = TRUE) # It is also useful to consider how concentrations vary by # considering two different periods e.g. in intervention # analysis. In the following plot NO2 has clearly increased but much # less so at weekends - perhaps suggesting vehicles other than cars # are important because flows of cars are approximately invariant by # day of the week ## Not run: mydata <- splitByDate(mydata, dates= "1/1/2003", labels = c("before Jan. 2003", "After Jan. 2003")) timeVariation(mydata, pollutant = "no2", group = "split.by", difference = TRUE) ## End(Not run) ## sub plots can be extracted from the openair object ## Not run: myplot <- timeVariation(mydata, pollutant = "no2") plot(myplot, subset = "day.hour") # top weekday and plot ## End(Not run) ## individual plots ## plot(myplot, subset="day.hour") for the weekday and hours subplot (top) ## plot(myplot, subset="hour") for the diurnal plot ## plot(myplot, subset="day") for the weekday plot ## plot(myplot, subset="month") for the monthly plot ## numerical results (mean, lower/upper uncertainties) ## myplot$data$day.hour # the weekday and hour data set ## summary(myplot, subset = "hour") #summary of hour data set ## head(myplot, subset = "day") #head/top of day data set ## tail(myplot, subset = "month") #tail/top of month data set ## plot quantiles and median ## Not run: timeVariation(mydata, stati="median", poll="pm10", col = "firebrick") ## with different intervals timeVariation(mydata, stati="median", poll="pm10", conf.int = c(0.75, 0.99), col = "firebrick") ## End(Not run)
This function carries out cluster analysis of HYSPLIT back trajectories. The
function is specifically designed to work with the trajectories imported
using the openair
importTraj
function, which provides
pre-calculated back trajectories at specific receptor locations.
trajCluster( traj, method = "Euclid", n.cluster = 5, type = "default", cols = "Set1", split.after = FALSE, map.fill = TRUE, map.cols = "grey40", map.alpha = 0.4, projection = "lambert", parameters = c(51, 51), orientation = c(90, 0, 0), by.type = FALSE, origin = TRUE, plot = TRUE, ... )
trajCluster( traj, method = "Euclid", n.cluster = 5, type = "default", cols = "Set1", split.after = FALSE, map.fill = TRUE, map.cols = "grey40", map.alpha = 0.4, projection = "lambert", parameters = c(51, 51), orientation = c(90, 0, 0), by.type = FALSE, origin = TRUE, plot = TRUE, ... )
traj |
An openair trajectory data frame resulting from the use of
|
method |
Method used to calculate the distance matrix for the back trajectories. There are two methods available: “Euclid” and “Angle”. |
n.cluster |
Number of clusters to calculate. |
type |
|
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet” and
|
split.after |
For |
map.fill |
Should the base map be a filled polygon? Default is to fill countries. |
map.cols |
If |
map.alpha |
The transparency level of the filled map which takes values from 0 (full transparency) to 1 (full opacity). Setting it below 1 can help view trajectories, trajectory surfaces etc. and a filled base map. |
projection |
The map projection to be used. Different map projections
are possible through the |
parameters |
From the |
orientation |
From the |
by.type |
The percentage of the total number of trajectories is given
for all data by default. Setting |
origin |
If true a filled circle dot is shown to mark the receptor point. |
plot |
Should a plot be produced? |
... |
Other graphical parameters passed onto |
Two main methods are available to cluster the back trajectories using two different calculations of the distance matrix. The default is to use the standard Euclidian distance between each pair of trajectories. Also available is an angle-based distance matrix based on Sirois and Bottenheim (1995). The latter method is useful when the interest is the direction of the trajectories in clustering.
The distance matrix calculations are made in C++ for speed. For data sets of
up to 1 year both methods should be relatively fast, although the
method = "Angle"
does tend to take much longer to calculate. Further
details of these methods are given in the openair manual.
an openair object. The data
component contains
both traj
(the original data appended with its cluster) and results
(the average trajectory path per cluster, shown in the trajCluster()
plot.)
David Carslaw
Sirois, A. and Bottenheim, J.W., 1995. Use of backward trajectories to interpret the 5-year record of PAN and O3 ambient air concentrations at Kejimkujik National Park, Nova Scotia. Journal of Geophysical Research, 100: 2867-2881.
Other trajectory analysis functions:
importTraj()
,
trajLevel()
,
trajPlot()
Other cluster analysis functions:
polarCluster()
,
timeProp()
## Not run: ## import trajectories traj <- importTraj(site = "london", year = 2009) ## calculate clusters clust <- trajCluster(traj, n.cluster = 5) head(clust$data) ## note new variable 'cluster' ## use different distance matrix calculation, and calculate by season traj <- trajCluster(traj, method = "Angle", type = "season", n.cluster = 4) ## End(Not run)
## Not run: ## import trajectories traj <- importTraj(site = "london", year = 2009) ## calculate clusters clust <- trajCluster(traj, n.cluster = 5) head(clust$data) ## note new variable 'cluster' ## use different distance matrix calculation, and calculate by season traj <- trajCluster(traj, method = "Angle", type = "season", n.cluster = 4) ## End(Not run)
This function plots gridded back trajectories. This function requires that
data are imported using the importTraj()
function.
trajLevel( mydata, lon = "lon", lat = "lat", pollutant = "height", type = "default", smooth = FALSE, statistic = "frequency", percentile = 90, map = TRUE, lon.inc = 1, lat.inc = 1, min.bin = 1, .combine = NA, sigma = 1.5, map.fill = TRUE, map.res = "default", map.cols = "grey40", map.alpha = 0.3, projection = "lambert", parameters = c(51, 51), orientation = c(90, 0, 0), grid.col = "deepskyblue", origin = TRUE, plot = TRUE, ... )
trajLevel( mydata, lon = "lon", lat = "lat", pollutant = "height", type = "default", smooth = FALSE, statistic = "frequency", percentile = 90, map = TRUE, lon.inc = 1, lat.inc = 1, min.bin = 1, .combine = NA, sigma = 1.5, map.fill = TRUE, map.res = "default", map.cols = "grey40", map.alpha = 0.3, projection = "lambert", parameters = c(51, 51), orientation = c(90, 0, 0), grid.col = "deepskyblue", origin = TRUE, plot = TRUE, ... )
mydata |
Data frame, the result of importing a trajectory file using
|
lon |
Column containing the longitude, as a decimal. |
lat |
Column containing the latitude, as a decimal. |
pollutant |
Pollutant to be plotted. By default the trajectory height is used. |
type |
It is also possible to choose
|
smooth |
Should the trajectory surface be smoothed? |
statistic |
Statistic to use for There are also various ways of plotting concentrations. It is possible to set If If If |
percentile |
The percentile concentration of |
map |
Should a base map be drawn? If |
lon.inc , lat.inc
|
The longitude and latitude intervals to be used for binning data. |
min.bin |
The minimum number of unique points in a grid cell. Counts
below |
.combine |
When statistic is "SQTBA" it is possible to combine lots of
receptor locations to derive a single map. |
sigma |
For the SQTBA approach |
map.fill |
Should the base map be a filled polygon? Default is to fill countries. |
map.res |
The resolution of the base map. By default the function uses
the ‘world’ map from the |
map.cols |
If |
map.alpha |
The transparency level of the filled map which takes values from 0 (full transparency) to 1 (full opacity). Setting it below 1 can help view trajectories, trajectory surfaces etc. and a filled base map. |
projection |
The map projection to be used. Different map projections
are possible through the |
parameters |
From the |
orientation |
From the |
grid.col |
The colour of the map grid to be used. To remove the grid set
|
origin |
If true a filled circle dot is shown to mark the receptor point. |
plot |
Should a plot be produced? |
... |
other arguments are passed to |
An alternative way of showing the trajectories compared with plotting
trajectory lines is to bin the points into latitude/longitude intervals. For
these purposes trajLevel()
should be used. There are several trajectory
statistics that can be plotted as gridded surfaces. First, statistic
can be
set to "frequency" to show the number of back trajectory points in a grid
square. Grid squares are by default at 1 degree intervals, controlled by
lat.inc
and lon.inc
. Such plots are useful for showing the frequency of
air mass locations. Note that it is also possible to set method = "hexbin"
for plotting frequencies (not concentrations), which will produce a plot by
hexagonal binning.
If statistic = "difference"
the trajectories associated with a
concentration greater than percentile
are compared with the the full set of
trajectories to understand the differences in frequencies of the origin of
air masses of the highest concentration trajectories compared with the
trajectories on average. The comparison is made by comparing the percentage
change in gridded frequencies. For example, such a plot could show that the
top 10\
the east.
If statistic = "pscf"
then the Potential Source Contribution Function is
plotted. The PSCF calculates the probability that a source is located at
latitude and longitude
(Pekney et al., 2006).The basis of
PSCF is that if a source is located at (i,j), an air parcel back trajectory
passing through that location indicates that material from the source can be
collected and transported along the trajectory to the receptor site. PSCF
solves
where is the number of times
that the trajectories passed through the cell (i,j) and
is the
number of times that a source concentration was high when the trajectories
passed through the cell (i,j). The criterion for determining
is
controlled by
percentile
, which by default is 90. Note also that cells with
few data have a weighting factor applied to reduce their effect.
A limitation of the PSCF method is that grid cells can have the same PSCF value when sample concentrations are either only slightly higher or much higher than the criterion. As a result, it can be difficult to distinguish moderate sources from strong ones. Seibert et al. (1994) computed concentration fields to identify source areas of pollutants. The Concentration Weighted Trajectory (CWT) approach considers the concentration of a species together with its residence time in a grid cell. The CWT approach has been shown to yield similar results to the PSCF approach. The openair manual has more details and examples of these approaches.
A further useful refinement is to smooth the resulting surface, which is
possible by setting smooth = TRUE
.
an openair object
This function is under active development and is likely to change
David Carslaw
Pekney, N. J., Davidson, C. I., Zhou, L., & Hopke, P. K. (2006). Application of PSCF and CPF to PMF-Modeled Sources of PM 2.5 in Pittsburgh. Aerosol Science and Technology, 40(10), 952-961.
Seibert, P., Kromp-Kolb, H., Baltensperger, U., Jost, D., 1994. Trajectory analysis of high-alpine air pollution data. NATO Challenges of Modern Society 18, 595-595.
Xie, Y., & Berkowitz, C. M. (2007). The use of conditional probability functions and potential source contribution functions to identify source regions and advection pathways of hydrocarbon emissions in Houston, Texas. Atmospheric Environment, 41(28), 5831-5847.
Other trajectory analysis functions:
importTraj()
,
trajCluster()
,
trajPlot()
# show a simple case with no pollutant i.e. just the trajectories # let's check to see where the trajectories were coming from when # Heathrow Airport was closed due to the Icelandic volcanic eruption # 15--21 April 2010. # import trajectories for London and plot ## Not run: lond <- importTraj("london", 2010) ## End(Not run) # more examples to follow linking with concentration measurements... # import some measurements from KC1 - London ## Not run: kc1 <- importAURN("kc1", year = 2010) # now merge with trajectory data by 'date' lond <- merge(lond, kc1, by = "date") # trajectory plot, no smoothing - and limit lat/lon area of interest # use PSCF trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20), pollutant = "pm10", statistic = "pscf" ) # can smooth surface, suing CWT approach: trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20), pollutant = "pm2.5", statistic = "cwt", smooth = TRUE ) # plot by season: trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20), pollutant = "pm2.5", statistic = "pscf", type = "season" ) ## End(Not run)
# show a simple case with no pollutant i.e. just the trajectories # let's check to see where the trajectories were coming from when # Heathrow Airport was closed due to the Icelandic volcanic eruption # 15--21 April 2010. # import trajectories for London and plot ## Not run: lond <- importTraj("london", 2010) ## End(Not run) # more examples to follow linking with concentration measurements... # import some measurements from KC1 - London ## Not run: kc1 <- importAURN("kc1", year = 2010) # now merge with trajectory data by 'date' lond <- merge(lond, kc1, by = "date") # trajectory plot, no smoothing - and limit lat/lon area of interest # use PSCF trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20), pollutant = "pm10", statistic = "pscf" ) # can smooth surface, suing CWT approach: trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20), pollutant = "pm2.5", statistic = "cwt", smooth = TRUE ) # plot by season: trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20), pollutant = "pm2.5", statistic = "pscf", type = "season" ) ## End(Not run)
This function plots back trajectories. This function requires that data are
imported using the importTraj
function.
trajPlot( mydata, lon = "lon", lat = "lat", pollutant = "height", type = "default", map = TRUE, group = NA, map.fill = TRUE, map.res = "default", map.cols = "grey40", map.alpha = 0.4, projection = "lambert", parameters = c(51, 51), orientation = c(90, 0, 0), grid.col = "deepskyblue", npoints = 12, origin = TRUE, plot = TRUE, ... )
trajPlot( mydata, lon = "lon", lat = "lat", pollutant = "height", type = "default", map = TRUE, group = NA, map.fill = TRUE, map.res = "default", map.cols = "grey40", map.alpha = 0.4, projection = "lambert", parameters = c(51, 51), orientation = c(90, 0, 0), grid.col = "deepskyblue", npoints = 12, origin = TRUE, plot = TRUE, ... )
mydata |
Data frame, the result of importing a trajectory file using
|
lon |
Column containing the longitude, as a decimal. |
lat |
Column containing the latitude, as a decimal. |
pollutant |
Pollutant to be plotted. By default the trajectory height is used. |
type |
It is also possible to choose
|
map |
Should a base map be drawn? If |
group |
It is sometimes useful to group and colour trajectories according to a grouping variable. See example below. |
map.fill |
Should the base map be a filled polygon? Default is to fill countries. |
map.res |
The resolution of the base map. By default the function uses
the ‘world’ map from the |
map.cols |
If |
map.alpha |
The transparency level of the filled map which takes values from 0 (full transparency) to 1 (full opacity). Setting it below 1 can help view trajectories, trajectory surfaces etc. and a filled base map. |
projection |
The map projection to be used. Different map projections
are possible through the |
parameters |
From the |
orientation |
From the |
grid.col |
The colour of the map grid to be used. To remove the grid set
|
npoints |
A dot is placed every |
origin |
If true a filled circle dot is shown to mark the receptor point. |
plot |
Should a plot be produced? |
... |
other arguments are passed to |
Several types of trajectory plot are available. trajPlot
by default
will plot each lat/lon location showing the origin of each trajectory, if no
pollutant
is supplied.
If a pollutant is given, by merging the trajectory data with concentration
data (see example below), the trajectories are colour-coded by the
concentration of pollutant
. With a long time series there can be lots
of overplotting making it difficult to gauge the overall concentration
pattern. In these cases setting alpha
to a low value e.g. 0.1 can
help.
The user can also show points instead of lines by plot.type = "p"
.
Note that trajPlot
will plot only the full length trajectories. This
should be remembered when selecting only part of a year to plot.
David Carslaw
Other trajectory analysis functions:
importTraj()
,
trajCluster()
,
trajLevel()
# show a simple case with no pollutant i.e. just the trajectories # let's check to see where the trajectories were coming from when # Heathrow Airport was closed due to the Icelandic volcanic eruption # 15--21 April 2010. # import trajectories for London and plot ## Not run: lond <- importTraj("london", 2010) # well, HYSPLIT seems to think there certainly were conditions where trajectories # orginated from Iceland... trajPlot(selectByDate(lond, start = "15/4/2010", end = "21/4/2010")) ## End(Not run) # plot by day, need a column that makes a date ## Not run: lond$day <- as.Date(lond$date) trajPlot(selectByDate(lond, start = "15/4/2010", end = "21/4/2010"), type = "day") ## End(Not run) # or show each day grouped by colour, with some other options set ## Not run: trajPlot(selectByDate(lond, start = "15/4/2010", end = "21/4/2010"), group = "day", col = "turbo", lwd = 2, key.pos = "right", key.col = 1) ## End(Not run) # more examples to follow linking with concentration measurements...
# show a simple case with no pollutant i.e. just the trajectories # let's check to see where the trajectories were coming from when # Heathrow Airport was closed due to the Icelandic volcanic eruption # 15--21 April 2010. # import trajectories for London and plot ## Not run: lond <- importTraj("london", 2010) # well, HYSPLIT seems to think there certainly were conditions where trajectories # orginated from Iceland... trajPlot(selectByDate(lond, start = "15/4/2010", end = "21/4/2010")) ## End(Not run) # plot by day, need a column that makes a date ## Not run: lond$day <- as.Date(lond$date) trajPlot(selectByDate(lond, start = "15/4/2010", end = "21/4/2010"), type = "day") ## End(Not run) # or show each day grouped by colour, with some other options set ## Not run: trajPlot(selectByDate(lond, start = "15/4/2010", end = "21/4/2010"), group = "day", col = "turbo", lwd = 2, key.pos = "right", key.col = 1) ## End(Not run) # more examples to follow linking with concentration measurements...
The trendLevel function provides a way of rapidly showing a large amount of data in a condensed form. In one plot, the variation in the concentration of one pollutant can to shown as a function of three other categorical properties. The default version of the plot uses y = hour of day, x = month of year and type = year to provide information on trends, seasonal effects and diurnal variations. However, x, y and type and summarising statistics can all be modified to provide a range of other similar plots.
trendLevel( mydata, pollutant = "nox", x = "month", y = "hour", type = "year", rotate.axis = c(90, 0), n.levels = c(10, 10, 4), limits = c(0, 100), cols = "default", auto.text = TRUE, key.header = "use.stat.name", key.footer = pollutant, key.position = "right", key = TRUE, labels = NA, breaks = NA, statistic = c("mean", "max", "frequency"), stat.args = NULL, stat.safe.mode = TRUE, drop.unused.types = TRUE, col.na = "white", plot = TRUE, ... )
trendLevel( mydata, pollutant = "nox", x = "month", y = "hour", type = "year", rotate.axis = c(90, 0), n.levels = c(10, 10, 4), limits = c(0, 100), cols = "default", auto.text = TRUE, key.header = "use.stat.name", key.footer = pollutant, key.position = "right", key = TRUE, labels = NA, breaks = NA, statistic = c("mean", "max", "frequency"), stat.args = NULL, stat.safe.mode = TRUE, drop.unused.types = TRUE, col.na = "white", plot = TRUE, ... )
mydata |
The openair data frame to use to generate the |
pollutant |
The name of the data series in |
x |
The name of the data series to use as the |
y |
The names of the data series to use as the |
type |
See |
rotate.axis |
The rotation to be applied to |
n.levels |
The number of levels to split |
limits |
The colour scale range to use when generating the
|
cols |
The colour set to use to colour the |
auto.text |
Automatic routine text formatting. |
key.header , key.footer
|
Adds additional text labels above and/or below
the scale key, respectively. For example, passing the options
|
key.position |
Location where the scale key should be plotted. Allowed arguments currently include “top”, “right”, “bottom” and “left”. |
key |
Fine control of the scale key via |
labels |
If a categorical colour scale is required then these labels
will be used. Note there is one less label than break. For example,
|
breaks |
If a categorical colour scale is required then these breaks
will be used. For example, |
statistic |
The statistic method to be use to summarise locally binned
|
stat.args |
Additional options to be used with |
stat.safe.mode |
An addition protection applied when using functions
directly with |
drop.unused.types |
Hide unused/empty |
col.na |
Colour to be used to show missing data. |
plot |
Should a plot be produced? |
... |
Addition options are passed on to |
trendLevel
allows the use of third party summarising functions via the
statistic
option. Any additional function arguments not included
within a function called using statistic
should be supplied as a list
of named parameters and sent using stat.args
. For example, the encoded
option statistic = "mean"
is equivalent to statistic = mean,
stat.args = list(na.rm = TRUE)
or the R command mean(x, na.rm= TRUE)
.
Many R functions and user's own code could be applied in a similar fashion,
subject to the following restrictions: the first argument sent to the
function must be the data series to be analysed; the name ‘x’ cannot be used
for any of the extra options supplied in stat.args
; and the function
should return the required answer as a numeric or NA
. Note: If the
supplied function returns more than one answer, currently only the first of
these is retained and used by trendLevel
. All other returned
information will be ignored without warning. If the function terminates with
an error when it is sent an empty data series, the option
stat.safe.mode
should not be set to FALSE
or trendLevel
may fail. Note: The stat.safe.mode = TRUE
option returns an NA without
warning for empty data series.
an openair object.
Karl Ropkins and David Carslaw
Other time series and trend functions:
TheilSen()
,
calendarPlot()
,
runRegression()
,
smoothTrend()
,
timePlot()
,
timeProp()
,
timeVariation()
#basic use #default statistic = "mean" trendLevel(mydata, pollutant = "nox") #applying same as 'own' statistic my.mean <- function(x) mean(x, na.rm = TRUE) trendLevel(mydata, pollutant = "nox", statistic = my.mean) #alternative for 'third party' statistic #trendLevel(mydata, pollutant = "nox", statistic = mean, # stat.args = list(na.rm = TRUE)) ## Not run: # example with categorical scale trendLevel(mydata, pollutant = "no2", border = "white", statistic = "max", breaks = c(0, 50, 100, 500), labels = c("low", "medium", "high"), cols = c("forestgreen", "yellow", "red")) ## End(Not run)
#basic use #default statistic = "mean" trendLevel(mydata, pollutant = "nox") #applying same as 'own' statistic my.mean <- function(x) mean(x, na.rm = TRUE) trendLevel(mydata, pollutant = "nox", statistic = my.mean) #alternative for 'third party' statistic #trendLevel(mydata, pollutant = "nox", statistic = mean, # stat.args = list(na.rm = TRUE)) ## Not run: # example with categorical scale trendLevel(mydata, pollutant = "no2", border = "white", statistic = "max", breaks = c(0, 50, 100, 500), labels = c("low", "medium", "high"), cols = c("forestgreen", "yellow", "red")) ## End(Not run)
The traditional wind rose plot that plots wind speed and wind direction by different intervals. The pollution rose applies the same plot structure but substitutes other measurements, most commonly a pollutant time series, for wind speed.
windRose( mydata, ws = "ws", wd = "wd", ws2 = NA, wd2 = NA, ws.int = 2, angle = 30, type = "default", bias.corr = TRUE, cols = "default", grid.line = NULL, width = 1, seg = NULL, auto.text = TRUE, breaks = 4, offset = 10, normalise = FALSE, max.freq = NULL, paddle = TRUE, key.header = NULL, key.footer = "(m/s)", key.position = "bottom", key = TRUE, dig.lab = 5, include.lowest = FALSE, statistic = "prop.count", pollutant = NULL, annotate = TRUE, angle.scale = 315, border = NA, alpha = 1, plot = TRUE, ... )
windRose( mydata, ws = "ws", wd = "wd", ws2 = NA, wd2 = NA, ws.int = 2, angle = 30, type = "default", bias.corr = TRUE, cols = "default", grid.line = NULL, width = 1, seg = NULL, auto.text = TRUE, breaks = 4, offset = 10, normalise = FALSE, max.freq = NULL, paddle = TRUE, key.header = NULL, key.footer = "(m/s)", key.position = "bottom", key = TRUE, dig.lab = 5, include.lowest = FALSE, statistic = "prop.count", pollutant = NULL, annotate = TRUE, angle.scale = 315, border = NA, alpha = 1, plot = TRUE, ... )
mydata |
A data frame containing fields |
ws |
Name of the column representing wind speed. |
wd |
Name of the column representing wind direction. |
ws2 , wd2
|
The user can supply a second set of wind speed and wind
direction values with which the first can be compared. See
|
ws.int |
The Wind speed interval. Default is 2 m/s but for low met masts with low mean wind speeds a value of 1 or 0.5 m/s may be better. |
angle |
Default angle of “spokes” is 30. Other potentially useful
angles are 45 and 10. Note that the width of the wind speed interval may
need adjusting using |
type |
It is also possible to choose Type can be up length two e.g. |
bias.corr |
When |
cols |
Colours to be used for plotting. Options include
“default”, “increment”, “heat”, “jet”,
“hue” and user defined. For user defined the user can supply a list
of colour names recognised by R (type |
grid.line |
Grid line interval to use. If |
width |
For |
seg |
When |
auto.text |
Either |
breaks |
Most commonly, the number of break points for wind speed. With
the |
offset |
The size of the 'hole' in the middle of the plot, expressed as a percentage of the polar axis scale, default 10. |
normalise |
If |
max.freq |
Controls the scaling used by setting the maximum value for the radial limits. This is useful to ensure several plots use the same radial limits. |
paddle |
Either |
key.header |
Adds additional text/labels above the scale key. For
example, passing |
key.footer |
Adds additional text/labels below the scale key. See
|
key.position |
Location where the scale key is to plotted. Allowed arguments currently include “top”, “right”, “bottom” and “left”. |
key |
Fine control of the scale key via |
dig.lab |
The number of significant figures at which scientific number formatting is used in break point and key labelling. Default 5. |
include.lowest |
Logical. If |
statistic |
The |
pollutant |
Alternative data series to be sampled instead of wind speed.
The |
annotate |
If |
angle.scale |
The scale is by default shown at a 315 degree angle.
Sometimes the placement of the scale may interfere with an interesting
feature. The user can therefore set |
border |
Border colour for shaded areas. Default is no border. |
alpha |
The alpha transparency to use for the plotting surface (a value
between 0 and 1 with zero being fully transparent and 1 fully opaque).
Setting a value below 1 can be useful when plotting surfaces on a map using
the package |
plot |
Should a plot be produced? |
... |
Other parameters that are passed on to |
For windRose
data are summarised by direction, typically by 45 or 30
(or 10) degrees and by different wind speed categories. Typically, wind
speeds are represented by different width "paddles". The plots show the
proportion (here represented as a percentage) of time that the wind is from a
certain angle and wind speed range.
By default windRose
will plot a windRose in using "paddle" style
segments and placing the scale key below the plot.
The argument pollutant
uses the same plotting structure but
substitutes another data series, defined by pollutant
, for wind speed.
It is recommended to use pollutionRose()
for plotting pollutant
concentrations.
The option statistic = "prop.mean"
provides a measure of the relative
contribution of each bin to the panel mean, and is intended for use with
pollutionRose
.
an openair object. Summarised proportions can be
extracted directly using the $data
operator, e.g. object$data
for output <- windRose(mydata)
. This returns a data frame with three
set columns: cond
, conditioning based on type
; wd
, the
wind direction; and calm
, the statistic
for the proportion of
data unattributed to any specific wind direction because it was collected
under calm conditions; and then several (one for each range binned for the
plot) columns giving proportions of measurements associated with each
ws
or pollutant
range plotted as a discrete panel.
windRose
and pollutionRose
both use drawOpenKey()
to
produce scale keys.
David Carslaw (with some additional contributions by Karl Ropkins)
Applequist, S, 2012: Wind Rose Bias Correction. J. Appl. Meteor. Climatol., 51, 1305-1309.
Droppo, J.G. and B.A. Napier (2008) Wind Direction Bias in Generating Wind Roses and Conducting Sector-Based Air Dispersion Modeling, Journal of the Air & Waste Management Association, 58:7, 913-918.
Other polar directional analysis functions:
percentileRose()
,
polarAnnulus()
,
polarCluster()
,
polarDiff()
,
polarFreq()
,
polarPlot()
,
pollutionRose()
# basic plot windRose(mydata) # one windRose for each year windRose(mydata,type = "year") # windRose in 10 degree intervals with gridlines and width adjusted ## Not run: windRose(mydata, angle = 10, width = 0.2, grid.line = 1) ## End(Not run)
# basic plot windRose(mydata) # one windRose for each year windRose(mydata,type = "year") # windRose in 10 degree intervals with gridlines and width adjusted ## Not run: windRose(mydata, angle = 10, width = 0.2, grid.line = 1) ## End(Not run)