Title: | Converting Transport Data from GTFS Format to GPS-Like Records |
---|---|
Description: | Convert general transit feed specification (GTFS) data to global positioning system (GPS) records in 'data.table' format. It also has some functions to subset GTFS data in time and space and to convert both representations to simple feature format. |
Authors: | Rafael H. M. Pereira [aut] , Pedro R. Andrade [aut, cre] , Joao Bazzo [aut] , Daniel Herszenhut [ctb] , Marcin Stepniak [ctb], Marcus Saraiva [ctb] , Ipea - Institue for Applied Economic Research [cph, fnd] |
Maintainer: | Pedro R. Andrade <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.1-2 |
Built: | 2024-12-08 07:18:19 UTC |
Source: | CRAN |
Some GTFS.zip data have issues related to arrival and departure time on stops. This function makes sure the GTFS has dis/embarking times at each stop. For each stop time row, this function applies the following steps:
1. If there is 'arrival_time' but no 'departure_time', it creates a departure_time column by summing the arrival plus a pre-defined 'min_lag'.
2. If there is 'departure_time' but no 'arrival_time', it creates an arrival_time column by subtracting a pre-defined 'min_lag' from the departure.
3. If there is an 'arrival_time' and a 'departure_time' but their difference is smaller than 'min_lag', it reduces the 'arrival_time' and increases 'departure_time' so that the difference will be exactly 'min_lag'.
adjust_arrival_departure(gtfs_data, min_lag = 20)
adjust_arrival_departure(gtfs_data, min_lag = 20)
gtfs_data |
A GTFS data created with |
min_lag |
Numeric. Minimum waiting time when a vehicle arrives at a stop. It can be a numeric or a units value that can be converted to seconds. Default is 20s. |
A GTFS with adjusted 'arrival_time' and 'departure_time' on data.table 'stop_times'.
poa <- read_gtfs(system.file("extdata/poa.zip", package="gtfs2gps")) poa <- adjust_arrival_departure(poa)
poa <- read_gtfs(system.file("extdata/poa.zip", package="gtfs2gps")) poa <- adjust_arrival_departure(poa)
gtfs2gps
Some GTFS.zip data sets might have quality issues, for example by assuming that a trip speed is unreasonably high (e.g. an urban bus running over 100 Km/h), or in other cases the 'timestamp' information might be missing for some route segments. This can lead a gps-like table to have 'NA' or unrealistic 'speed' and 'timestamp' values. This function allows the user to adjust the speed of trips and updates 'timestamp' values accordingly. The user can adjust the problematic speeds by either setting a custom constant value, or by considering the average of all valid trips speed (Default). The columns 'timestamp' and 'cumtime' are updated accordingly.
adjust_speed( gps_data, min_speed = 2, max_speed = 80, new_speed = NULL, clone = TRUE )
adjust_speed( gps_data, min_speed = 2, max_speed = 80, new_speed = NULL, clone = TRUE )
gps_data |
A GPS-like data.table created with |
min_speed |
Minimum speed to be considered as valid. It can be a numeric (in km/h) or a units value able to be converted to km/h. Values below minimum speed will be adjusted. Defaults to 2 km/h. |
max_speed |
Maximum speed to be considered as valid. It can be a numeric (in km/h) or a units value able to be converted to km/h. Values above maximum speed will be adjusted. Defaults to 80 km/h. |
new_speed |
Speed to replace missing values as well as values outside min_speed and max_speed range. It can be a numeric (in km/h) or a units value able to be converted to km/h. By default, 'new_speed = NULL' and the function considers the average speed of the entire gps data. |
clone |
Use a copy of the gps_data? Defaults to TRUE. |
A GPS-like data with adjusted 'speed' values. The columns 'timestamp' and 'cumtime' are also updated accordingly.
poa <- read_gtfs(system.file("extdata/poa.zip", package="gtfs2gps")) |> gtfstools::filter_by_shape_id("T2-1") |> gtfstools::filter_by_weekday(c("monday", "wednesday")) |> filter_single_trip() poa_gps <- gtfs2gps(poa) poa_gps_new <- adjust_speed(poa_gps)
poa <- read_gtfs(system.file("extdata/poa.zip", package="gtfs2gps")) |> gtfstools::filter_by_shape_id("T2-1") |> gtfstools::filter_by_weekday(c("monday", "wednesday")) |> filter_single_trip() poa_gps <- gtfs2gps(poa) poa_gps_new <- adjust_speed(poa_gps)
Add a column named height to GPS data using a tif data as reference.
append_height(gps, heightfile)
append_height(gps, heightfile)
gps |
A GPS data created from gtfs2gps(). |
heightfile |
The pathname of a tif file with height data. |
The GPS data with a new column named height.
## Not run: # this example takes more than 10s to run fortaleza <- system.file("extdata/fortaleza.zip", package = "gtfs2gps") srtmfile <- system.file("extdata/fortaleza-srtm.tif", package = "gtfs2gps") gtfs <- read_gtfs(fortaleza) |> gtfstools::filter_by_shape_id("shape836-I") |> filter_single_trip() fortaleza_gps <- gtfs2gps(gtfs, spatial_resolution = 500) |> append_height(srtmfile) ## End(Not run)
## Not run: # this example takes more than 10s to run fortaleza <- system.file("extdata/fortaleza.zip", package = "gtfs2gps") srtmfile <- system.file("extdata/fortaleza-srtm.tif", package = "gtfs2gps") gtfs <- read_gtfs(fortaleza) |> gtfstools::filter_by_shape_id("shape836-I") |> filter_single_trip() fortaleza_gps <- gtfs2gps(gtfs, spatial_resolution = 500) |> append_height(srtmfile) ## End(Not run)
Filter a GTFS data by keeping only one trip per shape_id. It also removes the unnecessary routes and stop_times accordingly.
filter_single_trip(gtfs_data)
filter_single_trip(gtfs_data)
gtfs_data |
A list of data.tables read using gtfs2gps::reag_gtfs(). |
A filtered GTFS data.
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) subset <- filter_single_trip(poa)
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) subset <- filter_single_trip(poa)
Filter a GTFS data read using gtfs2gps::read_gtfs(). It removes stop_times with NA values in arrival_time, departure_time, and arrival_time_hms. It also filters stops and routes accordingly.
filter_valid_stop_times(gtfs_data)
filter_valid_stop_times(gtfs_data)
gtfs_data |
A list of data.tables read using gtfs2gps::reag_gtfs(). |
A filtered GTFS data.
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) subset <- filter_valid_stop_times(poa)
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) subset <- filter_valid_stop_times(poa)
Every interval of GPS data points between stops for each trip_id is converted into a linestring segment. The output assumes constant average speed between consecutive stops.
gps_as_sflinestring(gps)
gps_as_sflinestring(gps)
gps |
A data.table with timestamp data. |
A simple feature (sf) object with LineString data.
library(gtfs2gps) poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) poa_subset <- gtfstools::filter_by_shape_id(poa, c("T2-1", "A141-1")) |> filter_single_trip() poa_gps <- gtfs2gps(poa_subset) poa_gps_sf <- gps_as_sflinestring(poa_gps)
library(gtfs2gps) poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) poa_subset <- gtfstools::filter_by_shape_id(poa, c("T2-1", "A141-1")) |> filter_single_trip() poa_gps <- gtfs2gps(poa_subset) poa_gps_sf <- gps_as_sflinestring(poa_gps)
Convert a GPS data stored in a data.table into Simple Feature points.
gps_as_sfpoints(gps, crs = 4326)
gps_as_sfpoints(gps, crs = 4326)
gps |
A data.table with timestamp data. |
crs |
A Coordinate Reference System. The default value is 4326 (latlong WGS84). |
A simple feature (sf) object with point data.
library(gtfs2gps) fortaleza <- read_gtfs(system.file("extdata/fortaleza.zip", package = "gtfs2gps")) srtmfile <- system.file("extdata/fortaleza-srtm.tif", package="gtfs2gps") subset <- fortaleza |> gtfstools::filter_by_weekday(c("monday", "wednesday")) |> filter_single_trip() |> gtfstools::filter_by_shape_id("shape806-I") for_gps <- gtfs2gps(subset) for_gps_sf_points <- gps_as_sfpoints(for_gps)
library(gtfs2gps) fortaleza <- read_gtfs(system.file("extdata/fortaleza.zip", package = "gtfs2gps")) srtmfile <- system.file("extdata/fortaleza-srtm.tif", package="gtfs2gps") subset <- fortaleza |> gtfstools::filter_by_weekday(c("monday", "wednesday")) |> filter_single_trip() |> gtfstools::filter_by_shape_id("shape806-I") for_gps <- gtfs2gps(subset) for_gps_sf_points <- gps_as_sfpoints(for_gps)
Convert a GTFS shapes data loaded using gtfs2gps::read_gtf() into a line simple feature (sf).
gtfs_shapes_as_sf(gtfs, crs = 4326)
gtfs_shapes_as_sf(gtfs, crs = 4326)
gtfs |
A GTFS data. |
crs |
The coordinate reference system represented as an EPSG code. The default value is 4326 (latlong WGS84) |
A simple feature (sf) object.
poa <- read_gtfs(system.file("extdata/saopaulo.zip", package = "gtfs2gps")) poa_sf <- gtfs_shapes_as_sf(poa)
poa <- read_gtfs(system.file("extdata/saopaulo.zip", package = "gtfs2gps")) poa_sf <- gtfs_shapes_as_sf(poa)
Convert a GTFS stops data loaded using gtfs2gps::read_gtf() into a point simple feature (sf).
gtfs_stops_as_sf(gtfs, crs = 4326)
gtfs_stops_as_sf(gtfs, crs = 4326)
gtfs |
A GTFS data. |
crs |
The coordinate reference system represented as an EPSG code. The default value is 4326 (latlong WGS84) |
A simple feature (sf) object.
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) poa_shapes <- gtfs_shapes_as_sf(poa) poa_stops <- gtfs_stops_as_sf(poa)
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) poa_shapes <- gtfs_shapes_as_sf(poa) poa_stops <- gtfs_stops_as_sf(poa)
Convert GTFS data to GPS format by sampling points using a given spatial resolution. This function creates additional points in order to guarantee that two points in a same trip will have at most a given distance, indicated as a spatial resolution. It is possible to use future package to parallelize the execution (or use argument plan). This function also uses progressr internally to show progress bars. See the example below on how to show a progress bar while executing this function.
gtfs2gps( gtfs_data, spatial_resolution = 100, parallel = TRUE, ncores = NULL, strategy = NULL, filepath = NULL, compress = FALSE, snap_method = "nearest2", continue = FALSE, quiet = FALSE )
gtfs2gps( gtfs_data, spatial_resolution = 100, parallel = TRUE, ncores = NULL, strategy = NULL, filepath = NULL, compress = FALSE, snap_method = "nearest2", continue = FALSE, quiet = FALSE )
gtfs_data |
A path to a GTFS file to be converted to GPS, or a GTFS data represented as a list of data.tables. |
spatial_resolution |
The spatial resolution in meters. Default is 100m. This function only creates points in order to guarantee that the minimum distance between two consecutive points will be at most the spatial_resolution. If a given shape has two consecutive points with a distance lower than the spatial resolution, the algorithm will not remove such points. |
parallel |
Decides whether the function should run in parallel. Defaults is FALSE. When TRUE, it will use all cores available minus one using future::plan() with strategy "multisession" internally. Note that it is possible to create your own plan before calling gtfs2gps(). In this case, do not use this argument. |
ncores |
Number of cores to be used in parallel execution. When 'parallel = FALSE', this argument is ignored. When 'parallel = TRUE', then by default the function uses all available cores minus one. |
strategy |
This argument is deprecated. Please use argument plan instead or use future::plan() directly. |
filepath |
Output file path. As default, the output is returned when gtfs2gps finishes. When this argument is set, each route is saved into a txt file within filepath, with the name equals to its id. In this case, no output is returned. See argument compress for another option. |
compress |
Argument that can be used only with filepath. When TRUE, it compresses the output files by saving them using rds format. Default value is FALSE. Note that compress guarantees that the data saved will be read in the same way as it was created in R. If not compress, the txt extension requires the data to be converted from ITime to string, and therefore they need to manually converted back to ITime to be properly handled by gtfs2gps. |
snap_method |
The method used to snap stops to the route geometry. There are two available methods: 'nearest1' and 'nearest2'. Defaults to 'nearest2'. See details for more info. |
continue |
Argument that can be used only with filepath. When TRUE, it skips processing the shape identifiers that were already saved into files. It is useful to continue processing a GTFS file that was stopped for some reason. Default value is FALSE. |
quiet |
Hide messages while processing the data? Defaults to FALSE. |
After creating geometry points for a given shape id, the 'gtfs2gps()' function snaps the stops to the route geometry. Two strategies are implemented to do this. - The 'nearest2' method (default) triangulates the distance between each stop and the two nearest points in the route geometry to decide which point the stop should be snapped to. If there is any stop that is further away to the route geometry than 'spatial_resolution', the algorithm recursively doubles the 'spatial_resolution' to do the search/snap of all stops. - The 'nearest1' method traverses the geometry points computing their distances to the first stop. Whenever it finds a distance to the stop smaller than 'spatial_resolution', then the stop will be snapped to such point. The algorithm then applies the same strategy to the next stop until the vector of stops end.
The 'speed', 'cumdist', and 'cumtime' are based on the difference of distance and time between the current and previous row of the same trip. It means that the first data point at the first stop of each trip represens a stationary vehicle. The 'adjust_speed()' function can be used to post-process the output to replace eventual 'NA' values in the 'speed' column.
Each stop is presented as two data points for each trip in the output. The 'timestamp' value in the first data point represents the time when the vehicle arrived at that stop (corresponding the 'arrival_time' column in the 'stop_times.txt' file), while the 'timestamp' in the second data point represents the time when the vehicle departured from that stop (corresponding the 'departure_time' column in the 'stop_times.txt' file). The second point considers that the vehicle is stationary at the stop, immediately before departing.
Some GTFS feeds do not report embark/disembark times (so 'arrival_time' and 'departure_time' are identical at the same stop). In this case, the user can call the 'adjust_arrival_departure()' function to set the minimum time each vehicle will spend at stops to embark/disembark passengers.
To avoid division by zero, the minimum speed of vehicles in the output is 1e-12 Km/h, so that vehicles are never completely stopped.
A 'data.table', where each row represents a GPS point. The following columns are returned (units of measurement in parenthesis): dist and cumdist (meters), cumtime (seconds), shape_pt_lon and shape_pt_lat (degrees), speed (km/h), timestamp (hh:mm:ss).
library(gtfs2gps) gtfs <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) |> gtfstools::filter_by_shape_id("T2-1") |> filter_single_trip() poa_gps <- progressr::with_progress(gtfs2gps(gtfs, quiet=TRUE))
library(gtfs2gps) gtfs <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) |> gtfstools::filter_by_shape_id("T2-1") |> filter_single_trip() poa_gps <- progressr::with_progress(gtfs2gps(gtfs, quiet=TRUE))
Read files of a zipped GTFS feed and load them to memory as a list of data.tables. It will load the following files: "shapes.txt", "stop_times.txt", "stops.txt", "trips.txt", "agency.txt", "calendar.txt", "routes.txt", and "frequencies.txt", with this last four being optional. If one of the mandatory files does not exit, this function will stop with an error message.
read_gtfs(gtfszip, quiet = FALSE)
read_gtfs(gtfszip, quiet = FALSE)
gtfszip |
A zipped GTFS data. |
quiet |
A logical. Whether to hide log messages and progress bars. Defaults to 'FALSE'. |
A list of data.tables, where each index represents the respective GTFS file name.
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps"))
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps"))
Remove all objects from GTFS data that are not used in all relations that they are required to be. That is, agency-routes relation (agency_id), routes-trips relation (route_id), trips-shapes relation (shape_id), trips-frequencies relation (trip_id), trips-stop_times relation (trip_id), stop_times-stops relation (stop_id), and trips-calendar relation (service_id), recursively, until GTFS data does not reduce its size anymore. For example, if one agency_id belongs to routes but not to agency will be removed. This might cause one cascade removal of objects in other relations that originally did not have any inconsistency.
remove_invalid(gtfs_data, only_essential = TRUE, prompt_invalid = FALSE)
remove_invalid(gtfs_data, only_essential = TRUE, prompt_invalid = FALSE)
gtfs_data |
A list of data.tables read using gtfs2gps::reag_gtfs(). |
only_essential |
Remove only the essential files? The essential files are all but agency, calendar, and routes. Default is TRUE, which means that agency-routes, routes-trips, and trips-calendar relations will not be processed as restrictions to remove objects. |
prompt_invalid |
Show the invalid objects. Default is FALSE. |
A subset of the input GTFS data.
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) object.size(poa) subset <- remove_invalid(poa) object.size(subset)
poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) object.size(poa) subset <- remove_invalid(poa) object.size(subset)
Remove points from the shapes of a GTFS file in order to reduce its size. It uses Douglas-Peucker algorithm internally.
simplify_shapes(gtfs_data, tol = 0)
simplify_shapes(gtfs_data, tol = 0)
gtfs_data |
A list of data.tables read using gtfs2gps::read_gtfs(). |
tol |
Numerical tolerance value to be used by the Douglas-Peucker algorithm. The default value is 0, which means that no data will be lost. |
A GTFS data whose shapes is a subset of the input data.
Test whether a GTFS feed is frequency based or whether it presents detailed time table for all routes and trip ids.
test_gtfs_freq(gtfs)
test_gtfs_freq(gtfs)
gtfs |
A GTFS data set stored in memory as a list of data.tables/data.frames. |
A string "frequency" or "simple".
Write GTFS stored in memory as a list of data.tables into a zipped GTFS feed. This function overwrites the zip file if it exists.
write_gtfs(gtfs, zipfile, overwrite = TRUE, quiet = FALSE)
write_gtfs(gtfs, zipfile, overwrite = TRUE, quiet = FALSE)
gtfs |
A GTFS data set stored in memory as a list of data.tables/data.frames. |
zipfile |
The pathname of a .zip file to be saved with the GTFS data. |
overwrite |
A logical. Whether to overwrite an existing |
quiet |
A logical. Whether to hide log messages and progress bars.
Defaults to |
The status value returned by the external zip command, invisibly.
# read a gtfs.zip to memory poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) |> gtfstools::filter_by_shape_id("T2-1") |> filter_single_trip() # write GTFS data into a zip file write_gtfs(poa, paste0(tempdir(), "/mypoa.zip"))
# read a gtfs.zip to memory poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps")) |> gtfstools::filter_by_shape_id("T2-1") |> filter_single_trip() # write GTFS data into a zip file write_gtfs(poa, paste0(tempdir(), "/mypoa.zip"))