---
title: "Filtering occurrence records"
author: "William K. Morris"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{5. Filtering occurrence records}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
When getting records from FinBIF there are many options for filtering the data
before it is downloaded, saving bandwidth and local post-processing time. For
the full list of filtering options see `?filters`.
## Location
Records can be filtered by the name of a location.
```r
finbif_occurrence(filter = c(country = "Finland"))
#> Records downloaded: 10
#> Records available: 44691386
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84
#> 1 …JX.1594385#3 Sciurus vulgaris Li… 1 60.23584 25.05693
#> 2 …KE.176/64895825d5de884fa20e297d#Unit1 Heracleum persicum … NA 61.08302 22.38983
#> 3 …JX.1594382#9 Hirundo rustica Lin… NA 64.12716 23.99111
#> 4 …JX.1594382#37 Pica pica (Linnaeus… NA 64.12716 23.99111
#> 5 …JX.1594382#49 Muscicapa striata (… NA 64.12716 23.99111
#> 6 …JX.1594382#39 Larus canus Linnaeu… NA 64.12716 23.99111
#> 7 …JX.1594382#5 Emberiza citrinella… NA 64.12716 23.99111
#> 8 …JX.1594382#31 Ficedula hypoleuca … NA 64.12716 23.99111
#> 9 …JX.1594382#41 Alauda arvensis Lin… NA 64.12716 23.99111
#> 10 …JX.1594382#21 Numenius arquata (L… NA 64.12716 23.99111
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
Or by a set of coordinates.
```r
finbif_occurrence(
filter = list(coordinates = list(c(60, 68), c(20, 30), "wgs84"))
)
#> Records downloaded: 10
#> Records available: 37318868
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84
#> 1 …JX.1594385#3 Sciurus vulgaris Li… 1 60.23584 25.05693
#> 2 …KE.176/64895825d5de884fa20e297d#Unit1 Heracleum persicum … NA 61.08302 22.38983
#> 3 …JX.1594382#9 Hirundo rustica Lin… NA 64.12716 23.99111
#> 4 …JX.1594382#37 Pica pica (Linnaeus… NA 64.12716 23.99111
#> 5 …JX.1594382#49 Muscicapa striata (… NA 64.12716 23.99111
#> 6 …JX.1594382#39 Larus canus Linnaeu… NA 64.12716 23.99111
#> 7 …JX.1594382#5 Emberiza citrinella… NA 64.12716 23.99111
#> 8 …JX.1594382#31 Ficedula hypoleuca … NA 64.12716 23.99111
#> 9 …JX.1594382#41 Alauda arvensis Lin… NA 64.12716 23.99111
#> 10 …JX.1594382#21 Numenius arquata (L… NA 64.12716 23.99111
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
See `?filters` section "Location" for more details
## Time
The event or import date of records can be used to filter occurrence data from
FinBIF. The date filters can be a single year, month or date,
```r
finbif_occurrence(filter = list(date_range_ym = c("2020-12")))
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 23847
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time
#> 1 …107 Pica pica (Linnaeus… 31 65.0027 25.49381 2020-12-31 10:20:00
#> 2 …45 Larus argentatus Po… 1 65.0027 25.49381 2020-12-31 10:20:00
#> 3 …153 Emberiza citrinella… 2 65.0027 25.49381 2020-12-31 10:20:00
#> 4 …49 Columba livia domes… 33 65.0027 25.49381 2020-12-31 10:20:00
#> 5 …117 Corvus corax Linnae… 1 65.0027 25.49381 2020-12-31 10:20:00
#> 6 …111 Corvus monedula Lin… 7 65.0027 25.49381 2020-12-31 10:20:00
#> 7 …161 Sciurus vulgaris Li… 1 65.0027 25.49381 2020-12-31 10:20:00
#> 8 …123 Passer montanus (Li… 28 65.0027 25.49381 2020-12-31 10:20:00
#> 9 …149 Pyrrhula pyrrhula (… 1 65.0027 25.49381 2020-12-31 10:20:00
#> 10 …77 Turdus pilaris Linn… 1 65.0027 25.49381 2020-12-31 10:20:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
, or for record events, a range as a character vector.
```r
finbif_occurrence(
filter = list(date_range_ymd = c("2019-06-01", "2019-12-31"))
)
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 911735
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84
#> 1 …KE.921/LGE.627772/1470480 Pteromys volans (Li… NA 61.81362 25.75756
#> 2 …JX.1054648#107 Pica pica (Linnaeus… 3 65.30543 25.70355
#> 3 …JX.1054648#85 Poecile montanus (C… 1 65.30543 25.70355
#> 4 …JX.1054648#103 Garrulus glandarius… 3 65.30543 25.70355
#> 5 …JX.1054648#123 Passer montanus (Li… 3 65.30543 25.70355
#> 6 …JX.1054648#149 Pyrrhula pyrrhula (… 1 65.30543 25.70355
#> 7 …JX.1054648#93 Cyanistes caeruleus… 9 65.30543 25.70355
#> 8 …JX.1054648#95 Parus major Linnaeu… 35 65.30543 25.70355
#> 9 …JX.1054648#137 Carduelis flammea (… 2 65.30543 25.70355
#> 10 …JX.1056695#107 Pica pica (Linnaeus… 6 62.7154 23.0893
#> date_time
#> 1 2019-12-31 12:00:00
#> 2 2019-12-31 10:20:00
#> 3 2019-12-31 10:20:00
#> 4 2019-12-31 10:20:00
#> 5 2019-12-31 10:20:00
#> 6 2019-12-31 10:20:00
#> 7 2019-12-31 10:20:00
#> 8 2019-12-31 10:20:00
#> 9 2019-12-31 10:20:00
#> 10 2019-12-31 10:15:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
Records for a specific season or time-span across all years can also be
requested.
```r
finbif_occurrence(
filter = list(
date_range_md = c(begin = "12-21", end = "12-31"),
date_range_md = c(begin = "01-01", end = "02-20")
)
)
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 1486845
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time
#> 1 …433443#318 Accipiter nisus (Li… 1 64.8162 25.32106 2023-02-20 15:00:00
#> 2 …531663#107 Pica pica (Linnaeus… 10 62.9199 27.71032 2023-02-20 07:40:00
#> 3 …530610#107 Pica pica (Linnaeus… 21 65.78623 24.49119 2023-02-20 09:15:00
#> 4 …530449#107 Pica pica (Linnaeus… 4 65.74652 24.62216 2023-02-20 08:20:00
#> 5 …531663#153 Emberiza citrinella… 12 62.9199 27.71032 2023-02-20 07:40:00
#> 6 …531663#49 Columba livia domes… 10 62.9199 27.71032 2023-02-20 07:40:00
#> 7 …530610#49 Columba livia domes… 2 65.78623 24.49119 2023-02-20 09:15:00
#> 8 …530610#117 Corvus corax Linnae… 1 65.78623 24.49119 2023-02-20 09:15:00
#> 9 …531663#61 Dendrocopos major (… 6 62.9199 27.71032 2023-02-20 07:40:00
#> 10 …531663#111 Corvus monedula Lin… 7 62.9199 27.71032 2023-02-20 07:40:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
## Data Quality
You can filter occurrence records by indicators of data quality. See `?filters`
section "Quality" for details.
```r
strict <- c(
collection_quality = "professional", coordinates_uncertainty_max = 1,
record_quality = "expert_verified"
)
permissive <- list(
wild_status = c("wild", "non_wild", "wild_unknown"),
record_quality = c(
"expert_verified", "community_verified", "unassessed", "uncertain",
"erroneous"
),
abundance_min = 0
)
c(
strict = finbif_occurrence(filter = strict, count_only = TRUE),
permissive = finbif_occurrence(filter = permissive, count_only = TRUE)
)
#> strict permissive
#> 52654 51733557
```
## Collection
The FinBIF database consists of a number of constituent collections. You can
filter by collection with either the `collection` or `not_collection` filters.
Use `finbif_collections()` to see metadata on the FinBIF collections.
```r
finbif_occurrence(
filter = c(collection = "iNaturalist Suomi Finland"), count_only = TRUE
)
#> [1] 691076
finbif_occurrence(
filter = c(collection = "Notebook, general observations"), count_only = TRUE
)
#> [1] 2110409
```
## Informal taxonomic groups
You can filter occurrence records based on informal taxonomic groups such as
`Birds` or `Mammals`.
```r
finbif_occurrence(filter = list(informal_groups = c("Birds", "Mammals")))
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 22116048
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time
#> 1 …5#3 Sciurus vulgaris Li… 1 60.23584 25.05693 2023-06-14 08:56:00
#> 2 …2#9 Hirundo rustica Lin… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 3 …2#37 Pica pica (Linnaeus… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 4 …2#49 Muscicapa striata (… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 5 …2#39 Larus canus Linnaeu… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 6 …2#5 Emberiza citrinella… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 7 …2#31 Ficedula hypoleuca … NA 64.12716 23.99111 2023-06-14 08:48:00
#> 8 …2#41 Alauda arvensis Lin… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 9 …2#21 Numenius arquata (L… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 10 …2#29 Dendrocopos major (… NA 64.12716 23.99111 2023-06-14 08:48:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
See `finbif_informal_groups()` for the full list of groups you can filter by.
You can use the same function to see the subgroups that make up a higher
level informal group:
```r
finbif_informal_groups("macrofungi")
#> Error in finbif_informal_groups("macrofungi"): Group not found
```
## Regulatory
Many records in the FinBIF database include taxa that have one or another
regulatory statuses. See `finbif_metadata("regulatory_status")` for a list of
regulatory statuses and short-codes.
```r
# Search for birds on the EU invasive species list
finbif_occurrence(
filter = list(informal_groups = "Birds", regulatory_status = "EU_INVSV")
)
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 471
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84
#> 1 …JX.1580858#3 Oxyura jamaicensis … 1 60.28687 25.0271
#> 2 …JX.1580860#3 Oxyura jamaicensis … 1 60.28671 25.02713
#> 3 …KE.176/62b1ad90d5deb0fafdc6212b#Unit1 Oxyura jamaicensis … 7 61.66207 23.57706
#> 4 …JX.1045316#34 Alopochen aegyptiac… 3 52.16081 4.485534
#> 5 …JX.138840#123 Alopochen aegyptiac… 4 53.36759 6.191796
#> 6 …JX.139978#214 Alopochen aegyptiac… 6 53.37574 6.207861
#> 7 …JX.139710#17 Alopochen aegyptiac… 30 52.3399 5.069133
#> 8 …JX.139645#57 Alopochen aegyptiac… 36 51.74641 4.535283
#> 9 …JX.139645#10 Alopochen aegyptiac… 3 51.74641 4.535283
#> 10 …JX.139442#16 Alopochen aegyptiac… 2 51.90871 4.53258
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
## IUCN red list
Filtering can be done by [IUCN red list](https://punainenkirja.laji.fi/)
category. See `finbif_metadata("red_list")` for the IUCN red list categories and
their short-codes.
```r
# Search for near threatened mammals
finbif_occurrence(
filter = list(informal_groups = "Mammals", red_list_status = "NT")
)
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 42510
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84
#> 1 …JX.1594024#23 Rangifer tarandus f… 15 63.31266 24.43298
#> 2 …JX.1588853#1075 Rangifer tarandus f… 1 63.84551 29.8366
#> 3 …JX.1593780#3 Pusa hispida botnic… 1 65.02313 25.40505
#> 4 …HR.3211/166639315-U Rangifer tarandus f… NA 63.7 24.7
#> 5 …HR.3211/166049302-U Rangifer tarandus f… NA 64.1 26.5
#> 6 …HR.3211/165761924-U Rangifer tarandus f… NA 63.9 24.9
#> 7 …JX.1589779#105 Rangifer tarandus f… 3 63.7261 23.40827
#> 8 …KE.176/647ad84dd5de884fa20e25e6#Unit1 Rangifer tarandus f… 1 64.12869 24.73877
#> 9 …HR.3211/165005253-U Pusa hispida botnic… NA 64.2865 23.87402
#> 10 …JX.1588052#18 Rangifer tarandus f… 2 64.13286 26.26767
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
## Habitat type
Many taxa are associated with one or more primary or secondary habitat types
(e.g., forest) or subtypes (e.g., herb-rich alpine birch forests). Use
`finbif_metadata("habitat_type")` to see the habitat types in FinBIF. You can
filter occurrence records based on primary (or primary/secondary) habitat type
or subtype codes. Note that filtering based on habitat is on taxa not on the
location (i.e., filtering records with `primary_habitat = "M"` will only return
records of taxa considered to primarily inhabit forests, yet the locations of
those records may encompass habitats other than forests).
```r
head(finbif_metadata("habitat_type"))
#> code name
#> MKV.habitatMt Mt alpine birch forests (excluding herb-rich alpine …
#> MKV.habitatTlk Tlk alpine calcareous rock outcrops and boulder fields
#> MKV.habitatTlr Tlr alpine gorges and canyons
#> MKV.habitatT T Alpine habitats
#> MKV.habitatTp Tp alpine heath scrubs
#> MKV.habitatTk Tk alpine heaths
```
```r
# Search records of taxa for which forests are their primary or secondary
# habitat type
finbif_occurrence(filter = c(primary_secondary_habitat = "M"))
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 26362337
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time
#> 1 …5#3 Sciurus vulgaris Li… 1 60.23584 25.05693 2023-06-14 08:56:00
#> 2 …2#37 Pica pica (Linnaeus… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 3 …2#49 Muscicapa striata (… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 4 …2#5 Emberiza citrinella… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 5 …2#31 Ficedula hypoleuca … NA 64.12716 23.99111 2023-06-14 08:48:00
#> 6 …2#29 Dendrocopos major (… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 7 …2#15 Sylvia borin (Bodda… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 8 …2#11 Anthus trivialis (L… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 9 …2#45 Corvus monedula Lin… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 10 …2#3 Phylloscopus trochi… NA 64.12716 23.99111 2023-06-14 08:48:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
You may further refine habitat based searching using a specific habitat type
qualifier such as "sun-exposed" or "shady". Use
`finbif_metadata("habitat_qualifier")` to see the qualifiers available. To
specify qualifiers use a named list of character vectors where the names are
habitat types or subtypes and the elements of the character vectors are the
qualifier codes.
```r
finbif_metadata("habitat_qualifier")[4:6, ]
#> code name
#> MKV.habitatSpecificTypeCA CA calcareous effect
#> MKV.habitatSpecificTypeH H esker forests, also semi-open forests
#> MKV.habitatSpecificTypeKE KE intermediate-basic rock outcrops and boulder fiel…
```
```r
# Search records of taxa for which forests with sun-exposure and broadleaved
# deciduous trees are their primary habitat type
finbif_occurrence(filter = list(primary_habitat = list(M = c("PAK", "J"))))
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 178
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time
#> 1 …502812#393 Pammene fasciana (L… NA 60.45845 22.17811 2022-08-14 12:00:00
#> 2 …435062#6 Pammene fasciana (L… 1 60.20642 24.66127 2022-08-04
#> 3 …435050#9 Pammene fasciana (L… 1 60.20642 24.66127 2022-07-25
#> 4 …501598#39 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-21 12:00:00
#> 5 …501387#162 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-20 12:00:00
#> 6 …448030#159 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-18 12:00:00
#> 7 …447556#78 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-14 12:00:00
#> 8 …446841#408 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-12 12:00:00
#> 9 …443339#36 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-10 12:00:00
#> 10 …440849#159 Pammene fasciana (L… 2 60.08841 22.48629 2022-07-08 12:00:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
## Status of taxa in Finland
You can restrict the occurrence records by the status of the taxa in Finland.
For example you can request records for only rare species.
```r
finbif_occurrence(filter = c(finnish_occurrence_status = "rare"))
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 406005
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84
#> 1 …HR.3211/167313706-U Pygaera timon (Hübn… NA 62.1281 27.45272
#> 2 …JX.1594282#21 Carterocephalus pal… 1 64.65322 24.58941
#> 3 …HR.3211/167197097-U Carterocephalus pal… NA 65.07819 25.55236
#> 4 …HR.3211/167183358-U Glaucopsyche alexis… NA 60.46226 22.76647
#> 5 …JX.1594291#3 Glaucopsyche alexis… 1 60.42692 22.20411
#> 6 …KE.176/6488c111d5de884fa20e295f#Unit1 Panemeria tenebrata… 1 61.16924 25.56036
#> 7 …JX.1593930#3 Hemaris tityus (Lin… 1 60.63969 27.29052
#> 8 …KE.176/64889455d5de884fa20e294f#Unit1 Pseudopanthera macu… 2 62.054 30.352
#> 9 …JX.1594170#199 Glaucopsyche alexis… 1 61.10098 28.68453
#> 10 …JX.1594112#3 Hemaris tityus (Lin… 1 61.25511 28.89127
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
Or, by using the negation of occurrence status, you can request records of birds
excluding those considered vagrants.
```r
finbif_occurrence(
filter = list(
informal_groups = "birds",
finnish_occurrence_status_neg = sprintf("vagrant_%sregular", c("", "ir"))
)
)
```
Click to show/hide output.
```r
#> Records downloaded: 10
#> Records available: 21725426
#> A data.frame [10 x 12]
#> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time
#> 1 …9 Hirundo rustica Lin… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 2 …37 Pica pica (Linnaeus… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 3 …49 Muscicapa striata (… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 4 …39 Larus canus Linnaeu… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 5 …5 Emberiza citrinella… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 6 …31 Ficedula hypoleuca … NA 64.12716 23.99111 2023-06-14 08:48:00
#> 7 …41 Alauda arvensis Lin… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 8 …21 Numenius arquata (L… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 9 …29 Dendrocopos major (… NA 64.12716 23.99111 2023-06-14 08:48:00
#> 10 …15 Sylvia borin (Bodda… NA 64.12716 23.99111 2023-06-14 08:48:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality
```
See `finbif_metadata("finnish_occurrence_status")` for a full list of statuses
and their descriptions.