Biodiversity queries to the ALA usually require some spatial filtering. Let’s see how spatial data are stored in the ALA and a few different methods to spatially filter data with galah and other packages.
Most records in the ALA contain location data in the form of two key
fields: decimalLatitude and decimalLongitude.
As expected, these fields are the decimal coordinates of each
occurrence, with south and west values denoted by negatives. While there
may be some uncertainty in these values (see field
coordinateUncertaintyInMeters), they are generally very
accurate.
While these are very important and useful fields, very rarely will we
encounter situations that require queries directly calling
decimalLatitude and decimalLongitude. Instead,
the ALA and galah have a number of features that make spatial queries
simpler.
Often we want to filter results down to some commonly defined spatial
regions, such as states, LGAs or IBRA/IMCRA regions. The ALA contains a
large range (>100) of contextual and spatial layers, in-built as
searchable and queriable fields. They are denoted by names beginning
with "cl", followed by an identifying number that may be up
to 6 digits long. These fields are each based on shapefiles, and contain
the names of the regions in these layers that each record lies in.
We strongly recommend using search_fields() to check
whether a contextual layer already exists in the ALA that matches what
you require before proceeding with other methods of spatial filtering.
These fields are all able to be queried with filter()and so
they are generally easier to use.
Suppose we are interested in querying records of the Red-Necked Avocet (Recurvirostra novaehollandiae) in the Coorong wetlands in South Australia. We can search the ALA fields for wetlands.
## # A tibble: 2 × 3
##   id      description                       type  
##   <chr>   <chr>                             <chr> 
## 1 cl901   Directory of Important Wetlands   fields
## 2 cl11192 Ramsar_Wetlands_of_AustraliaWGS84 fieldsOur search identifies that layer cl901 seems to match
what we are looking for. We can then either view all possible values in
the field with show_values(), or search again for our
particular field.
## • Showing values for 'cl901'.## # A tibble: 1 × 1
##   cl901                                      
##   <chr>                                      
## 1 The Coorong, Lake Alexandrina & Lake AlbertWe can filter all occurrences for exact matches with this value,
"Lake Eyre". Our galah query can be built as
follows:
galah_call() |>
  identify("Recurvirostra novaehollandiae") |>
  filter(cl901 == "The Coorong, Lake Alexandrina & Lake Albert") |>
  collect() |>
  head(5) |>
  gt::gt()## Retrying in 1 seconds.
## Retrying in 2 seconds.
## Retrying in 4 seconds.| recordID | scientificName | taxonConceptID | decimalLatitude | decimalLongitude | eventDate | occurrenceStatus | dataResourceName | 
|---|---|---|---|---|---|---|---|
| 00630f58-ee78-4f69-ba39-718b0a1c3356 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.59293 | 139.0218 | 2008-11-18 | PRESENT | SA Fauna | 
| 00705077-abc2-4f28-aedd-11a479ec2d82 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.52833 | 138.8089 | 2016-03-06 | PRESENT | BirdLife Australia, Birdata | 
| 0071444e-ddf2-45dc-9a10-c6399686e306 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.57719 | 138.9923 | 2008-01-01 | PRESENT | SA Fauna | 
| 008a2e6c-26b4-4f4e-a428-b0330db53466 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.52831 | 138.8280 | 2017-11-19 17:15:00 | PRESENT | eBird Australia | 
| 00b11f13-0100-4471-b18b-7e9736ddeeaf | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -36.18001 | 139.6589 | 2017-08-05 13:25:00 | PRESENT | eBird Australia | 
While server-side spatial information is useful, there are likely to
be cases where the shapefile or region you wish to query will not be
pre-loaded as a contextual layer in the ALA. In this case, shapefiles
can be introduced to the filtering process using the {sf} package and
the geolocate() function. Shapefiles can be provided as an
sf object, whether that is by importing them with
sf::st_read() or taking a POLYGON or
MULTIPOLYGON character string and transforming them with
sf::st_as_sfc().
For instance, we might interested in species occurrences in King
George Square, Brisbane. We can take the MULTIPOLYGON
object for the square (as sourced from the Brisbane City
Council) and transform it into sfc and then
sf objects.
king_george_sq <- "MULTIPOLYGON(((153.0243 -27.46886, 153.0242 -27.46896, 153.0236 -27.46837, 153.0239 -27.46814, 153.0239 -27.46813, 153.0242 -27.46789, 153.0244 -27.46805, 153.0245 -27.46821, 153.0246 -27.46828, 153.0247 -27.46835, 153.0248 -27.46848, 153.0246 -27.4686, 153.0246 -27.46862, 153.0245 -27.46871, 153.0243 -27.46886)))" |>
  sf::st_as_sfc() |> 
  sf::st_as_sf()We can provide this MULTIPOLYGON in our filter as the
argument of geolocate() to assess which species have been
recorded in King George Square.
galah_call() |>
  geolocate(king_george_sq) |>
  select(decimalLatitude, 
         decimalLongitude, 
         eventDate, 
         scientificName, 
         vernacularName) |>
  collect() |> 
  head(10) |>
  gt::gt()| decimalLatitude | decimalLongitude | eventDate | scientificName | vernacularName | 
|---|---|---|---|---|
| -27.46862 | 153.0243 | 2006-03-30 09:32:00 | Threskiornis moluccus | Australian White Ibis | 
| -27.46861 | 153.0244 | 2023-12-02 09:12:02 | Cosmophasis baehrae | NA | 
| -27.46855 | 153.0239 | 2024-02-03 20:45:00 | Mediastinia | NA | 
| -27.46851 | 153.0238 | 2023-04-08 17:49:00 | Entomyzon cyanotis | Blue-faced Honeyeater | 
| -27.46842 | 153.0241 | 2022-08-30 11:29:00 | Threskiornis moluccus | Australian White Ibis | 
| -27.46836 | 153.0241 | 2024-08-28 10:31:54 | Vitellus | NA | 
| -27.46833 | 153.0241 | 2022-07-24 14:43:00 | Threskiornis moluccus | Australian White Ibis | 
| -27.46831 | 153.0241 | 2024-10-13 14:34:00 | Threskiornis moluccus | Australian White Ibis | 
| -27.46830 | 153.0240 | 2024-10-13 14:35:00 | Gymnorhina tibicen | Australian Magpie | 
| -27.46817 | 153.0240 | 2024-07-23 10:00:53 | Threskiornis moluccus | Australian White Ibis | 
There is a second argument of geolocate() called
type, which defaults to value "polygon". By
setting the type argument to "bbox", the
provided POLYGON or MULTIPOLYGON will be
converted into the smallest bounding box (rectangle) that contains the
POLYGON. In this case, records will be included that may
not exactly lie inside the provided shape.
galah_call() |>
  geolocate(king_george_sq, type = "bbox") |>
  select(decimalLatitude, 
         decimalLongitude, 
         eventDate, 
         scientificName, 
         vernacularName) |>
  collect() |>
  head(10) |>
  gt::gt()## Data returned for bounding box:
## xmin = 153.0236 xmax = 153.0248 ymin = -27.46896 ymax = -27.46789| decimalLatitude | decimalLongitude | eventDate | scientificName | vernacularName | 
|---|---|---|---|---|
| -27.46889 | 153.0244 | 2015-10-09 | Burhinus (Burhinus) grallarius | Bush Stone-curlew | 
| -27.46882 | 153.0236 | 2019-04-21 14:40:00 | Threskiornis moluccus | Australian White Ibis | 
| -27.46882 | 153.0236 | 2005-08-28 12:18:00 | Threskiornis moluccus | Australian White Ibis | 
| -27.46868 | 153.0238 | 2024-07-15 15:00:18 | Pseudanapaea denotata | NA | 
| -27.46868 | 153.0238 | 2024-01-13 15:07:42 | Tenodera australasiae | Purplewinged Mantid | 
| -27.46868 | 153.0238 | 2024-01-13 15:04:53 | Pristhesancus plagipennis | Bee-killer | 
| -27.46868 | 153.0238 | 2024-01-13 15:06:26 | Toxorhynchites (Toxorhynchites) speciosus | NA | 
| -27.46868 | 153.0238 | 2024-01-13 12:20:00 | Xixuthrus | NA | 
| -27.46868 | 153.0238 | 2024-07-15 12:12:27 | Columba (Columba) livia | Rock Dove | 
| -27.46868 | 153.0238 | 2024-01-13 15:07:42 | Lyramorpha (Lyramorpha) rosea | Litchi Stink Bug | 
The type argument with option "bbox" is
provided because sf objects with >500 vertices will not
be accepted by the ALA. In the event you have a large shapefile, using
type = "bbox" will at least enable an initial reduction of
the data that is downloaded, before finer filtering to the actual
shapefile will obtain the desired set of occurrences. Alternatively, one
can also perform the "bbox" reduction before passing the
shape to geolocate() by using
sf::st_bbox().
A common situation for this to occur is when a shapefile with multiple shapes is provided, where we are interested in grouping our results by each shape. Here is a mock workflow using a subset of a shapefile of all 2,184 Brisbane parks.
Let’s say we are interested in knowing which parks in the Brisbane postcode 4075 have the most occurrences of the Scaly-Breasted Lorikeet (Trichoglossus chlorolepidotus) since 2020. We can download the entire shapefile from the link above, and perform our filtering and summarising as follows:
brisbane_parks <- sf::st_read("path/to/Park___Locations.shp") |>
  sf::st_make_valid() |>
  filter(POST_CODE == 4075)# Convert shapefile to a bounding box
brisbane_parks_bbox <- brisbane_parks |> sf::st_bbox()
# Find all occurrences of Trichoglossus chlorolepidotus in the bounding box in 2022
lorikeet_brisbane <- galah_call() |>
  filter(scientificName == "Trichoglossus chlorolepidotus", 
         year >= 2020) |>
  geolocate(brisbane_parks_bbox, type = "bbox") |>
  collect()## Data returned for bounding box:
## xmin = 152.96331 xmax = 152.99668 ymin = -27.57737 ymax = -27.51606
## Retrying in 1 seconds.# Filter records down to only those in the shapefile polygons
lorikeet_brisbane |>
  # Create a point geometry based on the occurrence coordinates
  sf::st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = sf::st_crs(brisbane_parks), remove = FALSE) |>
  # identify which park each occurrence sits in with st_intersects()
  mutate(intersection = sf::st_intersects(geometry, brisbane_parks) |> as.integer(),
         park = ifelse(is.na(intersection), NA, brisbane_parks$PARK_NAME[intersection])) |>
  # Filter out occurrences that did not occur in a park
  filter(!is.na(park)) |>
  # Drop the geometry column
  sf::st_drop_geometry() |>
  # Summarise the top 10 parks for lorikeet sightings in 2022
  group_by(park) |>
  summarise(counts = n()) |>
  arrange(desc(counts)) |>
  head(10)## # A tibble: 10 × 2
##    park                           counts
##    <chr>                           <int>
##  1 NYUNDARE-BA PARK                  552
##  2 SHERWOOD ARBORETUM                472
##  3 FAULKNER PARK                      64
##  4 FORT ROAD BUSHLAND                 46
##  5 STRICKLAND TERRACE PARK            44
##  6 BENARRAWA RESERVE                  34
##  7 GRACEVILLE RIVERSIDE PARKLANDS     32
##  8 NOSWORTHY PARK                     17
##  9 HORACE WINDOW RESERVE              14
## 10 GRACEVILLE AVENUE PARK              7Some shapefiles cover large geographic areas with the caveat that even the bounding box doesn’t restrict the number of records to a value that can be downloaded easily. In this case, we recommend more nuances and detailed methods that can be performed using looping techniques. One of our ALA Labs blog posts, Hex maps for species occurrence data, has been written detailing how to approach larger problems such as this.