crsuggest is a small R package to help spatial analysts determine an appropriate projected coordinate reference system for their data. It implements functions that attempt to match an input spatial dataset with corresponding coordinate reference systems that will work well for mapping and/or spatial analysis. The package is inspired by the more cleverly-named projestions API and companion QGIS plugin. It uses data from the EPSG Registry, a product of the International Association of Oil & Gas Producers, and is subject to its terms of use for the data.
Install from CRAN with the following command in R:
install.packages("crsuggest")
Or get the development version from GitHub:
::install_github("walkerke/crsuggest") remotes
Consider the following common use cases in R:
Let’s say you’ve obtained a points dataset in a geographic coordinate system and you’d like to draw buffers around those points:
library(tigris)
library(sf)
<- landmarks("TX")
tx_landmarks
<- st_buffer(tx_landmarks, 1) landmark_buf
# The message:
dist is assumed to be in decimal degrees (arc_degrees).
Warning message:
In st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle = endCapStyle, :
st_buffer does not correctly buffer longitude/latitude data
st_buffer()
is telling us that we need a projected
coordinate system. But how do we figure out the right one for our data?
For local areas (e.g. a US state) there are a number of options designed
to minimize distortion for that area. suggest_crs()
aims to
put these options at your fingertips.
library(crsuggest)
<- suggest_crs(tx_landmarks)
possible_crs
::glimpse(possible_crs) dplyr
Rows: 10
Columns: 6
$ crs_code <chr> "6580", "6579", "3666", "3665", "3085", "3084", "3083", "3082"…
$ crs_name <chr> "NAD83(2011) / Texas Centric Lambert Conformal", "NAD83(2011) …
$ crs_type <chr> "projected", "projected", "projected", "projected", "projected…
$ crs_gcs <dbl> 6318, 6318, 4759, 4759, 4152, 4152, 4269, 4269, 4269, 4267
$ crs_units <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "ft"
$ crs_proj4 <chr> "+proj=lcc +lat_0=18 +lon_0=-100 +lat_1=27.5 +lat_2=35 +x_0=15…
By default, suggest_crs()
returns the top 10 matches for
a given input spatial dataset. From here, you can browse the returned
CRS options, select an appropriate entry, and use the EPSG or
proj4string codes in your analysis. For example, we can now choose a
coordinate system, transform, and re-run:
<- st_transform(tx_landmarks, 3081)
landmarks_projected
<- st_buffer(landmarks_projected, 1000) buffer_1km
Let’s say you’ve obtained a spatial dataset, but you already know you
want a projected coordinate system with a specific geographic coordinate
system (e.g. NAD 1983, EPSG code 4269) and specific measurement units
(US Feet). suggest_crs()
has gcs
and
units
parameters that allow subsetting in this way.
library(tidycensus)
library(ggplot2)
library(sf)
# Median age by Census tract in Cook County, Illinois
<- get_acs(geography = "tract",
cook_age variables = "B01002_001",
state = "IL",
county = "Cook County",
geometry = TRUE)
<- suggest_crs(cook_age, gcs = 4269, units = "us-ft")
cook_crs
::glimpse(cook_crs) dplyr
Rows: 2
Columns: 6
$ crs_code <chr> "3435", "32166"
$ crs_name <chr> "NAD83 / Illinois East (ftUS)", "NAD83 / BLM 16N (ftUS)"
$ crs_type <chr> "projected", "projected"
$ crs_gcs <dbl> 4269, 4269
$ crs_units <chr> "us-ft", "us-ft"
$ crs_proj4 <chr> "+proj=tmerc +lat_0=36.6666666666667 +lon_0=-88.3333333333333 +k=0.99…
We get two options to choose from; one in the State Plane system and the other in the BLM system.
Alternatively, if you just need to quickly get a code for plotting
appropriately, you can use the function suggest_top_crs()
.
This will return either the EPSG code or proj4string of the top CRS
entry for your data. It should be used with caution; I recommend
researching the output especially before using it for spatial
analysis.
suggest_top_crs(cook_age)
> suggest_top_crs(cook_age)
Using the projected CRS NAD83 / Illinois East (ftUS) which uses 'us-ft' for measurement units. Please visit https://spatialreference.org/ref/epsg/3435/ for more information about this CRS.
[1] 3435
We can use the State Plane Illinois East CRS in plotting functions
like coord_sf()
:
ggplot(cook_age, aes(fill = estimate)) +
geom_sf(color = NA) +
coord_sf(crs = 3435) +
scale_fill_viridis_c()
Commonly, analysts will work with data for which the origin is not
known. These datasets may include CSV files that represent locations or
shapefiles missing the .prj file for coordinate system information. If
these datasets are represented in projected coordinates, it can be
difficult to figure out how to correctly handle the data in a geospatial
software environment. The guess_crs()
function, inspired by
the WhatTheProj
project, aims to help with this. Consider this example dataset of
locations collected in the field. The coordinates are clearly from a
projected CRS, but without knowledge of that CRS they cannot be mapped
correctly.
library(sf)
library(mapview)
<- data.frame(
locations X = c(1200822.97857801, 1205015.51644983, 1202297.44383987, 1205877.68696743,
1194763.21511923, 1195463.42403192, 1199836.01037452, 1207081.96500368,
1201924.15986897),
Y = c(1246476.31475063, 1248612.72571423, 1241479.45996392, 1243898.58428024,
1246033.7550009, 1241827.7730307, 1234691.50899912, 1251125.67808482,
1252188.4333016),
id = 1:9
)
<- st_as_sf(locations, coords = c("X", "Y"))
locations_sf
mapview(locations_sf)
If the general location of the data is known,
guess_crs()
will make guesses on the coordinate system from
which the data come. The EPSG code will be returned along with a
dist_km
column; this is the distance between the center
point of the input dataset and your known location if they are
in a given CRS. If the distance is large, the coordinate system likely
won’t work. If the mapboxapi package is
installed, you can supply a location name as your target location;
otherwise, supply a length-2 coordinate pair vector of form
c(longitude, latitude)
.
library(crsuggest)
guess_crs(locations_sf, "Chennai, India", n_return = 5)
Evaluating CRS options...
The 'best guess' for the CRS of your data is EPSG code 7785.
Use `sf::st_crs(your_data) <- 7785` to use this CRS for your data.
View the returned dataset for other possible options.
# A tibble: 5 x 2
crs_code dist_km
<chr> <dbl>
1 7785 4.09
2 24344 806.
3 32644 806.
4 32244 806.
5 32444 806.
The clear choice here is EPSG code 7785 for WGS84 / Tamil Nadu; no other possibilities are suitable here (and this is the correct answer!). Use this code to set the CRS for your data which will allow for correct placement of the points.
st_crs(locations_sf) <- 7785
mapview(locations_sf)
crsuggest is designed to make CRS recommendations: the right CRS for your project will be based on a variety of factors that can’t always safely be automated. If you intend to use these tools in production data pipelines, use at your own risk and I would strongly recommend checking your data for unexpected results. I personally the packages as a “look-up” tool to help you make informed decisions. Research the coordinate system you plan to use - and how it handles distortion of your data - before settling on it for use in production.