Missing observations and recording errors are fairly common in
tracking data. They can be caused by hardware failures, object
occultation, faulty data writing, etc. trackdf
provides a
few functions to help detect this missing or erroneous data so that you
can fix them or omit them altogether from your analysis.
But first, let’s load some “flawed” data provided with
trackdf
:
library(trackdf)
raw <- read.csv(system.file("extdata/gps/01.csv", package = "trackdf"))
tt <- track(x = raw$lon, y = raw$lat, t = paste(raw$date, raw$time), id = 1,
proj = "+proj=longlat", tz = "Africa/Windhoek")
## Warning: 1 failed to parse.
## Track table [3599 observations]
## Number of tracks: 1
## Dimensions: 2D
## Geographic: TRUE
## Projection: +proj=longlat
## Table class: data frame ('data.frame')
## id t x y
## 1 1 2015-09-10 07:00:00 15.76468 -22.37957
## 2 1 2015-09-10 07:00:01 15.76468 -22.37957
## 3 1 2015-09-10 07:00:04 15.76468 -22.37958
## 4 1 2015-09-10 07:00:05 15.76468 -22.37958
## 5 1 2015-09-10 07:00:06 NA -22.37958
## 6 1 2015-09-10 07:00:07 15.76467 NA
## 7 1 2015-09-10 07:00:08 15.76467 -22.37959
## 8 1 2015-09-10 07:00:09 15.76467 -22.37959
## 9 1 2015-09-10 07:00:09 15.76467 -22.37959
## 10 1 2015-09-10 07:00:10 15.76467 -22.37959
## [ reached 'max' / getOption("max.print") -- omitted 3589 rows ]
These are observations that have not been recorded at all. If the
data is recorded at regular intervals, then these missing observations
can be easily detected using the missing_data
function as
follows:
## Track table [5 observations]
## Number of tracks: 1
## Dimensions: 2D
## Geographic: TRUE
## Projection: +proj=longlat
## Table class: data frame ('data.frame')
## id t x y
## 1 1 2015-09-10 07:00:02 NA NA
## 2 1 2015-09-10 07:00:03 NA NA
## 4 1 2015-09-10 07:00:06 NA -22.37958
## 5 1 2015-09-10 07:00:07 15.76467 NA
## 3 1 2015-09-10 07:00:34 NA NA
The output is a track table with each row corresponding to a time stamp at which at least one coordinate is missing.
Note that you can specify the beginning (begin
) and end
(end
) of the observation window in which you want to detect
missing data, as well as the time difference (step
) between
successive observations.
These are observations that are repeated multiple times throughout
the data set (e.g., two observations with identical time stamps for a
given individual). These duplicated observations can be detected using
the duplicated_data
function as follows:
## Track table [1 observations]
## Number of tracks: 1
## Dimensions: 2D
## Geographic: TRUE
## Projection: +proj=longlat
## Table class: data frame ('data.frame')
## id t x y duplicate
## 8 1 2015-09-10 07:00:09 15.76467 -22.37959 txy
The output is a track table with each row corresponding to an
observation that was partially or completely duplicated, depending on
the type
argument. This argument is a character string or a
vector of character strings indicating the type of duplications to look
for. The strings can be any combination of “t” (for time duplications)
and “x”, “y”, “z” (for coordinate duplications). For instance, the
string “txy” will return data with duplicated time stamps and duplicated
x and y coordinates.
These are observations whose coordinates are too different from the
surrounding (timewise) observations, for instance, because of sporadic
errors in GPS recordings. These inconsistent observations can be
detected using the duplicated_data
function as follows:
## Track table [1 observations]
## Number of tracks: 1
## Dimensions: 2D
## Geographic: TRUE
## Projection: +proj=longlat
## Table class: data frame ('data.frame')
## id t x y
## 1 1 2015-09-10 07:00:24 15.86467 -22.4796
The output is a track table with each row corresponding to an inconsistent observation.
Note that the detection of inconsistencies requires specifying a
threshold (s
) for distinguishing between consistent and
inconsistent observations. Higher threshold values will result in a
lower number of detected inconsistencies, and reciprocally for lower
threshold values.