Uniformly reduce data volumes with either aggregation or resampling
(specified by the method argument) over an interval specified in
seconds using the interval argument.
Both options make two important assumptions:
(1) that timestamps are named 'time' and 'datetime', and
(2) all columns except the identity columns can be averaged in R.
While the 'subsample' option returns a thinned dataset with all columns from
the input data, the 'aggregate' option drops the column covxy, since
this cannot be propagated to the averaged position.
Both options handle the column 'time' differently: while 'subsample' returns
the actual timestamp (in UNIX time) of each sample, 'aggregate' returns the
mean timestamp (also in UNIX time).
The 'aggregate' option only recognises errors named varx and
vary.
If all of these columns are not present together the function assumes there
is no measure of error, and drops those columns.
If there is actually no measure of error, the function simply returns the
averaged position and covariates in each time interval.
Grouping variables' names (such as animal identity) may be passed as a
character vector to the id_columns argument.
If patch is among the columns, it will be converted to numeric for
aggregation and rounded back to the nearest integer, then converted back to
character to match the original type. This ensures that the most common
patch ID is retained after aggregation (e.g. 2 positions with patch ID 5
and 20 with patch ID 6 will aggregate to patch ID 6).
Usage
atl_thin_data(
data,
interval = 60,
id_columns = NULL,
method = c("subsample", "aggregate")
)Arguments
- data
Tracking data to aggregate. Must have columns
xandy, and a numeric column namedtime, as well asdatetime.- interval
The interval in seconds over which to aggregate.
- id_columns
Column names for grouping columns.
- method
Should the data be thinned by subsampling or aggregation. If resampling (
method = "subsample"), the first position of each group is taken. If aggregation (method = "aggregate"), the group positions' mean is taken.
Examples
library(data.table)
data <- data.table(
tag = as.character(rep(1:2, each = 10)),
time = rep(seq(1696218721, 1696218721 + 92, by = 10), 2),
x = rnorm(20, 10, 1),
y = rnorm(20, 15, 1)
)
data[, datetime := as.POSIXct(time, origin = "1970-01-01", tz = "UTC")]
#> tag time x y datetime
#> <char> <num> <num> <num> <POSc>
#> 1: 1 1696218721 10.606748 14.64564 2023-10-02 03:52:01
#> 2: 1 1696218731 9.890064 15.94635 2023-10-02 03:52:11
#> 3: 1 1696218741 10.172182 16.31683 2023-10-02 03:52:21
#> 4: 1 1696218751 9.909673 14.70336 2023-10-02 03:52:31
#> 5: 1 1696218761 11.924343 14.61279 2023-10-02 03:52:41
#> 6: 1 1696218771 11.298393 14.21457 2023-10-02 03:52:51
#> 7: 1 1696218781 10.748791 13.94326 2023-10-02 03:53:01
#> 8: 1 1696218791 10.556224 14.20446 2023-10-02 03:53:11
#> 9: 1 1696218801 9.451743 13.24372 2023-10-02 03:53:21
#> 10: 1 1696218811 11.110535 14.30946 2023-10-02 03:53:31
#> 11: 2 1696218721 7.387666 14.44146 2023-10-02 03:52:01
#> 12: 2 1696218731 9.844306 14.46334 2023-10-02 03:52:11
#> 13: 2 1696218741 10.433890 15.22713 2023-10-02 03:52:21
#> 14: 2 1696218751 9.618049 15.97845 2023-10-02 03:52:31
#> 15: 2 1696218761 10.424188 14.79112 2023-10-02 03:52:41
#> 16: 2 1696218771 11.063102 13.60059 2023-10-02 03:52:51
#> 17: 2 1696218781 11.048713 15.25854 2023-10-02 03:53:01
#> 18: 2 1696218791 9.961897 14.55820 2023-10-02 03:53:11
#> 19: 2 1696218801 10.486149 15.56860 2023-10-02 03:53:21
#> 20: 2 1696218811 11.672883 17.12685 2023-10-02 03:53:31
#> tag time x y datetime
#> <char> <num> <num> <num> <POSc>
# Thin the data by aggregation with a 60-second interval
thinned_aggregated <- atl_thin_data(
data = data,
interval = 60,
id_columns = "tag",
method = "aggregate"
)
# Thin the data by subsampling with a 60-second interval
thinned_subsampled <- atl_thin_data(
data = data,
interval = 60,
id_columns = "tag",
method = "subsample"
)
# View results
print(thinned_aggregated)
#> tag time x y datetime n_aggregated
#> <char> <num> <num> <num> <POSc> <int>
#> 1: 1 1696218720 10.63357 15.07325 2023-10-02 03:52:00 6
#> 2: 1 1696218780 10.46682 13.92523 2023-10-02 03:53:00 4
#> 3: 2 1696218720 9.79520 14.75035 2023-10-02 03:52:00 6
#> 4: 2 1696218780 10.79241 15.62805 2023-10-02 03:53:00 4
print(thinned_subsampled)
#> tag time x y datetime n_subsampled
#> <char> <num> <num> <num> <POSc> <int>
#> 1: 1 1696218721 10.606748 14.64564 2023-10-02 03:52:01 6
#> 2: 1 1696218781 10.748791 13.94326 2023-10-02 03:53:01 4
#> 3: 2 1696218721 7.387666 14.44146 2023-10-02 03:52:01 6
#> 4: 2 1696218781 11.048713 15.25854 2023-10-02 03:53:01 4
