The method developed by Greimel et al. (2016) detects and characterizes sub-daily flow fluctuations and is implemented in the R package hydropeak (available on CRAN). Based on the events detected by the method implemented in package hydropeak, hydroroute identifies associated events in hydrographs from neighboring gauging stations and models the translation and retention processes between them (Greimel et al., 2022).
This vignette presents the main function peaktrace()
which given the events and relation information between gauging stations
determines the associated events and based on these, estimates
predictive models to trace initial values specified for the relevant
metrics across the neighboring gauging stations. First, an overview on
the input data required is given and the additional function
get_lag()
and variants thereof are presented which estimate
the mean translation time of the hydrographs between adjacent gauging
stations, if not already available. Then the individual function
estimate_AE()
is presented which is used by
peaktrace()
to identify the associated events (AEs).
Finally the application of peaktrace()
is illustrated and
it is shown how the return value can be inspected.
Several data files are required to perform the analysis.
Q
If the mean translation time between hydrographs at neighboring
gauging stations is not given or known, the raw dataset Q
,
containing (equispaced) date-time values and the corresponding flow
fluctuations, is needed. Based on these data the mean translation time
between hydrographs at neighboring gauging stations may be estimated
using get_lag()
. The dataset Q
needs to
contain three variables:
ID
Character string which refers to the identifier of
the gauging station (in Austria: HZBCODE).Time
Character string with date-time information of
measurements.Q
Numeric, flow measurements.The combination of ID
and Time
must be
unique. Functions that use the dataset Q
assume that these
variables are contained in this order with exactly these names. If this
is not the case, i.e., if the columns have different names or are not in
this order, the order can be specified in these functions with argument
cols
. The columns are then renamed internally to make the
data processable. Also the date-time format must be specified if it is
different from the function’s default format used.
The following code loads the sample dataset Q
and shows
the first few rows:
<- system.file("testdata", "Q.csv", package = "hydroroute")
Q_path <- read.csv(Q_path)
Q head(Q)
#> ID Time Q
#> 1 200000 2014-01-01 00:00:00 4.07
#> 2 200000 2014-01-01 00:15:00 4.06
#> 3 200000 2014-01-01 00:30:00 4.05
#> 4 200000 2014-01-01 00:45:00 4.04
#> 5 200000 2014-01-01 01:00:00 4.03
#> 6 200000 2014-01-01 01:15:00 4.01
The sample dataset Q
is a data frame with 22656 rows and
3 variables as described above. The station ID’s along the flow path are
100000, 200000, 300000, 400000
. The time ranges from
"2014-01-01 00:00:00"
to
"2014-02-28 23:45:00"
.
relation
The dataset relation
provides information about the
gauging stations of neighboring hydrographs. It contains:
ID
Character string which refers to the identifier of
the gauging station (in Austria: HZBCODE).Type
Character string which characterizes the source
hydrograph (Turbine flow
, Gauge
,
Basin outflow
).Station
Character string which indicates the order of
the n
hydrographs in relation
in downstream
direction (Si
with i = 1, ..., n
).fkm
Numeric, position of hydrograph in km relative to
the source.LAG
Character string which contains the cumulative mean
translation time (or estimated cumulative lag) between the source and a
specific gauging station in the format HH:MM
. For
S1
this is either indicated as missing (NA
) or
always given as 00:00
. It is either already provided in
relation
or can be estimated from the corresponding dataset
Q
with get_lag()
.The following code loads an example dataset:
<- system.file("testdata", "relation.csv", package = "hydroroute")
relation_path <- read.csv(relation_path)
relation
relation#> ID Type Station fkm LAG
#> 1 100000 Gauge S1 0.01 00:00
#> 2 200000 Gauge S2 9.16 01:00
#> 3 300000 Gauge S3 17.02 02:15
#> 4 400000 Gauge S4 29.27 03:30
The dataset relation
contains 4 adjacent gauging
stations.
The output files from hydropeak’s
get_events_*()
function are used to identify AEs. The
naming scheme of the output files is
ID_event-type_date-time-from_date-time-to.csv
. Event types
are defined as follows:
The most important event types for the following analysis are
2
(increasing event; IC) and 4
(decreasing
event; DC).
Package hydroroute includes 8 sample
Event
files for each gauging station ID contained in the
sample dataset Q
and event type 2
(IC) and
4
(DC) between "2014-01-01 00:00:00"
and
"2014-02-28 23:45:00"
. The increasing events for the
station with ID 100000
are thus loaded using:
<- system.file("testdata", "Events", "100000_2_2014-01-01_2014-02-28.csv",
Sx package = "hydroroute")
<- read.csv(Sx)
Sx head(Sx)
#> ID EVENT_TYPE Time AMP MAFR MEFR DUR RATIO
#> 1 100000 2 2014-01-01 00:15:00 0.01 0.01 0.01000000 1 1.003546
#> 2 100000 2 2014-01-01 01:00:00 27.07 12.20 6.76750000 4 10.565371
#> 3 100000 2 2014-01-01 10:15:00 29.38 15.50 7.34500000 4 11.801471
#> 4 100000 2 2014-01-01 13:30:00 0.16 0.03 0.02666667 6 1.060837
#> 5 100000 2 2014-01-01 17:30:00 30.91 16.90 7.72750000 4 12.490706
#> 6 100000 2 2014-01-01 20:45:00 0.02 0.01 0.01000000 2 1.007812
get_lag()
For the identification of AEs, the translation time between
neighboring hydrographs and the event amplitude have to be considered.
For the first criterion, the mean translation time (LAG
)
between hydrographs has to be estimated and the cumulative values
appended to the relation
data for further processing, if
not available yet.
Function get_lag_file()
uses:
Q_file
A path to a file that contains the
Q
data from several stations or a data frame that contains
this information.relation_file
A path to a relation
file.
The ID
s of the stations must be in Q
.If the argument save
is TRUE
, the
relation
data with appended LAG
column is
written to a file specified in outfile
. If a
LAG
column already exists, argument overwrite
has to be set to TRUE
to overwrite the existing column. The
function can be applied to several relation
files by
iterating over file paths or if a single Q
data file is
available, get_lag_dir()
can be used. relation
files can be selected from a directory using regular expressions
(argument relation_pattern
).
The following code shows this for single file names:
<- system.file("testdata", "Q.csv", package = "hydroroute")
Q_file <- system.file("testdata", "relation.csv", package = "hydroroute")
relation get_lag_file(Q_file, relation, inputsep = ",", format = "%Y-%m-%d %H:%M",
(save = FALSE, overwrite = TRUE))
#> ID Type Station fkm LAG
#> 1 100000 Gauge S1 0.01 00:00
#> 2 200000 Gauge S2 9.16 01:15
#> 3 300000 Gauge S3 17.02 02:45
#> 4 400000 Gauge S4 29.27 04:00
This code indicates the use with get_lag_dir()
where the
directory is specified:
<- system.file("testdata", "Q.csv", package = "hydroroute")
Q_file <- file.path(tempdir(), "testdata")
relations_path dir.create(relations_path)
file.copy(list.files(system.file("testdata", package = "hydroroute"),
full.name = TRUE),
recursive = FALSE, copy.mode = TRUE)
relations_path, #> [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
get_lag_dir(Q_file, relations_path, inputsep = ",",
(inputdec = ".", format = "%Y-%m-%d %H:%M", overwrite = TRUE))
#> [[1]]
#> ID Type Station fkm LAG
#> 1 100000 Gauge S1 0.01 00:00
#> 2 200000 Gauge S2 9.16 01:15
#> 3 300000 Gauge S3 17.02 02:45
#> 4 400000 Gauge S4 29.27 04:00
estimate_AE()
Greimel et al. (2022) propose the following algorithm to identify AEs:
“For every event x
at the upstream hydrograph, the mean
translation time between the neighboring hydrographs (calculated by an
autocorrelation analysis) is subtracted from the downstream hydrograph.
This then captures several events from the downstream hydrograph within
a time slot +/- the translation time. Among those matches only those
events are retained where the relative difference in amplitude is +/-
one. In the following, a potential AE is the event y
detected with the smallest time difference to x
after
accounting for the mean translation time meeting the time and amplitude
criteria. […]
The relative difference in amplitude is then determined for these
events, and parabolas are fitted to the histograms obtained for the
relative difference data binned into intervals from -1 to 1 with a width
0.1 by fixing the vertex at the inner maximum of the histogram […]. The
width of the parabola is determined by minimizing the average squared
distances between the parabola and the histogram data along arbitrary
symmetric ranges from the inner maximum. Based on the fitted parabola,
cut points with the x
-axis are determined so that only
those potential AEs whose relative difference is within these cut points
are retained. If this automatic scheme does not succeed in determining
suitable cut points, e.g., because the estimated cut points are outside
the defined intervals, a strict criterion for the relative difference in
amplitude is imposed to identify AEs considering only deviations of at
most 10%.”
estimate_AE()
estimates suitable settings for the
amplitude based on the method developed in Greimel et al. (2022) based
on potential associated events identified using the specified time and
metric deviations allowed for a match.
It performs this procedure for two neighboring hydrographs, i.e., it
takes a subset of relation
and the two corresponding
Event
files as input. The gauging station ID
s
in the subset of relation
and in the Event
files must match. Suitable settings for the amplitude are estimated as
follows:
Sy$Time
is shifted by the optimal mean translation
time between Sx
and Sy
.
Based on the specified time lags, matches between Sx
and Sy
are captured.
Relative differences in AMP
are computed, e.g.,
(Sy$AMP - Sx$AMP) / Sx$AMP
, and only matches are retained
where these relative differences are within the range
specified.
Matched events are iteratively filtered to retain those where the time lag is most similar leading to the potential AEs.
The relative differences of potential AEs are binned into
intervals of length 0.1 from -1 to 1. The created relative frequency
table of the binned relative differences is passed to function
get_parabola()
where either suitable cut points with the
x-axis are determined or a strict criterion is returned.
The table of the relative differences is visualized in a plot where the fitted parabola and the cut points with the x-axis are also shown.
The estimated settings for amplitude, a data frame of “real” AEs, i.e., associated events within the estimated cut points, and the plot are returned.
Note that the metric flow ratio (RATIO) does not make sense for
S1
if the hydrograph is not of type Gauge
. So
metric RATIO
is set to NA
internally in this
case.
The following code shows this procedure for two Event
files:
# file paths
<- system.file("testdata", "Events", "100000_2_2014-01-01_2014-02-28.csv",
Sx package = "hydroroute")
<- system.file("testdata", "Events", "200000_2_2014-01-01_2014-02-28.csv",
Sy package = "hydroroute")
<- system.file("testdata", "relation.csv", package = "hydroroute")
relation
# read data
<- utils::read.csv(Sx)
Sx <- utils::read.csv(Sy)
Sy <- utils::read.csv(relation)
relation <- relation[1:2, ]
relation
# estimate AE, exact time matches
<- estimate_AE(Sx, Sy, relation, timeLag = c(0, 1, 0)) results
$settings
results#> station.x station.y bound lag metric
#> 1 S1 S2 lower 0 0.9286046
#> 2 S1 S2 inner 1 1.0500000
#> 3 S1 S2 upper 0 1.1713954
Column bound
represents the lower, inner and upper
bounds that are used to subset potential AEs. lag
represents the time lag. Only exact matches are used in this examples,
which is specified by argument timeLag = c(0, 1, 0)
, which
refers to 0 deviation at the lower and upper bound and 1 at the inner
bound, meaning, that the mean translation time from
relation
is not altered when time matches are computed.
$plot_threshold results
head(results$real_AE)
#> id.x event_type.x time.x amp.x mafr.x mefr.x dur.x ratio.x
#> 1 100000 2 2014-01-11 17:15:00 39.01 20.4 13.00333 3 12.177650
#> 2 100000 2 2014-01-14 12:30:00 15.00 6.0 2.50000 6 1.262238
#> 3 100000 2 2014-01-15 06:15:00 77.14 38.3 12.85667 6 27.060811
#> 4 100000 2 2014-01-15 20:15:00 10.20 5.1 2.55000 4 1.312883
#> 5 100000 2 2014-01-19 06:45:00 74.41 23.7 9.30125 8 30.883534
#> 6 100000 2 2014-01-27 15:45:00 0.10 0.1 0.10000 1 1.001391
#> station.x id.y event_type.y time.y amp.y mafr.y mefr.y dur.y
#> 1 S1 200000 2 2014-01-11 18:15:00 39.06 20.3 6.510 6
#> 2 S1 200000 2 2014-01-14 13:30:00 14.40 5.3 2.400 6
#> 3 S1 200000 2 2014-01-15 07:15:00 77.49 33.2 15.498 5
#> 4 S1 200000 2 2014-01-15 21:15:00 10.20 4.0 2.040 5
#> 5 S1 200000 2 2014-01-19 07:45:00 76.36 28.9 9.545 8
#> 6 S1 200000 2 2014-01-27 16:45:00 0.10 0.1 0.100 1
#> ratio.y station.y diff_metric
#> 1 7.804878 S2 1.281723e-03
#> 2 1.231140 S2 -4.000000e-02
#> 3 14.111675 S2 4.537205e-03
#> 4 1.280992 S2 0.000000e+00
#> 5 16.457490 S2 2.620616e-02
#> 6 1.001311 S2 1.469658e-13
peaktrace()
Function peaktrace
combines the identification of
potential AEs and the estimation of suitable amplitude settings for a
whole river section as specified in a relation
file. In
addition, the flow metrics of the AEs are pictured by scatter plots and
the translation and retention process between the hydrographs is
described by linear models.
In the following, the input arguments of function
peaktrace()
are described:
relation_path
: Is the path where the whole relation
file from a river section is to be read from.
events_path
: Is the path of the directory where the
Event
files are located. These files must correspond to the
format described earlier in the section discussing the input
data.
initial_values_path
: Is the path where initial
values for predicting the metric at the neighboring stations are to be
read from. It should not contain missing values. But missing values can
be imputed with a method specified in argument
impute_method
, which is by default max
. An
example for such a file is:
<- system.file("testdata", "initial_value_routing.csv",
initial_values_path package = "hydroroute")
<- read.csv(initial_values_path)
initials
initials#> Station Metric Value Name
#> 1 S1 AMP 93.00000 Q_max
#> 2 S1 AMP 69.75000 Q_0.75
#> 3 S1 AMP 46.50000 Q_0.5
#> 4 S1 AMP 23.25000 Q_0.25
#> 5 S1 MAFR 93.00000 Q_max
#> 6 S1 MAFR 69.75000 Q_0.75
#> 7 S1 MAFR 46.50000 Q_0.5
#> 8 S1 MAFR 23.25000 Q_0.25
#> 9 S1 MEFR 93.00000 Q_max
#> 10 S1 MEFR 69.75000 Q_0.75
#> 11 S1 MEFR 46.50000 Q_0.5
#> 12 S1 MEFR 23.25000 Q_0.25
#> 13 S1 DUR 1.00000 N
#> 14 S2 RATIO 19.90099 Max
The columns must be identical to this example. The content may vary. The initial values used for prediction must not contain any missing values.
Station
refers to the gauging station of the
hydrograph. Here, all initial values correspond to gauging station
S1
except for metric RATIO
, which starts at
gauging station S2
.
Metric
corresponds to the metric. This is used to
pick the corresponding fitted predictive model.
Value
can be chosen arbitrarily or estimated with a
data-driven approach. A unique Name
is assigned which can
be used to characterize the curve obtained from this initial value used
to predict the metrics in downstream direction. E.g., for this example
the initial values are set to certain quantiles of the metrics at
station S1
.
The return value of function peaktrace()
is structured
as follows:
A named list with one list element for each
event_type
.
For each event_type
:
One element that contains the estimated settings from
estimate_AE()
for all gauging stations.
Plot of relative differences of AMP
with cut points
from settings_AE()` for all pairs of neighboring gauging
stations.
Real AEs according to the estimated settings from
estimate_AE()
for all pairs of neighboring gauging stations
and a column diff_metric
that contains the relative
difference in AMP
.
A grid of scatter plots containing the AEs for neighboring hydrographs and for each metric with the fitted regression line.
Results of model fitting. Each row contains the corresponding stations and metric, the model type (default: “lm”), formula, coefficients, number of observations and \(R^2\).
Plot of predicted values based on the initial values.
<- system.file("testdata", "relation.csv", package = "hydroroute")
relation_path <- system.file("testdata", "Events", package = "hydroroute")
events_path <- system.file("testdata", "initial_value_routing.csv",
initial_values_path package = "hydroroute")
<- peaktrace(relation_path, events_path, initial_values_path) res
The first list object refers to event type 2 (IC event).
$`2`$settings
res#> station.x station.y bound lag metric
#> 1 S1 S2 lower 1 0.7388856
#> 2 S1 S2 inner 1 0.9500000
#> 3 S1 S2 upper 1 1.1611144
#> 4 S2 S3 lower 1 0.8818457
#> 5 S2 S3 inner 1 1.0500000
#> 6 S2 S3 upper 1 1.2181543
#> 7 S3 S4 lower 1 0.8267117
#> 8 S3 S4 inner 1 0.9500000
#> 9 S3 S4 upper 1 1.0732883
The settings
data frame contains the estimated time lag
and metric settings computed with estimate_AE()
. In this
example with 4 stations, it contains nine rows where three rows describe
the relation between two neighboring stations. Since only exact time
matches were allowed, the lag
values are 0 for
lower
and upper
bound and 1 for
inner
. metric
contains the range of relative
values of the amplitude allowed.
Events, where the relative difference in amplitude and the relative
difference in time are within these settings, are considered as “real”
AE and therefore events caused by disruptive factors are excluded as far
as possible.
The following plot shows the histograms obtained for the relative differences in amplitude for each pair of neighboring gauging stations binned into intervals from -1 to 1 of width 0.1. The dashed line shows the fitted parabola and the cut points of the parabola with the x-axis are indicated. Potential AEs where the relative difference is within these cut points are considered as “real” AEs.
::grid.draw(res$`2`$plot_threshold) grid
The “real” AEs can be inspected using:
head(res$`2`$real_AE)
#> id.x event_type.x time.x amp.x mafr.x mefr.x dur.x ratio.x
#> 1 100000 2 2014-01-01 01:00:00 27.07 12.2 6.767500 4 10.565371
#> 2 100000 2 2014-01-01 10:15:00 29.38 15.5 7.345000 4 11.801471
#> 3 100000 2 2014-01-01 17:30:00 30.91 16.9 7.727500 4 12.490706
#> 4 100000 2 2014-01-02 17:30:00 26.88 14.1 4.480000 6 8.636364
#> 5 100000 2 2014-01-03 05:00:00 27.34 12.9 9.113333 3 10.559441
#> 6 100000 2 2014-01-03 16:00:00 27.16 14.8 3.017778 9 10.912409
#> station.x id.y event_type.y time.y amp.y mafr.y mefr.y dur.y
#> 1 S1 200000 2 2014-01-01 02:15:00 25.83 13.11 5.166000 5
#> 2 S1 200000 2 2014-01-01 11:30:00 28.49 14.54 5.698000 5
#> 3 S1 200000 2 2014-01-01 18:45:00 29.15 14.98 7.287500 4
#> 4 S1 200000 2 2014-01-02 19:00:00 26.43 11.49 3.303750 8
#> 5 S1 200000 2 2014-01-03 06:15:00 26.69 13.70 5.338000 5
#> 6 S1 200000 2 2014-01-03 17:30:00 27.03 12.20 3.861429 7
#> ratio.y station.y diff_metric
#> 1 7.506297 S2 -0.045807167
#> 2 8.679245 S2 -0.030292716
#> 3 8.379747 S2 -0.056939502
#> 4 6.427105 S2 -0.016741071
#> 5 7.655860 S2 -0.023774689
#> 6 7.808564 S2 -0.004786451
The scatter plots of the metrics at the neighboring gauging stations
for the “real” AEs are contained in plot_scatter
. The
scatter plots are arranged in a grid where each row contains scatter
plots for a specific metric and each colums contains a different pair of
neighboring gauging stations. The x-axis is the upstream hydrograph
Sx
, the y-axis is the downstream hydrograph
Sy
. A linear regression line and the corresponding \(R^2\) value are added to each plot. By
default the aspect ratio is fixed and the axis limits are equal within
each plot.
::grid.draw(res$`2`$plot_scatter) grid
The fitted regression models may also be inspected:
$`2`$models
res#> station.x station.y metric type formula (Intercept) x n r2
#> 1 S1 S2 amp lm y ~ x -0.5542922 1.0194716 164 0.99658281
#> 2 S1 S2 dur lm y ~ x 1.6520198 0.8453943 164 0.64266715
#> 3 S1 S2 mafr lm y ~ x -0.3694711 1.0066784 164 0.91454377
#> 4 S1 S2 mefr lm y ~ x 0.4813008 0.7150873 164 0.70223571
#> 5 S2 S3 amp lm y ~ x 0.7597112 0.9782935 100 0.99765823
#> 6 S2 S3 dur lm y ~ x 0.7959971 0.7868426 100 0.70537848
#> 7 S2 S3 mafr lm y ~ x -1.0932362 1.5764049 100 0.94356887
#> 8 S2 S3 mefr lm y ~ x 0.5505522 1.0358693 100 0.76774263
#> 9 S2 S3 ratio lm y ~ x 0.5890385 0.6172956 100 0.97341252
#> 10 S3 S4 amp lm y ~ x -0.1251017 0.9485448 30 0.98810523
#> 11 S3 S4 dur lm y ~ x 3.0299519 0.4642539 30 0.09809454
#> 12 S3 S4 mafr lm y ~ x 3.0964565 0.5820430 30 0.87656463
#> 13 S3 S4 mefr lm y ~ x 2.2653366 0.7352544 30 0.56098791
#> 14 S3 S4 ratio lm y ~ x 1.2112368 0.3257644 30 0.44545302
The models
data frame contains the fitted (linear)
models for each pair of neighboring stations and each metric.
station.x
is the upstream hydrograph
Sx
.station.y
is the downstream hydrograph
Sy
.metric
is the name of the corresponding metric.type
is the model class, by default
lm
.formula
is the expression that is used to fit the
model.(Intercept)
, x
are the extracted
coefficients (called with coef
).n
is the number of events used to fit the model.r2
is the extracted or computed \(R^2\) for each model.Finally the fitted regression models are used to predict the values of the metrics along the longitudinal flow path given the initial values:
::grid.arrange(grobs = res$`2`$plot_predict$grobs, nrow = 3, ncol = 2) gridExtra
If a file with initial values is passed to peaktrace()
,
predictions along the longitudinal flow path are made and visualized in
a plot. Each line in the plot represents a different scenario, e.g., the
uppermost solid lines for AMP, MAFR and MEFR represent the values of
Q_max
in the initial file. Starting from these initial
values, predictions are made with the corresponding models, e.g., the
first value of the initial valus file is passed to the model that
describes the relationship of AMP between S1
and
S2
to predict the value at S2
.
The initial value and the first fitted model are:
1, ]
initials[#> Station Metric Value Name
#> 1 S1 AMP 93 Q_max
$`2`$models[1, ]
res#> station.x station.y metric type formula (Intercept) x n r2
#> 1 S1 S2 amp lm y ~ x -0.5542922 1.019472 164 0.9965828
The resulting predicted value is then passed to the next model along
the flow path, i.e., the model of S2
and S3
to
predict the value at S3
.
$`2`$models[6, ]
res#> station.x station.y metric type formula (Intercept) x n r2
#> 6 S2 S3 dur lm y ~ x 0.7959971 0.7868426 100 0.7053785
Finally, this predicted value is passed to the last model along this
river section: the model between S3
and S4
to
predict the value at S4
.
$`2`$models[11, ]
res#> station.x station.y metric type formula (Intercept) x n r2
#> 11 S3 S4 dur lm y ~ x 3.029952 0.4642539 30 0.09809454
This procedure is repeated for all metrics according to the initial
values file. Note that in this initial values file metric RATIO starts
at station S2
. Therefore the first value of RATIO to
predict is at station S3
.
The second list object refers to event type 4 (DC event). The nested objects of this event type are shown below.
$`4`$settings
res#> station.x station.y bound lag metric
#> 1 S1 S2 lower 1 0.8168386
#> 2 S1 S2 inner 1 0.9500000
#> 3 S1 S2 upper 1 1.0831614
#> 4 S2 S3 lower 1 0.9202873
#> 5 S2 S3 inner 1 1.0500000
#> 6 S2 S3 upper 1 1.1797127
#> 7 S3 S4 lower 1 0.5814461
#> 8 S3 S4 inner 1 0.8500000
#> 9 S3 S4 upper 1 1.1185539
head(res$`4`$real_AE)
#> id.x event_type.x time.x amp.x mafr.x mefr.x dur.x ratio.x
#> 1 100000 4 2014-01-01 02:00:00 27.07 13.40 3.867143 7 10.565371
#> 2 100000 4 2014-01-01 11:15:00 29.47 13.90 3.274444 9 12.205323
#> 3 100000 4 2014-01-01 18:30:00 31.04 16.00 3.448889 9 13.125000
#> 4 100000 4 2014-01-02 07:45:00 27.44 13.98 6.860000 4 12.154472
#> 5 100000 4 2014-01-02 20:15:00 27.37 14.20 3.421250 8 10.033003
#> 6 100000 4 2014-01-03 18:15:00 26.87 13.90 5.374000 5 9.867987
#> station.x id.y event_type.y time.y amp.y mafr.y mefr.y
#> 1 S1 200000 4 2014-01-01 03:30:00 26.09 6.5 0.8153125
#> 2 S1 200000 4 2014-01-01 12:45:00 28.09 6.5 1.7556250
#> 3 S1 200000 4 2014-01-01 19:45:00 29.50 7.1 1.2826087
#> 4 S1 200000 4 2014-01-02 08:45:00 26.37 6.2 1.6481250
#> 5 S1 200000 4 2014-01-02 21:00:00 27.26 6.5 0.8793548
#> 6 S1 200000 4 2014-01-03 19:15:00 25.84 6.4 1.9876923
#> dur.y ratio.y station.y diff_metric
#> 1 32 8.032345 S2 -0.036202438
#> 2 16 7.834550 S2 -0.046827282
#> 3 23 9.194444 S2 -0.049613402
#> 4 16 8.069705 S2 -0.038994169
#> 5 31 7.747525 S2 -0.004018999
#> 6 13 6.007752 S2 -0.038332713
::grid.draw(res$`4`$plot_threshold) grid
::grid.draw(res$`4`$plot_scatter) grid
$`4`$models
res#> station.x station.y metric type formula (Intercept) x n r2
#> 1 S1 S2 amp lm y ~ x -0.6331382 1.0230064 159 0.9976993
#> 2 S1 S2 dur lm y ~ x 3.4889475 1.1941344 159 0.5660323
#> 3 S1 S2 mafr lm y ~ x 0.3670848 0.6093259 159 0.8288550
#> 4 S1 S2 mefr lm y ~ x 0.3152232 0.5218025 159 0.6858269
#> 5 S2 S3 amp lm y ~ x 0.8889574 0.9860819 89 0.9960296
#> 6 S2 S3 dur lm y ~ x 2.5702522 0.6747927 89 0.8660458
#> 7 S2 S3 mafr lm y ~ x 0.8424465 0.8700683 89 0.9305420
#> 8 S2 S3 mefr lm y ~ x 0.9748893 0.6441977 89 0.7688562
#> 9 S2 S3 ratio lm y ~ x 0.6994949 0.5659408 89 0.9776473
#> 10 S3 S4 amp lm y ~ x -0.4977512 0.8877421 50 0.9569848
#> 11 S3 S4 dur lm y ~ x 1.6421232 0.6351157 50 0.7096118
#> 12 S3 S4 mafr lm y ~ x -0.2972899 0.8213848 50 0.5844032
#> 13 S3 S4 mefr lm y ~ x 0.2811406 1.1317973 50 0.2960442
#> 14 S3 S4 ratio lm y ~ x 0.8553193 0.3178922 50 0.7922559
::grid.arrange(grobs = res$`4`$plot_predict$grobs, nrow = 3, ncol = 2) gridExtra
If estimated settings are already available, it is possible to use
the settings directly to extract “real” AEs from the Event
data. For such an analysis, a path to a relation
file must
be provided, as well as a path to the Event
data and paths
to settings
and initial values.
The following code shows the extraction of “real” AEs based on the
settings
file which is included in the package after having
been generated with estimate_AE()
.
<- system.file("testdata", "relation.csv", package = "hydroroute")
relation_path <- system.file("testdata", "Events", package = "hydroroute")
events_path <- system.file("testdata", "Q_event_2_AMP-LAG_aut_settings.csv",
settings_path package = "hydroroute")
<- system.file("testdata", "initial_value_routing.csv",
initials_path package = "hydroroute")
<- extract_AE(relation_path, events_path, settings_path)
real_AE head(real_AE)
#> id.x event_type.x time.x amp.x mafr.x mefr.x dur.x ratio.x
#> 1 100000 2 2014-01-01 01:00:00 27.07 12.2 6.767500 4 10.565371
#> 2 100000 2 2014-01-01 10:15:00 29.38 15.5 7.345000 4 11.801471
#> 3 100000 2 2014-01-01 17:30:00 30.91 16.9 7.727500 4 12.490706
#> 4 100000 2 2014-01-02 17:30:00 26.88 14.1 4.480000 6 8.636364
#> 5 100000 2 2014-01-03 05:00:00 27.34 12.9 9.113333 3 10.559441
#> 6 100000 2 2014-01-03 16:00:00 27.16 14.8 3.017778 9 10.912409
#> station.x id.y event_type.y time.y amp.y mafr.y mefr.y dur.y
#> 1 S1 200000 2 2014-01-01 02:15:00 25.83 13.11 5.166000 5
#> 2 S1 200000 2 2014-01-01 11:30:00 28.49 14.54 5.698000 5
#> 3 S1 200000 2 2014-01-01 18:45:00 29.15 14.98 7.287500 4
#> 4 S1 200000 2 2014-01-02 19:00:00 26.43 11.49 3.303750 8
#> 5 S1 200000 2 2014-01-03 06:15:00 26.69 13.70 5.338000 5
#> 6 S1 200000 2 2014-01-03 17:30:00 27.03 12.20 3.861429 7
#> ratio.y station.y diff_metric
#> 1 7.506297 S2 -0.045807167
#> 2 8.679245 S2 -0.030292716
#> 3 8.379747 S2 -0.056939502
#> 4 6.427105 S2 -0.016741071
#> 5 7.655860 S2 -0.023774689
#> 6 7.808564 S2 -0.004786451
With the extracted “real” AEs, the routing procedure can be performed to describe the translation and retention processes between neighboring hydrographs.
Therefore, the output from extract_AE()
(or similar, the
output $real_AE
from estimate_AE()
), the
initial values data frame and the relation
data frame have
to be passed to function routing()
.
<- utils::read.csv(relation_path)
relation <- utils::read.csv(initials_path)
initials <- routing(real_AE, initials, relation) res
This produces the same scatter plot as before when
peaktrace()
was called as the events and the settings are
the same.
::grid.draw(res$plot_scatter) grid
$models
res#> station.x station.y metric type formula (Intercept) x n r2
#> 1 S1 S2 amp lm y ~ x -0.5542922 1.0194716 164 0.99658281
#> 2 S1 S2 dur lm y ~ x 1.6520198 0.8453943 164 0.64266715
#> 3 S1 S2 mafr lm y ~ x -0.3694711 1.0066784 164 0.91454377
#> 4 S1 S2 mefr lm y ~ x 0.4813008 0.7150873 164 0.70223571
#> 5 S2 S3 amp lm y ~ x 0.7597112 0.9782935 100 0.99765823
#> 6 S2 S3 dur lm y ~ x 0.7959971 0.7868426 100 0.70537848
#> 7 S2 S3 mafr lm y ~ x -1.0932362 1.5764049 100 0.94356887
#> 8 S2 S3 mefr lm y ~ x 0.5505522 1.0358693 100 0.76774263
#> 9 S2 S3 ratio lm y ~ x 0.5890385 0.6172956 100 0.97341252
#> 10 S3 S4 amp lm y ~ x -0.1251017 0.9485448 30 0.98810523
#> 11 S3 S4 dur lm y ~ x 3.0299519 0.4642539 30 0.09809454
#> 12 S3 S4 mafr lm y ~ x 3.0964565 0.5820430 30 0.87656463
#> 13 S3 S4 mefr lm y ~ x 2.2653366 0.7352544 30 0.56098791
#> 14 S3 S4 ratio lm y ~ x 1.2112368 0.3257644 30 0.44545302
::grid.arrange(grobs = res$plot_predict$grobs, nrow = 3, ncol = 2) gridExtra
peaktrace()
with existing settingsIt the automatically determined settings using
peaktrace()
are not suitable, the settings files can be
edited to insert manual settings. In the following we use a path to a
settings file for event type 2
where between S2 and S3 the
settings were modified to 0.75, 1 and 1.5 and no settings are provided
for subsequent pairs of stations. Function peaktrace()
is
called again providing also the path to this settings file and the
resulting settings for event type 2
are inspected:
<- system.file("testdata", package = "hydroroute")
settings_path <- peaktrace(relation_path, events_path, initial_values_path,
res
settings_path)$`2`$settings
res#> station.x station.y bound lag metric
#> 1 S1 S2 lower 1 0.7388856
#> 2 S1 S2 inner 1 0.9500000
#> 3 S1 S2 upper 1 1.1611144
#> 4 S2 S3 lower 1 0.7500000
#> 5 S2 S3 inner 1 1.0000000
#> 6 S2 S3 upper 1 1.5000000
#> 7 S3 S4 lower 1 0.8241694
#> 8 S3 S4 inner 1 0.9500000
#> 9 S3 S4 upper 1 1.0758306
The procedure is then applied without determining the settings in an automatic way, but using those specified. The settings files can also only provide settings for some relations between neighboring stations which are then used, while for those relations between neighboring stations where no settings are provided, these are again determined using the automatic procedure.
Greimel F, Zeiringer B, Höller N, Grün B, Godina R, Schmutz S (2016). “A Method to Detect and Characterize Sub-Daily Flow Fluctuations.” Hydrological Processes, 30(13), 2063-2078. doi: 10.1002/hyp.10773
Greimel F, Grün B, Hayes DS, Höller N, Haider J, Zeiringer B, Holzapfel P, Hauer C, Schmutz S (2022). “PeakTrace: Routing of Hydropower Plant-Specific Hydropeaking Waves Using Multiple Hydrographs - A Novel Approach.” River Research and Applications, 1-14. doi: 10.1002/rra.3978