Help for package SmoothPLS

Type:

Package

Title:

Partial Least-Squares Algorithm for Categorical and Scalar Functional Data

Version:

0.1.5

Date:

2026-04-10

Description:

Performs the Partial Least-Squares ('PLS') algorithm for functional data through the concept of active area integration. This approach builds upon the basis expansion methods for functional 'PLS' regression described in Aguilera et al. (2010) <doi:10.1016/j.chemolab.2010.09.007>. The package seamlessly handles both Scalar Functional Data ('SFD') and Categorical Functional Data ('CFD'), providing interpretable regression curves even for discrete state changes. It was developed during a PhD thesis between 'DECATHLON' and French research institute 'INRIA' 2022-2026. The 'SmoothPLS' method does not directly decompose the data into a basis; rather, it assumes the data is known as precisely as desired, and for every 'PLS' component, the weight functions are decomposed into the basis. For both single-state and multi-state 'CFD' as well as 'SFD', the algorithm is implemented for a scalar response. To provide a baseline, a naive 'PLS' method on time-value functions and standard Functional 'PLS' are also implemented.

License:

MIT + file LICENSE

URL:

https://github.com/FrancoisBassac/SmoothPLS, https://FrancoisBassac.github.io/SmoothPLS/

BugReports:

https://github.com/FrancoisBassac/SmoothPLS/issues

Encoding:

UTF-8

RoxygenNote:

7.3.3

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0)

VignetteBuilder:

knitr

Imports:

cfda (≥ 0.12.1), dplyr (≥ 1.1.4), fda (≥ 6.2.0), ggplot2 (≥ 3.5.1), MASS (≥ 7.3-64), mgcv (≥ 1.9-1), pls (≥ 2.8-5), pracma (≥ 2.4.4), stats (≥ 4.4.1), tidyr (≥ 1.3.1), utils (≥ 4.4.1), rlang (≥ 1.1.0), magrittr (≥ 2.0.3), future.apply, future

Config/testthat/edition:

NeedsCompilation:

Packaged:

2026-04-29 17:11:27 UTC; Francois

Author:

Francois Bassac [aut, cre]

Maintainer:

Francois Bassac <fr.bassac@gmail.com>

Repository:

CRAN

Date/Publication:

2026-05-04 19:20:02 UTC

assemble_basis_metric

Description

This function assemble the metrics of all the basis of the basis list. This function only assemble the needed basis, especially if length(curve_to_keep) != N_states

Usage

assemble_basis_metric(basis_list, curves_to_keep = NULL)

Arguments

basis_list

a list of basis fd object

curves_to_keep

a list of the states curves to keep

Value

a matrix of the metric to consider

Author(s)

Francois Bassac

Examples

basis1 = fda::create.bspline.basis(c(0,100), nbasis=10, norder=4)
basis2 = fda::create.bspline.basis(c(0,100), nbasis=15, norder=1)
basis3 = fda::create.fourier.basis(c(0,100), nbasis=7)
assemble_basis_metric(list(basis1, basis2, basis3), list(1,2,4))
assemble_basis_metric(list(basis1, basis2, basis3), list(1,2))

assert_funcPLS_inputs

Description

This function checks the integrity of the input for funcPLS. It returns a list of (basis_list, regul_time_list, curve_type_list, id_col_list, time_col_list)

Usage

assert_funcPLS_inputs(
  df_list,
  Y,
  basis_obj,
  regul_time_obj,
  curve_type_obj = NULL,
  id_col_obj = "id",
  time_col_obj = "time"
)

Arguments

df_list

a list of dataframes (id, time, value_or_state)

Y

a numeric vector of the response

basis_obj

a list of basis object or a basis object

regul_time_obj

a vector of time regularization values or a list of vectors

curve_type_obj

a character "cat" or 'num' or a list of those values

id_col_obj

a character of the id column for all the curves or a list of id column character

time_col_obj

a character of the time column for all the curves or a list of time column character

Value

a list of (basis_list, regul_time_list, curve_type_list, id_col_list, time_col_list)

Author(s)

Francois Bassac

assert_multivariate_naivePLS_inputs

Description

This function checks the input of naivePLS function.

Usage

assert_multivariate_naivePLS_inputs(
  df_list,
  Y,
  regul_time_obj = NULL,
  curve_type_obj = NULL,
  id_col_obj = "id",
  time_col_obj = "time"
)

Arguments

df_list

a list of dataframe (id, time, value_or_state)

Y

a numeric vector

regul_time_obj

a list of time regularisation values

curve_type_obj

a list of curve type 'cat' or 'num'

id_col_obj

a list of the names of the id columns

time_col_obj

a list of the names of the time columns

Value

a list

Author(s)

Francois Bassac

assert_multivariate_smoothPLS_inputs

Description

This function checks the integrity of the input for multivariate_fpls. It returns a list of (basis_list, regul_time_list, curve_type_list, id_col_list, time_col_list)

Usage

assert_multivariate_smoothPLS_inputs(
  df_list,
  Y,
  basis_obj,
  regul_time_obj = NULL,
  curve_type_obj = NULL,
  orth_obj = list(TRUE),
  id_col_obj = "id",
  time_col_obj = "time"
)

Arguments

df_list

a list of dataframes (id, time, value_or_state)

Y

a numeric vector of the response

basis_obj

a list of basis object or a basis object

regul_time_obj

a vector of time regularization values or a list of vectors

curve_type_obj

a character "cat" or 'num' or a list of those values

orth_obj

a boolean, a list or a vector of boolean to orthonormalize or not a basis

id_col_obj

a character of the id column for all the curves or a list of id column character

time_col_obj

a character of the time column for all the curves or a list of time column character

Value

a list of (basis_list, regul_time_list, curve_type_list, orth_list, id_col_list, time_col_list)*

Author(s)

Francois Bassac

beta_1_real_func

Description

beta_1_real_func

Usage

beta_1_real_func(t, end_time = 100, drop = NULL)

Arguments

t

evaluation time

end_time

end time; default 100

drop

particular point of the curve, default NULL

Value

a value

Author(s)

Francois Bassac

Examples

beta_1_real_func(0)
beta_1_real_func(10)
beta_1_real_func(10:90)
plot(x=0:100, y=beta_1_real_func(0:100, 100), type='l', main="Beta_1")

beta_2_real_func

Description

beta_2_real_func

Usage

beta_2_real_func(t, end_time = 100, drop = 3 * 100/5)

Arguments

t

evaluation time

end_time

end time; default 100

drop

particular point of the curve, default 3*100/5

Value

a value

Author(s)

Francois Bassac

Examples

beta_2_real_func(0)
beta_2_real_func(10)
beta_2_real_func(10:90)
plot(x=0:100, y=beta_2_real_func(0:100, 100), type='l', main="Beta_2")

beta_3_real_func

Description

beta_3_real_func

Usage

beta_3_real_func(t, end_time = 100, drop = 27 * 100/100)

Arguments

t

evaluation time

end_time

end time; default 100

drop

particular point of the curve, default 27

Value

a value

Author(s)

Francois Bassac

Examples

beta_3_real_func(0)
beta_3_real_func(10)
beta_3_real_func(10:90)
plot(x=0:100, y=beta_3_real_func(0:100, 100), type='l', main="Beta_3")

beta_4_real_func

Description

beta_4_real_func

Usage

beta_4_real_func(t, end_time = 100, drop = NULL)

Arguments

t

evaluation time

end_time

end time; default 100

drop

particular point of the curve, default NULL

Value

a value

Author(s)

Francois Bassac

Examples

beta_4_real_func(0)
beta_4_real_func(10)
beta_4_real_func(10:90)
plot(x=0:100, y=beta_4_real_func(0:100, 100), type='l', main="Beta_4")

beta_5_real_func

Description

beta_5_real_func

Usage

beta_5_real_func(t, end_time = 100, drop = 3 * 100/5)

Arguments

t

evaluation time

end_time

end time; default 100

drop

particular point of the curve, default 3*100/5

Value

a value

Author(s)

Francois Bassac

Examples

beta_5_real_func(0)
beta_5_real_func(10)
beta_5_real_func(10:90)
plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")

beta_6_real_func

Description

Constant function = 1 Can adjust the constant value by drop input

Usage

beta_6_real_func(t, end_time = 100, drop = 1)

Arguments

t

evaluation time

end_time

end time; default 100

drop

particular point of the curve, default 1

Value

a value

Author(s)

Francois Bassac

Examples

beta_5_real_func(0)
beta_5_real_func(10)
beta_5_real_func(10:90)
plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")

beta_7_real_func

Description

Triangular function with angle at (x=drop, y=drop) with slope of 1 and -1

Usage

beta_7_real_func(t, end_time = 100, drop = 3 * end_time/5)

Arguments

t

evaluation time

end_time

end time; default 100

drop

particular point of the curve, default 3*end_time/5

Value

a value

Author(s)

Francois Bassac

Examples

beta_5_real_func(0)
beta_5_real_func(10)
beta_5_real_func(10:90)
plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")

beta_list_generation

Description

beta_list_generation

Usage

beta_list_generation(N_states = 3)

Arguments

N_states

a int of the number of states wanted, default 3

Value

a list of functions

Author(s)

Francois Bassac

Examples


beta_list = beta_list_generation()
beta_list_2 = beta_list_generation(6)

block_diag

Description

returns a matrix whose blocks are A and B. returns : ( A, 0 ) ( 0, B )

Usage

block_diag(A, B)

Arguments

A

a matrix

B

a matrix

Value

a matrix

Author(s)

Francois Bassac

Examples

A = matrix(c(1, 2, 3, 4), 2)
B = matrix(c(5, 6, 7, 8,9, 10), 2)
C = block_diag(A, B)

build_df_per_state

Description

build_df_per_state

Usage

build_df_per_state(data_list, id_col = "id", time_col = "time")

Arguments

data_list

a list containing the dataframe of the indicator function of each state.

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

Value

a list of the ordered states in the indicator function form.

Author(s)

Francois Bassac

Examples

N_states = 3
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)

df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100,
lambdas, transition_df)

si_df = state_indicator(df, id_col='id',
time_col='time')

split_df = split_in_state_df(si_df, id_col='id', time_col='time')

build_new_data_list

Description

This function preprocess the different input in order to format them to the right number of curve. Warning the "right" number of curve take into account the number of different states for the CFDs.

Usage

build_new_data_list(
  df_list,
  N_curves,
  orth_basis_list = NULL,
  basis_list = NULL,
  curve_type_list,
  id_col_list,
  time_col_list,
  regul_time_list
)

Arguments

df_list

a list of dataframes (id, time, value_or_state)

N_curves

a integer, the number of curves

orth_basis_list

a list of orthogonalized basis fd list

basis_list

a list of basis fd functions

curve_type_list

a list of the curve type of each curve

id_col_list

a list of the id column name for each curve

time_col_list

a list of the time column name for each curve

regul_time_list

a list of the time regularization vector for each curve

Value

a list

Author(s)

Francois Bassac

build_reg_curve_spls

Description

This function builds the smooth PLS regression functions.

Usage

build_reg_curve_spls(
  plsr_model,
  curves_names_list,
  v_i_list,
  nb_comp_pls_opt = NULL,
  print_steps = TRUE
)

Arguments

plsr_model

a pls model

curves_names_list

a list of the names of the different curves

v_i_list

a list of the v functions

nb_comp_pls_opt

a integer, the number of component to take into account. if null (default) all the components are used

print_steps

a boolean to print steps

Value

a list of fd object

Author(s)

Francois Bassac

build_spls_functions

Description

This function build some intermediate functions of the smooth pls algorithm. It also evaluates the different coefficients gamma_ij

Usage

build_spls_functions(
  curves_names_list,
  new_basis_list,
  new_orth_basis_list,
  d_i,
  u_i
)

Arguments

curves_names_list

a list of the names of the different curves

new_basis_list

a list of the initial basis fd object for all curves

new_orth_basis_list

a list of the orthogonalized basis as fd list for all curves

d_i

the pls coefficient such as X = d_i t_i : plsr_model$loadings

u_i

the pls coefficient such as t_i = X u_i : plsr_model$loading.weights

Value

a list Lambda = sum d_i t_i

Author(s)

Francois Bassac

build_u_ki_list

Description

build_u_ki_list

Usage

build_u_ki_list(N_states, nbComp, ms_pls_models)

Arguments

N_states

a integer, number of different states

nbComp

a integer, max number of components

ms_pls_models

a list of the intermediate pls models to evaluate the real multi-stats pls components.

Value

a list of the u_i^k

Author(s)

Francois Bassac

cat_data_to_indicator

Description

This function apply all functions to go from a categorical functional data with different states to a list of one dataframe per state indicatrice (in the ascending order) whose duplicated states where removed.

Usage

cat_data_to_indicator(data, id_col = "id", time_col = "time")

Arguments

data

a multistates dataframe ('id', 'time', 'states')

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

Value

a list of the ordered states in the indicator function form.

Author(s)

Francois Bassac

Examples

N_states = 3
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)

df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100,
lambdas, transition_df)
df_list = cat_data_to_indicator(df)

convert_to_wide_format

Description

This function takes the df_new output from regularize_time_series and convert it into another format

Usage

convert_to_wide_format(df_new, id_col = "id", time_col = "time")

Arguments

df_new

a regularized dataframe

id_col

col_name of df_new for the id

time_col

col_name of df_new for the id

Value

the dataframe in wide format

Author(s)

Francois Bassac

Examples

id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1))
id_df_new = regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'cat')
convert_to_wide_format(id_df_new)

create_bspline_basis

Description

create_bspline_basis

Usage

create_bspline_basis(start, end, nbasis = 10, norder = 4)

Arguments

start

start time

end

end time

nbasis

number of basis functions, default 10

norder

order of the basis function, default cubic splines 4

Value

a basis fd object

Author(s)

Francois Bassac

Examples

b0 = create_bspline_basis(0, 10, 10, 4)
plot(b0)

b1 = create_bspline_basis(0, 10, 10, 2)
plot(b1)

b2 = create_bspline_basis(0, 10, 10, 1)
plot(b1)

determine_curve_name

Description

This function determines the curves names bases on its place in the data and its states if it is a categorical functional data.

Usage

determine_curve_name(curve_number, curve_type = NULL, states_names = NULL)

Arguments

curve_number

a int, the place of the curve in the data

curve_type

a character, 'cat' or 'num'

states_names

a character or list of character with the states of the CFD

Value

a vector of names.

Author(s)

Francois Bassac

determine_next_state

Description

This function determines the next state base on the current state and the transition matrix.

Usage

determine_next_state(current_state, transition_df)

Arguments

current_state

a value of the current state

transition_df

a dataframe of the transition matrix

Value

a value for the next state

Author(s)

Francois Bassac

Examples

N_states = 3
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)
determine_next_state(1, transition_df)

eval_max_min_y

Description

This function returns the min and max values of a list of functions and fd objects on regul_time.

Usage

eval_max_min_y(f_list, regul_time)

Arguments

f_list

a list of functions and fd objects

regul_time

a vector of time evaluation points.

Value

a vector

Author(s)

Francois Bassac

evaluate_V_i_function

Description

This function evaluates the different functions v_i_t bases on gamma_ij and the functions w_i_fd. Recursive : v_i = W_i - \sum_{j=1}^{i-1}gamma_ij*v_j

Usage

evaluate_V_i_function(w_i_list, gamma_ij_list)

Arguments

w_i_list

a list of fd functions w_i(t)

gamma_ij_list

a list of list of gamma_ij values

Value

a list of fd functions

Author(s)

Francois Bassac

evaluate_curves_distances

Description

Either for R func or fd function, this function evaluates the distance between the real curve and each curve fun or fd which are in the func_fd_list.

Usage

evaluate_curves_distances(real_f, regul_time, fun_fd_list = NULL)

Arguments

real_f

a fun of fd function, base function to compare

regul_time

a vector of time regularization values

fun_fd_list

a list of fun or fd functions or a fun or a fd function

Value

No return value, called for side effects (prints distances to the console).

Author(s)

Francois Bassac

evaluate_gamma_ij

Description

This function evaluates int_0^T w_i_fd p_j_t dt for all j return a list, the element i length is (i-1)

Usage

evaluate_gamma_ij(w_i_list, p_i_list)

Arguments

w_i_list

a list of fd functions w_i(t)

p_i_list

a list of fd functions p_i(t)

Value

a list of list of the gamma_ij values

Author(s)

Francois Bassac

evaluate_id_func_integral

Description

Evaluates the integral \int_{\tau_i} f(t) dt where \tau_i are the active intervals of a categorical functional data (states 0 or 1).

Usage

evaluate_id_func_integral(
  id_df,
  func,
  id_col = "id",
  time_col = "time",
  rel_tol = .Machine$double.eps^0.5,
  subdivisions = 1000L,
  ...
)

Arguments

id_df

Dataframe for a single individual with at least columns (id, time, state).

func

The R function to integrate.

id_col

Character, name of the id column, default 'id'.

time_col

Character, name of the time column, default 'time'.

rel_tol

Relative tolerance for stats::integrate, default 1e-8.

subdivisions

Max number of subdivisions for integrate, default 100.

...

Additional arguments (ignored to prevent passing unused params to func).

Value

A dataframe with the id and the calculated integral value.

evaluate_id_func_integral

Description

This function evaluate the integral for a state (0, 1) functional data : int( X(t) func(t) )dt. This function works ONLY for a one state CFD!

Usage

evaluate_id_func_integral_deprecated(
  id_df,
  func,
  mode = 1,
  id_col = "id",
  time_col = "time",
  nb_pt = 10,
  subdivisions = 100
)

Arguments

id_df

a single id dataframe of at least named columns (id, time)

func

the function to integrate

mode

select the integration mode 1 for R function integrate, 2 for pracma::trapz. default value : 1

id_col

col_name of df for the id

time_col

col_name of df for the time

nb_pt

number of points for the integration, default value : 10

subdivisions

default parameter of R function integrate; default value : 100

Value

a dataframe with the id and the integral value.

Author(s)

Francois Bassac

Examples

id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1))
evaluate_id_func_integral(id_df, function(t){t})

evaluate_lambda

Description

This function evaluates the Lambda matrix such as per column : Lambda_i = int_0^T X(t) phi_i(t) dt. The curve_type input is important function of the type of data you work with 'cat' for Categorical Functional Data 'num' for Scalar Functional Data

Usage

evaluate_lambda(
  df,
  basis,
  curve_type = NULL,
  int_mode = 1,
  id_col = "id",
  time_col = "time",
  nb_pt = 10,
  subdivisions = 100,
  regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1),
  parallel = TRUE
)

Arguments

df

dataframe X(t)

basis

basis fd object

curve_type

a character, 'cat' for Categorical FD, 'num' for Scalar FD

int_mode

integration mode, 1 for integrate, 2 for pracma::trapz

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

nb_pt

number of points for the integration, default value : 10

subdivisions

default parameter of R function integrate; default value : 100

regul_time

regul_time a vector of time regularization values default basis rangeval per 1

parallel

a boolean to use parallelization, default TRUE

Value

a matrix \Lambda_i = \int_0^T X(t) \phi_i(t) dt

Author(s)

Francois Bassac

Examples

df = generate_X_df(nind=100, start=0, end=100, curve_type = 'cat',
lambda_0=0.2, lambda_1=0.1, prob_start=0.5)
basis = create_bspline_basis(0, 100, 10, 4)
Lambda = evaluate_lambda(df, basis, curve_type = 'cat')


df = generate_X_df(nind=100, start=0, end=100, curve_type = 'num')
basis = create_bspline_basis(0, 100, 10, 4)
Lambda = evaluate_lambda(df, basis, curve_type = 'num', int_mode = 2)

evaluate_lambda_CFD

Description

This function evaluates the Lambda matrix Lambda_ij = int X_j(t) phi_i(t) dt using parallel processing with Chunking.

Usage

evaluate_lambda_CFD(
  df,
  basis,
  int_mode = 1,
  id_col = "id",
  time_col = "time",
  nb_pt = 10,
  subdivisions = 100,
  parallel = TRUE
)

Arguments

df

dataframe X(t)

basis

basis fd object or a list of fd functions

int_mode

integration mode, 1 for integrate, 2 for pracma::trapz

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

nb_pt

number of points for the integration, default value : 10

subdivisions

default parameter of R function integrate; default value : 100

parallel

boolean, if TRUE uses max(nb_cores - 2, 1) cores, default TRUE.

Value

a matrix of dimension nbasis columns and nind rows

Author(s)

Francois Bassac

evaluate_lambda_CFD_para_v1

Description

This function evaluates the Lambda matrix Lambda_ij = int X_j(t) phi_i(t) dt using parallel processing.

Usage

evaluate_lambda_CFD_para_v1(
  df,
  basis,
  int_mode = 1,
  id_col = "id",
  time_col = "time",
  nb_pt = 10,
  subdivisions = 100,
  parallel = TRUE
)

Arguments

df

dataframe X(t)

basis

basis fd object or a list of fd functions

int_mode

integration mode, 1 for integrate, 2 for pracma::trapz

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

nb_pt

number of points for the integration, default value : 10

subdivisions

default parameter of R function integrate; default value : 100

parallel

a boolean to enable parallel processing, default TRUE

Value

a matrix of dimension nbasis columns and nind rows

Author(s)

Francois Bassac

evaluate_lambda_SFD

Description

This function evaluates the Lambda matrix Lambda_ij = int X_j(t) phi_i(t) dt for step > 1 using parallel chunking.

Usage

evaluate_lambda_SFD(
  df,
  basis,
  regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1),
  int_mode = 1,
  id_col = "id",
  time_col = "time",
  subdivisions = 100,
  parallel = TRUE
)

Arguments

df

dataframe X(t)

basis

basis fd object

regul_time

a vector of time regularization values default basis rangeval per 1

int_mode

int, integration mode, 1 for integrate, 2 for pracma::trapz

id_col

default name of the id column

time_col

default name of the time column

subdivisions

default parameter of R function integrate; default value : 100

parallel

boolean, if TRUE uses parallel processing. Default TRUE.

Value

a matrix of dimension nbasis columns and nind rows

Author(s)

Francois Bassac

evaluate_lambda_SFD_para_v1

Description

This function evaluates the Lambda matrix Lambda_ij = int X_j(t) phi_i(t) dt for step > 1

Usage

evaluate_lambda_SFD_para_v1(
  df,
  basis,
  regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1),
  int_mode = 1,
  id_col = "id",
  time_col = "time",
  subdivisions = 100,
  parallel = TRUE
)

Arguments

df

dataframe X(t)

basis

basis fd object

regul_time

a vector of time regularization values default basis rangeval per 1

int_mode

int, integration mode, 1 for integrate, 2 for pracma::trapz

id_col

default name of the id column

time_col

default name of the time column

subdivisions

default parameter of R function integrate; default value : 100

parallel

a boolean to enable parallel processing, default TRUE

Value

a matrix of dimension nbasis columns and nind rows

Author(s)

Francois Bassac

evaluate_metric

Description

This function evaluates the metric of a certain basis. The metric is the inprod of the basis functions.

Usage

evaluate_metric(basis)

Arguments

basis

basis to evaluate the metric

Value

a matrix of dimension nbasis X nbasis

Author(s)

Francois Bassac

Examples

basis = create_bspline_basis(start=0, end=10, nbasis=10, norder=4)
metric = evaluate_metric(basis)

basis1 = create_bspline_basis(start=0, end=20, nbasis=10, norder=1)
metric1 = evaluate_metric(basis1)

evaluate_reg_curve_SPLS_uni

Description

This functions evaluates the regression function such as : Y = \int_0^T delta(t) X(t) dt.

Usage

evaluate_reg_curve_SPLS_uni(plsr_model, v_i_list, nb_comp = NULL)

Arguments

plsr_model

a model from pls package

v_i_list

a list of fd functions v_i(t) : t_i = \int_0^T v_i(t) X(t) dt

nb_comp

a value, number of components to take into account, default NULL if NULL, then delta(t) will take every components available.

Value

a fd function

Author(s)

Francois Bassac

evaluate_results This function evaluates the PRESS, RMSE, MAE, R2 and the % of variance between Y and Y_hat

Description

evaluate_results This function evaluates the PRESS, RMSE, MAE, R2 and the % of variance between Y and Y_hat

Usage

evaluate_results(Y, Y_hat)

Arguments

Y

a vector of real values

Y_hat

a vector of modeled values

Value

a dataframe

Author(s)

Francois Bassac

Examples

evaluate_results(c(1,2,3,4,5), c(0.9, 2.2, 4, 5.5, 5))

evaluate_variance_explained

Description

This function return the % of variance explained by Y_hat comparing to Y.

Usage

evaluate_variance_explained(Y, Y_hat)

Arguments

Y

a reference value

Y_hat

a modeled value Y_hat = model(X)

Value

a value in %

Author(s)

Francois Bassac

Examples

evaluate_variance_explained(c(1,2,3,4,5,6,7,8,9,10),
c(0.9, 1.1, 1.9, 2.4, 5.3, 6.01, 7,45, 9.12, 9.04, 11.6))

from_basis_to_fdlist

Description

This function transform if necessary the input basis into a list of fd functions. If basis is a basis object from fda, the output fd_list is the list of the different basis functions as fd functions. If basis is already a list of fd functions, nothing changes.

Usage

from_basis_to_fdlist(basis)

Arguments

basis

a basis fd object or a list of fd functions.

Value

a list of fd functions

Author(s)

Francois Bassac

Examples

basis = create_bspline_basis(start = 0, end = 10, nbasis = 5, norder = 4)
plot(basis)

basis_list = from_basis_to_fdlist(basis)

plot(basis_list[[1]], col = 1)
for(i in 2:length(basis_list)){
  plot(basis_list[[i]], add=TRUE, col = i)
}

basis_list_2 = from_basis_to_fdlist(basis_list)

from_fd_to_func

Description

This function transform a fd object into a function. It require either the fd object OR the coefficient and the basis object.

Usage

from_fd_to_func(fd_obj = NULL, coef = NULL, basisobj = NULL)

Arguments

fd_obj

a fd object to transform into a function

coef

the coefficient of an fd object to transform into a function

basisobj

the basis object of an fd object to transform into a function

Value

a function

Author(s)

Francois Bassac

Examples

basis = create_bspline_basis(0, 100, 10, 4)
coef = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
func_from_fd = from_fd_to_func(coef = coef, basis = basis)

funcPLS

Description

This function performs the Multivariate Functional PLS as a matrix problem.

Usage

funcPLS(
  df_list,
  Y,
  basis_obj,
  regul_time_obj,
  curve_type_obj = NULL,
  id_col_obj = "id",
  time_col_obj = "time",
  print_steps = FALSE,
  plot_rmsep = TRUE,
  print_nbComp = TRUE,
  plot_reg_curves = FALSE,
  jackknife = TRUE,
  validation = "LOO"
)

Arguments

df_list

a list of dataframes (id, time, value_or_state)

Y

a numeric vector of the response

basis_obj

a basis fd obj or a list of basis fd obj. If basis fd obj, the same basis is used for all the curves

regul_time_obj

a vector of time regularization values or a list of vectors

curve_type_obj

a character "cat" or 'num' or a list of those values

id_col_obj

a character of the id column for all the curves or a list of id column character

time_col_obj

a character of the time column for all the curves or a list of time column character

print_steps

a boolean to cat the current step

plot_rmsep

a boolean to plot the plsr RMSEP

print_nbComp

a boolean to cat the optimal number of components

plot_reg_curves

a boolean to directly plot the beta regression curves

jackknife

a plsr input, default = TRUE

validation

a plsr input, default = 'LOO'

Value

a list ("curve_names", "alphas", "metric", "root_metric", "trans_alphas", "mfpls_mfd", "nb_comp_pls_opt", "beta_0", "beta_pls_list")

Author(s)

Francois Bassac

funcPLS_predict

Description

This function performs the prediction on a df_predict_ms using the delta_list \hat(Y) = \delta_0 + \sum_{i=1}^K \int_0^T \delta_i(t) X_i(t) dt.

Usage

funcPLS_predict(
  df_predict_list,
  delta_list,
  curve_type_obj = NULL,
  regul_time_obj = NULL,
  id_col_obj = "id",
  time_col_obj = "time",
  int_mode = 1,
  nb_pt = 10,
  subdivisions = 100,
  parallel = TRUE
)

Arguments

df_predict_list

a list of dataframe (id, time, state_or_value)

delta_list

a list of regression objects (Intercept, fd, etc)

curve_type_obj

a list of the curves types 'cat' or 'num'

regul_time_obj

a list of time regularization values

id_col_obj

a list of characters of the names of the id columns

time_col_obj

a list of characters of the names of the time columns

int_mode

a integer for integration mode, 1 for integrate, 2 for pracma::trapz, default 1

nb_pt

a integer, the number of intermediate points for integration mode 2, default 10

subdivisions

a integer, the number of subdivisions for integration mode 1, default 100

parallel

a boolean to use parallelization, default TRUE

Value

a numeric vector

Author(s)

Francois Bassac

generate_X_df

Description

This function generate synthetic data of nind X(t). For 'cat' curve_type it is in state 0 or 1 by two different exponential laws. For curve_type = 'num' it is noised cosine.

Usage

generate_X_df(
  nind = 100,
  start = 0,
  end = 100,
  curve_type = "cat",
  lambda_0 = 0.2,
  lambda_1 = 0.1,
  prob_start = 0.5,
  noise_sd = 0.1,
  seed = 123
)

Arguments

nind

number of individuals, default 100

start

first time, default 0

end

last time, default 100

curve_type

character, type of wanted synthetic data, default 'cat', need to be \'cat\' or \'num\'.

lambda_0

lambda parameter for exponential law for state 0, default 0.2

lambda_1

lambda parameter for exponential law for state 1, default 0.1

prob_start

Start state 1 probability, binomial law, default 0.5

noise_sd

noise added to the signal, default 0.1

seed

seed for reproducibility, default 123

Value

the dataframe of the individuals

Author(s)

Francois Bassac

Examples

generate_X_df()

generate_X_df_CFD

Description

This function generate synthetic data of nind X(t) in state 0 or 1 by two different exponential laws.

Usage

generate_X_df_CFD(
  nind = 500,
  start = 0,
  end = 100,
  lambda_0 = 0.2,
  lambda_1 = 0.1,
  prob_start = 0.5,
  seed = 123
)

Arguments

nind

number of individuals, default 500

start

first time, default 0

end

last time, default 100

lambda_0

lambda parameter for exponential law for state 0, default 0.2

lambda_1

lambda parameter for exponential law for state 1, default 0.1

prob_start

Start state 1 probability, binomial law, default 0.5

seed

a integer, random seed 123

Value

the dataframe of the individuals

Author(s)

Francois Bassac

Examples

generate_X_df_CFD()
generate_X_df_CFD(10, 13, 60, 0.21, 0.13, 0.7)

generate_X_df_SFD

Description

This function generate synthetic data of nind X which are noised cosines.

Usage

generate_X_df_SFD(nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123)

Arguments

nind

number of individuals, default 500

start

first time, default 0

end

last time, default 100

noise_sd

noise added to the signal, default 0.1

seed

seed for reproducibility, default 123

Value

a dataframe

Author(s)

Francois Bassac

Examples

generate_X_df_SFD(nind = 100, start = 0, end = 100,
noise_sd = 0.1, seed = 123)

generate_X_df_SFD_data

Description

This function generate test data of nind X which are noised cosines.

Usage

generate_X_df_SFD_data(
  nind = 100,
  start = 0,
  end = 100,
  noise_sd = 0.1,
  seed = 123
)

Arguments

nind

number of individuals, default 500

start

first time, default 0

end

last time, default 100

noise_sd

noise added to the signal, default 0.1

seed

seed for reproducibility, default 123

Value

a dataframe

Author(s)

Francois Bassac

Examples

generate_X_df_SFD(nind = 100, start = 0, end = 100,
noise_sd = 0.1, seed = 123)

generate_X_df_multistates

Description

This function generates a multistates CFD.

Usage

generate_X_df_multistates(
  nind = 100,
  N_states = 3,
  start = 0,
  end = 100,
  lambdas,
  transition_df,
  seed = 123
)

Arguments

nind

a int of the number of the individuals, default 100

N_states

a int of the number of states wanted, default 3

start

a value of starting time, default 0

end

a value of ending time, default 100

lambdas

a vector of N_states lambda values from lambda_determination(N_states)

transition_df

a dataframe with the transition matrix from transfer_probabilities(N_states)

seed

a integer, random seed

Value

a dataframe of a multistates Categorical Functional Data

Author(s)

Francois Bassac

Examples

N_states = 4
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)

df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100,
lambdas,  transition_df)

generate_X_df_test

Description

generate_X_df_test

Usage

generate_X_df_test(
  TTRatio = 0.2,
  nind = 500,
  start = 0,
  end = 100,
  curve_type = "cat",
  lambda_0 = 0.2,
  lambda_1 = 0.1,
  prob_start = 0.5,
  seed = 123
)

Arguments

TTRatio

Train Test ratio, default 0.2

nind

number of individuals, default 500

start

first time, default 0

end

last time, default 100

curve_type

character, type of wanted synthetic data, default 'cat', need to be \'cat\' or \'num\'.

lambda_0

lambda parameter for exponential law for state 0, default 0.2

lambda_1

lambda parameter for exponential law for state 1, default 0.1

prob_start

Start state 1 probability, binomial law, default 0.5

seed

a integer, random seed 123

Value

a dataframe of the test data

Author(s)

Francois Bassac

Examples

generate_X_df_test()
generate_X_df_test(TTRatio = 0.4, nind=8, start=0, end=10, curve_type = 'num')

generate_Y_df

Description

This function generates Y_df bases on df, beta_func with the following link Y = beta_0 + int(X(t)*beta(t))dt It generates also the noised values of Y. Here the NotS_ratio is the Noise over total Signal ratio meaning that a value of 0.2 means that the noise represents 20% of the TOTAL variance.

Usage

generate_Y_df(
  df,
  curve_type = NULL,
  beta_real_func_or_list,
  beta_0_real = 5.4321,
  NotS_ratio = 0.2,
  seed = 123,
  id_col = "id",
  time_col = "time",
  int_mode = 1,
  nb_pt = 10,
  subdivisions = 100,
  parallel = FALSE
)

Arguments

df

the X(t) dataframe to evaluate Y on

curve_type

a character, the type of data, default NULL, need to be \'cat\' or \'num\'.

beta_real_func_or_list

a function or a list of functions beta(t), function used for the Y evaluation

beta_0_real

the intercept, default 5.4321

NotS_ratio

the Noise over total Signal ratio, default 0.2

seed

a integer value for the seed to be reproducible 123

id_col

a character of the id column, default 'id'

time_col

a character of the time column, default 'time'

int_mode

integration mode, 1 for integrate, 2 for pracma::trapz

nb_pt

number of points for the integration, default value : 10

subdivisions

default parameter of R function integrate; default value : 100

parallel

a boolean to use parallelization, default FALSE

Value

a dataframe of Y real and noised values

Author(s)

Francois Bassac

Examples

df = generate_X_df(nind=100, curve_type = 'cat')
beta_real_func<-function(t, end_time=100, drop = NULL){
return(beta_real = sin(t*2*pi/end_time + pi) * exp (1.5*t/end_time))
}
beta_0_real=5.4321
Y_df = generate_Y_df(df, curve_type = 'cat',
beta_real_func, beta_0_real, NotS_ratio=0.2)

generate_Y_df_CFD

Description

Usage

generate_Y_df_CFD(
  df,
  beta_real_func_or_list,
  beta_0_real = 5.4321,
  NotS_ratio = 0.2,
  id_col = "id",
  time_col = "time",
  int_mode = 1,
  nb_pt = 10,
  subdivisions = 100,
  seed = 123,
  parallel = FALSE
)

Arguments

df

the X(t) dataframe to evaluate Y on

beta_real_func_or_list

a function beta(t), or a list of function used for the Y evaluation

beta_0_real

the intercept, default 5.4321

NotS_ratio

the Noise over total Signal ratio, default 0.2

id_col

a character of the id column, default 'id'

time_col

a character of the time column, default 'time'

int_mode

integration mode, 1 for integrate, 2 for pracma::trapz

nb_pt

number of points for the integration, default value : 10

subdivisions

default parameter of R function integrate; default value : 100

seed

a integer, random seed

parallel

a boolean to enable parallel processing, default FALSE

Value

a dataframe of Y real and noised values

Author(s)

Francois Bassac

Examples

df = generate_X_df(nind=100, curve_type='cat')
beta_real_func<-function(t, end_time=100, drop = NULL){
return(beta_real = sin(t*2*pi/end_time + pi) * exp (1.5*t/end_time))
}
beta_0_real=5.4321
Y_df = generate_Y_df_CFD(df, beta_real_func, beta_0_real)

generate_Y_df_SFD

Description

Usage

generate_Y_df_SFD(
  df,
  beta_real_func,
  beta_0_real = 5.4321,
  NotS_ratio = 0.2,
  id_col = "id",
  time_col = "time",
  seed = 123
)

Arguments

df

the X(t) dataframe to evaluate Y on

beta_real_func

a function beta(t), function used for the Y evaluation

beta_0_real

the intercept, default 5.4321

NotS_ratio

the Noise over total Signal ratio, default 0.2

id_col

a character of the id column, default 'id'

time_col

a character of the time column, default 'time'

seed

a integer, random seed, 123

Value

a dataframe

Author(s)

Francois Bassac

generate_probabilities

Description

This function generates probabilities whose sum is 1.

Usage

generate_probabilities(N_proba)

Arguments

N_proba

a int the number of values requested

Value

a vector of the probabilities

Author(s)

Francois Bassac

Examples

generate_probabilities(3)
generate_probabilities(5)

gram_schmidt_orthonormalize

Description

Orthonormalize a basis functions with Gram-Schmidt algorithm.

Usage

gram_schmidt_orthonormalize(basis, output_type = "fdlist", tol = 1e-12)

Arguments

basis

A basis object from fda package or a list of fd functions.

output_type

A character to choose the output format. "fdlist" (default) or "funlist" for R functions.

tol

a float, tolerance parameter, default 1e-12

Value

A list of orthonormalized functions fd or func(t)

Author(s)

Francois Bassac

Examples


start = 0
end = 10
basis = create_bspline_basis(start, end, nbasis = 10, norder = 4)

basis_orth = gram_schmidt_orthonormalize(basis, "fdlist")

help_smoothPLS

Description

This function print some information on the package use.

Usage

help_smoothPLS()

Value

No return value, called for side effects (prints instructions to the console).

Author(s)

Francois Bassac

initial_state_determination

Description

This functions determine randomly the initial state of a categorical functional data, uniform probability between states.

Usage

initial_state_determination(N_states)

Arguments

N_states

a int the number of considered states

Value

a value of the designated state

Author(s)

Francois Bassac

Examples

initial_state_determination(2)
initial_state_determination(7)

is_orthogonal

Description

Check if a basis function is orthogonal

Usage

is_orthogonal(basis, tol = 1e-10)

Arguments

basis

A basis object from fda package or a list of fd functions.

tol

a float, tolerance parameter, default 1e-10)

Value

A boolean, TRUE if orthogonal, FALSE if not

Author(s)

Francois Bassac

is_orthonormal

Description

Check if a basis function is orthonormal

Usage

is_orthonormal(basis, tol = 1e-10)

Arguments

basis

A basis object from fda package or a list of fd functions.

tol

a float, tolerance parameter, default 1e-10)

Value

A boolean, TRUE if orthonormal, FALSE if not

Author(s)

Francois Bassac

lambda_determination

Description

This function determines a number of lambda parameters for exponential laws.

Usage

lambda_determination(N_states, lambda_values = c(0.05, 0.25))

Arguments

N_states

a int the number of states

lambda_values

a vector of min and max values authorized for lambda

Value

a numeric vector with the lambda values

Author(s)

Francois Bassac

Examples

lambda_determination(3)
lambda_determination(7)

mae_values

Description

This function evaluates the MAE error based on the values.

Usage

mae_values(y, y_hat)

Arguments

y

a vector of real values

y_hat

a vector of predicted values

Value

a value

Author(s)

Francois Bassac

Examples

y = c(1, 2, 4, 6, 8, 10)
y_hat = c(2, 3, 5, 7, 9, 11)
mae_values(y, y_hat)

multivariate_alpha_building

Description

This function builds the alphas matrix by columns binding the alpha of each curve. It returns the alphas matrix and a vector containing all the curves names.

Usage

multivariate_alpha_building(
  df_list,
  basis_list,
  curve_type_list,
  regul_time_list,
  id_col_list,
  time_col_list,
  print_steps = FALSE
)

Arguments

df_list

a list of data frame (id, time, state_or_value)

basis_list

a list of basis

curve_type_list

a list of curves

regul_time_list

a list of regul_time

id_col_list

a list if the characters for the id column

time_col_list

a list of the characters for the time column

print_steps

a boolean to print the current step, default FALSE

Value

a list of the alphas matrix and the curve_names vector and the new_basis_list

Author(s)

Francois Bassac

multivariate_assemble_basis_metric

Description

This function assemble the metrics of all the basis of the basis list for multivariate use case. It take care on the categorical case by multiplying the basis per state.

Usage

multivariate_assemble_basis_metric(
  df_list,
  basis_list,
  curve_list,
  id_col_list,
  time_col_list
)

Arguments

df_list

a list of dataframe (id, time, value_or_state)

basis_list

a list of basis fd object

curve_list

a list of the curves type 'cat' or 'num'

id_col_list

a list of the character of the id columns

time_col_list

a list of the character of the time columns

Value

a matrix of the metric to consider

Author(s)

Francois Bassac

naivePLS

Description

This function performs the naive PLS method for Categorical functional data, Scalar functional data and multivariate data.

Usage

naivePLS(
  df_list,
  Y,
  regul_time_obj = NULL,
  curve_type_obj = NULL,
  id_col_obj = "id",
  time_col_obj = "time",
  print_steps = FALSE,
  plot_rmsep = TRUE,
  print_nbComp = TRUE,
  plot_reg_curves = FALSE,
  validation = "LOO",
  jackknife = TRUE
)

Arguments

df_list

a list of dataframe (id, time, value_or_state)

Y

a numeric vector for the scalar response

regul_time_obj

a list of time regularization values

curve_type_obj

a list of the curve types 'cat' or 'num'

id_col_obj

a list of character of the names of the id columns

time_col_obj

a list of character of the names of the time columns

print_steps

a boolean to print the different steps, default FALSE

plot_rmsep

a boolean to plot the RMSEP, default TRUE

print_nbComp

a boolean to print the optimal number or components, default TRUE

plot_reg_curves

a boolean to plot the regression curves, default FALSE

validation

a character, pls::plsr input, default 'LOO'

jackknife

a boolean, pls::plsr input, default TRUE

Value

a list of ("plsr_model", "nbCP_opti", "curves_names", "opti_reg_coef", "reg_obj")

Author(s)

Francois Bassac

naivePLS_formatting

Description

This function format the list of given dataframe into a time regularized matrix usable for the naivePLS function or prediction.

Usage

naivePLS_formatting(
  df_list,
  regul_time_obj = NULL,
  curve_type_obj = NULL,
  id_col_obj = "id",
  time_col_obj = "time"
)

Arguments

df_list

a list of dataframe of 3 columns (id, time, state_or_value)

regul_time_obj

a list of vector of time regularization values

curve_type_obj

a list of character of the type of curves

id_col_obj

a list of character for the id column names

time_col_obj

a list of character for the time column names

Value

a list

Author(s)

Francois Bassacs

naivePLS_predict

Description

This function use the df_pred to make a prediction for a naivePLS object.

Usage

naivePLS_predict(
  naive_pls_obj,
  df_predict_list,
  regul_time_obj = NULL,
  curve_type_obj = NULL,
  id_col_obj = "id",
  time_col_obj = "time"
)

Arguments

naive_pls_obj

a list of a naivePLS object

df_predict_list

a list of dataframe (id, time, state_or_value)

regul_time_obj

a list of time regularization values

curve_type_obj

a list of curves types 'cat' or 'num'

id_col_obj

a list of id column names

time_col_obj

a list of time column names

Value

a numeric vector

Author(s)

Francois Bassac

number_of_test_id

Description

This function evaluates the number of individuals for a test set.

Usage

number_of_test_id(TTRatio = 0.2, nind = 100)

Arguments

TTRatio

Train Test ratio, default 0.2

nind

number of individuals, default 100

Value

int, the number of id for test set

Author(s)

Francois Bassac

Examples

number_of_test_id(TTRatio = 0.2, nind=100)

obj_list_creation

Description

This functions creates a list of basis containing the value of the number of states or curves sharing the same basis.

Usage

obj_list_creation(N_rep, obj)

Arguments

N_rep

a value of the number of states sharing the same basis

obj

a object

Value

a list of N_rep appended obj

Author(s)

Francois Bassac

Examples

N_rep = 4
start = 1
end = 51
nbasis = 13
norder = 3

basis = create_bspline_basis(start, end, nbasis, norder)
basis_list = obj_list_creation(N_rep, basis)
obj_list_creation(N_rep, 0:100)

orthonormalize_basis_list

Description

This function orthonormalized a basis from a list if needed.

Usage

orthonormalize_basis_list(basis_list, orth_list, tol = 1e-09)

Arguments

basis_list

a list of basis to orthonormalized

orth_list

a list of boolean per basis if orthonormalization is needed

tol

a float, tolerance parameter.

Value

a list of list of fd object : the orthonormalized functions.

Author(s)

Francois Bassac

p_w_building

Description

This functions evaluates the p_i(t) (X_i regression functions) and w_i(t) such as t_i = \int_0^T w_i(t)X_{i-1}(t) dt. The evaluation is done for all the components.

Usage

p_w_building(
  coefficient,
  N_curves_processed,
  new_basis_list,
  new_orth_basis_list,
  curves_names_list
)

Arguments

coefficient

a table of the coefficient to use

N_curves_processed

a integer, the real number of curves (1 sfd = 1 curve, 1 cfd = 1curve per state)

new_basis_list

a list of processed basis list (1 basis per curve_processed)

new_orth_basis_list

a list of orthonormalized basis list (1 basis per curve_processed)

curves_names_list

a list of curve name (1 name per curve_processed)

Value

a list of fd objects

Author(s)

Francois Bassac

plot_CFD_individuals

Description

This function only plot some individuals. It plots the first plot_individuals of df_to_plot. Works both for single state and multistates data (numerical states 1, 2, 3, etc)

Usage

plot_CFD_individuals(
  df_to_plot,
  n_ind_to_plot = 5,
  id_col = "id",
  time_col = "time",
  by_cfda = FALSE
)

Arguments

df_to_plot

dataframe whose individuals will be plotted

n_ind_to_plot

number of the first individuals to plot, default 5

id_col

col_name of df_to_plot for the id, not character, default id'

time_col

col_name of df_to_plot for the time, not character, default 'time'

by_cfda

a boolean to use cfda package function plotData, default FALSE

Value

a ggplot

Author(s)

Francois Bassac

Examples

df = generate_X_df()
plot_CFD_individuals(df, 5)
plot_CFD_individuals(df, 5, by_cfda = TRUE)

plot_fd_list

Description

This function plots on the same figure the fd curves from the fd_list by evaluating them on the given regul_time

Usage

plot_fd_list(fd_list, curves_names, regul_time)

Arguments

fd_list

a list of fd objects

curves_names

a list of the curves names

regul_time

a numeric vector

Value

a plot

Author(s)

Francois Bassac

plot_model_metrics_base

Description

This function plots some histograms for train_results and test_results

Usage

plot_model_metrics_base(
  train_results,
  test_results,
  models_to_plot = c("FPLS", "SmoothPLS", "NaivePLS"),
  n_digits = 3
)

Arguments

train_results

a dataframe of train results

test_results

a dataframe of test results

models_to_plot

a list of characters of the models to plot, default c("FPLS", "SmoothPLS", "NaivePLS")

n_digits

a integer for the number of significant numbers to print, default 3

Value

a plot

Author(s)

Francois Bassac

plot_real_and_smoothed_data_ind

Description

plot_real_and_smoothed_data_ind

Usage

plot_real_and_smoothed_data_ind(
  df_wide,
  df_fd,
  time_seq = 0:100,
  id = 1,
  col_list = c("blue", "red", "green", "yellow")
)

Arguments

df_wide

a wide dataframe, output of convert_to_wide_format()

df_fd

a list of functional data from df_wide

time_seq

a vector to plot on, default 0:100

id

a value of the id to plot

col_list

a list of color to separate the states

Value

No return value, called for side effects (generates a plot).

Author(s)

Francois Bassac

Examples

N_states = 3
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)

df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100,
lambdas,  transition_df)
df_processed = cat_data_to_indicator(df)

df_regul = list()
df_wide = list()
for(name in names(df_processed)){
print(paste0(name, " regularisation"))
df_regul[[name]] = regularize_time_series(df_processed[[name]],
time_seq =  c(0:100), curve_type = 'cat', id_col='id', time_col='time')

df_wide[[name]] = convert_to_wide_format(df_regul[[name]], id_col='id',
time_col='time')
}

basis = create_bspline_basis(0, 100, 10, 4)

df_fd = list()
for(name in names(df_wide)){
print(paste0(name, " fd transformation"))
df_fd[[name]] = fda::Data2fd(argvals = c(0:100),
y = t(df_wide[[name]][, -c(1)]), basis)
}

plot_real_and_smoothed_data_ind(df_wide, df_fd, c(0:100), id=1)

press_model This function evaluates the PRESS error 'LOO' of a lm or glm model.

Description

press_model This function evaluates the PRESS error 'LOO' of a lm or glm model.

Usage

press_model(model)

Arguments

model

a model from lm or glm

Value

a value

Author(s)

Francois Bassac

press_values

Description

This function evaluates the press error using values y and y_hat

Usage

press_values(y, y_hat)

Arguments

y

a vector of real values

y_hat

a vector of predicted values

Value

a value

Author(s)

Francois Bassac

Examples

y = c(1, 2, 4, 6, 8, 10)
y_hat = c(2, 3, 5, 7, 9, 11)
press_values(y, y_hat)

r_squared_values

Description

This function evaluates the R_2 using values y and y_hat

Usage

r_squared_values(y, y_hat)

Arguments

y

a vector of real values

y_hat

a vector of predicted values

Value

a value

Author(s)

Francois Bassac

Examples

y = c(1, 2, 4, 6, 8, 10)
y_hat = c(2, 3, 5, 7, 9, 11)
r_squared_values(y, y_hat)

reg_curve_funcPLS_evaluation

Description

This function evaluate the beta regression functions of the multivariate functional PLS.

Usage

reg_curve_funcPLS_evaluation(
  mfpls_mfd,
  nb_comp_pls_opt,
  root_metric,
  new_basis_list,
  curve_names,
  print_steps = FALSE
)

Arguments

mfpls_mfd

a MFPLS model

nb_comp_pls_opt

the optimal number of components

root_metric

the root metric

new_basis_list

a list of basis fd objects

curve_names

a list of curves to keep

print_steps

a boolean to cat or not the steps

Value

a list of the betas per state

Author(s)

Francois Bassac

regularize_time_series

Description

This function regularize the data on a new time interval time_seq given the curve_type. For curve_type = 'cat' the output state stay the same between 2 times. For curve_type = 'num' the intermediate values are interpolated linearly.

Usage

regularize_time_series(
  df,
  time_seq = 0:100,
  curve_type = NULL,
  id_col = "id",
  time_col = "time"
)

Arguments

df

dataframe with one or more different ids

time_seq

New time sequence where we want to regularize

curve_type

A string giving the type of the curve, 'cat' for a categorical functional data, 'num' for a scalar functional data, default : NULL

id_col

col_name of df for the id

time_col

col_name of df for the time

Value

a dataframe with the regularized data on time_seq

Author(s)

Francois Bassac

Examples

id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1))
regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'cat')
regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'num')

regularize_time_series_CFD

Description

This function regularize the data CFD on a new time interval time_seq

Usage

regularize_time_series_CFD(
  df,
  time_seq = 0:100,
  id_col = "id",
  time_col = "time"
)

Arguments

df

dataframe with one or more different ids

time_seq

New time sequence where we want to regularize

id_col

col_name of df for the id

time_col

col_name of df for the time

Value

a dataframe with the regularized data on time_seq

Author(s)

Francois Bassac

regularize_time_series_SFD

Description

This function regularizes a Scalar Functional Data SFD on the time_seq input. This function uses linear interpolation.

Usage

regularize_time_series_SFD(
  df,
  time_seq = 0:100,
  id_col = "id",
  time_col = "time"
)

Arguments

df

dataframe with one or more different ids

time_seq

New time sequence where we want to regularize

id_col

col_name of df for the id

time_col

col_name of df for the time

Value

a regularized dataframe

Author(s)

Francois Bassac

remove_duplicate_states

Description

This function removes the duplicated states and keep the last line. this function works both with (0,1) or categorical states on the state_col!

Usage

remove_duplicate_states(data, id_col = "id", time_col = "time")

Arguments

data

a single- or multi-state dataframe.

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

Value

a dataframe with no duplicated states.

Author(s)

Francois Bassac

Examples

N_states = 3
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)

df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100,
lambdas, transition_df)
remove_duplicate_states(df)

select_from_fd_list

Description

This function return a list of functions of fd_list without the ones specified in the numeric vector toDrop.

Usage

select_from_fd_list(fd_list, toDrop = NULL)

Arguments

fd_list

a list of fd object

toDrop

a numeric vector

Value

a list of fd objects

Author(s)

Francois Bassac

smoothPLS_multi

Description

This function performs the smooth PLS algorithm for both the univariate case for a Scalar Functional Data or a Categorical Functional Data and the multivariate case for a mix of functional data of different nature (SFD or CFD). For some input, if the same value is needed for all the different curves, no need to make a list with the value per curve.

Usage

smoothPLS(
  df_list,
  Y,
  basis_obj,
  regul_time_obj = NULL,
  curve_type_obj,
  orth_obj = TRUE,
  id_col_obj = "id",
  time_col_obj = "time",
  int_mode = 1,
  print_steps = FALSE,
  plot_rmsep = TRUE,
  print_nbComp = TRUE,
  plot_reg_curves = FALSE,
  jackknife = TRUE,
  validation = "LOO",
  parallel = TRUE
)

Arguments

df_list

a list of dataframe (id, time, value_or_state)

Y

a vector of the scalar response

basis_obj

a basis fd object or a list of basis fd object

regul_time_obj

a vector for time regularization values or a list

curve_type_obj

a list or vector of the different curves types, 'cat' or 'num.

orth_obj

a list or a vector of booleans if the orthonormalization is needed

id_col_obj

a list or a vector of the id column name

time_col_obj

a list or a vector of time column name

int_mode

a integer of the integration method : 1 for integrate, 2 for pracma::trapz

print_steps

a boolean to print the algorithm steps

plot_rmsep

a boolean to plot the pls model RMSEP

print_nbComp

a boolean to print the optimal number of components

plot_reg_curves

a boolean to plot the regressions curves

jackknife

a boolean for the jackknife input of pls() function, default TRUE

validation

a character for the validation input of pls() function, default 'LOO'

parallel

a boolean to use parallelization, default TRUE

Value

a list of the plsr_model and the regression curves (and intercept).

Author(s)

Francois Bassac

smoothPLS_CFD_predict (Updated v0.1.4)

Description

Predicts the response variable for Categorical Functional Data (CFD) by integrating the regression coefficient function over active state intervals.

Usage

smoothPLS_CFD_predict(
  df_predict,
  delta_spls,
  id_col = "id",
  time_col = "time",
  subdivisions = 100,
  parallel = TRUE,
  ...
)

Arguments

df_predict

Dataframe containing columns for id, time, and state.

delta_spls

A list containing the scalar intercept and the functional regression coefficient (fd object).

id_col

Character, name of the id column, default 'id'.

time_col

Character, name of the time column, default 'time'.

subdivisions

integer, maximum number of sub-intervals for integration, default 100

parallel

a boolean to enable parallel processing, default TRUE.

...

Additional parameters passed to evaluate_id_func_integral (e.g., rel_tol, subdivisions).

Value

A numeric vector of predicted values for each individual.

Author(s)

Francois Bassac

smoothPLS_CFD_predict_para_v1 (Updated v0.1.2)

Description

Predicts the response variable for Categorical Functional Data (CFD) by integrating the regression coefficient function over active state intervals.

Usage

smoothPLS_CFD_predict_para_v1(
  df_predict,
  delta_spls,
  id_col = "id",
  time_col = "time",
  subdivisions = 100,
  parallel = TRUE,
  ...
)

Arguments

df_predict

Dataframe containing columns for id, time, and state.

delta_spls

A list containing the scalar intercept and the functional regression coefficient (fd object).

id_col

Character, name of the id column, default 'id'.

time_col

Character, name of the time column, default 'time'.

subdivisions

integer, maximum number of sub-intervals for integration, default 100

parallel

a boolean to enable parallel processing, default TRUE.

...

Additional parameters passed to evaluate_id_func_integral (e.g., rel_tol, subdivisions).

Value

A numeric vector of predicted values for each individual.

Author(s)

Francois Bassac

smoothPLS_SFD_predict

Description

Predicts the response Y for Scalar Functional Data using the analytic L2 inner product. This implementation follows the theory from Chapter 7.

Usage

smoothPLS_SFD_predict(
  df_predict,
  delta_spls,
  basis_obj = NULL,
  id_col = "id",
  time_col = "time",
  parallel = TRUE,
  ...
)

Arguments

df_predict

Dataframe with columns (id, time, value).

delta_spls

List containing (intercept, delta_fd_object).

basis_obj

Optional basis for signal reconstruction. If NULL, uses the basis from delta_fd

id_col

Character, name of id column.

time_col

Character, name of time column.

parallel

a boolean to enable parallel processing, default TRUE.

...

Additional arguments for Data2fd or inprod.

Value

A numeric vector of predicted values

Author(s)

Francois Bassac

smoothPLS_SFD_predict_para_v1

Description

Predicts the response Y for Scalar Functional Data using the analytic L2 inner product. This implementation follows the theory from Chapter 7.

Usage

smoothPLS_SFD_predict_para_v1(
  df_predict,
  delta_spls,
  basis_obj = NULL,
  id_col = "id",
  time_col = "time",
  parallel = TRUE,
  ...
)

Arguments

df_predict

Dataframe with columns (id, time, value).

delta_spls

List containing (intercept, delta_fd_object).

basis_obj

Optional basis for signal reconstruction. If NULL, uses the basis from delta_fd

id_col

Character, name of id column.

time_col

Character, name of time column.

parallel

a boolean to enable parallel processing, default TRUE.

...

Additional arguments for Data2fd or inprod.

Value

A numeric vector of predicted values

Author(s)

Francois Bassac

smoothPLS_multi_predict

Description

This function use the list of regression functions to make a prediction \hat{Y} = \delta_0 + \int_0^T \delta(t) X(t) dt

Usage

smoothPLS_predict(
  df_predict_list,
  delta_list,
  curve_type_obj = NULL,
  id_col_obj = "id",
  time_col_obj = "time",
  regul_time_obj = NULL,
  int_mode = 1,
  nb_pt = 10,
  subdivisions = 100,
  parallel = TRUE
)

Arguments

df_predict_list

a list of dataframe (id, time, value_or_state)

delta_list

a list of regression object (intercept, delta_1_fd, delta_2_fd, etc)

curve_type_obj

a list of characters of the curve types 'cat' or 'num'

id_col_obj

a list of character of the name of the id column, default 'id'

time_col_obj

a list of character of the name of the time column, default 'time'

regul_time_obj

a list of the time regularization values

int_mode

a integer for the integration mode, 1 for integrate, 2 for pracma::trapz

nb_pt

a integer, number of intermediate points for pracma::trapz, default 10

subdivisions

a integer, number of subdivision in integrate function, default 100

parallel

a boolean to use parallelization, default TRUE

Value

a numeric vector of the prediction

Author(s)

Francois Bassac

smoothPLS_predict_uni

Description

This function make a prediction base on a dataframe and a list made of the intercept and the regression curve. The input curve_type in needed to select the good way of evaluate the integrals \int_0^T delta(t) X(t) dt.

Usage

smoothPLS_predict_uni(
  df_predict,
  delta_list,
  curve_type = NULL,
  int_mode = 1,
  id_col = "id",
  time_col = "time",
  nb_pt = 10,
  subdivisions = 100,
  regul_time = seq(delta_list[[2]]$basis$rangeval[1], delta_list[[2]]$rangeval[2], 1),
  parallel = TRUE
)

Arguments

df_predict

a dataframe ('id', 'time', 'state or value') to predict from

delta_list

a list of delta_spls : list(intercept, delta_fd)

curve_type

a character, 'cat' for Categorical FD, 'num' for Scalar FD

int_mode

a value of the integration mode, default 1

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

nb_pt

number of points for the integration, default value : 10

subdivisions

default parameter of R function integrate; default value : 100

regul_time

a vector of time regularization values default delta_fd basis rangeval per 1, NEEDED for curve_type = 'num'!

parallel

a boolean to use parallelization, default TRUE

Value

a vector of predicted values \hat{Y} = \delta_0 + \int_0^T X(t) \delta(t) dt

Author(s)

Francois Bassac

split_in_state_df

Description

This function transform a categorical functional data with its indicator functions into a dedicated list of all the state (one per different state) This function will also work with character states.

Usage

split_in_state_df(data, id_col = "id", time_col = "time")

Arguments

data

a dataframe containing the indicator functions, output of state_indicator()

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

Value

a list containing the dataframe of the indicator function of each state.

Author(s)

Francois Bassac

Examples

N_states = 3
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)

df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100,
lambdas, transition_df)

si_df = state_indicator(df, id_col='id',
time_col='time')

split_df = split_in_state_df(si_df, id_col='id', time_col='time')

state_indicator

Description

This function takes functional categorical curve as input and transform it into as many indicator curves as the number of state input return DATAFRAME Works even on dataframe without time condition respected (same start and end) This function sort the states by ascending order (if numeric) and put the name 'state_X' as the column of the output concerning the 'X' state. This function will also work with character states. Now for the different lists, the ith element of a list concern the ith states ordered.

Usage

state_indicator(data, id_col = "id", time_col = "time")

Arguments

data

a multistates dataframe ('id', 'time', 'states')

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

Value

a dataframe with columns ('id', 'time', list of states_XX)

Author(s)

Francois Bassac

Examples

N_states = 3
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)

df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100,
lambdas, transition_df)

si_df = state_indicator(df, id_col='id',
time_col='time')

state_indicator_old

Description

Usage

state_indicator_old(data, id_col = "id", time_col = "time")

Arguments

data

a multistates dataframe ('id', 'time', 'states')

id_col

a character for the id column, default 'id'

time_col

a character for the time column, default 'time'

Value

a dataframe with columns ('id', 'time', list of states_XX)

Author(s)

Francois Bassac

Examples

N_states = 3
lambdas = lambda_determination(N_states)
transition_df = transfer_probabilities(N_states)

df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100,
lambdas, transition_df)

si_df = state_indicator_old(df, id_col='id',
time_col='time')

test_basis_properties

Description

General function to test basis functions

Usage

test_basis_properties(basis, name = "Base", tol = 1e-10)

Arguments

basis

Basis to test, basis object or list of fd functions

name

Character, name of the basis

tol

Float, precision, default 1e-10

Value

a boolean

Author(s)

Francois Bassac

Examples

start = 0
end = 10
basis = create_bspline_basis(start, end, nbasis=10, norder=4)
test_basis_properties(basis, "cubic splines", tol = 1e-3)

basis_f = fda::create.fourier.basis(rangeval=c(start, end), nbasis=5)
test_basis_properties(basis_f, "Fourier", tol = 1e-3)

transfer_probabilities

Description

This function gives transfer probabilities between states. row -> columns.

Usage

transfer_probabilities(N_states)

Arguments

N_states

a int the number a states considered.

Value

a dataframe containing the transition probabilities

Author(s)

Francois Bassac

Examples

transfer_probabilities(3)
transfer_probabilities(5)

transition_matrix

Description

Build transition matrix between to basis.

Usage

transition_matrix(basis1, basis2)

Arguments

basis1

First basis, basis obj or list of fd functions

basis2

Second basis, basis obj or list of fd functions

Value

Transition matrix P such as basis1 = P * basis2

Author(s)

Francois Bassac

univariate_alpha_building

Description

This function build the alphas matrix for a dataframe of a curve_type curve on a basis on a regul_time vector.

Usage

univariate_alpha_building(
  df,
  basis,
  curve_type = NULL,
  regul_time,
  id_col = "id",
  time_col = "time"
)

Arguments

df

a dataframe (id, time, value_or_state)

basis

a basis fd object

curve_type

a character, 'cat' or 'num'

regul_time

a numeric vector for time regularization

id_col

a character for the id column name

time_col

a character for the time column name

Value

a matrix

Author(s)

Francois Bassac

Package {SmoothPLS}

assemble_basis_metric

Description

Usage

Arguments

Value

Author(s)

Examples

assert_funcPLS_inputs

Description

Usage

Arguments

Value

Author(s)

assert_multivariate_naivePLS_inputs

Description

Usage

Arguments

Value

Author(s)

assert_multivariate_smoothPLS_inputs

Description

Usage

Arguments

Value

Author(s)

beta_1_real_func

Description

Usage

Arguments

Value

Author(s)

Examples

beta_2_real_func

Description

Usage

Arguments

Value

Author(s)

Examples

beta_3_real_func

Description

Usage

Arguments

Value

Author(s)

Examples

beta_4_real_func

Description

Usage

Arguments

Value

Author(s)

Examples

beta_5_real_func

Description

Usage

Arguments

Value

Author(s)

Examples

beta_6_real_func

Description

Usage

Arguments

Value

Author(s)

Examples

beta_7_real_func

Description

Usage

Arguments

Value

Author(s)

Examples

beta_list_generation

Description

Usage

Arguments

Value