Type: | Package |
Title: | Techniques for Evaluating Clustering |
Description: | The design of this package allows us to run different clustering packages and compare the results between them, to determine which algorithm behaves best from the data provided. See Martos, L.A.P., García-Vico, Á.M., González, P. et al.(2023) <doi:10.1007/s13748-022-00294-2> "Clustering: an R library to facilitate the analysis and comparison of cluster algorithms.", Martos, L.A.P., García-Vico, Á.M., González, P. et al. "A Multiclustering Evolutionary Hyperrectangle-Based Algorithm" <doi:10.1007/s44196-023-00341-3> and L.A.P., García-Vico, Á.M., González, P. et al. "An Evolutionary Fuzzy System for Multiclustering in Data Streaming" <doi:10.1016/j.procs.2023.12.058>. |
Version: | 1.7.10 |
Date: | 2024-04-20 |
Author: | Luis Alfonso Perez Martos [aut, cre] (<https://orcid.org/0000-0002-5154-6105>) |
Maintainer: | Luis Alfonso Perez Martos <lapm0001@gmail.com> |
URL: | https://github.com/laperez/clustering |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.1 |
Repository: | CRAN |
Imports: | amap, apcluster, cluster, ClusterR, data.table, dplyr, foreach, future, ggplot2, gmp, methods, pracma, pvclust, shiny, sqldf, stats, tools, utils, xtable, toOrdinal |
Suggests: | DT, shinyalert, shinyFiles, shinyjs, shinythemes, shinyWidgets, tidyverse, shinycssloaders |
NeedsCompilation: | no |
Packaged: | 2024-04-22 18:47:54 UTC; luis |
Depends: | R (≥ 3.5.0) |
Date/Publication: | 2024-04-22 19:10:11 UTC |
Filter metrics in a clustering
object returning a new
clustering
object.
Description
Generates a new filtered clustering
object.
Usage
## S3 method for class 'clustering'
clustering[condition = TRUE]
Arguments
clustering |
The |
condition |
Expression to filter the |
Details
This function allows you to filter the data set for a given
evaluation metric. The evaluation metrics available are:
Algorithm, Distance, Clusters, Data, Var, Time, Entropy,
Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index,
Connectivity, Dunn, Silhouette and TimeAtt
.
Value
A clustering
object filtered from the input parameters.
Examples
result <- Clustering::clustering(df = Clustering::basketball, algorithm = 'clara',
min=3, max=4, metrics = c('Precision','Recall'))
result[Precision > 0.14 & Recall > 0.11]
Method that runs the aggExcluster algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the aggExcluster algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
aggExCluster_euclidean(dt, clusters, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the agnes algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the agnes algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
agnes_euclidean_method(dt, clusters, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the agnes algorithm using the manhattan metric to make an external or internal validation of the cluster
Description
Method that runs the agnes algorithm using the manhattan metric to make an external or internal validation of the cluster
Usage
agnes_manhattan_method(dt, clusters, metric)
Arguments
dt |
matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
is an integer that indexes the number of clusters we want to create. |
metric |
is a characters vector with the metrics avalaible in the package. The metrics implemented are: entropy, variation_information, precision,recall,f_measure,fowlkes_mallows_index,connectivity,dunn, silhouette. |
Value
returns a list with both the internal and external evaluation of the grouping.
amap package algorithms
Description
amap package algorithms
Usage
algorithm_amap()
Value
list with the algorithms
apcluster package algorithms
Description
apcluster package algorithms
Usage
algorithm_apcluster()
Value
list with the algorithms
cluster package algorithms
Description
cluster package algorithms
Usage
algorithm_cluster()
Value
list with the algorithms
ClusterR package algorithms
Description
ClusterR package algorithms
Usage
algorithm_clusterr()
Value
list with the algorithms
pvclust package algorithms
Description
pvclust package algorithms
Usage
algorithm_pvclust()
Value
list with the algorithms
Method that returns the list of used algorithms
Description
Method that returns the list of used algorithms
Usage
algorithms()
Value
algorithm listing array
Method that returns all the algorithms executed by the package
Description
Method that returns all the algorithms executed by the package
Usage
algorithms_package(packages)
Arguments
packages |
package array |
Value
array with the algorithms we're going to run
Method that runs the apClusterK algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the apClusterK algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
apclusterK_euclidean(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the apclusterK algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Description
Method that runs the apclusterK algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Usage
apclusterK_manhattan(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the apclusterK algorithm using the Minkowski metric to make an external or internal validation of the cluster.
Description
Method that runs the apclusterK algorithm using the Minkowski metric to make an external or internal validation of the cluster.
Usage
apclusterK_minkowski(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Clustering GUI.
Description
Method that allows us to execute the main algorithm in graphic interface mode instead of through the console.
Usage
appClustering()
Details
The operation of this method is to generate a graphical user. interface to be able to execute the clustering algorithm without knowing the parameters. Its operation is very simple, we can change the values and see the behavior quickly.
Value
GUI with the parameters of the algorithm and their representation in tables and graphs.
This data set contains a series of statistics (5 attributes) about 96 basketball players:
Description
This data set contains a series of statistics about basketball players:
Usage
data(basketball)
Format
A data frame with 96 observations on 5 variables:
This data set contains a series of statistics about basketball players:
- assists_per_minuteReal
average number of assistances per minute
- heightInteger
height of the player
- time_playedReal
time played by the player
- ageInteger
number of years of the player
- points_per_minuteReal
average number of points per minute
Source
KEEL, <http://www.keel.es/>
Best rated external metrics.
Description
Method in charge of searching for each algorithm those that have the best external classification.
Method that looks for those external attribute that are better classified, making use of the var column. In this way of discard attribute and only work with those that give the best response to the algorithm in question.
Usage
best_ranked_external_metrics(df)
Arguments
df |
Matrix or data frame with the result of running the clustering algorithm. |
Value
Returns a data.frame with the best classified external attribute.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 4,
algorithm='clara',
metrics=c("Recall")
)
Clustering::best_ranked_external_metrics(df = result)
Best rated internal metrics.
Description
Method in charge of searching for each algorithm those that have the best internal classification.
Method that looks for those internal attributes that are better classified, making use of the Var column. In this way we discard the attributes and only work with those that give the best response to the algorithm in question.
Usage
best_ranked_internal_metrics(df)
Arguments
df |
Matrix or data frame with the result of running the clustering algorithm. |
Value
Returns a data.frame with the best classified internal attributes.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='gmm',
metrics=c("Recall")
)
Clustering::best_ranked_internal_metrics(df = result)
Data from an experiment on the affects of machine adjustments on the time to count bolts.
Description
A manufacturer of automotive accessories provides hardware, e.g. nuts, bolts, washers and screws, to fasten the accessory to the car or truck. Hardware is counted and packaged automatically. Specifically, bolts are dumped into a large metal dish. A plate that forms the bottom of the dish rotates counterclockwise. This rotation forces bolts to the outside of the dish and up along a narrow ledge. Due to the vibration of the dish caused by the spinning bottom plate, some bolts fall off the ledge and back into the dish. The ledge spirals up to a point where the bolts are allowed to drop into a pan on a conveyor belt. As a bolt drops, it passes by an electronic eye that counts it. When the electronic counter reaches the preset number of bolts, the rotation is stopped and the conveyor belt is moved forward
Usage
data(bolts)
Format
A data frame with 40 observations on 8 variables:
A manufacturer of automotive accessories provides hardware, e.g. nuts, bolts, washers and screws, to fasten the accessory to the car or truck. Hardware is counted and packaged automatically. Specifically, bolts are dumped into a large metal dish. A plate that forms the bottom of the dish rotates counterclockwise. This rotation forces bolts to the outside of the dish and up along a narrow ledge. Due to the vibration of the dish caused by the spinning bottom plate, some bolts fall off the ledge and back into the dish. The ledge spirals up to a point where the bolts are allowed to drop into a pan on a conveyor belt. As a bolt drops, it passes by an electronic eye that counts it. When the electronic counter reaches the preset number of bolts, the rotation is stopped and the conveyor belt is moved forward
- RUNInteger
is the order in which the data were collected
- SPEED1Integer
a speed setting that controls the speed of rotation of the plate at the bottom of the dish
- TOTALInteger
total number of bolts (TOTAL) to be counted
- SPEED2Integer
a second speed setting hat is used to change the speed of rotation (usually slowing it down) for the last few bolts
- NUMBER2Integer
the number of bolts to be counted at this second speed
- SENSInteger
the sensitivity of the electronic eye
- TIMEReal
The measured response is the time, in seconds
- T20BOLTReal
n order to put times on a equal footing the response to be analyzed is the time to count 20 bolts
Details
There are several adjustments on the machine that affect its operation. These include; a speed setting that controls the speed of rotation (SPEED1Integer) of the plate at the bottom of the dish, a total number of bolts (TOTAL) to be counted, a second speed setting (SPEED2Integer) that is used to change the speed of rotation (usually slowing it down) for the last few bolts, the number of bolts to be counted at this second speed (NUMBER2Integer), and the sensitivity of the electronic eye (SENSInteger). The sensitivity setting is to insure that the correct number of bolts are counted. Too few bolts packaged causes customer complaints. Too many bolts packaged increases costs. For each run conducted in this experiment the correct number of bolts was counted. From an engineering standpoint if the correct number of bolts is counted, the sensitivity should not affect the time to count bolts. The measured response is the time (TIMEReal), in seconds, it takes to count the desired number of bolts. In order to put times on a equal footing the response to be analyzed is the time to count 20 bolts (T20BOLTReal). Below are the data for 40 combinations of settings. RUNinteger is the order in which the data were collected.
Source
KEEL, <http://www.keel.es/>
Method that calculates the best rated external metrics.
Description
Method that calculates the best rated external metrics.
Usage
calculate_best_external_variables_by_metrics(df)
Arguments
df |
Data matrix or data frame. |
Value
Return a table with the external metrics that has the best rating.
Method that calculates the best rated internal metrics.
Description
Method that calculates the best rated internal metrics.
Usage
calculate_best_internal_variables_by_metrics(df)
Arguments
df |
Data matrix or data frame. |
Value
Return a table with the internal metrics that has the best rating.
Method that calculates which algorithm and which metric behaves best for the datasets provided.
Description
Method that calculates which algorithm and which metric behaves best for the datasets provided.
Usage
calculate_best_validation_external_by_metrics(df, metric)
Arguments
df |
Data matrix or data frame. |
metric |
String with the metric. |
Value
Return a table with the algorithm and the best performing metric for the datasets.
Method that calculates which algorithm and which metric behaves best for the datasets provided.
Description
Method that calculates which algorithm and which metric behaves best for the datasets provided.
Usage
calculate_best_validation_internal_by_metrics(df, metric)
Arguments
df |
Data matrix or data frame. |
metric |
String with the metric. |
Value
Return a table with the algorithm and the best performing metric for the datasets.
Method to calculate the Connectivity
Description
Method to calculate the Connectivity
Usage
calculate_connectivity(
distance = NULL,
clusters,
datadf = NULL,
neighbSize = 12,
method = "euclidean"
)
Arguments
distance |
Dissimilarity matrix. |
clusters |
Array that containe tha data grouped in cluster. |
datadf |
Dataframe with original data. |
neighbSize |
Number of neighbours. |
method |
Indicates the method for calculating distance between points. Default is euclidean. |
Value
Return a double with the result of the connectivity calculation.
Method to calculate the dunn.
Description
Method to calculate the dunn.
Usage
calculate_dunn(distance = NULL, clusters, datadf = NULL, method = "euclidean")
Arguments
distance |
Dissimilarity matrix. |
clusters |
Array that containe tha data grouped in cluster. |
datadf |
Dataframe with original data. |
method |
Indicate the method for calculating distance between points. |
Value
Return a double with the result of the dunn calculation
Method that returns the value or variable depending on where it is in the calculated metrics.
Description
Method that returns the value or variable depending on where it is in the calculated metrics.
Usage
calculate_result(
algorith,
distance,
cluster,
dataset,
ranking,
timeExternal,
entropy,
variation_information,
precision,
recall,
fowlkes_mallows_index,
f_measure,
timeInternal,
dunn,
connectivity,
silhouette,
variables
)
Arguments
algorith |
Algorithm name. |
distance |
Name of the metric used to calculate the distance between points. |
cluster |
Number of clusters. |
dataset |
Name of dataset. |
ranking |
Position we want to obtain from the list of variables. |
timeExternal |
Array with the external validation calculation times of the clustering. |
entropy |
Array with the calculation of the entropy for each of the variables. |
variation_information |
Array with the calculation of the variation_information for each of the variables. |
precision |
Array with the calculation of the precision for each of the variables. |
recall |
Array with the calculation of the recall for each of the variables. |
fowlkes_mallows_index |
Array with the calculation of the fowlkes_mallows_index for each of the variables. |
f_measure |
Array with the calculation of the f_measure for each of the variables. |
timeInternal |
Array with the internal validation calculation times of the clustering. |
dunn |
Array with the calculation of the dunn for each of the variables. |
connectivity |
Array with the calculation of the connectivity for each of the variables. |
silhouette |
Array with the calculation of the silhouette for each of the variables. |
variables |
True if we want to show the value of the metric calculation and false if we want to show the variable. |
Value
Returns an array with the calculation of each metric based on the indicated position.
Method that returns the value or variable depending on where it is in the calculated metrics.
Description
Method that returns the value or variable depending on where it is in the calculated metrics.
Usage
calculate_result_internal(
algorith,
distance,
cluster,
dataset,
ranking,
timeInternal,
dunn,
connectivity,
silhouette,
variables
)
Arguments
algorith |
Algorithm name. |
distance |
Name of the metric used to calculate the distance between points. |
cluster |
Number of clusters. |
dataset |
Name of dataset. |
timeInternal |
Array with the internal validation calculation times of the clustering. |
dunn |
Array with the calculation of the dunn for each of the variables. |
connectivity |
Array with the calculation of the connectivity for each of the variables. |
silhouette |
Array with the calculation of the silhouette for each of the variables. |
variables |
True if we want to show the value of the metric calculation and false if we want to show the variable. |
Value
Returns an array with the calculation of each metric based on the indicated position.
Method that calculates which algorithm behaves best for the datasets provided.
Description
Method that calculates which algorithm behaves best for the datasets provided.
Usage
calculate_validation_external_by_metrics(df)
Arguments
df |
Data matrix or data frame. |
Value
Return a table with the best performing algorithm for the provided datasets.
Method that calculates which algorithm behaves best for the datasets provided.
Description
Method that calculates which algorithm behaves best for the datasets provided.
Usage
calculate_validation_internal_by_metrics(df)
Arguments
df |
Data matrix or data frame. |
Value
Return a table with the best performing algorithm for the provided datasets.
Method that runs the clara algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the clara algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
clara_euclidean_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the clara algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Description
Method that runs the clara algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Usage
clara_manhattan_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Clustering algorithm.
Description
Discovering the behavior of attributes in a set of clustering packages based on evaluation metrics.
Usage
clustering(
path = NULL,
df = NULL,
packages = NULL,
algorithm = NULL,
min = 3,
max = 4,
metrics = NULL
)
Arguments
path |
The path of file. |
df |
data matrix or data frame, or dissimilarity matrix. |
packages |
character vector with the packets running the algorithm.
|
algorithm |
character vector with the algorithms implemented within the
package. |
min |
An integer with the minimum number of clusters This data is
necessary to indicate the minimum number of clusters when grouping the data.
The default value is |
max |
An integer with the maximum number of clusters. This data is
necessary to indicate the maximum number of clusters when grouping the data.
The default value is |
metrics |
Character vector with the metrics implemented to evaluate the
distribution of the data in clusters. |
Details
The operation of this algorithm is to evaluate how the attributes of a dataset or a set of datasets behave in different clustering algorithms. To do this, it is necessary to indicate the type of evaluation you want to make on the distribution of the data. To be able to execute the algorithm it is necessary to indicate the number of clusters.
min
and max
, the algorithms algorithm
or packages.
packages
that we want to cluster and the metrics metrics
.
Value
A matrix with the result of running all the metrics of the algorithms contained in the packages indicated. We also obtain information with the types of metrics, algorithms and packages executed.
result It is a list with the algorithms, metrics and variables defined in the execution of the algorithm.
has_internal_metrics Boolean field to indicate if there are internal metrics such as: dunn, silhoutte and connectivity.
has_external_metrics Boolean field to indicate if there are external metrics such as: precision, recall, f-measure, entropy, variation information and fowlkes-mallows.
algorithms_execute Character vector with the algorithms executed. These algorithms have been mentioned in the definition of the parameters.
measures_execute Character vector with the measures executed. These measures have been mentioned in the definition of the parameters.
Examples
Clustering::clustering(
df = cluster::agriculture,
min = 3,
max = 3,
algorithm='clara',
metrics=c('Precision')
)
Method to calculate the connectivity.
Description
Method to calculate the connectivity.
Usage
connectivity_metric(distance, clusters_vector, dt, method)
Arguments
distance |
Dissimilarity matrix. |
clusters_vector |
Array that containe tha data grouped in cluster. |
dt |
Dataframe with original data. |
method |
Indicates the method for calculating distance between points. |
Value
Return a double with the result of the connectivity calculation.
Method that converts a matrix into numerical format.
Description
Method that converts a matrix into numerical format.
Usage
convert_numeric_matrix(datas)
Arguments
datas |
information matrix. |
Value
return a matrix in numeric format.
Method in charge of creating a table from an array with the values of the variable used as a sample and another with the classification of the values.
Description
Method in charge of creating a table from an array with the values of the variable used as a sample and another with the classification of the values.
Usage
convert_table(clusters_vector, column_dataset_label)
Arguments
clusters_vector |
Array of the variable used for the classification. |
column_dataset_label |
Array with the grouping of the values. |
Value
Return a table with the grouping of both arrays.
Method to convert columns to ordinal.
Description
Method to convert columns to ordinal.
Usage
convert_toOrdinal(df)
Arguments
df |
data frame with the results. |
Value
convert data frame to Ordinal.
Method that runs the daisy algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the daisy algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
daisy_euclidean_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the daisy algorithm using the Gower metric to make an external or internal validation of the cluster.
Description
Method that runs the daisy algorithm using the Gower metric to make an external or internal validation of the cluster.
Usage
daisy_gower_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the daisy algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Description
Method that runs the daisy algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Usage
daisy_manhattan_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method to filter only the external measurement columns
Description
Method to filter only the external measurement columns
Usage
dataframe_by_metrics_evaluation(data, external = TRUE)
Arguments
data |
information matrix. |
external |
boolean indicating whether it is an external measurement. |
Value
returns a data frame with the filtered columns.
Method in charge of detecting the limit of a dataset header.
Description
Method in charge of detecting the limit of a dataset header.
Usage
detect_definition_attribute(path)
Arguments
path |
of the dataset |
Value
The row where the dataset attributes definition ends
Method that runs the diana algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the diana algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
diana_euclidean_method(dt, clusters, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method to calculate the dunn.
Description
Method to calculate the dunn.
Usage
dunn_metric(dist, clusters_vector, dt, me)
Arguments
dist |
Dissimilarity matrix. |
clusters_vector |
Array that containe tha data grouped in cluster. |
dt |
Dataframe with original data. |
me |
Indicates the method for calculating distance between points. |
Value
Return a double with the result of the dunn calculation.
Method for calculating entropy.
Description
Method for calculating entropy.
Usage
entropy_formula(x_vec)
Arguments
x_vec |
With datas to calculate entropy. |
Value
An array with the calculate.
Method to calculate the entropy.
Description
Method to calculate the entropy.
Usage
entropy_metric(conversion_data_frame, table_convert, column_dataset_label)
Arguments
conversion_data_frame |
A double with the result of the entropy calculation. |
table_convert |
Table conversion (variable - cluster). |
column_dataset_label |
Array with the calculation of the clustering algorithm. |
Value
Return a double with the result of the entropy calculation.
Method in charge of calculating the average for all datasets using all the algorithms defined in the application.
Description
Method in charge of calculating the average for all datasets using all the algorithms defined in the application.
Usage
evaluate_all_column_dataset(datas, method, cluster, nameDataset, metrics)
Arguments
datas |
It's a data frame or matrix. |
method |
Described the metrics used by each of the algorithms. |
cluster |
Number of clusters. |
nameDataset |
Specify the name of dataset like information. |
metrics |
Array with internal or external metrics. |
Value
A list with result of external and internal validation applying on algorithms.
Evaluates algorithms by measures of dissimilarity based on a metric.
Description
Method that calculates which algorithm and which metric behaves best for the datasets provided.
Usage
evaluate_best_validation_external_by_metrics(df, metric)
Arguments
df |
Data matrix or data frame with the result of running the clustering algorithm. |
metric |
String with the metric. |
Details
Method groups the data by algorithm and distance measure, instead of obtaining the best attribute from the data set.
Value
A data.frame with the algorithms classified by measures of dissimilarity.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='kmeans_rcpp',
metrics=c("F_measure"))
Clustering::evaluate_best_validation_external_by_metrics(result,'F_measure')
Evaluates algorithms by measures of dissimilarity based on a metric.
Description
Method that calculates which algorithm and which metric behaves best for the datasets provided.
Usage
evaluate_best_validation_internal_by_metrics(df, metric)
Arguments
df |
Data matrix or data frame with the result of running the clustering algorithm. |
metric |
It's a string with the metric to evaluate. |
Details
This method groups the data by algorithm and distance measure, instead of obtaining the best attribute from the data set.
Value
A data.frame with the algorithms classified by measures of dissimilarity.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='gmm',
metrics=c("Precision","Connectivity")
)
Clustering::evaluate_best_validation_internal_by_metrics(result,"Connectivity")
Evaluate external validations by algorithm.
Description
Method that calculates which algorithm behaves best for the datasets provided.
Usage
evaluate_validation_external_by_metrics(df)
Arguments
df |
data matrix or data frame with the result of running the clustering algorithm. |
Details
It groups the results of the execution by algorithms.
Value
A data.frame with all the algorithms that obtain the best results regardless of the dissimilarity measure used.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 4,
algorithm='kmeans_arma',
metrics=c("Precision")
)
Clustering::evaluate_validation_external_by_metrics(result)
Evaluate internal validations by algorithm.
Description
Method that calculates which algorithm behaves best for the datasets provided.
Usage
evaluate_validation_internal_by_metrics(df)
Arguments
df |
data matrix or data frame with the result of running the clustering algorithm. |
Details
It groups the results of the execution by algorithms.
Value
A data.frame with all the algorithms that obtain the best results regardless of the dissimilarity measure used.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='kmeans_rcpp',
metrics=c("Recall","Silhouette")
)
Clustering::evaluate_validation_internal_by_metrics(result)
Clustering::evaluate_validation_internal_by_metrics(result$result)
Evaluation clustering algorithm.
Description
Method of performing information processing
Usage
execute_datasets(
path,
df,
packages,
algorithm,
cluster_min,
cluster_max,
metrics,
attributes,
name_dataframe
)
Arguments
path |
Path where the datasets are located. |
df |
Data matrix or data frame, or dissimilarity matrix, depending on the value of the argument. |
packages |
Array defining the clustering package. The seven packages implemented are: cluster, ClusterR, amap, apcluster, pvclust. By default runs all packages. |
algorithm |
Array with the algorithms that implement the package. The algorithms implemented are: hclust,apclusterK, agnes,clara,daisy,diana,fanny,mona,pam,gmm,kmeans_arma,kmeans_rcpp, mini_kmeans, pvclust. |
cluster_min |
Minimum number of clusters. at least one must be. |
cluster_max |
Maximum number of clusters. cluster_max must be greater or equal cluster_min. |
metrics |
Array defining the metrics avalaible in the package. The night metrics implemented are: Entropy, Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index, Connectivity, Dunn and Silhouette. |
name_dataframe |
Name of data.frame when df is fill. |
Value
Returns a matrix with the result of running all the metrics of the algorithms contained in the packages we indicated.
Evaluation clustering algorithm.
Description
Method that evaluates clustering algorithm from a file directory or dataframe.
Usage
execute_package_parallel(
directory_files,
df,
algorithms_execute,
measures_execute,
cluster_min,
cluster_max,
metrics_execute,
attributes,
number_algorithms,
numberClusters,
numberDataSets,
is_metric_external,
is_metric_internal,
name_dataframe
)
Arguments
directory_files |
It's a string with the route where the datasets are located. |
df |
Data matrix or data frame, or dissimilarity matrix, depending on the value of the argument. |
algorithms_execute |
Character vector with the algorithms to be executed. The algorithms implemented are: hclust, apclusterK,agnes,clara,daisy,diana,fanny,mona,pam,gmm,kmeans_arma, kmeans_rcpp,mini_kmeans, pvclust. |
measures_execute |
Character array with the measurements of dissimilarity to be executed. Depending on the algorithm, one or the other is implemented. Among them we highlight: Euclidena, Manhattan, etc. |
cluster_min |
Minimum number of clusters. |
cluster_max |
Maximum number of clusters. cluster_max must be greater or equal cluster_min. |
metrics_execute |
Character array defining the metrics to be executed. The night metrics implemented are: Entropy, Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index, Connectivity, Dunn and Silhouette. |
number_algorithms |
It's a numeric field with the number of algorithms. |
numberClusters |
It's a numeric field with the difference between clusters. |
numberDataSets |
It's a numeric field with the number of datasets. |
is_metric_external |
Boolean field to indicate whether to run external metrics. |
is_metric_internal |
Boolean field to indicate whether to run internal metrics. |
name_dataframe |
Name of data.frame when is fill. |
Value
Returns a list with the result matrix of evaluating the data from the indicated algorithms, metrics and number of clusters.
Export result of external metrics in latex.
Description
Method that exports the results of external measurements in latex format to a file.
Usage
export_file_external(df, path = NULL)
Arguments
df |
It's a dataframe that contains as a parameter a table in latex format with the results of the external validations. |
path |
It's a string with the path to a directory where a file is to be stored in latex format. |
Details
When we work in latex format and we need to create a table to export the results, with this method we can export the results of the clustering algorithm to latex.
Value
A file in Latex format with the results of the external metrics.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='gmm',
metrics=c("Precision")
)
Clustering::export_file_external(result)
file.remove("external_data.tex")
Export result of internal metrics in latex.
Description
Method that exports the results of internal measurements in latex format to a file.
Usage
export_file_internal(df, path = NULL)
Arguments
df |
It's a dataframe that contains as a parameter a table in latex format with the results of the internal validations. |
path |
It's a string with the path to a directory where a file is to be stored in latex format. |
Details
When we work in latex format and we need to create a table to export the results, with this method we can export the results of the clustering algorithm to latex.
Value
A file in Latex format with the results of the internal metrics.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='gmm',
metrics=c("Recall","Dunn")
)
Clustering::export_file_internal(result)
file.remove("internal_data.tex")
Method that return the extension of a file
Description
Method that return the extension of a file
Usage
extension_file(path)
Arguments
path |
dataset directory |
Value
return the extension of file
Method that applicate differents external metrics about a data frame or matrix, for example precision, recall etc
Description
Method that applicate differents external metrics about a data frame or matrix, for example precision, recall etc
Usage
external_validation(column_dataset_label, clusters_vector, metric = CONST_NULL)
Arguments
column_dataset_label |
Array containing the distribution of the data in the cluster. |
clusters_vector |
Array that containe tha data grouped in cluster. |
metric |
Array with external metric types. |
Value
Return a list of the external results initialized to zero.
Method that runs the fanny algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the fanny algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
fanny_euclidean_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the fanny algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Description
Method that runs the fanny algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Usage
fanny_manhattan_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that fill vector
Description
Method that fill vector
Usage
fill_cluster_vector(data, appcluster)
Arguments
data |
matriz or dataframe with dataset |
appcluster |
data with the information of the appcluster object |
Value
a vector fill with information
Method to calculate the f_measure.
Description
Method to calculate the f_measure.
Usage
fmeasure_metric(true_positive, false_positive, false_negative)
Arguments
true_positive |
Array with matching elements of B is in the same cluster. |
false_positive |
Array with non matching element of B is in the same cluster. |
false_negative |
Array with matching elements of B is not in the same cluster. |
Value
Returns a double with the f_measure calculation.
Method to calculate the fowlkes and mallows.
Description
Method to calculate the fowlkes and mallows.
Usage
fowlkes_mallows_index_metric(true_positive, false_positive, false_negative)
Arguments
true_positive |
Array with matching elements of B is in the same cluster. |
false_positive |
Array with non matching element of B is in the same cluster. |
false_negative |
Array with matching elements of B is not in the same cluster. |
Value
Returns a double with the fowlkes_mallows_index calculation.
Method that runs the gmm algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the gmm algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
gmm_euclidean_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the gmm algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Description
Method that runs the gmm algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Usage
gmm_manhattan_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the hcluster algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the hcluster algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
hclust_euclidean(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that returns an array with the external information of the cluster
Description
Method that returns an array with the external information of the cluster
Usage
information_external(metrics, information, size, variables)
Arguments
metrics |
array with the metrics used in the execution of the package |
information |
list with external clustering information |
size |
external number of columns |
variables |
Null returns the position of the variable, otherwise it returns the value of the variable |
Value
array with the information from the calculation of the external evaluation of the clustering
Method that returns an array with the internal information of the cluster
Description
Method that returns an array with the internal information of the cluster
Usage
information_internal(metrics, information, size, variables)
Arguments
metrics |
array with the metrics used in the execution of the package |
information |
list with internal clustering information |
size |
internal number of columns |
variables |
Null returns the position of the variable, otherwise it returns the value of the variable |
Value
array with the information from the calculation of the internal evaluation of the clustering
Method that return a list of internal validation initialized to zero.
Description
Method that return a list of internal validation initialized to zero.
Usage
initializeExternalValidation()
Value
A list of all values set to zero.
Method that return a list of external validation initialized to zero.
Description
Method that return a list of external validation initialized to zero.
Usage
initializeInternalValidation()
Value
A list of all values set to zero.
Method that applicate differents internal metrics about a data frame or matrix, for example dunn, connectivity etc.
Description
Method that applicate differents internal metrics about a data frame or matrix, for example dunn, connectivity etc.
Usage
internal_validation(
distance = NULL,
clusters_vector,
dataf = NULL,
method = CONST_EUCLIDEAN,
metric = NULL
)
Arguments
distance |
Dissimilarity matrix. |
clusters_vector |
Array that containe tha data grouped in cluster. |
dataf |
Dataframe with original data. |
method |
Indicates the method for calculating distance between points. |
metric |
Array with external metric types. |
Value
Return a list of the internal results initialized to zero.
Method that checks for external metrics
Description
Method that checks for external metrics
Usage
is_External_Metrics(metrics)
Arguments
metrics |
array with the metrics used in the execution of the package |
Value
true if it exists and false otherwise
Method that checks for internal metrics
Description
Method that checks for internal metrics
Usage
is_Internal_Metrics(metrics)
Arguments
metrics |
array with the metrics used in the execution of the package |
Value
true if it exists and false otherwise
Method that runs the kmeans_arma algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the kmeans_arma algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
kmeans_arma_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the kmeans_rcpp algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the kmeans_rcpp algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
kmeans_rcpp_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that return max value of metric.
Description
Method that return max value of metric.
Usage
max_value_metric(df, metric, isExternalMetric)
Arguments
df |
Data matrix or data frame. |
metric |
Metric to evaluate. |
Value
A value with maximum column.
Metrics of the amap algorithm
Description
Metrics of the amap algorithm
Usage
measure_amap()
Value
list with the metrics
Metrics of the apcluster algorithm
Description
Metrics of the apcluster algorithm
Usage
measure_apcluster()
Value
list with the metrics
Method that returns all the measures executed by the package from the indicated algorithms
Description
Method that returns all the measures executed by the package from the indicated algorithms
Usage
measure_calculate(algorithm)
Arguments
algorithm |
algorithms array |
Value
array with the measures we're going to run
Metrics of the cluster algorithm
Description
Metrics of the cluster algorithm
Usage
measure_cluster()
Value
list with the metrics
Metrics of the ClusterR algorithm
Description
Metrics of the ClusterR algorithm
Usage
measure_clusterr()
Value
list with the metrics
Method that returns all the measures executed by the package
Description
Method that returns all the measures executed by the package
Usage
measure_package(package)
Arguments
package |
package array |
Value
array with the measures we're going to run
Metrics of the pvclust algorithm
Description
Metrics of the pvclust algorithm
Usage
measure_pvclust()
Value
list with the metrics
Method in charge of verifying the implemented metrics
Description
Method in charge of verifying the implemented metrics
Usage
metrics_calculate(metrics, variables, internal, external)
Arguments
metrics |
array with the metrics used in the execution of the package |
variables |
boolean field that indicates if it should show the results of the variables |
Value
list of metrics
Method that returns the list of used external metrics
Description
Method that returns the list of used external metrics
Usage
metrics_external()
Value
external metrics listing array
Method that returns the list of used internal metrics
Description
Method that returns the list of used internal metrics
Usage
metrics_internal()
Value
internal metrics listing array
Method that returns the list of used metrics
Description
Method that returns the list of used metrics
Usage
metrics_validate()
Value
metrics listing array
Method that runs the mini_kmeans algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the mini_kmeans algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
mini_kmeans_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the mona algorithm using external or internal validation of the cluster.
Description
Method that runs the mona algorithm using external or internal validation of the cluster.
Usage
mona_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that returns how many external metrics there are in the array of metrics used in the calculation
Description
Method that returns how many external metrics there are in the array of metrics used in the calculation
Usage
number_columnas_external(metrics)
Arguments
metrics |
array with the metrics used in the execution of the package |
Value
returns the number of occurrences
Method that returns how many internal metrics there are in the array of metrics used in the calculation
Description
Method that returns how many internal metrics there are in the array of metrics used in the calculation
Usage
number_columnas_internal(metrics)
Arguments
metrics |
array with the metrics used in the execution of the package |
Value
returns the number of occurrences
Method that returns the number of variables in a dataset directory
Description
Method that returns the number of variables in a dataset directory
Usage
number_variables_dataset(path)
Arguments
path |
dataset directory |
Value
returns the number of variables in a dataset directory
Method that returns the list of used packages
Description
Method that returns the list of used packages
Usage
packages()
Value
package listing array
Method that runs the pam algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the pam algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
pam_euclidean_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the pam algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Description
Method that runs the pam algorithm using the Manhattan metric to make an external or internal validation of the cluster.
Usage
pam_manhattan_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that return a list of files that exists in a directory
Description
Method that return a list of files that exists in a directory
Usage
path_dataset(directory)
Arguments
directory |
of the directory |
Value
a vector with the files existing into of a directory
Graphic representation of the evaluation measures.
Description
Graphical representation of the evaluation measures grouped by cluster.
Usage
plot_clustering(df, metric)
Arguments
df |
data matrix or data frame with the result of running the clustering algorithm. |
metric |
it's a string with the name of the metric select to evaluate. |
Details
In certain cases the review or filtering of the data is necessary to select the data, that is why thanks to the graphic representations this task is much easier. Therefore with this method we will be able to filter the data by metrics and see the data in a graphical way.
Value
Generate an image with the distribution of the clusters by metrics.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='gmm',
metrics=c("Precision")
)
Clustering::plot_clustering(result,c("Precision"))
Method to calculate the precision.
Description
Method to calculate the precision.
Usage
precision_metric(true_positive, false_positive)
Arguments
true_positive |
Array with matching elements of B is in the same cluster. |
false_positive |
Array with non matching element of B is in the same cluster. |
Value
Returns a double with the precision calculation.
Method that runs the pvclust algorithm using the Correlation metric to make an external or internal validation of the cluster.
Description
Method that runs the pvclust algorithm using the Correlation metric to make an external or internal validation of the cluster.
Usage
pvclust_correlation_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the pvclust algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Description
Method that runs the pvclust algorithm using the Euclidean metric to make an external or internal validation of the cluster.
Usage
pvclust_euclidean_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that runs the pvpick algorithm using an external or internal validation of the cluster.
Description
Method that runs the pvpick algorithm using an external or internal validation of the cluster.
Usage
pvpick_method(dt, clusters, columnClass, metric)
Arguments
dt |
Matrix or data frame with the set of values to be applied to the algorithm. |
clusters |
It's an integer that indexes the number of clusters we want to create. |
metric |
It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette. |
Value
Return a list with both the internal and external evaluation of the grouping.
Method that converts a dataset into a matrix
Description
Method that converts a dataset into a matrix
Usage
read_file(path)
Arguments
path |
dataset directory |
Value
returns a matrix whose content is the dataset received as a parameter
Method to calculate the recall.
Description
Method to calculate the recall.
Usage
recall_metric(true_positive, false_negative)
Arguments
true_positive |
Array with matching elements of B is in the same cluster. |
false_negative |
Array with matching elements of B is not in the same cluster. |
Value
Returns a double with the recall calculation.
Method for refactoring the distance measurement name.
Description
Method for refactoring the distance measurement name.
Usage
refactorName(nameMeasure)
Arguments
nameMeasure |
name of the distance measure |
Value
a string with the refactored measure name
Method for filtering clustering results.
Description
Method for filtering clustering results.
Usage
resultClustering(result)
Arguments
result |
data.frame with clustering results. |
Value
a matrix with the filtered columns.
External results by algorithm.
Description
It is used for obtaining the results of an algorithm indicated as a parameter grouped by number of clusters.
Usage
result_external_algorithm_by_metric(df, metric)
Arguments
df |
data matrix or data frame with the result of running the clustering algorithm. |
metric |
It's a string with the metric to evaluate. |
Value
A data.frame with the results of the algorithm indicated as parameter.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='gmm',
metrics=c("Precision")
)
Clustering::result_external_algorithm_by_metric(result,'Precision')
Internal results by algorithm
Description
It is used for obtaining the results of an algorithm indicated as a parameter grouped by number of clusters.
Usage
result_internal_algorithm_by_metric(df, metric)
Arguments
df |
data matrix or data frame with the result of running the clustering algorithm. |
metric |
It's a string with the metric we want to evaluate your results. |
Value
A data.frame with the results of the algorithm indicated as parameter.
Examples
result = Clustering::clustering(
df = cluster::agriculture,
min = 4,
max = 5,
algorithm='gmm',
metrics=c("Recall","Silhouette")
)
Clustering::result_internal_algorithm_by_metric(result,'Silhouette')
Method in charge of obtaining those metrics that are external from those indicated.
Description
Method in charge of obtaining those metrics that are external from those indicated.
Usage
row_name_df_external(metrics)
Arguments
metrics |
Array with the metrics used in the calculation. |
Value
Return an array with the metrics that are external.
Method in charge of obtaining those metrics that are internal from those indicated.
Description
Method in charge of obtaining those metrics that are internal from those indicated.
Usage
row_name_df_internal(metrics)
Arguments
metrics |
Array with the metrics used in the calculation. |
Value
Return an array with the metrics that are internal.
Method that returns a table with the algorithm and the metric indicated as parameters.
Description
Method that returns a table with the algorithm and the metric indicated as parameters.
Usage
show_result_external_algorithm_by_metric(df, metric)
Arguments
df |
Data matrix or data frame. |
metric |
String with the metric. |
Value
Return a table with the algorithm and the metric indicated as parameter.
Method in charge of obtaining a table with the results of the algorithms grouped by clusters, calculating the maximum value of each external metrics.
Description
Method in charge of obtaining a table with the results of the algorithms grouped by clusters, calculating the maximum value of each external metrics.
Usage
show_result_external_algorithm_group_by_clustering(df)
Arguments
df |
Data matrix or data frame. |
Value
Return a table with the algorithms and the clusters.
Method that returns a table with the algorithm and the metric indicated as parameters.
Description
Method that returns a table with the algorithm and the metric indicated as parameters.
Usage
show_result_internal_algorithm_by_metric(df, metric)
Arguments
df |
Data matrix or data frame. |
metric |
An which we will calculate the results. |
Value
Return a table with the algorithm and the metric indicated as parameter.
Method in charge of obtaining a table with the results of the algorithms grouped by clusters, calculating the maximum value of each internal metrics.
Description
Method in charge of obtaining a table with the results of the algorithms grouped by clusters, calculating the maximum value of each internal metrics.
Usage
show_result_internal_algorithm_group_by_clustering(df)
Arguments
df |
Data matrix or data frame. |
Value
Return a table with the algorithms and the clusters.
Method to calculate the silhouette.
Description
Method to calculate the silhouette.
Usage
silhouette_metric(clusters_vector, distance)
Arguments
clusters_vector |
Array that containe tha data grouped in cluster. |
distance |
Dissimilarity matrix. |
Value
Return a double with the result of the silhouette calculation.
Returns the clustering result sorted by a set of metrics.
Description
This function receives a clustering object and sorts the columns by parameter. By default it performs sorting by the algorithm field.
Usage
## S3 method for class 'clustering'
sort(x, decreasing = TRUE, ...)
Arguments
x |
It's an |
decreasing |
A logical indicating if the sort should be increasing or decreasing. By default, decreasing. |
... |
Additional parameters as "by", a String with the name of the
evaluation measure to order by. Valid values are: |
Details
The additional argument in "..." is the 'by' argument, which is a
array with the name of the evaluation measure to order by. Valid value are:
Algorithm, Distance, Clusters, Data, Var, Time, Entropy,
Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index,
Connectivity, Dunn, Silhouette, TimeAtt
.
Value
Another clustering
object with the evaluation measures sorted
Examples
result <-
Clustering::clustering(df = cluster::agriculture,min = 4, max = 4,algorithm='gmm',
metrics='Recall')
sort(result, FALSE, 'Recall')
Method that format a number with four digits
Description
Method that format a number with four digits
Usage
specify_decimal(x, k)
Arguments
x |
number |
k |
number of decimals |
Value
a number convert to string with four digits
The data provided are daily stock prices from January 1988 through October 1991, for ten aerospace companies.
Description
The data provided are daily stock prices from January 1988 through October 1991, for ten aerospace companies.
Usage
data(stock)
Format
A data frame with 950 observations on 10 variables:
The data provided are daily stock prices from January 1988 through October 1991, for ten aerospace companies.
- Company1
company1 details
- Company2
company2 details
- Company3
company3 details
- Company4
company4 details
- Company5
company5 details
- Company6
company6 details
- Company7
company7 details
- Company8
company8 details
- Company9
company9 details
- Company10
company10 details
Source
KEEL, <http://www.keel.es/>
The study was performed at the 2nd Department of Medicine, 1st Faculty of Medicine of Charles University and Charles University Hospital. The data were transferred to electronic form by the European Centre of Medical Informatics, Statisticsand Epidemiology of Charles University and Academy of Sciences.
Description
The study was performed at the 2nd Department of Medicine, 1st Faculty of Medicine of Charles University and Charles University Hospital. The data were transferred to electronic form by the European Centre of Medical Informatics, Statisticsand Epidemiology of Charles University and Academy of Sciences.
Usage
data(stulong)
Format
A data frame with 1417 observations on 5 variables.
The study was performed at the 2nd Department of Medicine, 1st Faculty of Medicine of Charles University and Charles University Hospital. The data were transferred to electronic form by the European Centre of Medical Informatics, Statisticsand Epidemiology of Charles University and Academy of Sciences.
- a1
Height
- a2
Weight
- a3
Blood pressure I systolic (mm Hg)
- a4
Blood pressure I diastolic (mm Hg)
- a5
ercentage Cholesterol in mg
Source
KEEL, <http://www.keel.es/>
Method for filtering external columns of a dataset.
Description
Method for filtering external columns of a dataset.
Usage
transform_dataset(df)
Arguments
df |
Data frame with clustering results. |
Value
Dafa frame filtered with the columns of the external measurements.
Exists internal measure
Method for filtering internal columns of a dataset.
Description
Method for filtering internal columns of a dataset.
Usage
transform_dataset_internal(df)
Arguments
df |
data frame with clustering results. |
Value
dafa frame filtered with the columns of the internal measurements.
Exists internal measure
Method to calculate the variation information.
Description
Method to calculate the variation information.
Usage
variation_information_metric(conversion_data_frame, table_convert)
Arguments
conversion_data_frame |
Return a double with the result of the entropy calculation. |
table_convert |
Table conversion (variable - cluster). |
Value
Returns a double with the result of the variation information calculation.
One of the most known testing data sets in machine learning. This data sets describes several situations where the weather is suitable or not to play sports, depending on the current outlook, temperature, humidity and wind.
Description
One of the most known testing data sets in machine learning. This data sets describes several situations where the weather is suitable or not to play sports, depending on the current outlook, temperature, humidity and wind.
Usage
data(weather)
Format
A data frame with 14 observations on 5 variables:
One of the most known testing data sets in machine learning. This data sets describes several situations where the weather is suitable or not to play sports, depending on the current outlook, temperature, humidity and wind.
- Outlook
sunny, overcast, rainy
- Temperature
hot, mild, cool
- Humidity
high, normal
- Windy
true, false
- Play
yes, no
Source
KEEL, <http://www.keel.es/>