In the following, we will explain how to use a lama-dictionary (See Creating lama-dictionaries) in order to translate data frame variables or atomic vectors (or factor objects). The main functions are: * lama_translate()
and lama_translate_()
: Assign new labels to variable values and turn them into ordered factors (if to_factor = TRUE
). * lama_translate_all()
: Apply lama_translate()
on all possible columns of a data frame, if there are corresponding translations. * lama_to_factor()
and lama_to_factor_()
: Similar to lama_translate()
and lama_translate_()
, but the variables already have the right values (character or factor), but should be turned into factor variables with the factor levels given in the corresponding translations. * lama_to_factor_all()
: Apply lama_to_factor()
on all possible columns of a data frame, if there are corresponding translations.
Let df
be a data frame with the following structure:
df <- data.frame(
pupil_id = rep(1:4, each = 3),
subject = rep(c("eng", "mat", "gym"), 4),
level = factor(
c("a", "a", "a", "b", "b", "b", "b", "b", "b", "a", "a", "a"),
levels = c("a", "b")
),
result = c(1, 2, 2, NA, 2, NA, 1, 0, 1, 2, 3, NA),
stringsAsFactors = FALSE
)
df
#> pupil_id subject level result
#> 1 1 eng a 1
#> 2 1 mat a 2
#> 3 1 gym a 2
#> 4 2 eng b NA
#> 5 2 mat b 2
#> 6 2 gym b NA
#> 7 3 eng b 1
#> 8 3 mat b 0
#> 9 3 gym b 1
#> 10 4 eng a 2
#> 11 4 mat a 3
#> 12 4 gym a NA
The column subject
(character) contains the subject codes and the column level
(factor) holds the level of the courses (basic
and advanced
) pupils were tested in. The column result
(integer) contains the test results (1
and 2
are positive, 3
and 4
are negative, NA
means that the pupil missed the test and 0
means that something else went wrong).
We want to use the following lama-dictionary in order to translate the data frame variables:
library(labelmachine)
dict <- new_lama_dictionary(
sub = c(eng = "English", mat = "Mathematics", gym = "Gymnastics"),
lev = c(b = "Basic", a = "Advanced"),
result = c(
"1" = "Good",
"2" = "Passed",
"3" = "Not passed",
"4" = "Not passed",
NA_ = "Missed",
"0" = NA
)
)
dict
#>
#> --- lama_dictionary ---
#> Variable 'sub':
#> eng mat gym
#> "English" "Mathematics" "Gymnastics"
#>
#> Variable 'lev':
#> b a
#> "Basic" "Advanced"
#>
#> Variable 'result':
#> 1 2 3 4 NA_
#> "Good" "Passed" "Not passed" "Not passed" "Missed"
#> 0
#> NA
The function lama_translate()
uses non-standard evaluation, which means that we pass in expressions, which will be parsed and we can spare the quotes surrounding column and translation names:
df_new <- lama_translate(
.data = df,
dictionary = dict,
subject_new = sub(subject),
level = lev(level),
result = result(result),
keep_order = c(FALSE, TRUE, FALSE),
to_factor = c(TRUE, TRUE, FALSE)
)
str(df_new)
#> 'data.frame': 12 obs. of 5 variables:
#> $ pupil_id : int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : chr "eng" "mat" "gym" "eng" ...
#> $ level : Factor w/ 2 levels "Advanced","Basic": 1 1 1 2 2 2 2 2 2 1 ...
#> $ result : chr "Good" "Passed" "Passed" "Missed" ...
#> $ subject_new: Factor w/ 3 levels "English","Mathematics",..: 1 2 3 1 2 3 1 2 3 1 ...
The arguments .data
and dictionary
define which data frame should be translated and which lama-dictionary should be used for the translation. The argument keep_order
defines for each given translation if the original ordering of the variable should be kept (ordering of the variable in the data frame df
) or if the ordering given in the translation should be used. The argument to_factor
defines for each translation, if the resulting labeled variable should be a factor variable (to_factor = TRUE
) or a plain character variable (to_factor = FALSE
). Besides the arguments .data
, dictionary
and keep_order
all other arguments are label assignments. The names of the arguments (left hand side of the equations) define the column names under which the labeled variable should be stored. The right hand side of the assignments define the column which should be labeled (parameter name in the brackets) and which translation should be used (function name the left of the brackets). Hence, the statement above does the following things:
subject_new = sub(subject)
: The column subject
in the data frame df
is translated using the translation sub
and the resulting factor is stored under the column name subject_new
. Since the first entry in keep_order
is FALSE
, the ordering given in the translation sub
is used for the labels. Since the first entry in to_factor
is TRUE
the resulting variable is a factor variable.level = lev(level)
: The column level
in the data frame df
is translated using the translation lev
and then overwritten by the resulting factor. Since the second entry in keep_order
is TRUE
, the labeled variable has the same ordering as the original column. Since the second entry in to_factor
is TRUE
the resulting variable is a factor variable.result = result(result)
: The column result
in the data frame df
is translated using the translation result
and then overwritten by the resulting factor. Since the third entry in keep_order
is FALSE
, the ordering given in the translation is used for the labels. Since the third entry in to_factor
is FALSE
the resulting variable is a plain character variable.There are several abbreviations, in order to spare some writing:
result_new = result
is the same as result_new = result(result)
.lev(level)
is the same as level = lev(level)
.result
is the same as result = result(result)
.The function lama_translate_()
is the standard evaluation variant of lama_translate()
, which means that instead of expressions, we pass in character strings holding the names of the translations and columns we want to use:
df_new <- lama_translate_(
.data = df,
dictionary = dict,
translation = c("sub", "lev", "result"),
col = c("subject", "level", "result"),
col_new = c("subject_new", "level", "result"),
keep_order = c(FALSE, TRUE, FALSE),
to_factor = c(TRUE, TRUE, FALSE)
)
str(df_new)
#> 'data.frame': 12 obs. of 5 variables:
#> $ pupil_id : int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : chr "eng" "mat" "gym" "eng" ...
#> $ level : Factor w/ 2 levels "Advanced","Basic": 1 1 1 2 2 2 2 2 2 1 ...
#> $ result : chr "Good" "Passed" "Passed" "Missed" ...
#> $ subject_new: Factor w/ 3 levels "English","Mathematics",..: 1 2 3 1 2 3 1 2 3 1 ...
The arguments .data
and dictionary
define which data frame should be translated and which lama-dictionary should be used for the translation. The argument keep_order
defines for each given translation if the original ordering of the variable should be kept (ordering of the variable in the data frame df
) or if the ordering given in the translation should be used. The result is the same as before, when we used lama_translate()
.
The function lama_translate_all()
is an extension of lama_translate()
, which tries to automatically translate as many columns in the data frame .data
as possible. Therefore, the names of the columns which should be translated must match the names of the translations which should be used:
df_new <- lama_translate_all(
.data = df,
dictionary = dict,
prefix = "new_",
fn_colname = toupper,
suffix = "_labeled",
keep_order = TRUE
)
str(df_new)
#> 'data.frame': 12 obs. of 5 variables:
#> $ pupil_id : int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : chr "eng" "mat" "gym" "eng" ...
#> $ level : Factor w/ 2 levels "a","b": 1 1 1 2 2 2 2 2 2 1 ...
#> $ result : num 1 2 2 NA 2 NA 1 0 1 2 ...
#> $ new_RESULT_labeled: Factor w/ 4 levels "Good","Passed",..: 1 2 2 4 2 4 1 NA 1 2 ...
In the above example, only the column name result
matches the translation name and is therefore translated and stored under the column name new_RESULT_labeled
. The name of the new columns is a transformation of the old column name (e.g. result
), appending the strings given in the arguments prefix
and suffix
at the beginning and at the end of the column name. Before this string concatenation, the name of the original column can be transformed into a other string by using the string transformation function fn_colname
. In our case fn_colname
is given the function toupper
which transforms all letters of the column name result
to upper case RESULT
. Contrary to lama_translate()
, the argument keep_order
is just a single boolean flag. It defines whether the original order of all columns should be kept (keep_order = TRUE
) or if the order in the translation vector should be used. Like in the case of lama_translate()
, it is possible to pass an argument to_factor = FALSE
lama_translate_all
in order to define that all resulting labeled variables shall be stored as plain character vectors.
So far, we only translated variables in data frames, but it is also possible to use lama_translate()
and lama_translate_()
in order to translate atomic vectors (character, logical, numeric) and factors.
Using lama_translate()
:
vec <- c("eng", "eng", "gym", "mat")
vec_labeled <- lama_translate(vec, dict, sub)
Using lama_translate_()
:
vec_labeled <- lama_translate_(vec, dict, "sub")
Sometimes, you already have labeled variables (character or factor variables, maybe produced by lama_translate()
with argument to_factor = FALSE
) and you want to turn them into factor variables with a desired ordering. In this case the functions lama_to_factor()
, lama_to_factor_()
lama_to_factor_all()
are right choices.
Let df_non_factor
a data frame holding the right labels, but no factor variables (created with lama_translate_all()
using to_factor = FALSE
):
dict_new <- lama_rename(dict, subject = sub, level = lev)
df_non_factor <- lama_translate_all(df, dict_new, to_factor = FALSE)
str(df_non_factor)
#> 'data.frame': 12 obs. of 4 variables:
#> $ pupil_id: int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : chr "English" "Mathematics" "Gymnastics" "English" ...
#> $ level : chr "Advanced" "Advanced" "Advanced" "Basic" ...
#> $ result : chr "Good" "Passed" "Passed" "Missed" ...
Turning variables into factors with lama_to_factor()
:
df_factor <- lama_to_factor(
.data = df_non_factor,
dictionary = dict,
subject_new = sub(subject),
level = lev(level),
result = result(result)
)
str(df_factor)
#> 'data.frame': 12 obs. of 5 variables:
#> $ pupil_id : int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : chr "English" "Mathematics" "Gymnastics" "English" ...
#> $ level : Factor w/ 2 levels "Basic","Advanced": 2 2 2 1 1 1 1 1 1 2 ...
#> $ result : Factor w/ 4 levels "Good","Passed",..: 1 2 2 4 2 4 1 NA 1 2 ...
#> $ subject_new: Factor w/ 3 levels "English","Mathematics",..: 1 2 3 1 2 3 1 2 3 1 ...
The function lama_to_factor()
allows the same abbreviations as lama_translate()
. It can also be used on factor variables and there is also a keep_order
argument like in the case of lama_translate()
. Furthermore, the functions lama_to_factor()
and lama_to_factor_()
can both be applied to atomic vectors or plain factors like in the case of lama_translate()
.
Turning variables in a data frame into factors with lama_to_factor_()
:
df_factor <- lama_to_factor_(
.data = df_non_factor,
dictionary = dict,
translation = c("sub", "lev", "result"),
col = c("subject", "level", "result")
)
str(df_factor)
#> 'data.frame': 12 obs. of 4 variables:
#> $ pupil_id: int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : Factor w/ 3 levels "English","Mathematics",..: 1 2 3 1 2 3 1 2 3 1 ...
#> $ level : Factor w/ 2 levels "Basic","Advanced": 2 2 2 1 1 1 1 1 1 2 ...
#> $ result : Factor w/ 4 levels "Good","Passed",..: 1 2 2 4 2 4 1 NA 1 2 ...
Since the argument col_new
was omitted, the variable names (subject
, level
and result
) were overwritten.
Turning all possible variables in a data frame into factors with lama_to_factor_all()
:
df_factor <- lama_to_factor_all(
.data = df_non_factor,
dictionary = dict
)
str(df_factor)
#> 'data.frame': 12 obs. of 4 variables:
#> $ pupil_id: int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : chr "English" "Mathematics" "Gymnastics" "English" ...
#> $ level : chr "Advanced" "Advanced" "Advanced" "Basic" ...
#> $ result : Factor w/ 4 levels "Good","Passed",..: 1 2 2 4 2 4 1 NA 1 2 ...
Since the arguments prefix
, suffix
and fn_colname
were omitted, the variable names (subject
, level
and result
) were overwritten.