Particularly when teaching, it can be helpful to highlight specific pieces of the equation. With {equatiomatic}, we can do this as part of the equation extraction. For example, imagine we have a simple linear regression model, like this:
library(equatiomatic)
data("penguins", package = "equatiomatic")
slr <- lm(bill_length_mm ~ body_mass_g, data = penguins)We may want to start by highlighting the independent and dependent variables. We can do this with colors. For example:
\[ {\color{cornflowerblue}{\operatorname{bill\_length\_mm}}} = \alpha + \beta_{1}({\color{firebrick}{\operatorname{body\_mass\_g}}}) + \epsilon \]
and then we can use those same colors later in text, as in “The Dependent Variable is the length of the penguins bill, which is predicted by the Independent Variable, the body mass of the penguin.”
Note that we colorize the variables through the
var_colors= argument, which takes a named vector. The name
is equal to the variable you’d like to change the color of, and the
element is the actual color.
We could also take this further, to colorize the coefficients (the Greek notation). This argument structure is slightly different, taking either a single color, or a vector of colors. The notation includes three Greek characters, representing the model intercept, slope, and residual variance. We can just colorize the intercept with code like this:
extract_eq(slr,
  var_colors = c(
    bill_length_mm = "cornflowerblue",
    body_mass_g    = "firebrick"),
  greek_colors = c(
    "#3bd100", rep("black", 2)
  )
)\[ {\color{cornflowerblue}{\operatorname{bill\_length\_mm}}} = {\color{#3bd100}{\alpha}} + {\color{black}{\beta}}_{1}({\color{firebrick}{\operatorname{body\_mass\_g}}}) + {\color{black}{\epsilon}} \]
Or all three with something like:
greek_col <- c("#1b9e77", "#d95f02", "#7570b3")
extract_eq(slr, 
  var_colors = c(
    bill_length_mm = "cornflowerblue",
    body_mass_g    = "firebrick"),
  greek_colors = greek_col
)\[ {\color{cornflowerblue}{\operatorname{bill\_length\_mm}}} = {\color{#1b9e77}{\alpha}} + {\color{#d95f02}{\beta}}_{1}({\color{firebrick}{\operatorname{body\_mass\_g}}}) + {\color{#7570b3}{\epsilon}} \]
Note that all of this works with more complicated models as well. For example, consider a model with an interaction. By coloring the variable names, we can follow both the main effects and the interaction.
m_interaction <- lm(bill_length_mm ~ body_mass_g * flipper_length_mm,
  data = penguins)
extract_eq(m_interaction,
  var_colors = c(
    body_mass_g       = "#ffa91f",
    flipper_length_mm = "#00d1ab"),
  greek_colors = c(
    "black", "#3A21B3", "#58A1D9", "#FF7582", "black"),
  wrap = TRUE, terms_per_line = 3
)\[ \begin{aligned} \operatorname{bill\_length\_mm} &= {\color{black}{\alpha}} + {\color{#3A21B3}{\beta}}_{1}({\color{#ffa91f}{\operatorname{body\_mass\_g}}}) + {\color{#58A1D9}{\beta}}_{2}({\color{#00d1ab}{\operatorname{flipper\_length\_mm}}})\ + \\ &\quad {\color{#FF7582}{\beta}}_{3}({\color{#ffa91f}{\operatorname{body\_mass\_g}}} \times {\color{#00d1ab}{\operatorname{flipper\_length\_mm}}}) + {\color{black}{\epsilon}} \end{aligned} \]
Here, we’re using two different shades of blue to denote the main effects, and a pink color to denote the interaction. At the same time, we see how the variables combine. We can also change the subscripts. Perhaps we want to have them match the coefficients. The interface for the subscripts is exactly the same as the subscripts—either a single color or a vector of colors. One potentially confusing part of this, however, is that the colors still need to correspond to their position in the equation. If the term does not have a subscript, you can fill the positions with NA values or any other color (it won’t matter because there are not subscripts for those terms).
extract_eq(m_interaction,
  var_colors = c(
    body_mass_g       = "#ffa91f",
    flipper_length_mm = "#00d1ab"),
  greek_colors = c(
    "black", "#3A21B3", "#58A1D9", "#FF7582", "black"),
  subscript_colors = c(
     NA_character_, "#3A21B3", "#58A1D9", "#FF7582", NA_character_),
  wrap = TRUE, terms_per_line = 3
)\[ \begin{aligned} \operatorname{bill\_length\_mm} &= {\color{black}{\alpha}} + {\color{#3A21B3}{\beta}}_{{\color{#3A21B3}{1}}}({\color{#ffa91f}{\operatorname{body\_mass\_g}}}) + {\color{#58A1D9}{\beta}}_{{\color{#58A1D9}{2}}}({\color{#00d1ab}{\operatorname{flipper\_length\_mm}}})\ + \\ &\quad {\color{#FF7582}{\beta}}_{{\color{#FF7582}{3}}}({\color{#ffa91f}{\operatorname{body\_mass\_g}}} \times {\color{#00d1ab}{\operatorname{flipper\_length\_mm}}}) + {\color{black}{\epsilon}} \end{aligned} \]
Again, we may want to use these colors in our explanation. For example
In the above model, both the body mass and the flipper length of penguins are used to predict their bill length. We estimate the main effect of body mass, the main effect of flipper length, and their interaction. The interaction implies that the relation between body mass and bill length depends upon flipper length. Or, equivalently, that the relation between flipper length and bill length depends upon body mass.
Finally, there’s is one additional means by which we can control
colors. By default, {equatiomatic} handles categorical variables by
putting the corresponding levels in subscripts (relative to the
reference group, which is omitted). We can also change the color of
these variable subscripts, with the var_subscript_colors
argument.
m_categorical <- lm(bill_length_mm ~ species + island, data = penguins)
extract_eq(m_categorical,
  var_colors = c(
    species = "#FB2C4B",
    island  = "#643B77"),
  var_subscript_colors = c(
    species = "#0274B2",
    island  = "#FBA640")
)\[ \operatorname{bill\_length\_mm} = \alpha + \beta_{1}({\color{#FB2C4B}{\operatorname{species}}}{\color{#0274B2}{_{\operatorname{Chinstrap}}}}) + \beta_{2}({\color{#FB2C4B}{\operatorname{species}}}{\color{#0274B2}{_{\operatorname{Gentoo}}}}) + \beta_{3}({\color{#643B77}{\operatorname{island}}}{\color{#FBA640}{_{\operatorname{Dream}}}}) + \beta_{4}({\color{#643B77}{\operatorname{island}}}{\color{#FBA640}{_{\operatorname{Torgersen}}}}) + \epsilon \]
Note that the colorization is at the variable level, not the subscript level.
To make the previous equations more human-readable, we might want to change the variable names. We can do this through a similar interface while still keeping the colors intact. For example, our interaction model might look something like this:
extract_eq(m_interaction,
  swap_var_names = c(
    "bill_length_mm"    = "Bill Length [mm]",
    "body_mass_g"       = "Body Mass [g])",
    "flipper_length_mm" = "Flipper Length [mm]"),
  var_colors = c(
    flipper_length_mm   = "firebrick",
    body_mass_g         = "cornflowerblue"),
  wrap = TRUE, terms_per_line = 3
)\[ \begin{aligned} \operatorname{Bill\ Length\ [mm]} &= \alpha + \beta_{1}({\color{cornflowerblue}{\operatorname{Body\ Mass\ [g])}}}) + \beta_{2}({\color{firebrick}{\operatorname{Flipper\ Length\ [mm]}}})\ + \\ &\quad \beta_{3}({\color{cornflowerblue}{\operatorname{Body\ Mass\ [g])}}} \times {\color{firebrick}{\operatorname{Flipper\ Length\ [mm]}}}) + \epsilon \end{aligned} \]
You can similarly change the variable subscript names. For example:
extract_eq(m_categorical,
  swap_var_names = c(
    "bill_length_mm" = "Bill Length [mm]",
    "species"        = "Species",
    "island"         = "Island"),
  swap_subscript_names = c(
    Chinstrap        = "little buddy",
    Gentoo           = "happy feet"),
  var_colors = c(
    species          = "#FB2C4B",
    island           = "#643B77"),
  var_subscript_colors = c(
    species          = "#0274B2",
    island           = "#FBA640"),
  wrap = TRUE, terms_per_line = 3
)\[ \begin{aligned} \operatorname{Bill\ Length\ [mm]} &= \alpha + \beta_{1}({\color{#FB2C4B}{\operatorname{Species}}}{\color{#0274B2}{_{\operatorname{little\ buddy}}}}) + \beta_{2}({\color{#FB2C4B}{\operatorname{Species}}}{\color{#0274B2}{_{\operatorname{happy\ feet}}}})\ + \\ &\quad \beta_{3}({\color{#643B77}{\operatorname{Island}}}{\color{#FBA640}{_{\operatorname{Dream}}}}) + \beta_{4}({\color{#643B77}{\operatorname{Island}}}{\color{#FBA640}{_{\operatorname{Torgersen}}}}) + \epsilon \end{aligned} \]
Everything shown above is fully implemented in all model types
handled by {equatiomatic} with the exception of mixed effect models
(lme4::lmer() and lme4::glmer()) and
time-series models. For mixed effects models, colorization has been
partially implemented—you can use the interface shown above to change
the color or names variables, as well as variable subscripts. However,
Greek characters cannot be colored automatically at present. These
models will be fully implemented in a future release.
Finally, you might have noticed that the number and length of arguments to equations can become rather long. Because of this, we are currently considering moving to a piped interface. The last example may then turn into something like:
create_eq(m_categorical) |> 
  swap_var_names(
    "bill_length_mm" = "Bill Length [mm]",
    "species"        = "Species",
    "island"         = "Island"
  ) |> 
  swap_subscript_names(
    Chinstrap        = "little buddy",
    Gentoo           = "happy feet"
  ) |> 
  colorize_variables(
    species          = "#FB2C4B",
    island           = "#643B77"
  ) |>
  colorize_variable_subscripts(
    species          = "#0274B2",
    island           = "#FBA640"
  ) |>
  wrap(terms_per_line = 3)The length is perhaps not a whole lot less, but we think this layering approach might make building up equations easier and more intuitive, not unlike how you build-up a plot with {ggplot2}.
If you have any feedback on this, or other features, please don’t hesitate to get in touch.