The goal of funneljoin is to make it easy to analyze behavior
funnels. For example, maybe you’re interested in finding the people who
visit a page and then register. Or you want all the times people click
on an item and add it to their cart within 2 days. These can all be
answered quickly with funneljoin’s after_join()
or
funnel_start()
and funnel_step()
. As
funneljoin uses dplyr, it can also work with remote tables, but
has only been tried on postgres.
For more examples of how to use funneljoin, check out the vignette, which shows different types of joins and the optional arguments, or this blog post, which showcases how to use funneljoin analyze questions and answers on StackOverflow.
You can install this package from CRAN:
install.packages("funneljoin")
Or you can install the development version from GitHub with remotes:
library(remotes)
install_github("robinsones/funneljoin")
library(dplyr)
library(funneljoin)
We’ll take a look at two tables that come with the package,
landed
and registered
. Each has a column
user_id
and timestamp
.
Let’s say we wanted to get the first time people landed and the first
time afterward they registered. We would after_inner_join()
with a first-firstafter
type:
%>%
landed after_inner_join(registered,
by_user = "user_id",
by_time = "timestamp",
type = "first-firstafter",
suffix = c("_landed", "_registered"))
#> # A tibble: 6 × 3
#> user_id timestamp_landed timestamp_registered
#> <dbl> <date> <date>
#> 1 1 2018-07-01 2018-07-02
#> 2 4 2018-07-01 2018-07-02
#> 3 3 2018-07-02 2018-07-02
#> 4 6 2018-07-07 2018-07-10
#> 5 5 2018-07-10 2018-07-11
#> 6 8 2018-08-01 2018-08-02
The first two arguments are the tables we’re joining, with the first table being the events that happen first. We then specify:
by_time
: the time columns in each table. This would
typically be a datetime or a date column. These columns are used to
filter for time y being after or the same as time x.by_user
:the user or identity columns in each table.
These must be identical for a pair of rows to match.type
: the type of funnel used to distinguish between
event pairs, such as “first-first”, “last-first”, “any-firstafter”.suffix
(optional): just like dplyr’s join functions,
this specifies what should be appended to the names of columns that are
in both tables.type
can be any combination of first
,
last
, any
, and lastbefore
with
first
, last
, any
, and
firstafter
. Some common ones you may use include:
If your time and user columns have different names, you can work with that too:
<- landed %>%
landed rename(landed_at = timestamp,
user_id_x = user_id)
<- registered %>%
registered rename(registered_at = timestamp,
user_id_y = user_id)
%>%
landed after_inner_join(registered,
by_user = c("user_id_x" = "user_id_y"),
by_time = c("landed_at" = "registered_at"),
type = "first-first")
#> # A tibble: 5 × 3
#> user_id_x landed_at registered_at
#> <dbl> <date> <date>
#> 1 1 2018-07-01 2018-07-02
#> 2 3 2018-07-02 2018-07-02
#> 3 6 2018-07-07 2018-07-10
#> 4 5 2018-07-10 2018-07-11
#> 5 8 2018-08-01 2018-08-02
Sometimes you have all the data you need in one table. For example, let’s look at this table of user activity on a website.
<- tibble::tribble(
activity ~ "user_id", ~ "event", ~ "timestamp",
1, "landing", "2019-07-01",
1, "registration", "2019-07-02",
1, "purchase", "2019-07-07",
1, "purchase", "2019-07-10",
2, "landing", "2019-08-01",
2, "registration", "2019-08-15",
3, "landing", "2019-05-01",
3, "registration", "2019-06-01",
3, "purchase", "2019-06-04",
4, "landing", "2019-06-13"
)
We can use funnel_start()
and funnel_step()
to make an activity funnel. funnel_start()
takes five
arguments:
tbl
: The table of events.moment_type
: The first moment, or event, in the
funnel.moment
: The name of the column that indicates the
moment_type
.tstamp
: The name of the column with the timestamps of
the moment.user
: The name of the column indicating the user who
did the moment.%>%
activity funnel_start(moment_type = "landing",
moment = "event",
tstamp = "timestamp",
user = "user_id")
#> # A tibble: 4 × 2
#> user_id timestamp_landing
#> <dbl> <chr>
#> 1 1 2019-07-01
#> 2 2 2019-08-01
#> 3 3 2019-05-01
#> 4 4 2019-06-13
funnel_start()
returns a table with the user_ids and a
column with the name of your timestamp column, _
, and the
moment type. This table also includes metadata.
To add more moments to the funnel, you use
funnel_step()
. Since you’ve indicated in
funnel_start()
what columns to use for each part, now you
only need to have the moment_type
and the type
of after_join()
(e.g. “first-first”, “first-any”).
%>%
activity funnel_start(moment_type = "landing",
moment = "event",
tstamp = "timestamp",
user = "user_id") %>%
funnel_step(moment_type = "registration",
type = "first-firstafter")
#> # A tibble: 4 × 3
#> user_id timestamp_landing timestamp_registration
#> <dbl> <chr> <chr>
#> 1 3 2019-05-01 2019-06-01
#> 2 4 2019-06-13 <NA>
#> 3 1 2019-07-01 2019-07-02
#> 4 2 2019-08-01 2019-08-15
You can continue stacking on funnel_step()
with more
moments.
%>%
activity funnel_start(moment_type = "landing",
moment = "event",
tstamp = "timestamp",
user = "user_id") %>%
funnel_step(moment_type = "registration",
type = "first-firstafter") %>%
funnel_step(moment_type = "purchase",
type = "first-any")
#> # A tibble: 5 × 4
#> user_id timestamp_landing timestamp_registration timestamp_purchase
#> <dbl> <chr> <chr> <chr>
#> 1 3 2019-05-01 2019-06-01 2019-06-04
#> 2 1 2019-07-01 2019-07-02 2019-07-07
#> 3 1 2019-07-01 2019-07-02 2019-07-10
#> 4 2 2019-08-01 2019-08-15 <NA>
#> 5 4 2019-06-13 <NA> <NA>
If you use a type
that allows multiple moments of one
type for a user, like “first-any”, you will get more rows per user
rather than more columns. For example, user 1 had two purchases, so she
now has two rows. The timestamp_landing
and
timestamp_registration
is the same for both rows, but they
have a different timestamp_purchase
.
Finally, you can use the summarize_funnel()
to
understand how many and what percentage of people make it through to
each next step of the funnel. We can also switch to
funnel_steps()
to shorten our code a bit: we give it a
character vector of moment_types
in order and the
type
for each step.
%>%
activity funnel_start(moment_type = "landing",
moment = "event",
tstamp = "timestamp",
user = "user_id") %>%
funnel_steps(moment_types = c("registration", "purchase"),
type = "first-firstafter") %>%
summarize_funnel()
#> # A tibble: 3 × 4
#> moment_type nb_step pct_cumulative pct_step
#> <fct> <int> <dbl> <dbl>
#> 1 landing 4 1 NA
#> 2 registration 3 0.75 0.75
#> 3 purchase 2 0.5 0.667
nb_step
is how many users made it to each step,
pct_cumulative
is what percent that is out of the original
step, and pct_step
is what percentage that is out of those
who made it to the previous step. So in our case, 2 people had a
purchase, which is 50% of the people who landed but 66% of those who
registered.
If you find any bugs or have a feature request or question, please create an issue. If you’d like to add a feature, tests, or other functionality, please also make an issue first and let’s discuss!
Funneljoin was developed at DataCamp by Anthony Baker, David Robinson, and Emily Robinson and continues to be maintained primarily by Emily.