tidyr pivot trick: names_to -> .value
One of my favourite data-wrangling tricks is tidyr::pivot_longer’s
names_to
& .value
sentinel - use-cases don’t come up often, but it’s so deeply
✨ satisfying ✨ when it does!
Scenario #
Let’s say you have this starting dataframe:
# A tibble: 601 × 22
season title caught_fred caught_daphnie caught_velma caught_shaggy
<chr> <chr> <chr> <chr> <chr> <chr>
1 1 What a Nig… FALSE FALSE FALSE TRUE
2 1 A Clue for… FALSE FALSE FALSE TRUE
3 1 Hassle in … FALSE FALSE FALSE FALSE
4 1 Mine Your … TRUE FALSE FALSE FALSE
5 1 Decoy for … FALSE FALSE FALSE FALSE
6 1 What the H… TRUE FALSE FALSE FALSE
7 1 Never Ape … TRUE FALSE FALSE FALSE
8 1 Foul Play … FALSE FALSE FALSE FALSE
9 1 The Backst… FALSE FALSE FALSE TRUE
10 1 Bedlam in … FALSE FALSE FALSE TRUE
# ℹ 591 more rows
# ℹ 16 more variables: caught_scooby <chr>, captured_fred <chr>,
# captured_daphnie <chr>, captured_velma <chr>, captured_shaggy <chr>,
# captured_scooby <chr>, unmask_fred <chr>, unmask_daphnie <chr>,
# unmask_velma <chr>, unmask_shaggy <chr>, unmask_scooby <chr>,
# snack_fred <chr>, snack_daphnie <chr>, snack_velma <chr>,
# snack_shaggy <chr>, snack_scooby <chr>
# ℹ Use `print(n = ...)` to see more rows
and you want this ending dataframe:
# A tibble: 3,005 × 7
season title character caught captured unmask snack
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 What a Night for a Knight fred FALSE FALSE FALSE TRUE
2 1 What a Night for a Knight daphnie FALSE FALSE FALSE FALSE
3 1 What a Night for a Knight velma FALSE FALSE FALSE FALSE
4 1 What a Night for a Knight shaggy TRUE FALSE FALSE FALSE
5 1 What a Night for a Knight scooby TRUE FALSE TRUE FALSE
6 1 A Clue for Scooby Doo fred FALSE TRUE TRUE FALSE
7 1 A Clue for Scooby Doo daphnie FALSE TRUE FALSE FALSE
8 1 A Clue for Scooby Doo velma FALSE TRUE FALSE TRUE
9 1 A Clue for Scooby Doo shaggy TRUE FALSE FALSE FALSE
10 1 A Clue for Scooby Doo scooby FALSE FALSE FALSE FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows
How do you extract the characters (Fred, Velma, Daphnie, Shaggy, Scooby) from the
column names into a character
column?
Past me #
Hmm, well, I can pivot_longer, then separate the name, then pivot_wider again?
three_lines <- x %>%
pivot_longer(cols = -c("season","title")) %>%
separate(name,into = c("action","character"), sep = "_") %>%
pivot_wider(names_from = "action", values_from = "value")
# A tibble: 3,005 × 7
season title character caught captured unmask snack
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 What a Night for a Knight fred FALSE FALSE FALSE TRUE
2 1 What a Night for a Knight daphnie FALSE FALSE FALSE FALSE
3 1 What a Night for a Knight velma FALSE FALSE FALSE FALSE
4 1 What a Night for a Knight shaggy TRUE FALSE FALSE FALSE
5 1 What a Night for a Knight scooby TRUE FALSE TRUE FALSE
6 1 A Clue for Scooby Doo fred FALSE TRUE TRUE FALSE
7 1 A Clue for Scooby Doo daphnie FALSE TRUE FALSE FALSE
8 1 A Clue for Scooby Doo velma FALSE TRUE FALSE TRUE
9 1 A Clue for Scooby Doo shaggy TRUE FALSE FALSE FALSE
10 1 A Clue for Scooby Doo scooby FALSE FALSE FALSE FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows
This seems to work okay!
But what if I told you there’s a one-liner for this exact situation? #
pivot_longer has a “names_to” argument that takes a special “sentinel value”
called ".value"
- it immediately re-pivots the columns so that “character”
(fred, daphne etc) stays as a column and then the “.value” part becomes colnames
again (caught, captured etc)
one_line <- x %>%
pivot_longer(
cols = -c("season","title"),
names_to = c(".value","character"),
names_sep = "_"
)
# A tibble: 3,005 × 7
season title character caught captured unmask snack
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 What a Night for a Knight fred FALSE FALSE FALSE TRUE
2 1 What a Night for a Knight daphnie FALSE FALSE FALSE FALSE
3 1 What a Night for a Knight velma FALSE FALSE FALSE FALSE
4 1 What a Night for a Knight shaggy TRUE FALSE FALSE FALSE
5 1 What a Night for a Knight scooby TRUE FALSE TRUE FALSE
6 1 A Clue for Scooby Doo fred FALSE TRUE TRUE FALSE
7 1 A Clue for Scooby Doo daphnie FALSE TRUE FALSE FALSE
8 1 A Clue for Scooby Doo velma FALSE TRUE FALSE TRUE
9 1 A Clue for Scooby Doo shaggy TRUE FALSE FALSE FALSE
10 1 A Clue for Scooby Doo scooby FALSE FALSE FALSE FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows
Why use many line when one line do trick?
You can read more about this pivot_longer feature in this tidyr vignette.
Bonus: data.table equivalent #
I learned later (from Hadley quote-tweeting my thread) that this feature was
inspired by a similar feature in data.table
, so I wanted to track down how it
would be done there. As it turns out, kind of difficult because the feature isn’t
in the CRAN version: it was in development as of October 2020, merged to main in
May 2021, and still hasn’t made it onto CRAN because of some data.table governance
issues (which are finally getting close to being resolved?)
In any case, with the main branch of rdatatable/data.table as of today, here’s how you could do the same pivot trick:
# requires development version of data.table and/or 1.15.0+
rlang::check_installed("data.table (>= 1.14.99)")
data.table::data.table(x) |>
data.table::melt(
id.vars = c("season", "title"),
# measure() is not actually exported but is silently parsed under the hood by melt()
# - uses `sep = "_"` to identify & process the column names (i.e. split with sep "_")
# - new column names (i.e. character) are provided unquoted
# - value.name is equivalent to tidyr's `.value`
measure.vars = measure(value.name, character, sep = "_")
)
season title character caught captured unmask snack
<char> <char> <char> <char> <char> <char> <char>
1: 1 What a Night for a Knight fred FALSE FALSE FALSE TRUE
2: 1 A Clue for Scooby Doo fred FALSE TRUE TRUE FALSE
3: 1 Hassle in the Castle fred FALSE FALSE TRUE TRUE
4: 1 Mine Your Own Business fred TRUE FALSE TRUE FALSE
5: 1 Decoy for a Dognapper fred FALSE FALSE FALSE FALSE
---
3001: 2 The Dreaded Remake of Jekyll & Hyde! scooby FALSE FALSE FALSE FALSE
3002: Movie Happy Halloween, Scooby-Doo! scooby FALSE FALSE FALSE FALSE
3003: Movie Scooby-Doo! The Sword and the Scoob scooby FALSE FALSE FALSE FALSE
3004: 2 Dark Diner of Route 66! scooby TRUE FALSE FALSE FALSE
3005: 2 Total Jeopardy! scooby FALSE FALSE FALSE FALSE
Full Code #
library(tidyverse)
scooby_data <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-13/scoobydoo.csv")
x <-
scooby_data %>%
select(season, title,
starts_with("caught"),
starts_with("captured"),
starts_with("unmask"),
starts_with("snack"),
-contains("other"),
-contains("not")) %>%
filter(title!= "Wrestle Maniacs")
three_lines <- x %>%
pivot_longer(cols = -c("season","title")) %>%
separate(name,into = c("action","character"), sep = "_") %>%
pivot_wider(names_from = "action", values_from = "value")
one_line <- x %>%
pivot_longer(cols = -c("season","title"), names_to = c(".value","character"), names_sep = "_")
rlang::check_installed("data.table (>= 1.14.9)")
data.table::data.table(x) |>
data.table::melt(
id.vars = c("season", "title"),
# measure() is not actually exported but is silently parsed under the hood by melt()
# - uses `sep = "_"` to identify & process the column names (i.e. split with sep "_")
# - new column names (i.e. character) are provided unquoted/as bare symbols
# - value.name is equivalent to tidyr's `.value`
measure.vars = measure(value.name, character, sep = "_")
)
Original gist for reference.