tidyr pivot trick: names_to -> .value
One of my favourite data-wrangling tricks is tidyr::pivot_longer’s
names_to
& .value
sentinel - use-cases don’t come up often, but it’s so deeply
✨ satisfying ✨ when it does!
Scenario #
Let’s say you have this starting dataframe:
# A tibble: 601 × 22
season title caught_fred caught_daphnie caught_velma caught_shaggy
<chr> <chr> <chr> <chr> <chr> <chr>
1 1 What a Nig… FALSE FALSE FALSE TRUE
2 1 A Clue for… FALSE FALSE FALSE TRUE
3 1 Hassle in … FALSE FALSE FALSE FALSE
4 1 Mine Your … TRUE FALSE FALSE FALSE
5 1 Decoy for … FALSE FALSE FALSE FALSE
6 1 What the H… TRUE FALSE FALSE FALSE
7 1 Never Ape … TRUE FALSE FALSE FALSE
8 1 Foul Play … FALSE FALSE FALSE FALSE
9 1 The Backst… FALSE FALSE FALSE TRUE
10 1 Bedlam in … FALSE FALSE FALSE TRUE
# ℹ 591 more rows
# ℹ 16 more variables: caught_scooby <chr>, captured_fred <chr>,
# captured_daphnie <chr>, captured_velma <chr>, captured_shaggy <chr>,
# captured_scooby <chr>, unmask_fred <chr>, unmask_daphnie <chr>,
# unmask_velma <chr>, unmask_shaggy <chr>, unmask_scooby <chr>,
# snack_fred <chr>, snack_daphnie <chr>, snack_velma <chr>,
# snack_shaggy <chr>, snack_scooby <chr>
# ℹ Use `print(n = ...)` to see more rows
and you want this ending dataframe:
# A tibble: 3,005 × 7
season title character caught captured unmask snack
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 What a Night for a Knight fred FALSE FALSE FALSE TRUE
2 1 What a Night for a Knight daphnie FALSE FALSE FALSE FALSE
3 1 What a Night for a Knight velma FALSE FALSE FALSE FALSE
4 1 What a Night for a Knight shaggy TRUE FALSE FALSE FALSE
5 1 What a Night for a Knight scooby TRUE FALSE TRUE FALSE
6 1 A Clue for Scooby Doo fred FALSE TRUE TRUE FALSE
7 1 A Clue for Scooby Doo daphnie FALSE TRUE FALSE FALSE
8 1 A Clue for Scooby Doo velma FALSE TRUE FALSE TRUE
9 1 A Clue for Scooby Doo shaggy TRUE FALSE FALSE FALSE
10 1 A Clue for Scooby Doo scooby FALSE FALSE FALSE FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows
How do you extract the characters (Fred, Velma, Daphnie, Shaggy, Scooby) from the
column names into a character
column?
Past me #
Hmm, well, I can pivot_longer, then separate the name, then pivot_wider again?
three_lines <- x %>%
pivot_longer(cols = -c("season","title")) %>%
separate(name,into = c("action","character"), sep = "_") %>%
pivot_wider(names_from = "action", values_from = "value")
# A tibble: 3,005 × 7
season title character caught captured unmask snack
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 What a Night for a Knight fred FALSE FALSE FALSE TRUE
2 1 What a Night for a Knight daphnie FALSE FALSE FALSE FALSE
3 1 What a Night for a Knight velma FALSE FALSE FALSE FALSE
4 1 What a Night for a Knight shaggy TRUE FALSE FALSE FALSE
5 1 What a Night for a Knight scooby TRUE FALSE TRUE FALSE
6 1 A Clue for Scooby Doo fred FALSE TRUE TRUE FALSE
7 1 A Clue for Scooby Doo daphnie FALSE TRUE FALSE FALSE
8 1 A Clue for Scooby Doo velma FALSE TRUE FALSE TRUE
9 1 A Clue for Scooby Doo shaggy TRUE FALSE FALSE FALSE
10 1 A Clue for Scooby Doo scooby FALSE FALSE FALSE FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows
This seems to work okay!
But what if I told you there’s a one-liner for this exact situation? #
pivot_longer has a “names_to” argument that takes a special “sentinel value”
called ".value"
- it immediately re-pivots the columns so that “character”
(fred, daphne etc) stays as a column and then the “.value” part becomes colnames
again (caught, captured etc)
one_line <- x %>%
pivot_longer(
cols = -c("season","title"),
names_to = c(".value","character"),
names_sep = "_"
)
# A tibble: 3,005 × 7
season title character caught captured unmask snack
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 What a Night for a Knight fred FALSE FALSE FALSE TRUE
2 1 What a Night for a Knight daphnie FALSE FALSE FALSE FALSE
3 1 What a Night for a Knight velma FALSE FALSE FALSE FALSE
4 1 What a Night for a Knight shaggy TRUE FALSE FALSE FALSE
5 1 What a Night for a Knight scooby TRUE FALSE TRUE FALSE
6 1 A Clue for Scooby Doo fred FALSE TRUE TRUE FALSE
7 1 A Clue for Scooby Doo daphnie FALSE TRUE FALSE FALSE
8 1 A Clue for Scooby Doo velma FALSE TRUE FALSE TRUE
9 1 A Clue for Scooby Doo shaggy TRUE FALSE FALSE FALSE
10 1 A Clue for Scooby Doo scooby FALSE FALSE FALSE FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows
Why use many line when one line do trick?
You can read more about this pivot_longer feature in this tidyr vignette.
Bonus: data.table equivalent #
Implementing this idea (inspired by data.table) reduced a lot of the need to use pivot_wider() when tidying data https://t.co/TVltc5ZYNZ
— Hadley Wickham (@hadleywickham) March 5, 2022
I learned later (from Hadley quote-tweeting my thread) that this feature was
inspired by a similar feature in data.table
, so I wanted to track down how it
would be done there. As it turns out, kind of difficult because the feature isn’t
in the CRAN version: it was in development as of October 2020, merged to main in
May 2021, and still hasn’t made it onto CRAN because of some data.table governance
issues (which I won’t comment on in this post, that’s a whole can of worms for another
time).
In any case, with the main branch of rdatatable/data.table as of today, here’s how you could do the same pivot trick:
# requires development version of data.table
rlang::check_installed("data.table (>= 1.14.9)")
data.table::data.table(x) |>
data.table::melt(
id.vars = c("season", "title"),
# measure() is not actually exported but is silently parsed under the hood by melt()
# - uses `sep = "_"` to identify & process the column names (i.e. split with sep "_")
# - new column names (i.e. character) are provided unquoted
# - value.name is equivalent to tidyr's `.value`
measure.vars = measure(value.name, character, sep = "_")
)
season title character caught captured unmask snack
<char> <char> <char> <char> <char> <char> <char>
1: 1 What a Night for a Knight fred FALSE FALSE FALSE TRUE
2: 1 A Clue for Scooby Doo fred FALSE TRUE TRUE FALSE
3: 1 Hassle in the Castle fred FALSE FALSE TRUE TRUE
4: 1 Mine Your Own Business fred TRUE FALSE TRUE FALSE
5: 1 Decoy for a Dognapper fred FALSE FALSE FALSE FALSE
---
3001: 2 The Dreaded Remake of Jekyll & Hyde! scooby FALSE FALSE FALSE FALSE
3002: Movie Happy Halloween, Scooby-Doo! scooby FALSE FALSE FALSE FALSE
3003: Movie Scooby-Doo! The Sword and the Scoob scooby FALSE FALSE FALSE FALSE
3004: 2 Dark Diner of Route 66! scooby TRUE FALSE FALSE FALSE
3005: 2 Total Jeopardy! scooby FALSE FALSE FALSE FALSE
Full Code #
library(tidyverse)
scooby_data <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-13/scoobydoo.csv")
x <-
scooby_data %>%
select(season, title,
starts_with("caught"),
starts_with("captured"),
starts_with("unmask"),
starts_with("snack"),
-contains("other"),
-contains("not")) %>%
filter(title!= "Wrestle Maniacs")
three_lines <- x %>%
pivot_longer(cols = -c("season","title")) %>%
separate(name,into = c("action","character"), sep = "_") %>%
pivot_wider(names_from = "action", values_from = "value")
one_line <- x %>%
pivot_longer(cols = -c("season","title"), names_to = c(".value","character"), names_sep = "_")
rlang::check_installed("data.table (>= 1.14.9)")
data.table::data.table(x) |>
data.table::melt(
id.vars = c("season", "title"),
# measure() is not actually exported but is silently parsed under the hood by melt()
# - uses `sep = "_"` to identify & process the column names (i.e. split with sep "_")
# - new column names (i.e. character) are provided unquoted/as bare symbols
# - value.name is equivalent to tidyr's `.value`
measure.vars = measure(value.name, character, sep = "_")
)
Original gist for reference.