tidyr pivot trick: names_to -> .value

I originally shared this tip for extracting data from column names as a tweet, and I’ve found myself sharing it over and over again since. Here’s a brief transcription now that I’m no longer on Twitter.

One of my favourite data-wrangling tricks is tidyr::pivot_longer’s names_to & .value sentinel - use-cases don’t come up often, but it’s so deeply ✨ satisfying ✨ when it does!

Scenario #

Let’s say you have this starting dataframe:

# A tibble: 601 × 22
   season title       caught_fred caught_daphnie caught_velma caught_shaggy
   <chr>  <chr>       <chr>       <chr>          <chr>        <chr>        
 1 1      What a Nig… FALSE       FALSE          FALSE        TRUE         
 2 1      A Clue for… FALSE       FALSE          FALSE        TRUE         
 3 1      Hassle in … FALSE       FALSE          FALSE        FALSE        
 4 1      Mine Your … TRUE        FALSE          FALSE        FALSE        
 5 1      Decoy for … FALSE       FALSE          FALSE        FALSE        
 6 1      What the H… TRUE        FALSE          FALSE        FALSE        
 7 1      Never Ape … TRUE        FALSE          FALSE        FALSE        
 8 1      Foul Play … FALSE       FALSE          FALSE        FALSE        
 9 1      The Backst… FALSE       FALSE          FALSE        TRUE         
10 1      Bedlam in … FALSE       FALSE          FALSE        TRUE         
# ℹ 591 more rows
# ℹ 16 more variables: caught_scooby <chr>, captured_fred <chr>,
#   captured_daphnie <chr>, captured_velma <chr>, captured_shaggy <chr>,
#   captured_scooby <chr>, unmask_fred <chr>, unmask_daphnie <chr>,
#   unmask_velma <chr>, unmask_shaggy <chr>, unmask_scooby <chr>,
#   snack_fred <chr>, snack_daphnie <chr>, snack_velma <chr>,
#   snack_shaggy <chr>, snack_scooby <chr>
# ℹ Use `print(n = ...)` to see more rows

and you want this ending dataframe:

# A tibble: 3,005 × 7
   season title                     character caught captured unmask snack
   <chr>  <chr>                     <chr>     <chr>  <chr>    <chr>  <chr>
 1 1      What a Night for a Knight fred      FALSE  FALSE    FALSE  TRUE 
 2 1      What a Night for a Knight daphnie   FALSE  FALSE    FALSE  FALSE
 3 1      What a Night for a Knight velma     FALSE  FALSE    FALSE  FALSE
 4 1      What a Night for a Knight shaggy    TRUE   FALSE    FALSE  FALSE
 5 1      What a Night for a Knight scooby    TRUE   FALSE    TRUE   FALSE
 6 1      A Clue for Scooby Doo     fred      FALSE  TRUE     TRUE   FALSE
 7 1      A Clue for Scooby Doo     daphnie   FALSE  TRUE     FALSE  FALSE
 8 1      A Clue for Scooby Doo     velma     FALSE  TRUE     FALSE  TRUE 
 9 1      A Clue for Scooby Doo     shaggy    TRUE   FALSE    FALSE  FALSE
10 1      A Clue for Scooby Doo     scooby    FALSE  FALSE    FALSE  FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows

How do you extract the characters (Fred, Velma, Daphnie, Shaggy, Scooby) from the column names into a character column?

Past me #

Hmm, well, I can pivot_longer, then separate the name, then pivot_wider again?

three_lines <- x %>% 
  pivot_longer(cols = -c("season","title")) %>% 
  separate(name,into = c("action","character"), sep = "_") %>% 
  pivot_wider(names_from = "action", values_from = "value")

# A tibble: 3,005 × 7
   season title                     character caught captured unmask snack
   <chr>  <chr>                     <chr>     <chr>  <chr>    <chr>  <chr>
 1 1      What a Night for a Knight fred      FALSE  FALSE    FALSE  TRUE 
 2 1      What a Night for a Knight daphnie   FALSE  FALSE    FALSE  FALSE
 3 1      What a Night for a Knight velma     FALSE  FALSE    FALSE  FALSE
 4 1      What a Night for a Knight shaggy    TRUE   FALSE    FALSE  FALSE
 5 1      What a Night for a Knight scooby    TRUE   FALSE    TRUE   FALSE
 6 1      A Clue for Scooby Doo     fred      FALSE  TRUE     TRUE   FALSE
 7 1      A Clue for Scooby Doo     daphnie   FALSE  TRUE     FALSE  FALSE
 8 1      A Clue for Scooby Doo     velma     FALSE  TRUE     FALSE  TRUE 
 9 1      A Clue for Scooby Doo     shaggy    TRUE   FALSE    FALSE  FALSE
10 1      A Clue for Scooby Doo     scooby    FALSE  FALSE    FALSE  FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows

This seems to work okay!

But what if I told you there’s a one-liner for this exact situation? #

pivot_longer has a “names_to” argument that takes a special “sentinel value” called ".value" - it immediately re-pivots the columns so that “character” (fred, daphne etc) stays as a column and then the “.value” part becomes colnames again (caught, captured etc)

one_line <- x %>% 
  pivot_longer(
    cols = -c("season","title"), 
    names_to = c(".value","character"), 
    names_sep = "_"
  )

# A tibble: 3,005 × 7
   season title                     character caught captured unmask snack
   <chr>  <chr>                     <chr>     <chr>  <chr>    <chr>  <chr>
 1 1      What a Night for a Knight fred      FALSE  FALSE    FALSE  TRUE 
 2 1      What a Night for a Knight daphnie   FALSE  FALSE    FALSE  FALSE
 3 1      What a Night for a Knight velma     FALSE  FALSE    FALSE  FALSE
 4 1      What a Night for a Knight shaggy    TRUE   FALSE    FALSE  FALSE
 5 1      What a Night for a Knight scooby    TRUE   FALSE    TRUE   FALSE
 6 1      A Clue for Scooby Doo     fred      FALSE  TRUE     TRUE   FALSE
 7 1      A Clue for Scooby Doo     daphnie   FALSE  TRUE     FALSE  FALSE
 8 1      A Clue for Scooby Doo     velma     FALSE  TRUE     FALSE  TRUE 
 9 1      A Clue for Scooby Doo     shaggy    TRUE   FALSE    FALSE  FALSE
10 1      A Clue for Scooby Doo     scooby    FALSE  FALSE    FALSE  FALSE
# ℹ 2,995 more rows
# ℹ Use `print(n = ...)` to see more rows

Why use many line when one line do trick?

You can read more about this pivot_longer feature in this tidyr vignette.

Bonus: data.table equivalent #

I learned later (from Hadley quote-tweeting my thread) that this feature was inspired by a similar feature in data.table, so I wanted to track down how it would be done there. As it turns out, kind of difficult because the feature isn’t in the CRAN version: it was in development as of October 2020, merged to main in May 2021, and still hasn’t made it onto CRAN because of some data.table governance issues (which are finally getting close to being resolved?)

In any case, with the main branch of rdatatable/data.table as of today, here’s how you could do the same pivot trick:

# requires development version of data.table and/or 1.15.0+ 
rlang::check_installed("data.table (>= 1.14.99)")
data.table::data.table(x) |> 
  data.table::melt(
    id.vars = c("season", "title"),
    # measure() is not actually exported but is silently parsed under the hood by melt()
    # - uses `sep = "_"` to identify & process the column names (i.e. split with sep "_")
    # - new column names (i.e. character) are provided unquoted
    # - value.name is equivalent to tidyr's `.value`
    measure.vars = measure(value.name, character, sep = "_")
  )

      season                                title character caught captured unmask  snack
      <char>                               <char>    <char> <char>   <char> <char> <char>
   1:      1            What a Night for a Knight      fred  FALSE    FALSE  FALSE   TRUE
   2:      1                A Clue for Scooby Doo      fred  FALSE     TRUE   TRUE  FALSE
   3:      1                 Hassle in the Castle      fred  FALSE    FALSE   TRUE   TRUE
   4:      1               Mine Your Own Business      fred   TRUE    FALSE   TRUE  FALSE
   5:      1                Decoy for a Dognapper      fred  FALSE    FALSE  FALSE  FALSE
  ---                                                                                    
3001:      2 The Dreaded Remake of Jekyll & Hyde!    scooby  FALSE    FALSE  FALSE  FALSE
3002:  Movie         Happy Halloween, Scooby-Doo!    scooby  FALSE    FALSE  FALSE  FALSE
3003:  Movie  Scooby-Doo! The Sword and the Scoob    scooby  FALSE    FALSE  FALSE  FALSE
3004:      2              Dark Diner of Route 66!    scooby   TRUE    FALSE  FALSE  FALSE
3005:      2                      Total Jeopardy!    scooby  FALSE    FALSE  FALSE  FALSE

Full Code #

library(tidyverse)

scooby_data <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-13/scoobydoo.csv")

x <- 
  scooby_data %>% 
  select(season, title, 
    starts_with("caught"),
    starts_with("captured"),
    starts_with("unmask"),
    starts_with("snack"),
    -contains("other"),
    -contains("not")) %>% 
  filter(title!= "Wrestle Maniacs")

three_lines <- x %>% 
  pivot_longer(cols = -c("season","title")) %>% 
  separate(name,into = c("action","character"), sep = "_") %>% 
  pivot_wider(names_from = "action", values_from = "value")

one_line <- x %>% 
  pivot_longer(cols = -c("season","title"), names_to = c(".value","character"), names_sep = "_")
  
rlang::check_installed("data.table (>= 1.14.9)")
data.table::data.table(x) |> 
  data.table::melt(
    id.vars = c("season", "title"),
    # measure() is not actually exported but is silently parsed under the hood by melt()
    # - uses `sep = "_"` to identify & process the column names (i.e. split with sep "_")
    # - new column names (i.e. character) are provided unquoted/as bare symbols
    # - value.name is equivalent to tidyr's `.value`
    measure.vars = measure(value.name, character, sep = "_")
  )