A catalog of several million tasks Pythia can do.

Authors

T. Ben Thompson

Michael Sklar

Published

June 25, 2023

We’re sharing datasets that we hope will be useful for language model interpretability.

  1. Token-bigram and token-trigram prediction: a dataset of n-gram statistics from The Pile (Gao et al. 2020) including tables of one and two token prompts with their most likely completions. One of the simplest “tasks” for a language model is bigram completion.
    • for example, during training, 99.8% of the time the model sees " telome", the correct next token is "res".
  2. First token deletion: a dataset constructed by differencing the outputs of Pythia-2.8B (Biderman et al. 2023) between four and five token prompts. This method highlights tokens that are extremely predictive in context.
    • for example, when prompted with ", or common table", the model predicts " expression" (CTE) with probability 0.37. But, if we prompt with " chloride, or common table", then the model predicts " salt" with probability 0.99.

The data

In following sections we will give details on the construction and statistics of these datasets. Before continuing, we share some interactive data previews:

  • Deletion: the first 25000 rows of pile_scan_4.
  • Bigrams: the entirety of pile_top_bigrams, which contains bigrams with suffix probability greater than 50%.
  • Trigrams: the first 25000 rows of pile_top_trigrams, which contains trigrams with suffix probability greater than 50% and count greater than 1000.

The columns of the table below:

  • text: the two prompts provided. The additional token of backwards context is surrounded by square brackets. The example in the introduction would be written "[_chloride],_or_common_table".
  • token_short: the most likely next token predicted by Pythia-2.8B for the four token prompt.
  • token_long: the most likely next token predicted by Pythia-2.8B for the five token prompt.
  • p_short: the probability Pythia-2.8B assigns to token_short.
  • p_long: the probability Pythia-2.8B assigns to token_long.
  • JS: the Jensen-Shannon divergence between the model’s output distributions for the four and five token prompts.

Note:

  • in the table, spaces are replaced with underscores for clarity.
  • there are offensive tokens in the dataset. We have not removed them.
text token_short token_long p_short p_long JS
Loading... (need help?)