{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A catalog of several million tasks Pythia can do\n", "\n", "T. Ben Thompson \n", "Michael Sklar \n", "2023-06-25\n", "\n", "We’re sharing datasets that we hope will be useful for language model\n", "interpretability.\n", "\n", "1. **Token-bigram and token-trigram prediction**: a dataset of n-gram\n", " statistics from [The Pile](https://pile.eleuther.ai)\n", " [\\[1\\]](#ref-pile) including tables of one and two token prompts\n", " with their most likely completions. One of the simplest “tasks” for\n", " a language model is bigram completion.\n", " - for example, during training, 99.8% of the time the model sees\n", " `\" telome\"`, the correct next token is `\"res\"`.\n", "2. **First token deletion**: a dataset constructed by differencing the\n", " outputs of Pythia-2.8B [\\[2\\]](#ref-biderman2023pythia) between four\n", " and five token prompts. This method highlights tokens that are\n", " extremely predictive in context.\n", " - for example, when prompted with `\", or common table\"`, the model\n", " predicts `\" expression\"`\n", " ([CTE](https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL#Common_table_expression))\n", " with probability 0.37. But, if we prompt with\n", " `\" chloride, or common table\"`, then the model predicts\n", " `\" salt\"` with probability 0.99.\n", "\n", "## The data\n", "\n", "In following sections we will give details on the construction and\n", "statistics of these datasets. Before continuing, we share some\n", "interactive data previews:\n", "\n", "- **Deletion**: the first 25000 rows of\n", " [pile_scan_4](https://huggingface.co/datasets/Confirm-Labs/pile_scan_4).\n", "- **Bigrams**: the entirety of\n", " [pile_top_bigrams](https://huggingface.co/datasets/Confirm-Labs/pile_top_bigrams),\n", " which contains bigrams with suffix probability greater than 50%.\n", "- **Trigrams**: the first 25000 rows of\n", " [pile_top_trigrams](https://huggingface.co/datasets/Confirm-Labs/pile_top_trigrams),\n", " which contains trigrams with suffix probability greater than 50% and\n", " count greater than 1000.\n", "\n", "## **Deletion**\n", "\n", "The columns of the table below:\n", "\n", "- `text`: the two prompts provided. The additional token of backwards\n", " context is surrounded by square brackets. The example in the\n", " introduction would be written `\"[_chloride],_or_common_table\"`.\n", "- `token_short`: the most likely next token predicted by Pythia-2.8B\n", " for the *four* token prompt.\n", "- `token_long`: the most likely next token predicted by Pythia-2.8B\n", " for the *five* token prompt.\n", "- `p_short`: the probability Pythia-2.8B assigns to `token_short`.\n", "- `p_long`: the probability Pythia-2.8B assigns to `token_long`.\n", "- `JS`: the Jensen-Shannon divergence between the model’s output\n", " distributions for the four and five token prompts.\n", "\n", "Note:\n", "\n", "- in the table, spaces are replaced with underscores for clarity.\n", "- there are offensive tokens in the dataset. We have not removed them.\n", "\n", "\n", "