Top-p sampling, also called nucleus sampling, is a technique for autoregressive language model decoding proposed by Ari Holtzman in 2019. Before the introduction of nucleus sampling, maximum likelihood decoding and beam search were the standard techniques for text generation, but, both of these decoding strategies are prone to generating texts that are repetitive and otherwise unnatural. Top-p sampling avoids this by setting a threshold p and then restricting the sampling to the set of most probable tokens with cumulative probability more than p. Then, probabilities of the token from this set are rescaled to sum up to 1, the rest of tokens are rejected.
Top-k sampling is similar except that the sample is taken from the k-highest probability tokens regardless of their cumulative probability. The advantage of top-p sampling is that one avoids the difficult problem of choosing the optimal value of k which can vary depending on the shape of the output distribution and the particular task and dataset.
The top-p sampling technique is used in popular large language model applications like ChatGPT and is implemented in language modeling frameworks like Hugging Face and Cohere.
References
- Holtzman, Ari; Buys, Jan; Du, Li; Forbes, Maxwell; Choi, Yejin (22 April 2019). "The Curious Case of Neural Text Degeneration". arXiv:1904.09751 .
- Chiusano, Fabio (28 January 2022). "Two minutes NLP — Most used Decoding Methods for Language Models". Medium. Retrieved 23 August 2023.
- McCaffrey, James D. (14 October 2021). "Nucleus Sampling for Natural Language Processing". Retrieved 23 August 2023.
- von Platen, Patrick. "How to generate text: using different decoding methods for language generation with Transformers". Hugging Face. Retrieved 23 August 2023.