Hutter Prize
The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 1 GB English text file, with the goal of encouraging research in artificial intelligence (AI).
Launched in 2006, the prize awards 5000 euros for each one percent improvement (with 500,000 euros total funding)[1] in the compressed size of the file enwik9, which is the larger of two files used in the Large Text Compression Benchmark (LTCB);[2] enwik9 consists of the first 109 bytes of a specific version of English Wikipedia.[3] The ongoing[4] competition is organized by Hutter, Matt Mahoney, and Jim Bowery.[1]
As of 2018, the text data of enwik8 and enwik9 remains a key tool for evaluating the performance of compression algorithms (as done in Hutter's LTCB) and of language models.[5][6]
Goals
The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers believe that text compression and AI are equivalent problems. Hutter proved that the optimal behavior of a goal-seeking agent in an unknown but computable environment is to guess at each step that the environment is probably controlled by one of the shortest programs consistent with all interaction so far.[7] However, there is no general solution because Kolmogorov complexity is not computable. Hutter proved that in the restricted case (called AIXItl) where the environment is restricted to time t and space l, a solution can be computed in time O(t2l), which is still intractable.
The organizers further believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test. Thus, progress toward one goal represents progress toward the other. They argue that predicting which characters are most likely to occur next in a text sequence requires vast real-world knowledge. A text compressor must solve the same problem in order to assign the shortest codes to the most likely text sequences.[8]
Most large language models and neural network models are not eligible for the Hutter Prize, as they do not meet the Hutter Prize's requirements for computation and RAM[9] (see Hutter Prize § Rules). Large language models typically require GPUs to run efficiently, and higher-end models like OpenAI's GPT series require state-of-the-art GPUs such as NVidia's to run. Thus, although neural network-based approaches holding the record for enwik9 compression by a substantial margin,[10] no winning algorithm to date has used these models.
Rules
The contest is open-ended. It is open to everyone. To enter, a competitor must submit a compression program and a decompressor that decompresses to the file enwik9.[3] It is also possible to submit a compressed file instead of the compression program. The total size of the compressed file and decompressor (as a Win32 or Linux executable) must be less than or equal 99% of the previous prize winning entry. For each one percent improvement, the competitor wins 5,000 euros.
Submissions must be published in order to allow independent verification. There is a 30-day waiting period for public comment before awarding a prize. In 2017, the rules were changed to require the release of the source code under a free software license, out of concern that "past submissions [which did not disclose their source code] had been useless to others and the ideas in them may be lost forever."[4]
To be eligible for the prize, a compression algorithm must be able to completely decompress the dataset on a single-core i7 CPU with 10 GB of RAM with 10 hours of running time.[11]
History
The prize was announced on August 6, 2006[1] with a smaller text file: enwik8 consisting of 100 MB. On February 21, 2020 it was expanded by a factor of 10, to enwik9 of 1GB, similarly, the prize goes from 50,000 to 500,000 euros. The original prize baseline was 18,324,887 bytes, achieved by PAQ8F. The expanded prize baseline was 116MB.
On August 20 of that same year, Alexander Ratushnyak submitted PAQ8HKCC, a modified version of PAQ8H, which improved compression by 2.6% over PAQ8F. He continued to improve the compression to 3.0% with PAQ8HP1 on August 21, 4% with PAQ8HP2 on August 28, 4.9% with PAQ8HP3 on September 3, 5.9% with PAQ8HP4 on September 10, and 5.9% with PAQ8HP5 on September 25. At that point he was declared the first winner of the Hutter prize, awarded 3416 euros, and the new baseline was set to 17,073,018 bytes.
Ratushnyak has since broken his record multiple times, becoming the second (on May 14, 2007, with PAQ8HP12 compressing enwik8 to 16,481,655 bytes, and winning 1732 euros), third (on May 23, 2009, with decomp8 compressing the file to 15,949,688 bytes, and winning 1614 euros), and fourth (on Nov 4, 2017, with phda compressing the file to 15,284,944 bytes, and winning 2085 euros) winner of the Hutter prize.
As of July 2023, Saurabh Kumar is the latest winner of the Hutter Prize, with fast-cmix compressing the file to 113,746,218 bytes and winning 5187 euros.[2]
See also
References
- "500'000€ Prize for Compressing Human Knowledge". Hutter Prize. Retrieved 2023-01-08.
- Mahoney, Matt (2022-12-02). "Large Text Compression Benchmark". Retrieved 2023-01-08.
- Mahoney, Matt (2011-09-01). "About the Test Data". Retrieved 2022-11-16.
- "Human Knowledge Compression Contest Frequently Asked Questions & Answers". Hutter Prize. Retrieved 14 Oct 2022.
- Radford, Alec; Wu, Jeff; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilya (2019). "Language Models are Unsupervised Multitask Learners" (PDF).
- Chiang, Ted (2023-02-09). "ChatGPT Is a Blurry JPEG of the Web". The New Yorker. Retrieved 2023-07-23.
- Hutter, Marcus (2005). Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Texts in Theoretical Computer Science an EATCS Series. Springer. doi:10.1007/b138233. ISBN 3-540-22139-5.
- Mahoney, Matt (2009-07-23). "Rationale for a Large Text Compression Benchmark". Retrieved 2022-11-16.
- Hutter, Marcus. "500'000€ Prize for Compressing Human Knowledge". prize.hutter1.net. Retrieved 2023-09-13.
- "Large Text Compression Benchmark". www.mattmahoney.net. Retrieved 2023-09-13.
- Hutter, Marcus. "500'000€ Prize for Compressing Human Knowledge". prize.hutter1.net. Retrieved 2023-09-13.