The easiest way to blog the interesting things you find on the web. Supports Blogger, Wordpress, Typepad, Live Journal, Movable Type, and Vox.learn more»
The Large Text Compression Benchmark and
the Hutter Prize
are designed to encourage research in natural language processing (NLP).
I argue that compressing, or equivalently, modeling natural language text is "AI-hard". Solving
the compression problem is equivalent to solving hard NLP problems such as speech
recognition, optical character recognition (OCR), and language translation. I argue that
ideal text compression, if it were possible, would be
equivalent to passing the Turing test for artificial
intelligence (AI), proposed in 1950 [1]. Currently, no machine can pass this test [2].
Also in 1950, Claude Shannon estimated the entropy (compression limit)
of written English to be about 1 bit per character [3]. To date, no compression
program has achieved this level. In this paper I will
also describe the rationale for picking this particular data set and contest rules.
copy and paste this stylesheet into your blog template...