Files
notes/notes/ANS Theory.md
2025-05-09 12:37:48 -06:00

5.0 KiB

https://arxiv.org/abs/1311.2540

In standard numeral systems, different digits are treated as containing the same amount of information. A 7 is stored using the same amount of info as a 9, which is stored using the same amount of info as a 1, that is, 1 digit.

This makes the amount of information a single digit stores uniform across all digits. However, that's far from the most efficient way to represent most datasets, because real world data rarely follows a uniform distribution.

ANS theory is based around the idea that digits that occur more often can be stored in a way that requires less information, and digits that occur less often can be stored using more information.

Taking a look at the standard binary numeral system, there are two digits in the set (0 and 1). Given a natural number represented in binary, eg 1010, there are two different ways to add information to that number:

  1. We can add a digit to the most significant position. As an example, adding a 1 to the above value would result in 11010. Doing this means that the added digit stores information about large ranges. In the provided example, this means that setting that digit changes the value by 16.
  2. We can add a digit to the least significant position. As an example, adding a 1 to the above value would result in 10101. Changing the added digit will only change the resulting natural number by 1.

Given that x represents a natural number, and s is the digit we're adding. In a standard binary system, adding s to the least significant position means that in the new number x (before the addition) now represents the Nth appearance of an even (when s = 0 ), or odd (when s = 1). With ANS, the goal is is to make that asymmetrical, so that you can represent more common values with a denser representation.

Arithmetic Coding

Arithmetic coding works by taking a stream of data, and converting it into an infinitely precise number between 0.00, and 1.00. This is based off of the idea that the sum of the probability of all events happening will always amount to 100\%.

For example, the probability of a coin flip resulting in tails is 50%, and the probability of a coin flip resulting in heads is 50%. The probability of a coin flip resulting in heads or tails is %100.

If we wanted to keep track of the result of a series of coin flips, this could be done by subdividing a range. If the coin flip is between 0 and 0.5, then we know that the first flip must have been tails.

If the coin flip is between 0.5 and 1, then we know that the first flip must have been heads.

This subdivision process can be repeated infinitely to store an infinite number of coin flips by dividing each range again.

To store two coin flips, you might have the first subdivision represent the outcome of the first coin flip, and the second subdivision represent the outcome of the second coin flip:

Range Result
0.00 - 0.25 Tails, Tails
0.25 - 0.5 Tails, Heads
0.50 - 0.75 Heads, Tails
0.75 - 1.00 Heads, Heads
Imagine a situation where we want to store all possible outcomes of three consecutive coin flips using a decimal number, Heads, Heads, Tails.
Encoding this would happen as follows:
  1. First we subdivide the range by the probability of each event happening. The probability of each is 50%, so that's simple. Referring above, we know that heads is represented by the top half of the range, and tails is represented by the bottom half of the range.

Because the first coin flip resulted in Heads, the output value must be between 0.50 and 1.00.

  1. Subdividing the range 0.50 and 1.00 again to store the results of the second flip, we end up with values between 0.50 and 0.75 representing the sequence Heads, Tails, and values between 0.75 and 1.00 representing the sequence Heads, Heads.

Because the second coin flip resulted in Heads, we know that the output value must be between 0.75 and 1.00

  1. Subdividing the range 0.75 and 1.00 yet again, 0.750 - 0.875 means the third coin flip resulted in Tails, and a value in the range 0.875 - 1.000 means the third coin flip resulted in Heads

Because the *third coin flip resulted in Heads, any value between 0.875 and 1.000 encodes the fact that the first three coin flips went Heads, Heads, Tails.

The decoding process performs the same series of steps, but by asking a question instead of outputting a value.

  1. Is the value between 0.00 and 0.50? If so, the first coin flip resulted in Tails. Otherwise if the value is between 0.50 and 1.00, the first coin flip resulted in Heads. The above process can be repeated just like the encoding process until we've determined the result of the first three coin flips.

These subdivisions can be encoded using 0 and 1, where 0 represents the bottom half of the range, and 1 represents the top half of the range.

When the alphabet is large enough that you can't select a particular outcome using one bit, multiple bits can be used instead to divide up and down the range.