Compare commits

...

5 Commits

Author SHA1 Message Date
arc
053eea8b11 vault backup: 2025-05-09 12:37:48 2025-05-09 12:37:48 -06:00
arc
9efb23dc3d vault backup: 2025-05-09 12:32:48 2025-05-09 12:32:48 -06:00
arc
e5d60bfb94 vault backup: 2025-05-09 12:27:48 2025-05-09 12:27:48 -06:00
arc
cbe5d6cde7 vault backup: 2025-05-09 12:22:48 2025-05-09 12:22:48 -06:00
arc
c80bacb2fc vault backup: 2025-05-09 12:17:48 2025-05-09 12:17:48 -06:00

View File

@ -17,4 +17,33 @@ Arithmetic coding works by taking a stream of data, and converting it into an in
For example, the probability of a coin flip resulting in tails is 50%, and the probability of a coin flip resulting in heads is 50%. The probability of a coin flip resulting in heads *or* tails is %100.
If we wanted to keep track of the result of a series of coin flips, this could be done by subdividing a range. If the coin flip is between $0$ and $0.5$, then we know that the first flip must
If we wanted to keep track of the result of a series of coin flips, this could be done by subdividing a range. If the coin flip is between $0$ and $0.5$, then we know that the first flip must have been tails.
If the coin flip is between $0.5$ and $1$, then we know that the first flip must have been heads.
This subdivision process can be repeated infinitely to store an infinite number of coin flips by dividing each range again.
To store two coin flips, you might have the first subdivision represent the outcome of the first coin flip, and the second subdivision represent the outcome of the second coin flip:
| Range | Result |
| ------------- | ------------ |
| $0.00 - 0.25$ | Tails, Tails |
| $0.25 - 0.5$ | Tails, Heads |
| $0.50 - 0.75$ | Heads, Tails |
| $0.75 - 1.00$ | Heads, Heads |
Imagine a situation where we want to store all possible outcomes of three consecutive coin flips using a decimal number, *Heads, Heads, Tails*.
Encoding this would happen as follows:
1. First we subdivide the range by the probability of each event happening. The probability of each is 50%, so that's simple. Referring above, we know that heads is represented by the top half of the range, and tails is represented by the bottom half of the range.
> Because the *first* coin flip resulted in *Heads*, the output value must be between $0.50$ and $1.00$.
2. Subdividing the range $0.50$ and $1.00$ again to store the results of the second flip, we end up with values between $0.50$ and $0.75$ representing the sequence *Heads, Tails*, and values between $0.75$ and $1.00$ representing the sequence *Heads, Heads*.
> Because the *second* coin flip resulted in *Heads*, we know that the output value must be between $0.75$ and $1.00$
3. Subdividing the range $0.75$ and $1.00$ yet again, $0.750$ - $0.875$ means the third coin flip resulted in *Tails*, and a value in the range $0.875$ - $1.000$ means the third coin flip resulted in *Heads*
> Because the *third coin flip resulted in *Heads*, any value between $0.875$ and $1.000$ encodes the fact that the first three coin flips went *Heads, Heads, Tails*.
The decoding process performs the same series of steps, but by asking a question instead of outputting a value.
1. Is the value between $0.00$ and $0.50$? If so, the first coin flip resulted in *Tails*. Otherwise if the value is between $0.50$ and $1.00$, the first coin flip resulted in *Heads*.
The above process can be repeated just like the encoding process until we've determined the result of the first three coin flips.
These subdivisions can be encoded using $0$ and $1$, where $0$ represents the bottom half of the range, and $1$ represents the top half of the range.
When the alphabet is large enough that you can't select a particular outcome using one bit, multiple bits can be used instead to divide up and down the range.