Compression — “In a nutshell”

Casey Gibson
The Startup
Published in
3 min readDec 19, 2019

--

Walnuts in a nutcracker with the caption “Compression In a nutshell”

compression: the act of pressing something into a smaller space or putting pressure on it from different sides until it gets smaller https://dictionary.cambridge.org/dictionary/english/compression

We’ve all been there. We’ve tried to squeeze an object into a smaller space with the intention of removing it for use later. Depending on the object, you may desire it to be restored to its original “glory”, while other objects don’t matter.

Think of a soccer ball. You can deflate it to store in tight spaces and then inflate it to once again use it as a soccer ball. This is a form of compression, specifically lossless compression because nothing has been lost in its “compression”.

Another form of compression is a puzzle, of which actually has two forms of compression. If you simply pack up the puzzle into a box, that’s once again lossless compression as you can redo the puzzle and restore it to its original state. If you lose pieces however, when you pack the puzzle up you can still redo the puzzle, but it will have pieces missing. This is called lossy compression as you have lost information that you can’t then restore later.

Computers depend on both of these forms of compression to operate and are both widely used. An example of this is the photos you take on your smartphone. Typically, phones use JPEG images, which uses the lossy compression because it throws away data that humans can’t see in the image. This results in the smallest file size which is why it’s widely used in smartphones.

When it comes to the transmission of data however, computers typically need accurate information and can’t afford to throw away data. An example of this would be transferring money between banks. If someone sent you $5000, you expect to receive $5000, not $500 because the computer decided it didn’t need the extra 0.

Here’s a practical example of how computers might compress data using lossless compression. Here’s a block of letters, or in computer programming, a “String”.

"a,a,a,a,b,b,b,b,c,c,c,c,d,d,d,d"

There are fundamentally 2 ways you can store this in lossless compression.

The first is that we can see there are commas after each letter, so we can remove those and make a note that all letters have a “comma” after it.

"aaaabbbbccccdddd"

We can then take the next step and see there’s patterns. So, we can write the above as:

a4b4c4d4

The above tells us that the letter a repeats 4 times, b 4 times, c 4 times and d 4 times.

Since we know the rules, we can easily restore a4b4c4d4 back to “a,a,a,a,b,b,b,b,c,c,c,c,d,d,d,d” by simply repeating the letters and adding back the commas. The original String had 31 characters in it while our compressed version had only 8. That’s a huge 75% saving.

We’ll know look at how we can use lossy compression, with the best example being float numbers. Float numbers are effectively decimal point numbers. Here’s our example:

25.435383, 14.345384, 78.342383, 45.234283

In some situations, there isn’t a need to restore the data to its original state and only an estimate is needed. For this, you can simply remove the decimal points by rounding the numbers. Our example therefore turns into this:

25, 14, 78, 45

Instead of needing to store the full number, by rounding we have reduced the amount of storage needed from 36 characters (including the decimal points) to 8, a 78% saving.

The issue with this however is that we no longer have the original and most accurate number, which is now impossible to restore without redoing the original calculation.

In summary, compression is fundamentally an easy concept to understand and has been around forever.

While computers can store more date than they could 10 years ago and internet speeds have been becoming faster at the same time, the data has also been increasing with it. If you wanted to download all the Wikipedia pages (with revisions), you would need to store several Terabytes of data: https://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_do_I_get_it.3F

With the advances of computers, data compression has become just as important in its advances.

--

--

Casey Gibson
The Startup

I’m a full stack developer in HTML/CSS, JavaScript, PHP, Java, NoSQL, SQL with extensive knowledge in MongoDB, NodeJS, AWS Lambda and DynamoDB.