Understanding Cryptographic Hashing

Casey Gibson
5 min readDec 22, 2019

Cryptographic Hashing is one of the lesser known forms of security and data integrity, and is often mistaken for encryption. Hashing is however a critical element in the ways computers operate and communicate with one another. Essentially, without hashing, the way computers operate would be vastly different and potentially less secure and efficient.

How does Hashing Work?

Hashing works by allowing a one way conversion from readable text to unreadable, fixed-length alphanumeric text. As an example, the text “I am Text.” can be converted to the MD5 (more or this later) hashing of “8c120839f2b7f2665f6505845ed81f78”. By design, there is no practical way to convert the hash back to the readable text. This is different from encryption, which can be converted back into readable text.

What’s the purpose of Hashing?

Fundamentally, Hashing has 3 main uses.

  1. Storing sensitive data that we don’t want to be recovered, such as passwords
  2. Looking up data quickly, such as a hash table
  3. Verifying the integrating of data

What Hashing variations are there?

Hashing at it’s raw form follows an algorithm to turn text into the fixed-length alphanumeric version. While this typically meets the standard of most applications, when it comes to storing passwords it has a huge vulnerability.

Humans are, unfortunately, not very good at coming up with unique passwords. The most common password for 6 straight years was “123456”, with “password” previously the most common. While these weak passwords won’t hold up against a brute force attack, the vulnerability that Hashing possesses is duplicates.

Since Hashing depends on the concept of creating a predictable output based on its input, 2 (or more) user accounts that have the same password will also have the same stored hashing in the database.

From a practical point, if a hacker was able to gain access to a database and download the list of password hashes, they will try to attack the hashes that appear the most in the database. In practical terms, if the hacker sees that 50 accounts have the same password hash, it means that if they can crack that password hash, they instantly know the password for all 50 accounts.

This is why passwords that are going to be hashed need to be treated differently, which is where the term “Salt” comes in. Essentially, a Salt is extra random text that is added to the text that is about to be hashed. As an example, the text “I am Text.” produces “8c120839f2b7f2665f6505845ed81f78”, however “I am Text.RANDOM” generates “a964602c8217eb8d6e81961a867e19e9”. All we need to do is to remember / store the text “RANDOM” along with the hashed password. It’s fine to store “RANDOM” in readable text as it alone won’t help crack the password hash, but it will help make the hash unique as long as the Salt is unique.

How has hashing improved over the years?

Hashing is practically meant to have no collisions. As in, for all the inputs entered into the hashing function, there should always be a unique output.

Historically, MD5 is one of these functions that was designed to meet this criteria, but suffered major flaws, especially collisions. While it was secure in 1991 when it was designed, now with modern computers and their huge computational power, the possibility of finding a collision is a lot higher than it once was. MD5 still has a useful function however in that you can use it for file integrity verification against unintentional corruption.

SHA-2 is currently widely used in security, especially TLS and SSL, PGP, SSH, S/MIME, and IPsec. SHA-256 is also used at the foundation of the Bitcoin Blockchain. SHA-2 fixes a number of issues that were present in MD5 and SHA-1, especially with it’s few vulnerabilities. SHA-2 also has far less risk of finding a collision.

With hashing passwords, the introduction of Bcrypt has made password hashing secure and easy as it handles the Salt for you. This means that you’re less likely to forget to include the Salt, but also you don’t need to store it in your database as it’s stored inside the hash.

How does hashing protect against data recovery?

Here’s a simple, practical example of how hashing protects against data recovery. Image you have the below math function:

2 + 8 = 10

Imagine 10 is our hash value. For the number 10, there are 10 combinations to generate the number 10.

1 + 9 = 10
2 + 8 = 10
3 + 7 = 10
4 + 6 = 10
5 + 5 = 10
6 + 4 = 10
7 + 3 = 10
8 + 2 = 10
9 + 1 = 10

The above example isn’t how hashing works as the above causes collisions, but it shows us that once you have done the arithmetic and you have the value, you no longer now what the input values were. It could be any combination of values and we can’t be 100% sure what the correct input is unless you try it.

In summary, hashing is designed for one-way conversion of text from readable to non-readable text. A good hashing function creates values that avoids collisions or at least keeps the chances extremely low to the point you would never find it. Google even went to the effort of proving that it’s possible to collide SHA-1 at a significant cost: https://www.trendmicro.com/vinfo/us/security/news/vulnerabilities-and-exploits/sha-1-collision-signals-the-end-of-the-algorithm-s-viability

Since SHA-1 has been updated to SHA-2 which is significantly stronger, it still shows that once secure hashing algorithms will eventually be overcome by the ever increasing improvements of CPUs and GPUs and their increasing power. With this in mind, newer hashing functions needed to be created and standardised. One upcoming hash function is SHA-3 which will eventually replace SHA-2.

--

--

Casey Gibson

I’m a full stack developer in HTML/CSS, JavaScript, PHP, Java, NoSQL, SQL with extensive knowledge in MongoDB, NodeJS, AWS Lambda and DynamoDB.