Hashing is the process of running any given input through a specific mathematical algorithm called a Hash function which then generates a fixed-length output. Small changes in the input will dramatically change the output (a phenomenon called the Avalanche effect). Here we can see that by running the words “Cryptodesk” and “cryptodesk” through the SHA256 algorithm, we get two completely distinct results.
Hashing has applications stretched from optimizing data storage to cryptography and mining cryptocurrencies like Bitcoin. Hash functions mainly fall into either of two categories: Cryptographic Hash functions and non-cryptographic Hash functions. The main difference between the two is “uniqueness”. An adequate cryptographic Hash algorithm has all the criteria of a non-cryptographic Hash, plus the fact that it creates a unique Hash output. For a Hash function to be considered cryptographic, it must be collision-resistant, and pre-image resistant. Collision resistance means it is infeasible to find two inputs that generate the same output. Pre-image resistance means it is infeasible to reverse-engineer the output and obtain the original input. But what significance do Hash functions entail and how do they contribute to the world of cryptocurrencies?
Hashing: The magic box
Imagine that you want to initiate a contract between yourself and someone else. You approve the deal by putting your signature under the contract. A person with bad intentions might carefully copy-paste your signature on other documents and claim that you owe them money or real estate. There are different methods you can implement to prevent such mischievousness. Hashing is a particularly strong method to address such concerns.
Every party involved in the contract generates a private key/public key pair. A Hash function then takes this pair along with the contract as the “input” and turns this input into a fixed-length number. This “output” is called a “Hash” and has a few properties that make it ideal for cryptography. Depending on what Hash function is used, the outputs differ in size. SHA256 algorithm generates outputs of 256 bits, while the SHA-1 algorithm will always generate a 160-bits output.
Hashing is a deterministic process, meaning a particular input generates the same output every single time. Hashing algorithms are usually optimized for speed to be quick and generate the Hash of input as fast as possible. This makes the systems relying on Hashing way more efficient.
Explaining exactly what happens under the hood of these functions is a long technically detailed discussion for the future, but the general idea is that the Hashing algorithm is programmed to ensure that it’s infeasible to attempt to find the valid signature by simple trial and error. For strong modern Hash algorithms like SHA256, it takes an astronomically colossal computational effort for someone to reverse-engineer an output and obtain the original input and this makes Hashing the vital backbone of blockchain technology.
Hashing vs Encryption
Let’s say you want to send a secret message to someone called Jack. You set a simple rule: Every letter of the massage is replaced with the second next word, so “a” becomes “c”, “c” becomes “e” and so on. The “y” and “z” letters are replaced with “a” and “b”, respectively. This rule is called the “key”. So the final result is a new message with the same number of letters as the original one. If Jack has the key then he can easily turn the new nonsense massage into the original one. This is called “encryption”. But if the key ends up in the wrong hands, they can intercept the messenger and get their hands on that massage as well.
Hashing on the other hand is a one-way street. Since the process is bit-dependent, simply possessing the secret key doesn’t suffice. Bit dependency means that each bit of the output depends on each bit in the input.
Take the simple multiplication operator. It turns two inputs into one output. If only the output is known, the number “20” for example, how do we know the inputs weren’t “2” and “10” instead of “4” and “5”? Now imagine a “chain” of these operator “blocks” each generating an output that feeds into the next block. So you might start with “4” and “5” but end up with “1800”. This makes brute force attacks unimaginably difficult to succeed. Brute force attacks attempt to guess the initial values by simple trial and error. If the Hashing algorithm is sophisticated enough, then it is practically impossible to guess the initial values by brute force.
Hashing and Bitcoin
To understand how cryptocurrencies such as Bitcoin are created, it is crucial to first learn the role of cryptographic Hash functions in decentralized systems. Let’s start with an arbitrary list of transactions. I put a special number under that transaction, so when I run the list of transactions through a Hashing function like SHA256, it generates a 256-bit Hash that ends with a certain number of zeroes, like 100 zeroes. The probability of guessing a Hash that ends with 100 zeroes is 2 to the power of 100 or one with 30 zeroes in front of it! So if you generated a Hash that ends with 100 zeroes, I can be certain that you went through 1030 numbers to make the correct guess. This is called proof of work. The system is originally designed to reward anyone who puts in the computational processing effort to make the correct guess. Depending on the network, the reward is in a specific digital currency such as Bitcoin. Any device that consumes electrical energy, turns it into processing power and guesses the correct Hash in order to win this reward is called a miner.
The Proof of Work (PoW) function for Bitcoin is called Hashcash. This is what miners are trying to create so they can earn or mine Bitcoins. They basically listen to the transactions, try to guess the correct Hash by trial and error, and get rewarded a new transaction when they succeed which is some amount of Bitcoin. Then they take the new and old transactions and list them as a new block that is broadcasted to the network. This is how new Bitcoin is created (or mined) and added to the economy. By adding more blocks to the chain of blocks or the blockchain, it becomes more and more difficult for miners to create new Bitcoin. This is why there can only exist a total of 21 million Bitcoins.
Hashing is a one-way cryptography method that takes any input, and turns it into a fixed-length output, regardless of the original size of the input. The output or the “Hash” is a digital signature that is simply too ginormous to crack by brute force. The irreversibility –or rather the reversal infeasibility- of Hash values is what makes the Proof of Work consensus protocol possible in the first place which is the protocol that runs blockchains like Bitcoin and Ethereum 1.0. You can read the detailed description of the Proof of Work protocol here.