We often think that data is never secure.
Until now, this statement was correct, which is the reason we were never able to depend on data. Information and data that is stored in distributed and central systems can be changed by the operators of these systems, and this means that the integrity of this data cannot be ensured. Hash functions solve this problem.
Satoshi Nakamoto developed Bitcoin with hash functions
The mysterious Satoshi Nakamoto – also known as the inventor of Bitcoin – is the reason we no longer have to accept this uncertainty about our data. His concept shows how a decentralized system can be used to ensure the integrity of the data. His concept, which is only now slowly reaching the masses, is ideal for changing our systems: those of data and information processing anyway, but as a consequence also the structures of our society as a whole.
Basically, Nakamoto combined some approaches that were already known in a way that they can ensure that data will stay unchanged at any point. One key element is computer encryption and hash functions are a part of that. However, this alone isn’t enough, we need another element: the connection of old data with new data, creating a chain of data blocks in which the current data always takes into account the previous data and this data the one before. That is the basic principle of the blockchain – a chain of data blocks that are linked together.
First, the data is split into individual transactions
Nakamoto first split the system up into individual transactions and then collated them in data blocks. In this process, some additional cryptographic formulae are used to encrypt the data blocks. Thus, we enter the field of cryptography.
In the following section, we will use the conventional nomenclature of the cryptography sector. We will call the participants “Alice” and “Bob” and use the virtual currency Bitcoin, or its abbreviation, BTC. Alice and Bob are synonyms for sender and recipient of a transaction or message. “Carol” and “Dave” are further protagonists in a cryptographic system. They also represent senders or receivers. They represent people who are exchanging assets, so one is selling to another and gets paid. That is a transaction in the sense of Bitcoin.
Alice sends 5 Bitcoin (BTC) to Bob
So, if Alice wants to send 5 BTC to Bob, that is a simple transaction. Alice owns more than 5 BTC and sends 5 BTC to Bob. He now sends 3 BTC to Carol and 1 BTC to Dave. This is two further transactions. The Bitcoin system now collects these transactions and packs them in a data block.
This block has a specified maximum size and is packed and bundled by specialized computers. The system is designed so that a block is produced about every 10 minutes. It is this that acts as the brake mentioned previously and that brings other benefits with it (more about that later).
This data block is now not just bundled up like, say a .zip file, but cryptographic procedures are used that enable this block to be provided mathematically with an unequivocal label. This label, which is called a “hash,” is the result of a calculation from the data of the info-block using hash functions.
The transactions are now secured with hash functions
This hash is a very interesting piece of mathematical work that is based on so-called “trapdoor” functions. Exactly how it works is a source of fascination for mathematicians.
Trapdoor functions for the layperson
It is enough for the non-mathematician to understand that the formulae concerned can only be calculated in one direction. If you can normally calculate x if, say, 2+x=5, then that is not a trapdoor function, because it works in both directions. A hash algorithm, on the other hand, works in only one direction: forward.
Even if you know the result of the calculation and the associated calculation path, you cannot return to the starting value.
That sounds fantastical, but it really is possible and it is fascinating. As already stated, as a mathematical layman, it enough for us to simply know and recognize the existence of this phenomenon to understand the whole system. Those who wish to delve into it further will be able to research the mathematics of it easily on the internet.
The transactions are then collated in a data block and assigned a hash value through a cryptographic algorithm. This hash value cannot be changed, nor can the initial values be deduced from it. If you were to change only one single, tiny part of the transaction data that makes up the data block—say, adding a comma at one point—then that would result in a completely different hash value.
The result of hash functions
Here is a real, calculated example that illustrates how an individual change delivers a completely different hash value.
If, for example, you generate the SHA 256 hash from the following sentence:
I am Satoshi Nakamoto
this becomes: c8bb907d49983cfd5b1db28be3fe3c2c5ade3a2b2995bd56f8b4203f74345caa
If you change only one single character by adding a full stop at the end, then the 256 hash changes drastically. From:
Ich bin Satoshi Nakamoto.
[I only added a full stop after Nakamoto – that’s the small difference]
we get: 80cd76ffd0af98e2fbe066cda10847e7edbaa4caabb4fbf14d317b9cbbc4c963
The two hashes are in no way related.
This hash, and the fact that it changes so radically through a minor change, is one of the foundations of data integrity—also known as “the Truth”—of blockchain technology and the Bitcoin cryptocurrency.
The data blocks are interlocked with each other
In the Bitcoin system, Nakamoto decreed that the data block will not just hold the current transactions, but that each would also hold the hash of the preceding block. The next block can only be linked to its predecessor if the hash values can be linked—that is, if the new block can be calculated together with the hash value of the previous block. Thus, a conclusive blockchain is made from data in which the information is unalterable and protected from manipulation. There is an important safety function in this compared to conventional systems, which is what makes blockchain so uniquely future-orientated and forward-looking.
As this is a crucial and very significant function, it is summarized again below.
Every block contains the has value of the previous block
A hash value is a mathematically derived value that is massively altered even by a minimal change to the data. Even if the result value and the calculation route are known, it is impossible to reverse the formula calculation to extract the input data.
Every data block is calculated from the transaction data and the hash value of the previous block. In this way, one block can only be attached to its predecessor if the calculation is correct. The trick with this mathematics is principally that, although the calculation of this value can be very complicated, it is very easy for a third party to recognize that the calculation was correctly executed. You can, therefore, recognize that the result is correct without running the calculation again. As I said, fascinating.
Blocks are made under this premise: one after the other and attached to each other; thus, forming a blockchain.
Imagine, for example, that five blocks, exist, each with 1,000 transactions. Block 2 would be generated from the transactions of this block, plus the hash value of Block 1. Block 3 was generated from the transactions in Block 3 and the hash value of Block 2 which, in turn, depends on the hash value of Block 1. You can see where this is going.
If a transaction in Block 1 is now changed, this change results in a completely different hash value for the first block.
Block 2 is then incorrectly computed as the second block used the hash of the first in its formula. But as this is now different, the result cannot now be correct. As a result, a new value must also be calculated for Block 2. The same applies to the whole onward chain.
So, all blocks receive a new hash value because one variable—the hash of the previous block—is changed in the complex hash formula of the next block. This must change all the following hash values. A very ingenious thought, which is one of the keys to the success of blockchain. But is it enough?
A minimal change at the start forces a complete recalculation of the whole blockchain
For a while, this was pure mathematics, which was of no real use because data can be changed, which just leads to the hash values of the blocks being changed. They change to something noticeably different, and the data do not now represent the truth. But no-one can see that.
Transactions and records from the past have been falsified. The original data are not stored on our computers in a manipulated blockchain. History has been rewritten and the property now belongs to someone else. So, we are not there yet.
Not just hash functions, but also a peer-to-peer network
Of course, Nakamoto also solved this problem by additionally providing for a peer-to-peer network. A peer is a person, an individual or—in the technical world—a machine in a network. The meaning of “peer” is “having equal rank or status.”
In our technical case, it is just another computer called a node. This node is connected to the other nodes and receives the data blocks from its neighbor. It quickly checks the hash value of the new block. Due to the hash functions used, this is a comparatively easy calculation that can be executed very quickly by any computer. Checking is much easier than the original calculation of the hash value for each block, especially because the nodes will receive a input variable form the hash calculating computers, which are specialized in building the data blocks. If this value based on said hash functions is correct, and if it corresponds to the last block, which the node already has, then the node just attaches the new data block to the blockchain that it already has. If the incoming data block cannot be calculated based on the last known block, then the node discards it and waits for the next.
Thousands of nodes and hash functions ensure truth
The whole procedure is being carried out constantly on many thousands of nodes. So, local blockchains stored on the nodes are built up block by block. In the Bitcoin blockchain, many thousands of nodes have joined Blocks 1 to 5 in the above example and stored them locally.
The P2P network prevents manipulation
Now, it is extremely hard to carry out any manipulation. If someone on a node (computer) changes one single transaction in the first block, then the hash values of all five subsequent blocks in the example are changed.
This is a direct result of the methodology where the hash of a block is always computed from the transaction data plus the hash value of the previous block. The sixth block is, of course, built on the hash value of the fifth block and fits perfectly on the manipulated computer. However, if this sixth block is now sent to the other nodes, the block does not fit the fifth block that they all have stored locally. As a result, the block is refused and the nodes wait for a different block—Block 6. There now may be one computer that can conclusively show false data, but it is only one among thousands. However, the truth can be found in these thousands because the system has many thousands of witnesses.
To prevent every node from making changes of its own and constantly flooding the whole network with falsified data, Nakamoto provided a further module. This module has another function and rewards certain components in the network with it.
Computers that calculate and assemble the data blocks receive a special position in the network. Nodes store the existing blockchain themselves and hang each new block on it. This is a process that requires comparatively little computing power, but quite a lot of storage space, because each individual node saves the ever-longer chain for all time on its hard drive from the outset. That appears expensive and profligate, but it is the price you must pay for the truth.
This is a short excerpt from my book.
Bitcoin, Blockchain & Co.
The Truth, and Nothing but the Truth