Ethereum has managed to construct a blockchain that can do much more than just transfer cryptocurrencies between participants.
Complex applications can be developed and run on the platform—all with the unique features of blockchain technology. It has also been steadily evolving over the past few years. The biggest event was The Merge, in which Ethereum was transferred to the PoS consensus mechanism. A few months before The Merge, the PoS chain went live and ran parallel to the Ethereum proof-of-work (PoW) chain. Then, in The Merge, the two structures were merged together. This historical development is still very evident in the platform’s architecture. For example, there are basically two layers in Ethereum: the execution layer, which contains the remnants of the old Ethereum PoW chain, and the consensus layer, which carries the innovations of the Beacon Chain introduced with PoS. Since PoS has been the only consensus mechanism since The Merge, the Ethereum chain is now collectively called the Beacon Chain.
In this blog post, we would like to give you some general information to help you differentiate Blockchain 2.0 from Blockchain 1.0 in a comprehensible way.
Ethereum sees itself as a transaction-based state machine that starts with an initial genesis state and is converted into a final state through transactions. This final state is not a state with which the system ends but is always the most up-to-date state of the platform. (If you want to read more about this, see https://ethereum.github.io/yellowpaper/paper.pdf, a detailed paper.)
The Bitcoin project can also be described as a state machine, with the state represented by the global collection of all unspent transaction outputs (UTXOs; see https://medium.com/cybermiles/diving-into-ethereums-world-statec893102030ed for more information). Bitcoin’s state is also altered by transactions on the network. To initiate these transactions, the participant must use their key to access one or more UTXOs and convert them into new UTXOs. With Bitcoin, users don’t have an account balance associated with their address. They only manage keys in their wallets that can unlock UTXOs assigned to them. So, while the state of Bitcoin is rather abstract, Ethereum sees states as a basic concept on which its whole project is built. Unlike in Bitcoin, accounts form an important basic construct in the Ethereum network. These represent the addresses of the participants in the network but can contain much more information.
Now, we’ll introduce you to a data structure that is omnipresent in the Ethereum project: the Modified Merkle Patricia Trie (MPT). The MPT is a combination of two data structures: the Merkle tree and the Patricia trie.
The word trie is derived from information retrieval. The similarity to the word tree is intentional because a trie is structured like a tree and forms a key-value store. In general, a trie stores strings and can then be searched for them. For example, starting with the root, following the paths results in words. Alternatively, instead of words, IP addresses can be displayed in such a trie. Whereas in a regular trie, each additional character in a word entails a new path, if there is no explicit branching, the Patricia trie summarizes multiple characters. In this way, space can be saved. In the Patricia trie, the strings represent the keys that lead to a certain value, as shown in the figure below. For example, the value can be the word itself or the ID of a word. However, it must be made clear that the value itself is only recorded at the very bottom of the leaf of the trie. Along the way, the key-value pairs are formed by the substring acting as the key and the child node forming the value.
The MPT combines the search properties of the Patricia trie and the hash property of the Merkle tree. The keys introduced with the Patricia trie are broken down into nibbles in the MPT. In computer science, a nibble represents a data set of four bits (or half a byte). Here, a nibble is represented as a hexadecimal sign, which means that it can take values from 0 to 9 or a to f. There are basically three different types of nodes in an MPT:
We would now like to illustrate this abstract explanation with an example. Let’s say we have four keys already converted to nibbles: fa284b1, fa83bc9, fa83b14, and fad3492. These keys are linked to the values value1, value2, value3, and value4. This scenario is now to be represented in an MPT. The beginning fa is the same for all strings. Therefore, as shown below, an extension node is formed for this purpose.
This extension node is also the root of the tries, but the following characters of the keys differ. Therefore, the extension node points to a branch node, in which the references to the next child nodes can be entered. For the first key, a reference is created in slot 2; for the second and third keys, a reference is created in slot 8; and for the last key, a reference is created in slot d. Since the first and last keys remain unique from this point on, a leaf node is created with the last nibbles of the keys. These leaf nodes now contain the value, but the second and third keys also have the same following characters. Therefore, slot 8 refers to a new extension node with the matching nibbles of the keys. To represent the last, differing nibbles, a branch node is used again by adding a reference to the following child nodes at slot c and slot 1. These represent the final leaf nodes with the last nibble, and they contain the values. Note that if the nibble in the branch node were the last digit of the key, the value would be stored directly there—just like in a leaf node.
To further illustrate the concept of an MPT, the listing below shows what an implementation of the example trie looks like.
#Extension{:key [f a]
:child #Branch{2 #Leaf{:key [8 4 b 1]
:value "value1"}
8 #Leaf{:key [3 b]
:child #Branch{c #Leaf{:key [9]
:value "value2"}
1 #Leaf{:key [4]
:value "value3"}}
d #Leaf{:key [3 4 9 2]
:value "value4"}}}
Now, the Merkle part of the MPT comes into play: hashing. This, too, happens from the bottom up. First, the leaf nodes are hashed along with their data. Then, in the parent nodes (nodes that directly reference the affected leaf nodes), the pointers are replaced by the hash of the node in question. The parent node is then hashed, and the pointers are also replaced in their parent nodes. This continues until it’s the root’s turn. The resulting hash is ultimately the root hash of the MPT.
In Ethereum, the MPT is used multiple times, as you’ll see in the following sections.
The state trie is the heart of Ethereum and is made up of many individual accounts. In this section, we explain how this important data structure works.
The global state of Ethereum is the sum of the states of many accounts that exist on the network. An account is represented by an address that can be used to identify the account beyond doubt. There are two types of accounts in Ethereum: externally owned accounts (EOAs) and contract accounts (CAs).
The addresses of EOAs in Ethereum are calculated from the public key of the user. An Ethereum address has a size of 20 bytes and consists of 40 characters of the hexadecimal system. Each Ethereum address is preceded by the identifier 0x, which indicates the use of the hexadecimal system. Originally, the characters a to f only appeared in lowercase in the addresses, but since then, a variant has been introduced in which the characters are also capitalized. The latter variant includes a checksum that detects when an address has been entered incorrectly.
The addresses of CAs are calculated from the sender address and the total number of transactions a sender has made. For this purpose, the two values are encoded and hashed.
EOAs are accounts used by external users, outside of the Ethereum platform (e.g., real people as users). These accounts are accessed via a private key. Smart contracts are represented in the network via the CAs; instead of a private key, they are controlled only by the program code of the smart contract. In addition, the CAs can be connected to other program code in other CAs.
The state of any account, regardless of its form, consists of four components, as shown in the next figure and the following list:
Together, the individual components represent the state of an account.
We’ve explained that states play an important role in the Ethereum network. The current state of the Ethereum blockchain is constantly updated on the network, creating a global state called the platform’s world state. The state trie maps this global state, making it the heart of the Ethereum blockchain. So, it’s a snapshot of the entire system. A copy of the state trie is stored on each node in the network.
The state of the network is the sum of the states of all accounts in the network, and this is the reason why all existing accounts in the state are stored as key-value pairs. A key is represented by the address of an account. The value contains the associated account, including all its components, encoded in the Recursive Length Prefix (RLP) format. This means that the current balance, the current nonce, the entire storage, and the entire code of each account can be found in the state trie.
Recursive Length Prefix Format: The RLP format is used in Ethereum for serializing objects in byte streams. RLP takes either a string or a list of strings as an object, and it only encodes the raw structure of these objects and doesn’t care about how those objects were interpreted before encoding. This interpretation is then made again by the decoder at a later stage. With RLP, it’s possible to store data compactly in the tries or transfer it between nodes.
As in Bitcoin, there are transactions on the Ethereum platform that are stored in a transaction trie. In this section, we’ll explain how transactions fit into the system.
Transactions are an important construct in the Ethereum blockchain and ensure that momentum comes into the platform. When transactions happen between accounts, Ethereum moves from one state to a new final state that can then be stored again.
As in the Bitcoin blockchain, a transaction is usually a message between actors in the network. Dr. Gavin Wood’s Ethereum yellow paper, “Ethereum: A Secure Decentralised Generalised Transaction Ledger,” describes a transaction as a single cryptographically signed instruction initiated by an EOA. Messages can be sent to other EOAs or CAs via a message call, and if the transaction takes place between two EOAs, it’s simply a matter of sending a certain amount of Ether. This is the use case that Bitcoin or other cryptocurrencies meet. When a transaction takes place between an EOA and a CA, it’s done to call the internal program code of the CA. This entails operations on the internal memory. Transactions can serve another purpose: to create CAs by initiating a smart contract. The different use cases for transactions are shown here.
As with Bitcoin, all transactions can be identified with a unique hash, which in Ethereum is called a TxHash. A standard transaction in Ethereum consists of several components detailed in the following list and figure:
Gas: The Fuel of Ethereum: If you participate interactively in the Ethereum network, you can’t avoid gas. Gas keeps Ethereum running and is the price that users calculate to pay when they generate computing power in the network. Transactions, the creation of smart contracts, and the use of smart contracts—every operation performed requires a predetermined amount of gas. This allows developers to add up how much gas their smart contract will consume during operation and optimize it accordingly. Gas is not a currency, but the price of gas is expressed in Ether. The unit used is Gwei, which in turn corresponds to 1,000,000,000 Wei. The price of gas is determined by supply and demand in the network, so gas is a constant unit in a market where prices fluctuate. You can think of it like your car: if you have 5 gallons left in the tank, you know how far you can get with your car, no matter how high the price of gasoline is.
In a constantly evolving system like Ethereum, important components such as transactions are also changing. In addition to the standard transactions that we’ve described, there are modified transaction types with extended features. To make the system able to respond well to future developments and maintain backward compatibility, a typed transaction envelope was introduced and can handle a wide variety of transaction types (see https://eips.ethereum.org/EIPS/eip-2718). New transaction types can be wrapped in the envelope and just need to ensure backward compatibility.
Unlike the data in accounts, transactions in the block are not subsequently changed. It therefore makes sense to store transactions in a separate data structure. For this purpose, the Ethereum network uses the transaction trie by storing the transactions collected in the transaction list there. Here, Ethereum again resembles Bitcoin because unlike in the state trie, not all transactions in the network are stored in a transaction trie; only the transactions that have occurred since the last block are. So, there are several transaction tries—one per block, to be exact. Otherwise, the transaction trie works like a normal MPT. The transactions are stored in key-value pairs in the trie, with the RLP-encoded index of the transaction (which is important for the order) representing the key and the transaction components described previously representing the value.
You’ve now learned how human users (or off-chain software) as external actors can influence the network with external transactions via the EOAs. However, contract accounts that are located exclusively within the platform boundaries must also be able to actively participate in the network. Ethereum enables messages for this reason. With the help of messages, the contract accounts can communicate with other contract accounts and call functions there. Messages are similar to transactions, but they have some peculiarities. For example, messages can never be sent spontaneously. Each first message is preceded by an initial transaction of an EOA, but it can then trigger further messages. Another special feature is that messages don’t become part of the blockchain but only exist in the execution environment during runtime.
Nevertheless, messages can influence the status of an account. For example, it’s often the task of a contract account to send Ether and thus update the balance of an EOA or a CA. Such a message is sometimes referred to as a value transfer or an internal transaction. Again, these special messages are not stored in the blockchain but still change the balance of the account in question (see figure below). This may sound unusual, as we know from the Bitcoin blockchain that all transactions are stored without gaps. So, it’s not possible to trace where the Ethers in the balance originally come from, but the initial transaction, the input parameters entered, and the transparent view of the program code from the called CA can be used to simulate where the money comes from. For example, the leading Ethereum block explorer Etherscan exploits this fact to display the Ether-transferring messages to its users in an uncomplicated way. However, since these value transfers don’t have a TxHash for unique identification, the TxHash of the parent transaction (the initial transaction that directly or indirectly triggered the message) is used.
The structure and components of a message are very similar to the transactions, but there are some differences. Since a message comes from a CA, it doesn’t have a signature due to the lack of a private key. In addition, a message doesn’t have the gas-related fields, as this was already set by the EOA in the initial transaction.
The receipts and receipts trie store the results of a transaction. In this section, we’ll introduce you to why the receipts and the associated tries are important in Ethereum.
Transactions are instructions from an EOA that clearly state what the EOA wants the network to do. The transactions don’t show what happened after the transaction was executed and what effects it had. However, to be able to understand the change in a state, you need to know exactly what happened.
This issue is resolved with the receipts of the transactions. Receipts provide detailed information about how the transaction will be carried out, and they consist of several individual components, as shown in the following list and figure:
Like the transactions, the receipts in the block are not changed afterwards. However, the receipts must still be kept separately. The transaction trie is formed before execution and then already has the required immutable hash values. If the receipts were also stored in this trie, it would subsequently change the hash values. For this reason, a separate trie is created for the receipts. The receipts trie is very similar to the trie for the transactions: again, an instance contains only the receipts of the particular block. The receipts are also stored in key-value pairs in the trie, the RLP-encoded index of the receipt represents the key, and the components of the receipt represent the value.
Editor’s note: This post has been adapted from a section of the book Blockchain: The Comprehensive Guide to Blockchain Development, Ethereum, Solidity, and Smart Contracts by Tobias Fertig and Andreas Schütz.