This is a creation in Article, where the information may have evolved or changed.
The series of articles I have put on GitHub: blockchain-tutorial, Updates will be on GitHub, and may not be synced here. If you want to run the code directly, you can clone the tutorial repository on GitHub and go to the SRC directory to execute make
.
Introduction
At the beginning of this series of articles, we mentioned that blockchain is a distributed database. In the previous article, however, we selectively skipped the "distributed" section, but instead focused on the "Database" section. So far, we have implemented almost all the elements of a blockchain database. Today, we will analyze some of the mechanisms that we have skipped before. In the next article, we'll start to discuss the distributed nature of the blockchain.
Previous series of articles:
Basic prototypes
Proof of workload
Persistence and command-line interfaces
Trading (1)
Address
The code implementation of this article varies greatly, please click here to see all the code changes.
Reward
In the previous article, one of the little details we skipped was mining rewards. Now, we are ready to refine this detail.
Mining rewards, in fact, is a coinbase transaction. When a mining node starts digging a new block, it pulls the deal out of the queue and appends a Coinbase transaction to the front. The Coinbase transaction has only one output, which contains the miner's public key hash.
The rewards are very simple and can be updated send
:
func (cli *CLI) send(from, to string, amount int) { ... bc := NewBlockchain() UTXOSet := UTXOSet{bc} defer bc.db.Close() tx := NewUTXOTransaction(from, to, amount, &UTXOSet) cbTx := NewCoinbaseTX(from, "") txs := []*Transaction{cbTx, tx} newBlock := bc.MineBlock(txs) fmt.Println("Success!")}
In our implementation, the person who created the transaction dug up the new block at the same time, so they got a reward.
UTXO Set
In Part 3: Persistence and command-line interfaces, we studied how Bitcoin Core stores blocks in a database, and understands that chunks are stored in the blocks
database, and that the transaction output is stored in the chainstate
database. chainstate
the organization that will review:
c
+ 32-byte transaction hash--record of the transaction's not spent transaction output
B
+ 32 bytes Block Hash--block hash not spent on trade output
In the previous article, although we had implemented the transaction, it was not used chainstate
to store the output of the transaction. So, let's go ahead and finish this part.
chainstate
No transactions are stored. It stores the UTXO set, which is the set of unused trade outputs. In addition, it stores a "block hash of the unused transaction output that the database represents", but we will temporarily skip the block hash because we have not yet used the block height (but we will continue to improve it in the next article).
So why do we need a UTXO set?
To think about the approach we implemented earlier Blockchain.FindUnspentTransactions
:
func (bc *Blockchain) FindUnspentTransactions(pubKeyHash []byte) []Transaction { ... bci := bc.Iterator() for { block := bci.Next() for _, tx := range block.Transactions { ... } if len(block.PrevBlockHash) == 0 { break } } ...}
This function finds transactions that have not been spent on output. Since the transaction is stored in a chunk, it iterates through each chunk within the blockchain, checking every transaction inside. As of September 18, 2017, there are already 485,860 blocks in bitcoin, and the entire database requires more than three gigabytes of disk space. This means that if a person wants to validate a transaction, it must run a full node. In addition, verifying transactions will require iterations on many blocks.
The solution to the problem is to have an index that has no output at all, which is what the UTXO set is going to do: it's a cache that is built from all blockchain transactions (iterating over chunks, but only once), and then using it to calculate the balance and validate new trades. As of September 2017, the UTXO set is probably 2.7 Gb.
Well, let's think about what changes need to be made to implement the UTXO set. Currently, the following methods have been used to find deals:
Blockchain.FindUnspentTransactions
-Find the main function that has not spent the output trade. It also iterates over all chunks in this function.
Blockchain.FindSpendableOutputs
-This function is used when a new transaction is created. If a desired number of outputs are found. Use Blockchain.FindUnspentTransactions
.
Blockchain.FindUTXO
-Find the unused output of a public key hash and then use it to get the balance. Use Blockchain.FindUnspentTransactions
.
Blockchain.FindTransation
-Find a deal in the blockchain based on the ID. It iterates over all blocks until it is found.
As you can see, all the methods iterate over all the blocks in the database. But at the moment we haven't improved all of the methods because the UTXO set can't store all the trades, only those that have not spent the output. Therefore, it cannot be used Blockchain.FindTransaction
.
So, we want the following methods:
Blockchain.FindUTXO
-Find all the unused outputs by iterating over the chunks.
UTXOSet.Reindex
-Use to UTXO
find the output that was not spent and then store it in the database. This is where the cache is.
UTXOSet.FindSpendableOutputs
-Similar Blockchain.FindSpendableOutputs
, but uses the UTXO set.
UTXOSet.FindUTXO
-Similar Blockchain.FindUTXO
, but uses the UTXO set.
Blockchain.FindTransaction
Just like before.
Therefore, from now on, the two most commonly used functions will use the cache! To start writing code.
type UTXOSet struct { Blockchain *Blockchain}
We will use a single database, but we will store the UTXO set from a different bucket. So, UTXOSet
come Blockchain
along.
func (u UTXOSet) Reindex() { db := u.Blockchain.db bucketName := []byte(utxoBucket) err := db.Update(func(tx *bolt.Tx) error { err := tx.DeleteBucket(bucketName) _, err = tx.CreateBucket(bucketName) }) UTXO := u.Blockchain.FindUTXO() err = db.Update(func(tx *bolt.Tx) error { b := tx.Bucket(bucketName) for txID, outs := range UTXO { key, err := hex.DecodeString(txID) err = b.Put(key, outs.Serialize()) } })}
This method initializes the UTXO set. First, if the bucket exists, it is removed first, then all the unused output is fetched from the blockchain, and the output is eventually saved to the bucket.
Blockchain.FindUTXO
Almost Blockchain.FindUnspentTransactions
identical, but now it returns TransactionID -> TransactionOutputs
to a map.
Now, the UTXO set can be used to send coins:
func (u UTXOSet) FindSpendableOutputs(pubkeyHash []byte, amount int) (int, map[string][]int) { unspentOutputs := make(map[string][]int) accumulated := 0 db := u.Blockchain.db err := db.View(func(tx *bolt.Tx) error { b := tx.Bucket([]byte(utxoBucket)) c := b.Cursor() for k, v := c.First(); k != nil; k, v = c.Next() { txID := hex.EncodeToString(k) outs := DeserializeOutputs(v) for outIdx, out := range outs.Outputs { if out.IsLockedWithKey(pubkeyHash) && accumulated < amount { accumulated += out.Value unspentOutputs[txID] = append(unspentOutputs[txID], outIdx) } } } }) return accumulated, unspentOutputs}
Or check the balance:
func (u UTXOSet) FindUTXO(pubKeyHash []byte) []TXOutput { var UTXOs []TXOutput db := u.Blockchain.db err := db.View(func(tx *bolt.Tx) error { b := tx.Bucket([]byte(utxoBucket)) c := b.Cursor() for k, v := c.First(); k != nil; k, v = c.Next() { outs := DeserializeOutputs(v) for _, out := range outs.Outputs { if out.IsLockedWithKey(pubKeyHash) { UTXOs = append(UTXOs, out) } } } return nil }) return UTXOs}
This is a Blockchain
simple modified version of the method. This Blockchain
method is no longer needed.
With the UTXO set, it means that our data (transactions) are now stored separately: The actual transaction is stored in the blockchain, and the output is stored in the UTXO set. In this way, we need a good synchronization mechanism because we want the UTXO set to be up-to-date and store the output of the latest trade. But we don't want to regenerate the index every new block we build, because that's the frequent blockchain scan we're trying to avoid. Therefore, we need a mechanism to update the UTXO set:
Func (U utxoset) Update (block *block) {db: = U.blockchain.db err: = db. Update (func (TX *bolt). TX) Error {b: = tx. Bucket ([]byte (Utxobucket)) for _, TX: = range block. Transactions {if TX. Iscoinbase () = = False {for _, Vin: = Range TX. Vin {updatedouts: = txoutputs{} outsbytes: = B.get (vin. TXID) Outs: = Deserializeoutputs (outsbytes) for outidx, out: = Range outs. Outputs {if outidx! = Vin. Vout {updatedouts.outputs = append (Updatedouts.outputs, Out)} } If Len (updatedouts.outputs) = = 0 {err: = B.delete (vin. TXID)} else {err: = B.put (vin. Txid, Updatedouts.serialize ())}}} newoutputs: = txoutputs{} For _, Out: = Range TX. Vout { newoutputs.outputs = Append (Newoutputs.outputs, Out)} ERR: = B.put (Tx.id, Newoutput S.serialize ())}})}
Although this method may seem a little complicated, the things it needs to do are very intuitive. When a new block is dug out, the UTXO set should be updated. Updating means removing the output that has been spent and adding the unused output to the newly dug trades. If the output of a transaction is removed and no longer contains any output, the transaction should also be removed. Quite simple!
Now let's use the UTXO set when necessary:
func (cli *CLI) createBlockchain(address string) { ... bc := CreateBlockchain(address) defer bc.db.Close() UTXOSet := UTXOSet{bc} UTXOSet.Reindex() ...}
When a new blockchain is created, the index is rebuilt immediately. At present, this is the Reindex
only place to use, even if it looks a little "overkill", because when a chain starts, there is only one block, and there is only one transaction, which is Update
already in use. But we may need to re-index the mechanism in the future.
func (cli *CLI) send(from, to string, amount int) { ... newBlock := bc.MineBlock(txs) UTXOSet.Update(newBlock)}
When a new block is dug out, the UTXO set is updated.
Let's check if it works as scheduled:
$ blockchain_go createblockchain -address 1JnMDSqVoHi4TEFXNw5wJ8skPsPf4LHkQ100000086a725e18ed7e9e06f1051651a4fc46a315a9d298e59e57aeacbe0bf73Done!$ blockchain_go send -from 1JnMDSqVoHi4TEFXNw5wJ8skPsPf4LHkQ1 -to 12DkLzLQ4B3gnQt62EPRJGZ38n3zF4Hzt5 -amount 60000001f75cb3a5033aeecbf6a8d378e15b25d026fb0a665c7721a5bb0faa21bSuccess!$ blockchain_go send -from 1JnMDSqVoHi4TEFXNw5wJ8skPsPf4LHkQ1 -to 12ncZhA5mFTTnTmHq1aTPYBri4jAK8TacL -amount 4000000cc51e665d53c78af5e65774a72fc7b864140a8224bf4e7709d8e0fa433Success!$ blockchain_go getbalance -address 1JnMDSqVoHi4TEFXNw5wJ8skPsPf4LHkQ1Balance of '1F4MbuqjcuJGymjcuYQMUVYB37AWKkSLif': 20$ blockchain_go getbalance -address 12DkLzLQ4B3gnQt62EPRJGZ38n3zF4Hzt5Balance of '1XWu6nitBWe6J6v6MXmd5rhdP7dZsExbx': 6$ blockchain_go getbalance -address 12ncZhA5mFTTnTmHq1aTPYBri4jAK8TacLBalance of '13UASQpCR8Nr41PojH8Bz4K6cmTCqweskL': 4
Very good! 1JnMDSqVoHi4TEFXNw5wJ8skPsPf4LHkQ1
the address received 3 awards:
One is to dig up the Genesis block.
One is to dig up a block 0000001f75cb3a5033aeecbf6a8d378e15b25d026fb0a665c7721a5bb0faa21b
One is to dig up a block 000000cc51e665d53c78af5e65774a72fc7b864140a8224bf4e7709d8e0fa433
Merkle Tree
In this article, I also want to discuss an optimization mechanism.
As mentioned above, the full Bitcoin database (that is, the blockchain) requires more than a few gigabytes of disk space. Because of the de-centric nature of Bitcoin, each node in the network must be independent and self-sufficient, that is, each node must store a complete copy of the blockchain. As more and more people use Bitcoin, this rule becomes more and more difficult to comply with: because it is unlikely that everyone is going to run a full node. Also, because nodes are full participants in the network, they are responsible for the fact that the nodes must validate the transactions and chunks. In addition, to interact with other nodes and download new blocks, there is a certain amount of network traffic requirements.
In Nakamoto's original bitcoin paper, there is also a solution to this problem: simple payment verification (simplified Payment verification, SPV). The SPV is a small bitcoin node that does not need to download the entire blockchain, nor does it need to validate chunks and transactions . Instead, it looks for transactions in the blockchain (in order to verify payments) and needs to connect to a full node to retrieve the necessary data. This mechanism allows multiple light wallets to run on only one full node.
To implement the SPV, there is a way to check if a chunk contains a transaction without downloading the entire chunk. This is what the Merkle tree is going to do.
Bitcoin uses the Merkle tree to obtain a hash of the transaction, which is stored in the block size and is used for the proof of work system. So far, we've just connected the hash of each trade in a block, and we'll apply the SHA-256 algorithm to it. While this is a good way to get a unique representation of a chunk transaction, it does not take advantage of the Merkle tree.
Take a look at the Merkle tree:
Each block will have a Merkle tree, which starts at the leaf node (at the bottom of the tree) and a leaf node is a trade hash (Bitcoin uses a double SHA256 hash). The number of leaf nodes must be even-numbered, but not every block contains a single-numbered transaction. Because, if the number of trades in a block is singular, a copy of the last leaf node (that is, the last trade of the Merkle tree, not the last trade in the block) is duplicated.
From bottom to top, 22 pairs, concatenate two node hashes, and combine hashes as new hashes. The new hash becomes the new tree node. Repeat the process until there is only one node, that is, the root. Genhachy then as the only indicator of the entire block transaction, save it to the chunk header and use it for proof of work.
The advantage of the Merkle tree is that a node can verify that a transaction is included without downloading the entire block. And these only require a trade hash, a Merkle root hash, and a Merkle path.
Finally, write the code:
type MerkleTree struct { RootNode *MerkleNode}type MerkleNode struct { Left *MerkleNode Right *MerkleNode Data []byte}
Start with the struct first. Each MerkleNode
contains data and pointers to left and right branches. MerkleTree
is actually connecting to the root node of the next node, then connecting to the farther node, and so on.
Let's start by creating a new node:
func NewMerkleNode(left, right *MerkleNode, data []byte) *MerkleNode { mNode := MerkleNode{} if left == nil && right == nil { hash := sha256.Sum256(data) mNode.Data = hash[:] } else { prevHashes := append(left.Data, right.Data...) hash := sha256.Sum256(prevHashes) mNode.Data = hash[:] } mNode.Left = left mNode.Right = right return &mNode}
Each node contains some data. When the node is in the leaf node, the data is passed in from the outside (here, that is, a serialized transaction). When a node is associated to another node, it takes the data from the other nodes and then hashes the connection.
func NewMerkleTree(data [][]byte) *MerkleTree { var nodes []MerkleNode if len(data)%2 != 0 { data = append(data, data[len(data)-1]) } for _, datum := range data { node := NewMerkleNode(nil, nil, datum) nodes = append(nodes, *node) } for i := 0; i < len(data)/2; i++ { var newLevel []MerkleNode for j := 0; j < len(nodes); j += 2 { node := NewMerkleNode(&nodes[j], &nodes[j+1], nil) newLevel = append(newLevel, *node) } nodes = newLevel } mTree := MerkleTree{&nodes[0]} return &mTree}
When creating a new tree, the first thing to make sure is that the leaf node must be even. Then, the data (that is, an array of serialized trades) is converted into leaves of the tree, from which the leaves slowly form a tree.
Now, let's modify Block.HashTransactions
it to get a hash of the transaction in the proof-of-work system:
func (b *Block) HashTransactions() []byte { var transactions [][]byte for _, tx := range b.Transactions { transactions = append(transactions, tx.Serialize()) } mTree := NewMerkleTree(transactions) return mTree.RootNode.Data}
First, the transaction is serialized (used encoding/gob
), and then a mekle tree is constructed using a sequence of trades. The root will be the unique identifier for the block trade.
P2pkh
There's one more thing I'd like to talk about.
As you will recall, there is a script programming language in Bitcoin that is used to lock out the trade output, and the transaction input provides data to understand the lock output. This language is very simple, the code written in this language is actually a series of data and operators. Examples are as follows:
5 2 OP_ADD 7 OP_EQUAL
5, 2, and 7 are data, OP_ADD
and OP_EQUAL
are operators. script code executes from left to right: the data is put into the stack sequentially, and when the operator is encountered, the data is fetched from the stack, the operator is used for the data, and the result is taken as the top element of the stack. The stack of scripts is actually an advanced memory store: The first element in the stack is taken out, and each subsequent element is placed above the previous element.
Let's do this on the script division above:
Steps |
Stack |
Script |
Description |
1 |
Empty |
5 2 OP_ADD 7 OP_EQUAL |
The first stack is empty |
2 |
5 |
2 OP_ADD 7 OP_EQUAL |
Take it out of the script and 5 put it on the stack |
3 |
5 2 |
OP_ADD 7 OP_EQUAL |
Take it out of the script and 2 put it on the stack |
4 |
7 |
7 OP_EQUAL |
Operator is encountered OP_ADD , two operands are removed from the stack 5 , and the 2 result is put back on the stack after adding |
5 |
7 7 |
OP_EQUAL |
Take it out of the script 7 and put it on the stack. |
6 |
true |
Empty |
The operator is encountered OP_EQUAL , two operands are taken from the stack and compared, the results of the comparison are put back into the stack, the script is executed, and the empty |
OP_ADD
Take two elements from the stack, add the two elements, and then put the results back in the stack. OP_EQUAL
take two elements from the stack and compare the two elements: if they are equal, put one on the stack true
, or put one false
. The result of the script execution is the top element of the stack: in our case, if true
so, then the script executes successfully.
Now let's take a look at how a script executes the payment in Bitcoin:
<signature> <pubKey> OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG
This script is called pay-to-public Key Hash (P2PKH), which is the most commonly used script for Bitcoin. What it does is pay a hash of the public key, that is, lock some coins with a public key. This is the core of bitcoin payments : no account, no funds transfer; only one script checks whether the provided signature and public key are correct.
This script is actually stored as two parts:
The first section, <signature> <pubkey>
which is stored in the Input ScriptSig
field.
The second part, OP_DUP OP_HASH160 <pubkeyHash> OP_EQUALVERYFY OP_CHECKSIG
stored in the output of the ScriptPubKey
inside.
Therefore, the output is determined to understand the logic of the lock, and the input provides a "key" to unlock the output. Let's execute this script:
Steps |
Stack |
Script |
1 |
Empty |
<signature> <pubKey> OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG |
2 |
<signature> |
<pubKey> OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG |
3 |
<signature> <pubkey> |
OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG |
4 |
<signature> <pubKey> <pubKey> |
OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG |
5 |
<signature> <pubKey> <pubKeyHash> |
<pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG |
6 |
<signature> <pubKey> <pubKeyHash> <pubKeyHash> |
OP_EQUALVERIFY OP_CHECKSIG |
7 |
<signature> <pubKey> |
OP_CHECKSIG |
8 |
true Orfalse |
Empty |
OP_DUP
Copy the top element of the stack. OP_HASH160
take the top element of the stack, then RIPEMD160
hash it and send the result back to the stack. OP_EQUALVERIFY
compares the top two elements of the stack if they are not equal, terminating the script. OP_CHECKSIG
by hashing the transaction and using <signature>
and pubKey
to verify the signature of a transaction. The last operator is a bit complicated: it generates a trimmed copy of the trade, hashes it (because it is a signed transaction hash), and then uses the supplied <signature>
and pubKey
checked signatures correctly.
With one such scripting language, it can actually make Bitcoin a smart contract platform: in addition to transferring funds from a single public key, the language makes it possible for some other payment schemes.
Summarize
That's all for today! We have implemented almost all the key features of a blockchain-based crypto currency. We've got blockchain, address, mining and trading. But to give life to all these mechanisms and make Bitcoin a global system, there is an integral part: consensus (consensus). In the next article, we will begin to implement the "decenteralized" of the blockchain. Please listen!
Link:
Full Source Codes
The UTXO set:_data_storage#the_utxo_set_.28chainstate_leveldb.29)
Merkle Tree
Script
"Ultraprune" Bitcoin Core Commit
UTXO SET STATISTICS
Smart Contracts and Bitcoin
Why every Bitcoin user should understand "SPV security"
Original link: Building Blockchain in Go. Part 6:transactions 2