The workshop on blockchain introductory sessions often begins with stories that are easy to understand for point-to-point networks and bank ledgers, and then jump directly to writing smart contracts, which is very abrupt. So imagine yourself walking into the jungle and imagining that Ethereum blockchain is a strange creature you're about to study. Today we will look at the creature and interact with it and then collect all the data about it into a centralized store for your own use.
For the first time setup
First, you need to install web3py. Web3py is a python library for connecting the Ethereum blockchain. What you need to know beforehand is that there is no central management system from which data can be downloaded. An inner-connected node ("peer") that shares resources with each other stores a copy of the validated data (or part thereof). The network implements the Ethereum protocol, which defines the interaction rules between nodes and the smart contracts on the network.
If you want to access information about a transaction, balance, chunk, or any other chunk that is written to the blockchain, the agreement requires you to connect to the node. Nodes are constantly sharing new data with each other and validating the data so that you can identify those data that are not tampered with, and those that are up to date.
You can use two basic types of nodes in the first approach to Ethereum: local or managed. Local nodes can run on your computer, which means you first need to download a client like Geth, which synchronizes the blockchain to your device, consumes storage space and spends a lot of time doing it. For a first-time study, a managed node is a better choice-it's controlled by someone else, but you can easily connect to it and play the blockchain yourself.
Go to Infura and create your own free account to access this type of managed node. When you're done, you can see the mainnet master network (i.e. the Ethereum blockchain) and a bunch of testnets test nets, which basically test your smart contracts so you can make mistakes before deploying expensive code to mainnet and correct them.
For the first time, we first import the Web3 object and establish an HTTP connection.
from web3 import Web3 web3 = Web3(Web3.HTTPProvider("mainnet.infura.io/your-own-personal-number"))
Now you're done! You can use the WEB3 API to browse query data structures.
Querying specific chunk information
#current block number>>> web3.eth.blockNumber5658173#get the content of the most recently mined block>>> web3.eth.getBlock('latest')
This command returns the ATTRIBUTEDICT data structure, which is a dictionary of Key-value key-value pairs, as follows:
Attributedict ({' Difficulty ': 3297284143124448, ' Extradata ': hexbytes (' 0x65746865726d696e652d6177732d61736961312d34 '), ' gaslimit ': 7999992, ' gasused ': 7990111, ' hash ': hexbytes (' 0x8c09ba67123601c08ef5d292acaffd36798ca178b7d6fecd5e1144ce8e3b9e50 '), ' logsbloom ': HexBytes (' 0x348000240b40620836308460180004415000c8ccb260021402420721c22801ca847c625c0a89030482044001523a4d100050100250d100858381260 A186312088006c154010000491216446840888200c1812088c12b06000809a808530014160000812c2ac20008a201c83380314d02242338400c0500c2 a028005010988c44b0608a020400201032e10e16142b931115469824248066100b082224200222140a41a20aa2006224d608210f1a22811d03969423e 8c08058100388c0800402002a000802130c40d289201900c38142a1cc0380a4010f0201040d4022200022018c5801346c168502841906940485ea1d98 64044060a00000a00616004d006090 '), ' Miner ': ' 0xea674fdde714fd979de3edf0f56aa9716b898ec8 ', ' Mixhash ': HexBytes (' 0x84320fd71345778b48e437f3403e9021575520ba23aaac48dd7a352c9ce31f75 '), ' nonce ': hexbytes (' 0x98a0b1e00bfabac6 '), ' Number ': 5658173, ' ParenThash ': hexbytes (' 0x01eda8a47a0151533d1afacf9b9108606d4d89a86e269dddaac9698b6fb12930 '), ' receiptsRoot ': HexBytes ( ' 0xc40f774ad10ad443457c3a5a0db96b539af3007f8d351b198ca7bf2ef196b7e0 '), ' sha3uncles ': HexBytes (' 0x55725ec296c6c64257ed6a88d7d8c66160abe7b672f5d89bbad5487779b1d5fe '), ' size ': 27161, ' stateroot ': HexBytes (' 0x5bfc7a9a87fb9991f2760807d56319154f1dab91d3cfc9530a597b6c5d064aba '), ' timestamp ': 1527002893, ' totalDifficulty ': 4339832462578780814928, ' transactions ': [Hexbytes (' 0x1ce6bca99701c4e8acae986b10e7649d628d70ec62b7a8314bbb13726a312225 '), Hexbytes (' 0x6ba5e657243aea5f95afb40090313d10bb9443db41ed1216fbf7e7e60a16749a '), Loooooots_of_transactions_here], ' Transactionsroot ': hexbytes (' 0x67e1e1f2f4b1d33791a0fba2d5ebf039bd6c331c665cb8020ff41d0e0eade46e '), ' uncles ': [ Hexbytes (' 0x3df1bffa62d73b3847b434e9ea459c10cfdc3e212a2e78ebbf0db58adbef30b5 '), Hexbytes (' 0x74bdcd4b88427854ae18f9c7ada28d46f26411bed09af6b040cbede66fdb1853 ')]})
Not all of these variables will be useful to you immediately, because some variables are very technical, meaning only when you have a deeper understanding of how the blockchain actually works. You can read more about them in the so-called Yellow Book, or temporarily skip them and use easy-to-understand methods.
In short, a chunk that contains header information for a chunk, a list of verified transactions written to it, and an unconfirmed list (the miner's block identifiers, whose chunks are too slow to enter the main blockchain, but still receive an ether reward for their computational effort). Below you can see the meaning of each variable, and I divide it into subcategories.
General data
Block Variable |
meaning |
translation |
Number |
Scalar value equal to the number of ancestor blocks (Genesis block=0) |
The number of scalar values relative to the initiator block, Genesis Block=0 |
Size |
Size of the block in bytes |
The size of the block, in bytes |
Timestamp |
Unix ' s Time () at the This block ' s inception |
Unix time at the start of this block |
Miner |
160-bit address for fees collected from successful mining |
Successful mining collects 160 bits of Ethernet address |
Gaslimit |
Maximum gas expenditure allowed in this block |
Maximum allowable gas consumption for this block |
Gasused |
Total gas used by all transactions in this block |
Total gas used for all trades in this block |
Transactions |
List of transaction hashes included in the Block |
List of trade hashes included in the Block |
Parenthash |
KECCAK256 Hash of the parent block ' s header |
Keccak 256 hash value for parent block Chunk header |
Hash |
Current block ' s hash |
Hash value of the current block |
Extradata |
Extra data in byte array |
Extra data in a byte array |
Mining-related
Block Variable |
meaning |
translation |
Difficulty |
Scalar value corresponding to the difficulty level of the block |
Scalar value corresponding to the difficulty level of the block |
Totaldifficulty |
Integer of the difficulty of the chain until this block |
The total difficulty value of the chain until this chunk |
Nonce |
Hash of the generated proof-of-work; NULL when the its a pending block |
Generate a hash of the work proof; null when the chunk is suspended |
Mixhash |
256-bit hash which is combined with the nonce and used to prove that sufficient amount of computation have been carried out On this block |
A 256-bit hash is used in conjunction with a nonce to prove that sufficient computation has been performed on this block |
Uncle Related
Block Variable |
meaning |
translation |
Uncles |
List of Uncle Hashes |
Uncle Hash Value List |
Sha3uncles |
SHA3 of the uncles data in the block |
SHA3 value of uncles data in block |
Technology-related
Block Variable |
meaning |
translation |
Receiptsroot |
Keccak 256-bit hash of the root node of the tree structure populated with receipts of all transactions in this block |
The 256-bit hash of the root node of the Keccak tree is populated with receipts for all transactions in this block |
Stateroot |
Keccak256 hash of the root node if the state trie after all transactions is executed and finalisations applied |
After all transactions are executed and the application terminates, such as the keccak256 hash value of the trie root node |
Transactionsroot |
Keccak256 hash of the root node of the trie structure populated with the receipts of all transaction in the transactions List |
The keccak256 hash of the root node of the trie structure fills in the receipts for each transaction in the trade list |
Logsbloom |
The Bloom filter from indexable info (logger address and log topics) contained in each log entry from the receipt of each Transaction in the transaction list |
Bloom Filter for indexed information (logger address and log subject) contained in the receive log entry for each transaction in the transaction list |
Transactions and receipts
Now, we can also find a single transaction in a chunk by its unique identifier, the transaction hash.
>>> web3.eth.getTransaction('0x1ce6bca99701c4e8acae986b10e7649d628d70ec62b7a8314bbb13726a312225')AttributeDict({'blockHash': HexBytes('0x8c09ba67123601c08ef5d292acaffd36798ca178b7d6fecd5e1144ce8e3b9e50'), 'blockNumber': 5658173, 'from': '0x390dE26d772D2e2005C6d1d24afC902bae37a4bB', 'gas': 45000, 'gasPrice': 123400000000, 'hash': HexBytes('0x1ce6bca99701c4e8acae986b10e7649d628d70ec62b7a8314bbb13726a312225'), 'input': '0x', 'nonce': 415710, 'r': HexBytes('0x1bb901ad0a3add517504cc459fdb1545d193020ec5c63a566e440ee39dbfe131'), 's': HexBytes('0x4b7ac95eb321b5947948ecb624e1d80b19d9cc876668c69cc2b32670f52b061a'), 'to': '0xBbA2D99C9B3aF394B0d6417b1D58815eE495029D', 'transactionIndex': 0, 'v': 37, 'value': 1000000000000000000})
As before, Web3py returns an attribute dictionary. The following table summarizes the meaning of each key.
Transaction Variable |
meaning |
translation |
Blockhash |
Hash of the block the transaction belongs to |
The hash value of the stock Exchange block |
Blocknumber |
Number of that block |
The number of the block |
Hash |
Transaction hash (unique identifier) |
Transaction address hash (unique identifier) |
From |
160-bit address of a sender of a transaction |
160-bit address hash from the sender of the transaction |
To |
Address of the recipient or null for a contract creation transaction |
The address of the recipient or null when creating a contract transaction |
Value |
Number of Wei to BES transfered to the recipient or newly created account (case of contract creation) |
Number of Wei to transfer to recipient or newly created account (in case of contract creation) |
Ea1 |
Gas consumed by the transaction |
Trade consumption of natural gas |
Gasprice |
Number of Wei to being paid per unit of gas for all computatioon costs of this transaction |
This exchange has a calculated cost per unit of natural gas payment quantity |
Nonce |
Number of Transactions/contract creations sent by the sender prior to this one |
The number of transactions and creation contracts sent by the sender before this |
v/r/s |
Used to identify the sender; The signature values of the transaction |
Signature value used to identify the sender's transaction |
Input |
The data sent along with the transaction |
Data sent along with the transaction |
Transactionindex |
Index of the transaction in the block |
Index of transactions in a chunk |
Finally, we can also view transaction receipts:
>>> Web3.eth.getTransactionReceipt (' 0x68c70c5ffe54a42ebf7118e7e931aeac018cee4656a816ffe6a01388da50c851 ') attributedict ({' BlockHash ': HexBytes (' 0x44338e1f80302037c7213e8f56dd35d8a473b000319bc200f76e910e62d12f98 '), ' blocknumber ': 5617795, ' contractAddress ': None, ' cumulativegasused ': 21004, ' from ': ' 0xea6e3e41ebaa09d550d3c3f0d72971b3c5ccc261 ', ' gasused ': 21004, ' logs ': [], ' Logsbloom ': hexbytes (' 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000 '), ' status ': 1, ' to ': ' 0XD96A6E75D099CE529BBD257CBCB403224CCEAEBF ', ' transactionHash ': hexbytes (' 0x68c70c5ffe54a42ebf7118e7e931aeac018cee4656a816ffe6a01388da50c851 '), ' Transactionindex ': 0})
The transaction receipt contains some duplicates and new entries, and the new entries are explained below.
Receipt Variable |
meaning |
translation |
Status |
Boolean whether the transaction was successfull; False if the EVM (Ethereum Virtual machine) reverted the transaction |
Whether the transaction was successful and returns False if the EVM (Ethereum virtual machine) restored the transaction |
Contractaddress |
The contract address created if the transaction was a contract creation; otherwise null |
The contract address that is created if the transaction is a contract creation; otherwise null |
Gasused |
The total amount of gas used if this transaction is executed in the block |
The total amount of gas used when executing this transaction in a chunk |
Cumulativegasused |
The sum of gasused by this transaction and all preceding transactions in the same block |
The sum of all previous transactions in the Gasuse and the same block used by this transaction |
Logs |
Array of log objects which the transaction has generated |
Array of log objects generated by the transaction |
As a reference, in addition to the Yellow Book, I also include a variety of additional resources to compile these forms 2,3,4,5.
As you can see, just a few simple commands to connect to the network and get basic information about the transaction, block, or state of the original format. This will open a new window for the data!
Database management System
When you plan to write data to the appropriate database, you may realize that there are many management system solutions for Python enthusiasts, such as server-free SQLite, or servers-based mysql,postgresql or Hadoop. Depending on your intentions, you must decide which option is best for your project. In general, I find these points helpful:
- What is the expected size of the database (i.e. it can be processed on a single machine system)?
- Are these entries often edited or will they remain the same?
- Should the database be accessed and edited concurrently by multiple parties/applications?
Over time, the Ethereum blockchain is growing steadily, approaching 1TB by June 2018, a small one, so it is not the best choice for distributed processing systems like Hadoop. The blockchain database is written once and then extended only with new entries, leaving the old entries intact. The expected use case for this database is written by one channel and accessed by other channels as read-only, so we don't actually need to run it on the server. Saving the database locally on your machine will result in fast reads, which is desirable and achievable for a server-free management system like SQLite. Python has a built-in library sqlite3
, so we don't even need to install a new package.
Database design
The next step is to design the database. Keep in mind which data fields are most relevant to your analysis and are designed to optimize search and storage. For example, if you don't plan to use stateRoot
it, you might want to skip it entirely or save it in a separate table. You can search for tables with fewer columns faster, and if you later realize that you actually have stateRoot
a use case, you can still access it. You may also want to separate the block information from the transaction information, and if you do not, the chunk attribute timestamp
will waste a lot of space if it repeats n times for all trades in the chunk. JOIN
you can easily match a trade with its block properties later using the action.
I designed a database that contains 3 tables:
- Quick: The most relevant transaction information for quick access and analysis.
- TX: All remaining trading information,
- Block: Specifies the information for the block.
The naming convention of a variable changes slightly relative to the original web3py to disambiguate, such as the block hash and the trade hash are called "hashes," or "from" or "to" as the column name, which has different meanings in SQL, which causes the program to crash.
Trade values, balances, and other large numbers need to be stored as strings in the database. The reason is that SQLite can handle up to 8 bytes of stored signed integers, with a maximum value of 2 for 63 parties-1 is about 9223372036854775807. This is usually much lower than the transaction value in Wei (for example, 1ETH = 10*18 wei).
Build your Mini-database
The complete code can be found on GitHub. It organizes the blockchain information based on the upper-level architecture and outputs the Blockchain.db file containing the pre-specified number of block data. To test it, go to the database.py
file and select a reasonable number for the number of blocks to write, for example:
Nblocks = 10000
By default, you should point the Web3 object to the Infura endpoint. If you have an IPC provider (that is, your local node), you can also switch to the IPC provider by simply uncomment the line:
# or connection via node on the VM #web3 = Web3(Web3.IPCProvider('/path-to-geth.ipc/'))
Modify the path, and then just run it on the command line python database.py
. The code dumps the number of the last-written block into a lastblock.txt
file in case you need to restart it.
How to use the database
Once the first entry is written to the database, you can begin to communicate with it through the Ipython shell. For example, to print the first 5 rows of the table "Quick", you can run the following code.
import sqlite3 as sq3conn = sq3.connect("blockchain.db")cur = conn.cursor()# some SQL code, e.g. select first five entries of the table Quickcur.execute("SELECT * FROM Quick LIMIT 5")a = cur.fetchall() #list of tuples containing all elements of the rowprint(a)conn.close()
Local node and Infura
If you are building a large database, you should download the Geth and synchronize the nodes. Synchronization can be done in 3 basic modes:
If you don't need a past account state, you can synchronize node 6 in quick mode.
The following diagram shows the speed at which this code is written to the database, communicating with the locally fully synchronized node (IPC) and the address on Infura (Infura). As you can see, it is worthwhile to run this code on the local node because you will be able to increase the speed by nearly 2 orders of magnitude (i.e. 100x)!
Summarize
Now that you have your own local database and understand what's happening on the blockchain, you can start exploring it. For example, you can calculate the number of trades since its origin, and see the number of addresses generated as a function of time-the sky is the limit you can learn about blockchain. We have laid the groundwork for your data science playground. So, keep exploring, or review the next article to learn about potential apps.
Amway My Python uses web3.py to develop an Ethereum Blockchain Application course:/http