This article is sponsored by the currency Community (bihu.com) content support program.
Solidity provides data types that are common in other programming languages. In addition to simple value types such as numbers and structs, there are other data types that can be dynamically extended as the data increases. 3 large classes of dynamic types: mappings (mappings): Mapping (bytes32 => uint256), Mapping (address => string), and so on (Arrays): []uint256,[]byte, et cetera byte array (byte arrays): Only two types: string,bytes
In the second article in this series, we saw how a simple type of fixed size is represented in memory. Basic values: Uint256,byte and so on fixed-length arrays: [10]uint8,[32]byte,bytes32 combines the above type of structure
Fixed-size storage variables are packaged into 32-byte chunks as much as possible and stored sequentially in memory. (If this looks strange, read the second article in this series: a method of representing fixed-length data types
In this article, we will look at how solidity supports more complex data structures. On the surface, the arrays and mappings in the solidity may be familiar, but they have different performance characteristics from the way they are implemented.
We'll start with the mapping, which is the simplest of the three. Arrays and byte arrays are actually maps with more advanced features. Mapping
Let's store a numeric value in the uint256 => uint256 map:
pragma solidity ^0.4.11;
Contract C {
mapping (uint256 => uint256) items;
function C () {
Items[0xc0fefe] = 0x42
}
}
Compile:
Solc--bin--asm--optimize C-mapping.sol
Assembly Code:
Tag_2:
//Don't do anything, it should be optimized
0xc0fefe
0x0 swap1 dup2 mstore 0x20 mstore
//Will 0x42 stored in Address 0x798 ... 187c Upper
0x42
0x79826054ee948a209ff4a6c9064d7398508d2c1909a392f899d301c6d232187c
sstore
We can think of EVM as a key-value (Key-value) database, but each key is limited to 32 bytes. Rather than using Key0xc0fefe directly, use the hash value of the key 0x798 ... 187c, and 0x42 is stored here. The hash function uses the keccak256 (SHA256) function.
In this example we do not see the keccak256 instruction itself, because the optimizer has calculated the result in advance and is inline with the byte code. In the Mstore instructions, we can still see the traces of the calculation. Calculate Address
Use some Python code to put 0xc0fefe Hashicheng 0x798 ... 187c. If you want to follow along, you need to install Python 3.6, or install PYSHA3 to get the keccak_256 hash function.
Define two assistance functions:
Import binascii
import sha3
#将数值转换成32字节数组
def bytes32 (i): Return
binascii.unhexlify ('%064x '% i)
# computes the keccak256 hash value of a 32-byte array
def keccak256 (x): Return
sha3.keccak_256 (x). Hexdigest ()
Converts a number to 32 bytes:
>>> bytes32 (1)
B ' \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01 '
>>> bytes32 (0xc0fefe)
B ' \x00\x00\x00\x00\x00\ X00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0\xfe\xfe '
Use the + operator to connect two byte arrays:
>>> Bytes32 (1) + BYTES32 (2)
B ' \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ X00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02 '
Computes the keccak256 hash value of some bytes:
>>> keccak256 (bytes (1))
' bc36789e7a1e281436464229828f817d6612f7b477d66591ff96a9e064bcc98a '
Now we can calculate 0x798 ... 187c.
The location of the storage variable items is 0x0 (because it is the first storage variable). Connect Key0xc0fefe and items to get the address:
# key = 0xc0fefe, Position = 0
>>> keccak256 (bytes32 (0xc0fefe) + bytes32 (0))
' 79826054ee948a209ff4a6c9064d7398508d2c1909a392f899d301c6d232187c '
The formula for storing addresses for key is:
KECCAK256 (BYTES32 (key) + bytes32 (position))
Two Mappings
We'll put the formula here, and the formula will be used to calculate the value stored later.
Suppose we have two maps of our contract:
pragma solidity ^0.4.11;
Contract C {
mapping (uint256 => uint256) Itemsa;
Mapping (uint256 => uint256) ITEMSB;
function C () {
itemsa[0xaaaa] = 0xAAAA;
ITEMSB[0XBBBB] = 0xBBBB;
}
}
The location of the ITEMSA is 0,key 0xAAAA:
# key = 0xAAAA, Position = 0
>>> keccak256 (bytes32 (0xAAAA) + bytes32 (0))
' 839613f731613c3a2f728362760f939c8004b5d9066154aab51d6dadf74733f3 '
The location of the ITEMSB is 1,key 0xBBBB:
# key = 0xBBBB, Position = 1
>>> keccak256 (bytes32 (0xBBBB) + BYTES32 (1))
' 34cb23340a4263c995af18b23d9f53b67ff379ccaa3a91b75007b010c489d395 '
Use the compiler to verify these calculations:
$ solc--bin--asm--optimize C-mapping-2.sol
Assembly Code:
Tag_2:
//... Ignore memory operations that might be optimized
0xaaaa
0x839613f731613c3a2f728362760f939c8004b5d9066154aab51d6dadf74733f3
sstore
0xbbbb
0x34cb23340a4263c995af18b23d9f53b67ff379ccaa3a91b75007b010c489d395
sstore
The same as the desired result. KECCAK256 in the assembly code
The compiler can calculate the address of the key in advance because the associated value is constant. If the key is using a variable, then the hash must be completed in the assembly code. Now we're invalidating the optimizer to see how the hash is done in the assembly code.
It turns out that it's easy to invalidate the optimizer by simply introducing an indirect virtual variable i:
pragma solidity ^0.4.11;
Contract C {
mapping (uint256 => uint256) items;
This variable will cause a constant optimization failure
uint256 i = 0xc0fefe;
function C () {
items[i] = 0x42
}
}
The location of the variable items is still 0x0, so we should expect the address to be the same as before.
Plus the optimization option to compile, but this time the hash value is not calculated in advance:
$ solc--bin--asm--optimize C-mapping--no-constant-folding.sol
Assembly code for the annotation:
Tag_2:
//Load ' I ' into stack
sload (0x1)
[0xc0fefe]
//Place key ' 0xc0fefe ' in memory 0x0 position, prepare for hash
0x0
[0x0 0xc0fefe]
swap1
[0xc0fefe 0x0]
dup2
[0x0 0xc0fefe 0x0]
mstore
[0x0]
memory: {
0x00 => 0xc0fefe
}
//Store location ' 0x0 ' in the 0x20 (32) position in memory, prepare for hashing
0x20//
[0x20 0x0]
dup2
[0x0 0x20 0x0]
Swap1
[0x20 0x0 0x0]
mstore
[0x0]
memory: {
0x00
=> 0xc0fefe 0x20 => 0x0} //
starting from No. 0 byte, hash in memory the next 0x40 (64) Byte
0x40/
[0x40 0x0]
swap1
[0x0 0x40]
keccak256
[0x798 ... 187C]
//store 0x42 on the calculated address
0x42
[0x42 0x798 ... 187C]
swap1
[0x798 ... 187c 0x42]
sstore
store: {
0x798 ... 187c => 0x42
}
The mstore instruction writes 32 bytes into memory. Memory operation is much cheaper, only need 3 gas can read and write. The first half of the assembly code is "connected" by loading the key and position into adjacent blocks of memory:
0
[ key (bytes)] [Position (bytes)]
Then the keccak256 instruction hashes the data in memory. The cost depends on how much data is hashed: each SHA3 operation needs to pay 6 gas for each 32-byte word
The cost for a uint256 type Key,gas is 42:30 + 6 * 2. Map Large Values
Only 32 bytes can be stored per storage slot. What if we try to store a bigger structure.
pragma solidity ^0.4.11;
Contract C {
mapping (uint256 => Tuple) tuples;
struct Tuple {
uint256 A;
uint256 b;
uint256 c;
}
function C () {
tuples[0x1].a = 0x1A;
tuples[0x1].b = 0x1b;
TUPLES[0X1].C = 0x1c;
}
Compile, you will see 3 sstore instructions:
Tag_2:
//Ignore the 0x1a of the code that is not optimized
0xada5013122d395ba3c54772283fb069b10426056ef8ca54750cb9bb552a59e7d
Sstore
0x1b
0xada5013122d395ba3c54772283fb069b10426056ef8ca54750cb9bb552a59e7e
sstore
0x1c
0xada5013122d395ba3c54772283fb069b10426056ef8ca54750cb9bb552a59e7f
Sstore
Note that the calculated address is the same except for the last number. The members of the TULP structure are arranged sequentially (... 7d,.. 7e,.. 7f). mappings are not packaged
Considering how the mapping is designed, the minimum storage space required for each item is 32 bytes, even if you actually only need to store 1 bytes:
pragma solidity ^0.4.11;
Contract C {
mapping (uint256 => uint8) items;
function C () {
items[0xa] = 0xAA;
ITEMS[0XB] = 0xBB;
}
}
If a value is greater than 32 bytes, the storage space you need is incremented by 32 bytes. dynamic arrays are mappings for upgrades
In a typical language, an array is just continuously stored in memory in a series of elements of the same type. Suppose you have an array of elements that contain 100 uint8 types, then this consumes 100 bytes of memory. In this pattern, loading the entire array into the CPU's cache and looping through each element is a bit cheaper.
For most languages, arrays are cheaper than mappings. In solidity, however, arrays are more expensive mappings. The elements in the array are arranged sequentially in the memory:
0x290d...e563
0x290d...e564
0x290d...e565
0x290d...e566
Keep in mind, however, that each access to these storage slots is actually like a key-value lookup in the database. Accessing an array of elements is no different from accessing a mapped element.
Think about [the]uint256 type, which is essentially the same as mapping (uint256 => uint256), except that the latter is a bit more characteristic and makes it look like an array. Length represents a total number of element boundary checks. When the index value is greater than length when read or written, it is more complex than the mapped storage packaging behavior when the array changes to an hour, automatically clears unused storage slots bytes and string special optimizations allow short arrays (less than 32 bytes) to store more efficient simple arrays
Take a look at the array that holds 3 elements:
C-darray.sol
pragma solidity ^0.4.11;
Contract C {
uint256[] chunks;
function C () {
chunks.push (0xAA);
Chunks.push (0xBB);
Chunks.push (0xCC);
}
}
The assembly code for array access is difficult to trace, and the remix debugger is used to run the contract:
At the end of the simulation, we can see that there are 4 storage slots being used:
key:0x0000000000000000000000000000000000000000000000000000000000000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000003
Key: 0x290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e563
Value: 0X00000000000000000000000000000000000000000000000000000000000000AA
Key: 0x290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e564
Value: 0X00000000000000000000000000000000000000000000000000000000000000BB
Key: 0x290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e565
Value: 0x00000000000000000000000000000000000000000000000000000000000000cc
The position of the chunks variable is 0x0, which is used to store the length of the array (0x3) and the location of the hash variable to locate the address where the array data is stored:
# position = 0
>>> keccak256 (bytes32 (0))
' 290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e563 '
Each element of the array on this address is arranged sequentially (0x29 ...). 63,0x29.. 64,0x29.. 65). Dynamic Data Packaging
What is all the important packaging behavior. Arrays are compared to mappings, one of the advantages of arrays is packaging. The uint128[] array element with 4 elements just needs 2 storage slots (plus 1 storage slots to store the length).
Think about this:
pragma solidity ^0.4.11;
Contract C {
uint128[] s;
function C () {
s.length = 4;
S[0] = 0xAA;
S[1] = 0xBB;
S[2] = 0xCC;
S[3] = 0xDD;
}
}
Run this code in the remix, and the end of the memory looks like this:
key:0x0000000000000000000000000000000000000000000000000000000000000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000004
Key: 0x290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e563
Value: 0X000000000000000000000000000000BB000000000000000000000000000000AA
Key: 0x290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e564
Value: 0x000000000000000000000000000000dd000000000000000000000000000000cc
Only three storage slots were used, as expected. The length is stored again at the 0x0 location of the stored variable. 4 elements are packaged into two separate storage slots. The start address of the array is the hash value of the variable position:
# position = 0
>>> keccak256 (bytes32 (0))
' 290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e563 '
Now the address is added once per two array elements, and looks good.
However, the assembly code itself is not optimized. Because two storage slots are used, we would like the optimizer to use two Sstore instructions to complete the task. Unfortunately, due to boundary checks (and some other factors), there is no way to optimize the sstore instructions.
Use 4 Sstore instructions to complete a task:
/* "C-bytes--sstore-optimize-fail.sol": 105:116 s[0] = 0xAA * *
sstore/
* " C-bytes--sstore-optimize-fail.sol ": 126:137 s[1] = 0xBB * *
sstore/
*" C-bytes--sstore-optimize-fail.sol ": 147:158 s[2] = 0xCC *
/sstore/*" C-bytes--sstore-optimize-fail.sol ": 168:179 s[3] = 0xDD * *
Sstore
byte arrays and strings
Bytes and string are special array types that are optimized for bytes and characters. If the length of the array is less than 31 bytes, only 1 storage slots are required to store the entire array. A longer byte array is similar to the representation of a normal array.
Look at a shorter byte array:
C-bytes--long.sol
pragma solidity ^0.4.11;
Contract C {
bytes s;
function C () {
s.push (0xAA);
S.push (0xBB);
S.push (0xCC);
}
}
Because the array is only 3 bytes (less than 31 bytes), it occupies only 1 storage slots. Run in remix, the storage looks like this:
key:0x0000000000000000000000000000000000000000000000000000000000000000
Value: 0xaabbcc0000000000000000000000000000000000000000000000000000000006
Data 0xaabbcc ... Store from left to right. The back 0 is empty data. The last 0x06 byte is the encoded length of the array. The formula is length = encoding length/2, in this case the actual length is 6/2=3.
String is identical to the principle of bytes. Long byte array
If the length of the data is greater than 31 bytes, the byte array is the same as []byte. Take a look at the byte array with a length of 128 bytes:
C-bytes--long.sol
pragma solidity ^0.4.11;
Contract C {
bytes s;
function C () {
s.length = 4;
S[31] = 0x1;
S[63] = 0x2;
S[95] = 0x3;
S[127] = 0x4
}
}
Running in remix, you can see that 4 storage slots are used:
0x0000 ... 0000
0x0000 ... 0101
0x290d...e563
0x0000 ... 0001
0x290d...e564
0x0000 ... 0002
0x290d...e565
0x0000 ... 0003
0x290d...e566
0x0000 ... 0004
The 0x0 storage slots are no longer used to store data, and the entire storage slot now stores the encoded array lengths. To get the actual length, use the length = (encoded length-1)/2 formula. In this example the length is (0x101-1)/2=128. The actual bytes are saved in the 0x290d...e563, and the storage slots are contiguous.
There are quite a few assembly codes for byte arrays. In addition to normal boundary checks and array recovery sizes, it also requires encoding/decoding lengths, as well as attention to transitions between long byte arrays and short byte arrays.
Why to encode the length. Because after coding, it is easy to test whether the byte array is long or short. Note the encoding length is always odd for long arrays, while the encoding length of short arrays is always even. The assembly code just needs to see if the last one is 0, 0 is even (a short array), and not 0 is odd (a long array). Summary
Looking at the internal work of the solidity compiler, you can see that familiar data structures such as maps and arrays are completely different from traditional programming languages.
Generalization: As with maps, the less efficient than the mapping of the assembly code more complex small type (byte,uint8,string) storage than mapping efficient assembly code optimization is not very good. Even if you are packing, each task will have a sstore instruction
EVM's memory is a database of key values, like Git. If you change any of these things, the checksum of the root node will change as well. If the two root nodes have the same checksum, the stored data is guaranteed to be the same.
To realize the strangeness of solidity and EVM, imagine that every element of an array in a GIT repository is its own file. When you change the value of an element in an array, it is actually the equivalent of creating a commit. When you iterate over an array, you can't load the entire array one at a time, you have to go to the warehouse and find each file separately.
Not only that, each file is limited to 32 bytes. Because we need to split the data structure into 32-byte chunks, all the logic and optimizations of the solidity compiler are very responsible, and all are done in the assembly.
However, the 32-byte limit is completely arbitrary. Key-valued stores can use key to store values of any type. Maybe in the future we'll add a new EVM directive that uses key to store any byte array.
But now, EVM memory is a key-value database disguised as a 32-byte array.
You can look at Arrayutils::resizedynamicarray to find out what the compiler does when you restore the array size. Normally the data structure is done as a standard library of languages, but it is embedded in the compiler in the solidity.
Other parts of the article series translation Links: Introduction to the EVM assembly code (part 1th) Presentation method for fixed-length data types (part 2nd) ABI encoding methods for external method calls (part 4th) What happens when a new contract is created (part 5th)
Translation Author: Xu Li
Original address: Diving into the Ethereum VM part Three
Author: Lilymoana
Link: https://www.jianshu.com/p/af5721c79505
Source: Jianshu
Copyright belongs to the author. Commercial reprint please contact the author to obtain authorization, non-commercial reprint please indicate the source.