This is a creation in Article, where the information may have evolved or changed.
In the previous article, we introduced some basic principles of persistent data structure implementation and the implementation process of Vector TRIE data structure under Golang. This article finally came to the last step in implementing the persistent List: the ability to implement Transient and persistence.
This article is part of a series of articles, if you have not browsed the rest of the article, please refer to:
- Introduction to persistent data structures
- The realization of Vector Trie
- Transient and persistence (this article)
In the previous article, we have seen how to implement a vector Trie and how to use vector Trie to implement a persistent List of shared data structures: Each time we modify, we copy all nodes from the root node to the modified node path and use the resulting new root Node constructs a new head data structure for the List. This way we can access the new data through the new head, and we can get the old data through the old head.
In this way, we need to change the interface of the list, and for each action that modifies the list element, we return a new list object instead of modifying it on the original object. For example:
1 2 3 4 5 6 7
|
type Interface { int) (interfacebool) int Interface bool) Interface {}) List Interface {}) Int }
|
To implement such an interface, we have two options:
- For
Set , PushBack and RemoveBack functions, we modify them to return to the new listHead form.
- First
TransientList , for the Set , PushBack and RemoveBack function, we change it to the first to convert itself TransientList and modify, and finally return the Transient results will be persisted
Because of our previous code design pre-prepared, the implementation of the two scenarios is not very different. But because transient supports efficient execution of a series of operations on data structures, we decided to use the second scenario. The second scenario also makes the code more concise and more reusable.
So what is Transient? Let's introduce it here.
The principle of Transient
As I said before, the implementation of persistent data structures is to replicate nodes on one path. In our design, the width of each node is 32, so if we continuously modify several adjacent elements, even if these elements are on a leaf node, it will be copied many times. Such behavior is very inefficient. To enable us to make a series of changes efficiently, one solution is to allow a persistent data structure to temporarily become non-persistent, and then change back after our series of modifications. This way, each modification will be done in situ, which greatly improves performance. The temporary non-persistent data structure here is what we call the Transient.
But we also need to know that there are some risks in using Transient. First, as a mutable data structure, it is generally implemented as a non-thread-safe type, so if you manipulate it concurrently, you may have problems with Race condition. In addition, if the user retains a reference to Transient while in use, and changes the Transient to persist after the Transient is changed, the resulting persisted object will actually be changed. In other words, introducing Transient may result in invalid persistence.
While Transient poses some risks, it is worthwhile considering the performance gains. The implementation of Transient has two key points:
- Assign each Transient a globally unique
id , guaranteed modified node is labeled as Transient each time the internal structure is modified id
- When the Transient needs to modify the node each time, it first checks whether the target node and itself have the same,
id if the same, then the node itself has been modified or copied before, so can be directly modified on the node. Otherwise this node may have been generated by the previous Transient, in order to prevent changes to the original data, we should copy the current node.
These two policies ensure that we can safely modify the Transient without altering the original data. The key is that by marking the nodes of the Vector Trie id , Transient can determine whether a node's memory is allocated by itself. For id a different node, it is generated by a Transient that has occurred in the current List modification history, and the previous Transient may have been converted to a persistent object, so those nodes should not be modified directly. If id consistent, it means that the current Transient has recently modified this node, and we can modify it again. This step is based on the assumption that persistent transient will no longer be modified by the user. This is why the immutability of the resulting persisted object is destroyed if the Transient is still modified after persistence.
The differences between persistent List and Transient when performing modifications are compared.
The above figure is the case when no Transient is used, and the right three different colors represent three successive modifications. In this case, each of our modifications will result in a new Root node and a new leaf node. This is obviously inefficient.
At the bottom of the diagram, when Transient is used, each node contains an ID (the purple mark in the image), and when the first modification is made we assign a new ID to the Transient, b examine the nodes that need to be updated during the modification, and find that they all have IDs c1/>, which is different from the current ID, so a copy is required. In the next two modifications, because Transient has the same ID in its lifetime, the ID of the target node is found to be the same as the current Transient, so we do not need to replicate the nodes, so we can update the in-place directly.
The above is the basic principle of Transient. It can be seen from this basic principle that Transient and our persistent list can actually share a set of underlying data structures, except that Transient has an ID and the list does not. In fact, in order to differentiate between these two cases, we HEAD assign a special ID to all lists, for example 0 . Conversion between List and Transient can use the following means:
- To convert the list to Transient, we use an ID generator to generate a unique ID that is different from the list ID (such as a positive integer) and assign it to the list
- Convert Transient to list, we reset Transient ID to list ID (e.g.
0 )
In our case, because of Golang's special object-oriented design, we can actually implement the List internal data structure as an alias of the Transient internal data structure.
The realization of Transient
Implementation of the Unique ID generator
For Transient, how to generate unique IDs for each modification is an important issue. In reality, there are many unique ID generation algorithms with different functions, some of them can only work in single-machine cases, some of which may guarantee the uniqueness of the distributed situation, some of the cost is higher, and some are very light. Here, we choose the simplest way: a uint64 positive integer of the cumulative type.
By accumulating positive integers in a single case, we can guarantee that the generated unique ID is unique in the current process. The principle is sync/atomic to implement the increment operation by the atomic operation under the package AddUint64 . This operation is fast and thread safe.
The following is an internal package that implements this functionality counter :
1 2 3 4 5 6 7 8 9
|
Package Counter
Import "Sync/atomic"
var UInt64 0
func UInt64 { return 1) }
|
Update of the List interface
As we said earlier, you can implement the List as an alias for Transient. In this step, we first renamed the list internal data structure implemented in the previous blog to tListHead represent him as the Head of a Transient list, and the previously implemented methods were also transferred. In addition, we will add the ID field on the new tListHead and internal Trie tree nodes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
//Transient List Head type tlisthead struct { ID uint64 lenint level int offset int Root *trienode Tail *trienode }
//Trie Node type trienode struct { ID uint64 children []interface{} }
|
Then we redefine the interface and implementation of the List:
1 2 3 4 5 6 7 8 9 10
|
type List interface { Get (n int ) (interface< /span>{}, bool ) Set (n int , value interface {}) (List, bool ) Span class= "line" > pushback (value interface {}) List Remo Veback () (List, interface {}) Transientlist () Transientlist Len () int type Listhead tlisthead |
The
New interface is no longer modified on its original basis, but instead returns a new List object for each operation. We have also added a method to convert the current list to a Transient. Notice that listhead is just an alias of Tlisthead , so they can be converted to each other in the Go language. Next we define a global public empty variable to represent the empty list, because we want all the empty lists to be the same and persistent list is not changed, so we do not create a new empty object at new and return it every time. This also saves memory consumption when creating a new List.
1 2 3 4 5
|
var empty = &listhead{0000nilNil}
func New () List { return Empty }
|
Because the List Get does not change the value of the element, we directly convert the method of type conversion listHead to tListHead and call the corresponding method of the latter to obtain the result:
1 2 3
|
func int) (interfacebool) { return (*tlisthead) (head). Get (N) }
|
For other modifications, we will first convert it to Transient after the modification is done and then persist back. This will allow you to get a new List.
1 2 3 4 5 6 7 8 9 10 11 12 13 Span class= ' line ' >14 15 16 17 18 19 20 21 22 23 24 25 |
func (head *listhead) Set (n int, value interface{}) (List, bool) { t: = head. Transientlist () if t.set (n, value) { return t.persist (), true } Else { return head, false } }
func (head *listhead) pushback (value interface{}) List { t: = head. Transientlist () T.pushback (value) return t.persist () }
func (head *listhead) removeback () (List, interface{}) { if head. len = = 1 { value, _: = head. Get (0) return empty, value } Else { t: = head. Transientlist () Value: = T.removeback () return t.persist (), Value } }
|
TransientListthe implementation of the method is given below. Since the immutable nature of the List needs to be preserved, the conversion to Transient actually requires a new build tListHead , so that changes to the Transient will not affect the original list. This invokes the previously implemented package to counter obtain a Unique ID.
1 2 3 4
|
func (Head *listhead) Transientlist () transientlist { ID: = counter. Next () return &tlisthead{id, head. Len, Head.level, Head.offset, Head.root, Head.tail} }
|
Transient the implementation of the Modify operation
The next step is to update the implementation of the Transient modification operation to ensure that no other Transient and the previous persisted List are affected. The previous list in the implementation process we have partially considered this problem, most of the operations are designed to be recursive execution, while the recursive operation of the Trie tree will be assigned to the original node. On this basis we first trieNode add clone and setChild two methods.
cloneMethod copies the contents of the current node and returns a new node, which takes a id parameter and the properties of the newly copied node are id set to this parameter.
1 2 3 4 5
|
func UInt64) *trienode { Make ([]interface{}, Node_size) Copy (Children, Node.children) return &trienode{id, children} }
|
setChildis a convenient function to modify the function of the node, its first parameter is also id . If the incoming node is the same as the id original id , then this method modifies the original node directly and returns the original node, otherwise it will be clone node and operate on the new node.
1 2 3 4 5 6 7 8 9 10
|
func UInt64 int Interface {}) *trienode { if node.id = = id { Node.children[n] = Child Return Node Else { NewNode: = Node.clone (ID) Newnode.children[n] = Child return NewNode } }
|
Each of the internal functions of the List modification operation was also added id as a parameter. In addition, if the Set front and back List contains the same value, we hope that the actual effect is that the object has not been modified, and in this step we have done some careful work to ensure as much as possible. The specific code is no longer mentioned, the complete code please refer to this file.
Here are the functions that convert Transient to persistent, because we expect the user to not modify the original Transient (although not guaranteed from the code) after persisting the Transient, so we can simply use the type conversion to tListHead convert to listHead .
1 2 3 4 5
|
func (Head *tlisthead) Persist () List { Perisithead: = (*listhead) (head) 0 return Perisithead }
|
Summarize
This article introduces the implementation principle of Transient and the method of finally implementing persistent List. It can be seen that Transient is a data structure introduced to improve the efficiency of persistent list under continuous modification, while introducing Transient also simplifies the complexity of persistent list implementations. However, if the user uses Transient in an incorrect manner, the persistence of persistent lists can be compromised. In the case of Transient, the modification of the persisted List is implemented as a method of first converting to Transient and modifying, eventually Transient persisted.
So far, we have implemented a more complete List class for persistence. The persistent List class is one of the easiest implementations of persisted data structures, but by studying its implementation process, we can realize some of the main ideas of implementing persistent data structure. The end of this article announced that functional go this series of Blog temporarily. The next series will begin to explore another kind of very important data structure MAP persistence implementation method (Hash Array Mapped Trie).
The code implemented in this article has been open source on GitHub. As planned, with the update of the Blog, I will continue to add further implementation of the persistent data structure to this warehouse. Readers are also welcome to comment on the code I have implemented or to give feedback on bugs and contribution codes.