Array, linked list and its performance, Array Performance
The final version of the record Link
Patient: Doctor, it hurts me.
Doctor: OK
I said on Twitter:
You reminded me to usereduce
Although it is interesting to build an array, there is a risk of halving the performance.
Many people think this sentence is strange, which surprised me very much. A considerable number of people suggestreduce
The version is changed to do not copy the Array (INot ConsideredThis is feasible ). There are also suggestions that you need+
The operator is optimized so that it does not perform copy operations (I also don't think it is very simple, and soon we will realize this ).
Unless mentioned in this document, otherwise, you don't need to care about these minor issues (and I think this is something you must pay attention to when writing code-to say, "I only pay attention to it when the document tells me that there is a problem here" is like" I write the correct code only when the unit test results are incorrectly displayed .)
Some of the other feedbacks are related to the linked list article I posted earlier. Why is an outdated data structure implemented? We already have an array. What is the significance of its existence?
So, you know why I sometimes mention that this is not just a blog about Mac and iOS programming? Of course this is not just a blog about Mac and iOS programming! Don't put a linked list containing enumeration in your app because I accidentally think it's interesting. Because I may also be interested in your performance problems, but you will not. Even so, I think the Linked List Example is very interesting and worth implementing and playing. It may be used to improve the array.reduce
The method performance helps. It is even useful for actual coding in some (rare) scenarios.
At the same time, I think some additional features of Swift: for example, its enumeration can be flexibly applied to objects and specific methods, as well as "Default Security ". These have prompted it to become a very good CS teaching language. The future version of this book may use Swift as the implementation language.
So, in general-sometimes, you will find thatreduce
To build an array (dictionary or set). For example, the following usesreduce
Ofmap
Implementation:
extension SequenceType { func mapUsingReduce<T>(transform: Generator.Element->T) -> [T] { return reduce([]) { $0 + [transform($1)] } }}
Create a variable array and then usefor
Implementation of loop ing:
extension SequenceType { func mapUsingFor<T>(transform: Generator.Element->T) -> [T] { var result: [T] = [] for x in self { result.append(transform(x)) } return result }}
The difference is that,+
Each operation copies an ever-increasing array. The time consumed for copying an array is linear, that is, traversing the entire array. Therefore, as the length of the mapped array increases, the total time consumed increases by the second power.
Even so, it is normal that people will not implement it againmap
Function: You can see more of these tips: tell you to deduplicate or create a dictionary based on the word frequency. However, the most essential problems still exist.
But what does this have to do with the linked list? Because you can use the code of the last linked list to implementreduce
Ofmap
Version, as shown below:
extension SequenceType { func mapToList<T>(transform: Generator.Element->T) -> List<T> { return reduce(List()) { $0.cons(transform($1)) }.reverse() }}
The result is that the performance of this version is only half of that of the array version (becausereverse
This step), so that your teacher will suspect that you are cheating on the test results, rather than the results of the experiment:
The reason for this result is that the linked list is continuous-the old and new linked lists are always connected by nodes. Therefore, no copy is required. However, the cost is that the length of the linked list can only be increased from the header (reverse
), And the linked list must be completely unchanged. So even if the linked list has only one reference, you need to copy it before modifying it. This andArray
There is a difference,Array
It can detect when its buffer zone is accessed separately, and then it can be directly modified without copying it. There are other costs for using the linked list-the time required to count the number of nodes in the linked list is twice the time required to count the number of array elements, because the indirect addressing method in the traversal chain table takes time.
Therefore, the array+
Can the full copy operation be solved? Before considering this issue, let's take a look at how it can be helpful to copy an array during write. An article by Mike Ash, Niu X, has implemented an array copy at write time, so we made a slight change using the standard libraryManagedBuffer
Class to copy the array during write.
ManagedBuffer
ManagedBuffer
Is a class that can be inherited and used to simplify memory allocation/release operations and stack memory management. It is generic and hasValue
AndElement
These two independent placeholders,Element
The type of the block that stores n elements. It is dynamically allocated when an instance is created.Value
It is the type of the variable that is appended to store other information-for example, to implement an array, you need to store the number of elements, because the elements need to be destroyed before the memory is released. Access to elements is throughwithUnsafeMutablePointerToElements
The access to value can be implemented through a simple non-security method or directly.value
Attribute.
The following code implements a simple self-destruction Array Buffer:
private class MyArrayBuffer<Element>: ManagedBuffer<Int,Element> { deinit { self.withUnsafeMutablePointerToElements { elems->Void in elems.destroy(self.value) } }}
In this way,MyArrayBuffer
The stored elements are still generic,ManagedBuffer
OfValue
SetInt
To save the number of buffer elements (one thing to remember is that we will allocate more space to the elements in the array to avoid frequent redistribution operations ).
When the buffer zone is destructed,MyArrayBuffer.deinit
TheManagedBuffer.deinit
Previously called,ManagedBuffer.deinit
Memory is released. In this caseMyArrayBuffer
It will have the opportunity to destroy all its objects. IfElement
It is not just a passive struct, so it is very necessary to destroy objects. For example, if the array contains other copy-on-write types, destroying them will trigger them to release their own memory.
Now we can create an array-type struct that uses a private buffer for storage:
public struct MyArray<Element> { private var _buf: MyArrayBuffer<Element> public init() { _buf = MyArrayBuffer<Element>.create(8) { _ in 0 } as! MyArrayBuffer<Element> }}
We do not directly useMyArrayBuffer
Ofinit
-- UseManagedBuffer
. Because this method returns the parent class, and then we forcibly convert it to the correct type.
Then we willMyArray
To a collection type:
extension MyArray: CollectionType { public var startIndex: Int { return 0 } public var endIndex: Int { return _buf.value } public subscript(idx: Int) -> Element { guard idx < self.endIndex else { fatalError("Array index out of range") } return _buf.withUnsafeMutablePointerToElements { $0[idx] } }}
Next, we need to add two very similar methods for the buffer, one for copying the memory, and the other for adjusting the memory size. The copy method is called when shared storage is detected. The resize method is called when more memory is needed:
extension MyArrayBuffer { func clone() -> MyArrayBuffer<Element> { return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in return MyArrayBuffer<Element>.create(self.allocatedElementCount) { newBuf in newBuf.withUnsafeMutablePointerToElements { newElems->Void in newElems.initializeFrom(oldElems, count: self.value) } return self.value } as! MyArrayBuffer<Element> } } func resize(newSize: Int) -> MyArrayBuffer<Element> { return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in let elementCount = self.value return MyArrayBuffer<Element>.create(newSize) { newBuf in newBuf.withUnsafeMutablePointerToElements { newElems->Void in newElems.moveInitializeFrom(oldElems, count: elementCount) } self.value = 0 return elementCount } as! MyArrayBuffer<Element> } }}
Creating and filling the buffer at the same time is a little harsh-First we need to get a non-safe pointer to an existing element, and then callcreate
The closure of this method will receive an object with only a part of the construction (for example, the memory is allocated but not initialized). This object needs to be called later.newBuf.withUnsafeMutablePointerToElements
To copy the memory from the old buffer zone to the new buffer zone.
The main difference between the two methods is thatclone
It does not change the elements in the old buffer, but simply loads the new copy to the new buffer.resize
The elements will be moved from the old memory to the new memory (throughUnsafeMutablePointer
OfmoveInitializeFrom
Method), and then update the old buffer to tell it that it does not need to manage any elements -- otherwise, it will trydeinit
They are destroyed.
FinallyMyArray
Addappend
Andextend
Method:
extension MyArray { public mutating func append(x: Element) { if !isUniquelyReferencedNonObjC(&_buf) { _buf = _buf.clone() } if _buf.allocatedElementCount == count { _buf = _buf.resize(count*2) } _buf.withUnsafeMutablePointers { (val, elems)->Void in (elems + val.memory++).initialize(x) } } public mutating func extend<S: SequenceType where S.Generator.Element == Element>(seq: S) { for x in seq { self.append(x) } }}
This is just a sample code. In fact, you may extract the code for determining uniqueness and resizing Code separately, so that you can reuse the code in the subscript set and other slightly changed methods, and I just kept them all for the sake of simplicity.append
Method. If possibleappend
Reserve enough space for it to expand, so as to prevent dual copies of the buffer that is shared at the same time and the space is too small. But all these things have little impact on our great blueprint.
Now we have the operator. First,+=
, Value assignment operator, which is on the leftinout
, Use the content on the right to expand the left:
func +=<Element, S: SequenceType where S.Generator.Element == Element> (inout lhs: MyArray<Element>, rhs: S) { lhs.extend(rhs)}
Finally+
Operator. We can+=
Operator. This operator passes in two immutable arrays and then combines them into a new array. It relies on the write copy operation to create a variable copy for the left-side content, and then uses the content on the right to expand:
func +<Element, S: SequenceType where S.Generator.Element == Element> (lhs: MyArray<Element>, rhs: S) -> MyArray<Element> { var result = lhs result += rhs return result}
In fact, you canlhs
Used before variablevar
Identifier to further shorten the Code:
func +<Element, S: SequenceType where S.Generator.Element == Element> (var lhs: MyArray<Element>, rhs: S) -> MyArray<Element> { lhs += rhs return lhs}
I mentioned the second version because someone proposed a better one.reduce
The policy may be added to the accumulate parameter.var
Identifier. This is exactly the samelhs
What happened is similar:var
All you do is declare that the input parameters are variable. It is still a copy -- it does not pass the reference of the original value in some way.
+ Can operators be optimized?
Now we have a completely usable prototype for copying an array during write, you can do itappend
Operation, it also implements+
Operator. This means we can use it for rewriting.reduce
Versionmap
Method:
func mapUsingMyReduce<T>(transform: Generator.Element->T) -> MyArray<T> { return reduce([]) { $0 + [transform($1)] }}func mapUsingMyFor<T>(transform: Generator.Element->T) -> MyArray<T> { var result = MyArray<T>() for x in self { result.append(transform(x)) } return result}
If you use charts to record the performance, you will find that the two pieces of code are similar to the implementation of arrays.
Therefore, the current situation is that we have an implementation completely controlled by ourselves, and we can change it.+
Then let it not copy? I don't think we did.
Let's look at a simpler example. We can change the following code:
var a = MyArray<Int>()a.extend(0..<3)let b = a + [6,7,8]
Then let it not copy it? Obviously, we cannot.b
It must be a new copy of the array for the purpose of not affectinga
. Even if we are creatingb
Later incorrecta
There is no way to know the implementation of the + operator when making any changes. MaybeCompilerWill know, and then optimize according to the situation, but the + method is impossible to know.
Checking the unique reference here is not helpful.a
Still exist, solhs
It cannot be the only holder of the buffer zone.
reduce
There is no difference between the methods. The following is a possible implementation:
extension SequenceType { func myReduce<T>(initial: T, combine: (T,Generator.Element)->T) -> T { var result = initial for x in self { result = combine(result,x) } return result }}
Assume thatcombine
Yes{ $0 + [transform($1)] }
, You will find+
The operator also does not know that we directly assign the resultresult
Variable. Check the code and we will know that it is feasible to add the content on the right to the content on the left if possible (Theoretically yes, although the array is passed with an immutable value, because its buffer zone is a class, so that it has the reference semantics, and thus can be changed ). However+
It is impossible for an operator to know this by its position alone. It only explicitly knows that the copy of the Left content is not the only holder of the buffer. There is another holder:reduce
Holdresult
And immediately discard it and use the new result to replace it, but this is all executed in the + operation.After.
There is also a glimmer of hope that if each array is just a part of them (but it is not -- it is calledArraySlice
It requires additional overhead to record the start and end points of the Shard to the parent array ). If they are, they can be modified to allow one or only one array.append
The operation is ignored by other arrays. However, this may usually increase the overhead of the array, and the whole array is designed to be fast-You certainly won't let them slow down in this case.
There may be a very clever way to solve all these problems, and the compiler may not need help. However, this is still not a good idea.+
Operator semantics is to create a new array. It is obviously not the correct solution to make it implicitly modify an existing array in some very specific circumstances. If necessary, you canvar
Encapsulated in a small virtual method, as if it was not used. But it can still make your code more efficient.