Array, linked list and its performance, Array Performance

Last Update:2015-09-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The final version of the record Link

Patient: Doctor, it hurts me.
Doctor: OK

I said on Twitter:

You reminded me to usereduceAlthough it is interesting to build an array, there is a risk of halving the performance.

Many people think this sentence is strange, which surprised me very much. A considerable number of people suggestreduceThe version is changed to do not copy the Array (INot ConsideredThis is feasible ). There are also suggestions that you need+The operator is optimized so that it does not perform copy operations (I also don't think it is very simple, and soon we will realize this ).

Unless mentioned in this document, otherwise, you don't need to care about these minor issues (and I think this is something you must pay attention to when writing code-to say, "I only pay attention to it when the document tells me that there is a problem here" is like" I write the correct code only when the unit test results are incorrectly displayed .)

Some of the other feedbacks are related to the linked list article I posted earlier. Why is an outdated data structure implemented? We already have an array. What is the significance of its existence?

So, you know why I sometimes mention that this is not just a blog about Mac and iOS programming? Of course this is not just a blog about Mac and iOS programming! Don't put a linked list containing enumeration in your app because I accidentally think it's interesting. Because I may also be interested in your performance problems, but you will not. Even so, I think the Linked List Example is very interesting and worth implementing and playing. It may be used to improve the array.reduceThe method performance helps. It is even useful for actual coding in some (rare) scenarios.

At the same time, I think some additional features of Swift: for example, its enumeration can be flexibly applied to objects and specific methods, as well as "Default Security ". These have prompted it to become a very good CS teaching language. The future version of this book may use Swift as the implementation language.

So, in general-sometimes, you will find thatreduceTo build an array (dictionary or set). For example, the following usesreduceOfmapImplementation:

extension SequenceType {   func mapUsingReduce<T>(transform: Generator.Element->T) -> [T] {        return reduce([]) { $0 + [transform($1)] }    }}

Create a variable array and then useforImplementation of loop ing:

extension SequenceType {   func mapUsingFor<T>(transform: Generator.Element->T) -> [T] {        var result: [T] = []        for x in self { result.append(transform(x)) }        return result    }}

The difference is that,+Each operation copies an ever-increasing array. The time consumed for copying an array is linear, that is, traversing the entire array. Therefore, as the length of the mapped array increases, the total time consumed increases by the second power.

Even so, it is normal that people will not implement it againmapFunction: You can see more of these tips: tell you to deduplicate or create a dictionary based on the word frequency. However, the most essential problems still exist.

But what does this have to do with the linked list? Because you can use the code of the last linked list to implementreduceOfmapVersion, as shown below:

extension SequenceType {    func mapToList<T>(transform: Generator.Element->T) -> List<T> {        return reduce(List()) { $0.cons(transform($1)) }.reverse()    }}

The result is that the performance of this version is only half of that of the array version (becausereverseThis step), so that your teacher will suspect that you are cheating on the test results, rather than the results of the experiment:

The reason for this result is that the linked list is continuous-the old and new linked lists are always connected by nodes. Therefore, no copy is required. However, the cost is that the length of the linked list can only be increased from the header (reverse), And the linked list must be completely unchanged. So even if the linked list has only one reference, you need to copy it before modifying it. This andArrayThere is a difference,ArrayIt can detect when its buffer zone is accessed separately, and then it can be directly modified without copying it. There are other costs for using the linked list-the time required to count the number of nodes in the linked list is twice the time required to count the number of array elements, because the indirect addressing method in the traversal chain table takes time.

Therefore, the array+Can the full copy operation be solved? Before considering this issue, let's take a look at how it can be helpful to copy an array during write. An article by Mike Ash, Niu X, has implemented an array copy at write time, so we made a slight change using the standard libraryManagedBufferClass to copy the array during write.

ManagedBuffer

ManagedBufferIs a class that can be inherited and used to simplify memory allocation/release operations and stack memory management. It is generic and hasValueAndElementThese two independent placeholders,ElementThe type of the block that stores n elements. It is dynamically allocated when an instance is created.ValueIt is the type of the variable that is appended to store other information-for example, to implement an array, you need to store the number of elements, because the elements need to be destroyed before the memory is released. Access to elements is throughwithUnsafeMutablePointerToElementsThe access to value can be implemented through a simple non-security method or directly.valueAttribute.

The following code implements a simple self-destruction Array Buffer:

private class MyArrayBuffer<Element>: ManagedBuffer<Int,Element> {    deinit {        self.withUnsafeMutablePointerToElements { elems->Void in            elems.destroy(self.value)        }    }}

In this way,MyArrayBufferThe stored elements are still generic,ManagedBufferOfValueSetIntTo save the number of buffer elements (one thing to remember is that we will allocate more space to the elements in the array to avoid frequent redistribution operations ).

When the buffer zone is destructed,MyArrayBuffer.deinitTheManagedBuffer.deinitPreviously called,ManagedBuffer.deinitMemory is released. In this caseMyArrayBufferIt will have the opportunity to destroy all its objects. IfElementIt is not just a passive struct, so it is very necessary to destroy objects. For example, if the array contains other copy-on-write types, destroying them will trigger them to release their own memory.

Now we can create an array-type struct that uses a private buffer for storage:

public struct MyArray<Element> {    private var _buf: MyArrayBuffer<Element>    public init() {        _buf = MyArrayBuffer<Element>.create(8) { _ in 0 } as! MyArrayBuffer<Element>    }}

We do not directly useMyArrayBufferOfinit-- UseManagedBuffer. Because this method returns the parent class, and then we forcibly convert it to the correct type.

Then we willMyArrayTo a collection type:

extension MyArray: CollectionType {    public var startIndex: Int { return 0 }    public var endIndex: Int { return _buf.value }    public subscript(idx: Int) -> Element {        guard idx < self.endIndex else { fatalError("Array index out of range") }        return _buf.withUnsafeMutablePointerToElements { $0[idx] }    }}

Next, we need to add two very similar methods for the buffer, one for copying the memory, and the other for adjusting the memory size. The copy method is called when shared storage is detected. The resize method is called when more memory is needed:

extension MyArrayBuffer {    func clone() -> MyArrayBuffer<Element> {        return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in            return MyArrayBuffer<Element>.create(self.allocatedElementCount) { newBuf in                newBuf.withUnsafeMutablePointerToElements { newElems->Void in                    newElems.initializeFrom(oldElems, count: self.value)                }                return self.value            } as! MyArrayBuffer<Element>        }    }    func resize(newSize: Int) -> MyArrayBuffer<Element> {        return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in            let elementCount = self.value            return MyArrayBuffer<Element>.create(newSize) { newBuf in                newBuf.withUnsafeMutablePointerToElements { newElems->Void in                    newElems.moveInitializeFrom(oldElems, count: elementCount)                }                self.value = 0                return elementCount            } as! MyArrayBuffer<Element>        }    }}

Creating and filling the buffer at the same time is a little harsh-First we need to get a non-safe pointer to an existing element, and then callcreateThe closure of this method will receive an object with only a part of the construction (for example, the memory is allocated but not initialized). This object needs to be called later.newBuf.withUnsafeMutablePointerToElementsTo copy the memory from the old buffer zone to the new buffer zone.

The main difference between the two methods is thatcloneIt does not change the elements in the old buffer, but simply loads the new copy to the new buffer.resizeThe elements will be moved from the old memory to the new memory (throughUnsafeMutablePointerOfmoveInitializeFromMethod), and then update the old buffer to tell it that it does not need to manage any elements -- otherwise, it will trydeinitThey are destroyed.

FinallyMyArrayAddappendAndextendMethod:

extension MyArray {    public mutating func append(x: Element) {        if !isUniquelyReferencedNonObjC(&_buf) {            _buf = _buf.clone()        }        if _buf.allocatedElementCount == count {            _buf = _buf.resize(count*2)        }        _buf.withUnsafeMutablePointers { (val, elems)->Void in            (elems + val.memory++).initialize(x)        }    }    public mutating func extend<S: SequenceType where S.Generator.Element == Element>(seq: S) {        for x in seq { self.append(x) }    }}

This is just a sample code. In fact, you may extract the code for determining uniqueness and resizing Code separately, so that you can reuse the code in the subscript set and other slightly changed methods, and I just kept them all for the sake of simplicity.appendMethod. If possibleappendReserve enough space for it to expand, so as to prevent dual copies of the buffer that is shared at the same time and the space is too small. But all these things have little impact on our great blueprint.

Now we have the operator. First,+=, Value assignment operator, which is on the leftinout, Use the content on the right to expand the left:

func +=<Element, S: SequenceType where S.Generator.Element == Element>  (inout lhs: MyArray<Element>, rhs: S) {    lhs.extend(rhs)}

Finally+Operator. We can+=Operator. This operator passes in two immutable arrays and then combines them into a new array. It relies on the write copy operation to create a variable copy for the left-side content, and then uses the content on the right to expand:

func +<Element, S: SequenceType where S.Generator.Element == Element>  (lhs: MyArray<Element>, rhs: S) -> MyArray<Element> {    var result = lhs    result += rhs    return result}

In fact, you canlhsUsed before variablevarIdentifier to further shorten the Code:

func +<Element, S: SequenceType where S.Generator.Element == Element>  (var lhs: MyArray<Element>, rhs: S) -> MyArray<Element> {    lhs += rhs    return lhs}

I mentioned the second version because someone proposed a better one.reduceThe policy may be added to the accumulate parameter.varIdentifier. This is exactly the samelhsWhat happened is similar:varAll you do is declare that the input parameters are variable. It is still a copy -- it does not pass the reference of the original value in some way.

+ Can operators be optimized?

Now we have a completely usable prototype for copying an array during write, you can do itappendOperation, it also implements+Operator. This means we can use it for rewriting.reduceVersionmapMethod:

func mapUsingMyReduce<T>(transform: Generator.Element->T) -> MyArray<T> {    return reduce([]) { $0 + [transform($1)] }}func mapUsingMyFor<T>(transform: Generator.Element->T) -> MyArray<T> {    var result = MyArray<T>()    for x in self { result.append(transform(x)) }    return result}

If you use charts to record the performance, you will find that the two pieces of code are similar to the implementation of arrays.

Therefore, the current situation is that we have an implementation completely controlled by ourselves, and we can change it.+Then let it not copy? I don't think we did.

Let's look at a simpler example. We can change the following code:

var a = MyArray<Int>()a.extend(0..<3)let b = a + [6,7,8]

Then let it not copy it? Obviously, we cannot.bIt must be a new copy of the array for the purpose of not affectinga. Even if we are creatingbLater incorrectaThere is no way to know the implementation of the + operator when making any changes. MaybeCompilerWill know, and then optimize according to the situation, but the + method is impossible to know.

Checking the unique reference here is not helpful.aStill exist, solhsIt cannot be the only holder of the buffer zone.

reduceThere is no difference between the methods. The following is a possible implementation:

extension SequenceType {    func myReduce<T>(initial: T, combine: (T,Generator.Element)->T) -> T {        var result = initial        for x in self {            result = combine(result,x)        }        return result    }}

Assume thatcombineYes{ $0 + [transform($1)] }, You will find+The operator also does not know that we directly assign the resultresultVariable. Check the code and we will know that it is feasible to add the content on the right to the content on the left if possible (Theoretically yes, although the array is passed with an immutable value, because its buffer zone is a class, so that it has the reference semantics, and thus can be changed ). However+It is impossible for an operator to know this by its position alone. It only explicitly knows that the copy of the Left content is not the only holder of the buffer. There is another holder:reduceHoldresultAnd immediately discard it and use the new result to replace it, but this is all executed in the + operation.After.

There is also a glimmer of hope that if each array is just a part of them (but it is not -- it is calledArraySliceIt requires additional overhead to record the start and end points of the Shard to the parent array ). If they are, they can be modified to allow one or only one array.appendThe operation is ignored by other arrays. However, this may usually increase the overhead of the array, and the whole array is designed to be fast-You certainly won't let them slow down in this case.

There may be a very clever way to solve all these problems, and the compiler may not need help. However, this is still not a good idea.+Operator semantics is to create a new array. It is obviously not the correct solution to make it implicitly modify an existing array in some very specific circumstances. If necessary, you canvarEncapsulated in a small virtual method, as if it was not used. But it can still make your code more efficient.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Array, linked list and its performance, Array Performance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Array, linked list and its performance, Array Performance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support