Algorithm Lesson Note Series (vii)--split analysis

Last Update:2016-05-17 Source: Internet

Author: User

Tags compact

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This week's content is amortized analysis, which is another way to analyze the complexity of the algorithm. The basic concept is that given a series of operations, most operations are very inexpensive, and very few operations can be expensive, so a standard worst-case analysis may be too negative. The basic idea, therefore, is that when expensive operations are scarce, their costs may be halved to all operations. If the cost of the manual averaging is still cheap, we will have a tighter constraint on the operation of the entire sequence. Essentially, the averaging analysis is a strategy for a more restrictive approach to a series of operations in the worst scenario.

The difference between the averaging analysis and the average analysis is that the average situation analysis is the average of all inputs, for example, the insertion sort algorithm performs well on average for all possible inputs even if it behaves poorly in some inputs. The averaging analysis is an average operation, for example, the tableinsertion algorithm performs well on average on all operations, although some operations are time consuming. In the averaging analysis, the probabilities are not included, and the average performance of each operation is guaranteed in the worst case scenario.

There are three types of more common averaging analysis:

1. Cluster analysis: To prove that for all n, the total time of the sequence consisting of n operations in the worst case is T (N), the average cost per operation is T (n)/n; For example, the operation of the stack and the stack for an empty stack.

2. Bookkeeping method: In the accounting method of the split analysis, the average cost of each operation is determined, and different costs are assigned to different operations, and the cost of some operations is more or less than their actual cost. The number of charges we charge for an operation is called a split price. When the cost of a single operation exceeds its actual cost, the difference between the two is treated as a deposit (credit) and given to certain objects in the data structure, which can be used to compensate for operations that split the cost below its actual cost. This method differs from aggregation analysis in that the latter, all operations have the same cost. The total deposit stored in the data structure equals the difference between the total cost of the split and the total actual cost. Note: The total deposit cannot be negative. In the start stage, the excess price is stored in the pre-paid deposit, and the operation is then paid in the subsequent sequence. For example, binary counter: Computes a series of numbers with a binary trigger

3. Potential energy method: in the split analysis, the potential energy method (potential methods) does not represent the pre-paid work as a deposit in a particular object of the data structure, but instead represents the deposit as a "potential energy" or "potential", which can be released to pay for subsequent operations when needed. The potential is associated with the entire data structure rather than the individual objects within it. Dynamic tables, for example, can dynamically change the size of contiguous storage arrays.

First, cluster analysis

In cluster analysis, for a series of n operations, we calculate the total worst time t (N). In the worst case scenario, the average cost of each operation or the cost of averaging is T (n)/n. Cost t (n)/n applies to each operation (there may be several types of operations). The other two methods may assign different averaging costs to different types of operations.

For example, there are stacks of multipop operations. There are two basic stack operations that cost O (1) Time respectively: Push (s,x) and pop (S) are pressed into the stack by the object x, which pops up from the top of the stack S and returns the Popup object. Each operation is assigned a cost of 1. A series of n push and pop operations have a total consumption of n, and the actual run time for n operations is O (n).

Now add an additional stack operation to Multipop. MULTIPOP (s,k) is the top K object that pops up the stack S (or pops up the entire stack if k is larger than the stack size).

The total consumption of Multipop is min{| S|,k}.

Now consider a sequence of n Pop,push and Multipop operations on an initially empty stack. The pseudo code of the algorithm is as follows:

Here is an example:

With a rough analysis, MULTIPOP (s,k) will spend O (n) time, so

Some operations may be inexpensive in the sequence of operations, but some operations can be costly and time-consuming, such as Multipop (S,k). However, the worst operations are often not often called. As a result, the traditional worst-case single-operation analysis gives too-negative boundaries.

Our goal is, for each operation, we want to be able to give it an equally cost to the actual total cost of the delimitation. For an arbitrary sequence of n operations, we have

Here, the actual cost of step I is indicated.

Using clustering allows for more compact boundary analysis, with the same cost of averaging for all operations.

observed that the number of pop operations must be less than or equal to the number of push operations. Therefore, we can get:

Thus, on average, the MULTIPOP (s,k) step would cost O (1) instead of O (k) time.

Here's another example, consider a binary counter that counts K bits starting from 0. Use an array of bits a[0,..., k-1] to record the count. The binary number stored in the counter has the lowest-order bit in a[0], the highest-order bit in the a[k-1], and has

Initially, x=0, for i = 0,... k-1, have a[i]=0

A storage case is as follows:

The increment algorithm is used to add 1 (2^k) to a value in the counter.

The algorithm pseudo-code is described as:

Consider a sequence of n operations counted starting from 0:

So rough calculation, we can get T (n) <= kn, because an increase operation may change all k bits.

We use cluster count for compact analysis, there are basic operations Flip (1->0) and Flip (0->1)

In a sequence of n increment operations,

A[0] Flips each time the increment is called, so FLIPN times;

A[1] Flips calls incrementflip every two times, so Flip n/2 times;

...

A[i] flips times.

The cost of averaging each operation is: O (n)/n =o (1).

Second, the method of bookkeeping

The basic idea of the bookkeeping method is that for each operational op with an actual cost cop, the averaging cost is allocated so that for any sequence of n operations, there is

If so, the excess can be stored as a prepaid deposit (credit), which can be used for subsequent operations. This requirement essentially makes the deposit not negative.

We go back to the problem of the stack with the multipop operation, for such a stack, divide the cost of averaging into:

Where credit is the number of entries in the stack.

Starting with an empty stack, the maximum cost of N1 a push,n2 pop and any sequence of N3 multipop operations is, here, n = n1 + n2 + n3.

It is important to note that when there are more than one type of operation, each type of operation may be given a different cost of averaging.

The following is a banker's view of the bookkeeping method. If you are renting a machine that operates a coin and charges it according to the number of operations. Then there are two ways to pay:

A Pay the actual cost for each actual operation: For example, push pays 1 yuan, pop pays 1 yuan, Multipop pays k yuan

B Open an account and pay an average fee for each operation: for example, push pays 2 yuan, pop pays 0 yuan, multipop pays 0 yuan

If the average cost is greater than the actual cost, then the extra will be stored as credit (deposit), and if the average costs are less than the actual expenses, then credit will be used to pay the actual cost. The restrictions here are:

For any n operation, that is to say, make sure you have enough money in your account.

Here is an example:

For the previous binary counters have the same rationale, give the cost of averaging:

we can observe that Flip (0->1) has a number greater than or equal to flip (1->0), so there are

Iii. methods of potential energy

The potential energy method is to look at the problem from a physicist's point of view, the basic idea is a potential, for each operation op directly set is not so simple. Therefore, we define a potential energy function as a bridge, that is, we assign a value to a state instead of assigning an operation, so that the cost of averaging is calculated based on the potential function.

Defines the potential energy function as: where S is the state set.

The cost of averaging is set to:, so we have

To ensure that it is sufficient to guarantee

For an example of a stack, the number of entries in the stack is represented. In fact, we can simply talk about deposits as potential energy. Here the state SI represents the state of the stack after the first operation. For any of the I, there is.

Therefore, the state of the stack S is:

Then the line chart of the potential energy function is represented as:

We define as follows:

Therefore, starting with an empty stack, the N1 of Push,n2 pop and N3 multipop operations takes the most

, here n = n1 + n2 + n3.

In the binary counter, the potential energy function is set in the counter:

At this point, the line chart of the potential function is represented as:

In the counter will be set as the potential energy function, in step I, the number of flips CI is:

Therefore, we have

In other words, from 00 ... Starting at 0, a sequence of n increment operations takes up to 2n of time.

Here's a practical question to consider:

Suppose we are now being asked to develop a C + + compiler. A vector is a C + + class template that stores a series of objects. It supports the operation of:

A. push_back: Adding a new object to the end

B. Pop-back: POPs the last object

Note that vectors use a contiguous area of memory to store objects. So how do we design an effective memory allocation strategy for vectors?

This leads to the problem of dynamic tables.

In many applications, we are not able to know in advance how many objects to store in a table. Therefore, we have to allocate a certain amount of space to a table, but the final discovery is not enough. Here are two concepts:

Dynamic expansion: When a new item is inserted into a full table, the table must be re-formed into a larger table, and the objects in the original table must be copied to the new table.

Dynamic shrinkage: Similarly, if you delete many objects from a table, the table can be reassigned to a new table with smaller size.

We will give a memory allocation policy so that the cost of insertion and deletion is O (1). Even if an operation triggers an expansion or contraction, its actual cost is large.

Examples of dynamic table extensions:

Consider a sequence of operations starting from an empty stack:

Overflow extending the table after the operation:

Rough analysis, consider such an operation sequence, if we define costs based on basic insert and delete operations, then the actual cost of the first I operation CI is

The CI here = i is when the table is full, because at this point we need to insert it once and copy the I-1 entry into the new table.

If n operations are executed, then the worst-case cost for an operation would be O (n). In this case, the total run time for the total n operations is O (n^2), and not as compact as we need.

For the above scenario, if we use cluster analysis:

The first observation is that the table extension is very small, because the table extension does not occur frequently in n operations, so the boundaries of O (n^2) are not compact.

In particular, the table extension occurs in the first I operation, where i-1 happens to be a power of 2.

Therefore, we can decompose the CI into:

The total cost of such n operations is:

Therefore, the cost of averaging for each operation is 3, in other words, the average cost per Tableinsert operation is O (n)/n=o (1)

If we use the Bookkeeping method:

For the first I operation, an averaging cost is spent. This cost is consumed to run behind the operation. Any quantity that is not immediately consumed will be present in a "bank" for subsequent operations.

Therefore, for the first operation, $ A is used for the following occasions:

A. Pay-As-you-insert operation

B. The table extension after the $ $ store, including the most recent I/2 entry for the copy and the I/2 entry before the copy

The deposit will never be negative. In other words, the cost of averaging and the upper bound of the actual cost are given.

If we use the potential energy method:

A bank account can be seen as a potential energy function of a dynamic set. More specifically, we would like to have a potential energy function of this nature:

A. After an expansion,

B. Before an extension, the next expansion can be paid by potential energy

One possible scenario:

The line chart is:

Initially, and it is very easy to verify when the table is always at least half full. Then the reference to this is defined as:

In this case, it is an upper bound of the actual operation.

The following two scenarios are calculated:

Case-1: The first insert does not trigger an extension

At this point, here, Numi represents the number of table entries after the first operation, Sizei represents the size of the table, and TI represents potential energy.

Case-2: The I operation triggered an extension of a table

At this time

Therefore, starting with an empty table, a sequence of n Tableinsert operations takes O (n) in the worst case scenario.

The delete operation is similar to the analysis.

In general, because the averaging analysis of each operation is bounded by a constant, the actual cost of the sequence of any n Tableinsert and tabledelete operations on a dynamic table is O (n) if it starts with an empty table.

Averaging analysis can provide a clear abstraction of data structure performance. Any analysis method can be used when an averaging analysis is invoked, but there are some cases where each method is controversial as the simplest. Different methods may be applicable to different averaging cost assignments, and may sometimes get completely different bounds.

Algorithm Lesson Note Series (vii)--split analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More