Data results and algorithm------

Source: Internet
Author: User

7. Advanced Sorting:

Hill sort: Simple unpleasant and not slow stable

Insert Sort: The number of copies is too many to behave badly in extreme situations

N-Delta ordering: a formula for gradually decreasing the increment calculation interval h=3*h+1 Knuth

0 1 2 3 4 5 6 7 8 9 10 11 12

The first small certainly in the first group

The second small certainly in the second group, the first group

h=4,8

H=4

Outer=4

temp = array[4]

Innner=4

0 and 4 for comparison

0 and 4 swap positions

Outer =5

temp = array[5]

Inner =5

4 and 0

5 and 1

6 and 2

7 and 3

8 and 4

9 and 5

Does the algorithm not repeat the comparison?

Gradually decrease the interval, and at last must be equal to 1 the last trip sort is a normal insert sort


Division: Hubs

Guaranteed to be smaller than the pivot value on the left, larger than the pivot value on the right

Extreme: All data is smaller or larger than the pivot value and operates in vain

Efficiency of Division: O (N)


Quick sort:

Choice of pivot: Any data item of the specific data item

Essence: Find the ultimate purpose of the hub exchange position

Ideal: The median value of the data item should be selected for the worst Case O (N2)

Choosing the right hub becomes the key: all data calculation no random calculation no three data items in the

Handling small divisions: Using insert sorting for small partitions

Eliminate recursion: Now the system can be more efficient in handling the invocation of the method


Cardinal sort: Do not need to compare the sorting using the computer high-speed bit operation

First trip Sub sort second order ....

Efficiency of Cardinal Sort:




8. Two fork Tree: ordered Array (fast query, free space time), linked list (insert fast)

Ordered linked list

Common terminology: path, root (path to any node with only one), parent node, child node, leaf node, subtree, access (perform some action), traversal, layer, keyword,

Tree solves the problem: Edge (Reference) node (object). Multi-Tree----> Special case, Binary tree

Binary tree: The key value of the left child node of a node is small and the right child node is large

An analogy: Hierarchical file structure


Binary search tree work:

Unbalanced tree: Missing left and right child nodes, non-balanced tree when insert order is ascending or descending

Node class concepts, data items abstracted out

Tree class: Root node

Node class: Data item, left Dial hand node, right child node

Change and delete

Tree query Efficiency: log2n, the time to find a node depends on the number of layers on which the node resides

Inserting the tree:

Traversal: Pre-order, middle-order, post-sequence recursive traversal prefix expression, suffix expression

Find maximum minimum value: Leftmost node minimum, rightmost node maximum

The nature of the tree structure: the leftmost node of each tree is the minimum, the rightmost node is the maximum value, as long as it satisfies this requirement

To delete a node: 1. Delete the leaf node, remove it from the tree,

2. Delete the node that contains a child node, change the parent node's reference to point to the child node

3. Delete the node balance with two child nodes was broken. Need to be filled up. That's so smart.

Binary tree efficiency: large o notation, log2n a full tree most of the nodes are on the ground floor.

Using an array representation tree: Location 2*index+1 actions not allowed to be deleted

Duplicate keyword issues




Compression, Decompression principle:

Huffman coding: The principle of reducing the number of bits that represent the characters commonly used character each code cannot be a prefix for other code

The number of characters with the most frequency table occurrences should be minimal

Create Huffman Tree decoding:

Create Huffman code




9. Red-Black Tree: Adhere to certain rules of insertion, the purpose of maintaining balance

Binary tree problem: orderly insertion does not behave well there is an unbalanced tree insertion efficiency lon2n to n

Balanced remedy: Red-black Tree features: nodes have color

Red and black Rules: Each node is either red or black

The roots are always black

If the node is red, the child node must be black

Each path from the root to the leaf node must contain the same number of black nodes, that is, the black height must be the same

Repeat the balance of the keyword:

Fix violation: Change the color of a node to perform a rotation operation

Non-balanced tree

In short, the purpose of the tree is relatively balanced, the process is quite cumbersome

Other balance tree: AVL tree Saozi Right subtree height difference cannot exceed 1




10.2-3-4 Tree and external storage

Each node has a maximum of 4 nodes, 3 data items all leaf nodes are always on the same layer

Non-leaf nodes exist: The number of child nodes is more than the number of data items in a node

An empty node does not exist

The keywords for all child nodes of the child1 subtree are less than key0 and are greater than key1

Child2 the key value of all child nodes is greater than key1 and less than Key2

Search 2-3-4 Tree:

Insert:

Node splitting: When looking down the insertion position, the node is full and must be split in order to maintain balance

Move data up and to the right

Splitting in the downlink: the parent node of any split node is definitely not full

When to split: when it's full, it splits.

2-3-4 trees can be turned into red and black trees: rotation and discoloration are equivalent to splitting

Storage requirements: Just take advantage of 5/7 of the available space red-Haishi on storage higher than 2-3-4 tree utilization

2-3 Tree: Node splitting: 2-3-4 The new data is inserted after all splits are completed, and the new data items in the 2-3 tree must participate in the splitting process

The purpose of division is to maintain balance.


External storage: Data stored in RAM, disk storage

Quick Find, insert, and delete

Main memory: One second of one out of 10,000 accesses a byte

Disk: 1 per thousand 10 ms for 1 seconds read and write head moved to the correct track, read and write head rotated to the correct position 10000 laps per minute,

Data is stored in chunks to access one block of data at a time

Order ordered: Sort all records by a keyword

Looking for: 2 points find

Insert: Frequent read and write operation keywords that move too frequently on blocks of data


B-Tree: Multi-forked tree commemorates our Nephilim r.bayer and e.m.mccreight data structures for external storage

One data block per node in Disk link is the number of blocks in the file int type

17-Step B-Tree

B-Tree efficiency: Read efficiency log9n Delete, insert 5+6<500000



Index: Keyword---block list index is much smaller than the actual file record, can be put in memory completely

Find:

inserting: inserting

Multilevel index

The index is too large for memory: stored as a B-tree structure data item in disk Save keyword and point to a block pointer in the main file

Combining search criteria: sequential lookups

External file sort: Block internal ordered read two pieces together into an orderly merge sort


11. Hash table

The hash table is faster than the tree

Cons: Performance degradation is very severe when array-based, hard-to-scale, hash tables are basically filled

Hashiha: keyword directly for array subscript, hash function

Keyword as index, array

Dictionaries: dictionaries, compilers (hash tables, symbol tables)

Word and array subscript establish contact number add

50,000 words there is no way to divide the word enough to open the power of 27 to produce an array subscript too much

Hashiha: Take the remainder.

Hash function: Converts a large range of numeric hashes to a small range of numbers

Take medium: Every two array units, there is a word, these units no words

Conflict: There are not too many words with the same array subscript

Solutions for conflict Resolution: Open address law by increasing the vacancy and reducing the conflict chain address method:


Development Address method: Search for blank units (linear detection, two-time detection, re-hashing)

Linear probing: aggregation, resulting in longer probe lengths, means that the last unit of the access sequence is time consuming. Filling factor: The ratio of the data item and the length of the table filled into the hash table is called the filling factor

Two probes: Detecting a unit step farther apart is the square of the number of steps: two aggregates

Re-hashing: Dependent keyword probe sequence stepsize = constant * (key% constant)

Link Address method:

Hash function: Fast calculation, random keyword

Collapse: Keyword grouping

Efficiency of hashing: constant time of hash function + probe length

Linear probing: Successful query p= (1-L2)/2 failed query p= (+)/(1-L)/2

List Address method: Successful 1+LOADFACTOR/2 failure 1+loadfactor

Two probes and re-hashing: Success log2 (1-loadfactory)/loadfactory failure 1/(1-loadfactory)

Comparison of open address method and chain address method: The capacity is known, the filling factor is less than 0.5 unknown chain address method is better

Hashing and external storage: Index



12. Heap

Priority queue insertion and deletion time complexity is logn full binary tree Each node's keywords are larger than the node's child nodes

Weak order: Do not support traversal does not support lookup of keywords

Remove: Move the last node of the root to the location of the root filter down

Insert: Filter up

It's not really swapping. Reduce the number of replications

Efficiency of the heap: log2n

Heap sorting: More replication times than fast sorting, time complexity O (LOGN)



13. Figure

Edges and vertices

The connectivity adjacency path between intersection points

Connected graphs and non-connected graphs

Graph without direction

Euler 7 Bridge problem


















Data results and algorithm------

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.