The white of data structure is a set of structures stored in a storage structure and logical structure. Of course, it also includes its operational structure (this schema is not the architecture (⊙o⊙)).
Here are some basic concepts to consider:
1. Data
Data (Date) is the carrier of external world information. It can be computer identification, storage and processing, is the processing of computer programs raw materials. Computer programs handle a wide variety of data, which can be numeric data, such as integers, real numbers, or complex numbers, and are used primarily for engineering calculations, scientific calculations, and business processing, or non-numeric data, such as characters, text, graphics, images, sounds, and so on.
2. Data elements and data items
The data element is the basic but the smell of the data, which is usually considered and handled as a whole in a computer program. Data elements are sometimes also made into elements, nodes, vertices, records, and so on.
A data element can consist of several data items . Data items are indivisible and contain the smallest unit of data with independent meaning. Data items are sometimes referred to as Fields (field) or domains (domain). For example, in a database information processing system, a record of a data table is a data element. The fields of student's number, name, gender, birthplace, date of birth, and achievement in this record are data items.
Data items are divided into two types. An elementary term, such as a student's gender place of origin, can no longer be divided in the process of processing; the other is called combination items, such as students ' grades, which are subdivided into smaller items such as mathematics, physics, and chemistry.
3. Data Objects
The data Object is a collection of data elements of the same nature. It is a subset of the data, a collection of data elements of the same nature. In a specific problem, the data elements are of the same nature and belong to the same data object,
A data element is an instance of a data element class. For example, the integer data object is {0,1,2,3, }, the character data object is {a,b,c,d,... }. In transport consulting systems, all vertices are a data element class, and vertex a and vertex B each represent a city, and are two instances of the data element class, with the values of a and B respectively.
4. Data Type
The data type is the concept of a high-level programming language, the value range of data and the sum of operations on the data. The data type specifies the properties of the object in the program. The result of each variable, constant, or expression in the program should belong to a certain data type. For example, an integer variable in C has a range of integers (the interval size varies by computer ), and theoperations defined on it are subtraction and modulo arithmetic operations.
5. Data Structure
To put it simply, data Structure refers to the relationship between the information and the data. In any problem, the data elements are not isolated, but there is a certain relationship, this relationship is called Structure (Structure).
For example, there is a student selection schedule, this table is the data, the score table records the class of each student selected grades, each student's name is a row to form a record. Each record consists of a field consisting of a name, a school number, a course score, and each record is a node, also known as a data element, and each field is a data item. The Name field takes a value range of character, and the course score field is integer. The data of the student's selection score table is a group of students ' performance information, which has the same characteristics, belongs to the same data object, there is a sequential relationship between the adjacent data elements, in ascending order by the number of learners.
For the data structure consists of three parts: logical structure; storage structure; arithmetic set;
And there are four basic structures of data structures:
• Collection structure: In a collection structure, the relationship between data elements is "belong to the same collection". A collection structure is a structure with very loose element relationships.
• Linear structure: The data element of the structure has a one-to-two relationship, that is, a data element is only related to another data element (only one start endpoint and one endpoint).
• Tree structure: There is a one-to-many relationship between the data elements of the structure, that is, a data element is only related to another number of data elements (only one start endpoint and multiple endpoints).
• Graph structure: There is a many-to-many relationship between data elements of this structure, that is, there are multiple relationships between data elements. The graphical structure is also referred to as a mesh structure (each point can be used as a precursor and a successor point).
The image model is as follows:
The storage structure is divided into four types:
sequential storage; chain storage; index storage; hash storage;
Definition and representation of the algorithm
1. Definition of the algorithm
Do everything has a certain step, these steps are sequential, and indispensable, so broadly speaking, the algorithm is to solve the problem and the steps and methods, in the program design, the algorithm is in a finite step to solve a problem using a set of well-defined sequence of instructions, popular point, is the process of computer problem solving. Each instruction represents one or more operations. In this process, whether it is to form a problem-solving idea or to write a program, is the implementation of an algorithm, the former is the implementation of the logic of the algorithm, the latter is the implementation of the algorithm of the specific operation.
2. Representation of the algorithm
In order to represent an algorithm, many different should be shipped can be used. Common natural language, traditional flowchart, structured flowchart, N-S diagram, pseudo-code, computer language representation.
The characteristic and evaluation method of the algorithm
1. Features of the algorithm
(1) Having a poor nature, an algorithm must be guaranteed to end, not infinite, after performing a limited number of steps.
(2) certainty, each instruction in the algorithm must have a clear meaning, but not ambiguous ambiguity.
(3) Feasibility, each operation step must be completed within a limited time.
(4) input, an algorithm can have multiple inputs, or can not be entered.
(5) Output, an algorithm can have one or more outputs, no output of the algorithm is of no practical significance.
2. Algorithm evaluation (algorithm design requirements)
(1) Correctness, divided into the following four levels
A. The program does not contain grammatical errors.
B. Procedures for random groups of legitimate input data can be obtained to meet the requirements of the results.
C. Procedures for well-designed and typical legal data input can be obtained to meet the results.
D. The program will be able to obtain the results for all legitimate input data.
(2) legibility
A good algorithm is often shared with others, obscure algorithm is not easy to communicate with people, but also cause the maintenance of modified debugging great difficulty.
(3) High efficiency
Like people's dealings, the efficiency of the people, everyone is willing to work with him, the algorithm is the same, the less run time, its efficiency is higher, especially in large-scale programming, if each algorithm is efficient, it is very helpful to shorten the entire program running time. The efficiency of the algorithm is mainly measured from two aspects of time complexity and space complexity.
(4) maintainability
A good algorithm should maintain a low level of input for late maintenance.
Algorithm analysis
Algorithm analysis mainly refers to the efficiency of the analysis algorithm, the algorithm efficiency of the test is mainly from two aspects, the algorithm run time and the algorithm required storage space.
In the algorithm analysis, the time spent in the whole running process of the algorithm is called the complexity of the algorithm, and the spatial complexity of the space called algorithm occupied by the algorithm during the whole running process.
Time complexity analysis of the algorithm
1. The measurement of the algorithm calculation
There are usually two methods of measuring algorithm execution time:
(1) Methods of post-mortem statistics
Because many computers have a timing function, and some can even be accurate to the millisecond level, the program of different algorithms can be identified by a group or groups of the same statistical data to distinguish the merits. However, there are two defects in this method: First, we must run the program based on the algorithm, and the statistic of the time depends on the environment factors such as computer hardware and software, so people often use another method of pre-analysis and estimation.
(2) Methods of pre-analysis and estimation
By analyzing the sequence of different statements in the algorithm, the relative size of the execution times of all the statements in the algorithm is obtained, and the running time of the algorithm is judged. This is only a relative concept, not an absolute size.
2. Algorithm run time analysis rules
The factors that affect the running time of the program are manifold, such as the speed of the machine, the quality of the target code generated by the compiler, and the input of the program. Typically, a program's run time is defined by a T (n), where n is the size of the program's input data, not a specific input. The unit of T (n) is indeterminate and generally looks at the number of instruction strips executed on a particular computer.
When discussing the run time t (N) of a program, the focus is not on the specific value of T (N), but on its growth rate. The growth rate of T (n) is closely correlated with the input scale of the data in the algorithm. The data input scale is often represented by a function of a variable in the algorithm, usually expressed in F (N). As the size of the data input increases, the growth rate of f (n) is similar to the growth rate of T (n), so T (n) and F (n) are uniformly expressed as t (n) = O (f (n)) at the order of magnitude.
The growth rate of a program's run time will ultimately determine how large a problem the program can solve on a computer.
According to the logical relationship between the sequence of statements, the statistics of growth can be divided into two types: linear summation rule and geometric accumulation rule. Set T1 (n) = O (f (n)), T2 (n) =o (g (n)):
(1) With linear accumulation rules, set T1 (n) and T2 (n) is the program segment P1 and P2 run time, then execute P1 immediately after P2 run time T1 (n) + T2 (n) is:
T1 (n) + T2 (n) = O (Max{f (n), g (n)})
(2) with the geometric accumulation rules are:
T1 (n). T2 (n) = O (f (N). g (n))
In general, the time complexity of the analysis program is gradually carried out, first of all the statements in the program and the running time of each module, and then the entire program run time. The run time of a set of statements (which is part of the overall program run time) can be represented as a function of several variables or the size n of the input data. The run time of the entire program is generally expressed as a unique parameter (the function of the size n of the input data).
In the process of analysis, we will encounter a variety of statements and various modules, the specific analysis of the following 6 scenarios:
A. The elapsed time of each assignment statement or read/And statement is usually O (1). However, there are some exceptions, such as the possibility of function calls in the right expression of an assignment statement, and the time taken to calculate the value of the function.
B. The run time of a sequential statement is determined by a purely rule, which is the elapsed time of the most time-consuming statement in the sequence.
C. The run time of the statement if is the conditional statement test time (usually take O (1)) plus the run time of the spoke statement, the run time of the statement if-else-if is the conditional test time plus the run time of the spoke statement.
D. The elapsed time of the loop statement is the sum of the time spent in the N-Times repeated execution of the loop body, where n is the number of repetitions. And each repetition of the loop termination condition and the time spent jumping back to the beginning of the loop, the latter part by taking O (1), the constant factor is ignored, it is generally considered that the time is the number of cycles N and M of the product, where M is the n execution cycle of the most time-consuming run time, and can be calculated according to the geometric A product.
When encountering multi-layer loops, the inner layer should be analyzed by layers, so when the running time of the outer loop is analyzed, the running time of the inner loop should be known, and the inner loop can be considered as part of the outer loop.
E. When the number of algorithm runs cannot be determined.
In the above cases, the number of runs of the algorithm can be determined, but in some cases, the number of runs of the algorithm may not be deterministic, such as in a given record to find a keyword, the lookup process starts from the first record, it is possible to find one at a time, it is possible to find the last time, and may not find at all.
If you cannot find success, it is, of course, an operand of N (n is the number of records to be searched). The complexity of time is of course O (n). If the search succeeds, the average number of operations is required for this kind of algorithm, and the number of operations of the whole algorithm is represented by the average operation number.
Average = total number of operations/records to be calculated
This method is available in the Find and sort algorithm.
Spatial complexity analysis of 1.3.2 algorithm
Spatial complexity is a measure of the amount of storage space that is temporarily occupied by an algorithm while it is running. The storage space occupied by the algorithm in computer memory is divided into three parts: the storage space occupied by the algorithm source code itself, the storage space occupied by the algorithm input and output data, and the storage space occupied by the algorithm during the running.
The storage space occupied by the algorithm input and output data is determined by the size of the data input size to be solved by the algorithm, and it does not change with the different algorithm.
The algorithm source code itself occupies a storage space proportional to the length of the algorithm writing, to reduce this part of the space you have to write a simple algorithm source code.
The storage space temporarily occupied by the algorithm differs with the algorithm. Some algorithms occupy a large amount of temporary storage space, some algorithms occupy a small amount of temporary storage space, good algorithm in the process of running the application of temporary storage space does not vary with the size of data input.
In the analysis of time complexity and spatial complexity of an algorithm, it is often not possible to take into account the time complexity, and the performance of spatial complexity should be sacrificed, and vice versa. Therefore, the two should be integrated and coordinated from the frequency of the algorithm used, the size of the data processed.
No, ~\ (≧▽≦)/~.
Preliminary understanding of data structure