Personal understanding and summary of data structure
In the subconscious of many programmers, the knowledge of data structures seems to be useless, since there seems to be nothing in the work that involves data structures. My understanding of this can only be reported to hehe ~ is also no wonder, in fact, have these ideas of peers in the work of the majority are so come through: Master several common web frameworks, such as SSH, and then keep piling up the API to do some of the database additions and deletions to change the simple code design, and finally, the function is realized, Whether the design is correct, efficient and excellent, almost no one to tube. Also, this kind of work also basically does not have the useful to the data structure knowledge solves the problem place. In fact, there is no real software developers, or the level is not deep, because data structure is the most fundamental theoretical basis of software development.
1. Why data structures are important
The first reason to develop a variety of software, the purpose is only one: the use of computers to deal with a variety of data for people and in a certain form to display for users. With the increasing application of computer, the data that the computer needs to deal with is more and more complex, and the data processed by computer is no longer pure numerical data, and more non-numerical data.
On the other hand, the data that needs to be dealt with in reality is not disorganized, they must have a variety of intrinsic connections, but this requires the algorithm designer to summarize, generalize, model, and then abstract a concrete model to represent--we will make this model a logical structure of data. Then the smart designers around the logical structure to create a summary of the design of a set of processing methods, so that the data has, the model has, the algorithm has, in theory, the problem can be solved. The rest should be given to the computer to do, but the above are based on the logical design, the computer does not understand these. Therefore, the corresponding storage structure is needed to store the data to the computer first, and then the processing logic (algorithm) is implemented with corresponding code, so that the computer can effectively handle the data .
2. What is a data structure
data structure: the representation and implementation of the mathematical model (non-numerical calculation) and the operation (operation) on the computer that describe the real-world entity.
Data structure refers to a certain number of logical structure of a group of information, using a storage structure to store this batch of data on the computer, and on the data defined an operation set.
3. Data type
data type: defines the nature of the data covered by the type, the range of values, and the various actions that can be taken on the data. Each data in the program belongs to a data type, which determines the nature of the data and the various operations and operations on the data, and the data is protected by type, ensuring that the data cannot be manipulated illegally.
Advanced programming languages are usually predefined for some basic data types and construction data types. The value of the base data type is single, non-biodegradable, and it can participate directly in the operations allowed by that type. Constructing a data type is a more complex data type that is organized using the existing base data types and defined construction data types in accordance with certain grammatical rules. The value of a constructed data type is composed of several elements that are grouped together in a structure. The basic data Types of the Java language are integer types, floating-point number types, character types, Boolean types, and constructed data types (called reference types) with arrays, classes, and interfaces.
abstract data type, ADT: refers to a mathematical model and a set of operations defined on the model. Abstract data types and data types are inherently a concept, and they all represent abstract characteristics of data. Data abstraction refers to the separation of definition and implementation, which separates the logical meanings of data and operations on a type from concrete implementations. The data types provided by the programming language are abstract, describing only the characteristics of the data and the syntax rules for data manipulation, and do not describe how these data types are implemented. The programming language implements the various operations of its predefined data types, and programmers use the data types according to the rules provided by the language, considering only what to do with the data (what to do), regardless of how they are implemented (how to do it). For users who work with data types, data types are hidden from the information, and the details that all users don't need to know are encapsulated in the type. For example, the type of the Java language-an integer type is an abstract data type-and the programmer does not need to know how it is implemented when using an integral type.
On the other hand, the scope of abstract data types is broader, and it is no longer confined to data types that are defined and implemented by program languages (which can also be called intrinsic data types), and also includes data types that programmers define themselves when designing software systems.
As mentioned earlier, the definition of an abstract data type consists of a range and a set of operations defined on that range, where the programmer usually needs to customize an abstract data type for some abstract logical structure, such as: for linear tables, stacks, queues, strings, generalized tables, binary trees, trees, Custom abstract types such as graphs . An abstract data type describes the logical characteristics and operations of a data structure, independent of the data structure stored and implemented within the computer.
three elements of an abstract data type: Data objects, data relations, basic operations.
In practical applications, programmers must implement these abstract data types before they can use the custom abstract data types (before implementation, they should also define nodes for data elements that are part of the data type) before using them. The implementation of abstract data types relies on the data storage structure. For example, linear tables can be implemented using sequential storage structures or chained storage structures, respectively.
in Java language design, an interface is typically used to describe an abstract data type, using a class to implement a borrowing interface that implements the operations described in an abstract data type.
4. Logical Structure of data
the logical structure of data refers to the logical relationship between data elements , represented by a collection of data elements (including n>=0 data elements) and a number of relationships defined on this set. are often referred to as data structures .
The logical structure of the data is computer-independent, independent of the data stored in the computer. The logical structure of data refers to the logical relationship between data elements , divided into four kinds: set, linear structure, tree structure, and graph structure . more specific: linear tables, stacks, queues, strings, generalized tables, trees, graphs, etc. are abstract logical structures of real entities.
Data elements: is the basic unit of data, which is usually considered and processed as a whole in computer programs. A data element can be an indivisible atomic term, or it can consist of multiple data items. A data item is an indivisible, inseparable, smallest unit of data that has an independent meaning in its elements. For example, an integer, a character is an atomic term, and a student's data element consists of several data items, such as the number, name, gender, and date of birth. When the computer is stored , we can use a number of bits together to form a bit string to identify a data element (such as a string with a word length to represent an integer, a 8-bit binary number to represent a character, etc.), This bit is usually presented as an element or node (node). When a data element consists of several data items, the sub-string corresponding to each data item in the bit string is called the data field . in programming languages, such as Java, a data element is usually described by a class. We know that in C and C + + language is the use of pointers to implement the linked list structure, because the Java language does not provide pointers, so some people think in the Java language can not implement linked lists, in fact, theJava language than C and C + + easier to implement the linked list structure . The object reference in the Java language is actually a pointer (the pointers in this article are conceptually meaningful, not the data types provided by the language), so we can write such a class to implement the nodes in the linked list.
Class Node
{
Object data;
Node next;//points to the next node
}
The data field is defined as the object class because the object class is a generalized superclass, and any class object can assign values to it, increasing the commonality of the code. In order for a linked list to be accessible you also need to define a table header, which must contain a pointer to the first node and a pointer to the current node. To make it easier to add nodes at the end of the list, you can also add a pointer to the tail of the list, and you can use a field to represent the size of the list, and when the caller wants the size of the list, it does not have to traverse the entire list.
5. Storage structure of data
But to process the data, you must store the data on the computer. If the data is stored erratically in the computer, it is very bad when it is processed, and it is useless. Imagine if the words in an English-Chinese dictionary are arranged randomly, who will use this dictionary.
The representation of a data structure in a computer (also known as an image) becomes the physical structure, also known as the storage structure. It includes the representation of the data element in the computer and the representation of the relationship.
5.1 The basic storage structure of the data is mainly 4 kinds:
(1) sequential storage: Logically adjacent nodes are stored in a contiguous set of memory cells, so that logically adjacent nodes must be physically adjacent to each other, and the elements are stored in memory in the same order as their logical order. sequential storage is typically used to store data with a linear structure, which is typically implemented in high-level programming languages using arrays .
(2) Chained storage: That is , the use of several address scattered storage units to store data elements, logically adjacent data elements in the physical location is not necessarily adjacent, the relationship between data elements need to use additional information specifically specified . Usually, a pointer variable is used to record the storage address of a precursor or successor element, a node composed of data fields and address fields represents a data element, and the link between the nodes is linked by the Address field, and the links between them are the logical relationships between the data elements.
(3) Index storage: in a linear structure, the index number of the set start node is 1, the index number of the other nodes equals the index number of its preceding nodes plus 1, then each node has a unique index number, the index number is to determine the node's storage address according to the index number of the node.
(4) hash (hash) storage : The idea of hash storage is to construct a function h from the set K to the storage area m, which is defined as K, and the domain value of each node in M,k is the storage address of Ki in the computer determined by H (KI).
The sequential storage structure and the chained storage structure are the two most basic and common storage structures . In addition, the sequential storage structure and the chained storage structure are combined, and some more complex storage structures can be constructed.
5.2 Some common data structures (i.e. logical structures) corresponding to the storage structure
Storage of linear tables : Linear tables can use sequential storage structures and chained storage structures. A linked list of sequential storage structures is called a sequential table, which usually uses an array to store its data elements, keeping the order of the data elements of the linear table in an array. The physical order of the data elements in the array is exactly the same as the order of the elements in the linear table . Linear tables with chained storage become linked lists .
Storage of Stacks : sequential storage and chain storage, respectively, as sequential stacks and chain stacks;
Queue Storage : The queue generally adopts chain storage, in which the cyclic queue can be stored sequentially;
the storage structure of an array: sequential storage, in which two-dimensional arrays are stored in the main order of the row order and in the order of the main sequence;
An array is a random storage structure stored sequentially, occupying a contiguous set of storage units, and by identifying elements, the address of the element is a linear function of subscript. In programming languages , arrays have been implemented as a type of construction data, integer types in programming languages , character types, etc. are basic data types, and a variable represents a data, which is called a simple variable. In practical applications, it is often necessary to deal with a batch of data of the same nature . For example, to handle the test scores of 100 students, if you want to use simple variables, you will need 100 different variables, which is very inconvenient, for example, in Java, you can use arrays to store a batch of data of the same nature . Once an array occupies a piece of storage space, the address and length of this storage space are deterministic and cannot be changed. Therefore , the array can only be assigned, the value of two random access operations, can not be inserted, delete operations .
the storage structure of the string :
1. fixed-length sequential storage: A sequential storage structure similar to a linear table that stores a sequence of character sequences of string values (such as using one-dimensional array storage) with a contiguous set of storage cells.
2. Heap allocation storage: A sequence of characters that is still stored in a contiguous set of storage cells, but their storage space is dynamically allocated during program execution.
3. block Chain storage: The list of links can be used to store string values.
Master: array, character array, string array, ArrayList (dynamic array), LinkedList class, String class, StringBuffer class, StringBuilder class, and how to use it.
the storage structure of a matrix: when programming high-level languages, it is common to use two-bit arrays to store matrix elements. Matrix storage, we are not interested in the matrix itself, but how to store the matrix of the elements, so that the various operations of the matrix can be effectively carried out. Matrix many mathematical objects that are often studied in scientific and engineering computing problems.
the storage structure of a generalized table: a chain-store structure is usually used.
the commonly used storage structure of the tree is : parent representation storage, children's linked list storage, child brother notation (also known as binary linked list notation) storage, and so on, these storage methods belong to the different ways of chain storage.
two The common storage structure of the fork Tree : It is suitable for the sequential storage structure of the complete binary tree, the binary chain table storage, the three-fork linked list storage method Two kinds of chain storage structure.
the commonly used storage structures of graphs are : adjacency matrix notation Storage (array notation), adjacency table method, cross-linked list method, adjacency multi-table, and Adjacency table method, cross-linked table method and adjacency multiple table are different chain storage methods of graphs.
5.3 operation of the data (set of operations)
Data manipulation refers to various operations or processing of data elements in a structure. There is a set of data operations for each structure. For a batch of data, the operation of the data is defined on the logical structure of the data, and the concrete implementation of the operation relies on the data storage structure.
In general, the operation of data includes inserting, deleting, updating, retrieving, traversing, sorting and so on.
Initialize, determine whether the empty state, the number of statistics elements
Insert: Adds a new node to a structure.
Delete: Deletes a node in a struct.
Update: Updates a node in a structure where the location is established.
Retrieve: Finds a node in a structure that satisfies a condition.
Output: Prints and outputs the values of all nodes in a structure.
Sort: rearranges all the nodes in a structure in some order.
traversal: All elements are accessed in a certain order, and each element can only be accessed once, called a traversal operation.
Reference: "Data structure (c language version)" MIN; "Data structure (Java edition)"-Leaf nuclear industry