Java interview Knowledge point of database interview
1. Primary key Super Key candidate key foreign key
Primary key:
A combination of data columns or attributes that are uniquely and fully identified in a database table for storing data objects. A data column can have only one primary key, and the value of a primary key cannot be missing, that is, it cannot be null.
Super key:
An attribute set that uniquely identifies a tuple in a relationship is called a super key of a relational schema. A property can be used as a super key, and multiple attributes can be grouped together as a super key. The Super key contains candidate keys and primary keys.
Candidate Key:
Is the minimum key, that is, a super key with no redundant elements.
FOREIGN key:
The primary key of another table that exists in one table is called the foreign key of this table. 2. Four characteristics and implications of database transactions
Database transaction Transanction Four basic elements that are correctly executed. ACID, Atomicity (atomicity), consistency (correspondence), isolation (isolation), persistence (durability).
Atomicity: All operations in the entire transaction, either complete or incomplete, cannot be stalled in the middle of the process. An error occurs during the execution of a transaction and is rolled back (Rollback) to the state before the transaction began, as if the transaction had never been performed.
Consistency: Database integrity constraints are not compromised until the transaction is started and after the transaction has ended.
Isolation: Quarantine states perform transactions so that they appear to be the only actions that the system performs within a given time. If there are two transactions running at the same time and performing the same function, the isolation of the transaction will ensure that each transaction is considered only in the system by the transaction. This attribute is sometimes referred to as serialization, in order to prevent confusion between transaction operations, the request must be serialized or serializable so that only one request at a time is used for the same data.
Persistence: After the transaction is completed, the changes made by the firm to the database are persisted in the database and are not rolled back. 3. View function, view can be changed.
A view is a virtual table, unlike a table that contains data, which contains only queries that dynamically retrieve data when it is used, and does not contain any columns or data. Using views simplifies complex SQL operations, hides specific details, protects data, and after views are created, you can use them in the same way as tables.
The view cannot be indexed, cannot have an associated trigger or default value, and if the view itself has an order by, it will be overwritten again by an order by.
Creating view: Create View XXX as xxxxxxxxxxxxxx;
For some views, such as not using the JOIN Subquery grouping aggregate functions distinct union, you can update the base table, but the view is used primarily to simplify the retrieval, protect the data, not be used for updates, and most views are not updatable. the difference between 4.drop,delete and truncate
Drop directly deletes the table truncate deletes the data in the table, and then inserts the data from the growth ID from the 1 delete table, adding the WHERE clause.
(1) The DELETE statement performs a deletion by deleting a row from the table and saving the row's delete operation as a transaction record in the log for rollback operations. TRUNCATE table deletes all the data from the table once and does not log a separate delete action record to the journal Save, and the delete row is unrecoverable. And the delete trigger associated with the table is not activated during deletion. Faster execution.
(2) Space occupied by tables and indexes. When the table is truncate, the space occupied by the table and index is restored to its original size, and the delete operation does not reduce the amount of space occupied by the table or index. The drop statement frees up all the space occupied by the table.
(3) Generally, drop > truncate > Delete
(4) Application range. TRUNCATE can only be table and view for Table;delete
(5) TRUNCATE and delete delete data only, and drop deletes the entire table (structure and data).
(6) Truncate and no where Delete: delete only data without deleting the structure (definition) of the table the drop statement will delete the constraint (constrain) of the table's structure, the trigger (trigger) index, the stored procedure that depends on the table/ The function will be preserved, but its state will change to: invalid.
(7) The DELETE statement is DML (data maintain Language), which is placed in the rollback segment and is not effective until the transaction is committed. If there is a corresponding Tigger, the execution will be triggered.
(8) Truncate, drop is a DLL (data define language), the operation takes effect immediately, the original data is not placed in the rollback segment, can not be rolled back
(9) Careful use of drop and truncate in the absence of a backup. To delete some data rows with delete and note where to constrain the scope of the impact. The rollback segment should be large enough. To delete a table with drop, if you want to preserve the table and delete the data in the table, you can do it with truncate if it is not a transaction. If it's related to the transaction, or the teacher wants to trigger the trigger, use Delete.
(a) Truncate table name is faster and more efficient because:
TRUNCATE TABLE is functionally the same as a DELETE statement without a WHERE clause: Both delete all rows in the table. However, TRUNCATE TABLE is faster than DELETE and uses less system and transaction log resources. The DELETE statement deletes one row at a time and records an entry in the transaction log for each row that is deleted. TRUNCATE table deletes data by releasing the data pages used to store the table data, and only records the release of the page in the transaction log.
TRUNCATE table deletes all rows in the table, but the table structure and its columns, constraints, indexes, and so on remain unchanged. The count value used for the new line identification is reset to the seed of the column. If you want to keep the identity count value, use DELETE instead. If you want to delete the table definition and its data, use the DROP table statement.
(12) For tables referenced by the FOREIGN KEY constraint, you cannot use TRUNCATE table, and you should use a DELETE statement without a WHERE clause. Because the TRUNCATE TABLE is not logged in the log, it cannot activate the trigger. 5. The working principle of index and its kind
Database index is a sort of data structure in the database management system to help quickly query and update data in database tables. The implementation of the index usually uses B-tree and its variant plus + tree.
In addition to data, the database system maintains a data structure that satisfies a particular lookup algorithm, which refers to data in some way, so that advanced lookup algorithms can be implemented on these data structures. This data structure is the index.
There is a cost to setting up the index for a table: the first is to increase the storage space of the database, and the second is to spend more time inserting and modifying the data (because the index will also change).
The figure above shows a possible way of indexing. On the left is the datasheet, a total of two columns and seven records, and the leftmost is the physical address of the data record (note that logically adjacent records are not necessarily physically contiguous on disk). To speed up the Col2 lookup, you can maintain a two-fork lookup tree shown on the right, each node containing the index key value and a pointer to the corresponding data record physical address, so that the binary lookup can be used to obtain the corresponding data within the complexity of O (log2n).
Creating an index can greatly improve the performance of your system.
First, you can guarantee the uniqueness of each row of data in a database table by creating a unique index.
Second, you can greatly speed up the retrieval of data, which is the main reason to create indexes.
Third, you can speed up the connection between tables and tables, especially in terms of realizing the referential integrity of the data.
Finally, when you use grouping and sorting clauses for data retrieval, you can also significantly reduce the time to group and sort in a query.
In the process of querying, the optimization of the hidden device can be used to improve the performance of the system by using the index.
One might ask: why is there so many advantages to adding an index, and why not to create an index for each column in the table? Because there are many disadvantages to adding indexes.
First, it takes time to create indexes and maintain indexes, which increase as the amount of data increases.
Second, the index needs to occupy the physical space, in addition to the data table occupies the data space, each index also occupies a certain physical space, if you want to establish a clustered index, then need more space.
Third, when the data in the table is added, deleted and modified, the index will be maintained dynamically, thus reducing the data maintenance speed.
Indexes are built on top of some columns in a database table. When you create an index, you should consider which columns you can create indexes on, and on which columns you cannot create indexes. In general, you should create indexes on these columns: You can speed up your search on columns that you frequently need to search; On columns that are the primary key, enforce the uniqueness of the column and the arrangement of the data in the organization table; These columns are mostly foreign keys that are used frequently on connected columns to speed up the connection. ; Create indexes on columns that often need to be searched by scope because the indexes are sorted, the ranges specified are contiguous, and indexes are created on columns that often need to be sorted, because the indexes are sorted so that the query can take advantage of the sort of the index to speed up the sorting query time ; Create indexes on columns that are frequently used in the WHERE clause to speed up the judgment of the condition.
Similarly, indexes should not be created for some columns. In general, these columns that should not create indexes have the following characteristics:
First, you should not create indexes for columns that are rarely used or referenced in queries. This is because, since these columns are rarely used, they are indexed or indexed and do not increase the query speed. On the contrary, because of the addition of indexes, it reduces the maintenance speed of the system and increases the space requirement.
Second, you should not add indexes to columns that have very few data values. This is because, because of the low values of these columns, such as the gender column of the personnel table, the result set's data rows account for a large proportion of the data rows in the table, that is, the data rows that need to be searched in the table are large. Adding indexes does not significantly speed up the retrieval.
Third, columns that are defined as text, image, and bit data types should not be indexed. This is because these columns have either a large amount of data or a very small number of values.
Four, you should not create an index when the modification performance is far greater than the retrieval performance. This is because the modification performance and retrieval performance are contradictory. When indexing is added, retrieval performance is improved, but modification performance is reduced. When the index is reduced, the modification performance is improved and the retrieval performance is reduced. Therefore, indexes should not be created when the modification performance is far greater than the retrieval performance.
Depending on the functionality of your database, you can create three indexes in the Database Designer: A unique index, a primary key index, and a clustered index.
Unique index
A unique index is one that does not allow any two rows to have the same index value.
Most databases do not allow a newly created unique index to be saved with a table when duplicate key values exist in existing data. The database may also prevent the addition of new data that will create duplicate key values in the table. For example, if a unique index is created on the employee's last name (lname) in the employee table, no two employees will have the same surname. Primary key indexed database tables often have a column or combination of columns whose values uniquely identify each row in the table. This column is called the primary key of the table. Defining a primary key for a table in a database diagram automatically creates a primary key index, which is a specific type of unique index. The index requires that each value in the primary key is unique. When a primary key index is used in a query, it also allows for fast access to the data. Clustered index in a clustered index, the physical order of the rows in the table is the same as the logical (indexed) Order of the key values. A table can contain only one clustered index.
If an index is not a clustered index, the physical order of the rows in the table does not match the logical order of the key values. Clustered indexes typically provide faster data access than nonclustered indexes.
The principle of locality and disk pre-read
Because of the nature of the storage media, the disk itself is much slower to access than main memory, coupled with the cost of mechanical exercise, disk access speed is often one of the hundreds of of main memory, so in order to improve efficiency, to minimize disk I/O. To do this, the disk is often not read strictly on demand, but read each time, even if only a single byte, the disk will start from this location, sequentially reading a certain length of data into memory. The rationale for this is the famous local principle in computer science: When a data is used, the data near it is usually used immediately. The data that is required during program execution is usually more centralized.
Because of the high efficiency of disk sequential reads (no seek time is required, only a small amount of rotation time), prefetching can improve I/O efficiency for locally-accessible programs.
The length of the prefetch is typically the integer multiple of the page. Pages are logical blocks of Computer Management memory, and hardware and operating systems often split main memory and disk storage into contiguous blocks of equal size, each of which is called a page (in many operating systems, the page size is typically 4k), and main memory and disk Exchange data in page units. When the program to read the data is not in main memory, will trigger a page fault exception, the system will be sent to the disk read signal, the disk will find the starting position of data and read one or several pages back to load in memory, and then abnormal return, the program continues to run.
Performance analysis of B-/+tree index
Here we can finally analyze the performance of the B-/+tree index.
As mentioned above, the index structure is generally evaluated using disk I/O times. First from the B-tree analysis, according to the definition of b-tree, we can retrieve the maximum access to H nodes. The designer of the database system cleverly utilizes the disk prefetching principle to set the size of a node to be equal to a page so that each node needs only one I/O to be fully loaded. In order to achieve this goal, the following techniques need to be used to actually implement B-tree:
Each time you create a new node, directly request a page space, so that a node is physically also stored in a page, plus the computer storage allocation is page-aligned, the implementation of a node only once I/O.
A maximum of h-1 I/O (the root node resident memory) is required at one time in B-tree, and the asymptotic complexity is O (h) =o (LOGDN). In general practical applications, the out degree d is a very large number, usually more than 100, so H is very small (usually no more than 3).
The structure of the red-black tree, H is obviously much deeper. Because logically close nodes (parent-child) are physically far away from being able to exploit locality, the red-black tree has an O (h) I/O asymptotic complexity, which is significantly worse than B-tree.
To sum up, using B-tree as index structure efficiency is very high. 6. Type of connection
Execution in Query Analyzer:
– Build Table Table1,table2:
CREATE TABLE table1 (ID int,name varchar (10))
CREATE TABLE table2 (ID int,score int)
INSERT INTO table1 Select 1, ' Lee '
INSERT INTO Table1 Select 2, ' Zhang '
INSERT INTO table1 Select 4, ' Wang '
Insert INTO table2 Select 1,90
Insert INTO table2 Select 2,100
Insert INTO table2 select 3,70
such as table
Table1 | table2 |
ID Name |id Score |
1 Lee |1 90|
2 Zhang| 2 100|
4 wang| 3 70|
The following are performed in Query Analyzer
One, outer connection
1. Concept: Include left outer joins, right outer joins, or full outer joins
2. Left-side join: A/outer JOIN
(1) The result set of the left outer join includes all the rows of the left table specified in the OUTER clause, not just the rows that the join columns match. If a row in the left table does not have a matching row in the right table, all picklist columns in the right table in the associated result set row are null (NULL).
(2) SQL statement
SELECT * FROM table1 LEFT join table2 on Table1.id=table2.id
————-Results ————-
Idnameidscore
1lee190
2zhang2100
4wangNULLNULL
Remarks: Contains all the clauses of Table1, returns table2 corresponding fields according to the specified criteria, and does not conform to null display
3. Right connection: Right-hand join or outer join
(1) A right outer join is a reverse join of a left outer join. All rows from the right table will be returned. If a row in the right table does not have a matching row in the left table, a null value is returned for left table.
(2) SQL statement
SELECT * FROM table1 right join table2 on Table1.id=table2.id
————-Results ————-
Idnameidscore
1lee190
2zhang2100
NULLNULL370
Remarks: Contains all the clauses of table2, returns table1 corresponding fields according to the specified criteria, and does not conform to null display
4. Complete outer join: Full JOIN or fully outer join
(1) A full outer join returns all rows in the left and right tables. When a row does not match rows in another table, the select list column for the other table contains null values. If there is a matching row between the tables, the entire result set row contains the data values for the base table.
(2) SQL statement
SELECT * FROM table1 full join table2 on Table1.id=table2.id
————-Results ————-
Idnameidscore
1lee190
2zhang2100
4wangNULLNULL
NULLNULL370
Comments: Returns the and left and right connections (see upper left, right-hand connection)
Two, the internal connection
1. Concept: An inner join is a join that compares the values of the columns to be joined with a comparison operator
2. INNER JOIN: Join or INNER JOIN
3.sql statement
SELECT * FROM table1 join table2 on Table1.id=table2.id
————-Results ————-
Idnameidscore
1lee190
2zhang2100
Remarks: Returns only the Table1 and table2 columns that match the criteria
4. Equivalence (same as the following execution effect)
A:select a.,b. From table1 a,table2 b where a.id=b.id
B:select * FROM table1 Cross join Table2 where table1.id=table2.id (note: Cross join after conditions can only be used where, not on)
Cross-connect (complete)
1. Concept: A cross join without a WHERE clause produces a Cartesian product of the table involved in the join. The number of rows in the first table multiplied by the number of rows in the second table equals the size of the Cartesian product result set. (Table1 and table2 cross joins generate 3*3=9 Records)
2. Cross-connect: Cross join (without conditions where ...)
3.sql statement
SELECT * FROM table1 cross join Table2
————-Results ————-
Idnameidscore
1lee190
2zhang190
4wang190
1lee2100
2zhang2100
4wang2100
1lee370
2zhang370
4wang370
Note: Returns the 3*3=9 record, that is, the Cartesian product
4. Equivalence (same as the following execution effect)
A:select * from Table1,table2 7. Database Paradigm
1 First paradigm (1NF)
In any relational database, the first normal form (1NF) is the basic requirement of the relational schema, and the database that does not satisfy the first normal form (1NF) is not a relational database.
The so-called first normal form (1NF) means that each column of a database table is an indivisible basic data item and cannot have multiple values in the same column, that is, an attribute in an entity cannot have more than one value or cannot have duplicate attributes. If duplicate attributes occur, you may need to define a new entity, which consists of duplicate attributes and a one-to-many relationship between the new entity and the original entity. In the first normal form (1NF), each row of the table contains information about only one instance. In short, the first paradigm is a column without duplicates.
2 Second Normal form (2NF)
The second normal form (2NF) is established on the basis of the first normal form (1NF), that is, satisfying the second normal form (2NF) must first satisfy the first normal form (1NF). The second normal form (2NF) requires that each instance or row in a database table must be divided into unique regions. For implementation differentiation it is often necessary to add a column to the table to store the unique identities of individual instances. This unique property column is called the primary key or primary key, and the primary code.
The second normal form (2NF) requires that the attributes of an entity depend entirely on the primary keyword. Total dependency refers to an attribute that cannot exist only on a part of the primary key, and if so, this part of the attribute and the primary key should be separated to form a new entity, a one-to-many relationship between the new entity and the original entity. For implementation differentiation it is often necessary to add a column to the table to store the unique identities of individual instances. In short, the second paradigm is that non-primary attributes are dependent on the primary keyword.
3 Third paradigm (3NF)
Satisfying the third normal form (3NF) must first satisfy the second normal form (2NF). In short, the third normal form (3NF) requires that a database table not contain non-primary key information already contained in other tables. For example, there is a departmental information table in which each department has information such as department number (dept_id), department name, department profile, and so on. Then, after the department number is listed in the Employee Information table, the department name, department profile, and other department-related information can no longer be added to the Employee Information table. If the departmental information table is not present, it should also be built according to the third normal form (3NF), otherwise there will be a large amount of data redundancy. In short, the third paradigm is that attributes are not dependent on other non-primary attributes. (My understanding is to eliminate redundancy) 8. The idea of database optimization
This I learn from the MU class on database optimization courses.
1.SQL Statement Optimization
1 Avoid using the!= or <> operator in the WHERE clause, otherwise the engine discards the use of the index for a full table scan.
2) should try to avoid in the WHERE clause to judge the null value of the field, or it will cause the engine to discard the use of the index for a full table scan, such as:
Select ID from t where num is null
You can set the default value of 0 on NUM to ensure that the NUM column in the table does not have a null value and then query this way:
Select ID from t where num=0
3 Many times it is a good choice to use exists instead of in.
4 replaces the HAVING clause with a WHERE clause because having only filters the result set after retrieving all records
2. Index optimization
Look at the index above
3. Database structure optimization
1 Paradigm Optimization: Eliminate redundancy (save space, for example). 2) Inverse paradigm optimization: such as appropriate redundancy (less join) 3) Split table: Partitions separate the data physically, and data from different partitions can be set up in data files that are stored on different disks. So, when you query this table, by simply scanning in the table partition without having to perform a full table scan, the query time is significantly shortened, and the partition on the different disks will scatter the data transfer to the table in different disk I/O, and a carefully set partition can i/the data to the disk o competition is evenly dispersed. This method can be used for time tables with large amounts of data. Table partitions can be built automatically on a monthly basis.
4 Split is in fact divided vertically and horizontally split: case: The simple shopping system tentatively involves the following table: 1. Product table (data volume 10w, Stable) 2. Order table (data volume 200w, and growth trend) 3. User table (data volume 100w, with growth trend) Take MySQL as an example to explain the horizontal split and vertical split, MySQL can tolerate an order of magnitude in millions of static data can be split vertically: PROBLEM solving: IO competition between tables and tables does not solve the problem: the increase in data volume in a single table: putting the Product table and the user table on a server Order table separate on one server horizontal split: Problem solving: The increase in data volume in a single table does not solve the problem: IO contention between tables
Programme: User table split into male user table and female user table The order table is split into completed and incomplete order product tables through completion and completion complete order put on a Server Completed Order table box Male user table put a server on the female user table put on a server (woman love shopping haha)
4. Server hardware optimization
That's a lot of money. 9. The difference between stored procedures and triggers
Triggers are very similar to stored procedures, and triggers are also SQL statement sets, and the only difference is that triggers cannot be invoked with the EXECUTE statement, but instead automatically triggers (activates) execution when a user executes a Transact-SQL statement. Triggers are stored procedures that are executed when a data in a specified table is modified. It is common to enforce referential integrity and consistency of logically related data in different tables by creating triggers. Because users cannot bypass triggers, they can be used to enforce complex business rules to ensure data integrity. Triggers are different from stored procedures, and triggers are executed primarily through event execution, and stored procedures can be invoked directly by storing procedure name names. When you do these operations on a table, such as update, INSERT, and delete, SQL Server automatically executes the statements defined by the trigger, ensuring that the processing of the data must conform to the rules defined by those SQL statements.