Database interview question 2

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Connection Query

Equijoin

When the join operator is =, it is called an equivalent join. Other operators are called non-equivalent connections.

Select student. *, SC .*

From student, SC

Where student. Sno = SC. SnO

The SNO reoccurs in the result.

Natural connection

If duplicate attribute columns are removed from the equijoin, it is a natural connection.

Select student. Sno, sname, ssex, sage, sdept, CNO, grade

From student, SC

Where student. Sno = SC. SnO

2 outer connections

In a common connection operation, only the tuples that meet the connection conditions can be output as results. In Example 33, the result table contains no information about students 200215123 and 200215125, because they did not select courses and the table does not have the corresponding tuples in SC, these tuples in student are discarded during connection.

Sometimes you want to use the student table as the subject to list the basic information of each student and the course selection information. If a student does not select a course, the discarded student is still saved in the result relation, and null is left in the SC table attribute. In this case, an external connection is required.

Select student. Sno, sname, ssex, sage, sdept, CNO, grade

From student left out join SC on (student. Sno = SC. SnO)

The left Outer Join lists all the tuples in the left Link (student in this example), and the right Outer Join lists all the tuples in the right link.

Connection Type nodes correspond to connection operations in relational algebra. PostgreSQL defines the following connection types (taking T1 join T2 as an example ):

1) innerjoin: inner join. All the tuples in T1 and all the tuples in T2 that meet the connection conditions are connected.

2) leftouter join: Based on the left join and internal join, a null value is used to connect T1 tuples that cannot be connected to T2.

3) rightouter join: Right join. Based on the internal join, a null value is used to connect T2 tuples that cannot connect T1 tuples.

4) fullouter join: based on the full outer join and internal join, for T1 tuples that cannot be connected to t2 tuples, and T2 tuples that cannot be connected to T1 tuples, use a null-value tuples to connect to them.

5) semijoin: similar to the in operation, when a t1-tuple can find a tuple that meets the connection conditions in t2. This t1-tuple is returned, but it is not connected to the matching t2-tuple.

6) antijoin: type not in operation. When a t1-tuple does not find a tuple that meets the connection conditions in T2. the connection between the t1-tuple and the null tuple is returned.

PostgreSQL implements three connection operations: nest loop, merge join, and hash join ). The merge join algorithm can implement the preceding six connections, while nested loop connections and hash connections can only implement inner join, left Outer Join, semi join, and Anti join connections.

4. PostgreSQL has three connection algorithms. What are their application scenarios and the formula for cost estimation?

Nested loop connection (nested loops ),
Merge join (merge ),
Application of hash join

1. Applicability of nested loop connections (nested loops)
Two tables: An External table and an internal table.

If the external input is very small and the internal input is very large and an index has been created in advance, nested loop join will be particularly efficient.

For which table is the outer table and the inner table during the connection, I found that SQL server will automatically arrange it for you, and it has nothing to do with the position you write, it automatically selects a table with a small amount of data as an outer table, and a table with a large amount of data as an inner table.

2. Merge join (merge)
It means that both tables have indexes in the on filter conditions and are ordered. In this way, SQL server uses merge join for join, which improves performance.

If an index exists and no index exists, the nested loops join is selected.

3. Hash join)
If neither of the two tables has an index in the on filter condition, hash join is used.

That is to say, the hash join algorithm is used because of the lack of existing indexes.

Hash connections are often used to connect large datasets. The optimizer uses two small tables and uses the connection key to create a hash in the memory, scan a large table and detect the hash to find the rows matching the hash.

This method applies when a small table can be fully put into the memory, so that the cost is the sum of the costs of accessing the two tables. However, when the table is large, it cannot be completely put into the memory. In this case, the optimizer splits it into several different partitions, the partition cannot be written into the temporary segment of the disk if it is not in the memory.

Hash connections can only be used for equivalent connections (such as where. col3 = B. col4), non-equivalent join (where. col3> B. col4), external connection (where. col3 = B. col4 (+ )).

Connection mode application scenarios:

1. Hash connections are only applicable to equivalent connections.

2. nested loops are row-source connections and are only suitable for small data connections.

Hash connections and sort merge connections are set connections, which are suitable for massive data connections.

3. In the equijoin mode, a small number of records (<10000) are returned and the internal table has an index in the connection column. This method is suitable for nested loop join. If a large number of records are returned, hash connections are suitable.

4. In the equijoin mode, the two row source sets have a large combination. If the connection column is a high-base series, it is suitable for hash connections. Otherwise, it is suitable for sorting and merging connections.

5. nested loop connections can first return connected rows without waiting for all connection operations to complete before returning data. The other two connection methods do not work.

6. The sorted and merged data sets can be processed in parallel, while nested loops and hash Connections cannot.

5 mode external mode Internal Mode

Mode:

External mode: Also sub-mode or user mode. It is the logical structure and feature description of local data that can be viewed and used by database users. The external mode is usually a subset of the mode. A database can have multiple external modes. Because it is the data view of each user, if different users have different requirements on applications, ways of viewing data, and requirements for data confidentiality, the external mode description is different. External mode is a powerful measure to ensure database security. Each user can only view and access the data in the corresponding external mode. The remaining data in the database is invisible.

Mode: Also known as the logical mode, which describes the logical structure and features of all databases in the database. It is the middle layer of the database system pattern structure, that is, physical storage and hardware environments that do not involve data, it is also independent of the specific application, the application development tools used, and the advanced programming language ..

A database has only one mode. The database model is based on a certain data model. It comprehensively considers the needs of all users and organically combines these requirements into a logical whole. When defining a schema, you must not only define the logical structure of the data, such as which data items are composed of data records, the name, type, and value range of the data items, but also define the relationship between data items, defines data-related security and integrity requirements.

Internal Mode: storage mode. A database has only one internal mode. It is the description of the physical structure and storage method of data, and the expression of data in the database. For example, whether the record storage method is Heap Storage, whether it is stored in ascending or descending order of some attribute values, or clustered Storage Based on attribute values, and how the index is organized by B + tree index, it is also a hash index; whether data is compressed and stored, and whether it is encrypted; what are the rules for the data storage record structure, such as fixed-length structure or structure, a record cannot be stored across physical pages; and so on;

Secondary Image Function

1. External mode/mode image

2 mode/internal mode image

These two images improve the logical and physical independence of the database system.

1. External mode/mode image: When the mode changes (such as adding new relationships, new properties, changing the Data Type of attributes ), the Database Administrator changes the image of each external mode/mode to keep the external mode unchanged. Applications are written based on the external data mode, so that the program does not need to be modified, ensuring the logical independence of data and the program, referred to as the logical independence of data.

2 mode/internal mode image: When the database storage structure changes (for example, another storage structure is selected), the database administrator changes the mode/internal mode image accordingly, you can keep the mode unchanged, so that the application does not have to change. It ensures the physical independence of data and programs, or physical independence of data for short.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Database interview question 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Database interview question 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support