Merge data using relational algebra
1 relational algebra
The theoretical basis of Data Set merging is relational algebra, which was proposed by E. F. codd in 1970.
In the formal language of relational algebra:
? Tables or data sets are used to represent links or entities.
? Use rows to represent tuples.
? Columns are used to represent attributes.
Relational algebra contains the following eight Relational operators
? Select to return the rows that meet the specified conditions.
? Projection: returns the specified column from the dataset.
? Cartesian Product-do not combine rows from two datasets in all possible ways.
? And -- addition and subtraction of a link. It can multiply a link in the row direction. It merges the data in two tables, just like putting one table on another.
? -- Returns the rows in the two data sets.
? Difference-only rows in a data set are returned.
? Join-merge two tables horizontally by merging the rows matching each other on the common data items in the two tables.
? In addition, the exact match between two datasets is returned.
In addition, as a method for implementing modern relational algebra operations, SQL also provides:
? Subqueries-similar to connections, but more flexible; in external queries, the results of subqueries can be used in expressions, lists, or data sets.
This chapter describes multiple types of connections, simple and related subqueries, and the types of joins, links, and other content.
2 Use connection
2.1 connection type
In relational algebra, join operations are composed of a Cartesian Product operation and a selection operation. First, use Cartesian product to perform multiplication on two data sets, and then select the generated result set, make sure that only the rows from two data sets with overlapping parts are merged. The full meaning of join is to merge two data sets (usually tables) horizontally and generate a new result set, the method is to combine the rows in one data source with the matched rows in another data source into a new Meta Group.
SQL provides multiple types of connections. The difference between them is that the method used to select rows for connection is different from different overlapping data sets.
Connection Type Definition
The inner connection only connects the matched rows.
The left Outer Join contains all rows in the left table (no matter whether the table on the right has rows matching them) and all matched rows in the right table
The right Outer Join contains all rows in the right table (no matter whether the left table has rows matching them) and all matched rows in the left table
The full outer join contains all rows in the left and right tables, regardless of whether the tables on the other side have rows matching them.
(H) (theta) join uses conditions other than equivalent to match rows in the left and right tables
Cross join generates Cartesian Product-instead of using any matching or selection conditions, it directly matches each row in a data source with each row in another data source.
Join Table query in Informix
If the from clause specifies more than one table reference, the query connects rows from multiple tables. The connection condition specifies the connection relationship between each column (at least one column in each table. Because columns in the join condition are being compared, they must have consistent data types.
The from clause of the SELECT statement can specify the following types of connections:
Result set corresponding to the from clause keyword
Cross join Cartesian Product (all possible row pairs)
Inner join only applies to columns in cross that meet the connection conditions.
Left Outer Join a table that meets the conditions, and all rows in the other table
Right outer join is the same as left, but the roles of the two tables are interchangeable.
Full outer join left Outer and right outer
Inner join)
An internal connection is the most common connection. Its page is called a common connection, and E. fcodd was first called a natural connection.
ANSI SQL-92 standards below
Select *
From t_institution I
Inner join t_teller t
On I. inst_no = T. inst_no
Where I. inst_no = "5801"
Inner can be omitted.
Equivalent to the early connection syntax
Select *
From t_institution I, t_teller t
Where I. inst_no = T. inst_no
& I. inst_no = "5801"
2.3 outer connections
2.3.1 left Outer Join (left outer jion)
Select *
From t_institution I
Left Outer Join t_teller t
On I. inst_no = T. inst_no
Outer can be omitted.
2.3.2 right Outer Join (rigt outer jion)
Select *
From t_institution I
Right Outer Join t_teller t
On I. inst_no = T. inst_no
2.3.3 full outer)
All outer connections return all data in the two data sets involved in the connection, regardless of whether they have matched rows. In terms of function, it is equivalent to performing left Outer Join and right outer join on the two data sets respectively. Then, the preceding two result sets are combined into a result set by removing duplicate rows.
In real life, integrity constraints can be referenced to reduce the use of full outer connections. Generally, left outer connections are enough. When clear and standardized constraints are not used in the database to prevent erroneous data, all external connections become very useful. You can use it to clean up data in the database.
Select *
From t_institution I
Full outer join t_teller t
On I. inst_no = T. inst_no
2.3.4 use with external connections and conditions
When a condition is added to an internal join query, whether it is added to the join clause or the WHERE clause, the effect is the same, but the external join condition is different. When conditions are added to the join clause, SQL Server and Informix return all rows of the Outer Join table, and then return the rows of the second table using the specified conditions. If conditions are placed in the WHERE clause, SQL Server first performs the join operation, and then uses the WHERE clause to filter connected rows. The following two Queries show the impact of conditional placement on execution results:
Condition in join clause
Select *
From t_institution I
Left Outer Join t_teller t
On I. inst_no = T. inst_no
And I. inst_no = "5801"
The result is:
Inst_no inst_name inst_no teller_no teller_name
5801 Tianhe District 5801 0001 Tom
5801 Tianhe District 5801 0002 David
5802 Yuexiu District
5803 Baiyun District
Condition in where clause
Select *
From t_institution I
Left Outer Join t_teller t
On I. inst_no = T. inst_no
Where I. inst_no = "5801"
The result is:
Inst_no inst_name inst_no teller_no teller_name
5801 Tianhe District 5801 0001 Tom
5801 Tianhe District 5801 0002 David
2.4 self-join
A self-join means that the same table is connected to itself. This type of unary join is usually used to extract data from the self-inverse (also known as recursive) relationship. For example, the relationship between employees and bosses in the HR database.
the following example shows information about the institution and its parent institution in the organization table.
select S. inst_no superior_inst, S. inst_name sup_inst_name, I. inst_no, I. inst_name
from t_institution I
join t_institution S
on I. superior_inst = S. inst_no
result:
superior_inst sup_inst_name inst_no inst_name
800 Guangzhou 5801 Tianhe District
800 Guangzhou 5802 Yuexiu District
800 Guangzhou 5803 Baiyun district
2.5 cross (unrestricted) join
the cross join operation is used to multiply the pure relational algebra of two source tables. It does not use join conditions to limit the result set, but combines rows from both data sources in all possible ways. Each row in the data set must form a new row with each row in the data set. For example, if the first data source has five rows and the second data source has four rows, 20 rows will be generated for cross-join. This type of result set is called the Cartesian product.
most cross-connections are caused by incorrect operations, but they are very suitable for filling the example data in the database, you can also create some blank lines in advance to reserve space for the data to be filled during the Program execution.
select *
from t_institution I
cross join t_teller T
NO on Condition Clause in the cross join