In the SQL query, often use the left connection, right connection, inner connection, full connection, then how to implement these functions in R, today speaking!
SQL reviewprinciple
# connections can be divided into the following categories:
internal connections. (typical join operation, using a comparison operator like = or <>). Including equal connections and natural connections .
An inner join uses a comparison operator to match rows in two tables based on the values of the columns that are common to each table. For example, retrieve all lines of the students and courses table with the same student identification number.
External Connection。 An outer join can be left outward connection, right outward connection, or full outer join .
When you specify an outer join in the FROM clause, you can specify it by one of the following sets of keywords:
Left JOIN or left OUTER join.
The result set for the left outer join includes all rows from the left table specified in the outer clause, not just the rows that match the join column。 If a row in the left table does not have a matching row in the right table, all select list columns in the right table in the associated result set row are null values.
Right join or right OUTER join.
The right outward connection is a reverse connection of the left outward connection.all rows of the right table will be returned. If a row in the right table does not have a matching row in the left table, a null value will be returned for left table。
Full join or full OUTER join.
a full outer join returns all rows from the left and right tables. When a row does not have a matching row in another table, the selection list column for the other table contains a null value. If there are matching rows between the tables, the entire result set row contains the data values of the base table。
Cross Connect. Cross Join returns all rows from the left table, with each row in the left table combined with all rows in the right table。 Cross joins are also called Cartesian product.
Sample Analysis。
Note: There is a relationship between a.id and parent_id!
Table A |
b table |
ID Name 1 Sheets 3 2 John Doe 3 Harry |
ID Job parent_id 1 23 1 2 34 2 3 34 4 |
Internal connection
select a.*,b.* from a inner join b on a.id=b.parent_id
ID Name ID Job parent_id 1 Sheets 3 1 23 1 2 Lee 42 34 2 |
Left connection
select a.*,b.* from a left join b on a.id=b.parent_id
< p> id name id job parent_id 1 Zhang 3 1 23 1 2 John Doe 2 34 2 3 Wang Wu null |
Right connection
Select a.*,b.* from a right join B on a.id=b.parent_id
ID Name ID Job parent_id 1 Sheets 3 1 23 1 2 Lee 42 34 2 Null 3 34 4 |
Fully connected
Select a.*,b.* from a full join B on a.id=b.parent_id
ID Name ID Job parent_id 1 Zhang 3 1 23 1 2 John Doe 2 34 2 null 3 34 4 3 Wang Wu NULL |
Summary
The left JOIN returns all of the records from the table (a table) on top, even if there are no matching rows in the right table (table B).
Right OUTER join joins all records in the right table, even if no record matches it in the left table.
Full OUTER joins all connections, returning all records in the left and right tables.
R implementation
The function that implements the connection in R is basically the merge function in the base package (specifically for the data frame, corresponding to the table in SQL), the operation is very simple, the following is a brief analysis!
The merge function syntax
# Merge The data frames:merge the data Frames by Common columns or row names, or does other versions of database join operations.
# Usage
Merge (x, Y, ...) # # Default S3 Method:
Merge (x, y, by = intersect (names (x), names (y)),by.x = by, by.y = by, all = FALSE, all.x = al L,all.y = All,sort = TRUE, suffixes = C (". X", ". Y"), incomparables= NULL,...) # # S3 method for class ' Data.frame ' ( Blue highlighted as common parameter
# Arguments (parameter description)
# details (Detailed description)
By default, the merge function merges the data frame in a way that is connected within (that is, the natural connection: data frames is merged on thecolumns with names they both). by.x and By.y Specify the columns to use for the connection. All series specify how to connect: All=false Specify a natural connection (default), all.x=true specify a left connection, all.y=true specify a right connection, all=true specify full connection!
# example
Note: Surname and name are related!
< Span style= "font-family: Microsoft Jacob Black; Font-size:small; " >authors |
books |
surname NA Tionality deceased 1 Tukey US Yes 2 Venables Australia No 3 Tierney US No 4 Ripley UK No 5 McNeil Australia no |
name title Other.author 1 Tukey Exploratory Data Analysis <NA> 2 venables modern applied Statistics ... Ripley 3 Tierney lisp-stat <NA> 4 Ripley Spatial Statistics <NA> 5 Ripley Stochastic Simulation <NA> 6 M Cneil Interactive Data Analysis <NA> 7 R Core a Introduction to R Venables & Smith |
# Internal Connection : Specify parameter All=false and sort by by.x (Sort=true specified)
> merge (x = Authors,y = books,by.x = ' surname ', by.y=' name ', All=false,sort = TRUE) Surname Nationalit Y deceased title Other.author 1 McNeil Australia no Interactive Data analysis <NA> 2 Ripley UK no Spatial Statistics <NA> 3 Ripley UK no Stochastic Simulation <NA> 4 Tierney US no lisp-stat <NA> 5 Tukey US Yes Explorator Y Data Analysis <NA> 6 Venables Australia no modern applied Statistics ... Ripley |
# Left connection : books as left table, author to right table for left connection
> merge (x = Books,y = authors, by.x = ' name ', by.y= ' surname ', all.x=true,sort = TRUE) name Title Other.author Nationalit Y deceased 1 McNeil Interactive Data analysis <NA> Australia No 2 R Core a Introduction to R Venables & Smith C4><na> <NA> 3 Ripley Spatial Statistics <NA> UK No 4 Ripley Stochastic Simulation <NA> UK No 5 Tierney Lisp-stat <NA> US No 6 Tukey exploratory Data analysis <NA> US Yes 7 Venables modern applied Statis Tics ... Ripley Australia No |
# Right connection: Right connection and left connection similar, skip directly!
# Full Connectivity : Two common connection fields in the table are returned, not matched to the return to na!
> merge (x = Books,y = authors, by.x = ' name ', by.y= ' surname ', all=true,sort = TRUE) Name Title Other.author nationality Deceased 1 McNeil Interactive Data analysis <NA> Australia No 2 R Core a Introduction to R Venables & Smith <na> <NA> 3 Ripley Spatial Statistics <NA> UK No 4 Ripley Stochastic Simulation <NA> UK No 5 Tierney Lisp-stat <NA> US No 6 Tukey exploratory Data analysis <NA> US Yes 7 Venables modern applied Statisti Cs... Ripley Australia No |
#扩展: Use of the incomparable parameter
The
Merge function can not only make the usual SQL connection, but also perform a row join, which returns the row observations for the row equality of one of the two columns in the two table, which can be specified by the by= parameter (in fact, the intersection is taken)!
|
K1 K2 data1 NA 1 na na 3 na 4 4 4 5 5 5 5 |
K1 K2 data1 na na 2 na 3 4 4 4 5 5 5 5 |
# match the observations in the X and Y tables where the values in the K1, K2 fields are equal (the principle should be: First take the intersection of the K1, K2 vectors, return the line number of the intersection, and then take the observations from the two tables according to the line number!) )
> Merge (x, y, by = C ("K1", "K2")) #NA ' s match K1 K2 data.x data.y1 4 4 4 5 5 5
53 na na 2 1 |
# matches only K1, that is, two tables, and returns the observed value of the K1 corresponding public field to the response line number!
> Merge (x, y, by = "K1") K1 k2.x data.x k2.y data.y1 4 4 4 4 5 5 5< C24/>5 1 na 1 na 1 1 3 na na 2 na 16 Na na 2 3 3 |
# using the incomparable parameter, you can see: Incomparable=na removed all observations containing NA!
> Merge (x, y, by = "K2", Incomparables = NA) K2 k1.x data.x k1.y data.y1 4 4 4 4 5 5 5 5 5 |
Add:
How does the orthogonal complement in the set theory in mathematics be implemented in R? Here's a quick addition!
Union: Unions, Intercept: intersection, Setdiff: complement, Setequal: compare two sets for equality
Example
> (x <-C (Sort (sample (1:20, 9), NA)): [1] 4 5 6, NA > (y <-C (Sort (sample (3:23, 7), NA)): [1] 3 4 7 9 na > Union (x, y): [1] 4 5 6, 3 na 7 9 x x > Intersect (x, y): [1] 4 NA > Setdiff (x, y): [1] 5 6 1 > Setdiff (y, x): [] 3 7 9 x > Setequal (x, y): [1] FALSE |
r7-full connection inside and around