Many database data processing stories and database data processing stories
Every year, you will encounter a series of data import problems. I learned a lot from it.
In the past two days, I have been asked why you are always importing data. There is no problem with the data. To make full preparations, we need to sort out the most complete data of the 8-phase version into the 10-phase database.
Background: The problem is that the database design has changed greatly from the basic system of the 8-phase edition to the basic system of the 10-phase edition. For example, the 10-phase student table contains the class foreign key, the third table is used to maintain the relationship in Stage 8 tables. The relationship between the course and the teacher is required in the 'course class' of the 10th stage, and multiple tables are required in the 8th stage to associate them. For more than 20 tables and various foreign key associations, a maximum of more than 0.5 million pieces of data can be stored in one table. How to do it quickly is an urgent problem to be solved.
Story 1:
Because the eight-phase tables are connected by the third table. Tables with relationships like 1: n in Stage 10 are basically associated with foreign keys.
For example, I want to add student information (including student ID, name, foreign key of the class, etc.) to the table 10 ). I need to first export the student information (excluding the foreign key of the class) from the 8-phase student table and then export the relationship between the student and the class from the third table of the class and the student, then compare the two tables one by one to integrate them into one table.
This is the most natural idea at the beginning. Of course, it is inefficient and the error rate of manual verification is extremely high. So we switched to PlanB. Create a view for the student and the third table in the database, and then export the view to excel.
Story 2:
When creating a view, we encountered another problem. Join the two tables and find that the total number of students is 22052 rows, and the total number of views is 22191 rows. This should not happen if the data is correct. Because, in principle, there is a n: 1 Relationship between the student and the class, and the table data is not redundant. The data of the link and the right link should be the same.
So we used a query statement select ClassID from BasicClassEntities where ClassID not in (select ClassID from BasicStudentEntities) to forgive me for using such an efficient query statement. We found that there were no students in 139 classes. Later, the study found that there were more than 100 virtual administrative classes (for the exam), and there were actually three classes that were not associated with human error. You can check whether the relationship between the table and the table is correct. We will not list them here.
Story 3:
I thought of a previous interview question. So I created my own table. For example, table A has 5 data records and table B has 8 data records. If two tables have three public data records, use a cross-Join Operation to determine the number of data records that can be obtained by a left-side join operation.
Lab 1:
Data in the student table. Data in the teacher table
Use the student table and the instructor table to connect to the left of the student table ,.
Experiment 2: Based on this, three pieces of public data are changed to not one-to-one correspondence. Now there are two IDs that are the same as the student IDs.
The result is as follows.
Look! The content is as follows:
Story 4:
When we copied the sorted data excel to the database for the first time, we set the excel Field and the database field to the same, and then ctrl + c and ctrl + v. Result It takes more than 10 minutes for more than 20 thousand students to import the data. So we use the excel import function provided by the database to find that even 0.5 million of data can be done instantly. After the import, a new table is generated.
Then, from the table data to the other table, using the statement insert into BasicStudentEntities select * from BasicStudentEntities $ is also a matter of seconds.
There are many interesting stories. continue to the next section ~