SQL/SQL Server FAQ

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. How to remove duplicate rows in the table? List at least three methods and analyze the efficiency.

For example, the table salary is as follows:

Select * from salary order by userid

Method 1:

select distinct * into #tempSalary from Salarydelete Salaryinsert Salary select * from #tempSalarydrop table #tempSalary

Method 2:

with cts as (select ROW_NUMBER() over (partition by UserID, Salary, Dept order by UserID) rn from Salary)delete from cts where rn > 1

Execution result:

This method is an elegant method involving row_number over, order by, and partition. Partition by is a group based on the given column (s) and allows the group to be numbered (the order of numbers is the order ). The same record will generate a record number greater than 1.

For example, in the preceding example, run the following command separately:

select userid,ROW_NUMBER() over (partition by UserID, Salary, Dept order by UserID) rn from Salary

Method 3:

Use checksum as the hash function to calculate the hash value of the record as the new column h_id;

alter table Salaryadd H_ID as checksum(userid,salary,dept)

Based on the method in 1, select the unique records inserted to the temporary table for records with repeated h_id;

select distinct userid,Salary,dept into #duplicatedSalary from Salary where H_ID in(select H_ID from Salary group by H_ID having COUNT(*) > 1)

The temporary table is as follows:

Delete original table records based on h_id;

delete from Salary where H_ID in (select H_ID from Salary group by H_ID having COUNT(*) > 1)

After deletion, the following information is displayed:

Duplicate rows have been deleted.

Insert zero-hour table data back to the original table;

insert Salary select * from #duplicatedSalary

Delete the hash column and the zero-time table.

alter table Salarydrop column H_ID

drop table #duplicatedSalary

Finally, we get the result of removing duplicate records:

So far, the third method has ended. This method is certainly much faster than the first one in terms of efficiency, especially in the case of a large amount of data, it will reflect good and benign. It only deletes duplicate records and does not change non-duplicate records.

2. What Are indexes? What types of indexes are there? How are indexes implemented?

Index is the object created on the database. It provides a fast access channel to Data. Index storage uses the B-tree data structure. It can be divided into clustered indexes (physical indexes, such as the index of an English dictionary) and non-clustered indexes (such as the radicals index of a Chinese dictionary, which is actually stored as a rule table, cannot directly obtain the desired value ).

3. What is a stored procedure? Why is it more efficient to call a stored procedure?

Directly define the stored procedure: it is an SQL segment compiled on the server.

Reasons for high efficiency:

First, it is faster because it is a compiled snippet on the server and can be executed directly without re-compilation;
Second, it eliminates the time required to transmit SQL statements from the app server/or client to the server;
Third, the stored procedure can cache the SQL Execution Plan to reuse the execution plan.

4. What is transaction and what are its features? If transaction_a is nested with transaction_ B, when transaction_a executes a part of the starting star transaction_ B, transaction_ B is executed successfully, and the result is submitted, but an error occurs when transaction_a is executed for the remaining part, the transaction needs to be rolled back. Will the changes submitted by transaction_ B be rolled back after the rollback?

Transactions are the basic unit for running the database management system and a sequence of user-defined database operations. These sequences are either full or not all, which is an inseparable unit of work. Acid features:

Atomicity: atomicity. It ensures that a group of operations are atomic and cannot be divided into one whole. It is either done in full or not with the database;
Consistency, consistency, enables the database to change from one consistent state to another consistent state.
Isolation, isolation, cannot be disturbed by other transactions.
Durability, permanent. Once submitted, the change is permanent.

According to the features of the transaction, the transaction_a part of the failure needs to be rolled back. Although transaction_ B is committed once, it needs to wait for the second commit of transaction_a to actually commit. Therefore, the operations of transaction B will also be rolled back. Transaction is divided into common transactions and nested transactions. The commit of nested transactions can be fully committed only after all transactions are completed.

5. What are the methods for optimizing SQL statements?

You need to check the compiled SQL Execution Plan, and use the correct connection type for the data type and size of the specific table. SQL Server selects the execution plan by default, but sometimes it selects an error. At this time, manual optimization is required. For example, if the data volume of both tables is large, hash join is required for join.

Join types include:

Hash join;
Nested loop
Merge

Among them, the nested loop can be used for a small amount of data, such as a few as 100,000. However, when the data volume exceeds one million, hash is required, and hash join can only perform equivalent join operations. If the data in the table is ordered, you can use merge.

6. What is partition and how can it be used?

Partition is a good solution for big data storage and operations. It stores tables in partitions according to rules, and controls the data stored in each zone in the scope of ease of operation. For example, data in a data warehouse is stored for three years. The data size of each day is several million, and the total data volume reaches the order of billions. It is completely unreasonable to store data in a table. Therefore, you can perform partition and partition storage by day.

Partition is transparent to users, and the user is still operating on a table. However, we have optimized the partitions during storage.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL/SQL Server FAQ

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL/SQL Server FAQ

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support