Original: How to cope with the query problem of too large table data? (How to avoid large table associations)
In general, for the B/s structure of friends, more likely to encounter high concurrency of database access, because the popularity of the web now like a rocket launch, but also because of high traffic to bring a series of performance problems, The database has always been an important platform for communication between users and businessmen. User is impatient to endure a query need to use more than 10 seconds, or less, if the server crashes frequently or is reported query timeout, I think it will be a failed project. I've been working on the web for a few years, and I've never experienced a large or massive amount of data. This is not to say that a project without massive data is not a good project, but a project application.
When I recently worked on a project, I stumbled upon this opportunity, and in my work, I found the maximum number of single-table records up to 9 digits. There are also 8 digits in the form of orders. When querying an order, it is often not possible to solve by a single-table query, but also with other related tables associated with the query. Once large tables are associated with large tables, it is possible to have a slow wait at query time.
Keynote:How to avoid this situation? Now that the data is available, the need is still to be realized, and here's my recent optimization process for the database, which I'll explain in two articles.
First article: How to avoid big Table Association as far as possible.
The second article: partition large tables.
background:There are two tables:
1: Order form: Record details of user orders.
Order, which has a membership card number field
Cardno, Order generation time.
2: Membership table: Record information about members.
member, a member has a proxy number:
Proxyid, there are many membership cards under the agent:
Cardno, they share a proxy number.
The two tables are linked by Cardno.
Requirements: query A user or some users for a certain period of time all the membership card production order situation.
Implementing sql:
Select field from Order
INNER JOIN member on
Order.cardno=member.cardno
and Member.proxyid in (' a-01 ', proxy number two)
and time between ' 20080101 ' and ' 20080131 '
my opinion:I think the average friend to see this kind of demand will mostly write such query sql, if you do not like to use
inchOr think that in the poor performance of friends available
Union AllReplace. The SQL statement can be said to be simple cannot be simpler, itself is not a problem, only if the data of the two tables are more than million, and the fields are particularly many. At this point, if only the help of the index will not be able to achieve the desired effect.
Solution One: use table variables to replace large table associations, table variables scoped to a batch, the batch is finished, the table variable will also be invalidated, compared to the temporary table has its unique advantages: Do not have to manually delete the table variable to free memory.
Feasibility:Because most of the output fields in the requirements are from the order table, the member table only plays the role of data constraints, and query the role of the user membership card number, all can first take the agent's membership card number into the table variable, and then use the table variable with the card number and the order table associated query.
DECLARE @t table
(Cardno int)
Insert @t
Select Cardno from member where in (' a-01 ', proxy number two)
Select field from Order
INNER JOIN @t on
[Email protected] and time between ' 20080101 ' and ' 20080131 '
Here I do not paste the performance comparison chart, interested friends can try it on their own. This method is particularly helpful when querying more people. It wants the developers to compare the actual situation in detail, the results are not uniform, different environmental results may not be the same. I hope you understand.
Solution Two: use indexed views to improve the performance of large table associations.
feasibility: Generally in large table association, our output column is much smaller than the field of two tables, like the member table above only used two of the fields (Cardno,proxyid). Imagine that the member table at this point would be better if only these two fields ? The answer goes without saying.
View this noun, in my previous impressions of it, never thought that the view would optimize the query, because I think the view is a false table for the database, and there is no physical location in the database to store the data. It's just a different perspective for the user to see the results. View data
is generated in real-time, that is, when the view is invoked, the view is automatically expanded to run the corresponding SELECT statement inside. Later only to know that in version 2000 the view is divided into general and indexed views, The general view is the view that I remember without creating the index. The view is called an indexed view after it has been created. Indexed views are physically present, creating a unique clustered index on the view, and nonclustered indexes on other fields. Without changing the underlying table, the effect is optimized.
CREATE VIEW Memberview
With SCHEMABINDING
As
SELECT Cardno,proxyid from member
GO
--Create a unique clustered index by membership card number
CREATE UNIQUE CLUSTERED INDEX Ix_member_cardno
On member (CARDNO);
GO
Note: Create an indexed view point:
1:create VIEW Memberview Follow with SCHEMABINDING
Reason: • Use schemaname.objectname to explicitly identify all objects referenced by the view, regardless of which user accesses the view.
• Changes to the objects referenced in the view definition are not made in a way that causes the view definition to be illegal or to force SQL Server to re-create the index on that view.
2: The first index on a view must be CLUSTERED and UNIQUE.
Reason: Must be UNIQUE to easily locate records in a view while maintaining indexed views, and prevent the creation of a view with duplicate items (requires special logic to be maintained). Must be CLUSTERED, because only a clustered index can store rows while forcing uniqueness.
3: Consider creating an indexed view for the following scenarios:
• You can pre-compute aggregations and save them in the index, minimizing high-cost calculations when queries are executed.
• You can pre-join individual tables and save the resulting data set.
• A combination of joins or aggregations can be saved.
4: Updates to the underlying table will cause an update of the index vision.
5: The creation of an indexed view also provides maintenance overhead.
Reason: 1: Because the indexed view is physically present.
2: Additional maintenance index is needed.
Implementation: Sql:select field from order
INNER JOIN Memberview on
Order.cardno=member.cardno
and Member.proxyid=in (' a-01 ', agent number two)
and time between ' 20080101 ' and ' 20080131 '
Summary: two solutions, with strengths, it is generally preferable to use indexed views to optimize large table associations. These are the measures I have taken to avoid the occurrence of large-scale association, I hope you will advise us.
Note:
This article references: http://www.51cto.com/html/2005/1115/11396.htm
How do I cope with query problems with table data too large? (How to avoid large table associations)