Topic Center

Contact Sales

Home > Hot Categories > Big Data

Lao Li share: Big Data performance tuning case

Last Update:2015-10-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Lao Li share: Big Data performance Tuning Case 1, "Space Change Time" and "in-memory processing data"

For example, there are 200,000 different user_id in the User_id.csv file, according to user_id to check their corresponding user recently published a post, take out Post_id,post_title, Post_time and user_id (post table check, There is a column in the post table which is user_id, which means the post belongs to the person, and the number of posts is about 2 million, so how to deal with it?
My solutions are:
A. Export the four columns of the post table Post_id,post_title, Post_time, and user_id to the Posts.csv file, and then read the records in the Posts.csv file into the csvrecords using a CSV read component
B. Then, using the idea of space-for-time, first read the user_id in user_id.csv to the Useridlist list object, and then convert the useridlist to a dictionary:
var useriddict = useridlist.distinct (). ToDictionary (c = c, c = 1);
C. Finally, the results are compared to useriddict and csvrecords:
var resultrecords = Csvrecords.where (c = Postdict.containskey (C.userid)). ToList (); The query time complexity of ContainsKey here is O (1)

2. Join optimizes query performance
A page query is very inefficient (no results for more than a minute), and the query process executes 3 SQL in the background, where 2 SQL executes in about 39 seconds (2 SQL), causing the database connection to time out.
Background database queries use the EF framework, which can easily lead to poor query performance if the EF framework is used improperly.
Simply simulate the following (the database table name has been adjusted and the number of records has changed, without affecting the result):
Use 2 tables: A table for the account table, such as accounts, the number of records is about 3,000. The other table, for example, posts the post, with a record count of about 1.9 million.
Then the background processing process is probably: first based on the query criteria to obtain the account_id list of the Account table Accountids, and then according to ACCOUNT_ID list to find post record (the Post table has a field account_id), presumably:
VAR posts= db. Posts.where (M). Where (c = accountids.contains (C.accountid));
There is no need to focus too much on this line of code, I use the tool to monitor this line of code corresponding to the SQL:
SELECT * from Post
WHERE
((= account_id) OR (= account_id)) OR ...
ORDER by Created_at DESC
LIMIT 0, 15;
There are about 2000 or more conditions above. Explain results show that the rows value is more than 1.34 million and does not use an index.
Because the ORDER BY clause is more inefficient, it cannot be omitted for optimization because the business requires the first page to show the last 15 posts.
With the join query, you manually spelled a SQL, found almost instantaneous results, SQL probably as follows:
SELECT * FROM
Posts as A
JOIN
accounts as B
On a.account_id = b.ID
WHERE a.category = 1 # Posts by category
ORDER by A.created_at DESC
LIMIT 0, 15;
In this way, you discard EF and rewrite the code by using a method of spelling SQL instead. Speed improved a lot, it took nearly 2 minutes to display the results of the query, now only need 3-4 seconds.
This scenario uses join to improve query efficiency because the number of records in a table is only about 3,000, and the other one has millions of of the data. If both tables have millions of of the data, then join does not necessarily improve query efficiency

3, Business logic code level optimization
Understand business logic and eliminate redundant business logic code

The original link; http://www.cnblogs.com/laoli0201

Lao Li share: Big Data performance tuning case

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Big Data era: a summary of knowledge points based on Microsof... 11-05

Big Data Architecture Development Mining Analytics Hadoop HBa... 04-28

0 Basic Learning Cloud computing and Big Data DBA cluster Arc... 02-21

"Big Data dry" implementation of big data platform based on H... 10-21

MYSQL Big Data Import 12-08

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lao Li share: Big Data performance tuning case

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support