Experience in Big Data Processing

Source: Internet
Author: User

1. in terms of database technology, our company is currently studying hadoop hierarchical databases, but we do not know much about them. nosql non-relational databases are popular outside, some Japanese enterprises such as Amazon and Google have their own nosql databases;

2. Optimization of traditional relational databases, database layer optimization, and upper-layer use optimization.

Database layer: DBAs must be optimized to reduce fragmentation and perform partitioning;

Optimization of the Use layer, that is, optimization of SQL

From the external factors, the SQL statements are affected: CPU, Ram, network, and disk.

CPU: a large number of SQL order
By, a large number of group by, case when and so on will be very expensive CPU, need CPU for computing. Can I use summary to reduce this problem?

Ram: The queried data volume is too large, resulting in excessive memory resource usage.

For SQL statements without where, select
* SQL, full table scan, etc;

Frequent update and insert operations will affect the memory. Each SQL parsing takes a certain amount of time and space. Bind variables.

Network: excessive dB connections, frequent dB switches, cross-database associations, export of large amounts of data, and complex SQL statements.

Disk:

Create indexes for tables with large data volumes to ensure the effectiveness of indexes;

Reducing the insert and delete operations on large tables may cause disk fragmentation and disk pointer inconsistency;

The insert and delete operations on large tables will invalidate the index. If necessary, remove the index before adding, deleting, and modifying the index;

An index is actually a table, which must be simplified.

It is best to create an index for easy-to-Sort fields, such as number and date. Do not use varchar;

The varchar field should be consistent in length as much as possible, rather than giving more space;

Reduces the number of disk reads;

Disable sequential full table scanning for large tables and use indexes;

Reduce disdinct and replace union with unionall;

Not like, <>, Full fuzzy like, is
Null, is not null, not in will invalidate the index;

Do not use any function on the index. Try to use the function on the other end of the equal sign;

SQL writes are consistent, reducing the parsing time;

Selecting the best execution plan and complex SQL statements is not as good as multiple simple SQL statements;

Reduce nested subsql statements and use associated queries;

Avoid Cartesian Product connections;

To avoid using *, the database needs to perform a match on *, which consumes resources, and not all fields need to be queried or written. When writing data, the table structure changes may cause errors, so avoid *;

Delete the entire table. Do not use Delete. Use truncate;

Full table paging is inefficient. We recommend that you use step-by-step paging;

3. After Data Reading is optimized to a certain extent, the code can also be greatly optimized.

Avoid packing too much and use the value type;

Generic type is used for the collection of reference types;

Avoid loop nesting and endless recursion;

Avoid creating large objects in a loop;

Release of large objects;

4. Logical Optimization

When you need to query a large amount of data, you can use pagination;

When pages affect the generation of some icons, you can use the summary to display the summary information and icons first, and then drill the details;

Time Space replacement.

5. localized storage of common information. For example, the first loading of QQ is slow, but later login will be fast.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.