Oracle Large Data processing method

Source: Internet
Author: User
Keywords Large data very this can

From the Internet to query about database data processing program, there are a lot of good blog, put forward a lot of solutions, so I also want to tidy up on this aspect of the content, if just put the summary copy of other people to this doesn't mean anything, Even in the interview will often be asked how to deal with large data and high concurrency solutions, and also has a lot of repeated online content, an article copy to copy to go!

Some of the Java Web projects you do now are large data, very few, basically the entire database adds up also to hundreds of thousands of data quantity, like those very big data website or the system basically has many servers or the configuration very high server to support, recently did a project data quantity in TENS , the database table space is actually used about 3GB, this data is not what large data, but in the process of development of their own spent a lot of time, according to their own experience to share about this large data processing program, this is just personal experience!

First, solve the SQL problem first

First of all, regardless of what other third party is used, on the simple SQL, write query efficient SQL statements, a database report, you can use different keywords and different SQL to achieve, write different SQL comparison, which query efficiency, which query efficiency, this is the most direct!

Second, create the partition table or historical data table

To create a partitioned table or a historical datasheet for a table with a very large amount of data, for example, such as order form, transaction flow record, and so on, this may be based on the size of the data, such as I do in the last project, a day of trading water is less than 10,000, according to the increase in the amount of data, according to a year to establish a partition table, The amount of data on almost one table in a year is about 3 million. Or to add a query history of the function, or limit the query time, such as only one months or a quarter of data, many large sites are so dry!

Third, through the establishment of scheduled tasks or run batches to make statistical data, that is, the establishment of data tables

Previously, in the project to increase the timing of tasks, such as at night around 1 o'clock through the scheduled task of the day's transactions to make statistics and the data stored separately in a table, statistical data such as how many pens the day consumption, the total amount of consumption is how much, the day returned a number of pens and the day of the total amount of return is how much; make monthly tables, quarterly statistics , annual statistical tables; In this case, a table of data volume is quite small, this is actually static data! At the same time, this approach will also solve the so-called high concurrency caused by server pressure and slow access problems!

But there are problems with this approach, the problem is not the statistical data, but the timing of the task, once because the timing of the task too much, too much of the implementation of the matter to the impact of the system, then had to restart the server; So there's a good way to do that is to separate the timer tasks, write a jar package, Then write a shell script that executes the jar package on Linux, and deploy the jar package to a Linux server, and it's easy to write a timed task using the Linux shell, but I think it's a relatively good way to But it must do a good job of log processing; I do the way is in the jar bag through the log4j log tracking, but also set up a run batch of history table to record whether run batch success, if run batch failed, I also wrote a message processing function in the jar, that will be like the designated mailbox to send email notice!

Set up a static HTML page

You can use this kind of data, like the yearly statistics, that basically don't change, and then write the annual data directly in the HTML, direct access to the HTML static page, that is, the year to do every year data static page, make every year every quarter or even every one months of static data, And this approach reduces the pressure on the server and the database.

Only this way is to manually build static pages, in fact, you can also through some template technology to create static pages like freemarker and velocity;

However, in my project I do not use this approach to deal with, the reason must be some: 1, statistical data is very small, and these data are already handled well, a straightforward query can be completed, there is no need to build a static page; 2, do not want to do the same thing in two ways, is a lot of historical data I need to compare, for example, this year's January and last year, the same period of consumption comparison, this year's total business and go or the total amount of the previous years, compared to 3, the system is not a large number of visits to dozens of, do not need to do so. In combination, there is no way to use!

V. Don't rely too much on the Java framework

Like this large amount of data, do not rely too much on, and can even say do not use struts, Hibernate, Spring MVC, Spring, Ibatis, and other Java frameworks, really, I think the use of these frameworks is really not better than using pure JDBC, Unless you have a good example of how these frameworks are a good tool for dealing with big data, I've been working on outsourcing projects at home, using Spring MVC, Spring, Hibernate, and just 500,000 data volumes, and the system is running a little bit harder, Stress tests are a problem!

Of course, the SSH framework is not used in this project, it is a simple and understandable framework written by a cow in our company!

In addition to these five ways, there are many other details, simple to say a few: 1, you can set up the table index fields, but not too many indexes, too much of the words will not have effect; 2, do not use Oracle's sequence method to establish a primary key; 3. A primary key uses a char type that is more efficient than using a VARCHAR2 type query, and so on.

The points I have described above are not easy to solve without using any of the other third technologies, but none of these are used in my project:

Vi. using large data-processing tools from third parties

About large data processing tools, there should be a lot of

VII, distributed applications

This is currently being studied

Eighth, server cluster

This is also the same, not previously used in this way to deal with

Ninth, bought the server

This is the idea of seeing the boss.

The above is a person in the processing of large data (not to mention large data, please forgive AH) when the common way, if there is any good opinion or what is wrong with the place, we can communicate with each other to learn.

"Guess you like it."

1. Large Data processing technology--python

2. Trends in large data processing technologies-introduction of five open source technologies

3. Large data processing technology direction

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.