About Big data volumes, data transfer and storage issues

Source: Internet
Author: User

Scene One

Conditions:

The database has table A, which has 100W data.

Operation

Take all the data out of the table and handle it.

Problem

    1. Can you take it all out at once? Do you select 字段1,字段2 from A want to remove the data?
      (in addition to preventing the removal of data out of too many causes memory overflow, what need to consider?)
    2. What factors need to be taken into account in determining the number of data that needs to be taken out each time?

Scene Two

Conditions:

A service has a batch of data and needs to invoke the interface of the B service to get more information. B Service provides a batch interface

Operation

A service invokes B service through RPC

Problem:

    1. Here a batch of data can be through the B service batch interface, through the RPC all fetch (if not.) What is the main concern?)
    2. What are the factors that need to be taken into account in determining the number of data per query?

Reply content:

Scene One

Conditions:

The database has table A, which has 100W data.

Operation

Take all the data out of the table and handle it.

Problem

    1. Can you take it all out at once? Do you select 字段1,字段2 from A want to remove the data?
      (in addition to preventing the removal of data out of too many causes memory overflow, what need to consider?)
    2. What factors need to be taken into account in determining the number of data that needs to be taken out each time?

Scene Two

Conditions:

A service has a batch of data and needs to invoke the interface of the B service to get more information. B Service provides a batch interface

Operation

A service invokes B service through RPC

Problem:

    1. Here a batch of data can be through the B service batch interface, through the RPC all fetch (if not.) What is the main concern?)
    2. What are the factors that need to be taken into account in determining the number of data per query?

Question 1
1. You can take out all first, see how much data is taken out, if only a few m of memory, one-time removal processing, should not have any problem
2. If the batch is taken, pay attention to sorting, prevent paging out the same data

Question 2:
1. Look at the amount of data, if not big, just one call to dispose of it.
2. Do it first, the problem will always be exposed, from your simple description to see, do not know what the problem.

2 scenes first of all, just a moment.

Scenario One:

1. Can I take it all out at once? Do you want to remove the data from the Select field 1, Field 2 from a?
(in addition to preventing the removal of data out of too many causes memory overflow, what need to consider?)
2. What are the factors that need to be taken into account in determining the number of data to be taken out each time?

First, you're writing the biggest question in parentheses.
Second, without considering the parentheses in the first one, this second question needs to be considered most.

What are your needs? Is there a limit to the time? Taking it all together may be slow, do you want to split the threads in batches? Of course, all take the DB query speed and data accuracy, 100W if the field does not have a large number of fields (TEXT,BLOB, etc.) should be OK.

Scenario Two:

1. Here a batch of data can be through the B service of the bulk interface, through the RPC all take over (if not.) What is the main concern?)
2. What are the factors that need to be taken into account in determining the number of data per query?

First, it depends on how the B interface is designed, if you want to ask is can do, the answer is yes, the main worry is not to give it too much, it looked slow? Is it too big to return?
Second, see the next

2 scenes even for a moment.

In fact, your problem or performance, 100W of data processing to batch, but do not know how to do is the best, I suggest you first put this processing environment to take into the data of their own test, this aspect of the problem is very complex not one or two can be said clearly.

According to the results of the test itself to adjust the speed of processing and the performance of the program, the performance contains the implementation, the implementation of the way there are many kinds of, such as the above I mentioned multithreading, it depends on whether you query the data can be used in batches to do.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.