Big Data single table Storage Solution

Source: Internet
Author: User
Tags hypertable

Background:

10 million + users

Each user generates 1000 valid records every day. The record composition is user ID, timestamp, Field 1, Field 2, and field n.

The length of each record is about 1 K.

The total data volume generated by each user per day is 1000 kb, that is, 1 MB.

Data volume generated per month: 30 mb

The annual data volume is 360 MB, and the number of records is 1000.3012 = entries

These data features: one write, multiple reads, no changes in the middle!

Requirements:

The data generated by each user must be stored for more than five years and can be queried at any time. The time span of each query cannot exceed three days.

Problem:

  1. How to store the massive data using traditional relational databases (MSSQL and MySQL? Quick query is supported!

Idea 1: relational database storage

  1. Create a database by year. For example, in March 2014, the database name is user_2014, or the year database;

  2. In each year database, 360 data tables are created. Each table represents a user's annual record (at least million records, and the size of a single table is about M + );

Advantages:

  1. Technical implementation is simple and difficult;

Disadvantages:

  1. I personally know about MSSQL and MySQL. There are 10 million tables in any database, each of which has about records. When querying, inserting, and backing up data, it's not easy!

Idea 2: massive small files + relational database storage

I. data production and writing

  1. The producer submits the data to the queue (initially using a complete SQL statement;

  2. The consumer obtains the data to be written into the database from the queue and writes it to the data table in batches (100 entries are written each time );

Ii. Data Export

  1. Start a scheduled job after every day to export the data of the previous day from the data table to the file (by the target number, respectively );

  2. All data records of the target day are saved in each file. Each record uses \ n \ r as the end character;

Iii. Data Query

  1. Locate the corresponding file based on 'destination number, time range (up to 3 days of query), and complex query condition' and load it into the memory;

  2. The program filters out the data in the memory according to the 'complex query condition' and outputs the results;

Iv. File Storage Structure

Each 100 targets occupies one directory, which contains 100 subdirectories, as shown below:

PS: only the number of directories or files in each directory cannot exceed 100. The reason is: when the number of subdirectories or files in the directory exceeds 100 in Linux or windows, will affect OS efficiency.

Advantages:

  1. The directory structure is clear, simple, and easy to understand;

  2. After the first stage is complete, use hadoop or hypertable instead (because you are not familiar with hadoop or hypertable)

Disadvantages:

  1. During batch backup or management, you need to write some auxiliary tools for operations. Manual operations are too troublesome!

After reading the above content, you have a question. Which solution do you think is better? I am grateful if you have better solutions or suggestions!

Big Data single table Storage Solution

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.