ORACLE report database development ideas

Source: Internet
Author: User

OLAP is called online analysis, which is actually a report system and BI system. the BI system is a set of products here. both analysis and reports are developed using stored procedures. One is provided online to users, and the other is provided offline to colleagues.

Currently, online analysis is not widely used. The data volume involved is relatively small, but the number of users is relatively large.

1. the user only cares about his/her own data, such as the number of purchases, the total amount of purchases, and so on.

2. Product Association. For example, what other products have users purchased!

3. Product popularity;

The report involves all the data, including calendar data. Each department has different report requirements, and each colleague and each department lead will submit reports of interest to them.

ORACLE databases have evolved from transactional databases, and cannot process analytic data!

1. When you start to install the database, select OLAP. It will automatically adjust the necessary parameters.

2. Set 64-128KB data blocks instead of the default 8KB data blocks.

3. Hierarchical Design. Because of the large number of reports, obtaining from the original table will inevitably cause high performance blocking. therefore, we need to make basic data tables together, and other reports directly obtain data from these base tables. this greatly reduces the number.

A extraction source surface B Base Surface C Common layer D department Layer

How to score? Where and where data is used, it is necessary to know more about the business and understand the Reports of the company and various departments to have a rough idea. These cannot be done in the first place, it needs to be constantly optimized. this is because you cannot be familiar with your business in a short time.

4. Task Scheduling:

The storage process and software package are used for each report, and the data in each table is generated. Therefore, dependencies between these tasks are generated.

ORACLE jobs are used for scheduling. Stored Procedures include stored procedures. That is to say, jobs are used to schedule and start stored procedures to include related stored procedures.

This method is not flexible, poor scalability, and difficult to maintain!

Crontab should be used for scheduling. for example, to write a job with a break, the job runs every 5-10 minutes. this job only calls one stored procedure. A stored procedure starts a task. The task is a software package or stored procedure.

This Stored Procedure reads the task information table, the task dependency table, and when to start the task, and supervises the running status and alarm of the task.

5. The software package generally contains a extraction and storage process, B list storage process, C-day data storage process, D-week data storage process, e-month storage process, and F-result table storage process; g rollback storage process; H cleaning of expired data

A. The extraction and storage process extracts data from the source table to the temporary table, which refers to the table of data required by the task. The temporary table here is physically named as _ TMP.

The temporary table method is adopted because the Oracle table connection cost is very high, especially the left join + left join of multiple tables. you can use a temporary table to create small data blocks for necessary fields and rows.

B list Stored Procedure

The list indicates that this part of data needs to be temporarily stored for 1-3 months. The main requirement is de-duplication. The number of people in a month cannot be calculated from the number of people in a day. name the list with _ lst as the Partition Table month, day, or hour partition.

The C-day data process retrieves data from the list for statistics. Of course, if there is no list, it is directly obtained from the extracted temporary table.

D. The week process is troublesome, especially for cross-year weeks. If you do not need to duplicate the week, you can directly extract the data from the day.

The e-month process is the same as above.

F process: The result table is updated to avoid affecting the query of the lead. Therefore, all data is first integrated into a temporary summary table and then moved to the result table.

G process: it is an important process. Its main function is to implement the Undo rollback operation, because it is very slow to rely on Oracle's own undo mechanism.

When monthly reports are accumulated every day, or the list is too large, too much data is retained for a month, or the data is scanned for a month for too long, you can run the report once a day and add the report once a day.

Similar to update table set value = value + new_value;

In such a scenario, if a fault occurs during the operation, data inconsistency will occur before and after the operation. If only 30% of data is updated, the Operation will fail. therefore, the new value is stored in the rollback table before the update. call the rollback process before each operation and check the rollback flag.

If the end ends abnormally, extract the corresponding data to update table set value = VALUE-NEW_VALUE;

H Cleaning Process: here we mainly clean the list tables that are temporarily retained for a period of time.

Before running each process, truncate table XXXXX_TMP should be used to clear the TABLE. If the list and target TABLE are involved, delete table where yyyy = XXXX to avoid repeated data.

 

6 cursor Batch Processing

Because the data volume is large and tens of thousands of rows, it is impossible to submit it at once. for example, insert into table_name (xx, yyy, zz, hhh,) select xx, yy, zz, hh from table_tmp left jion table_tmp2;

Using cursor and batch extraction

Cursor cur_day_result is -- calculates the number of monthly logins and the number of times.

Select provcode from table_ B group by 1;

Type type_provcode is table of oss_openplat_truslogin_day_lst.provcode % type index by binary_integer;

Rochelle ary_provcode type_provcode;

Begin

Open cur_day_result;
Loop
Fetch cur_day_result bulk collect

L_ary_provcode

Limit g_batch_size_n; --- the number of extracted rows can be controlled here

Forall I in 1 .. l_ary_provcode.count
Insert into login_day_lst
(Provcode)

Values (l_ary_provcode)

Commit; -- submit part of the data here

End loop

 

7. complex requirements:

There are often three consecutive months of the number of users to buy, increase and increase, that is, the ratio of the day and the day of the previous month, that is, the year-on-year ratio; monthly cumulative value.

The method of using merg into and UPDATE is slow. INSERT and DELETE are used directly.

For example, date, category 1, Category 2, category 3, statistical value, and statistical value monthly accumulation;

Data is generated through the daily and monthly data processes.

Date, category 1, Category 2, category 3, statistical value;

Date, category 1, Category 2, category 3, Monthly Statistical Value accumulation;

Insert into to the summary table (date, category 1, Category 2, category 3, statistical value, Monthly Statistical Value accumulation)

Insert into summary table (date, category 1, Category 2, category 3, statistical value, statistical value monthly accumulation) select date, category 1, Category 2, category 3, statistical value, 0 from table_day_tmp;

Set the value of a field that is not yours to 0.

When moving the result table

Select date, category 1, Category 2, category 3, sum (statistical value), sum (monthly cumulative statistical value) from summary table group by date, category 1, Category 2, category 3

 

8-width table row-to-column Conversion

The idea is to increase the number of columns to reduce the number of rows. For example, to meet the report requirements of the number of users bought for three consecutive months

We have a user table and a User purchase record table. If we have a relatively small number of users, we have 1 million users. If the number of purchase records for 12 months is 0.2 billion million. on average, there are 1 million lines per month;

From the three-month record, the statistics of users for the past three months may be slow.

For example, the number of purchases made by a wide table user in December January, the number of purchases made in December February, the number of purchases made in December, the first purchase time, and the last purchase time

This table has only 1 hundred rows of records.

Select User

From table

Where January purchases> 0 and February purchases> 0 and March purchases> 0

 

9 Report levels

If all the reports are to be run before, this is a relatively difficult task. in the case of a very small amount of data, for example, 20 GB raid with one machine, 32 GB memory, 8 CPUs, multiple hard disks

It can indeed meet the requirements. If the data size exceeds GB, it will be troublesome.

Therefore, the report should be prioritized in different levels.

A-level reports run out before, which is generally viewed by the company's business core report leadership and the boss cto ceo.

The Level B Report ran out of the concerns of the department leaders before noon.

C-level reports are common employees before pm

D-level reports run at night, such as monitoring

 

10 RAC cluster

RAC cannot improve performance. The key to using RAC is to divide tasks on different nodes.

Node A is the primary management node;

Node B performs data extraction and synchronization. If the data size is big, it must be extracted around the clock and synchronized from time to time;

C-node report node. It mainly runs the task process of each report.

If the node report on the d node page is displayed in HTML format, the database accessed by the Page Server must be a separate node to avoid other operations affecting the node.

Node E: this node basically performs data query by itself, checks the data, and changes the data.

 

Node A is the management node of RAC responsible for managing the entire cluster block and lock processing. Therefore, in order not to affect performance, a single node must be used to communicate with the entire cluster.

Node B needs to perform 24-hour data insertion.

C-node heavyweight node the node uses machines that are several times higher than other nodes. A larger memory can be used to perform operations on a large number of data blocks, rather than being replaced by Linux swap partitions.

The speed and speed of the D-node face node leading the boss and colleagues to access the page is on this node. If they are combined with other nodes, the memory is easily occupied by the tasks of other nodes.

 

7 Partition Table

Generally, two partitions are dual partitions when the number of partitions reaches two layers. In some cases, a list partition is required for a three-layer physical month table, and a list partition is used for the next day partition. A physical month table is manually named "table_201206"

In this way, we need to constantly create a new table. During access to the stored procedure, we need to obtain the table name from the data dictionary. Otherwise, we need to use the time concatenation method and then use dynamic statements.

We recommend that you partition a table larger than 2 GB in the Partition Table oracle. What is the capacity of the smallest partition? This involves machine performance and I/O throughput, as well as the endurance of a partition full table scan time.

If it takes 10 minutes to completely scan a partition at a time of 1 GB, it is unacceptable. Then, a partition should be fully scanned within 1 minute.

 

11 index

It is basically not recommended to create an index in the table. A Multi-Layer partition table is used to scan the entire table. the index will lead to a slower speed than full scanning, and the maintenance cost of indexes during large-scale data updates is high. the running time of each report is greatly affected.

Most indexes are used in the result table because the result table has the least data inserted, the lowest update frequency, and the lowest maintenance cost. The query efficiency is the highest.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.