Thread 1 cannot allocate new log problem analysis (reproduced)

Source: Internet
Author: User

Thread 1 cannot allocate new log Problem Analysis

In the event of an Oracle crash, the alert file reports the following error:


Fri Jan 12 04:07:49 2007
Thread 1 cannot allocate new log, sequence 187398
Checkpoint not complete

Cause analysis:

The background process of ckpt is to do the checkpoint. One of the conditions that the checkpoint is triggered is the occurrence of the redo log switch. The specific work of the checkpoint includes:
• Trigger dbwn to write dirty data to the disk.
• Update the checkpoint information to the datafile header.
• Update the checkpoint information to the control file.

One of the tasks that checkpoint does is to trigger dbwn to cache the dirty disk in the buffer cache. In addition, it updates the SCN of the latest system to the datafile header and control file (each transaction has an SCN ), the purpose of the first task is to reduce the recovery time required due to sudden system downtime. The second task is to ensure Database Consistency. The redo log switch is the main event that triggers the checkpoint. When the first redo log is used up, Oracle will stop using the current redo log and use another redo log, this is called Log switch. While log switch triggers checkpoint. The minimum redo group required by Oracle is 2, but we generally recommend that you configure three or more redo log groups. Suppose we only have two redo log groups: group 1 and group 2, and there is always a large number of dirty blocks in the system that need to be written to datafile. When we switch from group 1 to group 2, the checkpoint is triggered. The checkpoint requires dbwn to write the dirty block in the buffer cache to datafile. However, when we use up the space in group 2 again, when you need to switch to group 1 again and reuse group 1, if we find that the dirty blocks protected by redo log group 1 are not completely written to datafile, the whole database must wait for dbwn to write all dirty blocks to datafile before doing other things. This is the "Checkpoint not complete" problem we encountered. This problem often implies that there is a problem with the redo log configuration. In this example, either the redo log is too small or there are only two redo log groups. The solution to this problem is to increase the redo log or add more redo log groups. No matter which solution, we aim to win more time for dbwn.

(

Other explanations are as follows:
When the system needs to reuse a log file, the system needs to write the dirty block in the buffer cache included in the log file to the corresponding data file. For a database operation, it may generate only dozens of bytes of Redo, but it is indeed a block (usually 8 K) in the buffer cache ). Therefore, for a log file that is only several hundred MB, the buffer cache it protects may be several GB
In the event of "thread 1 cannot allocate new log", it indicates that the system's checkpoint has not been completed, that is, the dirty data in the buffer cache has not been completely written to the data file, A large number of logs need to be written to the system. The system can only notify the application that the checkpoint has not been completed and you can only wait. At this time, the system is basically in the Hang state. When the database waits on checkpoints, redo generation is stopped until
Log switch is done
If we check the system information at this time, we will find that most of the logs in the V $ log are in the active status, and many log file switch events occur in the V $ session_wait.

)


Solution:

A. Add more log files

B. increase the frequency of checkpoint triggering

C. reduce the size of the redo log

D. Improve dbwr Efficiency

E. You can set

Log_checkpoint_interval = 0 log_checkpoint_timeout = 0 log_checkpoints_to_alert = true

References:

This topic enables DBAs to have a better understanding of the checkpoint and checkpoint optimization parameters:
-Fast_start_mttr_target
-Log_checkpoint_interval
-Log_checkpoint_timeout
-Log_checkpoints_to_alert

It also explains how to interpret and process the data that appears in alert <Sid>. log file.
Checkpoint error "'checkpoint not complete' and 'cannot allocate new log"

What is checkpoint?

Checkpoint is a database event used to synchronize modified data blocks with disk data files in the memory. It provides
Means to maintain data consistency after the transaction is committed. The mechanism for writing dirty data to Oracle disks is not synchronous with transaction commit.

Checkpoint has two targets: 1. ensure data consistency. 2. enables the database to be quickly restored. How to quickly restore it?
Because the database will set checkpoints for all changes in the data file and keep increasing it, it does not need to request checkpoints
The previous redo log. Checkpoint ensures that all data in the cache is written to the corresponding data file to prevent unexpected instances.
Data loss caused by failure.

Oracle writes this dirty data only under certain conditions:
The subsequent process requires the size of 1/4 db_block_buffer Parameters
Every three seconds
When a checkpoint is generated

A checkpoint has five event types:
Switching of each redo log
The arrival of the delay parameter log_checkpoint_timeout.
The corresponding bytes (log_checkpoint_interval * size of Io OS blocks) are written to the current redo log.

Io OS blocks: # fstyp-V/dev/vg00/lvol1 in UNIX
Vxfs
Version: 5
F_bsize: 8192

Alter system switch logfile command will directly cause checkpoint
Alter system checkpoint

The following process may occur during the Checkpoint Process:
Dbwr writes all dirty data to the data file
Lgwr updates the SCN of control files and data files

Checkpoints and Optimization
Checkpoints is a difficult point for database optimization. Frequent checkpoints can achieve rapid recovery, but it also makes the performance
Decrease. How does DBA solve this problem?

Depending on the number of database data files, a checkpoint may be high-speed. Because all data files are stored in the checkpoint
Will be frozen during this period. More frequent checkpoints can quickly restore the database. This is also the reason for the customer's tolerance for unexpected system downtime.
However, in some special cases, frequent checkpoints cannot guarantee quick recovery. Let's assume that the database is at 95% of the time.
Internal is normal, 5% the instance fails to be used, and recovery is required. Most customers prefer a 95% adjustment.
Performance rather than 5% downtime.

This assumption shows that performance is the first priority, so our goal is to reduce the checkpoints frequency during optimization.

Optimization of checkpoints includes four key initialization parameters:
-Fast_start_mttr_target
-Log_checkpoint_interval
-Log_checkpoint_timeout
-Log_checkpoints_to_alert

Each parameter is described in detail:
Fast_start_mttr_target

Since Oracle9i, The fast_start_mttr_target parameter is the preferred method for adjusting the checkpoint.
Fast_start_mttr_target can specify the number of seconds required to restore a single instance. Increase based on internal statistics
Checkpoint automatically adjusts the checkpint target to meet the needs of fast_start_mttr_target.
V $ instance_recovery.estimated_mttr displays the estimated number of seconds to restore. This value will be displayed
Even if fast_start_mttr_target is not specified.
V $ instance_recovery.target_mttr indicates the MTTR target in a short period of time.
V $ mttr_target_advice: displays the number of I/O and other I/O generated by the workload set by the current MTTR.
This view helps you assess the balance between optimization and recovery.

Log_checkpoint_interval

The log_checkpoint_interval parameter specifies the maximum number of redo blocks at intervals.
If fast_start_mttr_target is specified, log_checkpoint_interval cannot be set to 0.

In most Unix systems, the OS block size is 512 bytes. Setting log_checkpoint_interval = 10000 means
The increasing checkpoint cannot be appended to the current log because it is more than 5 MB. If your redo log is 20 mb
Issue four checkpoints for each redo log.

Log_checkpoint_interval will affect the setting of this parameter when a checkpoint occurs,
Keep it changing with the size of the redo log file. Frequent checkpoints are one of the reasons that affect database recovery.
A short checkpoint interval means that the database will be quickly restored and resource utilization is increased.

This parameter also affects the database rollback time. The actual recovery time is based on this time. Of course, there are also failure types and
The number of logs to be archived.

Log_checkpoint_timeout
This parameter specifies the interval at which the checkpoint is sent. In other words, it specifies the amount of time dirty data is written.

The checkpoint frequency affects the database recovery time. A long interval requires a longer time for database recovery.
We recommend that you use log_checkpoint_interval to control the checkpoint instead of log_checkpoint_timeout.
, Log_checkpoint_timeout sends a checkpoint every n seconds regardless of the transaction commit frequency. This may cause some
There is no necessary checkpoint when the transaction has changed. Unnecessary checkpoints must be avoided.

There is another easy misunderstanding: log_checkpoint_timeout will issue a log switch occasionally.
The log switch triggers a checkpoint, but the checkpoint does not lead to a log switch. The only manual method
Alter system switch logfile or reset the redo logs size can cause frequent switch.

The size of online redo logs is critical for optimization and recovery.

Solution:

If redo logs switch every 3 minutes, you will see performance degradation.
This indicates the redo logs are not sized large enough to efficiently handle
The transaction load.

Size of the redolog files.
Using statspack to determine Checkpointing Problems

Statspack snapshots can be taken every 15 minutes or so, these reports gather useful
Information about number of checkpoints started and checkpoints completed and number
Of database buffers written during Checkpointing for that window of time. It also contains
Statistics about redo activity. Gathering and comparing these snapshot reports gives you
A complete idea about Checkpointing performance at different periods of time.

Another important thing to watch in statspack report is the following wait events,
They cocould be a good indication about problems with the redo log throughput and Checkpointing:

Log File switch (checkpoint incomplete)
Log File switch (archiving needed)
Log File switch/archive
Log File switch (clearing log file)
Log File switch completion
Log switch/archive
Log File sync


In the case when one or more of the above wait events is repeated frequently
With considerable values then you need to take an action like adding more
Online redo log files or increasing their sizes and/or modifying Checkpointing parameters.

Thread 1 cannot allocate new log problem analysis (reproduced)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.