Real-Time System and garbage collection
The difference between real-time (RT) application development and general application development is that the former imposes time restrictions on some runtime behaviors. Such restrictions are generally implemented for certain parts of an application, such as the interrupt processing program. The code that responds to the interrupt must complete the work within a given time range. For a hard RT system, such as a heart monitor or defense system, if the operating of such systems exceeds the time limit, it can be seen as a catastrophic failure of the entire system. For a soft RT system, exceeding the time limit may have some adverse effects-for example, the GUI cannot display all the results of its monitoring stream-but does not cause system failure.
In Java applications, Java Virtual Machine (JVM) is responsible for optimizing runtime behavior, managing object heaps, and joining operating systems and hardware. Although the management layer between the language and platform simplifies software development, it also brings a certain amount of overhead to the program. GC is an example in which uncertainty in applications is usually paused. The pause frequency and duration are unpredictable, so that Java is traditionally not suitable for developing RT applications. Some existing solutions based on the Java real-time specification (rtsj) allow developers to avoid the uncertainty of Java technology, but some changes need to be made to the existing programming model.
Metemedime is a deterministic Garbage Collector that provides limited low pause times and specified applications for standard Java applications.Utilization. The reduction of limited suspension times stems from the increase in collection methods and careful engineering decisions, including basic changes to VMS. Utilization is the percentage of time in a specific time window that the application can run, and the remaining time is used for GC. Metemedime allows you to specify the application utilization level. By combining with rtsj, metstmme enables developers to build software with a fixed low pause time and no pause at a very small time window. This article explains the traditional GC restrictions of RT applications, describes the metemedime method, and provides some tools and guidance for developing hard RT applications using metemedime.
Traditional GC
Traditional GC implementation and useStop-the-world(STW) method to restore heap memory. The application runs until the available memory of the heap is exhausted. GC stops all application code, executes garbage collection, and keeps the application running.
Figure 1 demonstrates the traditional STW pauses used for GC activities, which are generally unpredictable in terms of frequency and duration. The traditional GC is uncertain, because the amount of execution required to restore the memory depends on the total number and size of objects used by the application, and the interconnection between these objects, and the amount of work required to release enough heap memory for future allocation.
Figure 1. paused by traditional GC
Reasons for the uncertainty of traditional GC
Looking at the basic components of GC, it is not difficult to understand the reasons why GC time is not limited and unpredictable. GC pause usually involves two independent phases:MarkPHASE ANDClearPhase. Although many implementations and methods can combine or modify the meaning of these two stages, or enhance GC by other means (such as compressing or reducing fragments in the heap, or make the operations in some stages and the concurrent execution of application programs, these two concepts are the technical baseline of traditional GC.
The markup phase is used to track all objects visible to the application andMarkFor live storage, so as not to recycle their storage. This trail usesRoot setIt consists of some internal structures, such as the thread stack and global reference of objects. Trace and traverse the reference chain until all (directly or indirectly) objects in the root set are marked. Objects that are not marked at the end of the markup phase are inaccessible to applications (Dead objectBecause there is no path to these objects from the root set through any reference sequence. The length of the tag phase is unpredictable because the number of live objects in the application is unpredictable at any specific time, it is also unpredictable to traverse all references to find all live objects in the system. Oracle in a stable operating system can predict time requirements based on previous timing features, but the accuracy of these predictions is another source of uncertainty.
The cleanup phase examines the heap and recycles the storage of dead objects after marking, and puts them back into the free storage of the heap so that the storage can be used for allocation. Similar to the flag stage, the consumption of clearing dead objects back to the free memory pool cannot be completely predicted. Although the number and size of live objects in the system can be obtained from the tag phase, the work required to analyze their location in the heap and their suitability for the free memory pool is unpredictable.
Suitability of traditional GC for RT applications
The Rt application must be able to respond to the actual stimulus with a specified interval. Traditional GC cannot meet this requirement because the application must be paused so that GC can reclaim all unused memory. There is no limitation on the time spent on recycling and it is affected by fluctuations. In addition, the timing of GC interruptions to applications is traditionally unpredictable. The duration of application suspension is calledPause TimeBecause the application process is suspended during this time, GC can reclaim free space. RT applications require a low pause time because it usually indicates that the application response has a high timing limit.
Metstmme GC
The metstmme method is used to divide the execution time of GC cycles into a series of increments calledQuantum. To this end, each stage completes all its work through a series of discontinuous steps, allowing the collector to perform the following operations:
- Preemptible the application for a short period of time.
- Perform collection.
- Resume the application.
This sequence is in good contrast with the traditional model. In the traditional model, the application is paused at an unpredictable time, GC is completed after running for a period of no limit, and GC is stopped, resume the application.
Although breaking down the stw gc cycle into a short, limited pause helps reduce GC impact, this is not enough for rt applications. In order for the RT application to meet its time limit requirements, a sufficient portion of the available applications must be available for any specific period of time; otherwise, a demand conflict may occur and the application may fail. For example, assume that a GC pause is limited to 1 millisecond: during every 1 millisecond GC pause, the application can only run for 0.1 milliseconds, and the execution of the program has almost no progress, even less complex RT systems may fail because they lack time for processing. In fact, there is no difference between a short pause time that is close enough to the complete stw gc.
Figure 2 shows the pause duration of GC running for most of the time, but it is still retained for 1 millisecond:
Figure 2. short pause time but shorter application time
Utilization
In addition to the limited pause time, another measurement method is required to determine the percentage of time allocated to the application and GC. We define application utilization as the time percentage of applications allocated to a specific time window, during which the application continuously executes the complete running process. Metstmme ensures that the application can get a certain percentage of processing time. The remaining time is determined by GC: It can be allocated to the application or by GC. The short pause time ensures a finer utilization rate than the traditional collectors. Because the time interval used to measure utilization approaches zero, the application's expected utilization rate is 0% or 100%, because the interval is lower than the GC volume. The usage of the sliding window must be strictly guaranteed. Metstmme uses 500 microseconds in a 10 ms time window. The default utilization target is 70%.
Figure 3 shows a GC cycle divided into multiple 500 microsecond time slices. The cycle has a 70% utilization rate within the 10 ms time window:
Figure 3. Sliding Window Utilization
In Figure 3, each time slice represents a time amount for running GC or applications. Each column under the time slice indicates a sliding window. Each sliding window can have a maximum of 6 GC times and at least 14 application time periods. Each GC time is followed by at least one application time, even if the target utilization is maintained through continuous GC time. This ensures that the application suspension time is limited to one time length. However, if the specified target utilization rate is lower than 50%, some continuous GC time will be generated so that the GC can satisfy the allocation.
Figure 4 and figure 5 demonstrate a typical application utilization scenario. In Figure 4, the region where the utilization rate is reduced to 70% indicates the region where the GC cycle is in progress. NOTE: If GC is inactive, the application utilization rate is 100%.
Figure 4. Total Utilization
Figure 5 demonstrates a GC loop segment in Figure 4:
Figure 5. GC cycle Utilization
Section A in Figure 5 is A ladder chart, where the falling part corresponds to the GC time volume, and the gentle part corresponds to the application time volume. The ladder type indicates that GC achieves a low pause time through staggered with the application, resulting in a decrease in the target utilization rate. The application activity that makes up segment B only maintains the utilization target for all sliding windows. The Utilization Mode only displays GC activities at the beginning of the mode, which is common. The reason is that GC runs as long as it is allowed (the pause time and utilization are maintained ), this usually means that it will exhaust the allocated time at the beginning of the mode and allow the application to resume execution in the remaining part of the time window. Section C indicates the GC activity when the utilization rate is close to the target utilization rate. The rising part indicates the application time, and the falling part indicates the GC time. This segment is also due to the staggered execution of GC and applications to maintain a low pause time. Segment D indicates the part before the GC cycle is completed. This section shows an upward trend, indicating that the GC is no longer running and the application will get a 100% utilization rate again.
In metemedime, you can specify the target utilization rate. In this article, the adjustment metemedime Section provides more information related to this.
Run the application using metemedime
Metemedime is designed to provide RT behavior for existing applications. You do not need to modify the user code. The expected heap size and target utilization must be adjusted for the application so that the target utilization can maintain the expected application throughput while the GC can satisfy the allocation. Users should run applications based on the maximum load they want to maintain to ensure the RT features and adequate application throughput. The adjustment metemedime section in this article describes what operations can be performed when throughput or utilization is insufficient. In some cases, the short pause time of metemedime cannot meet the RT features of the application. In this case, you can use RTSJ to avoid GC-caused pause time.
Java real-time specification
RTSJ is a Java platform specification that enables Java programs to be used for real-time applications ". Metstmme must be aware of some aspects of RTSJ-especiallyRealtimeThread
(RT thread ),NoHeapRealtimeThread
(NHRT) andPermanent memory. Apart from other features, RT threads are Java threads that run at a higher priority than normal Java threads. NHRT is an RT thread that cannot contain heap object references. In other words, objects that can be accessed by NHRT cannot reference GC-compliant objects. In exchange for this compromise, GC does not block NHRT scheduling, even during GC loops. This means that NHRT will not cause any pause time. Permanent memory provides a memory space that does not obey GC; that is, NHRT can reference permanent objects. These are only some aspects of RTSJ. for links to the complete specifications, see references.
Deterministic GC-related technical issues
Metemedime uses several key methods in the J9 Virtual Machine to determine the pause time and ensure GC security. These methods include arraylet, time-based Garbage Collector scheduling, root structure processing for tracking live objects, coordinating J9 virtual machines and GC to ensure that all live objects can be found, and the mechanism used to suspend the J9 Virtual Machine to provide GC time.
Arraylet
Although metstmme splits the collection process into step-by-step work units to achieve a definite pause time, allocation in some cases may lead to hiccup in GC. The allocation of large objects is an example. For most collector implementations, the distribution subsystem holds a free heap memory pool, and the application uses the pool by allocating objects, and then the collector complements the pool by cleaning. After the first collection, the free heap memory is mainly the result of some previous live objects (now dead. Because there is no predictable mode for how these objects die or when they die, the free memory on the stack is a collection of fragments of different sizes, even if adjacent dead objects are merged. In addition, each collection loop returns a different free block mode. As a result, if there are not enough free memory blocks to meet the request requirements, allocating a large object will fail. These large objects are usually arrays. Standard objects generally do not have more than dozens of fields, and generally occupy less than 2 K space in most JVMs.
To alleviate fragment problems, some collectors implement a compression or fragment stage for their collection cycles. After cleaning, if the allocation request cannot be met, the system will try to move the existing live objects in the heap to combine two or more free blocks into a larger one. This phase is sometimes implemented as an on-demand feature and is embedded into the collector structure (such as a semi-space collector) or implemented in an incremental form. Every such system has its own balancing method, but in general, the compression phase consumes a lot of time and work.
The current version of metemedime in WebSphere real time does not implement a compression system. To prevent fragments from being a problem, metemedime usesArrayletThe standard linear representation is divided into several discontinuous small blocks, which can be allocated independently of each other.
Figure 6 demonstrates the array object as a spine (it is a central object and a unique entity that can be referenced by other objects on the stack) and a series of arrayletLeaf(Contains the actual array content:
Figure 6. Arraylet
The arraylet leaves are not referenced by other heap objects and may be distributed in any order in any location in the heap. These leaves have a fixed size and allow simple calculation of the positions of elements, which is an additional detour. As shown in figure 6, the memory usage overhead caused by internal fragments in the spine has been optimized by including all the trailing data of the leaves in the spine.
Note that this format means that the array spine may grow to an unlimited size, but this is not found in the existing system.
Scheduling GC Time Volume
To pause GC Scheduling with certainty, metemedime uses the following two different threads to complete consistent scheduling and a short continuous pause:
- Alarm thread. To determine the GC scheduling time, metemedime uses the alarm thread as the heartbeat mechanism. The alarm thread has a high priority (higher than all other JVM threads in the system), and its wake-up speed is the same as the GC quantum time period (500 microseconds for metachme ), and determines whether to schedule a GC time. If scheduling is required, the alarm thread must pause JVM running and wake up the GC thread. The alarm thread is only active for a short period of time (usually less than 10 microseconds) and silently executed with the application.
- GC thread. The GC thread executes the actual work during a GC time. The GC thread must first suspend the JVM started by the alarm thread. Then, GC can be executed in the remaining time. When the time is near the end, the system changes itself back to sleep and resumes JVM running. If the GC thread cannot complete the scheduled task project before the end of the time, it can also preemptively enter sleep. For RTSJ, the priority of GC threads is higher than that of all RT threads except NHRT.
Collaboration suspension mechanism
Although metstmme uses a series of small, step-by-step pauses to complete a GC loop, it must still suspend JVM for each time volume in STW mode. For every STW such as this, metemedime is used in j9 Virtual MachineCollaboration suspension mechanism. This mechanism does not rely on any special local thread function to pause threads. Instead, it uses an asynchronous message passing system to notify the Java thread that access to the internal JVM structure (including the heap) must be released and sleep until it is notified of restoration. The Java threads in the J9 Virtual Machine periodically check whether a pause request has been sent. If yes, they perform the following steps:
- Release all internal JVM structures used.
- Store all used object references in a well-described location.
- Inform the central JVM that the suspension mechanism has arrivedSecurity Point.
- Sleep and wait for the correspondingRestore.
Once restored, the thread re-reads the object pointer and obtains the JVM-related structure it previously occupied. The operation to release the JVM structure allows the GC thread to process these structures in a safe form. The read and write operations on some updated structures may lead to unpredictable behavior and conflicts. By storing and reloading object pointers, the thread provides GC with the opportunity to update the Object Pointer during GC time. This update is necessary if the object is moved as part of any compress-like operation.
Because the suspension mechanism is in collaboration with Java threads, periodic checks in each thread should be separated with as few time intervals as possible, which is very important. This task is done by the JVM and the JIT compiler. Although checking the pause request will incur system overhead, some structures (such as stacks) can be well defined according to GC needs, so that it can accurately determine whether the values in the stack are object pointers.
This suspension mechanism is only used for the threads currently involved in JVM-related activities. Java threads that do not use Java threads or Java local interfaces (JNI) Code and do not use JNI APIs do not obey the suspension. If these threads are involved in any JVM activity (such as connecting to the JVM or calling the jni api), they pause the collaboration until the GC time is completed. This is very important because it enables the Java processing-related threads to continue to be scheduled. Although the thread priority will be taken into consideration, any visible interference to the system in these other threads will affect the determination of GC.
Write barrier
The comprehensive STW collector has the following advantages: it can trace object references and JVM internal structures, and applications will not interfere with links in object graphs. By decomposing GC loops into a series of small STW stages and staggered execution with applications, metemedime indeed brings potential problems in tracking live objects in the system. Because the application may modify the object reference after processing the object, the collector cannot detect the unprocessed object. Figure 7 demonstrates the problem of hiding objects:
Figure 7. hiding objects
Assume that an object graph exists in the heap described in section I of Figure 7. The metemedime collector is active and is allocated for tracking during this time period. During the time period it allocates, it tracks the root object and its referenced objects before the time is used up, and needs to call the JVM back to Segment II. During the running of the application, the reference between objects changes, so object A now points to an unprocessed object, which is no longer referenced by any other position in section III. GC is then called back and processed in other time periods.HideObject Pointer. The result is that a live object will be recycled andSuspendedPointer, resulting in incorrect behavior or even destruction in JVM or GC.
To prevent such errors, JVM and metemedime must collaborate to track heap and JVM changes so that GC can keep all related objects active. This task passesWrite barrierIt will track all object write operations and record the creation and destruction of references between objects, so that the collector can track potentially hidden live objects. The barrier type used by metomeme is calledInitial Snapshot(Snapshot at the beginning, SATB) barrier. It records the heap state in concept at the beginning of the collection loop and retains all the live objects allocated at that time and in the current loop. A specific solution involves a Yuasa-type barrier (see references) that records the override values in any field store and treats them as having a relevant root reference. Retain the original values of the slot before enabling live object settings for protection and processing.
The internal JVM structure also requires such barrier processing, including the JNI global reference list structure. Because applications can add and delete objects to and from this list, they can use barrier to track deleted objects (to avoid hidden objects similar to field rewriting) and the added object (eliminating the need to re-scan the structure ).
Root scan and root Processing
To start tracking live objects, the garbage collector starts from a groupRootStart. The root is the structure in JVM, indicating the Hard reference of the objects explicitly created by the application (such as JNI global reference) or implicitly (such as stack. The root structure is scanned as part of the initial feature of the Flag phase in the collector.
Most roots can be extended based on object references during execution. For this reason, you must track changes to their reference settings, as described in the write barrier section. However, some structures (such as stacks) cannot provide push and pop tracking that do not have serious performance impact. Therefore, some restrictions and changes have been made to the scan stack to enable metdeskme to adapt to the Yuasa form of barrier:
- Stack atomic scan.A separate thread stack must be scanned automatically or within a single time. The reason for this is that the thread can pop up any number of references from its stack during execution-the references may be stored in other locations during execution. The suspension during stack scan may lead to loss of storage tracking or missed scanning between the two parts, resulting in a floating pointer in the heap. Application developers should be aware that stacks should be automatically scanned and should avoid using deep stacks in RT applications.
- Fuzzy barrier.Although the stack must be scanned automatically, it may be difficult to maintain certainty if all stacks are scanned during a single time period. GC and JVM can be staggered while scanning the Java stack. This may cause objects to be moved from one thread to another through a series of loading and storage. To avoid the loss of object references, threads that have not been scanned during GC allow barrier to track overwrite and stored values. Objects stored in the trail should be stored in the processed objects and popped up from the stack to maintain accessibility through the write barrier.
Adjust metemedime
It is important to understand the relationship between heap size and application utilization. Although high-goal utilization is helpful for achieving the optimal application throughput, GC must be able to keep up with the allocation rate of the application. If both the target utilization and allocation rate are high, the application may exhaust the memory, force the GC to run continuously, and in most cases reduce the utilization rate to 0%. This reduction results in a large amount of pause time, which is generally unacceptable to RT applications. In this case, you must select to reduce the target utilization to provide more GC time, increase the heap size to support more applications, or use both. In some cases, memory may be used to maintain a specific target utilization. Therefore, reducing the Target Utilization Rate in terms of performance overhead is the only choice.
Figure 8 shows a typical balance between heap size and application utilization. A higher percentage of utilization requires a larger heap because some stacks are allowed to run in applications with lower utilization, but not in GC.
Figure 8. Comparison of heap size and Utilization
The relationship between utilization and heap size is closely related to the application. To achieve a rough balance, you need to use the application and Vm parameters for repeated experiments.
Redundant GC
Redundant GC is a tool that records GC activities and outputs them to a file or screen. You can use it to determine whether the parameters (heap size, Target Utilization, window size, and time) support application programs. Listing 1 demonstrates an example of redundant output:
Listing 1. Example of redundant GC
<?xml version="1.0" ?><verbosegc version="200702_15-Metronome"><gc type="synchgc" id="1" timestamp="Tue Mar 13 15:17:18 2007" intervalms="0.000"> <details reason="system garbage collect" /> <duration timems="30.023" /> |
Each verbose GC event is included in<gc></gc>
Marking. There are multiple available event types, but Listing 1 shows the most common ones.synchgc
Type indicates synchronous GC, which is a continuous GC cycle running from start to end; that is, it is not staggered with the application. There are two reasons for this:
System.gc()
Called by the application.
- The heap is full, and the application fails to allocate memory.
The reasons for Synchronous GC are as follows:<details>
Marking, wheresystem garbage collect
Used in the first scenarioout of memory
The second scenario is used. In the first case, no specific parameters are used for application maintenance. However, in many casesSystem.gc()
This can cause application utilization to drop to 0% and lead to a long pause; therefore, this situation should be avoided. However, if the GC synchronization error occurs due to insufficient memory in the second case, it means that the GC cannot keep up with the application allocation. Therefore, you should consider increasing the heap or reducing the application utilization target to avoid synchronous GC.
trigger
The GC event type corresponds to the start point and end point of the GC loop. They can be usedheartbeat
GC events are performed in batches.heartbeat
The GC event type integrates multiple GC time information into an overall redundancy event. Note that this is irrelevant to the heartbeat of the alarm thread.quantumcount
Attribute correspondenceheartbeat
GC time in GC.<quantum>
Markheartbeat
The timing information of GC time in GC.And<immortal>
Mark contains aboutheartbeat
The free memory information at the end of the time amount integrated in GC.<gcthreadpriority>
Mark information about the GC thread priority at the beginning of the time.
A Time Value refers to the pause time seen by the application. The average time volume is close to 500 microseconds, and the maximum time volume must be monitored to ensure that they provide an acceptable pause time for the RT application. The long pause time may be due to GC being preemptible by other processes in the system, so that GC cannot complete the time and resume the application, or because it abused some root structures in the system and grew to an unmanageable size (see the considerations when using metemedime ).
Permanent memory is the resource required by the rtsj and does not obey the GC. For this reason, it is often found that the permanent free memory in the redundant GC record is degraded and will not be restored. It is used for objects such as string constants and classes. Pay attention to the program behavior and adjust the permanent memory size as appropriate.
The use of the heap should be monitored to ensure the overall trend is stable. The decrease in heap free space indicates that applications may cause leakage. There are many causes of leakage, including more and more scattered lists, uncertain large resource objects to be saved, and uncleared global JNI references.
Figure 9 and figure 10 demonstrate the trend of stability and decline in the free heap space. Note: It is easy to implement local minimum and maximum space, because free space is increased only during GC cycles and decreases accordingly when the application is active and allocated.
Figure 9. Stable Free heap
Figure 10. Decreasing the free heap
<gc>
Markedinterval
Attribute refers to the time elapsed since the same type of redundant GC event was output last time. Forheartbeat
Event type, which is used to indicatetrigger start
The time when the event starts (if it is the first heartbeat of the current GC cycle ).
Tuning Fork
Tuning Fork is an independent tool used to tune metemedime to better suit your applications. Tuning Fork allows you to check many details of GC activities by performing a trace log check or socket check at run time after the activity is executed. Metemedime considers Tuning Fork during building and records many events that can be checked in the Tuning Fork application. For example, it shows the application utilization in time and checks the time for each GC phase.
Figure 11 shows the GC performance summary graph generated by Tuning Fork, including the target utilization, heap memory usage, and application utilization:
Figure 11. Tuning Fork Performance Overview
Precautions for using metemedime
Metemedime tries its best to provide a brief pause for GC. Some situations that may affect these results in the application code and the underlying platform may sometimes cause a pause time exception. GC behavior changes brought about by the use of the standard JDK collector will also appear here.
RTSJ requires that GC cannot process permanent memory. Classes are in permanent memory, so they are not subject to GC and therefore cannot be detached. Applications that want to use a large number of classes need to adjust the permanent space as appropriate. Applications that want to uninstall classes need to adjust their programming models in WebSphere Real Time.
GC in metstmme is time-based, and any changes to the hardware clock may cause problems that are difficult to diagnose. This is an example of synchronizing the system Time with the Network Time Protocol (NTP) server and synchronizing the hardware clock with the system Time. This may result in a sudden jump to GC in time and failure to maintain the utilization target or memory insufficiency errors.
Running multiple JVMs on a single machine may cause mutual interference between JVMs, so that the utilization chart is skewed. The alarm thread is a high-priority RT thread that will seize any other thread with a lower-priority, while the GC thread still runs at the RT priority. If there are enough GC and alarm threads active at any time, the JVM with no active GC loop may cause its application threads to be preemptible by other jvm gc and alarm threads, the time is actually allocated to the application because the GC of the VM is not active.
From: http://www.ibm.com/developerworks/cn/java/j-rtj4/index.html? S_TACT = 105AGX52 & S_CMP = techcsdn