Comparison of Solaris, Linux, and FreeBSD kernels

Source: Internet
Author: User

1. I personally think that the author's understanding of Linux by MAX is not as deep as that of Solaris, I do not know whether or not he has the following content about Linux from his own code reading and analysis, or only from third-party documents without on-site verification; 2. I have tried my best to translate it as intended. Of course, I can't help but insert two sentences in the middle. 3. Whether I only read this article or read other things, I think it is important to keep my mind clear. Thank you.

Max Bruning is a teacher/information expert whose content includes the crash Analysis and debugging of Solaris internal organizations, device drivers, kernels and applications, network Organization and other specific subjects (his blog may not be accessible in blogspot, so you can also check ).

When explaining how these subsystems are implemented in Solaris, his students always ask "How does it work in Linux ?" Or "in FreeBSD, what about Solaris ?" This experience eventually led Max to write this A Comparison of Solaris, Linux, and FreeBSD Kernels on the OpenSolaris website.

This article discusses scheduling, memory management, and file system architecture. These three subsystems are widely used in any operating system and are the most well-understood component.

At present, the materials and code referenced in many analysis or comparison articles are old and out of touch with reality. Max recommends the following websites with up to date:
Solaris Vs. Linux
Comparing MySQL Performance
Fast Track to Solaris 10 Adoption
Solaris 10 Heads for Linux Territory

In fact, aside from the differences between the three systems, they also have many similarities. Apart from naming conventions, these operating systems use very similar methods to implement different concepts. They all support thread-based time-sharing scheduling, support request page adjustment without using the page replacement algorithm recently, and support the virtual file system layer to allow different file system architectures. A good concept in this system will also be used in another system. For example, Linux also accepts and implements the Solaris slab Memory Allocation Algorithm. Many terms in FreeBSD Code also appear in Solaris (go and look at the code ...). Considering that the source code of the three systems can be obtained, provides a cross-reading view of the system source code, and may find many interesting places.

Well, the warmth and silence are almost over.

Scheduler and Scheduler
The scheduling unit of Solaris is kthread_t, FreeBSd is thread, and task_struct is Linux. To raise the level, the Solaris process is proc_t. Of course, the thread in each process is kthread_t. the Linux Process and thread are represented by task_struct, and the single-threaded process is a task_struct in Linux. A single-threaded process is running on Solaris.
There is a proc_t, A kthread_t, and a klwp_t. Klwp_t provides a storage area for thread switching between users and kernel mode. The single-threaded process in FreeBSD has a proc, a thread, and a ksegrp. Ksegrp is the kernel-scheduled entity group kernel scheduling entity group ". The threads of the three systems have different structures, but both support scheduling threads.
As you are familiar with, scheduling is based on priority. The small mathematical problem is that in Linux and FreeBSD, the smaller the number, the higher the priority; while SUN's baby prefers the larger the number, the higher the priority. See the following table.

The three systems all prefer interactive threads/processes (The following describes how interactive works ). The Interactive thread has a higher priority than the compute-bound thread, but the obtained time slice is less. Solaris, FreeBSD, and Linux both use the "runqueue" of each CPU ". FreeBSD and Linux have an active queue and an expired queue. The name is very clear-the system selects the thread as the priority for scheduling from the active node. The thread that runs out of its own time slice will be moved from active to expired (or to avoid other cases of starvation). After the active is empty, the kernel will switch between active and expired. FreeBSD has one more idle queue-this is the case only when the other two queue are empty. The concept of Solaris is "scheduling queue dispatch queue" for each CPU ". After the thread uses the time slice, the kernel gives it a new priority and puts it back into the scheduling queue. Runqueue of all three systems has linked lists for running threads with different priorities. FreeBSD shares a linked list with four priorities, while Solaris and Linux share a linked list with one priority.
Linux and FreeBSD combine the interactive-ness and Solaris lookup of the runtime and sleep time computing threads. None of them support "gang scheduling" (if you are interested in Google, you can refer to the scheduling algorithm on parallel computing. In other words, a group of tasks are disptach to each CPU. Lawrence lifmore is a favorite of atomic bombs. They have the world's most expensive toys, which can be understood.) every OS schedules the next thread instead of N threads to start running. These three operating systems use the CACHE (warm affinity) and load balancing mechanisms. For hyper-threading CPUs, FreeBSD can keep multiple threads on one CPU node as much as possible (of course, different CPU hyper-threading may be used ). Solaris also has a similar mechanism, but it is controlled by users and applications and is not limited to CPU hyper-threading. Their terms are processor sets, and FreeBSD calls processor groups.
The biggest difference from the other two operating systems is that Solaris supports multiple "scheduling classes" at the same time ". All three operating systems support POSIX SCHED_FIFO, SCHED_RR, and SCHED_OTHER (or SCHED_NORMAL ). SCHED_FIFO and SCHED_RR usually support real-time threads (I don't agree... But flip ...). Solaris and Linux support kernel preemption for both real-time threads. Solaris supports the fixed priority class. The system class is the system thread (such as the page feed thread), interactive is the thread that runs the Window environment under the X control, and Fair Share schedity is used for resource management. For more information, see Solaris. The FreeBSD scheduler is determined during compilation. What about Linux scheduling? -- Check the version.
Adding new scheduling classes to the system is costly. In the kernel, there must be an indirect code for function calling to call the scheduling class where scheduling may be decided. For example, when a thread is about to sleep, the kernel calls the scheduling code to complete the thread sleep. On Linux and FreeBSD, scheduling has completed all the work. No indirect call is required. An extra level means that Solaris scheduling takes up a little more system overhead-but more functions are provided.

Memory Management and paging
The process address space of Solaris is composed of logical segments and segments. These segments in the process address can be accessed through pmap. Solaris divides its memory management code and data structure into platform-independent and platform-related parts (this is not the same as it did not say ...). The platform is located at the hat (hardware address translation) layer. FreeBSD uses vmspace to describe the process address space and divides it into a logical block region. The hardware-related part is in the pmap (Physical Map) module, while the vmap routine processes hardware-independent parts and data structures. Linux uses the memory descriptor to divide the process address space. The logical unit is memory areas. In Linux, pmap uses examine process address space.
Linux divides the machine-related layer from the higher-level machine-independent layer. Most of the similar code in Solaris and FreeBSD, such as page fault processing, is machine-independent, while the code for Linux to process page fault is very machine-related-from fault processing. The result is that Linux can quickly complete most paging-related code-because there is less data abstraction. However, the cost is that the underlying hardware changes require a lot of code modifications-Solaris and FreeBSD respectively block the work on the hat and pmap layers.
Segment, region and meory area are separated into: the location of the Object/file mapped by the virtual address segmetn/region/memory area of the area.
For example, in a segmetn/region/memory area, the OS manages the address space. However, the data structure names are completely different.
The three paging systems use the latest variant of the least recently used algorithm to complete page replacement. They all have a daemon process/thread completion page replacement. FreeBSD is the vm_pageout daemon, which is periodically awakened when there is not much free memory. When the available memory is lower than a certain limit, the vm_pageout running routine vm_pageout_scan scans the memory and releases some pages. The vm_pageout_scan routine may need to asynchronously write the changed pages back to the disk before releasing them. No matter how many CPUs, there is only one such daemon. Solaris is pageout daemon, which also runs cyclically to handle idle memory. The paging limit value in Solaris is automatically calibrated when the system starts. This prevents the daemon from occupying CPU or sending a flood page feed request to the disk (well, flood is so accurate; P ). The value of FreeBSD daemon is fixed in most cases-but it can also be adjusted. The Linux LRU algorithm can be dynamically adjusted at runtime, and multiple kswapd daemon can be created, with one CPU at most. All three systems use the global working set policy instead of the PER process working set policy.
FreeBSD has multiple page linked lists to track recently used pages. Including active, inactive, cached, and feee pages. Based on usage, the page is walking around these linked lists. Frequently accessed pages are active. The data page of the exited process will be immediately put on free. If vm_pageout_scan cannot scan all the memory due to load, the FreeBSD kernel may swap out all the processes. If the memory shortage is serious, vm_pageout_scan may kill the largest process in the system. Linux also uses different page linked lists. Physical memory is divided into three zones: One DMA page, one common page, and one dynamically allocated memory page. The implementation of zone is similar to that due to the limitations of the X86 architecture. Pages are moved between hot, cold, and free linked lists. The Mechanism is similar to that of FreeBSD. Frequently Used pages are on hot. The available page is on cold or free.
SUN's bosses use free chain, hash chain, and vnode page chain supports their own LRU implementation. The latter two are roughly equivalent to the active/hot chains of FreeBSD and Linux-they are also the chains to be scanned by FreeBSD and Linux. Solaris does not scan these two objects. It uses the two-handed clock algorithm to scan all pages (seeSolaris InternalsOr somewhere else ). The general method is that the two hands are separated by a fixed example. The previous hand clears the reference bit of the page as the identifier. If no process has referenced this page since then, the subsequent hand will release this page (of course write back to disk if necessary ).

The NUMA locality is taken into account in the three systems during paging. They all merge IO buffer cache and Virtual Memory Page cache into a system page cache. The system page cache is used to read and write files that have been mmap files, as well as text and data segments of applications.

File System
All three systems use the data abstraction layer to hide the file system implementation details from the application. It is to access files by using familiar open, close, read, write, stat, and other system calls, regardless of the implementation and organization of file data in the lower layer. Solaris and FreeBSD call this mechanism VFS (virtual file system). The basic data structure is vnode ). Each accessed file in Solaris and FreeBSD has a vnode assigned to them. In addition to generic file information, vnode also contains a pointer to file-system-specific information. Linux uses a detailed mechanism, also known as VFS (virtual file switch). The file system has no data structure related to inode. This mechanism is similar to vnode (be careful: Solaris and FreeBSD also have their own inode-the data of file-system-dependent in the UFS file system ). Linux has two different structures: one for file operations and the other for inode operations. Solaris and FreeBSD merge them into vnode operations.
VFS allows multiple file systems in the system. This means they can access each other's file system. As long as the related file system routines and data structures have been transplanted to VFS. All three systems allow the file system to stack stacking. The following table lists the types of file systems implemented by each OS, not all.

Solaris, FreeBSD, and Linux both benefit from each other. With the Open Source of Solaris, such mutual promotion is expected to be faster. Max personally felt that Linux was the fastest change. New technologies are quickly integrated into the system, but the document and robustness may be a little outdated. There are a lot of Linux -- or sometimes it looks like there are a lot of -- developers. FreeBSD is probably (in a sense) the longest history of the three systems. Solaris comes from the combination of BSD Unix and AT&T Bell Laboratory Unix and uses more data abstraction layers, so it is generally easier to support more functions. However, most of the layers in the kernel are not described in this document. This may improve with the opening of the Code.
As for their differences, one of the biggest reasons is that the page fault process. In Solaris, when page fault occurs, the code is executed from the trap handler related to the platform (it doesn't seem to need to be said to be an IQ of the big family ...), Then, the as_fault routine of generic is called to determine the segment in which the page fault occurs and then the segment driver is called to process the page fault. The segment driver calls the file system code, which then calls the driver and switches to the page. After the switch is complete, the segment driver calls the HAT layer to update the page table items. In Linux, after page fault occurs, the code called by the kernel will immediately enter the relevant part of the platform. These processes may be faster, however, it may not be easy to scale and transplant (the second half is too economical. I wonder if the author has actually studied the corresponding processing process in Linux ).
Kernel observation and debugging tools are of critical significance for correct understanding of system behavior. In this regard, Solaris has kmdb, mdb, and DTrace. Before the open source, Max has been doing "reverse engineering" for Solaris for many years-he found that using tools to solve the problem is always faster than reading the code-I also know, But I have to look at what occasions, do not be misled by him. For Linux, I think Max is not familiar with it, so I don't think there are many tools. For FreeBSD, he also thinks that it is only possible to use GDB to debug the dump-Liux of the kernel.
It is best to serve as summative materials for reference. You should be clear-headed during reading.
Solaris Internals: Core Kernel ArchitectureBy mcrentall and Mauro Solaris Internals
The Design and Implementation of the FreeBSD Operating SystemBy McKusick and Neville-Neil The Design and Implementation of the FreeBSD Operating System
Linux Kernel DevelopmentBy Love Linux Kernel Development, 2nd Edition
Understanding the Linux KernelBy Bovet and Cesati Understanding the Linux Kernel, 2nd Edition-this is something you should be familiar with in China


Technorati: FreeBSD, Linux, Operating Systems, Software, Solaris, Technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.