Compare Solaris, Linux, and FreeBSD kernels

Source: Internet
Author: User

I spent most of my time teaching courses on Solaris internal implementation, device drivers, and kernel crash dump analysis and debugging. When explaining to students how to implement various subsystems in Solaris, they often ask "How is it implemented in Linux ?" Or "this is implemented in FreeBSD, but in Solaris ?". This article discusses the three basic subsystems of the kernel and compares their implementations in Solaris 10, Linux 2.6, and FreeBSD 5.3.

The three subsystems discussed in this article are scheduling, memory management, and file system architecture. I chose these subsystems because they are shared by all operating systems (not only UNIX and Unix-like systems) and may be the easiest component to understand in the operating system.

This article will not conduct in-depth research on these subsystems. For more information, see source code, various web sites, and books on this topic. The following are related books:

* Solaris internals: Core kernel architecture, co-authored by mcw.all and Mauro (please visit Solaris internals)

* The Design and Implementation of the FreeBSD operating system, co-authored by mckusick and Neville-NEIL (please visit the design and implementation of the FreeBSD operating system)

* Linux kernel development, love (visit Linux kernel development, 2nd edition); understanding the Linux kernel, co-authored by bovet and cesati (please visit understanding the Linux kernel, 2nd Edition)

If you search for comparison between Linux, FreeBSD, and Solaris on the web, most of the matching webpages discuss the old versions of these operating systems (sometimes Solaris 2.5, Linux 2.2, etc ). Many "facts" are incorrect for the latest version, and some "facts" are incorrect for the version they want to introduce. Of course, most web pages make some valuable judgments on the advantages of these operating systems that are often discussed by everyone, but there is little information to compare these kernels themselves. The information in the following sites may be relatively up-to-date:

* "Solaris vs. Linux" provides an in-depth comparison between Solaris 10 and Linux.

* "Comparing MySQL performance" compares Solaris 10, Linux, FreeBSD, and other operating systems.

* "Fast track to Solaris 10 Adoption" compares some aspects of Linux and Solaris.

* "Solaris 10 heads for Linux territory" is not a comparison, but a comment on Solaris 10.

What is interesting about these three operating systems is that they have many similarities. Apart from naming conventions, these operating systems use very similar methods to implement various concepts. Each operating system supports thread-based time-sharing scheduling, and allows you to execute request page adjustment using the "not recently used page replacement algorithm", and supports Virtual File System layers with different file system architectures. The concept of an operating system is often applied to other operating systems.

 

For example, Linux also adopts the Slab Memory distributor concept of Solaris. Many terms in FreeBSD source code also appear in Solaris. With the launch of Sun's open Solaris source code activity, I hope to see more operating systems that complement each other. Currently, lxr plans to provide a cross-source code reference browser for FreeBSD, Linux, and other UNIX-related operating systems. You can access this browser from the fxr.watson.org site. It would be a pleasure to see the opensolaris source code on this site.

The basic scheduling units in the scheduling and scheduling programs Solaris, FreeBSD, and Linux are kthread_t, thread, and task_struct.

 

Solaris represents each process as a proc_t, and each thread in the process has a kthread_t. Linux uses the task_struct structure to represent processes (and threads ). In Linux, a single-threaded process has a task_struct. A single-threaded process in Solaris has a proc_t, A kthread_t, and a klwp_t. Klwp_t provides a storage area for the thread switching between the user and the kernel mode. A single-threaded process in FreeBSD has a proc structure, a thread structure, and a ksegrp structure. Ksegrp
Is the "kernel scheduling entity group ". In fact, all three operating systems support thread scheduling. Among them, the thread is kthread_t in Solaris, the thread structure in FreeBSD, And the task_struct in Linux.

Scheduling decisions are based on priorities. In Linux and FreeBSD, the smaller the priority value, the higher the priority. This is an inversion. A value closer to 0 indicates a higher priority. In Solaris, a larger value indicates a higher priority. Table 1 describes the priority values of different operating systems.

Table 1. Scheduling priorities in Solaris, Linux, and FreeBSD

 

Solaris
0-59 time-sharing, interaction, fixed, and fair share Scheduler
60-99 System Classes
100-159 real-time (note that the priority of the real-time scheduling class is higher than that of the system thread)
160-169 low-level interruptions

Linux
0-99 system threads, real-time (sched_fifo, sched_rr)
100-139 user priority (sched_normal)

FreeBSD
0-63 interruptions
64-127 upper half kernel
128-159 real-time users (system threads have a higher priority)
160-223 time-sharing users
224-255 idle users

All three operating systems give priority to interactive threads/processes. The priority of the interactive thread is higher than that of the thread bound to the computer, but the time slice obtained by the interactive thread is shorter.

 

Solaris, FreeBSD, and Linux use each CPU "Running queue ". FreeBSD and Linux use the "active" queue and "expired" queue. The System Schedules threads by priority in the active queue. If a thread runs out of its time slice (or to avoid resource shortage), it will move from the active queue to the expired queue. If the active queue is empty, the kernel switches the active queue and the expired queue. FreeBSD also has a queue to hold "idle" threads. The thread in the queue is scheduled only when the other two queues are empty. Solaris uses the "Distribution queue" per CPU ". If a thread runs out of its time slice, the kernel assigns it a new priority and then places it back in the distribution queue. The "Running queue" of the three operating systems has a separate chain for the runable threads with different priorities.
Connect to the table.

Each of the four priorities in FreeBSD uses a list. In Solaris and Linux, each priority uses a separate list.

Linux and FreeBSD Use Arithmetic Operations to calculate the priority of a thread based on the ratio of running time to sleep time ("interactive" measurement method. Solaris execution table search.

 

Neither of the three operating systems supports "group scheduling ". In fact, each operating system schedules the next thread instead of scheduling n threads to run. These three operating systems use the high-speed cache (hot Association) and load balancing mechanisms.

 

For hyper-threading CPUs, FreeBSD has a mechanism to keep multiple threads on the same CPU node (although possibly different hyper-threading. Solaris also has a similar mechanism, but under the control of users and applications, and is not limited to hyper-threading (in Solaris, it is called a "processor set ", in FreeBSD, it is called a "processor Group ").

The biggest difference from the other two operating systems is that Solaris can support multiple "scheduling classes" in the system at the same time ". All three operating systems support POSIX sched_fifo, sched_rr, and sched_other (or sched_normal ).

 

Sched_fifo and sched_rr usually lead to "real-time" threads. (Note that both Solaris and Linux support kernel preemption to support real-time threads .) Solaris supports "Fixed Priority" classes, "System Classes" for system threads (such as page-out threads), and "interactive" classes for threads running in the window environment under the control of X servers, and a fair share scheduler that supports resource management. For more information about how to use these classes and the function overview of each type, see priocntl (1 ). For an overview specific to the fair share scheduler, see FSS (7 ).

 

FreeBSD's scheduler is selected during compilation, while Linux's scheduler depends on the Linux version.

Adding a new scheduling class to the system is costly. Each part of the kernel that can make scheduling decisions (except for the actual operations on the selected thread) has an indirect function used to call the scheduling code. For example, when a thread is about to sleep, it will call the Code related to the scheduling class to execute the job that needs to be completed for thread sleep in this class.

 

In Linux and FreeBSD, the scheduling code can perform the required operations without an indirect call. In Solaris, the additional layer means that more system overhead is required for scheduling (but more functions are provided ).

Memory Management and paging in Solaris, each process has an "address space" consisting of a logical partition called a "segment ". You can use pmap (1) to view the segments of the process address space. Solaris divides the memory management code and data structure into two parts: platform-independent and platform-related. The relevant part of the memory management platform is located at the hat (hardware address translation) layer. FreeBSD uses vmspace to describe the process address space (divided into logical segments called regions ). The hardware-related part is located in the "pmap" (physical ing) module, while the "vmap" routine processes hardware-independent parts and data structures. Linux uses the memory descriptor to divide the process address space into a logical section called a "Storage Area", and uses the logical section to describe the process address space. Linux
You can also use the pmap command to check the process address space.

Linux separates the computer-related layer from the computer-independent layer of the higher level in the software. For example, in Solaris and FreeBSD, code used to handle page faults is mostly irrelevant to computers. In Linux, the vast majority of code used to handle page faults (beginning with troubleshooting) are related to computers. In this way, Linux can process a lot of paging code faster, because the data abstraction (layering) in the Code is less. However, the cost is that you need to make more changes to the Code when changing the underlying hardware or model. Solaris and FreeBSD isolate these changes to the hat and pmap layers respectively.

Section, region, and storage area are limited by the following:

* The virtual address starting with the region.
* Their locations in objects/Files mapped by segments/regions/buckets.
* Permission.
* Ing size.

For example, the program text is located in the segment/region/storage area. The mechanisms used to manage address spaces in these three operating systems are very similar, but the names of data structures are completely different. In addition, more Linux Code depends on computers than the other two operating systems.

The three paging operating systems use the least recently used variant of the algorithm for page stealing/replacement. They all have a daemon/thread for page replacement. In FreeBSD, The vm_pageout daemon will be awakened on a regular basis or when there is not much available memory. When the available memory is less than certain thresholds, vm_pageout runs a routine (vm_pageout_scan) to scan the memory and try to release some pages.

 

Before the vm_pageout_scan routine releases modified pages, you may need to write these pages to the disk asynchronously. No matter how many CPUs there are, such a daemon has only one. Solaris has a pageout daemon, which runs regularly and responds when there is little memory available. In Solaris, the paging threshold is automatically calibrated when the system is started, so that the daemon does not overuse the CPU or conduct flooding attacks on the disk due to page-out requests. In most cases, the FreeBSD daemon uses hard-coded values or callable values to determine the paging threshold. Linux also uses the dynamically adjusted LRU (least recently
Used, least recently used) algorithm. In Linux, there can be multiple kswapd daemon processes (one at most for each CPU ). All three operating systems use a global working set policy (relative to the working set of each process ).

FreeBSD has multiple page lists to track recently used pages. These lists track activity pages, inactive pages, cache pages, and idle pages. Pages will be moved between these linked tables based on usage. Frequently accessed pages are in the activity list. The data page of the exited process is immediately displayed in the idle list. If vm_pageout_scan cannot keep up with the load changes (for example, if the system memory is insufficient), FreeBSD may swap out all processes. If the memory is insufficient, vm_pageout_scan suspends the largest process in the system.

Linux also uses different page chain tables to simplify the LRU style algorithm. Linux divides physical memory into three "zones" (which may be multiple groups): one for DMA pages, one for normal pages, and the other for dynamic memory allocation. These regions are probably the Implementation Details generated by the X86 architecture constraints. Pages are moved between the hot list, cold list, and idle list. The mechanism for moving pages between these lists is very similar to that in FreeBSD. Frequently accessed pages will be in the "hot" list. The idle page is displayed in the "cold" list or "idle" list.

Solaris maintains the variation of its LRU replacement algorithm by using the idle list, scattered list, And vnode page list. Solaris uses the "dual-pointer clock" algorithm (introduced in Solaris internal implementation and elsewhere) to scan all pages, instead of scanning the vnode page list or hash page list (equivalent to the "active"/"hot" list in FreeBSD/Linux ). The two pointers are kept at a fixed distance. The previous pointer clears the reference bit of the page to make the page expire. If no process references the page after the previous pointer accesses the page, the subsequent pointer will release the page (if the page has been modified, the page will be asynchronously written to the disk ).

These three operating systems take NUMA local into account during paging. In these three operating systems, the I/o cache storage area and the Virtual Memory Page cache are merged into a system page cache. The system page cache is used to read/write files and file, text, and application data processed by MMAP.

The three operating systems of the file system use the data abstraction layer to hide the detailed information of the file system implementation from the application. In these three operating systems, regardless of the underlying implementation and organization of the file data, you can use open, close, read, write, stat and other system calls to access the file. Solaris and FreeBSD call this mechanism VFS (Virtual File System). Its basic data structure is vnode (virtual node "). In Solaris or FreeBSD, A vnode is assigned to each accessed file. In addition to common file information, vnode also contains pointers to specific information of the file system. Linux
A similar mechanism is also used, which is also called VFS (Virtual File switching ). In Linux, the data structure independent of the file system is inode. This structure is similar to vnode in Solaris/FreeBSD. (Note that the inode structure is also available in Solaris/FreeBSD, but for UFS file systems, this is File System-related data ). Linux has two different structures: one for file operations and the other for inode operations. Solaris and FreeBSD combine these two operations into "vnode operations ".

VFS allows multiple file system types in the system. This means that these operating systems can access each other's file systems. Of course, this requires that the relevant file system routines and data structures be imported into the VFS of the relevant operating system. All three operating systems allow stack file systems. Table 2 lists the file system types implemented in each operating system, but not all file system types.

Table 2. List of partial file system types

Solaris

 

Ufs default local file system (based on BSD Fast File System)
NFS Remote File
Proc/proc file; see proc (4)
Namefs name file system; allow opening a door/stream as a file
Ctfs contract file system for service management tools
Tmpfs uses anonymous space for temporary files (memory/SWAP)
Swapfs tracks anonymous spaces (data, heaps, stacks, etc)
For details about the objfs tracking kernel module, see objfs (7fs)
Devfs trace/devices file; see devfs (7fs)

FreeBSD

 

Ufs default local file system (ufs2, based on BSD)
Defvs trace/dev File
Ext2 Linux ext2 File System (based on GNU)
NFS Remote File
NTFS Windows NT File System
Smbfs Samba File System
Portalfs mounts the process to the directory
Kernfs files containing various system information

 

Linux

Ext3 log records, derived from ext2-Based Extended file systems
Ext2 extension-based File System
The AFS client supports remote file sharing.
NFS Remote File
Coda another Network File System
Specific information about procfs processes, processors, bus, and platforms
Reiserfs logging file system

Solaris, FreeBSD, and Linux have obviously learned from each other.

With Solaris open source code, such mutual promotion is expected to be faster. I personally feel that Linux is the fastest changing. The advantage is that new technologies can be quickly integrated into the system. However, documents (and robustness) sometimes lag behind.

Linux has many developers (sometimes it seems. FreeBSD is probably (in a sense) the longest of the three systems. Solaris is a combination of bsd unix and at&t Bell Labs UNIX. Solaris uses more data abstraction layers, so it is generally easier to support more features. However, most of such layers in the kernel are not recorded. With the open source code, this may be improved.

A simple example of differences between the three operating systems is page troubleshooting. In Solaris, when a page failure occurs, the code in the platform-specific trap handler is executed and the general as_fault () routine is called. This routine determines the fault segment and then calls the "segment driver" to handle the fault. . The file system code then calls the device driver to switch to the required page. After the page is entered, the segment driver calls the hat layer to update the page table items (or their equivalent content ). In Linux, when a page fault occurs, the kernel calls the code to handle the fault. The system immediately executes platform-specific code. This means that the fault handling code can be executed faster in Linux,
Linux Code may not be easily extended or imported.

Visibility and debugging tools are important for correct understanding of system behavior. You can read the source code, but I believe it is easy to misunderstand the code. It is important to use tools to verify your guesses about how the code works. At this point, I think Solaris is undoubtedly a winner. It has kmdb, mdb, and dtrace. I have been "reverse engineering" on Solaris for many years ". I found that using tools to solve problems is often faster than reading the source code. As for Linux, so many choices cannot be provided in this respect. FreeBSD allows GDB to debug the kernel crash dump. GDB allows you to set breakpoints, steps, and check and modify data and code. For Linux, you can also perform these operations after downloading and installing the corresponding tool.

X bruning currently provides consultation on Solaris internal implementation, device drivers, kernel (and Application) Crash Analysis and debugging, internal network implementation, and various topics. Please use Max at bruningsystems dot com or http://mbruning.blogspot.com/
Contact him

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.