Introduction to Linux Cluster file system and new challenges of cluster application

Source: Internet
Author: User
Tags data structures posix require oracle database

Introduction to Linux Cluster file system

The cluster file system has perfected the database cluster function of Oracle RAC in many aspects. Here is a comparison of them.

Typically, a cluster is just a set of servers (PCs or workstations) that run as a single system. However, the extension of this definition has been significantly expanded, and cluster technology is now not only a dynamic domain, but also its various applications are constantly absorbing new features. In addition, clustered file system technology, whether open source or proprietary, is rapidly converging in its functionality.

Many people talk about cluster applications and the file system software they use, just as they are all the same. More precisely, most clusters contain two main components: a server that shares storage media over a fast network and a file system that acts as a software "binder" that keeps cluster nodes in collaboration.

In the "Linux File System Mastery Guide" article, I explained how file system methods and data structures provide a user-level perspective of the physical structure of a hard disk partition. Although there are differences between projects, the clustered file system does the same work for multiple nodes of a cluster: they make all nodes look like part of a single system, while allowing all nodes of the cluster to perform concurrent read and write operations.

In this subsequent article, we will look at the overall differences between clustered file systems and some of the features of the Oracle real-application cluster (RAC) environment from a higher perspective. It is instructive for a database administrator or system administrator who has just contacted a cluster, Linux file system, or Oracle RAC to send this article.

Introduction to cluster applications

Cluster applications have various levels of maturity and functionality. They include:

High-performance clusters, also known as parallel clusters or compute clusters, are often used in systems that support large amounts of computational processing. In these clusters, the parallel file system allocates processing resources between nodes, thus allowing each node to access the same file concurrently through concurrent reads and writes. The Beowulf Linux cluster, developed by NASA in the early 1990s, is the most common example.

High Availability (HA) clusters are designed specifically for fault tolerance or redundancy. Because these clusters typically use one or more servers for processing, these servers can assume the responsibility of processing other servers when one or more servers are down.

Load-balancing or load-balancing clusters distribute the load as evenly as possible between multiple servers (typically Web servers).

Storage clusters are used between Sans and servers with different operating systems to provide shared access to blocks of data on common storage media.

The database cluster takes Oracle RAC as a platform and introduces many of the clustered file system features into the application itself.

These clustered applications have overlapping features, one or more of which can often be found in a single clustered application-especially in HA and load-balancing clusters. For example, Oracle RAC can be installed on the HA clustered file system to introduce the benefits of a database cluster into an HA clustered application, such as:

Shared resources-including data, storage, hard disk, and metadata-so that multiple nodes look like single file systems. They allow all members of the cluster to read and write to the file system at the same time.

Collection of storage devices into a single disk volume, resulting in improved performance without data replication

scalable capacity, bandwidth, and connectivity

A single system image that provides the same data view for all nodes.

Now let's look at some of the optional Linux file systems that support Oracle RAC and cluster-aware, and how they can improve the capabilities of Oracle RAC.

Can run Oracle's clustered file system

Oracle RAC Technologies already offer features such as load balancing, redundancy, failover, scalability, caching, and locking, so there is a duplication of functionality when Oracle data files are located on a block device that has a traditional Linux file system, such as EXT2/EXT3. In this case, performance is reduced because the cache of Oracle and file systems consumes memory resources.

By the time this article was written, there were four optional file systems running Oracle RAC, in addition to the Third-party clustered file system. In the recommended order of Oracle, they are:

Oracle Automated Storage Management

Oracle Clustered File System

Network File system

Raw equipment.

One of the features of Oracle Automated Storage Management (ASM) Oracle is that no matter what environment it runs in, once you get an Oracle API, all the skins, experiences, and actions are the same. Oracle ASM is a feature of Oracle database 10g, which extends this consistent environment to storage management, using SQL statements, Oracle Enterprise Manager Grid Control, or database configuration helper programs to create and manage storage content and metadata. Using ASM for Oracle database 10g data file storage is considered the best method.

The basic data structure in ASM is a disk group, which consists of one or more disks. In this context, "disk" can be a disk partition, a full disk, a cascading disk, a storage device partition, or a complete storage device.

It is important to realize that ASM is not a common clustered file system. Instead, ASM is a cluster-aware file system designed specifically to handle Oracle database files, control files, and log files. ASM should not be shared with the Logical Volume Manager (LVM) because the latter would make the ASM unrecognized disk.

ASM performs the following functions:

The disk is identified by the ASM ID in the disk header.

Dynamically allocating data among all the storage in the disk group, providing optional redundancy protection and cluster-aware capabilities.

Allows primary storage operations when Oracle databases are fully operational-without downtime add, remove, or even move disk groups to new storage arrays (although rare)

Automatic load balancing and rebalancing when disks are added or deleted

Provides additional redundancy protection by using a failure group

Optimize the use of storage resources.

When installed on a raw device or installed on a block device that is recommended by Oracle to use the ASM Library driver, the ASM itself runs as an instance, which starts before the database instance. It enables DBAs to create, expand, and shrink disks, and map these changes to disk groups on other nodes that share access to those groups. A database instance can share a storage cluster pool between multiple nodes in a cluster.

The ASM is installed by the Oracle General Setup program. If you add ASM to an existing database, make sure that the database is set to subordinate to the ASM instance so that the ASM instance starts at boot time before the dependent database. For example:

$ srvctl Modify instance-d o10g-i o10g1-s +asm1

Make the O10G1 instance subordinate to the +ASM1 instance.

The difference between an ASM instance and an Oracle database instance is shown in the following ways:

Although you can use several v$ views to get information about ASM instances, there are no data dictionaries: V$asm_diskgroup, V$asm_client, V$asm_disk, V$asm_file, V$asm_template, V$asm_ ALIAS and V$asm_operation.

You can only connect ASM instances with SYSDBA or sysoper.

There are five initialization parameters for the ASM instance, where Instance_type is necessary and should be set as follows: Instance_type = ASM.

In the ASM instance, the DBA can use SQL syntax or Enterprise Manager to:

To define a disk group for a storage pool using one or more disks


Adding and removing disks in a disk group

Define a fault group to increase data redundancy protection. This is typically a series of disks in a disk group that require uninterrupted operation, and they share a common resource, such as a controller.


You can monitor the state of an ASM disk group either through Enterprise Manager or through the v$asm view. You can also allocate storage by referencing them in a database instance when you create the database structure.

When you create tablespaces, redo logs, archive log files, and control files, you can reference the ASM disk groups from the database instance by specifying the disk groups in the initialization parameters or in the DDL.

For more detailed information about ASM, see the ASM section of the OTN article "Automatic storage", Arup Nanda, Oracle database 10g: The most important 20 features for DBAs, Lannes morris-murphy, and Oracl E Database administrator's Guide 10g 1th (10.1), chapter 12th.

Oracle Clustered File System (OCFS) OCFS is specifically designed to support data and disk sharing for Oracle RAC applications. It provides a consistent file system image between the server nodes of the RAC cluster and serves as an alias for the original device. In addition to simplifying cluster database management, it also overcomes the limitations of original equipment while maintaining the advantage of original equipment performance. BT Wireless network cracking tutorial

The 1th edition of OCFS supports Oracle data files, SPFile, control files, quorum disk files, archive logs, configuration files, and Oracle Cluster Registry (OCR) files (new files in Oracle database 10g). It is not designed to use other file system files or even to use Oracle software that must be installed on each node of the cluster-unless you are using a third-party solution. In addition, OCFS does not provide LVM functionality such as I/O allocation (segmentation), nor does it provide redundancy.

Oracle supports OCFS on 32-bit and 64-bit release versions of Red Hat Advanced Server 2.1, Red Hat Enterprise Linux 3 and Novell SUSE (United Linux) Oracle database in version 1, which needs to be installed from a downloadable binary file. If you recompile it yourself, Oracle does not provide support.

There are three different RPM packages:

OCFS kernel modules, distribution versions for Red Hat and United Linux are different. Please verify your kernel version carefully: $ uname-alinux linux 2.4.18-4gb #1 Wed Mar 13:57:05 UTC 2002 i686 Unknown

OCFS Support Package BT4

OCFS tool package.

When you have downloaded these RPM packages, perform the following installation steps:

Install these packages by executing the RPM-UHV ocfs*.rpm command in the directory where the RPM package is downloaded.

Verify that the bootstrap mount is enabled automatically when booting.

Automatically uses Ocfstool to configure OCFS on each node in the cluster. You can also use manual configuration methods, see OCFS User's Guide for details. The final result of this step is to create a/etc/nf file for configuring OCFS.

Run OCFS Load_ocfs to ensure that OCFS is loaded at startup.

Use the Ocfstool command and GUI environment or MKFS.OCFS to format the OCFS partition.

Mount the OCFS partition manually, or add an item to the/etc/fstab to implement the automatic mount.

For a more detailed description of these steps, see the best practices documentation.

Because the 1th edition of OCFs is not written in accordance with POSIX standards, file commands such as CP, DD, TAR, and textutils require coreutils to provide a o_direct switch. Cisco Learning Video Download Center The switch enables these commands to be used as expected for Oracle data files, even if Oracle is operating on these same files (problems only occur when you run a third-party software for hot backup). Using RMAN can completely avoid this problem. If you still need to use these features to complete a variety of maintenance tasks, you can download OCFS tools that enable these commands from/projects/coreutils/files.

Instead, the 2nd edition of OCFs (still beta for March 2005) meets the POSIX standard and supports Oracle database software, which can be installed on one node and shared among other nodes in the cluster. In addition to shared Oracle_home, the other new features of OCFS version 2nd include improved metadata data caching, space allocation, and locking. There are also improved log and node recovery capabilities.

Network File System (NFS) Although ASM and OCFS are the preferred file systems for Oracle RAC, Oracle also supports NFS on certified network file servers. NFS is a distributed file system, and this article does not discuss it comprehensively. For more information, visit the NFS home page.

Original device for a period of time, the original device is the only option to run Oracle RAC. The original device is a disk drive that does not have a file system installed, and can be divided into multiple raw partitions. The raw device allows direct access to the hardware partition by bypassing the file system buffer cache.

For Oracle RAC to use raw devices, a block device must be bound to the original device through the Linux Raw command before the Oracle software is installed: Cisco router configuration

# Raw/dev/raw/raw1/dev/sda/dev/raw/raw1:bound to Major 8, minor 0# Raw/dev/raw/raw2/dev/sda1/dev/raw/raw2:bound to Maj or 8, minor raw/dev/raw/raw3/dev/sda2/dev/raw/raw3:bound to major 8, minor 2

After binding, you can use the raw command to query all original devices.

# Raw-qa/dev/raw/raw1:bound to Major 8, minor 0/dev/raw/raw2:bound-Major 8, minor 1/dev/raw/raw3:bound to major 8, min or 2

The major and minor values determine the device location and driver for the kernel. The major value determines the total device type, and the minor value determines the number of devices that belong to that device type. In the example above, major 8 is the device type for SCSI disk/DEV/SDA.

Note that the device does not need to be in an accessible state to run the above command. When I run the above command for a demo, my system does not have any SCSI disks attached. The effects of these commands will disappear after my next reboot, unless I place these commands in a boot script like/etc/init.d/boot.local or/etc/init.d/dbora, which will run whenever my system boots.

After you map a block device to the original device, you still need to ensure that the original device belongs to the Oracle user and Oinstall group.

# ls-l/DEV/RAW/RAW1CRW-RW----1 root disk 162, 1 Mar 2002/dev/raw/raw1# chown oracle:oinstall/dev/raw/raw1#/ DEV/RAW/RAW1CRW-RW----1 Oracle Oinstall 162, 1 Mar 2002/dev/raw/raw1

You can then use symbolic links between the Oracle data file and the original device to make things easier to manage.

The original device limits in Linux Kernel version 2.4 include the limitations of one raw device per partition and 255 original devices per system. Novell SUSE Enterprise Linux comes with 63 original device files, but you can create up to 255 original devices using the Mknod command, which requires root permissions.

# ls/dev/raw/raw64ls:/dev/raw/raw64:no such file or directory# Cd/dev/rawlinux:/dev/raw # mknod raw64 C 162 64# ls/dev/ Raw/raw64/dev/raw/raw64

The above Mknod command requires a device name, device type, and major and minor values. The device name in this example is "raw64" and the device type is "C" (indicating that it is a character device). The major and minor values for the new equipment are 162 and 64 respectively. In addition, Novell SUSE users can install these devices by running orarun rpm.

Other disadvantages of using the original device include:

The original partition number of a disk is limited to 14.

Oracle Management Files (OMF) are not supported.

The original device partition cannot be resized, so if there is not enough space, you must create another partition to add the database file.

The original device appears to be unused space, which may cause other applications to overwrite it.

The only way to write to the original device is to use the low-level command DD, which transmits raw data between devices or files. However, you need to be extra careful to ensure that I/O operations on memory and on disk are properly coordinated.

A raw partition can only have one data file, one control file, or one redo log, and so on. If you do not use ASM, you need to provide a separate raw device for each data file associated with the table space. However, a table space can have multiple data files in different raw device partitions. Cisco Router switch simulation software

Conclusions

Oracle RAC provides many of the features of a file system (clustered or non-clustered) that minimizes the work of the file system itself. As mentioned previously, all you need is a file system that complements the existing, intrinsic database clustering capabilities of Oracle RAC. Although OCFs, NFS, and raw devices may also be feasible, in most cases, ASM achieves this to the fullest extent and is considered the best practice for Oracle. You can also use ASM for data files, OCFS for voting on disks, OCR, and Oracle home directories, and for using ASM on NFS storage.

In the future, we can expect another way to improve the shared memory on ASM by using the shared Oracle home directory in the 2nd edition of OCFs.



new challenges for Linux cluster applications

Linux cluster computing has changed the composition of High-performance computing: Low-cost Linux clustering systems are replacing expensive, traditional supercomputers, and are being used to solve more challenging high-performance computing problems.

In order to give full play to the potential performance of Linux cluster system, we need a new storage mechanism, object-based cluster storage technology emerges. Object-based cluster storage technology is the foundation of a new storage system, it has good scalability both in storage capacity and in Access performance. These allow the technology to meet the storage needs of powerful Linux cluster systems.

In recent years, in the fields of scientific research and engineering calculation, the outstanding achievements of High-performance cluster computing are obvious to all. High-performance cluster technology has gradually occupied the leading position of High-performance computing, which is reflected in the world High performance computer rankings published in November 2003. Of the top 500 supercomputers, 208 use cluster systems, and cluster systems are already the most popular architectures for high-performance computers.

This trend is now spreading from the field of scientific engineering computing to the commercial sector. Geologists are working on more powerful seismic analysis techniques, to obtain a finer picture of the Earth's structure, which is used to guide the drilling and development of oilfields; pharmaceutical companies are working on a vast pool of gene banks to seek a deeper understanding and understanding of human disease, so that more effective drugs or therapies can be developed. , and some of the portals we know, such as Yahoo and Google, need to be searchable and sorted on the vast array of data on the Internet for use by people around the world. All of these areas have become a place for Linux cluster computing. At the same time, it has to be seen that the wide application of Linux cluster Computing has also brought new challenges.

Growing demand for shared storage performance

In addition to the demand for high-performance computing, these commercial applications share a common feature: they all require high-performance I/O support. The prerequisite for ensuring efficient use of cluster systems is that it can quickly access TB (1TB=1000GB,1GB=1000MB)-scale shared data. Without this, the performance of the cluster system will be significantly reduced. To simplify the development and maintenance of application systems, these shared data must be available to all processes on the compute cluster. With the increasing size of cluster system and the increasing number of nodes, the requirement of storage system is more and more high to realize the efficient access of each node to the shared data, and the traditional and network-based storage system can not provide the necessary performance to meet the shared access.

For example, in animation generation applications (the earliest and most famous example of this is the special effects of the film Titanic), it uses a 160-node Linux cluster system to distribute the scenario generation task to hundreds of compute nodes, each of which is responsible for generating a separate part of the final scenario. Shared scenes and character information, and the rendering instructions for each frame must be accessible to each node participating in the calculation, and each node computes a frame that produces approximately 50MB of output. Finally, each individual frame is combined in sequence to get a complete picture. Such a process is a common data access case in many cluster computing applications.

Disadvantages of traditional shared storage methods

The developers of cluster computing naturally adopt shared storage systems that can be accessed by all nodes in the cluster system. Let's take a quick look at the existing shared storage systems.

The first is the file server. It connects the disk array (RAID) directly to the servers in the network system, a form of network storage structure called Das (direct attached Storage). In this structure, various types of storage devices are connected to the file server through an I/O bus such as IDE or SCSI. The data access of the cluster node must be through the file server and then through the I/O bus to access the appropriate storage device. When the number of connected nodes increases, the I/O bus becomes a potential bottleneck, so this storage is only suitable for small-scale cluster systems, and larger clusters require more scalable storage systems.

Storage Area networks (San,storage-area Networks) and optimized direct networked storage, or network-attached storage (nas,network-attached Storage) structures are used in medium-scale cluster systems. A SAN is a high-speed storage network similar to a regular LAN, typically composed of a RAID array connected to Fibre Channel. Data communications for SAN and cluster nodes are typically implemented by SCSI commands, rather than network protocol implementations (as shown in Figure 1).

In a NAS storage structure, the storage system no longer passes the I/O bus to a particular server or client, but is directly connected to the network through a network interface, and the cluster nodes access the shared data through network protocols such as TCP/IP.


However, when clusters become large, there are serious flaws in these structures. Faced with the high concurrency of many cluster computing applications and the demand of single node high throughput, both SAN and NAS architectures are not enough. Because of these two limitations, in the practical application, people have to adopt the data "move" strategy. First, the data is moved from the shared storage system to the compute node, and then the calculation result is moved back to the shared storage system. On a large scale cluster system, many applications take hours or more to move around.

An emerging standard: object-based storage

For a large number of cluster computing users, an object-based storage technology is emerging as a foundation for building large scale storage systems. It leverages existing processing technologies, network technologies, and storage components to achieve unprecedented scalability and throughput in a simple and convenient way.

At the heart of this architecture is the object, which is the basic container that holds the application data and an extensible storage attribute. Traditional files are decomposed into a series of storage objects and distributed to one or more "smart Disks", which are called object-based storage devices (osd,object-based Storage Devices). Each OSD has local processing capabilities, local memory for data and property caching, and local network connectivity. The OSD forms the core of the distributed storage structure, which transfers many traditional storage allocation behaviors from the file system layer, thus solving a bottleneck problem of the current storage system.

Object properties include security information and usage statistics that are used for security-authenticated access, quality of service control, and dynamic allocation of data needed to achieve load balancing between OSD. Object-storage technology uses a scalable structure similar to that of a cluster computing system, and when storage capacity increases, it provides a balanced model to ensure that network bandwidth and processing capacity also grow synchronously, thus ensuring the scalability of the systems.

The joint technical team in the Storage Network Industry Association (SNIA) and the T10 Standards Technical Committee is developing a standard for OSD. The standard includes a set of commands for iSCSI protocols that add object extensions to the original SCSI command set. At the same time, the development of object specifications and command sets has led to the emergence of a new intelligent storage device that can be integrated into an ip-based, high-performance, large-scale, parallel storage environment. Many of the industry's leading storage equipment companies are involved in this effort, including EMC, HP, IBM, Intel, Seagate, and Veritas software companies.

Implementation of shared storage

The object storage architecture provides the foundation for a new generation of networked storage systems. In emerging applications, it is combined with an extensible metadata management layer that provides file system interfaces for applications. This layer is responsible for managing information such as directory affiliation and all permissions for files. It is also responsible for linking storage objects across the OSD (each storage object is part of a file) into a single file to ensure that the data is reliable and available. A cluster node makes a request to this layer, such as opening or closing a file, and, after authentication, accepting the information it needs to access the OSD, and then the cluster node can read and write directly to the file, regardless of the metadata management layer.

When the object storage structure is implemented as part of an extensible clustered file system, it can provide high capacity total bandwidth for hundreds of clients. In short, object storage technology can provide cost-effective shared storage for High-performance Linux cluster systems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.