Information systems can store information on storage devices in the form of data. They are stored online, nearline, and offline.
Online storage stores data on disk storage devices directly managed by the host's file system. It features that it uses the underlying I/O technology of the system, the advantage is that data can be accessed and changed in real time, and the performance is excellent, meeting the application's requirements for I/O performance.
Nearline storage is a disk storage device that stores data directly managed by the file system of another host, this method usually uses a certain amount of software and network to store data between different systems in different regions and migrate data as needed. The advantage of this method is that the data is also stored in the system running on the power-on platform, it can ensure the transmission performance of data storage and migration.
Offline storage refers to the storage of data in tape devices that can be separated from the system at any time when the system is running. Its biggest feature is that it relies on tape technology, the advantage is that you can get a copy of the data that is out of the system during system operation, which is easy to store in a remote location. The combined applications of these three methods will provide users with comprehensive data storage and management solutions.
Store and manage
The core purpose of informatization is to use information technology to provide better production and service methods and improve the competitiveness of organizations. A complete information system is usually composed of multiple subsystems, such as OA, email, proxy, DNS, DHCP, and various application subsystems. Various systems can run on a 7x24 basis. This is the basic requirement of Informatization. To meet this requirement, a complete online data storage solution is required. A unified online data storage system is designed for the entire network system to simplify management, reduce investment, and lay the foundation for data management.
The data management solution is a continuous guarantee for the vitality of the system, and also a basic measure for data reuse, giving full play to the value-added ability of data and strengthening competitiveness. Therefore, to design a complete data management solution, you must first design a complete online data storage solution.
With the development of storage technology, there are three trends in storage: independence, centralization, and network. The SCSI technology achieves storage independence, making the storage independent from the host system and becoming an independent device. The emergence of the Fiber Channel Technology generates FC switches, memory cards, and FC disk arrays, allowing users to design a unified online data storage system in the background of the information center independent of the local network of the enterprise, that is, the storage area network (FC-SAN ).
ISCSI has become a powerful competition in the fiber channel technology, allowing users to build a data transmission system and data storage system (IP-SAN) on the same Ethernet network ). In information planning, you can select the most appropriate one from these three technologies to form an online data storage system.
Management, customized as needed
Data management is different from storage management. Storage Management objects are buckets (or storage resources ), its main content is storage device status monitoring, Online dynamic expansion and adjustment of storage space, unified management and allocation of storage space, etc., in order to provide stable and reliable storage space for hosts and applications.
Data management objects are data stored in the online storage system. The management content mainly includes: you can use different methods to obtain data copies to achieve data security and high availability at various levels, migrate data in different storage devices, and manage data content. Storage Management is designed for online storage systems, while data management uses both near-line and offline data storage methods.
The purpose of designing a data management solution must be clear first. Data management aims:
Ensure system life continuity
Improves storage resource utilization (or saves storage costs)
Share or reuse data (to add value to benefits)
Data management is attached to the online storage system. Therefore, when designing a data management solution, you must consider the online storage system mode. After specifying the storage mode and data management purpose, you can use various data management methods, such as high-availability clusters, backup, replication, Disaster Tolerance, migration, and content management, select appropriate methods to achieve ideal data management.
A high-availability cluster connects two or more identical hosts on the same data stored in the disk array through special software, virtualize multiple hosts into an application system. Load Balancing can be achieved by distributing loads among multiple hosts internally, or the host and backup system can be specified to shut down the master system, the slave system takes over the application to ensure that the application continues to run, so as to achieve high application availability. High-availability clusters can effectively ensure system sustainability, especially suitable for relational databases, such as SQL, Oracle, Sybase, Informix, MySQL, and dbii.
Backup refers to the creation of data copies in a certain way, so that data can be restored when the source data is damaged. There are two backup methods: nearline backup and offline backup. The main difference is whether the backup device is a disk device or a tape device. Based on different scales and different storage modes, backup can be performed in standalone backup, network backup, sever free, and LAN free backup modes. In comparison, standalone backup is only applicable to a single application system and multiple application systems in the same network. It is suitable for network backup and SAN storage environments, server free and LAN free are more efficient.
Replication refers to copying data from the system's primary disk to other systems. Data Replication includes synchronous replication and asynchronous replication. Using different hardware and software devices, you can not only replicate data on the LAN, but also on the WAN. The combination of data replication software and nearline storage can form a high-performance data backup solution. Compared with tape backup, this method can achieve real-time backup when data is updated, the data can be completely restored within a short period of time after the source data is lost. Data replication software and high-availability software can be combined to achieve system disaster tolerance.
Disaster Tolerance refers to the establishment of a backup system in a remote location outside the main application system. Data is synchronized to the backup system through data replication software, through high-availability cluster software, monitor the running status of the master system. Once the master system goes down due to various disasters, the backup system can take over the work of the Master System to ensure real-time online availability of the system. Disaster Tolerance can bring high reliability, but the investment in disaster tolerance construction is relatively large.
Migration refers to the next level where high-speed and high-capacity storage devices (such as non-Online large-capacity tape libraries and online disk devices) are used as primary disk devices (Disk Arrays, automatically migrates data that is not commonly used in the primary disk to the secondary storage device according to the specified policy. When the data is required, the data is automatically transferred back to the primary disk. Through data migration, you can place a large amount of infrequently accessed data on offline or nearline devices, and only store a small amount of frequently accessed data on the primary disk, this improves storage resource utilization and greatly reduces device and management costs.
Data Migration Technology is usually suitable for the PACS, meteorology, earthquake, hydrological HPC and HPS systems in the medical industry, communication media, patent, insurance, books, banking, accounting, and file management industries, and industrial design and marketing.
Content management is an emerging technology in data management. Traditional data management methods use structured relational databases to process only structured data, and most of the information, such as files, reports, videos, audios, photos, faxes, and emails, they are all unstructured. The management of such information has become a difficult problem in data management, and the content management technology is born from this, content management should cover the collection, management, utilization, transmission and value-added work of structured and unstructured digital resources.
The same management method is ever-changing in practical application, but its implementation principle is similar. In information construction, we must consider the situation of each subsystem, taking into account factors such as critical application level, number of system nodes, data type and read/write mode, data size, cross-platform, and cross-network, we firmly grasp the core purpose of data management, customized as needed, choose one or more management methods to achieve the ideal state of data management.