With large data growing in data centers and cloud environments, how to manage networks that transmit millions of of records at the same time is an unprecedented problem.
This is not just a matter of data size--when it comes to large data network solutions, not only is the size of the data really negligible, but so does the workload. Large data environments do not simply operate on the basis of past data infrastructures. Given the complexity and speed of running large data application software, large data needs to be tailored to its own solution.
The traditional data analysis architecture assumes a limited source of data, and they have plenty of time to store the data in the correct tables in the correct database. When it comes to networks and applications such as Twitter, Facebook and Google, this approach to conventional database architecture is like sticking a single bulb in a nuclear reactor.
In order to overcome the obstacles in dealing with large capacity data in a short time, large data users have designed two different methods to solve this problem. The first is to deploy large-scale real-time databases, such as BigTable, Opendremel, MongoDB or Cassandra. These databases share unrelated attributes: they do not rely on standardized query languages (so they are called "NoSQL"), nor do they meet the acid requirements that all data in the associated database must meet.
Another solution is to use analysis databases, such as Hadoop, to classify large-capacity data to achieve the goal.
This means that the focus of the network and surrounding infrastructure will be shifted from optimized storage to optimized search. This must also be done because storage is greatly simplified in a typical large data environment, with all the emphasis on classifying the data to satisfy a useful dataset and then using the analysis of deep conclusions.
Unfortunately, this basic approach can only be applied to ordinary large data networks. In 20000 square feet of data centers, there are a variety of ways to match these data solutions. Each method has an inherent problem that must be addressed. For example, Hadoop uses a Namenode architecture that represents a single point of failure large data manager to handle very sensitive data. If the Namenode device does not work for the network, the entire Hadoop system will be paralyzed, which gives the network administrator to ensure the normal operation of special servers caused a lot of pressure.
Of course, there are non-network solutions. For example, the product brisk from DataStax company is to build a bridge between the real-time performance of Apache Cassandra and the analytical capabilities of Hadoop. Brisk the file system of Hadoop with Cassandra, which means that there is no longer a single point of failure.
Large Data and network architecture
Both options are just the tip of the iceberg from a potentially large data architecture. The difference in the network architecture of these solutions is already very great. So how do network managers deal with more and more large numbers every day?
Solutions such as OpenFlow can help. OpenFlow is the network infrastructure protocol for the Open Networking Foundation product. The reason Open Networking Foundation exists is to perform this protocol openflow around the concept of the software definition network.
The Software definition network is designed to solve problems such as the one described below: The application software itself can define a network topology in a way that is different from building a network solution that eats the world and forcing applications to use this solution to solve problems. By simplifying hardware and network management, OpenFlow can help network administrators more easily configure their networks according to the rules of the software definition network, thereby reducing network management costs for large data networks.
OpenFlow is a low-level standard, but vendors have begun to look for the possibility of setting their own software on OpenFlow. For example, is it possible to design a network management tool that can perceive the abrupt mass migration of network traffic and packet workload, and automatically transform configuration to compensate, and return to "normal" mode when the workload is completed? In fact, if this approach is widely available, OpenFlow will help with the "cloud Network"-on-demand network configuration.
This approach is very important. Switches and routers under the standard topology do not implement the bandwidth we are discussing here. The network itself is becoming an integral part of large data solutions, such as platform solution applications, promoted by Cisco Systems Inc. 's iOS product line, are becoming more common. In the face of such complexity and data scale, flexible fiber-optic connection is rapidly becoming the new favorite of network architecture.
The OpenFlow solution will help network administrators automatically control the size and shape of the network's optical fiber as needed, just as the traffic was implemented in unexpected ways a few years ago.
This is a way that network managers must adapt. Large-scale applications of cloud computing (public, private, or hybrid) and large data applications will seep into every enterprise's application environment in the near future.
(Responsible editor: The good of the Legacy)