Choosing new hardware, storage, and other datacenter infrastructures for large data is a new challenge for IT professionals.
Large data is unstructured information with unprecedented scale and form. It includes video, images, and semi-structured data (such as e-mail and text that is common on the web). With more and more sensor-based mobile web monitoring devices and output data, the amount of data available continues to grow exponentially.
The pressure to implement big data strategies often comes from the top, because managers believe that companies that can use data effectively will have a bigger advantage than those who lag behind. Major data strategies require changes in the data center infrastructure for five main points:
Hardware supporting large data
Large data causes storage requirements to grow by 60% to 80% a year. Given this rapid growth and current cost constraints, it buyers should choose the most cost-effective hardware for scalability and storage speed. A mainframe-like scaling up architecture is springing up because they can scale up cost-effectively and lower TCO. Similarly, solid-state drives (SSD) and solid-state cassettes are better at improving performance than traditional disks.
Hardware devices such as IBM Netezza and Oracle Exadata have proven to be effective in both scalability and performance. Consider using hardware devices to support critical data Services, but you should also be sure that the architecture of the device will provide rapid performance upgrades in the future.
Choose storage around large data
With successful large data strategies, companies can integrate high-quality data from within and Hadoop from multiple cloud vendors. This also improves the quality of business-related data, allowing disparate data to be organized into large data resources that are consistent and timely.
Large data is changing the decision-making base for central data warehousing and loosely coupled data marts, which have a much smaller repository that can replace either a central data warehouse or a data source in a central data warehouse. With the increase of local offices or international subsidiaries, central management will need high quality data to maintain control and avoid decentralization of power while expanding the line of business.
The new software technology undertakes the heavy storage related processing work. Data virtualization software provided by Composite Software (just Cisco acquisitions) and Denodo automatically discovers data sources and extracts data to enrich the global metadata repository, providing the entire organization with a common database look and feel for all the data that spans both internal and external. Master data management software improves data quality by creating common master records, eliminating time-consuming data warehouse searches.
The demand for enterprise web outside the chain deepens the reliance on the public and the mixed cloud. Many large enterprises find that they need large data from multiple cloud vendors, but they cannot expect the cloud to be responsible for consolidating the data. Enterprises can only seek tools from data virtualization vendors to consolidate large data across multiple clouds.
III. storage tiering strategy using SSD
Storage costs are high, and the faster storage is more expensive. Most importantly, large data requires storage to provide both large capacity and "big" performance. Storage Tiering provides a variety of cost/performance options in the storage pool, ranging from expensive high-performance solid-state storage to traditional serial scsi (SAS) disk storage, which lowers total cost of ownership. Adding a solid layer between the main memory and the disk will help keep the performance of large data tasks high and not cause storage costs to get out of control.
The amount of SSD should conform to the storage layering rule of "90-10": the best combination of cost and speed is: Use about 10% SSD and 90% mechanical hard drive. This strategy allows IT companies to achieve more than 90% performance gains at the cost of just 10% more. The same rules apply to the main memory and SSD capacity ratios.
Due to the increased performance price ratio of SSDs over traditional disks (capacity promotion, price reduction), the ratio of traditional disk and SSD configurations in the near future is expected to follow the rule of 80-20.
The latest vertical and memory database facilities of IBM BLU acceleration can use SSD to obtain far more traditional disk performance, and their design can effectively play the advantages of "flattened disk" such as SSD.
Iv. Large data analysis and reporting capabilities
While embedded analysis tools have been able to improve business processes with reporting and automated optimization, large data has changed the analysis rules again. For example, a large data strategy can create an iterative and insight analysis thread for each customer, compared to the traditional primary behavioral analysis insights of a single customer, allowing the company to track customers and better maintain long-term relationships with all customers.
Typical large data analysts are called data scientists, and they are more likely to serve as CMO (marketing director) than regular IT executives. However, IT professionals must understand the impact of their company's big data strategy on the work of data scientists.
This means that third-party considerations need to be manually added beyond automated reporting and embedded analysis: ad hoc and loosely coupled analysis. Support for dedicated query analysis and statistical tools is a necessary software prerequisite. Many traditional it vendors, as well as cloud vendors, such as IBM, Cognos and birst--, are expanding these capabilities.
V. Hadoop in the Enterprise
Hadoop provides a distributed file system that "clings to" the MapReduce file system handler framework for data-intensive applications. This file system supports parallel transaction extensions for rich text data, such as social media data.
Many IT companies solve the problem of acquiring Hadoop data sources from the Web by creating their own versions of Hadoop within the enterprise. However, lack of expertise is a challenge: IT managers who are proficient in this evolving web data management framework are just as rare as it executives.
Organizations that develop their own data management tools should be aware that these major vendors, such as IBM, Oracle, and EMC, often provide proprietary products for accessing Hadoop data or custom development, so that it companies do not need specialized data-inductive measures to access the data they need. If you decide to build your own data platform, vendors also provide integration services that make Hadoop more efficient and more aligned with existing IT resources.
Decisions about big data are different for each company. Keep in mind that with the evolution of technology around large data, large data strategies should be adjusted in time and with the Times.