When it comes to big data, it is a popular word that is now often referred to, and it encompasses many areas of the industry. In http://www.aliyun.com/zixun/aggregation/13782.html "> Business Analytics, large data often means making the best decisions about the business using information from customers, sales forecasts, suppliers, and many other inputs , including both short-term and long-term. Commodity traders may use big data in completely different ways, perhaps they will seek analysis from climate data, which requires viewing other images of satellites and related text messages to determine which deals are suitable for the long term or the short term. These cases require a completely different set of analysis tools, are very effective, and use completely different computing types and storage environments, and are very different from the algorithms that process data into information.
Recently, Jeff Layton and I met at dinner to discuss a number of different types of algorithms from chart analysis, to MapReduce, to image change monitoring, and other frameworks, such as NoSQL and the system architecture that effectively ran these algorithms. Of course, there are professional equipment from many suppliers and soon to become suppliers. So big data for me is the process of turning data into information and then becoming knowledge.
This is not a new phenomenon. This is not my famous quote. About 400 years ago, Mr. Francis Bacon said, "Knowledge is power." "When we extracted more and more information and knowledge from the data, Jeff and I believed that the system architecture would change a lot," he said. Without the information extracted and separated from the file, you will not have a static file.
Jeff and I discussed how to solve the problem, what type of data is important and how to move to the new computing age. During dinner, we came up with two different directions, Top-down and bottom-up, and figured out a way to solve large data. Jeff and I discussed how the data itself and how it was extracted depends on the type of data, which end of the spectrum, and which hardware needs to be used to analyze the data. Of course, we discussed the system software required by the operating system, file system, and other large data architectures during dinner. With the editor's approval, Jeff and I are ready to launch "Jeff and Henry's Big Data expedition." ”
I'm going to start talking about the hardware and large data architecture that big data algorithms need. For example:
· What kind of architecture does the future need to solve mapreduce problems, future charting issues or image change monitoring?
· Do you need Ssd,sas drive or enterprise SATA drive?
· What kind of storage controller is required?
· What are the key data archiving issues?
· In the future, what kind of interface is needed--sas, Fibre Channel Ethernet or other?
· Does the planned CPU meet demand, or do you need GPGPU, FPGAs, or something less conspicuous?
· What about memory requirements? Does the future DDR-3/4/5 memory plan to meet the requirements?
· Do you need to store layers and larger memory? For example, by extending CPU channels, such as SGI ultraviolet a connected machine, or professional memory systems and processors, such as Cray Urika?
· Does CPU construction require cache consistency checking, and is cache consistency bandwidth useful for the data type analysis you need?
· is the operating system higher than the device addressing the underlying hardware?
· What about languages, compilers, debuggers, and the entire ecosystem that needs to run system hardware?
· Don't forget the security of your data, because today's data has become information and new knowledge, how to save information from your competitors, enemies, and employees who should not be interviewed?
Maybe you want to have some users look at something, and other users can only look at anonymous data. The hospital patient data is a major example; you are not allowed to see your actual medical records except for a doctor, but the team may need to look at the condition, treatment options and outcomes. Security will be a huge problem, such as the creation and preservation of information in a separate location. Whether it's private data or company secrets, it's a temptation for hackers. Not everyone is able to see everything, everything should be tracked, such as reviewing the trail.
The questions here include:
· What about applications that need to run on these systems?
· Do some queries take precedence over others?
· How does the application write data to facilitate read processing?
· How many threads does the application need, a parallel programming model? If so, what is the programming model, or does it require an SMP model? What programming model will you need to use?
Can an application on a program get any shortcuts? The 90% answer is to get 50% of the computational processing. Does the 90% answer fit in the frame period? Or if you are making a choice between life and death, in which case the 90% answer is not good enough.
Thankfully, the waiters are slow to serve and the food is delicious, or Jeff and I will not have enough time to discuss these issues.
Of course, we have not come to any conclusion. After our dinner meeting with Jeff, we discussed it over the next few days and decided to make "big data" the subject of our second annual joint writing project.
How we handle large data
Over the next few months, I'll step through the stack and get busy with big data issues, which will start with the hardware and stack moving up. As I have said many times, details are important (at least for some time). Jeff will start at the other end, working on the middle part of the stack. We will join in the operating system or somewhere in the compilation and function library.
You might ask why storage sites are talking about compilers, debuggers, and similar things, why should I read these related? Good question. The answer is that we will see a shift in our world from data-oriented to information-oriented processing. Everything is going to change, and we don't want our readers to use dinosaur-style old methods. We believe that this shift is the key to understanding how major changes begin to take place. Storage is only part of it, and if you want to succeed, you need to understand not only storage, but also new operating environments and requirements.
This is not to say that we believe that we will be the experts in all the content, because no one is, or even try to become, but that means to succeed, you have to pay attention to and understand all aspects, or something I did not think of things and some unique things, to be suitable for future development. Large data is more than just cloud storage. Nor is it about archiving, backup, or other tactical issues. It is talking about what you have and extracting information that will help your organization succeed.
(Editor: Lu Guang)