Two misunderstandings in the big data industry

Source: Internet
Author: User

Http://www.cognoschina.net/club/thread-68835-1-1.html

Http://www.cognoschina.net/club/thread-68837-1-1.html

Misunderstandings in the big data industry



The word "Big Data" is probably one of the hottest words in the IT industry in the past two years. The word "Big Data" must be used in various meetings, in the IT industry, it has become the same "Arcade" or "street word". Don't say "big data is long, big Data is short. "I am embarrassed to tell people that I am engaged in it. To some extent, the big data "circle" is too messy, and it is no better than "expensive circle.

In terms of concept, what is big data? In fact, data processing has taken place since the birth of mankind. The old man's cord-breaking is the basic statistics, and counting how many times he had eaten and played hunting, each night, the emperor's brand is also used for data processing. before turning the brand, he needs to analyze indicators such as "convenience", "Heat", and "freshness" from many brands. More recently, data Warehouses have been mature for decades before the word big data emerged. Therefore, big data is not new, but some technologies, such as hadoop, Mr, storm, and spark, have developed to a certain stage and conform to these concepts, however, these concepts are based on a basic concept of "open source". This concept is that they have never been used in any stage before and can save money and improve efficiency, that's why everyone threw matches in this industry (It's not a bad thing to say that many people are currently in a bad fight with the wind ).


Misunderstanding 1: only those engaged in big data technology development are truly "inner ".


I have participated in several meetings. 70% of the participants were technical leaders and data-related project managers and technical leaders in China, the topics we discussed are all about the issues when upgrading the CDH version, which method is better when processing hive jobs, and how to improve efficiency when matching storm and Kafka, how to release the memory when using spark applications. Participants all have one attitude: those who do not know Big Data technology are not qualified to comment on big data. Do you not know the resource configuration in hadoop 2.0 or the optimization of Spark's memory resident time, don't participate in this meeting if you don't know how to collect Kafka! By the way, Google has recently abandoned MR and only used dataflow. Do you understand? You don't know how to roll Rough!

Here I would like to say that technological advances are all driven by the business. Can a certain treasure go to IOE to be called Big Data, as a deaf-mute masseuse, I completed a full-process massage for people of different sizes using a rope notebook. Isn't it big data analysis? To what extent technology develops, only a small part of it is driven by the spirit of scientists pursuing perfection. Most of the reason is that the business develops to a certain extent and the technology must make progress to achieve the goal.


Therefore, the real big data "inner" should include at least the following persons:

1. Business Operation personnel. For example, the Internet product manager requires a technician to calculate the mood index of the user when the user arrives at the website. In addition, to achieve dynamic monitoring, storm or spark can only be used for processing; for example, telecom operators require real-time marketing. When users enter the business office, they must immediately push text messages to users, remind him that his business office has a target for his blind date (showing indicators such as height, dimensions, and weight), but you need to buy a 4G mobile phone before you meet each other. For example, if the patient comes to the bank to open an account, the bank has learned that users have visited the hospital twice in the past week, traveled three times abroad, and brought their children swimming twice, the customer manager immediately recommends related bank insurance + wealth management products to the customer. These business personnel are often the core driver of technological progress.

2. Architect. How important the architect is. When a business engineer and an engineer speak a business language and technical terminology to discuss the problem, engineers often think about what kind of code they can use to immediately shut up, while architects often jump out and say, "No, that's not the case. You can only solve one problem in this way and create several problems in the future, according to my plan, several problems can be solved in the future!" A non-technical enterprise's IT system level usually has more than 70% standards in the hands of architecture designers. As soon as possible, many outstanding architects are learning from the gradual development of engineers, the importance of IT architecture, many enterprises have realized that many enterprises have CTO and CIO positions, which are equally important! No one can feel the beauty of the architecture when the IT system runs smoothly. However, it development must be structured in the context of a chimney-lined and chaotic architecture, after development!

3. Investors. Boss, needless to say, the boss gives you food and clothing, you sell your life to the boss, a natural basic data provider, the boss said there would be mountains, the boss said to do real-time data processing and analysis, with storm, the boss said to be open-source, and hadoop was ready. The boss also said to do iterative mining and spark ......

4. Scientists. They are the geeks in the eyes of others. They are the giants in the eyes of others. They are the mysterious men and women who travel early and late, just like hawking. They are the core force driving technological advances in the world. Apart from the world's top IT companies (often in the hands of the world's technology), other companies generally need one or two scientists who are truly devoted to science, do not let them consider the business scenario, do not let them consider the business process, do not let them calculate the cost, do not let them consider the project progress, the only thing they need to consider is how to beat an opponent on a certain indicator. Increasing 0.1% on a certain indicator has allowed them to fight continuously and never sleep. Let's applaud and cheer for these scientists. In China, I think there are no more than a hundred big data scientists ......

5. Engineers. The engineer is such a group of lovely people who are young, impulsive, and ideal. They are also called "diaosi" and "keyboard party ", they worked tirelessly for their own ideals. Every time they made a little progress, they were all thinking about whether it was another 5-cent increase in the quality of the eggs in the subway. They are sensitive and conceited. They never argue with business personnel. The difference between engineers and scientists is that engineers need to frequently modify code, frequently test programs, and launch frequently. However, the final system is composed of several engineers' codes. Every conceited engineer who sees the history code of the system will give a "Hum, this spam code", and then put it into coding that will be despised by future generations.

6. follow suit. Some of them are trainers, some are kmatt, some are coal bosses, and some are dummies. They are characterized by speculation. The only difference with speculation is that they don't have to pay for it. They think that big data is called as long as they are tied to data. Some of them have never even touched it systems, they are the masters of fish and fish, and they are invisible to the first few. But I would like to say, welcome to speculation. The more fierce an industry is, the more valuable people can play their own roles.


Misunderstanding 2: only big data can save the world

The current technologies and applications of big data are in data analysis, data warehouse, and so on. They are mainly for OLAP (Online Analytical System). From a technical point of view, they include the following two legs: one leg is batch data processing (including Mr and MPP), and the other leg is real-time data stream processing (storm, memory database, etc ). On this basis, in some scenarios, it is found that the Mr framework or real-time framework cannot meet the needs of nearline and iterative mining. Therefore, the spark framework based on memory data processing is very popular. Currently, many enterprises use the hive and pig frameworks above hadoop 2.0 to process underlying data, data processed according to business logic is directly sent to the application database. On the other hand, the storm stream processing engine processes real-time data and triggers corresponding marketing scenarios according to business marketing rules. In addition, Spark-based processing technology clusters are used to meet real-time data processing and mining needs.

The above description shows that a large number is said to be white, but has not yet entered the real transaction system, and has not contributed much to OLTP (Online Transaction System. As many articles relate big data to the Internet of Things, ubiquitous networks, and smart cities, I think big data is just one of the conditions, and other OLTP systems are available, physical networks and even organizational structures are important factors.

Finally, I would like to say that big data processing technology, such as Google's dataflow or mature such as hadoop 2.0, data warehouse, and storm, is essentially a data processing tool. For many engineers, you only need to clarify the data processing process. On this platform, it is sufficient to use fixed templates and scripts for data processing. After all, more than 70% of the value of data is for business applications. If a dazzling word is not helpful to the business, it will eventually be just a task of killing the dragon. Any technology and IT architecture must comply with business planning and business development requirements. Otherwise, technology will only hinder the development of business and productivity.

As the times change, everyone in the data industry is switching between different roles. Today you may be a scientist and tomorrow we will become architects, today, engineers will become scientists a few years later, and some people will enter the ranks of followers.

 

Misunderstanding 3: Big Data

There is a wave of people in the "data" field. They think that "Big Data is called only for Peta-level or higher, and big data is called for even if it reaches more than a-level, there is no real big data age yet! ", Every time I hear this, I know that these people are greatly influenced by the "capacity" in IOE's 4 V theory. In this regard, the first sentence I want to say is "Do not believe in books as well as do not have books, do believe in giants as well as go to IOE". IOE does not just start from hardware, we also want to start from the ideological challenge of giants. Although many classical theories in the IT industry are proposed by traditional giants, with the emergence of challengers, new ideas and technologies have emerged, traditional giants will be slowly overturned, which is also an important factor for us to move forward. If we still stay in the Age of superstitious giants and pursue a concept with such a rigid dogma, there will be no hadoop, spark, or Tesla, there will be no machine learning artificial intelligence, nor the nth industrial revolution in the future.

First of all, I would like to emphasize that big data technology is really not a new word. As I have mentioned in previous articles, the essence of big data is data. The data industry has been developing for several years, the size of data is always beyond the imagination of this era. For example, a floppy disk would have had 1.44 MB of data more than a decade ago. If the data size reached 1 TB, it would have left others speechless. So according to the data size standard, if someone collected 1 TB of data, has it entered the big data era? Apparently not! So I would like to say that the size of the data volume is not the standard for measuring big data. If we judge whether it is big data based on the data volume, then the word "Big Data" is really a pseudo proposition, just like a topic defined simply by the literal meaning that "Tigers are old, boys must be small, giants must be big heads, and flying people must be wings-long.

So let's look back at the concept of big data? First, big data is a complete ecosystem that forms a closed-loop value chain from data generation, collection, processing, summary, presentation, mining, and push, in addition, it provides valuable applications and services for business scenarios after various technologies are processed in each stage. What is the core of big data? On the one hand, it is open-source and throttling. At present, the core goal of big data technology is to better meet data requirements through low-cost technologies (especially to process more unstructured data in recent years ), and save as much investment as possible on the basis of meeting the needs. Speaking of one thousand million-to-10 thousand, the core concept of big data is to meet application requirements. A technology with a clear goal is productivity, and a technology without business goals is called "a waste of Vitality ".


Misunderstanding 4: Big Data for Big Data

I think this misunderstanding is the most serious at present. In some enterprises, the pursuit of technology must be the latest, the best, and the most dazzling, and must be advanced internationally and world-class. All enterprises, regardless of the nature of the industry, regardless of the region, regardless of the age, all shouted "catch up with BAT, big data helps ** enterprises achieve ** goals", next is to go to IOE first, then, I invested in buying clusters, removing all the previous high-performance minicomputers, and stopped all the previously purchased Oji authorizations. The previous decades of investment became obsolete overnight, more resources are invested to catch up with "Big Data ".

Students, we believe that everyone will hear or see it with their own eyes every day. Many companies just smile for the sake of money. This is a big misunderstanding. I want to say:

First, technically, bat or many Internet companies are pursuing big data because of business development needs. Any internet company was born to live for traffic and clicks. This means that this large amount of unstructured data needs to be processed quickly, at this time, it is decided that Internet enterprises can only break down the underlying data through some concurrent means, and then perform rapid processing to meet the needs of their service users and the market. The business process and model of Internet enterprises determine the adoption of big data technology. On the contrary, many enterprises do not need these technologies at all. Some enterprises simply make several formulas in one or two Excel files to meet their development needs, and the data cycle is still processed on a monthly basis, these technologies are not required at all.

Second, in terms of investment, Internet companies were born to be civilians and could not afford large devices. Even after a fortune, no traditional minicomputer mainframe could better satisfy their development, therefore, we can only find another way to create value chains and standards. In the previous low-investment and lightweight architecture, a small amount of linear hardware investment will not be interrupted to meet the business development requirements. On the contrary, some traditional enterprises, or even the giants, have made their investment plans clear a year ago, and on the basis of the original investment, there will be more ROI (return rate of investment ), now, in order to pursue the big data slogan, we have sacrificed a lot of previous investment, except for "not worth the candle", and the rest can only be full of local exercises.

Big Data technology and even any technology are born to meet specific business goals. With a clear business purpose, we can design a technical architecture that conforms to our own business architecture, is a scientific and healthy concept of development. If you are a boss, CEO, or investor, you must understand that big data technology is sometimes like water, and the business goal of an enterprise is the ship, it can also be used ".

With the continuous adjustment of production systems, there will be several rounds of continuous improvement in productivity, and the technology after big data will also change with each passing day, for example, many artificial intelligence technologies, such as machine learning and deep learning, are emerging, there have also been more detailed technical subdivisions, such as "Small Data" and "micro data". When the Technology Flood, as long as you keep a clear mind to satisfy your business, designing your own technical architecture based on your business needs will not be overwhelmed by various genres and concepts.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.