Big data is a rapidly growing technology area in recent years. The research and application of big data is increasing day by day, and it continues to influence social life. The applications closely related to the public, such as shopping recommendations, road condition analysis and even college entrance examination predictions, fully demonstrate the power of big data. In March 2016, AlphaGo and Li Shishi's man-machine battles allowed people to understand the deep influence of artificial intelligence driven by big data on human society. According to the Big Data Landscape version 3.0, big data related infrastructure, analysis tools and application systems are rapidly evolving. This year-by-year expansion shows that the territory of big data is constantly expanding, and the application of the field is deepening and the influence is increasing day by day.
In the field of education, big data has attracted the attention of researchers and practitioners in many aspects. Whether it is from research paradigms, technology applications, or practical cases, it is rapidly developing. Education big data is becoming a new driving force that cannot be ignored in the field of education, and plays an increasingly important role in the research and practice of education and teaching.
As an emerging field, big data technology is still being rapidly iterated, and new methods, tools, and new models are emerging. In the subdivision of education big data, it has its own distinctive characteristics while consolidating the overall trend of big data development. Today, education big data is becoming more and more eye-catching. On the basis of studying big data technology, analyzing the definition connotation, practical examples, development trends and challenges of education big data helps us grasp the overall picture of educational big data and respond to technological development. Promote systematic changes in education.
I. Development trend of big data technology
The origin of big data technology can be traced back to the MapReduce model proposed by Google in 2004. In more than a decade, big data technology has moved from concept to application, forming a complete set of technologies represented by Hadoop. Today, big data technology is still developing rapidly, and the basic framework, analytical technology, and application systems are constantly evolving and improving. According to statistics, in 2015, US big data startups received financing of US$6.64 billion, accounting for 11% of the total financing in the entire technology sector. This represents the vitality of the big data field and is recognized by the market. The development direction of big data technology is the result of mutual advancement of technology development and application requirements. The analysis of big data technology trends helps to understand the current situation in this field from a more essential level.
1) Infrastructure
After years of development, big data infrastructure is moving in the direction of speed, convenience and integration. The Hadoop framework is an important foundational framework for big data analytics. However, it has problems such as slow calculation speed and complicated operation and maintenance. Based on Hadoop, frameworks such as Spark and Pig are derived, which are constantly improving computing performance and optimizing processing flow. Compared to Hadoop, Spark has a higher level of abstraction, faster calculations, and easier programming. More importantly, Spark provides a unified data platform that supports different types of data applications through different modules. Support batch processing through Spark Core, data interaction through Spark SQL, streaming storage through Spark Streaming, machine learning through MLlib, and graph calculation through GrphaX.
In the big data infrastructure, various new technologies are constantly being produced. Data Lake and Fog Computing respectively provide solutions from different perspectives of data concentration and distribution. Data Lake is a large object-based repository in which data is stored in its original format. Without the need to convert the data, you can perform comprehensive monitoring and analysis and build a data model. Unlike data aggregation in the general sense, data lakes do not need to change the structure of the original data, but rather support the analysis of raw data. This approach eliminates the cost of data extraction, conversion, and loading of ETL. In order to achieve the goal of not directly changing the direct storage and technology of the data structure, the data lake has high requirements for metadata. At present, the data lake technology is still in its infancy, and there are still problems such as large differences in original data, complex types, and difficult analysis and application. But it helps companies to complete longer-term data planning, establish a data governance structure, and address security issues in advance.
The core technology, application status and development trend of education big data
Unlike data lake-focused data aggregation, fog computing proposes a distributed solution. The term fog computing first came from the field of network security, which was later borrowed by Cisco and gave meaning to distributed computing. Cisco interprets fog as "a cloud closer to the ground," and fog computing is an extension of cloud computing. Unlike cloud computing, fog computing is not composed of powerful servers, but consists of weaker, more distributed computing modules and intelligent network settings. These low-latency, location-aware modules can be integrated into each Class infrastructure, and even daily necessities.
It is foreseeable that with the continuous development of the Internet of Things, the amount of data from various types of terminals will increase dramatically. Faced with this situation, the bottleneck of cloud computing may be highlighted. In fog computing, data, analysis, and applications are concentrated in the endpoints of the network, and are aggregated into the cloud only when needed.
The core technology, application status and development trend of education big data
Fog computing extends computing power to a wide range of smart devices at the edge of the network. In this mode, the management and interaction of smart devices becomes very important. For example, Bitcoin's underlying technology "Block Chain" forms a mobile registration, ownership confirmation and intelligent management model. This provides new technical support for self-management and intelligent interaction of various intelligent terminals and devices through the network.
The data lake and fog calculations focus on the source and terminal of big data, providing solutions from both the distribution and concentration perspectives. Admittedly, these programs need to be tested through practice. But overall, data lake and fog computing represent the development trend of big data analysis infrastructure, that is, to acquire and process terminal data in a more flexible way, to distribute computing load reasonably, to broadly aggregate core data, and to implement data through customized standards. Governance.
2) Analysis technology
Analytical techniques are based on big data for model building and the basis for specific applications such as evaluation, recommendation, and forecasting. Big data analysis technology has developed rapidly in recent years, and intelligence, real-time and ease of use have become the development characteristics of analytical technology.
2-1) Intelligent
In terms of analytical technology, the new artificial intelligence formed by the combination of big data and machine learning has become the most attractive trend in recent years. Big data and machine learning are enabling data analysis to achieve intelligent relationship discovery and prediction more quickly based on statistical analysis. AlphaGo is a typical application example of this trend. On the basis of massive data, the innovative algorithms represented by deep learning, through massive parallel computing, continue to iteratively evolve, and finally form a data intelligence that can overcome humanity.
The core technology, application status and development trend of education big data
The artificial intelligence realized by the integration of big data and machine learning is not limited to specific domain applications, but realizes the breakthrough of general artificial intelligence technology. This breakthrough will have a major impact in various application areas represented by healthcare, transportation, finance and education. From a broader perspective, intelligent system solutions represented by smart cities herald the future prospects of integrated application of intelligent big data technology. Data from a variety of devices and sensors can be a source of data for intelligent analytics. Big data-based machine learning continues to evolve and improve its intelligence level while completing massive data collection and analysis. The results of data analysis drive the intelligent activities of each component of the smart city, and the new technology architecture based on data intelligence lays the foundation for the smart life of the future city.
2-2) Real-time
Real-time analytics is another direction of big data technology. With the in-depth development of big data technology, the requirements for real-time analysis and processing of data for various applications are constantly increasing. Unlike aggregation and analysis for historical data, real-time data analysis is more time-sensitive and puts higher demands on data storage, calculation, and presentation. The batch framework in Hadoop has become increasingly prominent in applications that require more efficient analysis, such as real-time user behavior analysis, user classification, and recommendation. Streaming real-time computing frameworks such as Spark Streaming, Samza, and Storm have emerged. The real-time analysis framework represented by Spark Streaming has excellent scheduling mechanism, fast distributed computing capability, balances key parameters between data aggregation and batch processing, improves data throughput and performance, and provides real-time computing. Effective support. Real-time nature indicates that big data will be more deeply integrated into people's work and life. Big data will play a more powerful role in areas such as transportation and translation that need to respond in time.
2-3) Ease of use
In recent years, as technology has matured, the threshold for big data applications has continued to decline. Giants such as Google and Microsoft are constantly introducing big data technology platforms. China's Internet giants Baidu, Ali and Tencent launched Baidu Open Cloud, Ali Digital Plus and Tencent Big Data Platform respectively, providing comprehensive support in application technology. High-quality solutions are available from data collection, model building to visualization applications. Moreover, there are many excellent open source projects in these analysis frameworks, such as Caffe, Torch, etc. Google provides an important choice for Tensor Flow's open source analytics tools. The developer of Tensor Flow comes from the Google Brain team, which integrates Google's analysis of search engines, email and translation, image recognition and more. And the application of Data Flow Graphic to the model construction process and product development are closely combined, after the modeling experiment can be completed, the code can be directly applied to the product. Ease of use paves the way for big data applications in vertical areas.
3) Domain application
With the support of basic framework and application technology, the application of big data in various fields is also rapidly and deeply developed, showing the characteristics of deepening and integration of domain applications, extensive application of visualization and germination of industrial ecological chain.
3-1) Deepening and integration of the field
Big data affects research and practice in many fields at the methodological level, and as a new research paradigm affects many disciplines. In various fields of application, big data has certain universality as a basic method and tool, and also has distinct domain characteristics and domain differences. The data is different from the financial, transportation, retail and other fields with relatively clear quantitative indicators as the basis for machine learning. In the social science related fields such as education, the quantitative indicators formed during the establishment of the big data analysis model are often difficult to obtain. This makes the model construction in the field of education unique. At the same time, the periodicity and complexity of education and teaching itself pose new challenges for model construction.
With the development of big data, domain applications will gradually deepen. In all fields, it is necessary to use domain knowledge to conduct in-depth research and practice on domain issues. In this process, with data as a bridge, integration in various fields will become possible. For example, DMSP/OLS nighttime lighting data from the meteorological system has achieved remarkable results in remote sensing mapping, urban planning, population estimation, national economic measurement, energy consumption, and ecological environmental impact assessment. Based on big data, while the various fields themselves have undergone profound changes, the comparison between the fields has accelerated the trend of convergence. The in-depth development of big data technology in the field and the integration and development between the fields will become increasingly important.
3-2) Visual application
Visualization is the presentation level of big data applications, directly targeting end users, and serving a variety of people through various application scenarios. Data visualization can be achieved in a variety of ways, from the lower-level R language Ggplot extension package, D3 function library, to SPSS Modeler, Tableau and other data analysis and visualization tools. There are many methods and tools for data visualization. In recent years, the application threshold of visualization tools has been decreasing. Heavyweight data analysis companies such as SAP and Tableau have introduced mobile data visualization tools. Take SAP's Roambi as an example. Just import the dataset and select the template. Roambi can immediately complete the beautiful visualization and support the interaction. Tableau not only introduced Tableau Mobile to support mobile-side data analysis, but also built an overall visualization solution for desktop analytics, online publishing and mobile applications through tools such as Tableau Public and Desktop.
With the support of various tools, the application threshold of data visualization is greatly reduced, laying a foundation for a wider range of applications. Data visualization, as the presentation layer of big data technology, is the "last mile" of data analysis and insight. With the continuous optimization and humanization of this link, the wide application of data analysis is just around the corner.
3-3) Ecological chain germination
In August 2015, the "Outline for the Promotion of Big Data Development" issued by the State Council placed big data as a new impetus to promote economic transformation and development, a new opportunity to reshape the country's competitive advantage, and a new way to improve government governance capabilities. The "Action Plan for Promoting Big Data Development" has become the policy basis for the development of big data industry, and will certainly play a catalytic role in the development of big data industry. The capital investment, infrastructure, data standards, application platforms and regional practices of the big data industry will surely show an accelerated development trend. At the same time, as mentioned above, large Internet companies such as Baidu, Ali and Tencent have already made efforts in the field of big data, and have begun to build infrastructure, standardize and promote applications, and big data practices in various application areas are also rapidly developing.
It can be seen that under the joint efforts of policy support, tool platform maturity and field application, the big data industry chain is gradually taking shape and the ecosystem is being gestated. The eco-chain will spawn a series of data standards, form a variety of integrated technology routes, open up raw data to terminal applications, and push big data applications to a new level.