At present, the role of scientific data in scientific research is becoming more and more prominent, data-intensive knowledge discovery method is widely concerned by the scientific community: Scientists not only solve scientific problems through real-time and dynamic monitoring and analysis of large amount of data, but also think, design and implement scientific research based on data. Data is not only the result of scientific research, but also the basis of scientific research; people not only care about data modeling, description, organization, preservation, access, analysis, reuse and establishment of scientific data infrastructure, more concerned about how to use the ubiquitous network and its inherent interactivity, openness, the use of mass data can be knowledge of the object, computable, Construct the research and innovation model based on data and open synergy. In the field of humanities and social sciences, the research methods characterized by "humanistic calculation", complex network analysis and large-scale data analysis have been gradually adopted, and the "scientific nature" of the humanities and social sciences has been improved markedly, while critical and humanistic concerns have weakened.
The data challenge of humanities and social Sciences research
Firstly, the rapid increase of the amount of scientific research data has brought great challenge to the humanities and social sciences scholars. In the 2006, Gregory Crane that researchers in the humanities and social sciences are confronted with a large number of documents and materials in their own research field, and the number of these documents has gone beyond the traditional reading ability to deal with. Therefore, humanities and social science scholars will have to use computers to deal with the completion of the relevant literature, that is, "the challenges of millions of books (Technologists book Challenge)" problem. With the increasing trend of interdisciplinary research, the traditional humanities and social sciences have introduced a large number of computer processing models and analytical methods, various types of computer storage media digital academic resources development, based on complex operations and analysis of computer simulation and demonstration, Based on the facts and evidence of business prediction and case evidence reasoning, and so on the broad rise, fundamentally changed the humanities knowledge acquisition, tagging, comparison, sampling, interpretation and representation. Especially in linguistics, literature, history, literature, ethnology and other aspects of the humanities has achieved remarkable results, and formed a specialized scientific research institutions, the formation of the International Digital Humanities Organization Alliance and the Digital Humanities Center Network two major Digital Humanities Research Alliance.
How to develop humanities and social sciences in large data age
Secondly, the digitization of data changes the data types of traditional humanities and social sciences, and the acquisition, processing and processing of digital resources are becoming more and more significant. At present, a large amount of books, newspapers, periodicals, photographs, pictures, music, video and other humanities materials are digitized, and on the Internet to be provided to researchers access to use. And the data resources represented by "large data" are relative to the digital text, digital information resources, such as a more extensive source, data granularity smaller, more fragmented recording units, more diversified structure, machine generated data is significantly more than manual generation of data, information quality is uneven, the collection of data, The preservation and comprehensive utilization are more dependent on the computer, and the humanities and social sciences need to rely on the computer to support the research process, and the lack of the traditional humanities and social sciences scholars to the computer technology and analytical skills may even affect the final realization of the humanities and social science research. Then it extends the ability of computer analysis and processing to be an important part of scientific research accomplishment of humanities and social science researchers.
Large data and new thinking of humanities and social sciences
From the current research situation of digital humanities and humanities data, there are three kinds of new research thinking in the integration of humanistic and social computing method and humanities and social science research:
First, the Humanities and social sciences open and the whole process of research thinking. In the past, the results of humanities and social sciences were the final results, and the reuse was mainly based on references, reporting and commentary. The Digital Humanities research can record the complete process of humanities and social Sciences research, the original data of the resources, the intermediate results can be used in three-dimensional application, and the level of reuse is significantly improved. At present, three-dimensional open research thinking characterized by online laboratory, project website, Open data set, Project forum and Project Social network is generally established, and the participation is greatly enhanced.
Second, the Humanities and social sciences fragmentation reorganization research thinking. Under the large Data environment, the Humanities and social sciences research pays more attention to fragment data, mass data, unstructured data collection, cleaning and analysis, through the fragmentation reorganization, the depth reveals the difficult to deal with or the unpredictable science question. For example, through the mass of natural language expression to observe the public awareness of political participation, through the online time and resources of scientists to download time distribution research on the time and work intensity of the scientists.
Third, the humanities and Social Sciences computational analysis of the thinking. In the past, the qualitative study of humanities and social sciences mostly, quantitative research also advocated the use of right and wrong, the adoption or rejection of a particular hypothesis, is the use of deterministic, causal relationship of research thinking. In the large data environment, the study of humanities and social sciences can use the computational analysis thinking, to carry on the trend analysis to the related proposition.
In addition, under the above research thinking system, interdisciplinary collaboration, Cross-platform collaboration, massive data processing and the computational trend of humanities and social sciences are becoming more and more obvious, and some research orientations and hot issues have emerged.
Basic characteristics of large data research in humanities and social sciences
The study of large data of humanities and social Sciences has the following basic characteristics:
First, the information involved is much more than the general reading, analysis and understanding of the scope can be dealt with, is the past "can not be studied" or "difficult to study", the emergence of large data analysis methods to provide a new research space in the humanities and social sciences, providing new research possibilities.
Second, the general introduction of computational analysis method, the conclusion is not observation, thinking, understanding and other traditional methods to obtain, but through a large number of data collection and "Automatic Emergence", the theory of the acquisition is different from the traditional humanities and social science research.
The third is to build a sustainable and rich data sets and analysis tools, its usability, sharing, reuse, collaboration greatly enhanced, providing the humanities and social science scholars a large-scale collaboration possible.
Four are interdisciplinary characteristics. Digital Humanities research needs to bring together professional field skills, data management skills, data analysis skills and project collaboration skills, so such projects are often completed by a large span of different disciplines of professional scholars.
The quality, quantity and utilization of data set is the main determinant of the quality of the study, and the research hypothesis is relatively easy. To some extent, data scientists will become the main character in the study of the Humanities and social sciences.
The hidden trouble of large data research in humanities and social sciences
Although the mainstream data service providers, represented by Microsoft, Google and IBM, are strongly advocating the bright future of digital humanities and social science research, there are also deficiencies:
First, the non-scene research logic lacks the applicability and the humanities concern. The data may be jerky and lack of understanding and applicability due to the complete stripping of the specific environment in which the data is contained. For example, the data mining in business analysis, its usability is only about 10%, not "a dig on the spirit." 2012, Canadian writer Stephen Marsh in his article "Literature is not data: against the digital humanities" also said that the literature as a data will lose the rich meaning of literature itself.
Secondly, the study of large data of humanities and social sciences may be "keenly" to find the problem, but can not give a reasonable explanation of the problem, can not give a targeted response, limited its scope of application. such as public opinion analysis, policy calculation, emotional computing applications.
Thirdly, the cluster Society of data analysis eliminates the important individual characteristic, but the individual is the focus of many humanities and social sciences research.
Finally, the study of large data of humanities and social sciences pays much attention to technical analysis, may neglect innovative thinking and speculative analysis, and is not conducive to the cultivation of master-level humanities and social science scholars.
In a word, with the rapid growth of the humanities and social science data and the improvement of the technology of large data analysis, the study of the humanities and social sciences will inevitably become the mainstream of the humanities and social sciences, but it will not replace the existing humanities and social sciences research, but complement each other.
(Sun Jianjun Author is a major project of the National Social Science Foundation "the research on the depth aggregation and service of network information resources oriented to Subject field", Nanjing University professor)
(Responsible editor: Mengyishan)