The view was expressed that large data would help to improve the efficiency of the health care industry and promote accountability in the industry. So far, however, other industries have been much more successful in this regard: through large-scale integration and analysis of a variety of data sources, practical value has been obtained.
The successful industry has figured out a problem: when different datasets are connected at a specific individual level, large data can have transformative value. Biomedical data, by contrast, are dispersed in research institutions and deliberately segregated to protect patients ' privacy. Connecting these decentralized data has both technical and social challenges. Only by meeting two challenges can the biomedical data play a full role in the healthcare industry. In today's "ideas" section, we focus on the challenges of this connection.
Campaigns, governments and businesses use large data to learn as much as possible about voters or customers, and then use advanced methods of estimating to develop strategies. When Obama campaigned in 2012, he combined data from Facebook, the census, voter lists and aggressive promotions to identify, approach and influence undecided voters. The National Security Service confirms terrorists by telephone companies and internet companies.
Through the user's Internet history and geographical environment, Google will each person's search results personalized processing. In all of these cases, the key is that it is beyond the scope of the consolidated data to connect the information to the specific person. It is helpful to know that there are many undecided voters in an administrative area, but getting in touch with these specific people may help win a race.
Obtaining large data may allow physicians and researchers to validate new assumptions and identify areas that may be subject to intervention. For example, can you predict the prevalence of obesity and type 2 diabetes in a public health database by purchasing a grocery model from stores in different regions? Is it possible to correlate the amount of exercise recorded by the home surveillance device with the efficacy of cholesterol-lowering drugs, as measured by subsequent formulations, as in dispensing? To what extent can patients ' Facebook users influence their lifestyle choices and compliance with medical treatment? Whether these interrelated inferences really exist in large numbers, and how doctors will use them, is unclear.
However, connecting the data to the specific patient level is a prerequisite for exploring these possibilities. The primary challenge in the effective use of biomedical data is to determine what potential sources of health-care information are and to determine the value of connecting those data. The large data provides a potential solution to the problem by having the dataset organized in different ways in terms of size.
Some large data, such as the Electronic Health Record (EHRS), provide detailed information, including a wide range of information on patient acceptance of the diagnosis (e.g. pictures, diagnostic records, etc.). However, other large data, such as insurance claims data, provide in-depth information-taking into account the patient's medical history over a long period of time within a narrow range of disease types. These large data add value when the connection data helps fill gaps. Only by remembering these can it be easier to understand how to integrate biomedical data from non-traditional sources beyond the health care system into these situations. Although the quality of the data varies, social media, credit card shopping, census records, and a large number of other types of data can help to collect a patient's historical data, particularly to help uncover social and environmental factors that may have an impact on health.
(Source: Journal of the American Medical Association, online)
(Responsible editor: Lvguang)