A world that reads all the big data with numbers

Source: Internet
Author: User
Keywords nbsp we the library digitization can

Since the early 1990s, digital technology has fundamentally changed our way of life. Now we are about to embark on a comprehensive transformation-transforming all human knowledge recorded in analogue form into digital form. The window of the future brings you back to the past, imagining the future.

The collection of the Bavaria library is being scanned and uploaded to the Internet.

Zouze (Zuse), CERN and Zuckerberg (Zuckerberg)-These three names mark an important milestone in the digital revolution process. As early as 1941, Konrad Zuse in Berlin developed the world's first computer to work properly. He called the huge digital computer "Zuse Z3". 1991, the World Wide Web, developed by Tim Berners of CERN, Switzerland, is open to the global public. Web technology has further revolutionized the way people communicate, paving the way for Google, Amazon and countless other businesses. Later, in 2004, Mark Zuckerberg created a social network called Facebook. Today, thanks to Facebook and similar networks, nearly 2 billion people have acquired digital identities.

Computers, Web and Facebook are based on digital technology. The meaning of digitalization is to convert analog information, such as text, sound, images, and video, into an easily stored binary code consisting of a large number of 0 and one. Once created, such digital items can be reproduced infinitely, and the quality is unaffected, and can be spread across the globe through the Internet. Digital technology has created a new type of sales channel for enterprises, but it also brings new problems, such as illegal copying of data. The Internet boom, which began in the 1990s, set off an unprecedented wave of digitization. In the 1993, only about 3% of the information was stored digitally, but by 2007 the ratio had soared to 94%. Moreover, this trend has intensified, with a large number of new data being digitized every day.

One of the most important pioneers of the digital world was the German inventor Rudolf Hell, who was hailed as "Edison of the graphic industry" and was awarded the "Grand Merit Cross of the Federal Republic of Germany" and was awarded the Gutenberg Prize (Gutenberg Prize) and "Weiner Award (werner-von-siemens-ring)". Hell is recognized as the father of fax machines and scanners. In 1980, he commercialized the groundbreaking chromacom Digital image processing system. Hell, a Siemens subsidiary, was employed by the Vatican Library in the early 1980s to scan and digitally copy valuable classics for public reading. The digital process became more extensive and systematic in the the 1990s. For example, in 1990, the company installed a digital processing system for the Kremlin Museum in Moscow, for the first time ever, in digital form, a catalogue of all the Russian czar's art collections. Then, the generated digital images and information, recorded and sorted, are saved to the image database.

Scanned half open books. Today, many organizations want to make digital copies of all the analog information they have. In this regard, the Bavaria library, located in Munich, is exemplary and its digital centre has a wide range of equipment, leading in Germany. "We use 26 different scanning systems, including 4 fully automated scanners that can handle up to 2000 pages per hour," said Klaus Ceynowa, deputy curator of the Bavaria State Library. We have two operators, each of whom is guarding two robots. This system is not only fast, in order to protect books, books only turn 60 degrees. The system's scanning prism can be inserted between the ajar pages. It can read the contents of the page clearly and correctly, and then page over and continue the scan. ”

The entire collection of Bavaria library (left) is being converted to digital form. At present, there is an application that can be used to read its most precious cultural treasures.

Since 2007, Bavaria Library has been working with Google to launch the "Google Digital Library" project, through which the Bavaria library 1 million volumes, will be digitized and put on the Internet for public reading. The books involved were written in the period from 1601 to 1874 and are no longer subject to copyright protection. "Every week, after Google converts it to digital form at its scanning center in Germany, we have to release about 5,000 books," Ceynowa explains. Google assumes the cost of scanning and provides us with digital copies to keep in our own database. All the writings before 1601 and after 1874, including the extremely valuable handwritten manuscripts from the Middle Ages, were converted at our own digital center. According to the plan, the Google Project will end by the end of this year. We have uploaded almost all 1 million books to the digital library of our website, and anyone can read them here. ”

Nevertheless, the Bavaria Library's overall digitization process is far from the end. "Our work is just the beginning, because what we've done so far has created different possibilities for connecting and Ceynowa digital information," says the group. "The Bavaria Library has developed several mobile apps, including a" Ludwig II "application. The app allows people to customize the historical data, images, and documentation associated with Ludwig's "fairytale Castle" according to their location. For example, when people stand in front of the Residenz palace built by Ludwig in Munich, they can use the camera mode of their smartphones to get a real-time image of the palace's famous Wintergarden-a garden that has long been buried in the dust of history. Wintergarden is a garden built on the roof of the Residenz Palace, with flowers and plants in the background, the artificial lake sparkling-through this application, people can enjoy the magnificent historical scenery.

It is now available to read nearly millions of books in the Bavaria library online.

Digital Civil registration. Museums and libraries are the last institutions to fully enjoy the benefits of full digitization, and government agencies and industrial enterprises have long been using the technology. Now, Germany's 16 federal states are planning to digitize all their civil registers. For this reason, over the past two years, the Siemens Central Institute has commissioned a feasibility study under the guidance of Dr. Bernt Andrassy. Andrassy explains: "The German land is basically divided into several blocks." The registration system allocates certain rights to these blocks. The registration system is therefore the central regulatory mechanism for land use in Germany. Currently, the federal states have scanned and archived all the registration documents for the past 50 years, and the Siemens Central Institute has provided them with the essential system components needed. We collected a large amount of data, a total of about 500 million pages of PDF files. ”

This ambitious digital project presents daunting challenges. For example, the Siemens team had to develop automated software to identify individual words, understand key issues, and discover the links within scanned files, including typing files, poor copies of files, and files with multiple modifications. "One thing the software has to know is that that part of the document contains the name of the real estate owner, which part is about the area of the property, whether it's a loan, and which bank is lending," Andrassy explains. "To solve these problems, experts have to work very painstakingly to write programs." Andrassy said: "Our software can identify the requested information and automatically fill in the input mask." The operator only needs to check that the data is completed. "At present, the federal states are planning to issue a tender announcement for this huge filing project." "Pending the digitization of all registrations, each state will establish its own user portal to enable individuals and institutions with legitimate interests to access documents quickly and easily-for example, notaries, banks and tax authorities." ”

The Brandenburg Archives maintains a 1743-year handwriting registration (left) and a contemporary electronic registration (right) displayed by an employee of the Frankfurt District Court.

Misreading can result in millions of losses. Andrassy's experience in registering digital projects also applies to the industrial sector. "We are working on a software package that automatically registers the customer requirements in the tender announcements and then andrassy the data in the digitized documents that were left behind," he said. Such bidding bulletins are usually in PDF format and are often thick with thousands of pages. In the past, each technical specification had to be manually extracted and then evaluated by an expert, such as the maximum allowable noise level of a combined cycle plant after 4 o'clock in the afternoon. ”

However, the list of requirements and technical specifications is usually very long, and even a misreading of a sentence can result in a loss of millions of euros in the future. With this in mind, Munich experts have developed a reliable search system technology that discovers every change made and notifies the user. The ultimate goal is to have the program interpret and interpret technical specifications as semantic objects. Andrassy explained: "The software we have developed is divided into three working stages, which we call ' tender search ', ' bid comparison ' and ' tender tracking '. The first step is a very efficient process that allows users to find technical specifications in the bidding documents. In the second step, the software retrieves similar technical specifications from the files of previous projects. As a result, you can take advantage of previous evaluations to avoid errors. In the final step, the software will track the identified technical specifications in all new versions of the bidding documents. ”

The advantages of this approach are obvious, as automatic evaluation greatly accelerates the assessment process and helps to identify errors made in similar projects as early as possible. In addition, the system enables customers to make changes at the last minute and quickly analyze the consequences and integrate them into the project.

Complete the file comb in a flash. Full digitization is just the beginning. Both libraries, government agencies and factories are creating a lot of digital knowledge that can be used in a whole new way. As a result, development efforts over the next few years, and decades, will focus on software-based tools that need to be completed in a flash to screen digital files, understand semantic associations, and categorize and regroup information. "For example, academics will be able to quickly determine which manuscript the word ' novel ' is first found in," Ceynowa says. They don't have to get into musty, check out hundreds of of documents in libraries around the world, and they have the answers. This will revolutionize certain research disciplines. ”

      Andrassy added: "Access to information such as jurisprudence and previous medical diagnoses of rare diseases can be accessed more quickly." However, intelligent data mining is still irreplaceable, but it can support people. In other words, there is a long way to go to create a PDF file that can read a customer, compare it to a database, and immediately know what it's going to make and how to make it. The

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.