Subversion of traditional science by new technologies: Genealogy uses big data to find the ancestral ancestors

Source: Internet
Author: User
Keywords Cloud Computing Big Data Ancestry Cloud Computing

Many genealogy enthusiasts may think it is a very interesting and rewarding thing to look for information about relatives on the Ancestry web site through census records, birth certificates and other documents. However, it is not easy to tell the story of an ancestor's social files, so it is less convincing to show off your personal records to friends and relatives.

People behind the Ancestry.com service are aware of this. They are now making the most of their 4PB database, which includes official personal records, user submitted information and other data with new features, to provide users with a computer-generated but editable summary of ancestral information.

Ancestry network launched the service called Story View, earlier this quarter for a small number of customers, and now 10% of the customers can enjoy this service. Eric Shoup, the company's executive vice president of product, said in a recent interview that they plan to further refine Story View before and after the official launch of the Story View feature. Ancestry has enhanced the interactivity of this feature by allowing users to wrap around a single-page document image and edit the text portion of the file.

How it works

The Story View feature is based on a well-established tool for data mining of kin data including handwritten records. But sometimes only the key fields, such as name and place of residence. Clients can access the handwritten notes, navigate to the location where a relative is described, and view the unprocessed data, such as that person's occupation.

By stepping through "keyers" to parse the handwritten record and convert the record to searchable text, Ancestry is trying to get more information by handwriting the record. Street addresses have been added this way, and other fields will be added later. In the meantime, there will be more sources of social files as Ancestry continues to expand its database.

Ancestry turned to Narrative Science to generate a paragraph summary for extracting information from multiple documents. Founded in 2010, Ancestry specializes in using machine-generated, copy-ready (the legendary technology that will make us all unemployed). Earlier reports of sporting events and earnings reports for listed companies now use Narrative Science technology for personal information processing.

Reed McGrew, chief development officer for Ancestry's narrative and context services group, said that when Ancestry first adopted Narrative Science, it could only produce data in batches. They generate a large amount of financial reports, which is not what we are trying to provide because the batch processing is really slow.

Within months, Narrative Science has developed a new API that works at a finer level. McGrew said: "They generate social files based on a single user."

Ancestry specializes in pedigree information, and the company's editors provide editorial standards, or "rules," that specify the format for narratives to send and receive data. McGrew explained the Ancestry standard: "For example, if you encounter a child whose child is only 10 years younger than your mother, it's more like a typo, although in reality it will happen, but in most cases it will not, so we will treat this as Wrong to deal with. "

Record that contains information about a relative from Shoup

In Story View, an ancestor's image and life summary is a scaled document image, not a discrete field of structured text. Next to the picture, Ancestry provides the guideline generated from the document information. Once Ancestry discovers that all the records are related to a person, Ancestry chooses a specific fact according to Ancestry's editing rules and assembles it into a complete sentence. Once the document-based guides appear in the browser, users can edit and save them before sharing.

Difficult to share

Scott Sorenson, Ancestry's CIO, said that the challenge we face is not to create and store new user data and web pages. Storage becomes less and less expensive, and accurate handwritten record processing is not a problem either. Often the keyers are found in China. The Chinese character set is much larger than our alphabet, and they are good at typing these records.

The real hard part is ensuring high availability of services, millions of users delivering the right documents and text, and ensuring that website traffic peaks without crashing, but one of the goals of Story View is to get more people to browse the content on the site Final registration.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.