Dating site eharmony through personality tests to match marriage

Source: Internet
Author: User
Keywords Cloud computing open source Docker hadoop openstack eharmony

"Editor's note" as one of the largest dating sites in the United States eharmony through personality tests to match marriage, all eharmony users, must first answer by the psychologist carefully designed hundreds of questions, eharmony to understand the user's personality of the dozens of dimensions, And based on this to the user to introduce appropriate contact object. With the development of technology such as OpenStack, Hadoop, Spark, Docker, CTO Thod Nguyen says they are embracing these technologies actively.

The following is the translation:

The plan began in 2013 and is expected to end by the end of 2015, Nguyen told me in a recent interview. One big reason for eharmony's existing virtualization-centric datacenter into a private cloud environment is the desire to run open source OpenStack cloud software. This will give companies greater flexibility in scaling and configuring infrastructure, including virtual servers and storage, which will strengthen their web sites and mobile apps.

eharmony's business is installed on Cisco's UCS Blades (the server has quietly become Cisco's billions of-dollar business), and now the company wants to cut the number of Web servers from the current 1000 machines to half, he said, the company also manages about 2000 other devices.

Cisco Blade Server

EHarmony has also studied the open source Cloudstack technology supported by Citrix Systems, but Nguyen says OpenStack seems to be more scalable. Although OpenStack has supported many large IT companies and more users, this does not affect their assessment.

"As part of a software-defined storage solution, it gives you more flexibility in shared storage by OpenStack The Swift component," Nguyen adds, "Our real ultimate goal is to be able to grow the storage scale exponentially with minimal operating costs." ”

But Nguyen says eharmony's new approach to operational efficiency will not stop at OpenStack, and the company is also considering the popular Docker container technology for simplifying the deployment and management of distributed applications, and in some cases they may "explore public cloud solutions ”。 eharmony has used AWS to proof-of-concept and disaster recovery, he added.

"Using the Docker concept, we can easily have a Dr solution running on a request-public cloud without having to invest in DR data centers, and investing in DR data centers is very, very expensive for us," Nguyen said.

Thod Nguyen

But eharmony has also collected and analyzed a large amount of data--nguyen is expected to reach PB levels in the next few years, and its previous Hadoop environment, which runs on the 512-node Seamicro device, has become an obstacle to expansion and innovation. Each workload requires its own cluster, Nguyen explains, which means that all other devices are the same and need to replicate the same data again.

Moving to a single cluster running the Yarn Resource Management framework will bring a lot of benefits to the company. First, it can host multiple workloads and process frameworks on the same set of servers, sharing the same file system. It can also be scaled horizontally on demand, rather than 512 nodes at a time.

A shared Hadoop cluster is commercially significant, "explains Nguyen." eharmony can launch new large data applications with more convenience and less input, and yarn means eharmony can begin to focus on new technologies, such as spark and streaming storm for accelerated machine learning workloads.

While the company, like most dating sites, is best known for its matching algorithms, Nguyen says better data infrastructure will also bring better models for business, including price optimization and user experience.

Hortonworks YARN on Hadoop frame composition

"Our goal is to create a data product that can really deliver the right features, very appealing to the customer's correct feature set," he said. "We should give them the products they want before they ask for it." ”

eharmony's technical transformation, especially in terms of data, is no coincidence. In fact, in the past year or two, Spark, Storm and Kafka technologies have begun to reach critical points, making it more feasible to analyze data interactively or in real time and to iterate machine learning models on a regular basis.

"I think the big data has been hyped too much," Nguyen said. "Many people think they're doing big data, but they're just storing the data and they're not actually doing anything with the data." ”

Original link: Why EHarmony is rebuilding itself atop Hadoop and (probably) OpenStack (Zebian/Wei)

CSDN invites you to participate in China's large data award-winning survey activities, just answer 23 questions will have the opportunity to obtain the highest value of 2700 Yuan Award (a total of 10), speed to participate in it!

National Large data Innovation project selection activities are also in full swing, details click here.

The 2014 China Large Data Technology Conference (Marvell conference 2014,BDTC 2014) will be held at Crowne Plaza Hotel, New Yunnan, December 12, 2014 14th. Heritage since 2008, after seven precipitation, "China's large Data technology conference" is currently the most influential, the largest large-scale data field technology event. At this session, you will not only be able to learn about Apache Hadoop submitter uma maheswara Rao G (a member of the project Management Committee), Yi Liu, and members of the Apache Hadoop and Tez Project Management Committee Bikas Saha and other shares of the general large data open source project of the latest achievements and development trends, but also from Tencent, Ali, Cloudera, LinkedIn, NetEase and other institutions of the dozens of dry goods to share. For a limited ticket discount, advance booking is expedited.

Free Subscribe to the "CSDN large data" micro-letter public number, real-time understanding of the latest big data progress!

CSDN large data, focus on large data information, technology and experience sharing and discussion, to provide Hadoop, Spark, Impala, Storm, HBase, MongoDB, SOLR, machine learning, intelligent algorithms and other related large data views, large data technology, large data platform, large data practice , large data industry information and other services.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.