Guagua:paypal's Hadoop iterative computing framework

How to use large data training risk control mathematical model has always been the challenge of PayPal in the detection of fraud transactions. PayPal has experienced four stages in the training of risk control model: Decision Tree: Early PayPal uses a simple decision tree model, mainly due to the small amount of data in the early model training, the result of the decision tree model is easy to explain. Logistic regression: When PayPal's business is becoming more and more complex, the split-control model is becoming more and more complex, and using logistic regression can easily handle larger data volumes and more features; and PayPal's line ...

Introduction of 14 Practical open source PHP online document management system

Webshare Webshare is a WEBFTP resource manager developed using ajax+php technology. You can use it to view, copy, modify, add, and share Web documents. Opengoo PHP Open source Document management system Opengoo is a set of open source Web office based on Extjs+xamp (Apache, PHP, MySQL). For any unit or individual to create, share, collaborate to maintain and publish all of their internal and external ...

5 Outstanding Open source audio editor Tools

A reliable audio editor doesn't seem to go into the first few of the tools you have, but it's a tool that can be a big help to your business. Why do you say that? With an audio editor, you can add audio to your corporate Web site, create and edit podcasts, help sell your services or products, record and submit audio for broadcast commercials, and more. But http://www.aliyun.com/zixun/aggregation/13856.html "> Open source community now offers ...

Compression using Lzo in Hadoop

Using Lzo compression algorithms in Hadoop reduces the size of the data and the disk read and write time of the data, and Lzo is based on block chunking so that he allows the data to be decomposed into chunk, which is handled in parallel by Hadoop.   This feature allows Lzo to become a very handy compression format for Hadoop. Lzo itself is not splitable, so when the data is in text format, the data compressed using Lzo as the job input is a file as a map. But s ...

Hive has brought a real-time query mechanism to Hadoop

The Apache hive is a Hadoop based tool that specializes in analyzing large, unstructured datasets using class-SQL syntax to help existing business intelligence and Business Analytics researchers access Hadoop content.   As an open source project developed by the Facebook engineers and recognized and contributed by the Apache Foundation, Hive has now gained a leading position in the field of large data analysis in the business environment. Like other components of the Hadoop ecosystem, hive ...

New technology bridges Oracle, Hadoop, NoSQL data storage

The use of large data has been far less than the ability to collect large data, the main reason is that the current enterprise data mainly dispersed in different systems or organizations, the key to the big data strategy is to be able to more in-depth, richer mining all the data system of valuable information, so more accurate prediction of customer behavior, find business value,   However, it is difficult to move this data to a separate data store, and security and regulatory issues are not guaranteed, Oracle Big Data SQL launched to solve the current challenges. The following is a translation:

Reflection and summarization of setting up Hadoop and HBase cluster

These days at the teacher's request, in three machines to build Hadoop environment and http://www.aliyun.com/zixun/aggregation/13713.html ">hbase environment, which encountered a lot of problems, until today the basic operation of success."   Configuration details are not discussed here, but the issues needing attention are listed for reference. Three machine time synchronization, Firewall (iptables) shutdown settings hostname. For Debian, modify/...

has five advantages over X86,isilon support for Hadoop

Referring to Isilon, the first impression is its onefs operating system, which is detached from the cluster storage system. When the 2010 Isilon was in the pocket of EMC revenue, the industry estimated that its sales would certainly explode.   For 4 years, Isilon is expected to "achieve sales from millions of to 1 billion dollars," said Nick Kirsc, vice president and chief technology officer of the Emcisilon Storage department. Nick Kirsc, vice president and chief technology officer, EMC Isilon Storage, in an article on E ...

Spark Enterprise Application era really come?

Since May 30, the Apache Software Foundation announced the release of the open source Platform Spark 1.0, Spark has repeatedly headlines, has been the focus of data experts.   But is Spark's business application era really coming? From the recent Spark Summit in the United States, we are still full of confidence in spark technology. Spark is often considered a real-time processing environment, applied to Hadoop, NoSQL databases, AWS, and relational databases, and can be used as an API for application interfaces, and programmers process data through a common program ...

Deep understanding of Hadoop clusters and networks

Introduction: The Network in cloud computing and Hadoop is a relatively small area of discussion. This article was written by http://www.aliyun.com/zixun/aggregation/13533.html ">dell, a technical expert in business, Brad Hedlund, who worked in Cisco for years, specializing in data centers, cloud networks, etc."   The article material is based on the author's own research, experiment and cloudera training material. This article will focus on the system of Hadoop clusters ...

Motor 0.3.2 released, MongoDB Python Drive

Motor 0.3.2 Released, this version compatible http://www.aliyun.com/zixun/aggregation/13461.html ">mongodb 2.2,2.4 and 2.6, minimum requirements Pymongo 2.7.1. This release fixes the socket leak in the "copy_database" method and rewrites "Let Us now Prais ...

Why am I still programming in advanced age?

People will expect you to give up some of the real work, such as programming, as your age increases and your personal conditions are limited.   Move to a bigger task, such as managing a team or raising money. This is true in academia, where the "real professor" decides the details, leaving only the "things in the direction". In other words, the organization faces vertical collaboration: Top managers manage some (cheaper) employees in a parallel structure. In research institutions, senior scientists put forward ideas, and the task of junior scientists is to achieve these ideas. Over time, advanced science ...

Tell you what Hadoop is.

What is Hadoop? Hadoop is a software platform for analyzing and processing large data, and it is a box of open source software implemented in Java language, which implements the http://www.aliyun.com/zixun/of massive data in a large number of computer clusters Appach   Aggregation/13452.html "> Distributed computing. The most central design of Hadoop's framework is: HDFs and Mapreduce.hdfs.

What are the major strategic motivations behind the enterprise's contribution to open source projects

Most companies gain a lot of competitive advantage by using open source software, which is beyond doubt.   But on the other side, what benefits can a company derive from its contribution to open source? While GM has gained some "feedback" for creating altruism, what businesses need is to get more feedback and benefits in this way. What is the economic motivation behind the initiative to contribute millions of lines of code, like Google or Facebook, to open source?   Let's take a look at the major strategic motives behind the enterprise's contribution to open source projects. 1. Establish the standard ...

Enterprise IT departments are challenged to seek open source tools

With the development of IT technology, it has become a powerful weapon for enterprises to meet the market competition, in recent years it technology departments in the enterprise's position has been greatly improved.   However, IT departments often face the problems of high human cost, tight project time, rapid development and lack of uniform technical specifications. The high cost of manpower cost is the problem that the enterprise manages not to revolve, especially to the technical staff average http://www.aliyun.com/zixun/aggregation/7393.html "> salary level is higher than ...

Red Hat with the force to integrate the MongoDB into the Linux system

To ease restrictions on the organization's large data services, Red Hat integrates 10gen MongoDB data into its new Identity Management Pack Rhel suite. 10gen Product Marketing Director Kellystirman that "a more robust identity management system it has a central infrastructure through which companies can be used for identity management in many different types of applications." "With the unification of MongoDB and Red Hat authentication management systems, stores that already use Rhel will be easier to build and run using Mon ...

Diagram which programming language has the most influence!

How powerful is a programming language? Ramio Gómez produced a programming language influence graph based on tens of thousands of data, and it can be found that the most influential programming languages are: C, Lisp, Pascal, Java, Smalltalk. Tiobe each month to the programming language rankings, from the side to explain which programming languages use more widely. Have you considered the influence of these programming languages on each other? Which language has the greatest influence? Obviously, most developers will recognize ...

About map and reduce maximum concurrency number settings

About map and reduce the maximum number of concurrent Settings blog Category: Test hadoop&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; First, the Environment 1, Hadoop 0.20.2 2, operating system Linux two, set 1, because Hadoop cluster all machines can not be completely configured, so, the different node machine concurrency of the largest map and red ...

Linux Container runtime Docker open source

Linux Container runtime docker open source published 15 hours ago | Times Read | SOURCE csdn| 0 Reviews | The author Zhang Hong month Open source Linuxdocker Summary: Docker is a http://www.aliyun.com/zixun/aggregation/13423.html "> Cloud computing Platform, it utilizes Linux lxc, Aufu, Go language, Cgroup to achieve the independence of resources, can be very easy to achieve file, capital ...

The technical revelation behind the Instagram 5 legendary Engineers (PPT)

Instagram 5 Legendary engineers behind the technical Revelation (PPT) published in 2013-03-28 22:13| Times Read | SOURCE csdn| 0 Reviews | Author Guo Shemei Postgresqlredismemcachedinstagram Open Source AWS Summary: Instagram, a photo-sharing app developer based on iOS and Android, with a unique operating philosophy, with only 5 engineers, Team A total of 13 people in the case of success to their own 7 ....

Total Pages: 417 1 .... 95 96 97 98 99 .... 417 Go to: GO

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.