Application of Four Common Compression Formats in Hadoop

Currently used in Hadoop more than four compression formats lzo, gzip, snappy, bzip2, the author based on practical experience to introduce the advantages and disadvantages of these four compression formats and application scenarios, so that we in practice according to the actual situation of choice Different compression formats. 1 gzip compression Advantages: compression ratio is relatively high, and the compression / decompression speed is faster; hadoop itself support, in the application of gzip format file processing and direct processing of the same text; have hadoop native library; most of the li ...

Some basic operations in Hadoop

& http: //www.aliyun.com/zixun/aggregation/37954.html "> nbsp; The first rough talk about the difference between" hadoop fs "and" hadoop dfs ": fs is the more abstract level, in a distributed environment , Fs is dfs, but in the local environment, fs is a local file system, this time dfs is not available. 1, list HDFS files: 1hado ...

'Bitstream Vera Sans Mono'

'Courier New'

Red Hat Update Open Source Software Development Tools

Red Hat (RHT) has updated its open source programming language and development tools, the well-known Red Hat software collection, currently available in beta 1.1. This development kit (released separately) complements its flagship product, Red Hat Enterprise Linux (RHEL). Red Hat Software Collection Released Fall 2013, http://www.aliyun.com/zixun/aggregation/8437.html "> The target groups are those who want to deploy the latest stable release with a single source ...

Talking about OpenStack Data Center Application

Cheng Hui is the founder and the founder of UnitedStack, the earliest promoter and practitioner of OpenStack in China. He will give an in-depth explanation of how OpenStack is building a cloud computing data center in the "Cloud Computing Core Architecture Forum" and bring the team over the past year Experience and Experience in the Process of OpenStack Productization. OpenStack cloud platform as a cloud computing Linux, Internet companies, traditional enterprises preferred open source cloud platform. Due to the rapid development of cloud computing technology, traditional data centers are facing a new era.

Hadoop read and write documents internal working mechanism is like?

Read the file & http: //www.aliyun.com/zixun/aggregation/37954.html "> nbsp; read the file internal working mechanism see below: The client calls FileSystem object (corresponding to the HDFS file system, call DistributedFileSystem object) Open () method to open the file (ie the first step in the diagram), DistributedFileSyst ...

Ali cloud ODPS vision, technology and difficulties

In January 2014, Aliyun opened up its ODPS service to open beta. In April 2014, all contestants of the Alibaba big data contest will commission and test the algorithm on the ODPS platform. In the same month, ODPS will also open more advanced functions into the open beta. InfoQ Chinese Station recently conducted an interview with Xu Changliang, the technical leader of the ODPS platform, and exchanged such topics as the vision, technology implementation and implementation difficulties of ODPS. InfoQ: Let's talk about the current situation of ODPS. What can this product do? Xu Changliang: ODPS is officially in 2011 ...

Hadoop is not a panacea to a clear scene to avoid weaknesses

Ye Qi said Hadoop is not a panacea, can not solve all the big data needs of http://www.aliyun.com/zixun/aggregation/14294.html ">, there are many shortcomings of its own security, real-time, SQL capabilities, Certainly clear demand and use of the scene, with its long and short, in the training he will share Haodop system planning and design, construction, operation and maintenance in the telecommunications industry implementation. - What are the reasons to attract you to study Hadoop technology?

Hadoop 2.3.0 solve what problems

Hadoop 2.3.0 has been released, the biggest highlight of which is centralized cache management (HDFS). This function is very helpful to improve the execution efficiency and real-time performance of Hadoop system and the upper application. This paper discusses this function from three aspects: principle, architecture and code analysis. Mainly solved the problem What users can according to their own logic to specify some frequently used data or high-priority tasks corresponding to the data, so that they are not resident in memory and Amoy ...

A detailed comparison of HPCC and Hadoop

The hardware environment usually uses a blade server based on Intel or AMD CPUs to build a cluster system. To reduce costs, outdated hardware that has been discontinued is used. Node has local memory and hard disk, connected through high-speed switches (usually Gigabit switches), if the cluster nodes are many, you can also use the hierarchical exchange. The nodes in the cluster are peer-to-peer (all resources can be reduced to the same configuration), but this is not necessary. Operating system Linux or windows system configuration HPCC cluster with two configurations: ...

Open-source projects have to prevent five traps

Nowadays, open source software, open source hardware and open source concepts have become more and more popular. If you want to start a new open source project, here are five "traps" you have to guard against! Your Support If you plan to release an open source product, you need to have a deep understanding of what "support" means. Do not expect the community to help you provide product support, and everyone will think that what they do is very important and will be supported (but impossible) by millions. Do not expect a large number of community volunteers to flood your support forums to help you answer questions. You have to be negative for your project ...

In-depth Analysis: Distributed Systems Transactional Classic Problems and Models

When we use a server to provide data services on the production line, I encounter two problems as follows: 1) One server does not perform enough to provide enough capacity to serve all network requests. 2) We are always afraid of this server downtime, resulting in service unavailable or data loss. So we had to expand our server, add more machines to share performance issues, and solve single point of failure problems. Often, we extend our data services in two ways: 1) Partitioning data: putting data in separate pieces ...

6 basic skills for landing your open-source dream

Mark Atwood, head of HP's Open Source Engagement division, published a message entitled "How to Get One of These Awesome Open Source http://www.aliyun.com/zixun/aggregation/16696" in a student-facing speech. html "> Jobs", there are some suggestions which are more suitable for those who want to open source projects ...

MapReduce programming combat

What MapReduce is? MapReduce is a programming model for Hadoop (this large http://www.aliyun.com/zixun/aggregation/14345.html "> Data Processing Environment). Since it is called a model, it means it has a fixed Form MapReduce programming model, Hadoop ecological environment for data analysis and processing of fixed programming. This fixed programming form is described as follows: ...

Use Linux and Hadoop for distributed computing

People rely on search engines every day to find specific content from the vast Internet data, but have you ever wondered how these searches were performed? One way is Apache's Hadoop, a software framework that distributes huge amounts of data. One application for Hadoop is to index Internet Web pages in parallel. Hadoop is a Apache project supported by companies like Yahoo !, Google and IBM ...

Learn 12 facts about Hadoop

Now, Apache Hadoop no one I do not know unknown. When Doug Cutting, a Yahoo search engineer, developed the open source repository for creating a distributed computing environment and named his son's elephant doll, who could think of one day it would occupy the head of "big data" technology Top spot it. Although Hadoop hot with big data together, but I believe there are still many users do not understand it. In last week's TDWI Solutions Summit, TDWI Research Director and Industry Analyst Phili ...

Hadoop metadata merge exception and solution

Observed these days StandbyNn above the log and found that after each Fsimage merged, StandbyNN notify ActiveNTP to download the merged Fsimage process will appear the following exception information: 012014-04-23 & http: // www .aliyun.com / zixun / aggregation / 37954.html "> nbsp; 14: 42: 54,964 ERROR ...

'Bitstream Vera Sans Mono'

'Courier New'

Hadoop crisis? 8 great alternatives to HDFS

HDFS (Hadoop Distributed http://www.aliyun.com/zixun/aggregation/19352.html"> File System) is a core sub-project of the Hadoop project and is the foundation of data storage management in distributed computing. To be honest, HDFS is a Good distributed file system, which has many advantages, but there are also some shortcomings, including: not suitable for low-latency data access, can not efficiently store a large number of small files, no ...

The latest version of Hive 0.13 is released, adding ACID features

Recently released Hive 0.13 ACID semantic transaction mechanism used to ensure transactional atomicity, consistency and durability at the partition layer, and by opening Zoohttp: //www.aliyun.com/zixun/aggregation/19458.html "> Keeper or in-memory lock mechanism to ensure transaction isolation.Data flow intake, slow changes in dimension, data restatement of these new use cases in the new version has become possible, of course, there are still some deficiencies in the new Hive, Hive ...

Total Pages: 128 1 .... 121 122 123 124 125 .... 128 Go to: GO

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.