An algorithm to drill down on a Hadoop disk deployment

There are different types of nodes in a http://www.aliyun.com/zixun/aggregation/14119.html ">hadoop cluster, and they have different requirements for disk."   The primary (master) node focuses on storage reliability, and data nodes require better read and write performance and larger capacity. In a virtual cluster, storage (datastore) can be divided into two types: local and shared.

The rise of Hadoop from a small elephant to a giant

With the rapid development of Internet, mobile Internet, IoT of things and cloud computing, the massive data of explosive growth in all walks of life will once again subvert the cloud era, the big data age of information explosion has sounded the horn. How can users extract information that is useful to them from this vast database? This requires large data analysis techniques and tools, and traditional business intelligence (BI) tools cannot withstand such a large amount of data information. Referring to large data, we have to say is the technical terminology related to large data: Hadoop, MapReduce, HBase, NoSQL, etc. ...

Strategies for delving into the Hadoop node deployment

Each http://www.aliyun.com/zixun/aggregation/14119.html ">hadoop cluster includes more than one node." These nodes can be grouped into several groups, such as the master group, the worker group, the client group, and so on.   The nodes in each group have different resource requirements, such as CPU, memory, storage. There is one or more virtual clusters (Cluster) in the Virtual Data Center (DataCenter) ...

A few things you need to know about MongoDB

Henrique Lobo Weissmann is a Brazilian http://www.aliyun.com/zixun/aggregation/6434.html "> software developer who is co-founder of the Itexto Company, This is a consulting firm.  Recently, Henrique on the blog about some of the content of MongoDB, some of which are worthy of our attention, especially the developers who are planning to use MongoDB. ...

Hadoop Capability Test Atlas

A diagram of the Hadoop technical framework under the HADOOP capability Test Map basically involves the main areas of Hadoop's current application and can be used as a test hadoophttp://www.aliyun.com/zixun/aggregation/7155. HTML "> Developer's current ability and level of a better tool. If you can articulate the functionality, application scenarios, and design architecture of each of these technical frameworks, congratulations, you've officially stepped into Hadoop application development ...

Installation under the Hadoop Ubuntu

This is the experimental version in your own notebook, in the unfamiliar situation or consider the installation of a pilot version of their own computer, and then consider installing the deployment of the production environment in the machine. First of all, you need to install a virtual machine VMware WorkStation on your own computer, after installation, and then install the Ubutun operating system on this virtual machine, I installed the Ubutun 11.10, can be viewed through the lsb_release-a command, If you do not have this command, you can use the following command to install the Sud ...

What is structured data? What is semi-structured data?

Relative to structured data (the data is stored in the database, it is possible to use two-dimensional table structure to express the implementation data logically, the data that is not convenient to use the database two-dimensional logical table to represent is called unstructured data, including all format Office documents, text, picture, XML, HTML, various kinds of reports, images and audio/   Video information and so on. An unstructured database is a database with a variable field length and a record of each field that can be made up of repeatable or repeatable child fields, not only to handle structured data (such as numbers, symbols, etc.), but also ...

Companies deploying Hadoop need to think carefully

In recent years, Hadoop has received a lot of praise, as well as "moving to the Big data analysis engine". For many people, Hadoop means big data technology. But in fact, open source distributed processing framework may not be able to solve all the big data problems.   This requires companies that want to deploy Hadoop to think carefully about when to apply Hadoop and when to apply other products. For example, using Hadoop for large-scale unstructured or semi-structured data can be said to be more than sufficient. But the speed with which it handles small datasets is little known. This limits the ha ...

The overview of the Hadoop cluster deployment model

Vsphere Big Data Extensions (BDE) supports multiple deployments to build Hadoop clusters. By: Store/COMPUTE binding model: Deploy Storage nodes (Data node) and compute nodes (Task Tracker) in the same virtual machine.   This is the most straightforward and simple deployment model that can be used to validate concepts and carry out data processing tasks for small-scale clusters. Single Computing model: Deploy only COMPUTE nodes (Job Tracker and task Tracker) ...

Cloudera Hadoop to create a universal data solution

Cloudera's idea of Hadoop as an enterprise data hub is bold, but the reality is quite different.   The Hadoop distance has a long way to go before other big data solutions are eclipsed. When you have a hammer big enough, everything looks like nails. This is one of the many potential problems that Hadoop 2.0 faces. For now, the biggest concern for developers and end-users is that Hadoop 2.0 has massively modified the framework for large data processing. Cloudera plans to build Hadoop 2.0 ...

11 Linux commands that few people know but are useful

The Linux command line attracts most Linux enthusiasts. A normal Linux user typically has about 50-60 commands to handle daily tasks. Linux commands and their transformations are the most valuable treasures for Linux users, Shell scripting programmers, and administrators.  Few Linux commands are known, but they are handy and useful, whether you're a novice or an advanced user. Little people know about Linux commands the purpose of this article is to introduce some of the less-known Linux commands that are sure to efficiently ...

HBase directory Structure

The first part of the file is the Write-ahead log file that is processed by Hlog, and these log files are saved in http://www.aliyun.com/zixun/aggregation/13713.html "> The. Logs folder under the HBase root directory. Logs directory create a separate folder for each hregionserver, with several Hlog files under each folder (because of log rotation). Every HRE ...

The contention of time series data processing: MongoDB vs. Cassandra

Http://www.aliyun.com/zixun/aggregation/13461.html ">mongodb and Cassandra are the two most popular NoSQL databases, MongoDB is the NoSQL field is worthy of the popularity of the king, while the Cassandra is the perennial occupation of the column storage field chief, compared with the hbase of concern for many reasons have been second. Recently Mydrive soulutions operation and Frame ...

Analysis and processing technology of Hadoop data

The analysis of data is the core of large data processing. The traditional data analysis is mainly aimed at the structured data, and the general process is as follows: firstly, the database is used to store the structured data, then the Data Warehouse is constructed, and then the corresponding cubes are constructed and the on-line analysis is processed according to the need. This process is very efficient when dealing with relatively small structured data. However, for large data, the analysis technology faces 3 intuitive problems: large-capacity data, multi-format data and analysis speed, which makes the standard storage technology can not store large data, so it is necessary to introduce a more reasonable analysis platform for large data analysis. Eyes ...

Hadoop China Technology summit triggers Hadoop 2.0 Storm

Hadoop has been 7 years since it was born in 2006.   Who is the global holder of Hadoop technology today? You must think of Hortonworks and Cloudera, or you'll be embarrassed to say you know Hadoop. As the largest Hadoop technology summit in the Greater China region this year, Chinese Hadoop summit will not be overlooked by these two vendors. Reporter has learned from the conference committee, Hortonworks Asia-Pacific technology director Jeff Markha ...

Tell your IT story from open source

Recently our team used dream distributed computing platform to do such a thing, will github a large number of data crawled down through the crawler, through analysis, we extracted the last year part of the developer and project information, got the following interesting information, so to share, the data authentic, no manual interference. (Thanks to all members of iveely team) First Data: Global IT talent distribution &n ...

cassandra-non-centralized structure storage system

Cassandra is a http://www.aliyun.com/zixun/aggregation/14305.html "> Distributed Storage System that can manage very large amounts of structured data distributed across many commercial server nodes, Provides high availability services with no single point of failure. The Cassandra goal is to run on hundreds of base nodes, which may be distributed across different data centers. On this scale, large and small components often fail. Cassandra to ...

Recent advances in SQL on Hadoop and 7 related technology sharing

The greatest fascination with large data is the new business value that comes from technical analysis and excavation. SQL on Hadoop is a critical direction. CSDN Cloud specifically invited Liang to write this article, to the 7 of the latest technology to do in-depth elaboration. The article is longer, but I believe there must be a harvest. December 5, 2013-6th, "application-driven architecture and technology" as the theme of the seventh session of China Large Data technology conference (DA data Marvell Conference 2013,BDTC 2013) before the meeting, ...

Apache HBase 0.96 release, support for Windows platform

http://www.aliyun.com/zixun/aggregation/14417.html ">apache Software Foundation recently announced the launch of the HBase 0.96 version.   According to the development team, the release fixes more than 2000 problems and includes a large number of functional improvements. HBase (Hadoop database) is a distributed, column-oriented open source database, Google BigTable Open source implementation, is APAC ...

HBase Shell Basics and Common commands

Http://www.aliyun.com/zixun/aggregation/13713.html ">hbase is a distributed, column-oriented open source database, rooted in a Google paper BigTable: A distributed storage system of structured data. HBase is an open-source implementation of Google BigTable, using Hadoop HDFs as its file storage system, using Hadoop mapreduce to handle ...

Total Pages: 263 1 .... 59 60 61 62 63 .... 263 Go to: GO

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.