Original address: http://blog.fens.me/hadoop-family-roadmap/Sep 6,Tags:hadoophadoop familyroadmapcomments:CommentsHadoop Family Learning RoadmapThe Hadoop family of articles, mainly about the Hadoop family of products, commonly used projects include Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa, and new additions to the project including, YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, hue, etc.Since 20
Hadoop family products, commonly used projects include Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa, new additions including, YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, hue, etc. Since 2011, China has entered the era of big data surging, and the family software, represented by Hadoop, occupies a vast expanse of data processing. Open source industry and vendors, all data software, no one to Hadoop
. Use pimin_net before creating.Use pimin_net; CREATE TABLE users (IDint, user_name varchar,primary KEY (ID));This creates a user table, and for the sake of simplicity, there are only two fields that look like Oracle and MySQL.Third, the crud on the tableAlready has a user table, we insert some data into it, query, update and delete it.Insert into users (ID, user_name) VALUES (1,"China"); INSERT INTO Users (ID, user_name) VALUES (2,'Taiwan'* from users;Results:Cqlsh:pimin_net> SELECT *from users
The main introduction to the Hadoop family of products, commonly used projects include Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa, new additions include, YARN, Hcatalog, O Ozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, hue, etc.Since 2011, China has entered the era of big data surging, and the family software, represented by Hadoop, occupies a vast expanse of data processing. Open source industry and vendors, all da
a relational database?
The difference between hbase and relational data.
is actually the pros and cons of the relational database and HBase respectively.
The flaws of the relational database:
1). Difficulty extending
2). Maintain complex
HBase is the problem of solving scalable rows. Gain linear scalability by simply adding nodes. SQL is not supported.
The difference between hbase and RDBMS.
1). Table Design: HBase table can be very high, very wid
to OLTP, but according to the CAP theory, traditional RDBMS, in order to achieve strong consistency, Synchronization through rigorous acid transactions results in significant discounts on the availability and scalability of the system, and many of the current NoSQL products, including hbase, are ultimately consistent systems that sacrifice part of the consistency for high availability. As I said above, what is column-oriented storage? Hbase,casandra,
compaction. All sstable under each column family are merged into a large sstable.
This model avoids the IO efficiency problem of random write, and effectively alleviates the write amplification problem of B-tree index, and greatly improves the writing efficiency.NoSQL uses a wide range of lsm-tree models, including HBase, Cassandra, LevelDB, Rocksdb, and other k/v storage.Of course, Lsm-tree also has its own flaws:
First, the operation
Introduction to Apache Gora in nutch 2.0
-----------------
1. What is Apache Gora?
Apache Gora is an open-source ORM framework that provides memory data models and data persistence for big data. Currently, Gora supports the storage of column data, key-value data, document data, and RDBMS data. It also supports the use of Apache hadoop for big data analysis.2. Why use Apache Gora?
Although there are many good relational database ORM frameworks on the m
ApacheCassandra is a distributed, scalable, highly available, and fault-tolerant NoSQL database. It was first developed by facebook and then contributed to the Apache Foundation. Cassandra's data model is inspired by GoogleBigtable, and its distributed model is inspired by AmazonDynamo. If you want to learn more about Cassandra's design details, refer to a Facebook paper. This article describes
Apache Cassandra is a distributed, scalable, highly avail
scenarios.
In the process of using MONGODB, although its performance is well behaved, it also embodies the basic characteristics of nosql, but in the actual application scene, MongoDB still has a lot of function deficiencies and can improve the performance.
1. Performance
First I want to talk about the performance of MongoDB. As a NoSQL database, MongoDB's performance aspects of many operations are naturally ahead of the RDBMS, while in comparison w
myself also apply it to some practical application scenarios.In the process of using MONGODB, although its performance is very good, but also very well embodies the basic characteristics of nosql, but in the actual application scenario, MongoDB still has a lot of functional deficiencies and performance can be improved.1. PerformanceFirst I want to talk about the performance of MongoDB. As a NoSQL database, the performance of many operations such as MongoDB's read-write check is naturally ahead
Copy from:http://blog.csdn.net/y_h_t/article/details/11917531All run configurations in Cassandra are configured in configuration file Cassandra.yaml.The following explains the configuration items in Cassandra:Cluster_NameSets the name of the Cassandra Cluster.In a Cassandra cluster, each server must have the corresponding name of the cluster. If the names are inc
model.
Here's a list of NoSQL categories, and the products I used to do when I wrote this article:
Key-value storage: Oracle Coherence, Redis, Kyoto Cabinet
Class BigTable Storage: Apache HBase, Apache Cassandra
Document database: MongoDB, CouchDB
Full-text index: Apache Lucene, Apache SOLR
Figure database: neo4j, FLOCKDB
Conceptual technology Conceptual techniquesThis section focuses on the basic principles of th
are some data use cases. We may have seen them when browsing the LinkedIn Web page.
The updated personal data can appear on the recruitment search page in almost real time.
After the updated personal data is updated, it can appear on the web page of contacts in almost real time.
Share an update that can appear on the news feed page in near real time.
Then it will be updated to other read-only pages, such as "people you may know", "people who have read my materials", and "related searches.
. Delete secondary indexes of a table, rewrite the query so that it only uses the primary key index; 7. Database sharding; this method is complex and costly to maintain; and it is costly to re-split when the data size increases again, limited secondary scalability;
Ii. RDBMS and NoSQLIn actual use, as long as the architecture is proper, relational databases can fully serve various levels of data storage applications, for example, Facebook and Google e
Nutch2.1 extends the storage layer through Gora, optionally using any of HBase, Accumulo, Cassandra, MySQL, Datafileavrostore, Avrostore to store data, but some of them are immature. In my repeated tests found that, overall, Nutch2.1 than Nutch1.6 performance is much worse, the most important thing is not long-term stable operation. Nutch1.6 uses Hadoop distributed File System (HDFS) as a storage, stable and reliable. Here are a few different ways to
/** Text:* Read_write_data.cpp** Description:* Create a context with a table, then try to read and write data* The Cassandra cluster.** Documentation:* Run with no options.* Fails if the test cannot create the context, create the table,* Read or write the data.** License:* Copyright (c) 2011 made to order Software Corp.** Http://snapwebsites.org/* Contact@m2osw.com** Permission is hereby granted, free of charge, to any person obtaining* Copy of this s
flexibility, and are very suitable for big data storage and processing. Compared with traditional relational databases, these NoSQL databases have great performance advantages. However, these NoSQL databases may not be the most suitable for you. Most common applications can still be developed using traditional relational databases. NoSQL databases are still not suitable for mission-critical transactions. I have briefly introduced these databases. let's take a look at them below.1. MongoDB
Mongo
different application types to meet the needs of specific data processing, the development of database system applications on the operating system is much like the development of mobile apps, there has been a booming development. Because big data is still a very hot topic in the future, the database that provides the underlying data management service will still be one of the areas where the computer develops rapidly in the coming period.Many people will confuse the database system with some ot
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.