"Open source Big data analytics engine Impala Combat" Preface

Last Update:2015-02-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Dangdang Book Information:

Http://search.dangdang.com/?key=impala

"Open source Big data analytics engine Impala Combat" Preface

Writing background

As a traditional relational database practitioner, we need to understand not only the database itself, but also the hosts that run the database, the warehouses that store the database data, the middleware that reads the database data, and the characteristics of the application itself. With the development of hardware and the thinning of data processing, the database technology evolves from the traditional disk-based relational database to the memory database and the MPP database, and the database product is developed from high Daquan to single RDBMS. In architecture, we have to choose the right database product based on the characteristics of the application.

Since 2009, I have been trying to use Hadoop-based technology to solve the problem that traditional databases cannot scale linearly. Hadoop cannot be called a "database", nor simply a "application", but a mixture between a database and an application that can be used to store and manipulate data and handle the application business logic, which we often call "data platforms." Although Hadoop essentially solves the problem of disk IO expansion, and because of its disk-based (which supports caching features since Hadoop 2.3), there is no need for some of the more demanding tasks of real-time, and Impala and other memory-based computing technologies emerge.

Impala's storage is based on HDFS, which generates execution plans based on the statistics of the table, and has resource management capabilities, which are the big data technologies most like traditional databases. The latest version of Impala when I started writing this book is 1.3.1, which has evolved to version 2.1, and is further enhanced in terms of SQL syntax, installation, extensibility, and performance.

Main content

工欲善其事, its prerequisite, the 1th chapter hand-in-hand for everyone to introduce how to build an Impala environment offline. With an environment, we can not consider the details for a moment, first try to use it. The 2nd chapter describes how to perform simple data loading, table-building, query and other operations on Impala. As a manager of Impala, simply being able to use it is far from enough. The 3rd chapter systematically introduces the architecture system of Impala and the function of each component. The 4th chapter is tailor-made for Impala users, and it costs a lot of space to introduce Impala SQL, functions, UDFs, and more. Any database provides a command-line tool for easy invocation without a graphical interface or in the shell, and Impala is no exception, and the 5th chapter describes Impala's command-line tool, Impala-shell. So how to effectively avoid the overload of hardware resources to use it? Of course, through resource management, the 6th chapter will detail Impala's resource management mechanisms, and Impala can also be managed using yarn. In the 7th chapter, the file types supported by Impala are described in detail, which basically includes the main types of Hadoop files. The 8th chapter introduces the partition mechanism of Impala. The 9th chapter introduces the guiding principles of Impala performance optimization and the techniques used in the optimization process. The 10th chapter introduces the design principles and application cases when using Impala in enterprise application.

Reader Object

L Memory Technology Beginner

L database Administrators and database developers

L Hadoop and in-memory computing operations engineer

L Open source software enthusiasts

L other people who are interested in big data technology

Thanks

Thanks to Dr. Miaokai, Deborah Wiltshire, and Yale Wang for Cloudera's endorsement of this book. Thanks to my good brother Shang and Shang for my encouragement. Thank you for the trust I have served to my clients. Thanks to my family and friends, you are the source of my constant efforts.

About the author

Jia, Data architect, Oracle OCM,DB2 migration Star, TechTarget, a pioneer in transformation from database to big data. Served in China Unicom, Chinese Telecom, CCB, PICC, etc., currently working in a big data solutions provider, is committed to using big data technology to solve the problem that traditional database cannot solve.

Author

January 2015

"Open source Big data analytics engine Impala Combat" Preface

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Open source Big data analytics engine Impala Combat" Preface

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Open source Big data analytics engine Impala Combat" Preface

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support