Dangdang Book Information:
Http://search.dangdang.com/?key=impala
"Open source Big data analytics engine Impala Combat" Preface
Writing background
As a traditional relational database practitioner, we need to understand not only the database itself, but also the hosts that run the database, the warehouses that store the database data, the middleware that reads the database data, and the characteristics of the application itself. With the development of hardware and the thinning of data processing, the database technology evolves from the traditional disk-based relational database to the memory database and the MPP database, and the database product is developed from high Daquan to single RDBMS. In architecture, we have to choose the right database product based on the characteristics of the application.
Since 2009, I have been trying to use Hadoop-based technology to solve the problem that traditional databases cannot scale linearly. Hadoop cannot be called a "database", nor simply a "application", but a mixture between a database and an application that can be used to store and manipulate data and handle the application business logic, which we often call "data platforms." Although Hadoop essentially solves the problem of disk IO expansion, and because of its disk-based (which supports caching features since Hadoop 2.3), there is no need for some of the more demanding tasks of real-time, and Impala and other memory-based computing technologies emerge.
Impala's storage is based on HDFS, which generates execution plans based on the statistics of the table, and has resource management capabilities, which are the big data technologies most like traditional databases. The latest version of Impala when I started writing this book is 1.3.1, which has evolved to version 2.1, and is further enhanced in terms of SQL syntax, installation, extensibility, and performance.
Main content
工欲善其事, its prerequisite, the 1th chapter hand-in-hand for everyone to introduce how to build an Impala environment offline. With an environment, we can not consider the details for a moment, first try to use it. The 2nd chapter describes how to perform simple data loading, table-building, query and other operations on Impala. As a manager of Impala, simply being able to use it is far from enough. The 3rd chapter systematically introduces the architecture system of Impala and the function of each component. The 4th chapter is tailor-made for Impala users, and it costs a lot of space to introduce Impala SQL, functions, UDFs, and more. Any database provides a command-line tool for easy invocation without a graphical interface or in the shell, and Impala is no exception, and the 5th chapter describes Impala's command-line tool, Impala-shell. So how to effectively avoid the overload of hardware resources to use it? Of course, through resource management, the 6th chapter will detail Impala's resource management mechanisms, and Impala can also be managed using yarn. In the 7th chapter, the file types supported by Impala are described in detail, which basically includes the main types of Hadoop files. The 8th chapter introduces the partition mechanism of Impala. The 9th chapter introduces the guiding principles of Impala performance optimization and the techniques used in the optimization process. The 10th chapter introduces the design principles and application cases when using Impala in enterprise application.
Reader Object
L Memory Technology Beginner
L database Administrators and database developers
L Hadoop and in-memory computing operations engineer
L Open source software enthusiasts
L other people who are interested in big data technology
Thanks
Thanks to Dr. Miaokai, Deborah Wiltshire, and Yale Wang for Cloudera's endorsement of this book. Thanks to my good brother Shang and Shang for my encouragement. Thank you for the trust I have served to my clients. Thanks to my family and friends, you are the source of my constant efforts.
About the author
Jia, Data architect, Oracle OCM,DB2 migration Star, TechTarget, a pioneer in transformation from database to big data. Served in China Unicom, Chinese Telecom, CCB, PICC, etc., currently working in a big data solutions provider, is committed to using big data technology to solve the problem that traditional database cannot solve.
Author
January 2015
"Open source Big data analytics engine Impala Combat" Preface