The characteristics of Vertica can be summed up as follows: Column storage, MPP architecture, technology is relatively new. The columnstore itself facilitates the compression of data, and the MPP architecture makes it possible to scale horizontally to a larger scale (PB-level) with a relatively inexpensive PC-Class Server, which has been available for more than 05 years, enabling it to use newer technologies in recent years in the field of column databases, such as invisible Connections (Invisible joins).
Unlike Oracle's one-of-a-kind package, Vertica is designed for analytical applications. Therefore, it is suitable for relatively low-concurrency, relatively overloaded analytic query scenarios. For each query SQL running on Vertica, it always tries to allocate enough system resources to complete in as short a time as possible, instead of pursuing more concurrency at the same time. In general, the number of CPU cores per node is the appropriate maximum concurrency setting. If the maximum concurrency setting is higher, depending on the hardware and parameter configuration of the system, many queries may allocate less resources, and some queries even enter the queue, resulting in an average query time increase for each SQL. In the experience of our testing, after the system load reaches a certain degree, increase the concurrency degree, the system's query throughput (the number of SQL running within a unit of time ) is basically flat or even down . This feature is especially important to note.
Vertica's SQL syntax is compatible with the SQL92 standard, and SQL on Oracle is rarely modified to run directly on Vertica except for some Oracle-specific functions. In particular, the performance of a SQL is inseparable from the analysis of the execution plan.
There are two ways to view the execution plan:
1. MC (Management Console) graphical interface
2. Query System View: Vertica provides a series of data dictionaries and views, two of the most important for SQL performance analysis. Query_plan_profiles provides information about the actual execution plan of the SQL runtime, and Execution_engine_profiles further provides specific resource consumption information per node at each step of SQL execution to accurately determine the bottleneck. With these two views of the data, you can basically complete all of the SQL-level performance analysis.
How to get the execution plan for a specific SQL:
In Oracle, a SQL execution plan can be uniquely determined based on the sql_id and plan hash value. Similarly, in Vertica, a single execution plan can be determined by the transaction_id and statement_id two parameters alone. Manual testing usually statement_id default is 1, so the above script in use, attention caught to analyze the SQL transaction_id can.
Because Vertica chooses a different compression algorithm for each column, the execution efficiency of different data types varies greatly during SQL execution. So the choice of data type has a more pronounced effect on performance than Oracle.
As far as our practice is concerned, the choice of data types can be summarized briefly as follows: Vertica
1, can use the integral type without floating-point type;
2, the length as short as possible;
3, can be fixed length do not change long (combined with 2);
Based on the test validation, we changed the fact table key/id Type field from floating point to integer in a case, changed the data progress of the large Amount field from (38,10) to (24,4), and finally the query performance increased from 6 seconds to 4 seconds, raising 50%
Vertica has a special concept projection, the specific definition and characteristics of the introduction is no longer mentioned here, the Oracle experience can be easily understood as a similar materialized view of the function (of course, the nature of a very big difference).
Vertica Performance Analysis