"equal"
MySQL classic paging "optimization" practices
In MySQL paging optimization, there is a classic problem. The slower the query is, the slower the data is (depending on the index type of the table. For B-tree indexes, the same is true for SQL Server)Select * from t order by id limit m, n.That is, as M increases, querying the same amount of data slows down.
optimization post on the local *** Og, right and wrong, and there was no time series to sort it out, in this article, we will sort out the concept of the join method for your reference. By checking the information to understand the various concepts mentioned in it, we can continue to verify and summarize the concepts in practice, so that we can fully understand the database step by step.
I only know more a
SQL General optimization scheme:1. Using parameterized queries: Prevent SQL injection and precompile SQL commands for increased efficiency2. Remove unnecessary queries and search fields: In fact, in the actual application of the project, many of the query conditions are optional, can be avoided from the source of redun
data record. When clustered index seek is executed, the actual data record is finally scanned. In this process, tableb. col2 =? This condition also avoids an additional filter operation. This is why the filter operation is not performed in MySQL 1.4.
F) construct the returned result set. Same as step d in step 2.
1.6 nested loop usage conditions
If any Join Operation meets the nested loop usage conditions, the SQ
Processing, I think of oracle. Along the way, I recorded a lot of optimization post on the local blog, right and wrong, and there was no time series to sort it out, in this article, we will sort out the concept of the join method for your reference. By checking the information to understand the various concepts mentioned in it, we can continue to verify and summarize the concepts in practice, so that we ca
Label:I. Spark SQL and SCHEMARDD There is no more talking about spark SQL before, we are only concerned about its operation. But the first thing to figure out is what is Schemardd? From the Scala API of spark you can know Org.apache.spark.sql.SchemaRDD and class Schemardd ex
, Shufflemanager is constantly iterating and becoming more advanced.Prior to spark 1.2, the default shuffle compute engine was hashshufflemanager. The Shufflemanager and Hashshufflemanager have a very serious disadvantage, that is, will produce a large number of intermediate disk files, and thus by a large number of disk IO operations affect performance.So in the release of Spark 1.2, the default shuffleman
, Shufflemanager is constantly iterating and becoming more advanced.Prior to spark 1.2, the default shuffle compute engine was hashshufflemanager. The Shufflemanager and Hashshufflemanager have a very serious disadvantage, that is, will produce a large number of intermediate disk files, and thus by a large number of disk IO operations affect performance.So in the release of Spark 1.2, the default shuffleman
Logicalplan The logical plan, made up of Catalyst TreeNode, can be seen with 3 syntax trees Sparkplanner Optimization strategies with different policies to optimize the physical execution plan queryexecution Environment context for SQL execution It is these objects that make up the spark SQL runtime and look cool
Profile On Select O. ID, O. cus_name, OD. good_name From Orderdetails As Od Inner Join [ Order ] As OOn O. ID = OD. order_id Option (Loop Join ) -- Force optimizer to use nested join
Result
We can see that
1> when running SQL Server (SQL 2008 is
spark SQL tree.SELECT * FROM (SELECT * from SRC) a joins (SELECT * from src) b on A.key=b.keyfirst, let's take a look at the generated plan in the console:3.1. Unresolve Logical Plan The first step to generate Unresolve Logical plan is as follows:scala> Query.queryExecution.logicalres0:org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Project [*] Join In
-connections.2.4 Reduce side join + BloomfilterIn some cases, the semijoin extracted by the key collection of the small table in memory still does not hold, this time can use Bloomfiler to save space.The most common function of bloomfilter is to determine whether an element is in a set. Its two most important methods are: Add () and contains (). The biggest feature is that false negative is not present, that is, if contains () returns false, the eleme
Dataframe knows the column information of the datab) The fundamental difference between RDD and DataframeThe RDD has a record as the basic unit, and spark cannot optimize the interior details of the Rdd when dealing with the RDD, so there is no further optimization, which limits the performance of Spark SQL.The dataframe contains the metadata information for eac
Spark SQL is one of the newest and most technologically complex components of spark. It supports SQL queries and the new Dataframe API. At the heart of Spark SQL is the Catalyst Optimizer, which uses advanced programming language
://blog.linezing.com/?p=1048" can see that storm can process 35,000 data per second, and Spark streaming hits its nearly twice-fold throughput. However, it is necessary to note that the storm version used on the internet is not up-to-date and does not indicate whether the business logic is optimized or not, so it is only possible to make some perceptual comparisons.
2. The initial experience of pressure
After the code is written, do not do any
fromGciinchGC. DefaultIfEmpty ()Select New{ClassID=S.classid, ClassName=GCI. ClassName, Student=New{Name=S.name, ID=S.studentid}}; foreach(varIteminchquery) {Console.WriteLine ("{0} {1} {2}", item. ClassID, item. ClassName, item. Student.name); }}} console.readline ();}Outer join must have the join table into a new variable GC and then use the GC. DefaultIfEmpty () represents an outer join.LINQ
Mysql join optimization bitsCN.com
Mysql join optimization
1. multi-table connection type
1. Cartesian products (cross join) can be considered as cross join in MySQL, or CROSS is omitted, or ',' such:
SELECT * FROM table1 cros
data to append, do nothing, return itself orig } else {//otherwise expand //grow in step s of initial size val capacity = Orig.capacity () val newSize = capacity + Size.max (CAPACITY/8 + 1) val pos = Orig.position () orig.clear () Bytebuffer . Allocate (NewSize) . Order (Byteorder.nativeorder ()) . Put (Orig.array (), 0, POS) } }......Finally call Mappartitionsrdd.cache () to cache the RDD and add it to the
Spark1.3, the dataframe is introduced to rename the Schemardd type , and in Spark1.3,Dataframe is a distributed dataset organized as a named column. is conceptually similar to a table in a relational database , and is equivalent to the DTA Frames in R/python. Dataframe can be converted from a structured data file, from a table in hive, or from an external database or an existing RDD. The Dataframe programming model has the following features : 1, from KB to petabytes of data volume suppor
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.