Microsoft BI's SSIS series-Talk about lookup caching

Source: Internet
Author: User
Tags ole ssis

The first introduction to the cache of lookup is in a previous article that has already mentioned the use of Microsoft BI's SSIS Series-Lookup component and its several cache modes-full cache, partial cache, NO cache but still may be missing in the part So that we could summarize and supplement it here. This is the first, or from a theoretical point of view, the question of the lookup cache, there will be a later time to write another article, from the background of SQL execution to understand the work of the lookup process. And about the Lookup cache There are other interesting topics, such as my posts, you can relate to see:
    • Microsoft BI's SSIS Series-string comparison case processing in Lookup sensitive or insensitive
We still discuss these three kinds of cache modes under OLE DB Connection Manager: Full cache, partial cache, no cache not cached in the example below, Ff_src_internet_sales Is the file data input source, Lkp_sales_order_number is the Lookup component. Full cache fully Cached

Full Cache-The default cache for Lookup. First step: In this mode, the query is sent to the database server after the data flow pre-execution pre-execute is executed, and the data is queried and returned, and cached in the SSIS runtime memory. After that, any changes in the data in the right-hand database, either in the view configured in Lookup or in the table "T032_salesorder", are not related to the data in the cache. The second step: the data stream is now executing, and the cache already contains data, and the data from Ff_src_internet_sales is used by the matching rules previously configured to use column SalesOrderNumber and columns Salesorderlinenumber The values in the cache find there are no matching SalesOrderNumber and Salesorderlinenumber. Step three: If the corresponding data is matched in the cache, the Available Input column (that is, the column in the file source) and the Available Lookup column (that is, the column in the Cache-view or table T032_salesorder) are hit The column of the tick (the option not checked in this example) together forms an output column group: Available Input column + Available Lookup columns checked down output, the name of the output is called  lookup Match output by default 。 Fourth step: If the corresponding data is not found in the cache, then the data in Available Input Columns is output as mismatched data according to the mismatched rules (Redirect rows to no match output here), the output name is called Lo Okup No Match Output. Note: The contents of the error data are not the data in Available Input Columns. Then the above steps are the full cache mode of lookup under OLE DB Connection and the process of finding the output. Key Features
    • Data queries and cache actions are completed before the data flow is started (or even earlier before the package executes), and the query result set is cached.
    • High memory consumption increases the time the data flow starts.
    • The data flow executes very quickly, and the source data is compared directly to the cached data without querying the database again.
    • Data changes in the cached data source will also no longer affect the data in the cache.
    • If the cached data capacity exceeds the size of the memory, then an out-of-memory error occurs because the cache does not actively write data to disk.
When to use fully cached full cache
    • The amount of data in a reference dataset, regardless of size, as long as it does not exceed the memory size, especially if the data source data and reference datasets match a high degree of time, a cache can be reused.
    • The database server is not on-premises, or the database server is under a lot of stress, and the database server is under a lot of pressure to reduce repeated connections.
Using the key points in full cache
    • The data is all cached in memory, and if memory is not enough, it will not cache the excess data to disk, but directly error
    • Because the dataset is cached in memory, you should not use the Table object directly when using  lookup  select  where  
    • < Span lang= "en-US" > Once the data is cached, In the process of data flow execution, it is no longer detected whether the source data is changed or updated, and so on, unless the data flow restarts execution.
Partial cache partially caches the first step: in partial cache mode, the Lookup cache is empty at the beginning of the data flow execution. When the Lookup component begins to read into a column in the Ff_src_internet_sales data source, it is time to check for matching values in the cache because of the need for comparison. The second step: if there is one in the cache, it can be directly output as a matching output. The third step: if not in the cache, the query will be sent to the database for querying (query statements can be configured parameters). If it is found in the database, it will cache the found content in the matching buffer for the next use, this time to match the output.   Fourth Step: If the cache does not, after querying the database is not, according to the configuration can be in the cache does not have the data in the database is configured in the mismatch buffer for the next mismatch check. Because they are not found, this time the output is not matched. Characteristics
    • before the data stream starts, the cache is empty and the data flow starts faster than the full cache.
    • lookup 
    • can be  advanced options 
    • You can set the percentage of buffers that do not match the buffers in the advanced Options so that one source data cannot be queried in the matching cache. In the case where the database is also not queried, the key comparison column of this data will be stored in mismatched buffers. The next time the data is not found in the matching buffer, it will first see if there is a mismatch in the buffer, which will reduce the chances of repeated queries to the database. If the data in the data source has a low match rate with the Lookup reference set, the ratio of mismatched buffers can be increased appropriately.
When to use partial cache partial caches
    • When there is less data in the data source, the number of queries is small.
    • The data in the reference dataset is large and memory is not supported.
    • The data in the Reference DataSet source table changes and needs to be known during the query matching process.
    • Consider using the Partial Cache when you need to use a parameterized query to limit the size of the reference set.
Places to note using partial cache partial caching
    • Note that the size allocation of the buffer is as large as possible, 25MB in size.
    • Reasonable use does not match the buffer, when the mismatch degree is high, the ratio of mismatched buffers is increased.
No cache is not cached

No cache mode, each match query will go to the database to check once. In this cache mode, the amount of data is small and the memory is very tight, it will be used, of course, its memory consumption is relatively minimal, but the lowest efficiency.

Summary as far as I am concerned, the current configuration of the use of Lookup is largely by default choosing full Cache mode, because memory support is not a big problem at the moment. A 1 W input Lookup set is basically resolved in a few seconds in full cache mode, if the Partial cache or No cache is at least 60 seconds or more. Therefore, in the actual development can basically maintain the default selection of full cache can achieve an optimal efficiency of use, unless memory tension in the cache allocation ratio of the fuss. Articles related to this article also have
    • Microsoft BI's SSIS series-use of the Lookup component and several of its cache modes-full cache, Partial cache, NO cache
    • Microsoft BI's SSIS Series-string comparison case processing in Lookup sensitive or insensitive

For more bi articles, see the Bi-series essay list (SSIS, SSRS, SSAS, MDX, SQL Server) If you feel this article has helped you, please help recommend it to make it easy for others to see these articles quickly in the Biwork blog recommendations Bar.

Microsoft BI's SSIS series-Talk about lookup caching

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.