Lookup in the ETL is a common operation, such as product key to the conversion of the surrogate key, ID to name conversion, etc., can be achieved through lookup. The Lookup transformation component in Informatica can be used to update slow-changing dimensions, in addition to the normal conversion, which is powerful. Based on Informatcia8.1 's online documentation, this article briefly describes Informatica's lookup transformation.
Chinese and English nouns correspond to:
Transformation: Converting
Connected: Connected
unconnected: Not connected
Cache: Caching
I. The function of Lookup
• Get related values: For example, find name based on ID
• Perform calculations: For example, get a calculation formula to get a result
• Update slow change dimension: Decide whether to insert or update records according to the conditions of lookup
Ii. Rational Lookups vs Flat File Lookups
The source of lookup can be a table in a relational database, or it can be a flat file. For a relational table, you can choose from source or target, or you can use the Import Wizard as a peace file.
Iii. connected lookups vs unconnected lookups
The conversion of Informatica can be divided into two types: connected and not connected.
The so-called connection conversion, is that the conversion is in the entire ETL data flow, its input ORT is directly from another transformation, rather than connected, then independent of the main data flow, through other transformations in the expression of the input data.
The lookup transformation of a connection processes each piece of data in the data stream, outputs a predetermined default value for those that do not meet the lookup criteria, and can update the dynamic cache. The output value is all Output/lookup port. You can use a static or dynamic cache.
The disconnected Lookup transformation processes only data that meets the lookup criteria, and only one value is returned. For a condition that does not qualify, the output is null. The disconnected Lookup transformation can be invoked multiple times. The output value is in the unique return port. Only static caching can be used.
Four, cache
Informatica uses the cache mechanism for lookup. The server's processing process for the cache is as follows:
When you start processing the first data, the server creates the cache in memory, and the size of the cache is determined by some properties of lookup conversion. For the lookup condition, a index cache is established, and for the output value, it is placed in the data cache.
If the memory cache is not large enough, the overflow cache is placed in the file. After the session is finished, the cache is purged unless the lookup cache is set to permanent.
For static cache, the lookup transformation is not allowed to update it. and dynamic cache, in the lookup if you find a value that does not meet the criteria, you can insert or update the cache processing.
Of course, you can choose not to use any cache.
V. Lookup Transmation Components
There are 5 components in lookup, which is to right-click on the Lookup transformation and select the 5 tabs you see after editing. In fact, basically informatica all the transformation are almost 5 components.
The First Transformation tab, the second ports and the fifth metadata extensions are basically the same. Just the port of lookup except for the usual I (input), O (output), and L (lookup), R (return). Where the return port can only have one and cannot be directly connected to other transformations, it can only be obtained by LKP: an expression.
The Fourth Condition tab specifies the criteria for lookup, which in effect sets the two-table association condition.
The third properties are the most important, where you can rewrite SQL to customize lookup, set how you want to return multiple records, set whether to use dynamic cache, and the size of the cache, and so on.
Vi. the tips of lookup
• CREATE index on the lookup condition column
• Use = condition as much as possible. If there are multiple conditions, try to put the = condition to the front
• For small tables, use cache as much as possible, and set cache size so that the entire table can be cache into memory
• If the table and source tables in lookup are in the same database and the cache is not large enough, use join instead of lookup
• For static lookup, use permanent cache (persistent cache) as much as possible so that multiple sessions can be reused.