Best practices for knowing your limit and kicking % notfound
I have started using bulk collect whenever I need to fetch large volumes of data. this has caused me some trouble with my dBA, however. he is complaining that although my programs might be running much faster, they are also consuming way too much memory. he refuses to approve them for a production rolout. what's a programmer to do?
The most important thing to remember when you learn about and start to take advantage of features such as bulk collect is that there is no free lunch. there is almost always a trade-off to be made somewhere. the tradeoff with bulk collect, like so many other performance-enhancing features, is "run faster but consume more memory."
Specifically, memory for collections is stored in the program global area (PGA), not the system global area (SGA ). SGA memory is shared by all sessions connected to Oracle database, but PGA memory is allocated for each session. thus, if a program requires 5 MB of memory to populate a collection and there are 100 simultaneous connections, that program causes the consumption of 500 mb of PGA memory, in addition to the memory allocated to the SGA.
Fortunately, PL/SQL makes it easy for developers to control the amount of memory used in a bulk collect operation by using the limit clause.
Suppose I need to retrieve all the rows from the employees table and then perform some compensation Analysis on each row. I can use Bulk collect as follows:
PROCEDURE process_all_rowsISTYPE employees_aatIS TABLE OF employees%ROWTYPEINDEX BY PLS_INTEGER;l_employees employees_aat;BEGINSELECT *BULK COLLECT INTO l_employeesFROM employees;FOR indx IN 1 .. l_employees.COUNTLOOPanalyze_compensation(l_employees(indx));END LOOP;END process_all_rows;
Very concise, elegant, and efficient code. If, however, my employees table contains tens of rows, each of which contains hundreds of columns, this program can cause excessive PGA memory consumption.
Consequently, you shoshould avoid this sort of "unlimited" use of bulk collect. instead, move the SELECT statement into an explicit cursor declaration and then use a simple loop to fetch rows, but not all, rows from the table with each execution of the loop body, as shown in Listing 1.
Code listing 1: using bulk collect with limit clause
PROCEDURE process_all_rows (limit_in IN PLS_INTEGER DEFAULT 100)ISCURSOR employees_curISSELECT * FROM employees;TYPE employees_aat IS TABLE OF employees_cur%ROWTYPEINDEX BY PLS_INTEGER;l_employees employees_aat;BEGINOPEN employees_cur;LOOPFETCH employees_curBULK COLLECT INTO l_employees LIMIT limit_in;FOR indx IN 1 .. l_employees.COUNTLOOPanalyze_compensation (l_employees(indx));END LOOP;EXIT WHEN l_employees.COUNT < limit_in;END LOOP;CLOSE employees_cur;END process_all_rows;
The process_all_rows procedure in Listing 1 requests that up to the value of limit_in Rows be fetched at a time. PL/SQL will reuse the same limit_in elements in the collection each time the data is fetched and thus also reuse the same memory. even if my table grows in size, the PGA consumption will remain stable.
How do you decide what number to use in the limit clause? Theoretically, you will want to figure out how much memory you can afford to consume in the PGA and then adjust the limit to be as close to that amount as possible.
From tests I (and others) have completed MED, however, it appears that you will see roughly the same performance no matter what value you choose for the limit, as long as it is at least 25. the test_diff_limits. SQL script, encoded with the sample code for this column, at otn.oracle.com/oramag/oracle/08-mar/o28plsql.zip, demonstrates this behavior, using the all_source data dictionary view on an Oracle Database 11GInstance. Here are the results I saw (in hundredths of seconds) when fetching all the rows (a total of 470,000 ):
Elapsed CPU time for limit of 1 = 1839Elapsed CPU time for limit of 5 = 716Elapsed CPU time for limit of 25 = 539Elapsed CPU time for limit of 50 = 545Elapsed CPU time for limit of 75 = 489Elapsed CPU time for limit of 100 = 490Elapsed CPU time for limit of 1000 = 501Elapsed CPU time for limit of 10000 = 478Elapsed CPU time for limit of 100000 = 527
Kicking the % notfound habit
I was very happy to learn that Oracle Database 10GWill automatically optimize my cursor for loops to perform at speeds comparable to bulk collect. Unfortunately, my company is still running on oracle9IDatabase, so I have started converting my cursor for loops to bulk collects. I have run into a problem: I am using a limit of 100, and my query retrieves a total of 227 rows, but my program processes only 200 of them. [The query is shown in Listing 2.] what am I doing wrong?
Code listing 2: bulk collect, % notfound, and missing rows
PROCEDURE process_all_rowsISCURSOR table_with_227_rows_curISSELECT * FROM table_with_227_rows;TYPE table_with_227_rows_aat ISTABLE OF table_with_227_rows_cur%ROWTYPEINDEX BY PLS_INTEGER;l_table_with_227_rows table_with_227_rows_aat;BEGINOPEN table_with_227_rows_cur;LOOPFETCH table_with_227_rows_curBULK COLLECT INTO l_table_with_227_rows LIMIT 100;EXIT WHEN table_with_227_rows_cur%NOTFOUND; /* cause of missing rows */FOR indx IN 1 .. l_table_with_227_rows.COUNTLOOPanalyze_compensation (l_table_with_227_rows(indx));END LOOP;END LOOP;CLOSE table_with_227_rows_cur;END process_all_rows;
You came so close to a completely correct conversion from your cursor for loop to bulk collect! Your only mistake was that you didn't give up the habit of using the % notfound cursor attribute in your exit when clause.
The statement
EXIT WHENtable_with_227_rows_cur%NOTFOUND;
Makes perfect sense when you are fetching your data one row at a time. With BULK collect, however, that line of code can result in incomplete data processing, precisely as you described.
Let's examine what is happening when you run your program and why those last 27 rows are left out. After opening the cursor and entering the loop, here is what occurs:
1. The fetch statement retrieves rows 1 via 100.
2. table_with227_rows_cur % notfound evaluates to false, and the rows are processed.
3. The fetch statement retrieves rows 101 through 200.
4. table_with227_rows_cur % notfound evaluates to false, and the rows are processed.
5. The fetch statement retrieves rows 201 through 227.
6. table_with227_rows_cur % notfound evaluates to true, and the loop is terminated-with 27 rows left to process!
Next steps Read more best practice PL/SQL Download Oracle Database 11G Sample Code for this column |
When you are using bulk collect and collections to fetch data from your cursor, you shoshould never rely on the cursor attributes to decide whether to terminate your loop and data processing.
So, to make sure that your query processes all 227 rows, replace this statement:
EXIT WHENtable_with_227_rows_cur%NOTFOUND;withEXIT WHENl_table_with_227_rows.COUNT = 0;
Generally, you shoshould keep all of the following in mind when working with bulk collect:
- The collection is always filled sequentially, starting from index value 1.
- It is always safe (that is, you will never raise a no_data_found exception) to iterate through a collection from 1 to collection. count when it has been filled with bulk collect.
- The collection is empty when no rows are fetched.
- Always check the contents of the collection (with the count method) to see if there are more rows to process.
- Ignore the values returned by the cursor attributes, especially % notfound.