1. Optimization
The integration service is powerful and complex in development and management. ETL packages can provide functions but cannot meet performance requirements. The following are some suggestions for improving the performance.
1.1 overall settings for speed improvement
1. Optimize data sources
(1) Delete unnecessary columns;
(2) Use the SQL SELECT statement to extract data from the view; avoid using the access mode of "table or view" in the "OLE DB source" adapter.
(3) Optimize Data Source Query;
2. Use flat files and other file sources
(1) combine adjacent undesired columns in a flat file;
(2) Keep unnecessary columns in the flat file as the "string" type;
3. Clear hidden operations
(1) Parse or convert columns only when necessary;
(2) Use the "fastparse" option in the "flat file source" adapter;
(3) Clear unnecessary log records;
1.2 Data Flow Optimization
(1) Using Indexes
(2) Implementing Parallelism
(3) Clear unnecessary operations
(4) Search for reference pairs; avoid using "select * from"
(5) perform data set-based operations in SQL Server. For example, in SQL Server, the speed of the "sort" conversion component is faster, especially when there is an index.
(6) Note the conversion of internal file IO. It can improve the performance of the hard drive.
(7) Transfer to a 64-bit Operating System
(8) monitor memory-intensive conversions, such as "aggregation", "Search", and "sorting ".
(9) use the "Search" partial cache or full cache Mode
Partial cache: It is applicable when the incoming data is repetitive and only references a small part of the data in the reference table. Full cache: Applicable to rows that use all data in the entire referenced table range when the referenced table is small.
(10) Remove "aggregation", "Search", and "sort" operations
(11) increase"DefaultbuffermaxsizeAndDefaultbuffermaxrows", But it cannot be too big.
(12) use"Enginethreads"(Engine thread ). You can use the log to test the actual number of threads required, and then set the consistency.
(13) Set"Maxconcurrentexecutables"(Maximum number of concurrent course execution programs) if the value is"-1 ", the number of threads used will be the number of processors in the computer plus 2. if the control flow has 10 data streams and is set to "4", only four data streams can be run simultaneously. ToEnginethreads"Combined with settings.
(14) use components with high overhead with caution, such as oledb commands, fuzzy search, data conversion, export column/to column, and word search.
1.3 insert Performance
1. Use the "SQL Server target" adapter;
2. Set "Submit capacity ";
3. Open the table lock ";
4. Disable constraints;
5. Use "least log records ";
6. Disable indexes when loading data again;
1.4 how to optimize using constraints Theory
1. Five Steps of constraint theory
(1) Recognition of constraints
(2) Determine how to use Constraints
(3) coordinate and synchronize all other aspects based on previous decisions
(4) Improve the Performance of constraints
(5) If the constraints change in any of the above steps, return to the first step.
2. The most common top priority Constraint
Memory, CPU speed, CPU quantity, disk drive speed, and network speed.
3. Differences between Benchmark Test constraints and bottlenecks
4. performance counters
· Buffers in use
· Flat buffers in use
· Private buffers in use
· Rows read
· Rows written
· Buffers spooled