ETL tools vs SQL sorting

Source: Internet
Author: User

Because both of them are used, informatica is easy to manage in the future, especially for data correction. when data is supplemented in the later stage, the data stream is clear at a glance.
SQL is efficient, but it is inconvenient to maintain it later. It takes a long time to find a data stream ..
ETL tools are easier to manage and maintain, especially complicated cleaning processes.

ETL tools are suitable for fixed and stable processes;
SQL code is applicable to personalization, independence, and high variability!

What ETL tools can do, SQL can also be done, but it will not work if it is said.

The benefits of ETL tools are standardized management of process processing, visibility into task scheduling, and visibility into the data audit process.

I personally think that the two projects will coexist.

ETL tools obviously help with standardization.
Good DBAs can write more elegant and refined statements, but it is not conducive to maintenance and standardization.

First, based on the scale of business projects, Party A has little investment in some BI projects. If we buy third-party tools, Party B has no money to earn, in general, there are many etl processes using SQL during project development, or the etl tools developed by Party B, but they basically cannot reach the same level as DS.
Second, it will be used only when a large BI project or Party A has an ETL Tool investment, but it should also be used in conjunction with SQL.
Third, it depends on the complexity of the etl logic. Generally, the etl tool is more flexible and simple to use than SQL, but it cannot be processed in complicated business logic, such as data splitting, A record is split into several records based on business rules. This involves the number of digits retained by Numeric decimal places, and whether the total values of several records after splitting can be consistent with those before splitting.


Some heterogeneous databases still need to use ETL tools. If you need to extract text, MF data, DB2, SQLServer, and paradox data to warehouse in oracle, it seems that only ETL tools are used.
In addition, the ETL division of labor in large enterprises is still very fine,
Mapping, workflow, monitor, and data processing are completed by different teams. In my previous project, laomei is responsible for mapping and workflow.
China only has the monitor permission. It mainly monitors the loading speed and progress of ETL tools. If there is a performance problem or data error, the page DBA team or another team will handle it.
Large companies have high requirements on permissions, and such requirements must be implemented through ETL tools.


One of the major advantages of ETL tools over SQL lies in the maintainability of codes.
Writing SQL statements depends on the experience and habits of developers. If the original developer leaves the company and another developer takes over the task, it takes a long time to read the code of the previous developer.
However, in complex data processing, SQL should be used for processing and nested in ETL tools.

ETL tools are highly efficient for later maintenance. Flexible SQL code. If the developer's coding level is excellent, the data processing efficiency is high.


In my personal opinion, let's just summarize some things from previous work.
Etl and SQL are completely different,
Etl is just like its name. Of course, you can also say that elt is calling and cleaning the loading process. Of course, you can also say that I can directly create it using SQL,
Etl is more of a solution. In the process of cleaning and loading data sources in a data warehouse, let's take the previous example, you can customize your own trigger mechanism and allocate the task running mechanism through the sub-threads of the system,
Of course, you can also use SQL + other languages to solve the problem.
In other words, I personally feel that SQL is just a language and a data operation method, while ETL is more of a solution, a data processing and mining solution.


Agree with everyone's point of view:
1. Easy to standardize and maintain tools;
2. Tools have their own special advantages, such as parallel processing of various points. CDC, row-column conversion, and SCD can be implemented with simple settings. Metadata Management;
3. The basic SQL statements that can be implemented by tools can also be implemented, and sometimes complicated SQL statements may need to be written;
4. SQL is more flexible and advantageous in processing complex logic;
5. SQL is not easy to maintain and manage.

SQL (including stored procedures) implements clear logic for conversion, and open source etl performs scheduling. Save costs. High efficiency.

"A lot of people think that tools are not hand-built, and that's why these people don't have the tools"
I agree with you!
By the way, creating an enterprise-level application platform is beyond the reach of pure SQL.

Many people understand ETL tools in a narrow sense as ETL Task Design management tools,
In my opinion, ETL tools are divided into ETL Task Design Management and ETL scheduling monitoring management.
Recently, TASKCTL seems to be on the Internet and is now open-source.

Free ETL scheduling Platform + free ETL Task Design (for example, kettle) = free ETL solution.

The two complement each other. I generally get used to it. complex business logic is implemented by programs, and scheduling and simple business logic are implemented by tools;
Both tools and code have their own application scenarios. For example, tools have advantages in synchronization between heterogeneous data sources, but they are suitable for scenarios where the data volume is not large; the scheduling function of the other tool is also very good, so you don't need to write special programs to implement scheduling, and you can easily perform secondary development and design at the tool base layer, the exception can be well handled.


The database engine is more efficient with structured data.

Http://www.itpub.net/thread-1581786-1-1.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.