Apr 24, 2020

ETL vs ELT

Reference: https://www.xplenty.com/blog/etl-vs-elt/#conclusion

The Biggest Advantages of ELT

The primary advantage of ELT over ETL relates to flexibility and ease of storing new, unstructured data. With ELT, you can save any type of information—even if you don’t have the time or ability to transform and structure it first—providing immediate access to all of your information whenever you want it. Furthermore, you don’t have to develop complex ETL processes before data ingests, and saves developers and BI analysts time when dealing with new information. 
undefinedThe Biggest Advantages of ETL
One of the biggest advantages of ETL over ELT relates to the pre-structured nature of the OLAP data warehouse. After structuring/transforming the data, ETL allows for speedier, more efficient, more stable data analysis. In contrast, ELT isn't ideal when speedy analysis is desired.undefined

ETL vs. ELT Comparison

ETLELT
Adoption of the technology and availability of tools and expertsETL is a well-developed process used for over 20 years, and ETL experts are readily available.
ELT is a new technology, so it can be difficult to locate experts and more challenging to develop an ELT pipeline compared to an ETL pipeline.
Availability of data in the systemETL only transforms and loads the data that you decide is necessary when creating the data warehouse and ETL process. Therefore, only this information will be available.
ELT can load all data immediately, and users can determine later which data to transform and analyze.
Can you add calculations?Calculations will either replace existing columns, or you can append the dataset to push the calculation result to the target data system.
ELT adds calculated columns directly to the existing dataset.
Compatible with data lakes?ETL is not normally a solution for data lakes. It transforms data for integration with a structured relational data warehouse system.
ELT offers a pipeline for data lakes to ingest unstructured data. Then it transforms the data on an as-needed basis for analysis.
ComplianceETL can redact and remove sensitive information before putting it into the data warehouse or cloud server. This makes it easier to satisfy GDPR, HIPAA, and CCPA compliance standards. It also protects data from hacks and inadvertent exposure.
ELT requires you to upload the data before redacting/removing sensitive information. This could violate GDPR, HIPAA, and CCPA standards. Sensitive information will be more vulnerable to hacks and inadvertent exposure. You could also violate some compliance standards if the cloud-server is in another country.
Data size vs. complexity of transformationsETL is best suited for dealing with smaller data sets that require complex transformations.
ELT is best when dealing with massive amounts of structured and unstructured data.
Data warehousing supportETL works with cloud-based and onsite data warehouses. It requires a relational or structured data format.
ELT works with cloud-based data warehousing solutions to support structured, unstructured, semi-structured, and raw data types.
Hardware requirementsCloud-based ETL platforms (like Xplenty) don't require special hardware. Legacy, onsite ETL processes have extensive and costly hardware requirements, but they are not as popular today.
ELT processes are cloud-based and don't require special hardware.
How are aggregations different?Aggregation becomes more complicated as the dataset increases in size.
As long as you have a powerful, cloud-based target data system, you can quickly process massive amounts of data.
Implementation ComplexityETL experts are easy to procure when building an ETL pipeline. Highly evolved ETL tools are also available to facilitate this process.
As a new technology, the tools to implement an ELT solution are still evolving. Moreover, experts with the requisite ELT knowledge and skills can be difficult to find.
Maintenance requirementAutomated, cloud-based ETL solutions, like Xplenty, require little maintenance. However, an onsite ETL solution that uses a physical server will require frequent maintenance.
ELT is cloud-based and generally incorporates automated solutions, so very little maintenance is required.
Order of the extract, transform, load processData transformations happen immediately after extraction within a staging area. After transformation, the data is loaded into the data warehouse.
Data is extracted, then loaded into the target data system first. Only later is some of the data transformed on an “as-needed” basis for analytical purposes.
CostsCloud-based SaaS ETL platforms that bill with a pay-per-session pricing model (such as Xplenty) offer flexible plans that start at approximately $100 and go up from there, depending on usage requirements. Meanwhile, an enterprise-level onsite ETL solution like Informatica could cost over $1 million a year!
Cloud-based SaaS ELT platforms that bill with a pay-per-session pricing model offer flexible plans that start at approximately $100 and go up from there. One cost advantage of ELT is that you can load and save your data without incurring large fees, then apply transformations as needed. This can save money on initial costs if you just want to load and save information. However, financially strapped businesses may never be able to afford the processing power required to reap the full benefits of their data lake.
Transformation processTransformations happen within a staging area outside the data warehouse.
Transformations happen inside the data system itself, and no staging area is required.
Unstructured data supportETL can be used to structure unstructured data, but it can’t be used to pass unstructured data into the target system.
ELT is a solution for uploading unstructured data into a data lake and make unstructured data available to business intelligence systems.
Waiting time to load informationETL load times are longer than ELT because it's a multi-stage process: (1) data loads into the staging area, (2) transformations take place, (3) data loads into the data warehouse. Once the data is loaded, analysis of the information is faster than ELT.
Data loading happens faster because there's no waiting for transformations and the data only loads one time into the target data system. However, analysis of the information is slower than ETL.
Waiting time to perform transformationsData transformations take more time initially because every piece of data requires transformation before loading. Also, as the size of the data system increases, transformations take longer. However, once transformed and in the system, analysis happens quickly and efficiently.
Since transformations happen after loading, on an as-needed basis—and you transform only the data you need to analyze at the time—transformations happen a lot of faster. However, the need to continually transform data slows down the total time it takes for querying/analysis.



No comments:

Post a Comment