Record Detail

Advanced Search

Text

Towards a Scalable and Efficient ETL

El Yazid Gueddoudj - Personal Name
Azeddine Chikh - Personal Name

Extract, transform, and load (ETL) processes are crucial for building repositories of data from a variety of self-contained sources. Despite their complexity and cost, ETL processes have demonstrated some maturity for traditional, XML, and graph data sources. However the main challenge for ETL processes is double: (1) they do not scale when brought down to managing large and highly varied data sources, involving web-data. (2) the deployment of the target data warehouse in a polystore. The paper reviews various research efforts along this line of research. The paper then proposes a conceptual modeling of these processes using BPMN (Business Process Modeling Notation). These processes are automatically converted to scripts to be implemented within Spark framework. The solution is packaged according a new distributed architecture (Open ETL) that supports both batch and stream processing. To make our new approach more concrete and evaluable, a real case study using the LUBM benchmark, which involves heterogeneous data sources is considered.

Availability

No copy data

Detail Information

Series Title	-
Call Number	-
Publisher	International Journal of Computing and Digital Systems : Bahrain., 2023
Collation	005
Language	English
ISBN/ISSN	2210-142X
Classification	NONE
Content Type	-

Media Type	-
Carrier Type	-
Edition	-
Subject(s)	Design Data warehouse etl Scalability Spark BPMN Polystore
Specific Detail Info	-
Statement of Responsibility	-

Other Information

Accreditation	Scopus Q3

Other version/related

No other version available

File Attachment

Towards a Scalable and Efficient ETL

Information

Web Online Public Access Catalog - Use the search options to find documents quickly