Record Detail
Advanced SearchText
Towards a Scalable and Efficient ETL
Extract, transform, and load (ETL) processes are crucial for building repositories of data from a variety of self-contained sources. Despite their complexity and cost, ETL processes have demonstrated some maturity for traditional, XML, and graph data sources. However the main challenge for ETL processes is double: (1) they do not scale when brought down to managing large and highly varied data sources, involving web-data. (2) the deployment of the target data warehouse in a polystore. The paper reviews various research efforts along this line of research. The paper then proposes a conceptual modeling of these processes using BPMN (Business Process Modeling Notation). These processes are automatically converted to scripts to be implemented within Spark framework. The solution is packaged according a new distributed architecture (Open ETL) that supports both batch and stream processing. To make our new approach more concrete and evaluable, a real case study using the LUBM benchmark, which involves heterogeneous data sources is considered.
Availability
No copy data
Detail Information
| Series Title |
-
|
|---|---|
| Call Number |
-
|
| Publisher | International Journal of Computing and Digital Systems : Bahrain., 2023 |
| Collation |
005
|
| Language |
English
|
| ISBN/ISSN |
2210-142X
|
| Classification |
NONE
|
| Content Type |
-
|
| Media Type |
-
|
|---|---|
| Carrier Type |
-
|
| Edition |
-
|
| Subject(s) | |
| Specific Detail Info |
-
|
| Statement of Responsibility |
-
|
Other Information
| Accreditation |
Scopus Q3
|
|---|
Other version/related
No other version available
File Attachment
Information
Web Online Public Access Catalog - Use the search options to find documents quickly






