Please use this identifier to cite or link to this item:
http://20.198.91.3:8080/jspui/handle/123456789/8882| Title: | Data engineering with apache spark |
| Authors: | Karmakar, Subhankar |
| Advisors: | Barik, Mridul Sankar |
| Keywords: | Data Analysts;Data Engineers |
| Issue Date: | 2022 |
| Publisher: | Jadavpur University, Kolkata, West Bengal |
| Abstract: | In this era of information, a large amount of data is easily available on hands of scientists and decision makers but the problem is that, they come not only in large volume but also in high variety, velocity, and as soon as they are available then veracity and value of the available data needs to be analyzed to make some decisions on that. So, before analyzing it for taking decisions, another big challenge is to manage this high volume of data, extracting data from various resources, transforming and cleaning the data in a structured format, and loading it to data warehouses to make them available to Data Scientists, Data Analysts. This process is known as ETL which stands for Extract, Transform and Load. Here comes the role of Data Engineers, as a separate category of experts in the world of data science. Over the years, a large number of data tools and products has been evolved and among all of them Apache Spark has been evolved as a best friend of data engineers for a few years. Apache Spark is one of the most widely used open source processing framework for big data, it allows one to process large datasets in parallel using a large number of compute nodes which make Spark as a unified general-purpose distributed data processing engine. This thesis aims to give a brief introduction to how spark works internally and some examples which will help to depict an idea how data engineers manages big data. In this thesis DataBricks, a cloud solution leveraging Spark processing engine, is used with PySpark, Sparks’s python APIs. |
| URI: | http://20.198.91.3:8080/jspui/handle/123456789/8882 |
| Appears in Collections: | Dissertations |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| M.CA (Dept.of Computer Science and Engineering) Subhankar Karmakar.pdf | 1.95 MB | Adobe PDF | View/Open |
Items in IR@JU are protected by copyright, with all rights reserved, unless otherwise indicated.