Logo
Please use this identifier to cite or link to this item: http://20.198.91.3:8080/jspui/handle/123456789/8882
Title: Data engineering with apache spark
Authors: Karmakar, Subhankar
Advisors: Barik, Mridul Sankar
Keywords: Data Analysts;Data Engineers
Issue Date: 2022
Publisher: Jadavpur University, Kolkata, West Bengal
Abstract: In this era of information, a large amount of data is easily available on hands of scientists and decision makers but the problem is that, they come not only in large volume but also in high variety, velocity, and as soon as they are available then veracity and value of the available data needs to be analyzed to make some decisions on that. So, before analyzing it for taking decisions, another big challenge is to manage this high volume of data, extracting data from various resources, transforming and cleaning the data in a structured format, and loading it to data warehouses to make them available to Data Scientists, Data Analysts. This process is known as ETL which stands for Extract, Transform and Load. Here comes the role of Data Engineers, as a separate category of experts in the world of data science. Over the years, a large number of data tools and products has been evolved and among all of them Apache Spark has been evolved as a best friend of data engineers for a few years. Apache Spark is one of the most widely used open source processing framework for big data, it allows one to process large datasets in parallel using a large number of compute nodes which make Spark as a unified general-purpose distributed data processing engine. This thesis aims to give a brief introduction to how spark works internally and some examples which will help to depict an idea how data engineers manages big data. In this thesis DataBricks, a cloud solution leveraging Spark processing engine, is used with PySpark, Sparks’s python APIs.
URI: http://20.198.91.3:8080/jspui/handle/123456789/8882
Appears in Collections:Dissertations

Files in This Item:
File Description SizeFormat 
M.CA (Dept.of Computer Science and Engineering) Subhankar Karmakar.pdf1.95 MBAdobe PDFView/Open


Items in IR@JU are protected by copyright, with all rights reserved, unless otherwise indicated.