A systematic review of SQL-on-Hadoop by using compact data formats

Date
2016-10-30
Authors
Plase, Daiga
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
There are huge volumes of raw data generated every day. The question is how to store these data in order to provide faster data access. The research direction in Big Data projects using Hadoop Technology, MapReduce kind of framework and compact data formats shows that two data formats (Avro and Parquet) support schema evolution and compression in order to utilize less storage space. In this paper, a systematic review of SQL-on-Hadoop by using Avro and Parquet has been performed over the past six years (2010–2015) using publications of conference proceedings and journals of IEEEXplore, ACM Digital Library, ScienceDirect. With the help of search strategy followed, 94 research papers have been identified out of which 17 have been analyzed as relevant papers. At the end, the conclusion has been made that direct comparison by compactness and fastness between Avro and Parquet do not exist in data science.
Description
Article also submitted for publication in Baltic J. Modern Computing (BJMC) on October 5, 2016.
Keywords
Systematic review , Big Data , Hadoop , HDFS , Avro , Parquet
Citation