Abstract:
Nothing is increasing dramatically as the increase of data in different dimensions. However the
consistent increase in data size has been observed a new phenomenon. Having a close look over
the current attributes of data, it has been increasing with high velocity, high volume and with
various varieties (3Vs or V3) known as Big Data. In IT industry research communities and data
scientists have observed that Big Data has challenged the legacy of solutions. ‘Big Data’ is an all-encompassing term for any collection of data or data sets which is so large and complex and
difficult to process and manage using traditional data processing applications and existing
RDBMS respectively. When we talk about Big Data to be handled by using traditional RDBMS;
the most important challenges include analysis, capture, curation, search, sharing, storage,
transfer, visualization and privacy. As the data increases in various dimensions with various
feature (i.e. structured, semi structured and unstructured) and attributes (3V); the Relational Data Base Management Systems (RDBMSs) face another fold of limitations. Due to the aforesaid
limitations of RDBMS, data scientists and information managers are forced to rethink about
alternative solutions for handling such Big Data. Various studies have already been indicated that existing RDBMS can provide only the short terms solutions. Existing data handling technologies like RDBMS & DBMS are efficient enough when nature of the data is structured, limited in volume, velocity and variety. However if data size is huge with different varieties (i.e. voice, text, video etc.) and generating with high velocity (i.e. Petabytes/ second) it becomes difficult to handle it with existing RDBMS & DBMS tools and technologies.
[1]. The major research questions required to be answered after this research are- What are the
current challenges in data Industry? Which technology will be appropriate to handle the data
which is generating with high volume, high velocity and high variety? Is the existing RDBMS are
capable enough to handle the data of unstructured and semi structured in nature? Which
technology is most feasible for handling and analyzing such big data? Is it possible to provide
visual patterns in RDBMS or DBMS for better understanding of the multilayered knowledge or
information hidden into the data? What parameters must be considered in selecting the Big Data
and RDBMS tools for Decision Makers? Aforesaid challenges clearly indicates that a long term solution with high accuracy, high volume, high speed, 3D visualization, and variety of data in terms of modern features of information analytics is highly anticipated in the IT Industry.
In this research attempts has been made to rigorously analyze the capabilities of RDBMS and
then performance analysis has been done in compare to Big Data handling tools and technologies as an alternative solutions to support the decision makers for long suitable solutions that can handle the data and information with 3V. The features considered for the discourse analysis in this research are resource consumption, execution time, on demand scalability, maximum data size, structure of the data, data visualization, and ease of deployment, cost and security.
This thesis is an exploratory research attempt. Firstly it covers the review of literature to find out
the gap in the existing researches in reference to the proposed problem and desired solutions in
Big Data Analytics and concludes with the research--ability note on the title. Secondly The Big
Data sets have been collected from the publically available data sources i.e. Telned and
Hortonworks. Later Big Data and RDBMS tools were selected for performance measurement and organizing the experimental data. Based on the database engine ranking statistics and the facts collected from the formal interview of the domain experts; an analysis and conclusion has been drawn. Based on selected parameters, experiments have been done to accomplish the comparative analysis of RDBMS Vs. Big Data tools and technologies. Finally the research provides a decision support matrix/system for decision makers in selecting the appropriate technology based on the nature of data to be handled in the target organizations.