Home Blog Shark wins over Hive

Shark wins over Hive

Sarvesh Gupta

July 22, 2015

Share to

Not long ago, ApacheTM Hadoop R (a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models) emerged as a solution to big data challenges. However, there were some inherent issues, related to performance and time lags as Hadoop is designed for batch processing and not for real-time queries. Another challenge with Hadoop is the requirement of `Map-Reduce’ perspective which was a deterrent for SQL engineers. To manage this issue, a data warehouse system called Hive was introduced. Hive wrapped the Map-Reduce nitty-gritty into an SQL-like interface with its Hive Query language. However, this did not resolve the inherent issues with Hadoop’s Map-Reduce approach i.e. latency.

As a result of these challenges, open-source tools such as Spark, Impala and HAWQ emerged, and these tools leveraged techniques to reduce the latency associated with batch-based Hadoop jobs. Shark is one such Hadoop extension tool that speeds up both in-memory and on-disk queries. Impala, another such tool, works well with Hive/HDFS and resembles traditional parallel databases.

With our passion for technology, we at Tavant, have tested these emerging solutions to evaluate their performance in real-world cases.

Given below is our analysis of Shark:

We simulated a total of six ad servers with a structured set of logs capturing the details of ad requests and deliveries. We generated 4 million requests in one hour per ad server, taking the size of logs on one server to 125 MB in one hour. We then set up two clusters – one with Hadoop/Hive and one with Spark/Shark. The same set of machine configurations was used for running both the clusters: OS: Ubuntu 12.04 LTS, Ram: 2GB, Number of nodes: 2

We executed a query to find out the number of requests, impressions and clicks based on the geographical location of the user.

The following infographic illustrates the execution time recorded for both the cases:

Thus, it can be inferred that Shark is superior to Hive in terms of performance.

However, we witnessed a few issues with Shark:

The memory size available to the Shark process must be chosen wisely, depending on the data size to be processed, in order to avoid ‘Out of Memory’ error.
The improvement in the performance of Shark over Hive is not consistently greater by a constant factor. Heavy workloads and different queries may show less gap in the execution times of Shark and Hive.

Nonetheless, Shark seems a good option at this point. Future releases of Shark will make available to us more features and upgrades.

Don’t miss our next blog: `Evaluation of Impala’.

Tags :

GO BACK

ARTIFICIAL INTELLIGENCE

FEATURED INSIGHT

Tavant Introduces AI Agents

FEATURED INSIGHT

Mastering Data Archival Techniques

Financial Products

Manufacturing Products

FEATURED INSIGHT

SLM - Opportunities And Challenges White Paper By Harvard Business Review

FEATURED INSIGHT

An Expert Take on How AI is Transforming the HELOC Experience

Financial Services

Media & Entertainment

Real Estate

Manufacturing

Digital Businesses

Agriculture

FEATURED INSIGHT

Tavant Named to HousingWire’s Tech100

IMPACT & INSIGHTS

Case Studies

Testimonials

Insights

QUICK READS

Online Platform Services for a Leading Game Company

ARTIFICIAL INTELLIGENCE

FEATURED INSIGHT

Tavant Introduces AI Agents

FEATURED INSIGHT

Mastering Data Archival Techniques

Financial Products

Manufacturing Products

FEATURED INSIGHT

SLM - Opportunities And Challenges White Paper By Harvard Business Review

FEATURED INSIGHT

An Expert Take on How AI is Transforming the HELOC Experience

Financial Services

Media & Entertainment

Real Estate

Manufacturing

Digital Businesses

Agriculture

FEATURED INSIGHT

Tavant Named to HousingWire’s Tech100

IMPACT & INSIGHTS

Case Studies

Testimonials

Insights

QUICK READS

Online Platform Services for a Leading Game Company

ABOUT

Leadership

Awards & Recognition

Our Partners

Our Story

News

FEATURED INSIGHT

SLM - Opportunities And Challenges White Paper By Harvard Business Review

Culture

Open Positions

FEATURED INSIGHT

SLM - Opportunities And Challenges White Paper By Harvard Business Review

ABOUT

Leadership

Awards & Recognition

Our Partners

Our Story

News

FEATURED INSIGHT

SLM - Opportunities And Challenges White Paper By Harvard Business Review

Culture

Open Positions

FEATURED INSIGHT

SLM - Opportunities And Challenges White Paper By Harvard Business Review

Shark wins over Hive

Sarvesh Gupta

Share to

Tags :

Follow us

AI & AI Agents

SLM - Opportunities And Challenges
White Paper By Harvard Business Review

SLM - Opportunities And Challenges
White Paper By Harvard Business Review

SLM - Opportunities And Challenges
White Paper By Harvard Business Review

SLM - Opportunities And Challenges
White Paper By Harvard Business Review

SLM - Opportunities And Challenges
White Paper By Harvard Business Review

SLM - Opportunities And Challenges
White Paper By Harvard Business Review