Home / BigData




Joins Two Tables Retail_invoice = LOAD '/Retail_invoice_hdfs' USING PigStorage('\t') as (uniq_idi:chararray, InvoiceNo:chararray, StockCode:chararray, Description:chararray,Quantity:INT); DESCRIBE Retail_invoice; Retail_Customer = LOAD '/Retail_Customer_hdfs' USING PigStorage('\t') as (uniq_idc:chararray, InvoiceDate:chararray, UnitPrice:INT, CustomerID:chararray,Country:chararray); DESCRIBE Retail_Customer; Left Outer Join Left_join = JOIN Retail_invoice BY uniq_idi LEFT OUTER, Retail_Customer BY uniq_idc; DESCRIBE Left_join; DUMP Left_join; Right Outer Join …

Read More »

301.4.6-Filter & Sorting

Filter and Sorting

Filter and Sorting Filter The basic syntax of filtering is use the filter operation and then relation name which is needed to be filtered, followed by the condition. Filtering – Rows Filter on Numerical variable. For now we will see how to filter the numerical variable. Retail_Customer_F1 = FILTER Retail_Customer_pig …

Read More »

301.4.5-Group By

Group By

Group by Group by in Pig In this section we will see about the grouping in the pig, grouping in the pig is very important because most of the pig function takes the bag as the input parameter. Grouping before using functions Most of the inbuilt functions in pig take …

Read More »



  Functions In this section we will talk about the functions in the pig. Function is very important concepts while doing the analysis we might use several functions like “sum, count, average, summary functions, numerical functions , string functions and etc.”” Writing the map reduce code for each one them …

Read More »

301.4.1-Pig Introduction

Pig Introduction

  Pig Introduction In this particular session we are going to learn the basic of the pig, such as “what is a pig, pig architecture ,pig latin scripts, pig basic operations , loading the data into pig , group by ,filtering , sorting, functions in pig , joins in pig …

Read More »



  Hive Joins Supports only equality joins a.key=b.key Doesn’t support a.key<b.key type of joins as of now Joins with Non-equality conditions are very difficult to express such conditions as a map/reduce job. Two Tables Online_Retail_Customer, Online_Retail_Invoice Push the datasets onto HDFS hadoop fs -copyFromLocal /home/hduser/datasets/Online_Retail_Sales_Data/Online_Retail_Customer.txt /Online_Retail_Customer hadoop fs -copyFromLocal /home/hduser/datasets/Online_Retail_Sales_Data/Online_Retail_Invoice.txt …

Read More »

301.3.6-Basic commands using HQL

Basic Commands using HQL

    Basic commands using HQL Count(*)in Hive A simpe count Query select count(*) from stack_overflow_tags; Group by in Hive The output is too big. We need to give the output file path. This command can be executed in a non-hive terminal hive -e "select tag1, count(*) as tag1_count from …

Read More »