Home / BigData / 301.1.4-Handling Big Data

301.1.4-Handling Big Data

Bigdata Tool

  • Analysis on this Bigdata can give us awesome insights.
  • But, by definition, the bigdata can’t be handled using conventional tools.
  • Datasets complex, huge and difficult to process.
  • What is the solution?

Handling Bigdata – Using super computers

  • Super Computer is a solution.
  • Put multiple CPUs in a machine (100?). It will give the result quickly.
  • Let us see if we have a normal laptop then it is very difficult to handle big data, because the data set size itself is 16 PB or 1 PB and if we have a normal system that even might have just 1 TB of hard disk space, then getting the data or acquiring the data or storing the data itself becomes difficult, forget about analyzing the data.
  • We can take a supercomputer, so instead of one CPU, we can put multiple CPUs in that, instead of one harddisk, we can put a huge harddisk so we can have a supercomputer to handle the big data.
  • Now the problem with supercomputer is building a supercomputer or the cost of building a supercomputer is so high that the institutes like NASA or ISRO or really big institutes or really big companies can afford supercomputers.
  • The cost of buying a supercomputer might be sometimes really higher than whatever results that you are going to get out of big data.
  • If the dataset’s size is large, then that doesn’t mean we have to invest a lot on the computer.
  • Supercomputer is a solution but it is not that cost effective solution; it is really costly for individuals. It’s almost like impossible to buy a supercomputer just to perform these operations.

Handling Bigdata: Is there a better way?

  • Till 1985, there is no way to connect multiple computers.
  • All computers were centralized individual systems.
  • Multi-core system or supercomputers were the only options for big data problems.
  • After 1985, we have powerful microprocessors and High Speed Computer Networks.

Handling Bigdata: Distributed systems

  • The Computer Networks LANs, WANs lead to distributed systems.
  • Now that we have a distributed system that ensures a collection of independent computers appears to its users as a single coherent system.
  • We can use some low-priced connected computers and process our bigdata.

Cluster Computing

  • Cluster is nothing but when you take few machines and you connect them through LANs and WAN’s, that is called cluster.
  • A collection of independent computers that are joined together using LAN is called computer cluster.
  • We can do distributed computing or cluster computing to handle big data with a single machine, as it is really difficult for it to handle big data.

Handling Bigdata- Distributed computing

Distributed computing

  • We have the overall final task, then we can divide the data into smaller pieces and place them on all these different machines.
  • Now these smaller machines or low end machines can handle smaller data set, if we have a huge data set, we can divide the dataset into smaller pieces and then distributed onto all these machines.
  • Then we connect all these machines using LAN or WAN and this whole set of machines or cluster of machines, cluster of computers look like a really big supercomputer, we can make it work like that.
  • Put them in each of the machine, divide the overall problem into smaller pieces and then run them locally on each of the machines.


About admin

Check Also



  Functions In this section we will talk about the functions in the pig. Function …

Leave a Reply

Your email address will not be published. Required fields are marked *