- Analysis on this Bigdata can give us awesome insights.
- But, by definition, the bigdata can’t be handled using conventional tools.
- Datasets complex, huge and difficult to process.
- What is the solution?
Handling Bigdata – Using super computers
- Super Computer is a solution.
- Put multiple CPUs in a machine (100?). It will give the result quickly.
- Let us see if we have a normal laptop then it is very difficult to handle big data, because the data set size itself is 16 PB or 1 PB and if we have a normal system that even might have just 1 TB of hard disk space, then getting the data or acquiring the data or storing the data itself becomes difficult, forget about analyzing the data.
- We can take a supercomputer, so instead of one CPU, we can put multiple CPUs in that, instead of one harddisk, we can put a huge harddisk so we can have a supercomputer to handle the big data.
- Now the problem with supercomputer is building a supercomputer or the cost of building a supercomputer is so high that the institutes like NASA or ISRO or really big institutes or really big companies can afford supercomputers.
- The cost of buying a supercomputer might be sometimes really higher than whatever results that you are going to get out of big data.
- If the dataset’s size is large, then that doesn’t mean we have to invest a lot on the computer.
- Supercomputer is a solution but it is not that cost effective solution; it is really costly for individuals. It’s almost like impossible to buy a supercomputer just to perform these operations.
Handling Bigdata: Is there a better way?
- Till 1985, there is no way to connect multiple computers.
- All computers were centralized individual systems.
- Multi-core system or supercomputers were the only options for big data problems.
- After 1985, we have powerful microprocessors and High Speed Computer Networks.
Handling Bigdata: Distributed systems
- The Computer Networks LANs, WANs lead to distributed systems.
- Now that we have a distributed system that ensures a collection of independent computers appears to its users as a single coherent system.
- We can use some low-priced connected computers and process our bigdata.
- Cluster is nothing but when you take few machines and you connect them through LANs and WAN’s, that is called cluster.
- A collection of independent computers that are joined together using LAN is called computer cluster.
- We can do distributed computing or cluster computing to handle big data with a single machine, as it is really difficult for it to handle big data.
Handling Bigdata- Distributed computing
- We have the overall final task, then we can divide the data into smaller pieces and place them on all these different machines.
- Now these smaller machines or low end machines can handle smaller data set, if we have a huge data set, we can divide the dataset into smaller pieces and then distributed onto all these machines.
- Then we connect all these machines using LAN or WAN and this whole set of machines or cluster of machines, cluster of computers look like a really big supercomputer, we can make it work like that.
- Put them in each of the machine, divide the overall problem into smaller pieces and then run them locally on each of the machines.