Terminologies and Technologies in BigData

Big Data

Big Data is one of the most in-demand technology. Startups, big tech vendors, companies big and small are all jumping on the big data craze. The reason being the data amount getting just doubled almost every two years.

It is used to handle massive amounts of information in all sorts of formats -- tweets, posts, e-mails, documents, audio, video, feeds etc.

There are a growing number of technologies that make up the Big Data world including all sorts of analytics, in-memory databases, NoSQL databases, Hadoop to name a few. We will briefly look upon the various tools and terminologies that are making huge inroads on this technology stack


Hadoop is a crucial technology at the center of the whole Big Data.

It is an open source software used to gather and store vast amounts of data and analyze it on low-cost commodity hardware. For instance, banks may use Hadoop for fraud detection, and online shopping services could use it to analyze customers' buying patterns. That will make huge impact once integrated in a CRM system.


Cassandra is a free and open source NoSQL database.

It's a kind of database that can handle and store data of different types and sizes of data and it's increasingly the go-to database for mobile and cloud applications. Several companies including Apple and Netflix use Cassandra and have been highly impactful.


MapReduce has been called "the heart of Hadoop."

MapReduce is the method that allows Hadoop to store all kinds of data across many low-cost computer servers. To get meaningful data of Hadoop, a programmer writes software programs (often in the popular language, Java) for MapReduce.


Cloudera is a company that makes a commercial version of Hadoop.

Although Hadoop is a free and open-source-project for storing large amounts of data on inexpensive computer servers, the free version of Hadoop is not easy to use. Several companies have created friendlier versions of Hadoop, and Cloudera is arguably the most popular one.


Hbase is yet another project based on the popular Hadoop technology.

Hadoop is a way to store all kinds of data across many low-cost computer servers. Once that data is stored using the Hadoop Distributed File System (HDFS), Hbase can sort through that data and group bits of data together, somewhat similar to how a traditional database organizes data.


Pig is another hot skill, thanks to demand for technologies like Big Data and Hadoop.

Pig is a programming language that helps extract information from Hadoop like find answers to certain questions or otherwise use the data.


Flume is yet another skill spawned from Big data" craze and the popularity of Hadoop.

Hadoop is a way to store all kinds of data across many low-cost computer servers. Flume is a method to move massive amounts of data from the place it was created into a Hadoop system.


Hive is yet another hot in-demand skill, courtesy Big Data and the popularity of Hadoop.

Hadoop is a way to store all kinds of data across many low-cost computer servers. Hive provides a way to extract information from Hadoop using the same kind of traditional methods used by regular databases. (In geek speak: it gives Hadoop a database query interface).


NoSQL is a new kind of database that is part of the big data phenomenon.

NoSQL has sometimes been called the cloud database. Regular databases need data to be organized. Names and account numbers need to be structured and labeled. But noSQL doesn't care about that. It can work with all kinds of documents.

There are a number of popular noSQL databases including Mongo, Couchbase and Cassandra.


Zookeeper is a free and open-source project that also came from the big data craze, particularly the uber popular tech called Hadoop.

Hadoop is a way to store all kinds of data across many low-cost computer servers. Zookeeper is like a file system, a way to name, locate, and sync files used in Hadoop. But now it's being used with other big-data technologies beyond Hadoop.


Arista makes a computer network switch used in big data centers.

Its claim to fame is its operating system software which users can programme to add features, write apps or make changes to the network.

At the center of much-in demand technology Big Data is something called "analytics," the ability to sift through the humongous amount of data and gather business intelligence out of it.

R is the language of choice for this. It used for statistical analysis and graphics/visualization.

Sqoop -

Sqoop is one of those skills that has zoomed into popularity, thanks to Big Data craze.

It's a free and open source tool that lets you transfer data from popular Big Data storage system, Hadoop, into classic relational databases like the ones made by Oracle, IBM and Microsoft.

It's a command-line interface tool, meaning you have to know the commands and type them directly into the system, rather than click on them with a mouse.

While Big Data options like Hadoop are the new-age way of dealing with data, Documentum (EMC Documentum is an "enterprise content management" system) remains a popular tool in industries that still use a lot of paper or electronic forms, like legal, medical, insurance, and so on. A major sections where BigData can bring about a revolution.

While NoSQL databases are increasingly becoming popular for new applications, many companies still RDBMS-based systems.
Relational Database Management System is the full from of RDBMS, a type of database management system. This is the traditional kind of database that uses the structured query language (SQL) used by databases like Oracle, Microsoft SQL Server, and IBM DB2.

There are data scientists that work on the tech side, the marketing side, and just about every other area of business in enterprise systems, and in just about every size company. They figure out how to get meaningful numbers and information from large volumes of data. And Bigdata is the most magical word in for them.

Post a Comment

أحدث أقدم