1. Apache Strom
Apache Storm – released by Twitter, is a distributed open-source framework that helps in the real-time processing of data. Apache Storm works for real-time data just as Hadoop works for batch processing of data (Batch processing is the opposite of real-time. In this, data is divided into batches, and each batch is processed.
MongoDB is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. No database makes you more productive. Try MongoDB free in the cloud! As a programmer, you think in objects. Now your database does too. MongoDB is a document database, which means it stores data in JSON-like documents.
Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous master less replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon’s Dynamo distributed storage and replication techniques combined with Google’s Big table data and storage engine model.
Cloudera is a software that provides a platform for data analytics, data warehousing, and machine learning. Initially, Cloudera started as an open-source Apache Hadoop distribution project, commonly known as Cloudera Distribution for Hadoop or CDH.
OpenRefine is an open-source desktop application for data cleanup and transformation to other formats, an activity commonly known as data wrangling. It is similar to spreadsheet applications, and can handle spreadsheet file formats such as CSV, but it behaves more like a database.