Sunday, July 21, 2013


Hadoop, the clear king of big-data analytics, is focused on batch processing. This model is sufficient for many cases (such as indexing the web), but other use models exist in which real-time information from highly dynamic sources is required. Solving this problem resulted in the introduction of Storm from Nathan Marz (now with Twitter by way of BackType). Storm operates not on static data but on streaming data that is expected to be continuous. With Twitter users generating 140 million tweets per day, it's easy to see how this technology is useful.

Storm is more than a traditional big-data analytics system: It's an example of a complex event-processing (CEP) system. CEP systems are typically categorized as computation and detection oriented, each of which can be implemented in Storm through user-defined algorithms. CEPs can, for example, be used to identify meaningful events from a flood of events, and then take actions on those events in real time.

