Ganesan Senthilvel: April 2014

Saturday, April 19, 2014

Mainframe Big Data

Mainframes are reliable and highly automated, often running for years with virtually no human intervention. Mainframe analyst Josh Krischer tells a story about an Eastern European airline that ran its core systems on an IBM mainframe for five years without ever touching the machine after its mainframe IT guy retired.

However, that data often stays locked in the mainframe because sorting and transforming it to perform complex data analytic has been expensive and robs the core applications of CPU cycles. Still, around 85% of corporate systems are running under Mainframe.

The good news is that Big Data technologies are making it easier and less costly to export that data. One option is to use JCL batch workloads to move the data to Hadoop, where it can be processed, combined with other appropriate data and, for instance, moved to a NoSQL database to support forward-looking analysis to support business decisions. The challenge is the lack of well established native connectivity between mainframes and Hadoop.

Saturday, April 12, 2014

BigData Tech Gain

Recent funding in Hadoop vendors underscores how venture capitalists see big bucks in managing Big Data. Last month, Hadoop providers Cloudera Inc., Hortonworks Inc. and Platfora Inc. received a collective $1 billion from investors convinced that they are onto something big.

Hadoop is a storage system that ingests large amounts of data from servers and breaks it into manageable chunks. Programmers structure the data, move it into a relational database, and study it with an analytical application. Companies supplement their relational databases with Hadoop because it organizes, or processes, data faster and more cheaply, running on a series of commodity servers. Hadoop is also better at processing text, photos, images than relational databases, which store data in tables and rows. And Hadoop’s architecture allows developers to collect information and figure out what to do with it later; relational systems require developers to carefully design and store data with a scheme planned in advance.

Stay tuned to the emerging technology trend-Big Data.

Saturday, April 5, 2014

LucidWorks Solr

Solr is based on Apache's own Lucene project and adds many options not found in the original that ought to appeal to those building next-generation data-driven apps -- for example, support for geospatial search.

The end-user advantages of Solr, lie in how it makes a broader variety of Hadoop searches possible for both less technical and more technical users. Queries can be constructed in natural language ways or through more precise key/value pairs. Another implication of Solr being able to return search results across a Hadoop cluster is that more data can be kept in Hadoop and not pretransformed for the sake of analytics. This means not having to anticipate the questions [to ask] before you load the data.

Solr will be rolled into HDP via a multistep process. The first phase involves making Solr available for customers on June 1 within a sandbox. After that, Solr will be integrated directly into the next release of HDP, although no release schedule for that has been announced yet. Later on, Hortonworks plans to do some work on hooking up Solr to Ambari, the management and monitoring component for Hadoop, for easier control of indexing speeds and alerting, among other aspects.

LucidWorks has also produced a version of Solr that's meant to join the ever-growing parade of open source or lower-priced products designed to steal some of Splunk's log-search thunder.