Saturday, May 31, 2014

Cassandra Partitioner

Last weekend, I had an interesting learning during BigData Cassandra App production release. Major shift was to change from RandomPartitioner to Murmur3Partitioner.  Let me ink about it.

Basically, Cassandra partitioner determines how data is distributed across the nodes in the cluster including replicas. Basically, a partitioner is a hash function for computing the token/hash of a row key. Each row of data is uniquely identified by a row key and distributed across the cluster by the value of the token.

Both the Murmur3Partitioner and RandomPartitioner use tokens to help assign equal portions of data to each node and evenly distribute data from all the tables throughout the ring or other grouping, such as a keyspace. This is true even if the tables use different row keys, such as usernames or timestamps.

Two key differences on implementation are:
  1. Murmur3Partitioner uniformly distributes data across the cluster based on MurmurHash hash values; where as RandomPartitioneron MD5 hash values.
  2. On setting the partitioner in the cassandra.yaml file, Murmur3Partitioner includes org.apache.cassandra.dht.Murmur3Partitioner, where as RandomPartitioner refers org.apache.cassandra.dht.RandomPartitioner

Sunday, May 18, 2014

Ambient Intelligence

Microsoft announced new hosted service, called the Azure ISS (Intelligent Systems Service), promises to ease the process of managing machine data from sensors and devices connected in the so-called Internet of Things. ISS is now available as a limited public preview.

Microsoft has also released APS (Analytics Platform System), an update and expansion of what was formerly called the Parallel Data Warehouse. APS can combine query results from relational data in SQL Server databases and non-relational data captured by Hadoop.

In addition, the company has launched SQL Server 2014, the first edition of Microsoft's relational database system that includes the ability to store entire databases in the working memory of a server, which allows for faster access of the data.

All these products can help an organization make better use of its "ambient intelligence," Nadella said at a customer event in San Francisco.

Nadella defined ambient intelligence as the data that is generated by both a growing number of machines, such as sensors, as well as by people who capture experiences with their digital devices.

"You have this enormous capacity to reason over all of this digitized information," Nadella said.

In a report commissioned by Microsoft, IDC estimated that organizations could generate $1.6 trillion in additional revenue and cost savings over the next four years by better understanding their data.

The new Microsoft tools are aimed at bringing big-data-styled analysis to the enterprise.

Sunday, May 11, 2014

TIBCO Jaspersoft

TIBCO made a shrewd move when it acquired Spotfire. Now the company is hoping it can catch lightning in a bottle a second time with the $185 million acquisition of Jaspersoft on Apr-14 end.

Jaspersoft provides commercial subscription support for open source data-integration, business intelligence, and analytics software it develops and upgrades with help from a large community of more than 400,000 registered users. The company is best known for its low-cost open source ETL and reporting software, as well as for embedded BI software used by partners such as Nike,, FedEx, and McGraw Hill to deliver more than 140,000 analytics-infused applications.

When TIBCO acquired Spotfire for $195 million in 2007, it was in the midst of the great consolidation of the BI market in which IBM acquired Cognos, Oracle acquired Siebel and Hyperion, and SAP acquired BusinessObjects. These were the leading business intelligence products of their day (along with MicroStrategy and InformationBuilders). But since that time, Spotfire has emerged as the third name in a hot trio that also includes Tableau Software and QlikTech.

These data-discovery products have seen the fastest growth in the category ever since, while the likes of IBM Cognos, Oracle OBIEE, SAP BusinessObjects, and MicroStrategy have seen flat to slow sales growth. But reporting, ad-hoc query tools, and embeddable BI software remain necessary. Here's where TIBCO is betting that Jaspersoft will complement Spotfire and give it a complete portfolo.

Pairing open source Jaspersoft with top-selling Spotfire, TIBCO hopes to disrupt the business intelligence and embedded analytics market.

Thursday, May 8, 2014

Big Data Top 5

We’re on the cusp of a real turning point for big data. Its applications are becoming clearer, its tools are getting easier and its architectures are maturing in a hurry. It’s no longer just about log files, clickstreams and tweets. It’s not just about Hadoop and what’s possible (or not) with MapReduce.
With each passing day, big data is becoming more about creativity — if someone can think of an application, they can probably build it. That makes the concept of big data a lot more tangible and a lot more useful to a lot more companies, and it makes the market for big data a lot more lucrative.
Here are five technologies helping spur a shift in thinking from “Why would I want to use some technology that Yahoo built? And how?” to “We have problem that needs solving. Let’s find the right tool to solve it.”  They are Shark, Spark, MlLib, GraphX, SparkR.