Saturday, May 11, 2013

Alteryx


Business intelligence firm Alteryx has debuted Project Edition, an analytics package it has dubbed instant analytics, the kind to be deployed for a specific project and one that doesn't require assistance from IT to run.

Project Edition, as the name implies, is a scaled down version of the Alteryx Strategic Analycis 8.5 system, and the company has released it in the hopes of exposing a wider customer base to the full paid version. Project Edition, while free to use, is limited because it only allows data to be run a handful of times for a given purpose.

Data can come from a variety of sources like Excel, text files, data warehouses, cloud apps, Hadoop or social media, and can be integrated and cleansed by Alteryx. This single analytics workflow is meant to help teams crunch data for particluar assignment even if they don't have any coding or programming skills. Once the data is analyzed, it can be presented in reports, data files or Tableau, a data visualization tool.

Sunday, March 24, 2013

BigData Summit

Big data. It just keeps getting bigger. Why? Because top management’s attention is laser-focused on big data as the solution to many corporate problems.  More actions at CIO Big Data summit at New York on first week of May'13.

Gartner Sales Performance Analyst Patrick Stakenas. “There’s no question that big data is the biggest trend in business intelligence, and it will remain that way for the foreseeable future. But it’s not just an IT issue. It’s a management issue. Relying too much on big data analytics risks losing the personal approach to selling.”

The huge stores of data that companies have accumulated haven’t added true value to the enterprise yet, because data requires context in order to be useful. Context includes a clearly articulated business strategy for using the data, an understanding of competitive shifts, an understanding of the market’s perceptions about your company and your products, and much, much more.

Gartner’s Laney says, for example, that social media is a great source of data and information about customers, but it can cause real problems for executives unless it’s put into context.

Even after all the data is collected and analyzed, there’s still one more pitfall to look for, adds Adam Sarner, Gartner’s big data and CRM analyst. “The successful big data project isn’t about collecting massive amounts of this data,” Sarner says. “It’s about making the right information accessible and action-oriented for the company and the customer for core CRM.”

Friday, March 8, 2013

Big Data Splunk



The initial focus of 'big data' has been about its increasing volume, velocity and variety — the "three Vs" — with little mention of real world application. Now is the time to get down to business.

Splunk is the platform for machine data. It’s the easy, fast and resilient way to collect, analyze and secure the massive streams of machine data generated by your IT systems and technology infrastructure—whether it’s physical, virtual or in the cloud.

Splunk software collects machine data securely and reliably from wherever it’s generated. It stores and indexes the data in real time in a centralized location and protects it with role-based access controls. Splunk lets you search, monitor, report and analyze your real-time and historical data

451 Research, and three "real world" case studies of Splunk customers handling the variety and velocity of their ever increasing unstructured data. 451 Research believes that in order to deliver value from 'big data', businesses need to look beyond the nature of the data and re-assess the technologies, processes and policies they use to engage with that data

Saturday, March 2, 2013

Oracle Big Data


Enterprise systems have long been designed around capturing, managing and analyzing business transactions e.g. marketing, sales, support activities etc. However, lately with the evolution of automation and Web 2.0 technologies like blogs, status updates, tweets etc. there has been an explosive growth in the arena of machine and consumer generated data. Defined as “Big Data”, this data is characterized by attributes like volume, variety, velocity and complexity and essentially represents machine and consumer interactions

Big data analytics lifecycle includes steps like acquire, organize and analyze. The analytics process starts with data acquisition. The structure and content of big data can’t be known upfront and is subject to change in-flight so the data acquisition systems have to be designed for flexibility and variability; no predefined data structures, dynamic structures are a norm. The organization step entails moving the data in well defined structures so relationships can be established and the data across sources can be combined to get a complete picture.

Oracle offers the broadest and most integrated portfolio of products to help you acquire and organize these diverse data sources and analyzes them alongside your existing data to find new insights and capitalize on hidden relationships. Attached diagram helps you to understand how Oracle acquire, organize, and analyze your big data.

Wednesday, February 20, 2013

Red Hat Big Data



Open source vendor Red Hat announces a Big Data strategy that spans the full enterprise software stack, both in the public cloud and on-premise.

Red Hat Enterprise Linux (RHEL) is arguably Raleigh, North Carolina-based Red Hat's flagship product, but the operating system arena is not by any means its only focus.  Red Hat also has big irons in the storage, cloud and developer fires, and its Big Data strategy announcement addressed all three of these.  Big Data is now a relevant factor in the entire enterprise software stack.

Red Hat rightly pointed out that the majority of Big Data projects are built on open source software (including Linux, Hadoop, and various NoSQL databases) and so it's fitting that such an important company in the open source world as Red Hat would announce its Big Data strategy.

Red Hat big data components are illustrated in the attached system diagram.

Saturday, February 2, 2013

Big Data Fourth Dimension


We knew that Big Data has 3 pillars namely Volume, Velocity and Variety.  I learnt a new (4th) dimension namely Veracity.  What does it mean?  Accuracy: conformity with truth or fact (or) truthfulness: devotion to the truth.

1. Volume:
Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.

  • Turn 12 terabytes of Tweets created each day into improved product sentiment analysis
  • Convert 350 billion annual meter readings to better predict power consumption

2. Velocity
Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

  • Scrutinize 5 million trade events created each day to identify potential fraud
  • Analyze 500 million daily call detail records in real-time to predict customer churn faster

3. Variety
Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.

  • Monitor 100’s of live video feeds from surveillance cameras to target points of interest
  • Exploit the 80% data growth in images, video and documents to improve customer satisfaction

4. Veracity
1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

Tuesday, January 29, 2013

Polyglot Persistence



In computing, a polyglot is a computer program or script written in a valid form of multiple programming languages, which performs the same operation  to compile or interpret it.

In NoSQL world, Polyglot Persistence contains a variety of different data storage technologies for different kinds of data in any decent sized enterprise.  Complex applications combine different types of problems, so picking the right language for the job may be more productive than trying to fit all aspects into a single language.  In Big Data era, there's been an explosion of interest in new languages, particularly functional languages like Clojure, Scala, Erlang.  In the new strategic enterprise application, the persistence should be no longer relational.  

A common example is configuring an Apache Solr server to stay in sync with a SQL-based database. Then you can do scored keyword/substring/synonym/stemmed/etc queries against Solr but do aggregations in SQL.

Another example is to use the same datastore, but store the same data in multiple aggregated formats. For example, having a dataset that is rolled up by date (each day getting a record) can also be stored rolled up by user (each user getting a record). Depending on the query you want to run, you choose the set that will give you the best performance. If the data is large enough, the overhead of keeping the two sets synchronized more than pays for itself in increased query speed.

Herez the attached reference architecture of Polyglot Persistence in a typical Web App.