Ganesan Senthilvel

Wednesday, August 14, 2013

IBM Big Data training

IBM hopes to help create the next generation of “big-data” specialists through a series of partnerships with universities around the world, as well as influence the curriculum.

Nine new agreements announced Wednesday involve Georgetown University, George Washington University, Rensselaer Polytechnic Institute, the University of Missouri, and Northwestern University in the U.S. IBM is also beginning big-data programs at Dublin City University, Mother Teresa Women’s University in India, the National University of Singapore, and the Philippines’ Commission on Higher Education.

They will result in a variety of programs, including a master of science degree in the business analytics track at George Washington University; an undergraduate course titled “Big Data Analytics” at the University of Missouri; and a center for business analytics at the National University of Singapore.

In its announcement, IBM cited U.S. Bureau of Labor statistics that found there will be a 24 percent rise in demand for people with “data analytics skills” over the next eight years.

While companies are managing to fill big data positions, there’s a caveat. “They are finding the candidates but a lot of what they’re doing is poaching candidates from other companies,” spokesman said. “One of the reasons I would expect IBM is making these partnerships to make sure there’s enough engineers to meet the demand they’re seeing.”

Thursday, August 8, 2013

Ideal data scientist

FICO, a leading predictive analytics and decision management software company, today released an infographic showing the characteristics of a good data scientist — what a Harvard Business Review article called the “sexiest job of the 21st century.”

The rise of Big Data has fueled demand for data scientists. Indeed.com reported that job postings for analytic scientists jumped 15,000 percent between the summer of 2011 and 2012. McKinsey & Company predicted the U.S. will see a 50- to 60-percent shortfall in analytic scientists by 2018.

“There’s more demand than ever for data scientists, but at the same time we demand more from job candidates,” said Dr. Andrew Jennings, chief analytics officer at FICO and head of FICO Labs. “FICO has been hiring data scientists — or analysts, as we used to call them — since 1956. We’ve learned that excellent math skills alone just aren’t enough. We want someone who can solve problems for businesses, and explain their insights to people who don’t have a Ph.D. in operations research.”

The FICO infographic identifies eight characteristics of a good data scientist. These include the ability to tease out insights from data, communicate with business users and focus on the practical applications of their work.

Saturday, August 3, 2013

Social Intelligence

No matter what industry a business operates in, data is now being used more than ever before to gain an advantage. Social is only one of the newest layers in this big data bonanza, and some companies that were early adopters are starting to mature their models into Social Intelligence.

Enterprises have an average of 178 social media accounts, the report found, and an array of departments and executives are increasingly active there. However, when it comes to things like customer relationship management, analytics and market research, social data is mostly isolated. This leads to disjointed efforts across a company, and doesn’t allow for a strategic, holistic view to be put into place.

It’s becoming a roadblock as companies seek to really tap into social data insights, so companies need to develop a common framework for social data collection and integration. Not doing so could result in poorer customer experiences, and of course, missed opportunities.

Altimeter collected input from 34 enterprise organizations on how to integrate social data, and how to build holistic systems that scale for its report.

Sunday, July 28, 2013

Big Data Stream

Stream computing is a new paradigm necessitated by new data-generating scenarios, such as the ubiquity of mobile devices, location services, and sensor pervasiveness. A crucial need has emerged for scalable computing platforms and parallel architectures that can process vast amounts of generated streaming data.

In static data computation (the left-hand side of attached diagram), questions are asked of static data. In streaming data computation (the right-hand side), data is continuously evaluated by static questions.

Let me give a simple example. In financial trading platform, applications are written traditionally to analyse the historical records in the batch mode. Meaning, we preserved the data in the data ware house. Based on the user request/query, the result is produced/returned back to the consumer. It is the first use case.

With big data streaming technology, the requests (like market trend of IT stocks) are pre built On the arrival/streaming of the data, the results are published to the prescribed subscriber/consumer. Isn't it too cool to taste the technology?

Thursday, July 25, 2013

Storm at Yahoo

Yahoo! is enhancing its web properties and mobile applications to provide its users personalized experience based on interest profiles. To compute user interest, we process billions of events from our over 700 million users, and analyze 2.2 billion content every day. Since users' change interest over time, we need to update user profiles to reflect their current interests.

Enabling low-latency big-data processing is one of the primary design goals of Yahoo!’s next-generation big-data platform. While MapReduce is a key design pattern for batch processing, additional design patterns will be supported over time. Stream/micro-batch processing is one of design patterns applicable to many Yahoo! use cases.

Yahoo! big-data platform enables Hadoop applications and Storm applications to share data via shared storage such as HBase. Yahoo! engineering teams are developing technologies to enable Storm applications and Hadoop applications to be hosted on a single cluster.

Sunday, July 21, 2013

Storm

Hadoop, the clear king of big-data analytics, is focused on batch processing. This model is sufficient for many cases (such as indexing the web), but other use models exist in which real-time information from highly dynamic sources is required. Solving this problem resulted in the introduction of Storm from Nathan Marz (now with Twitter by way of BackType). Storm operates not on static data but on streaming data that is expected to be continuous. With Twitter users generating 140 million tweets per day, it's easy to see how this technology is useful.

Storm is more than a traditional big-data analytics system: It's an example of a complex event-processing (CEP) system. CEP systems are typically categorized as computation and detection oriented, each of which can be implemented in Storm through user-defined algorithms. CEPs can, for example, be used to identify meaningful events from a flood of events, and then take actions on those events in real time.

Thursday, July 4, 2013

Hunk

Splunk is getting on board with a new Hadoop-based application it cheekily calls Hunk. Hunk takes Splunk’s popular analytics platform and puts it to work on data stored in Hadoop. Businesses that use Hadoop can now use this for exploration and visualization of data. Itz key features are:

Splunk Virtual Index: Splunk virtual index technology enables the “seamless use” of the entire Splunk technology stack, including the Splunk Search Processing Language (SPL), for interactive exploration, analysis and visualization of data stored anywhere, as if it was stored in a Splunk software index.
Explore data in Hadoop from one place: Hunk is designed for interactive data exploration across large, diverse data sets on top of Hadoop.
Interactive analysis of data in Hadoop: Hunk enables users to drive deep analysis, detect patterns, and find anomalies across terabytes and petabytes of data.
Create custom dashboards: Hunk users can combine multiple charts, views and reports into role-specific dashboards which can be viewed and edited on laptops, tablets or mobile devices.

Splunk has 5,600 customers, which includes half of the Fortune 100. It says the new Hunk product will target both new and existing customers, as long as they use Hadoop.