Sunday, August 25, 2013

HCatalog


HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools – Pig, MapReduce, and Hive – to more easily read and write data on the grid. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, or sequence files

HCatalog supports reading and writing files in any format for which a SerDe (serializer-deserializer) can be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.
 

Wednesday, August 14, 2013

IBM Big Data training


IBM hopes to help create the next generation of “big-data” specialists through a series of partnerships with universities around the world, as well as influence the curriculum.
Nine new agreements announced Wednesday involve Georgetown University, George Washington University, Rensselaer Polytechnic Institute, the University of Missouri, and Northwestern University in the U.S. IBM is also beginning big-data programs at Dublin City University, Mother Teresa Women’s University in India, the National University of Singapore, and the Philippines’ Commission on Higher Education.
They will result in a variety of programs, including a master of science degree in the business analytics track at George Washington University; an undergraduate course titled “Big Data Analytics” at the University of Missouri; and a center for business analytics at the National University of Singapore.
In its announcement, IBM cited U.S. Bureau of Labor statistics that found there will be a 24 percent rise in demand for people with “data analytics skills” over the next eight years.
While companies are managing to fill big data positions, there’s a caveat. “They are finding the candidates but a lot of what they’re doing is poaching candidates from other companies,” spokesman said. “One of the reasons I would expect IBM is making these partnerships to make sure there’s enough engineers to meet the demand they’re seeing.”

Thursday, August 8, 2013

Ideal data scientist


FICO, a leading predictive analytics and decision management software company, today released an infographic showing the characteristics of a good data scientist — what a Harvard Business Review article called the “sexiest job of the 21st century.”

The rise of Big Data has fueled demand for data scientists. Indeed.com reported that job postings for analytic scientists jumped 15,000 percent between the summer of 2011 and 2012. McKinsey & Company predicted the U.S. will see a 50- to 60-percent shortfall in analytic scientists by 2018.

“There’s more demand than ever for data scientists, but at the same time we demand more from job candidates,” said Dr. Andrew Jennings, chief analytics officer at FICO and head of FICO Labs. “FICO has been hiring data scientists — or analysts, as we used to call them — since 1956. We’ve learned that excellent math skills alone just aren’t enough. We want someone who can solve problems for businesses, and explain their insights to people who don’t have a Ph.D. in operations research.”

The FICO infographic identifies eight characteristics of a good data scientist. These include the ability to tease out insights from data, communicate with business users and focus on the practical applications of their work.

Saturday, August 3, 2013

Social Intelligence


No matter what industry a business operates in, data is now being used more than ever before to gain an advantage. Social is only one of the newest layers in this big data bonanza, and some companies that were early adopters are starting to mature their models into Social Intelligence.

Enterprises have an average of 178 social media accounts, the report found, and an array of departments and executives are increasingly active there. However, when it comes to things like customer relationship management, analytics and market research, social data is mostly isolated. This leads to disjointed efforts across a company, and doesn’t allow for a strategic, holistic view to be put into place.

It’s becoming a roadblock as companies seek to really tap into social data insights, so companies need to develop a common framework for social data collection and integration. Not doing so could result in poorer customer experiences, and of course, missed opportunities.

Altimeter collected input from 34 enterprise organizations on how to integrate social data, and how to build holistic systems that scale for its report.