Saturday, January 27, 2018

Redshift ETL


In ETL world, Amazon Redshift is revolutionary to make the developer life simple.  It is used to calculate daily, weekly, and monthly aggregations, which are then unloaded to S3, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

The proposed ETL process has 4 key steps to execute:

Step 1:  Extract from the RDBMS source to a S3 bucketIn this ETL process, the data extract job fetches change data every 1 hour and it is staged into multiple hourly files.

Step 2: Stage data to the Amazon Redshift table for cleansing
Ingesting the data can be accomplished using a JSON-based manifest file. Using the manifest file ensures that S3 eventual consistency issues can be eliminated and also provides an opportunity to deduce any files if needed.

Step 3: Transform data to create daily, weekly, and monthly datasets and load into target tables
Data is staged in the “stage_tbl” from where it can be transformed into the daily, weekly, and monthly aggregates and loaded into target tables.

Step 4: Unload the daily dataset to populate the S3 data lake bucketThe transformed results are now unloaded into another S3 bucket, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

Monday, January 22, 2018

India Economics 2018


Last 3 years, Indian politics has different perspective on the current Prime Minister Mr. Narendra Modi.  Forget about all the criticism against his initiatives.  In 4 members family, we used to have difference of opinions; think about 1.4 billion people with different culture, language, religion, etc.

Being an individual from my motherland, he inspires fellow citizen that every thing is possible in life from poor boy to prime minister of the largest powerful democratic nation.

Being industry leader, I've 2 dimension to review (technical; not political) his effort namely leadership skill and business domain.

(1) Leadership Skill
As Warren says "Leadership is the capacity to translate vision into reality".  Mr. Prime minister demonstrated strong leadership skill with courageous execution on few long pending initiatives like Global Sales Tax (GST), Demonetization, Unique identity (Aadhar) roll out, Citizens welfare on Health policy, Digital Innovation, etc.

(2) Business Domain
Indian Prime Minister prepares this week to address global business and political leaders in Davos, Switzerland, as his country passes France and the U.K. to become the world’s fifth-largest economy, underscoring the South Asian nation’s drive for recognition as a great power.

Some economists calculate that India’s gross domestic product jumped into the top five last quarter as it continued to outgrow every country in Europe—and for that matter most of the rest of the world. It has been reflected in Wall Street Journal (WSJ).
Ref: https://www.wsj.com/articles/davos-offers-modi-stage-to-push-muscular-vision-for-india-1516552979

Big salute to all the contributors.  Jai Hind !

Saturday, January 20, 2018

AD integration with EMR

Active Directory (AD) is a directory service that Microsoft developed for Windows domain networks. It is included in most Windows Server operating systems as a set of processes and services
In recent times,  many enterprises use Microsoft Active Directory to manage users, groups, and computers in a network.

This article is about the seamless integration of Active Directory on Amazon EMR with the same single sign-on (SSO) experience.

Ref: https://aws.amazon.com/blogs/big-data/use-kerberos-authentication-to-integrate-amazon-emr-with-microsoft-active-directory/

The ability to authenticate users and services with Kerberos not only allows you to secure your big data applications, but it also enables you to easily integrate Amazon EMR clusters with an Active Directory environment.  It is also possible to use AWS CloudFormation to automate the deployment of this solution.

Sunday, January 14, 2018

Effective vs Efficient

In back of my mind, the question pops up around the difference between Effective and Efficient. Couple of management books and few blogs, enlighten me to share this knowledge over Pongal holidays.

Definition

With fundamentals of computing, three states are vital in any system. They are
  1. Input
  2. Process
  3. Output
With this context, Efficiency is focused on lower input and higher output with core processing of 'quantity'. As an example, the business is efficient if they deliver the high quantity take away with lower resources. Let us assume, a building is constructed in 6 months with 100 construction professionals during normal term. To make it efficient, the business is expected to complete the same task with less time frame (say 3 months) and less people (50). Now, it is claimed that the project is efficiently completed by 50%
On coming to Effective factor, there is a co-relation between input and output parameters. It is related to the fine tuned processing of doing right things. In turn, Effectiveness has the 'quality' focus; rather than 'quantity'.

Expert Opinion

In the essence, management Guru Peter Drucker describes "Efficiency means doing thing right; where as Effective doing right thing", as defined below:

Mathematical Mode

With mathematical derivation, matrix is represented on 2 x 2 mode as below:
As self explanatory, most effective succeeds at a high cost; but most efficient leads to fail due to cost control. Below mathematical curve proves the optimum solution to have the balance between effective and efficient.

Management Mode

Let us log into the business management mode. Objective is the key goals in the business context. Return on Investment (RoI) and Cost are vital parameters to drive Effective and Efficient model in the business management.
On doing high effective & low efficient mode, goal got pursued in high cost. So, it aims towards high RoI & Cost. High efficient & low effective directs lower production with low cost model and so aims lower RoI & Cost.
Ideally, the business management targets High RoI & Low Cost using high effective & efficient model. Strive to Thrive is success mantra !!

Conclusion

As industry leader persona, Top-5 summary points are:
  1. Effective & Efficient are useful tools to leverage concurrently/interrelately
  2. Both are performance scorecard/indicator, to get things done with on time mode
  3. Motivates to share transparent customer feedback with your team
  4. Engages team to make them feel of belonging sense
  5. In turn, foster the positive work environment for better business results
As Ron Kaufman said "First be effective and then be efficient"

Monday, January 1, 2018

AWS Digital Training


AWS Training and Certification recently released free digital training courses that will make it easier for you to build your cloud skills and learn about using AWS Big Data services. This training includes courses like Introduction to Amazon EMR and Introduction to Amazon Athena.

You can get free and unlimited access to more than 100 new digital training courses built by AWS experts at aws.training. It’s easy to access training related to big data. Just choose the Analytics category on our Find Training page to browse through the list of courses. You can also use the keyword filter to search for training for specific AWS offerings.

Reference link: https://www.aws.training/

Recommended training
Just getting started, or looking to learn about a new service? Check out the following digital training courses:

Introduction to Amazon EMR (15 minutes)
Covers the available tools that can be used with Amazon EMR and the process of creating a cluster. It includes a demonstration of how to create an EMR cluster.

Introduction to Amazon Athena (10 minutes)
Introduces the Amazon Athena service along with an overview of its operating environment. It covers the basic steps in implementing Athena and provides a brief demonstration.

Introduction to Amazon QuickSight (10 minutes)
Discusses the benefits of using Amazon QuickSight and how the service works. It also includes a demonstration so that you can see Amazon QuickSight in action.

Introduction to Amazon Redshift (10 minutes)
Walks you through Amazon Redshift and its core features and capabilities. It also includes a quick overview of relevant use cases and a short demonstration.

Introduction to AWS Lambda (10 minutes)
Discusses the rationale for using AWS Lambda, how the service works, and how you can get started using it.

Introduction to Amazon Kinesis Analytics (10 minutes)
Discusses how Amazon Kinesis Analytics collects, processes, and analyzes streaming data in real time. It discusses how to use and monitor the service and explores some use cases.

Introduction to Amazon Kinesis Streams (15 minutes)
Covers how Amazon Kinesis Streams is used to collect, process, and analyze real-time streaming data to create valuable insights.

Introduction to AWS IoT (10 minutes)
Describes how the AWS Internet of Things (IoT) communication architecture works, and the components that make up AWS IoT. It discusses how AWS IoT works with other AWS services and reviews a case study.

Introduction to AWS Data Pipeline (10 minutes)
Covers components like tasks, task runner, and pipeline. It also discusses what a pipeline definition is, and reviews the AWS services that are compatible with AWS Data Pipeline.