Sunday, September 5, 2021

Lifecycle Hooks

 

Problem Statement

One of the monster production box holds the core business functionalities with the complete set of data and services. 

During the deployment, the scale-in process kick starts only after an hour.  The business impact is to await for the new scale-out services, after an hour of the effectiveness.

Why

A lifecycle hook provides a specified amount of time (one hour by default) to complete the lifecycle action before the instance transitions to the next state.

It allows to control what happens when Amazon EC2 instances are launched and terminated as they are scale out and in.

How

On troubleshooting the mess-up in lifecycle hook, it has been found the error in instance_terminate lifecycle transition.  As the resolution, new lifecycle hook is created for instance_terminate with the right heartbeat timeout and auto scaling default result with 'Continue' option.

This fix is applied in two methods - at AWS console manually and at terraform programmatically.

resource "aws_autoscaling_lifecycle_hook" "app" {
    depends_on             = ....
    autoscaling_group_name = ....
    name                   = ....
    default_result         = "CONTINUE"
    heartbeat_timeout      = 600
    lifecycle_transition   = "autoscaling:EC2_INSTANCE_TERMINATING"
    notification_target_arn = ....
    role_arn                = ....
}

Conclusion

Thus the reported scale-in delay problem is resolved to meet the business expectation.  Technology needs to enable the business.

4 comments: