Async Callback in Step Function with Task Token
👋 I am Sabyasachi Bhattacharya, in every newsletter, I will share articles about serverless technologies on AWS. If you like to have more posts like this consider subscribing!
In AWS Step Functions, when orchestrating workflows that require pausing the execution until an external process completes, you can utilize the waitForTaskToken pattern. This is particularly effective when the workflow must wait for external validations or processing before proceeding.
Let’s consider an example.
Scenario:
Imagine a banking system that processes loan applications. The system must evaluate several criteria before granting a loan. Each evaluation may involve different services and might not be completed instantaneously.
Our architecture contains
a Step function that receives a list of tasks. It sends those tasks to a SNS topic.
Multiple SQS are subscribed to the topic and have a subscription filter to receive some specific message.
We have separate lambda functions, each receives and processes messages from respective queues.
The diagram looks like the following -
"Process Tasks": {
"Type": "Map",
"Next": "Process Approval",
"ItemsPath": "$.tasks",
"Iterator": {
"StartAt": "Publish message to SNS topic",
"States": {
"Publish message to SNS topic": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish.waitForTaskToken",
"Parameters": {
"Message.$": "States.Format('Task for orderId {} is {}', $.orderId, $.task)",
"MessageAttributes": {
"Task": {
"DataType": "String",
"StringValue.$": "$.task"
},
"TaskToken": {
"DataType": "String",
"StringValue.$": "$$.Task.Token"
}
},
"TopicArn": "arn:aws:sns:xxx:xxxxxxx:task"
},
"End": true
}
}
},
"InputPath": "$.workflow",
"ResultPath": null
}an important part of the above snippet is the bold part where the task resource ends with the `waitForTaskToken` suffix. This will make the step wait until the task token is returned.
After processing messages, each of the lambdas needs to make one of below API call to step function -
SendTaskSuccess
SendTaskFailure
SendTaskHeartBeat
_, err = sfnClient.SendTaskSuccess(ctx, &sfn.SendTaskSuccessInput{
Output: aws.String("{\"result\":\"Credit Check Done\"}"),
TaskToken: &taskTokenValue,
})**Note the `Output` attribute needs to be a valid JSON
How to avoid a stuck state
How do we avoid a scenario when the step function never receives the task token? Is it stuck forever? In the documentation, the maximum execution time for a state machine is 1 year. However, it is often not desirable to keep an execution stuck for so long. The second and better approach is to use `HeartBeatSeconds` in a task. If the token is not returned within that time the task fails with timeout.
To conclude step function async callback with `waitForTaskToken` enables developers to create workflow that requires to wait for some external task to complete. While designing the workflows we need to take special care to not keep the task state in wait state and consume up resources.
Some interesting readings



