Amazon Redshift is a fast, fully managed, widely popular cloud data warehouse that powers the modern data architecture enabling fast and deep insights or machine learning (ML) predictions using SQL across your data warehouse, data lake, and operational databases. A key differentiating factor of Amazon Redshift is its native integration with other AWS services, which makes it easy to build complete, comprehensive, and enterprise-level analytics applications.
As analytics solutions have moved away from the one-size-fits-all model to choosing the right tool for the right function, architectures have become more optimized and performant while simultaneously becoming more complex. You can use Amazon Redshift for a variety of use cases, along with other AWS services for ingesting, transforming, and visualizing the data.
Manually deploying these services is time-consuming. It also runs the risk of making human errors and deviating from best practices.
In this post, we discuss how to automate the process of building an integrated analytics solution by using a simple script.
The framework described in this post uses Infrastructure as Code (IaC) to solve the challenges with manual deployments, by using AWS Cloud Development Kit (CDK) to automate provisioning AWS analytics services. You can indicate the services and resources you want to incorporate in your infrastructure by editing a simple JSON configuration file.
The script then instantly auto-provisions all the required infrastructure components in a dynamic manner, while simultaneously integrating them according to AWS recommended best practices.
In this post, we go into further detail on the specific steps to build this solution.
Prior to deploying the AWS CDK stack, complete the following prerequisite steps:
AWSCloudShellFullAccess
IAM Full Access
AWSCloudFormationFullAccess
AmazonSSMFullAccess
AmazonRedshiftFullAccess
AmazonS3ReadOnlyAccess
SecretsManagerReadWrite
AmazonEC2FullAccess
AmazonDMSRoleCustom
with the following permissions:{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "dms:*",
"Resource": "*"
}
]
}
To launch the target infrastructures, download the user-config-template.json file from the GitHub repo.
To prep the config file, start by entering one of the following values for each key in the top section: CREATE
, N/A
, or an existing resource ID to indicate whether you want to have the component provisioned on your behalf, skipped, or integrated using an existing resource in your account.
For each of the services with the CREATE
value, you then edit the appropriate section under it with the specific parameters to use for that service. When you’re done customizing the form, save it as user-config.json
.
You can see an example of the completed config file under user-config-sample.json in the GitHub repo, which illustrates a config file for the following architecture by newly provisioning all the services, including Amazon Virtual Private Cloud (Amazon VPC), Amazon Redshift, an Amazon Elastic Compute Cloud (Amazon EC2) instance with AWS SCT, and AWS DMS instance connecting an external source SQL Server on Amazon EC2 to the Amazon Redshift cluster.
This project uses CloudShell, a browser-based shell service, to programatically initiate the deployment through the AWS Management Console. Prior to opening CloudShell, you need to configure an IAM user, as described in the prerequisites.
git clone https://github.com/aws-samples/amazon-redshift-infrastructure-automation.git
~/amazon-redshift-infrastructure-automation/scripts/deploy.sh
user-config.json
.After you run the script, you can monitor the deployment of resource stacks through the CloudShell terminal, or through the AWS CloudFormation console, as shown in the following screenshot.
Each stack corresponds to the creation of a resource from the config file. You can see the newly created VPC, Amazon Redshift cluster, EC2 instance running AWS SCT, and AWS DMS instance. To test the success of the deployment, you can test the newly created AWS DMS endpoint connectivity to the source system and the target Amazon Redshift cluster. Select your endpoint and on the Actions menu, choose Test connection.
If both statuses say Success, the AWS DMS workflow is fully integrated.
If the stack launch stalls at any point, visit our GitHub repository for troubleshooting instructions.
In this post, we discussed how you can use the AWS Analytics Infrastructure Automation utility to quickly get started with Amazon Redshift and other AWS services. It helps you provision your entire solution on AWS instantly without any spending any time on challenges around integrating the services or scaling your solution.