Cloudy Things: Cloud Concepts Simplified

Monitoring and Troubleshooting on AWS: CloudWatch, X-Ray, and Beyond

Guillermo Ojeda — Sat, 23 Mar 2024 16:33:24 GMT

As an AWS user, I'm sure you know that monitoring and troubleshooting are essential for keeping your applications running smoothly. After all, you can't fix what you can't see. But with the sheer number of services and tools available on AWS, it can be overwhelming to know where to start.

That's where this article comes in. We'll dive into AWS monitoring and troubleshooting, with some key services like CloudWatch and X-Ray, along with other tools and best practices. By the end, you'll have a better understanding of how to effectively monitor and troubleshoot your AWS applications, so you can spend less time fighting fires and more time building cool stuff.

Understanding AWS CloudWatch

At the heart of AWS monitoring is CloudWatch, a powerful service that collects monitoring and operational data in the form of logs, metrics, and events. Think of it as the central nervous system of your AWS environment, constantly keeping track of everything that's going on.

CloudWatch Metrics

One of the core components of CloudWatch is metrics. CloudWatch Metrics are data points that represent the performance and health of your AWS resources over time. AWS services automatically send metrics to CloudWatch, and you can also publish your own custom metrics.

For example, EC2 instances automatically send metrics like CPU utilization, network traffic, and disk I/O to CloudWatch. RDS databases send metrics like database connections, read/write latency, and free storage space. By monitoring these metrics, you can get a clear picture of how your resources are performing and identify potential issues before they impact your users.

CloudWatch Logs

Another key feature of CloudWatch is logs. CloudWatch Logs allows you to collect, monitor, and store log files from various sources, including EC2 instances, Lambda functions, and on-premises servers. You can use CloudWatch Logs to troubleshoot issues, analyze application behavior, and gain insights into user activity.

One of the most powerful features of CloudWatch Logs is the ability to filter and search log data. You can use simple text searches or complex query syntax to find specific log events, making it easy to identify errors, exceptions, or other issues. With CloudWatch Logs Insights, you can even perform real-time log analytics, allowing you to quickly investigate and resolve problems.

CloudWatch Alarms

Of course, collecting metrics and logs is only half the battle. You also need a way to proactively detect and respond to issues. That's where CloudWatch Alarms come in.

CloudWatch Alarms allow you to set thresholds for your metrics and receive notifications when those thresholds are breached. For example, you could create an alarm that triggers when the CPU utilization of an EC2 instance exceeds 80% for more than 5 minutes. When the alarm is triggered, you can have CloudWatch send an email, SMS message, or push notification to your team, or even perform automated actions like scaling up your instances or triggering a Lambda function.

When setting up alarms, it's important to strike a balance between being proactive and being spammed with notifications. A good rule of thumb is to focus on metrics that directly impact the user experience or the stability of your application. You should also carefully consider the thresholds and time periods for your alarms to avoid false positives.

CloudWatch Dashboards

Finally, CloudWatch Dashboards provide a way to visualize your metrics and logs in a single, customizable view. Dashboards allow you to create graphs, tables, and other widgets based on your CloudWatch data, giving you a real-time overview of your application's health and performance.

When creating dashboards, it's important to focus on the metrics and logs that are most relevant to your team and your users. You should also use clear and concise labels and annotations to help your team quickly understand the data being presented. And don't forget to share your dashboards with your team members, so everyone has access to the same information.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

AWS X-Ray: Distributed Tracing for Microservices

While CloudWatch is great for monitoring individual resources and services, it doesn't provide a complete picture of how requests flow through your application. That's where AWS X-Ray comes in.

X-Ray is a distributed tracing service that allows you to track requests as they move through your application, helping you identify performance bottlenecks, errors, and other issues. X-Ray is especially useful for troubleshooting microservices architectures, where requests often span multiple services and resources.

Instrumenting Applications for X-Ray

To use X-Ray, you first need to instrument your application code to send tracing data to the X-Ray service. AWS provides X-Ray SDKs for popular programming languages like Java, Node.js, Python, and .NET, which make it easy to add tracing to your code.

When instrumenting your code, it's important to follow best practices like using meaningful segment names, adding annotations and metadata to your traces, and handling errors gracefully. You should also be careful not to over-instrument your code, as this can add unnecessary overhead and complexity.

Tracing Requests with X-Ray

Once your application is instrumented, X-Ray will automatically capture and visualize traces as requests flow through your system. The X-Ray service map provides a high-level overview of your application architecture, showing how services and resources are connected and how requests are routed between them.

By drilling down into individual traces, you can see detailed information about each segment of the request, including response times, errors, and other metadata. This makes it easy to identify performance bottlenecks, such as slow database queries or high network latency, and pinpoint the root cause of issues.

X-Ray also integrates with other AWS services, allowing you to trace requests as they move between services like API Gateway, Lambda, and DynamoDB. This provides a complete end-to-end view of your application, making it easier to troubleshoot issues that span multiple services.

Analyzing and Visualizing Traces

The X-Ray console provides a powerful interface for analyzing and visualizing your tracing data. You can use the console to view the service map, examine individual traces, and filter and group traces based on various attributes like response time, error rate, or user agent.

One of the most useful features of the X-Ray console is the ability to create custom trace views and dashboards. This allows you to focus on the metrics and traces that are most important to your team, and share those views with other team members.

You can also integrate X-Ray with CloudWatch, allowing you to create alarms based on X-Ray metrics and visualize X-Ray data alongside other CloudWatch metrics. This provides a more comprehensive view of your application's health and performance, making it easier to identify and resolve issues.

Monitoring Serverless Applications on AWS

Serverless architectures, such as those based on AWS Lambda and Step Functions, present unique challenges when it comes to monitoring and troubleshooting. Because serverless functions are ephemeral and can scale rapidly, traditional monitoring approaches may not be effective.

Monitoring AWS Lambda with CloudWatch

One of the key tools for monitoring AWS Lambda is CloudWatch Logs. By default, Lambda sends log output to CloudWatch Logs, allowing you to view and search log data in real-time. You can use CloudWatch Logs to troubleshoot issues, analyze function behavior, and gain insights into performance and usage patterns.

In addition to logs, Lambda also sends metrics to CloudWatch, including invocations, duration, errors, and throttles. By monitoring these metrics, you can identify performance issues, detect anomalies, and set up alarms to proactively notify you of problems.

When monitoring Lambda functions, it's important to correlate logs and metrics to get a complete picture of function behavior. For example, if you notice a spike in function duration, you can use CloudWatch Logs to investigate the root cause, such as a slow database query or a network issue.

Monitoring AWS Step Functions with X-Ray

For more complex serverless workflows, such as those based on AWS Step Functions, X-Ray can be a powerful tool for monitoring and troubleshooting. By enabling X-Ray tracing for your Step Functions, you can visualize the execution flow of your state machines, identify performance bottlenecks, and pinpoint the root cause of errors.

X-Ray integrates seamlessly with Step Functions, automatically capturing traces as executions move through the state machine. You can use the X-Ray console to view the service map, examine individual executions, and filter and group traces based on various attributes.

One of the most useful features of X-Ray for Step Functions is the ability to correlate traces across Lambda functions and other AWS services. This allows you to see how data flows through your application, identify performance issues, and troubleshoot errors that span multiple services.

Other AWS Monitoring and Troubleshooting Tools

While CloudWatch and X-Ray are the core tools for monitoring and troubleshooting on AWS, there are many other services and features that can help you keep your applications running smoothly. Here are a few worth mentioning:

Amazon EventBridge

EventBridge is a serverless event bus that makes it easy to build event-driven architectures on AWS. With EventBridge, you can monitor events from a wide range of sources, including AWS services, SaaS applications, and custom applications, and trigger automated actions based on those events.

For example, you could use EventBridge to monitor EC2 instance state changes, capture S3 bucket events, or detect changes to your AWS resources using CloudTrail. You can then use EventBridge rules to trigger Lambda functions, send SNS notifications, or perform other actions in response to those events.

AWS Config

AWS Config is a service that helps you assess, audit, and evaluate the configurations of your AWS resources. With Config, you can continuously monitor and record resource configurations, and receive notifications when those configurations change.

Config is particularly useful for troubleshooting issues related to resource misconfigurations or compliance violations. For example, you could use Config to detect when an S3 bucket is made publicly accessible, or when an EC2 instance is launched without the required security group.

VPC Flow Logs

VPC Flow Logs is a feature that allows you to capture information about the IP traffic going to and from your VPC. With Flow Logs, you can monitor network traffic at the interface or subnet level, and gain insights into traffic patterns, security issues, and performance bottlenecks.

Flow Logs can be particularly useful for troubleshooting connectivity issues, detecting unusual traffic patterns, and investigating security incidents. You can use tools like Amazon Athena or Amazon CloudWatch Logs Insights to analyze Flow Log data and identify issues.

Best Practices for Monitoring and Troubleshooting on AWS

Effective monitoring and troubleshooting on AWS requires more than just the right tools and services. It also requires a well-defined strategy, clear objectives, and a commitment to continuous improvement. Here are some best practices to keep in mind:

Establish clear monitoring and troubleshooting objectives. What are the key metrics and logs that matter most to your application and your users? What are your target response times and error rates? By setting clear objectives upfront, you can focus your monitoring and troubleshooting efforts where they'll have the biggest impact.
Create a comprehensive monitoring strategy. Your monitoring strategy should cover all aspects of your application, from infrastructure and application metrics to logs and traces. It should also define clear roles and responsibilities for your team, as well as processes for incident response and escalation.
Implement proactive and reactive troubleshooting processes. Proactive troubleshooting involves using monitoring data to identify and resolve issues before they impact users. Reactive troubleshooting involves quickly identifying and resolving issues when they do occur. Both approaches are essential for maintaining a reliable and performant application.
Leverage automation and Infrastructure as Code. Automation and Infrastructure as Code (IaC) can help you ensure consistency and reliability across your monitoring and troubleshooting processes. By defining your monitoring configuration as code, you can version control your settings, test changes before applying them, and quickly roll back if needed.
Continuously optimize your approach. Monitoring and troubleshooting is an ongoing process, not a one-time setup. As your application evolves and your usage patterns change, you'll need to continuously optimize your monitoring and troubleshooting approach to ensure it remains effective. This may involve adding new metrics and logs, adjusting alarm thresholds, or refining your troubleshooting processes.

Conclusion

Monitoring and troubleshooting are essential skills for any AWS user, whether you're running a simple web application or a complex microservices architecture. By using tools like CloudWatch and X-Ray, plus other AWS services and best practices, you can gain deep visibility into your application's behavior and quickly resolve issues when they occur.

But effective monitoring and troubleshooting is about more than just tools and technology. It's also about having a clear strategy, well-defined processes, and a culture of continuous improvement. By setting clear objectives, implementing proactive and reactive troubleshooting approaches, and continuously optimizing your monitoring and troubleshooting practices, you can build more reliable, performant, and resilient applications on AWS.

So don't wait until something breaks to start thinking about monitoring and troubleshooting. Start implementing these best practices today, and you'll be well on your way to building better applications on AWS.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Security in AWS: IAM Best Practices and Advanced Techniques

Guillermo Ojeda — Wed, 20 Mar 2024 00:41:01 GMT

AWS IAM (Identity and Access Management) is the backbone of any AWS security strategy. It's the service that controls who can access your AWS resources and what actions they can perform. Get IAM right, and you're well on your way to a secure cloud deployment. Mess it up, and you're leaving the door wide open for all sorts of security nightmares.

In this article, we'll dive deep into IAM best practices and advanced techniques to help you lock down your AWS environment like a pro. We'll start with the fundamentals, then move on to more advanced topics like granular access control, cross-account access, and automating IAM with Infrastructure as Code. By the end, you'll have a solid understanding of how to use IAM to secure your AWS resources and protect your sensitive data.

Understanding IAM Fundamentals

Before we jump into the best practices and advanced techniques, let's make sure we're all on the same page with the IAM basics. Understanding these foundational concepts is crucial for designing and implementing an effective IAM strategy.

IAM Users, groups, and roles

At the core of IAM are three main identity types: users, groups, and roles. IAM users represent individual people or applications that need access to your AWS resources. IAM groups are collections of IAM users, making it easier to manage permissions for multiple users at once. IAM roles are a bit different: they're not associated with a specific user, but rather are used by AWS services or external identities that need temporary access to your resources.

IAM policies and permissions

IAM policies are JSON documents that define permissions for IAM identities. They specify what actions an identity can perform on which AWS resources. Policies can be attached to IAM users, groups, or roles, or even directly to AWS resources (more on that later).

Resource-based policies vs. identity-based policies

There are two main types of IAM policies: identity-based policies and resource-based policies. Identity-based policies are attached to IAM identities (users, groups, or roles) and define what actions those identities can perform on which resources. Resource-based policies, on the other hand, are attached directly to AWS resources (like S3 buckets or KMS keys) and define who can access those resources and what actions they can perform.

How IAM interacts with other AWS services

IAM is deeply integrated with other AWS services. It's used to control access to virtually every AWS resource, from EC2 instances to S3 buckets to Lambda functions. Many AWS services also have their own resource-based policies that work in conjunction with IAM policies to provide fine-grained access control.

IAM Best Practices

Now that we've got the fundamentals down, let's dive into some IAM best practices that every AWS user should follow.

Principle of least privilege

The principle of least privilege is the golden rule of IAM. It means only granting users the permissions they need to perform their job duties; no more, no less. This helps minimize the blast radius if a user's credentials are compromised, and makes it easier to audit and manage permissions over time.

Proper IAM user and role management

Managing IAM users and roles can get complex, especially in large organizations. Some key best practices include:

Create individual IAM users for each person who needs access to AWS, rather than sharing credentials
Use IAM roles for applications and services that need access to AWS resources
Regularly review and remove unused IAM users and roles

Using IAM groups for better organization

IAM groups make it easier to manage permissions for multiple users at once. By creating groups for different job functions or teams, you can assign permissions at the group level rather than individually. This makes it easier to onboard new users and ensure consistent permissions across your organization.

Password policies and MFA enforcement

Strong password policies and multi-factor authentication (MFA) are critical for protecting your IAM users. AWS allows you to set password policies that enforce minimum length, complexity, and rotation requirements. You should also require MFA for all IAM users, especially those with administrative privileges.

Regularly reviewing and rotating IAM credentials

Over time, IAM users can accumulate unnecessary permissions, and credentials can become stale or compromised. That's why it's important to regularly review IAM users and their permissions, and rotate access keys and passwords on a regular basis. AWS recommends rotating access keys every 90 days, and immediately revoking credentials for users who leave your organization.

Avoiding use of root user account

The root user account has unrestricted access to all AWS resources in your account, making it a prime target for attackers. Best practice is to avoid using the root user account for day-to-day tasks, and instead create individual IAM users with specific permissions. You should also enable MFA on the root user account and use it only for tasks that absolutely require root privileges.

Implementing Granular Access Control

One of the most powerful features of IAM is the ability to create fine-grained policies that precisely control access to your AWS resources. Here are some techniques for implementing granular access control:

Creating fine-grained IAM policies

When creating IAM policies, it's important to be as specific as possible. Instead of granting broad permissions like s3:*, grant only the specific actions needed, like s3:GetObject or s3:PutObject. You can also restrict access to specific resources using ARNs (Amazon Resource Names), and limit permissions to specific IP ranges or VPC endpoints.

Using policy conditions for more precise control

IAM policy conditions allow you to further refine permissions based on specific criteria. For example, you can use conditions to allow access only during certain time windows, from specific IP ranges, or for requests that include certain headers or parameters.

Leveraging IAM policy variables

IAM policy variables allow you to create dynamic policies that adapt to your environment. For example, you can use the aws:username variable to grant users access to their own home directory in an S3 bucket, or the aws:SourceIp variable to restrict access based on the requester's IP address.

Combining multiple policies for complex permissions

In some cases, you may need to combine multiple policies to achieve the desired level of access control. For example, you might use an identity-based policy to grant broad permissions to a group of users, then use a resource-based policy to further restrict access to specific resources.

Real-world examples of granular access control in AWS

Let's look at a couple real-world examples of granular access control in action:

Granting read-only access to an S3 bucket for a specific IAM user

{  "Version": "2012-10-17",  "Statement": [    {      "Sid": "ReadOnlyAccess",      "Effect": "Allow",      "Action": [        "s3:GetObject",        "s3:ListBucket"      ],      "Resource": [        "arn:aws:s3:::my-bucket",        "arn:aws:s3:::my-bucket/*"      ]    }  ]}

Allowing an EC2 instance to access S3, but only from a specific VPC endpoint

{  "Version": "2012-10-17",  "Statement": [    {      "Sid": "AccessFromVPCEndpoint",      "Effect": "Allow",      "Action": "s3:*",      "Resource": "*",      "Condition": {        "StringEquals": {          "aws:sourceVpce": "vpce-1a2b3c4d"        }      }    }  ]}

Cross-Account Access and IAM Roles

In many organizations, you'll need to grant access to AWS resources across multiple accounts. That's where IAM roles and cross-account access come in.

Understanding cross-account access

Cross-account access allows IAM users or roles in one AWS account to access resources in another account. This is useful for scenarios like granting developers access to a production account, or allowing a central security team to monitor multiple accounts.

Using IAM roles for secure access delegation

IAM roles are the preferred way to grant cross-account access. Instead of sharing access keys or passwords, you create an IAM role in the target account and grant permissions to the trusted entity (user or role) in the source account. The trusted entity can then assume the role and access resources in the target account.

Assuming roles vs. using access keys

When accessing resources across accounts, it's best to assume an IAM role rather than using access keys. Access keys are long-term credentials that can be easily leaked or compromised, while IAM roles provide temporary, short-lived credentials that automatically expire.

Best practices for managing cross-account access

Some best practices for managing cross-account access include:

Use IAM roles for cross-account access instead of sharing long-term access keys
Limit the permissions granted to cross-account roles to the minimum necessary
Regularly review and audit cross-account access
Use external ID's to prevent the confused deputy problem

Securing Access to AWS Resources

In addition to identity-based policies, AWS also supports resource-based policies that allow you to control access to specific resources like S3 buckets, KMS keys, and Lambda functions.

Using resource-based policies (e.g., S3 bucket policies)

Resource-based policies are attached directly to an AWS resource and define who can access that resource and what actions they can perform. For example, an S3 bucket policy can allow read access to objects from a specific IP range, or deny all public access to the bucket.

Combining resource-based and identity-based policies

Resource-based policies work in conjunction with identity-based policies to provide comprehensive access control. When an IAM user or role tries to access a resource, AWS evaluates both the identity-based policies attached to the user/role and the resource-based policy attached to the resource. Access is granted only if both policies allow it.

VPC endpoints and IAM policies

VPC endpoints allow you to securely access AWS services from within your VPC, without traversing the public internet. You can use IAM policies to control access to VPC endpoints, ensuring that only authorized users or roles can access the services behind the endpoint.

Securing access to API Gateway and Lambda

API Gateway and Lambda are powerful tools for building serverless applications, but they also introduce new security challenges. Best practices for securing access to these services include:

Use IAM roles to grant Lambda functions access to other AWS services
Implement OAuth or JWT authentication for APIs
Use API keys and usage plans to control access to APIs
Enable AWS WAF to protect against common web exploits

Protecting sensitive data with KMS and IAM

AWS Key Management Service (KMS) allows you to encrypt your sensitive data using centrally managed keys. IAM policies can be used to control access to KMS keys, ensuring that only authorized users or roles can encrypt or decrypt data.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Centralized IAM Management with AWS Organizations

For organizations with multiple AWS accounts, managing IAM across all those accounts can be a challenge. That's where AWS Organizations comes in.

Benefits of using AWS Organizations

AWS Organizations allows you to centrally manage access across multiple accounts. You can create an organization, invite accounts to join, and then use Service Control Policies (SCPs) to enforce IAM policies across all accounts in the organization.

Setting up an organization and creating member accounts

To get started with AWS Organizations, you create an organization and invite existing accounts to join, or create new accounts directly within the organization. You can organize accounts into Organizational Units (OUs) to apply policies hierarchically.

Implementing Service Control Policies (SCPs)

Service Control Policies are a powerful feature of AWS Organizations that allow you to centrally control what actions can be performed by IAM users and roles across all accounts in your organization. SCPs are similar to IAM policies, but they apply at the account level and can be used to enforce security best practices and compliance requirements.

Delegating access across accounts with IAM roles

In addition to SCPs, AWS Organizations also simplifies cross-account access using IAM roles. You can create a role in a central account and grant access to users or roles in other accounts within the organization. This allows you to centrally manage permissions while still enabling teams to access the resources they need.

Best practices for AWS Organizations

Some best practices for using AWS Organizations include:

Use SCPs to enforce security best practices and compliance requirements
Implement a least privilege model, granting only the permissions necessary for each account
Use AWS CloudTrail to monitor IAM activity across all accounts
Regularly review and audit IAM policies and roles
Use automation tools like AWS CloudFormation to manage IAM resources consistently across accounts

Monitoring and Auditing IAM Activity with AWS CloudTrail

Monitoring and auditing IAM activity is critical for detecting and responding to security incidents. AWS CloudTrail is a powerful tool for tracking IAM activity across your AWS accounts.

Importance of monitoring IAM events

By monitoring IAM events, you can detect suspicious activity like unauthorized access attempts, changes to IAM policies, or creation of new IAM users or roles. This allows you to quickly investigate and respond to potential security breaches.

Using AWS CloudTrail to track IAM actions

AWS CloudTrail logs all API calls made to IAM, including who made the call, what actions were performed, and what resources were affected. You can use CloudTrail to create a complete audit trail of IAM activity in your account.

Monitoring IAM events with Amazon CloudWatch

In addition to CloudTrail, you can use Amazon CloudWatch to monitor IAM events in real-time. CloudWatch allows you to create alarms based on specific IAM events, like failed login attempts or changes to sensitive policies.

Detecting and alerting on suspicious IAM activity

By combining CloudTrail and CloudWatch, you can create a comprehensive monitoring and alerting system for IAM. Some best practices include:

Create alarms for high-risk events like IAM policy changes or root account usage
Use CloudTrail Insights to detect unusual activity patterns
Integrate with SIEM tools like Splunk or AWS Security Hub for centralized monitoring

Conducting regular IAM audits and compliance checks

In addition to real-time monitoring, it's important to conduct regular IAM audits to ensure your policies and permissions are configured correctly and comply with your security and compliance requirements. Tools like AWS IAM Access Analyzer and AWS Config can help automate this process.

Advanced IAM Security Features

These are some more advanced features of AWS IAM, or some related services that will help you secure your AWS accounts and workloads.

IAM Access Analyzer

AWS IAM Access Analyzer is a powerful tool for identifying unintended access to your AWS resources. It analyzes your IAM policies and resource-based policies to determine who has access to your resources and whether that access is intended.

IAM Access Analyzer can help you identify scenarios like:

Public access to S3 buckets or other resources
Access granted to external AWS accounts
Overly permissive IAM policies

By identifying these issues early, you can take corrective action before they lead to a security breach.

IAM Permission Boundaries

IAM Permission Boundaries are a way to limit the maximum permissions that can be granted to an IAM user or role. They're useful for scenarios like allowing developers to create their own IAM policies, but ensuring they can't grant themselves excessive permissions.

To implement a permission boundary, you create an IAM policy that defines the maximum permissions allowed, then attach that policy as a permission boundary to an IAM user or role. Any policies attached to the user or role are evaluated within the constraints of the permission boundary.

IAM Policy Conditions

IAM Policy Conditions allow you to create more fine-grained access control policies based on specific attributes of a request, like the source IP address, time of day, or presence of multi-factor authentication.

Some examples of using IAM policy conditions include:

Allowing access only during business hours
Requiring multi-factor authentication for sensitive actions
Restricting access to specific IP ranges or VPC endpoints

IAM Identity Center for AWS SSO

IAM Identity Center (formerly AWS Single Sign-On) is a centralized access management service that allows users to sign in once and access multiple AWS accounts and cloud applications.

With IAM Identity Center, you can create and manage user identities in a central directory, then assign permissions to those users across multiple AWS accounts. Users sign in once to the IAM Identity Center portal, then access their assigned accounts and applications without needing to manage separate credentials.

Integrating IAM Identity Center with third-party identity providers

IAM Identity Center also allows you to integrate with third-party identity providers like Azure AD, Okta, or Ping Identity. This allows you to use your existing identity management system to control access to AWS, without needing to recreate user identities in IAM.

Automating IAM with Infrastructure as Code Tools

As your AWS environment grows, managing IAM policies and roles manually becomes increasingly difficult. That's where Infrastructure as Code (IaC) tools like AWS CloudFormation, Terraform, and the AWS CDK come in.

Benefits of using Infrastructure as Code (IaC) for IAM

By defining your IAM resources as code, you can:

Version control your IAM policies and roles
Automate the creation and updates of IAM resources
Ensure consistency across multiple AWS accounts and regions
Easily roll back changes if needed

Using AWS CloudFormation to manage IAM resources

AWS CloudFormation is a native AWS service that allows you to define your infrastructure as code using JSON or YAML templates. You can use CloudFormation to create and manage IAM users, groups, roles, and policies across multiple accounts and regions.

Terraform and AWS CDK for IAM automation

Terraform and the AWS Cloud Development Kit (CDK) are popular third-party IaC tools that support IAM resource management. Terraform uses a declarative language called HCL (HashiCorp Configuration Language) to define infrastructure resources, while the AWS CDK allows you to define infrastructure using familiar programming languages like JavaScript, TypeScript, Python, or Java.

Best practices for IAM automation and version control

When automating IAM with IaC tools, it's important to follow best practices like:

Storing your IaC templates in a version control system like Git
Using separate AWS accounts for development, staging, and production environments
Implementing a code review process for IAM changes
Using tools like AWS CloudTrail and AWS Config to monitor and audit IAM changes

By treating your IAM resources as code and following these best practices, you can ensure consistency, maintainability, and auditability of your IAM configuration.

Conclusion

IAM is a critical component of securing your AWS environment, but it can be really complex and challenging to manage at scale. By following best practices like the principle of least privilege, using IAM roles for cross-account access, and implementing strong password policies and MFA, you can lay a solid foundation for your IAM strategy.

But to truly secure your accounts and environments, you need to go beyond the basics. Techniques like granular access control with policy conditions, resource-based policies, and permission boundaries allow you to implement fine-grained security policies that precisely control access to your resources. Centralized management with AWS Organizations and monitoring with CloudTrail and CloudWatch provide visibility and actionable data across your entire AWS environment.

As your AWS usage grows, automating IAM with Infrastructure as Code tools like CloudFormation, Terraform, and the AWS CDK becomes increasingly important. By defining your IAM resources as code and following best practices for version control and testing, you can ensure consistency and maintainability of your IAM configuration.

Securing your AWS environment is an ongoing process, not a one-time task. As you adopt new AWS services and your application requirements evolve, it's important to continually review and update your IAM policies to ensure they align with your security goals. Regular audits and compliance checks, along with automated monitoring and alerting, can help you stay on top of your IAM configuration and quickly detect and respond to potential issues.

By following the best practices and techniques outlined in this article, you can build a robust and secure IAM strategy that helps you protect your critical AWS resources and data. But don't stop here! Continue to explore and adopt new security services and features like AWS GuardDuty, AWS Security Hub, and AWS Secrets Manager to further strengthen your security posture.

Remember, security is a shared responsibility between AWS and you, the customer. By taking a proactive and layered approach to IAM and security, you can ensure that your AWS environment is protected against evolving threats and ready to support your business needs for years to come.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Disaster Recovery Strategies on AWS: Ensuring Business Continuity

Guillermo Ojeda — Fri, 15 Mar 2024 01:07:48 GMT

We're now living in the world of immediate and always-on stuff, where even a few minutes of downtime can be a disaster for businesses. Customers expect 24/7 availability, and any interruption in service can lead to lost revenue, damaged reputation, and even legal consequences. That's where disaster recovery (DR) and business continuity planning come into play.

Disaster recovery is all about preparing for the worst-case scenariosthose unexpected events that can bring your systems to a halt. Whether it's a natural disaster, human error, or a cyber-attack (which is often also caused by human error), having a solid DR plan in place can make the difference between a minor hiccup and a catastrophic failure.

Amazon Web Services (AWS) offers a wide variety of services and features to help you build robust, resilient architectures that can continue operating in the event of a disaster. In this article, we'll explore the key concepts and strategies for implementing effective disaster recovery on AWS.

Understanding RTO and RPO

Before we dive into specific DR strategies, let's take a moment to define two critical metrics: Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

RTO is the maximum acceptable amount of time your systems can be down when a disaster occurs. In other words, it's the timeframe within which you need to restore your applications and data to avoid unacceptable consequences. For example, a financial trading platform might have an RTO of just a few minutes, while a less critical internal tool might have an RTO of several hours.

RPO, on the other hand, refers to the maximum acceptable amount of data loss your business can tolerate. It's determined by how frequently you take backups and how much data you're willing to lose in the event of a disaster. For instance, an e-commerce site might have an RPO of just a few seconds, meaning they can only afford to lose a very small amount of data, while a blog might be okay with losing a day's worth of content.

Your RTO and RPO will heavily influence your choice of DR strategies. The tighter your objectives, the more robust (and expensive) your DR solution will need to be. It's all about finding the right balance between cost and risk.

Designing a Highly Available Architecture on AWS

The first step before even thinking about disaster recovery is a highly available architecture. On AWS, that means leveraging multiple Availability Zones (AZs) to build redundancy and fault tolerance into your applications.

AWS operates a global network of data centers, grouped into regions and further subdivided into AZs. Each AZ is a fully isolated partition of the AWS infrastructure, with independent power, cooling, and networking. By deploying your applications across multiple AZs within a region, you can protect against failures at the data center level.

Of course, building a highly available architecture involves more than just spreading your resources across AZs. You'll also need to implement load balancing and auto-scaling to distribute traffic evenly and automatically adjust capacity based on demand. Services like Amazon EC2 Auto Scaling and Elastic Load Balancing make this easy to achieve.

But what if an entire region goes down? That's where multi-region architectures come into play. By replicating your data and applications across multiple AWS regions, you can ensure that even if an entire region becomes unavailable, your business can continue to operate from another location.

That is what we call Disaster Recovery.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Disaster Recovery Strategies on AWS

Now that we've covered the basics of high availability, let's explore four common DR strategies you can implement on AWS: backup and restore, pilot light, warm standby, and multi-site active-active.

Backup and Restore Strategy

The backup and restore strategy is the most basic and cost-effective approach to DR on AWS. It involves taking regular backups of your data and storing them in a secure, durable location like Amazon S3. In the event of a disaster, you can restore your systems from the most recent backup. While simple, this strategy typically involves significant downtime, as you'll need to provision new infrastructure and restore your data before your applications can be brought back online. It's best suited for non-critical workloads with lenient RTO and RPO requirements.

Pilot Light Strategy

The pilot light strategy involves keeping a minimal version of your environment running in a secondary region, ready to scale up quickly in the event of a disaster. Core components, like your database servers, are always on, but application servers are kept in a stopped state to minimize costs. When disaster strikes, you can quickly start up your application servers, scale them out to handle the full production load, and redirect traffic to the secondary region. This approach offers faster recovery times than the backup and restore strategy, but still involves some downtime.

Warm Standby Strategy

The warm standby strategy takes the pilot light approach a step further. Instead of keeping your secondary environment in a minimal state, you maintain a scaled-down version of your full production environment in the secondary region, with all components running. In the event of a disaster, you can rapidly scale up the secondary environment to handle the full production load. This strategy provides even faster recovery times than the pilot light approach, but comes with higher ongoing costs.

Multi-Site Active-Active Strategy

The multi-site active-active strategy is the most comprehensive and expensive DR approach. It involves running your full production environment in multiple regions simultaneously, with each region serving traffic and replicating data in real-time. If one region fails, traffic is automatically routed to the other active region(s) without any interruption in service. This strategy provides the highest level of availability and the fastest recovery times, but also incurs the highest costs, as you're essentially running multiple copies of your entire infrastructure.

How to Create Backups on AWS

Regardless of which DR strategy you choose, creating regular backups is a critical component of any DR plan. AWS offers several backup services and features to help you protect your data.

For Amazon EC2 instances, you can create point-in-time snapshots of your EBS volumes, which can be used to restore your instances to a previous state. You can automate the creation and management of EBS snapshots using AWS Backup, a fully managed backup service that simplifies the process of backing up your AWS resources.

For managed database services like Amazon RDS and Amazon DynamoDB, automated backups are typically enabled by default. You can also create manual snapshots for longer-term retention or to copy your backups to another region for DR purposes.

It's important to regularly test your backups to ensure they can be successfully restored in the event of a disaster. You should also consider implementing a backup retention policy to ensure you have the right balance of short-term and long-term backups to meet your RPO requirements.

Replication and Failover Strategies

In addition to backups, replication and failover are key components of many DR strategies on AWS. By replicating your data and applications across multiple regions, you can ensure that even if an entire region becomes unavailable, your business can continue to operate from another location.

AWS offers several services and features to help you implement cross-region replication and failover. For example, you can use Amazon S3 Cross-Region Replication to automatically replicate objects across S3 buckets in different AWS regions. For databases, you can use Amazon RDS Read Replicas or Amazon Aurora Global Database to create cross-region read replicas that can be quickly promoted to standalone instances in the event of a disaster.

When it comes to failover, you'll need to consider both application-level and DNS-level strategies. At the application level, you can use services like Amazon Route 53 Application Recovery Controller to continuously monitor your application's health and automatically route traffic to healthy resources in the event of a failure.

For DNS failover, Amazon Route 53 offers a variety of routing policies that can help you direct traffic to the appropriate region based on factors like latency, geography, and resource health. By combining these strategies, you can create a robust, automated failover solution that minimizes downtime and ensures your applications remain available even in the face of a regional outage.

Disaster Recovery Automation and Testing

Automation is key to implementing an effective DR strategy on AWS. By declaring your infrastructure as code using tools like AWS CloudFormation and Terraform, you can ensure that your DR environment can be quickly and consistently provisioned in the event of a disaster.

Infrastructure as Code (IaC) not only speeds up the recovery process, but also reduces the risk of human error and ensures that your DR environment is always in a known, consistent state. You can use IaC templates to define everything from your network topology to your application configurations, making it easy to spin up an exact replica of your production environment in a secondary region.

Regular testing is also essential to ensuring the viability of your DR plan. You should schedule periodic DR drills to simulate different failure scenarios and validate that your recovery processes work as expected. These drills can help you identify gaps in your plan and areas for improvement, ensuring that you're always prepared for a real-world disaster.

Chaos Engineering on AWS

In addition to traditional DR testing, you may also want to consider implementing chaos engineering practices to proactively identify weaknesses in your systems. Chaos engineering involves intentionally injecting failures into your environment to test its resilience and uncover hidden vulnerabilities.

AWS offers a service called AWS Fault Injection Simulator (FIS) that makes it easy to perform controlled chaos experiments on your AWS workloads. With FIS, you can simulate a variety of failure scenarios, like EC2 instance terminations, API throttling, and network latency, and observe how your applications respond.

By regularly performing chaos experiments, you can build confidence in your systems' ability to withstand failures and identify opportunities for improvement before a real disaster strikes.

Monitoring and Alerting for Disaster Recovery

Effective monitoring and alerting are critical components of any DR strategy. You need to be able to quickly detect and respond to issues before they escalate into full-blown disasters.

AWS offers a range of monitoring and logging services, like Amazon CloudWatch and AWS X-Ray, that can help you gain visibility into the health and performance of your applications. CloudWatch allows you to collect and track metrics, collect and monitor log files, and set alarms that notify you when thresholds are breached. X-Ray helps you analyze and debug distributed applications, providing insights into how your services are interacting and performing.

In addition to these services, you should also consider implementing a robust alerting strategy using Amazon Simple Notification Service (SNS). With SNS, you can send notifications via email, SMS, or even trigger automated remediation actions when specific events occur or thresholds are crossed.

By combining comprehensive monitoring with proactive alerting, you can ensure that you're always aware of the state of your environment and can quickly respond to any issues that arise.

Cost Optimization for Disaster Recovery

Implementing a comprehensive DR strategy can be expensive, especially if you're maintaining a fully replicated environment in a secondary region. However, there are several strategies you can use to optimize your costs without compromising on your DR objectives.

One approach is to leverage AWS cost-saving features like Reserved Instances and Spot Instances for your DR environment. By purchasing Reserved Instances, you can significantly reduce your EC2 costs compared to On-Demand pricing. Spot Instances allow you to bid on spare EC2 capacity at steep discounts, which can be ideal for non-critical DR workloads.

Another strategy is to tiered approach to DR, using different strategies for different parts of your application stack based on their criticality and recovery requirements. For example, you might use a multi-site active-active approach for your most critical databases, but a pilot light approach for less critical application tiers.

Continuously monitoring and optimizing your DR costs is also important. You should regularly review your DR environment to identify any underutilized or unnecessary resources, and adjust your strategy accordingly. Tools like AWS Cost Explorer and AWS Budgets can help you track your spending and set alerts when you're approaching your budget limits.

Conclusion

Implementing an effective disaster recovery strategy on AWS requires careful planning, robust architecture, and regular testing and optimization. By leveraging the right mix of AWS services and features, you can create a DR solution that meets your business's unique requirements for availability, recovery time, and data protection.

To recap, the four main DR strategies you can implement on AWS are:

Backup and restore: Periodically backing up your data and resources, and restoring them in the event of a disaster.
Pilot light: Maintaining a minimal version of your environment in a secondary region, ready to scale up when needed.
Warm standby: Running a scaled-down version of your full environment in a secondary region, with the ability to quickly scale up to handle the full production load.
Multi-site active-active: Running your full production environment simultaneously in multiple regions, with automatic failover between regions.

Regardless of which strategy you choose, it's critical to regularly test and refine your DR plan to ensure it remains effective as your business evolves. By combining comprehensive monitoring, automated failover, and regular chaos engineering practices, you can build a resilient, highly available application that can weather any storm.

Remember, disaster recovery planning isn't a one-time exerciseit's an ongoing process that requires continuous improvement and optimization. By staying proactive and prepared, you can ensure that your business can continue to operate and thrive, no matter what challenges come your way.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Understanding Amazon S3 Pricing

Guillermo Ojeda — Thu, 01 Feb 2024 15:10:41 GMT

What is Amazon S3

Amazon Simple Storage Service (S3) is an object storage service by AWS that can store any kind of information. S3 is known for its durability, availability, and scalability, and the fact that all of these features come out of the box makes S3 a go-to solution for a wide range of data storage needs.

In S3 users create 'buckets' containers for data stored in the AWS cloud. Storing data in buckets serves various use cases, from website hosting to backup and recovery, data archiving, and big data analytics.

S3 Storage Classes

Amazon S3 offers several storage classes designed for different use cases:

S3 Standard: For frequently accessed data. You're billed per storage and per request.
S3 Standard-IA (Infrequent Access): For data that is accessed less frequently but requires rapid access when needed. Lower fee per GB stored than Standard, but a higher fee per request.
S3 One Zone-IA: Similar to Standard-IA, but data is stored in a single Availability Zone only, and it's also cheaper.
S3 Express One Zone: High-performance storage for your most frequently accessed data.
S3 Intelligent-Tiering: Automatically moves data between the Standard and Standard-IA tiers based on continuously evaluating your access patterns. Ideal for data with unknown or changing access patterns.
S3 Glacier: For long-term archival. Very low storage cost, but retrieving data can take several hours and is even more expensive than Standard-IA.
S3 Glacier Deep Archive: Amazon S3's lowest-cost storage class for long-term archiving where data retrieval times of 12 hours or more are acceptable.

S3 Pricing Explained

As mentioned above, the different storage classes have different prices. Here are the prices for each S3 storage class:

Pricing for S3 Standard

Storage: $0.023 per GB for the first 50 TB, $0.022 per GB for the next 450 TB, $0.021 per GB for storage over 500 TB.
Access: $0.005 per 1000 PUT, COPY, POST, LIST requests. $0.0004 per 1000 GET, SELECT requests.
Data Retrieval: $0.00 per GB
Other charges: None

Pricing for S3 Standard-IA (Infrequent Access)

Storage: $0.0125 per GB
Access: $0.01 per 1000 PUT, COPY, POST, LIST requests. $0.001 per 1000 GET, SELECT requests.
Data Retrieval: $0.01 per GB
Other charges: $0.01 per Lifecycle Transition request

Pricing for S3 One Zone-IA

Storage: $0.01 per GB
Access: $0.01 per 1000 PUT, COPY, POST, LIST requests. $0.001 per 1000 GET, SELECT requests.
Data Retrieval: $0.01 per GB
Other charges: $0.01 per Lifecycle Transition request

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Pricing for S3 Express One Zone

Storage: $0.16 per GB
Access: $0.0025 per 1000 PUT, COPY, POST, LIST requests. $0.0002 per 1000 GET, SELECT requests.
Data Retrieval: $0.00 per GB
Other charges: None

Pricing for S3 Intelligent-Tiering

Storage:
Frequent Access tier: $0.023 per GB for the first 50 TB, $0.022 per GB for the next 450 TB, $0.021 per GB for storage over 500 TB.
Infrequent Access tier: $0.0125 per GB.
Archive Instant Access tier: $0.004 per GB.
Access: $0.005 per 1000 PUT, COPY, POST, LIST requests. $0.0004 per 1000 GET, SELECT requests.
Data Retrieval: $0.00 per GB
Other charges: $0.0025 per 1,000 objects

Pricing for S3 Glacier

Storage:
Instant Retrieval: $0.004 per GB
Flexible Retrieval: $0.0036 per GB
Access:
Instant Retrieval: $0.02 per 1000 PUT, COPY, POST, LIST requests. $0.01 per 1000 GET, SELECT requests.
Flexible Retrieval: $0.03 per 1000 PUT, COPY, POST, LIST requests. $0.0004 per 1000 GET, SELECT requests.
Data Retrieval:
Instant Retrieval: $0.03 per GB
Flexible Retrieval: $0.03 per GB for Expedited, $0.01 per GB for Standard
Other charges:
Instant Retrieval: $0.02 per Lifecycle Transition request
Flexible Retrieval: $0.03 per Lifecycle Transition request

Pricing for S3 Glacier Deep Archive

Storage: $0.00099 per GB
Access: $0.05 per 1000 PUT, COPY, POST, LIST requests. $0.0004 per 1000 GET, SELECT requests.
Data Retrieval: $0.02 per GB for Standard, $0.0025 per GB for Bulk
Other charges: $0.05 per Lifecycle Transition request

Pricing Examples for S3 Storage Classes

To give you a clearer picture of how S3 pricing works, let's see some examples. For each example, assume the following:

Storage: 100 GB
Access: 100,000 GET requests, 10,000 PUT requests
Data Retrieval: 100 GB

Example 1: S3 Standard Storage

Storage Cost: $2.30
Access Cost: $90
Data Retrieval Cost: $0
Total Cost: $92.30

Example 2: S3 Express One Zone

Storage Cost: $16
Access Cost: $200
Data Retrieval Cost: $0
Total Cost: $216

Example 3: S3 Standard-IA

Storage Cost: $1.25
Access Cost: $200
Data Retrieval Cost: $10
Total Cost: $211.25

AWS S3 Free Tier

AWS offers a free tier for S3, which includes:

5 GB of Standard Storage
20,000 GET Requests
2,000 PUT, COPY, POST, or LIST Requests

This free tier is a great way to start experimenting with S3 without incurring immediate costs. Also, for really small uses like MVPs you end up paying $0 initially, and your costs only grow as you acquire more users.

Tips for Optimizing AWS S3 Costs

Understand Your Data Usage: Analyze your data access patterns to choose the most cost-effective storage class.
Monitor Your S3 Billing: Regularly check your AWS billing dashboard to track your S3 usage and costs.
Leverage S3 Lifecycle Policies: Automatically move or archive data to lower-cost storage classes.
Use S3 Analytics: Monitor and analyze storage access patterns for cost optimization.

The goal of this guide was to help you understand AWS S3 pricing. Now you're able to use the best storage classes for your use cases, minimizing cost while maintaining durability and availability.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Understanding AWS Lambda Pricing

Guillermo Ojeda — Mon, 29 Jan 2024 21:56:40 GMT

In this article, we'll dive deep into the pricing structure of AWS Lambda, breaking down its components, and providing examples to help you understand how costs are calculated. We'll also discuss the AWS Lambda Free Tier and offer practical tips for optimizing your Lambda usage to keep costs manageable.

What is AWS Lambda?

AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you. This service is capable of executing code in various languages and is commonly used for applications such as web application backends, data processing, and real-time file processing.

How AWS Lambda Works

Event-Driven Execution: AWS Lambda is designed to run code in response to triggers such as changes in data within AWS services (like S3 or DynamoDB), requests to an API Gateway, or direct invocations via SDKs.
Automatic Scaling: The service scales automatically, executing code in parallel and handling each trigger individually.
Flexible Resource Allocation: Compute power is allocated based on the memory configured for your function, ensuring efficient resource utilization.

Key Components of AWS Lambda

Lambda Functions: The core unit where your code resides, along with associated configuration information such as the function name, memory, and timeout settings.
Event Sources: These are AWS services or custom sources that trigger your Lambda function.
Logs and Monitoring: Integration with AWS CloudWatch ensures detailed monitoring and logging of your Lambda functions.
Runtime Environments: Supports multiple programming languages and runtimes.

Understanding AWS Lambda Pricing

AWS Lambda's pricing is primarily based on two components: the number of requests your functions process and the compute time they consume. Understanding these components in detail, including their cost, is crucial for effectively managing your AWS Lambda expenses. Here's an expanded breakdown:

Requests:
- Cost: AWS Lambda charges $0.20 per 1 million requests.
- What It Means: Every time your function is triggered and executed, it counts as a request.
Compute Time:
- Cost: Compute time is charged at $0.00001667 for every GB-second used.
- Calculation: The cost is based on the amount of memory allocated to your function and the time it takes to execute.
- GB-Second: A GB-second is a measure that combines memory usage and execution time. If your function uses 512MB of memory and runs for 3 seconds, it consumes 1.5 GB-seconds (0.5 GB * 3 seconds).

AWS Lambda Free Tier

AWS offers a generous free tier for Lambda:

1 million free requests per month.
400,000 GB-seconds of compute time per month.

Pricing Examples for AWS Lambda

To illustrate how Lambda pricing works, let's consider a few examples:

Example 1: Low Frequency, Simple Function
- Requests: 100,000 in a month
- Duration: Each request runs for 500ms with 128MB memory allocation.
- Total Cost: $0.02 for invocations + $0.1042 for execution time = $0.1242 / month.
Example 2: High Frequency, Complex Function
- Requests: 10 million in a month
- Duration: Each request runs for 800ms with 256MB memory allocation.
- Total Cost: $2.00 for invocations + $33.34 for execution time = $35.34 / month.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Tips for Optimizing AWS Lambda Costs

Monitor Function Invocations: Regularly review your Lambda function metrics through AWS CloudWatch to understand your usage patterns.
Adjust Memory Allocation: Optimize the memory allocation for your functions to balance performance and cost.
Reduce Execution Time: Optimize your code to run faster, which directly reduces the compute time cost.
Regularly Review Your Architecture: As your application evolves, continually reassess whether your use of Lambda aligns with your operational requirements and cost objectives.
Leverage Free Tier: Make the most out of the AWS Lambda Free Tier, especially for development and testing purposes.

Conclusion

AWS Lambda offers a flexible, cost-effective solution for running code in response to events. By understanding its pricing model and effectively managing your usage, you can leverage Lambda to build scalable, efficient applications without worrying about infrastructure management.

The goal of this guide is to help you gain a better understanding of AWS Lambda's pricing structure, enabling you to use this fantastic service more efficiently while keeping your AWS costs manageable.

Stop copying cloud solutions, start understanding them. Join over 4000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Disaster Recovery and Business Continuity on AWS

Guillermo Ojeda — Tue, 05 Dec 2023 19:41:37 GMT

Imagine this scenario: You successfully replicated your data to another region, so if your AWS region fails you can still access the data. However, all your servers are still down! You'd like to continue operating even in the event of a disaster.

Disaster Recovery and Business Continuity

Disasters are events that cause critical damage to our ability to operate as a business. Consider an earthquake near your datacenter (or the ones you're using in AWS), or a flood in that city (this happened to GCP in Paris in the second half of 2023). It follows that Business Continuity is the ability to continue operating (or recovering really fast) in the event of a Disaster. The big question is: How do we do that?

First, let's understand what recovering looks like, and how much data and time can we lose (yes, we lose both) in the process. There are two objectives that we need to set:

Recovery Point Objective (RPO)

The RPO is the maximum time that passes between when the data is written to the primary storage and when it's written to the backup. For periodic backups, RPO is equal to the time between backups. For example, if you take a snapshot of your database every 12 hours, your RPO is 12 hours. For continuous replication, the RPO is equal to the replication delay. For example, if you continuously replicate data from the primary storage to a secondary one, the RPO is the delay in that replication.

Data that hasn't yet been written to the backup won't be available in the event of a disaster, so you want your RPO to be as small as possible. However, minimizing it may require adopting new technologies, which means effort and money. Sometimes it's worth it, sometimes it isn't.

Different data may require different RPOs. Since the easiness of achieving a low RPO mostly depends on what technologies you use, the decision of what the RPO is for a specific set of data should be considered when selecting where to store it.

Recovery Time Objective (RTO)

The RTO is the maximum time that can pass from when a failure occurs to when you're operational again. The thing that will have the most impact on RTO is your disaster recovery strategy, which we'll see a bit further down this article. Different technologies will let you reduce the RTO within the same DR strategy, and a technology change may be a good way to reduce RTO without significantly increasing costs.

Stages of a Disaster Recovery Process

These are the three stages that a disaster recovery process goes through, always in this order.

Detect

Detection is the phase between when the failure actually occurs and when you start doing something about it. The absolute worst way to learn about a failure is from a customer, so detection should be the first thing you automate. The easiest way to do so is through a health check, which is a sample request sent periodically (e.g. every 30 seconds) to your servers. For example, Application Load Balancer implements this to detect whether targets in a target group are healthy, and can raise a CloudWatch Alarm if it has no healthy targets. You can connect that alarm to SNS to receive an email when that happens, and you'd have automated detection.

Escalate and Declare

This is the phase from when the first person is notified about an event 🔥 and when the alarm 🚨 sounds and everyone is called to battle stations 🚒. It may involve manually verifying something, or it may be entirely automated. In many cases it happens after a few corrective actions have been attempted, such as rolling back a deployment.

Restore

These are the steps necessary to get a system back online. It may be the old system that we're repairing, or it may be a new copy that we're preparing. It usually involves one or several automated steps, and in some events manual intervention is needed. It ends when the system is capable of serving production traffic.

Fail over

Once we have a live system capable of serving production traffic, we need to send traffic to it. It sounds trivial, but there are several factors that make it worth being a stage on its own:

You usually want to do it gradually, to avoid crashing the new system
It may not happen instantly (for example, DNS propagation)
Sometimes this stage is triggered manually
You need to verify that it happened
You continue monitoring afterward

Disaster Recovery Strategies on AWS

The two obvious solutions to disaster recovery are:

Backing up data to another region and re-creating the entire system
Continuously running the system in two regions

Both work, but they're not the only ones. They're actually the two extremes of a spectrum:

Backup and Restore

This is the simplest strategy, and the playbook is:

Before an event (and continuously):
- Back up all your data to a separate AWS region, which we call the DR region
When an event happens:
- Restore the data stores from the backups
- Re-create the infrastructure from scratch
- Fail over to the new infrastructure

It's by far the cheapest, all you need to pay for are the backups and any other regional resources that you need to operate (e.g. KMS keys used to encrypt data). When a disaster happens, you restore from the backups and re-create everything.

I'm being purposefully broad when I say "re-create everything". I bet your infrastructure took you a long time to create. How fast can you re-create it? Can you even do it in hours or a few days, if you can't look at how you did it the first time? (Remember the original region is down).

The answer, of course, is Infrastructure as Code. It will let you launch a new stack of your infrastructure with little effort and little margin for error. That's why we (and by we I mean anyone who knows what they're doing with cloud infrastructure) insist so much on IaC.

As you're setting up your infrastructure as code, don't forget about supporting resources. For example, if your CI/CD Pipeline runs in a single AWS Region (e.g. you're using CodePipeline), you'll need to be ready to deploy it to the new region along with your production infrastructure. Other common supporting resources are values stored in Secrets Manager or SSM Parameter Store, KMS keys, VPC Endpoints, and CloudWatch Alarms configurations.

You can define all your infrastructure as code, but creating the new copy from your templates usually requires some manual actions. You need to document everything, so you're clear on what's the correct order for the different actions, what parameters to use, common errors and how to avoid or fix them, etc. If you have all of your infrastructure defined as code, this documentation won't be really large. However, it's still super important.

Finally, test everything. Don't just assume that it'll work, or you'll find out that it doesn't right in the middle of a disaster. Run periodic tests for your Disaster Recovery plan, keep the code and the documentation up to date, and keep yourself and your teams sharp.

Pilot Light

With Backup and Restore you need to create a lot of things from scratch, which takes time. Even if you cut down all the manual processes, you might spend several hours staring at your terminal or the CloudFormation console waiting for everything to create.

What's more, most of these resources aren't even that expensive! Things like an Auto Scaling Group are free (without counting the EC2 instances), an Elastic Load Balancer costs only $23/month, and VPC and subnets are free. The largest portion of your costs come from the actual capacity that you use: a large number of EC2 instances, DynamoDB tables with a high capacity, etc. But since most of them are scalable, you could keep all the scaffolding set up with capacity scaled to 0, and scale up in the event of a disaster, right?

That's the idea behind Pilot Light, and this is the basic playbook:

Before an event (and continuously):
- Continuously replicate all your data to a separate AWS region, which we call the DR region
- Set up your infrastructure in the DR region, with capacity at 0
When an event happens:
- Scale up the infrastructure in the DR region
- Fail over to the DR region

One of the things that takes the longest time to create is data stores from snapshots. For that reason, the prescriptive advice (though not a strict requirement) for Pilot Light is to keep data stores functioning, instead of just keeping the backups and restoring from them in a disaster. It is more expensive though.

Since scaling can be done automatically, the Restore stage is very easy to automate entirely when using Pilot Light. Also, since the scaling times are much shorter than creating everything from scratch, the impact of automating all manual operations will be much higher, and the resulting RTO much lower than with Backup and Restore.

Warm Standby

The problem with Pilot Light is that, before it scales, it cannot serve any traffic at all. It works just like the pilot light in a home heater: a small flame that doesn't produce any noticeable heat, but is used to light up the main burner much faster. It's a great strategy, and your users will appreciate that the service interruption is brief, in the order of minutes. But what if you need to serve at least those users nearly immediately?

Warm Standby uses the same idea as Pilot Light, but instead of remaining at 0 capacity, it keeps some capacity available. That way, if there is a disaster you can fail over immediately and start serving a subset of users, while the rest of them wait until your infrastructure in the DR region scales up to meet the entire production demand.

Here's the playbook:

Before an event (and continuously):
- Continuously replicate all your data to a separate AWS region, which we call the DR region
- Set up your infrastructure in the DR region, with capacity at a percentage greater than 0
When an event happens:
- Reroute a portion of the traffic to the DR region
- Scale up the infrastructure
- Reroute the rest of the traffic

What portion of the traffic you reroute depends on how much capacity you maintain "hot" (i.e. available). This lets you do some interesting things, like setting up priorities where traffic for some critical services is rerouted and served immediately, or even for some premium users.

It also presents a challenge: How much infrastructure do you keep hot in your DR region? It could be a fixed number like 2 EC2 instances, or you could dynamically adjust this to 20% of the capacity of the primary region (just don't accidentally set it to 0 when the primary region fails!).

You'd think dynamically communicating to the DR region the current capacity or load of the primary region would be too problematic to bother with. But you should be doing it anyway! When a disaster occurs and you begin scaling up your Pilot Light or Warm Standby infrastructure, you don't want to go through all the hoops of scaling slowly from 0 or low capacity to medium, to high, to maximum. You'd rather go from wherever you are directly to 100% of the capacity you need, be it 30 EC2 instances, 4000 DynamoDB WCUs, or whatever service you're using. To do that, you need to know how much is 100%, or in other words, how much capacity the primary region was running on before it went down. Remember that once it's down you can't go check! To solve that, back up the capacity metrics to the DR region. And once you have them, it's trivial to dynamically adjust your warm standby's capacity.

You can pick any number or percentage that you want, and it's really a business decision, not a technical one. Just keep in mind that if you pick 0 you're actually using a Pilot Light strategy, and if you pick 100% it's a variation of Warm Standby called Hot Standby, where you don't need to wait until infrastructure scales before rerouting all the traffic.

An important aspect that Warm Standby introduces is the fact that all three strategies that we've seen so far are active/passive, meaning that one region (the active one) serves traffic, while the other region (the DR one, which is passive) doesn't receive any traffic. With Backup and Restore and with Pilot Light that should be obvious, since they're not able to serve any traffic. Warm Standby is able to serve some traffic, and Hot Standby is able to serve the entirety of the traffic. But even then, they don't get any traffic, and the DR region is passive.

The reason for this is that, if you allow your DR region to write data while you're using the primary region (i.e. while it isn't down), then you need to deal with distributed databases with multiple writers, which is much harder than a single writer and multiple readers. Some managed services handle this very well, but even then there are implications that might affect your application. For example, DynamoDB Global Tables handle writes in any region where the global table is set up, but they resolve conflicts with a last-writer-wins reconciliation strategy, where if two regions receive write operations for the same item at the same time (i.e. within the replication delay window), the one that was written last is the one that sticks. Not a bad solution, but you don't want to overcomplicate things if you don't have to.

Multi-site Active/Active

In active/passive configurations, only one region serves traffic. Active/active spreads the traffic across both regions in normal operation conditions (i.e. when there's no disaster). As mentioned in the previous paragraph, this introduces a few problems.

The main problem is the read/write pattern that you'll use. Distributed data stores with multiple write nodes can experience "contention", a term that means everything is slowed down because multiple nodes are trying to access the same data, and they need to wait for the others so they don't cause inconsistencies. Contention is one of the reasons why databases are hard.

Another problem is that you're effectively managing two identical but separate infrastructures. Suddenly it's not just a group of instances plus one of everything else (Load Balancer, VPC, etc), but two of everything.

You also need to duplicate any configuration resources, such as Lambda functions that perform automations, SSM documents, SNS topics that generate alerts, etc.

Finally, instead of using the same value for "region" in all your code and configurations, you need to use two values, and use the correct one in every case. That's more complexity, more work, more cognitive load, and more chances of mistakes or slip ups.

Overall, Multi-Site Active/Active is much harder to manage than Warm Standby, but the advantage is that losing a region feels like losing an AZ when you're running a Highly Available workload: You just lose a bit of capacity, maybe fail over a couple of things, but overall everything keeps running smoothly.

Tips for Effective Disaster Recovery on AWS

Decide on a Disaster Recovery Strategy

You can choose freely between any of the four strategies outlined on this article, or you can even choose not to do anything in the event of a disaster. There are no wrong answers, there's only tradeoffs.

To pick the best strategy for you:

Calculate how much money you'd lose per minute of downtime
If there are hits to your brand image, factor them in as well
Estimate how often these outages are likely to occur
Calculate how much each DR strategy would cost
Determine your RTO for each DR strategy
Plug everything into your calculator
Make an informed decision

"I'd rather be offline for 24 hours once every year and lose $2.000 than increase my anual AWS expenses by $10.000 to reduce that downtime" is a perfectly valid and reasonable decision, but only if you've actually run the numbers and made it consciously.

Improve Your Detection

The longer you wait to declare an outage, the longer your users have to wait until the service is restored. On the other hand, a false positive (where you declare an outage when there isn't one) will cause you to route traffic away from a region that's working, and your users will suffer from an outage that isn't there.

Improving the granularity of your metrics will let you detect anomalies faster. Cross-referencing multiple metrics will reduce your false positives without increasing your detection time. Additionally, consider partial outages, how to differentiate them from total outages, and what the response should be.

Practice, Practice, Practice

As with any complex procedure, there's a high probability that something goes wrong. When would you rather find out about it, on regular business hours when you're relaxed and awake, or at 3 am with your boss on the phone yelling about production being down and the backups not working?

Disaster Recovery involves software and procedures, and as with any software or procedures, you need to test them both. Run periodic disaster recovery drills, just like fire drills but for the prod environment. As the Google SRE book says: "If you havent gamed out your response to potential incidents in advance, principled incident management can go out the window in real-life situations."

Recommended Tools and Resources for Disaster Recovery

One of the best things you can read on Disaster Recovery is the AWS whitepaper about Disaster Recovery. In fact, it's where I took all the images from.

Another fantastic read is the chapter about Managing incidents from the Site Reliability Engineering book (by Google). If you haven't read the whole book, you might want to do so, but chapters stand independently so you can read just this one.

Stop copying cloud solutions, start understanding them. Join over 3700 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

DynamoDB Transactions: An E-Commerce with Amazon DynamoDB

Guillermo Ojeda — Thu, 09 Nov 2023 18:42:01 GMT

We're building an e-commerce app with DynamoDB for the database, pretty similar to the one we built for the DynamoDB Database Design article. No need to go read that issue (though I think it came up great), here's how our database works:

Customers are stored with a Customer ID starting with c# (for example c#123) as the PK and SK.
Products are stored with a Product ID starting with p# (for example p#123) as the PK and SK, and with an attribute of type number called 'stock', which contains the available stock.
Orders are stored with an Order ID starting with o# (for example o#123) for the PK and the Product ID as the SK.
When an item is purchased, we need to check that the Product is in stock, decrease the stock by 1 and create a new Order.
Payment, shipping and any other concerns are magically handled by the power of "that's out of scope for this issue" and "it's left as an exercise for the reader".

There are more attributes in all entities, but let's ignore them.

We're going to use the following AWS services:

DynamoDB: A NoSQL database that supports ACID transactions, just like any SQL-based database.

Before Implementing DynamoDB Transactions

We need to read the value of stock and update it atomically. Atomicity is a property of a set of operations, where that set of operations can't be divided: it's either applied in full, or not at all. If we just ran the GetItem and PutItem actions separately, we could have a case where two customers are buying the last item in stock for that product, our scalable backend processes both requests simultaneously, and the events go down like this:

Customer123 clicks Buy
Customer456 clicks Buy
Instance1 receives request from Customer123
Instance1 executes GetItem for Product111, receives a stock value of 1, continues with the purchase
Instance2 receives request from Customer456
Instance2 executes GetItem for Product111, receives a stock value of 1, continues with the purchase
Instance1 executes PutItem for Product111, sets stock to 0
Instance2 executes PutItem for Product111, sets stock to 0
Instance1 executes PutItem for Order0046
Instance1 receives a success, returns a success to the frontend.
Instance2 executes PutItem for Order0047
Instance2 receives a success, returns a success to the frontend.

The data doesn't look corrupted, right? Stock for Product111 is 0 (it could end up being -1, depends on how you write the code), both orders are created, you received the money for both orders (out of scope for this issue), and both customers are happily awaiting their product. You go to the warehouse to dispatch both products, and find that you only have one in stock. Where did things go wrong?

Steps to Implement DynamoDB Transactions

The problem is that steps 4 and 7 were executed separately, and Instance2 got to read the stock of Product111 (step 6) in between them, and made the decision to continue with the purchase based on a value that hadn't been updated yet, but should have. Steps 4 and 7 need to happen atomically, in a transaction.

Install the AWS SDK

First, install the packages from the AWS SDK V3 for JavaScript:

npm install @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb

Update the Code to Use Transactions

This is the code in Node.js to run the steps as a transaction (you should add this to the code imaginary you already has for the service):

const { DynamoDBClient } = require('@aws-sdk/client-dynamodb');const { DynamoDBDocumentClient } = require('@aws-sdk/lib-dynamodb');const dynamoDBClient = new DynamoDBClient({ region: 'us-east-1' });const dynamodb = DynamoDBDocumentClient.from(dynamoDBClient);//The code imaginary you already has//This is just some filler code to make this example valid. Imaginary you should already have this solvedconst newOrderId = 'o#123' //Must be uniqueconst productId = 'p#111' //Comes in the requestconst customerId = 'c#123' //Comes in the requestconst transactItems = {  TransactItems: [    {      ConditionCheck: {        TableName: 'SimpleAwsEcommerce',        Key: { id: productId },        ConditionExpression: 'stock > :zero',        ExpressionAttributeValues: {          ':zero': 0        }      }    },    {      Update: {        TableName: 'SimpleAwsEcommerce',        Key: { id: productId },        UpdateExpression: 'SET stock = stock - :one',        ExpressionAttributeValues: {          ':one': 1        }      }    },    {      Put: {        TableName: 'SimpleAwsEcommerce',        Item: {          id: newOrderId,          customerId: customerId,          productId: productId        }      }    }  ]};const executeTransaction = async () => {  try {    const data = await dynamodb.transactWrite(transactItems);    console.log('Transaction succeeded:', JSON.stringify(data, null, 2));  } catch (error) {    console.error('Transaction failed:', JSON.stringify(error, null, 2));  }};executeTransaction();//Rest of the code imaginary you already has

After Implementing DynamoDB Transactions

Here's how things may happen with these changes, if both customers click Buy at the same time:

Customer123 clicks Buy
Customer456 clicks Buy
Instance1 receives request from Customer123
Instance2 receives request from Customer456
Instance1 executes a transaction:
1. ConditionCheck for Product111, stock is greater than 0 (actual value is 1)
2. PutItem for Product111, set stock to 0
3. PutItem for Order0046
4. Transaction succeeds, it's committed.
Instance1 receives a success, returns a success to the frontend.
Instance2 executes a transaction:
1. ConditionCheck for Product111, stock is not greater than 0 (actual value is 0)
2. Transaction fails, it's aborted.
Instance2 receives an error, returns an error to the frontend.

Overview of DynamoDB

DynamoDB is so scalable because it's actually a distributed database, where you're presented with a single resource called Table, but behind the scenes there's multiple nodes that store the data and process queries. Data is partitioned using the Partition Key, which is part of the Primary Key (the other part is the Sort Key).

DynamoDB is highly available (meaning it can continue working if an Availability Zone goes down) because each partition is stored in 3 nodes, each in a separate Availability Zone. This is the "secret" behind DynamoDB's availability and durability. You don't need to know this to use DynamoDB effectively, but now that you do, you see that transactions in DynamoDB are actually distributed transactions.

How Transactions Work in DynamoDB

Two-Phase Commit

DynamoDB implements distributed transactions using Two-Phase Commit (2PC). This strategy is pretty simple: All nodes are requested to evaluate the transaction to determine whether they're capable of executing it, and only after all nodes report that they're able to successfully execute their part, the central controller sends the order to commit the transaction, and each node does the actual writing, affecting the actual data. For this reason, all operations done in a DynamoDB transaction consume twice as much capacity.

Itempotency

DynamoDB transactions are idempotent. They're identified by an attribute called ClientRequestToken, which the DynamoDB SDK includes automatically on any transactions. If you use the TransactReadItems API or TransactWriteItems API without the SDK, you'll need to include it to achieve transaction idempotency.

Isolation

Transaction isolation (the I in ACID) is achieved through optimistic concurrency control. This means that multiple DynamoDB transactions can be executed concurrently, but if DynamoDB detects a conflict, one of the transactions will be rolled back and the caller will need to retry the transaction.

Transactions on Multiple Tables

DynamoDB Transactions can span multiple tables, but they can't be performed on indexes. Also, propagation of the data to Global Secondary Indexes and DynamoDB Streams always happens after the transaction, and isn't part of it.

Pricing for DynamoDB Transactions

There is no direct cost for using transactions. However, all operations performed on DynamoDB as part of a transactions will consume double the amount of capacity units as they regularly would. Write and delete operations consume write capacity, and any condition expression consumes read capacity. This extra capacity is only consumed for the operations on the table, the read and write capacity consumed for updating secondary indexes and for DynamoDB Streams isn't affected. When working with DynamoDB On-Demand Mode, Request Units are doubled, just like Capacity Units.

DynamoDB vs SQL databases

The whole point of this article and the others I've written about DynamoDB is that SQL databases shouldn't be your default. I've shown you that DynamoDB can handle an e-commerce store just fine, including ACID-compliant transactions. That's because for an e-commerce, and in fact for 95% of the applications we write, we can predict data access patterns. When we can do that, we can optimize the structure and relations of a NoSQL database like DynamoDB and have it perform much better than a relational database for those known and predicted access patterns.

The use case for SQL databases is unknown access patterns! And those come from either giving the user a lot of freedom (which might be a mistake, or might be a core feature of your application), or from doing analytics and ad-hoc queries. In those cases, definitely go for relational databases. Otherwise, see if you can solve it with a NoSQL database like DynamoDB. It'll be much cheaper, and it will scale much better. I'll make one concession though: If all your dev team knows is SQL databases, just go with that unless you have a really strong reason not to.

Using SQL in DynamoDB

This is gonna blow your mind: You can actually query DynamoDB using SQL! Or more specifically, a SQL-compatible language called PartiQL. Amazon developed PartiQL as an internal tool, and it was made generally available by AWS. It can be used on SQL databases, semi-structured data, or NoSQL databases, so long as the engine supports it.

With PartiQL you could theoretically change your Postgres database for a DynamoDB database without rewriting any queries. In reality, you need to consider all of these points:

Why are you even changing? It's not going to be easy.
How are you going to migrate all the data?
You need to make sure no queries are triggering a Scan in DynamoDB, because we know those are slow and very expensive. You can use an IAM policy to deny full-table Scans.
Again, why are you even changing?

I'm not saying there isn't a good reason to change, but I'm going to assume it's not worth the effort, and you'll have to prove me otherwise. Remember that replicating the data somewhere else for a different access pattern is a perfectly valid strategy (in fact, that's exactly how DynamoDB GSIs work). We'll discuss it further in a future issue.

Are there any limitations to using transactions in DynamoDB?

Yes, there are some limitations to using transactions in DynamoDB. Transactions are limited to a maximum of 100 unique items and the total data size within a transaction cannot exceed 4 MB. Additionally, transactions cannot operate on tables with global secondary indexes that have projected attributes.

Best Practices

Operational Excellence

Monitor transaction latencies: Monitor latencies of your DynamoDB transactions to identify performance bottlenecks and address them. Use CloudWatch metrics and AWS X-Ray to collect and analyze performance data.
Error handling and retries: Handle errors and implement exponential backoff with jitter for retries in case of transaction conflicts.

Security

Fine-grained access control: Assign an IAM Role to your backend with an IAM Policy that only allows the specific actions that it needs to perform, only on the specific tables that it needs to access. You can even do this per record and per attribute. This is least privilege.

Reliability

Consider a Global Table: You can make your DynamoDB table multi-region using a Global Table. Making the rest of your app multi-region is more complicated than that, but at least the DynamoDB part is easy.

Performance Efficiency

Optimize provisioned throughput: If you're using Provisioned Mode, you'll need to set your Read and Write Capacity Units appropriately. You can also set them to auto-scale, but it's not instantaneous. Remember the article on using SQS to throttle writes.

Cost Optimization

Optimize transaction sizes: Minimize the number of items and attributes involved in a transaction to reduce consumed read and write capacity units. Remember that transactions consume twice as much capacity, so optimizing the operations in a transaction is doubly important.

Stop copying cloud solutions, start understanding them. Join over 3700 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Data Loss, Replication and Disaster Recovery on AWS

Guillermo Ojeda — Tue, 31 Oct 2023 18:37:56 GMT

Note: This content was originally published at the Simple AWS newsletter.

Imagine this scenario: You have some data that's absolutely critical to your business. If you lose it, it's a disaster! How do you recover?

Data Loss Scenarios

First, we need to define what we mean when we say "lose it". How do you lose data? Let's consider some scenarios, and what we can do to in each case.

Data Loss Because of Hardware Failure

As I'm sure you know, computer hardware is sensitive equipment, which will inevitably fail at some point. When working with AWS we don't really see or manage the hardware, but we're still vulnerable to hardware failing. That's why AWS services publicly advertise their durability: EBS has 99.9% or 99.999% depending on volume type, while S3 has 99.999999999% (referred to as 11 nines, or 11 9s). 99.999999999% (11 nines) durability means that if you store 10 million objects, then you expect to lose an object of your data every 10,000 years.

How to prevent it

S3 should be more than enough to protect from bit rot or simultaneous hardware failures. If you're not storing your critical data in S3, start at least backing it up there. You can create Snapshots from EBS volumes or RDS instances, which are stored in S3.

Data Loss Because of Human Error

This includes scenarios where you or anyone on your team (with legitimate access and good intentions) accidentally deletes or overwrites data. In S3, it can be deleting an object or an entire bucket. In EBS, EFS or anything mounted in the file system, it can be a typo when running a command like rm -rf. In a database, it's more often than not a query ran with the wrong parameters, such as a SQL UPDATE with no WHERE clause.

Automated processes are also included in this scenario, since the reason they might delete or overwrite data that they shouldn't have touched is always due to human error when programming or configuring them.

How to prevent it

The first step is to understand that anyone can make a mistake, no matter how skilled or careful. Training and clearly defined procedures will reduce the probability of mistakes, but they'll never take it to 0. Limiting accesses and implementing guardrails and additional confirmations will further reduce the probability.

Overall, the best way to protect from this is to have backups of the data that can't be overwritten or deleted through the same means. For example, you can copy database snapshots to another AWS account.

Data Loss Because of Hacks or Ransomware

In this case, you're dealing with a malicious actor intentionally trying to delete the data, or make it inaccessible to you. The most common scenarios are Ransomware attacks, where an attacker either steals or encrypts the data, and asks you for money in exchange for granting you access to it.

Attackers gain the ability to affect your data in AWS through credentials. This can be your own username and password stolen, the IAM role of an EC2 instance that the attacker gained access to, or any other way that they can gain AWS credentials to your account.

How to prevent it

Basic security measures such as security best practices for AWS accounts, minimum privileges, and application security go a long way. Requiring Multi-Factor Authentication for certain AWS operations, such as deleting objects in S3, is another good measure.

What can happen is that an attacker gains some form of access, often not enough to compromise the entire AWS account or access your data, and then performs lateral movements and privilege escalations to gradually gain more access. A really simple example would be an EC2 instance with no access to S3 but an IAM Policy that grants permissions iam:*. An attacker with access to that instance can't immediately encrypt an S3 bucket, but they can use that instance's credentials to create a new IAM User for themselves, which has access to S3.

A way to protect against that is to store backups of data where they can't be tampered with from the same place where the data is. A good example, which I'll show you how to configure, is to set up a separate AWS account (let's call it Account B) and replicate there all objects in an S3 bucket. So long as there's no way to delete those objects in Account B from Account A, there's no path for an attacker with access to Account A to delete or encrypt the data in Account B. This doesn't completely eliminate the risk! But it makes it much less likely to occur, since an attacker would need to succeed at two separate attacks, one to gain access to Account A and one to Account B. This, coupled with ways to detect failed sign in attempts to AWS, significantly improves your security.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Disaster Recovery Metrics: RPO and RTO

Before we move on to the solution, there's two things I want to discuss briefly, which will determine how often you perform your backups and what backup strategies and/or technologies you can use. They're two of the most common Disaster Recovery metrics:

Recovery Point Objective (RPO)

This is a measure of how much time can pass from when data is written to when it's backed up. It's measured in time units, usually in hours or minutes. Any data that was written less than "RPO" ago isn't guaranteed to be backed up, and in the event of a disaster it wouldn't be found in the backups, and would be lost.

For example, an RPO of 12 hours means any data that was written less than 12 hours ago isn't guaranteed to be backed up. A typical way to implement an RPO of 12 hours is to create backups twice a day, for example at 00:00 and 12:00. Data isn't guaranteed to not be backed up either, for example a disaster occurs at 13:00 the only data lost would be from the previous hour, which is the time since the last backup.

The reason RPO isn't exactly equivalent to time between backups is because backups can also be implemented in a continuous or nearly continuous way. For our backup every 12 hours example we're assuming the backup is instantaneous, which isn't exactly true but it's a good approximation. If we run backups every 1 minute, then the 30 seconds that the backup process might last isn't a number we can ignore, and our RPO would be 1 minute and 30 seconds.

The duration of the backup process is sometimes called replication delay. For example, Aurora has a replication delay of 1 minute between the primary instance and its replicas. That gives a Disaster Recovery strategy of using an Aurora replica a 1-minute RPO, since the replication process is started nearly immediately when data is changed.

Different data can have different RPOs. For example, data stored in S3 can have an RPO of 1 hour, and data on RDS an RPO of 6 hours. That's perfectly normal, and you should consider how bad it would be to lose all new data from the last X time to decide whether you're good with your numbers or need to improve them. RPO can be heavily constrained by the technology used to store the data and the technologies and techniques used to back it up. For example, an RPO of 1 hour is normal for S3 because the easiest backup method to set up for S3 is Cross-Region Replication, a feature already built into S3 with an RPO of 1 hour (15 minutes if you enable Replication Time Control).

Recovery Time Objective (RTO)

This is the target time between when you detect that a failure is happening and when you have the backup live and serving traffic at the same quality of service as if there was no failure. It's measured in time units, usually minutes or hours.

For example, if you're backing up your RDS database with RDS Snapshots, your RTO is going to be more or less the time it takes you to create a new RDS instance from the snapshot (usually between 30 minutes and 2 hours, depending on the size of the snapshot).

More accurately, your RTO would be the time between when you detect a failure in the original RDS instance and when the new RDS instance created from the snapshot is serving traffic. If you've automated this process, 99% of the recovery time is going to be creating the RDS instance. If you haven't, you need to take into account the time it takes you to:

Receive an alert
View that alert
Log in to the system
Find the correct snapshot
Figure out the correct configurations for the new RDS instance
Launch the creation of a new RDS instance
Switch over traffic to the new RDS instance

Every part of that which you can automate will reduce your RTO, and also reduce the chance that you make a mistake and, for example, restore the incorrect snapshot, or create the new RDS instance in the wrong VPC. Automate whatever you can (you can automate all of that). Start with the longest sentences, I wrote them like that on purpose.

Disaster Recovery in AWS

In AWS jargon, Disaster Recovery means being able to get the entire system back online in the case of an AWS Region failing. For that, you'll need to have the data available in that other region, as well as any additional resources required to access the data, such as the KMS key used to encrypt it (remember that KMS keys are regional by default, and you can create multi-region keys).

Getting the entire system back online also requires you to stand up compute capacity (be it EC2 instances, an ECS on Fargate cluster, Lambda functions, etc), make the data accessible (e.g. launch an RDS instance from the copied RDS snapshot), and switch over traffic. It's a complex process, there are different strategies, and there are multiple things to take into account depending on your RTO and RPO.

The next post is going to be about Disaster Recovery strategies, and being prepared to deploy the entire system in another AWS region. The first step towards that is to have the data accessible, so let's focus on that.

How to Configure S3 Replication Across Different AWS Accounts

Let's view a solution to back up data in S3 to another S3 bucket in a different AWS account. To better protect from different disaster scenarios, you should make sure access to this other AWS account is very restricted.

Step 0: Preparation

Log in to an AWS account, let's call it Account A.
Open the S3 console
Click Create bucket
Enter a name for the source bucket (must be unique across all AWS)
Scroll down to Bucket Versioning and select Enable
Click Create
Copy or write down the Account ID of Account A (you'll need it later)
Log in to a different AWS account, let's call it Account B
Open the S3 console
Click Create bucket
Enter a name for the destination bucket (must be unique across all AWS). Write down the name.
Scroll down to Bucket Versioning and select Enable
Click Create
Copy or write down the Account ID of Account B (you'll need it later)

Step 1: Enable Replication in the Source Bucket

Log back in to Account A and go to S3
Click on the source bucket (the one you created on Step 0)
Click the Management tab
Scroll down to Replication rules and click Create replication rule
Under Replication rule name enter a name for the rule, such as cross-account replication
Under Choose a rule scope, select Apply to all objects in the bucket
In the Destination section, under Destination select Specify a bucket in another account
Under Account ID enter the Account ID of Account B (where the destination bucket is)
Under Bucket name, enter the name of the destination bucket
Check Change object ownership to destination bucket owner
In the IAM role section, under IAM role open the dropdown and select Create new role
Click Save
Click Submit (we don't have any existing objects, so what we choose here doesn't really matter)
In the Replication configuration settings section, under IAM role, copy the name of the IAM role that was created automatically

Step 2: Update the Policy on the Destination Bucket

Log in to Account B and go to S3
Click on the destination bucket (the one you created on Step 0)
Click the Permissions tab
Next to Bucket policy, click Edit
In the following policy, replace ID_OF_ACCOUNT_A with the ID of the account where the source bucket is, NAME_OF_THE_IAM_ROLE with the last value you copied in Step 1, and NAME_OF_THE_DESTINATION_BUCKET with the name of the destination bucket. Then click Save.

{   "Version":"2012-10-17",   "Id":"",   "Statement":[      {         "Sid":"Set-permissions-for-objects",         "Effect":"Allow",         "Principal":{            "AWS":"arn:aws:iam::ID_OF_ACCOUNT_A:role/service-role/NAME_OF_THE_IAM_ROLE"         },         "Action":["s3:ReplicateObject", "s3:ReplicateDelete"],         "Resource":"arn:aws:s3:::NAME_OF_THE_DESTINATION_BUCKET/*"      },      {         "Sid":"Set permissions on bucket",         "Effect":"Allow",         "Principal":{            "AWS":"arn:aws:iam::ID_OF_ACCOUNT_A:role/service-role/NAME_OF_THE_IAM_ROLE"         },         "Action":["s3:List*", "s3:GetBucketVersioning", "s3:PutBucketVersioning"],         "Resource":"arn:aws:s3:::NAME_OF_THE_DESTINATION_BUCKET"      }   ]}

Step 3: Upload an Object to the Source Bucket

Log back in to Account A and go to S3
Click on the source bucket
Click Upload
Click Add files, select one or more files, and click Open
Click Upload
Verify that the file is uploaded to the Source bucket

Step 4: Upload an Object to the Source Bucket

Log back in to Account B and go to S3
Click on the destination bucket
Verify that the same file you uploaded to the source bucket is present on the destination bucket

Recommended Tools and Resources about Data Replication

When data is replicated across several disks, data loss happens when one of those disks fails, and in the time while a new copy is being created to replace the one that just failed, another disk also fails. The probability of losing data clearly depends on how often a disk fails (called Mean Time Between Failures, or MTBF) and how long it takes to recreate that copy (called Mean Time To Recovery or MTTR). But if I gave you those numbers, would you know how to calculate the probability of data loss? I didn't, until I read this article!

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Detecting Failed Sign In Attempts to AWS and Alerting

Guillermo Ojeda — Thu, 26 Oct 2023 15:26:47 GMT

Note: This content was originally published at the Simple AWS newsletter.

Imagine this scenario: You're careful with security, and you set up Multi-Factor Authentication for your AWS IAM or IAM Identity Center user. At one point, a malicious agent of evilness figures out your password, either via phishing, keylogging, or any other technique. The only thing standing between them and $50k in bitcoin mined on your AWS account (and the $500k AWS bill paid with your credit card) is your MFA device. They don't have access it (yet!), so you're safe (for now!).

The correct response is obvious: Change your password, so they'll be back to square one. The problem with this is that most of the times you don't know that a password has become known until someone uses it. This is the reason why we should rotate passwords regularly!

Let me propose an extra layer of security: Get notified every time a login attempt fails because of a failed MFA check.

Understanding CloudTrail event logs

AWS CloudTrail is a service that logs all requests performed against AWS in your account. This includes all actions and requests (even unauthorized ones) done through the Console, CLI, AWS SDKs, and APIs.

By default (already enabled when you create your account) and for free, CloudTrail offers you a viewable, searchable, downloadable, and immutable record of all events that happened in the past 90 days, called Event History.

You can also create Trails, which let you export to S3 or CloudWatch Logs all or a subset of CloudTrail events. This way, you can use other AWS services like Athena and OpenSearch, or external tools like ElasticSearch, to analyze CloudTrail events. Event history is per region, while Trails can be created for all regions in an account, and even for all accounts in an Organization.

Events are JSON objects, which contain all the information of what was attempted, when, by whom, some other details on the request, what was the result, and sometimes some other details on the response, such as the reason.

For example, this is what a ConsoleLogin event that fails looks like (I hid a few details replacing them with HIDDEN):

{    "eventVersion": "1.08",    "userIdentity": {        "type": "IAMUser",        "principalId": "HIDDEN",        "accountId": "HIDDEN",        "accessKeyId": "HIDDEN",        "userName": "HIDDEN"    },    "eventTime": "2023-08-24T21:07:08Z",    "eventSource": "signin.amazonaws.com",    "eventName": "ConsoleLogin",    "awsRegion": "us-east-1",    "sourceIPAddress": "HIDDEN",    "userAgent": "HIDDEN",    "errorMessage": "Failed authentication",    "requestParameters": null,    "responseElements": {        "ConsoleLogin": "Failure"    },    "additionalEventData": {        "LoginTo": "https://console.aws.amazon.com/console/home?HIDDEN",        "MobileVersion": "No",        "MFAUsed": "Yes"    },    "eventID": "HIDDEN",    "readOnly": false,    "eventType": "AwsConsoleSignIn",    "managementEvent": true,    "recipientAccountId": "HIDDEN",    "eventCategory": "Management",    "tlsDetails": {        "tlsVersion": "TLSv1.3",        "cipherSuite": "TLS_AES_128_GCM_SHA256",        "clientProvidedHostHeader": "signin.aws.amazon.com"    }}

How are IAM and IAM Identity Center events logged in CloudTrail?

Naturally, CloudTrail logs events when an IAM or IAM IC user logs in to AWS. However, it's not just one event. Let me show you what happens behind the scenes with IAM IC and IAM.

CloudTrail Events for IAM Identity Center users logging in

These are the events that CloudTrail logs when an IAM or IAM IC user logs in:

CredentialChallenge: AWS requested some form of credential, such as password or MFA device. Each of these is followed by UserAuthentication and one CredentialVerification event, and this sequence of three is repeated until either all necessary credentials are provided, or CredentialVerification fails.
UserAuthentication: AWS receives the requested credentials.
CredentialVerification: AWS checks whether the received credentials are valid. If they are, this event contains: "serviceEventDetails":{ "CredentialChallenge":"Success" }, and the process continues either by requesting the next credential or by authenticating. If the credentials received are invalid, this event contains "serviceEventDetails":{ "CredentialChallenge":"Failure" } and the process stops. The type of credential requested can be found in "additionalEventData":{ "CredentialType":"PASSWORD" }, the value of which can be PASSWORD for the regular password, TOTP for MFA devices that produce temporary codes, WEBAUTHN for web apps using WebAuthn, EXTERNAL_IDP for external identity providers, or RESYNC_TOTP to re-synchronize TOTP devices.
Authenticate: Once CredentialVerification succeeds and no more credentials are required, this event is logged. This means the user successfully authenticated to IAM Identity Center. If you're authenticating with an external Identity Provider such as Google Workspaces, Microsoft AD or Okta, this is the only event you'll see.
Federate: This event means the IAM IC user assumed an IAM role in an AWS account.
ConsoleLogin: This event means the IAM IC user logged in to the AWS Console using the assumed IAM role.

The first four events will be logged on the AWS account and region where IAM IC is configured. The last two are logged in the AWS account where the user signs in, in the default region for that user (i.e. the one that's selected when the user signs in).

CloudTrail Events for IAM users logging in

IAM is much simpler. The only event is ConsoleLogin, and this is how you can identify what happened:

If the user signed in successfully, the event contains "responseElements": { "ConsoleLogin": "Success" }
If sign in failed, the event contains "responseElements": { "ConsoleLogin": "Failure" } and can contain "errorMessage": "Failed authentication"
If the user used MFA (regardless of success or failure), the event contains "additionalEventData": { "MFAUsed": "Yes" }
If the user is the root user, the event contains "userIdentity": { "type": "Root" }

All of these events are always logged in the us-east-1 region.

Identifying when signing in fails due to MFA

The last section was a bit detailed, but it contains all the information we need to identify when a login attempt fails the MFA check!

For IAM IC, we need to find CredentialVerification events which contain either "additionalEventData":{ "CredentialType":"TOTP" } or "additionalEventData":{ "CredentialType":"WEBAUTHN" }, and contain "serviceEventDetails":{ "CredentialChallenge":"Failure" }.

For IAM, we need to find ConsoleLogin events which contain "responseElements": { "ConsoleLogin": "Failure" } and "additionalEventData": { "MFAUsed": "Yes" }.

You can do this manually in the CloudTrail event history, but if we want to automate it and get notifications, we need to send it to CloudWatch Logs.

Sending CloudTrail events to CloudWatch Logs

Sign in to the AWS Console. For IAM IC, sign in to the root account of the Organization
Go to CloudTrail
Click Create trail
For Trail name, enter management-events
If you're configuring this for an organization, check Enable for all accounts in my organization
Select Create new S3 bucket and under Trail log bucket and folder enter a name, or leave it as default
Under AWS KMS alias, enter a name for a KMS key to encrypt the logs
Under CloudWatch Logs, check Enabled
Under Log group name, enter a name for the CloudWatch Logs log group
Under Role name, enter a name for the IAM Role that'll let CloudTrail put logs to CloudWatch Logs
Click Next
Leave these options as default and click Next again
Click Create trail
Wait until the Status column changes to Logging (green)

Configuring Alerts and Notifications for CloudWatch Logs

First, let's create a filter to view the failed logins in CloudWatch Logs. Follow the steps, and on Step 6 choose the right filter depending on whether you're using IAM or IAM IC.

Open the CloudWatch console
In the panel on the left, under Logs, click Log groups.
Click on the name of the log group that you created for the trail
Click the Metrics filters tab, and click the button Create metric filter
Under Filter pattern, enter the pattern that corresponds to IAM or IAM IC, depending on which you're using:
1. For IAM: { ($.eventName = ConsoleLogin) && ($.additionalEventData.MFAUsed = "Yes") && ($.responseElements.ConsoleLogin = "Failure") }
2. For IAM IC: { ($.eventName = CredentialVerification) && (($.additionalEventData.CredentialType = "TOTP") || ($.additionalEventData.CredentialType = "WEBAUTHN")) && ($.serviceEventDetails.CredentialChallenge = "Failure")}
Click Next
For Filter name, enter SignInFailedMFA
Under Metric namespace, enter CloudTrailMetrics
For Metric name, enter SigninMFAFailureCount
For Metric value, enter 1.
Click Next
Click Create metric filter

On the Metric filters tab, find the metric filter you just created, select it and click Create alarm
Under Whenever SigninMFAFailureCount is..., select Greater/Equal
Under than, enter 1
Click Next
Under Send a notification to the following SNS topic, select Create new topic
Under Create a new topic, enter SignInFailedMFAAlarm
Under Email endpoints that will receive the notification, enter your email address
Click the Create topic button that's right below where you just entered your email address
Click Next
Under Alarm name, enter SignInFailedMFAAlarm
Click Next
Click Create alarm
Open your email inbox, open the email titled AWS Notification - Subscription Confirmation, and click Confirm subscription

Testing Alerts When Signing in to AWS Fails Due to Incorrect MFA Code

To test it, you'll need to attempt to sign in, and enter the incorrect MFA code. There's going to be a delay of a few minutes between when the event actually occurs and when you receive the notification. CloudTrail usually takes an average of 5 minutes to export the event logs to CloudWatch Logs, and the alarm could take a couple more minutes to fire.

You can also monitor the alarm in CloudWatch Alarms:

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

* Real scenarios and solutions

* The why behind the solutions

* Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Server-Side Rendering with AWS Amplify

Guillermo Ojeda — Tue, 24 Oct 2023 18:31:48 GMT

Note: This content was originally published at the Simple AWS newsletter.

Back in the Paleolithic, which for software means 30 years ago, we had HTML, CSS and JavaScript, and we wrote all the structure of the website in HTML. When we wanted to create dynamic content for our website, we added PHP or Java code in the middle of that HTML, and we would run that code on the server and it would output the HTML that we wanted to put there. That process of producing the final HTML for the website by running code is called rendering, and back then it happened Server-Side.

Then came web components and frameworks like Angular, React and Vue, and they changed the paradigm a bit. Now we didn't write plain HTML with snippets of PHP or Java, but instead we wrote JavaScript code that would then output the entire HTML code. Since it was JavaScript, we could run it on the browser, at the user's computer. That's called Client-Side Rendering.

A huge benefit of Client-Side Rendering was that we were using the user's CPU. That was awesome for our pockets, we didn't have to pay for that compute capacity. However, it meant the user had to wait however long their computer took to render that website. Mind you, it wasn't considered a big problem back then. Remember that back in those days users' expectations were pretty different: it was considered normal to wait 3 or 5 seconds for a website to load. Nowadays, more than 2 seconds seems like an eternity.

As compute capacity got cheaper, someone had the idea of bringing the rendering back to the server. The idea was to use our significantly more powerful servers to render the website much faster than what the user computer could achieve, reducing load time significantly. Sure, we were paying extra, but overall the improved user experience was worth the extra money. Cloud computing played a big part there as well: Not only was infrastructure cheaper per CPU-hour, but it also required less engineering hours to create and maintain.

We didn't bring back the old languages though! Instead, we started running that same React code in our servers, as if it were the user's browser. A big reason for this is that, while all of this back and forth was happening, frontends got increasingly more complex. That led to developers being classified as frontend or backend, and while there are full stack devs, most are just strong on one side and really weak on the other one (I'm one of those cases!). So, the folks who knew how to code UIs didn't know Java or PHP (not even backend devs like those nowadays), and the folks who did know those languages didn't know how to code UIs. The solution? Let's create a framework like Next.js that runs that same React code in our servers. That way, we get the best of both worlds: Frontend frameworks with Server-Side Rendering!

That's how we went from Server-Side Rendering, to Client-Side Rendering, and back to Server-Side Rendering. I've said it often, technology is cyclic.

How does Amplify help

AWS Amplify is a set of tools that help frontend devs use AWS infrastructure and even build backends, without knowing a lot about infrastructure or backend. There's two parts to it: Amplify Hosting is a managed service that provides hosting and CI/CD for serverless apps. Amplify Studio is a visual development environment that lets you build a UI and a backend as with a no-code tool, with reusable components. We're going to use Amplify Hosting, but you might be interested in checking out Amplify Studio.

Why not an EC2 instance?

We know that AWS Amplify is going to be using EC2 instances behind the scenes, right? So, since we really know our way around AWS (I mean, you're reading a newsletter about AWS!), why shouldn't we just use EC2 instances for this instead of relying on a managed service? You could! It's a bit of our classic buy vs build decision, right? And you know by now that I always recommend you default to managed services and only build if there's a good reason to do so. I'm tempted to recommend the same here, but Amplify takes the managed in managed service and cranks it up to 11, and the price reflects that.

Amplify works as a self-service platform that people with little to no knowledge of cloud infrastructure can use to develop and deploy their applications. This is like an extremely managed big service: You're no longer solving just one part of the problem, but the entire problem of hosting an app in the cloud. It does get pretty expensive, so you probably want to consider EC2, or an ECS cluster on Fargate (more expensive than plain EC2, but easier). Let's run some numbers, so you can see how much it can really cost you.

AWS Amplify Pricing

Here's how Amplify charges you:

Build and deploy: $0.01 per minute
Data storage: $0.023 per GB per month
Data transfer out: $0.15 per GB served
Requests for SSR: $0.30 per 1 million requests + $0.20 per GB-hour

For example, say you're a startup with the following assumptions:

10,000 daily active users
5 devs, each doing 2 commits a day
Average build time is 3 minutes
The team works Monday to Friday (20 days/month) <-- this is the least realistic assumption for startups
The app is 25 MB
Average page size is 1.5 MB

Here are our calculations:

Total build time per month = devs * commits/day * days/month * avg. build time = 5 * 2 * 20 * 3 = 600 build minutes per month. 600 * $0.01 = $6

Monthly GB served = daily active users * average page size * days/month = 10,000 * (1.5/1024) * 30 = 439.45 GB. 439.45 * $0.15 = $65.92

Monthly GB storage = app size * builds/month = (25/1024)(5*2*20) = 4.88 GB. 4.88 * $0.023 = $0.11

Total charges = $6 + $65.92 + $0.11 = $72.03/month

Ok, that wasn't so bad, right? Well, it's dominated by Monthly GB served, so let's run those numbers with CloudFront:

For the most expensive regions: 439.45 * $0.12 = $52.73
For the least expensive regions: 439.45 * $0.085 = $37,35

Amplify is 25% to 75% more expensive! Is that a lot? It depends. 75% more on $0,10 is less than the electricity I spent typing this sentence (shame on me! especially for adding this parenthesis to make the sentence long enough for that claim to be true). 75% more on $1.000 is $750, I'll set up CloudFront for you for that money!

AWS Amplify for the backend

Amplify also lets you host a backend, which it runs in Lambda functions. You don't have a lot of control over it, but it works well for its intended audience: People who wouldn't know what to do if they had a lot of control over their Lambda functions. Amplify also lets you consume other AWS services easily, through declarative and easy-to-use libraries. That way, you can consume Cognito or S3 from the frontend without knowing a lot about Cognito or S3. Here's the complete list of libraries for Amplify, and you can check the Readme of the JavaScript one as an example of its features.

Scenario

You work 8 hours a day as an engineer, and you want to launch a startup in your free time. You know you want good practices, but you don't have the time to set everything up manually. You want to start with the website, built with React and Next.js. You want Server-Side Rendering, and obviously a CI/CD Pipeline. You want everything to be serverless so you pay as little as possible while you get the hang of running a startup, getting users and all of that (which is actually the most difficult part).

Solution

Host the application in AWS Amplify, which handles hosting and CI/CD. As you build the backend, either use Amplify to create your Lambda functions, or go with something more traditional like Serverless or SAM, depending on your expertise.

Step-by-step Instructions

Step 0: Setup

Download and install Node.js from the official website. Alternatively, install nvm and your favorite version of Node.js
Install npm if it didn't come with Node.js.
Install yarn: npm install --global yarn
Install git

Step 1: Create a Next.js app

Open a terminal
Run the following command: yarn create next-app
Follow the prompts:
What is your project named? simpleaws-app
Would you like to use TypeScript? No
Would you like to use ESLint? Yes
Would you like to use Tailwind CSS? No
Would you like to use src/ directory? Yes
Would you like to use App Router? (recommended) Yes
Would you like to customize the default import alias? No
Change directories to the app's directory: cd simpleaws-app
Start the app locally: yarn dev
Open your browser, go to http://localhost:3000 and check that the page loads correctly.

Step 2: Create a git repo for the app

Go to https://github.com/new and create a new repository
Init the repo locally: git init
Add the GitHub repository as an origin (replace YOUR_USERNAME and PROJECT_NAME with your values): git remote add origin git@github.com\:username/project-name.git
Add the files, commit and push:
git add .
git commit -m 'initial commit'
git push origin main

Step 3: Create an Amplify project

Go to the Amplify console
Scroll down to the Get started section, and under Amplify Hosting, click Get started
Select GitHub and click Continue
Click Authorize AWS Amplify (us-east-1) (the green button)
Select your user or organization and click Continue
Select Only select repositories, click the dropdown Select repositories and click on your repo. Click Install & Authorize
Authenticate to GitHub with your security key, or click Use your password (it's in a really small font) and enter your password.
Click the dropdown under Recently updated repositories and select your repo. Leave branch as main, and click Next
Verify the Build and test settings (they should be fine, they were created by Next.js when you initialized the project)
Check Allow AWS Amplify to automatically deploy all files hosted in your project root directory
Click Next
Click Save and deploy
Wait until Provision, Build and Deploy are done

Step 4: Test the app

Click on the link under the window icon with the Amazon arrow. Verify that the website loads correctly.

Step 5: Delete the app

On the top right corner click Actions
Click Delete app
Enter "delete"
Click Delete

Explanation

Step 0: Setup

Just installing some tools and dependencies.

Step 1: Create a Next.js app

Next.js has this awesome project initializer, that creates all the basic scaffolding you need. It is a bit opinionated, but I've never met anyone who doesn't like it (if you don't like it, let me know, you'll be the first!). Besides, it fits the scenario: You just want to get things done.

Step 2: Create a git repo for the app

We need the app in a git repo so Amplify can track changes to branches and pull the code from there. You can use GitHub, GitLab, Bitbucket or CodeCommit. You can also not set up a git repo and upload the code manually (or reference an S3 bucket), but in that case Amplify can't do the CI/CD for you.

Step 3: Create an Amplify project

The only odd thing here would be the build steps. Amplify auto-detects them from your package.json file, and creates the configuration file that you saw in that step. Of course, you can edit it. I'd recommend you keep it in line with your package.json file though.

Step 4: Test the app

After everything is deployed, Amplify will be serving your app in a URL that looks something like branch-name.d1m7bkiki6tdw1.amplifyapp.com. You can set up a custom domain through the Amplify console. Don't just go to Route 53 and point a domain to your Amplify URL, that'll fail because of the SSL certificates.

Step 5: Delete the app

Let's not pretend like we always remember to delete the things we deploy. This is a friendly reminder to delete the app!

Best Practices

Operational Excellence

Set up a branch for each environment: Create a dev env where Amplify deploys from the dev branch, and a prod env where Amplify deploys from the main branch. Only commit to those branches through pull requests. That way, merging a PR means a release in that environment.
Monitor Performance: Amplify has a monitoring service that allows you to view logs, build events, and other metrics.

Security

Use Amplify's Built-in Authentication: Amplify integrates with Cognito for user authentication. This lets you use a Cognito user pool really easily.

Reliability

Data Backup and Versioning: If you're using Amplify DataStore, regularly backup your data.

Performance Efficiency

API Caching: If you're using GraphQL with AppSync, enable caching to improve API response times and reduce the load on your backend.

Cost Optimization

Consider not using Amplify: I don't want to position myself for or against Amplify, because the decision depends on your scenario and your problems (remember there is no single best solution!). Just consider the costs, the tradeoffs, and analyze what's the best solution for you. It may be Amplify, it may be doing things manually! Overall, remember that the cost of maintaining software is much higher than the cost of building the initial version.
Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.
- Real scenarios and solutions
- The why behind the solutions
- Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Understanding How DynamoDB Scales

Guillermo Ojeda — Thu, 19 Oct 2023 14:40:53 GMT

Note: This content was originally published at the Simple AWS newsletter.

As you probably know, DynamoDB is a NoSQL database. It's a managed, serverless service, meaning you just create a Table (that's the equivalent of a Database in Postgres), and AWS manages the underlying nodes. It's highly available, meaning nodes are distributed across 3 AZs, so loss of an AZ doesn't bring down the service. The nodes aren't like an RDS Failover Replica though, instead data is partitioned (that's why Dynamo has a Partition Key!) and split across nodes, plus replicated on other nodes for availability and resilience. That means DynamoDB can scale horizontally!

There are two modes for DynamoDB, which affect how it scales and how you're billed:

DynamoDB Provisioned Mode

You define some capacity, and DynamoDB provisions that capacity for you. This is pretty similar to provisioning an Auto Scaling Group of EC2 instances, but imagine the size of the instance is fixed, and it's one group for reads and another one for writes. Here's how that capacity translates into actual read and write operations.

Capacity in Provisioned Mode

Capacity is provisioned separately for reads and writes, and it's measured in Capacity Units.

1 Read Capacity Unit (RCU) is equivalent to 1 strongly consistent read of up to 4 KB, per second. Eventually consistent reads consume half that capacity. Reads over 4 KB consume 1 RCU (1/2 for eventually consistent) per 4 KB, rounded up. That means if you have 5 RCUs, you can perform 10 eventually consistent reads every second, or 2 strongly consistent reads for 7 KB of data each (remember it's rounded up) plus 1 strongly consistent read for 1 KB of data (again, it's rounded up).

Write Capacity Units (WCU) work the same, but for writes. 1 WCU = 1 write per second, of up to 1 KB. So, with 5 WCUs, you can perform 1 write operation per second of 4.5 KB, or 5 writes of less than 1 KB.

Remember that all operations inside a transaction consume twice the capacity, because DynamoDB uses two-phase commit for transactions. Every node has to simulate the operation and then actually perform it, so it's twice the work.

Also remember that local secondary indexes (LSIs) make each write consume additional capacity: 1 extra write operation for puts or deletes, 2 extra operations per update, with actual WCUs depending on how much data is written on the index (not the base table). Reads on an LSI that query for attributes that aren't projected on the LSI also consume additional read capacity: 1 additional read operation (RCUs for it depend on consistency and size of the data read) for each item that is read from the base table.

If you exceed the capacity (e.g. you have 5 RCUs and in one second you try to do 6 strongly consistent reads), you receive a ProvisionedThroughputExceededException. Your code should catch this and retry. DynamoDB doesn't overload from this, it'll just keep accepting operations up to your capacity and reject the rest (this is called load shedding btw). The AWS SDK already implements retries with exponential backoff, and you can tune the parameters.

Tokens, Burst Capacity and Adaptive Capacity

Under the hood, DynamoDB splits the data across several partitions, and capacity is split evenly across those partitions. So, if you set 30 RCUs for a table and it has 3 partitions, each partition gets 10 RCUs. Each partition has a "token bucket", which refills at a rate of 1 token per second per RCU (so 10 tokens per second in this case). Each read (strongly consistent, up to 4 KB) consumes 1 token, and if there are no more tokens, you get a ProvisionedThroughputExceededException.

There's two separate buckets, one for reads and one for writes. They both work exactly the same, the only difference is the operations that consume those tokens, and the size of the data (4 KB for reads, 1 KB for writes). I'll talk about RCUs, but the same is true for WCUs.

The tokens bucket has a maximum capacity of 300 * RCUs. For our example of 10 RCUs per partition (remember that each partition has its own bucket), it has a maximum capacity of 3000 tokens, refilling at 10 tokens per second. That means, with no operations going on, it takes 5 minutes to fill up to capacity.

If there's a sudden spike in traffic, these extra tokens that have been piling up will be used to execute those operations, effectively increasing the partition's capacity temporarily. For example, if every user performs 1 strongly consistent read per second on this partition, your RCUs of 10 per partition would serve 10 users. Suppose you don't have users for 5 minutes, the bucket fills up. Then, 20 users come in all of a sudden, making 20 reads per second on this partition. Thanks to those stored tokens, the partition can sustain those 20 reads per second for 5 minutes, even though the partition's RCUs are 10. AWS doesn't handle huge spikes instantaneously (e.g. it won't serve 3000 reads in a second, even if you do have 3000 tokens), but it scales this over a few seconds. This is called Burst Capacity, and it's completely separate from Auto Scaling (and will happen even with Auto Scaling disabled).

Another thing that happens with uneven load on partitions is Adaptive Capacity. RCUs are split evenly across partitions, so each of our 3 partitions will have 10 RCUs. If partition 1 is the one getting these 20 users, and the others are getting 0, then AWS can assign part of those 20 spare RCUs you have (remember that we set 30 RCUs on the table) to the partition that's handling that load. The maximum RCUs a partition can get is 1.5x the RCUs it normally gets, so in this case it would get 15 RCUs. That means 5 of our 20 spare RCUs are assigned to that partition, and the other 15 are unused capacity. That would let our partition handle those 20 users for 10 minutes instead of 5 (assuming a full token bucket). This also happens separately from Auto Scaling.

Burst Capacity doesn't effectively change RCUs, but it can make our table temporarily behave as if it had more RCUs than it really does, thanks to those stored tokens (very similar to how CPU credits work for EC2 burstable instances). It's great for performance, but I wouldn't count it as scaling (in case you forgot, we're talking about scaling DynamoDB).

Adaptive Capacity can actually increase RCUs beyond what's set for the table. If all partitions are getting requests throttled, Adaptive Capacity will increase their RCUs up to the 1.5 multiplier, even if this puts the total RCUs of the table above the value you set. This will only last for a few seconds, after which it goes back to the normal RCUs, and to throttling requests. I guess that technically counts as scaling the table's RCUs? Yeah, I'll count that as a win for me. Let's get to the real scaling though.

Scaling in Provisioned Mode

This is the real scaling. DynamoDB tables continuously send metrics to CloudWatch, CloudWatch triggers alarms when those metrics cross a certain threshold, DynamoDB gets notified about that and modifies Capacity Units accordingly.

On DynamoDB you enable Auto Scaling, set a minimum and maximum capacity units, and set a target utilization (%). You can enable scaling separately for Reads and Writes.

In the table metrics (handled by CloudWatch) you can view provisioned and consumed capacity, and throttled request count.

Here's the problem though: Auto Scaling is based on CloudWatch Alarms that trigger when the metric is above/below the threshold in at least 3 data points for 5 minutes. So, not only Auto Scaling doesn't respond fast enough for sudden spikes, it doesn't respond at all if the spikes last less than 5 minutes. That's why the default threshold is 70%, and allowed values are between 20% and 90%: You need to leave some margin for traffic to continue growing while Auto Scaling takes it sweet time to figure out it should scale.

Luckily, we have Bust Capacity and Adaptive Capacity to deal with those infrequent spikes, and retries can help you eventually serve the requests that were initially throttled. You probably can't retry your way into Auto Scaling (imagine waiting 5 minutes for a request), but retries can give Burst Capacity the few seconds it needs to kick in. Adaptive Capacity adjusts slower, and it's intended to fix uneven traffic across partitions, so don't count on it.

Now, this is all looking a lot like EC2 instances in an Auto Scaling Group, right? And we're seeing the same problems: We need to keep some extra capacity provisioned, as a buffer for traffic spikes. Even then, if a spike is big and fast enough, we can't respond to it! (except for some bursting). Why do we have these problems, if DynamoDB is supposed to be serverless? Well, it is serverless: you don't manage servers, but they're still there. What did you expect, magic? Well, sufficiently advanced science is indistinguishable from magic. Let's see if DynamoDB's other mode is close enough to serverless magic.

DynamoDB On-Demand Mode

Welcome to the real serverless mode of DynamoDB! With On-Demand mode, you don't need to worry about scaling your DynamoDB table, it happens automatically. Wait, you really believed that? Of course you need to worry! But it's much simpler to understand and manage, and it results in less throttles.

Capacity in On-Demand Mode

The cost of reads and writes stays the same: a read operation consumes 1 Read Request Unit (RRU) for every 4 KB read (half if it's eventually consistent), and a write operation consumes 1 WRU (Write Request Unit) for every 1 KB written. Twice for transactions, LSIs increase it, yadda yadda. Same as for Provisioned, we just changed Capacity Units for Request Units.

Here's the difference: There is no capacity you can set. You're billed for every actual operation, and DynamoDB manages capacity automatically and transparently. However, it does have a set capacity, it does scale, and understanding how it does is important.

Scaling in On-Demand Mode

Every newly-created table in On-Demand mode starts with 4.000 WCUs and 12.000 RCUs (yeah, that's a lot). You're not billed for those capacity units though, you'll only be billed for actual operations.

Every time your peak usage goes over 50% of the current assigned capacity, DynamoDB increases the capacity of that table to double your peak. So, suppose you used 5.000 WRUs, now your table's WCUs are 10.000. This growth has a cooldown period of 30 minutes, meaning it won't happen again until 30 minutes after the last increase.

This isn't documented anywhere, and I haven't managed to get official confirmation, but apparently capacity for On-Demand tables doesn't decrease, ever. This seems consistent with how DynamoDB works under the hood: Partitions are split in two and assigned to new nodes, with each node having a certain maximum capacity of 3000 RCUs and 1000 WCUs. Apparently partitions are never re-combined, so there's no reason to think capacity for On-Demand tables would decrease. Again, this isn't published anywhere, just a common assumption.

Switching from Provisioned Mode to On-Demand Mode

You can switch modes in either direction, but you can only do so once every 24 hours. If you switch from Provisioned Mode to On-Demand mode, the table's initial RCUs are the maximum of 12.000, your current RCUs, or double the units of the highest peak. Same for WCUs, the maximum between 4.000, your current WCUs, or double the units of the highest peak.

If you switch from On-Demand mode to Provisioned Mode, you need to set up your capacity or auto scaling manually.

In either case the switch takes up to 30 minutes, during which the table continues to function like before the switch.

Provisioned vs On-Demand - Pricing Comparison

In Provisioned Mode, like with anything provisioned, you're billed per provisioned capacity, regardless of how much you actually consume. The price is $0.00065/hour per WCU, and $0.00013/hour per RCU.

In On-Demand Mode you're only billed for Request Units (which is basically Capacity Units that were actually consumed). The price is $1.25 per million WRUs and $0.25 per million RRUs.

Let's consider some scenarios. Assume all reads are strongly consistent and read 4 KB of data, and all writes are outside transactions and for 1 KB of data. Also, there are no secondary indexes. Suppose you have the following traffic pattern:

Between 2000 and 3000 (average 2500) reads per second during 8 hours of the day (business hours).
Between 400 and 600 (average 500) reads per second during 16 hours of the day (off hours).
Between 200 and 300 (average 250) writes per second during 8 hours of the day (business hours).
Between 50 and 150 (average 100) writes per second during 16 hours of the day (off hours).

With Provisioned Mode, no Auto Scaling, we'll need to set 3000 RCUs and 300 WCUs. The price would be $0,39 per hour for reads and $0,195 per hour for writes, for a total of $280,80 + $140,40 = $421,20 per month.

With Provisioned Mode, Auto Scaling set for a minimum 400 and a maximum 3000 RCUs and minimum 50 and maximum 300 WCUs, we'll get the following:
For business hours: We'll use our average 2500 reads, so we get $0,325 per hour for reads, and $78/month for reads for business hours. For writes, using our average of 250, we get $0,1625/hour and $39/month.
For off hours: Using the average values, $0,065/hour and $31,20/month for reads, and $0,065/hour and $31,20/month for writes,
In total, we get $179,40 per month.

With On-Demand Mode, we'll just use the averages. With 2500 reads per second we have 9.000.000 reads per hour on business hours, which costs us $2,25/hour, or $540/month. We have an average 250 writes per second, so 900.000 writes per hour, which costs $1,125/hour or $270/month.
On off hours we have 1.800.000 reads per hour, for $0,45/hour and $216/month. Writes are 360.000/hour, at $0,45/hour and $216/month.
Our grand total is $540 + $270 + $216 + $216 = $1.242/month.

Note: These prices are only for reads and writes. Storage is priced separately, and so are other features like backups.

Best Practices for Scaling DynamoDB

Operational Excellence

Monitor Throttling Metrics: Keep an eye on the ReadThrottleEvents and WriteThrottleEvents metrics in CloudWatch. Compare them with your app's latency metrics, to determine how much this is impacting your app.
Audit Tables Regularly: Review your DynamoDB tables to make sure that they're performing and scaling well. This includes reviewing capacity settings, and reviewing indices and keys.

Security

Enable Point-In-Time Recovery: Data corruption doesn't happen, until it happens. Enabling Point-In-Time Recovery allows you to restore a table to a specific state if needed.

Reliability

Pre-Warm On-Demand Tables: When expecting a big increase in traffic (like a product launch), if you're using an On-Demand table, make sure to pre-warm it. You can do this by switching it to Provisioned Mode, setting its capacity to a large number and keeping it there for a few minutes, and then switching it back to On-Demand so the On-Demand capacity matches the capacity the table had in Provisioned Mode. Remember that you can only switch once every 24 hours.

Performance Efficiency

Use Auto-Scaling: This one's quite obvious, I hope. But the point is that there's no reason not to use this. Sure, sometimes it isn't fast enough, but in those cases Provisioned Mode without Auto Scaling won't work well either.
Choose the Right Partition Key: Remember what I said about capacity units being split across partitions? Well, if you pick a PK that doesn't distribute traffic uniformly (or as uniformly as possible), you're going to have a problem called hot partition. This is part of DynamoDB Database Design, but as you saw, it affects performance and scaling.

Cost Optimization

Pick The Right Mode: You saw the numbers in the example. On-Demand will very rarely result in throttling, but it is expensive. Only use it for traffic that spikes in less than the 5 minutes that Provisioned Mode with Auto Scaling takes to scale.
Monitor and Adjust Provisioned Capacity: Regularly review your capacity settings and adjust them. Traffic patterns change over time!
Use Reserved Capacity: If you have a consistent and predictable workload (like in the pricing example), consider purchasing reserved capacity for DynamoDB. It works similar to Reserved Instances: You reserve it and commit to a year or 3, for a lower price.
Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.
- Real scenarios and solutions
- The why behind the solutions
- Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Securing the Connection to S3 from EC2

Guillermo Ojeda — Thu, 07 Sep 2023 15:02:09 GMT

You've deployed your app on an EC2 instance, and there's a file in an S3 bucket that you need to access from the app. You created a public S3 bucket and uploaded the file, and it works! But then you read somewhere that keeping your private files in a public S3 bucket is a bad idea, so you set out to fix it.

Set up a restrictive bucket policy and add a VPC endpoint with an Endpoint Policy

Here's the initial setup, and you can deploy it here:

Deploy initial setup

This is what it looks like before the solution:

This is what it looks like with the solution:

Step by step instructions to secure the connection from EC2 to S3

Step 0: Test that the connection is working

Open the CloudFormation console
Select the initial state stack
Click the Outputs tab
Copy the value for EC2InstancePublicIp
Paste it in the browser, append :3000 and hit Enter/Return

Step 1: Create a VPC Endpoint

Go to the VPC console
In the panel on the left, click Endpoints
Click Create Endpoint
Enter a name
In the Services section, enter S3 in the search box, and select the one that says 'com.amazonaws.your_region.s3' (replace 'your_region' with the region where you deployed the initial setup, which is where the S3 bucket is). Then select the one that says Interface in the Type column.

For VPC, select SimpleAWSVPC from the dropdown list
Under Subnets, select us-east-1a and us-east-1b, and for each click the dropdown and select the only available subnet
Under Security groups, select the one called VPCEndpointSecurityGroup
Under Policy, pick Full Access for now (we'll change that in Step 2).
Open Additional settings
Check Enable DNS name
Uncheck Enable private DNS only for inbound endpoint
Click Create endpoint

Step 2: Configure the VPC Endpoint Policy

In the Amazon VPC console, go to Endpoints
Select the Endpoint you just created
Click the Policy tab
Click Edit Policy
Modify the following JSON by replacing the placeholder values REPLACE_BUCKET_NAME and REPLACE_VPC_ID with the name of your S3 bucket and the ID of SimpleAWSVPC. Then paste it into the Edit Policy page, and click Save.

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "AllowAccessToSpecificBucket",            "Principal": "*",            "Action": "s3:*",            "Effect": "Allow",            "Resource": [                "arn:aws:s3:::REPLACE_BUCKET_NAME",                "arn:aws:s3:::REPLACE_BUCKET_NAME/*"            ],            "Condition": {                "StringEquals": {                    "aws:sourceVpc": "REPLACE_VPC_ID"                }            }        }    ]}

Step 3: Set up a more restrictive bucket policy

Open the S3 console
Click on the bucket that you created with the initial setup
Click on the Permissions tab
Scroll down to Bucket Policy and click Edit
Paste the following policy, replacing the placeholders REPLACE_BUCKET_NAME and REPLACE_VPC_ENDPOINT_ID with their values (REPLACE_VPC_ENDPOINT_ID is not the same as REPLACE_VPC_ID from the previous step). Then click Save changes

{    "Version": "2012-10-17",    "Id": "Policy1415115909153",    "Statement": [        {            "Sid": "Access-only-from-SimpleAWSVPC",            "Effect": "Deny",            "Principal": "*",            "Action": [                "s3:PutObject",                "s3:GetObject"            ],            "Resource": [                "arn:aws:s3:::REPLACE_BUCKET_NAME",                "arn:aws:s3:::REPLACE_BUCKET_NAME/*"            ],            "Condition": {                "StringNotEquals": {                    "aws:SourceVpce": "REPLACE_VPC_ENDPOINT_ID"                }            }        },        {            "Sid": "Access-from-everywhere",            "Effect": "Allow",            "Principal": "*",            "Action": "s3:*",            "Resource": [                "arn:aws:s3:::REPLACE_BUCKET_NAME",                "arn:aws:s3:::REPLACE_BUCKET_NAME/*"            ]        }    ]}

Step 4: Test that the connection is still working

Go back to the browser tab where you pasted the public IP address of the instance and refresh the page

Step 5: Empty the S3 bucket

Before deleting the CloudFormation stack, you'll need to empty the S3 bucket! The Node.js app puts a file in there.

How does this solution make the connection from EC2 to S3 more secure?

VPC Endpoints

First of all, you'll notice that a VPC Endpoint is for one specific service, S3 in this case. If you wanted to connect to other services you'd need to create a separate VPC Endpoint for each different service.

The second thing you'll notice is that there are 2 types of endpoints: Interface and Gateway. Gateway endpoints are only for S3 and DynamoDB, while Interface endpoints are for nearly everything. Gateway endpoints are simpler, so use them when you can (except if you're writing a newsletter and want to show a few things about Interface endpoints).

Interface endpoints work by creating an Elastic Network Interface in every subnet where you deploy it, and automatically routing to that ENI the traffic that's addressed to the public endpoint of the service. That way, you don't need to make any changes to the code. This only works if you check Enable DNS name.

VPC Endpoint Policies

The existing policy is a Full Access policy, which is the default policy when a VPC endpoint is created. It allows all actions on the S3 service from anyone.

Instead of that, we're setting up a more restrictive policy, which only allows access to our specific bucket, and denies access to all other buckets.

VPC Endpoint policies are IAM resource policies, and as such, anything that's not explicitly allowed is implicitly denied.

Restrictive S3 bucket policies

Bucket policies are another type of IAM resource policies. Obviously, this bucket policy will only apply to our S3 bucket. It's important to add it because, while we've restricted what the VPC Endpoint can be used for, the S3 bucket can still be accessed from outside the VPC (e.g. from the public internet). This bucket policy is the one that's going to prevent that, restricting access to only from the VPC Endpoint.

Discussing Connection Security to S3

In this case I kept internet access for the VPC and for the EC2 instance itself, just to make it easier to trigger the code with an HTTP request. This solution is a good idea in these cases because traffic to S3 doesn't go over the public internet, but admittedly, the public internet is a viable alternative.

Where this solution matters more is when you don't have access to the internet. Sure, adding it is rather simple, but you're either exposing yourself unnecessarily by giving your instances a public IP address they don't need, or you're paying for a NAT Gateway. In those cases, VPC Endpoints are a much simpler, safer and cheaper solution.

Conceptually, you can think of this as giving the S3 service a private IP address inside your VPC. In reality, what you're doing is creating a private IP address in your VPC that leads to the S3 service, so that conception is pretty accurate! Behind the scenes (and you can see this easily), the VPC service creates an Elastic Network Interface (ENI) in every subnet where you deploy the VPC Endpoint. Those ENIs will forward the traffic to the S3 service endpoints that are private to the AWS network.

Also, behind the scenes there's a Route 53 Private Hosted Zone that you can't see, but which resolves the S3 address to the private IPs of those ENIs, instead of to the public IPs of the public endpoints. That's why you don't need to change the code: Your code depends on the address of the S3 service, and that private hosted zone takes care of resolving it to a different address. You can't see this private hosted zone, it's managed by AWS and hidden from users.

Best Practices for S3 Security

Operational Excellence

Monitor and Alert Endpoint Health: Monitor the health of your VPC endpoints using CloudWatch metrics. Any unusual activity or degradation in performance should trigger alerts. This could also help you detect a security incident!

Security

Least Privilege Access to Bucket: This is basically what we did in Step 3: We disabled public access, and implemented a policy that only allows reads from the VPC. Try reading from that S3 bucket from your own computer: aws s3api get-object --bucket 12ewqaewr2qqq --key thankyou.txt thankyou.txt --region us-east-1
Regularly Audit IAM Policies: Regularly review and tighten your IAM policies. Not only for the VPC Endpoint and S3 bucket, but also for the EC2 instance!

Reliability

Use Multiple Subnets in Different AZs: Each subnet gets one ENI, so if you distribute your subnets in several AZs, your VPC Endpoint is highly available within the region (i.e., it can continue functioning if an Availability Zone fails).

Performance Efficiency

Choose the Right VPC Endpoint Type: Choose the right type of VPC Endpoint based on your workload. For S3, a Gateway Endpoint works best. I'll leave it to you to figure out how to create it (=.

Cost Optimization

Delete Unused VPC Endpoints: Regularly delete any unused VPC endpoints to avoid paying for stuff you don't use.

Stop copying cloud solutions, start understanding them. Join over 3500 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Using SQS to Throttle Writes to DynamoDB

Guillermo Ojeda — Thu, 07 Sep 2023 00:12:11 GMT

We're running an e-commerce platform, where people publish products and other people purchase those products. Our backend has some highly scalable microservices running on well-designed Lambdas, and there's a lot of caching involved. Our order processing microservice writes to a DynamoDB table we set up following How to Design a DynamoDB Database. We're using DynamoDB provisioned capacity mode with auto scaling. We did a great job and everything runs smoothly.

Suddenly, someone's product goes viral, and a lot of people rush in to buy it at the same time. Our cache and CDN don't even blink at the traffic, our well-designed Lambdas scale amazingly fast, but our DynamoDB table is suddenly bombarded with writes and the auto scaling can't keep up. Our order processing Lambda receives ProvisionedThroughputExceededException, and when it retries it just makes everything worse. Things crash. Sales are lost. We eventually recover, but those customers are gone. How do we make sure it doesn't happen again?

Option 1 is to change the DynamoDB table to On-demand, which can keep up with Lambda when scaling, but it's over 5x more expensive. Option 2 is to make sure the table's write capacity isn't exceeded. Let's explore option 2.

AWS Services involved:

DynamoDB: Our database. All you need to know for this post is how DynamoDB scales.
SQS: A fully managed message queuing service that enables you to decouple components. Producers like our order processing microservice post to the queue, the queue stores these messages until they're read, consumers read from the queue in their own time.
SES: An email platform, more similar to services like MailChimp than to an AWS service. If you're already on AWS and you just need to send emails programmatically, it's easy to set up. If you're not on AWS, need more control, or need to send so many emails that price is a factor, you'll need to do some research. For this post, SES is good enough.

What is Amazon SQS

SQS is a fully managed message queuing service. A messaging queue is a data structure where items can be read in the same order as they were written: First-In, First-Out (FIFO).

Queues allow us to decouple components by making the consumer (that's the component that reads from the queue) unaware of who wrote the item (the writer is called producer). Additionally, in software architecture we usually focus on another characteristic of queues: The reading of the item can happen some time after the writing. This lets us decouple producers and consumers in time: Consumers don't need to be available when Producers write to the queue. The queue stores the messages for a certain amount of time, and when Consumers are ready, they poll the queue for messages, and receive the oldest message.

For our solution, we're going to use a queue so that our order processing microservice can send a message with the order, the queue stores the message, and a consumer can read it at its own rhythm (i.e. at our DynamoDB table's rhythm).

Types of SQS Queues

There's two types of queues in SQS:

Standard queues are the default type of queue. They're cheaper than FIFO queues and nearly-infinitely scalable. The tradeoff is that they only guarantee at-least-once delivery (meaning you might get duplicates), and order of the messages is mostly respected but not guaranteed.
FIFO queues are more expensive than Standard queues, and they don't scale infinitely, but they guarantee ordered, exactly-once delivery. You need to set the MessageGroupId property in the message, since FIFO queues only deliver the next message in a MessageGroup after the previous message has been successfully processed. For example, if you set the value of MessageGroupId to the customer ID and a customer makes two orders at the same time, the second one to come in won't be processed until the first one is finished processing. It's also important to set MessageDeduplicationId, to ensure that if the message gets duplicated upstream, it will be deduplicated at the queue. A FIFO queue will only keep one message per unique value of MessageDeduplicationId.

Most people who think of queues are thinking guaranteed FIFO order and exactly-once delivery. The only way to actually get those guarantees is with FIFO queues.

How to Implement an SQS Queue for DynamoDB

Follow these step by step instructions to implement an SQS Queue to throttle writes to a DynamoDB table. Replace YOUR_ACCOUNT_ID and YOUR_REGION with the appropriate values for your account and region.

Create the Orders Queue

Go to the SQS console.
Click "Create queue"
Choose the "FIFO" queue type (not the default Standard)
In the "Queue name" field enter "OrdersQueue"
Leave the rest as default
Click on "Create queue"

Update the Orders service to write to the SQS queue

We need to update the code of the Orders service so that it sends the new Order to the Orders Queue, instead of writing to the Orders table. This is what the code looks like:

const AWS = require('aws-sdk');const sqs = new AWS.SQS();const queueUrl = 'https://sqs.YOUR_REGION.amazonaws.com/YOUR_ACCOUNT_ID/OrdersQueue';async function processOrder(order) {  const params = {    MessageBody: JSON.stringify(order),    QueueUrl: queueUrl,    MessageGroupId: order.customerId,    MessageDeduplicationId: order.id  };  try {    const result = await sqs.sendMessage(params).promise();    console.log('Order sent to SQS:', result.MessageId);  } catch (error) {    console.error('Error sending order to SQS:', error);  }}

Also, add this policy to the IAM Role of the function, so it can access SQS. Don't forget to delete the permissions to access DynamoDB!

{  "Version": "2012-10-17",  "Statement": [    {      "Effect": "Allow",      "Action": "sqs:SendMessage",      "Resource": "arn:aws:sqs:YOUR_REGION:YOUR_ACCOUNT_ID:OrdersQueue"    }  ]}

Set up SES to notify the customer via email

Open the SES console
Click on "Domains" in the left navigation pane
Click "Verify a new domain"
Follow the on-screen instructions to add the required DNS records for your domain.
Alternatively, click on "Email Addresses" and then click the "Verify a new email address" button. Enter the email address you want to verify and click "Verify This Email Address". Check your inbox and click the link.

Set up the Order Processing service

Go to the Lambda console and create a new Lambda function. Add the following code:

const AWS = require('aws-sdk');const dynamoDB = new AWS.DynamoDB.DocumentClient();const ses = new AWS.SES();exports.handler = async (event) => {    for (const record of event.Records) {        const order = JSON.parse(record.body);        await saveOrderToDynamoDB(order);        await sendEmailNotification(order);    }};async function saveOrderToDynamoDB(order) {    const params = {        TableName: 'orders',        Item: order    };    try {        await dynamoDB.put(params).promise();        console.log(`Order saved: ${order.orderId}`);    } catch (error) {        console.error(`Error saving order: ${order.orderId}`, error);    }}async function sendEmailNotification(order) {    const emailParams = {        Source: 'you@simpleaws.dev',        Destination: {            ToAddresses: [order.customerEmail]        },        Message: {            Subject: {                Data: 'Your order is ready'            },            Body: {                Text: {                    Data: `Thank you for your order, ${order.customerName}! Your order #${order.orderId} is now ready.`                }            }        }    };    try {        await ses.sendEmail(emailParams).promise();        console.log(`Email sent: ${order.orderId}`);    } catch (error) {        console.error(`Error sending email for order: ${order.orderId}`, error);    }}

Also, add the following IAM Policy to the IAM Role of the function, so it can be triggered by SQS and access DynamoDB and SES:

{  "Version": "2012-10-17",  "Statement": [    {      "Effect": "Allow",      "Action": [        "sqs:ReceiveMessage",        "sqs:DeleteMessage",        "sqs:GetQueueAttributes"      ],      "Resource": "arn:aws:sqs:YOUR_REGION:YOUR_ACCOUNT_ID:OrdersQueue"    },    {      "Effect": "Allow",      "Action": [        "dynamodb:PutItem",        "dynamodb:UpdateItem",        "dynamodb:DeleteItem"      ],      "Resource": "arn:aws:dynamodb:YOUR_REGION:YOUR_ACCOUNT_ID:table/Orders"    },    {      "Effect": "Allow",      "Action": "ses:SendEmail",      "Resource": "*"    }  ]}

Make the Orders Queue trigger the Order Processing service

In the Lambda console, go to the Order Processing lambda
In the "Function overview" section, click "Add trigger"
Click "Select a trigger" and choose "SQS"
Select the Orders Queue
Set Batch size to 1
Make sure that the "Enable trigger" checkbox is checked
Click "Add"

Limit concurrent executions of the Order Processing Lambda

In the Lambda console, go to the Order Processing lambda
Scroll down to the "Concurrency" section
Click "Edit"
In the "Provisioned Concurrency" section, set "Reserved Concurrency" to 10
Click "Save"

Synchronous and Asynchronous Workflows with SQS

Architecture-wise, there's one big change in our solution: We've made our workflow async! Let me bring the diagram here.

Before, our Orders service would return the result of the order. From the user's perspective, they wait until the order is processed, and they see the result on the website. From the system's perspective, we're constrained to either succeed or fail processing the order in the timeout limit of API Gateway (29 seconds). In more practical terms, we're limited by what the user is expecting: we can't just show a "loading" icon for 29 seconds!

After the change, the website just shows something like "We're processing your order, we'll email you when it's ready". That sets a different expectation to the user. That's important for the system, because now we could actually have our Lambda function take 15 minutes, without hitting the 29 seconds limit of API Gateway, or without the user getting angry. It's not just that though, if the Order Processing lambda crashes mid-execution, the SQS queue will make the order available again as a message after the visibility timeout expires, and the Lambda service will invoke our function again with the same order. When the maxReceiveCount limit is reached, the order can be sent to another queue called Dead Letters Queue (DLQ), where we can store failed orders for future reference. We didn't set up a DLQ here, but it's easy enough, and for small and medium-sized systems you can easily set up SNS to send you an email and resolve the issue manually, since the volume shouldn't be particularly large.

Once the order went through all the steps, failed some, retried, succeeded, etc, then we notify the user that their order is "ready". This can look different for different systems, some are just a "we got the money", some ship physical products, some onboard the user to a complex SaaS. For this solution I chose to do it via email because it's easy and common enough, but you could use a webhook and still keep the process async.

Best Practices for SQS and DynamoDB

Operational Excellence

Monitor and set alarms: You know how to monitor Lambdas. You can monitor SQS queues as well! An interesting alarm to set here would be number of orders in the queue, so our customers don't wait too long for their orders to be processed.
Handle errors and retries: Be ready for anything to fail, and architect accordingly. Set up a DLQ, set up notifications (to you and to the user) for when things fail, and above all don't lose/corrupt data.
Set up tracing: We're complicating things a bit (hopefully for a good reason). We can gain better visibility into that complexity by setting up X-Ray.

Security

Check "Enable server-side encryption": That's all you need to do for an SQS queue to be encrypted at rest: check that box, and pick a KMS key. SQS communicates over HTTPS, so you already have encryption in transit.
Tighten permissions: The IAM policies in this issue are pretty restrictive. But there's always a nut to tighten, so keep your eyes open.

Reliability

Set up maxReceiveCount and a DQL: With a FIFO queue, the next message won't be available for processing until the previous one is either processed successfully or dropped (to the DLQ if you set one) after maxReceiveCount attempts. If you don't set these, one corrupted order will block your whole system.
Set visibility timeout: This is the time that SQS waits without receiving the "success" response, before assuming the message wasn't processed successfully and making it available again for the next consumer. Set a reasonable value, and set the same value as a timeout for your consumer (Order Processing lambda in this case).

Performance Efficiency

Optimize Lambda function memory: More memory means more money. But it also means faster processing. Going from 30 to 25 seconds won't matter much for a successfully processed order, but if orders are retried 5 times, now it's 25 seconds we're gaining instead of 5. Could be worth it, depending on your customers' expectations.
Use Batch processing: As discussed earlier, you should consider processing messages in batches.
Remember the 20 advanced tips for Lambda.

Cost Optimization

Provisioned vs. On-demand for DynamoDB: Remember that this could be fixed by using our DynamoDB table in On-demand mode. It's 5x more expensive though. Same goes for relational databases (if we use Aurora, then Aurora Serverless is an option).
Consider something other than Lambda: In this case, we're trying to get all orders processed relatively fast. If the processing can wait a bit more, an auto scaling group that scales based on the number of messages in the SQS queue can work wonders, for a lot less money.

Stop copying cloud solutions, start understanding them. Join over 3000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Amazon EBS Basics and Best Practices

Guillermo Ojeda — Wed, 30 Aug 2023 16:15:03 GMT

Elastic Block Store (EBS for short) is a block-level storage service for EC2 instances. Essentially it's a virtual SSD or HDD that you attach to EC2 instances, so they can have persistent storage. Honestly, EBS is pretty boring to talk about, but if you're storing a ton of data, knowing the fine details can save you a lot of money. Let's start with the basics.

EBS Basic Concepts

What is an EBS Volume

An EBS volume is a virtual block-level storage device that can be used by EC2 instances to store persistent data. An EBS volume acts like a HDD or SSD, but behind the scenes they're actually an array of physical discs in a RAID configuration, in the same datacenter but physically distanced from each other to minimize the probability of simultaneous failures (e.g. due to fires).

When you create an EBS volume you can define the size, the performance (only for some volume types), and the volume type, which is explained in the next section. Size and performance can be changed later, but not volume type.

EBS volumes exist separate from EC2 instances, and can be detached to one instance and attached to another one. When you create an EC2 instance, an EBS volume is created and attached as a root volume, and the default behavior is to delete it when the instance is terminated (this can be changed). But it's important to understand that they are actually a separate service from EC2.

Types of EBS Volumes

As is often the case in AWS, there's different types of resources to serve different use cases and needs. These are the types of volumes you can create in EBS. Remember that you set this on creation and cannot change it later.

EBS GP3 Volumes

This is the general-purpose volume type, which you should use for most stuff, and default to when in doubt. Size and performance (IOPS) can be configured separately (unlike the previous generation, GP2). Here are some details:

Volume Size: 1 GB to 16 TB
Durability: 99.8% to 99.9%
Max IOPS/Volume: 16,000 (operations of 16K)
Max Throughput/Volume: 1000 MB/s
Latency: single digit milliseconds
Price:
- $0.08/GB-month
- $0.005/provisioned IOPS-month over 3,000 (the first 3,000 are free)

EBS IO2 Volumes

GP3 is great for most use cases, but if you need more performance, you go with IO2. If you're running a database on an EC2 instance, this is the one to pick. Size and performance (IOPS) can be configured separately, and the limits are higher than for GP3. Details:

Volume Size: 4 GB to 16 TB
Durability: 99.999%
Max IOPS/Volume: 64,000 (operations of 16K). 256,000 IOPS with io2 Block Express.
Max Throughput/Volume: 1,000 MB/s. 4,000 MB/s with io2 Block Express.
Latency: single digit millisecond
Price:
- $0.125/GB-month
- $0.065/provisioned IOPS-month up to 32,000 IOPS (no free IOPS)
- $0.046/provisioned IOPS-month from 32,001 to 64,000 IOPS

EBS ST1 Volumes

This one is actually an HDD (virtual, but backed by real HDDs). Spinning disks! The use case is sequential access to data that sits contiguously in the physical disks. In general, that means data that is written as a long stream, and then read as a long stream, instead of having different parts accessed at random. It offers pretty good performance for that compared to GP3, at almost half the price. Here are the specs:

Volume Size: 125 GB to 16 TB
Durability: 99.8% to 99.9% durability
Max IOPS/Volume: 500 (operations of 1 MB, not 16K)
Max Throughput/Volume: 500 MB/s
Price: $0.045/GB-month

Maximum performance varies per size, at 40 MB/s per TB. Additionally, ST1 Volumes use a burst credits system to accumulate credits while usage is below peak throughput, and consume them to achieve for a period of time a higher throughput than the soft limit. In short, a 12.5 TB volume always performs at max 500 MB/s, and any volume below that has a lower soft limit, but can reach 500 MB/s for a short period of time. This is better explained here.

Certification Exam tip: ST1 volumes can't be root volumes.

EBS SC1 Volumes

SC1 volumes are also HDDs, aimed at offering the lowest price per GB of all block storage options. They're recommended for infrequently accessed data that needs to be accessed from block storage (the C in SC1 stand for Cold). If you don't strictly need block storage access, S3 Infrequent Access or Glacier are also viable options.

Volume Size: 125 GB to 16 TB
Durability: 99.8% to 99.9% durability Much less than S3!
Max IOPS/Volume: 250 (operations of 1 MB, not 16K)
Max Throughput/Volume: 250 MB/s
Price: $0.015/GB-month Slightly higher than S3 Infrequent Access, but with EBS SC1 you're not billed for reads and writes

SC1 volumes use the same burst credit system as ST1, though their performance is lower.

Certification Exam tip: SC1 volumes can't be root volumes.

Older Volume Types

IO1 (predecessor of IO2) and GP2 (predecessor of GP3) were the norm a few years ago, and you'll most likely find some still in production. IO1 works just like IO2, but with less durability, and a higher price at high IOPS. GP2 has the same use cases than GP3, but performance was tied to volume size and it used a burst credits system, like ST1. It's also 25% more expensive than GP3.

Cost-savings tip: Migrate GP2 volumes to GP3.

Characteristics of EBS Volumes

EC2 Root Volume

An EC2 instance comes with an EBS volume associated with it, called the root volume. This volume contains the OS, some libraries and programs, and some configurations. This is where the EC2 instance boots from when starting up.

Multiple EBS Volumes

You can attach up to 128 EBS volumes to an instance (depending on instance type), so long as they're in the same Availability Zone as the EC2 instance. Once a volume is attached, you'll see it in the OS just like if you were physically attaching a disk, and you can mount it on the OS or file system.

You can detach an EBS volume from an instance and attach it to another instance as many times as you want, so long as both instances are in the same Availability Zone as the EC2 instance.

Volumes of any type other than IO1 or IO2 can only be attached to once instance at a time. IO1 and IO2 volumes can be attached to multiple instances at the same time, in read-write mode.

Availability of EBS Volumes

EBS volumes are zonal resources. They exist in a single Availability Zone, so they are not highly available. This is also true for IO2 volumes, which offer durability of 99.999%.

EBS volumes are redundant within that Availability Zone, so data loss is significantly less likely than with a single disk. They're backed by an array of physical disks in a RAID configuration.

Lifecycle of an EBS Volume

It's important to understand that the lifecycle of an EBS volume is separate from that of the EC2 instance. You can create them, attach them, detach them and delete them on their own. You can also configure them to be deleted when the EC2 instance they're attached to is terminated, which is the default for the root volume.

Encryption of EBS Volumes

EBS volumes can be encrypted using KMS, in a way that's entirely transparent to you. You don't need to manage or use any encryption keys, the EBS service automatically fetches them and decrypts the data when you initiate a read operation.

You can enable Encryption by Default on your AWS account, which means every new EBS volume you create will be encrypted unless you explicitly configure it not to. This is a highly recommended best practice for AWS security.

Backing up EBS Volumes: EBS Snapshots

EBS snapshots are point-in-time copies of EBS volumes, used to back up and restore data. They're incremental, which means they only capture the data that has changed since the last snapshot. This makes EBS snapshots more efficient and cost-effective than full-volume backups. The size of an EBS snapshot is calculated based on the amount of data stored in the volume at the time the snapshot was taken.

Snapshots are regional resources, meaning you can use them to hold a copy of an EBS volume and, if the volume's Availability Zone fails, restore it from the snapshot in a different Availability Zone. They can also be shared across regions and AWS accounts. Here's where you can read more about automating EBS snapshots for Disaster Recovery.

EBS Best Practices

Default to EBS GP3 Volumes

You should default to GP3 volumes, and only use the other volume types if you have a specific use case, or you know you need more performance. Here's a guide to benchmark EBS volumes.

Use EBS-optimized instances for higher performance

EC2 Instance families have a limit on performance with EBS volumes, which is independent of the EBS volume itself. If you need high performance, it may not be enough to just use a better EBS volume such as IO2. You'll also need to look into whether your EC2 instances support that level of performance, and possibly use EC2 instances that are Storage Optimized.

EBS performance is also limited by instance size. You can use the EBSIOBalance% and EBSByteBalance% metrics in CloudWatch to help you determine whether your instances are sized correctly. Instances with a consistently low balance percentage should be increased in size, and instances where the balance percentage never drops below 100% should be reduced in size.

Use EC2 Instance Store for extreme performance

If you need extreme performance, you'll need to use EC2 Instance Store. It's ephemeral (that means non-permanent) block storage with a much higher performance than EBS. The main disadvantages are the pricing (you need an EC2 instance of a special family, which isn't cheap) and the fact that data is lost if the instance is stopped or terminated.

Encrypt your EBS Volumes

This comes at no cost and no performance hit to you, so it should be a no brainer. First, you should enable Encryption by Default, so all future EBS volumes are created with encryption. Then you should encrypt existing EBS volumes by creating a snapshot of them, encrypting that snapshot and creating a new volume from the encrypted snapshot.

Migrate GP2 volumes to GP3

GP3 volumes can do anything that GP2 volumes can, and they're 20% cheaper. Here's a guide to migrate your existing GP2 volumes to GP3.

Back up important EBS volumes

Remember that EBS volumes are zonal resources, meaning that if an Availability Zone goes offline you won't be able to access them. Furthermore, most EBS volume types offer 99.8% durability, making data loss or corruption entirely possible. To guard against that, create Snapshots of your EBS volumes.

Snapshots are regional, so you're good if an AZ goes down. But if you want to do cross-region disaster recovery, you'll need to export the snapshots to another AWS region. If you're exporting encrypted snapshots, use a multi-region KMS key.

You can use Data Lifecycle Manager to automate creating and exporting EBS snapshots.

Accessing a block for the first time on an EBS volume created from a snapshot has huge latency, because the data is lazy loaded from S3. To avoid this, you can initialize (pre-warm) the volume before putting the volume in production, by accessing each block once. You can do this on Linux by attaching the volume to an EC2 instance, installing the fio utility, and running the following program (example for a volume called xvdf)

sudo fio --filename=/dev/xvdf --rw=read --bs=1M --iodepth=32 --ioengine=libaio --direct=1 --name=volume-initialize

Another option is to enable EBS fast snapshot restore on the snapshot. This will ensure AWS does the initialization for you, and is much faster. However, it costs $540/month per snapshot (regardless of size), so I prefer the manual option.

Stop copying cloud solutions, start understanding them. Join over 3000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Microservices in AWS: Migrating from a Monolith

Guillermo Ojeda — Tue, 29 Aug 2023 01:36:01 GMT

The first rule about microservices is that you don't need microservices (for 99% of applications). They were invented as a REST-based implementation of Service-Oriented Architectures, which is an XML-based Enterprise Architecture pattern so complex that XML is the easiest part.

At some point, microservices became this really cool thing that all the cool kids were doing for street cred. "Netflix does it, so if I do it I'll be as cool as Netflix!" Folks, it doesn't work like that.

What Are Microservices?

A microservice is a service in a software application which encapsulates a bounded context (including the data), can be built, deployed and scaled independently, and exposes functionality through a clearly defined API.

Let's expand a bit on each characteristic:

Bounded context: The concept stems from Domain-Driven Design's bounded contexts, and essentially advocates for dividing the entire domain into several smaller domains. Microservices takes it a step further: each microservice is part of only one bounded context, and owns that context, including all domain entities, all data, and all operations and functionality. Anything that needs to access that bounded context needs to do so through the microservice's API. Conceptually, it's similar to encapsulation in Object-Oriented Programming, but on a higher level.
Built, deployed and scaled independently: Each microservice is independent in every sense. It can be built by a separate team using different technologies, it has its own deployment pipeline and process, and it can be scaled independently of the rest of the system. This provides a clear separation between what the microservice does and how it does it, and gives you a lot of flexibility on that how.
Clearly defined API: One microservice can't solve everything, or it would be just a monolith. You need several, and you need to combine them to realize the entire system's functionality. That means, you need a clear and unambiguous way to communicate with a microservice. The API is the interface of the microservice, and it's the only thing that other components of the system can access. This minimizes dependencies between microservices, letting them be implemented and evolved separately, so long as they adhere to their API. Keep in mind that, since each microservice owns its data, the only way to access the data of a microservice is through that microservice's API. No reading directly from another microservice's database!

Why Use Microservices?

Microservices exist to solve a specific problem: problems in complex domains require complex solutions, which become unmanageable due to the size and complexity of the domain itself. Microservices (when done right) split that complex domain into simpler domains, encapsulating the complexity and reducing the scope of changes.

Microservices also add complexity to the solution, because now you need to figure out where to draw the boundaries of the domains, and how the microservices interact with each other, both at the domain level (complex actions that span several microservices) and at the technical level (service discovery, networking, permissions).

So, when do you need microservices? When the reduction in complexity of the domain outweighs the increase in complexity of the solution.

When do you not need microservices? When the domain is not that complex. In that case, use regular services, where the only split is in the behavior (i.e. backend code). Or stick with a monolith, Facebook does that and it works pretty well, at a size we can only dream of.

Types of Microservices

There's two ways in which you can split your application into microservices:

Vertical slices: Each microservice solves a particular use case or a set of tightly-related use cases. You add services as you add features, and each user interaction goes through the minimum possible number of services (ideally only 1). This means features are an aspect of decomposition. Code reuse is achieved through shared libraries, and cross-service responsibilities are implemented on support microservices. This results in architectures very similar to SOA.
Functional services: Each service handles one particular step, integration, state, or thing. System behavior is an emergent property, resulting from combining different services in different ways. Each user interaction invokes multiple services. New features don't need entirely new services, just new combinations of services. Features are an aspect of integration, not decomposition. Code reuse is often achieved through invoking another service. This is often much harder to do, both because of the difficulty in translating use cases into reusable steps, and because you need a lot of complex distributed transactions.

Overall, vertical slices is easier to understand, and easier to implement for smaller systems. The drawback is that if your system does 200 different things, you'll need 200 services, plus support services and libraries. Functional services are harder to conceptualize, and it's not uncommon to end up with a ton of microservices that have 50 lines of code and don't own any data. If that's your case, you're doing it wrong. Remember that the split should be at the domain level, not at the code level. It's perfectly ok for a microservice to be implemented with several services!

Don't combine these two types of microservices! If you're doing vertical slices, support microservices should be only for non-business behavior, such as logging. If you're doing functional microservices, don't create a service that just orchestrates calls between other microservices; either use an orchestrator for all transactions, or choreograph them. And don't even think about migrating from one type of microservices to the other one. It's much, much easier to just drop the whole system and start from scratch.

Splitting a Monolith into Microservices

Let's see microservices in a real example. Picture the following scenario: We have an online learning platform built as a monolithic application, which enables users to browse and enroll in a variety of courses, access course materials such as videos, quizzes, and assignments, and track their progress throughout the courses. The application is deployed on Amazon ECS as a single service that's scalable and highly available.

As the app grew, we've noticed that content delivery becomes a bottleneck during normal operations. Additionally, changes in the course directory resulted in some bugs in progress tracking. To deal with these issues, we decided to split the app into three microservices: Course Catalog, Content Delivery, and Progress Tracking.

Out of scope (so we don't lose focus):

Authentication/authorization: When I say users I mean authenticated users. We could use Cognito to secure access to microservices, but let's focus on designing the microservices first.
User registration and management: Same as above.
Payments: Since our courses are so awesome, we should charge for them. We could use a separate microservice that integrates with a payment processor such as Stripe.
Caching and CDN: We should use CloudFront to cache the content, to reduce latency and costs. We'll do that in a future issue, let's focus on the microservices right now.
Frontend: Obviously, we need a frontend for our app. Let's keep the focus on the microservices, but if you're interested in frontend you might want to check out AWS Amplify.
Database design: Let's assume our database is properly designed. If you're interested in this topic, you should read DynamoDB Database Design.
Admin: Someone has to create the courses, upload the content, course metadata, etc. The operations to do that fall under the scope of our microservices, but I feared it would grow too complex, so I cut those features out.

AWS Services involved:

ECS: Our app is already deployed in ECS as a single ECS Service, we're going to split it into 3 microservices and deploy each as an ECS Service. We won't dive deep into ECS, but if you're interested you can learn about how to deploy a Node.js application on ECS.
DynamoDB: Our database for this example.
API Gateway: Used to expose each microservice.
Elastic Load Balancer: To balance traffic across all the tasks.
S3: Storage for the content (video files) of the courses.
ECR: A Docker registry managed by AWS.

Final design of the app split into microservices

How to Split a Monolith Into Microservices

Step 0: Make the Monolith Modular

The first step should always be to make sure your monolith is already separated into modules with clearly defined responsibilities. Modules should be well scoped, both in terms of functionality and in the code that implements that functionality. They should be cohesive, and lowly coupled to other modules. The level of granularity doesn't matter much, though ideally you'd be splitting modules according to the concept of domains from Doman-Driven Design (you don't need to apply the entirety of Domain-Driven Design). However, you can refine the scope and granularity when you start with microservices. For now, what's important is that you have clearly defined modules with clearly defined responsibilities, instead of a bowl of spaghetti code.

For this example we're going to assume this is already the case, but if you're dealing with a monolith that's not well modularized, that should be the first thing you do. If you commit all the way to microservices, you won't really use the modular monolith. However, I still recommend you first work on separating it into modules, to make the overall process easier by tackling one thing at a time.

Step 1: Identify the Microservices

Start by analyzing the monolithic application, focusing on the course catalog, content delivery, and progress tracking functionalities. Based on these functionalities, outline the responsibilities for each microservice:

Course Catalog: manage courses and their metadata.
Content Delivery: handle storage and distribution of course content.
Progress Tracking: manage user progress through courses.

Step 2: Define the APIs for each microservice

Once you understand what each microservice needs to do, you need to design the API endpoints for each microservice:

Course Catalog:
- GET /courses list all courses
- GET /courses/:id get a specific course
Content Delivery:
- GET /content/:id get a pre-signed URL for a specific course content
Progress Tracking:
- GET /progress/:userId get a user's progress
- PUT /progress/:userId/:courseId update a user's progress for a specific course

API endpoints are how microservices define and expose their functionality to external components. Essentially, the API is what a microservice can do for the user or for other microservices. We already knew the responsibilities of each microservice from Step 1, with this step we're expressing them in technical terms that other components can understand. We're also documenting them in a clear and unambiguous way.

If you're starting from a well-designed modular monolith, these APIs already exist as the APIs for services and interfaces for components, and you're just re-expressing them in a different, unified way. If the starting monolith isn't well modularized, you may find some of these APIs as functions, and you may need to add a few. In those cases it's easier to first modularize the monolith, then split it into microservices.

API design is really important, and hard to do. We're not just splitting the entire app's responsibilities into groups that we call microservices. We're actually creating several apps, that we're then going to interconnect to produce the expected system behavior. We need to not only define those apps' responsibilities well, but also design them in a maintainable way. Check out Fowler's post on consumer-driven contracts for some deeper insights.

Step 3: Configure API Gateway for each microservice

Create an API Gateway resource for each microservice (Course Catalog, Content Delivery, and Progress Tracking). Point the different routes to your monolith's APIs for now, since we don't have any microservices yet. Update any frontend code or DNS routes to resolve to the API Gateways.

This isn't a strict requirement, but I added it as part of the solution because it makes the switchover much easier: All we need to do is update the API Gateway of each microservice to point to the newly deployed microservice. Since everything else already depends on that API Gateway for that functionality, we're just changing who's resolving those requests. This way, we've effectively decoupled the API from its implementation. API Gateway also makes other things much easier, such as managing authentication for microservices.

Step 4: Create separate repositories and projects for each microservice

Set up individual repositories and Node.js projects for Course Catalog, Content Delivery, and Progress Tracking microservices. Structure the projects using best practices, with separate folders for routes, controllers, and database access code. You know the drill.

This is just the scaffolding, moving the actual code comes in the next step. The key takeaway is that you treat each microservice as a separate project. You could also use a monorepo, where the whole codebase is in a single git repository, each service has its own folder, and it's still deployed separately. This works well when you have a lot of shared dependencies, but in my experience it's harder to pull off.

Step 5: Separate the code

Refactor the monolithic application code, moving the relevant functionality for each microservice into its respective project:

Move the code related to managing courses and their metadata into the Course Catalog microservice project.
Move the code related to handling storage and distribution of course content into the Content Delivery microservice project.
Move the code related to managing user progress through courses into the Progress Tracking microservice project.

The code in the monolith may not be as clearly separated as you might want. In that case, first refactor as needed until you can copy-paste the implementation code from your monolith to your services (but don't copy it just yet). Then test the refactor. Finally, do the copy-pasting.

Step 6: Separate the data

First, create separate Amazon DynamoDB tables for each microservice:

CourseCatalog: stores course metadata, such as title, description, and content ID.
Content: stores content metadata, including content ID, content type, and S3 object key.
Progress: stores user progress, with fields for user ID, course ID, and progress details.

Then update the database access code and configurations in each microservice, so each one interacts with its own table.

Remember that the difference between a service and a microservice is the bounded context. Each microservice owns its domain model, including the data, and the only way to access that model (and the database that stores it) is through that microservice's API.

We could implement this separation of data at the conceptual level, without enforcing it through separate tables. We could even enforce it while keeping all data in a single table, using DynamoDB's field-level permissions. The problem with that idea (aside from the permissions nightmare) is that we wouldn't be able to scale the services independently, since DynamoDB capacity is managed per table.

If you're doing this for a database which already has data, but you can tolerate the system being offline during the migration, you can export the data to S3, use Glue to filter the data, and then import it back to DynamoDB.

If the system is live, this step gets trickier. Here's how you can split a DynamoDB table with minimal downtime:

First, add a timestamp to your data if you don't have one already.
Next, create the new tables.
Then set up DynamoDB Streams to replicate all future writes to the new table. You'll need to set one stream per microservice. It's easier if you set it up to copy all the data and after the switchover you delete the irrelevant data. But if you're performing a lot of writes, it will be cheaper to selectively copy only the data that belongs to the microservice.
Then copy the old data, either with a script or with an S3 export + Glue (don't use the DynamoDB import, it only works for new tables, write the data manually instead). Make sure this can handle duplicates.
Finally, switch over to the new tables.

I picked DynamoDB for this example because DynamoDB tables are easy to create and manage (other than designing the data model). In a relational database we would need to consider the tradeoff between having to manage (and pay for) one DB cluster per microservice, or having different databases in the same cluster. The latter is definitely cheaper, but it can get harder to manage permissions, and we lose the ability to scale the data stores independently. Aurora Serverless is a viable alternative, it scales very similarly to DynamoDB in Provisioned Mode. However, it's 4x more expensive than serverful Aurora.

Step 7: Build and Deploy the Microservices

We're using ECS for this example, just so we can focus on the microservices part, instead of debating over how to deploy an app. These are the steps to deploy in ECS, which you'll need to do separately for each microservice:

Write a Dockerfile specifying the base image, copying the source code, installing packages, and setting the appropriate entry point. Test this, obviously.
Build and push the Docker image to an Amazon Elastic Container Registry (ECR) registry. You'll use a separate registry for each microservice (remember they're separate apps).
Create a Task Definition in Amazon ECS, specifying the required CPU, memory, environment variables, and the ECR image URL.
Create an ECS service, associating it with the corresponding Task Definition. Make sure this is working properly.

I don't want to dive too deep into how to deploy an app to ECS. If you're not sure how to do it, here's an article I wrote about it.

Step 8: Update API Gateway

For each API in API Gateway, you'll need to update the routes to point to the newly deployed microservice, instead of to the monolith. First do it on a testing stage, even if you already ran everything in a separate dev environment. Then configure a canary release, and let the microservice gradually take traffic.

You might want to preemptively scale the microservice way beyond the expected capacity requirement. One hour of overprovisioning will cost you a lot less than angry customers.

User Interaction in the Monolith vs in Microservices

Here's the journey for a user viewing a course in our monolith:

The user sends a login request with their credentials to the monolithic application.
The application validates the credentials and, if valid, generates an authentication token for the user.
The user sends a request to view a course, including the authentication token in the request header.
The application checks the authentication token and retrieves the course details from the Courses table in DynamoDB.
The application retrieves the course content metadata from the Content table in DynamoDB, including the S3 object key.
Using the S3 object key, the application generates a pre-signed URL for the course content from Amazon S3.
The application responds with the course details and the pre-signed URL for the course content.
The user's browser displays the course details and loads the course content using the pre-signed URL.

And here's the same functionality in our microservices:

The user sends a login request with their credentials to the authentication service (not covered in the previous microservices example).
The authentication service validates the credentials and, if valid, generates an authentication token for the user.
The user sends a request to view a course, including the authentication token in the request header, to the Course Catalog microservice through API Gateway.
The Course Catalog microservice checks the authentication token and retrieves the course details from its Course Catalog table in DynamoDB.
The Course Catalog microservice responds with the course details.
The user's browser sends a request to access the course content, including the authentication token in the request header, to the Content Delivery microservice through API Gateway.
The Content Delivery microservice checks the authentication token and retrieves the course content metadata from its Content table in DynamoDB, including the S3 object key.
Using the S3 object key, the Content Delivery microservice generates a pre-signed URL for the course content from Amazon S3.
The Content Delivery microservice responds with the pre-signed URL for the course content.
The user's browser displays the course details and loads the course content using the pre-signed URL.

Best Practices for Microservices on AWS

Operational Excellence

Centralized logging: You're basically running 3 apps. Store the logs in the same place, such as CloudWatch Logs (which ECS automatically configures for you).
Distributed tracing: These three services don't call each other, but in a real microservices app it's a lot more common for that to happen. In those cases, following the trail of calls becomes rather difficult. Use X-Ray to make it a lot simpler.

Security

Least privilege: It's not enough to not write the code to access another service's data, you should also enforce it via IAM permissions. Your microservices should each use a different IAM role, that lets each access its own DynamoDB table, not *.
Networking: If a service doesn't need network visibility, it shouldn't have it. Enforce it with security groups.
Zero trust: The idea is to not trust agents inside a network, but instead authenticate at every stage. Exposing your services through API Gateway gives you an easy way to do this. Yes, you should do this even when exposing them to other services.

Reliability

Circuit breakers: User calls Service A, Service A calls Service B, Service B fails, the failure cascades, everything fails, your car is suddenly on fire (just go with it), your boss is suddenly on fire (is that a bad thing?), everything is on fire. Circuit breakers act exactly like the electric versions: They prevent a failure in one component from affecting the whole system. I'll let Fowler explain.
Consider different scaling speeds: If Service A depends on Service B, consider that Service B scales independently, which could mean that instances of Service B are not started as soon as Service A gets a request. Service B could be implemented in a different platform (EC2 Auto Scaling vs Lambda), which scales at a different speed. Keep that in mind for service dependencies, and decouple the services when you can.

Performance Efficiency

Scale services independently: Your microservices are so independent that even their databases are independent! You know what that means? You can scale them at will!
Rightsize ECS tasks: Now that you split your monolith, it's time to check the resource usage of each microservice, and fine-tune them independently.
Rightsize DynamoDB tables: Same as above, for the database tables.

Cost Optimization

Optimize capacity: Determine how much capacity each service needs, and optimize for it. Get a savings plan for the baseline capacity.
Consider different platforms: Different microservices have different needs. A user-facing microservice might need to scale really fast, at the speed of Fargate or Lambda. A service that only processes asynchronous transactions, such as a payments-processing service, probably doesn't need to scale as fast, and can get away with an Auto Scaling Group (which is cheaper per compute time). A batch processing service could even use Spot Instances! Every service is independent, so don't limit yourself.
Consider the increased management efforts: It's easier (thus cheaper) to manage 10 Lambda functions than to manage 5 Lambda functions, 1 ECS cluster and 2 Auto Scaling Groups.

Stop copying cloud solutions, start understanding them. Join over 3000 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Kubernetes Pods Explained

Guillermo Ojeda — Tue, 15 Aug 2023 00:18:02 GMT

What Is a Pod in Kubernetes?

A Kubernetes pod is the basic execution unit of a Kubernetes application. Think of it as a unique environment where your application runs, encapsulating one or more application containers and shared storage/network resources. Kubernetes has a lot of concepts that encapsulate services, endpoints and other entities, but in the end a pod is where your code is running.

Difference between Kubernetes Pods and Containers

Conceptually, a pod can be compared to a Container, especially when comparing Kubernetes with Docker Compose. Pods do fulfill in Kubernetes the same role that containers fulfill in Docker Compose, but a pod is actually a layer of abstraction over one or more containers, with associated networking and storage configurations. Pods can contain several containers, but the best practice is to have a single main container running the application code, and zero or more support containers that provide the application with additional features like logging, monitoring and networking in a decoupled way. This pattern is called Sidecar, and these support containers are called sidecar containers.

Difference between Kubernetes Pods and Nodes

A node is a worker in Kubernetes, which is where pods run. Nodes can be virtual, such as AWS EC2 instances, or physical computer servers. Pods are assigned to nodes, and they run on those nodes, consuming the capacity that the node provides to the cluster. Pods aren't explicitly bound to nodes, they are assigned by the Kubernetes Control Plane, and can be moved from one node to another if necessary.

Difference between Kubernetes Pods and Cluster

A cluster is essentially a group of nodes providing capacity, a group of pods running applications, and other configurations such as services or ingress controllers, all of this managed by the Kubernetes Control Plane. Conceptually, pods can be thought of as the way that Kubernetes runs applications.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

How Do Kubernetes Pods Work?

Pods are a layer of abstraction on top of a main container with our code, and zero or more support or sidecar containers. In addition to the containers, pod have an identity on the Kubernetes cluster, and several configurations.

Pod Lifecycle

Pods can go through several stages in their lifecycle:

Pending: The pod is accepted by the system, but one or more of the containers is not set up and running.
Running: The pod is bound to a node, and all of its containers are created.
Succeeded: All containers in the pod have terminated successfully and will not restart. The pod is no longer bound to a node and consuming resources.
Failed: At least one container in the pod has terminated in failure. The pod is no longer bound to a node and consuming resources.
Unknown: When the state is inexplicable due to some issues in communicating with the pod's host node. When a node fails to report its status to the Kubernetes Control Plane, pods that were running in that node enter the Unknown state.

How Pods manage multiple containers

A pod can encapsulate multiple containers, ensuring they share the same storage and network namespace. This lets us couple application helper processes with the primary application, and avoid dealing with network configurations between them. Containers that run in the same pod share the local network and storage. This is often used for sidecar containers, such as Logging, Monitoring or Network configurations.

For example, this is how you configure a pod with multiple containers:

apiVersion: v1kind: Podmetadata:  name: simple-podspec:  containers:  - name: main-app    image: main-app:v1  - name: helper-logger    image: logger-helper:v1

Working with Pods in Kubernetes

Kubernetes is a complex platform, and running production workloads requires a bit more knowledge than just defining a pod with a container. Here are some additional concepts that you need to understand in order to manage pods effectively.

Pod update and replacement

Directly updating a running pod isnt a good practice, since it breaks the assumption of pod immutability (essentially the same as container immutability). In fact, Kubernetes enforces this immutability at the pod level, meaning it rejects updates to pods. Instead, you should deploy a new version of the pod and gracefully redirect traffic to the newer version. If this deployment is done gradually, replacing one pod at a time (relevant when you're running several pods for the same application), this is called a Rolling Update. If you deploy an entire new set of pods, redirect traffic to them, verify that they work and only then terminate the old pods, this is called a Blue/Green Deployment.

When you create a Deployment, which contains a group of Pods, you can define the update strategy. For example, here's how to define a RollingUpdate strategy:

apiVersion: apps/v1kind: Deploymentmetadata:  name: rolling-deploymentspec:  replicas: 3  strategy:    type: RollingUpdate    rollingUpdate:      maxUnavailable: 1      maxSurge: 1  ...

Storage in Pods

Pods only have ephemeral storage, implemented through working memory. Persistent storage can be created using a Persistent Volume, and associated to a pod through a Persistent Volume Claim. Conceptually, persistent volumes can be compared to nodes: they make underlying resources (storage in this case, not compute) available to the Kubernetes cluster. Persistent Volume Claims reserve those available resources for usage by a pod, to which they're associated. These persistent volume claims can be associated to the containers inside the pod as volumes, and accessed as part of the local file system of the container. Importantly, they can be shared by all containers in the pod, allowing file-based inter-container communication in the same pod. Logging sidecar containers usually use this to read the logs from the main container and export them to an external log aggregator like AWS CloudWatch Logs.

This is how you can define a Persistent Volume Claim and associate it to a container in a pod, as a volume:

apiVersion: v1kind: Podmetadata:  name: example-storage-podspec:  containers:  - name: app    image: simple-app:v2    volumeMounts:    - mountPath: /app/data      name: app-data  volumes:  - name: app-data    persistentVolumeClaim:      claimName: simplepvc

Pod networking

Each pod is allocated a unique IP address across the entire cluster, which can be addressed from inside the cluster. You can also define Services, which let you address homogeneous groups of pods (typically a Deployment) through a single IP address or private DNS name, and load balance traffic across those pods. Ingresses can also be defined, to expose a service to outside the cluster through an Ingress Controller.

Next steps

Kubernetes is a really powerful platform, but that power comes with a significant degree of complexity. Pods are just the starting point, but understanding how they work is necessary to get a grip on the abstractions and configurations that Kubernetes uses on top of pods to deploy production-grade workloads. Soon I'll be writing a follow-up article with a deep dive into Kubernetes Networking, including services and ingress controllers.

If you'd like to continue learning about Kubernetes, I suggest my guide on How to deploy a Node.js app on Kubernetes on EKS.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Understanding Architecture in AWS

Guillermo Ojeda — Tue, 08 Aug 2023 19:13:06 GMT

What is the Importance of Architecture in AWS?

Architecting in AWS is not just about stringing together a few services; it's about building scalable, resilient, and efficient systems that serve a purpose for the business. In that sense, it's an activity that combines the technical knowledge of how to implement a system with the business knowledge of how cloud systems can support business objectives. With that in mind, we can analyze architecture from two different perspectives.

Technical Qualities of Architecture in AWS

Architecture from a technical perspective involves functional and non-functional requirements, such as scalability, availability and resiliency. It must also serve the technical people who will be implementing it, through aspects such as maintainability, simplicity and developer experience.

Architecture isn't about implementation details, but about the decisions that limit and constraint those details. For example, using EC2 instances is an architecture decision, but what specific family or size of EC2 instance is an implementation detail.

Overall, these are the key characteristics of a good architecture in AWS, from the technical perspective.

Performance Optimization

Fine-grained performance tuning might be an implementation detail, but the overall architecture can either enable or severely restrict performance. Selecting the right compute or storage layer, using load balancers or implementing caches can make or break an application's performance. All in all, good performance can be achieved through a deep understanding of both the workload requirements and how AWS services work in conjunction. This allows an architect to identify bottlenecks before even implementing the cloud solution, and optimizing performance as necessary.

Resilience and Availability

Resilience to failures, high availability and fault tolerance depend entirely on architecture decisions. Implementing Multi-AZ deployments, setting up disaster recovery strategies, and understanding how different services guarantee availability ensures that the architecture is resilient. The desired level of resilience is usually not a technical decision, but one of costs and business continuity. However, the implementation of that level of availability and resilience lies entirely on the architect.

Security and Compliance

Properly architected solutions will implement security at every layer, using VPC, Security Groups, IAM, encryption methods and different security services to protect the application from malicious users. Effective protection can only be achieved by specifically architecting for it.

Scalability and Flexibility

One of the biggest benefits of the cloud is being flexible. As demand increases you can scale out the infrastructure to meet the peaks, and when it fluctuates your solution can scale in to reduce costs. The compute layer can be made scalable by setting up Auto Scaling Groups and elastic load balancing, or with a containerized or purely serverless approach. However, a truly scalable architecture needs to consider every service involved, including the data stores and network services.

Every solution has a limit to which it can scale, depending on the AWS services used. It's the architect's job to understand the limits of their architecture, whether they're enough for the expected workload, and what to do if they aren't.

Developer Experience

A cloud architecture isn't a thought exercise. It's a set of decisions that will serve as a blueprint to implement a cloud solution. When architecting in AWS, you need to consider the difficulty of implementing that solution, and the effort of maintaining it. These considerations are so important that they need to be part of the architecture, and they will constraint your architecture decisions. Simplicity is preferred over elegance, maintainability over performance micro-optimizations, and developer experience over complexity or trying to look smart.

Business Qualities of Architecture in AWS

As I mentioned before, architecture isn't just about technical decisions. Those decisions need to serve the business, and business priorities will both constraint and prioritize the technical aspects of the architecture in AWS.

These are the key business characteristics that need to be part of the architecture in AWS.

Business Objectives

The architecture must align with business goals, translating organizational objectives into technical strategies. Whether it's agility, cost reduction, or global expansion, the architecture must reflect and support these goals. Remember that the solution you're architecting only exists to support the business.

Supporting Scale

A good AWS architecture is vital in ensuring the scalability of applications and services to meet the scale of the business. Since architecture is the set of decisions that are hard to change, the architecture in AWS needs to be designed for the business goals for scalability, and not just for the current workload. Furthermore, scalability of operations depends on the simplicity and developer experience of the solution, which come from the architecture decisions.

Cost Control

Architectural decisions directly impact cost. From the choice of EC2 vs Fargate to the storage type, these decisions define the cost structure. Cost optimization is a continuous process that starts with the architecture, and continues throughout the entire lifecycle of the cloud solution.

Facilitating Compliance

The architecture must align with legal and regulatory requirements, such as GDPR, HIPAA, and others. These requirements are still another constraint to the architecture decisions, and are nearly always non-negotiable. The architecture needs to support these requirements.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

AWS Architecture: 3-Tier or n-Tier Architecture

A typical 3-tier architecture consists of Web, Application, and Database layers, each playing a different role in the entire solution.

Web Tier

The web tier handles user interactions and serves website content. A typical pattern for a static website is to serve it from S3, with CloudFront as the CDN and Route 53 for DNS. Dynamic websites that render server-side depending on user input need a compute layer, such as an Auto Scaling Group of EC2 instances behind an Elastic Load Balancer, or an ECS cluster running on EC2 instances or Fargate. Even in those cases, using CloudFront as a CDN is nearly always recommended.

Application Tier

This tier handles business logic and dynamically processes requests. It's typically deployed in an Auto Scaling Group of EC2 instances with a Load Balancer, in an ECS or EKS cluster for containerized applications, or in AWS Lambda for purely serverless applications. It includes integrations with other services such as SQS or SNS to inter-communicate different modules, KMS to encrypt data, and services that aid in security and/or management, such as Secrets Manager or Systems Manager.

Database Tier

The database tier provides persistent storage for the application. Most of the time this comes in the form of a structured database such as DynamoDB, or RDS for MySQL or Postgres. However, the storage tier can also include caches with ElastiCache for Redis or for Memcached, block storage such as EBS, or file storage such as EFS. Understanding the different types of storage and how AWS services offer them and price them is key to designing a good database tier.

AWS Architecture: Serverless

Serverless means shifting to AWS the responsibility of managing underlying servers (even virtual ones), and paying for actual usage of resources instead of reserving capacity. For engineers who don't want to manage servers, that sounds like a fantastic promise. However, architecting a serverless solution needs a lot more than just shifting the responsibility of servers. You need to understand the basic building blocks of a serverless architecture, and how to combine them to build a serverless application.

Benefits of Serverless Architecture

No Server Management: The underlying compute layer is abstracted, removing the need for server provisioning and maintenance.
Automatic Scaling: Resources scale automatically with the number of executions.
Cost-Efficiency: Pay only for the compute time consumed; there's no charge when your code isn't running.

AWS Lambda

Lambda is the core compute service for serverless architectures. In Lambda you create functions, where each function has a part of the code (typically a service) that runs separately from the rest. When Lambda receives a request, it initiates a new invocation of the function, which may create a new execution environment or reuse an existing one. You can set the memory and CPU for the function, and you're only billed for the time that the invocation is running, from receiving the request to returning.

Lambda functions abstract away a lot of the responsibility of running code in production, but they impose certain limitations. An invocation cannot run for more than 15 minutes, so long-running processes can't be implemented with AWS Lambda. Additionally, since you can't guarantee that an invocation will reuse an existing execution environment or create a new one, Lambda functions need to be stateless. Thirdly, Lambda functions end up coupling infrastructure decisions with the code, so developers will need to understand how they work.

Event Source Mapping for Lambda functions

Lambda functions can be triggered by many events, not just HTTP requests. They can be used to respond to changes in a DynamoDB table, to alarms from CloudWatch metrics, to operations on S3 buckets, and even to misconfigurations or security events. In this sense, they're more than just a compute layer for a web application, but instead they become a fantastic tool to automate various infrastructure tasks.

Integration with Managed Services

Services like DynamoDB, S3, CloudWatch, SNS, SQS and Kinesis are tightly integrated with Lambda, enabling you to create complex serverless, event-driven workflows. Events in some parts of the system such as a DynamoDB table trigger behavior that's implemented with a Lambda function. Additionally, these Lambda functions can interact with other AWS services, triggering new events that, in turn, trigger other behaviors. Use these integrations to create event-driven workflows that are entirely serverless and greatly scalable.

Find out the 15 Most Common AWS Lambda Use Cases, or read about advanced patterns such as Managing Multiple Lambda Functions.

API Gateway

API Gateway is an AWS service that acts as a front door for applications, presenting a single endpoint that allows consumers to access data, business logic, or functionality from back-end services. The benefit of API Gateway is that it decouples the endpoint from the implementation, allowing you to replace a serverful application with a serverless one without modifying the endpoint that's exposed.

DynamoDB

DynamoDB is a high performance NoSQL database service that works perfectly within a serverless architecture. It's serverless itself, meaning that you don't need to worry about instances or availability zones. Moreover, it scales automatically, so your scaling Lambda functions won't need to be throttled to avoid overloading the database. Being a NoSQL database means you'll need to keep in mind a few additional considerations, such as DynamoDB Database Design.

Step Functions

Step Functions allows you to create state machines to coordinate components of distributed applications and microservices. You can build your state machines using visual workflows or through Infrastructure as Code, and create complex, multi-step workflow with AWS Step Functions.

AWS Architecture: High Availability Multi-tier

High Availability in AWS means that your application can continue to function with only a minor interruption in the event that an Availability Zone fails. To architect a multi-tier solution for high availability, we need to consider how each component of the architecture is deployed in availability zones, and how an AZ failure will impact them.

Multi-AZ Deployments

By using services across multiple Availability Zones, you ensure that a failure in one zone doesn't bring down the entire system. This means not only deploying EC2 instances across several availability zones, but also ensuring data is replicated in more than one availability zone, and that failover happens automatically.

Elastic Load Balancing

Multiple instances will naturally have multiple endpoints (public IP addresses), which may change if some instances stop responding. An Elastic Load Balancer exposes a single endpoint for the entire compute layer, and distributes requests across backend targets. It dynamically registers and de-registers EC2 instances or other backend targets according to their response to a health check request, ensuring only active instances will receive requests.

Auto Scaling Groups

Auto Scaling Groups (ASGs) can launch or terminate EC2 instances based on metrics such as average CPU usage. This way, your compute capacity can scale out when traffic increases, and scale in when traffic is low. Additionally, an Auto Scaling Group will re-create an instance if it fails. This is especially important in Multi-AZ architectures, since in the event of an AZ failure the Auto Scaling Group can re-create all the failed instances in another availability zone. Implementing Auto Scaling Groups correctly requires understanding scaling policies, cooldown periods, and lifecycle hooks. They're a key component in ensuring the compute layer can recover from failures and can scale without manual intervention.

Data Replication and Backup

For an architecture to be highly available, data needs to be replicated and accessible across several availability zones. Some services like DynamoDB automatically guarantee this. Others, such as RDS or Aurora, need the creation of a replica. Fortunately, for these types of services AWS offers the feature of automatic failover, where requests are sent to a single endpoint and in the event of a failure of the main instance the failover instance automatically starts handling the traffic.

AWS Architecture: Multi-region

On top of the failure of an Availability Zone, AWS can also experience the failure of an entire Region. Building an architecture that's resilient to regional failures presents much more complex challenges than a Multi-AZ architecture.

Data Replication Across Regions

Most services that offer replication in multiple availability zones only do so within the same region. Some have features to replicate across regions, like S3 cross-region replication, but they will charge you for that. Additionally, failover across regions doesn't happen automatically, and more often than not it needs code changes to update the endpoint from which data is consumed.

Service Availability

The vast majority of AWS services are region-scoped, meaning their configurations are specific for that region. This includes features like EBS snapshots, EC2 AMIs, RDS backups, and services such as Secrets Manager or VPC. If a region fails, all of these services will stop functioning in that region. That means a Multi-region architecture needs to replicate all of these configurations across regions, and deal with potential inconsistencies across these copies.

Compliance with Data Sovereignty Laws

Storing and processing data within specific legal jurisdictions may be necessary to comply with local laws. Multi-region architectures may come into conflict with these requirements. If there is only one region that complies with your legal requirements, you may need to consider either not replicating that part of the system, or architecting the entire system for only one region and accepting the risk of a region failing.

Disaster Recovery Strategy

A multi-region approach serves as a disaster recovery strategy. It involves knowing how to route traffic between regions, synchronize data, and ensure that applications can failover smoothly between regions.

Designing Resilient Architectures in AWS

Resilient architectures are about preparing for the unexpected, making sure that the application can keep functioning in the event of partial or localized failures, and that it can recover automatically from general failures. These are some of the best practices that will help you design a resilient architecture in AWS.

Fault Isolation Zones

Dividing the architecture into fault isolation zones ensures that failures are contained and do not cascade throughout the system. These zones can consist of different services, or simply different instances of the same service. For example, having separate RDS instances for different databases can prevent the failure of a single instance from bringing down the entire system.

Multi-AZ and Multi-Region Redundancy

Understanding how to spread resources across Availability Zones and Regions adds a layer of redundancy and resilience to the architecture. However, Multi-AZ and Multi-region doesn't come without a cost. You need to be aware of the tradeoffs of increased cost and increased difficulty to maintain, and decide whether the benefits are worth the price.

Regular Testing of Recovery Procedures

It's not enough to have a recovery plan on paper. Regularly testing recovery procedures using tools like AWS Fault Injection Simulator ensures that the system can handle real-world failure scenarios.

Automated Recovery Mechanisms

Using services like AWS Auto Scaling, CloudWatch Alarms, and AWS Step Functions to automate recovery mechanisms ensures that the system can react to failures without human intervention. This is key in resilient architectures, since humans are much slower to respond, much more expensive to keep on watch, and more prone to errors that could further complicate an outage.

AWS Well-Architected Framework

The AWS Well-Architected Framework is a set of guidelines that provide a consistent approach for customers and partners to evaluate architectures and implement scalable and resilient systems. It's divided into five pillars:

Operational Excellence

Focuses on running and monitoring systems to deliver business value and continually improving processes and procedures. It greatly impacts maintainability and developer experience.

Security

Emphasizes protecting information and systems, managing access, and implementing guardrails. It involves considering security as an architectural concern that needs to be present from the architecture design and throughout the entire lifecycle of the application.

Reliability

Considers the system's ability to recover from failures and dynamically adapt to meet changing demand. It emphasizes resilience, availability, fault tolerance and disaster recovery.

Performance Efficiency

Concentrates on using resources efficiently, selecting the right types and sizes based on workload requirements, and handling the expected traffic with the appropriate amount and configuration of resources. It complements Cost Optimization, but it's more concerned with the application working correctly.

Cost Optimization

Focuses on avoiding unnecessary costs, analyzing spending, and meeting business needs in the most cost-efficient way. It involves continuously analyzing and monitoring costs, and applying cost-reduction strategies. It complements Performance Efficiency, but from a costs perspective.

View Best Practices & Tools for Cost Optimization in AWS.

How to get an AWS Architecture Certification?

Getting AWS Certified with an architecture certification, either Solutions Architect - Associate or Solutions Architect - Professional, can have a significant impact on your career. They cover the knowledge necessary to architect applications in AWS, both from a general solutions perspective and from specific configurations that test the limits of AWS services. Architecting solutions in AWS requires both general software architecture knowledge and experience, and specific AWS knowledge. The certifications don't test for experience, but the knowledge they cover make them an important and useful badge to hold when looking for a cloud architect role.

Read Everything you need to know about AWS Certifications, and How to Pass the AWS Solutions Architect Associate Certification.

Conclusion

Understanding architecture in AWS involves more than knowing the individual services. It requires an understanding of how these components combine and interact to form scalable, resilient, and efficient systems. Whether you're opting for a serverless paradigm or designing across multiple regions, AWS provides the tools and services necessary to build complex architectures tailored to specific needs and objectives. As an AWS Architect, it's your responsibility to understand these tools and their limitations, and know how to use them to architect solutions in AWS that can meet your business requirements.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Understanding AWS Pricing: AWS WAF

Guillermo Ojeda — Thu, 03 Aug 2023 14:51:47 GMT

What is AWS WAF

AWS Web Application Firewall (WAF) is a web application firewall service offered by Amazon Web Services (AWS), designed to protect web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

AWS WAF protects applications against a wide array of attacks, including SQL injection, cross-site scripting (XSS), and other OWASP Top 10 vulnerabilities. You can create custom rules that control the traffic reaching your applications, which allows for tailored security measures that fit the specific needs and threat landscape of your applications.

AWS WAF offers real-time visibility into the traffic arriving at your application, including blocked requests. Integration with services like AWS CloudWatch enables detailed monitoring and alerting.

AWS WAF can be deployed in conjunction with other AWS services. It can be deployed to protect both globally distributed applications via CloudFront and region-specific applications behind an Application Load Balancer or within an Amazon API Gateway, allowing for centralized control over security settings across various parts of your application architecture.

WAF scales with your usage, meaning it can handle varying amounts of traffic without the need for manual intervention or reconfiguration.

How does AWS WAF work?

AWS Web Application Firewall is designed to protect web applications by filtering and monitoring HTTP/HTTPS traffic between a web application and the Internet. Here's how it operates:

Request Inspection: AWS WAF inspects all incoming requests to your application. This includes parameters, headers, and body content.
Rule Evaluation: Based on pre-configured rules, AWS WAF evaluates whether a request matches any unwanted patterns, such as SQL injections or cross-site scripting (XSS).
Action Decision: Depending on the rules matched, AWS WAF either allows, blocks, or counts the request.
Logging: Matched requests can be logged for further inspection and analysis.
Integration: AWS WAF is integrated with other AWS services, such as CloudWatch, to provide real-time metrics and monitoring.
Customization: You can write your own rules using a domain-specific language based on SQL or utilize managed rule sets from AWS or the AWS Marketplace.

AWS WAF's functionality extends to offer real-time protection against OWASP's top 10 security vulnerabilities and can be deployed in various AWS environments, offering flexibility and security customization.

AWS WAF Components

AWS WAF Web ACL

Web Access Control List (Web ACL) is the core resource in WAF. A Web ACL comprises rules that tell AWS WAF what to do with the web requests it inspects.

Rules: Define what to look for in web requests.
Actions: Specify what to do when a rule is matched.
Conditions: Define complex behaviors using logical conditions.
Logging Configuration: Log web requests that are inspected by the ACL.

AWS WAF Rules

There are several types of rules within AWS WAF:

Regular Rules: Use conditions to identify specific patterns or behaviors.
Rate-Based Rules: Block or allow requests based on the count from a particular IP address within a time frame.
Rule Groups: A collection of rules that can be reused across multiple Web ACLs.
Managed Rule Groups: Pre-configured rule sets provided by AWS or AWS Marketplace sellers.
Rule Actions: Define whether to allow, block, or count a request if it meets the conditions of a rule.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

AWS WAF Pricing

AWS WAF has a rather complex pricing structure, which I'll explain in this section and exemplify in the following one.

Web ACLs: $5.00 per month per Web ACL (prorated hourly)
Rule Groups: $1.00 per month per Rule Group (prorated hourly)
Custom Rules: $1.00 per month per Rule (prorated hourly)
Requests: $0.60 per 1 million requests

AWS WAF Free Tier

The AWS WAF Free Tier allows users to explore its features without immediate costs. However, beyond a specific threshold, charges will apply. Understanding these thresholds is essential to minimizing how much you spend with WAF:

Web ACLs: One free Web ACL.
Rule Groups: One free rule group.
Custom Rules: Ten free custom rules.
Requests: 10 million free requests inspected per month.

Managed Rule Groups from AWS Marketplace

Managed Rule Groups are provided by AWS or third-party sellers on the AWS Marketplace. They're pre-configured for common threats and compliance requirements.

Pricing Variables: Vendor, complexity, and the number of rules within the group can affect the price.
Updates and Maintenance: Managed Rule Groups often include updates and maintenance, potentially offering long-term value.

Pricing Examples for AWS WAF

Example 1: No Managed Rule Group and 10 Rules Written by You

Web ACL Cost: $5.00 * 1 = $5.00
Rule Cost: $1.00 * 10 rules = $10.00
Request Cost: $0.60/million * 10 million = $6.00
Total Cost: $5.00 + $10.00 + $6.00 = $21.00

Example 2: One Managed Rule Group from AWS Marketplace and 5 Rules Written by You

Web ACL Cost: $5.00 * 1 = $5.00
Rule Cost: $1.00 * 5 rules = $5.00
Request Cost: $0.60/million * 10 million = $6.00
Managed Rule Group Cost: $20.00
Managed rule group request charges: $1.20/million * 10 million = $12.00
Total Cost: $5.00 + $5.00 + $6.00 + $20.00 + $12.00 = $48.00

Example 3: One Rule Group Containing 5 Rules and 3 Rules Written by You

Web ACL Cost: $5.00 * 1 = $5.00
Rule Cost: $1.00 * (1 rule group + 5 rules + 3 custom rules) = $9.00
Request Cost: $0.60/million * 10 million = $6.00
Total Cost: $5.00 + $9.00 + $6.00 = $20.00

Example 4: One WAF Web ACL with Captcha Enabled

Web ACL Cost: $5.00 * 1 = $5.00
Rule Cost: $1.00 * (4 rules) = $4.00
Request Cost: $0.60/million * (100 million requests + 1,000 retries) = $60.00
Captcha Cost: $0.40/thousand * 10,000 attempts = $4.00
Total Cost: $5.00 + $4.00 + $60.00 + $4.00 = $73.00

Using AWS WAF to Protect Applications From Common Security Exploits

Get started with WAF with a practical step-by-step guide and explanation, Using AWS WAF to Protect Applications From Common Security Exploits

Tips for AWS WAF

Understand Free Tier Limitations: Leverage free tier to explore, but be mindful of limits.
Choose Rules Wisely: Select or create rules that are most relevant to your application.
Monitor Costs Regularly: Use AWS CloudWatch and Cost Explorer to monitor and manage costs.
Consider Managed Rules: Assess the cost-effectiveness of Managed Rule Groups.
Use Metrics and Logging: Use logs for performance tuning and security insights.
Test Regularly: Regularly test your configuration to ensure its effectiveness.

AWS WAF is a robust web application firewall service that offers flexibility in rule configuration and integration with other AWS services. Understanding its components, including Web ACLs, various rules, and pricing models, can help organizations develop a more strategic approach to securing their web applications without unnecessary spending.

The goal of this guide was to highlight the key aspects of AWS WAF, from its functionality and components to pricing structures and optimization tips. The idea is for you to understand what is AWS WAF and how it's priced, so you can use it without getting a huge surprise in your AWS bill.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

AWS CloudFormation Explained

Guillermo Ojeda — Fri, 07 Jul 2023 23:30:31 GMT

What Is AWS CloudFormation?

AWS CloudFormation is an Amazon service that allows developers to manage and provision AWS resources predictably and consistently. In a nutshell, it's a tool that lets you define AWS infrastructure as code, and deploy it from code files.

The core component of AWS CloudFormation is a template file written in JSON or YAML format. This template describes the resources you need, their dependencies, and the necessary permissions. When you deploy this template, AWS CloudFormation takes care of creating and configuring the resources for you. This eliminates the need to manually create and configure resources, and it gives you a repeatable way to deploy infrastructure.

But CloudFormation goes beyond just initial setup. It can also manage the entire lifecycle of your resources including updates, tweaks, and even clean-up when resources are no longer needed. This makes it a vital part of DevOps workflows, enabling teams to implement Infrastructure as Code (IaC) practices, and thus improving the predictability, efficiency, and scalability of deployments.

Benefits of CloudFormation

These are the main advantages of defining infrastructure as code using CloudFormation

1. Repeatable and Consistent Deployments: With CloudFormation, you can use templates to define your resources, and then reuse these templates to create identical copies of your infrastructure. This can be incredibly useful for deploying test environments or setting up new regions.

2. Simplified Management: AWS CloudFormation consolidates the management of your AWS resources into a single platform. This means you can create, update, and delete a collection of resources as a single unit, which we call a stack.

3. DevOps Friendly: With its support for Infrastructure as Code (IaC), CloudFormation allows you to manage and provision your AWS resources using code. This makes it easy to integrate with CI/CD pipelines and source control systems, thus streamlining your DevOps practices.

4. Safety Controls: CloudFormation provides detailed controls over your resources. With features like rollback on failure and change sets, you can minimize risks during deployment.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

AWS CloudFormation Basic Concepts

In order to effectively use AWS CloudFormation, there are several key concepts to understand:

Template

At the heart of AWS CloudFormation is the concept of a template. A template is a JSON or YAML formatted text file which serves as a blueprint for your AWS infrastructure. It defines the AWS resources you want to create and configure.

Templates are declarative, which means you only need to describe your desired state of resources, and AWS CloudFormation takes care of how to achieve that state. A template can be used repeatedly to create identical copies of the same stack (or to replicate the infrastructure with different settings).

Stacks

In AWS CloudFormation, a stack is the basic unit of management and deployment. It represents a collection of AWS resources that you can manage as a single unit. All resources in a stack are defined by the stack's AWS CloudFormation template.

You can create, update, or delete a collection of resources by creating, updating, or deleting stacks. All the resources in a stack are treated as a single unit, so operations and updates are atomic and consistent.

Formatting

As mentioned earlier, AWS CloudFormation templates can be written in either JSON or YAML, both of which are easily readable by humans and machines. While the choice between JSON or YAML mostly comes down to personal preference, YAML tends to be more compact and easier to read, which might be beneficial for larger templates.

Parameters

Parameters are a way to customize your AWS CloudFormation stacks. They allow you to pass custom values to your template at runtime, without the need to modify the template itself.

For instance, you might have a template that sets up an Amazon EC2 instance, and you want to use different instance types (like t3.micro, t3.small, etc.) depending on your needs. You can set up an instance type parameter in your template, and specify the desired instance type when you create or update the stack.

Conditions

Conditions in AWS CloudFormation templates enable you to control the creation of resources or the setting of properties based on different conditions. For example, you might want to create certain resources only in specific AWS regions, or configure properties differently based on the input parameters.

Change sets

Change sets are an important concept that helps you update your AWS CloudFormation stacks safely. Before making any changes to your stack, you can generate a change set, which is a summary of the proposed changes. This lets you see how changes might impact your resources or configuration before implementing them, allowing you to avoid potential disruptions or errors.

Functions

AWS CloudFormation provides a set of intrinsic functions that you can use within your templates to assign values to properties that are not available until runtime. These functions include capabilities such as retrieving the value of an AWS::StackName, working with strings, or including other files. They can help create more dynamic and flexible templates.

Anatomy of a CloudFormation Template

An AWS CloudFormation template is a formatted text file in JSON or YAML, which provides a user-friendly, human-readable format for specifying your infrastructure components. Here, we'll analyze the key sections found within a typical CloudFormation template:

AWS Template Format Version

This optional field defines the AWS CloudFormation template version that the template conforms to. It helps you ensure that the template is compatible with the correct AWS CloudFormation syntax.

AWSTemplateFormatVersion: "version date"

Description

A text string that describes the template. This field can help you and others understand the purpose and function of your template.

Description: String

Metadata

The optional Metadata section includes details about the template itself, such as the template file format version. It can also define some AWS-specific functionalities.

Metadata:  Instances:    Description: "Information about the instances"

Parameters

As described in the previous section, the Parameters section defines values that can be passed to your template at runtime. Parameters make your templates flexible and customizable.

Parameters:  InstanceType:    Description: "WebServer instance type"    Type: "String"

Mappings

The Mappings section lets you map keys to corresponding named values that you can use in your template. Mappings can be used to determine values based on the region in which the stack is run, the instance type, or any other value.

Mappings:  RegionMap:    us-east-1:      "32": "ami-6411e20d"

Conditions

Conditions define whether certain resources are created or whether certain resource properties are assigned a value during stack creation or update. For example, you could compare whether a value is equal to another value.

Conditions:  CreateProdResources: !Equals [!Ref EnvironmentType, prod]

Resources

The Resources section is the only required section. It declares the AWS resources that you want to include in the stack, such as an Amazon EC2 instance or an Amazon S3 bucket.

Resources:  MyEC2Instance:    Type: "AWS::EC2::Instance"

Outputs

The Outputs section declares output values that you can import into other stacks or that you can easily check by using the AWS CloudFormation console, AWS CLI, or APIs.

Outputs:  WebsiteURL:    Value: !GetAtt [ WebServerInstance, PublicDnsName ]

How to Create a CloudFormation Template

Creating a CloudFormation template involves the following steps:

Identify the AWS resources needed: To begin, you need to identify all the AWS resources that are needed for your project. This could be EC2 instances, S3 buckets, IAM Roles, or any other AWS service.
Start with a blank file: You can create a new blank file using any text editor of your choice. This file can be in either YAML or JSON format.
Write your template: AWS CloudFormation templates can have all of the sections we saw earlier, but the only required section is "Resources". This is where you define what AWS resources you want to create and configure.
A basic CloudFormation template in YAML format looks like this:
```
 Resources:     MyS3Bucket:         Type: 'AWS::S3::Bucket'
```
Validate your template: It's always a good idea to validate your CloudFormation template to catch any syntax errors. You can use the AWS Management Console, AWS CLI, or AWS SDKs to validate your template.
To validate the template using AWS CLI:
```
 aws cloudformation validate-template --template-body file://template.json
```

How to Deploy a CloudFormation Template

Deploying a CloudFormation template involves the following steps:

Log into AWS Console: Go to the AWS Management Console and enter your credentials to log in.
Open the CloudFormation service: On the AWS Management Console, type 'CloudFormation' into the service search bar and then choose 'CloudFormation' to open the service console.
Create a stack: In the CloudFormation console, choose 'Create stack', and then choose 'With new resources (standard)'.
Specify the template: Choose 'Template is ready'. In the 'Template source' section, choose 'Upload a template file'. Choose 'Choose file', and then select your CloudFormation template.
Specify stack details: For 'Stack name', use a unique name. Specify additional parameters under 'Parameters', and choose 'Next'.
Configure stack options: You can optionally choose tags for your stack and set advanced options. When you're done, choose 'Next'.
Review and create the stack: Review your settings, acknowledge that AWS CloudFormation might create IAM resources, and then choose 'Create stack'.

How to Update a CloudFormation Stack

Updating a CloudFormation stack involves the following steps:

Open the CloudFormation service: Like in the previous section, got to the CloudFormation console on AWS.
Select the stack: On the 'Stacks' page in the CloudFormation console, select the stack that you want to update.
Choose to update the stack: Choose 'Actions', and then choose 'Update stack'.
Provide the updated template: You can provide the updated template in several ways: directly in the AWS Management Console, by uploading a file, or specifying an Amazon S3 URL. Choose 'Next' after you've provided the updated template.
Specify stack details: For 'Stack name', the existing name of the stack is displayed. You can't change the stack name. Specify additional parameters under 'Parameters', and choose 'Next'.
Configure stack options and review: Configure any stack options, then review your settings. Acknowledge that AWS CloudFormation might create IAM resources with custom names, and then choose 'Update stack'.

Note that the AWS Management Console, AWS CLI, AWS SDKs, and APIs provide similar features for managing stacks. However, the console is typically the most straightforward method to use for stack management.

Advanced CloudFormation

Beyond the foundational use of CloudFormation that this blog post covers, there are several advanced features that can enhance and customize your CloudFormation stacks. These features include update rollback, stack policies, and, notably, AWS CloudFormation Hooks.

What Are AWS CloudFormation Hooks?

CloudFormation Hooks are scripts that run either before or after a particular lifecycle event, such as the creation, update, or deletion of a stack resource. They can be used to manage dependencies, perform custom validations, or implement sophisticated control logic.

Hooks are defined in the "Metadata" section of the resource definition within the CloudFormation template. Hooks are written as AWS Lambda functions, and you can define hooks in any runtime supported by AWS Lambda.

Here's an example of how you can define a hook:

Resources:  MyInstance:    Type: 'AWS::EC2::Instance'    Metadata:      'AWS::CloudFormation::Init':        config:          hooks:            BeforeInstall:              - location: s3://mybucket/myscript.sh                timeout: '60'

In this example, the script "myscript.sh" is downloaded from an S3 bucket and executed before the 'Install' event of the CloudFormation lifecycle. If this script fails, the CloudFormation stack operation fails and rolls back.

Alternatives to CloudFormation

While AWS CloudFormation is a robust and comprehensive tool for managing AWS resources, several alternatives might better suit your needs based on your use case, the complexity of your infrastructure, and your personal preference.

Terraform: Terraform, an open-source tool developed by HashiCorp, allows you to define and provision infrastructure using a declarative configuration language. Terraform is cloud-agnostic, meaning you can use it with multiple cloud providers simultaneously. It also has an extensive plugin system for integrating with other services.
AWS CDK (Cloud Development Kit): The AWS CDK is a software development framework to define cloud infrastructure as code and provision it through AWS CloudFormation. With AWS CDK, you can leverage the expressiveness of modern programming languages, like TypeScript, Python, and Java, to define your infrastructure.
Pulumi: Similar to AWS CDK, Pulumi lets you create, deploy, and manage cloud infrastructure using real programming languages, including JavaScript, TypeScript, Python, Go, and .NET. Pulumi supports multiple clouds in addition to AWS.
Serverless Framework: For serverless applications, the Serverless Framework is a popular choice. It provides a simple and intuitive developer experience, with a focus on building and deploying serverless architectures.

Each of these tools has its strengths and trade-offs. The choice depends on your specific needs and circumstances. However, AWS CloudFormation remains a powerful choice due to its deep integration with AWS services and its wide adoption in the industry.

And there you have it! Now you should have a basic understanding of what AWS CloudFormation is and how it operates. From its fundamental concepts to the details of creating, deploying, and updating a CloudFormation stack, you have the tools to get started with CFN.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Understanding Amazon CloudFront Pricing

Guillermo Ojeda — Thu, 06 Jul 2023 14:31:23 GMT

Amazon CloudFront is AWS's Content Delivery Network (CDN) service. A basic setup of CloudFront isn't hard to achieve, however, the pricing structure can be confusing and difficult to predict. This article is aimed at explaining the various aspects of AWS CloudFront pricing, and how to spend less money on it.

What Is Amazon CloudFront?

Amazon CloudFront is a global Content Delivery Network (CDN) service that safely delivers data, applications, videos, and APIs to users worldwide at high speed and with low latency. The service integrates seamlessly with the extensive infrastructure of Amazon Web Services (AWS), such as AWS Shield for DDoS mitigation, Amazon Simple Storage Service (Amazon S3), Amazon Route 53, and Elastic Load Balancing.

CloudFront operates via a network of data centers called edge locations that are located around the world. These edge locations help deliver content to users in the quickest way possible, thereby reducing latency and ensuring an efficient and reliable service.

CloudFront supports HTTP/2 and IPv6, offers field-level encryption, and integrates with AWS WAF (Web Application Firewall), AWS Shield, and AWS Certificate Manager. These features allow developers to create and manage content delivery efficiently and easily.

How does Amazon CloudFront deliver content?

Amazon CloudFront delivers content through its globally spread network of edge locations. When an end-user makes a request for content that is being served through CloudFront, the request is automatically routed to the edge location that can best serve the user's request. This is typically the edge location that provides the lowest latency.

The process by which CloudFront delivers content to the end-user can be broken down into a series of steps. Initially, when the end-user makes a request for content (like an image or a video), CloudFront routes the request to the edge location that can best serve the user's request. This is decided based on the proximity of the edge locations to the user, with the nearest one usually being chosen.

If the requested content is already in the edge location (cached from a previous request), CloudFront delivers the content directly to the user. If the requested content is not in the edge location, CloudFront forwards the request to the origin server (which can be an Amazon S3 bucket, an HTTP server, or a MediaPackage channel). The origin server then sends the content back to the CloudFront edge location, which in turn delivers the content to the end-user.

Once the content has been fetched from the origin server, CloudFront caches the content at the edge location, making it readily available for any subsequent requests. This is done so that for any future requests for the same content, CloudFront does not need to fetch the content from the origin server again. Instead, it can deliver the content directly from the edge location, reducing latency and improving the user's experience.

The time for which the content stays in the cache of the edge location is controlled by the cache control headers or by the CloudFront caching settings. Once the content is removed from the cache, any future request for the content would again need to be fetched from the origin server.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

How much does CloudFront cost?

Is CloudFront expensive? Not really. CloudFront can get expensive, but it's nearly always less expensive than delivering the same content without using CloudFront. The primary factors that contribute to CloudFront costs are the amount of data transferred out to the internet and the number of HTTP or HTTPS requests made.

To give a basic understanding of CloudFront's pricing structure, the costs are divided into three components:

Data Transfer Out to Internet: This is the cost associated with the delivery of content from CloudFront to your users. The cost depends on the region from which the content is being served and the amount of data transferred.
HTTP/HTTPS Requests: CloudFront also charges for the number of requests made by your users. The charges differ based on the type of request (HTTP or HTTPS) and the region from which the requests are served.
Data Transfer Out to Origin: If your origin server is not an AWS service, CloudFront will charge for the data transferred from the CloudFront edge location back to your origin server.

Keep in mind the first 1 TB of data transfer out per month and the first 10,000,000 HTTP or HTTPS Requests per month are free.

CloudFront Price Classes

Pricing varies by geographic region, so the location of your users also impacts your costs. The global nature of CloudFronts network of edge locations allows the service to maintain high availability and performance, but data transfer rates differ based on whether content is being delivered from the United States, Europe, Asia, Australia, South America, or Africa. Note that this doesn't exactly depend on where the users are actually located, but rather on the edge location from which the content is served. For a better user experience, content should be served from edge locations near the user, but this can increase price. Additionally, if you are using an AWS origin, data transferred from this AWS origin to CloudFront edge locations will be free.

With CloudFront, you don't get to choose directly which regions to enable or not. Instead, you choose between three different price classes, and the regions that are enabled depend on that.

These are the regions enabled for each price class:

	North America (United States, Mexico, Canada)	Europe and Israel	South Africa, Kenya, and the Middle East	South America	Japan	Australia and New Zealand	Hong Kong, Indonesia, the Philippines, Singapore, South Korea, Taiwan, and Thailand	India
Price Class All	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Price Class 200	Yes	Yes	Yes	No	Yes	No	Yes	Yes
Price Class 100	Yes	Yes	No	No	No	No	No	No

CloudFront Price Class 100 will be the cheapest, including only North America, Europe and Israel. Price Class 200 includes all regions except South America and Australia and New Zealand, which are the most expensive. Price Class All includes all regions.

Keep in mind that the price doesn't depend on the Price Class that you choose. It depends on what regions are used to serve the traffic. If you select Price Class All and have 10 users from South America and 10 from the US, you'll be paying more than if you selected Price Class 100. However, with Price Class 100 the users in South America would be served from the nearest region that's enabled, North America in this case, and would have a worse user experience.

CloudFront Pricing Example

To illustrate how AWS CloudFront pricing works, lets consider an example. Suppose you have an application hosted in the US East (N. Virginia) region, and you have users all around the world. You use CloudFront to deliver 150 GB of data and handle 300,000 HTTPS requests in one month. Keep in mind that the requests are HTTPS, which are priced slightly higher than HTTP requests.

Keep in mind the first 1 TB of data transfer out per month and the first 10,000,000 HTTP or HTTPS Requests per month are free. We're going to exclude these from our calculations, to better understand how CloudFront is priced beyond the free tier.

For the US, Canada, Mexico, Europe and Israel, AWS charges $0.085 per GB for the first 10 TB / month of data transfer out, and $0.0100 per 10,000 HTTPS request. So, your data transfer cost would be 150 GB x $0.085/GB = $12.75. The HTTPS requests cost would be 300,000 x $0.0100/10,000 = $0.30. So, the total cost for CloudFront would be $13.05.

For South America, the rates differ slightly. For the first 10 TB per month, AWS charges $0.110 per GB for data transfer out, and the HTTPS requests are charged at $0.0220 per 10,000 HTTPS requests. So, suppose we deliver an additional 50 GB of data in South America, with 100,000 HTTPS requests. Your data transfer costs would be 50 GB x $0.110/GB = $5.5, and the HTTPS requests costs would be 100,000 x $0.0220/10,000 = $0.22, totaling $5.72 for South America. The total cost for all of our regions would be $18.77.

This is a simplified example, and actual costs can be affected by other factors such as the price for invalidation requests (requests to remove an object from CloudFront cache before the set expiration time), dedicated IP custom SSL, and field-level encryption requests.

You can view the entire calculation in this AWS Pricing Calculator estimate.

Additional Costs for CloudFront Integrations

While CloudFront itself is cost-effective, integrating it with other AWS services may result in additional charges for bandwidth usage. For instance, if you store your files in an Amazon S3 bucket, you'll incur S3 storage costs. Similarly, if you're using AWS Shield for DDoS protection or Amazon Route 53 for DNS, you'll be billed for those services as well.

Remember that while CloudFront does charge for data transfer to the viewer, it does not charge for data transfer from origin servers like Amazon S3, Elastic Load Balancing, or EC2 to CloudFront. You also wont be charged for viewer requests to your website.

Amazon CloudFront Price Tips and Tricks

There are several strategies you can employ to optimize your CloudFront costs. Here are a few tips and tricks:

Use the AWS Free Tier: The first 1 TB of data transfer out per month and the first 10,000,000 HTTP or HTTPS Requests per month are free, forever. This means even accounts older than 12 months enjoy this free tier.
Price Classes: You can reduce the cost of delivery by choosing a price class for your CloudFront distribution that includes only the regions that your main users are in. This allows you to exclude more expensive regions where you may not have users. If a user requests the content from a region you haven't configured in CloudFront, they'll still be able to get the content, just with increased latency. For example, if you only configure North America, a user in South America will have their request sent to the edge locations in Mexico.
Caching Optimization: Fine-tune your CloudFront cache settings to maximize the caching duration (TTL) of your content at edge locations. This reduces the need to fetch data from the origin, which can significantly cut down data transfer costs.
Data Compression: CloudFront can automatically compress certain files at edge locations. This can reduce the size of data that CloudFront needs to transfer, which in turn can reduce your cost.
CloudFront Savings Bundle: For those with predictable high-volume traffic patterns, committing to a certain level of usage by purchasing a CloudFront Savings Bundle will result in a reduction of up to 30% of your CloudFront costs.

Amazon CloudFront Use Cases

There are several common use cases for Amazon CloudFront, including static asset caching and live streaming. These use cases highlight how businesses leverage the service to enhance their operations and serve their users more effectively.

Static Asset Caching

CloudFront excels in delivering static web content like HTML, CSS, JavaScript, and image files. By storing this content closer to users, websites experience reduced load times and increased reliability. For instance, a global e-commerce company could use CloudFront to serve its product images and website stylesheets to customers around the world, ensuring fast load times and a smooth shopping experience.

Live Streaming

CloudFront is also used for streaming live events such as sports, gaming tournaments, and concerts. The AWS Elemental MediaPackage integrates seamlessly with CloudFront for a scalable and cost-effective live streaming solution. Content providers can deliver live video with low latency to a global audience, providing an excellent viewing experience.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Amazon S3 Storage Classes

Depending on your use case, you can choose from a range of Amazon S3 storage classes, each with different pricing, availability, and durability characteristics.

Amazon S3 Standard

Amazon S3 Standard is the default storage class and is designed for frequently accessed data. It offers high durability, throughput, and low-latency, supporting a wide variety of use cases including cloud applications, content distribution, or backup and restore operations.

Amazon S3 Standard-Infrequent Access

S3 Standard-IA is meant for data that is accessed less frequently, but still requires rapid access when needed. It offers a cheaper storage price per GB than S3 Standard, while still providing the same high durability and throughput. This class is suitable for long-term backups and secondary storage.

S3 Glacier and Glacier Deep Archive

S3 Glacier and Glacier Deep Archive classes are designed for archiving data. S3 Glacier offers cost-effective storage for data archiving and backup, and data is accessible in timeframes that range from minutes to hours, depending on the configuration. S3 Glacier Deep Archive is the lowest-cost storage class and supports retrieval within 12 hours, ideal for archiving data that is rarely accessed.

Amazon S3 Use Cases

The availability, durability and ease of use of Amazon S3 make it an excellent choice for a wide arrange of use cases and applications.

Building a Data Lake with S3

With Amazon S3, you can build a highly scalable and secure data lake capable of storing exabytes of data. S3 supports all types of data, from structured files such as CSVs to unstructured social media data, computer logs, or IoT device-generated data. It can act as a hub for big data analytics, machine learning, and real-time business analytics. Furthermore, it integrates seamlessly with AWS services like Athena for querying data, Quicksight for visualization, and Redshift Spectrum for exabyte-scale data analysis.

Backing Up and Restoring Critical Data in S3

Amazon S3's availability and resilience make it ideal for backing up and restoring critical data. Its versioning feature allows you to preserve, retrieve, and restore every past version of every object, adding an extra layer of protection against user errors, system failures, or malicious acts. With cross-region replication (XRR) you can automate the replication of data across different geographical regions, ensuring your data is available and protected in the case of local or regional failures.

In addition, Amazon S3s lifecycle policies can be utilized to automate the migration of data between tiers, reducing costs, and enhancing efficiency in backup operations. Its compatibility with several AWS and third-party backup solutions makes it even better, enabling you to implement custom backup strategies without a huge engineering effort.

Archiving Data in S3 at the Lowest Cost

Amazon S3 offers highly durable and cost-effective solutions for archiving data. With S3 Glacier and S3 Glacier Deep Archive storage classes, you can preserve data for the long term at a fraction of the cost of on-premises solutions. S3 Glacier is ideal for data that needs retrieval within minutes to hours, while S3 Glacier Deep Archive is the lowest-cost storage class suitable for archiving data that's accessed once or twice in a year and can tolerate a retrieval time of 12 hours.

S3's fine-tuned access policies and automatic data lifecycle policies ensure that your data remains secure and compliant, regardless of how long it's archived.

Running Cloud-Native Applications with S3

Amazon S3 provides highly durable, scalable, and accessible storage for cloud-native applications. Developers can use S3's features and integrations with AWS services to build sophisticated applications capable of handling vast amounts of data and millions of users.

From storing user-generated content, like photos and videos, to serving static web content directly from S3, the service offers robust functionality. In addition, S3 events can trigger AWS Lambda functions for serverless computing, enabling you to build reactive, efficient applications.

Security in Amazon S3

Securing your data is a top priority when using Amazon S3 storage. The service provides a multitude of configurable security options to ensure your data remains private, and access is controlled.

Access Control in S3

Identity and Access Management (IAM)

AWS IAM allows you to manage access to AWS services and resources securely. IAM users or roles can be given permissions to access specific S3 buckets or objects using IAM policies. By applying least privilege access, where you grant only necessary permissions, you can reduce the risk of unauthorized access.

S3 Bucket Policies and ACLs

Bucket policies are used to define granular, bucket-level permissions. For example, you can set a policy that allows public read access to your bucket or restricts access to specific IP addresses.

Access Control Lists (ACLs), on the other hand, can be used to manage permissions at the individual object level, allowing more fine-grained access control.

Block Public Access to S3

S3 provides the option to block public access to your buckets. With this feature, you can set up access rules that override any other access policies, ensuring that your data remains private unless explicitly shared.

Encryption in S3

S3 Server-Side Encryption

Amazon S3 provides server-side encryption where data is encrypted before it's written to the disk. There are three server-side encryption options:

S3 Managed Keys (SSE-S3): Amazon handles key management and key protection for you.
AWS Key Management Service (SSE-KMS): This offers an added layer of security and audit trail for your key usage.
Customer-Provided Keys (SSE-C): You manage the encryption keys.

S3 Client-Side Encryption

In client-side encryption, data is encrypted on the client-side before it's transferred to S3. You have complete control and responsibility over encryption keys in this case.

Data Protection in S3

S3 Object Versioning

Versioning allows you to preserve, retrieve, and restore every version of every object in your bucket. This feature protects against both unintended user actions and application failures.

Amazon S3 Lifecycle

Lifecycle policies can be used to automate moving your objects between different storage classes at defined times in the object's lifecycle. For example, moving an object from S3 Standard to S3 Glacier after 30 days.

Security Monitoring and Compliance for S3

AWS CloudTrail

AWS CloudTrail logs, monitors and retains account activity related to actions across your AWS infrastructure. This can be useful for auditing and review of S3 bucket accesses and changes.

AWS Trusted Advisor

Trusted Advisor provides insights regarding AWS resources following best practices for performance, security, and cost optimization.

Amazon S3 Replication

One of the critical services Amazon S3 offers is data replication. It is a crucial aspect of ensuring data availability and protection against regional disruptions. Amazon S3 provides different types of replication services to meet various data management requirements.

What is Amazon S3 Replication?

Amazon S3 replication is an automatic, asynchronous process that makes an exact copy of your objects to a destination bucket in the AWS region of your choice. The replicated objects retain the metadata and permissions of the source objects.

Types of Amazon S3 Replication

Amazon S3 offers several types of replication services:

S3 Cross-Region Replication (CRR)

S3 Cross-Region Replication enables automatic, asynchronous copying of objects across buckets in different AWS regions. CRR is used to reduce latency, comply with regulatory requirements, and provide more robust data protection.

S3 Same-Region Replication (SRR)

Similar to CRR, S3 Same-Region Replication (SRR) automatically replicates objects within the same AWS region. SRR is useful for data sovereignty rules compliance, maintaining operational replica within the same region, or for security reasons.

S3 Replication Time Control (RTC)

S3 Replication Time Control (RTC) is designed for workloads that require predictable replication times backed by a Service Level Agreement (SLA). S3 RTC offers replication in less than 15 minutes for 99.99% of objects.

S3 Replication to Multiple Destinations

S3 also supports replicating data to multiple destination buckets. This feature is useful when you need to set up complex, resource-sharing structures between various departments or separate backup strategies.

Setting Up Replication in Amazon S3

To set up replication, you must use an IAM role that grants Amazon S3 the required permissions to replicate objects on your behalf. Then, create a replication rule in the AWS Management Console, specifying the source and destination buckets and the IAM role.

After setting up replication, you can monitor the process using S3 Replication metrics, events, and S3 Replication Time Control (S3 RTC). You can access these metrics through the Amazon S3 console or Amazon CloudWatch.

Understanding S3 Replication Costs

Replicating objects with Amazon S3 incurs costs for storing the replicated copy and for transferring data to another AWS region (for CRR). Additionally, there might be costs associated with requests, such as PUT, LIST, and GET, made against your buckets.

Conclusion

With its robust durability, security features, and a wide range of storage classes, Amazon S3 can handle a variety of use cases, from primary application storage to long-term archival. By understanding the mechanisms underpinning S3, you can leverage its full potential to drive cost efficiency and streamline your data storage and access workflows.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

What is a VPC and how does it work?

Guillermo Ojeda — Wed, 05 Jul 2023 14:28:00 GMT

What is Amazon VPC in simple words?

Amazon Virtual Private Cloud (Amazon VPC) is a service that lets you create dedicated, private networks in AWS. A VPC can be thought of as your private slice of the AWS network. It's like having your own isolated network within the cloud, where you can launch AWS resources in a virtual environment that you control. This service can be likened to a traditional network that you'd operate in your own data center, but with the benefits of the scalable infrastructure of AWS.

In VPC you can define the IP address range and subnets, decide whether instances in these subnets can reach the internet, and control inbound and outbound traffic through the use of Network Access Control Lists and Security Groups.

Basics of Amazon VPC

A VPC is a private virtual network in AWS, where you can deploy AWS resources such as EC2 instances and RDS databases. You're not limited to a single VPC, you can create as many as you want. In fact, you're encouraged to create a new VPC for every workload that you want to deploy.

Default VPC

When you create an AWS account, a default VPC is set up for you in each region. This ready-to-use environment makes it easy for you to start deploying instances right away. Every EC2 instance that you launch in the default VPC has a private and a public IP address, allowing these instances to communicate out to the internet and enabling the internet to communicate back, through AWS's infrastructure.

While the default VPC is user-friendly and facilitates easy transitioning for new users, it might not fulfill the requirements of more complex or larger systems. Its configuration is rather general, and it may lack the necessary security settings or the architectural segregation essential for more sophisticated infrastructures.

Creating additional VPCs

Creating additional VPCs allows you to segment your network for different purposes, like maintaining separate VPCs for production and development environments. These additional VPCs are entirely independent from your default VPC and from each other, ensuring isolation and security of your various environments.

When you create a new VPC, it comes with its own IP address range, router, default security groups, network access control list (NACL), and route table. You can also add more subnets, security groups, network ACLs, etc., according to your project's requirements.

Creating additional VPCs also provides opportunities for advanced networking architectures. For instance, you can implement VPC peering for secure communication between different VPCs or establish a VPN connection between your office network and your VPC.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Networking

Amazon VPC includes a wide array of networking components that allow you to design your network in a way that best suits your needs for architecture, security and connectivity. These components work together to provide connectivity, security, and functionality to your VPC.

Internet Gateway

An Internet Gateway (IGW) is a pivotal component that provides a path for network traffic between your Amazon VPC and the internet. It's an AWS-managed component designed to be highly available and horizontally scalable, ensuring reliability and performance.

The IGW plays two critical roles. Firstly, it serves as a route through which your VPC can communicate with the internet. Secondly, it performs network address translation (NAT) for instances that have been assigned public IPv4 addresses.

To enable internet access within a VPC, you must attach an IGW to your VPC, update your subnet's route table to direct traffic to the IGW, and ensure instances have publicly routable IP addresses (public or Elastic IPs).

Route Table

Route Tables form the backbone of your network traffic flow. They consist of a set of rules, known as routes, which determine where network traffic should be directed within your VPC. Every subnet in your VPC must be associated with a route table.

By default, each VPC uses a main route table, but you can create additional custom route tables that specify more precise routes. A route table can have multiple routes that influence the traffic from and to the subnets associated with it.

Each subnet can only be associated with one route table at a time, but you can associate multiple subnets with the same route table. By manipulating route tables, you have control over the network traffic paths, which can help enhance your network security and efficiency.

Public and Private Subnets

A public subnet is defined as any subnet that has a route to an IGW in the main or associated route table, thereby allowing instances in the subnet to connect directly with the internet.

In contrast, a private subnet doesn't have a route to the IGW, restricting instances in the subnet from having direct access to the internet. However, internet access can still be achieved indirectly by using a NAT gateway or NAT instance, allowing instances to download necessary updates or software while maintaining the private nature of the subnet.

Private, Public, and Elastic IP Addresses

In Amazon VPC, each instance is assigned a private IP address from the IPv4 address range of your VPC. Public IP addresses are reachable from the internet, but only if the network security group and network ACLs associated with your subnet allow such traffic, and if the instance is in a public subnet.

Public IP addresses are only assigned to an instance upon launching or when you allocate an Elastic IP address. Elastic IP addresses are static, public IPv4 addresses that are reserved for your AWS account, and you can associate or disassociate them with your instance as needed.

NAT Gateway

A NAT Gateway is a AWS managed service that provides instances in a private subnet access to the internet, but prevent the internet from initiating a connection with those instances. It's used for scenarios where instances in the private subnet need to download patches, updates, and other necessary data, but don't need to be accessible from the internet.

A common pattern for web services is for them to be accessible through a public Load Balancer, in which case instances don't need to be accessed directly. For securiy reasons, those instances should be placed in private subnets. In those cases a NAT Gateway will grant the instances access to the internet without making them directly accessible from the internet, enabling all functionality with a secure configuration.

Access the internet from your instances

Instances without a public IP can access the internet in two ways. One method is through a NAT device, which translates the private IP addresses of the instances to public IP addresses for outgoing internet traffic. The other way is to use a VPC Endpoint, which allows private connectivity to services across your VPC, without routing traffic across the public internet.

VPC Peering

VPC Peering is a networking connection between two VPCs, either within the same region or across different AWS regions, that enables you to route traffic between them privately. Instances in a peered VPC can communicate with each other as if they were within the same network. VPC peering connections are not transitive, meaning if you have a peering connection between VPC A and VPC B, and between VPC B and VPC C, VPC A does not have automatic access to VPC C.

Security Groups and Network ACLs

In the context of Amazon VPC, security groups and Network Access Control Lists (Network ACLs) are features that help secure your resources. They act as virtual firewalls that control the inbound and outbound traffic to your instances and subnets.

Security Groups for Web Servers

Security Groups act at the instance level, providing high-granularity access control. Each security group you create for your web servers should include rules that allow inbound and outbound traffic as necessary for your web applications.

For instance, a security group for a web server often allows inbound traffic on HTTP and HTTPS ports 80 and 443, respectively, to permit incoming client requests. It would typically allow outbound traffic on all ports to facilitate responses to client requests and communication with other components of the application.

If your web server needs to communicate with a database server, you might allow outbound traffic to the specific port your database server listens on, typically connecting to the security group your database server belongs to.

Security Groups for Database Servers

In contrast to web servers, database servers usually have a stricter set of rules. Inbound traffic is generally limited to specific ports that the database listens on, such as 3306 for MySQL or 5432 for PostgreSQL. This traffic is often allowed only from specific sources, such as the security group attached to your web servers, thus restricting access to only the necessary instances.

Outbound rules for a database server security group are usually less restrictive. Many configurations allow all outbound traffic, as responses to legitimate inbound requests and for necessary communication with other parts of the application.

Security Group Rules

Security Group rules are always permissive; you can't create rules that deny access. An absence of a rule is an implicit deny. Also, security groups are stateful if you send a request from your instance, the response traffic for that request is allowed to flow in regardless of inbound security group rules.

You can add rules to a security group that allow traffic to or from its associated instances. You can modify the rules of a security group at any time; the new rules are automatically applied to all instances associated with the security group.

Network ACLs

Network ACLs operate at the subnet level and provide a second layer of defense if your security group rules are not sufficient. Unlike security groups, Network ACLs have separate inbound and outbound rules, and each rule can either allow or deny traffic. Network ACLs are stateless, which means return traffic must be explicitly allowed by rules.

By default, each VPC comes with a modifiable default network ACL that allows all inbound and outbound traffic. When you create a new network ACL, it denies all inbound and outbound traffic until you add rules.

Overall, the combined use of security groups and Network ACLs can help ensure your VPC's security and integrity, enabling you to establish the precise access control that your applications require.

Pricing for Amazon VPC

There are no extra charges for creating and using a VPC itself. You pay for the AWS resources (like EC2 instances, EBS volumes) that you choose to launch within your VPCs.

However, note that some components associated with Amazon VPC, such as NAT Gateway, VPN Connections, and data transfer costs, have their own pricing structures. For example, there may be charges for data transfer between your VPC and your on-premises data center if it's done through the public internet.

Moreover, while inbound data transfer is typically free, outbound data transfers are charged after the first GB per month. These prices vary depending on the region and the specific service used.

Remember to monitor and adjust your resources as needed, taking advantage of the AWS Free Tier when possible to reduce costs.

In conclusion, Amazon VPC is a powerful tool that allows you to leverage the benefits of a private, isolated network while capitalizing on the scalability and efficiency of the AWS Cloud. It provides a reliable foundation upon which you can build your cloud infrastructure, making it an essential component of any robust AWS environment.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

How to Run Containers in AWS

Guillermo Ojeda — Tue, 04 Jul 2023 15:19:34 GMT

What are containerized applications?

Containerization is a revolutionary approach to software development and deployment. It encapsulates applications in a container with their entire runtime environment - all the files necessary for them to run. This technological advancement guarantees that the software will always run the same, regardless of its environment.

Containerized applications are neatly packed with all the elements they need to run correctly. These elements include code, runtime, system tools, libraries, and settings. Essentially, containerized applications are separated from their environments, which prevents the infamous "it worked on my machine" problem.

The key benefit of containerized applications is consistency. The encapsulated nature of containers means that they run the same way across all environments, be it a local setup, a testing environment, or a production server. Furthermore, containers are lightweight and can be created, started, and replicated quickly. They also isolate applications and their dependencies from each other, providing a clean, controlled environment for each application to run.

Containers have had a considerable influence on software development, making applications more portable and managing dependencies more straightforward. They simplify the process of building, shipping, and running applications, and help to ensure that the software runs as expected in different environments.

Why use containers instead of EC2?

While Amazon Elastic Compute Cloud (EC2) instances provide virtual servers in the cloud, containers offer a level of flexibility and efficiency that traditional virtual machines can't match. Here's why developers might lean towards containers over EC2 instances:

Consistency Across Environments: Containers package an application with everything it needs to run, ensuring consistency across all environments. It means the software runs the same, regardless of the deployment environment.
Efficient Resource Utilization: Containers are far more lightweight than virtual machines as they share the host system's kernel and don't require a full operating system per application. This results in less overhead, more efficient resource usage, and faster start-up times.
Greater Density: Containers are lightweight and require fewer resources, so you can run many more containers than VMs on a single host. This provides a higher level of efficiency and cost-effectiveness.
Isolation: Containers ensure that applications and their dependencies are isolated from each other and from the host system. This isolation prevents potential conflicts between application dependencies and increases security.
Portability: Containers are highly portable. They can be built once and run anywhere, making them an ideal choice for creating scalable, distributed applications and microservices.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

AWS Containers services

When it comes to managing containers in the cloud, AWS provides several robust, feature-rich services. These services cater to different needs and use cases, offering flexibility in how you deploy and manage your containerized applications.

Amazon Elastic Container Service (ECS): ECS is a high-performance, highly scalable service for running Docker containers. With ECS, you can easily run applications on a managed cluster of Amazon EC2 instances. It supports Docker and allows you to run and manage Docker containers across a cluster of EC2 instances.

Amazon Elastic Kubernetes Service (EKS): EKS is a managed service for running Kubernetes on AWS. It makes it easy to deploy, manage, and scale containerized applications using Kubernetes. EKS also manages the Kubernetes control plane for you, providing high availability, automatic updates, and security.

AWS Fargate: Fargate is a serverless compute engine for containers that removes the need to manage the underlying infrastructure. It works with both ECS and EKS and lets you focus on building and running applications rather than managing infrastructure.

AWS App2Container: AWS App2Container (A2C) is a CLI tool to transform .NET and Java applications into containerized applications.

Amazon Elastic Container Registry (ECR)

Amazon Elastic Container Registry (ECR) is a fully-managed Docker container registry provided by AWS. It allows you to store, manage, and deploy Docker container images. ECR is integrated with Amazon ECS and Amazon EKS, allowing you to simplify your development and production workflows.

ECR is reliable and highly scalable. It uses the power of AWS to automatically scale to meet your needs. Plus, it integrates with AWS Identity and Access Management (IAM), providing resource-level control of each repository.

Use Cases

Here are some of the typical use cases of Amazon ECR:

Distributed application development: Amazon ECR enables you to version your application's code and dependencies, making it easier to collaborate across development teams. Your application can be built and tested as a set of small, independent services that can be developed and deployed quickly.
Microservices architecture: Amazon ECR is a great tool for implementing microservices architecture, a design approach where a single application is composed of many loosely coupled and independently deployable smaller services. ECR can store and manage images for each service, helping you to manage, scale, and deploy your microservices more effectively.
Continuous integration and continuous delivery (CI/CD): ECR can be integrated with AWS CodeBuild and AWS CodePipeline, enabling you to streamline your CI/CD workflow. Developers can easily push images to Amazon ECR, which can then be deployed to Amazon ECS or EKS as part of an automated pipeline.

How to Use ECR

Using Amazon ECR involves the following steps:

Create a repository: In the AWS Management Console, go to the Amazon ECR page, and create a new repository. Provide a unique name for the repository and configure any necessary permissions.
Push an image: Before pushing an image, authenticate your Docker client to the Amazon ECR registry that you just created. Afterward, you can build a Docker image on your local machine and push it to the ECR repository.
Pull an image: Now you can pull the Docker image from the ECR repository to any location where it's needed, for example, an ECS or EKS cluster.
Delete an image: If an image is not required anymore, you can delete it from the ECR repository to save storage space.

Using Amazon ECR, you can store and manage your Docker images reliably and securely, which can be seamlessly deployed to Amazon's ECS or EKS services.

Amazon Elastic Container Service (ECS)

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service provided by AWS. It allows you to easily run, scale, and secure Docker container applications across a cluster of servers. ECS is deeply integrated with other AWS services, providing a complete solution for running a wide range of containerized applications or services.

Advantages of ECS

Deep Integration with AWS: ECS is natively integrated with many AWS services such as Elastic Load Balancer (ELB) for load distribution, ECR for Docker images, CloudWatch for logs, and IAM for role management. This seamless integration makes it easy to create a complete solution using familiar AWS services.
Scalability: ECS allows you to easily scale your applications up or down with simple service calls, and you can ensure high availability by running applications across multiple availability zones.
Security: With ECS, you can take advantage of IAM roles and policies for your applications. You can define granular access permissions for each service or container.
No Additional Cost: ECS comes at no extra cost. You only pay for the AWS resources (e.g., EC2 instances or EBS volumes) you create to store and run your applications.
Simplicity: ECS is simpler to use than other container orchestration services such as Kubernetes, which can be complex to set up, manage, and maintain.

Disadvantages of ECS

Limited Features: Compared to Kubernetes, ECS has a smaller feature set. Kubernetes offers a more mature and flexible platform with advanced networking, scaling, and load balancing features.
AWS Specific: ECS is an AWS-specific service, meaning it doesn't offer the same portability as Kubernetes, which can run on virtually any public or private cloud.
Service Discovery: ECS has limited service discovery capabilities compared to Kubernetes, which can make it challenging to coordinate containers that depend on each other.

How to Use ECS

To get started with Amazon ECS, you typically follow these steps:

Create a Docker image: Your application needs to be containerized as a Docker image. This image contains the application code and all its dependencies.
Store the Docker image: Once the Docker image is built, it needs to be stored in a Docker registry. AWS offers ECR for this purpose, but you can use any Docker-compliant registry.
Create a Task Definition: This JSON file describes the Docker container(s) that form your application. It specifies things like what Docker images to use, how much CPU and memory to allocate, what ports to open, and much more.
Create a Service: The service maintains a specified number of instances of your task definition simultaneously. If any of your tasks should fail or stop, the service scheduler launches another instance of your task definition to replace it and maintain the desired count of tasks.
Create a Cluster: A cluster is a logical grouping of tasks or services. When you first use Amazon ECS, a default cluster is created for you, but you can create multiple clusters in an account to keep your resources separate.
Run Your Application: Once your service is created, you can run your application. You can monitor the run state of your application through the Amazon ECS console, the AWS CLI, or the Amazon ECS API.

ECS is a powerful orchestration service for Docker containers. While it has its trade-offs, it can be a great tool for teams already working in AWS environments, looking for simplicity, and needing to get a containerized application up and running quickly.

Amazon Elastic Kubernetes Service (EKS)

Amazon Elastic Kubernetes Service (EKS) is a managed service that allows you to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane. It provides the flexibility of Kubernetes with the security, scalability, and availability of AWS.

Advantages of EKS

Managed Kubernetes: With EKS, you dont need to worry about operating your own Kubernetes control plane. This takes a significant load off of your operations team, allowing them to focus on building applications.
Compatibility: EKS runs upstream Kubernetes, so you can use all the existing plugins and tooling from the Kubernetes community. Applications running on any standard Kubernetes environment are fully compatible and can be easily migrated to EKS.
Highly Available and Scalable: EKS automatically manages the availability and scalability of the Kubernetes API servers and etcd persistence layer for each cluster, making it a reliable choice for production workloads.
Integration with AWS Services: EKS is natively integrated with AWS services such as AWS CloudTrail, Amazon RDS, and Amazon S3, providing a seamless experience for application deployment.
Security: EKS integrates with AWS IAM for authentication and Kubernetes RBAC (Role-Based Access Control) for authorization, providing robust security for your applications.

Disadvantages of EKS

Cost: Unlike ECS, EKS comes with an additional cost for the managed service on top of the EC2 instance costs.
Complexity: Kubernetes has a steep learning curve and managing a Kubernetes environment can be complex. While EKS alleviates some of this complexity by managing the control plane, teams still need to manage their own worker nodes.
Limited Customization: As a managed service, EKS may not allow for the same level of customization that a self-managed Kubernetes environment would.

How to Use EKS

To get started with Amazon EKS, you typically follow these steps:

Create an EKS Cluster: In the AWS Management Console, go to the Amazon EKS page, and create a new cluster. Provide details such as cluster name, Kubernetes version, role, and VPC details.
Launch and Configure EKS Worker Nodes: You have to create and configure the worker nodes for your cluster. These worker nodes register with the Kubernetes endpoint and become part of the EKS cluster.
Create a kubeconfig File: This file is used to tell Kubernetes where it can communicate with your cluster. Once created, it allows you to use 'kubectl', the Kubernetes command-line tool, to interact with your cluster.
Deploy Your Application: With your cluster set up, you can now deploy your application. You will define your application in a Kubernetes Deployment manifest, a YAML file that describes the desired state for your application. Once the Deployment is created, Kubernetes begins the work of launching your application containers.
Monitor and Scale Your Application: You can use Amazon CloudWatch and Auto Scaling groups to monitor the performance of your application and adjust the number of containers in your deployment, or the size of your worker node group, to meet the needs of your application.

Amazon EKS provides a powerful, scalable platform for running containerized applications. While it requires more expertise than ECS, it provides the extensive feature set and customizability of Kubernetes, making it an excellent choice for complex, large-scale applications.

AWS App2Container

AWS App2Container (A2C) is a command-line tool for modernizing .NET and Java applications into containerized applications. A2C analyzes and builds an inventory of all applications running in virtual machines, on-premises or in the cloud. You simply select the application you want to containerize, and A2C packages the application artifact and identified dependencies into container images, configures the network ports, and generates the ECS task and Kubernetes pod definitions.

Advantages of App2Container

Simplicity: A2C simplifies the process of containerizing existing applications. It automates the identification of application dependencies, the creation of Dockerfiles, and the generation of Kubernetes manifests or ECS task definitions.
Integrated with AWS Ecosystem: A2C integrates well with other AWS services. It creates images that are ready to be stored in the Amazon Elastic Container Registry (ECR), and produces application configurations for ECS or EKS.
Application Inventory: A2C provides an inventory of all your applications, giving you a clear overview of your landscape. This can be useful for tracking and management purposes.

Disadvantages of App2Container

Limited Application Support: Currently, A2C only supports Java and .NET applications. If your application is written in another language, you can't use A2C.
Initial Setup: Some users have found the initial setup and configuration of A2C to be somewhat complex. However, once set up, the process of containerizing applications is typically straightforward.
Less Control: Because A2C automates much of the containerization process, you have less control over the specifics of that process. This might not be ideal for complex applications with specific needs.

How to Use App2Container

To containerize an application with AWS App2Container, you typically follow these steps:

Install A2C: First, download and install the A2C CLI tool on the server where your application is running.
Inventory Applications: Run the a2c inventory command. This analyzes the applications on your server and creates an inventory of all discovered applications.
Analyze Application: Choose an application from the inventory and run the a2c analyze command with the application ID. This will identify application dependencies, ports, processes, and generate a report.
Containerize Application: Run the a2c containerize command with the application ID. This will generate Docker images for your application, including all its dependencies.
Deploy: Once your images are ready, you can deploy your newly containerized application to ECS or EKS. The a2c generate app-deployment command generates a CloudFormation template to simplify the deployment.

A2C is a powerful tool for quickly containerizing existing applications, allowing you to benefit from the portability and standardization offered by containers.

How to choose the right AWS containers service

Choosing the right AWS container service requires you to consider the specific needs of your application and your team's expertise. Here are some factors to consider:

Managed vs. Unmanaged: If you prefer to focus more on developing your application and less on managing the infrastructure, then ECS or EKS running on Fargate might be the right choice for you.
Orchestration: If your team has experience with Kubernetes, then EKS could be a fitting choice. If not, ECS could be the right choice as it is more straightforward and more integrated with other AWS services.
Pricing: Pricing is also a crucial factor to consider. Both ECS and EKS are free to use - you only pay for the AWS resources you create to run and store your applications.
Integration: If you need seamless integration with other AWS services, ECS might be a better choice, while EKS could be more suitable if you're planning on running your application across multiple clouds or in a hybrid environment.

By understanding your requirements and the strengths and limitations of each service, you can choose the right container service that suits your application's needs.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

AWS Spot Instances - Reducing EC2 Costs up to 90%

Guillermo Ojeda — Mon, 03 Jul 2023 22:32:20 GMT

Leveraging the most cost-effective resources while maintaining high performance is key to successful cloud architecture. AWS Spot Instances offer a way to achieve this balance, allowing you to utilize spare computing power at up to 90% less than standard costs. In this detailed guide, we'll explore the intricacies of Spot Instances, how to use them, and how they can significantly reduce your Amazon EC2 costs.

What are Spot Instances in AWS EC2?

Spot Instances are a feature of Amazon Web Services (AWS) associated with the Elastic Compute Cloud (EC2). Spot Instances allow users to leverage unused EC2 capacity in the AWS cloud. This is offered at substantially reduced rates, often resulting in savings of up to 90% compared to On-Demand prices.

Spot Instances operate identically to On-Demand instances. However, the critical differences lie in the pricing structure and availability. Spot Instances' cost, also known as spot pricing, varies based on the real-time supply and demand for AWS's unused capacity. This means the price fluctuates over time, and unlike On-Demand instances, Spot Instances can be interrupted by AWS with a two-minute notice. This interruption happens when AWS needs to reclaim capacity or if the current spot price goes above the maximum price you've set.

It's essential to understand that the nature of Spot Instances means that they're ephemeral. While they offer a fantastic way to save on computing costs, they do come with the risk of sudden termination. As such, Spot Instances are ideal for fault-tolerant and flexible applications.

Why use AWS Spot Instances?

Spot Instances are a fantastic resource for the right applications and come with several benefits.

Cost-Effective: One of the main attractions of Spot Instances is the cost. They can significantly lower your Amazon EC2 costs since they utilize spare EC2 capacity at a much lower price point. When compared to On-Demand Instance pricing, you can achieve savings of up to 90%.
Scaling Your Computing Power: Given their lower costs, Spot Instances allow you to augment your computational capacity for the same budget significantly. This makes them particularly attractive for scalable, fault-tolerant applications that can benefit from increased computing power without increasing costs.
Flexibility: Spot Instances are available for all types of EC2 instances. This flexibility allows you to select the right Spot Instance that best fits your application requirements in terms of memory, compute, and storage.
Integration with AWS Services: Spot Instances are integrated seamlessly with a variety of AWS services. They work with AWS Auto Scaling, Amazon EMR, Amazon ECS, and AWS CloudFormation. These integrations make Spot Instances even more convenient as they can be easily managed, scaled, and utilized across a wide range of applications.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

When should you use AWS Spot Instances?

While Spot Instances provide cost-saving benefits, they are not suitable for every type of workload. The key lies in understanding when to use Spot Instances effectively. They are best suited for applications with flexible start and end times, applications that are designed to be fault-tolerant, and applications that can handle sporadic interruptions.

Typically, workloads that are suitable for Spot Instances include big data processing jobs, containerized workloads, continuous integration and continuous delivery (CI/CD) pipelines, web services, high-performance computing (HPC), and testing & development environments. They are not recommended for mission-critical jobs that require continuous compute and cannot handle interruptions.

AWS Spot Instances pricing

The pricing for Spot Instances is dynamic and fluctuates based on the supply and demand for unused AWS EC2 capacity. When you request a Spot Instance, you specify the maximum price you are willing to pay per instance hour. If the current Spot price is less than or equal to your maximum price, your request is fulfilled, and your instances run until either you decide to terminate them, the Spot price goes above your maximum price, or when AWS needs to reclaim the capacity.

This spot pricing structure makes Spot Instances an excellent option for cost-saving, but it does require careful planning to ensure that your application can handle the potential for sudden interruptions.

AWS Spot Instances pricing history

AWS provides the Spot Instance pricing history in the AWS Management Console. This history shows the Spot price for each instance type in each Availability Zone for the past 90 days. Analyzing this data can help you understand the price trends for different instance types and make more informed decisions when setting your maximum price for Spot Instances.

How AWS Spot Instances work

To fully grasp the benefits and potential use cases for AWS Spot Instances, it's important to understand the mechanics behind them. From Spot Instance requests to interruptions and managing instance limits, let's delve into the workings of AWS Spot Instances.

Spot Instance Request

A Spot Instance request is the first step towards acquiring a Spot Instance. You can place a request through the AWS Management Console, AWS CLI, or SDKs. When placing a request, you specify the maximum price you're willing to pay per hour, per instance. If your bid price is higher than the current Spot price, your request is fulfilled and your instances run until you decide to terminate them, the Spot price goes above your maximum price, or when AWS needs the capacity back.

Spot Instance Interruptions

Spot Instance interruptions are an integral part of the Spot Instance lifecycle. AWS can interrupt a Spot Instance for one of three reasons: if the current Spot price exceeds your maximum bid, if AWS requires the capacity back for On-Demand usage, or if the Spot Instance has been running for more than six hours. When AWS decides to interrupt your Spot Instances, a two-minute warning is sent to your instances, allowing you to save and checkpoint your work.

Rebalance Recommendations

AWS provides rebalance recommendations to help manage Spot Instances effectively. A rebalance recommendation is a signal provided by AWS that indicates a Spot Instance is at an elevated risk of interruption. By monitoring these recommendations, you can proactively manage your Spot Instances and take action, like starting new instances or saving the state of your current instances, before an interruption occurs.

Spot Instance Advisor

The Spot Instance Advisor is a tool provided by AWS to help you identify the most cost-effective Spot Instances. The tool provides information about savings compared to On-Demand prices, frequency of interruption, and available Spot Instance types across all regions. This can assist you in making informed decisions when choosing Spot Instances.

Spot Instance Limits

Spot Instance limits are based on the number of vCPUs that you use in your Spot Instances. Each AWS account has a Spot Instance limit per region, and once you reach this limit, you will not be able to launch additional Spot Instances in that region. You can request an increase in your Spot Instance limit if necessary.

Burstable Performance Instances

Burstable Performance Instances, or T2 and T3 instances, provide a baseline level of CPU performance with the ability to burst above the baseline. These instances are an excellent choice for workloads that don't use the full CPU often or consistently but occasionally need to burst. When using these instances as Spot Instances, you can enjoy cost savings for workloads with flexible CPU requirements.

AWS Spot Fleet

A Spot Fleet is a set of Spot Instances, and optionally On-Demand Instances, that is launched based on criteria that you specify. Spot Fleet deploys the optimal combination of instance types and Availability Zones based on your preferences to achieve the desired capacity. It allows you to diversify your Spot Instances across multiple instance types and Availability Zones to optimize cost and maintain high availability. This can be a powerful tool for managing larger sets of Spot Instances.

**

What is the difference between spot instances and reserved instances?**

Spot Instances and Reserved Instances are two different pricing models offered by AWS, each with their unique attributes and use cases.

Reserved Instances (RIs) are ideal for predictable workloads and offer significant discounts compared to On-Demand pricing. When you purchase a Reserved Instance, you reserve capacity for specific instances in an AWS region for 1 or 3 years. This model offers cost predictability, capacity reservation, and substantial cost savings.

On the other hand, Spot Instances allow you to use spare EC2 computing capacity at up to 90% less than On-Demand rates. However, these instances can be interrupted by AWS, meaning they are less reliable for critical, uninterrupted workloads. Spot Instances are better suited for flexible, fault-tolerant, or time-insensitive tasks.

What is the difference between spot instances and on-demand instances?

On-Demand Instances and Spot Instances differ mainly in pricing structure and availability.

On-Demand Instances let you pay for compute capacity by the hour or second with no long-term commitments or upfront payments. You retain control and keep the instances for as long as you need, offering flexibility and ease-of-use. On-Demand Instances are ideal for short-term, irregular workloads that cannot be interrupted.

Spot Instances, meanwhile, offer the opportunity to request spare Amazon EC2 computing capacity at significantly reduced rates. However, they come with the possibility of being interrupted by AWS if your bid price is exceeded or if the capacity is needed elsewhere. Spot Instances are a cost-effective choice for applications with flexible start and end times or those that are fault-tolerant.

Spot Instance Best Practices

While Spot Instances offer significant cost benefits, they require thoughtful management and strategic application. Here are some best practices for using AWS Spot Instances:

Use for Suitable Workloads: Spot Instances are ideal for flexible, interruptible workloads or those that aren't time-sensitive. Jobs such as big data analysis, containerized workloads, CI/CD pipelines, testing environments, and stateless web servers are prime candidates.
Bid Wisely: When placing a Spot Instance request, bid a price that is higher than the current Spot price to increase the chances of your request being fulfilled. Remember, however, that a higher bid price increases the chances of your instance continuing to run but also increases your costs.
Diversify Your Spot Instances: Diversify across as many different instance types, sizes, and Availability Zones as your application can tolerate. This increases the odds of your spot instance request being fulfilled and helps maintain application availability during spot price fluctuations.
Monitor and Respond to Spot Instance Interruptions: Spot Instances come with a two-minute interruption notice. Monitor for these interruptions and design your applications to checkpoint, save state, and move work to a new spot instance or On-Demand Instance when required.
Use Spot Fleets: Spot Fleets are a collection of Spot Instances and potentially On-Demand Instances. They allow you to manage your instances and diversify across different instance types, purchase options and Availability Zones automatically.

While Spot Instances offer an excellent opportunity to save on costs, understanding and planning for their inherent volatility is key. A good AWS architecture should consider a mix of On-Demand, Reserved, and Spot Instances to achieve availability goals while staying cost-efficient.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

AWS Elastic Beanstalk Explained

Guillermo Ojeda — Mon, 03 Jul 2023 19:43:25 GMT

What is AWS Elastic Beanstalk?

AWS Elastic Beanstalk is an orchestrated service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS. Introduced by Amazon Web Services (AWS), this service automatically handles the details of capacity provisioning, load balancing, scaling, and application health monitoring.

Elastic Beanstalk provides an environment to easily deploy and run applications in the language of your choice. Simply upload the code, and Elastic Beanstalk will manage the deployment, from capacity provisioning, load balancing, and auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time.

The service's simplicity eliminates the need for system administrators, reduces complexity, and enables developers to focus on their code. This results in faster development and deployment of applications. Elastic Beanstalk is also free to use. You pay only for the underlying AWS resources that your application consumes.

Elastic Beanstalk Features

Elastic Beanstalk is designed to make it easier for developers to quickly deploy and manage applications in AWS, without needing to understand all the underlying components and services that allow this to work. These are the key features of Elastic Beanstalk:

Automatic Scaling: Elastic Beanstalk monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. It uses AWS Auto Scaling to react to changes in demand and ensure the right amount of resources are available to handle your application's traffic.
Managed Environment: Elastic Beanstalk takes care of the details of your hosting environment, including server setup and configuration, stack maintenance, patch updates, and more, allowing developers to focus solely on their applications.
Resource Monitoring: Elastic Beanstalk integrates with Amazon CloudWatch and AWS CloudTrail providing you with detailed performance and health monitoring for your applications. You can view real-time data, set alarms, view logs, and troubleshoot any issues that arise.
Customization: With Elastic Beanstalk, you have the freedom to select the AWS resources, such as Amazon EC2 instance type, that are optimal for your application.
Application Versioning: Elastic Beanstalk stores all versions of your application, allowing you to easily deploy any previous version, roll back updates, or even run multiple versions in parallel for A/B testing.

Use Cases: What is Elastic Beanstalk used for?

Web Application Hosting: AWS Elastic Beanstalk is commonly used for hosting web applications. It supports several pre-configured platforms, making it a breeze to deploy applications developed in popular languages and frameworks like Java, .NET, PHP, Node.js, Python, Ruby, and Go.
Microservices: Elastic Beanstalk is an excellent choice for deploying microservice architecture based applications. Each microservice can be deployed in its own Beanstalk environment, which can be individually scaled and managed. This ensures high availability and fault tolerance for each service.
API Backend: Elastic Beanstalk can serve as a robust backend for both RESTful and GraphQL APIs. It can scale to accommodate high load, ensuring your API can handle the demands of your applications.
Background Worker Tasks: Elastic Beanstalk's worker tier environments allow you to offload background tasks from your web application. This leads to improved performance by reducing the load on your web tier and provides a better user experience for your customers.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

How Elastic Beanstalk Works

When you create an AWS Elastic Beanstalk application, you simply upload your application code, and Elastic Beanstalk handles the rest. It automatically takes care of the details of capacity provisioning, load balancing, scaling, and application health monitoring, which means you have more time to focus on what matters most: your application.

Heres what happens when you deploy an application:

Upload Your Application: You start by creating an application, which is a logical collection of Elastic Beanstalk components, including environments, versions, and environment configurations. In Elastic Beanstalk an application serves as a container for the environments running your application versions.
Environment Creation: After uploading your application, you create an environment where your application will run. You can choose between a web server environment, which runs a single application that serves HTTP requests, and a worker environment, which pulls tasks from a queue to perform background tasks.
Configuration: Elastic Beanstalk provides a variety of platform configurations for Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker. You can select the one that matches the language used by your application.
Deployment: After you've configured your environment, Elastic Beanstalk will deploy your application and provision the necessary AWS resources, such as Amazon EC2 instances, an Amazon RDS DB instance, or an Elastic Load Balancer.
Monitoring: Once your application is up and running, you can monitor its health and performance using AWS Management Console, Elastic Beanstalk Health Dashboard, or AWS CloudWatch.

Elastic Beanstalk Components

Elastic Beanstalk consists of several components that work together to host your application. Two of the primary components are the "web server environment" and the "worker environment".

Elastic Beanstalk Web Server Environment

An Elastic Beanstalk Web Server Environment is specifically designed to support web apps that handle HTTP requests. It's intended for applications that directly respond to incoming traffic from users.

When you create a Web Server Environment, Elastic Beanstalk sets up and manages several AWS resources for you:

Amazon EC2: Elastic Beanstalk provisions one or more Amazon EC2 instances to run your application code. You can choose the instance type to suit the needs of your application.
Elastic Load Balancer: Elastic Beanstalk automatically creates an Elastic Load Balancer to distribute incoming traffic across the EC2 instances running your application.
Auto Scaling Group: Elastic Beanstalk sets up an Auto Scaling group to handle the automatic scaling of your application. This ensures that the number of EC2 instances increases during peak traffic and decreases during off-peak times, optimizing cost and performance.
Amazon S3 Bucket: Elastic Beanstalk stores your application versions and environment configurations in Amazon S3 for safekeeping and easy version management.
Amazon CloudWatch: Elastic Beanstalk sets up Amazon CloudWatch to monitor your application and environment, providing you with critical metrics and log streams.

Elastic Beanstalk Worker Environment

In contrast to the Web Server Environment, the Worker Environment is designed for long-running, data-intensive tasks that can be performed in the background, away from the immediate user traffic.

This environment is perfect for listening to a queue of incoming tasks and processing them asynchronously. For example, you could use a Worker Environment to handle tasks such as sending batch emails, processing media files, or running complex computations.

When you create a Worker Environment, Elastic Beanstalk sets up the following resources:

Amazon EC2: Just like the Web Server Environment, Elastic Beanstalk provisions Amazon EC2 instances to run your worker code.
Auto Scaling Group: Elastic Beanstalk sets up an Auto Scaling group to handle the automatic scaling of your worker tasks. This ensures that the number of EC2 instances increases when tasks queue up and decreases when the queue is light.
Amazon SQS Queue: Unlike the Web Server Environment, the Worker Environment sets up an Amazon SQS queue to hold tasks that your worker will process. Your application, or even users, can add tasks to this queue.
Periodic Tasks: The Worker Environment also supports scheduled (cron) tasks that you can use to run maintenance tasks or cleanups.
Amazon S3 Bucket: Elastic Beanstalk stores your worker environment configurations and versions in an S3 bucket.
Amazon CloudWatch: Elastic Beanstalk uses Amazon CloudWatch to monitor the performance and health of your worker environment and application.

In sum, the Web Server and Worker Environments in Elastic Beanstalk provide flexible, scalable, and managed solutions to handle both the user-facing and background processing needs of your applications.

How to Deploy an Application using Elastic Beanstalk

Deploying an application to AWS Elastic Beanstalk involves several steps. Let's break it down with a step-by-step example, where we'll be deploying a simple Node.js application.

Prerequisites: Before proceeding, make sure you have your AWS account set up and AWS CLI installed on your machine. Also, ensure Node.js is installed, as we're deploying a Node.js application.

Prepare Your Application: Start by creating a simple Node.js application. You can use Express.js to quickly create an application. Here is a basic example:

const express = require('express');const app = express();const port = 8080;app.get('/', (req, res) => res.send('Hello World!'));app.listen(port);console.log(`App running on http://localhost:${port}`);

Save this in a file called app.js.

Package Your Application: Next, create a ZIP file of your application. For this example, you would zip the app.js file.

zip app.zip app.js

Create an Elastic Beanstalk Application: Navigate to the Elastic Beanstalk console in your AWS Management Console. Click on 'Create Application'. Give your application a name, for example, 'test-app', and provide an optional description. Click 'Create'.
Create an Environment: Once your application is created, you'll need to create an environment for it. Click on 'Create environment' > 'Web server environment'. Fill in the necessary details, such as domain and platform. In this case, our platform is Node.js. Click 'Create environment'.
Upload and Deploy: Click on 'Upload and Deploy'. Upload the ZIP file you created earlier (app.zip). Once the upload is complete, click 'Deploy'. Elastic Beanstalk will now take care of provisioning resources and deploying your application.
Monitor: Once your application is up and running, you can use the Elastic Beanstalk console to monitor its health and performance. AWS provides various monitoring and logging options to help you troubleshoot any issues.

And there you have it! You've just deployed a sample Node.js application on AWS Elastic Beanstalk. Keep in mind your real app might have different requirements.

Alternatives to AWS Elastic Beanstalk

AWS Elastic Beanstalk vs App Runner

While Elastic Beanstalk provides a PaaS-like environment, suitable for various applications and web services, AWS App Runner is more specific. App Runner is ideal for developers who want to build and run containerized applications quickly, without worrying about the underlying infrastructure. However, it may not provide the same level of control and flexibility as Elastic Beanstalk.

AWS Elastic Beanstalk vs EC2

AWS Elastic Beanstalk and EC2 (Elastic Compute Cloud) both allow you to deploy and manage applications on virtual servers in the cloud. The primary difference lies in the level of control and the amount of management they require.

EC2 offers complete control and flexibility, but with that comes increased management overhead. You're responsible for setting up the server, installing the software stack, deploying the application, and managing scaling and monitoring.

On the other hand, Elastic Beanstalk abstracts much of this, managing the infrastructure for you and simplifying deployment and scaling. It's perfect for developers who want to focus on their application code without worrying about the underlying infrastructure.

Conclusion

In a nutshell, AWS Elastic Beanstalk offers an easy-to-use service for deploying and managing applications in the AWS cloud. It provides a layer of abstraction over key AWS services, handling the nitty-gritty details of capacity provisioning, load balancing, scaling, and application health monitoring, allowing developers to focus on writing code rather than managing infrastructure.

While other AWS services like EC2 and App Runner offer distinct strengths, Elastic Beanstalk's combination of simplicity, control, and flexibility makes it an excellent choice for many application deployment needs. Whether you are building a multi-tiered web application, a microservice, or a backend for your API, AWS Elastic Beanstalk could be just the solution you need.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

The Complete Guide to AWS Certification Levels

Guillermo Ojeda — Thu, 29 Jun 2023 12:51:06 GMT

About AWS Certifications

In the landscape of cloud computing, Amazon Web Services (AWS) Certifications stand as a benchmark for expertise and proficiency. These globally recognized certifications demonstrate an individual's ability to design, deploy, and manage applications and infrastructure on AWS, Amazon's versatile and comprehensive cloud computing platform.

AWS Certifications validate your knowledge and skills in one of the most prevalent cloud computing platforms worldwide. They play a crucial role in leveraging better job prospects, validating technical expertise, and standing out in a competitive market. AWS Certifications are designed with specific roles in mind, catering to a broad spectrum of professionals such as cloud practitioners, architects, developers, and operations staff. The certifications are categorized across foundational, associate, professional, and specialty levels, offering progression and specialization within the field. Preparing for an exam is a great way to build new skills and show that you can innovate with AWS services, advancing your cloud career to get better career opportunities.

Available AWS Certifications

The path to AWS proficiency is divided into four main categories, each offering a unique level of expertise: Foundational, Associate, Professional, and Specialty. Each certification aligns with specific job roles and expertise levels and offers a different set of knowledge and skills.

The Foundational AWS Certification, aimed at individuals with six months of basic AWS Cloud and industry knowledge, is a great starting point for those looking to get their feet wet in the cloud realm.

The Associate AWS Certifications are designed for professionals with one year of experience solving issues and implementing solutions on the AWS Cloud. There are three certifications for different roles, including Solutions Architect, Developer, and SysOps Administrator.

The Professional AWS Certifications require two years of comprehensive experience and cover a deeper understanding of how to design, operate, and troubleshoot solutions using the AWS Cloud.

Specialty AWS Certifications cater to those with distinct technical skills, focusing on specific technical areas like advanced networking, data analytics, security, machine learning, and more.

Benefits of AWS Certifications

AWS certifications provide several benefits, not just in terms of knowledge and skills, but also for your professional development. The benefits are wide-ranging, from improving your employability and job security to increasing your salary potential. Let's dive into a few key benefits.

Significant Demand for AWS Certified Professionals

One of the key benefits of obtaining an AWS certification is the high demand for AWS certified professionals in the IT job market. As businesses of all sizes move their operations to the cloud, the need for skilled and certified cloud professionals continues to grow.

AWS holds the largest share of the cloud market, making AWS skills and certifications highly sought after. AWS certified professionals are viewed as subject matter experts in their respective certification areas, be it architecture, development, operations, machine learning, or data analytics.

These certifications demonstrate your proficiency and commitment to staying updated in the rapidly evolving cloud space, which can help differentiate you from non-certified professionals. They make you more attractive to employers, increasing your chances of landing job interviews and securing high-profile projects.

Salary Benefits of Being AWS Certified

In addition to increased job opportunities, another significant benefit of AWS certifications is the potential for a higher salary. On average, AWS certified professionals earn significantly higher salaries compared to their non-certified counterparts. According to the Global Knowledge 2020 IT Skills and Salary Report, IT professionals with an AWS certification have a median salary of $129,868, which is higher than the average for all certified professionals in North America.

Moreover, with each additional AWS certification, you can expect your salary to rise even further. Earning multiple AWS certifications can also open the door to more specialized roles with higher salaries.

How Do I Become AWS Certified?

Becoming AWS certified involves a systematic approach, starting from choosing the right certification to passing the examination. The steps involved in the process are:

Choose the AWS Certification: Depending on your job role and career aspirations, choose the certification that best suits your needs.
Review the Exam Guide: AWS provides an exam guide for each certification. It outlines the content of the examination and helps you understand what to expect.
Prepare through AWS Training and Learning Resources: AWS offers a multitude of both free and paid learning resources. These include self-paced labs, instructor-led training, practice exams, whitepapers, FAQs, and much more.
Gain Hands-on Experience: AWS certifications, while theoretical, require a good amount of practical knowledge. Hands-on experience with AWS services helps consolidate learning and prepares you better for the exam.
Schedule and Pay for the Exam: Once you are prepared, schedule the exam and pay the examination fee. AWS exams can be taken online or at a testing center.
Review and Take the Exam: Regular review and practice are essential. Once you are thoroughly prepared, take the exam.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Choosing the Right AWS Certification

Choosing the right AWS certification can seem daunting, especially with so many options available. The right choice depends on your current skill set, your role (or the role you aspire to), and your career goals. AWS provides a certification learning path for each role (Architect, Developer, and Operations), making it easier to select the certification that aligns best with your career path.

If you are a cloud practitioner looking to establish your AWS foundation, the AWS Certified Cloud Practitioner would be a good start. For professionals with one year of experience in designing distributed systems, the AWS Certified Solutions Architect Associate would be an ideal fit for cloud certification. Similarly, choose the certification that best represents your expertise and career trajectory.

AWS Foundational Certifications

AWS Certified Cloud Practitioner

AWS Certified Cloud Practitioner is the entry-level certification and an excellent starting point for individuals looking to begin their cloud journey and understand cloud concepts. It provides a basic understanding of AWS Cloud, including the architectural principles, the value proposition, key services, security aspects, and more. It also provides an understanding of billing, account management, and pricing models. This certification is ideal for individuals in roles such as sales, purchasing, financial analysis, who need a basic understanding of the AWS Cloud.

Associate-Level AWS Certifications

The Associate-level certifications are designed for individuals with one year of experience in designing systems on AWS.

AWS Certified Solutions Architect Associate (SAA-C03)

The AWS Certified Solutions Architect Associate (SAA-C03) certification validates your ability to design, deploy, and manage secure and robust applications on the AWS cloud computing platform. It involves architectural design principles, including designing for cost optimization and designing for organizational complexity. Candidates should be able to demonstrate an understanding of networking technologies and how they work in AWS. The AWS Certified Solutions Architect Associate exam is designed for those with some experience in designing distributed applications and candidates will need to be able to demonstrate their ability to design, manage, and implement applications using tools and services on the AWS platform.

If you're interested in the AWS Certified Solutions Architect Associate (SAA-C03) certification, here is the best course to prepare for it.

AWS Certified Developer Associate

The AWS Certified Developer Associate certification validates your proficiency in developing, deploying, and debugging cloud-based applications using AWS. It covers the core AWS services, uses, and basic AWS architecture best practices. Candidates should have a good understanding of the AWS SDKs and how they interact with AWS services.

If you're interested in the AWS Certified Developer Associate certification, here is the best course to prepare for it.

AWS Certified SysOps Administrator Associate

The AWS Certified SysOps Administrator Associate certification validates your experience in deploying, managing, and operating scalable systems on AWS. This certification covers a wide range of topics, including data management, security controls, networking concepts, and more.

If you're interested in the AWS Certified SysOps Administrator Associate certification, here is the best course to prepare for it.

Professional-Level AWS Certifications

The Professional level certifications require two years of comprehensive experience designing, operating, and troubleshooting solutions using the AWS Cloud.

AWS Certified Solutions Architect Professional (SAP-C02)

The AWS Certified Solutions Architect Professional certification validates your advanced technical skills and experience in designing distributed applications and systems on the AWS platform. It covers designing dynamic applications, designing for business continuity, and cost optimization strategies.

If you're looking to take the Solutions Architect - Professional exam, you'll want to read my AWS Solutions Architect Professional Exam Notes and Prep Guide, and here is the best course to prepare for it.

AWS Certified DevOps Engineer Professional

The AWS Certified DevOps Engineer Professional certification validates your technical expertise in provisioning, operating, and managing distributed application systems on the AWS platform. It covers implementing and managing continuous delivery systems, implementing and automating security controls, governance processes, and compliance validation.

If you're interested in the AWS Certified DevOps Engineer Professional certification, here is the best course to prepare for it.

AWS Specialty Certifications

The Specialty AWS Certifications cater to individuals with expertise in specific technical areas.

AWS Certified Data Analytics Specialty

The AWS Certified Data Analytics Specialty certification validates your ability to design, build, secure, and maintain analytics solutions on AWS. It covers designing and maintaining big data, leveraging tools to automate data analysis, and more.

AWS Certified Advanced Networking Specialty

The AWS Certified Advanced Networking Specialty certification validates your ability to design and implement AWS and hybrid IT network architectures at scale. It covers designing, developing, and deploying cloud-based solutions using AWS, implementing core AWS services according to architectural best practices.

If you're interested in the AWS Certified Advanced Networking Specialty certification, here is the best course to prepare for it.

AWS Certified Security Specialty

The AWS Certified Security Specialty certification validates your ability to effectively secure the AWS platform. It covers incident response, logging and monitoring, infrastructure security, and identifying and managing access controls.

If you're interested in the AWS Certified Security Specialty certification, here is the best course to prepare for it.

AWS Certified Machine Learning Specialty

The AWS Certified Machine Learning Specialty certification validates your ability to design, implement, and maintain machine learning solutions for given business problems. It covers selecting the appropriate ML approach, identifying the appropriate AWS solution, and designing and implementing a scalable, cost-optimized, reliable, and secure ML solution.

AWS Certified Database Specialty

The AWS Certified Database Specialty certification validates your comprehensive understanding of AWS database services and your ability to bring value to businesses through optimized database integration and architecting. It covers choosing appropriate database services for specific types of data and workload requirements, migrating existing databases to AWS services, and staying up-to-date with the latest database technologies. This certification typically requires at least 5 years of experience with database technologies, a couple years of hands-on experience with AWS, and expertise working with on-prem and AWS cloud-based relational and NoSQL databases.

AWS Certified SAP on AWS Specialty

The AWS Certified SAP on AWS Specialty certification validates your ability to run SAP environments and applications on AWS. It covers deploying, managing, and operating scalable, highly available, and fault-tolerant SAP applications on AWS, selecting appropriate AWS services to meet an SAP application's requirements.

AWS Recertification and Staying Up-to-Date with AWS Certifications

AWS certifications have a validity of three years, after which you'll need to recertify to keep your certification status. Recertification helps strengthen your AWS knowledge with the latest updates and innovations in the AWS ecosystem. Recertification can be done either by taking a recertification exam for your current certification level or by passing a higher level exam.

Having an AWS certification under your belt not only validates your cloud skills but also opens up new opportunities for career advancement. Whether you are a developer, an administrator, a solutions architect, or a cloud practitioner, there's an AWS certification that can help you climb the ladder of your career.

Frequently Asked Questions (FAQ)

In this section, we'll answer some of the most commonly asked questions about AWS certifications.

How Long Does It Take to Get AWS Certified?

The time it takes to become AWS certified depends on the certification level and your existing knowledge and experience with AWS. If you're just starting with the foundational level AWS Certified Cloud Practitioner certification, you can expect to spend 30-50 hours of study over 1-2 months. For associate-level certifications, you might need to study for 80-120 hours over 2-3 months, including the AWS Solution Architect - Associate certification. For professional and specialty certifications, you might need to dedicate 120-180 hours of study over 3-6 months. These estimates can vary significantly based on your background and how much time you can dedicate to studying each week.

How Much Does It Cost to Get AWS Certified?

The cost of AWS certification exams varies based on the level of the certification. As of the current writing, the Foundational certification exam costs $100, Associate level exams cost $150 each, and Professional and Specialty certification exams cost $300 each. These costs are for the exam only and do not include training or study materials.

What Are the Different AWS Certification Levels?

AWS offers certifications at four levels: Foundational, Associate, Professional, and Specialty. Foundational certification is designed for individuals seeking to demonstrate basic understanding of AWS cloud services. Associate level certifications are designed for professionals with some experience in designing or managing applications on AWS and are considered the next level in the AWS Certification path. Professional level certifications require a high degree of technical skill and experience in designing or managing AWS applications. Specialty certifications validate advanced skills in specific technical areas.

Which Certification is Best in AWS?

The "best" AWS certification depends on your career goals, current skill level, and area of interest. If you're new to AWS, the AWS Certified Cloud Practitioner certification can be a good starting point and a good place to begin your journey in AWS. If you're a Solutions Architect or aspiring to be one, then the AWS Certified Solutions Architect Associate or Professional might be best for you. If you specialize in areas like data analytics, machine learning, or security, you might consider the relevant AWS Specialty certification.

Are AWS Certifications Worth It?

Yes, AWS certifications are worth it. They validate your skills and knowledge in the leading cloud platform, increase your job opportunities, enhance your credibility, and can increase your salary. In addition, preparing for AWS certifications helps you gain deeper knowledge and understanding of AWS services and best practices.

Can I Get AWS Certifications Online?

Yes, all AWS certification exams can be taken online with remote proctoring through Pearson VUE. You'll need a quiet, private location, a reliable device with a webcam, and a strong internet connection. AWS also provides various online resources, like virtual training, practice exams, whitepapers, and FAQs to help you prepare for certification exams.

What is an AWS Certification Path?

An AWS certification path is a series of AWS certifications that align with your career goal. The path starts with the foundational level, then progresses through the associate and professional levels. After achieving professional certification, you might choose to specialize in a specific area of AWS technology. For example, if you aim to be an AWS Solutions Architect, you could follow this path: AWS Certified Cloud Practitioner -> AWS Certified Solutions Architect Associate -> AWS Certified Solutions Architect Professional. This path provides a structured approach to gaining the necessary skills and knowledge in AWS services and best practices.

What are the best courses available to get AWS Certified?

The best courses that I've found are Adrian Cantrill's. They're really affordable, starting at $40 for an Associate-level certification, and will cover all the exam topics plus some guided examples so you can get real experience. Get them here.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Cloud Storage on AWS: Choosing the Right Solution

Guillermo Ojeda — Wed, 28 Jun 2023 20:13:49 GMT

As data volumes continue to explode, storage management has emerged as a critical component in the information technology landscape. Cloud storage has proved itself as an efficient and scalable solution. Let's delve into the core concepts of cloud storage and discuss the different types of storage provided by AWS, to help you select the right service for your needs.

How Does Storage Work in AWS?

Cloud storage functions by stashing data on remote servers that can be accessed via the internet. These servers are typically operated by third-party cloud service providers, who charge customers for the utilized storage capacity.

There are three fundamental operations in cloud storage writing (uploading data), reading (downloading data), and deleting data. Some systems also allow modifications to parts of a file without needing to read or write the entire file. Cloud storage solutions can be grouped into three main categories: object storage, file storage, and block storage.

What are the 3 Types of Storage in AWS?

AWS provides a comprehensive suite of storage services that support object, block, and file storage.

Object Storage

Object storage treats data as objects, each with a unique identifier. These objects are stored in a flat address space, making it ideal for unstructured data like photos, videos, and log files. Object storage is highly scalable and is perfect for data archiving, backup, and content distribution.

The principal object storage service in AWS is Amazon S3 (Simple Storage Service). It provides unlimited storage space and can store objects up to 5 terabytes in size. S3 is widely recognized for its durability, scalability, and security features, making it suitable for a broad range of applications.

File Storage

File storage organizes data in a hierarchical structure, typically in files and folders. It supports shared access from multiple clients and is best for use cases such as content management, development environments, and media workflows.

Amazon Elastic File System (EFS) and Amazon FSx are AWSs primary file storage services. EFS is a scalable and fully-managed NFS file system, while FSx offers fully managed third-party file systems.

Block Storage

Block storage splits a volume into individual data blocks, each with a unique address. It is ideal for workloads where the application needs to control the data as if it were locally attached storage, such as databases and boot volumes.

AWSs primary block storage service is Amazon Elastic Block Store (EBS). It provides persistent block-level storage volumes for use with Amazon EC2 instances. AWS also offers instance storage, ephemeral storage directly attached to the compute instance.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Storage Services in AWS

When it comes to storing data in the cloud, AWS offers a wide array of services to cater to different use cases, performance needs, and budget constraints. Each of these AWS storage services brings its unique blend of features, making AWS a versatile platform that can handle virtually any cloud storage needs. Selecting the appropriate service depends on the specific requirements of your workloads and applications. Let's deep dive into these services and understand how each can be applied to meet your storage needs.

Amazon Elastic Block Store

Amazon Elastic Block Store (EBS) is a high-performance block storage service designed for use with Amazon EC2 instances. It provides persistent block-level storage volumes that can be attached to EC2 instances. It's suitable for workloads that require low latency, such as databases and boot volumes. EBS volumes offer features like snapshot capabilities and the ability to increase volume size without downtime.

Here's a Guide to Automating EBS Snapshots for Disaster Recovery in AWS.

Amazon Elastic File System

Amazon Elastic File System (EFS) is a fully managed, scalable file storage solution for use with AWS Cloud services and on-premises resources. It supports Network File System versions 4.0 and 4.1 (NFSv4) protocols and is an excellent choice for use cases requiring shared file storage, such as content management systems and development environments.

Amazon FSx for Lustre

Amazon FSx for Lustre is a fully managed, high-performance, scalable file storage for compute-intensive workloads such as high-performance computing, machine learning, and media data processing workflows. It's designed to provide fast processing for large datasets and integrates seamlessly with S3.

Amazon FSx for Windows File Server

Amazon FSx for Windows File Server provides fully managed, reliable, and scalable file storage that is accessible over the industry-standard Server Message Block (SMB) protocol. It's designed to support a broad range of enterprise workloads, including home directories, web serving, and content management.

If you're interested in a deep dive of FSx for Windows, read this article: Effectively Using Amazon FSX: A Managed File Server for Windows.

Amazon Simple Storage Service (S3)

Amazon Simple Storage Service (S3) is an object storage service that offers industry-leading scalability, durability, and availability. It's designed to make web-scale computing easier by providing a simple web service interface to store and retrieve any amount of data, at any time, from anywhere on the web. It's used for backup and restore, archival, content distribution, and more.

Learn more about S3 with The Ultimate Guide to Amazon S3 Storage.

AWS Backup

AWS Backup is a fully managed backup service that simplifies the process of backing up data across multiple AWS services. It provides centralized backup activity tracking, compliance monitoring, and offers support for application-consistent backups. You can automate and manage backups across AWS services, including EBS volumes, RDS databases, DynamoDB tables, EFS file systems, and more.

AWS Storage Gateway

AWS Storage Gateway is a hybrid cloud storage service that provides on-premises applications with seamless, low-latency access to virtually unlimited cloud storage. It integrates with your existing applications via standard storage protocols and provides cost-effective, secure, and durable cloud storage for backup, disaster recovery, and data archiving.

Cloud Storage Requirements and Considerations

When deciding on a cloud storage solution, there are several key factors to consider.

Durability and Availability

Durability is the likelihood that an object will remain intact over a given period, while availability is the ability to access that object when needed. S3 provides 99.999999999% (11 9's) durability and up to 99.99% availability, making it a reliable choice for important data. EBS volumes also offer high durability by automatically replicating within their Availability Zone.

Security

AWS provides a variety of security features to protect your data. These include data encryption at rest and in transit, access controls via AWS Identity and Access Management (IAM), and logging and monitoring via AWS CloudTrail and Amazon CloudWatch.

In addition, AWS offers services like AWS Key Management Service (KMS) for key management and AWS Certificate Manager for SSL/TLS certificates.

Conclusion

Understanding the different storage options provided by AWS is key to ensuring that your storage solution effectively meets your needs. By considering factors such as the type of data you're dealing with, the required performance, and the need for scalability and security, you can make an informed choice about which AWS storage service is right for you. Whether you're dealing with object, file, or block data, AWS has a storage solution that can handle your workloads.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

The Ultimate Guide to Amazon S3 Storage

Guillermo Ojeda — Wed, 28 Jun 2023 19:33:04 GMT

Amazon Simple Storage Service (S3) is one of the core AWS services. Engineered for 99.999999999% (11 9's) durability, Amazon S3 is designed to deliver robust, secure, and scalable object storage. This guide provides an in-depth look into Amazon S3, explaining how it works, storage classes, and best practices.

Understanding Amazon S3

Amazon S3 is an object storage service, meaning it stores data as objects within resources called "buckets". Each object includes the data, a uniquely assigned key to identify it, and metadata that describes the data. S3 lets you store potentially infinite data, and scales automatically so you don't need to worry about provisioning storage capacity or compute capacity to access that data. This makes S3 a fantastic solution for several scenarios, from storage of static content to backups, data archiving and disaster recovery.

How does S3 work?

These are the core concepts of S3, which you need to understand in order to use S3 appropriately.

S3 Buckets

An S3 bucket is the main container for your data. It's similar to a directory or folder in a filesystem, but at a higher level. A bucket holds objects, and it also has configurations associated with it, such as a resource policy, replication configurations and logging options. You can create as many buckets as you want (they're free), and assign different configurations to them. The name of your bucket needs to be globally unique, and it's part of the URL of the objects contained.

S3 Object Keys

Within each bucket, you store data as objects. Every object contains the data itself, optional metadata in the form of key-value pairs, and an identifier, known as the key. The key is used to name and retrieve the object.

AWS Regions for S3

AWS has data centers globally, and these are grouped into regions. You can select the region where your bucket resides based on factors like proximity to users, regulatory requirements, or cost. The choice of region influences latency and data transfer costs. Within the selected region, S3 stores 6 copies of the data across at least 3 datacenters, greatly reducing the probability of data loss or corruption. Note that data stored within a region does not leave that region unless explicitly transferred, and such transfers incurr a cost.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Amazon S3 Storage Classes

Depending on your use case, you can choose from a range of Amazon S3 storage classes, each with different pricing, availability, and durability characteristics.

Amazon S3 Standard

Amazon S3 Standard-Infrequent Access

S3 Glacier and Glacier Deep Archive

Amazon S3 Use Cases

The availability, durability and ease of use of Amazon S3 make it an excellent choice for a wide arrange of use cases and applications.

Building a Data Lake with S3

Backing Up and Restoring Critical Data in S3

Archiving Data in S3 at the Lowest Cost

S3's fine-tuned access policies and automatic data lifecycle policies ensure that your data remains secure and compliant, regardless of how long it's archived.

Running Cloud-Native Applications with S3

Security in Amazon S3

Securing your data is a top priority when using Amazon S3 storage. The service provides a multitude of configurable security options to ensure your data remains private, and access is controlled.

Access Control in S3

Identity and Access Management (IAM)

S3 Bucket Policies and ACLs

Bucket policies are used to define granular, bucket-level permissions. For example, you can set a policy that allows public read access to your bucket or restricts access to specific IP addresses.

Access Control Lists (ACLs), on the other hand, can be used to manage permissions at the individual object level, allowing more fine-grained access control.

Block Public Access to S3

Encryption in S3

S3 Server-Side Encryption

Amazon S3 provides server-side encryption where data is encrypted before it's written to the disk. There are three server-side encryption options:

S3 Managed Keys (SSE-S3): Amazon handles key management and key protection for you.
AWS Key Management Service (SSE-KMS): This offers an added layer of security and audit trail for your key usage.
Customer-Provided Keys (SSE-C): You manage the encryption keys.

S3 Client-Side Encryption

In client-side encryption, data is encrypted on the client-side before it's transferred to S3. You have complete control and responsibility over encryption keys in this case.

Data Protection in S3

S3 Object Versioning

Versioning allows you to preserve, retrieve, and restore every version of every object in your bucket. This feature protects against both unintended user actions and application failures.

Amazon S3 Lifecycle

Security Monitoring and Compliance for S3

AWS CloudTrail

AWS CloudTrail logs, monitors and retains account activity related to actions across your AWS infrastructure. This can be useful for auditing and review of S3 bucket accesses and changes.

AWS Trusted Advisor

Trusted Advisor provides insights regarding AWS resources following best practices for performance, security, and cost optimization.

Amazon S3 Replication

What is Amazon S3 Replication?

Types of Amazon S3 Replication

Amazon S3 offers several types of replication services:

S3 Cross-Region Replication (CRR)

S3 Same-Region Replication (SRR)

S3 Replication Time Control (RTC)

S3 Replication to Multiple Destinations

Setting Up Replication in Amazon S3

Understanding S3 Replication Costs

Conclusion

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

AWS RDS Instance Types: Complete Guide

Guillermo Ojeda — Mon, 26 Jun 2023 00:14:10 GMT

What is an AWS RDS instance?

An AWS RDS instance is a database instance created with the AWS Relational Database Service (RDS) service, which comes pre-configured with a relational database engine, such as PostgreSQL or MySQL, and is optimized for database performance. With AWS RDS, you can choose from various instance types based on your specific needs for performance, storage, and memory.

AWS provides a wide range of instance classes for RDS, each designed to support different workload requirements. These instance classes are categorized into three families based on their performance characteristics: General Purpose, Memory Optimized, and Burstable Performance.

What Are The Different AWS RDS Instance Types?

General Purpose AWS RDS Instances

General Purpose instances, also known as T and M classes, are designed to provide a balance of compute, memory, and network resources. They are best suited for a broad spectrum of use cases including small to medium-sized databases, back-end servers for SAP, SharePoint, and other enterprise applications.

Use Cases for General Purpose RDS Instances

These instances work well for application servers, back-end servers for enterprise applications, gaming servers, content management, and relational database servers, where the requirement is for consistent baseline performance and the ability to burst for short periods to support spikes in CPU usage.

Memory Optimized AWS RDS Instances

Memory-optimized instances (R and X classes) are engineered to deliver high performance for workloads that process large data sets in memory. They are ideal for memory-intensive applications such as real-time big data analytics and high-performance databases.

Use Cases for Memory-Optimized RDS Instances

These instances are suitable for mid-size to large databases and memory-intensive applications that require low-latency read access to large data sets, like high-performance web servers, data analytics, and batch processing workloads.

Burstable Performance RDS Instance Types

Burstable Performance instances (T classes) are designed to provide a baseline level of CPU performance with the ability to burst above the baseline. They are well-suited for workloads that don't require full CPU continuously, such as small databases.

Use Cases for Burstable Performance RDS Instances

These instances are ideal for low-latency interactive applications, small and medium databases, and test and development environments that require full CPU utilization sporadically.

What's the difference between RDS Instance Classes and RDS Instance Types?

RDS Instance Classes and RDS Instance Types are two different concepts in AWS that refer to different aspects of the RDS service. Let me explain.

The RDS instance class refers to the hardware specifications of the underlying infrastructure that powers your RDS instance. They determine factors such as CPU, RAM, storage capacity, and network performance. These classes are categorized into families such as T, M, R, X, etc., each offering different levels of compute power and memory resources.

On the other hand, RDS instance types represent the specific configurations within each instance class. Instance types define the combination of hardware and software that make up an Amazon RDS database instance. They determine the number of virtual CPUs, amount of memory, and storage capacity available to your database. Each instance type is optimized for different workloads and offers varying levels of performance and capabilities.

For example, within the M class of RDS instances, you have different instance types like db.m6g.large with 2 vCPU and 8 GB of memory, db.m6g.xlarge with 4 vCPU and 16 GB of memory, db.m6g.2xlarge with 8 vCPU and 32 GB of memory, etc. These types differ in terms of CPU power and memory size, but they keep the same memory to CPU ratio. Comparing that with the R class of RDS instances, you'll find that an instance of type db.r6g.large has 2 vCPU and 16 GB of memory.

Choosing the right RDS instance type is crucial as it directly impacts the performance and capabilities of your database. It's important to carefully consider your workload requirements and choose an instance type that aligns with them.

If your application requires high compute power and memory resources, you might opt for an instance type from the larger M or R families. These instances offer a higher number of virtual CPUs and more memory, enabling them to handle resource-intensive workloads efficiently.

On the other hand, if cost optimization is a priority and your workload can operate with limited compute power and memory, you could consider an instance type from the T family. These instances provide a balance between performance and cost-effectiveness

What Are The Different Amazon RDS Instance Sizes?

Each instance type within an instance class comes in several sizes to support varying database workload demands. These sizes mainly differ in terms of CPU, memory, storage, and network capacity, with each type having its own unique combination of CPU, memory, storage, and network capacity tailored to different workloads. For instance, a db.t3.micro instance provides 1 vCPU and 1 GiB RAM, whereas a db.t3.2xlarge offers 8 vCPU and 32 GiB RAM.

How To Choose the Right RDS Instance Type

The choice of the right RDS instance type depends on the nature and requirements of your workload. Here are some tips:

Understand Your Workload: Identify whether your workload is compute, memory, or network-intensive.
Consider Performance Requirements: High-performance applications may benefit from Memory-Optimized or Burstable Performance instances.
Evaluate Cost Effectiveness: Bigger instances provide better cost-effectiveness for workloads with high compute requirements.
Choose Multi-AZ deployment if necessary: If your application requires high availability, consider using a Multi-AZ deployment.

AWS RDS Pricing

Pricing for RDS instances depends on several factors including the instance type, region, and whether you use On-Demand Instances or Reserved Instances. On-Demand Instances allow you to pay for compute capacity by the hour with no long-term commitments, which can be beneficial for short-term, spiky, or unpredictable workloads. In contrast, Reserved Instances provide you with a significant discount compared to On-Demand pricing and are recommended for steady-state usage.

For a quick cost comparison:

An On-Demand, db.m7g.large instance (General Purpose) in the US East region may cost around $0.168 per hour.
A Memory Optimized, db.r7g.large instance may cost around $0.239 per hour in the same region.
The Burstable Performance, db.t4g.medium instance could be around $0.065 per hour.

Always use the AWS Pricing Calculator to estimate the cost of your setup before launching the instances.

Understanding RDS Instance Pricing

Understanding the pricing of RDS instances is crucial when considering which instance type to choose for your workload. Several factors, such as instance type, region, and usage (On-Demand or Reserved Instances), influence the overall cost.

On-Demand vs Reserved Instances

Aside from considering the instance type and its associated costs, it is also important to factor in your specific workload requirements. For example, if you have a short-term project or anticipate unpredictable demand spikes, utilizing On-Demand Instances can provide the flexibility you need. On-Demand Instances offer flexibility by allowing you to pay for compute capacity on an hourly basis, with no long-term commitments. With this option, only pay for compute capacity on an hourly basis, making it a good choice for scaling up or down as needed without any long-term commitments.

On the other hand, if you have a consistent workload and can commit to using the RDS instance for a longer duration, Reserved Instances provide a significant discount compared to On-Demand pricing. These instances require an upfront payment but offer substantial savings in the long run. They are recommended for steady-state usage, where you have a consistent workload and can commit to using the RDS instance for a longer duration.

Comparing RDS Instances

When comparing the cost of different instance types, it's important to consider their specifications and performance characteristics. For example, the db.m7g.large instance in the US East region costs around $0.168 per hour. This General Purpose instance provides a balance between compute and memory resources, making it suitable for a wide range of workloads.

If your workload requires more memory, you might opt for the Memory Optimized db.r7g.large instance, which costs around $0.239 per hour in the same region, and provides double the amount of memory. The range of RDS instance types available can sometimes be overwhelming, but understanding their pricing is crucial to determine the best fit for your workload. The cost of an RDS instance depends on factors such as instance type, region, usage, and any additional charges.

How do I change my RDS instance type?

To change the type of your AWS RDS instance, you can use the AWS Management Console, AWS CLI, or Amazon RDS API. Simply select your instance, choose "Modify," and then select the instance type you want to upgrade to. Follow the prompts to complete the upgrade process. Here's a detailed step-by-step guide:

Sign in to the AWS Management Console and open the Amazon RDS console at console.aws.amazon.com/rds.
Navigate to Databases and choose the DB instance that you want to modify.
Choose 'Modify'. In the Instance specifications section, pick a different DB instance class.
Select 'Apply Immediately' if you want the changes to be effective immediately. If you do not choose to apply the changes immediately, the change will occur during your specified maintenance window.
Choose 'Continue' and check the summary of modifications.
Choose 'Modify DB Instance' to save the changes.

Remember, the modification will cause a brief outage for your DB instance. Make sure you perform such operations during scheduled maintenance periods.

If you'd like to know more about determining the right size of RDS db instances, and changing the size, visit this Guide to AWS RDS & Aurora Instance Types and Sizes.

Additionally, you can gain a more thorough understanding of Amazon RDS by reading Managed Relational Databases with AWS RDS and Aurora.

With a thorough understanding of various RDS instance types, their use cases, and cost structures, you can make an informed decision that caters to your application needs while optimizing costs. AWS provides a selection of instance types, each designed to efficiently handle different types of workloads. By choosing the right instance type, you can ensure a robust, high-performing, and cost-effective database solution for your applications.

Stop copying cloud solutions, start understanding them. Join over 3600 devs, tech leads, and experts learning how to architect cloud solutions, not pass exams, with the Simple AWS newsletter.

Real scenarios and solutions
The why behind the solutions
Best practices to improve them

Subscribe for free

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Best Practices & Tools for Cost Optimization in AWS

Guillermo Ojeda — Fri, 23 Jun 2023 22:07:05 GMT

As businesses increasingly adopt the cloud, understanding AWS cost optimization becomes crucial. In this comprehensive guide, we will dive into the core principles, strategies, and tools to master AWS cost optimization.

Understanding AWS Cost Optimization

AWS cost optimization is the process of reducing your overall AWS expenses by eliminating wasted spend and making sure your resources are efficiently used.

Why AWS Cost Optimization is Important

The elasticity of AWS allows businesses to scale up or down quickly, which can inadvertently lead to higher costs if not properly managed. By optimizing your AWS costs, you can ensure that your organization is only paying for the resources it needs, when it needs them. This enables you to maintain high performance and capacity while minimizing costs.

Principles of AWS Cost Optimization

Cost optimization in AWS involves understanding its various pricing models, effectively using its management tools, and implementing best practices in your cloud environment.

Overview of Pricing Models in AWS

AWS provides various pricing models that let you optimize costs according to your specific use cases:

On-Demand Pricing for AWS

On-Demand instances let you pay for compute capacity by the hour without any long-term commitments. This flexibility comes with higher costs compared to other pricing models.

AWS Savings Plans

Savings Plans offer significant savings on AWS compute usage. You commit to a consistent amount of usage (e.g., $10/hour) over 1 or 3 years to receive a lower rate.

AWS Reserved Instances

Reserved Instances (RIs) provide a significant discount compared to On-Demand pricing and are ideal for predictable workloads with steady-state usage.

AWS EC2 Spot Instances

Spot Instances let you use spare Amazon EC2 computing capacity at a considerable discount. However, these instances can be interrupted by AWS with two minutes of notification when AWS needs the capacity back.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

AWS Cost Management Tools

Several AWS tools can help you understand, control, and optimize your costs:

Billing and Cost Management Console

This console provides information about your AWS costs and usage. It includes a dashboard that you can use to track your AWS spending.

AWS Cost Explorer

Cost Explorer is a visualization tool that lets you view and analyze your AWS costs and usage over time. You can filter data by AWS service, linked account, tag, and more.

AWS Budgets

AWS Budgets lets you set custom cost and usage budgets that alert you when your costs or usage exceed (or are forecasted to exceed) your budgeted amount.

AWS Trusted Advisor

Trusted Advisor offers real-time guidance to help you provision your resources following AWS best practices, including cost-optimizing recommendations.

Amazon CloudWatch

CloudWatch allows you to collect monitoring and operational data in the form of logs, metrics, and events. This visibility can help identify over-provisioned resources that could be downsized to save costs.

AWS Compute Optimizer

Compute Optimizer recommends optimal AWS resources for your workloads to reduce costs while improving performance by analyzing the configuration and resource utilization of your AWS resources.

AWS Cost Optimization Strategies

Let's dive into practical cost optimization strategies:

Using AWS Cost-Allocation Tags

Tags help you categorize AWS resources in different ways, such as by purpose, owner, or environment. This can aid in tracking costs and understanding spending patterns.

Choosing the Right AWS Pricing Model

Selecting an appropriate pricing model is crucial for cost optimization. The best model for your use case depends on factors such as usage patterns, flexibility needs, and budget.

Optimizing EC2 Instance Usage

Here are some methods to optimize your EC2 instances:

Identifying Low Utilized EC2 Instances

Use AWS Cost Explorer's Rightsizing Recommendations report to identify underutilized EC2 instances and get recommendations for appropriate instance types and sizes based on real usage.

Using EC2 Spot Instances

Spot Instances allow you to leverage unused EC2 capacity at a discounted price. This can provide significant savings but be aware that these instances can be interrupted if demand increases.

Understanding and Applying EC2 Auto Scaling

Auto Scaling helps to maintain the availability of your application and allows you to scale your Amazon EC2 capacity up or down automatically according to the conditions you define. This can be a cost-effective way to run your applications and manage your resources more efficiently.

S3 Storage Tier Selection

Amazon S3 offers a range of storage classes designed for different use cases. These include S3 Standard for general-purpose storage of frequently accessed data, S3 Intelligent-Tiering for data with unknown or changing access patterns, S3 Standard-IA and One Zone-IA for long-lived, but less frequently accessed data, and Amazon S3 Glacier and S3 Glacier Deep Archive for long-term archive and digital preservation. Selecting the appropriate storage tier can help you manage costs more effectively.

Selecting the Right EBS Volume Type

EBS volumes come in several types, each with different performance characteristics and costs. These include General Purpose SSD (gp2), Provisioned IOPS SSD (io1), Throughput Optimized HDD (st1), Cold HDD (sc1), and Magnetic (standard). Choosing the right volume type for your workload needs can help optimize costs.

Using Auto Scaling for DynamoDB

DynamoDB Auto Scaling dynamically adjusts throughput capacity in response to actual traffic patterns. This feature provides a balance of cost-effectiveness and performance by allowing you to pay for only the throughput you need while maintaining high responsiveness during peak traffic times.

Reducing Data Transfer Costs

AWS charges for data transfer in certain situations and this can contribute significantly to your overall costs. Comprehending these costs, utilizing services like Amazon CloudFront for caching, and reducing inter-region and internet data transfers can help optimize these costs.

AWS Cost Reduction Checklist

Here's a simplified checklist to keep your AWS costs optimized:

Understand AWS Pricing Models: Know the difference between On-Demand, Spot, Reserved Instances, and Savings Plans.
Use AWS Cost Management Tools: Use tools like AWS Cost Explorer, AWS Budgets, AWS Trusted Advisor, Amazon CloudWatch, and AWS Compute Optimizer.
Implement Tagging: Implement a comprehensive tagging strategy for better visibility into your AWS costs.
Rightsize EC2 Instances: Regularly review your EC2 instances for underutilization.
Use Spot Instances: Utilize Spot Instances for flexible, interruption-tolerant workloads.
Implement Auto Scaling: Make use of Auto Scaling for EC2 instances and DynamoDB throughput.
Optimize S3 Costs: Use appropriate S3 storage tiers and lifecycle policies.
Choose Correct EBS Volume: Select the correct EBS volume type based on workload.
Monitor Data Transfer Costs: Keep an eye on data transfer costs and take steps to minimize them where possible.

By understanding and implementing these strategies, you can effectively optimize your AWS costs, ensuring that you're extracting maximum value from AWS. AWS cost optimization is an ongoing process, but with the right approach, it can lead to significant savings and a more efficient cloud environment.

Streamlining the AWS Cost Optimization Process

Streamlining the cost optimization process requires a continuous commitment to monitoring, analyzing, and adjusting your usage and expenses on the AWS platform. This may seem like a daunting task, but the good news is that you can automate most of these tasks using AWS native tools and features.

Implement Regular Monitoring and Auditing

Make it a practice to monitor your AWS usage and costs on a regular basis. The frequency of this can vary depending on the size and complexity of your environment - larger, more complex environments may require daily or weekly monitoring. AWS provides several tools to help with this, including AWS Cost Explorer and AWS Budgets, both of which provide detailed insights into your AWS spending.

Automate Wherever Possible

Automation can save a significant amount of time and money. Auto Scaling, for example, enables you to automatically scale your resources up or down based on demand, ensuring that you only pay for what you use. Similarly, lifecycle policies in Amazon S3 can automatically transition data to less expensive storage classes or archive it to Amazon S3 Glacier to save on storage costs.

Use Cost Analytics

Use data analytics to gain deeper insights into your spending. AWS Cost Explorer, for instance, allows you to visualize, understand, and manage your AWS costs and usage over time. You can use this tool to identify trends, pinpoint cost drivers, and detect anomalies - all of which are crucial for effective cost management.

Stay Up-to-Date with New AWS Services and Features

AWS continually releases new services and features, some of which could offer more cost-effective solutions than the ones you're currently using. By staying up-to-date with the latest AWS offerings, you can take advantage of these to further optimize your costs.

Conclusion

Cost optimization is a critical component of managing your AWS environment. By understanding the fundamentals of AWS cost optimization, leveraging the right tools, and employing the best practices outlined in this guide, you can make sure youre getting the most out of your AWS investment. Remember, cost optimization on AWS is an ongoing process - regular check-ups and adjustments are essential to keeping your costs at an optimal level. With these strategies in hand, you are now equipped to make the most of what AWS has to offer while keeping your costs in check.

Take advantage of this comprehensive guide to drive your cost optimization strategy and efforts, ensuring significant savings, and a healthier bottom line for your organization. Happy cost optimizing!

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

A Comprehensive Guide to Amazon FSX Backups

Guillermo Ojeda — Fri, 23 Jun 2023 20:04:57 GMT

Data is a valuable asset in today's digital world, and protecting it is paramount. One of the most effective ways to ensure data protection is by implementing a robust backup strategy. Amazon FSX for Windows provides high-performance file storage built on Windows Server, and an integral part of this service is its powerful backup capabilities. This guide delves into the nuances of Amazon FSX backups, aiming to arm experienced cloud engineers and developers with in-depth knowledge about this critical feature.

Importance of Amazon FSX Backups

Amazon FSX backups are automatic, incremental, and managed by AWS. They allow you to safeguard your data, enabling you to restore it in the event of user errors, system failures, or malicious attacks. While the primary function of these backups is data protection, they also facilitate other operational tasks such as duplicating a file system across AWS regions or accounts.

When it comes to managing FSx backups, understanding how the service works is crucial. FSx automatically takes backups daily during a backup window, a 30-minute interval of your choosing. These automatic backups are retained for a period of 7 days. However, FSx also allows you to manually create backups, referred to as user-initiated or on-demand backups, which can be retained indefinitely.

The Anatomy of FSX Backups

FSx backups are composed of multiple elements that carry important metadata about the backup. This includes the ID of the backup, the file system from which the backup was created, the backup progress status, the type of backup (automatic or manual), and the KMS key ID used for encryption, among others. Such granular details provide you with control and traceability over your backups.

The Lifecycle of an FSX Backup

Understanding the lifecycle of an FSX backup can help you manage your backups effectively. The lifecycle starts when the backup is created, either automatically during the backup window or when a user initiates a manual backup. From there, the backup progresses through various statuses - CREATING, AVAILABLE, COPYING, and DELETING, until it finally reaches the DELETED status, where it is removed from the file system.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

How to Create FSX Backups

Creating FSX Backups is a straightforward process, but understanding its nuances helps avoid any errors and streamline the process. An FSx Backup encompasses the entire file system all metadata, data, and properties of your files, along with the file system configuration, its linked security groups, and network settings.

Here's a step-by-step guide on how to create an FSX Backup from the AWS Management Console:

Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.
In the left navigation pane, select File systems.
Select the file system that you want to back up.
In the Actions drop-down menu, select Create backup.
(Optional) In the Backup window, you can assign a name to the backup under Backup name. If you leave this field blank, Amazon FSx assigns an automatic unique name.
Click Create backup.
You can monitor the progress of the backup operation on the Backups page. Once the status changes to AVAILABLE, your backup is ready.

How to Restore FSX Backups

Restoring an FSx Backup means creating a new file system that is an exact replica of the original file system at the point the backup was taken. The new file system has its own DNS name and resource ID.

Below are the detailed instructions to restore an FSx Backup from the AWS Management Console:

Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.
In the left navigation pane, click on Backups.
Select the backup you want to restore.
Click on the Actions drop-down menu and select Create file system.
In the Create file system from backup window, specify the details for the new file system. For most settings, the console pre-fills the values from the backup, but you can modify them.
Click on Create file system. Amazon FSx then restores the backup to a new file system. The status of the new file system changes to AVAILABLE once it's ready to use.

How to Delete FSX Backups

Deleting an FSx Backup permanently removes the backup and it cannot be restored, so make sure to double-check before performing this action.

Here's a detailed guide on how to delete an FSx Backup from the AWS Management Console:

Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.
In the left navigation pane, select Backups.
Select the backup that you want to delete.
Click on the Actions drop-down menu and select Delete.
In the Delete backup window, you'll see a message asking you to confirm the deletion. Type "delete" in the box to confirm.
Click on Delete backup.

Best Practices for Managing FSX Backups

Efficient management of FSX backups involves following certain best practices. These strategies can help you maximize the value from FSX backups, optimize costs, and ensure robust data protection.

Regularly Create On-Demand Backups: While automatic backups provide a safety net, it's wise to regularly create on-demand backups when making significant changes or updates to your file system. These backups act as snapshots, capturing the state of your file system at a particular point in time, and can be crucial for disaster recovery scenarios.
Implement a Multi-tiered Retention Policy: Establish a retention policy that considers both your data recovery requirements and cost implications. While retaining backups for an extended period provides better protection, it also increases costs. Therefore, a tiered policy that keeps daily backups for a week, weekly backups for a month, and monthly backups for a year, for example, can be a balanced approach.
Monitor Backup Activity: Keeping an eye on your backup activity can help you manage your storage capacity and costs. AWS provides various monitoring tools, such as Amazon CloudWatch and AWS CloudTrail, which can help you track and audit backup and restore operations.
Regularly Test Restore Operations: Regularly test your restore procedures to ensure they work as expected. This practice can help you identify and resolve potential issues before they can impact your business continuity.
Tag Your Backups: Utilizing AWS tagging capabilities can greatly simplify backup management, especially in large-scale environments. Tags allow you to categorize your backups based on various attributes, such as the project, owner, or environment, enabling you to manage, filter, and search your backups effectively.
Encrypt Your Backups: To enhance data security, enable automatic encryption for your backups using AWS Key Management Service (KMS). AWS KMS provides you with centralized control over cryptographic keys, making it easier to manage the keys used to encrypt your data.

Deep Dive into FSX Backup Operations

Amazon FSX uses Windows Volume Shadow Copy Service (VSS) snapshots for backups. Here's a deeper look into some of the more technical aspects of FSX backup operations.

Understanding VSS Snapshots for FSX Backups

VSS is a Windows technology that allows taking manual or automatic backup copies or snapshots of data on a volume, even when applications are writing to the files. FSx backups utilize VSS snapshots, enabling backups of locked or open files without interrupting the file system operations.

FSX Backups Consistency

While VSS can create 'crash-consistent' backups that capture the state of the data at a specific moment, FSx goes a step further. FSx uses VSS to create 'application-consistent' backups, preserving the integrity of the applications (like Microsoft SQL Server or Active Directory) running on the file system.

Incremental Backups in FSX

FSX backups are incremental, storing only the changes made since the last backup. This results in faster backup operations and efficient use of backup storage, significantly reducing backup costs.

Working with Cross-Region and Cross-Account FSX Backups

FSX provides flexibility not only with intra-region backups but also supports cross-region and cross-account backups. This allows for improved disaster recovery, better compliance with data residency requirements, and simplified data migration.

Creating Cross-Region FSX Backups

Cross-region backups in Amazon FSx allow you to duplicate your backups in another region, providing an additional layer of data protection and facilitating compliance with regulatory standards that require geographical diversification of backups. You can use AWS Backup service to create cross-region backup copies.

Here's a detailed guide on how to create cross-region backups for FSx using the AWS Backup console:

Open the AWS Backup console at https://console.aws.amazon.com/backup.
In the navigation pane, select Protected resources.
In the Resource type drop-down, select FSx.
Select the file system you want to backup.
In the Actions drop-down menu, select Create on-demand backup.
Specify the details for the backup. In the Backup vault option, select a backup vault located in the region where you want to store the backup copy.
Click Create on-demand backup.

Remember that cross-region backups can increase your costs, so you should evaluate your needs and consider deleting older backups that are no longer needed.

Creating Cross-Account FSX Backups

Creating cross-account backups is a best practice for enhancing the security of your backups. By storing backups in a different account, you can safeguard your backups even if your primary account gets compromised. You can achieve this using AWS Backup together with AWS Resource Access Manager (RAM) to share your backup vault with another account.

Here's how you can create cross-account backups for FSx using AWS Backup and AWS RAM:

Open the AWS Backup console at https://console.aws.amazon.com/backup.
In the navigation pane, select Backup vaults.
Select the backup vault you want to share.
In the Actions drop-down menu, select Share.
In the Share backup vault window, enter the AWS account ID where you want to share the backup vault, and click Share.
Log into the secondary AWS account, open AWS RAM console at https://console.aws.amazon.com/ram.
In the navigation pane, select Shared with me.
Accept the resource share. Now you can access the shared backup vault from AWS Backup in the secondary account.
In the AWS Backup console, under Protected resources, select your FSx file system and create a new on-demand backup specifying the shared backup vault.

Remember, cross-account backups involve data transfer across accounts, which could incur additional charges. Always follow the principle of least privilege when sharing resources across accounts to maintain security.

Conclusion

Amazon FSX for Windows File Server offers powerful and flexible backup capabilities to protect your file system data. Understanding the details of FSX backups and following the best practices outlined in this guide can equip you to utilize these features effectively. As data protection continues to gain paramount importance in today's evolving digital landscape, mastering Amazon FSX backups will be a significant asset in your AWS toolbelt.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Amazon FSX for Windows: A Managed File Server

Guillermo Ojeda — Tue, 16 May 2023 19:33:31 GMT

In the tech-driven landscape, understanding the nitty-gritty of file storage services like Amazon FSX for Windows is a must for businesses running Windows-based applications. Amazon FSX is a marvel of technology that simplifies Windows file storage, allowing companies to focus more on their core business rather than the backend. In this comprehensive guide, we aim to provide a deeper technical understanding of Amazon FSX, discussing its pivotal components, setup, operation, integration with AWS services, and security.

What is Amazon FSX for Windows?

Amazon FSX for Windows is an AWS managed service providing robust, scalable, and high-performing Windows file storage. The service is built on the foundation of Windows Server, integrating seamlessly with Windows-based applications and workloads. The strength of Amazon FSX lies in its ability to eradicate the administrative burden of managing file infrastructure, enabling businesses to concentrate on their central applications.

The service is built with three key elements:

A fully managed Windows file system: Amazon FSX takes care of all the time-consuming administrative tasks such as hardware provisioning, software configuration, patching, and backups.
Rich feature set: Amazon FSX provides features like user quotas, file restoration, data deduplication, and Microsoft DFS replication that enhance storage optimization and data management.
Wide compatibility: Amazon FSX is compatible with a broad range of AWS services and Windows-based applications, making it a versatile choice for diverse business needs.

Core Components of Amazon FSX

FSX Windows Server

At the core of Amazon FSX for Windows lies Windows Server, offering a fully native environment for Windows applications. The service operates on the Windows Server platform, extending compatibility to your existing Windows tools and scripts, as well as the Active Directory environment. The amalgamation of Windows Server with Amazon FSX provides a seamless integration and management experience for users, making it an efficient choice for Windows-based operations.

SMB Protocol in Amazon FSX

The Server Message Block (SMB) protocol is a cornerstone of Amazon FSX for Windows. As a network file-sharing protocol, it allows Windows-based applications to access the file system as they would with any other network file share. It enables shared access to files, printers, and serial ports between different nodes on a network. The SMB protocol in Amazon FSX ensures smooth and uninterrupted access to data, thus bolstering efficiency in operations.

Amazon FSX Integration with Active Directory

The integration of Amazon FSX with Active Directory is one of its defining features. The service can be seamlessly integrated with on-premises Active Directory or AWS Managed Microsoft AD, enabling the use of existing Windows user accounts and groups. This not only simplifies access management to your file shares but also ensures the enforcement of existing security policies.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Setting Up FSX for Windows

Accessing the FSX for Windows Console

The journey with Amazon FSX begins with accessing the AWS Management Console. This web application is your gateway to managing your Amazon Web Services. Once you sign in to the console, you can navigate and select the Amazon FSX service to begin the process of setting up your file system.

Creating a File System

Creating a file system is the next step in the setup process. The Amazon FSX service in the AWS Management Console provides a wizard-like interface that walks you through the process. During the setup, you'll need to provide specific details, including your preferred Windows Server version, the size of the file system, and the throughput capacity.

Choosing Storage Capacity and Throughput

Storage capacity and throughput are two critical factors to consider when creating a file system. Amazon FSX offers a flexible range of options for both. For storage capacity, you can choose from 32 GiB to 65,536 GiB, depending on your data requirements. As for throughput, you have the option to select from 8 MB/s up to 2048 MB/s. This flexibility ensures that you can customize your file system to meet the specific demands of your workloads.

Configuring Network Settings

After choosing the storage and throughput, the next step is to configure the network settings for your file system. You'll need to specify the Virtual Private Cloud (VPC) and Subnets where your file system will reside, along with the security groups. The security groups act as a firewall, determining which traffic to allow into your file system.

Setting Up Windows Authentication

Amazon FSX for Windows integrates seamlessly with Microsoft Active Directory. This integration allows you to leverage your existing Active Directory infrastructure for user authentication. During setup, you can choose to create a new Microsoft Active Directory in AWS, use an existing AWS Managed Microsoft AD, or use an on-premises Active Directory.

Launching and Connecting to the FSX for Windows File System

Once you've completed all the setup steps, you can launch your new file system. After the file system is available, you can connect to it using standard SMB protocol from your EC2 instances or your on-premises servers. Amazon FSX provides DNS names for your file systems, making it easy to mount from any Windows-based application.

Managing and Operating Amazon FSX for Windows

Modifying an FSX File System

Post-creation, Amazon FSX provides the flexibility to modify your file system configuration based on your evolving needs. You can alter both storage capacity and throughput capacity to align with your current requirements. This flexibility ensures that your file system remains optimized for your workload, facilitating efficient utilization of resources.

Monitoring FSX with Amazon CloudWatch

Monitoring is a crucial aspect of managing any file system, and Amazon FSX makes it easy with seamless integration with Amazon CloudWatch. Amazon CloudWatch allows you to collect and track metrics, set alarms, and automatically react to changes in your AWS resources. It provides visibility into your FSx resource utilization, application performance, and operational health, enabling you to optimize your file systems and respond proactively to any potential issues.

Optimizing Performance in Amazon FSX

Optimizing the performance of your file system is a critical task, and Amazon FSX provides several tools and features to help. You can adjust throughput capacity to match your workloads, use SSD storage for high-speed access, and enable data deduplication to reduce storage consumption. Additionally, Amazon FSX integrates with Amazon CloudWatch to provide detailed performance metrics, helping you to fine-tune your file system for optimal performance.

Data Deduplication in Amazon FSX

Data deduplication is a feature that can significantly optimize your storage usage. It works by locating and eliminating duplication within your data, ensuring each unique piece of data is only stored once. This reduces the amount of storage needed, which can result in substantial cost savings. Amazon FSX supports automatic data deduplication, making it easy to take advantage of this powerful feature.

FSX for Windows Backups

AWS lets you implement comprehensive backup strategies for Amazon FSX, offering a reliable way to protect your data from unintended deletions, application errors, and system failures. These backups are easy to create, manage, and restore, thus providing a robust defense mechanism for your file system data.

Automated Backups with Amazon FSX

One of the significant advantages of Amazon FSX is its automated backup feature. The service performs daily backups of your file systems during a user-defined window, providing a regular snapshot of your data.

Scheduling: The default backup window is a 30-minute interval selected at random from an 8-hour block of time for each AWS Region.
Retention: The service retains these daily backups for a total of 35 days, which allows you to restore your file system from a backup taken within this retention period.

On-Demand Backups for Amazon FSX

In addition to automated backups, Amazon FSX allows you to create on-demand backups at any time. This feature comes in handy when planning system updates or before initiating significant changes in your application:

To create an on-demand backup, go to the Amazon FSX console.
Select the file system that you want to back up.
Choose "Create backup" from the "Actions" dropdown menu.
Provide an optional name for your backup and click "Create backup".

Restoring Data from Amazon FSX Backups

Restoring your data from a backup is straightforward. You can restore an entire file system or even specific files and folders, offering granular control over your data recovery process:

In the Amazon FSX console, choose "Backups" from the navigation pane.
Select the backup you want to restore and choose "Create file system" from the "Actions" dropdown menu.
Follow the instructions to create a new file system from the backup.

Understanding Amazon FSX Backups Lifecycle

To optimize cost and resource usage, you can transition your backups to a colder storage class after a certain period. You can also delete backups when they are no longer needed. However, remember that once deleted, backups cannot be recovered.

Integrating with AWS Backup for a Complete Amazon FSX Backup Strategy

Lastly, it's worth noting that Amazon FSX integrates seamlessly with AWS Backup, a centralized backup service. This integration allows you to manage backups of your Amazon FSX file systems alongside your other AWS resources, offering a holistic approach to your backup strategy.

Integrating Amazon FSX with Other AWS Services

Integrating Amazon FSX with Amazon DataSync

Amazon FSX integrates seamlessly with Amazon DataSync, a data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services, or between AWS storage services. With DataSync, you can easily transfer your file data into or out of Amazon FSX, making it easier to migrate, replicate, or archive your data.

Integrating Amazon FSX with AWS Backup

Data backup is an essential aspect of any storage solution. Amazon FSX integrates with AWS Backup, a centralized backup service that simplifies the management of backups for AWS services. With AWS Backup, you can configure policy-driven backup policies, manage backup retention, and monitor recent backup and restore activity across your AWS resources.

Integrating Amazon FSX with AWS CloudTrail

For auditing and governance purposes, Amazon FSX also integrates with AWS CloudTrail. AWS CloudTrail records AWS API calls for your account, providing visibility into user activity. By integrating with CloudTrail, Amazon FSX provides you with logs of file system operations, helping you to monitor and troubleshoot any issues.

Security Features in Amazon FSX for Windows

Data Encryption in Amazon FSX

Amazon FSX offers robust security features, including data encryption. Data at rest within the file system and data in transit between the file system and your instances are encrypted. Amazon FSX uses AWS Key Management Service (KMS) for encryption, giving you centralized control over the cryptographic keys used to protect your data.

Integrating Amazon FSX with AWS IAM

Integration with AWS Identity and Access Management (IAM) ensures that access to your Amazon FSX resources is secure. IAM enables you to manage access to AWS services and resources securely. You can create users and groups, assign permissions to allow or deny their access to AWS resources like Amazon FSX.

Integrating Amazon FSX with Amazon VPC

Amazon FSX file systems are always created within an Amazon VPC, providing an additional layer of security. Amazon Virtual Private Cloud (VPC) lets you launch AWS resources in a virtual network that you define, providing a wide range of configuration options for IP addressing, subnetting, routing, and security.

Network Security for Amazon FSX using Security Groups

Amazon FSX allows you to use security groups to control inbound and outbound traffic. A security group acts as a virtual firewall for your file system, controlling the traffic to your file system. You can set rules specifying which ports can receive traffic, the sources of the traffic, and the types of protocols that are allowed.

Amazon FSX for Windows uses standard Windows file permissions for access control. You can use both share-level permissions (which apply to the entire shared resource) and NTFS file and folder permissions (which apply to specific files and folders). This dual-level permission system provides granular control over who can access your data and what they can do with it.

Data Protection in Amazon FSX with Automated Backups

To safeguard your data, Amazon FSX automatically takes daily backups of your file system. These backups are incremental, meaning they only capture changes made after the last backup, reducing storage usage. You can also initiate backups manually at any time. These backups are stored in Amazon S3, which is designed for 99.999999999% (11 9's) of durability.

Audit and Compliance in Amazon FSX with AWS CloudTrail

To keep track of activities in your Amazon FSX, you can use AWS CloudTrail, which records all actions taken in Amazon FSX as events. These events include actions taken within the FSx console, AWS SDKs, command line tools, and other AWS services. The recorded information includes the identity of the user, the start time of the action, the source IP address, the request parameters, and the response elements returned by Amazon FSX.

Compliance Certifications of Amazon FSX for Windows

Amazon FSX for Windows meets a broad set of international and industry-specific compliance standards, such as ISO, PCI, and HIPAA. This further emphasizes its position as a secure file storage solution suitable for handling sensitive and regulated data. It not only provides an assurance of stringent security measures in place but also allows businesses operating under these regulations to remain compliant while utilizing the service.

Conclusion

Amazon FSX for Windows represents a significant leap forward in the realm of file storage for Windows-based applications. It encapsulates a broad range of features that make it a robust, efficient, and secure solution for businesses. Whether it's the compatibility with Windows Server, the operational benefits of SMB protocol, or the seamless integration with Active Directory and other AWS services, Amazon FSX is a comprehensive solution tailored to meet the diverse needs of modern businesses.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Understanding AWS Aurora Instance Types

Guillermo Ojeda — Thu, 11 May 2023 17:24:29 GMT

Navigating the myriad of options when it comes to AWS Aurora instance types can be overwhelming. Aurora, a cloud-based relational database engine, is designed to be highly performant and cost-efficient. Yet, to unlock its true potential, understanding different instance types and their respective strengths is essential. This guide offers a deep dive into the different AWS Aurora instance types, providing you with the knowledge to optimize your databases for maximum performance and cost-effectiveness.

Understanding Aurora Instance Types

AWS Aurora offers an array of instance types, each optimized for different workloads and use-cases. These instance types fall into distinct families, each with their unique characteristics.

db.r6g: This family is tailored for memory-intensive workloads like in-memory databases, high-performance computing, and data-intensive workloads. With instances in this family, you get superior memory performance ideal for applications that require high memory throughput.
db.t4g: The db.t4g family is ideal for cost-effective, general-purpose workloads. These are perfect for development and testing environments or small-scale production workloads that don't require substantial CPU power.
db.m6g: The db.m6g family is designed for general-purpose workloads, balancing compute, memory, and network resources. They are a go-to choice for applications that require balanced resource allocation.

While each instance type offers unique advantages, it's crucial to consider your specific use-case and workload characteristics when making a selection.

Choosing the Ideal Aurora Instance Type

Selecting the right instance type from the outset is key to maintaining optimal performance and cost-efficiency. Here are some tips to guide your decision:

Assess Your Workload: Consider the nature of your workload. Is it compute-intensive, memory-intensive, or balanced? The answer will guide your choice of instance family.
Estimate Traffic Volume: The volume of read and write requests your application generates will also influence the selection. Higher traffic volumes might necessitate a more robust instance type.
Consider Data Complexity: The complexity of your queries and the size of your data can also impact the instance type you need. Larger datasets and more complex queries may require more powerful instances.
Forecast Growth: Your projected growth can also guide your initial choice. If you anticipate rapid growth, selecting a larger instance type might be prudent to avoid frequent resizing.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Aurora Performance Optimization

Optimal performance of your Aurora instance is a function of the instance type and how effectively you manage its resources. Here are some strategies to optimize your Aurora instance:

Monitor Performance Metrics: Regularly check metrics such as CPU utilization, memory usage, and I/O operations. These insights can help you identify performance bottlenecks and guide instance resizing.
Leverage Performance Insights: AWS Aurora's Performance Insights dashboard provides a comprehensive view of your database's performance, highlighting the most resource-intensive SQL queries.
Use SQL Queries: Certain SQL queries can provide insights into your instance's performance. For instance, "SHOW STATUS WHERE variable_name = 'Threads_connected'" can reveal the number of currently open connections.
Scale When Necessary: Use Aurora's scaling capabilities to match your instance's resources with your workload. You can manually scale your instance or use Aurora Auto Scaling for automatic adjustment.

Resizing Aurora Instances With Minimum Downtime

Resizing your Aurora instances, whether for scaling up or down, requires a deep understanding of your application's performance requirements and careful planning to minimize downtime. Here's a step-by-step guide to resize your Aurora instance with minimal downtime:

Create a Read Replica: Start by creating a read replica of your Aurora instance in the same Availability Zone as the original. This read replica serves as a backup and allows for continuous operation during the resizing process.
Ensure Readiness of the Replica: Verify that the read replica is fully functional and in a "Ready" state. Only proceed with the next steps once you're confident that the replica is ready.
Promote the Read Replica: Promote the read replica to a standalone database. This step ensures data written to the original instance during the resizing process is also available on the new instance.
Redirect Your Application: Modify the DNS entry or connection string of your application to point to the new primary instance. This step redirects all traffic to the new instance.
Test Your Application: Conduct thorough testing to ensure your application is operating correctly with the new instance. Look for any discrepancies or performance issues that might arise.
Delete the Original Instance: Once you're confident that the new instance is working as expected, you can safely delete the original instance.

Although this process minimizes downtime, a brief period may still occur during the switch-over process. To further safeguard your data, it's recommended to take a snapshot of your Aurora instance before resizing.

You can view more information about how resizing works in my guide to RDS and Aurora instance types.

Improving Performance and Cost with Aurora

Balancing performance and cost is at the heart of effective AWS Aurora management. Here are some additional tips to help you achieve this:

Regularly Review Your Aurora Instance Performance: Regular monitoring allows you to spot performance issues early and adjust accordingly. AWS CloudWatch is a useful tool for this.
Scale Aurora According to Your Needs: AWS Aurora allows you to scale your instances up or down based on your applications demand. Make use of this feature to maintain optimal performance and cost-efficiency.
Use Aurora Reserved Instances for Predictable Workloads: If your application has a predictable workload, consider using reserved instances. These can provide significant cost savings compared to on-demand pricing.
Use Aurora Serverless for Unpredictable Workloads: For applications with unpredictable or intermittent workloads, consider using Aurora Serverless. This feature automatically adjusts capacity based on actual usage.

Conclusion

Understanding the different AWS Aurora instance types can be complex, but knowing their unique characteristics and optimal use-cases can guide your decision. By balancing your application requirements with the capabilities of each instance type, you can optimize your database for both performance and cost. Regular monitoring and adjustments are key to maintaining this balance as your workload evolves. Armed with these insights, you're better positioned to unlock the full potential of AWS Aurora.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

15 Most Common AWS Lambda Use Cases

Guillermo Ojeda — Mon, 08 May 2023 18:44:07 GMT

AWS Lambda has changed the game for developers, enabling serverless computing and freeing them from the constraints of managing servers. In this article, we'll delve into several noteworthy applications of AWS Lambda, demonstrating its power and adaptability in various use cases.

1. AWS Lambda forBig Data Analysis

AWS Lambda facilitates efficient big data analysis, allowing developers to process large datasets in real time. By integrating Lambda with AWS Glue, Amazon Redshift, and other big data services, you can design serverless data pipelines, transform raw data into actionable insights, and generate reports or visualizations for better decision-making.

2. Serverless Microservices Architecture using AWS Lambda

Leverage AWS Lambda to create a serverless microservices architecture, allowing applications to scale independently and remain highly available. By decomposing monolithic applications into smaller, more manageable components, developers can improve application performance, maintainability, and fault tolerance.

3. Voice-Enabled Applications with AWS Lambda

Build voice-enabled applications using AWS Lambda and Amazon Lex, allowing users to interact with your applications through natural language. Voice-enabled applications can enhance accessibility, user engagement, and overall user experience. Lambda's integration with other voice assistant platforms, such as Google Assistant and Apple Siri, further broadens the potential reach of your applications.

4. Geospatial Analysis using AWS Lambda

Utilize AWS Lambda to perform geospatial analysis and processing, enabling location-based services and applications. With Lambda, you can process and analyze geospatial data from various sources, such as GPS devices, IoT sensors, or user-generated content, and then visualize the results on maps or integrate them into other applications.

5. Monitoring and Log Analysis with AWS Lambda

Improve application performance and reliability by using AWS Lambda for monitoring and log analysis. By integrating Lambda with Amazon CloudWatch and other monitoring services, you can create custom dashboards, set up automated alerts, and analyze logs to identify performance bottlenecks, errors, or security threats.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

6. Serverless Workflow Orchestration with AWS Lambda

Orchestrate serverless workflows using AWS Lambda and AWS Step Functions, allowing you to coordinate multiple Lambda functions and other AWS services in a single workflow. This approach simplifies the creation and management of complex, multi-step processes, enabling the development of more sophisticated serverless applications.

7. Content Delivery and Optimization with AWS Lambda

Optimize and deliver content to your users by leveraging AWS Lambda with Amazon CloudFront and other content delivery services. Lambda functions can be used to implement dynamic content generation, caching, and compression strategies, ensuring that your users receive the best possible experience, regardless of their device or network conditions.

8. Serverless Security and Compliance using AWS Lambda

Enhance the security and compliance of your applications using AWS Lambda and AWS Config. By leveraging Lambda functions, you can automate security checks, monitor resource changes, and enforce compliance policies across your AWS environments, reducing the risk of security breaches or non-compliance issues.

9. Document Processing and OCR with AWS Lambda

Automate document processing and OCR (optical character recognition) tasks with AWS Lambda and Amazon Textract, enabling serverless applications to extract, process, and analyze text from documents or images. This can be used to build applications such as automated invoice processing, intelligent document search, or data extraction for data analysis.

10. Serverless Gaming Backends with AWS Lambda

Develop scalable, serverless gaming backends using AWS Lambda, allowing you to manage game sessions, player data, and real-time interactions without the need for dedicated servers. By leveraging Lambda and other AWS services like Amazon GameLift, you can create highly available, performant, and cost-effective gaming experiences for players around the world.

11. AI-Driven Customer Support using AWS Lambda

Implement AI-driven customer support solutions with AWS Lambda and Amazon Connect. By combining Lambda functions with AI services like Amazon Lex and Amazon Comprehend, you can create intelligent chatbots or voice assistants that can handle customer inquiries, route support tickets, or even provide real-time assistance, improving customer satisfaction and reducing support costs.

12. Media Transcoding and Streaming with AWS Lambda

Deliver seamless media experiences to your users by utilizing AWS Lambda for media transcoding and streaming. By integrating Lambda with Amazon Elastic Transcoder or AWS Elemental MediaConvert, you can create serverless workflows for converting audio and video files into multiple formats, resolutions, and bitrates, ensuring optimal playback on various devices and network conditions.

13. Serverless Email Processing using AWS Lambda

Automate email processing and handling with AWS Lambda and Amazon Simple Email Service (SES). By creating Lambda functions that process incoming emails, you can implement features like automatic email filtering, classification, and response generation, as well as integrating email data with other systems or applications.

14. Smart Home Automation using AWS Lambda

Build smart home automation solutions using AWS Lambda and AWS IoT services, allowing you to control and monitor connected devices through serverless applications. With Lambda, you can create custom logic for managing device states, responding to sensor data, or automating routine tasks, resulting in more energy-efficient and convenient smart home systems.

15. Serverless Blockchain Integration with AWS Lambda

Integrate blockchain technologies into your applications using AWS Lambda and Amazon Managed Blockchain. By leveraging Lambda functions, you can create serverless workflows that interact with blockchain networks, enabling features like asset tracking, smart contract execution, or decentralized data storage.

In conclusion, AWS Lambda enables a wide variety of powerful and versatile applications, showcasing the immense potential of serverless computing. By exploring these diverse use cases, you can harness the full power of AWS Lambda and revolutionize your applications and workflows. Embrace serverless computing and unlock new possibilities for your projects.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Easy AWS Cost Optimization: How to Optimize Costs in AWS

Guillermo Ojeda — Thu, 20 Apr 2023 23:56:48 GMT

Introduction to Simple AWS Cost Optimization

As businesses migrate their workloads to the cloud, understanding how to effectively manage and optimize costs becomes increasingly important. In this article, we'll explore easy ways to optimize costs in AWS, one of the leading cloud service providers. We'll cover key services and pricing models, as well as various tools and strategies to help you save on your AWS bill.

Understanding AWS Services and Pricing Models

Before diving into cost optimization, it's crucial to understand the various AWS services and their pricing models. Some of the most commonly used services include:

EC2 instances: Elastic Compute Cloud (EC2) is a core service that allows you to run virtual machines in the cloud. EC2 offers a wide range of instance types with different CPU, memory, and storage configurations to meet your application's requirements.
S3 storage: Simple Storage Service (S3) provides scalable and durable object storage for a variety of use cases, from backups to big data analytics.
Lambda functions: AWS Lambda is a serverless compute service that lets you run your code without provisioning or managing servers. You pay only for the compute time you consume.
RDS databases: The Relational Database Service (RDS) makes it easy to set up, operate, and scale a relational database in the cloud. RDS supports several popular database engines, including MySQL, PostgreSQL, and Amazon Aurora.

Leveraging AWS Cost Management Tools

AWS offers a suite of cost management tools to help you monitor, analyze, and control your cloud expenses. AWS Cost Explorer: This tool allows you to visualize, understand, and manage your AWS costs and usage over time. You can create custom reports, filter data, and identify trends to optimize your spending.

AWS Budgets: With AWS Budgets, you can set custom cost and usage budgets based on your business requirements. You can also configure alerts to notify you when your spending or usage exceeds defined thresholds.
AWS Trusted Advisor: Trusted Advisor provides real-time guidance to help you follow AWS best practices, including cost optimization recommendations. It analyzes your AWS environment and suggests ways to save money and improve performance.

Right-Sizing Your EC2 Instances

One of the easiest ways to optimize AWS costs is by right-sizing your EC2 instances. This involves selecting the most appropriate instance type for your workload, based on its CPU, memory, and storage requirements. Consider the following steps to right-size your instances:

Analyze instance utilization: Monitor your instances' CPU, memory, and storage utilization using Amazon CloudWatch. Identify underutilized instances that can be downsized to save costs.
Choose the right instance type: AWS offers a wide range of instance types, each optimized for different workloads. Select the instance type that best matches your application's resource requirements.
Saving with Reserved Instances and Savings Plans: By committing to a specific instance type or compute usage for a longer period (1 or 3 years), you can save up to 72% compared to on-demand pricing.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Optimizing S3 Storage Costs

S3 storage costs can be optimized by selecting the appropriate storage class, configuring object lifecycle policies, and minimizing data transfer and request costs.

S3 storage classes: AWS offers several storage classes, each with different performance characteristics and costs. Choose the storage class that best fits your access patterns and durability requirements.
Object lifecycle policies: Configure S3 lifecycle policies to automatically transition objects to lower-cost storage classes or delete them when they're no longer needed.
Data transfer and request costs: Minimize costs by reducing data transfers and requests, using S3 Select to filter data before transferring it, and leveraging Amazon CloudFront for content delivery.

Reducing Lambda Function Costs

To reduce AWS Lambda costs, you can fine-tune function memory and timeout settings, utilize Provisioned Concurrency, and monitor function invocations.

Fine-tuning memory and timeout settings: Optimize your Lambda functions by allocating the right amount of memory and setting appropriate timeout values. This will help you avoid overprovisioning resources and paying for unnecessary compute time.
Utilizing Provisioned Concurrency: If your Lambda functions experience variable workloads with occasional spikes in demand, consider using Provisioned Concurrency to reduce the function's cold start times and control costs.
Monitoring function invocations: Keep an eye on the number of function invocations and their duration using Amazon CloudWatch. Optimize your code to reduce the number of invocations and the time it takes for each function to execute.

I've written another, more comprehensive guide about cost-optimization strategies for AWS Lambda.

Reducing RDS Database Costs

Optimizing your RDS database expenses involves selecting the appropriate database engine, managing storage and IOPS, and exploring serverless options like Amazon Aurora Serverless.

Selecting the appropriate database engine: Choose the right database engine for your use case, considering factors like performance, compatibility, and cost.
Managing storage and IOPS: Monitor your RDS database's storage and I/O performance using Amazon CloudWatch. Adjust storage and IOPS settings to match your workload requirements and avoid overprovisioning resources.
Exploring Aurora Serverless: For variable workloads with occasional spikes in demand, consider using Amazon Aurora Serverless, which automatically scales capacity up and down based on actual usage, helping you save on costs.

Using AWS EC2 Spot Instances

Spot Instances are an often-overlooked way to save on EC2 costs. They allow you to bid on unused EC2 capacity at a significant discount compared to on-demand pricing.

Spot Instance fundamentals: Spot Instances are available at a lower cost because they can be terminated by AWS with a two-minute warning when the capacity is needed for on-demand customers. They're best suited for fault-tolerant, flexible workloads that can handle interruptions.
Use cases and best practices: Common use cases for Spot Instances include batch processing, big data analytics, and containerized workloads. To get the most out of Spot Instances, implement strategies such as diversified bidding, checkpointing, and graceful shutdowns.
Spot Fleet management: AWS Spot Fleet helps you manage a collection of Spot Instances, ensuring that your desired capacity is maintained while optimizing for cost and availability.

Monitoring and Analyzing Costs with AWS Cost Explorer

AWS Cost Explorer helps you monitor and analyze your AWS costs and usage, enabling you to identify trends and potential savings opportunities.

Exploring cost trends: Use Cost Explorer to visualize your historical and forecasted costs, identify patterns, and track the effectiveness of your cost optimization efforts.
Identifying cost drivers: Drill down into your cost data to identify the services, accounts, and resources responsible for your spending. This information can help you prioritize your cost optimization efforts.
Setting up custom cost reports: Configure custom reports in Cost Explorer to focus on specific aspects of your spending, such as instance types, storage classes, or data transfer costs.

Implementing AWS Budgets for Cost Control

AWS Budgets allows you to set custom cost and usage budgets for your AWS accounts and services, helping you control your spending.

Creating budgets for accounts and services: Create budgets based on your organization's needs, such as per-account or per-service spending limits. Define the budget scope, amount, and time period to reflect your business requirements.
Setting up budget alerts: Configure budget alerts to notify you when your spending or usage exceeds predefined thresholds. You can set up alerts based on actual spending, forecasted spending, or a percentage of your budget.
Analyzing budget performance: Monitor your budget performance in the AWS Budgets dashboard. Review budget adherence, identify overspending, and adjust budgets as needed to maintain control over your AWS expenses.

Getting Recommendations from AWS Trusted Advisor

AWS Trusted Advisor is a valuable resource for identifying cost optimization opportunities in your AWS environment.

Discovering cost savings recommendations: Trusted Advisor analyzes your AWS resources and usage, providing recommendations to reduce costs. These suggestions may include right-sizing instances, removing idle resources, or optimizing storage and data transfer costs.
Implementing best practices: Trusted Advisor's recommendations are based on AWS best practices, which can help you save money while maintaining high performance and reliability.
Monitoring Trusted Advisor findings: Regularly review Trusted Advisor findings and implement its recommendations to continuously optimize your AWS costs.

Automating Cost Optimization with AWS Organizations

AWS Organizations enables centralized billing and cost management, helping you automate cost optimization across multiple AWS accounts.

Centralized billing and cost management: Consolidate your AWS bills and manage costs across all your accounts from a single location.
Applying SCPs for cost control: Implement Service Control Policies (SCPs) to enforce cost-saving measures, such as restricting the use of expensive services or limiting the number of resources that can be created.

Conclusion

AWS cost optimization is an ongoing process that requires regular monitoring, analysis, and adjustment. By understanding AWS services and pricing models, leveraging cost management tools, and implementing best practices, you can effectively control your AWS expenses and improve your overall cloud efficiency. Stay proactive in your cost optimization efforts to ensure that your organization reaps the full benefits of AWS without overspending.

FAQs

What is AWS cost optimization?

AWS cost optimization refers to the process of reducing your AWS expenses by selecting the most appropriate resources and services, utilizing cost-saving features, and monitoring your usage and spending.

How can I save on EC2 costs?

You can save on EC2 costs by right-sizing your instances, choosing the appropriate instance types, using Spot Instances, and purchasing Reserved Instances or Savings Plans.

What tools does AWS offer for cost management?

AWS provides several tools for cost management, including AWS Cost Explorer, AWS Budgets, and AWS Trusted Advisor.

How can I optimize my S3 storage costs?

Optimize your S3 storage costs by selecting the appropriate storage class, configuring object lifecycle policies, and minimizing data transfer and request costs.

What are AWS Spot Instances, and when should I use them?

AWS Spot Instances are EC2 instances available at a significant discount compared to on-demand pricing. They can be terminated by AWS with a two-minute warning when the capacity is needed for on-demand customers. Spot Instances are best suited for fault-tolerant, flexible workloads that can handle unplanned interruptions.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

EC2 Data Protection: 10 Backup and Recovery Strategies for AWS EC2

Guillermo Ojeda — Fri, 24 Mar 2023 21:34:10 GMT

Amazon EC2 (Elastic Compute Cloud) offers scalable, reliable virtual computing resources in the cloud. However, ensuring data protection is crucial for every organization. In this guide, we'll explore 10 comprehensive backup and recovery strategies for Amazon EC2 to help you protect your data and maintain business continuity.

1. What are EBS Snapshots

Amazon Elastic Block Store (EBS) provides persistent block-level storage for your EC2 instances. Implement EBS snapshots to create point-in-time backups of your EBS volumes. Here's how to set up EBS snapshots:

Schedule Automatic Snapshots: Use Amazon Data Lifecycle Manager (DLM) to create a snapshot lifecycle policy. In the AWS Management Console, navigate to EC2 > EBS > Lifecycle Manager, and click "Create snapshot lifecycle policy." Define the schedule, retention rules, and target volumes using tags.
Cross-Region Copy: Use AWS CLI or SDK to copy snapshots across regions for disaster recovery and compliance purposes. For example, to copy a snapshot to another region using AWS CLI, run: aws ec2 copy-snapshot --source-region us-west-2 --source-snapshot-id snap-01234567890abcdef --destination-region us-east-1.
Monitor Snapshots: Configure Amazon CloudWatch to monitor snapshot status and usage. Set up CloudWatch Events to trigger a Lambda function when a snapshot is created or deleted, and use CloudWatch Alarms to notify you of snapshot failures or excessive usage.

Follow our guide to automating EBS snapshots for disaster recovery.

2. What are AWS EC2 AMIs

An Amazon Machine Image (AMI) is a pre-configured template that simplifies the deployment of new EC2 instances. Use custom AMIs to streamline instance recovery and maintain consistent configurations. Follow these steps to create and use custom AMIs:

Create an AMI: In the AWS Management Console, go to EC2 > Instances, select the instance you want to create an AMI from, click "Actions," and choose "Create Image." Provide a unique name and description for the AMI.
Launch Instances from AMI: When launching a new EC2 instance, select "My AMIs" in the "Choose an Amazon Machine Image" step and pick your custom AMI. This automatically replicates the original instance settings, including block device mappings and network configurations.
Update AMIs: Regularly update your custom AMIs to capture the latest configurations and security updates. Create a new AMI from an updated instance, deregister the old AMI, and replace the old AMI with the new one in your deployment scripts or templates.

3. What are AWS Instance Store Backups

Instance store volumes offer temporary block-level storage for EC2 instances. While they provide high-performance and low-latency storage, the data is lost upon instance stop or termination. To protect instance store data:

Backup Data to EBS Volumes or Amazon S3: Regularly sync your instance store data to an EBS volume or Amazon S3 bucket. For EBS, attach a new volume to your instance and use rsync or a similar tool to copy data. For Amazon S3, use the AWS CLI, SDK, or third-party tools to upload data to an S3 bucket.
Automate Backups: Schedule backup jobs using cron or another task scheduler to ensure consistent and timely data protection. Monitor and log the backup process to detect and resolve issues quickly.

4. What is Amazon S3 and How to Use it for Backups

Amazon S3 provides scalable, durable, and cost-effective object storage. Use Amazon S3 to store offsite backups of your EC2 data, including EBS snapshots, instance store backups, and AMIs.

Configure S3 Lifecycle Policies: Set up S3 lifecycle policies to transition backups between storage classes, such as moving older backups to lower-cost storage classes like S3 Glacier or deleting them after a specified retention period.
Enable Versioning and Cross-Region Replication: Use S3 versioning to preserve multiple versions of an object, which can help recover from accidental data loss or corruption. Enable cross-region replication to copy S3 objects to another region, improving disaster recovery and compliance.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

5. Hybrid Cloud Backups in AWS

For organizations with a hybrid cloud infrastructure, integrate your on-premises backup solutions with AWS to protect both local and cloud-based data.

Use AWS Storage Gateway: Deploy AWS Storage Gateway in your on-premises environment to connect your local applications with Amazon S3, Amazon EBS, or Amazon FSx for Windows File Server. This enables seamless data transfers and backups between your on-premises infrastructure and the AWS Cloud.
Leverage Third-Party Backup Solutions: Many backup software providers offer integration with AWS services, enabling centralized management of your hybrid cloud backups. Research and choose a backup solution that best fits your requirements.

6. What is AWS Backup

AWS Backup is a fully managed backup service that simplifies the management of backups across multiple AWS resources, including EC2 instances, EBS volumes, RDS databases, and more.

Create Backup Plans: Configure backup plans with custom rules for scheduling, retention, and backup window preferences. Apply backup plans to your resources using tags or resource IDs.
Monitor and Audit Backups: Use the AWS Backup Dashboard to monitor backup and restore jobs, and review resource compliance. Audit backup activities using AWS CloudTrail logs.

7. Disaster Recovery Strategies in AWS

Develop a disaster recovery plan to ensure business continuity in the event of infrastructure failures, data corruption, or other incidents.

Define Recovery Objectives: Establish Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) based on your organization's tolerance for downtime and data loss.
Implement Multi-AZ and Multi-Region Architectures: Deploy resources across multiple Availability Zones (AZs) and regions to reduce the impact of localized failures. Use Amazon RDS Multi-AZ deployments, Amazon S3 cross-region replication, and EBS snapshot cross-region copy features to improve fault tolerance.
Test Failover Procedures: Regularly test your disaster recovery plan by simulating failover scenarios and verifying that your resources can be recovered within the defined RTO and RPO.

8. What is EC2 Auto Scaling and Elastic Load Balancing

Use Auto Scaling and load balancing to dynamically adjust the number of EC2 instances based on demand, improving availability and performance.

Configure Auto Scaling Groups: Create Auto Scaling groups and define scaling policies based on metrics like CPU utilization, network traffic, or custom CloudWatch metrics. This ensures that new instances are launched or terminated based on demand, optimizing resource usage and maintaining high availability.
Implement Load Balancing: Use Elastic Load Balancing (ELB) services like Application Load Balancer (ALB) or Network Load Balancer (NLB) to distribute incoming traffic across multiple EC2 instances. This helps improve application performance and fault tolerance.

9. What is AWS CloudFormation

AWS CloudFormation enables you to manage your AWS infrastructure using code, making it easier to automate, version, and replicate infrastructure components.

Define Backup and Recovery Resources: Use CloudFormation templates to define backup and recovery resources like EBS snapshots, Amazon S3 buckets, or AWS Backup plans. This ensures consistent configurations across environments and simplifies infrastructure updates.
Automate Resource Provisioning: Use CloudFormation StackSets to create, update, or delete stacks across multiple AWS accounts and regions. This helps you to automate the provisioning and management of your backup and recovery infrastructure.

10. AWS Security Best Practices

Implement security best practices to safeguard your backup and recovery infrastructure from unauthorized access or data breaches.

Encrypt Data at Rest and in Transit: Use AWS Key Management Service (KMS) to encrypt EBS snapshots, S3 objects, and other sensitive data at rest. Enable encryption in transit for services like Amazon RDS, Amazon S3, and Elastic Load Balancing.
Use IAM Policies and Roles: Create IAM policies and roles to grant the minimum necessary permissions to users and services that access your backup and recovery resources. Regularly review and update IAM policies to ensure they align with your organization's security requirements.
Enable Monitoring and Logging: Use Amazon CloudWatch and AWS CloudTrail to monitor, log, and alert on backup and recovery activities. Set up CloudWatch Alarms to notify you of potential security issues, such as unauthorized access or failed backup jobs.

By implementing these 10 backup and recovery strategies for AWS, you can protect your Amazon EC2 data and maintain high availability, ensuring business continuity and minimizing the impact of potential failures or data loss.

Remember to regularly review and update your strategies to keep pace with the evolving needs of your organization and to maintain compliance with industry regulations and best practices. Stay proactive in monitoring the performance, security, and efficiency of your data protection efforts to minimize downtime and maintain business continuity in the face of unexpected challenges.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

In-Depth Guide to High Availability in AWS

Guillermo Ojeda — Fri, 24 Mar 2023 04:19:00 GMT

Achieving high availability in your AWS infrastructure is crucial for maintaining business continuity and minimizing the impact of service disruptions. In this detailed guide, we delve into various AWS high availability techniques and services, including step-by-step instructions on how to implement them to achieve fault tolerance and optimal performance for your applications.

High Availability in AWS

High availability refers to the ability of a system or service to remain operational and accessible despite failures or faults. By leveraging AWS high availability strategies, you can:

Minimize the risk of downtime and service interruptions
Enhance application performance and user experience
Meet service level agreements (SLAs) and compliance requirements
Improve the overall reliability and resilience of your infrastructure

AWS Services for High Availability

AWS offers a wide range of services and features designed to help you build highly available and fault-tolerant architectures. Let's explore some key services and their role in ensuring high availability:

1. Amazon EC2

Amazon Elastic Compute Cloud (EC2) provides scalable compute resources that can be easily provisioned and managed. To achieve high availability with EC2, consider the following strategies:

EC2 Auto Scaling

EC2 Auto Scaling automatically adjusts the number of EC2 instances based on demand or predefined conditions to ensure sufficient capacity and maintain performance. To set up EC2 Auto Scaling:

Create a Launch Configuration that specifies the instance type, AMI, and security groups.
Define an Auto Scaling group that uses the Launch Configuration and sets the desired capacity, minimum size, and maximum size.
Configure scaling policies that define when to scale in or out based on CloudWatch alarms.

EC2 Instances in Multiple Availability Zones

Distribute your EC2 instances across multiple Availability Zones (AZs) within a region to achieve fault tolerance and redundancy. To deploy instances in multiple AZs:

Specify multiple AZs when creating a VPC, ensuring they are part of the same region.
Launch EC2 instances in each AZ, specifying the respective subnet.
Distribute resources evenly across AZs to balance load and minimize the impact of an AZ failure.

Elastic Load Balancing (ELB) for EC2

Distribute incoming traffic across multiple EC2 instances to optimize performance and availability. To create an ELB:

Choose a load balancer type: Application Load Balancer (ALB) or Network Load Balancer (NLB).

Configure the load balancer settings, such as listener port and SSL certificate.

Create target groups with instances in multiple AZs and associate them with the load balancer.
Set up health checks and traffic routing rules to distribute traffic evenly across instances.

2. Amazon RDS

Amazon Relational Database Service (RDS) simplifies the process of setting up, operating, and scaling a relational database in the cloud. For high availability, use these RDS features:

Multi-AZ Deployments for RDS

Automatically provision a standby replica of your RDS instance in a different AZ, enabling automatic failover in case of a primary instance failure. To enable Multi-AZ deployments:

Create an RDS instance with the "Multi-AZ deployment" option enabled.
Configure automatic backups, specifying a backup window and retention period.
Monitor the replication status and failover events using CloudWatch metrics and RDS events.

RDS Read Replicas

Create read replicas to offload read traffic from your primary instance and improve performance. To set up read replicas:

Enable automatic backups for the primary RDS instance.
Create a read replica in the same region or another region, specifying the primary instance as the source.
Configure your application to direct read traffic to the read replica, using the replica's endpoint.
Monitor the replication lag and replica performance using CloudWatch metrics.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

3. Amazon S3

Amazon Simple Storage Service (S3) provides highly available and durable storage for various types of data. To ensure high availability with S3, implement the following features:

S3 Bucket Replication

Automatically replicate S3 objects across buckets in different regions to improve data durability and minimize the impact of regional failures. To set up cross-region replication:

Enable versioning on the source and destination buckets.
Configure an S3 replication rule on the source bucket, specifying the destination bucket and a suitable IAM role.
Verify the replication status using S3 object metadata and monitor replication metrics in CloudWatch.

S3 Transfer Acceleration

Speed up the transfer of data between clients and S3 by leveraging Amazon CloudFront's globally distributed edge locations. To enable S3 Transfer Acceleration:

Enable Transfer Acceleration on your S3 bucket.
Use the Transfer Acceleration endpoint when uploading or downloading data from the bucket.
Monitor the transfer performance and cost savings using CloudWatch metrics and S3 usage reports.

4. Amazon Route 53

Amazon Route 53 is a highly available and scalable DNS service that helps route user requests to your application endpoints. Enhance high availability with these Route 53 features:

Latency-Based Routing with Route 53

Route traffic to the endpoint with the lowest latency for the user, improving performance and reducing load on your infrastructure. To set up LBR:

Create a hosted zone for your domain in Route 53.
Create latency alias resource record sets for each of your application's endpoints, specifying the latency region.
Configure health checks to monitor the availability of your endpoints and automatically reroute traffic in case of failure.

Geolocation Routing with Route 53

Direct user traffic to specific endpoints based on the user's geographic location, optimizing performance and ensuring compliance with regional data regulations. To enable geolocation routing:

Create a hosted zone for your domain in Route 53.
Create geolocation resource record sets for each of your application's endpoints, specifying the geographic region.
Configure health checks to monitor endpoint availability and automatically reroute traffic if needed.

5. AWS Global Accelerator

AWS Global Accelerator is a networking service that improves the availability and performance of your applications for users around the world by routing traffic through AWS's globally distributed edge locations. To set up AWS Global Accelerator:

Create an accelerator, specifying your desired IP address type (static or elastic).
Add listeners to your accelerator, configuring the protocols and port ranges.
Create endpoint groups for each AWS region where your application is deployed.
Add application endpoints (such as EC2 instances or load balancers) to the endpoint groups.
Update your DNS records with the Global Accelerator's Anycast IP addresses to route user traffic.
Monitor the performance and health of your accelerator using CloudWatch metrics and health checks.

High Availability Design Patterns and Best Practices in AWS

In addition to using AWS services, consider implementing these high availability design patterns and best practices:

Decoupling Components in Your AWS Architecture

Decouple your application components to minimize the impact of failures and improve scalability. Use services like Amazon SQS, SNS, and Kinesis to build decoupled, event-driven architectures.

Stateless Applications in AWS

Design stateless applications to ensure that any instance can handle any request without relying on session or state information. Use services like Amazon DynamoDB, ElastiCache, or Amazon RDS to store and manage state information externally.

Distributed Data in AWS

Distribute data across multiple AZs and regions to achieve fault tolerance and minimize the impact of failures. Use services like Amazon RDS Multi-AZ deployments, S3 cross-region replication, and DynamoDB global tables.

Implementing a Cache in AWS

Implement caching strategies to improve application performance and reduce the load on your backend services. Use services like Amazon ElastiCache or Amazon CloudFront to cache frequently accessed data and content.

Monitoring and Alerting in AWS

Monitor your infrastructure and set up alerts to proactively detect and respond to failures and performance issues. Use services like Amazon CloudWatch, AWS X-Ray, and AWS Trusted Advisor to monitor and optimize your infrastructure.

Backup and Disaster Recovery in AWS

Regularly back up your data and test your disaster recovery plan to minimize data loss and ensure business continuity. Use services like AWS Backup, Amazon RDS snapshots, and Amazon S3 lifecycle policies to automate backup and recovery processes.

Infrastructure as Code in AWS

Manage and version your infrastructure as code using AWS CloudFormation or Terraform to ensure consistency, repeatability, and easy recovery. Implement continuous integration and continuous deployment (CI/CD) pipelines to automate infrastructure provisioning and application deployments.

By implementing these AWS services, design patterns, and best practices, you can build a highly available, fault-tolerant infrastructure that ensures optimal performance and reliability for your applications.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Beginner's Guide to AWS: Core AWS Services And Use Cases

Guillermo Ojeda — Fri, 24 Mar 2023 03:30:57 GMT

Explore the core AWS services in detail and learn how they can empower your business with scalable, reliable, and cost-effective cloud solutions.

1. What is Amazon EC2

Amazon Elastic Compute Cloud (EC2) provides resizable compute capacity in the cloud, allowing you to easily scale your applications and infrastructure. EC2 offers a wide range of instance types, each optimized for specific workloads and designed to meet varying compute, memory, storage, and network requirements.

EC2 Use Cases and Advantages

Web Hosting: Deploy scalable websites and web applications on EC2 instances, with the ability to handle traffic spikes and ensure high availability using features like auto-scaling groups, load balancing, and Amazon RDS integration for database management.
Big Data Processing: Run large-scale data processing workloads on EC2 instances, utilizing Hadoop, Spark, or other distributed computing frameworks. Benefit from high-performance storage options like Amazon EBS and instance types optimized for data processing tasks.
Backup and Recovery: Create and manage backup solutions for your critical data using EC2 instances and storage services like Amazon S3 or Amazon Glacier. Implement disaster recovery strategies by replicating data across regions and leveraging EC2 features such as snapshots, AMIs, and Elastic IPs.

2. What is Amazon S3

Amazon Simple Storage Service (S3) offers robust, scalable, and cost-effective object storage for a wide range of use cases. With virtually unlimited storage capacity, S3 provides high durability and availability, making it suitable for storing and retrieving any amount of data.

S3 Use Cases and Key Features

Data Archiving: Store and manage long-term data archives using S3's customizable retention policies and integration with Amazon Glacier for cost-effective, long-term storage. Utilize features like object tagging and lifecycle policies to automate data management tasks.
Backup and Recovery: Protect your data by creating and managing backups with S3's versioning, cross-region replication, and lifecycle policies. Ensure data durability by storing data across multiple availability zones and regions.
Content Delivery: Distribute content to users worldwide with low latency and high transfer speeds using Amazon CloudFront integration, which leverages S3 as an origin server for caching and serving content through a global network of edge locations.

3. What is Amazon RDS

Amazon Relational Database Service (RDS) makes it easy to set up, operate, and scale relational databases in the cloud. RDS supports popular database engines, including MySQL, PostgreSQL, Oracle, and Microsoft SQL Server, offering automated backups, patching, monitoring, and scaling features.

RDS Use Cases and Benefits

Web Applications: Power data-driven web applications with fully managed RDS database instances, which provide high availability, automated backups, and scaling capabilities to handle varying workloads. Easily integrate with other AWS services, like EC2 and Elastic Beanstalk, to build end-to-end solutions for your applications.
Data Warehousing: Utilize RDS to create and manage data warehouses that store and analyze large volumes of structured data. Leverage features like Amazon Redshift integration for faster querying and better performance.
Database Migration: Migrate your on-premises or legacy databases to RDS with minimal downtime using the AWS Database Migration Service, which supports homogeneous and heterogeneous migrations.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

4. What is Amazon Lambda

AWS Lambda is a serverless compute service that lets you run your code without provisioning or managing servers. You only pay for the compute time you consume, and Lambda automatically scales with the number of requests, making it ideal for applications with variable workloads.

Lambda Use Cases and Key Features

Microservices: Build and deploy microservices-based applications using Lambda as the compute backend. Leverage the event-driven architecture to respond to events from other AWS services like API Gateway, S3, and DynamoDB.
Real-Time Data Processing: Process data streams in real time using Lambda functions triggered by events from Amazon Kinesis, Amazon S3, or other data sources. Perform data transformations, filtering, and analysis without the need for dedicated servers.
Automation and Orchestration: Automate tasks and orchestrate complex workflows using Lambda functions in combination with AWS Step Functions and other AWS services.

5. What is Amazon VPC

Amazon Virtual Private Cloud (VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define. This provides enhanced security, control, and flexibility over your cloud infrastructure.

VPC Use Cases and Features

Network Isolation: Create isolated, secure environments for your applications and resources within a VPC, with fine-grained control over network access and traffic routing.
Hybrid Cloud Architecture: Extend your on-premises network to the AWS Cloud using a VPC, facilitating a hybrid cloud architecture that leverages the best of both worlds.
Multi-Tier Applications: Design and deploy multi-tier applications within a VPC, separating resources into different subnets based on their functions and access requirements.

6. What is Amazon CloudFront

Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to users globally with low latency and high transfer speeds. CloudFront integrates seamlessly with other AWS services, like S3 and EC2, to provide an optimized content delivery solution.

CloudFront Use Cases and Benefits

Website Acceleration: Speed up the delivery of static and dynamic web content, including HTML, CSS, JavaScript, and images, to users around the world with CloudFront's global network of edge locations.
Video Streaming: Deliver high-quality, low-latency video streams to users worldwide using CloudFront's optimized network and streaming protocols, such as HLS and DASH.
API Caching and Acceleration: Improve the performance and reliability of your APIs by caching and serving API responses at the edge, reducing the load on your origin servers.

7. What is Amazon DynamoDB

Amazon DynamoDB is a managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB is ideal for applications that require low-latency data access, flexible data models, and the ability to scale horizontally.

DynamoDB Use Cases and Key Features

Gaming and Mobile Applications: Power real-time, data-intensive gaming and mobile applications with DynamoDB's low-latency data access and flexible data modeling capabilities.
Internet of Things (IoT): Store and retrieve large volumes of time-series data generated by IoT devices using DynamoDB's high throughput and consistent performance.
Personalization and Recommendation Engines: Build personalized user experiences and recommendation engines by leveraging DynamoDB's fast and flexible data access capabilities, ideal for processing large volumes of user data and preferences.

8. What is Amazon ECS

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that makes it easy to deploy, manage, and scale containerized applications using Docker. ECS integrates with other AWS services like EC2, ECR, and Fargate, providing a comprehensive solution for running containerized applications.

ECS Use Cases and Benefits

Microservices: Design, deploy, and manage microservices-based applications using ECS to orchestrate container deployment, scaling, and networking.
Batch Processing: Run batch processing workloads using ECS tasks and services, enabling efficient resource utilization and easy scaling based on workload demands.
Continuous Integration and Deployment (CI/CD): Integrate ECS with AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy to implement a CI/CD pipeline for your containerized applications.

9. What is Amazon SQS

Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. SQS eliminates the complexity and overhead associated with managing and operating message-oriented middleware.

SQS Use Cases and Features

Asynchronous Communication: Implement asynchronous communication between microservices and applications by sending messages to an SQS queue, allowing the receiving components to process messages at their own pace.
Distributed Task Queues: Distribute tasks among multiple worker processes using SQS queues, enabling parallel processing and improved fault tolerance.
Event-driven Workflows: Trigger event-driven workflows and processes by sending messages to an SQS queue, which can be processed by Lambda functions or other AWS services.

10. What is Amazon SNS

Amazon Simple Notification Service (SNS) is a fully managed messaging service that enables you to send messages to multiple subscribers using a publish-subscribe (pub/sub) pattern. SNS supports various protocols, such as email, SMS, and HTTP(S), as well as integration with AWS Lambda and other AWS services.

Application Notifications: Send notifications to users or other systems when events occur within your application, such as order status updates, account activity alerts, or system health notifications.
Mobile Push Notifications: Deliver push notifications to mobile devices using SNS's platform-agnostic messaging capabilities, supporting Apple (APNs), Google (GCM/FCM), and Amazon (ADM) push notification services.
Fan-out Messaging: Publish a single message to multiple subscribers using SNS topics, enabling efficient and scalable distribution of messages across various systems and applications.

In conclusion, AWS offers a wide range of services to help you build, deploy, and manage modern applications. By understanding the key features and use cases of these essential services, you can create scalable, secure, and cost-effective cloud solutions tailored to your specific needs.

Whether you're developing web applications, processing massive amounts of data, or building event-driven architectures, AWS provides the tools and services necessary to succeed in today's competitive digital landscape. By leveraging AWS's ecosystem of managed services, you can focus on your core business needs while leaving the heavy lifting of infrastructure management to AWS.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

The Ultimate Guide to AWS Services for IoT Applications

Guillermo Ojeda — Fri, 24 Mar 2023 02:30:07 GMT

Discover the top AWS services that will revolutionize your IoT applications, and learn how to harness the full potential of cloud computing for your projects.

1. What is AWS IoT Core

AWS IoT Core is the heart of your IoT applications, providing a scalable, secure, and reliable platform for connecting devices, processing data, and managing communications.

Key Features of AWS IoT Core

Device Connectivity: Supports secure connections for millions of devices using MQTT, WebSockets, and HTTP protocols.
Message Broker: Facilitates efficient, low-latency communication between devices and applications.
Device Shadow: Stores the latest state of each device, enabling seamless synchronization between devices and applications.
Rules Engine: Transforms and routes incoming data to other AWS services based on customizable rules.

2. What is AWS IoT Device Management

AWS IoT Device Management makes it easy to onboard, monitor, and manage your entire fleet of IoT devices at scale.

Benefits of AWS IoT Device Management

Device Provisioning: Automate device registration and securely store device metadata.
Remote Device Management: Monitor and control devices remotely, perform over-the-air (OTA) updates, and troubleshoot issues.
Fleet Indexing: Search and analyze your device fleet based on attributes, status, or custom tags.
Fine-Grained Access Control: Define custom roles and policies to control access to devices and resources.

3. What is AWS IoT Analytics

AWS IoT Analytics empowers you to collect, process, and analyze IoT data, delivering actionable insights to drive informed decision-making.

AWS IoT Analytics Capabilities

Data Collection: Ingest raw data from IoT devices or other AWS services, such as Amazon S3.
Data Processing: Cleanse, transform, and enrich your data using customizable processing pipelines.
Data Storage: Store processed data in a time-series format, optimized for efficient analysis and querying.
Data Analysis: Perform ad-hoc queries, visualize data with Amazon QuickSight, or build custom machine learning models using Amazon SageMaker.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

4. What is AWS IoT Greengrass

AWS IoT Greengrass allows you to run local compute, messaging, data caching, and machine learning inference on your IoT devices, even when offline.

Advantages of AWS IoT Greengrass

Local Device Operations: Execute AWS Lambda functions and manage device state locally, reducing latency and improving responsiveness.
Offline Operation: Maintain device functionality during intermittent connectivity or in remote locations.
Secure Communication: Establish secure, encrypted connections between devices and the cloud.
Over-the-Air Updates: Deploy software updates, Lambda functions, and machine learning models to devices remotely.

5. What is AWS IoT Events

AWS IoT Events enables you to build event-driven applications that automatically detect and respond to changes in device data, system events, or sensor readings.

AWS IoT Events for Enhanced Automation

Event Detection: Define custom events based on patterns, thresholds, or device status.
Event Actions: Trigger AWS Lambda functions, send notifications, or update device state in response to events.
Visual Workflow: Design and manage complex workflows using a drag-and-drop interface.
Scalable and Secure: Handle millions of events and devices with built-in AWS IoT Events, while ensuring data privacy and security.

6. What is Amazon Kinesis Data Streams

Amazon Kinesis Data Streams provides a scalable, managed platform for processing real-time data from IoT devices, enabling you to gain insights and respond to events as they occur.

Processing Real-Time Data with Amazon Kinesis

High Throughput: Ingest and process large volumes of data from IoT devices with low latency.
Data Durability: Automatically replicate data across multiple availability zones for fault tolerance.
Flexible Scaling: Dynamically adjust the processing capacity to handle fluctuations in data volume.
Stream Processing: Integrate with AWS Lambda, Apache Flink, or other stream processing frameworks to analyze and process data in real time.

7. What is AWS IoT Device Defender

AWS IoT Device Defender helps protect your IoT devices and applications by continuously monitoring and enforcing security best practices.

IoT Security with AWS IoT Device Defender

Security Audits: Evaluate your IoT devices and applications against pre-defined security policies or create custom policies tailored to your needs.
Anomaly Detection: Monitor device behavior and detect anomalies that may indicate security threats or compromised devices.
Alerts and Notifications: Receive real-time alerts for security incidents or policy violations.
Mitigation Actions: Automate incident response with AWS IoT Events, AWS Lambda, or other AWS services.

8. What is Amazon S3

Amazon Simple Storage Service (S3) offers durable, scalable, and cost-effective object storage for your IoT data, making it an ideal choice for long-term storage and archiving.

Benefits of Amazon S3 for IoT Applications

High Durability: Store your data across multiple availability zones, ensuring data protection and availability.
Flexible Storage Classes: Optimize cost and performance with storage classes tailored to different use cases, from frequent access to long-term archiving.
Data Management: Organize, search, and manage your IoT data with object tagging, versioning, and lifecycle policies.
Security and Compliance: Secure your data with encryption, access controls, and compliance certifications.

Conclusion

Leveraging these powerful AWS services will enable you to build robust, scalable, and secure IoT applications that harness the full potential of cloud computing. By combining AWS IoT Core, IoT Device Management, IoT Analytics, IoT Greengrass, IoT Events, Amazon Kinesis Data Streams, IoT Device Defender, and Amazon S3, you can create a comprehensive IoT ecosystem that drives innovation and delivers results.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Top 15 AWS Lambda Use Cases

Guillermo Ojeda — Thu, 23 Mar 2023 20:36:16 GMT

AWS Lambda has revolutionized the way we build and deploy applications, allowing developers to focus on writing code without worrying about server management. In this comprehensive guide, we'll explore 15 powerful AWS Lambda use cases that showcase the true potential of serverless computing, delving into each use case with more in-depth information.

AWS Lambda for Real-Time Data Processing

AWS Lambda enables efficient real-time data processing, making it ideal for ingesting data from various sources, including Kinesis Data Streams, DynamoDB, and S3. Use Lambda for log analysis by ingesting logs from various sources, analyzing patterns, and alerting on anomalies. Clickstream analytics can also be performed with Lambda, processing user interaction data in real time to understand user behavior, optimize websites, or recommend content. Social media sentiment analysis with Lambda can help businesses track brand sentiment by processing social media feeds, identifying trends, and reacting to customer feedback.

Transforming Data with ETL using AWS Lambda

Perform Extract, Transform, Load (ETL) operations and transform raw data into meaningful insights with AWS Lambda. Data cleansing and filtering can be done by removing invalid entries, duplicates, or irrelevant data points. Lambda can also be used to enrich and aggregate data, combining data from multiple sources and performing calculations or transformations. Data format conversion, such as converting CSV files to JSON, can be performed with Lambda functions, enabling seamless data interchange between systems.

Serverless Event-Driven Architectures using AWS Lambda

Leverage AWS Lambda's event-driven architecture to trigger functions automatically when specific events occur. Common event sources include Amazon S3 for file uploads and deletions, DynamoDB for table updates, and CloudWatch Events for scheduled tasks. By using Lambda to handle event-driven workflows, developers can create highly responsive and scalable applications that react to changes in real time. Lambda functions can be used for automatically resizing images upon upload, sending notifications when new records are added to a database, or running periodic maintenance tasks.

Creating Web Applications and APIs with AWS Lambda

Build serverless web applications and APIs with AWS Lambda, API Gateway, and other managed services. This approach simplifies deployment, scaling, and maintenance by offloading infrastructure management tasks to AWS. With Lambda and API Gateway, developers can create scalable and secure RESTful APIs, enabling seamless integration with web applications, mobile apps, and third-party services. Additionally, Lambda can be combined with other AWS services, such as Amazon S3 for static website hosting and Amazon Cognito for user authentication, to build complete serverless web applications.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

AWS Lambda for Cost-Efficient Image and Video Processing

Optimize and process multimedia content on-the-fly using AWS Lambda. Examples include image resizing and compression, which can be performed as images are uploaded to S3, ensuring that different sizes and formats are available for various devices and network conditions. Thumbnail generation for images and videos can be done with Lambda, creating previews for faster browsing and improved user experience. Video transcoding and watermarking can also be handled by Lambda, converting video files to different formats or adding branding elements, enabling seamless content delivery across devices and platforms.

Building Chatbots and Conversational Interfaces with AWS Lambda

Create intelligent chatbots and conversational interfaces using AWS Lambda with Amazon Lex, Polly, and other AI services. With Lambda, developers can build custom logic for chatbots, allowing them to respond to user input, access external APIs, or interact with other AWS services. This enables the creation of highly engaging and interactive chat experiences that can be integrated with websites, mobile applications, or messaging platforms, such as Facebook Messenger and Slack. Additionally, AWS Lambda can be combined with Amazon Polly for text-to-speech capabilities, making it possible to create voice-enabled applications and interfaces that work with Amazon Alexa and other voice assistants.

Powering IoT Backend Processing with AWS Lambda

Implement serverless IoT backends with AWS Lambda, handling millions of requests from connected devices and processing data in real time. With Lambda, developers can create custom logic to process data from IoT devices, perform calculations, and store the results in databases like Amazon DynamoDB or time-series databases like Amazon Timestream. This enables the development of IoT applications that can scale to handle large numbers of devices, while minimizing infrastructure costs and management overhead. AWS Lambda can also be integrated with AWS IoT Core, a managed service that provides secure device connectivity and messaging, allowing developers to focus on building IoT applications without worrying about the underlying infrastructure.

Deploying Machine Learning Inference with AWS Lambda

Perform real-time inference with pre-trained machine learning models using AWS Lambda and Amazon SageMaker, enabling AI-driven applications without managing infrastructure. By deploying machine learning models as Lambda functions, developers can create applications that utilize AI capabilities, such as image recognition, natural language processing, and anomaly detection, without the need for dedicated servers or complex deployment pipelines. This serverless approach simplifies the integration of machine learning into existing applications and workflows, making it more accessible for developers and businesses.

Serverless Cron Jobs with AWS Lambda

Replace traditional cron jobs with serverless scheduled tasks using AWS Lambda and CloudWatch Events, simplifying scheduling and execution. By running scheduled tasks as Lambda functions, developers can offload the management of servers and ensure that tasks run reliably and on time. This serverless approach also allows for dynamic scaling of resources, ensuring that tasks run efficiently even as their resource requirements change. Examples of serverless cron jobs include nightly database backups, periodic data processing tasks, or scheduled reporting and analytics.

Notifications and Alerts with AWS Lambda

Send notifications and alerts based on specific triggers using AWS Lambda with Amazon SNS, SES, and other messaging services. By combining Lambda with these messaging services, developers can create custom notification workflows that respond to events or conditions within their applications. For example, Lambda can be used to send email notifications when new records are added to a database, send SMS alerts when a sensor detects abnormal conditions, or publish messages to SNS topics for further processing by other Lambda functions or subscribers.

AWS Lambda for Authentication and Authorization

Enhance security by using AWS Lambda to implement custom authentication and authorization logic for your applications and APIs. Lambda can be used in conjunction with Amazon API Gateway or Amazon Cognito to enforce custom authentication requirements, such as multi-factor authentication, IP address restrictions, or integration with third-party identity providers. Lambda functions can also be used for fine-grained authorization, allowing developers to implement custom access control policies based on user attributes, roles, or resource ownership. By leveraging AWS Lambda for authentication and authorization, developers can create more secure applications while maintaining flexibility and control over access management.

Backup and Archiving with AWS Lambda

Automate backup and archiving processes using AWS Lambda, ensuring data durability and compliance with minimal effort. Lambda can be used to create custom backup workflows, such as periodic snapshots of Amazon RDS databases, or automatic backups of Amazon S3 objects to Amazon Glacier for long-term storage. Additionally, Lambda functions can be used to enforce data retention policies, deleting old or unused data to reduce storage costs and maintain compliance with regulatory requirements.

Content Moderation with AWS Lambda

Protect your platforms and users by implementing content moderation workflows with AWS Lambda. By integrating Lambda with Amazon Rekognition, developers can automatically moderate images and videos for explicit or inappropriate content, ensuring that user-generated content adheres to platform guidelines and community standards. Lambda can also be used to implement custom moderation logic, such as keyword filtering for text-based content or integration with third-party content moderation services.

Environment Cleanup using AWS Lambda

Keep your AWS environments clean and organized by using AWS Lambda to automatically delete unused resources or perform periodic maintenance tasks. For example, Lambda functions can be used to identify and delete unused Amazon EC2 instances or Amazon RDS snapshots, reducing costs and freeing up resources. Lambda can also be used to enforce naming conventions, tags, or other organizational policies, ensuring consistency and maintainability across AWS environments.

Custom Integrations of AWS Lambda

Extend the capabilities of your applications by building custom integrations with AWS Lambda. With its event-driven architecture and support for various AWS services, Lambda is a versatile tool for creating custom workflows and automations that can be triggered by events or API calls. Examples of custom integrations include generating custom reports, performing data synchronization between systems, or integrating with third-party APIs and services.

Conclusion

AWS Lambda offers a wide range of use cases that demonstrate the power and flexibility of serverless computing. By leveraging Lambda and its integration with other AWS services, developers can create scalable, efficient, and cost-effective applications while minimizing the need for server management. Explore these 15 use cases to discover how AWS Lambda can revolutionize your applications and workflows, unlocking the full potential of serverless computing.While AWS Lambda is fantastic for saving money in highly volatile environments, you should apply cost-saving strategies for AWS Lambda to reduce the costs to a minimum.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

AWS Networking Fundamentals for Solutions Architects

Guillermo Ojeda — Thu, 23 Mar 2023 20:01:57 GMT

Understanding AWS networking fundamentals is critical for designing, deploying, and managing efficient, secure, and cost-effective cloud solutions in AWS. In this guide, we will explore the essential AWS networking fundamentals concepts that every aspiring AWS Solutions Architect should master.

What is AWS VPC

Amazon Virtual Private Cloud (VPC) is the cornerstone of AWS networking, providing a secure and isolated environment for your resources. With VPC, you can define custom IP address ranges, create subnets, and configure network gateways and security settings. Subnets help you divide your VPC's IP address range into smaller segments for better resource organization and security. You can create public subnets with internet access and private subnets without direct internet access, enhancing security for sensitive resources.

Amazon VPC Key Components

Route Tables: Define how traffic is routed within your VPC and between external networks.
Internet Gateways: Provide internet access to your VPC and enable communication between your VPC and the internet.
NAT Gateways: Allow resources in private subnets to access the internet while preventing direct inbound connections from the internet.
Security Groups: Act as virtual firewalls controlling inbound and outbound traffic at the instance level.
Network ACLs: Provide an additional layer of security by controlling inbound and outbound traffic at the subnet level.

Advantages and Use Cases of Amazon VPC

Enhanced Security: Isolate resources in private subnets to limit exposure to external threats.
Customization: Define custom IP address ranges and networking configurations to suit your application requirements.
Scalability: Easily add or remove resources and subnets as your infrastructure grows.
Compliance: Meet specific regulatory and compliance requirements by isolating resources within a VPC.

Expert Tips for Amazon VPC

Plan your IP address ranges carefully to avoid conflicts and ensure seamless communication between resources.
Use multiple Availability Zones within a VPC to increase fault tolerance and ensure high availability.
Regularly review and update security groups and network ACLs to maintain a secure environment.

What is AWS Route 53

Amazon Route 53 is a scalable and reliable Domain Name System (DNS) service that helps manage domains, route internet traffic, and connect users to your applications. With Route 53, you can register, transfer, and manage domain names, configure DNS records, and resolve domain names to IP addresses. Health checks in Route 53 let you monitor the health of your resources and route traffic based on resource availability.

Core Features of Route 53

Latency-based Routing: Route traffic to the resource with the lowest latency for the user.
Geolocation Routing: Direct traffic based on the geographic location of users.
Weighted Round-Robin: Distribute traffic across multiple resources based on assigned weights.
Failover Routing: Automatically redirect traffic to a secondary resource if the primary one is unavailable.
Resolver Rules: Customize DNS query behavior for VPCs by forwarding specific DNS queries to on-premises or third-party DNS resolvers.

Advantages and Use Cases of Route 53

High Availability: Route traffic to healthy resources to ensure uninterrupted service.
Global Scalability: Serve users worldwide with low latency and high performance.
Traffic Control: Implement advanced routing policies to distribute traffic across resources efficiently
DNS Management: Register, transfer, and manage domain names with ease.

Expert Tips for Route 53

Monitor resource health with Route 53 health checks and configure alerts to stay informed about issues.
Implement DNS failover to maintain high availability for your applications.
Use Private DNS for VPCs to manage private domain names without exposing them to the public internet.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

What is AWS CloudFront

Amazon CloudFront is a global Content Delivery Network (CDN) that accelerates the delivery of static and dynamic content, improving user experience and reducing latency. It caches content at globally distributed edge locations, bringing content closer to users. CloudFront integrates with various AWS and custom origins, such as Amazon S3, EC2, and on-premises servers.

Key Features of CloudFront

Edge Locations: Cache content at globally distributed edge locations, reducing latency for users.
Origin Support: Integrate with various AWS and custom origins, including S3 buckets, EC2 instances, and custom HTTP servers.
Security: Implement HTTPS, AWS WAF, and Amazon Route 53 for secure and reliable content delivery. Restrict access to content using Origin Access Identity (OAI).
Customization: Fine-tune caching, headers, and error responses to tailor your content delivery strategy.

Advantages and Use Cases of CloudFront

Improved Performance: Accelerate content delivery with reduced latency, enhancing user experience.
Global Reach: Serve content to users around the world with a network of edge locations.
Security: Protect your content and applications with HTTPS, AWS WAF, and access control.
Cost Savings: Optimize data transfer costs by caching content at edge locations.

Expert Tips for CloudFront

Use cache behaviors to control caching policies and forward specific headers, cookies, or query strings to your origin.
Implement Lambda@Edge to run custom code at edge locations for personalized content delivery and response modifications.
Enable access logging to monitor and analyze requests to your CloudFront distributions.

What is AWS Direct Connect

AWS Direct Connect provides dedicated, private network connections between your on-premises data center and AWS, bypassing the public internet. This results in reduced data transfer costs, enhanced security, and consistent network performance. Direct Connect simplifies your network architecture by enabling multi-region connections and providing direct access to private VPC resources without using a VPN or traversing the public internet.

Key Features of Direct Connect

Dedicated Connections: Establish private, dedicated network connections between AWS and your on-premises data center.
Enhanced Performance: Experience consistent network performance with reduced latency and jitter.
Multi-Region Access: Connect to multiple AWS regions and services through a single Direct Connect connection.
Cost Savings: Reduce data transfer costs compared to standard internet-based data transfers.

Advantages and Use Cases of Direct Connect

Security: Maintain a secure connection between your on-premises data center and AWS without traversing the public internet.
Performance: Achieve consistent network performance with reduced latency, jitter, and data transfer costs.
Hybrid Architectures: Seamlessly integrate AWS resources with your on-premises infrastructure.

Expert Tips for Direct Connect

Plan your Direct Connect deployment carefully, considering factors such as redundancy, capacity, and latency requirements.
Use Direct Connect Gateway to connect to multiple VPCs across AWS regions through a single Direct Connect connection.
Monitor your Direct Connect connections using Amazon CloudWatch to ensure optimal performance and troubleshoot issues.

What is AWS Elastic Load Balancing (ELB)

Elastic Load Balancing (ELB) distributes incoming traffic across multiple targets, such as Amazon EC2 instances, containers, and IP addresses, to ensure optimal resource utilization, high availability, and fault tolerance. ELB offers three types of load balancers: Application Load Balancer (ALB), Network Load Balancer (NLB), and Classic Load Balancer (CLB).

Key Features of Elastic Load Balancing

Health Checks: Monitor the health of targets and route traffic only to healthy resources.
SSL/TLS Termination: Offload SSL/TLS decryption from your backend instances, improving performance and simplifying certificate management.
Sticky Sessions: Enable session affinity to route requests from a specific user to the same target.
Cross-Zone Load Balancing: Distribute traffic evenly across resources in multiple Availability Zones.

Advantages and Use Cases of Elastic Load Balancing

High Availability: Route traffic to healthy resources, ensuring uninterrupted service.
Scalability: Automatically scale your load balancer as traffic demands change.
Flexibility: Choose from three types of load balancers to suit your application requirements (ALB, NLB, and CLB).

Disadvantages of Elastic Load Balancing

Cost: Running an ELB instance may incur additional costs, depending on the type and duration of usage.
Complexity: Configuring and managing an ELB instance can be complex, especially for large-scale deployments.

Expert Tips for Elastic Load Balancing

Choose the right type of load balancer based on your application requirements (Layer 7 features with ALB, high-performance Layer 4 load balancing with NLB, or basic features with CLB).
Enable access logs to monitor and analyze traffic patterns to your load balancer.
Configure health checks and alarms to ensure optimal performance and availability.

What is AWS API Gateway

Amazon API Gateway is a fully managed service that makes it easy to create, publish, maintain, and monitor APIs for your applications. With API Gateway, you can create RESTful and WebSocket APIs, define custom domain names, and generate client SDKs. API Gateway integrates with AWS services like AWS Lambda, Amazon EC2, and Amazon S3, allowing you to create backend services without provisioning or managing servers.

Key Features of Amazon API Gateway

Custom Domain Names: Define custom domain names for your APIs and manage SSL/TLS certificates.
Throttling and Quotas: Control the rate of requests to your APIs and set quotas to prevent abuse.
Caching: Cache responses from your backend services to improve performance and reduce latency.
Logging and Monitoring: Gain insights into API usage, performance, and errors with Amazon CloudWatch and AWS X-Ray.
Integrations: Integrate with other services, such as Cognito for authorization and WAF for DDoS protection.

Advantages and Use Cases of Amazon API Gateway

Serverless Integration: Create serverless APIs by integrating with AWS Lambda, reducing operational overhead and cost.
Scalability: Automatically scale your APIs to handle large numbers of requests.
Security: Protect your APIs with AWS Identity and Access Management (IAM), Amazon Cognito, and custom authorizers.
Flexibility: Support both RESTful and WebSocket APIs to suit various application requirements.

Expert Tips for Amazon API Gateway

Use caching to improve API performance and reduce backend service load.
Monitor your APIs using Amazon CloudWatch and AWS X-Ray to identify performance bottlenecks and troubleshoot issues.
Secure your APIs by implementing proper authentication and authorization mechanisms, such as IAM, Amazon Cognito, and custom authorizers.
Use custom domain names and SSL certificates to make your APIs secure.

Disadvantages of Amazon API Gateway

Cost: API Gateway charges can accumulate quickly for high-traffic APIs.
Learning Curve: API Gateway's advanced features and integrations may be complex for beginners.

Conclusion

Mastering AWS networking fundamentals is vital for every AWS Solutions Architect. By gaining a deep understanding of these core concepts and services, you can design and deploy secure, scalable, and high-performance cloud solutions. Keep refining your knowledge and stay up-to-date with the latest AWS developments to ensure your success as an AWS Solutions Architect.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Mastering AWS Lambda Cost Optimization: 9 Essential Techniques to Maximize Savings

Guillermo Ojeda — Thu, 23 Mar 2023 19:47:32 GMT

Efficiently managing AWS Lambda costs is crucial for organizations that want to get the most value from their serverless architectures, for all AWS Lambda use cases. In this comprehensive guide, we analyze 9 proven strategies to optimize AWS Lambda cost, enabling you to fully enjoy the power of serverless computing while keeping AWS costs under control.

1. Optimizing the Memory of Lambda Functions

Choosing the appropriate memory size for your Lambda functions is key to striking a balance between performance and cost. Keep in mind that AWS Lambda pricing is based on both memory and execution time. Carefully analyze your function's requirements and allocate memory accordingly to prevent over-provisioning or performance issues. Here are some tips to help you with memory configuration:

Start with a reasonable default memory size based on the function's purpose.
Perform load testing to measure the function's performance under different memory settings.
Analyze the function's execution logs to identify potential bottlenecks or memory leaks.
Keep an eye on CloudWatch metrics like Duration and MemorySize to make informed decisions on memory allocation.

2. Fine-Tune Lambda Function Timeout

Setting appropriate timeout limits for your Lambda functions can lead to significant cost savings. By setting a function's timeout to the optimal value, you can prevent excessive charges for stalled executions. Monitor and analyze function metrics to determine the best timeout settings for your specific use case. Consider the following:

Start with a conservative timeout value, then iteratively adjust based on your function's observed performance.
Monitor the function's average and maximum execution times to ensure the timeout value is appropriate.
Use CloudWatch alarms to receive notifications if a function times out or exceeds a certain duration threshold.

3. Using Lambda Provisioned Concurrency

Provisioned Concurrency allows you to pre-warm a set number of Lambda function instances, ensuring consistent performance and reducing latency. This feature is particularly beneficial for latency-sensitive applications. While provisioned concurrency incurs additional costs, it can lead to overall cost savings by reducing execution time and minimizing cold starts. To make the most of provisioned concurrency, follow these tips:

Estimate the number of concurrent requests your application typically handles and provision accordingly.
Monitor the function's invocation and concurrency metrics to adjust the number of pre-warmed instances as needed.
Use CloudWatch alarms to detect spikes in demand and trigger automated scaling actions.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

4. Minimize AWS Lambda Deployment Package Size

Reducing the size of your Lambda deployment packages can help lower costs by decreasing initialization time and improving performance. To minimize package size, remove unnecessary dependencies and files, and utilize tools like AWS Lambda Layers or the Serverless Framework to share common code across multiple functions. Here are some best practices:

Use code minification and compression techniques to reduce the size of JavaScript, CSS, and HTML files.
Bundle only the necessary parts of third-party libraries or modules.
Leverage Lambda Layers to share common dependencies and code across multiple functions, reducing duplication.

5. Optimize Lambda Function Triggers and Event Sources

Optimizing the triggers and event sources for your Lambda functions can lead to more efficient resource utilization and cost savings. Consider using Amazon EventBridge or Amazon SNS to manage event-driven workflows and consolidate events, reducing the number of function invocations and thus lowering costs. To optimize triggers and event sources, consider the following:

Use Amazon EventBridge for centralized event routing, allowing you to filter and process events more efficiently.
Implement batching with Amazon SQS or Amazon Kinesis Data Streams to process multiple records with a single Lambda invocation.
Use Amazon SNS for event fan-out, distributing messages to multiple Lambda functions or other AWS services.

6. Implement Throttling and Control Invocation Rate of Lambda Functions

Throttling your Lambda functions can help prevent uncontrolled costs due to excessive invocations. By setting the reserved concurrency limit and using AWS services like API Gateway or Amazon SQS, you can manage the rate at which your functions are invoked, allowing you to maintain control over costs while ensuring application stability. To effectively implement throttling, follow these guidelines:

Set a reasonable reserved concurrency limit based on your application's expected traffic patterns.
Use Amazon API Gateway to apply custom throttling limits on a per-client basis, preventing abuse or overuse of your Lambda functions.
Employ Amazon SQS to queue and manage function invocations, providing backpressure and ensuring that functions are not overwhelmed during traffic spikes.

7. Use AWS Step Functions for Complex Serverless Workflows

For orchestrating complex workflows, consider using AWS Step Functions. This service enables you to create state machines that manage the flow of your Lambda functions, reducing the need for recursive functions or long-running executions. By offloading control logic to Step Functions, you can optimize Lambda usage and minimize costs. To leverage AWS Step Functions effectively, consider these tips:

Break down complex workflows into smaller, manageable steps that can be executed by individual Lambda functions.
Use error handling and retry strategies within Step Functions to handle failures and improve the resilience of your workflows.
Monitor and analyze Step Function executions using Amazon CloudWatch and AWS X-Ray to identify bottlenecks and inefficiencies.

8. Monitor and Analyze AWS Lambda Performance Metrics

Regularly monitoring and analyzing your Lambda functions' performance metrics is essential for identifying optimization opportunities. Utilize Amazon CloudWatch and AWS X-Ray to gain insights into your functions' behavior, identify bottlenecks, and make data-driven decisions to improve performance and cost efficiency. To effectively monitor and analyze performance metrics, consider the following:

Set up custom CloudWatch dashboards to visualize key performance indicators for your Lambda functions.
Use AWS X-Ray to trace function invocations, pinpointing issues and identifying areas for improvement.
Establish CloudWatch alarms to receive notifications when specific performance thresholds are breached, allowing you to take timely corrective actions.

9. Implement Savings Plans in AWS Lambda

For predictable, long-term Lambda workloads, consider purchasing AWS Savings Plans. These plans offer significant discounts compared to on-demand pricing, allowing you to optimize costs for steady-state serverless usage. Analyze your usage patterns and opt for Savings Plans to maximize your Lambda cost efficiency. To make the most of Savings Plans, follow these steps:

Review your historical Lambda usage to estimate future demand and identify suitable Savings Plans.
Monitor the Savings Plan utilization and coverage metrics to ensure you are getting the most value from your investment.
Regularly re-evaluate your Savings Plans as your workload evolves, adjusting plans as needed to optimize cost efficiency.

Conclusion

Optimizing AWS Lambda costs is a continuous process that requires a proactive approach and ongoing monitoring. By implementing these 9 essential techniques, you can effectively minimize expenses while maximizing the benefits of serverless computing. Stay ahead of the curve by regularly reviewing your Lambda functions and leveraging the powerful tools and resources available to you, ensuring optimal cost efficiency and high-performance serverless applications.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

In-Depth Guide to AWS Cost Optimization: 10 Cloud Saving Strategies

Guillermo Ojeda — Thu, 23 Mar 2023 16:14:54 GMT

AWS cost optimization is crucial for organizations looking to get the most value from their cloud investments. In this comprehensive guide, we reveal 10 powerful tools and strategies that can help you reduce AWS expenses and maximize efficiency. Implement these expert tips and watch your cloud costs decrease while your business thrives.

1. Use AWS Cost Explorer for Cost-Savings Analysis

AWS Cost Explorer is an indispensable tool that helps organizations visualize and understand their AWS spending. It provides detailed insights into your cost and usage patterns and allows you to identify potential areas for cost optimization. Here's what you need to know:

Key Components: AWS Cost Explorer features customizable reports, advanced filtering, and grouping capabilities. It integrates with AWS Organizations for consolidated billing and cost management across multiple accounts.
How it Works: Analyze usage patterns, identify cost drivers, and pinpoint areas for optimization. You can create custom reports, filter data by service, region, and other dimensions, and group costs by tags or other metadata.
Use Cases: Use AWS Cost Explorer to identify cost-saving opportunities and make data-driven decisions. For example, track Reserved Instance (RI) utilization, analyze data transfer costs, or monitor storage usage across different S3 storage classes.
Advantages: Gain visibility into spending trends, spot inefficiencies, and forecast future expenses. AWS Cost Explorer also helps you identify cost anomalies and prevent budget overruns.
Disadvantages: It may require some time investment to understand and interpret the data fully. Additionally, it doesn't provide real-time cost data, as it takes up to 24 hours for new data to appear.
Tips: Export your Cost Explorer data to a CSV file for further analysis and integration with other tools. Use AWS Organizations to consolidate billing data and manage costs across multiple accounts effectively.

2. Choose the Right Storage Solution

AWS offers multiple storage services, and choosing the right storage solution in AWS can be difficult.

Let's take S3 as an example. AWS S3 offers several storage classes to cater to various performance requirements and access patterns. Selecting the right storage class can significantly reduce costs. Here's what you need to know:

Storage Classes Overview: Amazon S3 provides storage classes such as S3 Standard, S3 One Zone-Infrequent Access, S3 Intelligent-Tiering, S3 Glacier, and S3 Glacier Deep Archive. Each storage class has different performance characteristics, access times, and costs, catering to different data access frequency and retrieval speed requirements.
How to Choose: Evaluate your data's access frequency, retrieval speed requirements, and durability needs. Infrequently accessed data can be stored in S3 One Zone-Infrequent Access, S3 Glacier, or S3 Glacier Deep Archive for cost savings. Frequently accessed data should be stored in S3 Standard or S3 Intelligent-Tiering, which automatically moves objects between frequent and infrequent access tiers based on changing access patterns.
Use Cases: Store backup data, log files, or archives in infrequent access or archival storage classes to optimize costs. Keep frequently accessed data, like website content or big data analytics input, in the S3 Standard or S3 Intelligent-Tiering storage classes for better performance.
Advantages: Optimize storage costs and performance by choosing the most cost-effective storage class for each dataset. This can result in significant savings, especially for large or growing data stores.
Disadvantages: Requires regular monitoring to ensure the storage class still meets your needs. Data retrieval from archival storage classes, such as S3 Glacier and S3 Glacier Deep Archive, can be slow and expensive.
Tips: Use Amazon S3 Inventory to audit your object metadata and transition objects between storage classes using S3 Lifecycle policies. Consider the S3 Intelligent-Tiering storage class for automatic cost optimization without manual intervention.

3. Implement AWS Auto Scaling

Auto Scaling allows you to automatically adjust your AWS resources based on actual demand, helping you optimize costs while maintaining performance. Here's what you need to know:

Key Components: AWS Auto Scaling, scaling policies, and Amazon CloudWatch alarms.
How it Works: Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. It uses scaling policies and CloudWatch alarms to determine when to scale in or out based on predefined thresholds or target values.
Use Cases: Scale Amazon EC2 instances, Amazon RDS instances, and other AWS resources to match fluctuating workloads. For example, scale out web servers during peak traffic times and scale in during periods of low traffic.
Advantages: Pay only for the resources you need, when you need them, and maintain optimal performance. Auto Scaling enables you to handle unexpected traffic spikes without manual intervention.
Disadvantages: Requires proper configuration and monitoring to ensure the desired results. Improper configuration can lead to increased costs or reduced performance.
Tips: Use predictive scaling to forecast future resource needs and proactively scale resources. Combine Auto Scaling with load balancing for optimal distribution of traffic across instances.

4. Adopt a Serverless Architecture with AWS Lambda

Serverless computing with AWS Lambda eliminates the need to provision and manage servers, allowing you to focus on your code. Here's what you need to know:

Key Components: AWS Lambda, Lambda functions, and event triggers.
How it Works: Write and deploy your code as Lambda functions, which are automatically triggered by AWS services, such as Amazon S3, Amazon DynamoDB, or custom application events. Lambda automatically scales your application by running code in response to each trigger, adjusting the number of concurrent executions based on incoming request rates.
Use Cases: Build applications with microservices, process files uploaded to Amazon S3, perform real-time data processing, and create custom backends for mobile and web applications.
Advantages: Pay only for the compute time you consume, reducing operational overhead. Lambda enables you to scale your applications without worrying about provisioning or managing servers.
Disadvantages: Limited control over the underlying infrastructure, which may be unsuitable for certain applications. Additionally, Lambda functions have a maximum execution time of 15 minutes, which may not be suitable for long-running tasks.
Tips: Use AWS Step Functions to coordinate complex workflows involving multiple Lambda functions. Monitor your Lambda functions with Amazon CloudWatch and use AWS X-Ray for distributed tracing and performance analysis. View our guide to optimizing AWS Lambda costs.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

5. Optimize Data Transfer and Bandwidth Usage

Data transfer costs can be a significant contributor to your AWS bill. Optimizing data transfer and bandwidth usage can result in cost savings. Here's what you need to know:

Key Components: Amazon CloudFront, AWS Direct Connect, and Amazon S3 Transfer Acceleration.
How it Works: Optimize data transfer by utilizing Amazon CloudFront, a global content delivery network (CDN) that caches and delivers content from edge locations closer to your users. AWS Direct Connect provides a dedicated network connection between your on-premises data centers and AWS, reducing data transfer costs and latency.
Use Cases: Improve the performance and reduce the costs of distributing content to end-users, such as website assets, video streaming, and API responses. Use AWS Direct Connect to securely transfer large volumes of data between your on-premises infrastructure and AWS, or to establish a hybrid cloud environment.
Advantages: Reduce data transfer costs, improve content delivery performance, and decrease latency. CloudFront provides built-in security features, like DDoS protection and data encryption.
Disadvantages: Requires proper configuration and monitoring to ensure optimal performance and cost savings. Additional costs may be incurred for using CloudFront or Direct Connect.
Tips: Use Amazon S3 Transfer Acceleration to speed up transfers of large files over the public internet to Amazon S3. Implement caching strategies and compress content to further optimize data transfer and bandwidth usage.

6. Rightsize Your EC2 Instances and Use Reserved Instances

Regularly review your instances and rightsize them based on your actual usage to eliminate over-provisioning. Here's what you need to know:

Rightsizing: Utilize AWS Compute Optimizer to analyze historical data and provide recommendations for rightsizing instances. Review CPU, memory, and network usage to identify underutilized or overprovisioned instances.
Reserved Instances (RIs): Purchase RIs for predictable, long-term workloads to benefit from significant discounts compared to On-Demand pricing. RIs are available for various AWS services, such as EC2, RDS, and ElastiCache.
Use Cases: Optimize costs for steady-state, predictable workloads, or workloads with consistent usage patterns.
Advantages: Save up to 75% on compute costs compared to On-Demand pricing. RIs provide cost predictability and help you commit to specific instance types and regions.
Disadvantages: RIs require an upfront commitment for a 1-year or 3-year term, which may not be suitable for all workloads or budgets. Inflexible RIs may result in underutilized resources if your workloads change significantly.
Tips: Use AWS Savings Plans as an alternative to RIs, which offer similar cost savings but with greater flexibility. Regularly monitor your RI utilization and modify or exchange RIs as needed to match your changing workloads.

7. Use Spot Instances for Flexible Workloads

Spot Instances offer a cost-effective alternative to On-Demand instances, allowing you to bid on unused AWS capacity at a fraction of the cost. Here's what you need to know:

How it Works: Bid on unused EC2 capacity at a discount of up to 90% compared to On-Demand pricing. Spot Instances are ideal for fault-tolerant, flexible workloads that can withstand interruptions.
Use Cases: Run batch jobs, big data analytics, or test and development environments that can tolerate interruptions and don't require strict availability guarantees.
Advantages: Save up to 90% on compute costs, making Spot Instances an economical choice for suitable workloads.
Disadvantages: Spot Instances can be terminated with short notice if AWS needs the capacity or if your bid price is below the current Spot price. They may not be suitable for mission-critical or time-sensitive applications.
Tips: Use Spot Fleet to launch and manage a collection of Spot Instances with a single request. Implement strategies like capacity-optimized allocation or diversified instance types to reduce the risk of interruptions and improve the availability of your Spot Instances.

8. Employ AWS Budgets and Cost Allocation Tags

AWS Budgets enable you to set custom cost and usage budgets, helping you stay within your financial limits. Here's what you need to know:

AWS Budgets: Set custom cost and usage budgets based on your requirements. Receive alerts when your spending approaches or exceeds your defined thresholds.
Cost Allocation Tags: Use cost allocation tags to categorize resources and gain better visibility into your AWS costs. Assign tags to resources like EC2 instances, S3 buckets, and Lambda functions to track spending by project, department, or any custom category.
Use Cases: Monitor and control your AWS spending, allocate costs to different teams, projects, or environments, and improve financial planning and forecasting.
Advantages: Gain granular insights into your AWS costs, improve cost management, and ensure accountability across your organization. AWS Budgets helps you proactively manage your cloud expenses and prevent cost overruns.
Disadvantages: Requires ongoing maintenance and monitoring to ensure accurate and up-to-date cost tracking. Inconsistent or incomplete tagging can result in inaccurate cost allocation.
Tips: Define a consistent tagging strategy across your organization, and automate tag enforcement using tools like AWS Organizations and AWS Config. Use AWS Cost Categories to group your costs into custom categories, simplifying cost allocation and analysis.

9. Leverage AWS Trusted Advisor for Best Practices

AWS Trusted Advisor is a powerful resource that provides real-time guidance on cost optimization, security, and performance best practices. Here's what you need to know:

How it Works: Trusted Advisor performs checks on your AWS environment and provides recommendations based on AWS best practices. The Cost Optimization checks help identify underutilized resources, overprovisioned instances, and other opportunities to reduce costs.
Use Cases: Improve cost efficiency, enhance security, boost performance, and ensure fault tolerance in your AWS environment.
Advantages: Identify cost-saving opportunities and optimize your AWS environment based on expert recommendations. Trusted Advisor also offers guidance on security, performance, and fault tolerance, ensuring a well-rounded approach to cloud management.
Disadvantages: Some Trusted Advisor checks and features are only available with a Business or Enterprise Support plan, which may not be cost-effective for all organizations.
Tips: Regularly review Trusted Advisor recommendations and implement changes to optimize your AWS environment. Set up email notifications for new Trusted Advisor findings to stay informed and address issues promptly.

10. Implement practices from the Well-Architected Framework

Adopting the AWS Well-Architected Framework helps ensure your infrastructure is designed and operated efficiently. Here's what you need to know:

Key Components: The Well-Architected Framework consists of five pillars: Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Optimization.
How it Works: The framework provides a set of best practices, design principles, and questions to help you build and maintain a well-architected AWS environment. The Cost Optimization pillar focuses on eliminating unneeded resources, rightsizing instances, and optimizing your AWS environment.
Use Cases: Design, build, and maintain a cost-efficient, secure, and high-performing AWS infrastructure. Apply the Well-Architected Framework to new projects or review existing architectures for improvement opportunities.
Advantages: Minimize costs and maximize the value of your AWS investment. Improve your overall cloud infrastructure by addressing all five pillars of the Well-Architected Framework.
Disadvantages: Implementing the framework requires time and effort, and may necessitate changes to existing infrastructure or processes.
Tips: Use the AWS Well-Architected Tool to assess your workloads against the framework's best practices and receive improvement recommendations. Engage an AWS Well-Architected Partner for expert guidance and assistance in implementing the framework.

Conclusion

Optimizing AWS costs is an ongoing process that requires a proactive approach, regular monitoring, and strategic planning. By diving deeper into these 10 expert strategies and leveraging the powerful tools and resources available to you, you can effectively reduce your AWS expenses, boost efficiency, and make the most of your cloud investment.

Stay ahead of the curve by continuously improving your AWS infrastructure and keeping up to date about new features and cost-optimization best practices for AWS. Additionally, make sure you use AWS cost-optimization tools to aid you in your cost-optimization efforts. By implementing these strategies and maintaining a thorough understanding of your organization's AWS usage, you can successfully minimize costs, maximize performance, and ensure the long-term success of your cloud-based infrastructure.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Automating AWS EBS Snapshots With Python and Boto3

Guillermo Ojeda — Thu, 23 Mar 2023 15:45:02 GMT

In this guide, we'll discuss how to automate AWS EBS snapshot creation and receive email notifications using Python and Boto3. We will cover the entire process step by step, from setting up the AWS environment to writing and executing the Python script.

Prerequisites

Before we begin, ensure that you have the following:

An active AWS account.
Basic knowledge of AWS services, particularly EBS, Lambda, and SNS.
Familiarity with Python programming and the Boto3 library.

Setting Up the AWS Environment and Installing Boto3

In order to automate EBS snapshots and send email notifications, we need to configure the AWS environment by performing the following steps:

Install the AWS CLI on your local machine by following the official AWS guide.
Configure the AWS CLI by running the command aws configure and entering your AWS Access Key ID, Secret Access Key, and preferred AWS region.
Install the Boto3 library by executing the command pip install boto3.

With the environment set up, we can proceed to create an SNS topic.

Amazon Simple Notification Service (SNS) allows us to send email notifications. To create an SNS topic, follow these steps:

Sign in to the AWS Management Console.
Navigate to the SNS dashboard.
Click on "Topics" in the left menu, and then select "Create topic".
Choose "Standard" as the topic type and enter a name and display name for your topic.
Click "Create topic" to finalize the creation process.
Copy the ARN (Amazon Resource Name) of the newly created topic for later use.

Now, let's move on to writing the Python script.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Automating EBS Snapshots with Python and Boto3

Create a new Python file, ebs_snapshot_automation.py, and add the following code:

import boto3import datetime# Replace the ARN with your SNS topic ARNSNS_TOPIC_ARN = 'arn:aws:sns:us-west-2:123456789012:MyEBSSnapshots'def create_snapshot_and_notify(volume_id):    ec2 = boto3.resource('ec2')    volume = ec2.Volume(volume_id)    snapshot = volume.create_snapshot(        Description=f'Snapshot of {volume_id} on {datetime.datetime.now()}'    )    sns = boto3.client('sns')    response = sns.publish(        TopicArn=SNS_TOPIC_ARN,        Message=f'Snapshot {snapshot.id} created for volume {volume_id}',        Subject='EBS Snapshot Created'    )    return responsedef lambda_handler(event, context):    VOLUME_IDS = event['volume_ids']    for volume_id in VOLUME_IDS:        create_snapshot_and_notify(volume_id)

Replace the SNS_TOPIC_ARN variable with the ARN of the SNS topic created earlier. This script takes a list of EBS volume IDs and creates snapshots for each one, then sends an email notification with the snapshot ID.

Scheduling the Boto3 Script to Automate EBS Snapshots

To schedule the script to run periodically, follow these steps to create an AWS Lambda function and set up an Amazon CloudWatch Events rule:

Sign in to the AWS Management Console.
Navigate to the Lambda dashboard and click on "Create function".
Select "Author from scratch" and enter a name for your function, such as ebs_snapshot_automation.
Choose "Python 3.8" as the runtime, and then configure the function's execution role.
Click "Create function" to finalize the creation process.
In the "Function code" section, upload the ebs_snapshot_automation.py script or copy the script into the inline editor.
Set the lambda_handler as the handler for your Lambda function.
Navigate to the Amazon CloudWatch dashboard and click on "Rules" under the "Events" section.
Click on "Create rule" and select "Schedule" as the event source.
Configure the rule to run at your desired frequency (e.g., daily or weekly).
In the "Targets" section, select the Lambda function you created earlier.
Click "Configure details" and provide a name and description for the rule.
Click "Create rule" to set up the scheduled event.

With the CloudWatch Events rule in place, the Lambda function will execute the Python script at the specified interval, automating EBS snapshot creation and sending email notifications.

Conclusion

In this article, we have demonstrated how to automate AWS EBS snapshot creation and send email notifications using Python and Boto3. We covered setting up the AWS environment, creating an SNS topic, writing the Python script, and scheduling the script using AWS Lambda and Amazon CloudWatch Events.

By following these steps, you can ensure that your EBS volumes are backed up regularly and that you receive timely email notifications. This process will help you maintain data consistency and recover from potential data loss or failures, ensuring the reliability and stability of your AWS infrastructure.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Ultimate Guide to Understanding and Sizing AWS RDS and Aurora Instances

Guillermo Ojeda — Fri, 27 Jan 2023 00:24:57 GMT

If you're managing your databases in the cloud with AWS RDS or Aurora instances, choosing the right instance size is crucial. You want to ensure that your instance can handle the traffic generated by your application while also avoiding overprovisioning and keeping costs down. In this guide, we will provide you with best practices for rightsizing your AWS RDS or Aurora instance.

What are AWS RDS and Aurora Instances

Before diving into rightsizing, it's essential to know the basics of AWS RDS and (Aurora instances)[https://blog.guilleojeda.com/understanding-aws-aurora-instance-types?utm_source=blog&utm_medium=hashnode]. RDS is a fully managed service that makes it easy to set up, operate, and scale a relational database in the cloud. Aurora, on the other hand, is a MySQL and PostgreSQL-compatible relational database engine that is fully managed and designed to be more performant than RDS. Both RDS and Aurora instances come in different sizes, or "instance types," that are optimized for different workloads.

AWS RDS and Aurora Instance Types

The different instance types available can be divided into several families, each with their unique characteristics. The most commonly used instance families are:

db.r6g: This family is designed for memory-intensive workloads, such as in-memory databases, high-performance computing, and data-intensive workloads.
db.t4g: This family is designed for low-cost, general-purpose workloads, including development and test environments, small-scale production workloads, and applications that don't require a lot of CPU power.
db.m6g: This family is designed for general-purpose workloads and is optimized for a balance of compute, memory, and network resources.

How to Determine the Right Size for Your RDS Instances

Selecting the appropriate initial size for your RDS or Aurora instance is crucial to ensure it can handle the expected workload. To do so, you should consider several factors, including the workload, traffic, and expected growth. AWS RDS and (Aurora instance types)[https://blog.guilleojeda.com/understanding-aws-aurora-instance-types?utm_source=blog&utm_medium=hashnode] and families can serve as a guide to select the right initial size for your workload.

For example, if you're expecting a moderate amount of read and write requests, with a moderate amount of data and simple queries, you can start with a (db.t4g.medium)[https://blog.guilleojeda.com/everything-you-need-to-know-about-aws-rds-instance-types?utm_source=blog&utm_medium=hashnode] instance type. If you're expecting a high amount of read and write requests, with a large amount of data and complex queries, you can start with a db.m6g.large instance type.

Once you have launched the instance, it's essential to monitor its performance using AWS RDS and Aurora performance metrics. These metrics, such as CPU utilization, memory usage, and I/O operations, can help you evaluate the instance's performance and identify when to resize.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Identifying When to Right-Size Your RDS Instances

Rightsizing your RDS or Aurora instance involves adjusting the resources to match the actual usage of your workload. This can include adjusting the CPU, memory, and storage of your instance to better match the needs of your application. Several methods can be used to identify when to rightsize, including:

Using CloudWatch Metrics for Right-Sizing RDS Instances

Step 1: Go to the CloudWatch Console.
Step 2: Select the desired time range for your metrics.
Step 3: Locate your RDS or Aurora instance and select the "DBInstanceIdentifier"
Step 4: Check the metrics for CPU, Memory and Storage usage.
Step 5: Compare the usage with the current configuration of your instance and determine if there is an opportunity to rightsize.

Using Performance Insights for Right-sizing RDS Instances

Step 1: Go to the RDS or Aurora Console.
Step 2: Select your RDS or Aurora instance.
Step 3: Click on the "Performance Insights" button.
Step 4: Check the metrics for CPU, Memory and Storage usage.
Step 5: Compare the usage with the current configuration of your instance and determine if there is an opportunity to rightsize.

Using SQL Queries for Right-Sizing RDS Instances

Step 1: Connect to your RDS or Aurora instance using SQL client.
Step 2: Run the following query: "SHOW STATUS WHERE variable_name = 'Threads_connected'"
Step 3: Check the value of the "Threads_connected"
Step 4: Compare the value with the current configuration of your instance and determine if there is an opportunity to rightsize.

By using a combination of these methods, you can gain a better understanding of the usage patterns of your RDS or Aurora instance and make informed decisions about when to rightsize.

How to Resize RDS Instances with Minimal Downtime

Resizing your RDS or Aurora instance involves modifying its instance type or storage capacity. When resizing, it's important to keep downtime to a minimum to avoid impacting your application's availability and business continuity. To minimize downtime, you can take the following steps:

Step 1: Create a read replica of your RDS or Aurora instance in the same Availability Zone as the original instance.
Step 2: Verify that the read replica is in a "Ready" state.
Step 3: Promote the read replica to be a standalone database. This ensures that any data written to the original instance during the migration process is also available on the new instance.
Step 4: Modify the DNS entry or connection string of your application to point to the new primary instance.
Step 5: After the DNS change propagates, ensure that your application can successfully connect to the new instance.
Step 6: Test your application thoroughly to ensure that it works as expected.
Step 7: Finally, delete the original DB instance.

By following this process, you can resize your RDS instance with minimal downtime and without losing any data. However, it's important to note that there may still be a short period of downtime during the switch-over process, typically around 1-2 minutes. It's also a good idea to take a snapshot of your RDS instance before resizing it, in case you need to roll back the changes.

Conclusion

Choosing the right size for your AWS RDS or Aurora instance is crucial to ensure optimal performance and cost efficiency. By understanding the basics of instance types and families, picking the right initial size, identifying when to rightsize, and resizing with minimum downtime, you can optimize your database performance and reduce unnecessary costs. Remember to regularly monitor the performance of your instances and make adjustments as needed to keep your applications running smoothly.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

How to Pass the AWS Solutions Architect Associate Certification

Guillermo Ojeda — Thu, 22 Dec 2022 21:27:33 GMT

If you're looking to break into the cloud engineering field, you know it's not easy to land your first role. Building a portfolio of side projects can help, but it may not be enough. Fortunately, there's another option: earning the AWS Solutions Architect Associate certification.

What is the AWS Solutions Architect Associate Certification?

The AWS Solutions Architect Associate certification is the most general of the three mid-level (Associate) AWS certifications. In theory it's designed for individuals who have experience designing distributed systems and applications on the AWS platform. The certification covers so many services that it can be overwhelming at first sight. But in reality only the core services are covered in depth. You'll need to understand multiple types of architectures and how to build them with VPC, EC2, ECS, ELB, Lambda, and other services. Other services such as SSM or OpsWorks are also covered, but you won't need to get down to the fine details.

How Can the AWS Solutions Architect Associate Certification Help Beginners?

Demonstrates skills and knowledge: It shows that you know what you're doing and can actually build a lot of different things.
Shows a commitment to learning: It takes around 100 hours to study and get the certification. It's not as hard as it sounds, but it's not a walk in the park either.
Often leads to higher salaries: According to a survey conducted by Global Knowledge, AWS-certified individuals earn an average of 18% more than their non-certified counterparts. While 18% may not seem like a lot, finding somewhere else to put 100 hours that will increase your salary by an average of 18% may not be easy.
Helps you stand out in a crowded job market: There's a lot of competition for intermediate and senior roles. For beginner roles, there's a lot of people and a much smaller number of jobs. By getting the AWS Solutions Architect Associate certification, you are setting yourself apart from all the other job seekers. It's not easy, and only a few junior cloud engineers will have it, if any.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Tips to Pass the AWS Solutions Architect Associate Certification Exam

Find a great course: Finding a great course can make a huge difference in your ability to pass the exam. I highly recommend Adrian Cantrill's course for this. It covers everything you need to know and even provides guided practical exercises you can do. The best thing is that no previous AWS experience is needed. You can jump right into this course, and it'll teach you everything you need to know to pass the exam confidently. You can purchase the course from Adrian's website.
Review the Exam Objectives and Study Materials: The AWS Solutions Architect Associate exam covers a wide range of topics. Therefore, it is important to understand what is included on the exam. You can find the exam objectives on the AWS Training and Certification website. Use the exam objectives as a guide for your study plan, focusing on the heavily tested areas. The AWS Training and Certification website provides a variety of study materials, including whitepapers, FAQs, and practice exams.
Familiarize yourself with AWS: The AWS Solutions Architect Associate exam covers a wide range of services and features. It's important that you know your way around. I recommend exploring the AWS documentation and getting hands-on experience with the AWS platform by building and deploying applications on it. Logging into the AWS console, choosing a random service, and exploring it is a great first step.
Understand key concepts and design principles: The AWS Solutions Architect Associate exam tests your ability to design and implement scalable, fault-tolerant, and secure systems on AWS. To do this effectively, you need to understand key concepts such as elasticity, scalability, high availability, and security. I recommend reviewing the AWS Well-Architected Framework, which gives best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. You should also familiarize yourself with design patterns and principles such as the 12-Factor App, which outlines best practices for building cloud-native applications. The Simple AWS newsletter is a great free resource to learn key AWS concepts and solutions.
Practice, practice, practice: Getting hands-on experience is one of the most effective methods to prepare for the AWS Solutions Architect Associate exam. Adrian Cantrill's course includes many practical exercises. Additionally, you can do online workshops (which are highly recommended) or create some projects on your own. A list of recommended workshops will be provided at the end.
Understand the Exam Format and Time Management: The AWS Solutions Architect Associate exam consists of 65 multiple-choice and multiple-response questions, and you have 130 minutes to complete it. It is critical to understand the exam format and manage your time effectively during the exam. You should practice with sample questions and practice exams to get a sense of the pacing and difficulty of the exam. Another tip is to practice with a complete set of questions so that you're prepared for the actual exam experience. It's not the same to answer ten questions, take a break, and repeat as it is to sit for two hours of intense concentration.
Take care of yourself: Taking care of yourself is crucial during this time. Make sure to get enough sleep, eat healthy meals, and take breaks to rest and relax. Taking care of yourself will help you stay focused and perform at your best on the exam.
Seek Help if Needed: If you're struggling to prepare for the AWS Solutions Architect Associate exam, don't hesitate to seek help. There are plenty of online resources and study groups available to help you prepare for the exam. You can also consider hiring a tutor or mentor who can provide personalized coaching and guidance. If you need a hand, don't hesitate to reach out to me. I offer my help entirely for free, no strings attached.
I hope these tips are helpful as you prepare for the AWS Solutions Architect Associate exam. Good luck on the exam, and good luck in your career as a cloud engineer. Remember to always keep learning, and most importantly, take care of yourself.

AWS Workshops That Help With Solutions Architect Associate

To further improve your cloud engineering skills and prepare for the AWS Solutions Architect Associate certification, you might want to check out the following AWS workshops:

These workshops cover a variety of topics and provide hands-on experience with AWS services and tools, allowing you to gain practical knowledge that will be useful both for the exam and your future career as a cloud engineer. By participating in these workshops, you'll get a better understanding of AWS services and best practices, helping you become more confident in your ability to design and implement solutions on the AWS platform.

In conclusion, earning the AWS Solutions Architect Associate certification can significantly boost your chances of landing your first cloud engineer role. By following the tips outlined in this article and dedicating time to study and practice, you'll be well-prepared to ace the exam and showcase your skills and commitment to potential employers. Remember that continuous learning and self-improvement are key to a successful career in the ever-evolving field of cloud engineering.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Monitoring and Tracing in Event-Driven Architectures: A Guide to AWS X-Ray

Guillermo Ojeda — Tue, 20 Dec 2022 19:46:21 GMT

Event-driven architecture is a powerful design pattern that places events as the central component of the system. When events are generated by external sources, they trigger a sequence of actions within the system. In this architecture, the components don't invoke other components directly, but they emit events with no knowledge of what other components might be subscribed to those events. This reduces coupling between components, allowing them to scale independently. However, managing an event-driven architecture can be quite challenging due to its inherent complexity and distributed nature. Fortunately, AWS X-Ray can help with tracing and monitoring.

Challenges in Event-Driven Architectures

Managing an event-driven architecture requires proper tracing and monitoring, which can be challenging due to its distributed nature. With no adequate tracing mechanisms in place, identifying the root cause of an issue becomes especially difficult. Monitoring an event-driven architecture is another problem altogether. Individual components can be monitored just fine, but a holistic view of the system is much harder to achieve.

Monitoring means collecting data and metrics about the system, so you can better understand its performance. This can include things like:

Tracking the number of events processed per second
Monitoring the response time of the system
Monitoring the overall health of the system

Tracing is the process of tracking a specific request or event as it travels through the system. This helps you understand how the system is processing a particular event, and identify bottlenecks and other issues.

What is AWS X-Ray

AWS X-Ray is a service that enables you to monitor and trace applications and microservices, including serverless applications. It provides detailed visibility into the request and event flow of the system, giving you the tools you need to troubleshoot issues in event-driven architectures.

With AWS X-Ray you can trace requests and events as they travel through the system, view performance metrics in real-time, and generate reports. This is especially useful for event-driven architectures, where understanding how requests travel through the system is particularly difficult.

Benefits of using AWS X-Ray

Using AWS X-Ray in event-driven architectures has several benefits:

Improved troubleshooting: AWS X-Ray provides detailed traces and visualizations of the request flow, which helps you quickly identify and resolve issues.
Enhanced performance: By identifying bottlenecks and other performance issues, you can optimize your architecture for better performance.
Greater visibility: AWS X-Ray provides a holistic view of your system, making it easier to understand how individual components interact with one another.
Easier collaboration: AWS X-Ray's detailed traces and visualizations can be shared among team members, making it easier for everyone to understand the system and collaborate on resolving issues.

How To Use AWS X-Ray in Event-driven Architectures

Let's consider an example event-driven architecture built on AWS using Node.js: An order-processing system. When a user places an order, it triggers an event that is picked up by an AWS Lambda function. This function then writes the order to a DynamoDB table and sends a message to an SNS topic. The SNS topic triggers another Lambda function, which sends a confirmation email to the user.

Let's consider a typical problem with this architecture. Most orders are processed successfully, but sometimes a user doesn't receive the confirmation email. Without proper monitoring and tracing, it can be very difficult to identify the root cause of this problem. That's what AWS X-Ray solves.

AWS X-Ray is so important in these scenarios that it's been included in a list of 20 Advanced Tips for Lambda.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

How to set up AWS X-Ray

Setting up AWS X-Ray is easy and straightforward. First, set up AWS X-Ray in your order-processing Lambda function by installing the AWS X-Ray SDK for Node.js. Then, wrap the code in an X-Ray segment. If you're using the AWS SDK or HTTP requests, you need to wrap them with the captureAWS and captureHTTPsGlobal functions, respectively. After you've added AWS X-Ray to your Node.js functions, you need to modify your DynamoDB table and SNS topic to be X-Ray enabled. Let's see it in more detail.

How to set up AWS X-Ray in a Lambda function using Node.js

Install the AWS X-Ray SDK for Node.js:

npm install aws-xray-sdk

Require the AWS X-Ray SDK at the top of your Lambda function code:

const AWSXRay = require('aws-xray-sdk');

Wrap the code in an X-Ray Segment:

const segment = new AWSXRay.Segment('my-function-name');AWSXRay.captureAsyncFunc('my-function-name', function(callback) {  // Your code  callback();});

If you're using the AWS SDK (for example to write to our DynamoDB table), you need to wrap it with the captureAWS function:

const AWS = AWSXRay.captureAWS(require('aws-sdk'));

If you are using HTTP requests, you can wrap the https package with the captureHTTPsGlobal function:

AWSXRay.captureHTTPsGlobal(require('https'));

This is a complete example of a Lambda function that has been modified to use the AWS X-Ray SDK:

const AWSXRay = require('aws-xray-sdk');const AWS = AWSXRay.captureAWS(require('aws-sdk'));AWSXRay.captureHTTPsGlobal(require('https'));exports.handler = async (event) => {  const segment = new AWSXRay.Segment('my-function-name');  try {    // Your code goes here  } catch (error) {    console.error(error);    throw error;  } finally {    segment.close();  }};

Remember to add AWS X-Ray to both your order-processing Lambda function and your email sending Lambda function.

How to set up AWS X-Ray for DynamoDB

To set up AWS X-Ray for DynamoDB, follow these simple steps:

In the list of tables, click on the name of the table.
On the Table Details page, click the "Actions" dropdown and select "Modify".
In the "Advanced settings" section, click the "Edit" button that's next to the "AWS X-Ray tracing" setting.
In the "AWS X-Ray Tracing" dialog, select "Yes".
Click "Save".

How to set up AWS X-Ray for SNS

To set up AWS X-Ray for SNS, follow these simple steps:

In the list of topics, click on the name of the SNS Topic.
On the topic details page, click the "Actions" dropdown and select "Modify".
In the "Advanced settings" section, click the "Edit" button next to the "AWS X-Ray tracing" setting.
In the "AWS X-Ray Tracing" dialog, select "Yes".
Click "Save".

Using AWS X-Ray to Trace and Debug Applications

Once AWS X-Ray is set up in every component, we can use it to solve the problem of some users not receiving the confirmation email.

The X-Ray console can show a trace of a request that is experiencing problems and display exactly where the request is getting stuck. We might find that the message sent to the SNS topic is not reaching its intended destination. Or perhaps the second Lambda function is encountering an error when trying to send the confirmation email.

Once we know where the problem is, we can set out to fix it. We can modify the Lambda function to retry the request if it fails, or add additional error handling logic for uncaught errors.

Best Practices for AWS X-Ray

To get the most out of AWS X-Ray, follow these tips and best practices:

Enable X-Ray for all relevant components: To get the most out of X-Ray, it's important to enable it for all relevant components of your system. This includes Lambda functions, DynamoDB tables, and SNS topics.
Use the X-Ray console and API to view and analyze trace data: Use the console to view a visual representation of your trace data. Use the API to programmatically access and manipulate trace data, and automate tasks such as performance analysis or error reporting.
Use segments and subsegments to add context to your trace data: Use segments to represent the overall flow of a request through the system. Use subsegments to represent specific actions within that flow. By adding context to your trace data, you can more easily understand and troubleshoot issues that may be occurring.
Enable sampling to reduce the overhead of tracing: Tracing can add overhead to your system, particularly if you are instrumenting a large number of functions or if your functions are being invoked frequently. To reduce this overhead, you can enable sampling in X-Ray. This will cause X-Ray to only trace a subset of requests, which can significantly reduce the overhead of tracing, as well as the costs.
Use custom attributes to add context to your trace data: Custom attributes are key-value pairs that you can add to your trace data to provide additional context. Use custom attributes to add metadata about your requests, such as user IDs or request IDs.
Use the X-Ray SDK to add error handling to your Lambda functions: The X-Ray SDK provides a captureFunc method that you can use to capture errors and add them to trace data.

Conclusion

Monitoring and tracing are critical in event-driven architectures. AWS X-Ray provides detailed visibility into the request and event flow of the system, giving you the tools you need to troubleshoot issues in event-driven architectures. By following the steps outlined in this article, you can easily set up AWS X-Ray for your event-driven architecture and enjoy the benefits of better tracing and monitoring.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Automating EBS Snapshots for Disaster Recovery

Guillermo Ojeda — Fri, 16 Dec 2022 23:10:12 GMT

If you use Amazon Web Services (AWS), you are likely familiar with EBS, Elastic Block Store. It is a block-level storage service for EC2 instances, which means it is a persistent storage device that you can attach to your EC2 instances and use like a physical hard drive. If you want to learn more about EBS, check out Amazon EBS Basics and Best Practices.

In this article, we will focus on EBS snapshots, which are point-in-time copies of your EBS volumes that you can use to back up your data.

What are AWS EBS Snapshots

EBS snapshots are point-in-time copies of your EBS volumes that you can use to back up your data. By creating a backup (an EBS snapshot) of your disk (the EBS volume), you can restore a new EBS volume from that snapshot.

EBS snapshots are incremental, which means they only capture the data that has changed since the last snapshot. For example, if you have a 100 GB volume and you take a snapshot, then make a small change to the volume and take another snapshot, the second snapshot will only contain the data that has changed since the first snapshot. This makes EBS snapshots more efficient and cost-effective than full-volume backups. The size of an EBS snapshot is calculated based on the amount of data stored in the volume at the time the snapshot was taken.

Categories of EBS Snapshots

There are two categories of EBS snapshots: Standard snapshots and Archive snapshots. Standard snapshots are stored in Amazon S3 and are designed for fast recovery of data. They are the default type of snapshot and are suitable for most use cases. Archive snapshots, on the other hand, are stored in Amazon S3 Glacier and are designed for long-term data retention. They are more cost-effective than Standard snapshots, but have retrieval times of a few hours (because they are retrieved from Glacier instead of S3).

Using EBS Snapshots for Disaster Recovery

EBS snapshots are regional, which means they can only be used in the region where they were created. If you need to use an EBS snapshot in another region (e.g., for disaster recovery), you will need to export it. However, it is important to export the snapshot before the region becomes unavailable. You can automate EBS snapshot creation and copying to another region using various tools like AWS CLI command, AWS Systems Manager Automation, or Data Lifecycle Manager (DLM).

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Automating EBS Snapshots in AWS with Amazon DLM

You can automate the creation and management of EBS snapshots using AWS Data Lifecycle Manager (DLM). With DLM, you can create snapshot policies that specify the schedule, retention, and other settings for snapshot creation.

Amazon DLM allows you to create snapshot policies that specify the schedule, retention, and other settings for snapshot creation. You can create snapshot policies using tags to identify the volumes to include in the snapshot policy. For example, you can create a snapshot policy that applies to all EBS volumes with the tag key "Snapshot" and value "true."

In addition to taking snapshots of EBS volumes, DLM also allows you to enable cross-region copy, which copies the snapshots to another region. This can be useful for disaster recovery or to create a backup of your data in a different location. To enable cross-region copy, you need to specify the destination region and the KMS key to use for encryption in the snapshot policy. Once enabled, DLM will automatically copy the snapshots to the specified region on the specified schedule.

Here's a sample CloudFormation template to automate EBS snapshot creation and copying to another region using DLM. As always, be careful with what you deploy from the internet.

---AWSTemplateFormatVersion: '2010-09-09'Parameters:  KmsKeyArn:    Type: String    Description: The ARN of the KMS key to use for encrypting cross-Region snapshot copies  DestinationRegion:    Type: String    Description: The destination region to copy the snapshots toResources:  SnapshotPolicy:    Type: AWS::DLM::LifecyclePolicy    Properties:      Description: EBS snapshot policy with cross-Region copy      PolicyDetails:        ResourceTypes:          - VOLUME        TargetTags:          -            Key: Snapshot            Value: true        Schedules:          - Name: DailySnapshot            CopyTags: true            CreateRule:              Interval: 1              IntervalUnit: DAYS            RetainRule:              Count: 7        Parameters:          ExcludeBootVolume: true          RestorablePeriod: 0          CrossRegionCopy:            DestinationRegion: !Ref DestinationRegion            Encrypted: true            KmsKeyArn: !Ref KmsKeyArn

How to Restore from an EBS Snapshot

You can create a new EBS volume from the snapshot and attach it to an EC2 instance like a regular EBS volume. The process is straightforward for Standard snapshots, and the volume is ready for use as soon as it's created. However, for Archive snapshots, you need to retrieve the snapshot from Amazon S3 Glacier and then create the volume from the snapshot, which can take several hours, depending on the size of the snapshot and the retrieval tier you choose.

The pricing for restoring an Archive EBS snapshot is based on the size of the snapshot. In the us-east-1 region, Archive snapshot restores are priced at $0.03 per GB of data retrieved. For example, if you need to restore a 200 GB Archive snapshot, the cost would be 200 * $0.03 = $6.

Cost Analysis of EBS Snapshots

EBS snapshots are priced per GB-month of data stored, and there are two types of EBS snapshots available: Standard and Archive. In the us-east-1 region, Standard snapshots are priced at $0.05 per GB-month, while Archive snapshots are priced at $0.0125 per GB-month. This means that Archive snapshots are a lot cheaper than Standard snapshots. However, keep in mind that Archive snapshots have long retrieval times.

For example, a 200 GB Standard snapshot that's stored for 30 days would cost 200 * $0.05 ** 30 = $30. Storing a 200 GB Archive snapshot for 30 days would cost 200 \ $0.0125 * 30 = $7.

Comparing EBS Snapshots with Amazon S3 Glacier

Amazon S3 Glacier is another AWS storage service designed for long-term, low-cost storage. It is used for data archiving and long-term backups, but unlike EBS snapshots, it is not tied to EBS volumes. If you need to store large amounts of data for long periods, you might consider using Amazon S3 Glacier instead of EBS snapshots.

Amazon S3 Glacier stores data in archives, which are organized into vaults. You can upload data to a vault and then download it later when you need it. The retrieval time for Amazon S3 Glacier is several hours, similar to Archive EBS snapshots, but the cost is typically lower. In the us-east-1 region, Amazon S3 Glacier storage costs $0.004 per GB-month, compared to $0.0125 per GB-month for Archive EBS snapshots.

However, EBS snapshots offer several advantages over Amazon S3 Glacier, such as the ability to restore EBS volumes directly from the snapshot and the incremental backup feature. If your primary use case is to back up EBS volumes, EBS snapshots are generally more convenient and easier to manage.

Best Practices for Automating EBS Snapshots

Here are some best practices to help you optimize your EBS snapshot strategy:

Schedule regular snapshots: To minimize data loss, create a schedule for taking snapshots regularly, such as daily or weekly. This will help ensure that you always have a recent backup of your data.
Use tags to identify important volumes: Use tags to mark EBS volumes that should be included in snapshot policies. This makes it easy to manage snapshots for multiple volumes.
Enable cross-region copy for disaster recovery: To protect your data in case of a regional outage, enable cross-region copy for your EBS snapshots. This will automatically copy your snapshots to another region, ensuring that you have a backup available even if your primary region becomes unavailable.
Delete old snapshots: To save on storage costs, regularly delete old snapshots that are no longer needed. You can use Amazon DLM to automatically retain a certain number of snapshots and delete the rest.
Monitor snapshot usage and costs: Keep an eye on your snapshot usage and costs to ensure that you're not spending more than necessary. You can use Amazon CloudWatch and AWS Budgets to monitor your EBS snapshot usage and set up alerts if your costs exceed a certain threshold.

In conclusion, EBS snapshots are a powerful tool for backing up your data on AWS. By understanding the different types of snapshots, automating their creation and management, and following best practices, you can effectively protect your data and ensure a robust disaster recovery plan.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Sticking with AWS: Why Vendor Lock-In Is Not A Problem

Guillermo Ojeda — Thu, 15 Dec 2022 18:48:01 GMT

If you're deploying your applications on AWS, you may be concerned about vendor lock-in. Many businesses are hesitant to tie themselves to one cloud provider due to the potential risks involved. However, the reality is that the cost of avoiding vendor lock-in is almost always higher than the risk of being locked in with a vendor such as AWS. In this comprehensive guide, we will dive deeper into the concept of vendor lock-in, multi-cloud strategy, and the benefits of sticking with AWS as your primary cloud provider.

Understanding Vendor Lock-in

Vendor lock-in occurs when a business becomes overly reliant on a particular vendor's products or services to the point where it would be challenging and costly to switch to a different provider. This is particularly frequent in cloud computing when a business has developed custom applications or integrations specific to a single cloud provider or is using many managed services.

Key Components of Vendor Lock-in

Proprietary technologies: Using vendor-specific technologies that are not easily transferable to other providers.
Data transfer costs: High costs associated with moving data from one provider to another.
Custom integrations: Developing custom integrations that are specific to a single cloud provider.
Skillset investment: Investing in employee training and certification for a particular vendor's platform.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Tips to Mitigate Vendor Lock-in

Use open-source technologies and tools when possible.
Design your applications with portability in mind.
Regularly assess your cloud provider's performance and cost structure.

Avoiding Vendor Lock-in with a Multi-Cloud Strategy

One way to avoid vendor lock-in is to use a multi-cloud strategy. This approach involves using multiple cloud computing services from various vendors to improve performance, flexibility, and reliability. By spreading workloads across multiple providers, businesses can avoid the risks of vendor lock-in while gaining benefits such as enhanced disaster recovery capabilities, greater flexibility, and access to a more extensive range of services and technologies.

Advantages of a Multi-Cloud Strategy

Reduced reliance on a single provider: Less vulnerability to vendor-specific issues or outages.
Enhanced disaster recovery: Improved resilience with workloads distributed across multiple providers.
Greater flexibility: Ability to choose the best cloud service for each workload, optimizing performance and cost.

Disadvantages of a Multi-Cloud Strategy

Increased complexity: Implementing and maintaining a multi-cloud environment requires specialized knowledge and expertise in each cloud provider you're using.
Higher costs: A multi-cloud strategy involves more effort and overhead for development and operations, requires investment in tools and technologies to manage and integrate multiple cloud environments, and necessitates the replication of both infrastructure and data across all cloud providers.
Lack of standardization: Standardizing cloud environments can be more difficult with a multi-cloud strategy, making it harder to develop and deploy applications consistently across multiple providers and more challenging to ensure all necessary security and compliance measures are in place.

Benefits of AWS Over a Multi-cloud Strategy

In contrast to a multi-cloud strategy, using only one cloud provider like AWS has several advantages that may outweigh the potential costs of a multi-cloud approach. For example, AWS offers a wide range of managed services that can reduce the cost and effort associated with development and operations. Additionally, it's much simpler to apply security measures to a single cloud provider than to multiple ones.

Key Features and Advantages of AWS

Comprehensive service offerings: AWS provides a vast array of managed services, including compute, storage, databases, machine learning, and IoT, catering to various business needs.
Global infrastructure: AWS has a global network of data centers, providing low-latency access and improved performance for users worldwide.
Cost optimization: AWS offers multiple pricing models, including pay-as-you-go, reserved instances, and spot instances, allowing businesses to optimize costs based on their specific needs.
Security and compliance: AWS provides a robust set of security features and compliance certifications, making it easier for businesses to meet regulatory requirements.
Scalability: AWS enables businesses to easily scale their infrastructure up or down based on demand, providing flexibility and cost savings.
Innovation and continuous improvement: AWS is constantly expanding its service offerings and introducing new features, enabling businesses to leverage the latest technology advancements.

Tips for Maximizing AWS Benefits

Focus on using managed services: Managed services reduce the operational burden, allowing your team to focus on building value-added features.
Leverage AWS Well-Architected Framework: Adhere to AWS best practices to optimize performance, cost, and security.
Utilize cost management tools: Use tools like AWS Cost Explorer and AWS Budgets to monitor and optimize your cloud spending.

Conclusion

While there may be a time and place for a multi-cloud strategy and concerns about vendor lock-in, for most companies, sticking with AWS is likely the better choice. While multi-cloud has its benefits, the costs of implementing and maintaining such a strategy often outweigh those benefits. Businesses in finance or healthcare may need to consider a multi-cloud approach due to regulations, but for most other companies, the advantages of sticking with AWS far outweigh the potential risks of vendor lock-in. By leveraging AWS's extensive service offerings, global infrastructure, and security features, businesses can enjoy the benefits of cloud computing while mitigating the risks associated with vendor lock-in.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

AWS Solutions Architect Professional Exam Notes and Prep Guide

Guillermo Ojeda — Sun, 11 Dec 2022 14:53:34 GMT

I set my sights on the SA Pro cert a while ago, but for multiple reasons I couldn't find the time to sit down and study, until early this year. On April 6th I finally sat the exam and passed with a score of 859. Here's my account on how I prepared for it, what the exam felt like, and a ton of notes that I took about small technical details that can make a difference in a question.

My Study Materials and Strategy

While I had some experience as a freelance architect and AWS Authorized Instructor, the past year saw me working a lot with code and GCP, and barely even touching AWS, so I knew I needed a full course that would help me remember the basics (in case I had forgotten anything) and also level up on the advanced stuff. I chose Adrian Cantrill's AWS Certified Solutions Architect - Professional course for that, and it was excellent, though quite long.

It took me over a month and a half to go over Adrian's course, but after that I felt in a pretty good place, with his excellent lessons and demos. However, I knew something must be lacking, from my memory if not from the course, so I signed in to AWS SkillBuilder and found the Exam Readiness: AWS Certified Solutions Architect Professional course. It says 4 hours, but I think you should take at least 6, because while the course doesn't give you any new knowledge, it helps you a lot to reflect on what you're missing and identify your weaknesses, and that's what's going to drive your next steps.

Identifying and Addressing Weaknesses

My weaknesses weren't focused on a single area, since those I had identified earlier and covered by re-watching Adrian's lessons as many times as necessary (I think I watched the Direct Connect ones 4 or 5 times). Instead of not knowing one service or one kind of solution, my weaknesses were all over the place, not in the general aspects but rather in the smallest details that mattered.

Some of the not so small details:

If you're connecting Direct Connect to a VPC without a VPN, should you use a public or private VIF? What about when using site-to-site VPN? Answer: private when going to the VPC directly, public when using a VPN because Site-to-Site VPN is a public service (i.e. not in a VPC, same as S3 for example).
Is Kinesis Firehose able to stream data in real time? Answer: No, it has a 60-second latency, and is considered near-real time, NOT real time.

Some of the much smaller ones:

In ALB, can you associate multiple SSL certificates with the same listener? If so, how will the listener choose the correct certificate? Answer: Yes, and the listener automatically chooses the correct cert using SNI.
Is data ordered in a Kinesis Data Stream? Answer: Yes inside the shard, not across multiple shards.
In SQS with a retention period of 7 days, if a message is moved to the DLQ 5 days after being enqueued, when will it be deleted? Answer: In 2 days, because the retention period checks the enqueue timestamp, which is unchanged when a message is moved to the DLQ.

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. For a limited time, you'll get a free ebook (valued at $10) when you subscribe.

Focusing on Practice Exams

So I knew I was lacking, but I didn't even know the questions that I should seek answers to. I tried going to the FAQs, but let me tell you, those are SUPER LONG and full of A TON of info that's probably not relevant to the exam (though at the professional level you should assume everything is relevant). After about half an hour of just reading the FAQs and getting terribly bored, I went online to search for practice exams, so I could make my own mistakes and learn that way. I found the AWS Certified Solutions Architect Professional Practice Exams 2022 in TutorialsDojo and purchased that.

On a brief note, TutorialsDojo's practice exams are excellent, even if they're not perfect. Most answers are correct and the explanations are really good. I did find one or three that were ridiculous or outright technically impossible. Still, one or three among 375 (4 practice exams + 1 final exam) is very good. Just keep in mind that, when in doubt, you should look up the documentation and try to find the correct answer by yourself.

Benefits of Practice Exams

At this point, doing practice exams is by far the best thing that you can do, in my opinion. Making your own mistakes (TutorialsDojo does tell you which questions you got right or wrong, what the correct answer is, and why) really helps you to recall those small details that make a difference. Plus, you can do half of an exam, or just 10 questions, whenever you have the time. I do recommend doing at least one or two full, timed exams, but you don't have to do either a 3-hour study session or nothing at all; if all you have is 30 minutes, it's better to answer 5 or 10 questions than not doing anything. Also, write everything down, so you can go over your notes later.

Another huge thing about practice exams is that you get to practice timing yourself. You get 180 minutes for 75 questions, which is 2 minutes and 24 seconds per question. If it doesn't sound like much, it's because it isn't. Most questions are very long, much longer than in the SA Associate exam, and the correct answer often depends on a word or two. You'll find yourself scanning through answers 4 or 5 lines long that seem exactly the same, until you find the difference: a private VPC vs a public VPC, for example. Other times, what seems to be the best answer actually has a detail that means it won't work. For example, one answer might describe setting up the application in a private subnet and adding an interface VPC endpoint to access DynamoDB, while the other will talk about putting the application in a public subnet and using a gateway VPC endpoint to access DynamoDB. If you're not careful, you might miss the fact that DynamoDB does not support interface VPC endpoints, only gateway VPC endpoints.

Timing Strategy

For the SA Pro exam, I recommend spending the first 90 minutes reading through all 75 questions, and answering only the ones that you're 100% sure of. Flag the others for review, and take a quick break if needed. Then, go back to the flagged questions and spend the next 60 minutes trying to figure them out. Finally, spend the last 30 minutes going over all the questions again, reviewing your answers and ensuring you haven't missed any small details. This approach worked well for me and helped me manage my time effectively.

Final Thoughts

The AWS Solutions Architect Professional exam is challenging, but with the right study materials and practice exams, you can succeed. Adrian Cantrill's course, AWS SkillBuilder's Exam Readiness, and TutorialsDojo's practice exams were invaluable in my preparation. The key is to identify your weaknesses, focus on the small technical details, and practice your timing. Remember to always consult the documentation when in doubt, and take the time to learn from your mistakes. Best of luck in your certification journey!

Exam notes

The following are the notes I took on the very fine details for each service. They don't cover everything, just what I thought would be difficult and important to remember.

EBS

GP2

1 IOPS = 1 IO (16 KB) in 1 second.
Max IO credits = 5.4 million. Starts full. Fills at rate of Baseline Performance. above the 100 minimum IO credits, 3 IO credits per second per GB of volume size.
Burst up to 3000 IOPS or the fill rate
Volumes above 1000 GB have baseline performance higher than 3000 IOPS and don't use credits.

GP3

3000 IOPS & 125 MiB/s standard (regardless of size)
Goes up to 16000 IOPS or 1000 MiB/s
Performance doesn't scale with size, need to scale it separately. It's still around 20% cheaper than GP2

Provisioned IOPS

Consistent low latency & jitter
64000 IOPS, 1000 MB/s (256000 IOPS & 4000 MB/s for Block Express)
4 GB to 16 TB (64 TB for Block Express)
IO1: 50 IOPS/GB max. IO2: 500 IOPS/GB max.
IOPS can be adjusted independently of size
Real limitations for maximum performance between EBS and EC2:
Per instance performance: IO1: 260000 IOPS & 7500 MB/s, IO2: 160000 IOPS & 4750 MB/s, IO2 Block Express: 260000 IOPS & 7500 MB/s
Limitations on the EC2 instance type and size
Use cases: Small volumes with really high performance, extreme performance, latency-sensitive workloads

HDD

st1: cheaper than SSD, really bad at random access. Max 500 IOPS, but 1 MB per IO. Max 500 MB/s. 40 MB/s/TB base, 250 MB/s/TB burst. Size 125 GB to 16 TB. Use case: sequential access, big data, data warehouses, log processing.
sc1: even cheaper, but cold, designed for infrequent workloads. Max 250 IOPS but 1 MB per IO. Max 250 MB/s. 12 MB/s/TB base, 80 MB/s/TB burst. Size 125 GB to 16 TB.

Instance Store volumes

Block storage devices (like EBS) but local to the instance. Physically connected to one EC2 host. Instances on that host can access them.
Included in instance price (for instance types that have it), use it or waste it
Attached at launch
Ephemeral storage. If the instance moves between hosts, data in instance volumes is lost.
Size depends on type and size of instance
EC2 instance type D3 = 4.6 GB/s throughput
EC2 instance type I3 = 16 GB/s sequential throughput
How to choose between EBS and Instance Store:
Persistence, resilience, backups or isolation from instance lifecycle: choose EBS
Cost for EBS: ST1 or SC1 (both are hard disks)
Throughput or streaming: ST1
Boot volume: NOT ST1 or SC1
Up to 16000 IOPS: GP2/3
Up to 64000 IOPS: IO2
Up to 256000 IOPS: IO2 Block Express
Up to 260000 IOPS: RAID0 + EBS (IO1/2-BE/GP2/3) (this is the max performance of an EC2 instance)
More than 260000 IOPS: Instance Store (but it's not persistent)
Support encryption, but it's NOT enabled by default.

EC2

Placement groups

Cluster: Same rack, higher network, one AZ, supported instance type, for fast speeds and low latency
Spread: always different racks, 7 instances per AZ, for critical instances
Partition: Max 7 partitions, each can have more than 1 instance, great for topology-aware apps like HDFS, HBase and Cassandra

ELB:

GWLB:
L3 LB for ingress/egress security scans
To pass traffic through scalable 3rd party appliances, using GENEVE protocol.
Uses GWLB Endpoint, which can be added to a RT as a next hop.
Packets are unaltered.

ALB:

Can have multiple SSL certificates associated with a secure listener and will automatically choose the optimal certificate using SNI.

DynamoDB:

Local Secondary Indexes (LSI)

Can only be created when creating the table
Use the same PK but a different SK
Aside from keys, can project none, some or all attributes
Share capacity with the table
Are sparse: only items with values in PK and SK are projected
Use strong consistency.

Global Secondary Indexes (GSI)

Can be created at any time
Different PK and SK
Own RCU and WCU allocations
Aside from keys, can project none, some or all attributes
Are sparse: only items with values in PK and SK are projected
Are always eventually consistent, replication between base table and GSI is async.
On LSIs and GSIs you can query on attributes not projected, but it's expensive.

DynamoDB Streams

A Kinesis Stream with 24-h rolling window of time-ordered item changes in a table
Enabled on a per-table basis
Records INSERTS, UPDATES and DELETES
Different view types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE and NEW_AND_OLD_IMAGE.

Athena

Serverless interactive querying service
Free, you only pay for the data consumed
Schema-on-read table-like translation
Original data never changed, remains on S3
Schema translates data to relational-like when reading
Can also query AWS logs, web server logs or Glue Data Catalogs
Can use Athena Federated Query to use a Lambda to transform the data before querying.

Kinesis

Data stream

Sub-1-second
custom processsing per record
choice of stream processing framework
Multi-shard
1 shard = 1 MB ingestion and 2 MB consumption
Order is guaranteed within the shard, but not across shards
24h (up to 7d for more $$$) rolling window
Multiple consumers

Firehose

Connects to a data stream or ingests from multiple sources
Zero admin (automatically scalable, serverless and resilient)
\>60 seconds latency
delivers data to existing analytics tools: HTTP such as splunk, ElasticSearch and OpenSearch, S3 and Redshift (through intermediate S3 bucket)
Order is guaranteed
Supports transformation of data on the fly
Billed by data streamed.

Difference between SQS and Kinesis data streams

SQS has 1 production group and 1 consumer group, and once a message is consumed it's deleted
It's typically used to decouple async communication
Kinesis is designed for huge scale ingestion and multiple consumer within the rolling window
It's designed for data ingestion, analytics, monitoring, app clicks, and streaming

Kinesis Data Analytics

real-time processing of data using SQL
Ingests from Data streams or Firehose or S3, processes it and sends to Data streams, Lambda or Firehose
It fits between 2 streams and allows you to use SQL to modify the data

Elastic MapReduce (EMR)

Managed implementation of Hadoop, Spark, HBase, Presto, Flink, Hive and Pig.
Huge-scale parallel processing
Two phases: Map and Reduce. Map: Data is separated into 'splits', each assigned to a mapper. Perform customized operations at scale. Reduce: Recombine data into results.
Can create clusters for long-term usage or ad-hoc (transient) usage.
Runs in one AZ in a VPC (NOT HA) using EC2 for compute
Auto scales and can use spot, instance fleet, reserved and on-demand.
Loads data from S3 and outputs to S3.
Uses Hadoop File System (HDFS)
Data stored across multiple data nodes and replicated between nodes for fault tolerance.

Node types

Master (at least 1): Manages the cluster and health, distributes workloads and controls access to HDFS and SSH access to the cluster. Don't run in spot.
Core (0 or more): Are the data nodes for HDFS, run task trackers and can run map and reduce tasks. HDFS runs in instance store. Don't run in spot.
Task nodes (0 or more): Only run tasks, don't run HDFS or task trackers. Ideal for spot instances.

EMRFS

Is a file system for EMR
backed by S3 (regionally resilient)
persists past the lifetime of the cluster and is resilient to core node failure
It is slower than HDFS (S3 vs Instance Storage)

Redshift

Petabyte-scale data warehouse
OLAP (column-based, not OLTP: row/transaction)
Designed to aggregate data from OLTP DBs
NOT designed for real-time ingestion, but for batch ingestion.
Provisioned (server-based)
Single-AZ (not HA).
Automatic snapshots to S3 every 8h or 5GB with 1d (default) to 35d retention, plus manual snapshots, make the data resilient to AZ failure. Can be configured to be copied to another region.
DMS can migrate into Redshift and Firehose can stream into redshift.
Redshift Spectrum: Directly query data in S3. Federated query: Directly query data in other DBs.
For ad-hoc querying use Athena.
Can copy encrypted snapshots to another region by configuring a snapshot copy grant for the master key in the other region.

Node types in Redshift

Leader node: Query input, planning and aggregation. Applications interact with the leader node using ODBC or JDBC.
Compute node: performing queries of data. They have slices with the data, replicated to 1 additional node.

AWS Batch

Lets you worry about defining batch jobs, handles the compute.
Job: script, executable or docker container. The thing to run.
Job definition: Metadata for a job, including permissions, resource config, mount points, etc.
Job queue: Jobs are added to queues, where they wait for compute capacity. Capacity comes from 1+ compute environments.
Compute environment: managed or unmanaged compute, configurable with instance type/size, vCPU amount, spot price, or using an existing environment with ECS (only with ECS).
Managed compute environment: Batch manages capacity, you pick on-demand or spot, instance size/type, max spot price. Runs in VPC, can run in private VP but you need to provide gateways.
Unmanaged compute environment: You create everything and manage everything outside of Batch (with ECS).
Jobs can come from Lambda API calls, Step Functions integration or API call, target of EventBridge (e.g. from S3).
When completed, can store data and metadata in S3 and DynamoDB, can continue execution of Step Functions, or post to Batch Event Stream.

Difference between AWS Batch and AWS Lambda

Lambda has 15-min execution limit, 10 GB disk space limit (as of 2022/03/24, probably not impacted the exam yet, previous limit was 512 MB) and limited runtime
Batch uses docker (so any runtime) and has no resource limits.

ElastiCache

Redis: advanced data structures, persistent, multi-az, read replicas, can scale up but not out (and can't scale down), backups and restores. Highly available (multi-az)
Memcached: simple K/V, non-persistent, can scale up and out (multiple nodes), multi-thread, no backup/restore. NOT highly available

EFS and FSx

FSx for Windows

ENIs injected into VPCs
Native Windows FS
needs to be connected with Directory Service or self-managed AD
Single or Multi-AZ
on-demand and scheduled backups
accessible using VPC, VPN, peering, direct connect
Encryption at rest (KMS) and in transit
Keywords: VSS, SMB, DFS

FSx for Lustre

ENIs injected into VPCs
HPC for Linux (POSIX)
Used for ML, big data or financial
100s GB/s
deployment types: Scratch (short term, no replication) and Persistent (longer term, HA in one AZ, self-healing)
Available over VPN or direct connect
Data is lazy loaded from S3 and can sync back to S3
< 1 ms latency.

EFS

NFSv4 FS for Linux
Mount targets in VPC
General purpose and Max I/O modes
Bursting and Provisioned throughput modes (separate from size)
Standard and IA storage classes.
It's impossible to update the deployment type (single-AZ or multi-AZ) of an FSx for Windows file system after it has been created
To migrate to multi-AZ, create a new one and use DataSync to replicate the data.

QuickSight

BA/BI tool for visualizations and ad-hoc analysis.
Supports discovery and integration with AWS or external data sources
Used for dashboards or visualization.

SQS

Visibility timeout

Default is 30s
can be between 0s and 12h
Set on queue or per message.

Extended client library

for messages over SQS max (256 KB)
Allows larger payloads (up to 2 GB) stored in S3
SendMessage uploads to S3 automatically and stores the link in the message
ReceiveMessage loads payload from S3 automatically
DeleteMessage also deletes payload in S3
Exam often mentions Java.

Delay queues

Postpone delivery of message (only in Standard queues)
Set DelaySeconds and messages will be added immediately to the queue but will only be visible after the delay
Min (default) is 0s, max is 15m.

Dead-letter queues

Every time a message is received (or visibility timeout expires) in a queue, ReceiveCount is increased
When ReceiveCount > maxReceiveCount a message is moved to the dead-letter queue
Enqueue timestamp is unchanged (so Retention period is time at queue + time at DL queue).

FIFO queue

3000 messages per second limit.

Amazon MQ

Open-source message broker based on Apache ActiveMQ
JMS API with protocols such as AMQP, MQTT, OpenWire and STOMP
Provides queues and topics.
Runs in VPC with single instance or HA pair (active/standby)
Comparison with SNS and SQS: SNS and SQS use AWS APIs, public, highly scalable, AWS integrated. Amazon MQ is based on ActiveMQ and uses protocols JMS, AMQP, MQTT, OpenWire and STOMP. Look for protocols in the exam. Also, SNS and SQS for new apps, Amazon MQ for migrations with little to no app change.

Lambda

FaaS, short-running (default 3s, max 15m)
Function = piece of code + wrapping and config
It uses a runtime (Python, Ruby, Java, Go and C#), and it's loaded and ran in a runtime environment
The environment has a direct memory (128MB to 10240 MB), indirect CPU and instance storage (default 512 MB, max 10 GB) allocation.
Docker is an anti-pattern for lambda, lambda container images is something different and is possible.
Used for serverless apps, file processing (S3 events), DB triggers (DynamoDB), serverless cron (EventBridge), realtime stream data processing (Kinesis).
By default Lambda runs in public space and can't access to VPC services
It can also run inside a VPC (needs EC2 Network permissions)
Technically Lambda runs in a separate (shared) VPC, creates an ENI in your VPC per function (NOT per invocation) and uses an NLB, with 90s for initial setup and no additional invocation delay.
Lambda uses an Execution role (IAM role) which grants permissions.
Also a resource policy can control what can invoke the lambda.
Lambda logs to CW Logs, posts metrics to CW and can use X-Ray. Needs permissions for this, in the Execution role.
Context includes runtime + variables created before handler + /tmp. Context can be reused, but we can't control that, must assume new context.
Cold start: Provision HW, install environment, download code, run code before handler. Can pre-warm using Provisioned concurrency. You are NOT billed for cold-start time (not even for code before handler)k.
Execution process: Init (cold start) (if necessary), Invoke (runs the function Handler), Shutdown (terminate environment).
Lambda + ALB: ALB synchronously invokes Lambda (automatically translates HTTP(s) request to Lambda event).
Multi-value headers: (When using ALB + Lambda) Groups query string values by key, eg http://a.io?&search=a&search=b is passed as multiValueQueryStringParameters:{"search": ["a","b"]}. If not using Multi-value headers, only the last value is sent, e.g. "queryStringParameters": {"search":"b"}

Lambda layers

Share and reuse code by externalising libraries, which are shared between functions
Also allows new, unsupported runtimes such as Rust
Deployment zip only contains specific code (is smaller)
Can use AWS layers or write your own.

Lambda versions

A function has immutable versions
Each with their own ARN, called qualified, the unqualified ARN points to $Latest
Each includes code + config (including env vars)
$Latest points at the latest version, and aliases like Dev, Stage, Prod can be created and updated
A version is created when a Lambda is published, but it can be deployed without being published
You can also create aliases that point a % of traffic to an alias and another % to another alias.

Lambda invocation types

Sync: CLI/API invokes and waits for response. Same is used through API Gateway. Client handles errors or retries.
Async: Typical when AWS services invoke Lambdas, such as S3. Lambda handles retries (configurable 0-2 times). Must be idempotent!! Events can be sent to DLQ and destinations (SQS, SNS, Lambda and EventBridge.
Event Source Mapping: Kinesis data streams sends batches of events to Lambdas using Event Source Mapping. Lambda needs permissions to access the source (which are used on its behalf by Event Source Mapping). Can use DLQ for failed events.

Lambda container images

Include Lambda Runtime API (to run) and Runtime Interface Emulator (to local test) in the container image
Image is built and pushed to ECR, then operates as normal.

API GW

Highly available
scalable
handles auth (directly with Cognito or with a Lambda authorizer), throttling, caching, CORS, transformations, OpenAPI spec, direct integration.
Can connect to services in AWS or on-prem
Supports HTTP, REST and WebSocket

Endpoint types

Edge-optimized: Routed to the nearest CloudFront POP
Regional: Clients in the same region
Private: Endpoint accessible only within a VPC via interface endpoint
APIs are deployed to stages, each stage has one deployment. Stages can be environments (dev, prod) or version (v1, v2). Each stage has its own config. They are NOT immutable, can be changed and rolled back. Stages can be enabled for canary deployments.
2 phases: Request: Authorize, validate and transform. Response: transform, prepare and return. Request is called method request and is converted to integration request, which is passed to the backend. Response is called integration response, which is converted to method response and is returned to the client.

Types of integration

Mock: For testing, no backend
HTTP: Set translation for method->integration request and integration->method response in the API GW
HTTP Proxy: Pass through request unmodified, return to the client unmodified (backend needs to use supported format)
AWS: Exposes AWS service actions
AWS_PROXY(Lambda): Low admin overhead Lambda endpoint.
Mapping templates: User for AWS and HTTP (non-PROXY) integrations. Modify or rename parameters, body or headers of the request/response. Uses Velocity Template Language (VTL). Can transform REST request to a SOAP API.
API GW has a timeout of 29s (cant be increased)

Errors

4XX: Client error, invalid request on client side
5XX: Server error, valid request, backend issue
400: Bad request, generic
403: Access denied, authorizer denies or WAF filtered
429: API GW throttled the request
502: Bad GW, bad output returned by backend
503: Service unavailable, backend offline?
504: Integration failure/timeout, 29s limit achieved.
Cache: TTL 0s to 3600s (default 300s), 500 MB to 237 GB, can be encrypted. Defined per stage. Request only goes to backend if cache miss.
Payload limit: 10 MB

CloudFront

Private behaviors: A behavior can be made private if it uses a Trusted Signer (key created by root user). It will require a signed URL (access to 1 object) or signed cookie (access to the whole origin).
Origin Access Identity: Set an identity to the CloudFront behavior and only allow that identity in the origin (e.g. S3 bucket).

Storage Gateway

File Gateway

Access S3/Glacier through NFS and SMB protocols
Only the most recent data is stored (cached) on prem
NOT low-latency because from on prem it needs to fetch data form S3.

Tape Gateway

Access S3/Glacier through iSCSI VLT
Mainly used for archiving
Backed by Glacier, so can't consume in real time.

Stored-Volume Gateway

iSCSI-mounted volume stored on-prem and async backed to S3 as EBS snapshots
16 TB per volume, max 32 volumes per gateway = max 512 TB
Is low latency, since all data is stored on prem, S3 is just used as backup of EBS snapshots of the volume.

Cached-Volume Gateway

iSCSI-mounted volume stored in S3 and cached on-prem
32 TB per volume, max 32 volumes per gateway = 1024 TB
Data is stored on S3, NOT on-prem
On-prem only has a cache of the data, so low latency will only work for the cached data, not all data.

Migrations

6R

Retain: Stays on prem, no migration for now, revisit in the future
Re-host: Lift and shift with no changes
Refactor: Architect brand new, cloud native app. Lots of work
Re-platform: Lift and shift with some tinkering
Replace: Buy a native solution (not build one)
Retire: Solution is no longer needed, it's not replaced with something else Migration process:

Migration Plan

Discovery: making sure we know what's really happening, identify dependencies, check all the corners and ask the questions
Assessment and profiling, data requirements and classification, prioritization, business logic and infrastructure dependencies
Design: Detailed migration plan, effort estimation, security and risk assessment.
Tools: AWS Application Discovery Service, AWS Database Migration Service.

Migration Build

Transform: Network topology, migrate, deploy, validate
Transition: Pilot testing, transition to support, release management, cutover and decomission.

Migration Run

Operate: Staff training, monitoring, incident management, provisioning
Optimize: Monitoring-driven optimization, continuous integration and continuous deployment, well-architected framework.

S3

S3 Object Lock (requires versioning)

Legal Hold: Turn on or off. Object versions can't be deleted or modified while turned on, can be turned off.
Retention Compliance: Set a duration, object versions can't be deleted or modified for the duration. Can't be disabled, not even by root.
Retention Governance: Set a duration, object versions can't be deleted or modified for the duration. Special permissions allow changing the policy.

Amazon Macie

Data security and privacy service. Identifies data that should be private.
Select S3 buckets, create discovery job, set managed or custom data identifiers, post policy findings and sensitive data findings to EventBridge o Security Hub.

Interface and Gateway endpoints

Interface endpoints: ENI with private IP for traffic to services with PrivateLink
Gateway endpoints: Target for a route in the RT, only used for S3 or DynamoDB
NACL: Limit of 20 rules, can be increased to 40

Amazon DirectConnect (DX)

Public VIF: Used for AWS public services (including VPN)
Private VIF: Used for resources inside a VPC

Schema Conversion Tool (SCT)

You need to configure the data extraction agent first on your on-premises server.

Database Migration System (DMS)

can directly migrate the data to Amazon Redshift.

RDS

cross-region read replicas
multi-master: all master nodes need to be in the same region, and can't enable cross-region read replicas.
Max size: 64 TB

Step Functions

Does not directly support Mechanical Turk, in that case use SWF.

CloudSearch

Provides search capabilities, for example for documents stored in S3.

AWS Config

Can aggregate data from multiple AWS accounts using an Aggregator
Can only perform actions in the same AWS account.

Most important thing to remember

You can do it!!!

Master AWS with Real Solutions and Best Practices.
Join over 2500 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Subscribe now and you'll get the AWS Made Simple and Fun ebook for free (valued at $10). Limited offer, don't wait!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com

Event-Driven Architectures in AWS: Building Scalable and Responsive Applications

Guillermo Ojeda — Thu, 08 Dec 2022 16:40:06 GMT

Event-driven architectures have gained popularity in recent years, especially in conjunction with microservices. They are a type of architecture in which components respond to specific events or triggers. In cloud computing, event-driven architectures are more easily built with managed services and serverless services, making them an attractive option for businesses looking to create scalable and responsive applications.

In this blog post, we will explore event-driven architectures in Amazon Web Services (AWS), discuss some of the key benefits and challenges associated with using this approach, and provide an example implementation.

What is an Event-Driven Architecture

Event-driven architecture is a design pattern that allows the creation of applications and components that are triggered by specific events. These events can be external, such as a user clicking on a button, or internal, such as a record being written into a database.

In AWS, event-driven architectures are built using a combination of AWS managed services, especially Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), AWS Event Bridge, and AWS Lambda.

This is an example of an event-driven architecture in AWS:

Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.

Benefits of Event-Driven Architectures in AWS

Scalability is one of the most significant benefits of using event-driven architectures in AWS. As these architectures are triggered by specific events, they can automatically scale up or down to meet the demands of the application. Components can also be scaled independently, helping businesses save money by only paying for the resources they use.

Event-driven architectures can also help businesses respond to customer needs more quickly. Because components are decoupled and triggered by specific events, they can be updated independently. Additionally, these architectures can integrate with a wide variety of other AWS services, enabling businesses to build complex applications that can process and analyze data from multiple sources, providing valuable insights and enabling them to make informed decisions.

Challenges of Event-Driven Architectures in AWS

One of the most significant challenges associated with using event-driven architectures in AWS is their complexity. These architectures are built using different AWS services, which can make them difficult to set up and manage, particularly for organizations unfamiliar with cloud computing and AWS architecture.

Event-driven architectures can also be challenging to test and debug. Because they are triggered by specific events, it can be difficult to simulate these events in a testing environment, making it challenging to identify and fix issues before they impact production systems.

How to Implement an Event-Driven Architecture in AWS

To implement an event-driven architecture in AWS, we will use a combination of Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), and AWS Lambda. It might also be useful to deploy these using AWS CloudFormation.

First, we will create an SQS queue that will act as the central hub for our event-driven architecture. This queue will receive messages whenever a specific event occurs, such as a user clicking on a button or a database reaching a certain size.

Next, we will create an SNS topic that is subscribed to the SQS queue. Whenever a message is added to the queue, the SNS topic will be notified and will trigger an AWS Lambda function.

The AWS Lambda function will contain the logic for our event-driven architecture, including processing and analyzing data, making real-time decisions, and integrating with other AWS services.

To summarize, our event-driven architecture will work as follows:

A specific event occurs, such as a user clicking on a button.
This event adds a message to the SQS queue.
The SNS topic is notified and triggers an AWS Lambda function.
The AWS Lambda function processes and analyzes the data, and takes any necessary actions.

The following CloudFormation template creates an AWS Lambda function in Node.js that is triggered by an SNS topic, which is subscribed to an SQS queue:

Copy codeAWSTemplateFormatVersion: '2010-09-09'Resources:  MyLambdaFunction:    Type: AWS::Lambda::Function    Properties:      FunctionName: MyLambdaFunction      Runtime: nodejs12.x      Code:        ZipFile: |          exports.handler = async (event) => {            // Your function logic goes here            return          }  MySQSQueue:    Type: AWS::SQS::Queue  MySNSTopic:    Type: AWS::SNS::Topic    Properties:      Subscription:        - Protocol: sqs          Endpoint: !GetAtt MySQSQueue.Arn

This CloudFormation template creates an AWS Lambda function in Node.js named "MyLambdaFunction" and an SQS queue named "MySQSQueue". It also creates an SNS topic named "MySNSTopic" and subscribes the SQS queue to the topic.

When a message is added to the SQS queue, the SNS topic will be notified and will trigger the Lambda function. The function will contain the logic for our event-driven architecture, which you can customize by modifying the code in the "ZipFile" property.

You can also customize the properties of the SQS queue and SNS topic in the template. For instance, you can specify the visibility timeout, message retention period, and other settings for the queue. Additionally, you can modify the runtime and code for the Lambda function, as well as other properties like memory and timeout settings.

Conclusion

In summary, event-driven architectures in AWS offer numerous benefits, such as scalability, high responsiveness, and integration with a wide variety of AWS services. However, they also pose challenges, such as complexity and difficulty in testing and debugging. Overall, event-driven architectures can be a valuable tool for businesses looking to build scalable and highly-responsive applications in the cloud.

Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.

Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them

Simple AWS is free. Start mastering AWS!

If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com