Guillermo Ojeda
Cloudy Things: How to build on AWS


Cloudy Things: How to build on AWS

AWS Solutions Architect Professional exam notes

AWS Solutions Architect Professional exam notes

Guillermo Ojeda's photo
Guillermo Ojeda
·Dec 11, 2022·

25 min read

I set my sights on the SA Pro cert a while ago, but for multiple reasons I couldn't find the time to sit down and study, until early this year. On April 6th I finally sat the exam and passed with a score of 859. Here's my account on how I prepared for it, what the exam felt like, and a ton of notes that I took about small technical details that can make a difference in a question.

While I had some experience as a freelance architect and AWS Authorized Instructor, the past year saw me working a lot with code and GCP, and barely even touching AWS, so I knew I needed a full course that would help me remember the basics (in case I had forgotten anything) and also level up on the advanced stuff. I chose Adrian Cantrill's AWS Certified Solutions Architect - Professional course for that, and it was excellent, though quite long.

It took me over a month and a half to go over Adrian's course, but after that I felt in a pretty good place, with his excellent lessons and demos. However, I knew something must be lacking, from my memory if not from the course, so I signed in to AWS SkillBuilder and found the Exam Readiness: AWS Certified Solutions Architect – Professional course. It says 4 hours, but I think you should take at least 6, because while the course doesn't give you any new knowledge, it helps you a lot to reflect on what you're missing and identify your weaknesses, and that's what's going to drive your next steps.

My weaknesses weren't focused on a single area, since those I had identified earlier and covered by re-watching Adrian's lessons as many times as necessary (I think I watched the Direct Connect ones 4 or 5 times). Instead of not knowing one service or one kind of solution, my weaknesses were all over the place, not in the general aspects but rather in the smallest details that mattered.

Some of the not so small details:

  • If you're connecting Direct Connect to a VPC without a VPN, should you use a public or private VIF? What about when using site-to-site VPN? Answer: private when going to the VPC directly, public when using a VPN because Site-to-Site VPN is a public service (i.e. not in a VPC, same as S3 for example).

  • Is Kinesis Firehose able to stream data in real time? Answer: No, it has a 60-second latency, and is considered near-real time, NOT real time.

Some of the much smaller ones:

  • In ALB, can you associate multiple SSL certificates with the same listener? If so, how will the listener choose the correct certificate? Answer: Yes, and the listener automatically chooses the correct cert using SNI.

  • Is data ordered in a Kinesis Data Stream? Answer: Yes inside the shard, not across multiple shards.

  • In SQS with a retention period of 7 days, if a message is moved to the DLQ 5 days after being enqueued, when will it be deleted? Answer: In 2 days, because the retention period checks the enqueue timestamp, which is unchanged when a message is moved to the DLQ.

So I knew I was lacking, but I didn't even know the questions that I should seek answers to. I tried going to the FAQs, but let me tell you, those are SUPER LONG and full of A TON of info that's probably not relevant to the exam (though at the professional level you should assume everything is relevant). After about half an hour of just reading the FAQs and getting terribly bored, I went online to search for practice exams, so I could make my own mistakes and learn that way. I found the AWS Certified Solutions Architect Professional Practice Exams 2022 in TutorialsDojo, and purchased that.

On a brief note, TutorialsDojo's practice exams are NOT excellent, but they are good enough. Most answers are correct and the explanations are pretty good. A few of them are a bit more questionable, and I found one or three that were ridiculous or outright technically impossible. Still, one or three among 375 (4 practice exams + 1 final exam) is good enough. Just keep in mind that, when in doubt, you should look up the documentation and try to find the correct answer by yourself. If anyone knows about better practice exams, let me know and I'll add them to the recommendations.

At this point doing practice exams is by far the best thing that you can do, in my opinion. Making your own mistakes (TutorialsDojo does tell you which questions you got right or wrong, what the correct answer is and why) really helps you to recall those small details that make a difference. Plus, you can do half of an exam, or just 10 questions, whenever you have the time. I do recommend doing at least one or two full, timed exams, but you don't have to do either a 3-hour study session or nothing at all, if all you have is 30 minutes it's better to answer 5 or 10 questions than not doing anything. Also, write everything down, so you can go over your notes later.

Another huge thing about practice exams is that you get to practice timing yourself. You get 180 minutes for 75 questions, which is 2 minutes and 24 seconds per question. If it doesn't sound like much, it's because it isn't. Most questions are very long, much longer than in the SA Associate exam, and the correct answer often depends on a word or two. You'll find yourself scanning through answers 4 or 5 lines long that seem exactly the same, until you find the difference: a private VPC vs a public VPC, for example. Other times what seems to be the best answer actually has a detail that means it won't work. For example, one answer might describe setting up the application in a private subnet and adding an interface VPC endpoint to access DynamoDB, while the other will talk about putting the application in a public subnet and accessing DynamoDB through the internet. It should be obvious to you that it's possible to access DynamoDB either through a VPC endpoint or through the internet, and accessing through a VPC endpoint is much preferrable (costs, security, latency, etc), so you might be tempted to pick the first option. You have to pay attention though, otherwise you'll miss the fact that an interface VPC endpoint cannot be used to access DynamoDB, only a gateway VPC endpoint will do that, so while going through AWS's internal network would be the ideal solution, between the two solutions presented the second one is the only one that's technically feasible, and it is therefore the correct answer. Yes, both those things matter: knowing the differences between an interface VPC endpoint and a gateway VPC endpoint, and reading carefully. So, time is short, but do read all the options carefully, discard what you can, and choose between what's left.

One final note about timing: After answering the first 10 questions, you should have more than 156 minutes left. If you are very close to 156 minutes left (even if you have a bit less), you're probably good, since it does take one or five questions to build up a rythm. If you have a lot less than that, you're going to need to speed up a little or you won't make it. Your next marker can either be the start of question 21 (i.e. you finished 20 questions), which should see you with 132 minutes or more, or the start of question 26, which means you've completed a third of the exam and should have 120 minutes or more in the clock. Question 31 is 108 minutes, question 39 is half the exam at 90 minutes, question 51 is two thirds at 60 minutes, question 61 is 36 minutes and question 71 is 12 minutes. Of course you're going to want to keep a buffer in case two or three really long questions are grouped at the end, plus you want time to review some questions that you've flagged. Sometimes the wisest choice is to quickly discard some options and, if you're left with two options, just pick one and take that 50% chance of getting it right, and flag the question for later if you have time to review your choice and improve those 50% odds. Try to commit these milestones to memory, or at least the ones that you think are important, so you don't waste precious minutes trying to figure out whether you have a few minutes to spare.

Now, as far as resources that I used, that was it. Adrian's excellent course ($80), AWS's Exam Readiness (free) and TutorialsDojo's not-so-excellent-but-still-good practice exams ($15), which adds up to $95 on top of the $300 that the exam costs. I took some notes, which I read from time to time while studying and a couple of times the day before the exam, and I googled anything that I got wrong in the practice exams. I think that's all I can give you, but if you have any questions, feel free to comment or message me, I'll help you in any way I can.

Remember to study the fine details, remember to keep an eye on the clock, and most of all remember that you can do it!

As a final note, here are the notes that I took. They are INCOMPLETE, knowing all of this is NOT enough to pass the exam, but if you don't know at least 90% of what's in here, you're going to have a hard time. These were just the areas where I was lacking (hence the absence of important services like EC2). My recommendation is that you also take your own notes, to help you with your weak areas. Also, the EBS section has A LOT of number, which didn't prove to be that relevant. Do keep in mind though the use cases for each volume type.


  • GP2:

  • 1 IOPS = 1 IO (16 KB) in 1 second.

  • Max IO credits = 5.4 million. Starts full. Fills at rate of Baseline Performance. above the 100 minimum IO credits, 3 IO credits per second per GB of volume size.

  • Burst up to 3000 IOPS or the fill rate

  • Volumes above 1000 GB have baseline performance higher than 3000 IOPS and don't use credits.

  • GP3:

  • 3000 IOPS & 125 MiB/s standard (regardless of size)

  • Goes up to 16000 IOPS or 1000 MiB/s

  • Performance doesn't scale with size, need to scale it separately. It's still around 20% cheaper than GP2

  • Provisioned IOPS

  • Consistent low latency & jitter

  • 64000 IOPS, 1000 MB/s (256000 IOPS & 4000 MB/s for Block Express)

  • 4 GB to 16 TB (64 TB for Block Express)

  • IO1: 50 IOPS/GB max. IO2: 500 IOPS/GB max.

  • IOPS can be adjusted independently of size

  • Real limitations for maximum performance between EBS and EC2:

  • Per instance performance: IO1: 260000 IOPS & 7500 MB/s, IO2: 160000 IOPS & 4750 MB/s, IO2 Block Express: 260000 IOPS & 7500 MB/s

  • Limitations on the EC2 instance type and size

  • Use cases: Small volumes with really high performance, extreme performance, latency-sensitive workloads

  • HDD

  • st1: cheaper than SSD, really bad at random access. Max 500 IOPS, but 1 MB per IO. Max 500 MB/s. 40 MB/s/TB base, 250 MB/s/TB burst. Size 125 GB to 16 TB. Use case: sequential access, big data, data warehouses, log processing.

  • sc1: even cheaper, but cold, designed for infrequent workloads. Max 250 IOPS but 1 MB per IO. Max 250 MB/s. 12 MB/s/TB base, 80 MB/s/TB burst. Size 125 GB to 16 TB.

  • Instance Store volumes

  • Block storage devices (like EBS) but local to the instance. Physically connected to one EC2 host. Instances on that host can access them.

  • Included in instance price (for instance types that have it), use it or waste it

  • Attached at launch

  • Ephemeral storage. If the instance moves between hosts, data in instance volumes is lost.

  • Size depends on type and size of instance

  • EC2 instance type D3 = 4.6 GB/s throughput

  • EC2 instance type I3 = 16 GB/s sequential throughput

  • How to choose between EBS and Instance Store:

  • Persistence, resilience, backups or isolation from instance lifecycle: choose EBS

  • Cost for EBS: ST1 or SC1 (both are hard disks)

  • Throughput or streaming: ST1

  • Boot volume: NOT ST1 or SC1

  • Up to 16000 IOPS: GP2/3

  • Up to 64000 IOPS: IO2

  • Up to 256000 IOPS: IO2 Block Express

  • Up to 260000 IOPS: RAID0 + EBS (IO1/2-BE/GP2/3) (this is the max performance of an EC2 instance)

  • More than 260000 IOPS: Instance Store (but it's not persistent)

  • Support encryption, but it's NOT enabled by default.


  • Placement groups: Cluster: Same rack, higher network, one AZ, supported instance type, for fast speeds and low latency. Spread: always different racks, 7 instances per AZ, for critical instances. Partition: Max 7 partitions, each can have more than 1 instance, great for topology-aware apps like HDFS, HBase and Cassandra


  • GWLB: L3 LB for ingress/egress security scans, to pass traffic through scalable 3rd party appliances, using GENEVE protocol. Uses GWLB Endpoint, which can be added to a RT as a next hop. Packets are unaltered.

  • ALB:

  • Can have multiple SSL certificates associated with a secure listener and will automatically choose the optimal certificate using SNI.


  • Local Secondary Indexes (LSI): Can only be created when creating the table. Use the same PK but a different SK. Aside from keys, can project none, some or all attributes. Share capacity with the table. Are sparse: only items with values in PK and SK are projected. Use strong consistency.

  • Global Secondary Indexes (GSI): Can be created at any time. Different PK and SK. Own RCU and WCU allocations. Aside from keys, can project none, some or all attributes. Are sparse: only items with values in PK and SK are projected. Are always eventually consistent, replication between base table and GSI is async.

  • On LSIs and GSIs you can query on attributes not projected, but it's expensive.

  • Streams: A Kinesis Stream with 24-h rolling window of time-ordered item changes in a table. Enabled on a per-table basis. Records INSERTS, UPDATES and DELETES. Different view types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE and NEW_AND_OLD_IMAGE.


  • Serverless interactive querying service. Free, you only pay for the data consumed. Schema-on-read table-like translation. Original data never changed, remains on S3. Schema translates data to relational-like when reading. Can also query AWS logs, web server logs or Glue Data Catalogs. Can use Athena Federated Query to use a Lambda to transform the data before querying.


  • Data stream: Sub-1-second, custom processsing per record, choice of stream processing framework. Multi-shard. 1 shard = 1 MB ingestion and 2 MB consumption. Order is guaranteed within the shard, but not across shards. 24h (up to 7d for more $$$) rolling window. Multiple consumers.

  • Firehose: Connects to a data stream or ingests from multiple sources. Zero admin (automatically scalable, serverless and resilient), >60 seconds latency, delivers data to existing analytics tools: HTTP such as splunk, ElasticSearch and OpenSearch, S3 and Redshift (through intermediate S3 bucket). Order is guaranteed. Supports transformation of data on the fly. Billed by data streamed.

  • Difference between SQS and Kinesis data streams: SQS has 1 production group and 1 consumer group, and once a message is consumed it's deleted. It's typically used to decouple async communication. Kinesis is designed for huge scale ingestion and multiple consumer within the rolling window. It's designed for data ingestion, analytics, monitoring, app clicks, and streaming.

  • Kinesis Data Analytics: real-time processing of data using SQL. Ingests from Data streams or Firehose or S3, processes it and sends to Data streams, Lambda or Firehose. It fits between 2 streams and allows you to use SQL to modify the data.

Elastic MapReduce (EMR):

  • Managed implementation of Hadoop, Spark, HBase, Presto, Flink, Hive and Pig.

  • Huge-scale parallel processing. Two phases: Map and Reduce. Map: Data is separated into 'splits', each assigned to a mapper. Perform customized operations at scale. Reduce: Recombine data into results.

  • Can create clusters for long-term usage or ad-hoc (transient) usage.

  • Runs in one AZ in a VPC (NOT HA) using EC2 for compute. Auto scales and can use spot, instance fleet, reserved and on-demand.

  • Loads data from S3 and outputs to S3.

  • Uses Hadoop File System (HDFS). Data stored across multiple data nodes and replicated between nodes for fault tolerance.

  • Node types:

  • Master (at least 1): Manages the cluster and health, distributes workloads and controls access to HDFS and SSH access to the cluster. Don't run in spot.

  • Core (0 or more): Are the data nodes for HDFS, run task trackers and can run map and reduce tasks. HDFS runs in instance store. Don't run in spot.

  • Task nodes (0 or more): Only run tasks, don't run HDFS or task trackers. Ideal for spot instances.

  • EMRFS is a file system for EMR, backed by S3 (regionally resilient), persists past the lifetime of the cluster and is resilient to core node failure. It is slower than HDFS (S3 vs Instance Storage)


  • Petabyte-scale data warehouse. OLAP (column-based, not OLTP: row/transaction). Designed to aggregate data from OLTP DBs. NOT designed for real-time ingestion, but for batch ingestion.

  • Provisioned (server-based). Single-AZ (not HA).

  • Leader note: Query input, planning and aggregation. Applications interact with the leader node using ODBC or JDBC.

  • Compute node: performing queries of data. They have slices with the data, replicated to 1 additional node.

  • Automatic snapshots to S3 every 8h or 5GB with 1d (default) to 35d retention, plus manual snapshots, make the data resilient to AZ failure. Can be configured to be copied to another region.

  • DMS can migrate into Redshift and Firehose can stream into redshift.

  • Redshift Spectrum: Directly query data in S3. Federated query: Directly query data in other DBs.

  • For ad-hoc querying use Athena.

  • Can copy encrypted snapshots to another region by configuring a snapshot copy grant for the master key in the other region.


  • Lets you worry about defining batch jobs, handles the compute.

  • Job: script, executable or docker container. The thing to run.

  • Job definition: Metadata for a job, including permissions, resource config, mount points, etc.

  • Job queue: Jobs are added to queues, where they wait for compute capacity. Capacity comes from 1+ compute environments.

  • Compute environment: managed or unmanaged compute, configurable with instance type/size, vCPU amount, spot price, or using an existing environment with ECS (only with ECS).

  • Managed compute environment: Batch manages capacity, you pick on-demand or spot, instance size/type, max spot price. Runs in VPC, can run in private VP but you need to provide gateways.

  • Unmanaged compute environment: You create everything and manage everything outside of Batch (with ECS).

  • Jobs can come from Lambda API calls, Step Functions integration or API call, target of EventBridge (e.g. from S3).

  • When completed, can store data and metadata in S3 and DynamoDB, can continue execution of Step Functions, or post to Batch Event Stream.

  • Difference with Lambda: Lambda has 15-min execution limit, 10 GB disk space limit (as of 2022/03/24, probably not impacted the exam yet, previous limit was 512 MB) and limited runtime. Batch uses docker (so any runtime) and has no resource limits.


  • Redis: advanced data structures, persistent, multi-az, read replicas, can scale up but not out (and can't scale down), backups and restores. Highly available (multi-az)

  • Memcached: simple K/V, non-persistent, can scale up and out (multiple nodes), multi-thread, no backup/restore. NOT highly available

EFS and FSx

  • FSx for Windows: ENIs injected into VPCs. Native Windows FS, needs to be connected with Directory Service or self-managed AD, Single or Multi-AZ, on-demand and scheduled backups, accessible using VPC, VPN, peering, direct connect. Encryption at rest (KMS) and in transit. Keywords: VSS, SMB, DFS

  • FSx for Lustre: ENIs injected into VPCs. HPC for Linux (POSIX). Used for ML, big data or financial. 100s GB/s. deployment types: Scratch (short term, no replication) and Persistent (longer term, HA in one AZ, self-healing). Available over VPN or directconnect. Data is lazy loaded from S3 and can sync back to S3. < 1 ms latency.

  • EFS: NFSv4 FS for Linux. Mount targets in VPC. General purpose and Max I/O modes. Bursting and Provisioned throughput modes (separate from size). Standard and IA storage classes.

  • It's impossible to update the deployment type (single-AZ or multi-AZ) of an FSx for Windows file system after it has been created. To migrate to multi-AZ, create a new one and use DataSync to replicate the data.


  • BA/BI tool for visualizations and ad-hoc analysis.

  • Supports discovery and integration with AWS or external data sources

  • Used for dashboards or visualization.


  • Visibility timeout: Default is 30s, can be between 0s and 12h. Set on queue or per message.

  • Extended client library: for messages over SQS max (256 KB). Allows larger payloads (up to 2 GB) stored in S3. SendMessage uploads to S3 automatically and stores the link in the message. ReceiveMessage loads payload from S3 automatically. DeleteMessage also deletes payload in S3. Exam often mentions Java.

  • Delay queues: Postpone delivery of message (only in Standard queues). Set DelaySeconds and messages will be added immediately to the queue but will only be visible after the delay. Min (default) is 0s, max is 15m.

  • Dead-letter queues: Every time a message is received (or visibility timeout expires) in a queue, ReceiveCount is increased. When ReceiveCount > maxReceiveCount a message is moved to the dead-letter queue. Enqueue timestamp is unchanged (so Retention period is time at queue + time at DL queue).

  • FIFO queue: 3000 messages per second limit.

Amazon MQ:

  • Open-source message broker based on Apache ActiveMQ. JMS API with protocols such as AMQP, MQTT, OpenWire and STOMP. Provides queues and topics.

  • Runs in VPC with single instance or HA pair (active/standby)

  • Comparison with SNS and SQS: SNS and SQS use AWS APIs, public, highly scalable, AWS integrated. Amazon MQ is based on ActiveMQ and uses protocols JMS, AMQP, MQTT, OpenWire and STOMP. Look for protocols in the exam. Also, SNS and SQS for new apps, Amazon MQ for migrations with little to no app change.


  • FaaS, short-running (default 3s, max 15m)

  • Function = piece of code + wrapping and config. It uses a runtime (Python, Ruby, Java, Go and C#), and it's loaded and ran in a runtime environment. The environment has a direct memory (128MB to 10240 MB), indirect CPU and instance storage (default 512 MB, max 10 GB (as of 2022/03/24, probably not impacted the exam yet)) allocation.

  • Docker is an anti-pattern for lambda, lambda container images is something different and is possible.

  • Lambda container images: Include Lambda Runtime API (to run) and Runtime Interface Emulator (to local test) in the container image. Image is built and pushed to ECR, then operates as normal.

  • Used for serverless apps, file processing (S3 events), DB triggers (DynamoDB), serverless cron (EventBridge), realtime stream data processing (Kinesis).

  • By default Lambda runs in public space and can't access to VPC services. It can also run inside a VPC (needs EC2 Network permissions). Technically Lambda runs in a separate (shared) VPC, creates an ENI in your VPC per function (NOT per invocation) and uses an NLB, with 90s for initial setup and no additional invocation delay.

  • Lambda uses an Execution role (IAM role) which grants permissions.

  • Also a resource policy can control what can invoke the lambda.

  • Lambda logs to CW Logs, posts metrics to CW and can use X-Ray. Needs permissions for this, in the Execution role.

  • Invocation can be Sync, Async and Event Source Mapping:

  • Sync: CLI/API invokes and waits for response. Same is used through API Gateway. Client handles errors or retries.

  • Async: Typical when AWS services invoke Lambdas, such as S3. Lambda handles retries (configurable 0-2 times). Must be idempotent!! Events can be sent to DLQ and destinations (SQS, SNS, Lambda and EventBridge.

  • Event Source Mapping: Kinesis data streams sends batches of events to Lambdas using Event Source Mapping. Lambda needs permissions to access the source (which are used on its behalf by Event Source Mapping). Can use DLQ for failed events.

  • A function has immutable versions (with their own ARN, called qualified, the unqualified ARN points to $Latest), which include code + config (including env vars). $Latest points at the latest version, and aliases like Dev, Stage, Prod can be created and updated. A version is created when a Lambda is published, but it can be deployed without being published. You can also create aliases that point a % of traffic to an alias and another % to another alias.

  • Context includes runtime + variables created before handler + /tmp. Context can be reused, but we can't control that, must assume new context.

  • Cold start: Provision HW, install environment, download code, run code before handler. Can pre-warm using Provisioned concurrency. You are NOT billed for cold-start time (not even for code before handler)k.

  • Execution process: Init (cold start) (if necessary), Invoke (runs the function Handler), Shutdown (terminate environment).

  • Layers: Share and reuse code by externalising libraries, which are shared between functions. Also allows new, unsupported runtimes such as Rust. Deployment zip only contains specific code (is smaller). Can use AWS layers or write your own.

  • Lambda + ALB: ALB synchronously invokes Lambda (automatically translates HTTP(s) request to Lambda event).

  • Multi-value headers: (When using ALB + Lambda) Groups query string values by key, eg is passed as multiValueQueryStringParameters:{"search": ["a","b"]}. If not using Multi-value headers, only the last value is sent, e.g. "queryStringParameters": {"search":"b"}


  • Highly available, scalable, handles auth (directly with Cognito or with a Lambda authorizer), throttling, caching, CORS, transformations, OpenAPI spec, direct integration.

  • Can connect to services in AWS or on-prem

  • Supports HTTP, REST and WebSocket

  • Endpoint types: Edge-optimized: Routed to the nearest CloudFront POP, Regional: Clients in the same region, Private: Endpoint accessible only within a VPC via interface endpoint

  • APIs are deployed to stages, each stage has one deployment. Stages can be environments (dev, prod) or version (v1, v2). Each stage has its own config. They are NOT immutable, can be changed and rolled back. Stages can be enabled for canary deployments.

  • 2 phases: Request: Authorize, validate and transform. Response: transform, prepare and return. Request is called method request and is converted to integration request, which is passed to the backend. Response is called integration response, which is converted to method response and is returned to the client.

  • Types of integration: Mock: For testing, no backend. HTTP: Set translation for method->integration request and integration->method response in the API GW. HTTP Proxy: Pass through request unmodified, return to the client unmodified (backend needs to use supported format). AWS: Exposes AWS service actions. AWS_PROXY(Lambda): Low admin overhead Lambda endpoint.

  • Mapping templates: User for AWS and HTTP (non-PROXY) integrations. Modify or rename parameters, body or headers of the request/response. Uses Velocity Template Language (VTL). Can transform REST request to a SOAP API.

  • API GW has a timeout of 29s (can’t be increased)

  • Errors: 4XX: Client error, invalid request on client side. 5XX: Server error, valid request, backend issue. 400: Bad request, generic. 403: Access denied, authorizer denies or WAF filtered. 429: API GW throttled the request. 502: Bad GW, bad output returned by backend. 503: Service unavailable, backend offline?. 504: Integration failure/timeout, 29s limit achieved.

  • Cache: TTL 0s to 3600s (default 300s), 500 MB to 237 GB, can be encrypted. Defined per stage. Request only goes to backend if cache miss.

  • Payload limit: 10 MB


  • Private behaviors: A behavior can be made private if it uses a Trusted Signer (key created by root user). It will require a signed URL (access to 1 object) or signed cookie (access to the whole origin).

  • Origin Access Identity: Set an identity to the CloudFront behavior and only allow that identity in the origin (e.g. S3 bucket).

Storage Gateway

  • File Gateway: Access S3/Glacier through NFS and SMB protocols. Only the most recent data is stored (cached) on prem. NOT low-latency because from on prem it needs to fetch data form S3.

  • Tape Gateway: Access S3/Glacier through iSCSI VLT. Mainly used for archiving. Backed by Glacier, so can't consume in real time.

  • Stored-Volume Gateway: iSCSI-mounted volume stored on-prem and async backed to S3 as EBS snapshots. 16 TB per volume, max 32 volumes per gateway = max 512 TB. Is low latency, since all data is stored on prem, S3 is just used as backup of EBS snapshots of the volume.

  • Cached-Volume Gateway: iSCSI-mounted volume stored in S3 and cached on-prem. 32 TB per volume, max 32 volumes per gateway = 1024 TB. Data is stored on S3, NOT on-prem. On-prem only has a cache of the data, so low latency will only work for the cached data, not all data.


  • Retain: Stays on prem, no migration for now, revisit in the future

  • Re-host: Lift and shift with no changes

  • Refactor: Architect brand new, cloud native app. Lots of work

  • Re-platform: Lift and shift with some tinkering

  • Replace: Buy a native solution (not build one)

  • Retire: Solution is no longer needed, it's not replaced with something else Migration process:

  • Plan: Discovery: making sure we know what's really happening, identify dependencies, check all the corners and ask the questions: Assessment and profiling, data requirements and classification, prioritization, business logic and infrastructure dependencies. Design: Detailed migration plan, effort estimation, security and risk assessment. AWS Application Discovery Service, AWS Database Migration Service.

  • Build: Transform: Network topology, migrate, deploy, validate. Transition: Pilot testing, transition to support, release management, cutover and decomission.

  • Run: Operate: Staff training, monitoring, incident management, provisioning. Optimize: Monitoring-driven optimization, continuous integration and continuous deployment, well-architected framework.


  • S3 Object Lock (requires versioning):

  • Legal Hold: Turn on or off. Object versions can't be deleted or modified while turned on, can be turned off.

  • Retention Compliance: Set a duration, object versions can't be deleted or modified for the duration. Can't be disabled, not even by root.

  • Retention Governance: Set a duration, object versions can't be deleted or modified for the duration. Special permissions allow changing the policy.


  • Data security and privacy service. Identifies data that should be private.

  • Select S3 buckets, create discovery job, set managed or custom data identifiers, post policy findings and sensitive data findings to EventBridge o Security Hub.

Interface and Gateway endpoints

  • Interface endpoints: ENI with private IP for traffic to services with PrivateLink

  • Gateway endpoints: Target for a route in the RT, only used for S3 or DynamoDB

  • NACL: Limit of 20 rules, can be increased to 40


  • Public VIF: Used for AWS public services (including VPN)

  • Private VIF: Used for resources inside a VPC

Schema Conversion Tool (SCT):

  • You need to configure the data extraction agent first on your on-premises server.

Database Migration System (DMS):

  • can directly migrate the data to Amazon Redshift.


  • cross-region read replicas

  • multi-master: all master nodes need to be in the same region, and can't enable cross-region read replicas.

  • Max size: 64 TB

Step Functions:

  • Does not directly support Mechanical Turk, in that case use SWF.


  • Provides search capabilities, for example for documents stored in S3.

AWS Config:

  • Can aggregate data from multiple AWS accounts using an Aggregator

  • Can only perform actions in the same AWS account.

Most important thing to remember:

  • You can do it!!!

Thanks for reading!

If you're interested in building on AWS, check out my newsletter: Simple AWS.

It's free, runs every Monday, and tackles one use case at a time, with all the best practices you need.

If you want to know more about me, visit my website,