Table of contents
- My Study Materials and Strategy
- Final Thoughts
- Exam notes
- EBS
- EC2
- Placement groups
- ELB:
- DynamoDB:
- Athena
- Kinesis
- Elastic MapReduce (EMR)
- Redshift
- AWS Batch
- ElastiCache
- EFS and FSx
- QuickSight
- SQS
- Amazon MQ
- Lambda
- API GW
- CloudFront
- Storage Gateway
- Migrations
- S3
- Amazon Macie
- Interface and Gateway endpoints
- Amazon DirectConnect (DX)
- Schema Conversion Tool (SCT)
- Database Migration System (DMS)
- RDS
- Step Functions
- CloudSearch
- AWS Config
- Most important thing to remember
I set my sights on the SA Pro cert a while ago, but for multiple reasons I couldn't find the time to sit down and study, until early this year. On April 6th I finally sat the exam and passed with a score of 859. Here's my account on how I prepared for it, what the exam felt like, and a ton of notes that I took about small technical details that can make a difference in a question.
My Study Materials and Strategy
While I had some experience as a freelance architect and AWS Authorized Instructor, the past year saw me working a lot with code and GCP, and barely even touching AWS, so I knew I needed a full course that would help me remember the basics (in case I had forgotten anything) and also level up on the advanced stuff. I chose Adrian Cantrill's AWS Certified Solutions Architect - Professional course for that, and it was excellent, though quite long.
It took me over a month and a half to go over Adrian's course, but after that I felt in a pretty good place, with his excellent lessons and demos. However, I knew something must be lacking, from my memory if not from the course, so I signed in to AWS SkillBuilder and found the Exam Readiness: AWS Certified Solutions Architect – Professional course. It says 4 hours, but I think you should take at least 6, because while the course doesn't give you any new knowledge, it helps you a lot to reflect on what you're missing and identify your weaknesses, and that's what's going to drive your next steps.
Identifying and Addressing Weaknesses
My weaknesses weren't focused on a single area, since those I had identified earlier and covered by re-watching Adrian's lessons as many times as necessary (I think I watched the Direct Connect ones 4 or 5 times). Instead of not knowing one service or one kind of solution, my weaknesses were all over the place, not in the general aspects but rather in the smallest details that mattered.
Some of the not so small details:
If you're connecting Direct Connect to a VPC without a VPN, should you use a public or private VIF? What about when using site-to-site VPN? Answer: private when going to the VPC directly, public when using a VPN because Site-to-Site VPN is a public service (i.e. not in a VPC, same as S3 for example).
Is Kinesis Firehose able to stream data in real time? Answer: No, it has a 60-second latency, and is considered near-real time, NOT real time.
Some of the much smaller ones:
In ALB, can you associate multiple SSL certificates with the same listener? If so, how will the listener choose the correct certificate? Answer: Yes, and the listener automatically chooses the correct cert using SNI.
Is data ordered in a Kinesis Data Stream? Answer: Yes inside the shard, not across multiple shards.
In SQS with a retention period of 7 days, if a message is moved to the DLQ 5 days after being enqueued, when will it be deleted? Answer: In 2 days, because the retention period checks the enqueue timestamp, which is unchanged when a message is moved to the DLQ.
Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. For a limited time, you'll get a free ebook (valued at $10) when you subscribe.
Focusing on Practice Exams
So I knew I was lacking, but I didn't even know the questions that I should seek answers to. I tried going to the FAQs, but let me tell you, those are SUPER LONG and full of A TON of info that's probably not relevant to the exam (though at the professional level you should assume everything is relevant). After about half an hour of just reading the FAQs and getting terribly bored, I went online to search for practice exams, so I could make my own mistakes and learn that way. I found the AWS Certified Solutions Architect Professional Practice Exams 2022 in TutorialsDojo and purchased that.
On a brief note, TutorialsDojo's practice exams are excellent, even if they're not perfect. Most answers are correct and the explanations are really good. I did find one or three that were ridiculous or outright technically impossible. Still, one or three among 375 (4 practice exams + 1 final exam) is very good. Just keep in mind that, when in doubt, you should look up the documentation and try to find the correct answer by yourself.
Benefits of Practice Exams
At this point, doing practice exams is by far the best thing that you can do, in my opinion. Making your own mistakes (TutorialsDojo does tell you which questions you got right or wrong, what the correct answer is, and why) really helps you to recall those small details that make a difference. Plus, you can do half of an exam, or just 10 questions, whenever you have the time. I do recommend doing at least one or two full, timed exams, but you don't have to do either a 3-hour study session or nothing at all; if all you have is 30 minutes, it's better to answer 5 or 10 questions than not doing anything. Also, write everything down, so you can go over your notes later.
Another huge thing about practice exams is that you get to practice timing yourself. You get 180 minutes for 75 questions, which is 2 minutes and 24 seconds per question. If it doesn't sound like much, it's because it isn't. Most questions are very long, much longer than in the SA Associate exam, and the correct answer often depends on a word or two. You'll find yourself scanning through answers 4 or 5 lines long that seem exactly the same, until you find the difference: a private VPC vs a public VPC, for example. Other times, what seems to be the best answer actually has a detail that means it won't work. For example, one answer might describe setting up the application in a private subnet and adding an interface VPC endpoint to access DynamoDB, while the other will talk about putting the application in a public subnet and using a gateway VPC endpoint to access DynamoDB. If you're not careful, you might miss the fact that DynamoDB does not support interface VPC endpoints, only gateway VPC endpoints.
Timing Strategy
For the SA Pro exam, I recommend spending the first 90 minutes reading through all 75 questions, and answering only the ones that you're 100% sure of. Flag the others for review, and take a quick break if needed. Then, go back to the flagged questions and spend the next 60 minutes trying to figure them out. Finally, spend the last 30 minutes going over all the questions again, reviewing your answers and ensuring you haven't missed any small details. This approach worked well for me and helped me manage my time effectively.
Final Thoughts
The AWS Solutions Architect Professional exam is challenging, but with the right study materials and practice exams, you can succeed. Adrian Cantrill's course, AWS SkillBuilder's Exam Readiness, and TutorialsDojo's practice exams were invaluable in my preparation. The key is to identify your weaknesses, focus on the small technical details, and practice your timing. Remember to always consult the documentation when in doubt, and take the time to learn from your mistakes. Best of luck in your certification journey!
Exam notes
The following are the notes I took on the very fine details for each service. They don't cover everything, just what I thought would be difficult and important to remember.
EBS
GP2
1 IOPS = 1 IO (16 KB) in 1 second.
Max IO credits = 5.4 million. Starts full. Fills at rate of Baseline Performance. above the 100 minimum IO credits, 3 IO credits per second per GB of volume size.
Burst up to 3000 IOPS or the fill rate
Volumes above 1000 GB have baseline performance higher than 3000 IOPS and don't use credits.
GP3
3000 IOPS & 125 MiB/s standard (regardless of size)
Goes up to 16000 IOPS or 1000 MiB/s
Performance doesn't scale with size, need to scale it separately. It's still around 20% cheaper than GP2
Provisioned IOPS
Consistent low latency & jitter
64000 IOPS, 1000 MB/s (256000 IOPS & 4000 MB/s for Block Express)
4 GB to 16 TB (64 TB for Block Express)
IO1: 50 IOPS/GB max. IO2: 500 IOPS/GB max.
IOPS can be adjusted independently of size
Real limitations for maximum performance between EBS and EC2:
Per instance performance: IO1: 260000 IOPS & 7500 MB/s, IO2: 160000 IOPS & 4750 MB/s, IO2 Block Express: 260000 IOPS & 7500 MB/s
Limitations on the EC2 instance type and size
Use cases: Small volumes with really high performance, extreme performance, latency-sensitive workloads
HDD
st1: cheaper than SSD, really bad at random access. Max 500 IOPS, but 1 MB per IO. Max 500 MB/s. 40 MB/s/TB base, 250 MB/s/TB burst. Size 125 GB to 16 TB. Use case: sequential access, big data, data warehouses, log processing.
sc1: even cheaper, but cold, designed for infrequent workloads. Max 250 IOPS but 1 MB per IO. Max 250 MB/s. 12 MB/s/TB base, 80 MB/s/TB burst. Size 125 GB to 16 TB.
Instance Store volumes
Block storage devices (like EBS) but local to the instance. Physically connected to one EC2 host. Instances on that host can access them.
Included in instance price (for instance types that have it), use it or waste it
Attached at launch
Ephemeral storage. If the instance moves between hosts, data in instance volumes is lost.
Size depends on type and size of instance
EC2 instance type D3 = 4.6 GB/s throughput
EC2 instance type I3 = 16 GB/s sequential throughput
How to choose between EBS and Instance Store:
Persistence, resilience, backups or isolation from instance lifecycle: choose EBS
Cost for EBS: ST1 or SC1 (both are hard disks)
Throughput or streaming: ST1
Boot volume: NOT ST1 or SC1
Up to 16000 IOPS: GP2/3
Up to 64000 IOPS: IO2
Up to 256000 IOPS: IO2 Block Express
Up to 260000 IOPS: RAID0 + EBS (IO1/2-BE/GP2/3) (this is the max performance of an EC2 instance)
More than 260000 IOPS: Instance Store (but it's not persistent)
Support encryption, but it's NOT enabled by default.
EC2
Placement groups
- Cluster: Same rack, higher network, one AZ, supported instance type, for fast speeds and low latency
- Spread: always different racks, 7 instances per AZ, for critical instances
- Partition: Max 7 partitions, each can have more than 1 instance, great for topology-aware apps like HDFS, HBase and Cassandra
ELB:
GWLB:
L3 LB for ingress/egress security scans
To pass traffic through scalable 3rd party appliances, using GENEVE protocol.
Uses GWLB Endpoint, which can be added to a RT as a next hop.
Packets are unaltered.
ALB:
- Can have multiple SSL certificates associated with a secure listener and will automatically choose the optimal certificate using SNI.
DynamoDB:
Local Secondary Indexes (LSI)
Can only be created when creating the table
Use the same PK but a different SK
Aside from keys, can project none, some or all attributes
Share capacity with the table
Are sparse: only items with values in PK and SK are projected
Use strong consistency.
Global Secondary Indexes (GSI)
Can be created at any time
Different PK and SK
Own RCU and WCU allocations
Aside from keys, can project none, some or all attributes
Are sparse: only items with values in PK and SK are projected
Are always eventually consistent, replication between base table and GSI is async.
On LSIs and GSIs you can query on attributes not projected, but it's expensive.
DynamoDB Streams
A Kinesis Stream with 24-h rolling window of time-ordered item changes in a table
Enabled on a per-table basis
Records INSERTS, UPDATES and DELETES
Different view types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE and NEW_AND_OLD_IMAGE.
Athena
Serverless interactive querying service
Free, you only pay for the data consumed
Schema-on-read table-like translation
Original data never changed, remains on S3
Schema translates data to relational-like when reading
Can also query AWS logs, web server logs or Glue Data Catalogs
Can use Athena Federated Query to use a Lambda to transform the data before querying.
Kinesis
Data stream
Sub-1-second
custom processsing per record
choice of stream processing framework
Multi-shard
1 shard = 1 MB ingestion and 2 MB consumption
Order is guaranteed within the shard, but not across shards
24h (up to 7d for more $$$) rolling window
Multiple consumers
Firehose
Connects to a data stream or ingests from multiple sources
Zero admin (automatically scalable, serverless and resilient)
\>60 seconds latency
delivers data to existing analytics tools: HTTP such as splunk, ElasticSearch and OpenSearch, S3 and Redshift (through intermediate S3 bucket)
Order is guaranteed
Supports transformation of data on the fly
Billed by data streamed.
Difference between SQS and Kinesis data streams
SQS has 1 production group and 1 consumer group, and once a message is consumed it's deleted
It's typically used to decouple async communication
Kinesis is designed for huge scale ingestion and multiple consumer within the rolling window
It's designed for data ingestion, analytics, monitoring, app clicks, and streaming
Kinesis Data Analytics
real-time processing of data using SQL
Ingests from Data streams or Firehose or S3, processes it and sends to Data streams, Lambda or Firehose
It fits between 2 streams and allows you to use SQL to modify the data
Elastic MapReduce (EMR)
Managed implementation of Hadoop, Spark, HBase, Presto, Flink, Hive and Pig.
Huge-scale parallel processing
Two phases: Map and Reduce. Map: Data is separated into 'splits', each assigned to a mapper. Perform customized operations at scale. Reduce: Recombine data into results.
Can create clusters for long-term usage or ad-hoc (transient) usage.
Runs in one AZ in a VPC (NOT HA) using EC2 for compute
Auto scales and can use spot, instance fleet, reserved and on-demand.
Loads data from S3 and outputs to S3.
Uses Hadoop File System (HDFS)
Data stored across multiple data nodes and replicated between nodes for fault tolerance.
Node types
Master (at least 1): Manages the cluster and health, distributes workloads and controls access to HDFS and SSH access to the cluster. Don't run in spot.
Core (0 or more): Are the data nodes for HDFS, run task trackers and can run map and reduce tasks. HDFS runs in instance store. Don't run in spot.
Task nodes (0 or more): Only run tasks, don't run HDFS or task trackers. Ideal for spot instances.
EMRFS
Is a file system for EMR
backed by S3 (regionally resilient)
persists past the lifetime of the cluster and is resilient to core node failure
It is slower than HDFS (S3 vs Instance Storage)
Redshift
Petabyte-scale data warehouse
OLAP (column-based, not OLTP: row/transaction)
Designed to aggregate data from OLTP DBs
NOT designed for real-time ingestion, but for batch ingestion.
Provisioned (server-based)
Single-AZ (not HA).
Automatic snapshots to S3 every 8h or 5GB with 1d (default) to 35d retention, plus manual snapshots, make the data resilient to AZ failure. Can be configured to be copied to another region.
DMS can migrate into Redshift and Firehose can stream into redshift.
Redshift Spectrum: Directly query data in S3. Federated query: Directly query data in other DBs.
For ad-hoc querying use Athena.
Can copy encrypted snapshots to another region by configuring a snapshot copy grant for the master key in the other region.
Node types in Redshift
Leader node: Query input, planning and aggregation. Applications interact with the leader node using ODBC or JDBC.
Compute node: performing queries of data. They have slices with the data, replicated to 1 additional node.
AWS Batch
Lets you worry about defining batch jobs, handles the compute.
Job: script, executable or docker container. The thing to run.
Job definition: Metadata for a job, including permissions, resource config, mount points, etc.
Job queue: Jobs are added to queues, where they wait for compute capacity. Capacity comes from 1+ compute environments.
Compute environment: managed or unmanaged compute, configurable with instance type/size, vCPU amount, spot price, or using an existing environment with ECS (only with ECS).
Managed compute environment: Batch manages capacity, you pick on-demand or spot, instance size/type, max spot price. Runs in VPC, can run in private VP but you need to provide gateways.
Unmanaged compute environment: You create everything and manage everything outside of Batch (with ECS).
Jobs can come from Lambda API calls, Step Functions integration or API call, target of EventBridge (e.g. from S3).
When completed, can store data and metadata in S3 and DynamoDB, can continue execution of Step Functions, or post to Batch Event Stream.
Difference between AWS Batch and AWS Lambda
Lambda has 15-min execution limit, 10 GB disk space limit (as of 2022/03/24, probably not impacted the exam yet, previous limit was 512 MB) and limited runtime
Batch uses docker (so any runtime) and has no resource limits.
ElastiCache
Redis: advanced data structures, persistent, multi-az, read replicas, can scale up but not out (and can't scale down), backups and restores. Highly available (multi-az)
Memcached: simple K/V, non-persistent, can scale up and out (multiple nodes), multi-thread, no backup/restore. NOT highly available
EFS and FSx
FSx for Windows
ENIs injected into VPCs
Native Windows FS
needs to be connected with Directory Service or self-managed AD
Single or Multi-AZ
on-demand and scheduled backups
accessible using VPC, VPN, peering, direct connect
Encryption at rest (KMS) and in transit
Keywords: VSS, SMB, DFS
FSx for Lustre
ENIs injected into VPCs
HPC for Linux (POSIX)
Used for ML, big data or financial
100s GB/s
deployment types: Scratch (short term, no replication) and Persistent (longer term, HA in one AZ, self-healing)
Available over VPN or direct connect
Data is lazy loaded from S3 and can sync back to S3
< 1 ms latency.
EFS
NFSv4 FS for Linux
Mount targets in VPC
General purpose and Max I/O modes
Bursting and Provisioned throughput modes (separate from size)
Standard and IA storage classes.
It's impossible to update the deployment type (single-AZ or multi-AZ) of an FSx for Windows file system after it has been created
To migrate to multi-AZ, create a new one and use DataSync to replicate the data.
QuickSight
BA/BI tool for visualizations and ad-hoc analysis.
Supports discovery and integration with AWS or external data sources
Used for dashboards or visualization.
SQS
Visibility timeout
Default is 30s
can be between 0s and 12h
Set on queue or per message.
Extended client library
for messages over SQS max (256 KB)
Allows larger payloads (up to 2 GB) stored in S3
SendMessage uploads to S3 automatically and stores the link in the message
ReceiveMessage loads payload from S3 automatically
DeleteMessage also deletes payload in S3
Exam often mentions Java.
Delay queues
Postpone delivery of message (only in Standard queues)
Set DelaySeconds and messages will be added immediately to the queue but will only be visible after the delay
Min (default) is 0s, max is 15m.
Dead-letter queues
Every time a message is received (or visibility timeout expires) in a queue, ReceiveCount is increased
When ReceiveCount > maxReceiveCount a message is moved to the dead-letter queue
Enqueue timestamp is unchanged (so Retention period is time at queue + time at DL queue).
FIFO queue
- 3000 messages per second limit.
Amazon MQ
Open-source message broker based on Apache ActiveMQ
JMS API with protocols such as AMQP, MQTT, OpenWire and STOMP
Provides queues and topics.
Runs in VPC with single instance or HA pair (active/standby)
Comparison with SNS and SQS: SNS and SQS use AWS APIs, public, highly scalable, AWS integrated. Amazon MQ is based on ActiveMQ and uses protocols JMS, AMQP, MQTT, OpenWire and STOMP. Look for protocols in the exam. Also, SNS and SQS for new apps, Amazon MQ for migrations with little to no app change.
Lambda
FaaS, short-running (default 3s, max 15m)
Function = piece of code + wrapping and config
It uses a runtime (Python, Ruby, Java, Go and C#), and it's loaded and ran in a runtime environment
The environment has a direct memory (128MB to 10240 MB), indirect CPU and instance storage (default 512 MB, max 10 GB) allocation.
Docker is an anti-pattern for lambda, lambda container images is something different and is possible.
Used for serverless apps, file processing (S3 events), DB triggers (DynamoDB), serverless cron (EventBridge), realtime stream data processing (Kinesis).
By default Lambda runs in public space and can't access to VPC services
It can also run inside a VPC (needs EC2 Network permissions)
Technically Lambda runs in a separate (shared) VPC, creates an ENI in your VPC per function (NOT per invocation) and uses an NLB, with 90s for initial setup and no additional invocation delay.
Lambda uses an Execution role (IAM role) which grants permissions.
Also a resource policy can control what can invoke the lambda.
Lambda logs to CW Logs, posts metrics to CW and can use X-Ray. Needs permissions for this, in the Execution role.
Context includes runtime + variables created before handler + /tmp. Context can be reused, but we can't control that, must assume new context.
Cold start: Provision HW, install environment, download code, run code before handler. Can pre-warm using Provisioned concurrency. You are NOT billed for cold-start time (not even for code before handler)k.
Execution process: Init (cold start) (if necessary), Invoke (runs the function Handler), Shutdown (terminate environment).
Lambda + ALB: ALB synchronously invokes Lambda (automatically translates HTTP(s) request to Lambda event).
Multi-value headers: (When using ALB + Lambda) Groups query string values by key, eg http://a.io?&search=a&search=b is passed as multiValueQueryStringParameters:{"search": ["a","b"]}. If not using Multi-value headers, only the last value is sent, e.g. "queryStringParameters": {"search":"b"}
Lambda layers
Share and reuse code by externalising libraries, which are shared between functions
Also allows new, unsupported runtimes such as Rust
Deployment zip only contains specific code (is smaller)
Can use AWS layers or write your own.
Lambda versions
A function has immutable versions
Each with their own ARN, called qualified, the unqualified ARN points to $Latest
Each includes code + config (including env vars)
$Latest points at the latest version, and aliases like Dev, Stage, Prod can be created and updated
A version is created when a Lambda is published, but it can be deployed without being published
You can also create aliases that point a % of traffic to an alias and another % to another alias.
Lambda invocation types
Sync: CLI/API invokes and waits for response. Same is used through API Gateway. Client handles errors or retries.
Async: Typical when AWS services invoke Lambdas, such as S3. Lambda handles retries (configurable 0-2 times). Must be idempotent!! Events can be sent to DLQ and destinations (SQS, SNS, Lambda and EventBridge.
Event Source Mapping: Kinesis data streams sends batches of events to Lambdas using Event Source Mapping. Lambda needs permissions to access the source (which are used on its behalf by Event Source Mapping). Can use DLQ for failed events.
Lambda container images
Include Lambda Runtime API (to run) and Runtime Interface Emulator (to local test) in the container image
Image is built and pushed to ECR, then operates as normal.
API GW
Highly available
scalable
handles auth (directly with Cognito or with a Lambda authorizer), throttling, caching, CORS, transformations, OpenAPI spec, direct integration.
Can connect to services in AWS or on-prem
Supports HTTP, REST and WebSocket
Endpoint types
Edge-optimized: Routed to the nearest CloudFront POP
Regional: Clients in the same region
Private: Endpoint accessible only within a VPC via interface endpoint
APIs are deployed to stages, each stage has one deployment. Stages can be environments (dev, prod) or version (v1, v2). Each stage has its own config. They are NOT immutable, can be changed and rolled back. Stages can be enabled for canary deployments.
2 phases: Request: Authorize, validate and transform. Response: transform, prepare and return. Request is called method request and is converted to integration request, which is passed to the backend. Response is called integration response, which is converted to method response and is returned to the client.
Types of integration
Mock: For testing, no backend
HTTP: Set translation for method->integration request and integration->method response in the API GW
HTTP Proxy: Pass through request unmodified, return to the client unmodified (backend needs to use supported format)
AWS: Exposes AWS service actions
AWS_PROXY(Lambda): Low admin overhead Lambda endpoint.
Mapping templates: User for AWS and HTTP (non-PROXY) integrations. Modify or rename parameters, body or headers of the request/response. Uses Velocity Template Language (VTL). Can transform REST request to a SOAP API.
API GW has a timeout of 29s (can’t be increased)
Errors
4XX: Client error, invalid request on client side
5XX: Server error, valid request, backend issue
400: Bad request, generic
403: Access denied, authorizer denies or WAF filtered
429: API GW throttled the request
502: Bad GW, bad output returned by backend
503: Service unavailable, backend offline?
504: Integration failure/timeout, 29s limit achieved.
Cache: TTL 0s to 3600s (default 300s), 500 MB to 237 GB, can be encrypted. Defined per stage. Request only goes to backend if cache miss.
Payload limit: 10 MB
CloudFront
Private behaviors: A behavior can be made private if it uses a Trusted Signer (key created by root user). It will require a signed URL (access to 1 object) or signed cookie (access to the whole origin).
Origin Access Identity: Set an identity to the CloudFront behavior and only allow that identity in the origin (e.g. S3 bucket).
Storage Gateway
File Gateway
Access S3/Glacier through NFS and SMB protocols
Only the most recent data is stored (cached) on prem
NOT low-latency because from on prem it needs to fetch data form S3.
Tape Gateway
Access S3/Glacier through iSCSI VLT
Mainly used for archiving
Backed by Glacier, so can't consume in real time.
Stored-Volume Gateway
iSCSI-mounted volume stored on-prem and async backed to S3 as EBS snapshots
16 TB per volume, max 32 volumes per gateway = max 512 TB
Is low latency, since all data is stored on prem, S3 is just used as backup of EBS snapshots of the volume.
Cached-Volume Gateway
iSCSI-mounted volume stored in S3 and cached on-prem
32 TB per volume, max 32 volumes per gateway = 1024 TB
Data is stored on S3, NOT on-prem
On-prem only has a cache of the data, so low latency will only work for the cached data, not all data.
Migrations
6R
Retain: Stays on prem, no migration for now, revisit in the future
Re-host: Lift and shift with no changes
Refactor: Architect brand new, cloud native app. Lots of work
Re-platform: Lift and shift with some tinkering
Replace: Buy a native solution (not build one)
Retire: Solution is no longer needed, it's not replaced with something else Migration process:
Migration Plan
Discovery: making sure we know what's really happening, identify dependencies, check all the corners and ask the questions
Assessment and profiling, data requirements and classification, prioritization, business logic and infrastructure dependencies
Design: Detailed migration plan, effort estimation, security and risk assessment.
Tools: AWS Application Discovery Service, AWS Database Migration Service.
Migration Build
Transform: Network topology, migrate, deploy, validate
Transition: Pilot testing, transition to support, release management, cutover and decomission.
Migration Run
Operate: Staff training, monitoring, incident management, provisioning
Optimize: Monitoring-driven optimization, continuous integration and continuous deployment, well-architected framework.
S3
S3 Object Lock (requires versioning)
Legal Hold: Turn on or off. Object versions can't be deleted or modified while turned on, can be turned off.
Retention Compliance: Set a duration, object versions can't be deleted or modified for the duration. Can't be disabled, not even by root.
Retention Governance: Set a duration, object versions can't be deleted or modified for the duration. Special permissions allow changing the policy.
Amazon Macie
Data security and privacy service. Identifies data that should be private.
Select S3 buckets, create discovery job, set managed or custom data identifiers, post policy findings and sensitive data findings to EventBridge o Security Hub.
Interface and Gateway endpoints
Interface endpoints: ENI with private IP for traffic to services with PrivateLink
Gateway endpoints: Target for a route in the RT, only used for S3 or DynamoDB
NACL: Limit of 20 rules, can be increased to 40
Amazon DirectConnect (DX)
Public VIF: Used for AWS public services (including VPN)
Private VIF: Used for resources inside a VPC
Schema Conversion Tool (SCT)
- You need to configure the data extraction agent first on your on-premises server.
Database Migration System (DMS)
- can directly migrate the data to Amazon Redshift.
RDS
cross-region read replicas
multi-master: all master nodes need to be in the same region, and can't enable cross-region read replicas.
Max size: 64 TB
Step Functions
- Does not directly support Mechanical Turk, in that case use SWF.
CloudSearch
- Provides search capabilities, for example for documents stored in S3.
AWS Config
Can aggregate data from multiple AWS accounts using an Aggregator
Can only perform actions in the same AWS account.
Most important thing to remember
- You can do it!!!
Master AWS with Real Solutions and Best Practices.
Join over 2500 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.
Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them
Subscribe now and you'll get the AWS Made Simple and Fun ebook for free (valued at $10). Limited offer, don't wait!
If you'd like to know more about me, you can find me on LinkedIn or at www.guilleojeda.com