AWS DVA review

12-09-2021 AWS Comments Word Count: 24.5k(words) Read Count: 153(minutes)

I PASSED!

View My Certificate

ELB + ASG

Why use a load balancer

spread load across multiple downstream instances
expose a single point of access (DNS) to your application
seamlessly handle faliures of downstream instances
do regular health checks to your instances
provide SSL termination (HTTPS) for your websites
enforce stickness with cookies
high availability across zones
separate public traffic from private traffic

ELB integrated with many AWS services

EC2, ASG, ECS
ACM, CloudWatch
Route 53, WAF, Global Accelerator

ALB

Layer 7 (HTTP)
load balancing to multiple HTTP applications across machines (target groups)
load balancing to multiple applications on the same machine (containers)
support for HTTP and WebSockets
support redirects (from HTTP to HTTPS)
routing tables to different target groups
- based on path in URL
- based on hostname in URL
- routing based on query string, headers
ALB are a great fit for micro services and container based application
has a port mapping feature to redirect to a dynamic port in ECS
in comparison, we need multiple CLB per application
the application doesn’t see the client IP directly

Target groups

EC2 instances
ECS tasks
lambda functions (HTTP request is translated into a JSON event)
IP addresses - must be private IPs
ALB can route to multiple target groups, each target group could have multiple instances
health checks are at the target group level
you can set rules to decide which target groups to redirect the traffic

NLB

layer 4 (TCP and UDP)
handle millions of request per second, lower latency
NLB has one static IP per AZ, and supports assigning Elastic IP (helpful for whitelisting specific IP)
NLB are used for extreme performance, TCP or UDP traffic

Sticky Sessions

it is possible to implement stickness so that the same client is always redirected to the same instance behind a load balancer
this works for CLB and ALB
the cookie used for stickness has an expiration date you control
use case: make sure the user doesn’t lose his session data
enabling stickness may bring imbalance to the load over the backend EC2 instnaces

SSL - server name indication

SNL solves the problem of loading multiple SSL certificates onto one web server (to serve multiple websites)
it is a newer protocol, and requires the client to indicate the hostname and the target server in the initial SSL handshake
the server will then find the correct certificate, or return the default one
only works for ALB and NLB, CloudFront

Connection Draining (De-registration delay)

time to complete the in-flight requests while the instance is de-registering or unhealthy
stops sending new requests to the EC2 instances which is de-registering
between 1 to 3600 seconds (default 300 seconds)
can be disabled (set value to 0)
set to a low value if your requests are short
instances will be terminated after the draining time is over

X-Forwarded-For and X-Forwarded-Proto

X-Forwarded-For
- The X-Forwarded-For (XFF) header is a de-facto standard header for identifying the originating IP address of a client connecting to a web server through an HTTP proxy or a load balancer. When traffic is intercepted between clients and servers, server access logs contain the IP address of the proxy or load balancer only. To see the original IP address of the client, the X-Forwarded-For request header is used.
X-Forwarded-Proto
- The X-Forwarded-Proto (XFP) header is a de-facto standard header for identifying the protocol (HTTP or HTTPS) that a client used to connect to your proxy or load balancer. Your server access logs contain the protocol used between the server and the load balancer, but not the protocol used between the client and the load balancer. To determine the protocol used between the client and the load balancer, the X-Forwarded-Proto request header can be used.

ASG

A launch configuration (launch template is the newer version)
- AMI + instance type
- EC2 user data
- EBS volume
- security groups
- SSH key pair
min size, max size, initial capacity
network + subnets information
load balancer information (so ASG knows which target group to launch the instnace), ASG and ELB can be linked
scaling policies
it is possible to scale an ASG based on CloudWatch alarms
an alarm monitors a metric (such as average CPU)
metrics are computed for the overall ASG instnaces
to update an ASG, you must provide a new launch configuration or new launch template
IAM roles attached to an ASG will get assigned to EC2 instances launched
ASG is free, you need to pay for the underlying resources launched

Scaling Policies

Target tracking scaling
- most simple and easy to setup
- example: I want the average ASG CPU to stay at around 40%
Simple / Step scaling
- when a CloudWatch alarm is triggered, then add 2 units
- when a CloudWatch alarm is triggered, then remove 1 unit
- the difference between simple and step scaling policies are: for step policy, you can create step adjustments, and ASG will change the number of instances based on the size of the alarm breach.
scheduled actions
- anticipate a scaling based on known usage patterns
- example: increase the min capacity to 10 at 5pm on Fridays
predictive scaling
- continuously forecast load and schedule scaling ahead

Good metrics to scale on

CPU utilization
request count per target
average network in / out

Scaling cooldowns

after a scaling policy happens, you are in the cooldown period (default is 300 seconds)
during the cooldown period, the ASG will not launch or terminate additional instance (to allow for metrics to stablize)
advice: use a ready to use AMI to reduce configuration time in order to be serving request faster and reduce the cooldown period

RDS

managed DB service for DB use SQL as a language
it allows you to create databases in the cloud that are managed by AWS
- Postgres
- MySQL
- MariaDB
- Oracle
- SQL server
- Aurora
RDS is a managed service
- automated provisioning, OS patching
- continuous backups and restore to specific timestamp (point in time restore)
- monitoring dashboards
- read replicas for improved read performance
- Multi AZ setup for DR
- maintenance windows for upgrades
- scaling capability
- storage backup by EBS
- RDS DB will be launched in a VPC in an AZ
you can’t SSH into your instances

RDS backups

Automated backups
- daily full backup of the database
- transaction logs are backuped by RDS every 5 mins
- ability to restore to any point in time (from oldest to 5 mins ago)
- 7 days retention (can be increased to 35 days)
DB snapshots
- manually triggered by the user
- retention of backup for as long as you want

RDS storage auto scaling

helps you increase storage on your RDS DB instnace dynamically
when RDS detects you are running out of free database storage, it scales automatically
avoid manually scaling your database storage
you have to set Maximum storage threshold
automatically modify storage if
- free storage is less than 10% of allocated storage
- low storage lasts at least 5 mins
- 6 hours have passed since last modification
useful for applications with unpredictable workloads

RDS read replicas for read scalability

up to 5 read replicas
within AZ, cross AZ or cross region
replication is async, so reads are eventually consistent
replicas can be promoted to their own DB
applications must update the connection string to leverage read replicas

RDS read replicas - use case

you have a production database, that is taking on normal load
you want to run a reporting application to run some analytics
you create a read replica to run the new workload there
the production application is unaffected
read replicas are used for SELECT only kind of statements

RDS read replicas - network cost

in AWS there is a network cost when data goes from one AZ to another
for RDS read replicas, within the same region, you don’t pay that fee, but you do need to pay if data goes to another region

RDS multi AZ (DR)

SYNC replication
one DNS name - automatic app failover to standby
increase availability
failover in case of loss of AZ, loss of network, instance or storage failure
no manual intervention in apps
not used for scaling (standby instance can’t be read or write)
NOTE: the read replicas can be setup as Multi AZ for DR

RDS from single AZ to Multi AZ

zero downtime operation (no need to stop the DB)
just click on modify for the database
the following happens internally
- a snapshot is taken
- a new DB is restored from the snapshot in a new AZ
- synchronization is established between the two databases

RDS security - encryption

at rest encryption
- possibility to encrypt the master and read replicas with AWS KMS - AES-256 encryption
- encryption has to be defined at launch time
- if the master is not encrypted, the read replicas cannot be encrypted
in flight encryption
- SSL certificates to encrypt data to RDS in flight
- provide SSL options with trust certificate when connecting to database

RDS encryption operations

encrypting RDS backups
- snapshots of unencrypted RDS databases are un-encrypted
- snapshots of encrypted RDS databases are encrypted
- can copy a snapshot into an encrypted one
to encrypt an un-encrypted RDS database
- create a snapshot of the un-encrypted database
- copy the snapshot and enable encryption for the snapshot
- restore the database from the encrypted snapshot
- migrate applications to the new database, and delete the old database

RDS security - Network and IAM

network security
- RDS databases are usually deployed within a private subnet, not in a public one
- RDS security works by leveraging security groups (the same concept as for EC2 instances) - it controls which IP / security group can communicate with RDS
Access management
- IAM policies help control who can manage AWS RDS
- traditional username and password can be used to login into the database
- IAM based authentication can be used to login into RDS MySQL and postgreSQL

Aurora

Aurora is a proprietary technology from AWS
postgres and MySQL are both supported as Aurora DB
Aurora is AWS cloud optimized and claims 5x performance improvement over MySQL on RDS, over 3x performance of Postgres on RDS
Aurora storage automatically grows in increments of 10GB, up to 64TB
Aurora can have 15 replicas while MySQL has 5, and the replication process is faster
failover in Aurora is instantaneous, it is HA native
user to connect to write endpoint or read endpoint, these endpoints will redirect traffic to the correct instances.
Security is the same as RDS
Aurora has 4 features
- one writer, multiple reader
- one writer, multiple readers - parallel query
- multiple writers
- severless

ElastiCache

the same way RDS is to get managed relational databases
ElatiCache is to get managed Redis and Memcachced
Caches are in memory databases with really high performance, low latency
helps reduce load off of databases for read intensive workloads
helps make your application stateless
AWS takes care of OS maintenance and patching, optimizations, setup, configuration, monitoring, failure recovery and backups
using ElastiCache involves heavy application code changes

DB cache

Application queries ElastiCache, if not avaialble, get from RDS and store in ElastiCache
helps relieve load in RDS
cache must have an invalidation strategy to make sure only the most current data is used in there

User session store

user logs into any of the application
the application writes the session data into ElastiCache
the user hits another instance of our application
the instance retrieves the data and the user is already logged in

Redis vs Memcached

Redis	Memcached
Multi AZ with auto failover	multi node for partitioning of data (sharding)
read replicas to scale reads and have HA	no HA
data durability using AOF persistence	non persistent
backup and restore features	no back and restore
-	Multi threaded architecture

Redis Auth
- if you enable encryption in transit, you can enable Redis Auth, you need to setup a token for your application to connect to Redis.

Caching implementation considerations

is it safe to cache data
- data may be out of data, eventaully consistent
is caching effective for that data
- pattern: data changing slowly, few keys are frequently needed, good to use caching
- anti pattern: data changing rapidly, all large keys space frequently needed, not good to use caching
is data structured well for caching?
- key value caching, or caching of aggregations results?
- caching is good for well structured data

lazy loading / cache aside / lazy population

# python

def get_user(user_id):
  // check the cache
  record = cache.get(user_id)

  if record is None:
    // run a DB query
    record = db.query("select * from users where id = ?", user_id)
    // populate the cache
    cache.set(user_id, record)
    return record
  else:
    return record

write through - add or update cache when database is updated

when there is a write call

write to DB
write to cache

pros
- data in cache is never stale, reads are quick
- wirte penalty vs read penalty (each write requires 2 calls)
cons
- missing data until it is added/ updated in the DB, mitigation is to implement lazy loading strategy as well, combine 2 strategies together
- cache churn - a lot of the data will never be read

# python

def save_user(user_id, values):

  # save to DB
  record = db.query("update users ... where id = ?", user_id, values)

  # push into cache
  cache.set(user_id, record)

  return record

cache evictions and TTL (time to live)

cache eviction can occur in 3 ways
- you delete item explicity in the cache
- item is evicted because the memory is full and it is not recently used (LRU)
- you set an item TTL
TTL are helpful for any kind of data
- leaderboard
- comments
- activity streams
TTL can range from few seconds to hours or days
if too many evictions happen due to memory, you should scale up or out

Final words

lazy loading is easy to implement and works for many situations as a foundation, especially on the read side
write through is usually combined with lazy loading as targeted for the queries or workloads that benefit from this optimization
setting a TTL is usually not a bad idea, except when you are using write through, set it to a sensible value for your application
only cache data that makes sense

ElastiCache replication - cluster mode disabled

one primary node, up to 5 replicas
asynchronous replication
the primary node is used for read and write
the other nodes are read only
we have only one shard, and all nodes are in the shard, each node has all the data
guard against data loss if node failure
multi AZ enabled by default for failover
helpful to scale read performance

ElastiCache replication - cluster mode enabled

data is partitioned across shards (helpful to scale writes)
each shard has a primary and up to 5 replica nodes, each shard has part of the data
multi AZ capability
up to 500 nodes per cluster

Route 53

DNS
- Domain Name System which translates the human friendly hostnames into the machine IP addresses
Route 53
- A highly available, scalable, fully managed and authoritive DNS
- route 53 is also a Domain Registrar
- ability to check the health of your resources
- the only AWS service which provides 100% availability SLA

Hosted zones

a container for records that define how to route traffic to a domain and its subdomains
public hosted zones
- contains records that specify how to route traffic on the internet (public domain names)
private hosted zones
- contain records that specify how you route traffic within one or more VPCs (private domain names)

CNAME vs alias

CNAME
- points a hostname to any other hostname
- only for non root domain
Alias
- points a hostname to an AWS resource
- works for root domain and non root domain
- free of charge
- native health check
- automatically recognizes changes in the resource’s IP addresses
Alias targets
- ELB
- CloudFront distributions
- API gateway
- Elastic Beanstalk environments
- S3 websites
- VPC interface endpoints
- Global accelerator
- route 53 record in the same hosted zone
You cannot set an Alias record for an EC2 DNS name

Route 53 - routing policies

define how route 53 responds to DNS queries
route 53 supports the following routing policies
- simple
- weighted
- failover
- latency based
- geolocation
- multi value answer
- geoproximity (using route 53 traffic flow feature)

Simple

typically, route traffic to a single resource
can specify multiple values in the same record
if multiple values are returned, a random one is chosen by the client
when Alias enabled, sepecify only one AWS resource
can’t be associated with health checks

Weighted

control the percentage of the requests that go to each specific resource
assign each record a relative weight
DNS records must have the same name and type
use cases: load balancing between regions, testing new application versions…
assign a weight of 0 to a record to stop sending traffic to a resource
if all records have weight of 0, then all records will be returned equally

latency

redirect to the resource that has the least latency close to us
super helpful when latency for users is a priority
latency is based on traffic between users and AWS regions
Germany users may be directed to the US
can be associated with health checks (has a failover capability)

health checks

HTTP health checks are only for public resources
health check => automated DNS failover

health checks that monitor an endpoint
health checks that monitor other health checks (calculated health checks)
health checks that monitor cloudwatch alarms

health checks are integrated with CW metrics

monitor an endpoint

about 15 global health checks will check the endpoint health
- healthy / unhealthy threshold = 3
- interval - 30 seconds
- supported protocol: HTTP, HTTPS, TCP
- if > 18% of health checks report the endpoint is healthy, route 53 consider it is healthy, otherwise, it is unhealthy
- ability to choose which locations you want route 53 to use
health checks pass only when the endpoint responds with the 2xx and 3xx status codes
health checks can be setup to pass or fail based on the text in the first 5120 bytes of the response
your ELB must allow the incoming requests from the route 53 health checkers IP address range

calculated health checks

combine the results of multiple health checks into a single health check
you can use OR, AND, or NOT
can monitor up to 256 child health checks
specify how many of the health checks need to pass to make the parent pass
usage: perform maintenance to your website without causing all health checks to fail

private hosted zones

route 53 health checks are outside the VPC
they can’t access private endpoint
you can create a CloudWatch metric and associate a CloudWatch alarm, then create a health check that checks the alarm itself, if the CloudWatch alarm status becomes ALARM, the health checker will become to unhealthy

failover

create two records associate with 2 resources
primary and secondary records
primary record must associated with a health checker
if the primary record is unhealthy, DNS will return IP address of the secondary resource

Geolocation

different from latency based
this routing is based on user location
specify location by Continent, Country, or by US state
should create a default record (in case there is no match on location)
use cases: website localization, restrict content distribution, load balancing…
can be associated with health checks

Geoproximity

route traffic to your resources based on the geographic location of users and resources
ability to shift more to resources based on the defined bias
to change the size of the geographic region, specify bias values
- to expand (1 to 99), more traffic to the resource
- to shrink (-1 to -99), less traffic to the resource
resource can be
- AWS resources (AWS region)
- non AWS resources (latitude and longitude)
you must use route 53 traffic flow (advanced) to use this feature

Traffic flow

simplify the process of creating and maintaing records in large and complex configurations
visual editor to manage complex routing decision trees
configurations can be saved as traffic flow policy
- can be applied to different route 53 hosted zones
- supports versioning

multi value

use when routing traffic to multiple resources
route 53 return multiple values / resources
can be associated with health checks (return only values for healthy resources)
up to 8 healthy records are returned for each multi value query
multi value is not a substitude for having an ELB (it is more like a client side load balancing)

VPC

VPC: private network to deploy your resource
subnet: allow you to partition your network inside your VPC (AZ resource)
a public subnet is a subnet that is accessible from the internet
a private subnet is a subnet that is not accessible from the internet
to define access to the internet and between subnets, we use route tables

Internet gateway and NAT gateways

internet gateways helps our VPC instances connect with the internet
public subnets have a route to the internet gateway
NAT gateways (AWS managed) and NAT instances (self managed) allow your instances in your private subnets to access the internet while remaining private

NACL and security groups

NACL (network ACL)
- a firewall which controls traffic from and to subnet
- can have ALLOW and DENY rules
- are attached at the subnet level
- rules only include IP addresses
security groups
- a firewall that controls traffic to and from an ENI / an EC2 instance
- can have only ALLOW rules
- rules include IP addresses and other security groups

Security group	Network ACL
operates at the instance level	operates at the subnet level
supports allow rules only	supports allow rules and deny rules
is stateful: return traffic is automatically allowed, regardless of any rules	is stateless: return traffic must be explicitly allowed by rules
we evaluate all rules before deciding whether to allow traffic	we process rules in number order when deciding whether to allow traffic
applies to an instance only if someone specifies the security group when launching the instance, or associate the security group with the instance later on	automatically applies to all instances in the subnets it’s associated with (therefore, you don’t have to rely on users to specify the security group)

VPC Flow logs

capture information about IP traffic going into your interfaces
- VPC flow logs
- subnet flow logs
- ENI (elastic network interface) flow logs
helps to monitor and troubleshoot connectivity issues
- subnets to internet
- subnets to subnets
- internet to subnets
captures network information from AWS managed interfaces too: Elastic load balancers, ElastiCache, RDS, Aurora, etc…
VPC flow logs data can go to S3 / CloudWatch logs

VPC Peering

connect two VPCs, privately using AWS network
make them behave as if they were in the same network
must not have overlapping CIDR
VPC peering connection is not transitive (must be established for each VPC that need to communicate with one another)

VPC endpoints

endpoints allow you to connect to AWS services using a private network instead of the public www network
this gives you enhanced security and lower latency to access AWS services
VPC endpoint gateway: S3 and DynamoDB
VPC endpoint interface: the rest AWS rervices
only used within your VPC

Site to site VPN and Direct connect

site to site VPN
- connect an on premises VPN to AWS
- the connection is automatically encrypted
- goes over the public internet
direct connect
- establish a physical connection between on premises and AWS
- the connection is private, secure, and fast
- goes over a private network
- takes at least a month to establish
NOTE: site to site VPN and direct connect cannot access VPC endpoints

S3

buckets

S3 allows people to store objects in buckets
buckets must have a globally unique name
buckets are defined at the region level

objects

objects have a key
the key is the FULL path
the key is composed of prefix + object name
there is no concept of directories within buckets
just keys with very long names that contain slashes
object values are the content of the body
- max object size is 5TB
- if uploading more than 5GB, must use multi-part upload
metadata (list of text key / value pairs - system or user metadata)
tags (unicode key / value pair, up to 10) - useful for security / lifecycle
version ID (if versioning is enabled)

versioning

you can version your files in S3
it is enabled at the bucket level
same key overwrite will increment the version: 1,2,3…
it is best practice to version your buckets
- protect against unintended deletes
- easy roll back to previous version
note:
- any file that is not versioned prior to enabling versioning will have version null
- suspending versioning does not delete the previous versions

Encryption for objects

SSE-S3

encryption using keys handled and managed by S3
object is encrypted server side
AES-256 encryption type

SSE-KMS

encryption using keys handled and managed by KMS
KMS advantages: user control + audit trail
object is encrypted server side

SSE-C

server side encryption using data keys fully managed by the customer outside of AWS
S3 does not store the encryption key you provide
HTTPS must be used, because you need to send the encryption key in the header
encryption key must be provided in HTTP headers, for every HTTP request made

client side encryption

client library such as the Amazon S3 encryption client
clients must encrypt data themselves before sending to S3
clients must decrypt the data themselves when retrieving from S3
customer fully manages the keys and encryption cycle

Encryption in transit (SSL/TLS)

Amazon S3 exposes
- HTTP endpoint: non encrypted
- HTTPS endpoint: encryption in flight
you are free to use the endpoint you want, but HTTPS is recommended
most clients would use the HTTPS endpoint by default
HTTPS is mandatory for SSE-C

security

user based
- IAM policies - which API calls should be allowed for a specific user from IAM console
resource based
- bucket policies - bucket wide rules from the S3 console - allows cross account
- object ACL - finer grain
- bucket ACL - less common
NOTE: an IAM principal can access an S3 object if
- the user IAM permissions allow it OR the resource policy alloow it
- AND there is no explicit DENY

bucket settings for block public access

block public access to buckets and objects granted through
- new access control lists
- any access control lists
- new public bucket or access point policies
- block public and cross account access to buckets and objects through any public bucket or access point policies
these settings were created to prevent company data leaks
if you know your bucket should never be public, leave these on
can be set at the account level

others

networking
- supports VPC endpoints (for instances in VPC without www internet)
logging and audit
- S3 access logs can be stored in other S3 buckets
- API calls can be logged in AWS cloudtrail
user security
- MFA delete: MFA can be required in versioned buckets to delete objects
- pre-signed URLs: URLs that are valid for a limited time (premium videos service for logged in users)

CORS

an origin is a scheme, host, and port
CORS means cross origin resource sharing
web browser based mechanism to allow requests to other origins while visiting the main origin
the requests won’t be fulfilled unless the other origin allows for the requests, using CORS headers
if a client does a cross origin request on our S3 bucket, we need to enable the correct CORS headers
you can allow for a specific origin or for * (for all origins)

consistency model

strong consistency as of Dec 2020

AWS CLI, SDK, IAM Roles and policies

AWS CLI Dry run
- tells you if your command would have succeed or not without actually executing it
AWS CLI STS decode erros
- decode API error messgaes using the STS command line

AWS EC2 instance metadata

it allows AWS EC2 instance to learn about themselves without using an IAM role for that purpose
the URL is http://169.254.169.254/latest/meta-data/
you can retrieve the IAM role name from the metadata, but you cannot retrieve the IAM policy
metadata = info about the EC2 instance
user data = launch script of the EC2 instance

MFA with CLI

to use MFA with the CLI, you must create a temporary session
to do so, you must run the STS GetSessionToken API call

AWS SDK

what if you want to perform actions on AWS directly from your applications code?
you can use an SDK
we have to use the AWS SDK when coding against AWS services such as DynamoDB
if you don’t specify or configure a default region, then us-east-1 will be chosen by default

AWS limit

API rate limits
- DescribeInstances API for EC2 has a limit of 100 calls per seconds
- GetObject on S3 has a limit of 5500 GET per second per prefix
- for intermittent errors: implement exponential backoff
- for consistent errors: request an API throttling limit increase
service quotas
- running on-demand standard instances: 1152 vCPU
- you can request a service limit increase by opening a ticket
- you can request a service quota increase by using the service quotas API

Exponential Backoff

if you get ThrottlingException intermittently, use exponential backoff
retry mechanism already included in AWS SDK API calls
must implement yourself if using the AWS API as-is or in specific cases
- must only implement the retries on 5xx server errors and throttling
- do not implement on the 4xx client errors

AWS CLI credentials provider chain

the CLI will look for credentials in this order

command line options
environment variables
CLI credentials file
CLI configuration file
container credentials
instance profile credentials

AWS SDK default credentials provider chain

the java SDK will look for credentials in this order

java system properties
environment variables
the default credential profiles file
Amazon ECS container credentials
instance profile credentials

Credentials Scenario

an application deployed on an EC2 instance is using environment variables with credentials from an IAM user to call the Amazon S3 API
The IAM user has S3FullAccess permissions
the application only uses one S3 bucket, so according to best practices
- an IAM role and EC2 instance profile was created for the EC2 instance
- the role was assigned the minimum permissions to access that one S3 bucket
the IAM instance profile was assigned to the EC2 instance, but it still had access to all S3 buckets, why?
the credentials provider chain is still giving priorities to the environment variables

credentials best practice

never store AWS credentials in your code
best practice is for credentials to be inherited from the credentials chain
if working within AWS, use IAM roles
- EC2 instance roles for EC2 instances
- ECS roles for ECS tasks
- lambda roles for lambda functions
if working outside AWS, use environment variables / named profiles

signing AWS API requests

when you call the AWS HTTP API, you sign the request so that AWS can identify you, using your AWS credentials (access key and secret key)
note: some requests to Amazon S3 don’t need to be signed
if you use the SDK or CLI, the HTTP requests are signed for you
you should sign an AWS HTTP request using Signature v4 (SigV4)

sigV4 options

HTTP header
query string in URL

S3 and Athena Advanced

S3 MFA delete

MFA forces user to generate a code on a device before doing important operations on S3
to use MFA delete, we need to enable versioning on the S3 bucket
you will need MFA to
- permanently delete an object version
- suspend versioning on the bucket
you won’t need MFA for
- enabling versioning
- listing deleted versions
only the bucket owner can enable / disable MFA delete
MFA delete can only be enabled using the CLI

S3 default encryption vs bucket policies

one way to force encryption is to use a bucket policy and refuse any API call to PUT an S3 object without encryption headers
another way is to use the default encryption option in S3
note: bucket policies are evaluated before default encryption

S3 access logs

for audit purpose, you may want to log all access to S3 buckets
any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
that data can be analyzed using data analysis tools…
or Amazon Athena
do not set your logging bucket to be the monitored bucket
it will create a logging loop, and your bucket will grow in size exponentially

S3 replication (CRR or SRR)

must enable versioning in source and destination
cross region replication - CRR
same region replication - SRR
buckets can be in different accounts
copying is asynchronous
must give proper IAM permissions to S3
CRR use cases: compliance, lower latency access for users in another region, replicatioin across accounts
SRR use cases: log aggregation, live replication between production and test accounts
after activating, only new objects are replicated (not retroactive), existing objects will not be replicated
for DELETE operations
- can replicate delete markers from source to target (optional setting)
- deletions with a version ID are not replicated (to avoid malicious deletes), it means if you delete an object using its version ID, this operation will not be replicated
there is no chaining of replication
- if bucket 1 has replication into bucket 2, which has replication into bucket 3
- then objects created in bucket 1 are not replicated to bucket 3

S3 pre signed URLs

can generate per signed URLs using SDK or CLI
- for downloads (easy, can use the CLI)
- for uploads (harder, must use the SDK)
valid for a default of 3600 seconds, can change timeout with --expires-in [TIME_BY_SECONDS] argument
users given a pre signed URL inherit the permissions of the person who generated the URL for GET / PUT
examples
- allow only logged in users to download a permium video on your S3 buckets
- allow an ever changing list of users to download files by generating URLs dynamically
- allow temporarily a user to upload a file to a precise location in our bucket

Amazon Glacier and Glacier Deep Archive

Amazon Glacier - 3 retrival options
- expedited - 1 to 5 mins
- standard - 3 to 5 hours
- bulk - 5 to 12 hours
- minimum storage duration of 90 days
Amazon Glacier Deep Archive - for long term storage - cheaper
- standard - 12 hours
- bulk - 48 hours
- minimum storage duration of 180 days

S3 lifecycle rules

transition actions: it defines when objects are transitioned to another storage class
- move objects to standard IA class 60 days after creation
- move to Glacier for archiving after 6 months
expiration actions: configure objects to expire (delete) after some time
- access log files can be set to delete after a year
- can be used to delete old versions of files (if versioning is enabled)
- can be used to delete incomplete multi part uploads
rules can be created for a certain prefix
rules can be created for certain object tags

S3 performance

Amazon S3 automatically scales to high request rates, latency 100-200ms
your application can achieve at least 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix in a bucket
there are no limits to the number of prefixes (prefix is a folder in S3) in a bucket

KMS limitation

if you SSE-KMS, S3 performance may be impacted by the KMS limits
when you upload, it calls the GenerateDataKey KMS API
when you download it, it calls the Decrypt KMS API
count towards the KMS quota per second (5500, 10000, 30000 based on region)
you can request a quota increase using the service quotas console

multi part upload

recommended for files > 100MB, must be used for files > 5GB
can help parallelize uploads (speed up transfers)

S3 transfer acceleration

increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
compatible with multi part upload

S3 byte range fetches

parallelize GETs by requesting specific byte ranges
better resilience in case of failures
can be used to speed up downloads
can be used to retrieve only partial data (for example the head of a file)

S3 select and Glacier select

retrieve less data using SQL by performing servide side filtering
can filter by rows and columns (complex query not supported)
less network transfer, less CPU cost client side

S3 event notifications

use case: generate thumbnails of images uploaded to S3
can create as many S3 events as desired
S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer
it two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent
if you want to ensure that an event notification is sent for every successful write, you can enable versioning on your bucket
compared to CloudWatch event or EventBridge, S3 event notifications have lower latency and lower costs, it works better with S3

AWS Athena

serverless service to perform analytics directly against S3 files
uses SQL language to query the files
has a JDBC / ODBC driver
charged per query and amount of data scanned
supports CSV, JSON, ORC, Avro and Parquet (built on Presto)
use cases: business intelligence, analytics, reporting, analyze and query, VPC flow logs, ELB logs, CloudTrail trails, etc…
exam tip: analyze data directly on S3 => use Athena

AWS CloudFront

CDN
improves read performance, content is cached at the edge
216 point of presence globally (edge locations)
DDoS protection, integration, with Shield, AWS WAF
can expose external HTTPS and can talk to internal HTTPS backends

Origins

S3 bucket
- for distributing files and caching them at the edge
- enhanced security with CloudFront OAI
- CloudFront can be used as ingress (to upload files to S3)
Custom Origin (HTTP)
- ALB
- EC2 instance
- S3 website (must first enable the bucket as a static S3 website)
- any HTTP backend you want

CloudFront Geo Restriction

you can restrict who can access your distribution
- whitelist: allow your users to access your content only if they are in one of the countries on a list of approved countries
- blacklist: prevent your users from accessing your content if they are in one of the countries on a blacklist of banned countries
the country is determined using a third party geo IP database
use case: copyright laws to control access to content

CloudFront vs S3 CRR

CloudFront
- global edge network
- files are cached for a TTL
- great for static content that must be available everywhere
S3 CRR
- must be setup for each region you want to replication to happen
- files are updated in near real time
- read only
- great for dynamic content that needs to be available at low latency in few regions

CloudFront caching

cache based on
- headers
- session cookies
- query string parameters
the cache lives at each CloudFront edge location
you want to maximize the cache hit rate to minimize request on the origin
control the TTL, can be set by the origin usign the cache-control header, expires header…
you can invalidate part of the cache using the CreateInvalidation API

CloudFront signed URL / signed cookies

you want to distrbute paid shared content to premium users over the world
we can use CloudFront signed URL / signed cookie, we attach a policy with
- includes URL expiration
- includes IP ranges to access the data from
- trusted signers (which AWS accounts can create signed URLs)
how long should the URL be valid for
- shared content (movie, music): make it short (a few minutes)
- private content (private to the user): you can make it for years
signed URL: access to individual files (one signed URL per file)
signed cookies: access to multiple files (one signed cookie for many files)

CloudFront signed URL vs S3 pre signed URL

Signed URL

allow access to a path, no matter the origin
account wide key pair, only the root can manage it
can filter by IP, path, date, expiration
can leverage caching features

S3 pre signed URL

issue a request as the person who pre signed the URL
uses the IAM key of the signing IAM principal (has the same access as the IAM user who create the URL)
limited lifetime

CloudFront signed URL process

two types of signers
- either a trusted key group (recommended)
  - can leverage APIs to create and rotate keys (and IAM for API security)
- an AWS account that contains a CloudFront key pair
  - need to manage keys using the root account and the AWS console
  - not recommended because you shouldn’t use the root account for this
in your CloudFront distribution, create one or more trusted key groups
you generate your own public / private key
- the private key is used by your applications to sign URLs
- the public key is used by cloudfront to verify URLs

Price classes

you can reduce the number of edge locations for cost reduction
three price classes
- price class All: all regions - best performance
- price class 200: most regions, but excludes the most expensive regions
- price class 100: only the least expensive regions

Multiple origin

to route to different kind of origins based on the content type
based on path pattern
- /images/*
- /api/*
- /*

origin groups

to increase high availability and do failover
origin group: one primary and one secondary origin
if the primary origin fails, the second one is used (works for both EC2 instance and S3 buckets)

field level encryption

protect user sensitive information throught application stack
adds an additional layer of security along with HTTPS
sensitive information encrypted at the edge close to user
uses asymmetric encryption
usage:
- specify set of fields in POST request that you want to be encrypted (up to 10 fields)
- specify the public key to encrypt them
- fields will be encrypted using the public key at edge locations and will be decrypted when the request reached the web servers

ECS

Docker

Docker is a software development platform to deploy apps
apps are packaged in containers that can be run on any OS
apps run the same, regardless of where they are run
- any machine
- no compatibility issues
- predictable behavior
- less work
- easier to maintain and deploy
- works with any language, any OS, any technology
Docker images are stored in Docker repositories
public: Docker hub
private: Amazon ECR
Docker vs VM
- docker is sort of a virtualization technology, but not exactly
- resources are shared with the host => many containers on one server
Docker containers management
- to manage containers, we need a container management platform
- 3 choices
- ECS: Amazon’s own platform
- Fargate: Amazon’s own serverless platform
- EKS: Amazon’s managed Kubernates (open source)

ECS clusters overview

ECS clusters are logical grouping of EC2 instances
EC2 instances run the ECS agent (Docker container)
the ECS agents registers the instance to the ECS cluster
the EC2 instances run a special AMI, made specifically for ECS

ECS task definitions

tasks definintions are metadata in JSON form to tell ECS how to run a Docker Container
it contains crucial information around
- Image name
- port binding for container and host (80 -> 8080)
- memory and CPU required
- environment variables
- networking information

ECS service

ECS service help define how many tasks should run and how they should be run
they ensure that the number of tasks desired is running across our fleet of EC2 instances
they can be linked to ELB / NLB / ALB if needed

ECS service with load balancer

ALB has the dynamic port forwarding feature
when you create ECS tasks it assign random port numbers to tasks
multiple ECS tasks can be run on a single EC2 instances with different port numbers
ALB can use the dynamic port forwarding feature to route traffic to these tasks based on their port number

ECR

ECR is a private Docker image reporsitory
access is controlled through IAM (if you have permission errors, check the policy)

if you have AWS CLI version 1
- $(aws ecr get-login --no-include-email --region eu-west-1)
- then you need to execute the output of the above command
if you have AWS CLI version 2
- aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 12334556790.dkr.ecr.eu-west-1.amazonaws.com
- you could just execute the above command which is using the pipe feature
Docker push and pull

Fargate

when launching an ECS cluster, we have to create our EC2 instances
if we need to scale, we need to add EC2 instances
so we need to manage infrastructure…
with Fargate, it is all serverless
we don’t provision EC2 instances
we just create task definitions, and AWS will run our containers for us

ECS IAM roles deep dive

EC2 instance profile
- for EC2 instance to run ECS task, we need to install ECS agent on the EC2 instance
- ECS will do these things
- make API calls to ECS service
- send container logs to CloudWatch logs
- pull docker image from ECR
- so ECS agent will use the EC2 instance profile role to do these things
ECS task role
- when we run ECS tasks on EC2 instance, each task will have its own role
- we use different roles for the different ECS services run
- task role in defined in the task definition

ECS tasks placement

when a task of type EC2 is launched, ECS must determine where to place it, with the constraints of CPU, memory, and available port
similarly, when a service scales in, ECS needs to determine which task to terminate
to assist with this, you can define a task placement strategy and task placement constraints
NOTE: this is only for ECS with EC2, not for Fargate

ECS task placement process

task placement strategies are a best effort
when Amazon ECS places tasks, it uses the following process to select container instances

identify the instances that satisfy the CPU, memory, and port requirements in the task definition
identify the instances that satisfy the task placement constraints
identify the instances that satisfy the placement strategies

ECS task placement strategies

Binpack
- place tasks based on the least available amount of CPU or memory
- this minimize the number of instances in use (cost savings)
random
- place the task randomly
spread
- place the task evenly based on the specified value
- example: instanceID, availability zone
you can also mix the placement strategies together, e.g. use Spread for the AZ and Binpack for memory

ECS task placement constraints

distinctInstance: place each task on a different container instance
memberOf: places task on instances that satisfy an expression
- uses the Cluster Query language
- e.g. place tasks only on t2 instances

ECS service auto scaling

CPU and RAM is tracked in CloudWatch at the ECS service level
target tracking: target a specific average CloudWatch metric
step scaling: scale based on CloudWatch alarms
scheduled scaling: based on predictable changes
ECS service scaling (task level) != EC2 auto scaling (instance level)
Fargate auto scaling is much easier to setup (because of serverless)

ECS cluster capacity provider

a capacity provider is used in association with a cluster to determine the infrastructure that a task runs on
- for ECS and Fargate users, the FARGATE and FARGATE_SPOT capacity providers are added automatically
- for Amazon ECS on EC2, you need to associate the capacity provider with an auto scaling group
when you run a task or a service, you define a capacity provider strategy, to prioritize in which provider to run
this allows the capacity provider to automatically provision infrastructure for you
if you set the average CPU to be at most 70%, then cluster capacity provider will create a new EC2 instance for you when you create a new task to run

ECS data volumes

EC2 task strategies

the EBS volume is already mounted onto the EC2 instances
this allows your Docker containers to mount the EBS volume and extend the storage capacity of your task
Problem: if your task moves from one EC2 instance to another one, it won’t be the same EBS volume and data, because EBS volume is mounted to the old EC2 instance
use cases:
- mount a data volume between different containers on the same instance
- extend the temporary storage of a task

EFS file systems

works for both EC2 tasks and Fargate tasks
ability to mount EFS volumes onto tasks
tasks launched in any AZ will be able to share the same data in the EFS volume
Fargate + EFS = serverless + data storage without managing servers
use case: persistent multi AZ shared storage for your containers

works for both EC2 tasks (using local EC2 instance storage) and Fargate tasks (get 4GB for volume mounts)
useful to share an ephemeral storage between multiple containers part of the same ECS task
great for sidecar container pattern where the sidecar can be used to send metrics / logs to other destinations

Beanstalk

developer problems on AWS

managing infrastructure
deploying code
configuring all the databases, load balancers, etc…
scaling concerns
most web apps have the same architecture (ALB + ASG)
all the developers want is for their code to run
possibly, consistently across different applications and environments

Elastic Beanstalk overview

a developer centric view of deploying an application on AWS
it uses all components’ we have seen before: EC2, ASG, ELB, RDS…
managed services
- automatically handles capacity provisioning, load balancing, scaling, application health monitoring, instance configuration…
- just the application code is the responsiblity of the developer
we still have full control over the configuration
Beanstalk is free but you pay for the underlying instances

Components

application: collection of Elastic Beanstalk components (environments, versions, configurations)
application version: an iteration of your application code
environment
- collection of AWS resources running an application version (only one application version at a time)
- tiers: web server environment tier and worker environment tier
- you can create multiple environmnets (dev, test, prod…)
web environment
- use ELB with multiple EC2 instances running on different AZs and ASG to scale
worker environment
- use SQS queue with multiple EC2 instances running on different AZs and ASG will scale based on SQS’s length

Beanstalk deployment options

All at once

fastest deployment
application has downtime
great for quick iterations in development environment
no additional cost

Rolling

application is running below capacity
can set the bucket size, bucket size is the number of new instances we launched each time
application is running both versions simultaneously
no additional cost
long deployment

Rolling with additional batches

application is running at capacity
can set the bucket size
application is running both versions simultaneously
small additional cost
additional batch is removed at the end of the deployment
longer deployment
good for production environment

Immutable

zero downtime
new code is deployed to new instances on a temporary ASG
high cost, double capacity
longest deployment
quick rollback in case of failures (just terminate new ASG)
great for production

Blue / Green

not a direct feature of Elastic Beanstalk
zero downtime and release facility
create a new stage environment and deploy version 2 there
the new environment (green) can be validated independently and roll back if issue happens
route 53 can be setup using weighted policies to redirect a little bit of traffic to the stage environment
using Beanstalk, swap URLs when done with the environment test

Traffic splitting (Canary Testing)

new application version is deployed to a temporary ASG with the same capacity
a small percetage of traffic is sent to the temporary ASG for a configuraion amount of time
deployment health is monitored
if there is a deployment failure, this triggers an automated roll back (very quick)
no application downtime
new instances are migrated from the temporary to the original ASG
old application version is then terminated

Deploy using CLI

describe dependencies
package code as zip, and describe dependencies
console: upload zip file (creates new app version), and then deploy
CLI: create new app version using CLI (uploads zip), and then deploy
Elastic Beanstalk will deploy the zip on each EC2 instance, resolve dependencies and start the application

Beanstalk lifecycle policy

Elastic Beanstalk can store at most 1000 application versions
if you don’t remove old versions, you won’t be able to deploy anymore
to phase out old application versions, use a lifecycle policy
- based on time (old versions are removed)
- based on space (when you have too many versions)
versions that are currently used won’t be deleted
option not to delete the source bundle in S3 to prevent data loss

Beanstalk extensions

a zip file containing our code must be deployed to Elastic Beanstalk
all the parameters set in the UI can be configured with code using files
requriements
- in the .ebextensions/ directory in the root of source code
- YAML / JSON format
- .config extensions (example: logging.config)
- able to modify some default settings using: option_settings
- ability to add resources such as RDS, ElastiCache, DynamoDB, etc…
resources managed by .ebextensions get deleted if the environment goes away

Beanstalk vs CloudFormation

under the hood, Elastic Beanstalk relies on CloudFormation
CloudFormation is used to provision other AWS services
use case: you can define CloudFormation resources in your .ebextensions to provision ElastiCache, S3 bucket, or anything you want.

Elastic Beanstalk cloning

clone an environment with the exact same configuration
useful for deploying a test version of your application
all resources and configuration are preserved
- load balancer type and configuration
- RDS database type (but data is not preserved)
- environment variables
after cloning an environment, you can change settings

Beanstalk migration

load balancer

after creating an Elastic Beanstalk environmnet, you cannot change the ELB type
to migrate to a different ELB
1. create a new env with the same configuration except LB, create your new LB here
2. deploy your application onto the new env
3. perform a CNAME swap or Route 53 update so all your traffic can be direct to the new env

RDS

RDS can be provisioned with Beanstalk, which is greate for dev / test
this is not great for production as database lifecycle is tied to the Beanstalk environment lifecycle
the best for prod is to separately create an RDS database and provide our Beanstalk application with the connection string
but what if you have already created Beanstalk application with the RDS in production? How to migrate it to a new environment without RDS?

create a snapshot of RDS DB (as a safeguard)
go to the RDS console and protect the RDS database from deletion
create a new environment, without RDS, point your application to the existing RDS in the old env
perform a CNAME swap or Route 53 update, confirm it is working
terminate the old env (RDS will not be deleted because you prevent it in the console)
delete the CloudFormation stack manually (it will be in DELETE_FAILED state because it can’t delete RDS)

single Docker

run your application as a single Docker container
either provide
- Dockerfile: Elastic Beanstalk will build and run the Docker container
- Dockerrun.aws.json (v1): describe where the Docker image is (already built)
Beanstalk in single Docker container does not use ECS

Multi Docker containers

multi docker helps run multiple containers per EC2 instance in EB
this will create for you
- ECS cluster
- EC2 instances, configured to use the ECS cluster
- load balancer (in HA mode)
- task definitions and execution
requries a config Dockerrun.aws.json (v2) at the root of the source code
Dockerrun.aws.json is used to generate the ECS task definition

Elastic Beanstalk and HTTPS

Beanstalk with HTTPS
- idea: load the SSL certificate onto the load balancer
- can be done from the console (EB console, load balancer configuration)
- can be done from the code: .ebextensions/securelistener-alb.config
- SSL certificate can be provisioned using ACM or CLI
- must configure a security group rule to allow incoming port 443 (HTTPS port)
Beanstalk redirect HTTP to HTTPS
- configure your instances to redirect HTTP to HTTPS
- configure the application load balancer with a rule
- make sure health checks are not redirected (so they keep giving 200 OK, otherwise they will receive 301 and 302…)

Web server vs worker environment

if your application performs tasks that are long to complete, offload these tasks to a dedicated worker environment
decoupling your application into two tiers is common
example: processing a video, generating a zip file, etc…
you can define periodic tasks in a file cron.yaml

custom platform (advanced)

custom platforms are very advanced, they allow to define from scratch
- the OS
- additional software
- scripts that Beanstalk runs on these platforms
use case: app language is incompatible with Beanstalk and doesn’t use Docker
to create your own platform
- define an AMI using Platform.yaml file
- build that platform using the Packer software (open source tool to create AMIs)
custom platform vs Custom image
- custom image is to tweak an existing Beanstalk platform
- custom platform is to create an entirely new Beanstalk platform

CICD

Introduction

we now know how to create resources in AWS manually
we know how to interact with AWS CLI
we have seen how to deploy code to AWS using Elastic Beanstalk
all these manual steps make it very likely for us to do mistakes
what we would like is to push our code in a repository and have it deployed onto the AWS
- automatically
- the right way
- making sure it is tested before deploying
- with possibility to go into different stages
- with manual approval where needed
to be a proper AWS developer, we need to learn AWS CICD

Continuous integration

developers push the code to a code repository often (github, codecommit, bitbucket, etc…)
a testing / build server checks the code as soon as it is pushed (codebuild, Jenkins CI, etc…)
the developer gets feedback about the tests and checks that have passed / failed
find bugs early, fix bugs
deliver faster as the code is tested
deploy often

Continuous delivery

ensure that the software can be released reliably whenever needed
ensures deployments happen often and are quick
automated deployment

CodeCommit

version control is the ability to understand the various changes that happened to the code over time
all these are enabled by using a version control system such as git
a git repository can live on one’s machine, but it usually lives on a central online repository
benefits are
- collaborate with other developers
- make sure the code is backed up somewhere
- make sure it is fully viewable and auditable
git repositories can be expensive
the industry incldues
- github
- bitbucket
AWS CodeCommit
- private git repositories
- no size limit on repositories
- fully managed, HA
- code only in AWS, increased security and compliance
- secure
- integrated with Jenkins / CodeBuild / other CI tools

security

interactions are done using git
authentication in git
- SSH keys: AWS users can configure SSH keys in their IAM console
- HTTPS: done through the AWS CLI Authentication helper ot generating HTTPS credentials
- MFA can be enabled for extra security
Authorization in git
- IAM policies manage user / roles rights to repositories
encryption
- repositories are automatically encrypted at rest using KMS
- encrypted in transit (can only use HTTPS and SSH - both secure)
cross account access
- do not share your SSH keys
- do not share your AWS credentials
- use IAM role in your AWS account and use AWS STS (with AssumeRole API)

CodeCommit vs Github

Similarities
- both are git repositories
- both support code review
- github and CodeCommit can be integrated with AWS CodeBuild
- both support HTTPS and SSH method of authentication
differences
- security
  - github: github users
  - codecommit: AWS IAM users / roles
- hosted:
  - github: hosted by github
  - github enterprise: self hosted on your servers
  - codecommit: managed and hosted by AWS
- UI
  - github UI is fully featured

notifications

you can trigger notifications in CodeCommit using AWS SNS or AWS lambda or CloudWatch event rules
use cases for notifications SNS / lambda
- deletion of branches
- trigger for pushes that happens in master branch
- notify external build system
- trigger AWS lambda function to perform codebase analysis
use cases for CloudWatch event rules
- trigger for pull request updates
- commit comment event
- CloudWatch event rules goes into an SNS topic

CodePipeline

Continuous delivery
visual workflow
source: github / codecommit / S3
build: codebuild / Jenkins
load testing: third party tools
deploy: AWS code deploy / Beanstalk / CloudFormation / ECS
made of stages
- each stage can have sequential actions or parallel actions
- stages examples: build / test / deploy / load test
- manual approval can be defined at any stage

artifacts

each pipeline stage can create artifacts
artifacts are passed stored in S3 and passed on to the next stage

troubleshooting

codepipeline state changes happen in CloudWatch events, which can in return create SNS notifications
- you can create events for failed pipelines
- you can create events for cancelled stages
if codepipeline fails a stage, your pipeline stops and you can get information in the console
CloudTrail can be used to audit AWS API calls
if pipeline can’t perform an action, make sure the IAM service role attached does have enough permissions

CodeBuild

fully managed build service
alternative to other build tools such as Jenkins
continuous scaling (no servers to manage or provision - no build queue)
pay for usage: the time it takes to complete the builds
leverages Docker under the hood for reproducible builds
possibility to extend capabilities leveraging our own base Docker images
secure: integration with KMS for encryption of build artifacts, IAM for build permissions, and VPC for network security, CloudTrail for API calls logging
source code from github / codecommit / codepipeline / S3
build instructions can be defined in code (buildspec.yml file)
output logs to S3 and AWS cloudwatch logs
metrics to monitor codebuild statistics
use cloudwatch alarms to detect failed builds and trigger notifications
cloudwatch events / lambda as a Glue
SNS notifications
ability to reproduce codebuild locally to troubleshoot in case of errors
builds can be defined within CodePipeline or Codebuild itself

BuildSpec

buildspec.yml file must be at the root of your code
define environment variables
- plaintext variables
- secure secrets: use SSM parameter store
phases
- install: install dependencies you may need for your build
- pre build: final commands to execute before build
- build: actual build commands
- post build: finishing touches (zip output)
artifacts: what to upload to S3
cache: files to cache to S3 for future build speedup

local build

in case of need of deep troubleshooting beyond logs
you can run CodeBuild on your laptop (after installing Docker)
for this, leverage CodeBuild agent

CodeBuild in VPC

by default, your Codebuild containers are launched outside your VPC
therefore, by default, it cannot access resources in a VPC
you can specify a VPC configuration
- VPC ID
- subnet ID
- security group ID
they your build can access resources in your VPC
use case: integration tests, data query, internal load balancers

CodeDeploy

we want to deploy our application automatically to many EC2 instances
these instances are not managed by Elastic Beanstalk
there are several ways to handle deployments using open source tools (Ansible, Terraform, Chef, Pupper, etc…)

Steps

Each EC2 machine (or on premises machine) must be running the CodeDeploy agent
the agent is continuously polling AWS codeDeploy for work to do
CodeDeploy sends appspec.yml file
application is pulled from github or S3
EC2 will run the deployment instructions
CodeDeploy agent will report of success / faliure of deployment on the instance

other information

EC2 instances are grouped by deployment group (dev / test / prod)
lots of flexibility to define any kind of deployments
CodeDeploy can be chained into CodePipeline and use artifacts from there
CodeDeploy can reuse existing setup tools, works with any application, auto scaling integraion
Note: Blue / Green only works with EC2 instances (not on premises)
support for AWS lambda deployments
CodeDeploy does not provision resources

primary components

application: unique name
compute platform: EC2 or on premises or lambda
deployment configuration: deployment rules for success / failures
- EC2 or on premises: you can specify the minimum number of healthy instances for the deployment
- lambda: specify how traffic ;is routed to your updated lambda function versions
deployment group: group of tagged instances (allows to deploy gradually)
deployment type: in place deployment or Blue/Green deployment
IAM instance profile: need to give EC2 the permissions to pull from S3 / github
application revision: application code + appspec.yml file
service role: role for CodeDeploy to perform what it needs
target revision: target deployment application version

Appspec

file section: how to source and copy from S3 / github to filesystem
hooks: set of instructions to do to deploy the new version (hooks can have timeouts)
- applicationStop
- DownloadBundle
- BeforeInstall
- AfterInstall
- ApplicationStart
- ValidateService: really important

Deployment config

Configs:
- one a time: one instance at a time, one instance fails => deployment stops
- half at a time: 50%
- all at once: quick but no healthy host, downtime, good for dev
- custom: min healthy host = 75%
failures:
- instances stay in failed state
- new deployments will first be deployed to failed state instances
- to rollback: re-deploy old deployment or enable automated rollback for failures
deployment targets
- set of EC2 instances with tags
- directly to an ASG
- mix of ASG / tags so you can build deployment segments
- customization in scripts with DEPLOYMENT_GROUP_NAME environment variables

CodeDeploy for EC2 and ASG

code deploy to EC2
- define how to deploy the application using appspec.yml + deployment strategy
- will do in place update to your fleet of EC2 instances
- can use hooks to verify the deployment after each deployment phase
code deploy to ASG
- in place updates
  - updates current existing EC2 instances
  - instances newly created by an ASG will also get automated deployments
- Blue / Green deployment
  - a new auto scaling group is created (settings are copied)
  - choose how long to keep the old instances
  - must be using an ELB (for directing traffic to new ASG group)

CodeStar

CodeStar is an integrated solution that regroups: github, codecommit, codebuild, codeDeploy, CloudFormation, codepipeline, cloudwatch
helps quickly create CICD ready projects for EC2, lambda, Beanstalk
supported language: C#, Go, HTML5, Java, Node.js, PHP, Python, Ruby
issue tracking integration with JIRA, Github issues
ability to integrate with Cloud9 to obtain a web IDE
one dashboard to view all your components
free services, pay only for the underlying usage of other services
limited customization

CloudFormation

infrastructure as code

manual work will be very tough to reproduce
- in another region
- in another AWS account
- within the same region if everything was deleted
CloudFormation would be the code to create / update / delete our infrastructure
CloudFormation is a declarative way of outlining your AWS infrastructure, for any resources
CloudFormation creates the resources for you in the right order, with the exact configuration that you sepcify

benefits

infrastructure as code
- no resources are manually created, which is excellent for control
- the code can be version controlled for example using git
- changes to the infrastructure are reviewed through code
cost
- each resources within the stack is stagged with an identifier so you can easily see how much a stack costs you
- you can estimate the costs of your resources using the CloudFormation template
- savings strategy: in dev, you could automation deletion of templates at 5pm and recreate anything at 8am safely
productivity
- ability to destroy and recreate an infrastructure on the cloud on the fly
- automated generation of diagram for your templates
- declarative programming (no need to figure out ordering and orchestration)
separation of concern: create many stacks for many apps, and many layers
don’t reinvent the wheel
- leverage existing templates on the web
- leverage the documentation

how cloudformation works

templates have to be uplaoded in S3 and then referenced in cloudformation
to update a template, we can’t edit previous ones, we have to reupload a new version of the template to AWS
stacks are identified by a name
deleting a stack deletes every single artifact that was created by CloudFormation

deploying cloudformation template

manual way
- editing templates in the CloudFormation designer
- using the console to input parameters
automated way
- editing templates in a YAML file
- using the AWS CLI to deploy the templates
- recommended way when you fully want to automate your flow

building blocks

templates components
- resources: your AWS resources declared in the template (mandatory)
- parameters: the dynamic inputs for your template
- mappings: the static variables for your templates
- outputs: references to what has been created
- conditionals: list of conditions to perform resource creation
- metadata
template helpers
- references
- functions

resources

resources are the core of your CloudFormation template
they repreesent the different AWS components that will be created and configured
resources are declared and can reference each other
AWS figures out creation, updates and deletes of resources for us
can I create a dynamic amount of resources
- no you can’t
is every AWS services suported
- almost, only a few are not

parameters

parameters are a way to provide inputs to your AWS CloudFormation template
they are important to know about if
- you want to resue your templates across the company
- some inputs can not be determined ahead of time
parameters are extremely powerful, controlled, and can precent errors from happening in your templates thanks you types

how to reference a parameter

the Fn::Ref function can be leveraged to reference parameters
parameters can be used anywhere in a template
the shorthand for this in YAML is !Ref
the function can also reference other elements within the template

Pseudo parameters

AWS offers us pseudo parameters in any CloudFormation template
these can be used at any time and are enabled by default

mappings

mappings are fixed variables within your CloudFormation template
they are very handy to differentiate between different environments (dev vs prod), regions, AMI types, etc…
all the values are hardcoded within the template

when would you use mappings vs parameters

mappings are great when you know in advance all the values that can be taken and that they can be deduced from variables such as
- region
- AZ
- AWS account
- environment
they allow safer control over the template
use parameters when the values are really user specific

accessing mapping values

we use Fn::FindInMap to return a named value from a specific key
!FindInMap [MapName, TopLevelKey, SecondLevelKey]

outputs

the outputs section declares optional outputs values that we can import into other stacks (if you export them first)
you can also view the outputs in the AWS console or using the AWS CLI
they are very useful for example if you define a network CloudFormation, and output the variables such as VPC ID, and your subnet IDs
it is the best way to perform some collaboration cross stack, as you let export handle their own part of the stack
you can’t delete a CloudFormation stack if its outputs are being referenced by another CloudFormation stack

outputs example

create a SSH security group as part of one template
we create an output that references that security group

Outputs:
  StackSSHSecurityGroup:
    Description: The SSH security group for our company
    Value: !Ref MyCompanyWideSSHSecurityGroup
    Export:
      Name: SSHSecurityGroup

Cross stack reference

we then create a second template that leverages that security group
for this, we use the Fn::ImportValue function
you can’t delete the underlying stack until all the references are deleted too

Resources:
  MySecureInstance:
    Type: AWS::EC2::Instance
    Properties:
      AvailabilityZone: us-east-1a
      ImageId: ami-xxxxxxxx
      InstanceType: t2.micro
      SecurityGroups:
        - !ImportValue SSHSecurityGroup

conditions

conditions are used to control the creation of resources or outputs based on a condition
conditions can be whatever you want them to be, but common ones are
- environment
- region
- parameter value
each condition can reference another condition, parameter value or mapping

define a conditon

1 2	Conditions: CreateProdResources: !Equals [ !Ref EnvType, prod]

the logical ID is for you to choose, it is how you name condition
the intrinsic function can by any of the following
- Fn::And
- Fn::Equals
- Fn::If
- Fn::Not
- Fn::Or

using a condition

conditions can be applied to resources / outputs

Resources:
  Mountpoint:
    Type: "AWS::EC2::VolumeAttachment"
    Condition: CreateProdResources

Intrinsic functions

Fn::Ref

can be leveraged to reference
- parameters
- resources

Fn::GetAtt

attributes are attached to any resources you create
to know the attributes of your resources, the best place to look at is the documentation
example: AZ of an EC2 machine

Resources:
  EC2Instance:
    Type: "AWS::EC2::Instance"
    Properties:
      ImageId: ami-xxxxx
      InstanceType: t2.micro

NewVolume:
  Type: "AWS::EC2::Instance"
  Condition: CreateProdResources
  Properties:
    Size: 100
    AvailabilityZone:
      !GetAtt EC2Instance.AvailabilityZone

Fn::FindInMap

we use Fn::FindInMap to return a named value from a specific key
!FindInMap [MapName, TopLevelKey, SecondLevelKey]

Fn::ImportValue

import values that are exported in other templates

Fn::Join

join values with a delimiter
!Join [delimiter, [a, b, c]]

Fn::Sub

used to substitute variables from a text, it is a very handy function that will allow you to fully customize your templates
for example: you can combine Fn::Sub with references or AWS Pseudo variables
String must contain ${VariableName} and will substitute them

Condition Functions

the logical ID is for you to choose, it is how you name condition

CloudFormation rollbacks

stack creation fails
- default: everything rolls back, we can look at the log
- option to disable rollback and troubleshoot what happened
stack update fails:
- the stack automatically rolls back to the previous known working state
- ability to see in the log what happened and error messages

ChangeSets

when you update a stack, you need to know what changes before it happens for greater confidence
ChangeSets won’t say if the update will be successful

Nested Stacks

stacks as part of other stacks
they allow you to isolate repeated patterns / common components in separate stacks and call them from other stacks
Example:
- load balancer configuration that is re used
- security group that is re used
nested stacks are considered best practice
to update a nested stack, always update the parent (root stack)

Cross stack vs nested stack

Cross stacks
- helpful when stacks have different lifecycles
- use outputs export and Fn::ImportValue
- when you need to pass export values to many stacks
nested stacks
- helpful when components must be re used
- example: re use how to properly configure an application load balancer
- the nested stack only is important to the higher level stack

StackSets

create, update, or delete stacks across multiple accounts and regions with a single operation
administrator account to create StackSets
trusted accounts to create, update, delete stack instances from StackSets
when you update a stack set, all associated stack instances are updated throughout all accounts and regions

CloudFormation drift

CloudFormation allows you to create infrastructure
but it doesn’t protect you against manual configuration changes
how do we know if our resources have drifted?
we can use CloudFormation drift

Monitoring

AWS CloudWatch
- metrics: collect and track key metrics
- log: collect, monitor, analyze, and store log files
- events: send notifications when certain events happen in your AWS
- alarms: react in real-time to metrics / events
X-Ray
- troubleshooting application performance and errors
- distrbuted tracing of microservices
CloudTrail
- internal monitoring of API calls being made
- audit changes to AWS resources by your users

CLoudWatch metrics

CloudWatch provides metrics for every services in AWS
metric is a variable to monitor
metric belong to namespaces
dimension is an attribute of a metric
up to 10 dimensions per metric
metrics have timestamps
can create CloudWatch dashboards of metrics

Detailed monitoring

EC2 instance metrics have metrics every 5 minutes
with detailed monitoring, you get data every 1 minute
use detailed monitoring if you want to scale faster for your ASG
the AWS free tier allows us to have 10 detailed monitoring metrics
NOTE: EC2 memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)

CloudWatch Custom metrics

possibility to define and send your own custom metrics to CloudWatch
example: RAM usage, disk space, number of logged in users
use API call PutMetricData
ability to use dimensions (attribute) to segment metrics
- instance id
- environment name
matric resolution
- standard: 60 seconds
- high resolution: 1 / 5 / 10 / 30 seconds - higher cost
important: the API accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC2 instance time correctly)

CloudWatch logs

applicatoins can send logs to CloudWatch using the SDK
CLoudWatch can collect log from
- Elastic Beanstalk: collection of logs from application
- ECS: colletion from containers
- AWS lambda: collection from function logs
- VPC flow logs: VPC specific logs
- API gateway
- CLoudTrail based on filter
- CloudWatch log agents: for example on EC2 machines
- route 53: log DNS queries
cloudWatch logs can go to
- batch exporter to S3 for archival
- stream to elasticSearch cluster for further analysis
ClouWatch logs can use filter expressions
logs storage architecture
- log groups: arbitrary name, usually representing an application
- log stream: instances within application / log files / containers
can define log expiration policies (never, 30 days, etc…)
using the AWS CLI we can tail CloudWatch logs
to send logs to CloudWatch, make sure IAM permissions are correct
security: encryption of logs using KMS at the group level

CloudWatch logs for EC2

by default, no logs from EC2 machine will go to CloudWatch
you need to run a CloudWatch agent on EC2 to push the log files you want
make sure IAM permissions are correct
the CLoudWatch log agent can be setup on premises too

CloudWatch logs agent vs Unified agent

CloudWatch logs agent
- old version of the agent
- can only send to CloudWatch logs
CloudWatch Unified agent
- collect additional system level metrics such as RAM, processes, etc…
- collect logs to send to CloudWatch logs
- centralized configuration using SSM parameters store

CloudWatch logs metric filter

CloudWatch logs can use filter expressions
- for exampe: find a specific IP inside of a log
- or count occurrences of ERROR in your logs
- metric filters can be used to trigger alarms
filter do not retroactively filter data. (it doesn’t not filter historical data, only data since filter is created is counted).
filters only publish the metric data points for events that happen after the filter was created
can be integrated with CloudWatch alarms, SNS, etc…

CloudWatch alarms

alarms are used to trigger notifications for any metric
various options
alarm states
- OK
- INSUFFICIETN_DATA
- ALARM
period
- length of time in seconds to evaluate the metric
- high resolution custom metrics: 10 sec, 30 sec or multiple of 60 sec

CloudWatch alarm targets

stop, terminate, reboot or recover EC2 instances
trigger auto scaling action
send notification to SNS

good to know

alarms can be created based on CloudWatch logs metrics filters
to test alarms and notifications, you could set the alarm state using CLI

CloudWatch Events

event pattern: intercept events from AWS service
- EC2 instance state change, code build failure, S3…
- can intercept any API call with CloudTrail integration
schedule or Cron
A json payload is created from the event and passed to a target

CloudWatch event bridge

EventBridge is the next evolution of CloudWatch events
default event bus: generated by AWS service
partner event bus: receive events from SaaS service or applications
custom event bus: for you own application
event buses can be accessed by other AWS accounts
rules: how to process the events (similar to CloudWatch event rules)

Schema registry

eventBridge can analyze the events in your bus and infer the schema
the schema registry allows you to generate code for your application that will know in advance how data is structured in the event bus
schema can be versioned

Amazon EventBridge vs CloudWatch events

EventBridge builds upon and extends CloudWatch events
it uses the same service API and endpoint, and the same underlying service infrastructure
EventBridge allows extension to add event buses for your custom applications and third party SaaS apps
EventBridge has the schema registry capability
EventBridge has a different name to mark the new capabilities
over time, the CloudWatch events name will be replaced with EventBridge

X-Ray

debugging in Production, the old way
- test locally
- add log statements everywhere
- re deploy in production
log formats differ across applications using CLoudWatch and analytics is hard

X-ray advantages

troubleshooting performance
understand dependencies in a microservices architecture
pinpoint service issues
review request behavior
find errors and exceptions

tracing

tracing is an end to end way to following a request
each component dealing with the request adds its own trace
tracing is made of segments
annotations can be added to traces to provide extra information
ability to trace
- every request
- sample request (as a percentage for example or a rate per minute)
X-Ray security
- IAM for authorization

How to enable?

Your code must import the AWS X-Ray SDK

very little code modification needed
the application SDK will then capture
- calls to AWS service
- HTTP / HTTPS requests
- database calls
- queue calls

install X-Ray daemon or enable X-Ray AWS integration

X-Ray daemon works as a low level UDP packet interceptor
lambda / other AWS services already run the X-Ray daemon for you
each application must have the IAM rights to write data to X-Ray

X-Ray magic

X-Ray service collects data from all the different services
service map is computed from all the segments and traces
X-Ray is graphical, so even non technical people can help troubleshoot

X-Ray troubleshooting

if X-Ray is not working on EC2
- ensure the EC2 IAM role has the proper permissions
- ensure the EC2 instance is running the X-Ray daemon
to enable on AWS lambda
- ensure it has an IAM execution role with proper policy
- ensure that X-Ray is imported in the code

X-Ray instrumentation and concepts

instrumentation means the measure of product’s performance, diagnose errors and to write trace information
to instrument your application code, you use the X-Ray SDK
many SDK require only configuration changes
you can modify your application code to customize and annotation the data that the SDK sends to X-Ray, using interceptors, filters, handlers, middleware…

X-Ray concepts

segments: each application / service will send them
subsegments: if you need more details in your segment
trace: segments collected together to form an end to end trace
sampling: decrease the amount of requests sent to X-Ray, reduce cost
annotations: key value pairs used to index traces and use with filters
metadata: key value pairs, not indexed, not used for searching
the X-Ray daemon / agent has a config to send traces cross account
- make sure the IAM permissions are correct - the agent will assume the role
- this allows to have a central account for all your application tracing

X-Ray sampling rules

with sampling rules, you control the amount of data that you record
you can modify sampling rules without changing your code
by default, the X-Ray SDK records the first request each second, and five percent of any additional requests
one request per second is the reservior, which requests that at least one trace is recorded each second as long the service is serving requests
Five percent is the rate, at which additional requests beyond the reservior size are sampled

X-Ray with Beanstalk

Elastic Beanstalk platforms include the X-Ray daemon
you can run the daemon by setting an option in the Elastic Beanstalk console or with a configuration file (in .ebextension/xray-daemon.config)
make sure to give your instance profile the correct IAM permissions so that the X-Ray daemon can function correctly
then make sure your application code is intrumentated with the X-Ray SDK
note: the X-Ray daemon is not provided for multicontainer Docker

CloudTrail

provides governance, compliance, and audit for your AWS account
CloudTrail is enabled by default
get an history of events / API calls made within your AWS accounts by
- console
- SDK
- CLI
- AWS services
can put logs from CloudTrail into CLoudWatch logs or S3
a trail can be applied to All regions (default), or a single region
if a resource is deleted in AWS, investigate CloudTrail first

CloudTrail events

management events
- operations that are performed on resources in you account
- examples
  - configuring security
  - configuring rules for routing data
  - setting up logging
- by default, trails are configured to log management events
- can separate read events (that don’t modify resources) and write events (that may modify resources)
data events
- by default, data events are not logged (because high volume operations)
- S3 object-level activity
- can separate read and write events
- lambda function execution activity

CloudTrail insights

enable CloudTrail insights to detect unusual activity in your account
- inaccurate resource provisioning
- hitting service limits
- bursts of IAM actions
- gaps in periodic maintenance activity
CloudTrail insights analyzes normal management events to create a baseline
and then continuously analyzes write events to detect usuaual patterns
- anomalies appear in the CloudTrail console
- event is sent to S3
- eventBridge is generated

CloudTrail events retention

events are stored for 90 days in CloudTrail
to keep events beyond this period, log them to S3 and use Athena

CloudTrail vs CloudWatch vs X-Ray

CloudTrail
- audit API calls made by users / services / AWS console
- useful to detect unauthorized calls or root cause of changes
CloudWatch
- metrics over time for monitoring
- logs for storing application log
- alarms to send notifications in case of unexpected metrics
X-Ray
- automated trace analysis and central service map visualization
- latency, errors and fault analysis
- request tracking across distributed systems

SQS

Communications between applications

Synchronous
- synchronous between applications can be problematic if there are sudden spikes of traffic
- what if you need to suddenly encode 1000 videos but usually it is 10?
asynchronous
- it is better to decouple your applications
- SQS: queue model
- SNS: pub/sub model
- Kinesis: real time streaming model
- these services can scale independently from our application

SQS - standard queue

fully managed service, used to decouple applications
attributes
- unlimited throughput, unlimited number of messages in queue
- default retention of messages: 4 to 14 days
- law latency
- limitation of 256 KB per message sent
can have duplicate messages (at least once delivery, occasionally)
can have out of order messages (best effort ordering)

Producing messages

produced to SQS using the SDK (SendMessage API)
the message is persisted in SQS until a consumer deletes it

comsuming messages

consumers (running on EC2 instances, servers, or lambda)
poll SQS for messages (receive up to 10 messages at a time)
process the messages (example: insert the message into an RDS database)
delete the messages using the DeleteMessage API

multiple EC2 instances consumers

consumers receive and process messages in parallel
at least once delivery
best effort message ordering
consumers delete messages after processing them
we can scale consumers horizontally to improve throughput of processing (using ASG)

security

encryption
- in flight encryption using HTTPS API
- at rest encryption using KMS keys
- client side encryption if the client wants to perform encryption / decryption itself
access controls: IAM policies to regulate access to SQS API
SQS access policies: (similar to S3 bucket policies)
- useful for cross account access to SQS queues
- useful for allowing other services (SNS, S3…) to write to an SQS queue

message visibility timeout

after a message is polled by a consumer, it becomes invisible to other consumers
by default, the message visibility timeout is 30 seconds
that means the message has 30 seconds to be processed
after the message visibility timeout is over, the message is visible again in SQS
if a message is not processed within the visibility timeout, it will be received by consumer again so it will be processed twice
a consumer could call the ChangeMessageVisibility API to get more time
if visibility timeout is high, and consumer crashes, it will take longer time for the message to become visible in the queue and being consumed by others
if the visibility timeout is low, we may get duplicates

Dead letter queue

if a consumer fails to process a message within the visibility timeout, the message goes back to the queue
we can set a threshold of how many times a message can go back to the queue
after the MaximumRecevies threshold is exceeded, the message goes into a dead letter queue
useful for debugging
make sure to process the messages in the DLQ before they expire
- good to set a retention of 14 days in the DLQ

delay queue

delay a message (consumers don’t see it immediately) up to 15 minutes
default is 0 seconds (message is available right away)
can set a default at queue level
can override the default on send using the DelaySeconds parameter

long polling

when a consumer requests messages from the queue, it can optionally wait for messages to arrive if there are none in the queue
this is called long polling
long polling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application
the wait time can be between 1 to 20 seconds
long polling is preferable to short polling
long polling can be enabled at the queue level or at the API level using WaitTimeSeconds

SQS extended client

message size limit is 256 KB, how to send large messages?
using the SQS extended client (Java Library)
it can be implemented using any language, it first uploads the large object to S3
then send the metadata of that object to SQS, once the consumer received the metadata, it will fetch the real object from S3.

FIFO queue

First in first out
limited throughput: 300 messages / second, without batching, 3000 m/s with batching
exactly once send capability (by removing duplicates)
messages are processed in order by the consumer

FIFO deduplication

deduplication interval is 5 minutes
two deduplication methods
- content based deduplication: will do a SHA-256 hash of the message body
- explicitly provide a message deduplication ID
if the queue receives messages with the same hash key or the same deduplication ID, it will refuse to receive the message

message grouping

if you specify the same value of MessageGroupID in an SQS FIFO queue, you can only have one consumer, and all the messages are in order
to get ordering at the level of a subset of messages, specify different values for MessageGroupID
- messages that share a common message group ID will be in order within the group
- each group ID can have a different consumer (parallel processing)
- ordering across groups is not guaranteed

SNS

what if you want to send one message to many receivers?
the event producer only sends message to one SNS topic
as many event receivers as we want to listen to the SNS topic notifications
each subscriber to the topic will get all the messages (note: new feature to filter messages)
up to 10 million subscriptions per topic
100k topics limit
subscribers can be
- SQS
- HTTP / HTTPS
- lambda
- emails
- SMS messages
- mobile notifications
SNS integrates with a lot of AWS services
- many AWS services can send data directly to SNS for notifications
- CloudWatch alarms
- ASG notifications
- S3
- CloudFormation (upon state changes => failed to build etc…)

How to publish

topic publish (using the SDK)
- create a topic
- create a subscription
- publish to the topic
direct publish (for mobile apps SDK)
- create a platform application
- create a platform endpoint
- publish to the platform endpoint
- works with Google GCM, Apple APNS, Amazon ADM…

security

encryption
- in flight encryption using HTTPS API
- at rest encryption using KMS keys
- client side encryption if the client wants to perform encryption / decryption itself
access controls: IAM policies to regulate access to the SNS API
SNS access policies (similar to S3 bucket policies)
- useful for cross account to SNS topic
- useful for allowing other services to write to an SNS topic

push once in SNS, receive in all SQS queues that are subscribers
fully decoupled, no data loss
SQS allows for: data persistence, delayed processing and retries of work
ability to add more SQS subscribers over time
make sure your SQS queue access policy allows for SNS to write

S3 events to multiple queues

for the same combination of: event type and prefix, you can only have one S3 event rule
if you want to send the same S3 event to many SQS queues, use fanout (SNS + SQS)

similar features as SQS FIFO
- ordering by message group ID
- deduplication using a deduplication ID or Content based deduplication
can only have SQS FIFO queues as subscribers
limited throughput (same throughput as SQS FIFO)

message filtering

JSON policy used to filter messages sent to SNS topic’s subscriptions
if a subscription doesn’t have a filter policy, it receives every message

Kinesis

Kinesis data streams

billing is per shard provisioned, can have as many shards as you want
retention between 1 to 365 days
ability to reprocess data (because data will not be deleted by consumer, it stays in Kinesis data streams until retention period is over)
once data is inserted in Kinesis, it can’t be deleted (immutability)
data that shares the same partition goes to the same shard (shard level ordering)
producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis agent
consumers
- write your own: Kinesis Client Library (KCL), AWS SDK
- managed: AWS lambda, Kinesis data firehose, Kinesis data analytics

Kinesis data streams security

control access / authorization using IAM policies
encryption in flight using HTTPS
encryption at rest using KMS
you can implement encryption / decryption of data on client side
VPC endpoints available for Kinesis to access within VPC (e.g. EC2 instance in private subnet access Kinesis data stream using VPC endpoint)
monitor API calls using CloudTrail

Kinesis consumers

Kinesis consumer types

Shared fanout consumer - pull	enhanced fanout consumer - push
low number of consuming applications	multiple consuming applications for the same stream
read throughput 2MB/ second per shard across all consumers	2 MB / second per consumer per shard
max 5 GetRecords API calls / sec	-
latency ~200ms	latency ~ 70ms
minimize cost	higher cost
consumers poll data from Kinesis using GetRecords API call	Kinesis push data to consumers over HTTP
returns up to 10MB or up to 10000 records	soft limit of 5 consumer applications per data stream

Kinesis Client library (KCL)

a Java library that helps read record from a Kinesis Data Stream with distributed applications sharing the read workload
each shard is to be read by only one KCL instance
- e.g. 4 shards => max 4 KCL instances
progress is checkpointed into DynamoDB (needs IAM access from KCL instance to DynamoDB), this means if one KCL instance is down, DynamoDB will save the checkpoint and knows where to resume when KCL instance goes backup
track other workers and share the work amongst shards using DynamoDB
KCL can run on EC2, elastic Beanstalk and on premises
records are read in order at the shard level
versions
- KCL 1.x (supports shared consumer)
- KCL 2.x (supports shared and enhanced fanout consumer)

Kinesis operations

Shard splitting

used to increase the Stream capacity
used to divide a hot shard
the old shard is closed and will be deleted once the data is expired (until the retention period is over)
no automatic scaling (manually increase / decrease capacity)
can’t split into more than two shards in a single operation

merging shards

decrease the Stream capacity and save costs
can be used to group two shards with low traffic
old shards are closed and will be deleted once the data is expired
can’t merge more than two shards in a single operation

Kinesis data firehose

fully managed service, no administration, automatic scaling, serverless
- target: redshift, S3, ElasticSearch
- third party
- custom HTTP endpoint
pay for data going through firehose
near real time
- 60 seconds latency minimum for non full batches
- or minimum 32 MB of data at a time
- it is not real time because it will batch the data into 60 seconds of data or 32MB of data
supports many data formats, conversions, transformations, compression
supports custom data transformations using AWS lambda
can send failed or all data to a backup S3 bucket

Kinesis data streams vs Firehose

Kinesis data streams	Kinesis data firehose
streaming service for ingest at scale	load streaming data into S3 / redshift / ElasticSearch / Thrid party / custom HTTP
write custom code (producer / consumer)	fully managed
real time (~200ms)	near real time (60 seconds or 32MB)
manage scaling (shard spliting / shard merging)	automatic scaling
data storage for 1 to 365 days	no data storage
supports replay capability	doesn’t support replay capability

Kinesis data analytics (SQL application)

perform real time analytics on Kinesis streams using SQL
fully managed, no server to provision
automatic scaling
real time analytics
pay for actual consumption rate
can create streams out of the real time queries
use cases
- time series analytics
- real time dashboards
- real time metrics

SQS

consumer pull data
data is deleted after being consumed
can have as many as workers as we want
no need to provision throughput
ordering guarantees only on FIFO queues
individual message delay capability

push data to many subscribers
data is not persisted (lost if not delivered)
pub/sub
no need to provision throughput
integrates with SQS for fanout architecture pattern
FIFO capability for SQS FIFO

Kinesis

standard: pull data, 2 MB per shard
enhanced fanout: push data, 2 MB per shard per consumer
possibility to replay data
meant for real time big data, analytics and ETL
ordering at the shard level
data expires after X days
must provision throughput

Kinesis vs SQS ordering

let’s assume 100 trucks, 5 kinesis shards, 1 SQS FIFO
Kinesis data streams
- on average you will have 20 trucks per shard
- trucks will have their data ordered within each shard
- the maximum amount of consumer in parallel we can have is 5
SQS FIFO
- you only have one SQS FIFO queue
- you will have 100 group ID
- you can have up to 100 consumers (due to the 100 group ID)
- you have up to 300 message per second (or 3000 if using batching, because one GetRecords API call can receive up to 10 messages)

Lambda

what is serverless

serverless is a new paradigm in which the developers don’t have to manage servers anymore
they just deploy code
serverless was pioneered by AWS lambda but now also includes anything that is managed: databases, messaging, storage, etc…
serverless does not mean there are no servers, it means you just don’t manage / provision / see them

serverless in AWS

lambda
DynamoDB
Cognito
API Gateway
S3
SNS and SQS
Kinesis data firehose
Aurora serverless
Step functions
Fargate

Lambda synchronous invocations

synchronous: CLI, SDK, API Gateway, ALB
- results is returned right away
- error handling must happen client side

lambda integration with ALB

to expose a lambda function as an HTTP endpoint
you can use the ALB or an API gateway
the lambda function must be registered in a target group
ALB will convert the request HTTP to JSON and convert the response JSON to HTTP

ALB multi healer values

ALB can support multi header values
when you enable multi value headers, HTTP headers and query string parametersthat are sent with multiple values are shown as arrays within the AWS lambda event and response objects

HTTP
http://example.com/path?name=foo&name=bar

JSON
"queryStringParameters":{"name":["foo", "bar"]}

lambda@Edge

you have deployed a CDN using CloudFront
what if you wanted to run a global lambda alongside?
or how to implement request filtering before reaching your application?
for this, you can use Lambda@edge, deploy lambda functions alongside your CloudFront CDN
- build more responsive applications
- you don’t manage servers, lambda is deployed globally
- customize the CDN content
- pay only for what you use
you can use lambda to change CloudFront requests and responses
- after CloudFront receives a request from a viewer
- before CloudFront forwards the request to the origin
- after CloudFront receives the response from the origin
- before CloudFront forwards the response to the viewer
you can also generate responses to viewers without ever sending the request to the origin

lambda - asynchronous invocations

S3, SNS, CloudWatch events
the events are placed in an event queue
lambda attempts to retry on errors
- 3 tries total
- 1 minute after first, then 2 minutes wait
make sure the processing is idempotent (result is the same after retry)
if the function is retried, you will see duplicate logs entries in CloudWatch logs
can define a DLQ - SNS or SQS - for failed processing (need correct IAM permissions for lambda to write to SQS)
asynchronous invocations allow you to speed up the processing if you don’t need to wait for the result

lambda event source mapping

Kinesis data Streams and DynamoDB Streams
SQS and SQS FIFO queue
common denominator: records need to be pulled from the source
your lambda function is invoked synchronously

Streams and lambda (Kinesis and DynamoDB)

an event source mapping creates an iterator for each shard, processes items in order
start with new items, from the beginning or from timestamp
processed items aren’t removed from the stream (other consumers can read them again)
if traffic is low, we can use batch window to accumulate records before processing
you can process multiple batches in parallel
- up to 10 batches per shard
- in order processing is still guaranteed for each partition key

Streams and lambda - error handling

by default, if your function returns an error, the entire batch is reprocessed until the function succeeds, or the items in the batch expire
to ensure in order processing, processing for the affected shard is paused until the error is resolved
you can configure the event source mapping to
- discard old events
- restrict the number of retries
- split the batch on error (to work around lambda timeout issue, maybe there is not enough time to process the whole batch, so we split the batch to make it small and faster to process)
discarded events can go to a Destination

SQS and SQS FIFO with lambda

event source mapping will pull SQS (long polling)
specify batch size (1 to 10 messages)
recommended: set the queue visibility timeout to 6x the timeout of your lambda function
to use a DLQ:
- setup on the SQS queue, not lambda (DLQ for lambda is only for async invocations)
- or use a lambda Destination for failures
lambda also supports in order processing for FIFO queues, scaling up to the number of active message groups
for standard queues, items aren’t necessarily processed in order
lambda scales up to process a standard queue as quickly as possible
when an error occurs, batches are returned to the queue as individual items and might be processed in a different grouping than the original batch
occasionally, the event source mapping receive the same item from the queue twice, even if no function error occurred
lambda deletes items from the queue after they are processed successfully
you can configure the source queue to send items to a DLQ if they can’t be processed

lambda event mapper scaling

Kinesis data streams and DynamoDB streams
- one lambda invocation per stream shard
- if you use parallelization, up to 10 batches processed per shard simultaneously
SQS standard
- lambda adds 60 more instances per minute to scale up
- up to 1000 batches of messages processed simultaneously
SQS FIFO
- messages with the same group ID will be processed in order
- the lambda function scales to the number of active message groups

lambda - Destinations

for asynchronous invocations, we can define destinations for successful and failed event
- SQS
- SNS
- lambda
- EventBridge bus
note: AWS recommends you use Destinations instead of DLQ now (but both can be used at the same time)

lambda permissions - IAM roles and resource policies

lambda execution role
- grants the lambda function permissions to AWS services / resources
- when you use an event source mapping to invoke your function, lambda uses the execution role to read event data (e.g. lambda need permission to pull messages from SQS)
lambda resource based policies
- use resource based policies to give other accounts and AWS services permission to use your lambda resources
- similar to S3 bucket policies for S3 bucket
- an IAM principal can access lambda
  - if the IAM policy attached to the principal authorizes it (user access)
  - or if the resource based policy authorizes (service access)
- when an AWS service like S3 calls your lambda function, the resource based policy gives it access

lambda environment variables

environment variable = key / value pair in string form
adjust the function behavior without updating code
the environment variable are available to your code
lambda service adds its own system environment variables as well
helpful to store secrets (encrypted by KMS)
secrets can be encrypted by the lambda service key, or your own CMK

lambda logging and monitoring

CLoudWatch logs
- lambda execution logs are stored in AWS CloudWatch logs
- make sure your AWS lambda function has an execution role with an IAM policy that authorizes writes to CloudWatch logs
CLoudWatch metrics
- lambda metrics are displayed in AWS CloudWatch metrics
- invocations, Durations, concurrent executions
- error count, success rates, throttles
- async delivery failures
- iterator age (lagging for Kinesis and DynamoDB streams)

lambda tracing with X-Ray

enable in lambda configuration (active tracing)
runs the X-Ray daemon for you
use AWS X-Ray SDK in code
ensure lambda function has a correct IAM execution role to write to X-Ray
- the managed policy is called: AWSXRayDaemonWriteAccess

lambda in VPC

lambda by default

by default, your lambda function is launched outside your own VPC (in an AWS owned VPC)
therefore it cannot access resources in your VPC

lambda in VPC

you must define the VPC ID, the subnets and the security groups
lambda will create an ENI in your subnets
lambda needs AWSLambdaVPCAccessExecutionRole

internet access

a lambda function in your VPC does not have internet access
deploying a lambda function in a public subnet does not give it internet access or a public IP
deploying a lambda function in a private subnet gives it internet access if you have a NAT gateway / NAT instance
you can use VPC endpoints to privately access AWS services without a NAT

lambda function performance

configuration

RAM
- from 128MB to 3008MB in 64MB increments
- the more RAM you add, the more vCPU credits you get
- at 1792MB, a function has the equivalent of one full vCPU
- after 1792MB, you get more than one CPU, and need to use multi threading in your code to benefit from it
if your application is CPU-bound (computation heavy), increase RAM
timeout: default 3 seconds, maximum is 900 seconds

lambda execution context

the execution context is a temporary runtime environment that initialize any external dependencies of your lambda code
great for database connections, HTTP clients, SDK clients…
the execution context is maintained for some time in anticipation of another lambda function invocation
the next function invocation can reuse the context to execution time and save time in initializing connections objects (e.g. establish database connection outside of function handler)
the execution context includes the /tmp directory

lambda function `/tmp` space

if your lambda function needs to download a big file to work
if your lambda function needs disk space to perform operations
you can use the /tmp directory
max size is 512 MB
the directory content remains when the execution context is frozen, providing transient cache that can be used for multiple invocations (helpful to checkpoint your work)
for permanent persistence of object, use S3

lambda concurrency

concurrency limit: up to 1000 concurrent executions across entire account, so if one of your lambda function takes up all the concurrencies (if you didn’t setup reserved concurrency limit), the other lambda functions will be throttled.
can set a reserved concurrency at the function level
each invocation over the concurrency limit will trigger a throttle
throttle behavior
if synchronous invocation = return throttle error 429
if asynchronous invocation = retry automatically and then go to DLQ
if you need a higher limit, open a support ticket

lambda concurrency and asynchronous invocations

if the function doesn’t have enough concurrency available to process all events, additional requests are throttled
for throttling errors and system errors, lambda returns the event to the queue and attempts to run the funtion again for up to 6 hours
the retry interval increases exponentially from 1 second after the first attempt to a maximum of 5 minutes

Cold start and provisioned concurrency

cold start
- new instance => code is loaded and code outside the handler run (init)
- if the init is large, this process can take some time
- first request served by new instances has higher latency than the rest
provisioned concurrency
- concurrency is allocated before the function is invoked (in advance)
- so the cold start never happens and all invocations have low latency
- application auto scaling can manage concurrency

lambda external dependencies

if your lambda function depends on external libraries
- for example AWS X-Ray SDK, database client, etc…
you need to install the packages alongside your code and zip it together
upload the zip straight to lambda if less than 50MB, else to S3 first and reference from S3
native libraries work: they need to be complied on Amazon Linux
AWS SDK comes by default with every lambda function

lambda and CloudFormation

inline

inline functions are very simple
use the code.zipfile property
you cannot include function dependencies with inline functions

through S3

you must store the lambda zip in S3
you must refer the S3 zip location in the CloudFormation code
- S3 bucket
- S3 key: full path to zip
- S3 object version: if versioned bucket
if you update the code in S3, but don’t update S3 bucket, S3 key or S3 object version, CloudFormation won’t update your function because it will not detect the change

lambda layers

externalize dependencies to re use them

lambda container images

deploy lambda function as container images of up to 10GB from ECR
pack complex dependencies, large dependencies in a container
base images are available
can create your own image as long as it implements the lambda runtime API
test the containers locally using the lambda runtime interface emulator
unified workflow to build apps

lambda versions and aliases

lambda versions

when you work on a lambda function, we work on $LATEST, which is an unpublished mutable version
when we are ready to publish a lambda function, we create a version
versions are immutable
versions have increasing version numbers
versions get their own ARN
version = code + configuration
each version of the lambda function can be accessed

lambda aliases

aliases are pointers to lambda function versions
we can define a dev, test, prod aliases and have them point at different lambda versions
aliases are mutable
aliases enable Blue / Green deployment by assigning weights to lambda functions
aliases enable stable configuration of our event triggers / destinations
aliases have their own ARNs
aliases cannot reference other aliases

lambda and CodeDeploy

CodeDeploy can help you automate traffic shift for lambda aliases
feature is integrated within the SAM framework
linear
- grow traffic every N minutes until 100%
canary
- try X percent then 100%
AllAtOnce
- immediate
can create pre and post traffic hooks to check the health of the lambda function

lambda limits good to know - per region

memory allocation: 128MB - 10 GB
maximum execution time: 15 minutes
environment variables: 4KB
disk capacity in the function container in /tmp: 512 MB
concurrency executions: 1000
lambda function deployment size(zipped): 50MB
size of uncompressed deployment(code + dependencies): 250MB
can use the /tmp directory to load other files at startup

lambda best practices

perform heavy duty work outside of your function handler
- connect to databases
- initilize the SDK
- pull in dependencies
use environment variables for
- database connection sttrings, S3 buckets, etc…
- passwords, sensitive values
minimize your deployment package size to its runtime necessities
- break down the function
- remember lambda limits
- use Layers where necessary
aviod using recursive code, never have a lambda function call itself

DynamoDB

NoSQL database

non-relational databases and are distributed
include MongoDB, DynamoDB…
do not support query joins (or just limited support)
all the data that is needed for a query is present in one row
don’t perform aggregations such as SUM, AVG…
scale horizontally
there is no right or wrong for NoSQL or SQL, they just require to model the data differently and think about user queries differently

Amazon DynamoDB

fully managed, highly available with replication across multiple AZ
NoSQL database
scales to massive workloads, distributed database
millions of requests per second, trillions of row, 100s of TB of storage
fast and consistent in performance (low latency on retrieval)
integrated with IAM for security, authorization and administration
enables event driven programming with DynamoDB streams
low cost and auto scaling capabilities

basics

DynamoDB is made of Tables
each table has a Primary Key (must be decided at creation time)
each table can have an infinite number of items
each item has attributes (can be added over time - can be null)
maximum size of an item is 400KB
data types supported are:
- scalar types: String, Number, Binary, Boolean, Null
- Document types: List, Map
- Set Types: String Set, Number Set, Binary Set
Primary keys
- Partition Key (HASH)
  - partition key must be unique for each item
  - partition key must be diverse so that the data is distributed
- Partition Key + Sort Key (HASH + RANGE)
  - the combination must be unique for each item
  - data is grouped by partition key

Read / Write capacity modes

control how you manage your table’s capacity
provisioned mode (default)
- you specify the number of reads/ writes per second
- you need to plan capacity beforehand
- pay for provisioned read / write capacity units
on demand mode
- read / writes automatically scale up / down with your workloads
- no capacity planning needed
- pay for what you use, more expensive
you can switch between different modes once every 24 hours

R/W capacity modes - provisioned

table must have provisioned read an dwrite capacity units
read capacity units (RCU)
write capacity units
option to setup auto scaling of throughput to meet demand
throughput can be exceeded temporarily using brust capacity
if burst capacity has been consumed, you will get a ProvisionedThroughpuutExceededException
it is then advised to do an exponential backoff retry

Write Capacity units (WCU)

one WCU represents one write per second for an item up to 1KB in size
if the items are larget then 1 KB, more WCUs are consumed

Strongly consistent read vs Eventually consistent read

Eventually consistent read (default)
- if we read just after a write, it is possible we will get some stale data because of replication
Strongly consistent read
- if we read just after a write, we will get the correct data
- set ConsistentRead parameter to True in API calls
- consumes twice the RCU

Read capacity units (RCU)

one RCU represents one Strongly Consistent Read per second, or two Eventually consistent reads per second, for an item up to 4KB
if the items are larger than 4KB, more RCUs are consumed

Paritions Internal

data is stored in partitions
partition keys go through a hashing algorithm to know to which partition they go to
WCUs and RCUs are spread evenly across partitions

Throttling

if we exceed provisioned RCUs or WCUs, we get ProvisionedThroughputExceededException
reasons
- hot keys: one partition key is being read too many times (popular item)
- hot partitions
- very large items, remember RCU and WCU depends on size of items
solutions
- exponential backoff when exception is encountered
- distribute partition keys as much as possible
- if RCU issue, we can use DynamoDB Accelerator (DAX)

on demand

Read and writes automatically scale up and down with your workloads
no capacity planning needed
unlimited WCU and RCU, no throttle, more expensive
you are charged for reads and writes that you use in terms of RRU and WRU
read request units (RRU) - throughput for reads (same as RCU)
write request units (WRU) - throughput for writes (same as WCU)
2.5x more expensive than provisioned capacity
use cases: unknown workloads, unpredictable application traffic…

writing data

PutItem
- creates a new item or fully replace an old item
- consumers WCUs
UpdateItem
- edits an existing item’s attributes or adds a new item if it doesn’t exist
- can be used to implement Atomic Counters - a numeric attribute that is unconditionally incremented
conditional writes
- accept a write / update / delete only if conditions are met, otherwise returns an error
- helps with concurrent access to items
- no performance impact

reading data

GetItem
- read based on primary key
- primary key can be HASH or HASH + RANGE
- eventually consistent read
- option to use strongly consistent reads (more RCU - might take longer)
- ProjectionExpression can be specified to retrieve only certain attributes

reading data - query

query returns items based on
- KeyConditionExpression
  - partition key value - required
  - sort key value = optional
- FilterExpression
  - additional filtering after the query operation (before data returned to you)
  - use only with non key attributes
returns
- the number of items specified in limit
- or up to 1 MB of data
ability to do pagination on the results
can query table, a local secondary index, or a global secondary index

reading data - scan

scan the entire table and then filter out data (inefficient)
returns up to 1 MB of data - use pagination to keep on reading
consumes a lot of RCU
limit impact using Limit or reduce the size of the result and pause
for faster performance, use parallel scan
- multiple workers scan multiple data segments at the same time
- increases the throughput and RCU consumed
- limit the impact of parallel scans just like you would for Scans
can use ProjectionExpression and FilterExpression
filtering will be done at the client side (e.g. in the browser)

deleting data

DeleteItem
- delete an individual item
- ability to perform a conditional delete
DeleteTable
- delete a whole table and all its items
- much quicker deletion than calling DeleteItem on all items

batch operations

allows you to save in latency by reducing the number of API calls
operations are done in parallel for better efficiency
part of a batch can fail, in which case we need to try again for the failed items
BatchWriteItem
- up to 25 PutItem and DeleteItem in one call
- up to 16 MB of data written, up to 400KB of data per item
- can’t update items
BatchGetItem
- return items from one or more tables
- up to 100 items, up to 16 MB of data
- items are retrieved in parallel to minimize latency

Local Secondary Index (LSI)

alternative sort key for your table (use the same partition key)
the sort key consists of one scalar attribute
up to 5 local secondary indexes per table
must be defined at table creation time
attribute projections - can contain some or all the attributes of the base table

Global secondary index (GSI)

alternative Primary key (HASH + HASH + RANGE) from the base table
speed up queries on non key attributes
the index key consists of scalar attributes
attribute projections - some or all the attributes of the base table
must provision RCUs and WCUs for the index
can be added / modified after table creation

indexes and throttling

GSI
- if the writes are throttled on the GSI, then the main table will be throttled
- even if the WCU on the main tables are fine
- choose your GSI partition key carefully
- assign your WCU capacity carefully
LSI
- uses the WCUs and RCUs of the main table
- no special throttling considerations

Optimistic locking

DynamoDB has a feature called Conditional Writes
a strategy to ensure an item hasn’t changed before you update / delete it
each item has an attribute that acts as a version number, and each update / delete request will change the value of the item, and also update the version number
if two request send at the same time, only one will succeed because the second request will not try to change the item because the version is different already.

DynamoDB DAX

fully managed, highly available, seamless in memory cache for DynamoDB
microseconds latency for cached reads and queries
doesn’t require application logic modification
solves the hot key problem (too many reads)
5 minutes TTL for cache (default)
up to 10 nodes in the cluster
multi AZ
secure

DAX vs ElastiCache

DAX is for individual object cache and simple query and scan
ElastiCache can store aggregation result and complex intermediate results

DynamoDB Streams

ordered stream of item level modifications in a table
stream records can be
- sent to Kinesis Data Streams
- read by AWS lambda
- read by Kinesis Client Linrary applications
data retention for up to 24 hours
use case
- react to changes in real time
- analytics
- insert into derivative tables
- insert into ElasticSearch
- implement cross region replication
ability to choose the information that will be written to the stream
- KEYS_ONLY - only the key attributes of the modified item
- NEW_IMAGE - the entire item, as it appears after it was modified
- OLD_IMGAE - the entire item, as it appeared before it was modified
- NEW_AND_OLD_IMAGES - both the new and old images of the item
DynamoDB streams are made of shards, just like Kinesis Data Streams, so Kinesis KCL can be the consumer for DynamoDB Streams
you don’t need to provision shards, this is automated by AWS
records are not retroactively populated in a stream after enabling it

Streams and lambda

you need to define an Event Source Mapping to read from DynamoDB streams
you need to ensure the lambda function has the appropriate permissions
your lambda function is invoked synchronously

DynamoDB TTL

automatically delete items after an expiry timestamp
doesn’t consume any WCUs
the TTL attribute must be a number data type with Unix Epoch timestamp value
expired items deleted within 48 hours of expiration
expired items that haven’t been deleted, appears in reads/queries/scans (if you don’t want them, filter them out)
expired items are deleted from both LSIs and GSIs
a delete operation for each expired item enters the DynamoDB streams (can help recover expired items)
use cases: reduce stored data by keeping only current items, adhere to regulatory obligations, user sessions…

DynamoDB CLI

--projection-expression: one or more attributes to retrieve
--filter-expression: filter items before returned to you
general CLI pagination options
- --page-size: specify that CLI retrieves the full list of items but with a larger number of API calls instead of one API call
- --max-items: max number of items to show in the CLI (returns NextToken)
- --starting-token: specify the last NextToken to retrieve the next set of items

DynamoDB transactions

coordinated, all or nothing opeartions to multiple items across one or more tables
provides Atomicity, Consistency, Isolation, and Durability (ACID)
read modes - Eventual consistency, strong consistency, transactional
write modes - standard, transactional
consumers 2x WCUs and 2x RCUs
two operations
- TransactGetItems - one or more GetItem operations
- TransactWriteItems - one or more PutItem, UpdateItem, DeleteItem operations
use cases: financial transactions, managing orders, multi player games…

DynamoDB Session State Cache

it is common to use DynamoDB to store session state
vs ElastiCache
- ElastiCache is in memory, but DynamoDB is serverless with auto scaling
- both are key value pairs
vs EFS
- EFS must be attached to EC2 instances as a network drive
vs EBS and Instance store
- EBS and Instance store can only be used for local caching, not shared caching
vs S3
- S3 is higher latency, and not meant for small objects

DynamoDB Security and other features

security
- VPC endpoints available to access DynamoDB without using the internet
- access fully controled by IAM
- encryption at rest using KMS and in transit using SSL/TLS
backup and restore feature available
- point in time recovery (PITR) like RDS
- no performance impact
global tables
- multi region, multi active, fully replicated, high performance, need to enable DynamoDB streams first
DynamoDB local
- develop and test apps locally without accessing the DynamoDB web service (without internet)
AWS database migration service can be used to migrate to DynamoDB

Fine-Grained access control

using web identity federation or cognito identity pools, each user gets AWS credentials
you can assign an IAM role to these users with a condition to limit their API access to DynamoDB
Leading Keys - limit row level access for users on the primary key
Attributes - limit specific attributes the user can see

API Gateway

Integrations high level

lambda function
- invoke lambda function
- easy way to expose REST API backed by lambda
HTTP
- expose HTTP endpoints in the backend
- why? add rate limiting, caching, user authentications, API keys, etc…
AWS service
- expose any API through the API Gateway
- example: Step function workflow, post a message to SQS
- why? add authentication, deploy publicly, rate control…

endpoint types

Edge-Optimized (default): for global clients
- requests are routed through the CloudFront Edge locations
- the API Gateway still lives in only one region
regional
- for clients within the same region
- could manually combine with CloudFront (more control over the caching strategies and the distribution)
private
- can only be accessed from your VPC using an interface VPC endpoint (ENI)
- use a resource policy to define access

Deployment stages

making changes in the API Gateway does not mean they are effective
you need to make a deployment for them to be in effect
changes are deployed to Stages (as many as you want)
use the naming you like for stages (dev, test, prod)
each stage has its own configuration parameters
stages can be rolled back as a history of deployments is kept

stage variables

stage variables are like environment variables for API Gateway
use them to change often changing configuration values
they can be used in
- lambda function ARN
- HTTP endpoint
- parameter mapping templates
use cases:
- configure HTTP endpoints your stages talk to
- pass configuration parameters to lambda through mapping templates
stage variables are passed to the context object in lambda

stage variables with lambda aliases

we can create a stage variable to indicate the corresponding lambda alias
our API gateway will automatically invoke the right lambda function

canary deployment

possibility to enable canary deployments for any stage
choose the percentage of traffic the canary channel receives
metrics and logs are separate (for better monitoring)
possiblity to override stage variables for canary
this is Blue / Green deployment with lambda and API gateway

API Gateway - Integration types

MOCK
- API Gateway returns a response without sending the request to the backend (for testing and dev purpose)
HTTP / AWS
- you must configure both the integration request and integration response
- setup data mapping using mapping templates for the request and response
AWS_PROXY (lambda proxy)
- incoming request from the client is the input to lambda
- the function is responsible for the logic of request / response
- no mapping template, headers, query string parameters
HTTP_PROXY
- no mapping template
- the HTTP request is passed to the backend
- the HTTP response from the backend is forwarded by API gateway

Mapping template

mapping templates can be used to modify request / response
rename and modify query string parameters
modify body content
add headers
uses Velocity template language
filter output results

Mapping template: JSON to XML with SOAP

SOAP API are XML based, whereas REST API are JSON based
in this case, API gateway should
- extract data from the request: either path, payload or header
- build SOAP message based on request data (mapping template)
- call SOAP service and receive XML response
- transform XML response to desired format and respond to the user

API Gateway Swagger / Open API spec

common way of defining REST APIs, using API defintion as code
import existing Swagger / OpenAPI 3.0 spec to API Gateway
- method
- method request
- integration request
- method response
- extensions for API Gateway and setup every single option
can export current API as Swagger / OpenAPI spec
swagger can be written in YAML or JSON

Caching API response

caching reduces the number of calls made to the backend
default TTL is 300 seconds
caches are defined per stage
possible to override cache settings per method
cache encryption option
cache capacity between 0.5 to 237 GB
cache is expensive, makes sense in production, may not make sense in dev and test

API Gateway cache invalidation

able to flush the entire cache immediately
clients can invalidate the cache with header: Cache-Control: max-age=0 (with proper IAM authorization)
if you don’t impose an InvalidateCache policy or choose the require authorization check box in the console, any client can invalidate the API cache, which is not good.

Usage plan and API keys

if you want to make an API available as an offering to your customers
usage plan
- who can access one or more deployed API stages and methods
- how much and how fast they can access them
- uses API keys to identify API clients and meter access
- configure throttling limits and quota limits that are enforced on individual client
API keys
- alphanumberic string values to distribute to your customers
- can use with usage plans to control access
- throttling limits are applied to API keys
- quotas limits is the overall number of maximum requests

logging and tracing

CloudWatch logs
- enable CloudWatch logging at the stage level
- can override settings on a per API basis
- log contains information about request / response body
X-Ray
- enable tracing to get extra information about requests in API gateway
- X-Ray API Gateway + Lambda gives you the full picture

CloudWatch metrics

metrics are by stage, possiblity to enable detailed metrics
CacheHitCount and CacheMissCount: efficiency of the cache
Count: the total number of API requests in a given period
IntegrationLatency: the time between when API Gateway relays a request to the backend and when receives a response from the backend
Latency: the time between when API gateway receives a request from a client and when it returns a response to the client, the latency includes the integration latency and other API gateway overhead
4xx Error (client side) and 5xx error (server side)

throttling

account limit
- API gateway throttles requests at 10000 rps across all API
- soft limit that can be increased upon request
in case of throttling = 429 too many requests
can set stage limit and method limits to improve performance
or you can define usage plans to throttle per customer
just like lambda concurrency, one API that is overloaded, if not limited, can cause the other APIs to be throttled too.

CORS

CORS must be enabled when you receive API calls from another domain
the OPTIONS pre flight request must contain the following headers
- Access-Control-Allow-Methods
- Access-Control-Allow-Headers
- Access-Control-Allow-Origin
CORS can be enabled through the console

Authentication and Authorization

IAM
- great for users already within your AWS accounts + resource policy for cross account
Custom Authorizer
- great for third party tokens
- very flexible in terms of what IAM policy is returned
Cognito User Pool
- you manage your own user pool
- no need to write any custom code
- must implement authorization in the backend

WebSocket API

what is WebSocket
- two way interactive communication between a user’s browser and a server
- server can push information to the client
- this enables stateful application use cases
WebSocket APIs are often used in real time applications such as chat applications, collaboration platforms, multiplayer games, and financial trading platforms
works with AWS services (lambda, DynamoDB) or HTTP endpoints

Routing

incoming JSON messages are routed to different backend
if no routes => send to default
you request a route selection expression to select the field on JSON to route from
the result is evaluated against the route keys available in your API gateway
the route is then connceted to the backend you have setup through API gateway

Architecture

create a single interface for all the microservices in your company
use API endpoints with various resources
apply a simple domain name and SSL certificates
can apply forwarding and transformation rules at the API gateway level

SAM (serverless application model)

framework for developing and deploying serverless applications
all the configurations is YAML code
generate complex CloudFormation from simple SAM YAML file
supports anything from CLoudFormation
only two commmands to deploy to AWS
SAM can use CodeDeploy to deploy lambda functions
SAM can help you to run lambda, API gateway, DynamoDB locally

Recipe

transform header indicates its SAM template
- Transform:
write code
- AWS::Serverless::Function
- AWS::Serverless::Api
- AWS::Serverless::SimpleTable
package and deploy
- aws cloudformation package / sam package
- aws cloudformation deploy / sam deploy

SAM policy templates

list of templates to apply permissions to your lambda functions
important examples
- S3ReadPolicy: give read only permissions to objects in S3
- SQSPollerPolicy: allows to poll an SQS queue
- DynamoDBCrudPolicy: CRUD = create read update delete

MyFunction:
  Type: 'AWS::Serverless::Function'
  Properties:
    CodeUri: xxxxx
    Handler: xxxxxx
    Runtime: xxxxxx
    Policies:
      - SQSPollerPolicy:
        QueueName:
          !GetAtt MyQueue.QueueName

SAM Sumary

SAM is built on CloudFormation
SAM requires the Transform and Resources sections
commands to know
- sam build: fetch dependencies and create local deployment artifacts
- sam package: package and upload to Amazon S3, generate CloudFormation template
- sam deploy: deploy to CloudFormation
SAM policy templates for easy IAM policy definition
SAM is integrated with CodeDeploy to do deploy to lambda aliases

Serverless Application Repository (SAR)

managed repository for serverless applications
the applications are packaged using SAM
build and publish applications that can be re used by organizations
- can share publicly
- can share with specific accounts
this prevents duplicate work, and just go straight to publishing
application settings and behavior can be customized using Environment variables

Cloud Development Kit (CDK)

define your cloud infrastructure using a familiar language
contains high level components called constructs
the code is complied into a CloudFormation template (YAML / JSON)
you can therefore deploy infrastructure and application runtime code togther
- great for lambda functions
- great for Docker Containers in ECS / EKS

CDK vs SAM

SAM
- serverless focused
- write your template declaratively in JSON or YAML
- great for quickly getting started with lambda
- leverages CloudFormation
CDK
- all aws services
- write infra in a programming language
- leverages CloudFormation

Cognito

we want to give our users an identity so that they can interact with our application
Cognito user pools
- sign in functionality for app users
- integrate with API gateway and ALB
Cognito Identity Pool (federated identity)
- provide AWS credentials to users so they can access AWS resources directly
- integrate with Cognito user pools as an identity provider
Cognito Sync
- Synchronize data from device to Cognito
- is deprecated and replaced by AppSync

Cognito User Pools

create a serverless database of user for your web and mobile apps
simple login: username and password combination
password reset
email and phone number verification
federated identities: users from Facebook, Google, SAML…
feature: block users if their credentials are compromised elsewhere
login send back a JSON web token (JWT)
Cognito has a hosted authentication UI that you can add to your app to handle signup and signin workflows
using the hosted UI, you have a foundation for integration with social logins, OIDC or SAML
can customize with a custom logo and custom CSS

Cognito Identity Pools

get identities for users so they obtain temporary AWS credentials
your identity pool can include
- public providers (login with Amazon, Facebook, Google, Apple)
- users in an Amazon Cognito user pool
- OpenID Connect Providers and SAML identity providers
- developer authenticated identities
- Cognito identity pools allow for unauthenticated (guest) access
users can then access AWS service directly or through API gateway
- the IAM policies applied to the credentials are defined in Cognito
- they can be customized based on the user_id for fine grained control

IAM roles

default IAM roles for authenticated and guest users
define rules to choose the role for each user based on the user’s ID
you can partition your users’ access using policy variables
IAM credentials are obtained by Cognito identity pools through STS
the roles must have a trust policy of Cognito identity pools

Cognito User Pools vs Cognito Identity Pools

Cognito User Pool
- database of users for your web and mobile application
- allows to federate logins through public social identity provider, OIDC, SAML…
- can customize the hosted UI for authentication
- has triggers with AWS lamdba during the authentication flow
Cognito identity pools
- obtain AWS credentials for your users
- users can login through public social, OIDC, SAML and Cognito User Pools
- users can be unauthenticated
- users are mapped to IAM roles and policies, can leverage policy variables
CUP + CIP = manage users / password + access AWS services

Cognito Sync

Deprecated - use AWS AppSync now
store preferences, configuration, state of app
cross device synchronization
offline capability
store data in datasets
push sync: silently notify across all devices when identity data changes
Cognito Stream: stream data from Cognito into Kinesis
Cognito Events: execute lambda functions in response to events

Step Functions

model your workflows as state machines (one per workflow)
- order fulfillment, data processing
- web applications, any workflow
written in JSON
visualization of the workflow and the execution of the workflow, as well as history
start workflow with SDK call, API gateway, eventbridge

task states

do some work in your state machine
invoke one service
- can invoke a lambda function
- run an batch job
- run an ECS task and wait for it to complete
- insert an item from DynamoDB
- publish message to SNS, SQS
- launch another step function workflow
run an activity
- EC2, Amazon ECS, on premises
- activities poll the step functions for work
- activities send result back to step functions

states

choice state: test for a condition to send to a branch
fail or succeed state: stop execution with failure or success
pass state: simply pass its input to its output or inject some fixed data, without performing work
wait state: provide a delay for a certain amount of time or until a specified time/date
Map state: dynamically iterate steps
parallel state: begin parallel branches of execution

Error handling

any state can encounter runtime errors for various reasons
- state machine definition issues
- task failtures
- transient issues
use retry and catch in the state machine to handle the errors instead of inside the application code
the state may report its own errors

Retry

evaluated from top to bottom
ErrorEquals: match a specific kind of error
IntervalSeconds: initial delay before retrying
BackOffRate: multiple the delay after each retry
MaxAttempts: default to 3, set to 0 for never retried
When max attempts are reached, the catch kicks in

Catch

evaluated from top to bottom
ErrorEquals: match a specific kind of error
Next: state to send to
ResultPath: a path that determines what input is sent to the state specified in the Next field

ResultPath

include the error in the input

AppSync

AppSync is a managed service that uses GraphQL
GraphQL makes it easy for applications to get exactly the data they needed
this includes combining data from one or more sources
retrieve data in real time with WebSocket or MQTT on WebSocket
for mobile apps: local data access and data Synchronization
it all starts with uploading one GraphQL schema

Security

there are four ways you can authorize applications to interact with your AppSync GraphQL API
- API KEY
- IAM
- OPENID_CONNECT
- COGNITO USER POOLS
for custom domain and HTTPS, use CloudFront in front of AppSync

STS (security Token service)

Allows to grant limited and temporary access to AWS resources
AssumeRole: assume roles within your account or cross account
AssumeRoleWithSAML: return credentials for users logged in with SAML
AssumeRoleWithWebIdentity
- return credentials for users logged with an IDP
- AWS recommends against using this, and using Cognito User Pools instead
GetSessionToken: For MFA, from a user or account root user
GetFederationToken: obtain temporary credentials for a federated user
GetCallerIdentity: return details about the IAM user or role used in the API call
DecodeAuthorizationMessage: decode error message when an AWS API is called

using STS to assume a role

define an IAM role within your account or cross account
define which principals can access this IAM role
user STS to retrieve credentials and impersonate the IAM role you have access to
temporary credentials can be valid between 15 minutes to 1 hour

STS with MFA

use GetSessionToken from STS
appropriate IAM policy using IAM conditions
aws:MultiFactorAuthPresent:true
GetSessionToken returns
- access ID
- secret key
- session token
- expiration date

Advanced IAM

IAM policies and S3 Bucket policies

IAM policies are attached to users, roles and groups
S3 bucket policies are attached to buckets
when evaluating if an IAM principal can perform an operation X on a bucket, the union of its assigned IAM policies and S3 bucket policies will be evaluated at the same time.

Dynamic policies with IAM

how do you assign each user access to their own foler in S3 bucket?
create one dynamic policy with IAM
leverage the special policy variable ${aws:username}

inline vs managed policies

AWS managed policy
- maintained by AWS
- good for power users and administrators
- updated in case of new services and new APIs
customer managed policy
- best practice, re usable, can be applied to many principals
- version controlled + rollback, central change management
inline
- strict one to one relationship between policy and principal
- policy is deleted if you delete the IAM principal

granting a user permissions to pass a role to an AWS service

to configure many services, you must pass an IAM role to the service
the service will later assume the role and perform actions
for this, you need the IAM permission iam:PassRole
it often comes with iam:GetRole to view the role being passed

can a role be passed to any service?

no: roles can only be passed to what their trust allows
a trust policy for the role that allows the service to assume the role

Directory service - overview

AWS managed Microsoft AD
- create your own AD in AWS, manage users locally, supports MFA
- establish trust connections with your on permise AD
AD connector
- directory gateway to redirect to on premises AD
- users are managed on the on premises AD only
Simple AD
- AD compatible managed directory on AWS
- cannot be joined with on premises AD

KMS

Encryption

Encryption in flight

data is encrypted before sending and decrypted after receiving
SSL certificate help with encryption
encryption in flight ensures no MITM can happen

server side encryption at rest

data is encrypted after being received by the server
data is decrypted before being sent
it is stored in an encrypted form thanks to a key
the encryption / decryption keys must be managed somewhere and the server must have access to it

Client side encryption

data is encrypted by the client and never decrypted by the server
data will be decrypted by a receiving client
the server should not be able to decrypt the data
could leverage Envelop encryption

AWS KMS

fully integrated with IAM for authorization
seamlessly integrated into
- EBS
- S3
- RedShift
- RDS
- SSM
but you can also use CLI / SDK
the value in KMS is the CMK used to encrypt data can never be retrieved by user, and the CMK can be rotated for extra security
KMS can only help in encryping up to 4KB of data per call, if data > 4KB, we need to use Envelope encryption
to give access to KMS to someone
- make sure the key policy allows the user
- make sure the IAM policy allows the API calls

CMK Types

Symmetric
- first offering of KMS, single encryption key that is used to encrypt and decrypt
- AWS services that are integrated with KMS use Symmetric CMKs
- necessary for envelope encryption
- you never get access to the key uncrypted (must call KMS API to use)
Asymmetric
- public and private key pair
- used for encrypt and decrypt
- the public key is downloadable, but you can’t access the private key unencrypted
- use case: encryption outside of AWS by users who can’t call the KMS API

KMS key policies

control access to KMS keys, similar to S3 bucket policies
difference: you cannot control access without them
default KMS key policy
- created if you don’t provide a specific key policy
- compelete access to the key to the root user, which means all IAM users can access the key
- gives access to the IAM policies to the KMS key
custom KMS key policy
- define users, roles that can access the KMS key
- define who can administer the key
- helpful for cross account access of your KMS key

copying snapshots across accounts

create a snapshot, encrypted with your own CMK
attach a KMS key policy to authorize cross account access
share the encrypted snapshot
create a copy of the snapshot, encrypt it with a KMS key in your account
create a volume from the snapshot

Envelope encryption

KMS encrypt API call has limit of 4kb
if you want to encrypt > 4KB, we need to user envelope encryption
the main API that will help us is the GenerateDataKey API
steps
- Encryption
  1. call GenerateDataKey API to get the plaintext date key and encrypted data key (encrypted using your CMK)
  2. encrypt the big file using the plaintext data key on your local machine (client side)
  3. create an envelope includes the encrypted date key and the encrypted big file
- decryption
  1. call Decrypt API, send the encrypted data key to KMS to decrypt using your own CMK
  2. plaintext data key will be returned
  3. use the plaintext data key to decrypt your encrypted big file.

Encryption SDK

the Encryption SDK implemented envelope encryption for us
the encryption SDK also exists as a CLI tool we can install
feature - data key caching
- re use data keys instead of creating new data keys for each encryption
- helps with reducing the number of API calls to KMS with a security trade off

KMS symmetric - API summary

encrypt: up to 4KB
GenerateDataKey: generates a unique symmetric data key
- returns a plaintext copy of the data key
- and a copy that is encrypted under the CMK that you specify
decrypt: decrypt up to 4KB of data (including data encryption keys)
GenerateRamdom: returns a random byte string

Quota limits

when you exceed a request quota, you get a ThrottlingException
to respond, use exponential backoff
for crytographic operations, they share the same quota
this includes requests made by AWS on your behalf
for GenerateDataKey, consider using DEK caching from the encryption SDK
you can also request quotas increase through AWS support

SSE-KMS deep dive

SSE-KMS leverages the GenerateDataKey and Decrypt KMS API calls
these KMS API calls will show up in CloudTrail, helpful for logging
to perform SSE-KMS, you need
- a KMS key policy that authorize the user / role (so we could use the key)
- an IAM policy that authorizes access to KMS (so we could access the AWS KMS service)
- otherwise you will get an access denied error
S3 calls to KMS for SSE-KMS count against your KMS limits
- if throttling, try exponential backoff
- or request an increase in KMS limits

S3 bucket policies - force SSL

to force SSL, create an S3 bucket policy with a DENY on the condition aws:SecureTransport=false

S3 bucket policy - force encryption of SSE-KMS

deny incorrect encryption header: make sure it includes aws:kms
deny no encryption header to ensure objects are not uploaded un encrypted

we could also use S3 default encryption of SSE-KMS, in this case, we don’t need the second policy.

S3 bucket key for SSE-KMS encryption

we could enable S3 bucket key to reduce the API calls to KMS directly
the key is used to encrypt kMS objects with new data keys using envelope encryption
you will see less KMS cloudtrail events

SSM Parameter Store

secure storage for configuration and secrets
optional seamless encryption using KMS
serverless, scalable, durable, easy sdk
version tracking of configurations / secrets
configuration management using path and IAM
notifications with CloudWatch events
integration with CloudFormation

Parameter policies

allow to assign a TTL to a parameter to force updating or deleting sensitive data
can assign multiple policies at a time

Secrets Manager

Newer service, meant for storing secrets
capability to force rotation of secrets every X days
automate generation of secrets on rotation using lambda function
integration with RDS
secrets are encrypted using KMS
mostly meant for RDS integration

SSM Parameter store vs secrets manager

secrets manager
- automatic rotation of secrets with lambda
- lambda function is provided for RDS, Redshift…
- KMS encryption is mandatory
SSM parameter store
- simple API
- no secret rotation (can be implemented using CloudWatch events and lambda)
- KMS encryption is optional
- can pull a secrets manager secrets using the SSM parameter Store API

CloudWatch logs - encryption

you can encrypt CloudWatch logs with KMS keys
encryption is enabled at the log group level, by associating a CMK with a log group, either when you create the log group or after it exists
you cannot associate a CMK with a log group using the CloudWatch console, have to use CLI
you must use the CloudWatch logs API
- associate-kms-key: if the log group already exists
- create-log-group: if the log group doesn’t exist yet

ACM (AWS certificate manager)

provision, manage, and deploy SSL / TLS certificates
used to provide in flight encryption for websites
supports both public and private TLS certificates
free of charge for public TLS certificates
automatic TLS certificate renewal
integration with
- ELB
- CloudFront
- APIs on API Gateway

Link： http://hellcy.github.io/2021/09/12/AWS-DVA-review/

Copyright： This article is using CC BY 4.0 license.

Yuan ChengSoftware Engineer

Wait and see