AWS Monitoring, Audit, and Performance
AWS Monitoring, Audit and Performance
AWS CloudWatch Metrics
- Cloudwatch provides metrics for every services in AWS
 - Metric is a variable to monitor (CPUUtilization, NetworkIn…)
 - Metrics belong to namespaces
 - Dimension is an attribute of a metric (instance id, environment, etc…)
 - Up to 10 dimensions per metric
 - Metrics have timestamps
 - Can create CloudWatch dashboards of metrics
EC2 Detailed monitoring
 - EC2 instance metrics have metrics “every 5 minutes”
 - With detailed monitoring (for a cost), you get data “every 1 minute”
 - Use detailed monitoring if you want to scale faster for you ASG!
 - The AWS tier allows us to have 10 detailed monitoring metrics
 - Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)
CloudWatch Custom Metrics
 - Possibility to define and send your own custom metrics to CloudWatch
 - Example: memory (RAM) usage, disk space, number of logged in users …
 - Use API call PutMetricData
 - Ability to use dimensions (attributes) to segment metrics
- Instance.id
 - Environment.name
 
 - Metric resolution (StorageResolution API parameter - two possible value):
- Standard: 1 minute (60 seconds)
 - High Resolution: 1/5/10/30 second(s) - Highr cost
 
 - Important: Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC@ instance time corectly)
CloudWatch Dashboards
 - Great way to setup custom dashboards for quick access to key metrics and alarms
 - Dashboards are global
 - Dashboards can include graphs from different AWS accounts and regions
 - You can change the time zone & time range of dashboards
 - You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)
 - Dashboards can be shared with people who don’t have an AWS account (public, email address, 3rd party SSO provider through Amazon Cognito)
 - Pricing
 - Log groups: arbitrary name, usually representing an application
 - Log stream: instances within application / log files / containers
 - Can define log expiration policies (never expire, 30 days, etc..)
 - CloudWatch Logs can send logs to:
 - SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
 - Elastic Beanstalk: collection of logs from application
 - ECS: collection from containers
 - AWS Lambda: collection from function logs
 - VPC Flow Logs: VPC specific logs
 - API Gateway
 - CloudTrail based on filter
 - Route53: Log DNS queries
CloudWatch Logs Metric Filter & Insights
 - CloudWatch Logs can use filter expressions
- For example, find a specific IP inside of a log
 - Or count occurrences of “ERROR” in your logs
 
 - Metric filters can be used to triggr CloudWatch alarms
 - Cloudwatch Logs Insights can be used to query logs and add queries to CloudWatch Dashboards
CloudWatch Logs – S3 Export
 - Log data can take up to 12 hours to become available for export
 - The API call is CreateExportTask
 - Not near-real time or real-time… use Logs Subscriptions instead
CloudWatch Logs Subscriptions
CloudWatch Logs Aggregation Multi-Account & Multi Region
CloudWatch Logs for EC2
 - By default, no logs from your EC2 machine will go to CloudWatch
 - You need to run a CloudWatch agent on EC2 to push the log files you want
 - Make sure IAM permissions are correct
 - The CloudWatch log agent can be setup on premises too 
CloudWatch Logs Agent & Unified Agent
 - For virtual servers (EC2 instances, on-premises servers…)
 - CloudWatch Logs Agent
- Old version of the agent
 - Can only send to CloudWatch Logs
 
 - CloudWatch Unified Agent
 - Collected directly on your Linux server / EC2 instance
 - CPU (active, guest, idle, system, user, steal)
 - Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
 - RAM (free, inactive, used, total, cached)
 - Netstat (number of TCP and UDP connections, net packets, bytes)
 - Processes (total, dead, bloqued, idle, running, sleep)
 - Swap Space (free, used, used %)
 - Reminder: out-of-the box metrics for EC2 – disk, CPU, network (high level)
CloudWatch Alarms
 - Alarms are used to trigger notifications for any metric
 - Various options (sampling, %, max, min, etc…)
 - Alarm States:
- OK
 - INSUFFICIENT_DATA
 - ALARM
 
 - Period:
 - Stop, Terminate, Reboot, or Recover an EC2 Instance
 - Trigger Auto Scaling Action
 - Send notification to SNS (from which you can do pretty much anything) 
EC2 Instance Recovery
 - Status Check:
- Instance status = check the EC2 VM
 - System status = check the underlying hardware
 
 - Recovery: Same Private, Public, Elastic IP, metadata, placement group 
CloudWatch Events
 - Event pattern: Interceot events from AWS services (Sources)
- Example sources: EC2 instance start, CodeBuild Failre, S3, Trusted Advisor
 - Can intercept any API call with CloudTrai integration
 
 - Schedule or Cron (example: create an event every 4 hours)
 - A JSON payload is created from the event and passed to a target…
 - EventBridge is the next evolution of CloudWatch Events
 - Default Event Bus - generated by AWS services (CloudWatch Events)
 - Partner Event Bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)
 - Custom Event Buses: for your own applications
 - Event Buses can be accessed by other AWS accounts
 - You can archive events (all/filter) sent to an event bus (indefinitely or set period)
 - Ability to replay archived events
 - Rules: how to process the events (like couldwatch events)
Amazon EvntBridge - Schema Registry
 - EventBridge can analyze the events in your bus and infer the schema
 - The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus
 - Schema can be versioned
Amazon EventBridge – Resource-based Policy
 - Manage permissions for a specific Event Bus
 - Example: allow/deny events from another AWS account or AWS region
 - Use case: aggregate all events from your AWS Organization in a single AWS account or AWS region 
Amazon EventBridge vs CloudWatch Events
 - Amazon EventBridge builds upon and extends CloudWatch Events.
 - It uses the same service API and endpoint, and the same undrlying service infrastructure
 - EventBridge allows extension to add event buses for your custom applications and your third-party SaaS apps.
 - Event Bridge has the Schema registry capability
 - EventBridge has a different name to mark the new capabilities
 - Over time, the CloudWatch Events name will be replaced with EventBridge
AWS CloudTrail
 - Provides governance, compliance and audit for your AWS Account
 - CloudTrail is enabled by default!
 - Get an history of events / API calls made within your AWS Account by:
- Console
 - SDK
 - CLI
 - AWS Services
 
 - Can put logs from CloudTrail into Cloud
 - A trail can be applied to All Regions (default) or a single Region
 - If a resource is deleted in AWS, investigate CloudTrail first!
CloudTrail Diagram
CloudTrail Events
 - Management Events:
- Operations that are performed on resources in your AWS account
 - Examples:
- Configuring security (IAM AttachRolePolicy)
 - Configuring rules for routing data (Amazon EC2 CreateSubnet)
 - Setting up logging (AWS CloudTrail CreateTrail)
 
 - By default, trails are configured to log management events.
 - Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
 
 - Data Events:
- By default, data events are not logged (because high volume operations)
 - Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject ): can separate Read and Write Events
 - AWS Lambda function execution activity (the Invoke API)
 
 - CloudTrail Insights Events
 - Enable CloudTrail Insights to detect unusual activity in your account:
- inaccurate resource provisioning
 - hitting service limits
 - Bursts of AWS IAM actions
 - Gaps in periodic maintenance activity
 
 - CloudTrail Insights analyzes normal management events to create a baseline
 - And then continuously analyzes write events to detect unusual patterns
 - Events are stored for 90 days in CloudTrail
 - To keep events beyond this period, log them to S3 and use Athena 
AWS Config
 - Helps with auditing and recording compliance of your AWS resources
 - Helps record configurations and changes over time
 - Questions that can be solved by AWS Config:
- Is there unrestricted SSH access to my security groups?
 - Do my buckets have any public access?
 - How has my ALB configuration changed over time?
 
 - You can receive alerts (SNS notifications) for any changes
 - AWS Config is a per-region service
 - Can be aggregated across regions and accounts
 - Possibility of storing the configuration data into S3 (analyzed by Athena)
Config Rules
 - Can use AWS managed config rules (over 75)
 - Can make custom config rules (must be defined in AWS Lambda)
- Ex: evaluate if each EBS disk is of type gp2
 - Ex: evaluate if each EC2 instance is t2.micro
 
 - Rules can be evaluated / triggered:
- For each config change
 - And / or: at regular time intervals
 
 - AWS Config Rules does not prevent actions from happening (no deny)
 - Pricing: no free tier, $0.003 per configuration item recorded per region, $0.001 per config rule evaluation per region
AWS Config Resource
 - View compliance of a resource over time
 - View configuration of a resource over time
 - View CloudTrail API calls of a resource over time 
Config Rules – Remediations
 - Automate remediation of non-compliant resources using SSM Automation Documents
 - Use AWS-Managed Automation Documents or create custom Automation Documents
- Tip: you can create custom Automation Documents that invokes Lambda function
 
 - You can set Remediation Retries if the resource is still non-compliant after auto-remediation
Config Rules – Notifications
 - Use EventBridge to trigger notifications when AWS resources are non-compliant 
 - Ability to send configuration changes and compliance state notifications to SNS (all events – use SNS Filtering or filter at client-side) 
CloudWatch vs CloudTrail vs Config
 - CloudWatch
- Performance monitoring (metrics, CPU, network, etc…) & dashboards
 - Events & Alerting
 - Log Aggregation & Analysis
 
 - CloudTrail
- Record API calls made within your Account by everyone
 - Can define trails for specific resources
 - Global Service
 
 - Config
 - CloudWatch:
- Monitoring Incoming connections metric
 - Visualize error codes as % over time
 - Make a dashboard to get an idea of your load balancer performance
 
 - Config:
- Track security group rules for the Load Balancer
 - Track configuration changes for the Load Balancer
 - Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
 
 - CloudTrail:
 - Allows to grant limited and temporary access to AWS resources.
 - Token is valid for up to one hour (must be refreshed)
 - AssumeRole
- Within your own account: for enhanced security
 - Cross Account Access: assume role in target account to perform actions there
 
 - AssumeRoleWithSAML
- return credentials for users logged with SAML
 
 - AssumeRoleWithWebIdentity
- return creds for users logged with an IdP (Facebook Login, Google Login, OIDC compatible…)
 - AWS recommends against using this, and using Cognito instead
 
 - GetSessionToken
 - Define an IAM Role within your account or cross-account
 - Define which principals can access this IAM Role
 - Use AWS STS (Security Token Service) to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)
 - Temporary credentials can be valid between 15 minutes to 1 hour 
Cross account access with STS
Identity Federation in AWS
 - Federation lets users outside of AWS to assume temporary role for accessing AWS resources.
 - These users assume identity provided access role.
 - Federations can have many flavors:
- SAML 2.0
 - Custom Identity Broker
 - Web Identity Federation with Amazon Cognito
 - Web Identity Federation without Amazon Cognito
 - Single Sign On
 - Non-SAML with AWS Microsoft AD
 
 - Using federation, you don’t need to create IAM users (user management is outside of AWS) 
SAML 2.0 Federation
 - To integrate Active Directory / ADFS with AWS (or any SAML 2.0)
 - Provides access to AWS Console or CLI (through temporary creds)
 - No need to create an IAM user for each of your employees 
SAML 2.0 Federation – Active Directory FS
 - Same process as with any SAML 2.0 compatible IdP 
SAML 2.0 Federation
 - Needs to setup a trust between AWS IAM and SAML (both ways)
 - SAML 2.0 enables web-based, cross domain SSO
 - Uses the STS API: AssumeRoleWithSAML
 - Note federation through SAML is the “old way” of doing things
 - Amazon Single Sign On (SSO) Federation is the new managed and simpler way
 - Use only if identity provider is not compatible with SAML 2.0
 - The identity broker must determine the appropriate IAM policy
 - Uses the STS API: AssumeRole or GetFederationToken 
Web Identity Federation – AssumeRoleWithWebIdentity
 - Not recommended by AWS – use Cognito instead (allows for anonymous users, data synchronization, MFA) 
AWS Cognito
 - Goal:
- Provide direct access to AWS Resources from the Client Side (mobile, web app)
 
 - Example:
- provide (temporary) access to write to S3 bucket using Facebook Login
 
 - Problem:
- We don’t want to create IAM users for our app users
 
 - How:
 - Found on any Windows Server with AD Domain Services
 - Database of objects: User Accounts, Computers, Printers, File Shares, Security Groups
 - Centralized security management, create account, assign permissions
 - Objects are organized in trees
 - A group of trees is a forest
AWS Directory Services
 - AWS Managed Microsoft AD
- Create your own AD in AWS, manage users locally, supports MFA
 - Establish “trust” connections with your on-premises AD
 
 - AD Connector
- Directory Gateway (proxy) to redirect to on-premises AD, supports MFA
 - Users are managed on the on-premises AD
 
 - Simple AD
 - Global service
 - Allows to manage multiple AWS accounts
 - The main account is the master account – you can’t change it
 - Other accounts are member accounts
 - Member accounts can only be part of one organiztion
 - Consolidated Billing across all accounts - single payment method
 - Pricing benefits from aggregated usage (volume discount for EC2, S3…)
 - API is available to automate AWS account creation
Multi Account Strategies
 - Create accounts per department, per cost center, per dev / test / prod, based on regulatory restrictions (using SCP), for better resource isolation (ex: VPC), to have separate per-account service limit, isolated account for logging
 - Multi Account vs One Account Multi VPC
 - Use tagging standards for billing purposes
 - Enable CloudTrail on all accounts, send logs to central S3 account
 - Send CloudWatch logs to central logging account
 - Establish cross account roles for admin purposes
Organizational Units (OU) - Examples
AWS Organization
Service Control Policies (SCP)
 - Whitelist or blacklist IAM actions
 - Applied at the OU or Account level
 - Does not apply to the Master Account
 - SCP is applied to all the Users and Roles of the Account, including Root user
 - The SCP does not affect service-linked roles
- Service-linked roles enable other AWS services to integrate with AWS Organizations and can’t be restricted by SCPs.
 
 - SCP must have an explicit Allow (does not allow anything by default)
 - Use cases:
 - To migrate accounts from one organization to another
- Remove the member account from the old organization
 - Send an invite to the new organiztion
 - Accept the invite to the new organization from the member account
 
 - If you want the master account of the old organization to also join the new organization, do the following:
 - ListBucket permission applies to arn:aws:s3:::test
 - => bucket level permission
 - GetObject, PutObject, DeleteObject applies to arn:awn:s3:::test/*
 - => object level permission 
IAM Roles vs Resource Based Policies
 - Attach a policy to a resource (example: S3 bucket policy) versus attaching of a using a role as a proxy 
IAM Roles vs Resource Based Policies
 - When you assume a role (user, application or service), you give up your original permissions and take the permissions assigned to the role
 - When using a resource based policy, the principal doesn’t have to give up his permissions
 - Example: User in account A needs to scan a DynamoDB table in Account A and dump it in an S3 bucket in Account B
 - Supported by: Amazon S3 buckets, SNS topics, SQS queues, etc…
IAM Permission Boundaries
 - IAM Permission Boundaries are supported for users and roles (not groups)
 - Advanced feature to use a managed policy to set the maximum permissions an IAM entity can get. 
IAM Permission Boundaries
 - Can be used in combinations of AWS Organizations SCP 
 - Use cases
- Delegate responsibilities to non administrators within their permission
 - Allow developers to self-assign policies and manage their own permissions, while making sure they can’t escalate their privileges (= make themselves admin)
 - Useful to restrict one specific user (instead of a whole account using Organizations & SCP)
IAM Policy Evaluation Logic
Example IAM Policy
 
 - Can you perform sqs:CreateQueue?
 - Can you perform sqs:DeleteQueue?
 - Can you perform ec2:DescribeInstances?
AWS Resource Access Manager (RAM)
 - Share AWS resources that you own with other AWS accounts
 - Share with any account or within your Organization
 - Avoid resource duplication!
 - VPC Subnets:
- allow to have all the resources launched in the same subnets
 - must be from the same AWS Organizations.
 - Cannot share security groups and default VPC
 - Participants can manage their own resources in there
 - Participants can’t view, modify, delete resources that belong to other participants or the owner
 
 - AWS Transit Gateway
 - Route53 Resolver Rules
 - License Manager Configurations
Resource Access Manager – VPC example
 - Each account…
- is responsible for its own resources
 - cannot view, modify or delete other resources in other accounts
 
 - Network is shared so…
 - Centrally manage Single Sign-On to access multiple accounts and 3rd -party business applications.
 - Integrated with AWS Organizations
 - Supports SAML 2.0 markup
 - Integration with on-premises Active Directory
 - Centralized permission management
 - Centralized a auditing with CloudTrail 
AWS Single Sign-On (SSO) – Setup with AD
SSO – vs AssumeRoleWithSAML
 
 This post is licensed under  CC BY 4.0  by the author.