AWS Monitoring, Audit, and Performance
AWS Monitoring, Audit and Performance
AWS CloudWatch Metrics
- Cloudwatch provides metrics for every services in AWS
- Metric is a variable to monitor (CPUUtilization, NetworkIn…)
- Metrics belong to namespaces
- Dimension is an attribute of a metric (instance id, environment, etc…)
- Up to 10 dimensions per metric
- Metrics have timestamps
- Can create CloudWatch dashboards of metrics
EC2 Detailed monitoring
- EC2 instance metrics have metrics “every 5 minutes”
- With detailed monitoring (for a cost), you get data “every 1 minute”
- Use detailed monitoring if you want to scale faster for you ASG!
- The AWS tier allows us to have 10 detailed monitoring metrics
- Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)
CloudWatch Custom Metrics
- Possibility to define and send your own custom metrics to CloudWatch
- Example: memory (RAM) usage, disk space, number of logged in users …
- Use API call PutMetricData
- Ability to use dimensions (attributes) to segment metrics
- Instance.id
- Environment.name
- Metric resolution (StorageResolution API parameter - two possible value):
- Standard: 1 minute (60 seconds)
- High Resolution: 1/5/10/30 second(s) - Highr cost
- Important: Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC@ instance time corectly)
CloudWatch Dashboards
- Great way to setup custom dashboards for quick access to key metrics and alarms
- Dashboards are global
- Dashboards can include graphs from different AWS accounts and regions
- You can change the time zone & time range of dashboards
- You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)
- Dashboards can be shared with people who don’t have an AWS account (public, email address, 3rd party SSO provider through Amazon Cognito)
- Pricing
- Log groups: arbitrary name, usually representing an application
- Log stream: instances within application / log files / containers
- Can define log expiration policies (never expire, 30 days, etc..)
- CloudWatch Logs can send logs to:
- SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
- Elastic Beanstalk: collection of logs from application
- ECS: collection from containers
- AWS Lambda: collection from function logs
- VPC Flow Logs: VPC specific logs
- API Gateway
- CloudTrail based on filter
- Route53: Log DNS queries
CloudWatch Logs Metric Filter & Insights
- CloudWatch Logs can use filter expressions
- For example, find a specific IP inside of a log
- Or count occurrences of “ERROR” in your logs
- Metric filters can be used to triggr CloudWatch alarms
- Cloudwatch Logs Insights can be used to query logs and add queries to CloudWatch Dashboards
CloudWatch Logs – S3 Export
- Log data can take up to 12 hours to become available for export
- The API call is CreateExportTask
- Not near-real time or real-time… use Logs Subscriptions instead
CloudWatch Logs Subscriptions
CloudWatch Logs Aggregation Multi-Account & Multi Region
CloudWatch Logs for EC2
- By default, no logs from your EC2 machine will go to CloudWatch
- You need to run a CloudWatch agent on EC2 to push the log files you want
- Make sure IAM permissions are correct
- The CloudWatch log agent can be setup on premises too
CloudWatch Logs Agent & Unified Agent
- For virtual servers (EC2 instances, on-premises servers…)
- CloudWatch Logs Agent
- Old version of the agent
- Can only send to CloudWatch Logs
- CloudWatch Unified Agent
- Collected directly on your Linux server / EC2 instance
- CPU (active, guest, idle, system, user, steal)
- Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
- RAM (free, inactive, used, total, cached)
- Netstat (number of TCP and UDP connections, net packets, bytes)
- Processes (total, dead, bloqued, idle, running, sleep)
- Swap Space (free, used, used %)
- Reminder: out-of-the box metrics for EC2 – disk, CPU, network (high level)
CloudWatch Alarms
- Alarms are used to trigger notifications for any metric
- Various options (sampling, %, max, min, etc…)
- Alarm States:
- OK
- INSUFFICIENT_DATA
- ALARM
- Period:
- Stop, Terminate, Reboot, or Recover an EC2 Instance
- Trigger Auto Scaling Action
- Send notification to SNS (from which you can do pretty much anything)
EC2 Instance Recovery
- Status Check:
- Instance status = check the EC2 VM
- System status = check the underlying hardware
- Recovery: Same Private, Public, Elastic IP, metadata, placement group
CloudWatch Events
- Event pattern: Interceot events from AWS services (Sources)
- Example sources: EC2 instance start, CodeBuild Failre, S3, Trusted Advisor
- Can intercept any API call with CloudTrai integration
- Schedule or Cron (example: create an event every 4 hours)
- A JSON payload is created from the event and passed to a target…
- EventBridge is the next evolution of CloudWatch Events
- Default Event Bus - generated by AWS services (CloudWatch Events)
- Partner Event Bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)
- Custom Event Buses: for your own applications
- Event Buses can be accessed by other AWS accounts
- You can archive events (all/filter) sent to an event bus (indefinitely or set period)
- Ability to replay archived events
- Rules: how to process the events (like couldwatch events)
Amazon EvntBridge - Schema Registry
- EventBridge can analyze the events in your bus and infer the schema
- The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus
- Schema can be versioned
Amazon EventBridge – Resource-based Policy
- Manage permissions for a specific Event Bus
- Example: allow/deny events from another AWS account or AWS region
- Use case: aggregate all events from your AWS Organization in a single AWS account or AWS region
Amazon EventBridge vs CloudWatch Events
- Amazon EventBridge builds upon and extends CloudWatch Events.
- It uses the same service API and endpoint, and the same undrlying service infrastructure
- EventBridge allows extension to add event buses for your custom applications and your third-party SaaS apps.
- Event Bridge has the Schema registry capability
- EventBridge has a different name to mark the new capabilities
- Over time, the CloudWatch Events name will be replaced with EventBridge
AWS CloudTrail
- Provides governance, compliance and audit for your AWS Account
- CloudTrail is enabled by default!
- Get an history of events / API calls made within your AWS Account by:
- Console
- SDK
- CLI
- AWS Services
- Can put logs from CloudTrail into Cloud
- A trail can be applied to All Regions (default) or a single Region
- If a resource is deleted in AWS, investigate CloudTrail first!
CloudTrail Diagram
CloudTrail Events
- Management Events:
- Operations that are performed on resources in your AWS account
- Examples:
- Configuring security (IAM AttachRolePolicy)
- Configuring rules for routing data (Amazon EC2 CreateSubnet)
- Setting up logging (AWS CloudTrail CreateTrail)
- By default, trails are configured to log management events.
- Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
- Data Events:
- By default, data events are not logged (because high volume operations)
- Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject ): can separate Read and Write Events
- AWS Lambda function execution activity (the Invoke API)
- CloudTrail Insights Events
- Enable CloudTrail Insights to detect unusual activity in your account:
- inaccurate resource provisioning
- hitting service limits
- Bursts of AWS IAM actions
- Gaps in periodic maintenance activity
- CloudTrail Insights analyzes normal management events to create a baseline
- And then continuously analyzes write events to detect unusual patterns
- Events are stored for 90 days in CloudTrail
- To keep events beyond this period, log them to S3 and use Athena
AWS Config
- Helps with auditing and recording compliance of your AWS resources
- Helps record configurations and changes over time
- Questions that can be solved by AWS Config:
- Is there unrestricted SSH access to my security groups?
- Do my buckets have any public access?
- How has my ALB configuration changed over time?
- You can receive alerts (SNS notifications) for any changes
- AWS Config is a per-region service
- Can be aggregated across regions and accounts
- Possibility of storing the configuration data into S3 (analyzed by Athena)
Config Rules
- Can use AWS managed config rules (over 75)
- Can make custom config rules (must be defined in AWS Lambda)
- Ex: evaluate if each EBS disk is of type gp2
- Ex: evaluate if each EC2 instance is t2.micro
- Rules can be evaluated / triggered:
- For each config change
- And / or: at regular time intervals
- AWS Config Rules does not prevent actions from happening (no deny)
- Pricing: no free tier, $0.003 per configuration item recorded per region, $0.001 per config rule evaluation per region
AWS Config Resource
- View compliance of a resource over time
- View configuration of a resource over time
- View CloudTrail API calls of a resource over time
Config Rules – Remediations
- Automate remediation of non-compliant resources using SSM Automation Documents
- Use AWS-Managed Automation Documents or create custom Automation Documents
- Tip: you can create custom Automation Documents that invokes Lambda function
- You can set Remediation Retries if the resource is still non-compliant after auto-remediation
Config Rules – Notifications
- Use EventBridge to trigger notifications when AWS resources are non-compliant
- Ability to send configuration changes and compliance state notifications to SNS (all events – use SNS Filtering or filter at client-side)
CloudWatch vs CloudTrail vs Config
- CloudWatch
- Performance monitoring (metrics, CPU, network, etc…) & dashboards
- Events & Alerting
- Log Aggregation & Analysis
- CloudTrail
- Record API calls made within your Account by everyone
- Can define trails for specific resources
- Global Service
- Config
- CloudWatch:
- Monitoring Incoming connections metric
- Visualize error codes as % over time
- Make a dashboard to get an idea of your load balancer performance
- Config:
- Track security group rules for the Load Balancer
- Track configuration changes for the Load Balancer
- Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
- CloudTrail:
- Allows to grant limited and temporary access to AWS resources.
- Token is valid for up to one hour (must be refreshed)
- AssumeRole
- Within your own account: for enhanced security
- Cross Account Access: assume role in target account to perform actions there
- AssumeRoleWithSAML
- return credentials for users logged with SAML
- AssumeRoleWithWebIdentity
- return creds for users logged with an IdP (Facebook Login, Google Login, OIDC compatible…)
- AWS recommends against using this, and using Cognito instead
- GetSessionToken
- Define an IAM Role within your account or cross-account
- Define which principals can access this IAM Role
- Use AWS STS (Security Token Service) to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)
- Temporary credentials can be valid between 15 minutes to 1 hour
Cross account access with STS
Identity Federation in AWS
- Federation lets users outside of AWS to assume temporary role for accessing AWS resources.
- These users assume identity provided access role.
- Federations can have many flavors:
- SAML 2.0
- Custom Identity Broker
- Web Identity Federation with Amazon Cognito
- Web Identity Federation without Amazon Cognito
- Single Sign On
- Non-SAML with AWS Microsoft AD
- Using federation, you don’t need to create IAM users (user management is outside of AWS)
SAML 2.0 Federation
- To integrate Active Directory / ADFS with AWS (or any SAML 2.0)
- Provides access to AWS Console or CLI (through temporary creds)
- No need to create an IAM user for each of your employees
SAML 2.0 Federation – Active Directory FS
- Same process as with any SAML 2.0 compatible IdP
SAML 2.0 Federation
- Needs to setup a trust between AWS IAM and SAML (both ways)
- SAML 2.0 enables web-based, cross domain SSO
- Uses the STS API: AssumeRoleWithSAML
- Note federation through SAML is the “old way” of doing things
- Amazon Single Sign On (SSO) Federation is the new managed and simpler way
- Use only if identity provider is not compatible with SAML 2.0
- The identity broker must determine the appropriate IAM policy
- Uses the STS API: AssumeRole or GetFederationToken
Web Identity Federation – AssumeRoleWithWebIdentity
- Not recommended by AWS – use Cognito instead (allows for anonymous users, data synchronization, MFA)
AWS Cognito
- Goal:
- Provide direct access to AWS Resources from the Client Side (mobile, web app)
- Example:
- provide (temporary) access to write to S3 bucket using Facebook Login
- Problem:
- We don’t want to create IAM users for our app users
- How:
- Found on any Windows Server with AD Domain Services
- Database of objects: User Accounts, Computers, Printers, File Shares, Security Groups
- Centralized security management, create account, assign permissions
- Objects are organized in trees
- A group of trees is a forest
AWS Directory Services
- AWS Managed Microsoft AD
- Create your own AD in AWS, manage users locally, supports MFA
- Establish “trust” connections with your on-premises AD
- AD Connector
- Directory Gateway (proxy) to redirect to on-premises AD, supports MFA
- Users are managed on the on-premises AD
- Simple AD
- Global service
- Allows to manage multiple AWS accounts
- The main account is the master account – you can’t change it
- Other accounts are member accounts
- Member accounts can only be part of one organiztion
- Consolidated Billing across all accounts - single payment method
- Pricing benefits from aggregated usage (volume discount for EC2, S3…)
- API is available to automate AWS account creation
Multi Account Strategies
- Create accounts per department, per cost center, per dev / test / prod, based on regulatory restrictions (using SCP), for better resource isolation (ex: VPC), to have separate per-account service limit, isolated account for logging
- Multi Account vs One Account Multi VPC
- Use tagging standards for billing purposes
- Enable CloudTrail on all accounts, send logs to central S3 account
- Send CloudWatch logs to central logging account
- Establish cross account roles for admin purposes
Organizational Units (OU) - Examples
AWS Organization
Service Control Policies (SCP)
- Whitelist or blacklist IAM actions
- Applied at the OU or Account level
- Does not apply to the Master Account
- SCP is applied to all the Users and Roles of the Account, including Root user
- The SCP does not affect service-linked roles
- Service-linked roles enable other AWS services to integrate with AWS Organizations and can’t be restricted by SCPs.
- SCP must have an explicit Allow (does not allow anything by default)
- Use cases:
- To migrate accounts from one organization to another
- Remove the member account from the old organization
- Send an invite to the new organiztion
- Accept the invite to the new organization from the member account
- If you want the master account of the old organization to also join the new organization, do the following:
- ListBucket permission applies to arn:aws:s3:::test
- => bucket level permission
- GetObject, PutObject, DeleteObject applies to arn:awn:s3:::test/*
- => object level permission
IAM Roles vs Resource Based Policies
- Attach a policy to a resource (example: S3 bucket policy) versus attaching of a using a role as a proxy
IAM Roles vs Resource Based Policies
- When you assume a role (user, application or service), you give up your original permissions and take the permissions assigned to the role
- When using a resource based policy, the principal doesn’t have to give up his permissions
- Example: User in account A needs to scan a DynamoDB table in Account A and dump it in an S3 bucket in Account B
- Supported by: Amazon S3 buckets, SNS topics, SQS queues, etc…
IAM Permission Boundaries
- IAM Permission Boundaries are supported for users and roles (not groups)
- Advanced feature to use a managed policy to set the maximum permissions an IAM entity can get.
IAM Permission Boundaries
- Can be used in combinations of AWS Organizations SCP
- Use cases
- Delegate responsibilities to non administrators within their permission
- Allow developers to self-assign policies and manage their own permissions, while making sure they can’t escalate their privileges (= make themselves admin)
- Useful to restrict one specific user (instead of a whole account using Organizations & SCP)
IAM Policy Evaluation Logic
Example IAM Policy
- Can you perform sqs:CreateQueue?
- Can you perform sqs:DeleteQueue?
- Can you perform ec2:DescribeInstances?
AWS Resource Access Manager (RAM)
- Share AWS resources that you own with other AWS accounts
- Share with any account or within your Organization
- Avoid resource duplication!
- VPC Subnets:
- allow to have all the resources launched in the same subnets
- must be from the same AWS Organizations.
- Cannot share security groups and default VPC
- Participants can manage their own resources in there
- Participants can’t view, modify, delete resources that belong to other participants or the owner
- AWS Transit Gateway
- Route53 Resolver Rules
- License Manager Configurations
Resource Access Manager – VPC example
- Each account…
- is responsible for its own resources
- cannot view, modify or delete other resources in other accounts
- Network is shared so…
- Centrally manage Single Sign-On to access multiple accounts and 3rd -party business applications.
- Integrated with AWS Organizations
- Supports SAML 2.0 markup
- Integration with on-premises Active Directory
- Centralized permission management
- Centralized a auditing with CloudTrail
AWS Single Sign-On (SSO) – Setup with AD
SSO – vs AssumeRoleWithSAML
This post is licensed under CC BY 4.0 by the author.