Post

AWS Monitoring, Audit, and Performance

AWS Monitoring, Audit and Performance

AWS CloudWatch Metrics

  • Cloudwatch provides metrics for every services in AWS
  • Metric is a variable to monitor (CPUUtilization, NetworkIn…)
  • Metrics belong to namespaces
  • Dimension is an attribute of a metric (instance id, environment, etc…)
  • Up to 10 dimensions per metric
  • Metrics have timestamps
  • Can create CloudWatch dashboards of metrics

    EC2 Detailed monitoring

  • EC2 instance metrics have metrics “every 5 minutes”
  • With detailed monitoring (for a cost), you get data “every 1 minute”
  • Use detailed monitoring if you want to scale faster for you ASG!
  • The AWS tier allows us to have 10 detailed monitoring metrics
  • Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)

    CloudWatch Custom Metrics

  • Possibility to define and send your own custom metrics to CloudWatch
  • Example: memory (RAM) usage, disk space, number of logged in users …
  • Use API call PutMetricData
  • Ability to use dimensions (attributes) to segment metrics
    • Instance.id
    • Environment.name
  • Metric resolution (StorageResolution API parameter - two possible value):
    • Standard: 1 minute (60 seconds)
    • High Resolution: 1/5/10/30 second(s) - Highr cost
  • Important: Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC@ instance time corectly)

    CloudWatch Dashboards

  • Great way to setup custom dashboards for quick access to key metrics and alarms
  • Dashboards are global
  • Dashboards can include graphs from different AWS accounts and regions
  • You can change the time zone & time range of dashboards
  • You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)
  • Dashboards can be shared with people who don’t have an AWS account (public, email address, 3rd party SSO provider through Amazon Cognito)
  • Pricing
    • 3 dashboards (up to 50metrics) for free
    • $3/dashboard/month afterwards

      CloudWatch Logs

  • Log groups: arbitrary name, usually representing an application
  • Log stream: instances within application / log files / containers
  • Can define log expiration policies (never expire, 30 days, etc..)
  • CloudWatch Logs can send logs to:
    • Amazon S3 (exports)
    • Kinesis Data Streams
    • Kinesis Data Firehose
    • AWS Lambda
    • ElasticSearch

      CloudWatch Logs - Sources

  • SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
  • Elastic Beanstalk: collection of logs from application
  • ECS: collection from containers
  • AWS Lambda: collection from function logs
  • VPC Flow Logs: VPC specific logs
  • API Gateway
  • CloudTrail based on filter
  • Route53: Log DNS queries

    CloudWatch Logs Metric Filter & Insights

  • CloudWatch Logs can use filter expressions
    • For example, find a specific IP inside of a log
    • Or count occurrences of “ERROR” in your logs
  • Metric filters can be used to triggr CloudWatch alarms
  • Cloudwatch Logs Insights can be used to query logs and add queries to CloudWatch Dashboards

    CloudWatch Logs – S3 Export

  • Log data can take up to 12 hours to become available for export
  • The API call is CreateExportTask
  • Not near-real time or real-time… use Logs Subscriptions instead

    CloudWatch Logs Subscriptions

    CloudWatch Logs Aggregation Multi-Account & Multi Region

    CloudWatch Logs for EC2

  • By default, no logs from your EC2 machine will go to CloudWatch
  • You need to run a CloudWatch agent on EC2 to push the log files you want
  • Make sure IAM permissions are correct
  • The CloudWatch log agent can be setup on premises too

    CloudWatch Logs Agent & Unified Agent

  • For virtual servers (EC2 instances, on-premises servers…)
  • CloudWatch Logs Agent
    • Old version of the agent
    • Can only send to CloudWatch Logs
  • CloudWatch Unified Agent
    • Collect additional system-level metrics such as RAM, processes, etc…
    • Collect logs to send to CloudWatch Logs
    • Centralized configuration using SSM Parameter Store

      CloudWatch Unified Agent – Metrics

  • Collected directly on your Linux server / EC2 instance
  • CPU (active, guest, idle, system, user, steal)
  • Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
  • RAM (free, inactive, used, total, cached)
  • Netstat (number of TCP and UDP connections, net packets, bytes)
  • Processes (total, dead, bloqued, idle, running, sleep)
  • Swap Space (free, used, used %)
  • Reminder: out-of-the box metrics for EC2 – disk, CPU, network (high level)

    CloudWatch Alarms

  • Alarms are used to trigger notifications for any metric
  • Various options (sampling, %, max, min, etc…)
  • Alarm States:
    • OK
    • INSUFFICIENT_DATA
    • ALARM
  • Period:
    • Length of time in seconds to evaluate the metric
    • High resolution custom metrics: 10 sec, 30 sec or multiples of 60 sec

      CloudWatch Alarm Targets

  • Stop, Terminate, Reboot, or Recover an EC2 Instance
  • Trigger Auto Scaling Action
  • Send notification to SNS (from which you can do pretty much anything)

    EC2 Instance Recovery

  • Status Check:
    • Instance status = check the EC2 VM
    • System status = check the underlying hardware
  • Recovery: Same Private, Public, Elastic IP, metadata, placement group

    CloudWatch Events

  • Event pattern: Interceot events from AWS services (Sources)
    • Example sources: EC2 instance start, CodeBuild Failre, S3, Trusted Advisor
    • Can intercept any API call with CloudTrai integration
  • Schedule or Cron (example: create an event every 4 hours)
  • A JSON payload is created from the event and passed to a target…
    • Compute: Lambda, Batch, ECS task
    • Integration: SQS, SNS, Kinesis Data Streams, Kinesis Data Firehose
    • Orchestration: Step functions, CodePipeline, CodeBuild
    • Maintenance: SSM, EC2 Actions

      Amazon EventBridge

  • EventBridge is the next evolution of CloudWatch Events
  • Default Event Bus - generated by AWS services (CloudWatch Events)
  • Partner Event Bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)
  • Custom Event Buses: for your own applications
  • Event Buses can be accessed by other AWS accounts
  • You can archive events (all/filter) sent to an event bus (indefinitely or set period)
  • Ability to replay archived events
  • Rules: how to process the events (like couldwatch events)

    Amazon EvntBridge - Schema Registry

  • EventBridge can analyze the events in your bus and infer the schema
  • The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus
  • Schema can be versioned

    Amazon EventBridge – Resource-based Policy

  • Manage permissions for a specific Event Bus
  • Example: allow/deny events from another AWS account or AWS region
  • Use case: aggregate all events from your AWS Organization in a single AWS account or AWS region

    Amazon EventBridge vs CloudWatch Events

  • Amazon EventBridge builds upon and extends CloudWatch Events.
  • It uses the same service API and endpoint, and the same undrlying service infrastructure
  • EventBridge allows extension to add event buses for your custom applications and your third-party SaaS apps.
  • Event Bridge has the Schema registry capability
  • EventBridge has a different name to mark the new capabilities
  • Over time, the CloudWatch Events name will be replaced with EventBridge

    AWS CloudTrail

  • Provides governance, compliance and audit for your AWS Account
  • CloudTrail is enabled by default!
  • Get an history of events / API calls made within your AWS Account by:
    • Console
    • SDK
    • CLI
    • AWS Services
  • Can put logs from CloudTrail into Cloud
  • A trail can be applied to All Regions (default) or a single Region
  • If a resource is deleted in AWS, investigate CloudTrail first!

    CloudTrail Diagram

    CloudTrail Events

  • Management Events:
    • Operations that are performed on resources in your AWS account
    • Examples:
      • Configuring security (IAM AttachRolePolicy)
      • Configuring rules for routing data (Amazon EC2 CreateSubnet)
      • Setting up logging (AWS CloudTrail CreateTrail)
    • By default, trails are configured to log management events.
    • Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
  • Data Events:
    • By default, data events are not logged (because high volume operations)
    • Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject ): can separate Read and Write Events
    • AWS Lambda function execution activity (the Invoke API)
  • CloudTrail Insights Events
    • See next slide

      CloudTrail Insights

  • Enable CloudTrail Insights to detect unusual activity in your account:
    • inaccurate resource provisioning
    • hitting service limits
    • Bursts of AWS IAM actions
    • Gaps in periodic maintenance activity
  • CloudTrail Insights analyzes normal management events to create a baseline
  • And then continuously analyzes write events to detect unusual patterns
    • Anomalies appear in the CloudTrail console
    • Event is sent to Amazon S3
    • An EventBridge event is generated (for automation needs)

      CloudTrail Events Retention

  • Events are stored for 90 days in CloudTrail
  • To keep events beyond this period, log them to S3 and use Athena

    AWS Config

  • Helps with auditing and recording compliance of your AWS resources
  • Helps record configurations and changes over time
  • Questions that can be solved by AWS Config:
    • Is there unrestricted SSH access to my security groups?
    • Do my buckets have any public access?
    • How has my ALB configuration changed over time?
  • You can receive alerts (SNS notifications) for any changes
  • AWS Config is a per-region service
  • Can be aggregated across regions and accounts
  • Possibility of storing the configuration data into S3 (analyzed by Athena)

    Config Rules

  • Can use AWS managed config rules (over 75)
  • Can make custom config rules (must be defined in AWS Lambda)
    • Ex: evaluate if each EBS disk is of type gp2
    • Ex: evaluate if each EC2 instance is t2.micro
  • Rules can be evaluated / triggered:
    • For each config change
    • And / or: at regular time intervals
  • AWS Config Rules does not prevent actions from happening (no deny)
  • Pricing: no free tier, $0.003 per configuration item recorded per region, $0.001 per config rule evaluation per region

    AWS Config Resource

  • View compliance of a resource over time
  • View configuration of a resource over time
  • View CloudTrail API calls of a resource over time

    Config Rules – Remediations

  • Automate remediation of non-compliant resources using SSM Automation Documents
  • Use AWS-Managed Automation Documents or create custom Automation Documents
    • Tip: you can create custom Automation Documents that invokes Lambda function
  • You can set Remediation Retries if the resource is still non-compliant after auto-remediation

    Config Rules – Notifications

  • Use EventBridge to trigger notifications when AWS resources are non-compliant
  • Ability to send configuration changes and compliance state notifications to SNS (all events – use SNS Filtering or filter at client-side)

    CloudWatch vs CloudTrail vs Config

  • CloudWatch
    • Performance monitoring (metrics, CPU, network, etc…) & dashboards
    • Events & Alerting
    • Log Aggregation & Analysis
  • CloudTrail
    • Record API calls made within your Account by everyone
    • Can define trails for specific resources
    • Global Service
  • Config
    • Record configuration changes
    • Evaluate resources against compliance rules
    • Get timeline of changes and compliance

      For an Elastic Load Balancer

  • CloudWatch:
    • Monitoring Incoming connections metric
    • Visualize error codes as % over time
    • Make a dashboard to get an idea of your load balancer performance
  • Config:
    • Track security group rules for the Load Balancer
    • Track configuration changes for the Load Balancer
    • Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
  • CloudTrail:
    • Track who made any changes to the Load Balancer with API calls

      AWS STS – Security Token Service

  • Allows to grant limited and temporary access to AWS resources.
  • Token is valid for up to one hour (must be refreshed)
  • AssumeRole
    • Within your own account: for enhanced security
    • Cross Account Access: assume role in target account to perform actions there
  • AssumeRoleWithSAML
    • return credentials for users logged with SAML
  • AssumeRoleWithWebIdentity
    • return creds for users logged with an IdP (Facebook Login, Google Login, OIDC compatible…)
    • AWS recommends against using this, and using Cognito instead
  • GetSessionToken
    • for MFA, from a user or AWS account root user

      Using STS to Assume a Role

  • Define an IAM Role within your account or cross-account
  • Define which principals can access this IAM Role
  • Use AWS STS (Security Token Service) to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)
  • Temporary credentials can be valid between 15 minutes to 1 hour

    Cross account access with STS

    Identity Federation in AWS

  • Federation lets users outside of AWS to assume temporary role for accessing AWS resources.
  • These users assume identity provided access role.
  • Federations can have many flavors:
    • SAML 2.0
    • Custom Identity Broker
    • Web Identity Federation with Amazon Cognito
    • Web Identity Federation without Amazon Cognito
    • Single Sign On
    • Non-SAML with AWS Microsoft AD
  • Using federation, you don’t need to create IAM users (user management is outside of AWS)

    SAML 2.0 Federation

  • To integrate Active Directory / ADFS with AWS (or any SAML 2.0)
  • Provides access to AWS Console or CLI (through temporary creds)
  • No need to create an IAM user for each of your employees

    SAML 2.0 Federation – Active Directory FS

  • Same process as with any SAML 2.0 compatible IdP

    SAML 2.0 Federation

  • Needs to setup a trust between AWS IAM and SAML (both ways)
  • SAML 2.0 enables web-based, cross domain SSO
  • Uses the STS API: AssumeRoleWithSAML
  • Note federation through SAML is the “old way” of doing things
  • Amazon Single Sign On (SSO) Federation is the new managed and simpler way
    • Read more here: https://aws.amazon.com/blogs/security/enabling-federation-to-aws-using-windows-active-directory-adfs-and-saml-2-0/

      Custom Identity Broker Application

  • Use only if identity provider is not compatible with SAML 2.0
  • The identity broker must determine the appropriate IAM policy
  • Uses the STS API: AssumeRole or GetFederationToken

    Web Identity Federation – AssumeRoleWithWebIdentity

  • Not recommended by AWS – use Cognito instead (allows for anonymous users, data synchronization, MFA)

    AWS Cognito

  • Goal:
    • Provide direct access to AWS Resources from the Client Side (mobile, web app)
  • Example:
    • provide (temporary) access to write to S3 bucket using Facebook Login
  • Problem:
    • We don’t want to create IAM users for our app users
  • How:
    • Log in to federated identity provider – or remain anonymous
    • Get temporary AWS credentials back from the Federated Identity Pool
    • These credentials come with a pre-defined IAM policy stating their permissions

      What is Microsoft Active Directory (AD)?

  • Found on any Windows Server with AD Domain Services
  • Database of objects: User Accounts, Computers, Printers, File Shares, Security Groups
  • Centralized security management, create account, assign permissions
  • Objects are organized in trees
  • A group of trees is a forest

    AWS Directory Services

  • AWS Managed Microsoft AD
    • Create your own AD in AWS, manage users locally, supports MFA
    • Establish “trust” connections with your on-premises AD
  • AD Connector
    • Directory Gateway (proxy) to redirect to on-premises AD, supports MFA
    • Users are managed on the on-premises AD
  • Simple AD
    • AD-compatible managed directory on AWS
    • Cannot be joined with on-premises AD

      AWS Organizations

  • Global service
  • Allows to manage multiple AWS accounts
  • The main account is the master account – you can’t change it
  • Other accounts are member accounts
  • Member accounts can only be part of one organiztion
  • Consolidated Billing across all accounts - single payment method
  • Pricing benefits from aggregated usage (volume discount for EC2, S3…)
  • API is available to automate AWS account creation

    Multi Account Strategies

  • Create accounts per department, per cost center, per dev / test / prod, based on regulatory restrictions (using SCP), for better resource isolation (ex: VPC), to have separate per-account service limit, isolated account for logging
  • Multi Account vs One Account Multi VPC
  • Use tagging standards for billing purposes
  • Enable CloudTrail on all accounts, send logs to central S3 account
  • Send CloudWatch logs to central logging account
  • Establish cross account roles for admin purposes

    Organizational Units (OU) - Examples

    AWS Organization

    Service Control Policies (SCP)

  • Whitelist or blacklist IAM actions
  • Applied at the OU or Account level
  • Does not apply to the Master Account
  • SCP is applied to all the Users and Roles of the Account, including Root user
  • The SCP does not affect service-linked roles
    • Service-linked roles enable other AWS services to integrate with AWS Organizations and can’t be restricted by SCPs.
  • SCP must have an explicit Allow (does not allow anything by default)
  • Use cases:
    • Restrict access to certain services (for example: can’t use EMR)
    • Enforce PCI compliance by explicitly disabling services

      SCP Hierarchy

      AWS Organization – Moving Accounts

  • To migrate accounts from one organization to another
    1. Remove the member account from the old organization
    2. Send an invite to the new organiztion
    3. Accept the invite to the new organization from the member account
  • If you want the master account of the old organization to also join the new organization, do the following:
    1. Remove the member accounts from the organizations using procedure above
    2. Delete the old organization
    3. Repeat the process above to invite the old master account to the new org

      IAM Conditions

      IAM for S3

  • ListBucket permission applies to arn:aws:s3:::test
  • => bucket level permission
  • GetObject, PutObject, DeleteObject applies to arn:awn:s3:::test/*
  • => object level permission

    IAM Roles vs Resource Based Policies

  • Attach a policy to a resource (example: S3 bucket policy) versus attaching of a using a role as a proxy

    IAM Roles vs Resource Based Policies

  • When you assume a role (user, application or service), you give up your original permissions and take the permissions assigned to the role
  • When using a resource based policy, the principal doesn’t have to give up his permissions
  • Example: User in account A needs to scan a DynamoDB table in Account A and dump it in an S3 bucket in Account B
  • Supported by: Amazon S3 buckets, SNS topics, SQS queues, etc…

    IAM Permission Boundaries

  • IAM Permission Boundaries are supported for users and roles (not groups)
  • Advanced feature to use a managed policy to set the maximum permissions an IAM entity can get.

    IAM Permission Boundaries

  • Can be used in combinations of AWS Organizations SCP
  • Use cases
    • Delegate responsibilities to non administrators within their permission
    • Allow developers to self-assign policies and manage their own permissions, while making sure they can’t escalate their privileges (= make themselves admin)
    • Useful to restrict one specific user (instead of a whole account using Organizations & SCP)

      IAM Policy Evaluation Logic

      Example IAM Policy

  • Can you perform sqs:CreateQueue?
  • Can you perform sqs:DeleteQueue?
  • Can you perform ec2:DescribeInstances?

    AWS Resource Access Manager (RAM)

  • Share AWS resources that you own with other AWS accounts
  • Share with any account or within your Organization
  • Avoid resource duplication!
  • VPC Subnets:
    • allow to have all the resources launched in the same subnets
    • must be from the same AWS Organizations.
    • Cannot share security groups and default VPC
    • Participants can manage their own resources in there
    • Participants can’t view, modify, delete resources that belong to other participants or the owner
  • AWS Transit Gateway
  • Route53 Resolver Rules
  • License Manager Configurations

    Resource Access Manager – VPC example

  • Each account…
    • is responsible for its own resources
    • cannot view, modify or delete other resources in other accounts
  • Network is shared so…
    • Anything deployed in the VPC can talk to other resources in the VPC
    • Applications are accessed easily across accounts, using private IP!
    • Security groups from other accounts can be referenced for maximum security

      AWS Single Sign-On (SSO)

  • Centrally manage Single Sign-On to access multiple accounts and 3rd -party business applications.
  • Integrated with AWS Organizations
  • Supports SAML 2.0 markup
  • Integration with on-premises Active Directory
  • Centralized permission management
  • Centralized a auditing with CloudTrail

    AWS Single Sign-On (SSO) – Setup with AD

    SSO – vs AssumeRoleWithSAML

This post is licensed under CC BY 4.0 by the author.