Skip to content

Chapter 9 — Monitoring & Incident Response

PART IV — OPERATIONS, COMPLIANCE, AND COST

Required Log Sources

Log TypeAWS SourcesAzure SourcesGCP SourcesRetention
Control PlaneCloudTrail, ConfigActivity Log, PolicyAudit Logs, Cloud Logging365 days
NetworkVPC Flow LogsNSG Flow LogsVPC Flow Logs90 days
ApplicationCloudWatch LogsApp InsightsCloud Logging90 days
SecurityGuardDuty, InspectorSecurity CenterSecurity Command Center365 days
DatabaseRDS Logs, CloudTrailSQL AuditCloud SQL Logs90 days

Centralized Log Architecture

LogAggregation:
Collection:
- AWS: CloudWatch Logs → S3 → SIEM
- Azure: Log Analytics → Event Hub → SIEM
- GCP: Cloud Logging → Pub/Sub → BigQuery → SIEM
Processing:
- Parsing: Field extraction and normalization
- Enrichment: Threat intelligence integration
- Correlation: Multi-source event correlation
- Storage: Hot/warm/cold tier optimization
Analysis:
- Real-time: Stream processing for alerts
- Batch: Historical trend analysis
- ML: Anomaly detection and pattern recognition
- Forensics: Long-term query capabilities

Common Event Format (CEF)

CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension

Custom Log Schema

{
"timestamp": "2025-02-11T10:30:45Z",
"event_type": "iam_policy_change",
"severity": "high",
"source": {
"service": "aws",
"region": "us-east-1",
"account": "123456789012"
},
"actor": {
"user": "[email protected]",
"ip": "203.0.113.45",
"mfa_status": "verified"
},
"action": {
"operation": "AttachUserPolicy",
"resource": "arn:aws:iam::123456789012:user/user1",
"policy": "arn:aws:iam::aws:policy/AdministratorAccess"
},
"risk_score": 85
}

Detection Rules Framework

DetectionRules:
PrivilegeEscalation:
- Rule: "IAM Policy Attachment to Sensitive Roles"
Conditions:
- action: "AttachUserPolicy OR AttachRolePolicy"
- policy: "*Admin* OR *FullAccess*"
- user_not_in: ["security_team", "devops_team"]
Severity: "High"
Response: "Alert + Block"
- Rule: "Multiple Failed MFA Attempts"
Conditions:
- event: "ConsoleLoginFailure"
- mfa_failure_count: "> 3 in 5 minutes"
- same_user: true
Severity: "Medium"
Response: "Alert + Temporary Lockout"
DataExfiltration:
- Rule: "Large Data Transfer to External"
Conditions:
- data_volume: "> 1GB in 1 hour"
- destination: "external IP ranges"
- protocol: "HTTPS, FTP, SFTP"
Severity: "High"
Response: "Alert + Block + Investigate"
- Rule: "Unusual S3 Access Patterns"
Conditions:
- unusual_time_access: "2AM-5AM"
- download_volume: "10x normal baseline"
- new_ip_address: true
Severity: "Medium"
Response: "Alert + MFA Challenge"
NetworkAnomalies:
- Rule: "Port Scanning Activity"
Conditions:
- connection_attempts: "> 100 ports"
- time_window: "5 minutes"
- source: "single IP"
Severity: "Medium"
Response: "Block Source IP + Alert"
- Rule: "Lateral Movement Detection"
Conditions:
- internal_connections: "> new normal"
- privileged_ports: [22, 3389, 1433, 3306]
- time_window: "1 hour"
Severity: "High"
Response: "Alert + Investigate"

Alert Escalation Matrix

SeverityResponse TimeEscalationActions
Critical< 5 minutesImmediate pageIncident response team
High< 15 minutes30 min escalationSecurity team notification
Medium< 1 hour4 hour escalationEmail + ticket creation
Low< 24 hoursWeekly reviewLog entry only

Visualizing the IR Workflow:

graph TD
  %% Classes
  classDef stage fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
  classDef action fill:#f3f4f6,stroke:#374151,stroke-width:2px,color:#1f2937
  classDef decision fill:#fce7f3,stroke:#db2777,stroke-width:2px,color:#9d174d

  Prepare[Preparation]:::stage
  Detect[Detection & Analysis]:::stage
  Contain[Containment]:::stage
  Eradicate[Eradication]:::stage
  Recover[Recovery]:::stage
  Lesson[Post-Incident Activity]:::stage

  Alert(Security Alert):::action
  Triage{False Positive?}:::decision
  Scope{Determine Scope}:::decision
  Clean(Remove Threat):::action
  Restore(Restore Service):::action
  Report(Final Report):::action

  Prepare --> Detect
  Detect --> Alert
  Alert --> Triage

  Triage -->|Yes| Detect
  Triage -->|No| Scope

  Scope --> Contain
  Contain --> Eradicate
  Eradicate --> Clean
  Clean --> Recover
  Recover --> Restore
  Restore --> Lesson
  Lesson --> Report
  Report --> Prepare

  %% Styling
  linkStyle default stroke:#9ca3af,stroke-width:2px

Detection Mechanisms

DetectionSources:
Automated:
- SIEM correlation rules
- Threat intelligence feeds
- Anomaly detection algorithms
- Vulnerability scan results
- User behavior analytics
Manual:
- Security team monitoring
- Employee reports
- External notifications
- Compliance audit findings
Indicators:
- Unauthorized access attempts
- Data access anomalies
- Configuration changes
- Performance degradation
- Alert floods

Containment Strategies

ContainmentActions:
Network:
- Block malicious IP addresses
- Isolate compromised subnets
- Disable compromised accounts
- Implement network segmentation
System:
- Isolate affected instances
- Disable compromised credentials
- Stop malicious processes
- Snapshot evidence
Data:
- Prevent data exfiltration
- Implement additional encryption
- Restrict data access
- Preserve evidence

Forensic Investigation Process

InvestigationSteps:
EvidenceCollection:
- System memory dumps
- Disk images
- Network captures
- Log files
- Configuration snapshots
TimelineReconstruction:
- Initial compromise point
- Lateral movement paths
- Data access patterns
- Persistence mechanisms
- Exfiltration methods
ImpactAssessment:
- Affected systems and data
- Data breach scope
- Business impact analysis
- Regulatory notification requirements

Eradication Activities

EradicationTasks:
MalwareRemoval:
- Scan and clean systems
- Remove persistence mechanisms
- Patch vulnerabilities
- Update security controls
AccessControl:
- Reset all credentials
- Review and update permissions
- Implement additional MFA
- Strengthen authentication
SystemHardening:
- Update security configurations
- Implement additional monitoring
- Deploy endpoint protection
- Harden network controls

Recovery Planning

RecoverySteps:
SystemRestoration:
- Restore from clean backups
- Validate system integrity
- Reinstall critical applications
- Test functionality
Validation:
- Security testing
- Performance validation
- Access control verification
- Monitoring confirmation
Communication:
- Stakeholder notifications
- Customer communications
- Regulatory reports
- Post-incident briefings

Post-Incident Activities

PostIncidentActivities:
RootCauseAnalysis:
- Identify security gaps
- Analyze detection failures
- Review response effectiveness
- Document lessons learned
ImprovementPlanning:
- Update security controls
- Enhance monitoring capabilities
- Improve response procedures
- Conduct additional training
KnowledgeSharing:
- Update incident response playbooks
- Share threat intelligence
- Update security awareness training
- Document for compliance audits