Bearbeiten

Freigeben über


Administer your Azure cloud estate

This article explains how to effectively manage your Azure cloud estate to ensure operational health. You need strong administrative control over your cloud operations to ensure the cloud aligns with your business objectives. Follow these best practices:

Identify your management scope

Determine your management scope clearly for each deployment model to make informed management decisions for your cloud estate. Infrastructure (IaaS) and platform services (PaaS) operate within Azure. Compare these responsibilities with on-premises environments and software services (SaaS). Use this table to identify your responsibilities in each deployment model.

Administration areas On-premises scope IaaS scope (Azure) PaaS scope (Azure) SaaS scope
Data X X X X
Code and runtime X X X
Cloud resources X X X
Operating system X X
Virtualization layer X
Physical hardware X

Manage change

Change is the most common source of problems in the cloud. As a result, you need a change management approach that tracks changes and their approvals. It should also detect unapproved changes and revert them to a desired state. Follow these steps:

  1. Develop a change request process. Use a formal system, such as a ticketing tool, pull request (GitHub or Azure DevOps), or designated forms. The change request process must capture key details like the type of change, requester identity, target environment, scope, and the reason. Keep separate procedures for routine service requests like password resets.

  2. Assess the risk associated with the change. Assign clear risk categories (high, medium, low) to balance deployment speed with risk management. Evaluate each change according to criteria like downtime tolerance (error budget) and workload criticality. Use the following table as an example to help determine the appropriate approval workflow:

    Risk level Downtime allowance Workload criticality Approval process Example changes
    High No downtime allowed These changes affect mission-critical systems that require continuous availability with zero tolerance for any downtime. Multiple senior engineer reviews, automated pipeline alerts, fast canary release, and active monitoring. Critical infrastructure updates
    Medium Short downtime allowed These changes affect important systems with limited tolerance for downtime. Automated pipeline flags the change. Quick review by engineers if monitoring raises an alert. Noncritical system updates, feature enhancements during short maintenance windows
    Low Ample downtime allowed These changes affect noncritical systems where extended downtime is acceptable without affecting overall operations. Fully automated deployment via CI/CD runs predeployment tests and monitoring. Routine updates, minor policy updates
  3. Standardize approval clearly. Define approval criteria and authority required at each risk level. Specify who must review each change, whether it's a single approver or a review board, and clarify how reviewers must provide and resolve feedback.

  4. Standardize the deployment process. Clearly outline the procedures for building, testing, and deploying approved changes to production. For details, see Manage cloud resources.

  5. Standardize the post-deployment process. Implement monitoring and validation steps to confirm successful changes. Include a clear rollback strategy to quickly restore service if a change introduces problems.

  6. Prevent and detect unauthorized change. Use Change Analysis to detect configuration changes and explain their underlying causes. Use Azure Policy to deny and audit changes using effects like (Deny, DenyAction), (Audit, and auditIfNotExists). If you use Bicep, consider using Bicep deployment stacks to prevent unauthorized changes.

Manage security

Identity is your security perimeter. Use a standardized platform to verify identities, restrict permissions, and maintain secure resource configurations. Follow these steps:

  1. Manage identities. Use Microsoft Entra ID as your unified identity management solution. Clearly define permissions by applying role-based access control (RBAC). Use Microsoft Entra ID Governance to control access request workflows, access reviews, and identity lifecycle management. Enable Privileged Identity Management to grant just-in-time privileged access. This strategy reduces unnecessary elevated access. Manage all three identity types (user, application, device) consistently to ensure proper authentication and authorization.

  2. Manage access. Use Azure role-based access control (RBAC) and attribute-based access control (ABAC) to grant the least permission to accomplish the job. Prefer role assignments based on groups to limit management overhead. Grant permissions at the lowest required scope, such as subscriptions, resource groups, or individual resources. Avoid overly broad permission scopes to prevent unintended privilege escalation. Assign only the necessary permissions for each user's role.

  3. Manage resource configurations. Use infrastructure as code (IaC) to ensure consistent and reproducible configuration of resources. Then use Azure Policy to enforce organizational standards and assess compliance. Then use Azure Policy to enforce secure configurations of specific Azure services. Reference the Security baselines for guidance on available security capabilities and optimal security configurations. As an add-on feature, use security policies in Defender for Cloud to align with common security standards.

  4. Manage authentication. Ensure users adopt strong authentication through multifactor authentication (MFA) and use Microsoft Entra multifactor authentication (MFA). Always require conditional access to enforce authentication based on user identity, device health, and access context. Configure self-service password reset and eliminate weak passwords.

  5. Manage security information. Use Microsoft Sentinel for security information and even management (SIEM) and security orchestration, automation, and response (SOAR).

  6. Control workload security. For workload security recommendations, see the Well-Architected Framework's security checklist and Azure service guides (start with the Security section)

Manage compliance

Compliance management ensures that Azure operations remain aligned with established governance policies and regulatory standards. This practice reduces risk by safeguarding the environment from potential violations and misconfigurations.

  1. Understand your governance policies. Governance policies define the high-level constraints that your teams must follow to remain compliant. Review your organization's policies and map each requirement to your operational processes. If you don't have governance policies, first document governance policies.

  2. Manage compliance. Enforcing compliance ensures your environment remains aligned with both organizational and regulatory standards. See the following table for policy recommendations.

    Recommendation Details
    Start with General policy definitions Begin with Azure Policy's general definitions, including allowed locations, disallowed resource types, and audit custom RBAC roles.
    Align with regulatory standards Use Azure Policy's free, built-in definitions aligned with regulatory standards such as ISO 27001, NIST SP 800-53, PCI DSS, EU GDPR

For more information, see Enforcing compliance in Azure.

Manage data

Managing data in cloud operations involves actively classifying, segmenting, securing access, and protecting against deletion. Effective data control safeguards sensitive information, maintains compliance, and ensures data reliability during operational changes.

  1. Discover and classify data. Identify and categorize data according to sensitivity and importance. This classification guides tailored controls for each data type. Use Microsoft Purview for data governance. For more information, see Data sources that connect to Microsoft Purview Data Map.

  2. Control data residency. Select regions within your geography, such as the United States or Europe, to meet data residency requirements. Verify any exceptions because certain Azure services might store data outside your selected region. Regularly review Azure data residency settings and compliance requirements to maintain full control over your customer data.

  3. Isolate internal (“Corp”) and internet-facing (“Online”) workloads. Use management groups to separate internal and external workloads. Internal workloads typically require connectivity or hybrid connectivity to your corporate network. External workloads usually don't require corporate network connectivity and might need direct inbound or outbound internet access. For an example, review the "Corp" (internal) and "Online" (internet-facing) management groups in Azure landing zone.

  4. Enforce access control. Implement robust access controls, such as Azure RBAC and ABAC, to ensure only authorized personnel access sensitive data based on defined classifications.

  5. Protect data from deletion. Use features such as soft delete, data versioning, and immutability where available. Implement database versioning and prepare rollback procedures. Utilize Azure Policy to explicitly deny datastore deletions (Deny or DenyAction) or audit (Audit or auditIfNotExists) any changes. If you use Bicep, consider using Bicep deployment stacks to prevent unauthorized changes. Only use resource locks strictly to prevent unintended modifications or deletions of critical data. Avoid using resource locks to protect configurations, as resource locks complicate IaC deployments

  6. Manage workload data. See the Well-Architected Framework's recommendations on Data classification.

For more information, see Enforce data governance.

Manage costs

Managing costs in cloud operations means tracking spending actively both centrally and per workload. Cost control should provide visibility into expenditures and encourage responsible spending. Follow these steps:

  1. Manage and review costs. Use Microsoft Cost Management tools to monitor cloud costs. Azure lacks a subscription-wide mechanism to cap spending at a certain threshold. Some services, like Azure Log Analytics workspace, have spending caps. Your cost monitoring strategy serves as your primary tool for managing expenses.

  2. Manage workload costs. Grant billing access to workload teams. Have these teams use the Well-Architected Framework's Cost Optimization checklist.

Manage code and runtime

Managing code and runtime are workload responsibilities. Have your workload teams use the Well-Architected Framework's Operational Excellence checklist, which outlines 12 recommendations to control code and runtime.

Manage cloud resources

Managing cloud resources involves governance, oversight, and maintenance of all Azure services, deployments, and infrastructure. Establish clear deployment protocols and proactive drift detection strategies to maintain consistency across environments. Follow these recommendations:

Manage portal-based deployments

Define protocols and limits for portal-based deployments to minimize the potential for production problems. Follow these steps:

  1. Define portal deployment policy. Ensure significant portal-based changes adhere to established change management processes. Use portal deployments primarily for rapid prototyping, troubleshooting, or minor adjustments in development and testing environments. Avoid unstructured portal changes because these changes lead to drift, misconfigurations, and compliance issues. Instead, rely on version-controlled infrastructure-as-code (IaC) templates for consistency. For more information, see code-based deployments.

  2. Differentiate environments. Limit portal-based changes strictly to nonproduction environments. Allow rapid prototyping exclusively in dedicated development or testing environments and enforce stringent controls in production.

  3. Restrict portal permissions. Limit deployment capabilities from the portal using role-based access control (RBAC). Assign read-only permissions by default, and escalate privileges only when necessary.

    • Grant just-in-time access. Use Privileged Identity Management (PIM) for accessing Azure and Microsoft Entra resources. Require sequential approvals from multiple individuals or groups for activating PIM. Reserve privileged roles (“A0” super admin roles) exclusively for emergency scenarios.

    • Structure RBAC based on the operating model. Design RBAC policies tailored to operational teams, including support levels, security operations, platforms, networking, and workloads.

    • Audit all activities. Monitor and record all actions in your system. Use Azure Policy to audit (Audit or auditIfNotExists) changes. Additionally, configure alert in Azure Monitor to notify stakeholders when someone deletes an Azure resource. If you use Bicep, consider using Bicep deployment stacks to prevent unauthorized changes.

  4. Use version-controlled templates. Limit portal use to emergency scenarios if employing IaC deployments. Portal changes result in configuration drift from IaC templates. Replicate all portal-based changes immediately in version-controlled IaC templates, such as Bicep, Terraform, or ARM templates. Regularly export Azure resource configurations and store them as IaC to maintain production environments aligned with approved, traceable configurations. See guidance on how export Azure configurations as Bicep, Terraform, or ARM templates. Consider template specs if using ARM templates.

    Tool Use Case
    Bicep Manageable, readable Azure-specific IaC
    Terraform Multicloud solution, broader community support
    ARM templates Full control, comfortable with JSON

Manage code-based deployments

Adopt code-based deployments to automate and control complex or large-scale changes. Follow these steps:

  1. Standardize tooling. Use a consistent toolset to minimize context switching. Choose developer tools (VS Code, Visual Studio), a code repository (GitHub, Azure DevOps), a CI/CD pipeline (GitHub Actions, Azure Pipelines), and an IaC solution (Bicep, Terraform, or ARM templates) that work together.

  2. Use version control. Maintain a single source of truth for your code. Use version control to reduce configuration drift and simplify rollback procedures.

  3. Use deployment pipelines. A CI/CD pipeline automates the build process, runs tests, and scans code for quality and security issues with each pull request. Use GitHub Actions or Azure Pipelines to build and deploy application code and IaC files. Enforce precommit hooks and automated scans to catch unauthorized or high-risk changes early.

  4. Test deployments. Stage approvals within your CI/CD pipelines to validate deployments progressively. Follow this sequence: development, build verification, integration tests, performance tests, user acceptance testing (UAT), staging, canary releases, preproduction, and finally, production.

  5. Use infrastructure as code (IaC). Use IaC to ensure consistency and manage deployments through version control. Move from Azure portal-based proof-of-concepts to IaC for production environments. Use Bicep, Terraform, or ARM templates to define resources. For Bicep, use modules and consider deployment stacks. For ARM template, consider use template specs for versioned deployment.

  6. Apply code repository best practices. Following these standards reduces errors, streamlines code reviews, and avoids integration issues. For high-priority production environments:

    Requirement Description
    Disable direct pushes Block direct commits to the main branch
    Require pull requests Require all changes to pass through a pull request
    Require code reviews Ensure someone other than the author reviews every pull request
    Enforce code coverage thresholds Ensure a minimum percentage of code passes automated tests for all pull requests
    Use validation pipelines Configure branch protection rules to run a validation pipeline for pull requests
  7. Require workload team onboarding checks. Verify that new codebases and teams align with business goals, standards, and best practices. Use a checklist to confirm code repository structure, naming standards, coding standards, and CI/CD pipeline configurations.

Manage configuration drift

Manage configuration drift by identifying and correcting discrepancies between your intended configuration and the live environment. Follow these best practices:

  1. Prevent and detect change. Use Change Analysis to detect configuration changes and explain their underlying causes. Use Azure Policy to deny and audit changes using effects like (Deny, DenyAction), (Audit, and auditIfNotExists). If you use Bicep, consider using Bicep deployment stacks to prevent unauthorized changes.

  2. Detect IaC configuration drift. Drift occurs when someone updates the IaC file (intentional, unintentional) or makes a change in the Azure portal. Regularly compare the live environment with your desired configuration to detect drift:

    • Store desired and last-known-good configurations. Save your desired configuration file in a version-controlled repository. This file shows the original, intended configuration. Maintain a last-known-good configuration as a reliable rollback reference and drift detection baseline.

    • Detect configuration drift before deployment. Preview potential changes before deployment using Terraform plan, Bicep what-if, or ARM template what-if. Investigate discrepancies thoroughly to ensure proposed changes align with the desired state.

    • Detect drift post deployment. Regularly compare live environments with desired configurations through regular drift checks. Integrate these checks into your CI/CD pipelines or conduct them manually to maintain consistency.

    • Rollback to last-known-good configuration. Develop clear rollback strategies that use automated procedures within your CI/CD pipeline. Utilize your last-known-good configuration to quickly revert undesired changes and minimize downtime.

    • Minimize portal-driven changes. Minimize non-IaC changes to emergency scenarios only. Enforce strict access controls such as Privileged Identity Management. Promptly update IaC files if manual adjustments are necessary to preserve the accuracy of your desired configuration.

Manage operating systems

Where you use virtual machines, you need to also manage the operating system. Follow these steps:

  1. Automate virtual machine maintenance. In Azure, use automation tools to create and manage Azure virtual machines. Use Azure Machine Configuration to audit or configure operating system settings as code for machines running in Azure and hybrid.

  2. *Update operating systems. You need to manage guest updates and host maintenance to ensure the operating systems are up to date for security purposes.

  3. Monitor in-guest operations. Use the Azure Change Tracking and Inventory service to enhance the auditing and governance for in-guest operations. It monitors changes and provides detailed inventory logs for servers across Azure, on-premises, and other cloud environments.

Azure management tools

Category Tool Description
Manage change Change Analysis Detects configuration changes and explains their underlying causes
Manage change Azure Policy Enforces, audits, or prevents modifications to cloud resources
Manage change Bicep deployment stacks Prevents unauthorized changes.
Manage security Azure security baselines Provides guidance on available security capabilities and optimal security configurations
Manage security Well Architected Framework's security pillar Security guidance for workload design
Manage security Azure service guides (start with the Security section) Security configuration recommendations for Azure services
Manage security Microsoft Entra ID Provides unified identity management
Manage security Defender for Cloud Aligns resource configurations with security standards
Manage security Microsoft Sentinel Provides security information and even management (SIEM) and security orchestration, automation, and response (SOAR)
Manage security Azure RBAC Grants secure access with role-based assignments
Manage security Azure ABAC Grants secure access based on attribute conditions
Manage security Microsoft Entra ID Governance Manages access workflows and identity lifecycle
Manage security Privileged Identity Management Offers just-in-time privileged access
Manage security Microsoft Entra multifactor authentication (MFA) Enforces strong multifactor authentication
Manage security Conditional Access Enforces context-based authentication
Manage security Self-service password reset Allows secure user password resets
Manage compliance Azure Policy Enforces standards and secures resource configurations
Manage data Microsoft Purview Governs and classifies sensitive data
Manage data Azure Policy Prevents or audits unintended modifications or deletions of resources
Manage data Resource locks Prevents unintended modifications or deletions
Manage costs Monitor costs Monitoring is essential to managing cloud costs
Manage cloud resources Azure Policy Enforces, audits, or prevents modifications to cloud resources
Manage cloud resources (portal deployments) ARM template export Exports resource configurations as IaC templates
Manage cloud resources (portal deployments) Azure Monitor alerts Notifies stakeholders of resource changes
Manage cloud resources (code deployments) Bicep Manages infrastructure as code for Azure resources
Manage cloud resources (code deployments) Bicep deployment stacks Supports version-controlled deployments and prevents unauthorized changes
Manage cloud resources (code deployments) Terraform Manages multicloud infrastructure as code
Manage cloud resources (code deployments) ARM templates Defines and deploys Azure resources with templates
Manage cloud resources (code deployments) ARM Template specs Versions and manages ARM templates for consistency
Manage cloud resources (code deployments) GitHub Actions Automates build, test, and deployment pipelines
Manage cloud resources (code deployments) Azure Pipelines Automates build and deployment processes
Manage drift Azure Policy Enforces, audits, or prevents modifications to cloud resources
Manage drift Change Analysis Detects and explains configuration changes
Manage drift Bicep what-if Previews potential configuration changes
Manage drift Terraform plan Previews potential changes before Terraform deployment
Manage drift ARM template what-if Previews potential configuration changes
Manage operating systems Azure Machine Configuration Audits and configures operating system settings as code
Manage operating systems Azure Change Tracking and Inventory service Monitors and logs changes for operating systems
Manage operating systems Automation tools Automates virtual machine maintenance