HDInsight cluster management best practices
Learn best practices for managing HDInsight clusters.
How do I create HDInsight clusters?
Option | Documents |
---|---|
Azure Data Factory | Create on-demand Apache Hadoop clusters in HDInsight using Azure Data Factory |
Custom Resource Manager template | Create Apache Hadoop clusters in HDInsight by using Resource Manager templates |
Quickstart templates | HDInsight Quickstart templates |
Azure samples | HDInsight Azure samples |
Azure portal | Create Linux-based clusters in HDInsight by using the Azure portal |
Azure CLI | Create HDInsight clusters using the Azure CLI |
Azure PowerShell | Create Linux-based clusters in HDInsight using Azure PowerShell |
cURL | Create Apache Hadoop clusters using the Azure REST API |
SDKs (.NET, Python, Java) | .NET, Python, Java, Go |
Note
If you are creating a cluster and re-using the cluster name from a previously created cluster, wait until the previous cluster deletion is completed before creating your cluster.
How do I customize HDInsight clusters?
Option | Documents |
---|---|
Script actions | Customize Azure HDInsight clusters by using script actions |
Bootstrap | Customize HDInsight clusters using Bootstrap |
External metastores | Use external metadata stores in Azure HDInsight |
Custom Ambari DB | Set up HDInsight clusters with a custom Ambari DB |
What are some errors I might face when creating clusters?
Error | More information |
---|---|
No quota | There are quotas for the number of cores that you can create on your subscription in each region. For more information, see Capacity planning: quotas. |
No more IP addresses available | Each VNet has a limited number of IP addresses. When you create a HDInsight cluster, each node (including zookeeper and gateway nodes) uses some of these allotted IP addresses. When all of the IP addresses are in use, you will encounter this error. |
Network security group (NSG) rules don't allow communication with HDInsight resource providers | If you use NSGs or user-defined routes (UDRs) to control inbound traffic to your HDInsight cluster, you must ensure that your cluster can communicate with critical Azure health and management services. For more information, see Network security group (NSG) service tags for Azure HDInsight |
Reuse of cluster name | When you use a cluster name that you have used before, you need to wait X number of minutes before recreating the cluster. Otherwise you will see a message that the resource already exists. |
How do I manage running HDInsight clusters?
Option | Documents |
---|---|
Autoscale | Automatically scale Azure HDInsight clusters |
Manual scaling | Scale Azure HDInsight clusters |
Monitoring with Ambari | Monitor cluster performance in Azure HDInsight |
Monitoring with Azure Monitor logs | Use Azure Monitor logs to monitor HDInsight clusters |
Service issues, planned maintenance, health & security advisories | Subscribe to subscription specific service health alerts |
How do I check on deleted HDInsight clusters?
Azure Monitor logs
You can use the following query with Azure Monitor logs to monitor deleted clusters.
AzureActivity
| where ResourceProvider == "Microsoft.HDInsight" and (OperationName == "Create or Update Cluster" or OperationName == "Delete Cluster") and ActivityStatus == "Succeeded"