Troubleshooting Hyper-V Replica

Introduction to Troubleshooting Hyper-V Replica

This section explains how to troubleshoot Hyper-V Replica.  Use this guide when:

  • You have problems with connectivity between Primary and Replica servers
  • You have problems enabling a virtual machine for replication
  • You have problems with virtual machine replication whether it is Initial Replication (IR) or Delta Replication (DR)
  • You have problems executing management actions associated with virtual machines on a Primary or Replica server
  • You have problems with the Replication Broker configured in a Hyper-V Failover Cluster.
  • You need to collect Performance monitoring data for replicated virtual machines.

Tools for Troubleshooting Hyper-V Replica

Utilities and Commands for Troubleshooting Hyper-V Replica

Performance Monitor

Performance Monitor contains Hyper-V counters specific to Hyper-V Replica.  These counters monitor replication statistics for configured virtual machines.  The specific counter is Hyper-V Failover Replication Counter VM.  The data that can be collected for each selected virtual machine includes:

  • Average Replication Latency
  • Average Replication Size
  • Last Replication Size
  • Network Bytes Received
  • Network Bytes Sent
  • Replication Count
  • Replication Latency
  • Resynchronized Bytes
Hyper-V Replica Integration into the Hyper-V Best Practice Analyzer (BPA)

Rules pertaining to Hyper-V Replica are included in the Hyper-V Best Practice Analyzer. The following BPA Rule details are provided to assist with troubleshooting:

Summary

Detail

Rule Title A Replica server must be configured to accept replication requests
Severity Red
Category Configuration
Issue This computer is designated as a Hyper-V Replica server but is not configured to accept incoming replication data from primary servers.
Impact This server cannot accept replication traffic from primary servers.
Resolution Use Hyper-V Manager to specify which primary servers this Replica server should accept replication data from.

Summary

Detail

Rule Title Replica servers should be configured to identify specific primary servers authorized to send replication traffic
Severity Yellow
Category Configuration
Issue As configured, this Replica server accepts replication traffic from all primary servers and stores them in a single location.
Impact All replication from all primary servers is stored in one location, which might introduce privacy or security problems.
Resolution Use Hyper-V Manager to create new authorization entries for the specific primary servers and specify separate storage locations for each of them. You can use wildcard characters to group primary servers into sets for each authorization entry.

Summary

Detail

Rule Title Compression is recommended for replication traffic
Severity Yellow
Category Configuration
Issue The replication traffic sent across the network from the primary server to the Replica server is uncompressed.
Impact Replication traffic will use more bandwidth than necessary. This impacts the following virtual machines:<List of VMs>
Resolution Configure Hyper-V Replica to compress the data transmitted over the network in the settings for the virtual machine in Hyper-V Manager. You can also use tools outside of Hyper-V to perform compression.

Summary

Detail

Rule Title Configure guest operating systems for VSS-based backups to enable application-consistent snapshots for Hyper-V Replica
Severity Red
Category Configuration
Issue Application-consistent snapshots require that Volume Shadow Copy Services (VSS) is enabled and configured in the guest operating systems of virtual machines participating in replication.
Impact Even if application-consistent snapshots are specified in the replication configuration, Hyper-V will not use them unless VSS is configured. This impacts the following virtual machines:<List of VMs>
Resolution Use Hyper-V Manager to install integration services in the virtual machine.

Summary

Detail

Rule Title Integration services must be installed before primary or Replica virtual machines can use an alternate IP address after a failover
Severity Red
Category Configuration
Issue Virtual machines participating in replication can be configured to use a specific IP address in the event of failover, but only if integration services are installed in the guest operating system of the virtual machine.
Impact In the event of a failover (planned, unplanned, or test), the Replica virtual machine will come online using the same IP address as the primary virtual machine. This configuration might cause connectivity issues. This impacts the following virtual machines:<List of VMs>
Resolution Use Hyper-V Manager to install integration services in the virtual machine.

Summary

Detail

Rule Title To participate in replication, servers in failover clusters must have a Hyper-V Replica Broker configured
Severity Red
Category Configuration
Issue For failover clusters, Hyper-V Replica requires the use of a Hyper-V Replica Broker name instead of an individual server name.
Impact If the virtual machine is moved to a different failover cluster node, replication cannot continue.
Resolution Use Failover Cluster Manager to configure the Hyper-V Replica Broker. In Hyper-V Manager, ensure that the replication configuration uses the Hyper-V Replica Broker name as the server name.

Summary

Detail

Rule Title Virtual hard disks with paging files should be excluded from replication
Severity Yellow
Category Configuration
Issue Paging files should be excluded from participating in replication, but no disks have been excluded.
 Impact Virtual hard disks that experience a high volume of input/output activity will unnecessarily require much greater resources to participate in replication. This impacts the following virtual machines:\n{0}
Resolution If you have not already done so, create a separate virtual hard disk for the Windows paging file. If initial replication has already been completed, use Hyper-V Manager to remove replication. Then, configure replication again and exclude the virtual hard disk with the paging file from replication.

Summary

Detail

Rule Title Configure the Failover TCP/IP settings that you want the Replica virtual machine to use in the event of a failover
Severity Yellow
Category Configuration
Issue Replica virtual machines configured with a static IP address should be configured to use a different IP address from their primary virtual machine counterpart in the event of failover.
Impact Clients using the workload supported by the primary virtual machine might not be able to connect to the Replica virtual machine after a failover. Also, the primary virtual machine’s original IP address will not be valid in the Replica virtual machine network topology.
Resolution Use Hyper-V Manager to configure the IP address that the Replica virtual machine should use in the event of failover. This impacts the following virtual machine(s): <List of VMs>

Summary

Detail

Rule Title Authorization entries should have distinct tags for primary servers with virtual machines that are not part of the same security group.
Severity Yellow
Category Configuration
Issue The server will accept replication requests for the replica virtual machine from any of the servers in the authorization list associated with the same replication tag as of the VM.
Impact There might be privacy and security concerns with a virtual machine accepting replication from primary servers belonging to different authorization entries. This impacts the following authorization entries:<List of VMs>
Resolution Use different tags in the authorization entries for primary servers with virtual machines that are not part of the same security group. Modify the Hyper-V settings to configure the replication tags.

Summary

Detail

Rule Title Certificate-based authentication is configured, but the specified certificate is not installed on the Replica server or failover cluster nodes
Severity Red
Category Configuration
Issue The security certificate that Hyper-V Replica has been configured to use to provide certificate-based replication is not installed on the Replica server (or any failover cluster nodes).
Impact In the event of a cluster failover or move to another node, Hyper-V replication will pause if the new node does not also have the appropriate certificate installed. This impacts the following nodes: <List of nodes>
Resolution Install the configured certificate on the Replica server (and all associated nodes in the failover cluster, if any).

Summary

Detail

Rule Title Replication is paused for one or more virtual machines on this server
Severity Yellow
Category Operation
Issue Replication is paused for one or more of the virtual machines. While the primary virtual machine is paused, any changes that occur will be accumulated and will be sent to the Replica virtual machine once replication is resumed.
Impact As long as replication is paused, accumulated changes occurring in the primary virtual machine will consume available disk space on the primary server. After replication is resumed, there might be a large burst of network traffic to the Replica server. This impacts the following virtual machines: <List of VMs>
Resolution Confirm that pausing replication was intended. If replication was paused to address low disk space or network connectivity, resume replication as soon as those issues are resolved.

Summary

Detail

Rule Title Initial replication is complete, but no test failover has been attempted
Severity Red
Category Operation
Issue No test failovers have been attempted since completing initial replication.
Impact A test failover confirms that failover will succeed and that all workload operations on the primary virtual machine continue properly after failover to the Replica virtual machine. This impacts the following virtual machines: <List of VMs>
Resolution Use Hyper-V Manager to conduct a test failover.

Summary

Detail

Rule Title There has been no test failover in at least one month
Severity Yellow
Category Operation
Issue Test failovers should be carried out at least monthly to verify that failover will succeed and that virtual machine workloads will operate as expected after failover.
Impact A test failover confirms that failover will succeed and that all workload operations on the primary virtual machine continue properly after failover to the Replica virtual machine. This impacts the following virtual machines: <List of VMs>
Resolution Use Hyper-V Manager to conduct a test failover.

Summary

Detail

Rule Title Certificate-based authentication is recommended for replication.
Severity Yellow
Category Configuration
Issue One or more virtual machines selected for replication are configured for Kerberos authentication.
Impact The replication network traffic from the primary server to the replication server is unencrypted. This impacts the following virtual machines:<List of VMs>
Resolution If another method is being used to perform encryption, you can ignore this. Otherwise, modify the virtual machine settings to choose certificate-based authentication.

Summary

Detail

Rule Title Configure a policy to throttle the replication traffic on the network
Severity Yellow
Category Configuration
Issue There might not be a limit on the amount of network bandwidth that replication is allowed to consume.
Impact Network bandwidth could become completely dominated by replication traffic, affecting other critical network activity. This impacts the following ports: <List of Ports>
Resolution If you use another method to throttle network traffic, you can ignore this. Otherwise, use Group Policy Editor to configure a policy that will throttle the network traffic to the relevant port of the Replica server.

Summary

Detail

Rule Title Resynchronization of replication should be scheduled for off-peak hours.
Severity Yellow
Category Configuration
Issue Resynchronization of replication for the primary VMs is not scheduled for off-peak hours.
Impact Replication logs and Replication Point Objective will increase when the VM is in a resynchronize-required state for a longer time. At the same time, resynchronization will affect the IOPS bandwidth on the primary and the replica server, hence might affect production workloads.
Resolution Use Hyper-V Manager VM Replication settings to configure the auto-resynchronize replication window of the primary VM within the off-peak hours.

Summary

Detail

Rule Title VHDX-based virtual hard disks are recommended for virtual machines that have recovery history enabled in replication settings.
Severity Yellow
Category Configuration
Issue VHD-based virtual hard disks are being used for the virtual machines that are enabled for replication with recovery history turned on.
Impact Under some circumstances, the VHDs on the replica server could experience consistency issues. This impacts the following virtual machine(s): <List of VMs>
Resolution Use the new virtual hard disk format (VHDX) for the virtual machines that are enabled for replication with recovery history turned on. You can convert a virtual hard disk from VHD format to VHDX format. The VHDX format has reliability mechanisms that help protect the disk from corruptions due to system power failures. However, do not convert the virtual hard disk if it is likely to be attached to an earlier release of Windows at some point. Windows releases earlier than {1} do not support the VHDX format.

Summary

Detail

Rule Title Recovery snapshots should be removed after failover.
Severity Yellow
Category Operation
Issue A failed over virtual machine has one or more recovery snapshots.
Impact Available space may run out on the physical disk that stores the snapshot files. If this occurs, no additional disk operations can be performed on the physical storage. Any virtual machine that relies on the physical storage could be affected. This impacts the following virtual machines: <List of VMs>
Resolution For each failed over virtual machine, use the Complete-VMFailover cmdlet in Windows PowerShell to remove the recovery snapshots and indicate Failover completion.

Summary

Detail

Rule Title A large number of recovery points has been configured
Severity Yellow
Category Configuration
Issue Hyper-V Replica has been configured to store more than nine previous recovery points.
Impact Maintaining too many recovery points could cause the Replica server to run out of available disk space. This impacts the following virtual machines: <List of VMs>
Resolution Review the number of recovery points configured, taking into account factors such as the number of virtual machines on the server and the oldest recovery point that is really required.

General Methodology for Troubleshooting Hyper-V Replica

Hyper-V Replica connectivity issues between Primary and Replica servers

Symptom:  Hyper-V Replica functionality is disrupted and the Hyper-V VMMS\Admin log reports general network connectivity errors between the Primary and Replica server
  1. Verify the Replica server is booted and running.
  2. Check network connectivity and name resolution functionality between the Primary and Replica server by executing ping and nslookup tests.  If ping test fails, resolve network connectivity issues.  If name resolution fails, check DNS
  3. Ensure the Replica server is listening on the Replica Server Port.  This can be accomplished by running a netstat -ano command on the Replica server after verifying the  appropriate firewall rule has been Enabled or the custom firewall rule has been configured to allow Inbound communications on the configured port
    troblehooting_Rep
  4. Inspect the System Event Log on the Primary and Replica servers to determine if there is any failure condition associated with network functionality
  5. Run the Hyper-V Best Practice Analyzer (BPA) and inspect the report for any configuration or operational issues

Configuring a virtual machine for replication

Symptom:  Configuring a virtual machine for replication fails.
  1. Verify the Replica server is booted and running.
  2. Check network connectivity between and name resolution functionality the Primary and Replica server by executing a ping and nslookup tests.  If the ping test fails, resolve network connectivity issues. If name resolution fails, check DNS
  3. Ensure the Replica server is listening on the Replica Server Port and the Authentication Type is configured correctly.
  4.  If the Replica server configuration matches the parameters entered in the Enable Replication wizard,  verify the Firewall on the Replica server has been configured to allow Inbound communications on the Replica Server Port
  5. Inspect the System Event Log on the Primary and Replica servers to determine if there is any failure condition associated with network functionality
  6. Inspect the Hyper-V VMMS\Admin Log for any events related to network connectivity on both the Primary and Replica servers

Virtual machine Planned Failover process

A virtual machine Planned Failover process is a planned event where a running virtual machine on the Primary server is moved to a designated Replica server.

Symptom:  The Check that virtual machine is turned off Pre-Requisite test fails.
  1. Ensure the virtual machine has been shut down prior to executing a Planned Failover to a Replica server
Symptom:  The Check configuration for allowing revers replication test fails.
  1. Ensure the Primary server has also been configured as a Replica server.  The assumption is that if a Planned Failover is executed to a Replica server, the virtual machine will use the Primary server as the new Replica server.   This configuration in the virtual machine is included as part of the Planned Failover process
Symptom:  Send un-replicated data to Replica server fails.
  1. Verify network connectivity to the Replica server using the procedures outlined in the Hyper-V Replica connectivity issues between Primary and Replica servers section

Configuring a virtual machine for Reverse Replication

Symptom:  Reverse Replication configuration for a virtual machine results in a failure.
  1. Verify network connectivity to the Hyper-V server being used as a Replica server using the procedures outlined in the Hyper-V Replica connectivity issues between Primary and Replica servers section

Initial Replication (IR) for a virtual machine

Symptom:  Initial Replication (IR) for a virtual machine fails.
  1. Verify network connectivity to the Replica server using the procedures outlined in the Hyper-V Replica connectivity issues between Primary and Replica servers section
  2. Ensure the protocol configuration between the Primary and Replica server match
  3. Verify the Primary server is authorized to replicate with the Replica server this includes verifying the Security Tags match
  4. Ensure the Authentication method matches between the Primary and Replica server
  5. If there is an error on the Replica server indicating there is insufficient storage space,   verify there is sufficient storage space available on the drive hosting the virtual machine replica file(s).  If there is insufficient storage space, add additional storage space

Delta Replication (DR) for a virtual machine

Symptom:  Delta Replication (DR) for a virtual machine fails
  1. Verify network connectivity to the Replica server using the procedures outlined in the Hyper-V Replica connectivity issues between Primary and Replica servers section
  2. Ensure the protocol configuration between the Primary and Replica server match
  3. Verify the Primary server is authorized to replicate with the Replica server
  4. Ensure the Authentication method matches between the Primary and Replica server
  5. Check for any error(s) on the Replica server indicating there is insufficient storage space available to host the virtual machine replica files
  6. Check for any error(s) on the Replica server indicating the virtual machine files could not be located
Symptom:  Application-consistent replicas are not generated by the Primary server and replicated to the Replica server
  1. Verify the virtual machine has been configured to replicate application-consistent replicas to the Replica server
  2. Verify the Integration Services version of the Guest matches what is installed in the Host (if there is a mismatch, a Warning message will be registered in the Hyper-V-Integration Admin log)
  3. Check the virtual machine Integration Services and verify the Backup (Volume snapshot) integration component is enabled in the Guest
  4. Review the system event log in the Guest and determine if there any errors pertaining to the Volume Shadow Copy Service (VSS)
  5. Test VSS in the Guest by executing a backup of the operating system
  6. Execute a backup on the Hyper-V host and verify the Guest can be backed up

Replication Broker issues

Symptom:  When enabling a virtual machine for replication, a connection to the Client Access Point (CAP) being used by the Hyper-V Replica Cluster Replication Broker cannot be made.
  1. Ensure all the resources supporting the Hyper-V Replica Clustering Replication Broker are Online in the cluster.  If there are any failures for the resources in the group, troubleshoot the failures using standard Failover Cluster troubleshooting procedures
  2. Move the resource group containing the Hyper-V Replica Clustering Replication Broker to another node in the cluster and attempt to enable replication for a virtual machine using the Client Access Point for the Hyper-V Replica Clustering Replication Broker

Guest IP functionality

Symptom:  After initiating a Failover for a virtual machine, the configured Failover TCP/IP settings for the virtual machine in the Replica server are not implemented and a connection to the virtual machine cannot be made.
  1. Ensure the Integration Components in the virtual machine have been updated.  This problem could occur in down-level operating systems running in a virtual machine on a Windows Server 2012 Hyper-V server
  2. Check the Hyper-V-Integration\Admin event log for an Event ID: 4010 Warning message reporting a problem with the Hyper-V Data Exchange functionality with the virtual machine experiencing this problem.  Additionally, an Event ID: 4132 Error message will be recorded indicating a problem applying IP settings to a network adapter in the virtual machine experiencing this problem
  3. Update the Integration Components in the virtual machine

Why is the “Hyper-V Replica Broker” required?

Hyper-V Replica requires the Failover Clustering role Hyper-V Replica Broker to be configured if either the primary or replica Hyper-V server is part of a cluster. This post builds on top of the guide and explains *why* the broker is required and captures its high level behavior.

The following example will be used through the rest of the article:

  • Cluster-P – Failover Cluster      in city 1
  • P1, P2, P3      (.contoso.com) – names of the cluster nodes on a cluster Cluster-P
  • P-Broker-CAP.contoso.com – the client access      point of the broker on Cluster-P
  • VirtualMachine_Workload – the name of the      virtual machine running on Cluster-P
  • Cluster-R – Failover Cluster      in city 2
  • R1, R2      (.contoso.com) – names of the cluster nodes on the Cluster-R
  • R-Broker-CAP.contoso.com – the client access      point of the broker on Cluster-R

Unified View

  • On Cluster-R;      P-Broker-CAP.contoso.com is added to the list of authorized      servers rather than adding P1, P2, P3 (.contoso.com) individually.
  • When enabling      replication for any virtual machine on the primary server, the client      access point of the broker on the replica server is used (and not the      replica server name)
  • When a      replicating virtual machine migrates within the Cluster-P,      the destination server is automatically authorized to send replication      traffic
  • When new nodes are added      to the Cluster-P, there is no change required on replication      settings (specifically the authorization table) on Cluster-R

Initial Node placement

  • When replication      is enabled for the primary virtual machine, the primary server contacts R-Broker-CAP
  • The request is      authenticated and authorized. R-Broker-CAP then picks a      random node fromits cluster Cluster-R after      validating whether the host node is available and if the Virtual machine      Management Service is running. It returns the node name (eg:      R2.contoso.com) to the primary server
  • The primary      server now starts replicating to this node (R2.contoso.com)

Making the replica virtual machine, HA

As part of creating the replica virtual machine, the Hyper-V Replica Broker is also responsible for making the virtual machine highly available. If the node crashes, the Failover Cluster Service would move replica the Virtual machine, thereby protecting the replica Virtual machine and the replication process from host crashes on the Cluster-R.

Redirect traffic in case replica virtual machine migrates

  • If the replica      virtual machine migrates from one node (eg: R1.contoso.com) to another (eg:      R2.contoso.com), the primary server falls back to the broker R-Broker-CAP      with the question “where is the replica for the virtual machine VirtualMachine_Workload
  • The broker      locates the virtual machine in the cluster and returns the node name      (R2.contoso.com) to the primary server.
  • The primary      server sends its subsequent requests to R2.contoso.com – the replication      is re-established with no manual intervention.

Provide centralized management of the replication settings

  • For a cluster on the      replica site, the replication settings are configured via the Replication      Settings which is available on clicking the Broker role in the Failover      cluster console.
  • The Broker role      writes the replication configuration to the cluster database and triggers      a notification.
  • Virtual machine      Management Service on each node picks up the configurations and each node      is now working with the latest copy of the replication settings.

Configure the Broker using PS cmdlet

·         Issue the following cmdlets to configure the broker:

$BrokerName = “P-Broker-CAP”

Add-ClusterServerRole -Name $BrokerName

Add-ClusterResource -Name “Virtual Machine Replication Broker” -Type “Virtual Machine Replication Broker” -Group $BrokerName

Add-ClusterResourceDependency “Virtual Machine Replication Broker” $BrokerName

Start-ClusterGroup $BrokerName