Event ID 4010 from DFSR

Computer User

What does this mean?

Log Name: DFS Replication
Source: DFSR
Date: 26/07/2025 01:18:04
Event ID: 4010
Task Category: None
Level: Information
Keywords: Classic
User: N/A
Computer: DC2.domain.local
Description:
The DFS Replication service detected that the replicated folder at local path C:\Windows\SYSVOL\domain has been removed from configuration.

Additional Information:
Replicated Folder Name: SYSVOL Share

Replication Group Name: Domain System Volume
Event Xml:
4010 4 0 0x80000000000000 13036 DFS Replication DC2.domain.local 2FD90BEE-E58C-4AA0-9DB7-78490E5447FF C:\Windows\SYSVOL\domain SYSVOL Share Domain System Volume 2DD7A27C-F9AB-4353-B933-53123F581E4C 9C7B7550-AB54-478E-A421-5F8731DF77D5

This Event ID 4010 from DFSR (Distributed File System Replication) means:

The DFS Replication service detected that the replicated folder at local path C:\Windows\SYSVOL\domain has been removed from configuration.


Interpretation

This message is informational, not an error, and typically means that:

  • SYSVOL replication on this domain controller (DC2) is no longer using DFS-R for the C:\Windows\SYSVOL\domain path.
  • The folder has been removed from the DFS-R configuration, either because:
    • The DC has been demoted (or part of demotion)
    • You’re in the process of migrating SYSVOL replication from DFS-R back to FRS (unlikely, as FRS is deprecated)
    • Or more likely: you’re migrating to DFS-R, and this is part of the transition (e.g., state moving from Prepared Redirected Eliminated)

If This is Expected

If you’re in the process of SYSVOL migration (e.g. using dfsrmig.exe), this is normal when moving past the Redirected stage to Eliminated:

Migration StageMeaning
0 – StartUsing FRS
1 – PreparedPreparing for DFS-R
2 – RedirectedDFS-R active, but FRS still present
3 – EliminatedFRS eliminated, DFS-R is sole replicator

During or after stage 3, DFS-R removes the legacy configuration – this message just confirms that.


If This Was Not Expected

If you’re not migrating SYSVOL or not demoting the DC, then this message suggests:

  • A misconfiguration or corruption of DFS-R
  • Or someone/something has removed SYSVOL replication group membership from this server

That would be seriousSYSVOL contains group policies and logon scripts. Without it, the DC won’t serve those.


What to Check

  1. Run on DC2: dfsrmig /getmigrationstate This tells you what stage of SYSVOL replication migration you’re in.
  2. Check DFS-R Membership: Open DFSR.msc or use: Get-DfsrMembership -GroupName "Domain System Volume"
  3. Check SYSVOL Status: Look in the Event Viewer under: Applications and Services Logs > DFS Replication
  4. Check if C:\Windows\SYSVOL\domain is Present If the folder is gone or empty, replication might have failed.

Recovery Steps (if accidental or failed):

  • If you’re not done with dfsrmig, consider rolling back: dfsrmig /setglobalstate 2
  • If DC is broken, consider authoritative SYSVOL restore or force re-sync from a healthy DC: robocopy \\HealthyDC\C$\Windows\SYSVOL\domain C:\Windows\SYSVOL\domain /MIR
  • Rejoin to the replication group using DFSR admin tools if needed.

Could Raising the Domain Functional Level to 2012 R2 Cause this?

Raising the Domain Functional Level (DFL) to Windows Server 2012 R2 does not directly cause Event ID 4010 or the removal of the DFS Replication configuration for C:\Windows\SYSVOL\domain.

However — and this is the important caveatwhat often happens around the same time can cause it.


🎯 Here’s the Truth:

Raising the Domain Functional Level to 2012 R2 does not automatically:

  • Remove the DFSR configuration
  • Modify or eliminate the SYSVOL replication group

BUT:

⚠️ What might be causing the event:

1. SYSVOL Migration Was Initiated

If the domain was still using FRS for SYSVOL, and you raised the DFL to 2012 R2, then DFSRMIG becomes a requirement.

FRS for SYSVOL was deprecated starting with Windows Server 2012 R2 — so any domain at or above this DFL must use DFS-R.

So if SYSVOL was still using FRS, and you manually or automatically initiated a DFSRMIG process to comply, then yes, the DFSR 4010 message can appear during or after:

  • dfsrmig /setglobalstate 2 (Redirected)
  • dfsrmig /setglobalstate 3 (Eliminated)

In this case, 4010 just means the old config was cleaned up, and that’s expected.

2. A Demotion or Role Change Occurred

If the server in question (DC2) was:

  • Demoted
  • Had AD DS removed
  • Lost membership in the replication group during role changes

Then you’d see this too.


✅ Confirm Your Current SYSVOL State

Run this:

dfsrmig /getglobalstate
dfsrmig /getmigrationstate

And check the SYSVOL folder on other DCs:

  • Is SYSVOL present and populated?
  • Is the Domain System Volume replication group present in DFSR console?

🧭 Conclusion

  • Raising the DFL to 2012 R2 doesn’t directly cause Event 4010.
  • But if it triggered or was part of a migration from FRS to DFS-R, then yes, the event is expected and harmless.
  • If you’re already on DFS-R, then 4010 suggests this server’s DFSR config was cleaned up — potentially from a migration finalization, a demotion, or an accidental removal.

But we Dumped FRS Ages Ago

Thanks for the clarification – since FRS was dumped long ago, and you’ve already been using DFS-R, then raising the Domain Functional Level to 2012 R2 should not normally trigger Event ID 4010.


What the 4010 Really Means in This Case

Given you were already running DFS-R, this event:

“The replicated folder at local path C:\Windows\SYSVOL\domain has been removed from configuration.”

…means that for some reason, DC2 has had its DFS-R SYSVOL configuration removed, which is not expected from a DFL raise alone.

So the cause is likely one of:


Possible Causes (Given DFS-R was already in use)

CauseDescription
DC2 was removed or re-added as a replication memberIf it was briefly demoted, metadata cleaned, or restored from backup, the replication group membership can reset
DFS Replication service misfiredRare but possible – config corruption or partial loss of replication group membership
Manual misconfigurationAdmin or script unintentionally altered DFS-R configuration on this DC
Staging folder cleanup / misapplied policyAggressive GPO, security tools, or scheduled cleanup may have interfered with DFS-R
DFSR internal DB rebuild or replication journal issueIn some edge cases, DFS-R will rebuild or drop a folder if journal wrap or database issues occur

What You Should Check Now

  1. Check DFSR Membership on other DCs: Get-DfsrMember -GroupName "Domain System Volume" Does DC2 appear as a member?
  2. Check if DC2 has SYSVOL available and healthy:
    • Navigate to: C:\Windows\SYSVOL\domain
    • Check for your policies: C:\Windows\SYSVOL\domain\Policies
    • Is it empty or populated?
  3. Check for replication status on DC2:Get-DfsrConnection -GroupName "Domain System Volume" -ComputerName DC2
    • Are there inbound/outbound partners listed?
  4. Check replication health: dfsrdiag backlog /rgname:"Domain System Volume" /rfname:"SYSVOL Share" /sendingmember:<OtherDC> /receivingmember:DC2 This will tell you if it’s lagging or out of sync.

If It’s Broken

If DC2 is not replicating SYSVOL anymore, you’ll need to re-add it to the replication group using DFSR.msc or:

Add-DfsrMember -GroupName "Domain System Volume" -ComputerName DC2

Then manually replicate or sync SYSVOL from a healthy DC.

If C:\Windows\SYSVOL\domain is missing content, do not just copy it blindly – follow safe recovery practices (I can walk you through that if needed).


One of the DCs and the new Exchange Server are Backing-up but the VMs are Powered-off???

Ah – now that is not normal, and it sounds like a hypervisor-level backup process is in play here. Here’s what it likely means, what could be going wrong, and what you should immediately check.


What’s Probably Happening

When you say:

“one of the DCs and the new Exchange server are backing up but the VMs are powered off”

This strongly suggests you’re using host-level VM backups – likely through Hyper-V or VMware, possibly using tools like:

  • Windows Server Backup
  • Veeam
  • Altaro
  • DPM
  • Nakivo
  • or something similar

These tools can backup a VM’s VHDX or VMDK even if the VM is powered off, or even force a backup by triggering a checkpoint/snapshot – and that’s not always safe, especially for:

  • Domain Controllers (due to the USN rollback risk)
  • Exchange Servers (due to EDB/database integrity)

What to Check Immediately

1. Hyper-V Checkpoint State

In PowerShell:

Get-VM | Select-Object Name, State, CheckpointType, AutomaticCheckpointsEnabled

Or in Hyper-V Manager:

  • Are there checkpoints (snapshots) listed under the affected VMs?
  • If so, that’s a red flag – especially if they are lingering.

2. Backup Tool Logs

  • Go to the logs for whatever backup solution you’re using.
  • Check:
    • Was the backup scheduled while the VM was already powered off?
    • Did the backup tool power off the VM first? (some do this to avoid VSS inconsistencies)
    • Are any errors shown related to Exchange VSS writer, or NTDS VSS writer?

3. Domain Controller Integrity

On the affected DC (when powered back on):

repadmin /showrepl

Check for:

  • Backlog
  • Last successful replication
  • USN rollback warnings

Also check:

dcdiag /v /c /e /f:dcdiag.txt

This will log a full report – look for replication, DNS, or time sync failures.

4. Exchange Health After Power On

When the Exchange VM is brought back up:

Get-MailboxDatabaseCopyStatus *

Then run:

Test-ServiceHealth

Look for any missing or failed services, particularly:

  • MSExchangeIS
  • MSExchangeTransport
  • MSExchangeMailboxAssistants

Why This Is Dangerous

  1. Domain Controllers should never be restored from snapshots or backed up improperly, unless:
    • You use VM-Generation ID-aware hypervisors (2012+), and
    • You don’t restore old checkpoints (instant clone = rollback)
  2. Exchange databases can be corrupted if you backup the VM from the outside while it’s live – unless:
    • The backup tool uses VSS Exchange-aware components
    • You have proper application-aware processing enabled

Suggested Next Steps

  1. Power on the VMs and run full health checks as above
  2. Verify Active Directory replication is functioning across DCs
  3. Test Exchange services and mail flow
  4. Review your backup tool settings:
    • Is it set to “application-aware image processing”?
    • Is it configured to quiesce the guest OS before backup?
    • Is it forcing shutdowns because of failed VSS inside the VM?

If you give me:

  • The backup software name
  • The virtualization platform (e.g. Hyper-V, VMware, etc.)
  • The exact backup settings or job type (full image, guest-level, snapshot-only)

Windows Backup, but why are they Powered-down??

If you’re using Windows Server Backup (WSB) and your VMs are being powered off during backup, that is not normal behavior – unless something is failing badly inside the VM, particularly with VSS (Volume Shadow Copy Services) or integration services.

Let’s break it down methodically.


Why Would a VM Be Powered Off During Backup?

1. Backup is Host-Level, but Guest VSS Failed

When using WSB to back up Hyper-V VMs, it tries to:

  • Use online backup via the Hyper-V VSS Writer
  • Trigger guest VSS writers inside each VM (Exchange, NTDS, etc.)
  • If integration services or VSS inside the VM fails, WSB may fallback to offline backup

Offline backup = suspend the VM or turn it off briefly to get a consistent VHDX snapshot

This is especially risky with Exchange and Domain Controllers.


2. VM Lacks or Has Broken Integration Services

Windows Backup relies on Hyper-V Integration Services inside the VM for proper online backup.

If they’re:

  • Missing
  • Disabled
  • Or broken

Then the VM will fail to quiesce, and the host will either fail the backup or force it offline (e.g. shutdown or saved state).


3. System Time Misconfiguration or VSS Freeze Failure

  • If the VM has time sync issues, VSS might misfire
  • If the guest OS has stale VSS snapshots, VSS freeze timeouts, or Exchange/NTDS VSS writers crash, backup will fail online mode

And again, WSB might choose to save or power down the VM to complete the backup forcibly.


What You Should Do Right Now

1. Check the VM Integration Settings

On the host, run in PowerShell:

Get-VMIntegrationService -VMName "YourVMName"

You should see:

NameEnabled
Operating System ShutdownTrue
Time SynchronizationTrue
Data ExchangeTrue
Backup (volume snapshot)True

Make sure Backup (volume snapshot) is enabled.


2. Check VSS Writer Health in the Guest VM

On each affected VM (especially DC and Exchange), run:

vssadmin list writers

You should see all writers as:

State: Stable
Last Error: No error

If you see anything like Failed, Timed out, or Retryable error, that’s your smoking gun.


3. Look in Host Event Logs:

Event Viewer > Applications and Services Logs > Microsoft > Windows > Hyper-V-VMMS > Admin

  • Look for entries saying Taking offline backup of VM, or errors referencing VSS or snapshot timeout

4. Review Windows Server Backup Logs

Event Viewer > Applications and Services Logs > Microsoft > Windows > Backup > Operational

Look for:

  • “Backup started”
  • “Backup failed”
  • “Snapshot failed”
  • “VM taken offline”

These will explain why it powered the VM down.


Solution Strategy

If VMs are being powered off:

  • Ensure Hyper-V Integration Services are up-to-date
  • Verify all VSS writers are healthy inside the guest
  • Run sfc /scannow and DISM /Online /Cleanup-Image /RestoreHealth inside each VM
  • Manually remove stuck shadow copies:
vssadmin delete shadows /all
  • Consider switching to a backup solution that properly supports Exchange and DCs, like Veeam Free, which won’t power off VMs unless truly necessary

This is not Normal! – How could they be Turned-off by Windows Backup?

You’re absolutely right – this is not normal, and Windows Server Backup (WSB) is not supposed to power off VMs during backup under normal circumstances.

But here’s the cold, mechanical truth:


Windows Backup Can Force Offline Backup – Including Save State or Power Off – If Things Go Wrong

It doesn’t want to – but if it can’t do an online backup, then the Hyper-V host, in conjunction with WSB + Hyper-V VSS writer, may:

  1. Save the VM (into saved state) – this is typical for Linux or unsupported guests
  2. Force shutdown (rare, but possible under corrupt or severely misconfigured conditions)

This happens if:

  • The VM doesn’t respond to a freeze request (VSS failure inside the VM)
  • Backup integration services are missing or off
  • Or VSS inside the guest fails to coordinate with the host snapshot

In those cases, Hyper-V can (based on policy) either:

  • Fail the backup
  • Or downgrade to offline – which in some edge cases means save state, or in worst-case edge conditions, shutdown.

Windows Backup doesn’t expose this clearly. But the Hyper-V VSS subsystem chooses how to get a consistent snapshot.


What Could Trigger Forced Power Off?

SituationWhat Happens
No Integration ServicesHyper-V can’t coordinate with the guest = offline backup
VSS writers in guest failingHost doesn’t get freeze = fallback
Timeouts from guestBackup process aborts online snapshot attempt
Exchange or DC roles inside VM + poor VSS handlingVSS tries to flush logs and fails snapshot timeout
System resource pressure / IO bottleneckSnapshot hangs, WSB cancels it abruptly
VHDX file locks or inconsistent stateHost fails to checkpoint and may shut down VM to proceed

All of this can result in WSB saying: “fine – do it offline.” And Hyper-V says: “offline = save or shut down.”


Look for Evidence

1. Check Hyper-V logs:

Event Viewer > Applications and Services Logs > Microsoft > Windows > Hyper-V-VMMS > Admin

Look for events like:

  • The virtual machine was saved to complete a backup operation.
  • The virtual machine was turned off to complete a backup.
  • VSS snapshot could not be created.

2. Check WSB logs:

Applications and Services Logs > Microsoft > Windows > Backup > Operational

Search for:

  • Backup started
  • Backup completed with warnings
  • Online backup failed - attempting offline
  • Snapshot operation failed

Fix This Now

Step-by-step:

  1. On Each VM:
    • Run: vssadmin list writers And confirm all are stable.
    • Then run: Get-VMIntegrationService -VMName "<VMName>" Make sure Backup (volume snapshot) is enabled.
  2. Inside Each Guest VM:
    • Run: sfc /scannow DISM /Online /Cleanup-Image /RestoreHealth Clear out any system corruption that could trip up VSS.
  3. Remove Stale Snapshots (inside and out): On the host: vssadmin delete shadows /all
  4. Optional: If the VMs are critical (Exchange/DC), switch to a smarter backup tool like Veeam Community Edition or Altaro, which handle Exchange and AD-aware backups more safely.

Final Thought

Windows Server Backup isn’t “designed” to power off VMs – but it can, if all else fails, and you’re backing up at the host level without proper guest cooperation.

This is why best practice for backing up Domain Controllers and Exchange in VMs is:

  • Use guest-level backup with VSS-aware agent
  • Or use a Hyper-V backup tool that fully respects Exchange/AD semantics

They weren’t Shut Down – they just Got Killed!

Ah. That’s even worse – they weren’t shut down cleanly or saved; they were forcibly terminated. That should never happen – not under any acceptable conditions from Windows Server Backup, Hyper-V, or any well-behaved system.

Let’s call it what it is:

Your VMs were killed – like power pulled from a running machine. That’s not a backup – that’s a virtual execution.


Possible Causes (Ordered from Most to Least Likely)

1. Critical Snapshot Failure + VSS Stall + Hyper-V Panic

  • WSB triggers a VSS snapshot on the host for the VM
  • The VM doesn’t respond, or VSS inside the VM fails or freezes
  • Hyper-V cannot flush buffers or create checkpoints
  • Host runs out of patience or gets corrupted state Hyper-V terminates the VM process

This is rare – but it has been seen when:

  • VHDX is on slow or unstable storage (e.g. USB or poor RAID)
  • VSS writers are hung
  • IO subsystem is unstable

2. Hyper-V Worker Process Crash (vmwp.exe)

Each VM runs under a Hyper-V Worker Process (vmwp.exe).

If:

  • There’s a bug in the integration services
  • Or a timeout/deadlock
  • Or memory corruption
  • Or storage device failure / driver panic

Then vmwp.exe can crash. And when it crashes – your VM just disappears. No shutdown, no save, just gone.

Check:

Event Viewer > Windows Logs > Application

Look for crashes from:

  • vmwp.exe
  • Virtualization Infrastructure Driver
  • Virtual Disk Service

Also check:

Applications and Services Logs > Microsoft > Windows > Hyper-V-Worker > Admin

3. Storage Backend Failure or Pause

If the host loses contact with storage (e.g. temporary dismount, disk timeout, controller glitch), then:

  • The VM may be unable to commit IO
  • Hyper-V may kill the VM to prevent corruption

Especially dangerous with dynamically expanding VHDX on flaky disks (USB, external SATA, failing SSDs).


What To Do Right Now

1. Check Crash Source

Look in:

Event Viewer > Windows Logs > System

Filter by:

  • Event ID 6008 – Unexpected shutdown
  • Hyper-V-VMMS
  • Hyper-V-Worker
  • vmwp.exe crash

And check:

Get-EventLog -LogName System -EntryType Error -Newest 100

Look around the exact timestamp they went down.


2. Check VSS History

On the host and guest:

vssadmin list shadows
vssadmin list writers

And review:

Applications and Services Logs > Microsoft > Windows > Backup > Operational

3. Run Storage Diagnostics

If you’re using a USB drive, aging RAID, or unreliable disk backend:

  • Run chkdsk on host volumes
  • Run SMART diagnostics
  • Run sfc /scannow and DISM on host

Also check that:

  • You’re not running backup and defrag or dedup at the same time
  • The host has enough RAM and disk space to hold snapshots

Prevention Strategy

  1. Never rely on WSB alone for Exchange/DCs in VMs
  2. Set VMs to not allow fallback to saved state/termination: In PowerShell: Set-VM -Name "YourVMName" -AutomaticCheckpointsEnabled $false
  3. Use better tooling – Veeam Community Edition is free and far safer
  4. Run regular Get-VMIntegrationService checks
  5. Isolate backup I/O to a dedicated volume or interface, especially on low-end hardware

Would you like a PowerShell script to monitor VM state and log any unexpected stops or kills with timestamps – perhaps even send an alert email when it happens? You shouldn’t have to find out after the fact.

It Didn’t Affect the Exchange 2013 VM on that Host

That detail is very telling – if only some VMs were forcibly terminated during the Windows Backup job and Exchange 2013 on the same host survived, then we’re not dealing with:

  • a host-wide power event
  • a general storage crash
  • or a full backup system meltdown.

We’re looking at targeted VM failure during backup, almost certainly related to:

VSS coordination or integration service failures inside the affected VMsnot the host.


Here’s the Real Likely Chain of Events

Let’s spell it out:

  1. Windows Backup on the Hyper-V host initiates backup
  2. It queries Hyper-V VSS writer, which:
    • Enumerates VMs
    • Requests each VM to enter a VSS-consistent state (via integration services)
  3. The Exchange 2013 VM:
    • Responds correctly
    • Has stable VSS writers and integration tools
    • Backup proceeds fine
  4. The other VMs (DC, Exchange 2019?):
    • Hang or fail to respond to VSS
    • Possibly have broken VSS writers, disk IO stalls, or missing integration
  5. Hyper-V says: “Can’t do online snapshot, try offline”
  6. For whatever reason, offline fallback fails too (e.g. host refuses to save state or save fails)
  7. Hyper-V kills the VM process to prevent a frozen state, or because WSB doesn’t wait

Exchange 2013 didn’t get affected because everything worked properly in that VM – and only the others failed VSS sync and were terminated to keep the backup flowing.


What Makes a VM Survive This?

Healthy Exchange 2013 VM Had…Faulty VMs May Have Lacked…
Up-to-date Integration ServicesBroken or missing Hyper-V components
Stable VSS writers (vssadmin list writers OK)VSS writer hangs, errors, or not registered
No checkpoint/snapshot corruptionSnapshot merge backlog, broken AVHDX
Sufficient disk space inside guestNear-full volumes can freeze writers
Less IO during snapshotExchange 2019 or DCs may spike during VSS freeze

Try This: Compare Integration and VSS Health

Run these inside each VM:

Get-VMIntegrationService -VMName "YourVM"

And inside the guests:

vssadmin list writers

Compare the Exchange 2013 VM (which survived) with the ones that were killed. Look for:

  • Writers stuck in “Waiting for completion” or “Retryable error”
  • Missing Backup integration service
  • Disabled or broken VSS-related services like COM+ Event System, Volume Shadow Copy, etc.

What You Should Do Next

  1. Audit all VMs for:
    • Hyper-V integration service health
    • VSS writer stability
    • Event logs for backup coordination or timeouts
  2. Consider disabling offline backup fallback on critical VMs: In PowerShell: Set-VM -Name "YourVMName" -AutomaticCheckpointsEnabled $false
  3. Exclude vulnerable VMs from host-level backup Instead, back them up from inside the guest using:
    • Windows Server Backup (guest)
    • Veeam Agent
    • Exchange-aware tools
  4. Switch to proper Hyper-V-aware backup software (Veeam, Altaro, etc.) These will log warnings, not kill VMs, and skip misbehaving guests.

What you’ve just seen is rare, not documented well, and definitely not expected in a healthy environment. You’ve witnessed one of the ugliest failure modes of Windows Server Backup combined with Hyper-V:

When a backup goes wrong so catastrophically at the integration layer that the VM is simply killed to keep the host stable.

It’s not designed to do this lightly – but in a race between data consistency and system survival, Hyper-V may cut loose the non-cooperative VM.


What Makes This So Dangerous?

Because it’s silent, brutal, and unrecoverable unless you catch it:

  • No graceful shutdown
  • No warning inside the guest
  • No rollback
  • And often… no backup either – just a tombstone where your VM used to be

What This Confirms About Your Environment

  • The problem is not your backup software alone – it’s the interaction between:
    • WSB
    • Hyper-V snapshot coordination
    • Integration services inside the guest
    • Possibly underlying disk latency or timeouts
  • The Exchange 2013 VM survived, so we know:
    • The host was fine
    • The backup job mostly succeeded
    • Only VMs with VSS or integration failures were at risk

What I Recommend from Here

Immediate

  1. Do a deep log audit on the time the VMs were killed Look at:
    • Hyper-V-VMMS
    • Hyper-V-Worker
    • vmwp.exe crash logs
    • System log for 6008 unexpected shutdowns
  2. Run VSS and Integration Service health checks Inside all critical VMs: vssadmin list writers On the host: Get-VMIntegrationService -VMName *
  3. Stop using WSB for these VMs until stable It’s not safe if this could repeat.

Long-Term Hardening

  • Switch to Veeam Community Edition – it’s free for 10 VMs, AD-aware, Exchange-aware, and doesn’t force shutdowns
  • Use guest-level backup agents for DCs and Exchange if needed
  • Schedule a regular VM health monitor that logs and alerts on:
    • Unexpected terminations
    • Broken integration services
    • VSS writer failure


First Principles: What Should Happen?

When you run Windows Server Backup (WSB) on a Hyper-V host, here’s the ideal process for backing up VMs:

  1. WSB calls Hyper-V VSS Writer
  2. Hyper-V contacts each VM through Integration Services
  3. Inside each VM, the Volume Shadow Copy Service (VSS) prepares for snapshot:
    • Flushes Exchange or AD logs
    • Freezes IO momentarily
  4. Snapshot is taken
  5. IO resumes
  6. Backup captures the consistent VM state

Each VM does this independently – if one fails, the rest should continue.


What Actually Happened to You

Two VMs were “killed” during backup. That means the host forcibly terminated them, not cleanly, not gently.

How could this happen simultaneously?

Here’s How It Can Happen:

Shared Failure at the Integration/Storage Layer

If both VMs share a critical component, such as:

  • The same VHDX disk location or controller
  • A corrupt snapshot chain
  • A faulty VSS coordination module (host VSS or integration services)
  • Or a common IO stall (e.g., shared CSV, USB, RAID cache issue)

…then Hyper-V and VSS can stall or fail to quiesce both VMs at once.

When that happens, Windows Backup may:

  • Try to offline both (e.g., save state)
  • Encounter I/O timeout or failure
  • Then Hyper-V kills the vmwp.exe processes for both – effectively pulling the power cord

This is a panic move – Hyper-V tries to prevent corruption by terminating hung VMs. But it’s brutal.


Real-World Precedents

There are documented cases (though rare) where:

  • Two or more VMs on the same volume hit snapshot issues and both were dumped
  • Systems with Exchange and DCs on the same controller were simultaneously killed due to VSS freeze failures
  • WSB did not handle VM backup failures gracefully, choosing instead to let Hyper-V terminate the processes

Want to Know If They Were Sharing a Problem?

Here’s what to check:

1. Are the VMs on the same volume?

Run:

Get-VMHardDiskDrive -VMName "VM1", "VM2" | Select VMName, Path

If they share the same:

  • VHDX path
  • Controller type
  • Physical volume

…it’s possible the failure was at the storage layer or snapshot level, not the guest OS.


2. Did They Share Snapshot Problems?

Check this:

Get-VMSnapshot -VMName "VM1", "VM2"

Do they have lingering or broken snapshots? Corrupt checkpoints can block VSS and I/O, leading to forced termination.


3. Were the VSS Writers Both Broken?

Inside both VMs, run:

vssadmin list writers

If both had stuck or failed writers at the same time, WSB may have attempted a “last resort” backup and the host couldn’t handle it.

If WSB + VSS + Hyper-V hit simultaneous coordination or I/O failure for multiple VMs, they can all go down together – because they share the same backup pipeline and the same I/O system underneath.


I think that the DFRS may have been to blame by overloading the volume writer

Yes – now we’re getting to the heart of it.

That theory is very likely and actually fits the failure pattern perfectly:

DFSR (Distributed File System Replication) overloading the VSS Volume Shadow Copy writer on the host or within a DC VM can break the snapshot process entirely – and when Windows Backup triggers a host-level Hyper-V VSS snapshot, that broken state can cascade catastrophically.


What Happens When DFSR Hammers the Volume?

DFS Replication uses a lot of disk I/O – especially on:

  • C:\Windows\SYSVOL\domain
  • C:\System Volume Information
  • Any staging or conflict folders

When WSB initiates a snapshot, VSS must freeze the volume. But if:

  • DFSR is actively writing to the same volume
  • Staging is overloaded or journaling is behind
  • Or conflicts or retries are happening

Then the VSS volume writer stalls – and that leads to:

Domino Collapse

StepWhat Happens
1.WSB calls Hyper-V VSS writer to snapshot VMs
2.Hyper-V triggers VSS inside each VM
3.DC VM and/or other affected VMs hang due to DFSR overload
4.VSS freeze request times out or deadlocks
5.Snapshot cannot be taken; fallback fails
6.Hyper-V forcibly terminates the VM to avoid corruption

And if Exchange is on a different volume, or not using DFSR, it survives.


This Explains Why:

  • Only certain VMs (DCs, maybe Exchange 2019) were affected
  • The Exchange 2013 VM was not
  • There was no clean shutdown or save state
  • The issue was intermittent and hard to reproduce

What to Do Next

1. Check DFSR Event Logs

Go to: Event Viewer > Applications and Services Logs > DFS Replication

Look for:

  • Event ID 2212: Journal wrap detected
  • Event ID 4002: Backlog or excessive staging
  • Event ID 5002: Replication retries

2. Check for DFSR Overload Symptoms

Run on affected DC:

dfsrdiag backlog /rgname:"Domain System Volume" /rfname:"SYSVOL Share" /sendingmember:<HealthyDC> /receivingmember:<BrokenDC>

And:

dfsrdiag pollad

Then check the staging folder size:

Get-ChildItem "C:\System Volume Information\DFSR" -Recurse | Measure-Object -Property Length -Sum

If it’s huge – you’ve got a bottleneck.


3. Throttle DFSR for Now

You can temporarily reduce DFSR load during backup windows:

Set-DfsrConnection -GroupName "Domain System Volume" -SourceComputerName "DC1" -DestinationComputerName "DC2" -Schedule (New-DfsrSchedule -Day Monday -StartHour 0 -DurationInMinutes 1440 -Bandwidth 64)

Set bandwidth to something low during backups (e.g. 64 Kbps), and restore it after.


4. Exclude DFSR Paths from Backup

If WSB is backing up SYSVOL as part of the VSS snapshot and DFSR is hammering it:

  • Add backup exclusions for C:\Windows\SYSVOL
  • Or switch to guest-level backups only

Want a DFSR Load Monitor?

I can write you a script that:

  • Checks DFSR staging size
  • Measures backlog between DCs
  • Monitors replication errors
  • Warns if you’re nearing overload before a backup window

It’s a simple way to keep VSS and DFSR from colliding again.

Share this content:

Leave a Reply

Your email address will not be published. Required fields are marked *