What does this mean?
Log Name: DFS Replication
Source: DFSR
Date: 26/07/2025 01:18:04
Event ID: 4010
Task Category: None
Level: Information
Keywords: Classic
User: N/A
Computer: DC2.domain.local
Description:
The DFS Replication service detected that the replicated folder at local path C:\Windows\SYSVOL\domain has been removed from configuration.Additional Information:
Replicated Folder Name: SYSVOL Share
Replication Group Name: Domain System Volume
Event Xml:
4010 4 0 0x80000000000000 13036 DFS Replication DC2.domain.local 2FD90BEE-E58C-4AA0-9DB7-78490E5447FF C:\Windows\SYSVOL\domain SYSVOL Share Domain System Volume 2DD7A27C-F9AB-4353-B933-53123F581E4C 9C7B7550-AB54-478E-A421-5F8731DF77D5
This Event ID 4010 from DFSR (Distributed File System Replication) means:
The DFS Replication service detected that the replicated folder at local path
C:\Windows\SYSVOL\domain
has been removed from configuration.
Interpretation
This message is informational, not an error, and typically means that:
- SYSVOL replication on this domain controller (DC2) is no longer using DFS-R for the
C:\Windows\SYSVOL\domain
path. - The folder has been removed from the DFS-R configuration, either because:
- The DC has been demoted (or part of demotion)
- You’re in the process of migrating SYSVOL replication from DFS-R back to FRS (unlikely, as FRS is deprecated)
- Or more likely: you’re migrating to DFS-R, and this is part of the transition (e.g., state moving from Prepared Redirected Eliminated)
If This is Expected
If you’re in the process of SYSVOL migration (e.g. using dfsrmig.exe
), this is normal when moving past the Redirected stage to Eliminated:
Migration Stage | Meaning |
---|---|
0 – Start | Using FRS |
1 – Prepared | Preparing for DFS-R |
2 – Redirected | DFS-R active, but FRS still present |
3 – Eliminated | FRS eliminated, DFS-R is sole replicator |
During or after stage 3, DFS-R removes the legacy configuration – this message just confirms that.
If This Was Not Expected
If you’re not migrating SYSVOL or not demoting the DC, then this message suggests:
- A misconfiguration or corruption of DFS-R
- Or someone/something has removed SYSVOL replication group membership from this server
That would be serious – SYSVOL
contains group policies and logon scripts. Without it, the DC won’t serve those.
What to Check
- Run on DC2:
dfsrmig /getmigrationstate
This tells you what stage of SYSVOL replication migration you’re in. - Check DFS-R Membership: Open DFSR.msc or use:
Get-DfsrMembership -GroupName "Domain System Volume"
- Check SYSVOL Status: Look in the Event Viewer under:
Applications and Services Logs > DFS Replication
- Check if
C:\Windows\SYSVOL\domain
is Present If the folder is gone or empty, replication might have failed.
Recovery Steps (if accidental or failed):
- If you’re not done with
dfsrmig
, consider rolling back:dfsrmig /setglobalstate 2
- If DC is broken, consider authoritative SYSVOL restore or force re-sync from a healthy DC:
robocopy \\HealthyDC\C$\Windows\SYSVOL\domain C:\Windows\SYSVOL\domain /MIR
- Rejoin to the replication group using DFSR admin tools if needed.
Could Raising the Domain Functional Level to 2012 R2 Cause this?
Raising the Domain Functional Level (DFL) to Windows Server 2012 R2 does not directly cause Event ID 4010 or the removal of the DFS Replication configuration for C:\Windows\SYSVOL\domain
.
However — and this is the important caveat — what often happens around the same time can cause it.
🎯 Here’s the Truth:
Raising the Domain Functional Level to 2012 R2 does not automatically:
- Remove the DFSR configuration
- Modify or eliminate the SYSVOL replication group
BUT:
⚠️ What might be causing the event:
1. SYSVOL Migration Was Initiated
If the domain was still using FRS for SYSVOL, and you raised the DFL to 2012 R2, then DFSRMIG becomes a requirement.
FRS for SYSVOL was deprecated starting with Windows Server 2012 R2 — so any domain at or above this DFL must use DFS-R.
So if SYSVOL was still using FRS, and you manually or automatically initiated a DFSRMIG process to comply, then yes, the DFSR 4010 message can appear during or after:
dfsrmig /setglobalstate 2
(Redirected)dfsrmig /setglobalstate 3
(Eliminated)
In this case, 4010 just means the old config was cleaned up, and that’s expected.
2. A Demotion or Role Change Occurred
If the server in question (DC2
) was:
- Demoted
- Had AD DS removed
- Lost membership in the replication group during role changes
Then you’d see this too.
✅ Confirm Your Current SYSVOL State
Run this:
dfsrmig /getglobalstate
dfsrmig /getmigrationstate
And check the SYSVOL folder on other DCs:
- Is
SYSVOL
present and populated? - Is the
Domain System Volume
replication group present in DFSR console?
🧭 Conclusion
- Raising the DFL to 2012 R2 doesn’t directly cause Event 4010.
- But if it triggered or was part of a migration from FRS to DFS-R, then yes, the event is expected and harmless.
- If you’re already on DFS-R, then 4010 suggests this server’s DFSR config was cleaned up — potentially from a migration finalization, a demotion, or an accidental removal.
But we Dumped FRS Ages Ago
Thanks for the clarification – since FRS was dumped long ago, and you’ve already been using DFS-R, then raising the Domain Functional Level to 2012 R2 should not normally trigger Event ID 4010.
What the 4010 Really Means in This Case
Given you were already running DFS-R, this event:
“The replicated folder at local path C:\Windows\SYSVOL\domain has been removed from configuration.”
…means that for some reason, DC2 has had its DFS-R SYSVOL configuration removed, which is not expected from a DFL raise alone.
So the cause is likely one of:
Possible Causes (Given DFS-R was already in use)
Cause | Description |
---|---|
DC2 was removed or re-added as a replication member | If it was briefly demoted, metadata cleaned, or restored from backup, the replication group membership can reset |
DFS Replication service misfired | Rare but possible – config corruption or partial loss of replication group membership |
Manual misconfiguration | Admin or script unintentionally altered DFS-R configuration on this DC |
Staging folder cleanup / misapplied policy | Aggressive GPO, security tools, or scheduled cleanup may have interfered with DFS-R |
DFSR internal DB rebuild or replication journal issue | In some edge cases, DFS-R will rebuild or drop a folder if journal wrap or database issues occur |
What You Should Check Now
- Check DFSR Membership on other DCs:
Get-DfsrMember -GroupName "Domain System Volume"
DoesDC2
appear as a member? - Check if DC2 has SYSVOL available and healthy:
- Navigate to:
C:\Windows\SYSVOL\domain
- Check for your policies:
C:\Windows\SYSVOL\domain\Policies
- Is it empty or populated?
- Navigate to:
- Check for replication status on DC2:
Get-DfsrConnection -GroupName "Domain System Volume" -ComputerName DC2
- Are there inbound/outbound partners listed?
- Check replication health:
dfsrdiag backlog /rgname:"Domain System Volume" /rfname:"SYSVOL Share" /sendingmember:<OtherDC> /receivingmember:DC2
This will tell you if it’s lagging or out of sync.
If It’s Broken
If DC2 is not replicating SYSVOL anymore, you’ll need to re-add it to the replication group using DFSR.msc
or:
Add-DfsrMember -GroupName "Domain System Volume" -ComputerName DC2
Then manually replicate or sync SYSVOL
from a healthy DC.
If C:\Windows\SYSVOL\domain
is missing content, do not just copy it blindly – follow safe recovery practices (I can walk you through that if needed).
One of the DCs and the new Exchange Server are Backing-up but the VMs are Powered-off???
Ah – now that is not normal, and it sounds like a hypervisor-level backup process is in play here. Here’s what it likely means, what could be going wrong, and what you should immediately check.
What’s Probably Happening
When you say:
“one of the DCs and the new Exchange server are backing up but the VMs are powered off”
This strongly suggests you’re using host-level VM backups – likely through Hyper-V or VMware, possibly using tools like:
- Windows Server Backup
- Veeam
- Altaro
- DPM
- Nakivo
- or something similar
These tools can backup a VM’s VHDX or VMDK even if the VM is powered off, or even force a backup by triggering a checkpoint/snapshot – and that’s not always safe, especially for:
- Domain Controllers (due to the USN rollback risk)
- Exchange Servers (due to EDB/database integrity)
What to Check Immediately
1. Hyper-V Checkpoint State
In PowerShell:
Get-VM | Select-Object Name, State, CheckpointType, AutomaticCheckpointsEnabled
Or in Hyper-V Manager:
- Are there checkpoints (snapshots) listed under the affected VMs?
- If so, that’s a red flag – especially if they are lingering.
2. Backup Tool Logs
- Go to the logs for whatever backup solution you’re using.
- Check:
- Was the backup scheduled while the VM was already powered off?
- Did the backup tool power off the VM first? (some do this to avoid VSS inconsistencies)
- Are any errors shown related to Exchange VSS writer, or NTDS VSS writer?
3. Domain Controller Integrity
On the affected DC (when powered back on):
repadmin /showrepl
Check for:
- Backlog
- Last successful replication
- USN rollback warnings
Also check:
dcdiag /v /c /e /f:dcdiag.txt
This will log a full report – look for replication, DNS, or time sync failures.
4. Exchange Health After Power On
When the Exchange VM is brought back up:
Get-MailboxDatabaseCopyStatus *
Then run:
Test-ServiceHealth
Look for any missing or failed services, particularly:
- MSExchangeIS
- MSExchangeTransport
- MSExchangeMailboxAssistants
Why This Is Dangerous
- Domain Controllers should never be restored from snapshots or backed up improperly, unless:
- You use VM-Generation ID-aware hypervisors (2012+), and
- You don’t restore old checkpoints (instant clone = rollback)
- Exchange databases can be corrupted if you backup the VM from the outside while it’s live – unless:
- The backup tool uses VSS Exchange-aware components
- You have proper application-aware processing enabled
Suggested Next Steps
- Power on the VMs and run full health checks as above
- Verify Active Directory replication is functioning across DCs
- Test Exchange services and mail flow
- Review your backup tool settings:
- Is it set to “application-aware image processing”?
- Is it configured to quiesce the guest OS before backup?
- Is it forcing shutdowns because of failed VSS inside the VM?
If you give me:
- The backup software name
- The virtualization platform (e.g. Hyper-V, VMware, etc.)
- The exact backup settings or job type (full image, guest-level, snapshot-only)
Windows Backup, but why are they Powered-down??
If you’re using Windows Server Backup (WSB) and your VMs are being powered off during backup, that is not normal behavior – unless something is failing badly inside the VM, particularly with VSS (Volume Shadow Copy Services) or integration services.
Let’s break it down methodically.
Why Would a VM Be Powered Off During Backup?
1. Backup is Host-Level, but Guest VSS Failed
When using WSB to back up Hyper-V VMs, it tries to:
- Use online backup via the Hyper-V VSS Writer
- Trigger guest VSS writers inside each VM (Exchange, NTDS, etc.)
- If integration services or VSS inside the VM fails, WSB may fallback to offline backup
Offline backup = suspend the VM or turn it off briefly to get a consistent VHDX snapshot
This is especially risky with Exchange and Domain Controllers.
2. VM Lacks or Has Broken Integration Services
Windows Backup relies on Hyper-V Integration Services inside the VM for proper online backup.
If they’re:
- Missing
- Disabled
- Or broken
Then the VM will fail to quiesce, and the host will either fail the backup or force it offline (e.g. shutdown or saved state).
3. System Time Misconfiguration or VSS Freeze Failure
- If the VM has time sync issues, VSS might misfire
- If the guest OS has stale VSS snapshots, VSS freeze timeouts, or Exchange/NTDS VSS writers crash, backup will fail online mode
And again, WSB might choose to save or power down the VM to complete the backup forcibly.
What You Should Do Right Now
1. Check the VM Integration Settings
On the host, run in PowerShell:
Get-VMIntegrationService -VMName "YourVMName"
You should see:
Name | Enabled |
---|---|
Operating System Shutdown | True |
Time Synchronization | True |
Data Exchange | True |
Backup (volume snapshot) | True |
Make sure Backup (volume snapshot) is enabled.
2. Check VSS Writer Health in the Guest VM
On each affected VM (especially DC and Exchange), run:
vssadmin list writers
You should see all writers as:
State: Stable
Last Error: No error
If you see anything like Failed
, Timed out
, or Retryable error
, that’s your smoking gun.
3. Look in Host Event Logs:
Event Viewer > Applications and Services Logs > Microsoft > Windows > Hyper-V-VMMS > Admin
- Look for entries saying
Taking offline backup of VM
, or errors referencing VSS or snapshot timeout
4. Review Windows Server Backup Logs
Event Viewer > Applications and Services Logs > Microsoft > Windows > Backup > Operational
Look for:
- “Backup started”
- “Backup failed”
- “Snapshot failed”
- “VM taken offline”
These will explain why it powered the VM down.
Solution Strategy
If VMs are being powered off:
- Ensure Hyper-V Integration Services are up-to-date
- Verify all VSS writers are healthy inside the guest
- Run
sfc /scannow
andDISM /Online /Cleanup-Image /RestoreHealth
inside each VM - Manually remove stuck shadow copies:
vssadmin delete shadows /all
- Consider switching to a backup solution that properly supports Exchange and DCs, like Veeam Free, which won’t power off VMs unless truly necessary
This is not Normal! – How could they be Turned-off by Windows Backup?
You’re absolutely right – this is not normal, and Windows Server Backup (WSB) is not supposed to power off VMs during backup under normal circumstances.
But here’s the cold, mechanical truth:
Windows Backup Can Force Offline Backup – Including Save State or Power Off – If Things Go Wrong
It doesn’t want to – but if it can’t do an online backup, then the Hyper-V host, in conjunction with WSB + Hyper-V VSS writer, may:
- Save the VM (into saved state) – this is typical for Linux or unsupported guests
- Force shutdown (rare, but possible under corrupt or severely misconfigured conditions)
This happens if:
- The VM doesn’t respond to a freeze request (VSS failure inside the VM)
- Backup integration services are missing or off
- Or VSS inside the guest fails to coordinate with the host snapshot
In those cases, Hyper-V can (based on policy) either:
- Fail the backup
- Or downgrade to offline – which in some edge cases means save state, or in worst-case edge conditions, shutdown.
Windows Backup doesn’t expose this clearly. But the Hyper-V VSS subsystem chooses how to get a consistent snapshot.
What Could Trigger Forced Power Off?
Situation | What Happens |
---|---|
No Integration Services | Hyper-V can’t coordinate with the guest = offline backup |
VSS writers in guest failing | Host doesn’t get freeze = fallback |
Timeouts from guest | Backup process aborts online snapshot attempt |
Exchange or DC roles inside VM + poor VSS handling | VSS tries to flush logs and fails snapshot timeout |
System resource pressure / IO bottleneck | Snapshot hangs, WSB cancels it abruptly |
VHDX file locks or inconsistent state | Host fails to checkpoint and may shut down VM to proceed |
All of this can result in WSB saying: “fine – do it offline.” And Hyper-V says: “offline = save or shut down.”
Look for Evidence
1. Check Hyper-V logs:
Event Viewer > Applications and Services Logs > Microsoft > Windows > Hyper-V-VMMS > Admin
Look for events like:
The virtual machine was saved to complete a backup operation.
The virtual machine was turned off to complete a backup.
VSS snapshot could not be created.
2. Check WSB logs:
Applications and Services Logs > Microsoft > Windows > Backup > Operational
Search for:
Backup started
Backup completed with warnings
Online backup failed - attempting offline
Snapshot operation failed
Fix This Now
Step-by-step:
- On Each VM:
- Run:
vssadmin list writers
And confirm all are stable. - Then run:
Get-VMIntegrationService -VMName "<VMName>"
Make sure Backup (volume snapshot) is enabled.
- Run:
- Inside Each Guest VM:
- Run:
sfc /scannow DISM /Online /Cleanup-Image /RestoreHealth
Clear out any system corruption that could trip up VSS.
- Run:
- Remove Stale Snapshots (inside and out): On the host:
vssadmin delete shadows /all
- Optional: If the VMs are critical (Exchange/DC), switch to a smarter backup tool like Veeam Community Edition or Altaro, which handle Exchange and AD-aware backups more safely.
Final Thought
Windows Server Backup isn’t “designed” to power off VMs – but it can, if all else fails, and you’re backing up at the host level without proper guest cooperation.
This is why best practice for backing up Domain Controllers and Exchange in VMs is:
- Use guest-level backup with VSS-aware agent
- Or use a Hyper-V backup tool that fully respects Exchange/AD semantics
They weren’t Shut Down – they just Got Killed!
Ah. That’s even worse – they weren’t shut down cleanly or saved; they were forcibly terminated. That should never happen – not under any acceptable conditions from Windows Server Backup, Hyper-V, or any well-behaved system.
Let’s call it what it is:
Your VMs were killed – like power pulled from a running machine. That’s not a backup – that’s a virtual execution.
Possible Causes (Ordered from Most to Least Likely)
1. Critical Snapshot Failure + VSS Stall + Hyper-V Panic
- WSB triggers a VSS snapshot on the host for the VM
- The VM doesn’t respond, or VSS inside the VM fails or freezes
- Hyper-V cannot flush buffers or create checkpoints
- Host runs out of patience or gets corrupted state Hyper-V terminates the VM process
This is rare – but it has been seen when:
- VHDX is on slow or unstable storage (e.g. USB or poor RAID)
- VSS writers are hung
- IO subsystem is unstable
2. Hyper-V Worker Process Crash (vmwp.exe
)
Each VM runs under a Hyper-V Worker Process (vmwp.exe
).
If:
- There’s a bug in the integration services
- Or a timeout/deadlock
- Or memory corruption
- Or storage device failure / driver panic
Then vmwp.exe
can crash. And when it crashes – your VM just disappears. No shutdown, no save, just gone.
Check:
Event Viewer > Windows Logs > Application
Look for crashes from:
vmwp.exe
Virtualization Infrastructure Driver
Virtual Disk Service
Also check:
Applications and Services Logs > Microsoft > Windows > Hyper-V-Worker > Admin
3. Storage Backend Failure or Pause
If the host loses contact with storage (e.g. temporary dismount, disk timeout, controller glitch), then:
- The VM may be unable to commit IO
- Hyper-V may kill the VM to prevent corruption
Especially dangerous with dynamically expanding VHDX on flaky disks (USB, external SATA, failing SSDs).
What To Do Right Now
1. Check Crash Source
Look in:
Event Viewer > Windows Logs > System
Filter by:
Event ID 6008
– Unexpected shutdownHyper-V-VMMS
Hyper-V-Worker
vmwp.exe
crash
And check:
Get-EventLog -LogName System -EntryType Error -Newest 100
Look around the exact timestamp they went down.
2. Check VSS History
On the host and guest:
vssadmin list shadows
vssadmin list writers
And review:
Applications and Services Logs > Microsoft > Windows > Backup > Operational
3. Run Storage Diagnostics
If you’re using a USB drive, aging RAID, or unreliable disk backend:
- Run
chkdsk
on host volumes - Run
SMART
diagnostics - Run
sfc /scannow
andDISM
on host
Also check that:
- You’re not running backup and defrag or dedup at the same time
- The host has enough RAM and disk space to hold snapshots
Prevention Strategy
- Never rely on WSB alone for Exchange/DCs in VMs
- Set VMs to not allow fallback to saved state/termination: In PowerShell:
Set-VM -Name "YourVMName" -AutomaticCheckpointsEnabled $false
- Use better tooling – Veeam Community Edition is free and far safer
- Run regular
Get-VMIntegrationService
checks - Isolate backup I/O to a dedicated volume or interface, especially on low-end hardware
Would you like a PowerShell script to monitor VM state and log any unexpected stops or kills with timestamps – perhaps even send an alert email when it happens? You shouldn’t have to find out after the fact.
It Didn’t Affect the Exchange 2013 VM on that Host
That detail is very telling – if only some VMs were forcibly terminated during the Windows Backup job and Exchange 2013 on the same host survived, then we’re not dealing with:
- a host-wide power event
- a general storage crash
- or a full backup system meltdown.
We’re looking at targeted VM failure during backup, almost certainly related to:
VSS coordination or integration service failures inside the affected VMs – not the host.
Here’s the Real Likely Chain of Events
Let’s spell it out:
- Windows Backup on the Hyper-V host initiates backup
- It queries Hyper-V VSS writer, which:
- Enumerates VMs
- Requests each VM to enter a VSS-consistent state (via integration services)
- The Exchange 2013 VM:
- Responds correctly
- Has stable VSS writers and integration tools
- Backup proceeds fine
- The other VMs (DC, Exchange 2019?):
- Hang or fail to respond to VSS
- Possibly have broken VSS writers, disk IO stalls, or missing integration
- Hyper-V says: “Can’t do online snapshot, try offline”
- For whatever reason, offline fallback fails too (e.g. host refuses to save state or save fails)
- Hyper-V kills the VM process to prevent a frozen state, or because WSB doesn’t wait
Exchange 2013 didn’t get affected because everything worked properly in that VM – and only the others failed VSS sync and were terminated to keep the backup flowing.
What Makes a VM Survive This?
Healthy Exchange 2013 VM Had… | Faulty VMs May Have Lacked… |
---|---|
Up-to-date Integration Services | Broken or missing Hyper-V components |
Stable VSS writers (vssadmin list writers OK) | VSS writer hangs, errors, or not registered |
No checkpoint/snapshot corruption | Snapshot merge backlog, broken AVHDX |
Sufficient disk space inside guest | Near-full volumes can freeze writers |
Less IO during snapshot | Exchange 2019 or DCs may spike during VSS freeze |
Try This: Compare Integration and VSS Health
Run these inside each VM:
Get-VMIntegrationService -VMName "YourVM"
And inside the guests:
vssadmin list writers
Compare the Exchange 2013 VM (which survived) with the ones that were killed. Look for:
- Writers stuck in “Waiting for completion” or “Retryable error”
- Missing
Backup
integration service - Disabled or broken VSS-related services like
COM+ Event System
,Volume Shadow Copy
, etc.
What You Should Do Next
- Audit all VMs for:
- Hyper-V integration service health
- VSS writer stability
- Event logs for backup coordination or timeouts
- Consider disabling offline backup fallback on critical VMs: In PowerShell:
Set-VM -Name "YourVMName" -AutomaticCheckpointsEnabled $false
- Exclude vulnerable VMs from host-level backup Instead, back them up from inside the guest using:
- Windows Server Backup (guest)
- Veeam Agent
- Exchange-aware tools
- Switch to proper Hyper-V-aware backup software (Veeam, Altaro, etc.) These will log warnings, not kill VMs, and skip misbehaving guests.
What you’ve just seen is rare, not documented well, and definitely not expected in a healthy environment. You’ve witnessed one of the ugliest failure modes of Windows Server Backup combined with Hyper-V:
When a backup goes wrong so catastrophically at the integration layer that the VM is simply killed to keep the host stable.
It’s not designed to do this lightly – but in a race between data consistency and system survival, Hyper-V may cut loose the non-cooperative VM.
What Makes This So Dangerous?
Because it’s silent, brutal, and unrecoverable unless you catch it:
- No graceful shutdown
- No warning inside the guest
- No rollback
- And often… no backup either – just a tombstone where your VM used to be
What This Confirms About Your Environment
- The problem is not your backup software alone – it’s the interaction between:
- WSB
- Hyper-V snapshot coordination
- Integration services inside the guest
- Possibly underlying disk latency or timeouts
- The Exchange 2013 VM survived, so we know:
- The host was fine
- The backup job mostly succeeded
- Only VMs with VSS or integration failures were at risk
What I Recommend from Here
Immediate
- Do a deep log audit on the time the VMs were killed Look at:
Hyper-V-VMMS
Hyper-V-Worker
vmwp.exe
crash logsSystem
log for6008
unexpected shutdowns
- Run VSS and Integration Service health checks Inside all critical VMs:
vssadmin list writers
On the host:Get-VMIntegrationService -VMName *
- Stop using WSB for these VMs until stable It’s not safe if this could repeat.
Long-Term Hardening
- Switch to Veeam Community Edition – it’s free for 10 VMs, AD-aware, Exchange-aware, and doesn’t force shutdowns
- Use guest-level backup agents for DCs and Exchange if needed
- Schedule a regular VM health monitor that logs and alerts on:
- Unexpected terminations
- Broken integration services
- VSS writer failure
First Principles: What Should Happen?
When you run Windows Server Backup (WSB) on a Hyper-V host, here’s the ideal process for backing up VMs:
- WSB calls Hyper-V VSS Writer
- Hyper-V contacts each VM through Integration Services
- Inside each VM, the Volume Shadow Copy Service (VSS) prepares for snapshot:
- Flushes Exchange or AD logs
- Freezes IO momentarily
- Snapshot is taken
- IO resumes
- Backup captures the consistent VM state
Each VM does this independently – if one fails, the rest should continue.
What Actually Happened to You
Two VMs were “killed” during backup. That means the host forcibly terminated them, not cleanly, not gently.
How could this happen simultaneously?
Here’s How It Can Happen:
Shared Failure at the Integration/Storage Layer
If both VMs share a critical component, such as:
- The same VHDX disk location or controller
- A corrupt snapshot chain
- A faulty VSS coordination module (host VSS or integration services)
- Or a common IO stall (e.g., shared CSV, USB, RAID cache issue)
…then Hyper-V and VSS can stall or fail to quiesce both VMs at once.
When that happens, Windows Backup may:
- Try to offline both (e.g., save state)
- Encounter I/O timeout or failure
- Then Hyper-V kills the
vmwp.exe
processes for both – effectively pulling the power cord
This is a panic move – Hyper-V tries to prevent corruption by terminating hung VMs. But it’s brutal.
Real-World Precedents
There are documented cases (though rare) where:
- Two or more VMs on the same volume hit snapshot issues and both were dumped
- Systems with Exchange and DCs on the same controller were simultaneously killed due to VSS freeze failures
- WSB did not handle VM backup failures gracefully, choosing instead to let Hyper-V terminate the processes
Want to Know If They Were Sharing a Problem?
Here’s what to check:
1. Are the VMs on the same volume?
Run:
Get-VMHardDiskDrive -VMName "VM1", "VM2" | Select VMName, Path
If they share the same:
- VHDX path
- Controller type
- Physical volume
…it’s possible the failure was at the storage layer or snapshot level, not the guest OS.
2. Did They Share Snapshot Problems?
Check this:
Get-VMSnapshot -VMName "VM1", "VM2"
Do they have lingering or broken snapshots? Corrupt checkpoints can block VSS and I/O, leading to forced termination.
3. Were the VSS Writers Both Broken?
Inside both VMs, run:
vssadmin list writers
If both had stuck or failed writers at the same time, WSB may have attempted a “last resort” backup and the host couldn’t handle it.
If WSB + VSS + Hyper-V hit simultaneous coordination or I/O failure for multiple VMs, they can all go down together – because they share the same backup pipeline and the same I/O system underneath.
I think that the DFRS may have been to blame by overloading the volume writer
Yes – now we’re getting to the heart of it.
That theory is very likely and actually fits the failure pattern perfectly:
DFSR (Distributed File System Replication) overloading the VSS Volume Shadow Copy writer on the host or within a DC VM can break the snapshot process entirely – and when Windows Backup triggers a host-level Hyper-V VSS snapshot, that broken state can cascade catastrophically.
What Happens When DFSR Hammers the Volume?
DFS Replication uses a lot of disk I/O – especially on:
C:\Windows\SYSVOL\domain
C:\System Volume Information
- Any staging or conflict folders
When WSB initiates a snapshot, VSS must freeze the volume. But if:
- DFSR is actively writing to the same volume
- Staging is overloaded or journaling is behind
- Or conflicts or retries are happening
Then the VSS volume writer stalls – and that leads to:
Domino Collapse
Step | What Happens |
---|---|
1. | WSB calls Hyper-V VSS writer to snapshot VMs |
2. | Hyper-V triggers VSS inside each VM |
3. | DC VM and/or other affected VMs hang due to DFSR overload |
4. | VSS freeze request times out or deadlocks |
5. | Snapshot cannot be taken; fallback fails |
6. | Hyper-V forcibly terminates the VM to avoid corruption |
And if Exchange is on a different volume, or not using DFSR, it survives.
This Explains Why:
- Only certain VMs (DCs, maybe Exchange 2019) were affected
- The Exchange 2013 VM was not
- There was no clean shutdown or save state
- The issue was intermittent and hard to reproduce
What to Do Next
1. Check DFSR Event Logs
Go to: Event Viewer > Applications and Services Logs > DFS Replication
Look for:
Event ID 2212
: Journal wrap detectedEvent ID 4002
: Backlog or excessive stagingEvent ID 5002
: Replication retries
2. Check for DFSR Overload Symptoms
Run on affected DC:
dfsrdiag backlog /rgname:"Domain System Volume" /rfname:"SYSVOL Share" /sendingmember:<HealthyDC> /receivingmember:<BrokenDC>
And:
dfsrdiag pollad
Then check the staging folder size:
Get-ChildItem "C:\System Volume Information\DFSR" -Recurse | Measure-Object -Property Length -Sum
If it’s huge – you’ve got a bottleneck.
3. Throttle DFSR for Now
You can temporarily reduce DFSR load during backup windows:
Set-DfsrConnection -GroupName "Domain System Volume" -SourceComputerName "DC1" -DestinationComputerName "DC2" -Schedule (New-DfsrSchedule -Day Monday -StartHour 0 -DurationInMinutes 1440 -Bandwidth 64)
Set bandwidth to something low during backups (e.g. 64 Kbps), and restore it after.
4. Exclude DFSR Paths from Backup
If WSB is backing up SYSVOL
as part of the VSS snapshot and DFSR is hammering it:
- Add backup exclusions for
C:\Windows\SYSVOL
- Or switch to guest-level backups only
Want a DFSR Load Monitor?
I can write you a script that:
- Checks DFSR staging size
- Measures backlog between DCs
- Monitors replication errors
- Warns if you’re nearing overload before a backup window
It’s a simple way to keep VSS and DFSR from colliding again.
Share this content: