Site Recovery

Enterprise Feature

Site Recovery is available exclusively with an Enterprise license. The required feature flag is ceph_replication. Learn more about licensing.

Site Recovery provides disaster recovery (DR) capabilities for your Proxmox environments. It manages data replication between nodes or clusters, orchestrates recovery plans, and supports failover, failback, and emergency DR operations -- giving you confidence that critical workloads can be restored quickly when disaster strikes.

Overview

Site Recovery is built around two core concepts:

Replication Jobs -- Continuous or scheduled data replication from a source node/cluster to a target, ensuring an up-to-date copy of your VMs is always available.
Recovery Plans -- Predefined sequences of actions that describe how to restore a set of VMs on a target cluster in case of failure.

Together, these allow you to protect workloads, test your DR strategy regularly, and execute real failovers with minimal downtime.

Interface Tabs

The Site Recovery page is organized into four tabs: Dashboard, Protection, Recovery Plans, and Emergency.

Dashboard

The Dashboard tab provides a high-level view of your replication health:

Overall replication status -- healthy, degraded, or critical
Active replication job count and their current states
Error count -- jobs in an error state are flagged immediately
Recovery plan status overview

Use this tab as a daily check-in to verify that your DR posture is healthy.

Protection

The Protection tab manages replication jobs. Each job defines what data is replicated, from where, and to where.

Creating a Replication Job

Click Create Job to open the creation dialog. You can configure:

Source connection -- the Proxmox cluster containing the VMs to protect
Target connection -- the destination cluster for replicated data
VMs to replicate -- select individual VMs from the source cluster
Schedule -- how often replication runs

Managing Replication Jobs

For each job, the following actions are available:

Action	Description
Sync	Trigger an immediate replication sync
Pause	Temporarily suspend replication
Resume	Resume a paused replication job
Delete	Remove the replication job entirely

Selecting a job displays its execution logs in a detail panel, showing the history of sync operations with timestamps and results.

tip

Run a manual sync after making significant changes to a protected VM to ensure the latest state is replicated before relying on it for recovery.

Recovery Plans

Recovery Plans define the procedure for restoring services on a target cluster. The tab lists all existing plans and lets you create new ones.

Creating a Recovery Plan

Click Create Plan to define:

Plan name and description
Source and target clusters
Associated replication jobs -- which replication jobs feed into this plan
VM startup order and dependencies

Recovery Plan Operations

Each recovery plan supports three operations:

Operation	Description
Test Failover	Executes the recovery plan in an isolated, network-isolated environment. Production workloads are not affected. Use this to validate your DR strategy regularly.
Failover	Activates the recovery plan for real. VMs are started on the target cluster using the most recent replicated data. Use this during an actual disaster.
Failback	After the primary site is restored, failback reverses the direction -- migrating workloads back from the DR site to the original production cluster.

When any operation is executed, ProxCenter tracks its progress in real time, polling the execution status every 3 seconds and displaying step-by-step updates.

warning

Failover is a disruptive operation. Ensure the source site is truly unavailable before initiating a production failover, as running the same VMs on both sites simultaneously can cause data corruption.

Test Cleanup

After running a test failover, use the Cleanup action to tear down the test environment and release resources on the target cluster. This ensures that test artifacts do not consume storage or interfere with future tests.

Execution History

Select a recovery plan to view its execution history -- a chronological list of all test, failover, and failback operations with their outcomes, timestamps, and any errors encountered.

Emergency

The Emergency tab is designed for critical situations where you need to act fast without going through the full recovery plan workflow.

Emergency DR Mode allows you to:

Start individual VMs on a target cluster directly from their most recent replication snapshot
Execute immediate failover of an entire recovery plan
Execute failback to restore services to the original site

This tab aggregates all replication jobs and recovery plans with quick-action buttons, giving operators a single view to manage a crisis.

warning

Emergency operations bypass the normal validation steps. Use them only when time is critical and you understand the implications of starting replicated VMs without a full plan execution.

Workflow Example

A typical Site Recovery workflow looks like this:

Set up replication: Create replication jobs for your critical VMs, pointing to a secondary Proxmox cluster
Create a recovery plan: Group the replication jobs into a recovery plan with the correct startup order
Test regularly: Run test failovers monthly to validate that recovery works as expected, then clean up
Respond to incidents: If the primary site fails, execute a failover from the Emergency tab or Recovery Plans tab
Restore normal operations: Once the primary site is back, perform a failback to return workloads to production

Permissions

Permission	Description
`vm.config`	Required to access Site Recovery and manage replication jobs and recovery plans

Users without the vm.config permission will not see the Site Recovery entry in the navigation sidebar.

Overview​

Interface Tabs​

Dashboard​

Protection​

Recovery Plans​

Emergency​

Workflow Example​

Permissions​