Failover in physical cluster replication (PCR) allows you to move application traffic from the active primary cluster to the passive standby cluster. When you complete the replication stream to initiate a failover, the job stops replicating data from the primary, sets the standby virtual cluster to a point in time (in the past or future) where all ingested data is consistent, and then makes the standby virtual cluster ready to accept traffic.
After a failover event, you may want to return your operations to the original primary cluster (or a new cluster). Failback in PCR does this by replicating new application traffic back onto the original primary cluster. When you initiate a failback, the job ensures the original primary is up to date with writes from the standby that happened after failover. The original primary cluster is then set as ready to accept application traffic once again.
This page describes:
- Failover from the primary cluster to the standby cluster.
- Failback: 
- From the original standby cluster (after it was promoted during failover) to the original primary cluster.
- After the PCR stream used an existing cluster as the primary cluster.
 
- Job management after a failover or failback.
Failover and failback do not redirect traffic automatically to the standby cluster. Once the failover or failback is complete, you must redirect application traffic to the standby cluster.
Failover
The failover is a two-step process on the standby cluster:
Before you begin
During PCR, jobs running on the primary cluster will replicate to the standby cluster. Before you fail over to the standby cluster, or fail back to the original primary cluster, consider how you will manage running (replicated) jobs between the clusters. Refer to Job management for instructions.
Step 1. Initiate the failover
To initiate a failover to the standby cluster, specify the point in time for its promotion. At failover, the standby cluster’s data will reflect the state of the primary at the specified moment. Refer to the following sections for steps:
- LATEST: The most recent replicated timestamp. This minimizes any data loss from the replication lag in asynchronous replication.
- Point-in-time:
- Past: A past timestamp within the failover window of up to 4 hours in the past.
Tip:Failing over to a past point in time is useful if you need to recover from a recent human error.
- Future: A future timestamp for planning a failover.
 
- Past: A past timestamp within the failover window of up to 4 hours in the past.
Fail over to the most recent replicated time
To initiate a failover to the most recent replicated timestamp, specify LATEST. Due to replication lag, the most recent replicated time may be behind the current actual time. Replication lag is the time difference between the most recent replicated time and the actual time.
- To view the current replication timestamp, use: - SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;- id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status -----+------+--------------------+-------------------------------------------------+---------------------------------+------------------------+-----------------+--------------+-------------- 3 | main | main | postgresql://user@hostname or IP:26257?redacted | 2024-04-18 10:07:45.000001+00 | 2024-04-18 14:07:45+00 | 00:00:19.602682 | NULL | replicating (1 row)Tip:- You can view the Replication Lag graph in the standby cluster's DB Console. 
- Run the following from the standby cluster's SQL shell to start the failover: - ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO LATEST;- The - failover_timeis the timestamp at which the replicated data is consistent. The cluster will revert any replicated data above this timestamp to ensure that the standby is consistent with the primary at that timestamp:- failover_time ---------------------------------- 1695922878030920020.0000000000 (1 row)
Fail over to a point in time
You can control the point in time that the PCR stream will fail over to.
- To select a specific time in the past, use: - SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;- The - retained_timeresponse provides the earliest time to which you can fail over. This is up to four hours in the past.- id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status -----+------+--------------------+-------------------------------------------------+-------------------------------+------------------------+-----------------+--------------+-------------- 3 | main | main | postgresql://user@hostname or IP:26257?redacted | 2024-04-18 10:07:45.000001+00 | 2024-04-18 14:07:45+00 | 00:00:19.602682 | NULL | replicating (1 row)
- Specify a timestamp: - ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO SYSTEM TIME '-1h';- Refer to Using different timestamp formats for more information. - Similarly, to fail over to a specific time in the future: - ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO SYSTEM TIME '+5h';- A future failover will proceed once the replicated data has reached the specified time. 
To monitor for when the replication stream completes, do the following:
Step 2. Complete the failover
- The completion of the replication is asynchronous; to monitor its progress use: - SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;- id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status ---+------+--------------------+-------------------------------------------------+-------------------------------+------------------------------+-----------------+--------------------------------+-------------- 3 | main | main | postgresql://user@hostname or IP:26257?redacted | 2023-09-28 16:09:04.327473+00 | 2023-09-28 17:41:18.03092+00 | 00:00:19.602682 | 1695922878030920020.0000000000 | replication pending failover (1 row)- Refer to Physical Cluster Replication Monitoring for the Responses and Data state of - SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUSfields.
- Once complete, bring the standby's virtual cluster online with: - ALTER VIRTUAL CLUSTER main START SERVICE SHARED;- id | name | data_state | service_mode -----+---------------------+--------------------+--------------- 1 | system | ready | shared 3 | main | ready | shared (3 rows)
- To make the standby's virtual cluster the default for connection strings, set the following cluster setting: - SET CLUSTER SETTING server.controller.default_target_cluster='main';
At this point, the primary and standby clusters are entirely independent. You will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to the standby (now primary). To manage replicated jobs on the promoted standby, refer to Job management.
To enable PCR again, from the new primary to the original primary (or a completely different cluster), refer to Fail back to the primary cluster.
Failback
After failing over to the standby cluster, you may want to return to your original configuration by failing back to the original primary-standby cluster setup. Depending on the configuration of the primary cluster in the original PCR stream, use one of the following workflows:
- From the original standby cluster (after it was promoted during failover) to the original primary cluster. If this failback is initiated within 24 hours of the failover, PCR replicates the net-new changes from the standby cluster to the primary cluster, rather than fully replacing the existing data in the primary cluster.
- After the PCR stream used an existing cluster as the primary cluster.
To move back to a different cluster that was not involved in the original PCR stream, set up a new PCR stream following the PCR setup guide.
Fail back to the original primary cluster
This section illustrates the steps to fail back to the original primary cluster from the promoted standby cluster that is currently serving traffic.
- Cluster A = original primary cluster
- Cluster B = original standby cluster
Cluster B is serving application traffic after the failover.
- To begin the failback to Cluster A, the virtual cluster must first stop accepting connections. Connect to the system virtual on Cluster A: - cockroach sql --url \ "postgresql://{user}@{node IP or hostname cluster A}:26257?options=-ccluster=system&sslmode=verify-full" \ --certs-dir "certs"
- From the system virtual cluster on Cluster A, ensure that service to the virtual cluster has stopped: - ALTER VIRTUAL CLUSTER {cluster_a} STOP SERVICE;
- Open another terminal window and generate a connection string for Cluster B using - cockroach encode-uri:- cockroach encode-uri {replication user}:{password}@{cluster B node IP or hostname}:26257 --ca-cert certs/ca.crt --inline- Copy the output ready for starting the PCR stream, which requires the connection string to Cluster B: - postgresql://{replication user}:{password}@{cluster B node IP or hostname}:26257/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0ATip:- For details on connection strings, refer to the Connection reference. 
- Connect to the system virtual cluster for Cluster B: - cockroach sql --url \ "postgresql://{user}@{cluster B node IP or hostname}:26257?options=-ccluster=system&sslmode=verify-full" \ --certs-dir "certs"
- From the system virtual cluster on Cluster B, enable rangefeeds: - SET CLUSTER SETTING kv.rangefeed.enabled = 'true';
- From the system virtual cluster on Cluster A, start the replication from Cluster B to Cluster A. Include the connection string for Cluster B: - ALTER VIRTUAL CLUSTER {cluster_a} START REPLICATION OF {cluster_b} ON 'postgresql://{replication user}:{password}@{cluster B node IP or hostname}:26257/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A';- This will reset the virtual cluster on Cluster A back to the time at which the same virtual cluster on Cluster B diverged from it. Cluster A will check with Cluster B to confirm that its virtual cluster was replicated from Cluster A as part of the original PCR stream. Note:- (Preview) If you want to start the PCR stream with a read-only virtual cluster on the standby after failing back to the original primary cluster, run the - ALTER VIRTUAL CLUSTERstatement in this step with the- READ VIRTUAL CLUSTERoption.
- Check the status of the virtual cluster on A: - SHOW VIRTUAL CLUSTER {cluster_a};- id | name | data_state | service_mode ----+--------+--------------------+--------------- 1 | system | ready | shared 3 | {vc_a} | replicating | none 4 | test | replicating | none (2 rows)
- From Cluster A, start the failover: - ALTER VIRTUAL CLUSTER {cluster_a} COMPLETE REPLICATION TO LATEST;- After the failover has successfully completed, it returns a - failover_timetimestamp, representing the time at which the replicated data is consistent. Note that the cluster reverts any replicated data above the- failover_timeto ensure that the standby is consistent with the primary at that time:- failover_time ---------------------------------- 1714497890000000000.0000000000 (1 row)
- From Cluster A, bring the virtual cluster online: - ALTER VIRTUAL CLUSTER {cluster_a} START SERVICE SHARED;
- To make Cluster A's virtual cluster the default for connection strings, set the following cluster setting: - SET CLUSTER SETTING server.controller.default_target_cluster='{cluster_a}';
At this point, Cluster A has caught up to Cluster B. The clusters are entirely independent. To enable PCR again from the primary to the standby, refer to Set Up Physical Cluster Replication.
Fail back after replicating from an existing primary cluster
You can replicate data from an existing CockroachDB cluster that does not have cluster virtualization enabled to a standby cluster with cluster virtualization enabled. For instructions on setting up a PCR in this way, refer to Set up PCR from an existing cluster.
After a failover to the standby cluster, you may want to set up PCR from the original standby cluster, which is now the primary, to another cluster, which will become the standby. There are multiple ways to set up a new standby, and some considerations.
In the example, the clusters are named for reference:
- A = The original primary cluster, which started without virtualization.
- B = The original standby cluster, which started with virtualization.
- You run a PCR stream from cluster A to cluster B.
- You initiate a failover from cluster A to cluster B.
- You promote the mainvirtual cluster on cluster B and start serving application traffic from B (that acts as the primary).
- You need to create a standby cluster for cluster B to replicate changes to. You can do one of the following:
- Create a new virtual cluster (main) on cluster A from the replication of cluster B. Cluster A is now virtualized. This will start an initial scan because the PCR stream will ignore the former workload tables in the system virtual cluster that were originally replicated to B. You can drop the tables that were in the system virtual cluster, because the new virtual cluster will now hold the workload replicating from cluster B.
- Start an entirely new cluster C and create a mainvirtual cluster on it from the replication of cluster B. This will start an initial scan because cluster C is empty.
 
- Create a new virtual cluster (
Job management
During PCR, jobs running on the primary cluster replicate to the standby cluster. Once you have completed a failover (or a failback), refer to the following sections for details on resuming jobs on the promoted cluster.
Backup schedules
Backup schedules pause after failover on the promoted standby cluster. Take the following steps to resume jobs:
- Verify that there are no other schedules running backups to the same collection of backups, i.e., the schedule that was running on the original primary cluster.
- Resume the backup schedule on the promoted cluster.
If your backup schedule was created on a cluster in v23.1 or earlier, it will not pause automatically on the promoted cluster after failover. In this case, you must pause the schedule manually on the promoted cluster and then take the outlined steps.
Changefeeds
Currently running changefeeds will fail on the promoted cluster immediately after failover to avoid two clusters running the same changefeed to one sink. We recommend that you recreate changefeeds on the promoted cluster.
To avoid multiple clusters running the same schedule concurrently, changefeed schedules will pause after physical cluster replication has completed.
If your changefeed schedule was created on a cluster in v24.1 or earlier, it will not pause automatically on the promoted cluster after failover. In this case, you will need to manage pausing or canceling the schedule on the promoted standby cluster to avoid two clusters running the same changefeed to one sink.