To update multiple rows in a table, you can use a single UPDATE statement, with a WHERE clause that filters the rows you want to update.
To update a large number of rows (i.e., tens of thousands of rows or more), we recommend iteratively updating subsets of the rows that you want to update, until all of the rows have been updated. You can write a script to do this, or you can write a loop into your application.
This page provides guidance on writing batch-update loops with a pattern that executes SELECT and UPDATE statements at different levels of a nested loop.
Exercise caution when batch-updating rows from tables with foreign key constraints and explicit ON UPDATE foreign key actions. To preserve UPDATE performance on tables with foreign key actions, we recommend using smaller batch sizes, as additional rows updated due to ON UPDATE actions can make batch loops significantly slower.
Before you begin
Before reading this page, do the following:
- Create a CockroachDB Standard cluster or start a local cluster.
- Install a Driver or ORM Framework. - For the example on this page, we use the - psycopg2Python driver.
- Insert data that you now want to update. - For the example on this page, we load a cluster with the - movrdatabase and data from- cockroach workload.
Write a batch-update loop
- At the top level of a loop in your application, or in a script, execute a - SELECTquery that returns a large batch of primary key values for the rows that you want to update. When defining the- SELECTquery:- Use a WHEREclause to filter on columns that identify the rows that you want to update. This clause should also filter out the rows that have been updated by previous iterations of the nestedUPDATEloop:- For optimal performance, the first condition of the filter should evaluate the last primary key value returned by the last UPDATEquery that was executed. This narrows eachSELECTquery's scan to the fewest rows possible, and preserves the performance of the row updates over time.
- Another condition of the filter should evaluate column values persisted to the database that signal whether or not a row has been updated. This prevents rows from being updated more than once, in the event that the application or script crashes and needs to be restarted. If there is no way to distinguish between an updated row and a row that has not yet been updated, you might need to add a new column to the table (e.g., ALTER TABLE ... ADD COLUMN updated BOOL;).
 
- For optimal performance, the first condition of the filter should evaluate the last primary key value returned by the last 
- Add an AS OF SYSTEM TIMEclause to the end of the selection subquery, or run the selection query in a separate, read-only transaction withSET TRANSACTION AS OF SYSTEM TIME. This helps to reduce transaction contention.
- Use a LIMITclause to limit the number of rows queried to a subset of the rows that you want to update. To determine the optimalSELECTbatch size, try out different sizes (10,000 rows, 20,000 rows, etc.), and monitor the change in performance. Note that thisSELECTbatch size can be much larger than the batch size of rows that are updated in the subsequentUPDATEquery.
- To ensure that rows are efficiently scanned in the subsequent UPDATEquery, include anORDER BYclause on the primary key.
 
- Use a 
- Under the - SELECTquery, write a nested loop that executes- UPDATEqueries over the primary key values returned by the- SELECTquery, in batches smaller than the initial- SELECTbatch size. When defining the- UPDATEquery:- Use a WHEREclause that filters on a subset of the primary key values returned by the top-levelSELECTquery. To determine the optimalUPDATEbatch size, try out different sizes (1,000 rows, 2,000 rows, etc.), and monitor the change in performance.
- Make sure that the UPDATEquery updates a column that signals whether or not the row has been updated. This column might be different from the column whose values you want to update.
- Add a RETURNINGclause to the end of the query that returns the primary key values of the rows being updated. TheWHEREclause in the top-levelSELECTquery should filter out the primary key value of the last row that was updated, using the values returned by the lastUPDATEquery executed.
- Where possible, we recommend executing each UPDATEin a separate transaction.
 
- Use a 
Example
Suppose that over the past year, you've recorded hundreds of thousands of MovR rides in a cluster loaded with the movr database. And suppose that, for the last week of December, you applied a 10% discount to all ride charges billed to users, but you didn't update the rides table to reflect the discounts.
To get the rides table up-to-date, you can create a loop that updates the relevant rows of the rides table in batches, following the query guidance provided above.
In this case, you will also need to add a new column to the rides table that signals whether or not a row has been updated. Using this column, the top-level SELECT query can filter out rows that have already been updated, which will prevent rows from being updated more than once if the script crashes.
For example, you could create a column named discounted, of data type BOOL:
ALTER TABLE rides ADD COLUMN discounted BOOL DEFAULT false;
The Bulk-update Data statement performs a schema change. For more information about how online schema changes work in CockroachDB, see Online Schema Changes.
In Python, a batch-update script might look similar to the following:
#!/usr/bin/env python3
import psycopg2
import os
import time
def main():
    conn = psycopg2.connect(os.environ.get('DB_URI'))
    lastid = None
    while True:
        with conn:
            with conn.cursor() as cur:
                cur.execute("SET TRANSACTION AS OF SYSTEM TIME '-5s'")
                if lastid:
                    cur.execute("SELECT id FROM rides WHERE id > %s AND discounted != true AND extract('month', start_time) = 12 AND extract('day', start_time) > 23 ORDER BY id LIMIT 10000", (lastid,))
                else:
                    cur.execute("SELECT id FROM rides WHERE discounted != true AND extract('month', start_time) = 12 AND extract('day', start_time) > 23 ORDER BY id LIMIT 10000")
                pkvals = list(cur)
        if not pkvals:
            return
        while pkvals:
            batch = pkvals[:2000]
            pkvals = pkvals[2000:]
            with conn:
                with conn.cursor() as cur:
                    cur.execute("UPDATE rides SET discounted = true, revenue = revenue*.9 WHERE id = ANY %s RETURNING id", (batch,))
                    print(cur.statusmessage)
                    if not pkvals:
                        lastid = cur.fetchone()[0]
        del batch
        del pkvals
        time.sleep(5)
    conn.close()
if __name__ == '__main__':
    main()
At each iteration, the SELECT query returns the primary key values of up to 10,000 rows of matching historical data from 5 seconds in the past, in a read-only transaction. Then, a nested loop iterates over the returned primary key values in smaller batches of 2,000 rows. At each iteration of the nested UPDATE loop, a batch of rows is updated. After the nested UPDATE loop updates all of the rows from the initial selection query, a time delay ensures that the next selection query reads historical data from the table after the last iteration's UPDATE final update.
Note that the last iteration of the nested loop assigns the primary key value of the last row updated to the lastid variable. The next SELECT query uses this variable to decrease the number of rows scanned by the number of rows updated in the last iteration of the loop.