Transaction Isolation

The third of ACID properties “Isolation” is basically the question:

“Can an in-progress transaction see changes made by another transaction running in-parallel?”

Follow up to it would be “If yes, at what point of time can it see such changes?”

Let’s try to analyse some aspects of the above question(s) from the db system implementation perspective. 

Transactions: One at a time or Concurrent

First, we’re talking about more than 1 transaction running in-parallel to each other. If only a single transaction was allowed to execute at a time, then we don’t need to worry about Isolation at-all.. Like.. ever.. But then there’s no fun either right? I mean, what good would a computer system be if it could do only 1 thing at a time and pick up the next only when the first has finished?

Further, multiple concurrent transactions would still be fine if they were to execute on mutually exclusive data-sets only. But in practice, such a system would still be fairly limited in usability and performance. So we need to think about systems where multiple transactions are running in parallel, potentially on the same dataset.

Change visibility

Now that we’ve established to worry about only concurrent transactions working on the same dataset, Second thing we need to think about is whether we should allow a transaction to see changes made by another transaction? If yes, at what point of time during its execution? If not, why and what are the trade-offs to consider for each approach?

With the background of these different questions around change visibility, we can now discuss something called the “Read Phenomena”.

Read Phenomena

The term is used as a collective name for three different issues that can occur when there are parallel transactions reading and writing the same rows/ dataset. These are:

  • Dirty Read
  • Non-Repeatable Read
  • Phantom Read

While the above three are as defined under ANSI SQL standard, there is a fourth read phenomena also to consider, called “Lost Updates“

Let’s discuss each of the three “Read Phenomena” in more detail:

Dirty Read:

Reading an uncommitted change. Let’s take our example INVENTORY table from the Transaction page:

Example Inventory table

Initially, there are 10 qty each for the two items. Now we perform following transactions in parallel:

BEGIN Transaction1;  
SELECT ITEM_DESC, QTY FROM INVENTORY;    
/*
Will result in:      
iPhone, 10      Samsung Galaxy Note,  10    
*/





SELECT SUM(QTY) FROM INVENTORY;    
/*
Will return 18 where it should’ve returned 20 because we read a dirty value that has not been committed yet.
*/
COMMIT Transaction1;
BEGIN Transaction2;

  UPDATE INVENTORY SET QTY = QTY - 2 WHERE ITEM_ID = 1;

/* Will change the data for item_id = 1 as this:    1, iPhone, 8    */

/* Still Uncommitted */
Transaction flow to depict dirty read

So in case of dirty-read when there are two transactions running in-parallel above, we read the uncommitted changes from transaction-2 into transaction-1. Now, transaction-2 may even revert the changes by rollback, but since transaction-1 has already read the value, it would produce an incorrect result.

Non-repeatable Read

You do a read of certain data in a transaction and then you attempt to read the same data again within that same transaction, but by this time, the data has changed/ deleted. In essence, you’re just reading, but the results differ between different reads within the same transaction. 

Continuing with the same example INVENTORY table setup from above section on Dirty Read, let’s say this time we have following transactions:

BEGIN Transaction1;  
SELECT ITEM_DESC, QTY FROM INVENTORY;    

/*
Will result in:      
iPhone, 10      
Samsung Galaxy Note,  10   
 */








SELECT SUM(QTY) FROM INVENTORY;
/* Will return 18 where it should’ve returned 20 because we
read a committed value from another transaction, but it is something which makes our earlier result inconsistent */
COMMIT Transaction1;
BEGIN Transaction2;

UPDATE INVENTORY SET QTY = QTY - 2 WHERE ITEM_ID = 1;    
/*
Will change the data for item_id = 1 as this:    
1, iPhone, 8    
*/
COMMIT Transaction2;
Transaction flow to depict non-repeatable read

So the only difference is that we actually commit Transaction2 before the second select query is executed in transaction1. Now the result of the second select in tx1 is the same as that in the dirty-read scenario. But since we’re actually reading a committed value, it’s not termed “dirty” read. And because the results in two selects on the same data in the same transaction are different we say that its non-repeatable read problem.

Phantom Read

When we want to read a certain range of data twice in the same transaction but before the second read, new data gets inserted by another transaction and which satisfies our range criteria. Now this sounds similar to non-repeatable reads but it’s not quite. In non-repeatable reads, the existing data being read is changed/ updated while in Phantom Read we talk about new data becoming available.

Extending our example from above two cases with INVENTORY table, let’s say our initial table setup is same and then we perform following transactions:

BEGIN Transaction1;
SELECT ITEM_DESC, QTY FROM INVENTORY;
/*
Will result in:      
iPhone, 10      
Samsung Galaxy Note,  10   
 */








SELECT SUM(QTY) FROM INVENTORY;
/* Will return 25 where it should’ve returned 20 because
we read a “phantom” row which was not available during the first select query.*/
COMMIT Transaction1;
BEGIN Transaction2;

INSERT INTO INVENTORY VALUES (3, ‘One Plus’, 5);
/*
Adding new data row in the table
3, One Plus, 5    
*/
COMMIT Transaction2;

So here we insert another row, a third row, in our INVENTORY table via transaction2 and commit this change before the second select in transaction1 executes. This results in the second select of transaction1 giving a sum of 28 items whereas the first select could only return 20 items. So reading data during subsequent select within the same transaction which was non-existent earlier is called ‘Phantom Read’ phenomena.

Lost Updates

When an update made by one transaction gets overwritten by an update from another transaction. So this means if we were to read the updated data within the first transaction, that read would not be the expected result. Now this sounds very similar to the non-repeatable read, but this one is actually a “Write” problem while non-repeatable read is essentially a read problem. Let’s try to understand better by an example.

So the initial setup remains the same for the INVENTORY table as that in all above read-phenomena examples. And then we perform following transactions:

BEGIN Transaction1;
UPDATE INVENTORY SET QTY = QTY - 2 WHERE ITEM_ID = 1;
/*
Will result in iPhone quantity being updated to 8:      
iPhone, 8    
*/










SELECT SUM(QTY) FROM INVENTORY;
/* Will return 12 where it should’ve returned 8 only,
because we read a committed value from another transaction,
the update we made was lost to the other transaction. */
COMMIT Transaction1;
BEGIN Transaction2;


UPDATE INVENTORY SET QTY = QTY + 2 WHERE ITEM_ID = 1;
/*
Will change the data for item_id = 1 as this:    
1, iPhone, 12    
*/
COMMIT Transaction2;

So both updates in above transactions are operating on the same row. Since both would read the same initial value of 10 while working in parallel, the updated value would be different in each case. The first one expects the updated result to be 8 as a result of reducing 2 quantities, the other one actually increases the quantity by 2. So the update from the first one gets overwritten by the second one. Row level locks are the most popular way to prevent this read phenomena.

Isolation levels for inflight transactions

Database systems usually provide following isolation levels which can be used to prevent 1 or more of above read phenomena. One of these levels is normally set as default at database level (either by database system developer or by the database administrator) but could also be set per-transaction level if required.

  • Read Uncommitted: Basically means read everything from outside current transaction (including dirty-reads)
  • Read Committed: Every query would read only the committed changes from other transactions (means non-repeatable reads can occur). So restricts only uncommitted reads. It is the default isolation level in many commercially available DBMS out there. Oracle’s read-committed level also includes the repeatable read level.
  • Repeatable Read: Within a transaction, when a query reads a row and then another query attempts to read the same row, the data in that row would remain unchanged with repeatable read isolation level. Although such a guarantee is not made for a range of rows read by some query.
  • Snapshot: It’s like saying the data visible to queries within a transaction would be the snapshot of the database at the start of that transaction. Where snapshot would only include the committed changes into the database up to that point.
  • Serialisable: Transactions run as-if they are executed one after the other. Most secure in terms of read phenomena and also the least performant. It does include intricacies such as linearizability of transactions and things like that but for the purpose of the current post we keep it short.

The following table summarises the isolation levels and different read phenomena that can occur in each such level. Source (wikipedia)

Table to summarise isolation levels and read-phenomenon that occur in each
Table to summarise isolation levels and read-phenomenon that occur in each (source: wikipedia)

Snapshot and Serialisable are the same in terms of preventing read-phenomena, but it’s essentially the difference in underlying implementation and performance of the system.

Rate this post

Leave a Reply