tldr: Aries
Part of the tldr series, which summarizes papers I'm currently reading. Inspired by the morning paper.
*With Concurrency Control and Recovery, by Franklin and an example from http://db.csail.mit.edu/6.830/notes.php
ARIES is a protocol for DBMS recovery based on WAL. (For a review of the basics of WAL, you can take a look here.) This blog post will go over ARIES with an example and explanations along the way.
Example
This is the example we'll be following:
Assume the following is the log right before the crash:
tldr: ARIES - A Transaction Recovery Method [1992]
Mohan et. al*With Concurrency Control and Recovery, by Franklin and an example from http://db.csail.mit.edu/6.830/notes.php
ARIES is a protocol for DBMS recovery based on WAL. (For a review of the basics of WAL, you can take a look here.) This blog post will go over ARIES with an example and explanations along the way.
Example
This is the example we'll be following:
CP = checkpoint, Flush = flush to disk, W = write
SOT = start of transaction, EOT = end of transaction, UP = update
(In this example, each data item is on a different page. Treat them synonymously for simplicity.)
Data Structures
Transaction Table
The transaction table stores data about ongoing transactions. One of the pieces of data is lastLSN, which is the LSN of the most recent log record that each transaction wrote.
Log records that are part of the same transaction are linked backwards using a prevLSN field. Whenever a new log record is written for a transaction, the new record's prevLSN is set to the lastLSN entry from the transaction table, and the new lastLSN is set to the new record's LSN.
Example
Right before the example system crashed, the transaction table is:
This is because transaction 3 is the only ongoing transaction at the time of the crash.
Example
Right before the example system crashed, the transaction table is:
This is because transaction 3 is the only ongoing transaction at the time of the crash.
Dirty Page Table
The dirty page table stores data for each dirty page (e.g. a page with updates not yet updated on disk). One of the pieces of data is recoveryLSN, which is the LSN of the first log record that dirtied the page. In other words, recoveryLSN is the LSN of the earliest log record that might need to be REDOne.
Example
Right before the example system crashed, the dirty page table is:
Example
Right before the example system crashed, the dirty page table is:
The recovery LSNs are the LSNs of the earliest log record that dirtied each page after the flush.
Checkpoint Table
The checkpoint table stores the state of the transaction table and the dirty page table at the moment of the checkpoint.
Algorithm
Analysis Pass
In the first pass, ARIES processes the log from the most recent checkpoint onwards. This pass achieves 3 things:
- Figures out where in the log to start the REDO pass
- Figures out which pages were dirty when the crash happened
- Figure out which transactions hadn't committed when the crash happened
- These transactions will have to be UNDOne
Specifically, this pass reconstructs the transaction table and the dirty page table. It starts with the tables that were logged at the checkpoint. Then, it updates the tables based on the log records scanned:
- If the pass scans a log record for a transaction that isn't in the transaction table yet, the transaction is added to the transaction table
- If the pass scans a log record that commits or aborts a transaction, the transaction is removed from the transaction table
- If the pass scans a log record that updates a page not yet in the dirty page table, the LSN of that record is recorded as the recoveryLSN for that page
The dirty page table ends up being a conservative estimate of which pages are dirty, since some pages could have been flushed to disk.
The next pass of ARIES will start at the earliest recoveryLSN of the entries in the dirty page table, also called firstLSN.
Example
In the example system, the most recent checkpoint is at LSN 5, so that's where the first pass starts:
Example
In the example system, the most recent checkpoint is at LSN 5, so that's where the first pass starts:
At this point, the only ongoing transaction is transaction 1, which most recently updated B at LSN 3:
Transaction 1 also updated A at LSN 2. Both of the updates to A and B haven't been flushed to disk yet:
Finally, on disk, the pageLSN (the LSN of the log record for the latest update) is stored for each item.
If the pass scans a log record for a transaction that isn't in the transaction table yet, the transaction is added to the transaction table. For example, Transaction 3 (LSN 5) isn't in the transaction table yet:
If the pass scans a log record that updates a page not yet in the dirty page table, the LSN of that record is recorded as the recoveryLSN for that page. For example, LSN 6 updates C, which isn't yet in the page table, so it's appended:
By the time ARIES finishes the first pass, the dirty page table looks like:
Note that this dirty page table includes at least everything (and a lot more!) than the actual dirty page table above right before the crash.
Finally, note the only "loser" (uncommitted before crash) transaction remaining in the transaction table:
REDO Pass
In the second pass, ARIES uses a repeating history model for its REDO, which means it redoes updates for all transactions, even if those transactions will eventually be UNDOne. This pass restores the database to the state it was in right before the crash happened.
Specifically, ARIES starts scanning forward at the firstLSN (as defined earlier). It figures out if a log record should be REDOne:
- If the page is not in the dirty page table, the update doesn't need to be REDOne.
- If the page is in the dirty page table,
- If the recoveryLSN is greater than the LSN of the record being checked, the update doesn't need to be REDOne.
- Else if the pageLSN is greater than or equal to the LSN of the record being checked, the update doesn't need to be REDOne.
- If those two cases aren't true, the update must be REDOne.
To REDO an update, the logged action is re-applied and the pageLSN is set to the LSN of the redone log record. No logging is done.
Example
As seen from the last version of the dirty page table, the firstLSN is 2.
Example
As seen from the last version of the dirty page table, the firstLSN is 2.
UNDO pass
In the third pass, ARIES processes the log backwards from the end of the log, removing all effects of transactions that hadn't committed yet before the crash. UNDO is unconditional, which means that the pageLSN of the affected page is not checked because the undo is always applied.
Once each UNDO is applied, it is logged using a Compensation Log Record (CLR). One piece of data that the CLR contains is UndoNxtLSN, which is set to the value of the prevLSN of the log record being undone. This field is the LSN of the next log record to be undone. CLRs help ARIES decrease the amount of logging it has to do.