MVCC是目前大部分主流数据库实现并发控制的方法,本篇主要介绍一下数据库解决并发控制问题的发展历史,其中会涉及到
Two Phase Locking (2PL)
Basic Timestamp Ordering (Basic T/O)
Optimistic Currency Control (OCC)
Multi Version Two-phase Locking (MV2PL)
、Multi Version Timestamp Ordering (MVTO)
和Multi Version Optimistic Concurrency Control (MVOCC)
两个事务并发,什么情况下会冲突?
需要满足以下条件:
T想读X,U想写X,并且T开始早于U,理论上T应该读取修改前的X,但是按照上图的顺序,T会读取修改后的X,这是就发生了读写冲突,T应该回滚。
T想写X,U想读X,并且T开始早于U,理论上U应该读取修改后的X,但是按照上图的顺序,U读取了修改前的X,这就发生了写读冲突,T应该回滚。
U想写X,T想读X,并且U开始早于T,U写完后发生了回滚操作,理论上T应该读到修改前的X,但是按照上图的顺序,T读取了修改后的X,这就发生了脏读,T应该要等到U commit后才能进行读操作。
START(T) ... START(U) ... wU(X) ... wT(X)
T和U都想写X,并且T开始早于U,理论上应该是U的写操作会覆盖T的写操作,但是按照上图的顺序,T覆盖了U,这就发生了写写覆盖冲突,T应该要回滚。
Concurrency Control就是为了来解决并发事务之间的冲突。
A DBMS’ concurrency control protocol to allow transactions to access a database in a multi-programmed fashion while preserving the illusion that each of them is executing alone on a dedicated system. The goal is to have the effect of a group of transactions on the database’s state is equivalent to any serial execution of all transactions.
DBMS并发控制协议可以使得不同的事务可以并发的访问DBMS,同时让不同的事务感觉仿佛只有自己一个事务在运行。也就是说并发控制需要达到这样的目标:不同事务并发的执行结果,和这些事务串行的执行的结果是等价的。
实现并发控制有两种办法:
- Two-Phase Locking (Pessimistic): Assume transactions will conflict so they must acquire locks on database objects before they are allowed to access them.
- Timestamp Ordering (Optimistic): Assume that conflicts are rare so transactions do not need to first acquire locks on database objects and instead check for conflicts at commit time.
Two-Phase Locking是一种悲观锁模式,假设事务冲突比较严重,因此在访问数据前需要先获得锁。
Timestamp Ordering是一种乐观锁模式,假设事务冲突比较少,因此在访问数据前不需要获得锁,而是在commit的时候检查冲突。
2PL是指事务的执行可以分为两个阶段:生长阶段Growing Phase(加锁阶段)和衰退阶段Shrinking Phase(解锁阶段):
可以证明,若并发执行的所有事务均遵守两段锁协议,则对这些事务的任何并发调度策略都是可串行化的。
需要注意的是2PL可能会有死锁问题,需要有死锁检测和防范机制。
因为2PL对所有读写的数据都进行了加锁,因此不同事物之间的读写冲突都可以避免,也正是由于对所有读写的数据都进行了加锁,并发访问冲突比较多的时候,由于锁竞争,会导致事物执行效率大大降低。
Basic T/O使用Timestamp来决定事务的先后顺序。
- Every transaction is assigned a unique timestamp when they arrive in the system.
- The DBMS maintains separate timestamps in each tuple’s header of the last transaction that read that tuple or wrote to it.
- Each transaction check for conflicts on each read/write by comparing their timestamp with the timestamp of the tuple they are accessing.
- The DBMS needs copy a tuple into the transaction’s private workspace when reading a tuple to ensure repeatable reads
Basic T/O协议主要特点:
Basic T/O需要对每条记录额外存储以下几个信息:
read流程:
If TS < W-ts(x) then
reject read request and abort corresponding transaction
else
execute transaction
Set R-ts(x) to max{R-ts(x), TS}
write流程:
If TS < R-ts(x) or TS < W-ts(x) then
reject write request and abort corresponding transaction
else
execute transaction
Set W-ts(x) to TS.
Basic T/O如何解决读写冲突:
TS < W-ts(x)
,读取事务失败回滚TS < R-ts(x)
,写入事务失败回滚TS < w-ts(x)
,后写入的事务失败回滚可以发现Basic T/O可以成功解决读写、写读和写写覆盖这些并发冲突问题,但是由于commit前数据的修改是直接在DBMS记录上做的修改,因此无法解决事务回滚导致的冲突,例如脏读问题。
OCC会把所有修改先缓存到本地private空间,然后在commit的时候检测冲突并merge,因此可以成功解决Basic T/O无法处理事务回滚引起冲突的问题。
Three Phases:
- Read Phase: Transaction’s copy tuples accessed to private work space to ensure repeatable reads,and keep track of read/write sets.
- Validation Phase: When the transaction invokes COMMIT,the DBMS checks if it conflicts with other transactions. Parallel validation means that each transaction must check the read/write set of other transactions that are trying to validate at the same time. Each transaction has to acquire locks for its write set records in some global order. Original OCC uses serial validation.
- Backward Validation: Check whether the committing transaction intersects its read/write sets with those of any transactions that have already committed.
- Forward Validation: Check whether the committing transaction intersects its read/write sets with any active transactions that have not yet committed.
- Write Phase: The DBMS propagates the changes in the transactions write set to the database and makes them visible to other transactions’ items. As each record is updated,the transaction releases the lock acquired during the Validation Phase
OCC事务提交分三个阶段:
先介绍一下Validation:
当事务commit的时候,DBMS会该事务和其他事务是否冲突,有两种检测方法:
Backword Validation会检查当前事务读写的数据集是否和已提交的事务冲突。
Check whether the committing txn intersects its read/write sets with those of any txns that have already committed.
OCC需要对每条记录额外存储以下几个信息:
read流程:
If private workspace does not contain(x)
copy x from DBMS to private workspace
return x from private workspace
write流程:
If private workspace does not contain(x)
copy x from DBMS to private workspace
update x in private workspace
commit流程:
TS = generate next timestamp
if(validate success)
execute transaction
apply modification from private workspace to DBMS
else
reject and abort corresponding transaction
回滚流程:
clear private workspace
OCC如何解决读写冲突:
OCC成功解决了所有读写冲突的问题,但是每次读取数据都需要把数据拷贝到private空间,导致读取效率比较低。
MVCC使用Multi Version
的方法,即对一个logical tuple
存储多个版本的physical tuple
,可以解决OCC读取时需要把数据拷贝到private空间的问题。
MVCC is currently the best approach for supporting transactions in mixed workloads. The DBMS maintains multiple physical versions of an object of a single logical object in the database. When a transaction writes to an object, the DBMS, creates a new version of that object. When a transaction reads an object, it reads the newest version that existed when a transaction started.
MVCC目前已经被广泛使用到DBMS中,DBMS保持一个逻辑数据的多个物理版本,当事务写入时创建一个新的物理版本,当事务读取时,返回最新的物理版本。
MVCC主要的优点:
Multi Version
不仅仅可以和OCC结合,也可以和2PC、Basic T/O结合,因此产生了:
MVTO是在Basic T/O的基础上加上Mulit Version的功能。
MVTO需要对每条记录额外存储以下几个信息:
MVTO的特点:
事务start流程:
Tstart = genreate next timestamp
read流程:
if (find Ax satisfy begin-ts(Ax) <= Tstart < end-ts(Ax))
if (txn-id(Ax) == 0)
execute transaction
set read-ts(Ax) to max{read-ts(Ax), Tstart}
else
reject read request and abort corresponding transaction
else
return A does not exist
write流程:
if (find Bx satisfy end-ts(Bx) == INF)
if (txn-id(Bx) == 0 and Tstart > read-ts(Bx))
execute transaction
txn-id(Bx) = Tstart
new Bx+1
txn-id(Bx+1) = Tstart
read-ts(Bx+1) = 0
set read-ts(Ax) to max{read-ts(Ax), Tstart}
add Bx+1 to WriteSet
else
reject write request and abort corresponding transaction
else
return B does not exist
commit流程
for Ix+1 in WriteSet
begin-ts(Ix+1) = Tstart
end-ts(Ix+1) = INF
end-ts(Ix) = Tstart
txn-id(Ix) = 0
txn-id(Ix+1) = 0
MVTO如何解决读写冲突:
MVOCC是在OCC的基础上加上Mulit Version的功能。
MVOCC需要对每条记录额外存储以下几个信息:
MVOCC事务提交分三个阶段:
begin-ts
和end-ts
,返回该数据最新的已commit的版本事务commit timestamp
,决定了事务的serialization orderbegin-ts
设置为事务commit timestamp
,end-ts
设置为INF
一个事务修改的数据只有commit后,才能被另外一个事务看到。如果一个事务读取了过时的数据,在该事务提交后执行验证阶段会检测出来,如果冲突则回滚。
事务start流程:
Tstart = generate next timestamp
read流程:
find Ax satisfy begin-ts(Ax) <= Tstart < end-ts(Ax)
if txn-id(Ax) == 0
execute transaction
add Ax to Read Set
return Ax
else
reject read request and abort corresponding transaction
write流程:
find Bx satisfy end-ts(Bx) == INF
if txn-id(Bx) == 0
set txn-id(Bx) = Tstart
create Bx+1
set txn-id(Bx+1) = Tstart
add Bx to Write Set
else
reject write request and abort corresponding transaction
commit流程:
Tcommit = generate next timestamp
for all Rx in Read Set
if Rx is updated in committed transaction
reject and abort corresponding transaction
for all Wx in Write Set
set begin-ts(Wx+1) = Tcommit
set end-ts(Wx+1) = INF
set end-ts(Wx) = Tcommit
set txn-id(Wx) = 0
set txt-id(Wx+1) = 0
abort流程:
for all Wx in Write Set
delete Wx+1
set txn-id(Wx) = 0
MVOCC如何解决读写冲突:
MVOCC是在2PC的基础上加上Mulit Version的功能。
MV2PL需要对每条记录额外存储以下几个信息:
事务start流程:
Tstart = generate next timestamp
read流程:
find Ax satisfy begin-ts(Ax) <= Tstart < end-ts(Ax)
if txn-id(Ax) == 0
read-cnt(Ax) += 1
execute transaction
return Ax
else
reject read request and abort corresponding transaction
write流程:
find Bx satisfy end-ts(Bx) == INF
if txn-id(Bx) == 0 && read-cnt(Bx) == 0
set txn-id(Bx) = Tstart
create Bx+1
set txn-id(Bx+1) = Tstart
add Bx to Write Set
else
reject write request and abort corresponding transaction
commit流程:
Tcommit = generate next timestamp
for all Wx in Write Set
set begin-ts(Wx+1) = Tcommit
set end-ts(Wx+1) = INF
set end-ts(Wx) = Tcommit
set txn-id(Wx) = 0
set txt-id(Wx+1) = 0
for all Rx in Read Set
set read-cnt(Rx) -= 1
abort流程:
for all Wx in Write Set
delete Wx+1
set txn-id(Wx) = 0
for all Rx in Read Set
set read-cnt(Rx) -= 1
MVTO如何解决读写冲突:
算法 | 序列化时间戳 | 读写冲突 | 写读冲突 | 脏读 | 写写覆盖 |
---|---|---|---|---|---|
2PL | start-ts | ok | ok | ok | ok |
Basic T/O | start-ts | 写成功读失败 | 读成功写失败 | 有冲突 | 先写成功后写失败 |
OCC | commit-ts | 没有冲突 | 写成功读失败 | 没有冲突 | 没有冲突 |
MV-TO | start-ts | 没有冲突 | 读成功写失败 | 没有冲突 | 先写成功后写失败 |
MV-OCC | commit-ts | 没有冲突 | 写成功读失败 | 没有冲突 | 先写成功后写失败 |
MV-2PL | commit-ts | 抢到锁的成功 | 抢到锁的成功 | 没有冲突 | 抢到锁的成功 |
数据库 | 并发控制协议 |
---|---|
Oracle | MV2PL |
Postgres | MV-2PL/MV-TO |
HYRISE | MV-OCC |
Hekaton | MV-OCC |
MemSQL | MV-OCC |
SAP HANA | MV-2PL |
NuoDB | MV-2PL |
HyPer | MV-OCC |