24.4 Question: Like database systems, workflow systems also require concurrency and recovery management. List three reasons why we cannot simply apply a relational database system using 2PL, physical undo logging, and 2PC.
24.4 Answer:
a) The tasks in a workflow have dependencies based on their status. For example the starting of a task may be conditional on the outcome (such as commit or abort) of some other task. All the tasks cannot execute independently and concurrently, using 2PC just for atomic commit.
b) Once a task gets over, it will have to expose its updates, so that other tasks running on the same processing entity don’t have to wait for long. 2PL is too strict a form of concurrency control, and is not appropriate for workflows.
c) Workflows have their own consistency requirements, i.e. failure-atomicity.
An execution of a workflow must finish in an acceptable termination state.
Because of this, and because of early exposure of uncommitted updates, the recovery procedure will be quite different. Some form of logical logging and compensation transactions will have to be used. Also to perform forward recovery of a failed workflow, the recovery routines need to restore the state information of the scheduler and tasks, not just the updated data items. Thus simple WAL cannot be used.
24.6 Question: Consider a main-memory database system recovering from a system crash. Explain the relative merits of
· Loading the entire database back into main memory before resuming transaction processing
· Loading data as it is requested by transactions
24.6 Answer:
- Loading the entire database into memory in advance can provide transactions which need high-speed or real time data access the guarantee that once they start they will not have to wait for disk accesses to fetch data. However no transaction can run till the entire database is loaded.
- The advantage in loading on demand is that transaction processing can start right away; however transactions may see long and unpredictable delays in disk access until the entire database is loaded into memory.
24.8 Question: Is a high-performance transaction system necessarily a real-time system? Why or why not?
24.8 Answer: A high-performance system is not necessarily a real-time system. In a high performance system, the main aim is to execute each transaction as quickly as possible, by having more resources and better utilization. Thus average speed and response time are the main things to be optimized. In a real-time system, speed is not the central issue. Here each transaction has a deadline, and taking care that it finishes within the deadline or takes as little extra time as possible, is the critical issue.
24.10 Question: Explain why it may be impractical to require serializability for long-duration transactions.
24.10 Answer: In the presence of long-duration transactions, trying to ensure serializability has several drawbacks: -
a) With a waiting scheme for concurrency control, long-duration transactions will force long waiting times. This means that response time will be high, concurrency will be low, so throughput will suffer. The probability of deadlocks is also increased.
b) With a time-stamp based scheme, a lot of work done by a long-running transaction will be wasted if it has to abort.
c) Long duration transactions are usually interactive in nature, and it is very difficult to enforce serializability with interactive ness. Thus the serializability requirement is impractical. Some other notion of database consistency has to be used in order to support long duration transactions.
24.11 Question: Consider a multithreaded process that delivers messages from a durable queue of persistent messages. Different threads may run concurrently, attempting to deliver different messages. In case of delivery failure, the message must be restored in queue. Model the actions that each thread carries out as a multilevel transaction, so that locks on the queue need not be held till message is delivered.
24.11 Answer: Each thread can be modeled as a transaction T that takes a message from the queue and delivers it. We can write transaction T as a multilevel transaction with sub transactions T1 and T2. Sub transaction T1 removes a message from the queue and sub transaction T2 delivers it. Each sub transaction releases locks once it completes, allowing other transactions to access the queue. If transaction T2 fails to deliver the message, transaction T1 will be undone by invoking a compensating transaction which will restore the message to the queue.
24.12 Question: Discuss the modifications that need to be made in each of the recovery schemes covered in Chapter on Recovery System, if we allow nested transactions. Also, explain any differences that result if we allow multilevel transactions.
24.12 Answer:
- The advanced recovery algorithm (in Section 17.9): -
The redo pass, which repeats history, is the same as before. We discuss below how the undo pass is handled.
Recovery with nested transactions:
Each sub transaction needs to have a unique TID, because a failed sub transaction might have to be independently rolled back and restarted. If a sub transaction fails, the recovery actions depend on whether the unfinished upper-level transaction should be aborted or continued. If it should be aborted, all finished and unfinished sub transactions are undone by a backward scan of the log (this is possible because the locks on the modified data items are not released as soon as a sub transaction finishes). If the nested transaction is going to be continued, just the failed transaction is undone, and then the upper-level transaction continues.
In the case of a system failure, depending on the application, the entire nested-transaction may need to be aborted, or, (for e.g., in the case of long duration transactions) incomplete sub transactions aborted, and the nested transaction resumed. If the nested-transaction must be aborted, the rollback can be done in the usual manner by the recovery algorithm, during the undo pass. If the nested-transaction must be restarted, any incomplete sub transactions that need to be rolled back can be rolled back as above. To restart the nested-transaction, state information about the transaction, such as locks held and execution state, must have been noted on the log, and must restored during recovery. Mini-batch transactions (discussed in Section 21.2.7) are an example of nested transactions that must be restarted.
Recovery with multi-level transactions:
In addition to what is done in the previous case, we have to handle the problems caused by exposure of updates performed by committed sub transactions of incomplete upper-level transactions. A committed sub transaction may have released locks that it held, so the compensating transaction has to reacquire the locks. This is straightforward in the case of transaction failure, but is more complicated in the case of system failure.
The problem is, a lower level sub transaction a of a higher-level transaction A may have released locks, which have to be reacquired to compensate A during recovery. Unfortunately, there may be some other lower level sub transaction b of a higher-level transaction B that started and acquired the locks released by a, before the end of A. Thus undo records for b may precede the operation commit record for A. But if b had not finished at the time of the system failure, it must first be rolled back and its locks released, to allow the compensating transaction of A to reacquire the locks.
This complicates the undo pass; it can no longer be done in one backward scan of the log.
- Recovery in a shadow paging scheme: -
In a shadow paging based scheme, the implementation will become very complicated if the sub transactions are to be executed concurrently. If they are to execute serially, the current page table is copied to the shadow page table at the end of every sub transaction. The general idea of recovery then is alike to the logging based scheme, except that undoing and redoing become much easier, as in Section 17.5.
24.14 Question: Consider a multi database system in which it is guaranteed that most one global transaction is active at any time, and every local site ensures local serializibility.
a) Suggest ways in which multidatabase system can ensure that there is at most one active global transaction at any time.
b) Show by example that it is possible for a non-serializable global schedule to result despite the assumptions.
24.14 Answer:
a) We can have a special data item at some site on which a lock will have to be obtained before starting a global transaction. The lock should be released after the transaction completes. This ensures the single active global transaction requirement. To reduce dependency on that particular site being up, we can generalize the solution by having an election scheme to choose one of the currently up sites to be the coordinator, and requiring that the lock be requested on the data item, which resides on the currently elected coordinator.
b) The following schedule involves two sites and four transactions. T1 and T2 are local transactions, running at site 1 and site 2 respectively. TG1 and TG2 are global transactions running at both sites. X1, Y1 are data items at site 1, and X2, Y2 are at site 2.
T1 T2 TG1 TG2
write(Y1)
read(Y1)
write(X2)
read(X2)
write(Y2)
read(Y2)
write(X1)
read(X1)
In this schedule, TG2 starts only after TG1 finishes. Within each site, there is local serializability. In site 1, TG2 ® T1 ® TG1 is a serializability order. In site 2, TG1 ® T2 ® TG2 is a serializability order. Yet the global schedule is non-serializable.
24.15 Question: Consider a multi database system in which every local site ensures local serializibility, and all global transactions are read only.
a) Show by example that non-serializable execution may result in such a system.
b) Show how you could use a ticket scheme to ensure global serializability.
24.15 Answer:
a) The same system as in the answer to Exercise 24.14 is assumed, except that now both the global transactions are read-only. Consider the schedule given below.
T1 T2 TG1 TG2
read(X1)
write(X1)
read(X1)
read(X2)
write(X2)
read(X2)
Though there is local serializability in both sites, the global schedule is not serializable.
b) Since local serializability is guaranteed, any cycle in the system wide precedence graph must involve at least two different sites, and two different global transactions. The ticket scheme ensures that whenever two global transactions access data at a site, they conflict on a data item (the ticket) at that site. The global transaction manager controls ticket access in such a manner that the global transactions execute with the same serializability order in all the sites. Thus the chance of their participating in a cycle in the system wide precedence graph is eliminated.
No comments:
Post a Comment