• A distributed database is a database where data is stored across different physical locations. In the system, there are loosely coupled sites that share no physical components.

    • We may store the data by using fragmentation and replication. Each fragment is stored across sites and the system maintains replicas of each fragment.
    • a homogeneous distributed database is where database is stored identically.
      • All systems use identical DBMSs and OS’s
      • Data can be accessed and modified simultaneously on several databases.
      • From the user’s perspective, the database is no different to a single non-distributed database.
    • A heterogeneous distributed database is where different sites use different schemas or software.
      • Each site uses different DBMS software.
      • Coordination is done through generic connectivity, which is usually limited.
  • A Distributed Database Management System is a centralized software system that manages a distributed database in a manner as if it were stored in a single location

    • The DDBMS should facilitate distributed transactions through the following:
      • Each site has a local transaction manager which controls and coordinates the execution of transactions.
      • Each site has a transaction coordinator which is responsible for
        • Starting the execution of transactions that originate at its site
        • Distributing transactions at appropriate sites
        • Coordinating the termination of each transaction
  • Data Transparency refers to the degree in which a system user remains unaware of the details of how and where data are stored in the distributed system.

Fragmentation

  • Fragmentation - pertains to partitioning a relation into several fragments. These fragments may then be stored in different sites (i.e., a distributed system)

  • Horizontal Fragmentation - is a variation of fragmentation wherein each tuple in a relation is assigned to one of more fragments. It allows for:

    • Parallelization on fragments of a relation
    • The relation can be split so that tuples are closer to where they are frequently accessed.
  • Vertical Fragmentation - a subtype of fragmentation wherein the schema for a relation is split into smaller schemas such that

    • All schemas contain a common candidate key
    • Optionally, a tuple id attribute is added to each schema to serve as a candidate key.
    • This has the following consequences
      • Allows a tuple to be split so that each part of the tuple is closer to where it is frequently accessed. Allows a tuple to be split so that each part of the tuple is closer to where it is frequently accessed.
      • The tuple-id allows joins across vertical fragments.
      • Parallelization for relations

Replication

  • Replication - a technique in managing distributed databases using specialized software that looks for changes in the database.

    • Once changes have been identified, the replication process makes all the databases across all sites look the same.
    • This may be done on the relation level or on the database level. In either case, we say this group of data is replicated.
  • Full Replication - a subtype of replication wherein an entire relation is stored across all sites in a distributed database.

    • A fully-redundant database storage is one where the entire database is replicated across all sites in a distributed database

Transactions

Replication

  • A technique using specialized software that looks for changes in the database.
  • Once changes have been identified, the replication process makes all the databases across all sites look the same.
  • Advantages
    • More robust system
    • Better data availability
    • Easier parallelization on by distributing transactions across several nodes
    • Reduced data transfer, especially when data is stored close to where it is used.
  • Disadvantages
    • Increased cost for CRUD operations since we must update all sites
    • Increased complexity in terms of concurrency control which can affect data consistency.

Two Phase Commit

  • Two Phase Commit is an atomic commit protocol that coordinates all the processes that participate in a distributed transaction.

  • Assume:

    • One node is a designated coordinator. The remaining nodes in the distributed system are participants
    • There is stable storage at each node with a write-ahead log
    • No node crashes forever; data is never lost or corrupted in a crash.
    • Nodes can communicate to each other.
  • The protocol is initiated by the coordinator after the last step of the transaction is reached.

  • Participants then respond with an agreement or an abort message depending on if the transaction has been processed successfully.

Commit Request

  • Coordinator sends a query to commit message to all participants, and waits until it has received a reply from all participants.
  • Participants execute the transaction up to the point where it will be committed. They then write an entry to their undo and redo logs.
  • Participants reply with a vote (either Yes to Commit or No to Commit) depending on if the participant performed the actions successfully or if a failure happened.

Commit Phase

  • On a success

    • The coordinator sends the commit message to all participants.
    • Each participant completes the operation, and releases any resources held during the transaction.
    • Send an acknowledgement to the coordinator.
    • The coordinator completes the transaction when all acknowledgements are received
  • On a failure of any kind from the participants.

    • Send a rollback message to all participants.
    • All participants undo the transaction and release resources held during the transaction.
    • All participants send an acknowledgement.
    • The coordinator undoes the transaction when all acknowledgements are received.
  • On a failure of any kind from the coordinator.

    • Look at the transaction logs to determine the fate of the transactions.
    • Treat the transaction as a success if all participants sent a yes-to-commit, then handle the success.
    • Treat the transaction as a failure if at least one participants sent a no-to-commit, then handle the failure as normal.
  • Note that this introduces a blocking problem-all participants must wait for the coordinator to finish.

Links