Schemas

  • Data Domain - a usually named set of atomic values - each value is indivisible. We provide a description of the domain using data types or formats.

  • Data Definition Language - a computer language that is used to create and modify the structure of database objects, especially data schemas.

    • Typically includes CREATE, ALTER, and DROP applicable for tables.
  • Data Manipulation Language - computer programming language used for operating on data within a database.

  • A schema is a description of a particular collection of data using a given data model .

  • Database View - a subset of a database based on a query that runs on one or more tables .

  • A relational database schema is defined as a set of Relation schemas and a set of integrity constraints where

  • A relational database instance is a set of relation instances which satisfy the relational database schema.

    • We may think of this as a snapshot of the state of a particular database at any given time.

Relations

  • A relation is a set of tuples . Where and is the data domain.

  • Informally, we may think of relations as some logical structure with columns and rows such that:

    • The columns represent attributes
    • The allowed values for each attribute represent the data domain
    • The rows represent a tuple
  • A Relation Schema is a description of a relation. More formally it is notated as

    where is the name of the relation, and are the set of attributes of the relations.

Basic Properties

  • Each relation name in a relational database must be distinct
  • The attribute names must be distinct from any attribute names of the same relation.
  • The order of the attributes specified in is not significant.
  • Each tuple in a relation is distinct. No duplicate tuples exist.
  • The order of tuples is insignificant.
  • Each cell in the relation table must contain exactly one value (Atomicity).
  • Each of refers to an attribute of the relation.

Keys

  • A key is a minimal superkey

  • A superkey is a proper subset of attributes for which no tuples may have the same combination of values.

  • A candidate key is any key. It is any set of attributes chosen from the relation.

  • A primary key is a key chosen to act as the means to which to identify tuples in a relation. We prefer this to be as small as possible.

  • A foreign key of relation is a set of its attributes intending to be used (by each tuple in ) for identifying or referring to a tuple in some relation .

    is the referencing relation. is the referenced relation.

    • The set of attributes of forming the foreign key should correspond to some super key of (typically its primary key )

Multiplicity

  • The Degree refers to the number of attributes in a relation (i.e., number of columns)
  • The Cardinality denotes how many tuples of the referenced relation can correspond to a tuple in the referencing relation .
  • The Participation denotes whether or not some tuple of the referencing relation should correspond to a tuple in the referenced relation .
  • The Multiplicity of entity relationships is a pair which specifies the participation and the cardinality given two relations and related by a foreign key

Relational Algebra

  • The relational algebra is an approach that defines fundamental operations that manipulate and operate on tuples in a relation.

  • They are defined as follows.

    Let be relations be a set of attributes in the relation

    • Select - denoted . We choose a subset of tuples in that satisfy the predicate. The predicate acts as a filter based on some requirements.
    • Projection - denoted . We generate relations that contain only the specified attributes allowing for rearranging their orderings and manipulate their values.
    • Union - denoted . It is the relation that contains all the tuples that appear in at least one of the relations.
    • Intersection - denoted . It is the relation that contains all the tuples that appear in both of the relations.
    • Differnce - denoted . It is the relation that contains all the tuples that appear in but not in
    • Product - denoted . It is the relation that contains all possible combinations of tuples from the input relations.
    • Join - denoted . It is the relation that contains all tuples that are a combination of two tuples (one from each of R and S) with a common value for one or more attributes.

Constraints

  • Data Integrity - maintenance and assurance of accuracy and consistency of data within the database.

  • An integrity constraint specifies permissible values inside the database. It must hold for all valid relation instances and states.

    Motivation: They serve to guard against accidental damage to the database such that authorized changes should not result in a loss of data consistency.

    • Inherent or Implicit Constraints - Constraints inherent in the definitions and assumptions of a particular data model hold in every database having that data model as its underpinning.
    • Schema-based or Explicit Constraints - expressed by using the facilities provided by the model via its DML.
      • Domain Constraint - the value of each attribute must be an atomic value from its domain.
      • Key Constraint - no two tuples may have the same combination of values in their attributes.
        • A relation is a set of tuples, and each tuple’s “identity” is given by the values of its attributes. Hence, it makes no sense for two tuples in a relation to be identical (because then the two tuples are actually one and the same tuple).
      • Entity Integrity Constraint - in a tuple, none of the values of the attributes forming the relation’s *primary key may have the (non)-value null. *
      • Referential Integrity Constraint - for every tuple in relation , the tuple to which it refers to must actually be in .
        • When a foreign key exists, the value must match a candidate key value of some tuple in the referenced relation.
        • In some cases we may permit the foreign keys have NULL values . but it must does so in all its attributes. Such a value denotes that the tuple in does not refer to any tuple in .
      • Semantic Integrity Constraint- a business specific rule that limits the permissible values within a relation. It ensures that the values must be in a specified domain
    • Application-based or Semantic Constraint - Constraints that are beyond the expressive power of the model and must be specified and enforced by the application itself. These are specifically related to the business rules.

Dependencies

  • Given a relation with attributes . is said to functionally determine if and only if each value in is associated with precisely one value of . That is, the values of can be determined simply by looking at the corresponding values of .

  • A transitive dependency exists when is dependent on , and is dependent on implies that is dependent on .

Normalization

  • Normalization is a database design technique that is designed to reduce data redundancy and improve data integrity by following a normal form.
  • Denormalization is the opposite process. This is relevant in query optimization to minimize the number of joins we need to perform

First Normal Form

  • First Normal Form is a property of a relation wherein

    • No attribute domain has relations as elements (i.e., no table columns may have tables or repeating groups as values).
    • Each set of related data must be its own table identified with their own primary key
  • Enforcement: Do not use multiple fields in a single table to store similar data.

  • Enforcement: Use foreign keys to declare relationships between relations.

Second Normal Form

  • Second Normal Form is a property of a relation wherein

    • It is in first normal form
    • It does not have any candidate key that is functionally dependent on any proper subset of any candidate key of the relation
    • It does not depend on anything other than the primary key .
  • Enforcement: Create separate tables for sets of values that apply to multiple records.

  • Enforcement: Relate tables with a foreign key

Third Normal Form

  • Third Normal Form is a property of a relation wherein

    • It is in second normal form
    • All attributes in the relation solely depend on the primary key
    • No attributes have transitive dependency with another key
  • Enforcement: We can enforce as follows

    • Eliminate attributes in the relation that are not dependent on the key.
    • If the contents of a group of attributes may apply to more than a single record in the table, then separate them into a new table.
    • Compromise: If it is not feasible to enforce this, consider enforcing it only on data that frequently changes.