Schemas
-
Data Domain - a usually named set of atomic values - each value is indivisible. We provide a description of the domain using data types or formats.
-
Data Definition Language - a computer language that is used to create and modify the structure of database objects, especially data schemas.
- Typically includes CREATE, ALTER, and DROP applicable for tables.
-
Data Manipulation Language - computer programming language used for operating on data within a database.
-
A schema is a description of a particular collection of data using a given data model .
-
Database View - a subset of a database based on a query that runs on one or more tables .
-
A relational database schema
is defined as a set of Relation schemas and a set of integrity constraints where -
A relational database instance is a set of relation instances which satisfy the relational database schema.
- We may think of this as a snapshot of the state of a particular database at any given time.
Relations
-
A relation is a set of tuples
. Where and is the data domain. -
Informally, we may think of relations as some logical structure with columns and rows such that:
- The columns represent attributes
- The allowed values for each attribute represent the data domain
- The rows represent a tuple
-
A Relation Schema is a description of a relation. More formally it is notated as
where
is the name of the relation, and are the set of attributes of the relations.
Basic Properties
- Each relation name in a relational database must be distinct
- The attribute names must be distinct from any attribute names of the same relation.
- The order of the attributes specified in
is not significant. - Each tuple in a relation is distinct. No duplicate tuples exist.
- The order of tuples is insignificant.
- Each cell in the relation table must contain exactly one value (Atomicity).
- Each of
refers to an attribute of the relation.
Keys
-
A key is a minimal superkey
-
A superkey is a proper subset of attributes for which no tuples may have the same combination of values.
-
A candidate key is any key. It is any set of attributes chosen from the relation.
-
A primary key is a key chosen to act as the means to which to identify tuples in a relation. We prefer this to be as small as possible.
-
A foreign key of relation
is a set of its attributes intending to be used (by each tuple in ) for identifying or referring to a tuple in some relation . is the referencing relation. is the referenced relation. - The set of attributes of
forming the foreign key should correspond to some super key of (typically its primary key )
- The set of attributes of
Multiplicity
- The Degree refers to the number of attributes in a relation (i.e., number of columns)
- The Cardinality denotes how many tuples of the referenced relation
can correspond to a tuple in the referencing relation . - The Participation denotes whether or not some tuple of the referencing relation
should correspond to a tuple in the referenced relation . - The Multiplicity of entity relationships is a pair which specifies the participation and the cardinality given two relations
and related by a foreign key
Relational Algebra
-
The relational algebra is an approach that defines fundamental operations that manipulate and operate on tuples in a relation.
-
They are defined as follows.
Let
be relations be a set of attributes in the relation - Select - denoted
. We choose a subset of tuples in that satisfy the predicate. The predicate acts as a filter based on some requirements. - Projection - denoted
. We generate relations that contain only the specified attributes allowing for rearranging their orderings and manipulate their values. - Union - denoted
. It is the relation that contains all the tuples that appear in at least one of the relations. - Intersection - denoted
. It is the relation that contains all the tuples that appear in both of the relations. - Differnce - denoted
. It is the relation that contains all the tuples that appear in but not in - Product - denoted
. It is the relation that contains all possible combinations of tuples from the input relations. - Join - denoted
. It is the relation that contains all tuples that are a combination of two tuples (one from each of R and S) with a common value for one or more attributes.
- Select - denoted
Constraints
-
Data Integrity - maintenance and assurance of accuracy and consistency of data within the database.
-
An integrity constraint specifies permissible values inside the database. It must hold for all valid relation instances and states.
Motivation: They serve to guard against accidental damage to the database such that authorized changes should not result in a loss of data consistency.
- Inherent or Implicit Constraints - Constraints inherent in the definitions and assumptions of a particular data model hold in every database having that data model as its underpinning.
- Schema-based or Explicit Constraints - expressed by using the facilities provided by the model via its DML.
- Domain Constraint - the value of each attribute must be an atomic value from its domain.
- Key Constraint - no two tuples may have the same combination of values in their attributes.
- A relation is a set of tuples, and each tuple’s “identity” is given by the values of its attributes. Hence, it makes no sense for two tuples in a relation to be identical (because then the two tuples are actually one and the same tuple).
- Entity Integrity Constraint - in a tuple, none of the values of the attributes forming the relation’s *primary key may have the (non)-value null. *
- Referential Integrity Constraint - for every tuple in relation
, the tuple to which it refers to must actually be in . - When a foreign key exists, the value must match a candidate key value of some tuple in the referenced relation.
- In some cases we may permit the foreign keys have NULL values . but it must does so in all its attributes. Such a value denotes that the tuple in
does not refer to any tuple in .
- Semantic Integrity Constraint- a business specific rule that limits the permissible values within a relation. It ensures that the values must be in a specified domain
- Application-based or Semantic Constraint - Constraints that are beyond the expressive power of the model and must be specified and enforced by the application itself. These are specifically related to the business rules.
Dependencies
-
Given a relation
with attributes . is said to functionally determine if and only if each value in is associated with precisely one value of . That is, the values of can be determined simply by looking at the corresponding values of . -
A transitive dependency exists when
is dependent on , and is dependent on implies that is dependent on .
Normalization
- Normalization is a database design technique that is designed to reduce data redundancy and improve data integrity by following a normal form.
- Denormalization is the opposite process. This is relevant in query optimization to minimize the number of joins we need to perform
First Normal Form
-
First Normal Form is a property of a relation wherein
- No attribute domain has relations as elements (i.e., no table columns may have tables or repeating groups as values).
- Each set of related data must be its own table identified with their own primary key
-
Enforcement: Do not use multiple fields in a single table to store similar data.
-
Enforcement: Use foreign keys to declare relationships between relations.
Second Normal Form
-
Second Normal Form is a property of a relation wherein
- It is in first normal form
- It does not have any candidate key that is functionally dependent on any proper subset of any candidate key of the relation
- It does not depend on anything other than the primary key .
-
Enforcement: Create separate tables for sets of values that apply to multiple records.
-
Enforcement: Relate tables with a foreign key
Third Normal Form
-
Third Normal Form is a property of a relation wherein
- It is in second normal form
- All attributes in the relation solely depend on the primary key
- No attributes have transitive dependency with another key
-
Enforcement: We can enforce as follows
- Eliminate attributes in the relation that are not dependent on the key.
- If the contents of a group of attributes may apply to more than a single record in the table, then separate them into a new table.
- Compromise: If it is not feasible to enforce this, consider enforcing it only on data that frequently changes.