- Set based data structures are usually represented with a list of items. Generally, these data structures support the following operations which either query or modify the list.
SEARCH(S, k)
- given a setand key value , return a pointer to an element in such that x.key == k
orNULL
if no such element exists.INSERT(S, x)
- augment the setwith the element pointed to by . DELETE(S, x)
- given a pointerto an element in , remove from . MINIMUM(S)
- assuming a totally ordered set, return a pointer to the element in with the smallest key. MAXIMUM(S)
- assuming a totally ordered set, return a pointer to the element in with the largest key. SUCCESSOR(S, x)
- assuming a totally ordered set, return a pointer to the next larger element in or NIL
ifPREDECESSOR(S, x)
- assuming a totally ordered set, return a pointer to the next smaller element in or NIL
if
Stack, Queue, and Deque
-
In a stack, we follow a Last In First Out policy. The first element deleted is the last element to be inserted.
- We refer to insertion as push and deletion as pop.
- A stack underflows if we try to pop an empty stack.
- A stack overflows if we exceed the maximum capacity of the stack
(if any exist)
-
In a queue, we follow a First In First Out policy. The first element deleted is the first element inserted.
- We refer to insertion as enqueue and deletion as dequeue
- When we enqueue an element, it becomes the tail of the queue.
- When we dequeue the queue, we remove the head of the queue and replace it with its successor.
-
A deque is a double ended queue where we allow insertion and deletion on both ends of the list.
Linked List
- A linked list is a data structure where objects are arranged in a linear order where an item points (via pointer) to the next item on the list.
- A doubly linked list extends the linked list to allow items to point to the previous item as well.
- A circular list extends a linked list by linking the tail to the head.
- We can add a sentinel — a dummy object that simplifies boundary conditions (i.e., when dealing with the predecessor of the head and the successor of the tail).
- By doing this, we create a “circular list”.
Flattened Lists
- A flattened list is an umbrella term for a data structure that is usually represented with pointers, but is instead represented as an ordered list of items.
- We use this structure if pointers and objects are unavailable to us.
- Sometimes, this creates the effect of simulating how the computer operates on memory — including allocation, deletion, and garbage collection
Specialized Lists
-
A suffix array is a sorted array of all the suffixes of a string. They can be seen as a space efficient alternative to Suffix Trees
Let
be an -string and the substring ranging from to inclusive. The suffix array
of is the array of integers providing the starting positions of suffixes of in lexicographic order. That is contains the starting position of the -th smallest suffix of . - A suffix array can be constructed using DFS on a suffix tree assuming edges are visited in lexicographic order.
- A suffix array requires
space, whereas the original text only requires where is the size of the alphabet.
Hashed Data Structure
Disjoint Set Operations
-
A disjoint set data structure maintains a collection
of disjoint dynamic sets. Each set is identified by a representative, some member of the set. -
It supports the following operations
MAKE_SET(x)
- create a new set whose only member is, where is not in some other dataset UNION(x,y)
- unites the dynamic sets that containand into a new set that is the union of these two sets. We then destroy the input sets. FIND_SET(x)
- returns a pointer to the representative of the unique set containing.
-
One way to represent disjoint sets is through a linked list. Each set corresponds to one linked list with a
head
andtail
. All elements in the set are connected to the set’shead
.MAKE_SET
andFIND_SET
taketime. FIND_SET
can be done by outputting the head of the set.- For
UNION
we can consider using the weighted union heuristic where we append the shorter list into the longer list. - (CLRS 21.1) Using the linked list representation of disjoint sets, a sequence of
MAKE_SET
,UNION
andFIND_SET
operationsof which are MAKE_SET
operations, takestime. - This follows because each object’s pointer is updated at most
times over all UNION
operations
- This follows because each object’s pointer is updated at most
-
Another possible representation is through Rooted trees where each node contains one member and each tree represents one set. The root of each tree contains its representative and is its own parent
- A
UNION
operation causes the root of one tree to point to the root of another. FIND_SET
traverses the tree until we reach the root of the tree.- We can use the following heuristics:
- The union by rank heuristic involves making the root of the tree with fewer nodes point to the tree with more nodes. Here, we maintain a rank — an upper bound on the height of the node.
- This gives a runtime of
with MAKE_SET
andtotal operations. - This introduces a
LINK
operation that performs the union by rank heuristic
- This gives a runtime of
- The path compression heuristic involves using each node on a path point directly to the root.
- Here, we make two passes whenever we do
FIND_SET
. The first is to find the node, and the second is to update each node to point to the root directly - For a sequence of
MAKE_SET
andFIND_SET
operations, this gives a worst case running time of
- Here, we make two passes whenever we do
- (CLRS 21.14) A sequence of
MAKE_SET
,UNION
, andFIND_SET
operationsof which are MAKE_SET
can be performed on a disjoint forest with union by rank and path compression in worst-casewhere is the Ackermann function . - (CLRS 21.11) - the amortized cost of each
MAKE_SET
operation is - (CLRS 21.12) - the amortized cost of each
LINK
operation is - (CLRS 21.13) - the amortized cost of each
FIND_SET
operation is
- (CLRS 21.11) - the amortized cost of each
- The union by rank heuristic involves making the root of the tree with fewer nodes point to the tree with more nodes. Here, we maintain a rank — an upper bound on the height of the node.
- A