-means clustering
Procedure
- Pick 
starting points from a list of sample points.  - Each data point is labelled based on which of the starting points it is closest to.
 - Calculate the mean location of each cluster (i.e., its clustroid) and perform another pass of re-clustering.
- If the clusters in this pass are the same as those in the previous pass, we can stop.
 
 
Considerations
- Vary the sample points chosen to start the algorithm. Ideally, we want low total variance from points to their clustroid.
 
Choosing 
- Vary 
depending on the domain knowledge or quality of clustering.  - Use the 
that is at the ”elbow” of the variance-vs- plot. - This is subjective and sometimes not-obvious in the plot.
 
 - Use the Silhouette method to measure how similar points tend to be to points in their clusters compared to other clusters.
- Pick the 
which maximizes the silhouette score.  
 - Pick the 
 
DBSCAN
- Involves creating clusters starting from select core points— points which have 
other points within distance of them (for tunable and ).  - The cluster is extended by adding all core points that are reachable (within distance 
) from core points already in the cluster.  - Finally, all non-core points that are distance 
to a core point in the cluster are added to the cluster.  - This process is repeated until all points are added.
 - For more information on this see the Wikipedia page
 
Hierarchical
- A technique that seeks to build a hierarchy of clusters from a specified distance metric
 - Agglomerative Clustering is bottom up—each observation starts in its own cluster and pairs of clusters are merged as one moves up the hierarchy.
 - Divisive Clustering is top-down—all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
 
Calculating Distance between Clusters
For each of these,
let 
- 
Complete Linkage Clustering - clusters based on the maximum distance between two elements of two clusters.
 - 
Single Linkage Clustering - clusters based on the minimum distance between two elements of two clusters.
 - 
Unweighted Average Linkage Clustering - clusters based on the average distance between all pairs of elements from
.  
Gaussian Mixture Models
- Assume all datapoints with a certain label are distributed based on a Gaussian distribution parameterized by label-specific mean 
, covariance and label probability .  - The goal, then, is to maximize the probability of observing a datapoint given the label-specific Gaussian distributions.
- This is done by tweaking each of the 
, and .  
 - This is done by tweaking each of the 
 - See more here