Expectation Maximization

Expectation Maximization (EM) is a density estimation technique. Oracle Data Mining implements EM as a distribution-based clustering algorithm that uses probability density estimation.

In density estimation, the goal is to construct a density function that captures how a given population is distributed. The density estimate is based on observed data that represents a sample of the population.


Note:

Expectation Maximization requires Oracle Database 12c.

Dense areas are interpreted as components or clusters. Density-based clustering is conceptually different from distance-based clustering (such as k-Means), where emphasis is placed on minimizing intercluster and maximizing the intracluster distances.

The shape of the probability density function used in EM effectively predetermines the shape of the identified clusters. For example, Gaussian density functions can identify single peak symmetric clusters. These clusters are modeled by single components. Clusters of a more complex shape need to be modeled by multiple components. The EM algorithm assigns model components to high-level clusters by default.