SEMINAR ABSTRACT
Algorithms and Inference for Mixture Models with Application to Protein Sequence
Analysis
Youyi Fong, Department of
Biostatistics, University of Washington
Mixture model-based clustering is a commonly used statistical
tool. By combining bottom-up hierarchical clustering and partitional clustering,
I propose new algorithms that are the only viable solutions to some challenging
clustering problems, which tend to involve high dimensional observations.
Insights into why the new algorithms perform so well can be obtained by drawing
from the stochastic local search literature. The clustering problem that
motivates my study of the algorithms is the modeling of a protein family as a
mixture of profile hidden Markov models to identify functional subgroups to
improve genome annotation. Using the asymptotically consistent Bayesian
Information Criterion to select the number of components in the mixture turns
out to over-penalize in datasets of practical sizes. On the other hand, using
Bayes factors with substantive priors, but not the default prior, has
satisfactory finite sample performance. Differences between the default and the
substantive priors shed light on the roles of priors in estimating the mixture
order. This is joint work with Drs. Jon Wakefield and Ken Rice at the University
of Washington.