|
ABSTRACT Dr. Chun-houh Chen, Assistant Research Fellow, Institute of Statistical Science, Academia Sinica, Taiwan Conventional data visualization tools for high dimensional data set
usually adopt dimension reduction technique such as principal component
analysis to project data structure from a higher dimensional space down to
a lower dimensional configuration. This dimension reduction procedure is
also an information reduction process. Dimension reduction is necessary
since conventional tools always use a scatter-plot type of display to
represent the metric relationship between observations geometrically. GAP
is a dimension free visualization environment for multivariate data
structure. Given a multivariate data set, GAP first compute the proximity
matrices for variables as well as for subjects. Proper seriations
(permutations) are searched for rearrange these two matrices to satisfy
certain properties. Double sorted raw data matrix together with two sorted
proximity matrices are then projected through appropriate color spectrums
to create matrix maps. These three maps should be cross-examined to
identify three major pieces of information contained in any multivariate
data set: 1. the linkage amongst n subject points in the p-dimensional
space (subject-clusters); 2. the linkage between p variable vectors in
the n-dimensional space (variable-groups); and 3. the interaction linkage
between the sets of subjects and variables.
Several modules have been added to GAP: A dynamic clustering
procedure using GAP (DynaGAP) is developed for systematically searching
for clustering pattern for both subjects and variables. When data profile
is observed more than once, a longitudinal version of GAP (LongGAP) with
parallel linkage and overlapping linkage is designed to study the 3
linkages over time. CateGAP (Categorical GAP) is also created for
visualizing the information structure for data set of categorical nature.
CanoGAP (Canonical GAP) is good for comparing the similarity and
difference structure for two sets of variables measured on the same set of
subjects.
GAP was originally developed for analyzing data sets from the
Taiwan multidimensional psychopathological group research program (MPGRP).
It has become a quite powerful environment for information visualization
for assisting general purpose multivariate analyses.
Return to Longitudinal/Survival Working Group List | Return to Home Page |
|