Mohamed Mehdi Limam, Universite Paris Dauphine, France
and
Edwin Diday, Universite Paris Dauphine, France
and
Suzanne Winsberg, IRCAM, Paris, France
Abstract
We aim is to describe a class, C, from a given population, by partitioning
it; each class of the partition is described by a conjunction of characteristic
properties, and the class, C, is described
by a disjunction of these conjunctions. We employ a stepwise top-down binary
tree method. At each step we choose the best variable and its optimal splitting
in order to optimize simultaneously a discrimination criterion furnished by
a given prior partition of the population as well as a homogeneity criterion.
Therefore the classes we obtain are homogenous with respect to the variables
describing them, and of course will discriminated from each other with respect
to these same variables, but in addition they will be discriminated from each
other with respect to the prior partition. Not only does this approach combine
both supervised and unsupervised learning, it also deals with a data table
in which each cell contains an interval, so it deals
with symbolic data, (see Bock and Diday, 2002). The algorithm may be extended
or reduced to deal with other types of data, for example histogram type data,
(see Vrac et al, 2003), and classical data. We also introduce a new stopping
rule We illustrate the method on both
simulated and real data.
References
H.H. Bock and E. Diday (Eds.): (2002) "Analysis of Symbolic Data",
Springer, Heidelberg.
Vrac. M., Diday, E., Winsberg, S., and Limam, M.M.: (2003), Symbolic Class
Description in " Data Analysis, Classification and Related Methods;
Proceedings of the 8th Conference of the IFCS", Springer, Heidelberg.