
Sponsor: National Science Foundation (NSF), Award
No. CCF 0514679 PI: Xiaohua Hu Amount: $102,300 Duration: July 15, 2005  June 30, 2008 
Data mining (aka Knowledge Discovery in Databases, KDD) is a procedure to
extract previously unknown and potentially useful information or pattern from
huge data sets. KDD is usually a multiphase process involving numerous steps
such as data preparation, data preprocessing, feature selection, rule
induction, knowledge evaluation and deployment etc. Many novel data mining and
learning algorithms have been developed, though vigorously, under rather add
hoc and vague concepts. These algorithms, in most cases, are individual
creations of different researchers, without much common methodological and fundamental
framework. In other words, great majority of work in data mining is focused on
algorithm development while neglecting the studies of fundamental theoretical
issues concerning data, interdata relationships, and quality of the implicit
information hidden in the data or data redundancies. Thus, it is not easy to
fully understand and evaluate how individual phase influences each other and
the impact of each phase on the whole knowledge discovery process. For further
development and breakthroughs in data mining and learning algorithms, a deep
examination of its foundation is necessary. The central goal of the proposed
research is to develop a unified rough set based data mining framework to
explore various fundamental issues of data mining and learning algorithms. It
aims to present the analytical capabilities of the methodology of rough sets in
the context of data mining methodologies, techniques and applications. It will
provide a unified framework to help better understand the whole KDD process.
Intellectual
merit: Rough set theory is particularly suited to reasoning about imprecise or
incomplete data and discovering relationships in the data. The simplicity and
mathematical clarity of rough set theory makes it attractive for both
theoreticians and applicationoriented researchers. The main advantage of rough
set theory is that it does not require any preliminary or additional
information about the data, such as probability in statistics, basic
probability assignment in DempsterShafer theory or the value of membership in
fuzzy set theory. Rough set theory constitutes a sound basis for KDD and can be
used in different phases of the KDD process. In particular, the formal
techniques of rough set theory lead to many novel and promising breakthrough methods
and algorithms for attribute functional, or partial functional dependencies,
their discovery, analysis, and characterization, feature election, feature
extraction, data reduction, decision rule generation, and pattern extraction
(templates, association rules) etc., which are the fundamental issues of the
KDD process. Rough set theory represents a new innovative approach and can lead
to the development of new learning algorithms to create novel uses and
breakthroughs of data mining techniques.