1.
10   Major   Issues in Data   Mining
                       and          interaction issues
Mining methodology           user
    These reflect the kinds of knowledge mined, the ability to mine
knowledge at multiple granularities, the use of domain knowledge, ad
hoc mining and knowledge visualization.
    Mining different kinds of knowledge in databases:
                                                               Since   different
users are interested in different kinds of knowledge, data mining should
cover a wide spectrum of data analysis and knowledge discovery tasks
including  data characterization, discrimination, association and
correlation analysis, classification, prediction, clustering and outlier
                                                           different
analysis. These tasks may use the same database in        ways              and
require the development of numerous data mining techniques.
    Interactive mining of knowledge at multiple levels ofabstraction
Since it is difficult to know exactly what can be discovered within a
database, the data mining process should be interactive. For databases
containing a huge amount of data, appropriate sampling techniques can
be applied to facilitate interactive data exploration. Interactive mining
allow users to focus the search for patterns, refining data mining requests
based on returned results. In this way, user can interact with the data
mining system to view data and discovered patterns at multiple
granularities and from different angles.
     Incorporation of background knowledge: Background knowledge
or information regarding the domain under study may be used to guide.
the discovery process and allow discovered patterns to be expressed in
concise terms and at different levels of abstraction. Domain knowledge
related to databases, such as integrity constraints can help focus and
speed up a data mining process, or judge the interestingness of discovered
patternsS.
     Data mining query languages and adhoc data mining: Relational
query languages (such as SQL) allow users to pose adhoc queries for
data retrieval. In   a   similar way,   high-level data mining query languaged
need to be developed to allow users to describe adhoc data mining. This
include tasks of specifying relevant sets of data for analysis, the doma"
     uledge. the kinds of knowledge to be mined and the conditions and
                                         to   be enforced   on   the discovered patterns. Such
 o   n   s   t   r   a   i   n   t   s
                                                                                                     a
                                                                                                         language
   l d be integrated with a database or data warehouse query language
             optimizcd                        for efticient and flexible data mining
  and
             Preventation and visualization ofdata mining results: Discovered
 nawledge should be expressed in high-level languages, visual
                                       forms so that the knowledge can be
 reprcsentations. or other expressive
                     and directly usable by humans. This is especiallIy
 casily understood
                                    is to be interactive. This requires the
 crucial if the data mining system
                                                                                         such
 svstem to                     adopt expressive knowledge representation techniques,
 as trees.                   tables, rules, graphs, charts, crosstabs. matrices or curves.
                                                       data                                  stored in       database
             Handling noisy or incomplete data: The
                                                                                                         a
                                                                           When
 may         reflect noise, exceptional cases, or incomplete data objects.
                                            confuse the process, causing
mining data regularities, these objects may
the knowledge model constructed to overtit the
                                                   data. As a result, the
                                     can be poor. Data cleaning methods
accuracy of the discovered patterns
and data                                         required, as well as
                             analysis methods that                can   handle noise   are
outlier mining methods for the discovery and analysis of exceptional
cases.
         Pattern evaluation- the interestingness problem: A data mining
system can uncover thousands of patterns. Many of the patterns
discovered may be uninteresting to the given user, either because they
represent common                              knowledge or lack novelty. Several challenges remain
Tegarding the                             development of techniques to assess the interestingness
of discovered patterns. The                                      use o f interestingness measures or             ser-
p e c i l i e d c o n s t r a i n t s to g u i d e the d i s c o v e r y p r o c e s s and reduce t e
                                                                                                                   h
space is another active                               arca   of research.
P'erformance issues
                           efticiency. scalability
                                                   and parallelization of         ddata
       These include
mining algorithms.
                                                algorithms: To efectivel
       Eficiency and scalability of data mining
extract   informationfrom a       amount
                                  huge      data in databases, data minino
                                                 of                                   g
                                                             words, the runnino
algorithms       must    be efficient and scalable. In other                 Ang
                                         must be predictable and acceptable in
time   of   a   data   mining algorithm
                                               on knowledge discovery
large databases. From a database perspective
                                                                 of data
efticiency and scalability are key issues in the implementation
                                 issues discussed above under mining
mining systems. Many of the
                                                consider efficiency and
methodology and user interaction must also
scalability.
       Parallel, distributed, and incremental mining algorithms: The huge
size of many databases, the wide distribution of data and the
computational complexity of              some   data   mining   methods   are   factors
motivating the development of parallel and distributed data mining
algorithms. Such algorithms divide the data into partitions, which are
processed in parallel. The results from the partitions are then merged.
       Moreover, the high cost of some data mining processes promotes
the need for incremental data mining algorithms. Such algorithms
perform knowledge modification incrementally to amend and strengthen
what was previously discovered.
Issues relating to the diversity of database types
       Handling of relational and complex types of data: Since relationa
databases and data warehouses are widely used, the development ol
                                                                   nt.
efficient and effective data mining systems for such data is important
 However, other databases may contain complex data objects, hyperteext
and    multimedia data, spatial          lata, temporal data, or
                                                                 transaction data.    It
     unrealistie
is unrealistic
is
                   to   expect   one   system to mine all kinds of data,
                                                                               given the
diversity of data types          and dilferent
                                                 goals   of data   mining. Specific data
nining systems should            be constructed for
                                                      mining specific kinds of data
Therefore.    one       may expect     to have   different data mining systems for
different kinds of data.
       Mining      information from heterogeneous                  databases and global
information systems: Local and wide-area computer networks (such as
the Internet) connect many sources of data, forming huge, distributed
and heterogeneous databases. The discovery of knowledge from different
sources of structured, semi-structured, or unstructured data with diverse
data semantics poses great challenges to data mining. Data mining may
help to disclose high-level data regularities              in   multiple heterogeneous
databases. They are unlikely to be discovered by simple query systems
and may       improve information exchange and interoperability                        in
heterogeneous databases. Web mining, which uncovers interesting
knowledge about Web contents, Web structures, Web usage and Web
dynamics, becomes a very challenging area in data mining.