0% found this document useful (0 votes)
262 views9 pages

A Study On Importance of Image Mining and Its Challenges

The document discusses image mining and its challenges. Image mining involves extracting patterns and knowledge from large collections of images using techniques from fields like computer vision, image processing, and machine learning. It differs from image processing by focusing on discovering general patterns across image sets rather than analyzing specific images. Key challenges of image mining include creating databases to store the huge and growing volumes of images produced daily, and developing more precise and effective algorithms for mining images. Content-based image retrieval is also discussed as an important application that allows searching image databases based on visual image features.

Uploaded by

Shilpa K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
262 views9 pages

A Study On Importance of Image Mining and Its Challenges

The document discusses image mining and its challenges. Image mining involves extracting patterns and knowledge from large collections of images using techniques from fields like computer vision, image processing, and machine learning. It differs from image processing by focusing on discovering general patterns across image sets rather than analyzing specific images. Key challenges of image mining include creating databases to store the huge and growing volumes of images produced daily, and developing more precise and effective algorithms for mining images. Content-based image retrieval is also discussed as an important application that allows searching image databases based on visual image features.

Uploaded by

Shilpa K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

A study on importance of Image mining and its challenges

Shilpa. K1 Dr. V. Ganesh Babu2


Asst. Professor, Dept. of Computer Science Government Science College, N.T.Road
Bengaluru, India
shilpaktr@gmail.com
Mobile no.:9900777900
Asst. Professor, Dept. of Computer Science Government college for women,
Maddur, Mandya(Dist), India
vgbzone@gmail.com
Mobile no.: 994531369

Abstract

Image mining is an interdisciplinary field that is based on machine vision, image processing, image retrieval, data
mining, machine learning, databases and artificial intelligence. The advancement in digital technology, the volume
of the data is increasing day by day. The main Challenge is to create a separate database for images. This is more
complex and need more of attention. In almost every field like medicine, geographical systems, robotics, health
sciences engineering etc., people required separate databases for storing their images. So it plays a vital role in
research and development areas. The main idea is extraction and discovery of new information or knowledge from
the images present in the database. This extraction is known as Image Mining. It is the more advanced field in data
mining. It is different from data mining as it is focused on images and extraction of information from the images
only. The relationship between the image sets and other patterns are mined based on the user requirements. Various
algorithms have been developed and used for mining images but more work needs to be done, for the results to be
more specific, precise, accurate and effective. This paper focuses on image mining process, and its applications and
identifies the challenges and the future of the research in Image Mining.

Keywords— Data Mining, Image Mining, Knowledge Discovery, Machine Learning, Artificial Intelligence, Rule
Mining, Datasets.

1. Introduction

Image mining is an extended version of data mining under the image domain. It includes expertise from computer
vision, image processing, acquisition of an image, retrieval of an image, data mining, machine learning, database and
AI (Artificial Intelligence). Every day huge volume of images is captured/generated in our day to day life, especially
in the field of medicine, e.g. MRI scan, CT scan, ultrasound scan, Mammogram images, X-ray images, etc. Things
are more challenging in the area of meteorology, where the images that are received from satellite needs to be stored
for processing so that they can do weather forecasting. This has brought significant change in the databases, where
the challenge is to how to store these image, indexing them, querying them, what query needs to be stated to retrieve
the image for knowledge gardening. All these days the meaning of data was restricted only with alphanumeric but,
now the concept has changed from alphanumeric to images.

Image mining focuses on the extraction of patterns from large collections of images while the emphasis of image
processing is depend on certain characteristics of a specific image. A high volume of images, such as satellite images,
medical images and digital photos produced stored in a cloud very rapidly. In case of the analysis of these images, a
lot of useful information can be gained. The pixels shown in a raw image or series of images in order to detect objects
and the relationship among them is the most fundamental challenge in the mining picture .

The main objective of image analysis is obtaining all significant patterns of images, without knowing the details of
the content of the images; this means that without having a basic knowledge of the content of the images you can
extract important patterns out of a series of images as an input.

2. Content-Based Image Retrieval (CBIR)

Image mining can be done manually by cutting and fragmenting data to achieve a specific pattern or that can be
performed by using programs that analyze the data automatically. Color, texture and existing shapes in the image,
are the primary describers in context-based image retrieval system. Primary descriptors are used to identify and
retrieve similar images from a database of images; it is very difficult to extract images from a data set manually,
because this is a very large data base .

Moreover, CBIR is well known as a Query by Image Content (QBIC) and content-based visual information retrieval
(CBVIR) and consists of using machine vision for retrieving digital images of large databases of images. It is
confirmed that the previous methods of image retrieval, such as indexing, is very time consuming and inefficient. In
these methods an indexed image is stored in the database and it is connected to a keyword or a number related to the
classified descriptions. These old methods were not based on CBIR content.

In CBIR any image which is stored in the database has its own characteristics, which is extracted and compared with
the features of the query image. This method is a combination of knowledge in different fields such as pattern
recognition, matching objects, machine learning, and microwave filtering and so on. CBIR is intended to receive and
discover visual properties of images without having any descriptive text about them.

CBIR plans to look at the database images that are similar to the query image. It also focuses on the development of
techniques that would effect on digital libraries of images based on the feature; the image is automatically extracted
from the query. CBIR also focuses on the features of images; these features can be classified as low-level features or
characteristics of a high level. CBIR images from the database images based on attributes such as color, texture, edge
and shape their recovery. In a text-based image retrieval system (TBIR) images based on descriptions, indexing and
retrieval, such as size, type, date, time capture, identify the owner of the image, keywords or some other explanatory
text on the image.

In Figure 1 a general CBIR system is shown. In such a system, concepts of visual images extracted from databases
and features are described as multi-dimensional vectors. Feature vector features are going to be in the form of a
database. To restore an image, users provide a sample image as input. The application form its own internal system
that turns the feature vector. The similarity between the input image and the images in the database search and
indexing is performed is calculated, and retrieved with the help of patterns.
User

Describing visual concepts


Related feedback
Picture query Feature vectors

compare
Database images Describing visual concepts Database
similarity
Features
Indexing and retrieval

Output
Retrieval results

Figure 1. An example system architecture Content-Based Image Retrieval CBIR.

3. Image Mining

Image mining deals with extraction of implicit knowledge, image data relationship or other patterns not
explicitly stored in image Unlike other image processing techniques, IM does not aim at detecting a specific pattern
in images. It focuses rather on identifying and finding image patterns and deriving the knowledge from images within
an image set based on the low-level (pixel) information. As a research field, it has developed to an interdisciplinary
area combining knowledge and tools of data mining, databases, computer vision, image processing, image retrieval,
statistics, recognition, machine learning, artificial intelligence, etc. Image mining process consists of several
components including:

 image analysis covering image pre-processing, object recognition and feature extraction,
 image classification,
 image indexing,
 image retrieval,
 data management.
Figure 1 below gives a glimpse of image mining process.

Transformation
Image Pre-processing Mining
and feature
Database
extraction

Interpretation
and Evaluation

Knowledge
Figure 1: Image mining process.

3.1 Image pre-processing: The very first step in image mining is image pre-processing, this is the lowest
level of abstraction, the image is processed to remove distortion, enhance the resolution of the image to be
processed. If it is a satellite image then it used to extract the spatial location of fire spots (Figure 2). The fire
spots are represented by latitude and longitude. There are many pre-processing algorithms that come into
picture while doing this activity for reducing the impact of noise, average, median and Wiener filtering.

Figure 2: Natural-colour image of the Thomas Fire in Ventura County California. Photo was taken by NASA's
Aqua satellite. Credit: NASA Goddard LANCE/EOSDIS MODIS Rapid Response Team

3.2. Classification

Classification is a supervised method of data grouping. In supervised methods, classification of a set of labelled
images is provided, which is called learning set. Classification is usually a two-phase process. Learning phase and
test phase. In the first phase, profile images are distinct and learning is made on the basis of class. In the second
phase, parts of the specifications are used to classify images. The most popular classification methods are decision
trees, Bayesian classifier, SVM-based classification rule, neural networks, and fuzzy logic techniques mentioned.
One of the methods which are very important in the process of classification is using decision tree. Decision trees,
divide decision space to smaller areas as a return based on the whole sample. In this way, decision trees break down
the complex decision as a throwback which has a uniform result and naturally reflects the recognition strategy that
can be used in human decision-making process.

3.3. Color Processing

One of the methods of color image processing is using color histogram. Color histogram of an image may be at the
level of the whole picture or for each range, a histogram as a feature in the image used to represent the color
distribution. A color image of RGB, is an M * N * 3 array of color pixels, the color pixels of which is a triple
specifying the amount of red, green, blue part of the image in a space. A color image can be considered as a stack of
three black and white images when color display with entries in a red, green and blue are combined to make a color
image, which can average each color component in the image as calculated (Formula 1).

Average pixels red = R (P) / P

Average green pixels = (G (P)) / P

Average blue pixels = (B (P)) / P

Formula1: Calculation formula

Where P is the total number of image pixels. R (P) is the number of red pixels. G (P) is the number of green pixels
and B (P) is number of blue pixels.

3.4. Clustering

Clustering, a branch of learning, is an unsupervised method and is an automated process in which samples are divided
into groups, whose members are similar to the categories called cluster. Therefore, cluster is a collection of objects
where objects are similar with each other and with objects in other clusters are dissimilar. Similarly, the various
criteria to be taken into account for example, the criteria are to be used for clustering contract and objects that are
closer together as a cluster consider that this type of clustering, also called distance-based clustering.

Clustering, divided into a number of subsets or clusters of heterogeneous population is said to be homogeneous. What
distinguishes clustering categories is that clustering does not rely on pre-determined categories. In categorization
based on model, each data is allocated to a pre-determined category. These categories (such as gender, skin color,
etc.) have been determined thorough the finding of previous studies. There is no set of predetermined clustering and
data on the basis of similarity are grouped and titles of each group be determined by the user. For example, clusters
of symptoms may indicate a variety of diseases and clusters of features customers may be indicative of different
market segments. Clustering is usually as a prelude to the use of other data mining analysis or modelling is used.

3.5 Feature Extraction: is a process of compressing the information that is extracted from the known objects into
attributes. Global, as well as Local descriptors are used for representing the image. Segmentation error are minimal
also, they are much easier to compute using Global descriptors. On the other hand, Local descriptors give us precise
information and also discover distinct patterns. The features are normally represented numerically; they provide a
complex mathematical representation of an image. The objects are described as shape, texture, colour, etc. Martinez
explains that the descriptors can be divided into following description tools:

a. Basic elements: the tools that are used by relevant descriptors are Grid layout, time series, 2D3D multiple
view, spatial 2D/3D co-ordinates and temporal interpolation.
b. Colour: Colour histogram (Figure- 4)- A graphical representation of a distribution of colours within an
image. It provides effective characteristics of the colour distribution of an image for easy computing. As
a descriptor histogram cannot be rotated or translated. Colour Space, Colour Quantization, Scalable
Colour, Dominant Colour, Colour Layout, Colour Structure, and Group-of-Frames/Group-of-Pictures
Colour are described as colour descriptors by Martinez.

3.6 Selecting Properties

To select properties, we can use measuring methods based on entropy, Gain – ratio, Gini- index, chi square,
etc. To discrete the properties, we apply chi- merge discrete cut point, discrete based on MDLP or LVQ. If we use
decision tree to categorize, these discrete methods create one or several interval during making decision tree. Gained
tree can be binary or n- number that led to produce more correct and compact trees. To evaluate them, we can use n-
fold lateral evaluating methods or test and train method .

Selecting features cause to reduce problem dimension and as a result cause to improve prediction and decrease time
calculations. This, problem can remove via deleting unrelated, additional and noisily features. Therefore, we always
try to select a subset of features. Usually, these features select via search ways. Different search ways were developed
to reach this purpose. Of popular algorithms which are used including sequential forward selection, sequential
backward selection, genetics algorithm, particle swarm optimization, branch and bound feature optimization.

3.7. Histogram Equalization

Histogram equalization is a method that use for contrast setting in image processing. Contrast amount distribute better
on histogram via this setting. This matter let limits which has less local contrast to reach better contrast. Histogram
equalization performs this operation via developing the most amount contrast. This method is very useful for images
that their background and foreground is black and white such as radiology images. One of the other histogram
methods in image processing is providing severity histogram. In this kind of histogram, we consider some feature
such as average, variance, skewness, elongation, entropy and energy.

Figure 3: Simple histogram of flowers.


Photo courtesy: Sanjaya. A
Histogram courtesy: PineTools
Figure 4: Colour histogram of flowers.
Photo courtesy: Sanjaya. A,
Colour histogram courtesy: Lunapic

4. Challenges:
Automation of Image Analysis and subsequent Knowledge Acquisition based on computer-driven processing images
have tremendous potential. The research is still at the beginning and many areas need further exploration. There are
many issues to be solved in order for the computer to be able to efficiently analyse images and to derive knowledge
accordingly:

 Instead of low-level pixel representation of images, developing a representation of images able to encode
the related information hidden in an image is crucial.

 Another challenge in Image Mining is a classification of the obtained patterns. Automatic deriving of
appropriate decision criteria for clustering representations of image is still difficult to be overcome.

 Implementing suitable indexing method and developing standards for procedures of indexing and retrieving
knowledge from mages is also a concern.

 Developing a query language for both visual patterns and text information is needed.

 Analysing and retrieving knowledge from the images which are stored online will be the major challenge for
image mining.

5. Conclusion:

Valuable bits of information from sources like satellite, space, medical and digital images, are produced daily, in
such a way that their high magnitude and size has made it impossible for human to analyze them for extracting
information or useful and appropriate patterns in decision making processes. Image mining is a new and promising
area for knowledge extraction from images, however is still in the beginning and more studies need to be done for
future development to improve techniques such as image processing, feature extraction, image segmentation and
identifying objects. In this paper, we presented the unique features of image mining, proceeded with the general
process of analyzing and discussed the main image mining techniques. Furthermore, we introduced the concept of
image mining as one of newest research axis in imaging database. Then we accounted for different methods and
techniques for image mining proposed by researchers.

6. References
1. Fayyad U, Shapiro G, Smyth P. Knowledge Discovery and Data Mining [Online]. 2011 [Cited2011Aug8];
Available from: URl: http://www. Aaai.org/.
2. Tan J. Medical Informatics: Concepts, Methodologies, Tools, and Applications. Hershey: IGI Global snippet;
2008.
3. G. Eason, B. Noble, and I. N. Sneddon, "On certain integrals of Lipschitz-Hankel type involving products of
Bessel functions," Phil. Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955. (references).
4. LaTour KM, Eichenwald S. Health Information Management: Concepts, Principles, and Practice. Chicago:
AHIMA; 2002. p. 478-80.
5. Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques. Philadelphia: Elsevier; 2011.
6. I. S. Jacobs and C. P. Bean, "Fine particles, thin films and exchange anisotropy," in Magnetism, vol. III, G.
T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
7. Chen H, Fuller SS, Friedman C, Hersh W. Medical Informatics: Knowledge Management and Data Mining
in Biomedicine. New York: Springer; 2005.
8. Maimon OZ, Rokach L. Data Mining And Knowledge Discovery Handbook. New York: Springer Science
& Business; 2010. p. 1.
9. Chakrabarti S, Cox E. Data Mining: Know It All. Amsterdam: Morgan Kaufmann p. 7; 2009.
10. J. Zhang, W. Hsu, M. Lee, Image Mining: Issues, Frameworks And Techniques, In Proc. Of the second
International workshop on Multimedia Data Mining, SanFrancisco, USA, August 2001.
11. C. Ordonez, E. Omiecinski, Image Mining: A new approach for data mining", Technical Report GIT-CC-
98-12, Georgia Institute of Technology, College of Computer, 1998.
12. Ji Zhang, Wynne Hsu, Mong Li Lee, "An Information. Driven Framework For Image Mining", Computer
Science, School of Computer, National University of Singapore, IEEE, August 2001.
13. Ramadass Sudhir, "A Survey on Image Mining Techniques: Theory and Applications", Computer
Engineering and Intelligent Systems, Vol2,No,6,2011.
14. Monika Sahu, Madhu P Shrivastava, Dr. M A Rizvi, "image mining: a new approach for data mining based
on texture",IEEE,2012.
15. A.Kannan, DR.V.Mohan, Dr.N.Anbazhagan, "Image Clustering and Retrieval using Image Mining
Techniques",IEEE,2010.
16. Nishchol mishra1,Dr.sanjay Silakari, "Image Mining the Context of content based Image Retrieval:A
perspective",IJCSI,Vol.9,Issue4, No3, July 2012.
17. Sanjay T.Gandhe, K.T. Talele, and Avinash G.Keskar. "Image Mining Using Wavelet Transform". Springer-
Verlag Berlin Heidelberg 2007.
18. Tomas Berlage, "Analyzing and mining image database", DRUG DISCOVERY TODAY:BIOSILICO,DDT,
Vol 10, Number 11, June 2005.
19. Petra Perner, "Image mining: issue, framework, a generic tool and its application to medical-image
diagnosis",Elsevier,2002.
20. Aswini Kumar Mohanty, Manas Ranjan Senapati, Saroj Kumar Lenka, " A novel image mining technique
for classification of mammograms using hybrid feature selection, "Springer, 23 February 2012.
21. Chidansh Amitkumar Bhatt, Mohan S.Kankanhalli, "Multimedia data mining: state of the art and
challenges", springer Science+Business Media, LLC 2010.
22. A.Hema,E.Annasaro,"a survey in need of image mining techniques", International Journal of Advanced
Research in Computer and Communication Engineering Vol.2,Issue2, February 2013.

You might also like