Exploiting
Multimodal Synergy for Large Scale and Diverse Image Retrieval in Digital
Archives
Sponsor: National
Science Foundation

This
project addresses a three year research and education program (2006 - 2009)
focusing on developing a revolutionary approach to large scale and diverse
image retrieval in digital archives. It has become ubiquitous today that almost
all the digital archives contain not just the traditional structured data, but
more often the multimedia data; with the rapid development in technologies, it
has become more and more dominant for the multimedia data in digital archives.
Given a typical presence of the multimedia data in digital archives, imagery is
considered as the most popular modality of the multimedia data probably only
next to text. Consequently, image retrieval becomes an important research area
in the literature, and thus is considered as the focused research area towards
the development of effective and efficient Multimedia Information Retrieval
(MIR) technologies in digital archives.
Due to this consideration, image retrieval has
been studied for over a decade as an emerging area called Content Based Image
Retrieval (CBIR), and has become a major focus of attention in the research in
MIR. The current status of the research in image retrieval exhibits two
notorious bottlenecks: (1) the issue of the semantic gap -- the majority of the
existing methods in the literature focuses on using low-level image features to
retrieve images and it is well-studied that it is usually insufficient to find
similar images solely using image features due to the gap between the image
features and the semantic concepts carried in the image; this is due to the
fact that it is found to be very difficult to directly represent and use the
semantic concepts in image retrieval; and (2) the issue of scalability -- all
the existing methods in the literature are only demonstrated using very clean
data sets (e.g., the Corel data) and very small data sets (typically below
10,000 images); this is due to the three reasons: (a) most of the proposed
methods in the literature are not scalable in nature (e.g., linear search in
complexity); (b) in addition to the complexity in nature, many existing methods
are sensitive to the diversity of the image content and quality, which results
in reporting experiments using very clean data such as the Corel collection;
and (c) the image retrieval community at present does not yet have a standard
benchmark collection similar to the ones in the text retrieval community;
consequently, each research group typically uses the data sets either collected
by their own or shared with other research groups which are typically small in
scale. Note that here the scalability issue refers to both the scales in
diversity of the image content and quality and the scales in size of the image
databases. This observation is supported by the recent research in the
literature in this area; it has been noted that the data sets used in most
recent automatic image annotation and/or image retrieval systems fail to
capture the difficulties inherent in many real image databases.
On
the other hand, it is well-observed that often imagery data does not exist in
isolation; instead, typically there is rich collateral information co-existing
with image data in many applications. Examples include the Web, many
domain-archived image databases (in which there are annotations to images), and
even consumer photo collections. In order to reduce the semantic gap, recently
multimodal approaches to image retrieval are proposed in the literature to
explicitly exploit the redundancy co-existing in the collateral information to
the images. In addition to the improved retrieval accuracy, another added
benefit found in the multimodal approaches is the multiple query modalities --
users may query image databases either by image, or by a collateral information
modality (e.g., text), or by any combinations.
This project focuses on developing a novel
multimodal approach to image retrieval by explicitly exploiting the synergy
between the multimodal data in addressing the two bottlenecks simultaneously.
Ultimately, this project aims at revolutionizing the research in image
retrieval and developing and advancing the proven and working technologies
allowing large scale and diverse image retrieval in digital archives.
Specifically, as an integrated research and
education program, this project focuses on the following three specific
objectives to be achieved synergistically: (1) to develop a revolutionized
theory as well as the related methodology as a multimodal approach to large
scale and diverse image retrieval that addresses the semantic gap and the
scalability issues simultaneously; (2) to extensively evaluate the theory and
the methodology using truly large scale and diverse multimodal data; and (3) to
develop and evaluate innovative community outreach activities through the
existing partnership in research collaborations in this project to further
promote knowledge dissemination.
The intellectual merit of this project includes
the revolutionized understanding of the image retrieval in the multimodal
context as well as the expected breakthrough in effective and efficient image
retrieval that shall undoubtedly advance the literature of CBIR as well as MIR
and generate profound impact in the related areas including pattern
recognition, data mining, and computer vision.
The broader impact of this project is two folds.
Educationally, the development, the implementation, and the evaluation of the
innovative community outreach activities in this project shall promote the
timely and effective knowledge dissemination related to multimodal image
retrieval and to further enrich the pedagogical literature; the disseminated
knowledge to the collaborating organizations, especially those non-profit
organizations, shall further advance and enhance their research and services to
the whole society. Technologically, the expected breakthrough in image
retrieval shall embrace a new era of technological revolution in a wide range
of applications noticeably including the Web search engines, digital libraries,
as well as K-12 learning tools.

NSF Project Manager: Dr. Maria Zemankova

Project Personnel:
PI: Prof. Zhongfei
(Mark) Zhang
PhD student:
Master
student:

Publications:
-
Zhongfei (Mark) Zhang and
Ruofei Zhang, Multimedia Data Mining -- A Systematic Introduction to Concepts
and Theory, Taylor & Francis Group/CRC Press, 2008, ISBN: 9781584889663
-
Ruofei Zhang and Zhongfei
(Mark) Zhang, Solving Small and Asymmetric Sampling Problem in the Context of
Image Retrieval, in Artificial Intelligence for Maximizing Content Based Image
Retrieval, Edited by Zongmin Ma, Idea Group Inc., 2008
-
Jian Yao, Zhongfei (Mark)
Zhang, Sameer Antani, Rodney Long, and George Thoma, Automatic Medical Image
Annotation and Retrieval, Neurocomputing, Elsevier Science Press, Volume
71/10-12, 2008, pp 2012-2022
-
Xiao-Bing Xue, Zhi-Hua Zhou,
and Zhongfei (Mark) Zhang, Improving Web Search Using Image Snippets, ACM
Transactions on Internet Technology, ACM Press, in press, 2008
-
Zhongfei (Mark) Zhang, Haroon
Khan, and Mark A. Robertson, A Holistic, In-Compression Approach to Video
Segmentation for Independent Motion Detection, EURASIP Journal on Advances in
Signal Processing, Hindawi Publishing Co., Article ID 738158, 9 pages,
doi:10.1155/2008/738158, Volume 2008, 2008
-
Zhongfei (Mark) Zhang, Florent
Masseglia, Ramesh Jain, and Alberto Del Bimbo, Editorial: Introduction to the
Special Issue on Multimedia Data Mining, IEEE Transactions on Multimedia, IEEE
Computer Society Press, Volume 10, Number 2, 2008, pp 165 -- 166
-
Tianbing Xu, Zhongfei Zhang,
Philip S. Yu, and Bo Long, Evolutionary Clustering by Hierarchical Dirichlet
Process with Hidden Markov State, Proc. IEEE International Conference on Data Mining,
Pisa, Italy, December, 2008, (9.6% acceptance rate)
[pdf]
-
Tianbing Xu, Zhongfei Zhang,
Philip S. Yu, and Bo Long, Dirichlet Process Based Evolutionary Clustering,
Proc. IEEE International Conference on Data Mining, Pisa, Italy, December,
2008, (9.6% acceptance rate)
[pdf]
-
Xi Li, Zhongfei Zhang, Yanguo
Wang, and Weiming Hu, Multiclass Spectral Clustering Based on Discriminant
Analysis, Proc. International Conference on Pattern Recognition, Tempa, FL,
USA, December, 2008, (20.0% acceptance rate)
-
Xi Li, Weiming Hu, Zhongfei
Zhang, and Xiaoqin Zhang, Robust Visual Tracking Based on An Effective
Appearance Model, Proc. European Computer Vision Conference, Marseille, France,
October, 2008, (23.3% acceptance rate)
-
Xi Li, Weiming Hu, Zhongfei
Zhang, Xiaoqin Zhang, Robust Foreground Segmentation Based on Two Effective
Background Models, Proc. ACM International Conference on Multimedia Information
and Retrieval, Vancouver, Canada, October, 2008, (20.0% acceptance rate)
[pdf]
-
Xi Li, Weiming Hu, Zhongfei
Zhang, Xiaoqin Zhang, and Guan Luo, Trajectory-Based Video Retrieval Using
Dirichlet Process Mixture Models, Proc. British Machine Vision Conference,
Leeds, UK, September, 2008, (12.5% acceptance rate)
-
Bo Long, Zhongfei (Mark) Zhang,
and Tianbing Xu, Clustering on Complex Graphs, Proc. 23th Conference on
Artificial Intelligence (AAAI 2008), Chicago, IL, USA, July, 2008, (24%
acceptance rate)
[pdf]
-
Xi Li, Weiming Hu, Zhongfei
Zhang, Xiaoqin Zhang, Mingliang Zhu, Jian Cheng, and Guan Luo, Visual tracking
via incremental log-Euclidean Riemannian subspace learning, Proc. IEEE Computer
vision and Pattern Recognition, Anchorage, Alaska, USA, June 2008, (27.9%
acceptance rate)
[pdf]
-
Ming Li, Zhongfei (Mark) Zhang,
and Zhi-Hua Zhou, Mining Bulletin Board Systems Using Community Generatio,
Proc. Pacific and Asia Knowledge Discovery and Data Mining Conference, Osaka,
Japan, May 2008, (11.9% acceptance rate)
-
Bo Long, Philip S. Yu and
Zhongfei (Mark) Zhang, A general model for multiple view unsupervised learning,
Proc. the SIAM International Conference on Data Mining, Atlanta, GA, 2008, (14%
acceptance rate)
-
Zhen Guo, Zhongfei (Mark)
Zhang, Eric P. Xing, and Christos Faloutsos, Semi-supervised learning based on
semiparametric regularization, Proc. the SIAM International Conference on Data
Mining, Atlanta, GA, 2008, (14% acceptance rate)
[pdf]
- Xi Li, Weiming Hu, Zhongfei (Mark) Zhang, Xiaoqin Zhang, and Quan
Luo, Robust Visual Tracking Based on Incremental Tensor Subspace Learning,
Proc. the IEEE International Conference on Computer Vision, Rio de Janeiro,
Brazil, October, 2007
[pdf]
- Bo Long, Xiaoyun Wu, Zhongfei (Mark) Zhang, and Philip S. Yu,
Community Learning by Graph Approximation, Proc. the IEEE International
Conference on Data Mining, Omaha, NE, USA, October, 2007
[pdf]
-
Xi
Li, Weiming Hu, and Zhongfei (Mark) Zhang, Corner Detection of Contour Images
Using Spectral Clustering, Proc. the 14th IEEE International Conference on
Image Processing, San Antonio, TX, USA, September, 2007
-
Zhongfei
Zhang, Zhen Guo, Christos Faloutsos, Eric P. Xing, and Jia-Yu (Tim) Pan, On the
scalability and adaptability for multimodal image retrieval and image
annotation, Proc. International Workshop on Visual and Multimedia Digital
Libraries, Modena, Palazzo Ducale, Italy, September, 2007
-
Bo
Long, Zhongfei (Mark) Zhang, and Philip S. Yu, A Probabilistic Framework for
Relational Clustering, Proc. the 13th ACM International Conference on Knowledge
Discovery and Data Mining, San Jose, CA, USA, August, 2007
-
Zhen Guo, Zhongfei (Mark) Zhang, Eric P.
Xing, and Christos Faloutsos, Enhanced Max Margin Learning on Multimodal Data
Mining in a Multimedia Database, Proc. the 13th ACM International Conference on
Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007
- Bo
Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Graph Partitioning Based on Link
Distribution, Proc. the 22nd Annual Conference on Artificial Intelligence
(AAAI-07), Vancouver, British Columbia, Canada, July, 2007
- Zhen
Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, A Max Margin
Framework on Image Annotation and Multimodal Image Retrieval, Proc. the IEEE
Annual International Conference on Multimedia and Expo, Beijing, China, July,
2007
-
Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Relational
Clustering by Symmetric Convex Coding, Proc. the 24th Annual International
Conference on Machine Learning, Oregon State University, OR, USA, June, 2007
[pdf]
- Zhongfei (Mark) Zhang, Florent
Masseglia, Ramesh Jain, and Alberto Del Bimbo, KDD/MDM 2006: The 7th KDD
Multimedia Data Mining Workshop Report, ACM KDD Explorations, accepted, 2006
- Ruofei Zhang and Zhongfei
(Mark) Zhang, Effective Image Retrieval Based on Hidden Concept Discovery in
Image Database, IEEE Transaction on Image Processing, Volume 16, Number 2,
February, 2006, pp 562 -- 572
- Arif Ghafoor, Zhongfei (Mark) Zhang, Michael S.
Lew, and Zhi-Hua Zhou, Guest Editors' Introduction to Machine Learning
Approaches to Multimedia Information Retrieval, ACM Multimedia Systems Journal,
Springer, 2006
-
Zhongfei
(Mark) Zhang, Querying Non-Uniform Image Databases for Biometrics-Related
Identification Applications, Sensor Review, Emerald Publishers, Volume 26,
Number 2, April, 2006, pp 122-126
- Ruofei
Zhang and Zhongfei (Mark) Zhang, Empirical Bayesian Learning in the Relevance
Feedback of Image Retrieval, Image and Vision Computing, Elsevier Science,
Volume 24, Issue 3, March, 2006, pp 211-223
- Ruofei Zhang, Zhongfei
(Mark) Zhang, Mingjing Li, Wei-Ying Ma, and Hong-Jiang Zhang, A Probabilistic
Semantic Model for Image Annotation and Multi-Modal Image Retrieval, ACM
Multimedia Systems Journal, the special issue of Using Machine Learning
Approaches to Multimedia Information Retrieval, Springer, 2006
-
Jian
Yao and Zhongfei (Mark) Zhang, Hierarchical Shadow Detection for Color Aerial
Images, Computer Vision and Image Understanding, Elsevier Science, Volume 102,
Issue 1, April, 2006, pp 60-69
-
Bo
Long, Xiaoyun Wu, Zhongfei (Mark) Zhang, and Philip S. Yu, Unsupervised
Learning on K-partite Graphs, Proc. ACM International Conference on Knowledge
Discovery and Data Mining, ACM Press, Philadelphia, PA, USA, August, 2006
[pdf]
-
Bo
Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu, Spectral Clustering
for Multi-Type Relational Data, Proc. International Conference on Machine
Learning, ACM Press, Pittsburgh, PA, USA, June, 2006
[pdf]
-
Xiao-Bing
Xue, Zhi-Hua Zhou, and Zhongfei (Mark) Zhang, Improve Web Search Using Image
Snippets, Proc. the 21st National Conference on Artificial Intelligence, AAAI
Press, Boston, MA, USA, July, 2006
-
Jian Yao, Zhongfei (Mark)
Zhang, Sameer Antani, Rodney Long, and George Thoma, Automatic Medical Image
Annotation and Retrieval Using SEMI-SECC, Proc. IEEE International Conference
on Multimedia and Expo, IEEE Computer Society Press, Toronto, Canada, July,
2006
-
Jian Yao, Sameer Antani, Rodney
Long, and George Thoma, and Zhongfei (Mark) Zhang, Automatic Medical Image
Annotation and Retrieval Using SECC, Proc. IEEE International Symposium on
Computer Based Medical Systems, IEEE Computer Society Press, Salt Lake City,
Utah, USA, June, 2006
[pdf]

Code Release :
·
EMML
code(based on paper: Zhen Guo, Zhongfei
(Mark) Zhang, Eric P. Xing, and Christos Faloutsos, Enhanced Max Margin
Learning on Multimodal Data Mining in a Multimedia Database, Proc. the 13th ACM
International Conference on Knowledge Discovery and Data Mining, San Jose, CA,
USA, August, 2007)

Partners:
This material
is based upon the work supported by the National Science Foundation under Award
No. 0535162.
Any opinions,
findings, and conclusions or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the views of the National
Science Foundation.
Go back to the Multimedia Computing
Research Lab homepage