Exploiting
Multimodal Synergy for Large Scale and Diverse Image Retrieval in Digital
Archives
Sponsor: National Science Foundation
![]()
This project
addresses a three year research and education program
(2006 - 2009) focusing on developing a revolutionary
approach to large scale and diverse image retrieval in
digital archives. It has become ubiquitous today that
almost all the digital archives contain not just the
traditional structured data, but more often the
multimedia data; with the rapid development in
technologies, it has become more and more dominant for
the multimedia data in digital archives. Given a typical presence of the
multimedia data in digital archives, imagery is
considered as the most popular modality of the
multimedia data probably only next to text. Consequently, image retrieval
becomes an important research area in the literature,
and thus is considered as the focused research area
towards the development of effective and efficient Multimedia Information
Retrieval (MIR) technologies in digital archives.
Due to this consideration, image retrieval has been studied for over a decade
as an emerging area called Content Based Image
Retrieval (CBIR), and has become a major focus of
attention in the research in MIR. The current status
of the research in image retrieval exhibits two
notorious bottlenecks: (1) the issue of the semantic gap -- the
majority of the existing methods in the
literature focuses on using low-level image features to
retrieve images and it is well-studied that it
is usually insufficient to find similar images solely
using image features due to the gap between the image features and the semantic
concepts carried in the image; this is due to the fact that it is found
to be very difficult to directly represent and use the
semantic concepts in image retrieval; and (2) the
issue of scalability -- all the existing methods in the literature are
only demonstrated using very clean data sets (e.g., the Corel data) and
very small data sets (typically below 10,000 images);
this is due to the three reasons: (a) most of the
proposed methods in the literature are not scalable in nature (e.g., linear
search in complexity); (b) in addition to the complexity in nature, many
existing methods are
sensitive to the diversity of the image content and quality, which results in
reporting experiments using very clean data
such as the Corel collection; and (c) the image retrieval community at
present does not yet have a standard benchmark collection similar to the
ones in the text retrieval community; consequently,
each research group typically uses the data sets
either collected by their own or shared with other research groups which
are typically small in scale. Note that here the
scalability issue refers to both the scales in
diversity of the image content and quality and the
scales in size of the image databases. This
observation is supported by the recent research in the
literature in this area; it has been noted that the
data sets used in most recent automatic
image annotation and/or image retrieval systems fail to capture the
difficulties inherent in many real image databases.
On the other hand,
it is well-observed that often imagery data does not exist in
isolation; instead, typically there is rich
collateral information co-existing with image data in
many applications. Examples include the Web, many domain-archived image
databases (in which there are annotations to images),
and even consumer photo collections. In order to
reduce the semantic gap, recently multimodal approaches to image retrieval
are proposed in the literature to explicitly exploit the redundancy
co-existing in the collateral information to the
images. In addition to the improved retrieval
accuracy, another added benefit found in the multimodal approaches is the
multiple query modalities -- users may query image
databases either by image, or by a collateral
information modality (e.g., text), or by any combinations.
This project focuses on developing a novel multimodal approach to image
retrieval by explicitly exploiting the synergy between the multimodal
data in addressing the two bottlenecks simultaneously.
Ultimately, this project aims at revolutionizing the
research in image retrieval and developing and advancing the proven
and working technologies allowing large scale
and diverse image retrieval in digital archives.
Specifically, as an integrated research and education program, this project
focuses on the following three specific objectives to
be achieved synergistically: (1) to develop a
revolutionized theory as well as the related methodology as a multimodal
approach to large scale and diverse image retrieval that addresses the
semantic gap and the scalability issues
simultaneously; (2) to extensively evaluate the theory
and the methodology using truly large scale and diverse multimodal data; and
(3) to develop and evaluate innovative community outreach activities
through the existing partnership in research
collaborations in this project to further promote
knowledge dissemination.
The intellectual merit of this project includes the revolutionized understanding
of the image retrieval in the multimodal context as
well as the expected breakthrough in effective and
efficient image retrieval that shall undoubtedly advance the
literature of CBIR as well as MIR and generate profound impact in the
related areas including pattern recognition, data
mining, and computer vision.
The broader impact of this project is two folds. Educationally, the development,
the implementation, and the evaluation of the innovative community
outreach activities in this project shall promote the
timely and effective knowledge dissemination related
to multimodal image retrieval and to further enrich
the pedagogical literature; the disseminated knowledge to the
collaborating organizations, especially those non-profit organizations,
shall further advance and enhance their research and
services to the whole society. Technologically, the
expected breakthrough in image retrieval shall embrace a new
era of technological revolution in a wide range of applications
noticeably including the Web
search engines, digital libraries, as well as K-12 learning tools.
![]()
NSF Project Manager: Dr. Maria Zemankova
![]()
Project Personnel:
PI: Prof. Zhongfei (Mark) Zhang
PhD student:
Bo Long
Zhen Guo
Tianbing Xu
![]()
Publications:
Zhen Guo, Zhongfei Zhang, Eric P Xing, and Christos Faloutsos, Semi-supervised Learning Based on Semiparametric Regularization, 2008 SIAM International Conference on Data Mining, Atlanta, Georgia, 2008
Xi Li, Weiming Hu, Zhongfei (Mark) Zhang, Xiaoqin Zhang, and Quan Luo, Robust Visual Tracking Based on Incremental Tensor Subspace Learning, Proc. the IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, October, 2007
[pdf]
Bo Long, Xiaoyun Wu, Zhongfei (Mark) Zhang, and Philip S. Yu, Community Learning by Graph Approximation, Proc. the IEEE International Conference on Data Mining, Omaha, NE, USA, October, 2007
[pdf]
Xi Li, Weiming Hu, and Zhongfei (Mark) Zhang, Corner Detection of Contour Images Using Spectral Clustering, Proc. the 14th IEEE International Conference on Image Processing, San Antonio, TX, USA, September, 2007
Zhongfei Zhang, Zhen Guo, Christos Faloutsos, Eric P. Xing, and Jia-Yu (Tim) Pan, On the scalability and adaptability for multimodal image retrieval and image annotation, Proc. International Workshop on Visual and Multimedia Digital Libraries, Modena, Palazzo Ducale, Italy, September, 2007
Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, A Probabilistic Framework for Relational Clustering, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007
Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007
Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Graph Partitioning Based on Link Distribution, Proc. the 22nd Annual Conference on Artificial Intelligence (AAAI-07), Vancouver, British Columbia, Canada, July, 2007
Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, A Max Margin Framework on Image Annotation and Multimodal Image Retrieval, Proc. the IEEE Annual International Conference on Multimedia and Expo, Beijing, China, July, 2007
Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Relational Clustering by Symmetric Convex Coding, Proc. the 24th Annual International Conference on Machine Learning, Oregon State University, OR, USA, June, 2007
[pdf]
Zhongfei (Mark) Zhang, Florent Masseglia, Ramesh Jain, and Alberto Del Bimbo, KDD/MDM 2006: The 7th KDD Multimedia Data Mining Workshop Report, ACM KDD Explorations, accepted, 2006
Ruofei Zhang and Zhongfei (Mark) Zhang, Effective Image Retrieval Based on Hidden Concept Discovery in Image Database, IEEE Transaction on Image Processing, Volume 16, Number 2, February, 2006, pp 562 -- 572
Arif Ghafoor, Zhongfei (Mark) Zhang, Michael S. Lew, and Zhi-Hua Zhou, Guest Editors' Introduction to Machine Learning Approaches to Multimedia Information Retrieval, ACM Multimedia Systems Journal, Springer, 2006
Zhongfei (Mark) Zhang, Querying Non-Uniform Image Databases for Biometrics-Related Identification Applications, Sensor Review, Emerald Publishers, Volume 26, Number 2, April, 2006, pp 122-126
Ruofei Zhang and Zhongfei (Mark) Zhang, Empirical Bayesian Learning in the Relevance Feedback of Image Retrieval, Image and Vision Computing, Elsevier Science, Volume 24, Issue 3, March, 2006, pp 211-223
Ruofei Zhang, Zhongfei (Mark) Zhang, Mingjing Li, Wei-Ying Ma, and Hong-Jiang Zhang, A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval, ACM Multimedia Systems Journal, the special issue of Using Machine Learning Approaches to Multimedia Information Retrieval, Springer, 2006
Jian Yao and Zhongfei (Mark) Zhang, Hierarchical Shadow Detection for Color Aerial Images, Computer Vision and Image Understanding, Elsevier Science, Volume 102, Issue 1, April, 2006, pp 60-69
Bo Long, Xiaoyun Wu, Zhongfei (Mark) Zhang, and Philip S. Yu, Unsupervised Learning on K-partite Graphs, Proc. ACM International Conference on Knowledge Discovery and Data Mining, ACM Press, Philadelphia, PA, USA, August, 2006
[pdf]
Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu, Spectral Clustering for Multi-Type Relational Data, Proc. International Conference on Machine Learning, ACM Press, Pittsburgh, PA, USA, June, 2006
[pdf]
Xiao-Bing Xue, Zhi-Hua Zhou, and Zhongfei (Mark) Zhang, Improve Web Search Using Image Snippets, Proc. the 21st National Conference on Artificial Intelligence, AAAI Press, Boston, MA, USA, July, 2006
Jian Yao, Zhongfei (Mark) Zhang, Sameer Antani, Rodney Long, and George Thoma, Automatic Medical Image Annotation and Retrieval Using SEMI-SECC, Proc. IEEE International Conference on Multimedia and Expo, IEEE Computer Society Press, Toronto, Canada, July, 2006
Jian Yao, Sameer
Antani, Rodney Long, and George Thoma, and Zhongfei (Mark) Zhang, Automatic
Medical Image Annotation and Retrieval Using SECC, Proc. IEEE International
Symposium on Computer Based Medical Systems, IEEE Computer Society Press,
Salt Lake City, Utah, USA, June, 2006
[pdf]
![]()
Code Release :
EMML code(based on paper: Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007)
![]()
Partners:
This material is based upon the work supported by the National Science Foundation under Award No. 0535162.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.