Relational
Data Community Discovery and Learning
Sponsor: National Science Foundation
![]()
This proposed project
addresses a three year integrated research and education program focusing on
engaging into an in-depth research on a series of fundamental, open, but very
important issues leading to relational data community discovery and learning, built
upon our existing strength on the state-of-the-art research on this topic.
The intellectual merit
of this project includes the revolutionized understanding of the unsupervised
general relational data clustering and learning as well as the expected breakthrough
in the community discovery and learning methodologies that shall undoubtedly
advance the literature of data mining and machine learning and generate
profound impact in the related areas.
The broader impacts of
this project are two folds. Educationally, the development, the implementation,
and the evaluation of the innovative community outreach activities proposed in
this project shall promote the timely and effective knowledge dissemination
related to relational data mining and machine learning and shall further enrich
the pedagogical literature; the disseminated knowledge to the collaborating
parties, especially the collaborating high school, shall further advance and
enhance the high school education services and syllabi and develop the model for
high schools' research and services to the whole society. Technologically, the
expected breakthrough in developing the novel theory on relational data
community discovery and learning shall embrace a new era of technological
revolution in a wide range of applications in the world, and in particular,
shall benefit the collaborating organizations in developing and advancing their
domain expertise in applications related to social network mining in general
and Web data mining in particular.
It is well-observed that
the whole world is full of data, and is also highly related in terms of the
different types of the data objects such as people, organizations, and events.
In many applications, it is intended to discover the hidden structures through
such relationships involving different types of data objects in the world, in
addition to "clusters" of the same type of data objects. For example,
in financial services, it is often needed to identify any potential fraud
activities reflected in the normal transactions that involve people and
financial institutions; in commercial sales, it is often needed to link the
customer purchase patterns to the potential sales promotion strategies to
identify what kinds of customers are related to what kinds of commercial products
through what kinds of service providers; in Web search industries, it is
extremely desirable to identify what kinds of users use what kinds of Web pages
and are highly influenced by what kinds of advertisements related to what kinds
of commercial industries.
On the other hand, it is
also true that it is too often that we do not have the luxury to have any
training data with ground truth for knowledge discovery. Consequently,
unsupervised relational data learning is expected and desired for all these
situations.
In this research, we
focus on the most general scenario of relational data: the data objects may
have attributes, homogeneous relations (among data objects of the same type)
and/or heterogeneous relations (between data objects of different types). Given
such general relational data, all the practical situations are considered as
the special cases of this general scenario, and thus the novel unified theory
as well as the related methodologies we wish to develop in this research shall
be applicable to any real-world relational data knowledge discovery problems,
potentially resulting in revolutionary technology development and making the
proposed work fundamentally new and uniquely distinct from all the existing
literature. Consequently, we define a relational data community in the broad
sense that includes not only the local clusters of the same type of data
objects, but more importantly also the global, hidden structures incorporating
relationships with different types of data objects.
Relational data
community discovery and learning is a fairly new area with many challenging and
fundamentally new issues completely open.
On the other hand,
solutions to these issues may lead to revolutionary technology development that
shall generate significant societal impacts.
The work to be accomplished in this project shall be radically new because it is based on innovative preliminary research and it is to address a set of fundamentally new problems with fundamentally new solutions that not only aim at developing a better in-depth understanding of the literature, but more importantly it is likely to generate revolutionary technology development with significant societal impacts. Specifically, this project focuses on the following three objectives to be achieved synergistically: (1) to address a series of challenging, fundamentally new, but very important issues on relational data community discovery and learning to lead to the development of a unified, fundamentally new theory on this topic to have a better in-depth understanding of the literature; (2) to extensively evaluate the theory and methodologies to be developed in collaborations with the domain experts in Web search industries as a specific application to social network mining; and (3) to develop and evaluate the innovative community outreach and education activities through the existing partnership with a local high school to further promote the knowledge dissemination from this research.
![]()
NSF Project Manager: Dr. Maria Zemankova
![]()
Project Personnel:
PI: Prof. Zhongfei (Mark) Zhang
PhD student:
![]()
Partners:
![]()
Publications:
Zhen Guo Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Relational Data Clustering: Models, Algorithms, and Applications, Taylor & Francis/CRC Press, 2009, ISBN: 9781420072617
Zhongfei Zhang and Haroon Khan, A Holistic, In-Compression Approach to Mining Independent Motion Segments for Massive Surveillance Video Collections, in Video Search and Mining, Edited by Dan Schonfeld, Caifeng Shan, Dacheng Tao, and Liang Wan, Springer, 2009
Zhongfei (Mark) Zhang and Ruofei Zhang, Multimedia Data Mining, in Data Mining and Knowledge Discovery Handbook, 2nd Ed., Edited by Oded Maimon and Lior Rokach, Springer, 2009
Zhongfei (Mark) Zhang, Bo Long, Zhen Guo, Tianbing Xu, and Philip S. Yu, Machine Learning Approaches to Link-Based Clustering, in Link Mining: Models, Algorithms and Applications, Edited by Philip S. Yu, Christos Faloutsos, and Jiawei Han, Springer, 2009
Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, A Max Margin Framework on Image Annotation and Multimodal Image Retrieval, in Multimedia, Edited by Vedran Kordic, IN-TECH, 2009
Zhongfei (Mark) Zhang, Zhen Guo, Christos Faloutsos, Eric P. Xing, and Jia-Yu Pan, On the scalability and adaptability for multimodal image retrieval and image annotation, in Machine Learning Techniques for Adaptive Multimedia Retrieval: Technologies Applications and Perspectives, Edited by Roger Wei, Idea Group Inc., 2010
Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, A General Framework for Relation Graph Clustering, Knowledge and Information Systems Journal, Elsevier Science Press, Accepted, 2009
Zhen Guo, Zhongfei (Mark) Zhang, Shenghuo Zhu, Yun Chi, and Yihong Gong, Knowledge Discovery from Citation Networks, Proc. IEEE International Conference on Data Mining, Miami, FL, USA, December, 2009
Zhen Guo, Shenghuo Zhu, Yun Chi, Zhongfei (Mark) Zhang, and Yihong Gong, A latent topic model for linked documents, Proc. ACM International Conference SIGIR, Boston, MA, USA, July, 2009
Xi Li, Weiming Hu, Zhongfei (Mark) Zhang, and Yang Liu, Spectral Graph Partitioning Based on A Random Walk Diffusion Similarity Measure, Proc. Asian Conference on Computer Vision, XiAn, China, September, 2009
![]()
Code Release :
Zhen Guo, Zhongfei (Mark) Zhang, Shenghuo Zhu, Yun Chi, and Yihong Gong, Knowledge Discovery from Citation Networks, Proc. IEEE International Conference on Data Mining, Miami, FL, USA, December, 2009 [code]
![]()
Data Release :
Zhen Guo, Zhongfei (Mark) Zhang, Shenghuo Zhu, Yun Chi, and Yihong Gong, Knowledge Discovery from Citation Networks, Proc. IEEE International Conference on Data Mining, Miami, FL, USA, December, 2009 [data]
This material is based
upon the work supported by the National Science Foundation under Award
No. 0812114.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.