Research Statement
1. Indexing &
Retrieval of Large Chinese Calligraphic Character Databases
|
The large amount of Chinese calligraphic scripts in existence is
a valuable part of the Chinese cultural heritage. As shown in Fig. 1, although these scripts can be available through public
libraries or Internet, they can hardly be retrieved by optical character
recognition (OCR) which performs well on machine printed characters against
clean background. The reasons why no effective techniques can support the
retrieval of Chinese calligraphic characters written in different styles are
as follows: (1) Complexity: It has been estimated that an
average character is composed of an average number of 12.71 strokes [Zhang
1992; Wu et al, 1992]. The size of a stroke varies according to the total
numbers of strokes composing the character, and the size of a stroke segment
depends on the total number of segments composing the stroke. (2) Deformation: The same writer under different
moods can generate different styles of the same character. Sometimes a
character is deformed consciously for a better artistic effect. (3) Degradation: Many ancient calligraphic works have been degraded by nature
changes. Moreover, so
far, no efficient techniques have been proposed to retrieve and index large
Chinese calligraphic character databases. In essence, the indexing issue of Chinese
calligraphic character belongs to the category of high- dimensional data
indexing, for which considerable research work has been done on the
high-dimensional indexing issue [Bohm et al, 2001].
Unfortunately, the existing indexing methods
for the high-dimensional data can not be directly applied to the Chinese
calligraphic characters due to their |
|
|
Fig. 1. The example of Chinese calligraphic character
script |
unique
characteristics:
l
To depict each Chinese calligraphic
character, a set of contour points is extracted from each Chinese calligraphic
character image [Cadal 2005]. Due to the complexity of Chinese
characters, the number of contour points extracted from each Chinese
calligraphic character image is very large (in general, above 150 dimensions),
As a consequence, if each point occupies an entry in the character’s vector
representation, the dimensionality of a vector is very high and conventional
multi-dimensional indexing techniques, such as R-tree [Guttman.
1998] and k-d-tree [Bentley 1975] can not be used to index due to the “curse of
dimensionality”.
l
The number of contour points of
each character is different from each other owing to their shape’s complexity.
Thus, for different characters, the dimensionalities of their
representative vectors may vary. Many existing high-dimensional indexing
schemes (e.g., R-tree [Guttman. 1998], VA-file [Weber
et al. 1998], etc.) are not suitable for indexing the Chinese calligraphic
characters only because they can not well handle the high- dimensional search
with dynamic dimensionalities.
In this
project, we propose a novel high-dimensional indexing for effective and
efficient retrieving of Chinese calligraphic characters. (see
[Zhuang et al. TALIP 07] and [Zhuang et al. CIKM’06])
Here are three
demos: [Demo 1], [Demo
2], [Demo 3]
2.
Database Support for Efficient Retrieval of Large Cross-Media Retrieval
|
Content-based
multimedia retrieval and indexing issues have been extensively studied for
many years. The existing approaches, however, mainly
focus on the single-modality-based retrieval, such as content-based image
retrieval [1] and content-based video retrieval [2], etc. However, for the
media objects of different modalities (e.g., image, audio and video, etc) in webpages, there exists to some extent latent semantic
correlation among them. Cross-media retrieval [5] as a new multimedia
retrieval method has received increasing attention in the research community.
It is perceived as a novel retrieval method which returns the media objects
of multi-modalities in response to a query media object of single modality.
For example, when user submits a “tiger” image, the retrieval system may
return some “tiger”-related audio clips and video clips via using so called Cross Reference Graph [5]. Compared
with the traditional single-modality retrieval [1,2],
the cross-media retrieval tries to breakthrough the restriction of media
modality in multimedia retrieval. Up to now, the research of the cross-media
retrieval and indexing has been barely touched. The fundamental challenges of
it lie in two folds: ● How to model the correlation of media objects
of different modalities? For example, for an image and an audio clip, it is
hard to measure the correlation between them with the same semantic
information since they are two different modality media objects, ● To effectively and efficiently facilitate
the cross-media retrieval with different modalities (e.g., image, audio and
video, etc), how to generate and index a cross reference graph via link
analysis among the webpages downloaded from the
Internet? To the best of
our knowledge, Octopus [5] is the
first prototype system to support the multi-modality retrieval. No index,
however, has been considered in this system, so the retrieval performance of
the Octopus is not satisfactory
when the number of media objects becomes large. To speed up the large-scale
cross-media retrieval efficiency, the study of multi-feature indexing method
has received much attention nowadays. In [3], Jagadish
proposed a multi- feature indexing method. Shen, et
al [4] also proposed a multi-scale indexing (MSI) structure for web image retrieval, which nicely combines two
modality information of image, such as its semantic and visual features.
However the usability of the MSI is very limited
and it can not support a unified indexing expression for multi-modality media
objects. In this project, based on a cross reference graph [5],
we propose a novel integrated indexing scheme called CIndex
(see [Zhuang et al. JOS 2008), which is
specifically designed for indexing the cross-media retrieval over large
multi-modality media databases. With the aid of the CIndex,
a cross-media retrieval of a query example in high- dimensional spaces is
transformed into a range query in the single dimensional space. Moreover, our
technique permits the immediate generation of a few results while additional
results are searched for. Here is a demo: [Demo 4] |
|
|
(a).
By image |
|
|
|
|
|
(b).
By audio |
|
|
|
|
|
(c).
By video |
|
|
Fig.
2. The example of Cross-media retrieval |