logo

You are here

Warning message

Attention! This event has already passed.

Context-Sensitive Kernel Functions for Support Vector Machine Learning with Sequences of Symbolic Data : A Distance Function Viewpoint

Monday, 8 May, 2006 - 17:00
Campus: Brussels Humanities, Sciences & Engineering campus
Faculty: Science and Bio-engineering Sciences
D
2.01
Bram Vanschoenwinkel
phd defence

In classification problems, machine learning algorithms often make use
of the assumption that similar inputs lead to similar outputs. In this
setting, two questions naturally arise: what does it mean for two input
instances to be similar and how can this be used in a learning
algorithm? In support vector machine learning, similarity between
instances in the input space X, is implicitly expressed by a kernel
function K: X x X -> F that calculates inner products in a
high-dimensional Hilbert space F. In the support vector machine
literature this Hilbert space F is called the feature space, and
separation of the instances is done in that feature space F. More
precisely, separation involves making use of a maximum-margin hyperplane
which maximizes the distance between the hyperplane and the closest
instances. In this light the support vector machine can be considered as
a distance-based machine learning technique as the distance between the
instances plays a crucial role in the classification process.

This dissertation describes an approach to support vector machine
learning for instances that are formed by sliding a window over a
sequence of strings. The instances formed in this way are called
contexts and they consist out of a target string and a number or context
of strings before and after the target string. Applications that rely on
instances that are described by such contexts are called
context-dependent classification problems. An example of this is
part-of-speech tagging. In part-of-speech tagging it is the purpose to
assign words in a sentence to a syntactical category (e.g. noun, verb,
cardinal, etc.). Assigning a word to a syntactical category can be done
by looking at the word itself and a context of a number of words before
and after that word.

In this setting however, two key difficulties arise. First of all,
numerical instances can be represented by real vectors, such data is
said to be vectorial and in this case the concept of an inner product is
easy to define, however for discrete structures like contexts which are
non-vectorial, this concept is less obvious. Secondly, for symbolic data
similarity is often determined making use of special purpose similarity
measures and distance functions, for a great deal depending on the
application domain under consideration.

To cope with the non-vectorial instances, in the support vector machine
setting, two approaches exist: i) the pre-processing approach and ii)
the direct approach. In the pre-processing approach the symbolic
instances are first transformed to real vectors which are then used as
arguments for one of the standard kernel functions. In the direct
approach on the other hand, special kernel functions, that work directly
on the symbolic data, are used to implicitly map the symbolic data to
real vectors in the feature space F.

Although the pre-processing approach has been successfully used for a
wide range of applications, it has the disadvantage that for some
applications it becomes hard or even impossible to make use of the
special purpose similarity measures one would like to use in the kernel
function K. For contexts, until now, only the use of the pre-processing
approach has been described in the literature. More precisely, contexts
are often represented making use of an orthonormal vector encoding. In
this setting every distinct string occurring in the data is represented
by a unique unit vector and contexts are formed by concatenating these
unit vectors.

In this dissertation however it will be shown that for contexts the
direct approach should be preferred over the use of an orthonormal
vector encoding as such an encoding does not allow to use the special
purpose distance functions and similarity measures that are used in
context-dependent classification problems.

In this light, this dissertation will describe an approach to work with
and reason about kernels for contexts from the viewpoint of the distance
functions that are used in context-dependent applications. We start by
introducing some basic notation which will make it possible to work with
and reason about contexts and their orthonormal encoding. Next,
different methods for measuring similarity between contexts will be
described, followed by the theory needed to understand how these methods
can be used in the support vector machine framework. In the end we
introduce a number of kernel functions that are defined completely in
terms of the similarity between the contexts themselves.

In addition, because the similarity measures these kernels are based on
take into account the amount of information that is present at different
positions in the contexts, it will become possible to easily use such
information in the kernels themselves. In that sense the kernel
functions are said to be context-sensitive.

Finally, the proposed kernels are applied to different context-dependent
classification problems both in the application domains of natural
language learning and bio-informatics. From the results it will become
clear that the context-sensitive kernels outperform their counterparts
that are not context-sensitive both in terms of accuracy and model
complexity.