Clustering in Image Space for Place Recognition and Visual Annotations for Human-robot Interaction


IEEE Transactions on System Man and Cybernetics B, Vol. 31, No. 5, pp. 669-682, 2001


Abstract:

The most classical way of attempting to solve the vision-guided navigation problem for autonomous robots corresponds to the use of 3D geometrical descriptions of the scene; what is known as model-based approaches. However, these approaches do not facilitate the user's task because they require that geometrically precise models of the 3D environment be given by the user. In this paper, we propose the use of ``annotations'' posted on some type of blackboard or ``descriptive'' map to facilitate this user-robot interaction. We show that using this technique user commands can be as simple as ``go to label 5''.

To build such a mechanism, new approaches for vision-guided mobile robot navigation have to be found. We show that this can be achieved by means of mixture models within an appearance-based paradigm. Mixture models are more useful in practice than other pattern recognition methods such as PCA (Principal Component Analysis) or FDA (Fisher Discriminant Analysis - also known as Linear Discriminant Analysis, LDA), because they can represent non-linear sub-spaces.

However, given the fact that mixture models are usually learned using the EM (Expectation-Maximization) algorithm which is a gradient ascent technique, the system cannot always converge to a desired final solution, due to the local maxima problem. To resolve this, a genetic version of the EM algorithm is used. We then show the capabilities of this latest approach on a navigation task that uses the above describe ``annotations".