Estimation of Spatio-Temporal Object Properties for Manipulation Tasks from Observation of Humans

A robot should be able to interact with humans in their daily life. An essential requirement is the capability to manipulate objects in the complex environment of humans. A fixed pre-programmed approach is not appropriate here, since it is too limited and the programming is time-consuming. In order to learn, how objects can be manipulated, the robot has to be able to observe demonstrations (see Figure 1). The relevant spatio-temporal properties of the manipulated objects need to be estimated.

Figure 1. : Observation of a human demonstration.

The extracted knowledge has to be stored in an appropriate manner. Our system consists of two components: the Atlas and the Working Memory. The Atlas is the long-term memory, which provides the "experience" of the system. It contains general knowledge, as well as a-priori knowledge of object properties, which can not be observed with a purely camera-based observation system. The Working Memory is the counterpart of the Atlas: It is the short-term memory. The information about the object in the current scene is stored in the Working Memory. The general knowledge in the Atlas can be mapped into the current scene (see Figure 2), considering its context. It is important to notice, that the system does not consist of pre-defined rules. It is able to learn from its own actions and the observation of others.

Figure 2. : Atlas and Working Memory.

In complex environments, it is useful to focus the attention on areas, which are currently relevant. The object candidates, which can be manipulated, are the relevant areas to the robot in a manipulation task. Consequently, these object candidates have to be segmented from the rest of the scene (see Figure 3).

Figure 3. Left: Original color image. Middle: Disparity image of the color image on the left. Right: Remaining object in the disparity image.

As soon as the human grasps an object candidate, it is selected as the foreground of the scene. The foreground is relevant to the manipulation task and, therefore, observed (see Figure 4). The remaining structure in the scene is the background, which is mainly interesting for obstacle avoidance.

Figure 4. Left: Contact detection and initialization of the tracking. Right: Example of features during the tracking.

It is important to point out here, that the detection of the foreground is done in 3D, whereas the tracking of the manipulated object is done in 2D. The 2D-tracking is less computational intensive. The observed manipulation is recorded as 6DoF-trajectory, providing the necessary information for the estimation of spatio-temporal object properties. For example, the appearance of rotation during the manipulation can be determined. Another property, which can be extracted, is the development of the speed (see Figure 5).

Figure 5. Left: Object trajectory: Development of the rotated and translated object, represented by its normal (yellow). Right: Object trajectory: Development of the speed (green = no movement, yellow = slow movement, red = movement, blue = fast movement).


  • Susanne Petsch and Darius Burschka. Estimation of Spatio-Temporal Object Properties for Manipulation Tasks from Observation of Humans. In IEEE International Conference on Robotics and Automation, pages 192-198, Anchorage, USA, 2010.