Martin Tall On Gaze Interaction: zoom

Showing posts with label zoom. Show all posts

Tuesday, August 10, 2010

Eye control for PTZ cameras in video surveillance

Bartosz Kunka, a PhD student at the Gdańsk University of Technology have employed a remote gaze-tracking system called Cyber-Eye to control PTZ cameras in video surveillance and video-conference systems. The movie prepared for system presentation on Research Challange at SIGGRAPH 2010 in Los Angeles.

Monday, September 14, 2009

GaZIR: Gaze-based Zooming Interface for Image Retrieval (Kozma L., Klami A., Kaski S., 2009)

From the Helsinki Institute for Information Technology, Finland, comes a research prototype called GaZIR for gaze based image retrieval built by Laszlo Kozma, Arto Klami and Samuel Kaski. The GaZIR prototype uses a light-weight logistic regression model as a mechanism for predicting relevance based on eye movement data (such as viewing time, revisit counts, fixation length etc.) All occurring on-line in real time. The system is build around the PicSOM (paper) retrieval engine which is based on tree structured self-organizing maps (TS-SOMs). When provided a set of reference images the PicSOM engine goes online to download a set of similar images (based on color, texture or shape)

Abstract
"We introduce GaZIR, a gaze-based interface for browsing and searching for images. The system computes on-line predictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. The key novelty is that the relevance feedback is inferred from implicit cues obtained in real-time from the gaze pattern, using an estimator learned during a separate training phase. The natural zooming interface can be connected to any content-based information retrieval engine operating on user feedback. We show with experiments on one engine that there is sufficient amount of information in the gaze patterns to make the estimated relevance feedback a viable choice to complement or even replace explicit feedback by pointing-and-clicking."

Fig1. "Screenshot of the GaZIR interface. Relevance feedback gathered from outer rings influences the images retrieved for the inner rings, and the user can zoom in to reveal more rings."

Fig2. "Precision-recall and ROC curves for userindependent relevance prediction model. The predictions (solid line) are clearly above the baseline of random ranking (dash-dotted line), showing that relevance of images can be predicted from eye movements. The retrieval accuracy is also above the baseline provided by a naive model making a binary relevance judgement based on whether the image was viewed or not (dashed line), demonstrating the gain from more advanced gaze modeling."

Fig 3. "Retrieval performance in real user experiments. The bars indicate the proportion of relevant images shown during the search in six different search tasks for three different feedback methods. Explicit denotes the standard point-and-click feedback, predicted means implicit feedback inferred from gaze, and random is the baseline of providing random feedback. In all cases both actual feedback types outperform the baseline, but the relative performance of explicit and implicit feedback depends on the search task."

László Kozma, Arto Klami, and Samuel Kaski: GaZIR: Gaze-based Zooming Interface for Image Retrieval. To appear in Proceedings of 11th Conference on Multimodal Interfaces and The Sixth Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI), Boston, MA, USA, Novermber 2-6, 2009. (abstract, pdf)

Thursday, September 18, 2008

The Inspection of Very Large Images by Eye-gaze Control

Nicholas Adams, Mark Witkowski and Robert Spence from the Department of Electrical and Electronic Engineering at the Imperial College London got the HCI 08 Award for International Excellence for work related to gaze interaction.

"The researchers presented novel methods for navigating and inspecting extremely large images solely or primarily using eye gaze control. The need to inspect large images occurs in, for example, mapping, medicine, astronomy and surveillance, and this project considered the inspection of very large aerial images, held in Google Earth. Comparative search and navigation tasks suggest that, while gaze methods are effective for image navigation, they lag behind more conventional methods, so interaction designers might consider combining these techniques for greatest effect." (BCS Interaction)

Abstract

The increasing availability and accuracy of eye gaze detection equipment has encouraged its use for both investigation and control. In this paper we present novel methods for navigating and inspecting extremely large images solely or primarily using eye gaze control. We investigate the relative advantages and comparative properties of four related methods: Stare-to-Zoom (STZ), in which control of the image position and resolution level is determined solely by the user's gaze position on the screen; Head-to-Zoom (HTZ) and Dual-to-Zoom (DTZ), in which gaze control is augmented by head or mouse actions; and Mouse-to-Zoom (MTZ), using conventional mouse input as an experimental control.

The need to inspect large images occurs in many disciplines, such as mapping, medicine, astronomy and surveillance. Here we consider the inspection of very large aerial images, of which Google Earth is both an example and the one employed in our study. We perform comparative search and navigation tasks with each of the methods described, and record user opinions using the Swedish User-Viewer Presence Questionnaire. We conclude that, while gaze methods are effective for image navigation, they, as yet, lag behind more conventional methods and interaction designers may well consider combining these techniques for greatest effect.

Download paper as PDF

This paper is the short version of Nicolas Adams Masters thesis which I stumbled upon before creating this blog. A early version appeared as a short paper for COGAIN06.

Tuesday, April 15, 2008

Gaze Interaction Demo (Powerwall@Konstanz Uni.)

During the last few years quite a few wall sized displays have been used for novel interaction methods. Not seldomly these have been used with multi-touch, such as the Jeff Han´s FTIR technology. This is the first demonstration I have seen where eye tracking is used for a similar purpose. A German Ph.D candidate, Jo Bieg, is working on this out of the HCI department at the University of Konstanz. The Powerwall is 5.20 x 2.15M and has a resolution of 4640 x 1920.

The demonstration can be view at a better quality (10Mb)

Also make sure to check out the 360 deg. Globorama display demonstration. It does not use eye tracking for interaction but a laser pointer. Nevertheless, really cool immersive experience, especially the Google Earth zoom in to 360 panoramas.

Wednesday, March 12, 2008

Eye Gaze Interaction with Expanding Targets (Minotas, Spakov, MacKenzie, 2004)

Continuing on the topic of expanding areas this paper presents an approach where the expansion of the target area is invisible.The authors introduce their algorithm called "Grab-and-hold" which aims at stablizing the gaze data and performs a two part experiment to evaluate it.

Abstract
"Recent evidence on the performance benefits of expanding targets during manual pointing raises a provocative question: Can a similar effect be expected for eye gaze interaction? We present two experiments to examine the benefits of target expansion during an eye-controlled selection task. The second experiment also tested the efficiency of a “grab-and-hold algorithm” to counteract inherent eye jitter. Results confirm the benefits of target expansion both in pointing speed and accuracy. Additionally, the grab-and-hold algorithm affords a dramatic 57% reduction in error rates overall. The reduction is as much as 68% for targets subtending 0.35 degrees of visual angle. However, there is a cost which surfaces as a slight increase in movement time (10%). These findings indicate that target expansion coupled with additional measures to accommodate eye jitter has the potential to make eye gaze a more suitable input modality." (Paper available here)

Their "Grab-and-hold" algorithm that puts some more intelligent processing of the gaze data. "Upon appearance of the target, there is a settle-down period of 200 ms during which the gaze is expected to land in the target area and stay there. Then, the algorithm filters the gaze points until the first sample inside the expanded target area is logged. When this occurs, the target is highlighted and the selection timer triggered. The selection timer counts down a specified dwell time (DT) interval. "

While reading this paper I came to think about an important question concerning filtering of gaze data. The delay that comes from collecting the samples used for the algorithm processing causes a delay in the interaction. For example, if I were to sample 50 gaze positions and then average these to reduce the jitter it would result in a one second delay on a system that captures 50 images per second (50Hz) As seen in other papers as well there is a speed-accuracy trade off to make. What is more important, a lower error rate or a more responsive system?

Friday, March 7, 2008

Inspiration: All Eyes on the Monitor (Mollenbach et al, 2008)

Going further with the Zooming User Interface (ZUI) is the prototype descibed in the "All Eyes on the Monitor: Gaze Based Interaction in Zoomable, Multi-Scaled Information-Space" (E. Mollenbach, T. Stefansson, J-P Hansen) developed at the Loughborough University in the U.K and the ITU INC, Denmark. It employes the gaze based pan/zoom interaction style which is suitable for gaze interaction to resolve the inaccuracy (target sizes increase when zooming in to them) Additionally, the results indicate that for certain tasks gaze based interaction is faster than traditional mouse operation.

ABSTRACT

The experiment described in this paper, shows a test environment constructed with two information spaces; one large with 2000 nodes ordered in semi-structured groups in which participants performed search and browse tasks; the other was smaller and designed for precision zooming, where subjects performed target selection simulation tasks. For both tasks, modes of gaze- and mouse-controlled navigation were compared. The results of the browse and search tasks showed that the performances of the most efficient mouse and gaze implementations were indistinguishable. However, in the target selection simulation tasks the most efficient gaze control proved to be about 16% faster than the most efficient mouse-control. The results indicate that gaze-controlled pan/zoom navigation is a viable alternative to mouse control in inspection and target exploration of large, multi-scale environments. However, supplementing mouse control with gaze navigation also holds interesting potential for interface and interaction design. Download paper (pdf)

The paper was presented at the annual International Conference for Intelligent Interfaces (IUI) that was held in Maspalomas, Gran Canaria between 13-16th January 2008.

Monday, March 3, 2008

Zooming and Expanding Interfaces / Custom componenets

The inspiration I got from the reviewed papers on using a zooming interaction style to developing a set of zoom based interface components. The interaction style is suitable for gaze to overcome the inaccuracy and jitter of eye movements. My intention is that the interface components should be completely standalone, customizable and straightforward to use. Ideally included in new projects by importing one file and writing one line of code.

The first component is a dwell-based menu button that on fixation will a) provide a dwelltime indicator by animating a small glow effect surrounding the button image and b) after 200ms expand an ellipse that houses the menu options. This produces a two step dwell activation while making use of the display area in a much more dynamic way. The animation is put in place to keep the users fixation remained at the button for the duration of the dwell time. The items in the menu are displayed when the ellipse has reached its full size.

This gives the user a feedback in his parafoveal region and at the same time the glow of the button icon has stopped indicating a full dwelltime execution. (bit hard to describe in words, perhaps easier to understand from the images below) The parafoveal region of our visual field is located just outside the foveal region (where the full resolution vision takes place). The foveal area is about the size of a thumbnail on an armslengths distance, items in the parafoveal region still can be seen but the resolution/sharpness is reduced. We do see them but have to make a short saccade for them to be in full resolution. In other words the menu items pop out at a distance that attracts a short saccade which is easily discriminated by the eye tracker. (Just4fun test your visual field)

Before the button has received focus

Upon fixation the button image displays an animated glow effect indicating the dwell process. The image above illustrates how the menu items pops out on the ellipse surface at the end of the dwell. Note that the ellipse grows in size during a 300ms period, exact timing is configurable by passing a parameter in the XAML design page.

The second prototype I have been working on is also inspired by the usage of expanding surfaces. The purpose is a gaze driven photo gallery where thumbnail sized image previews becomes enlarged upon glancing at them. The enlarged view displays an icon which can be fixated to make the photo appear in full size.

Displaying all the images in the users "My pictures" folder.

Second step, glancing at the photos. Dynamically resized. Optionally further enlarged.

Upon glancing at the thumbnails they become enlarged which activates the icon at the bottom of each photo. This enables the user to make a second fixation on it to bring the photo into a large view. This view has to two icons to navigate back and forth (next photo). By fixating outside the photo the view goes back to the overview.

Saturday, February 23, 2008

Inspiration: EyeWindows (Fono et al, 2005)

Continuing on the zooming style of interaction that has become common within the field of gaze interaction is the "EyeWindows: Evalutaion of Eye-Controlled Zooming Windows for Focus Selection" (Fono&Vertegaal, 2005) Their paper describes two prototypes, one media browser with dynamic (elastic) allocation of screen real estate. The second prototype is used to dynamically size desktop windows upon gaze fixation. Overall, great examples presented in a clear, well structured paper. Interesting evaluation of selection techniques.

Abstract
In this paper, we present an attentive windowing technique that uses eye tracking, rather than manual pointing, for focus window selection. We evaluated the performance of 4 focus selection techniques: eye tracking with key activation, eye tracking with automatic activation, mouse and hotkeys in a typing task with many open windows. We also evaluated a zooming windowing technique designed specifically for eye-based control, comparing its performance to that of a standard tiled windowing environment. Results indicated that eye tracking with automatic activation was, on average, about twice as fast as mouse and hotkeys. Eye tracking with key activation was about 72% faster than manual conditions, and preferred by most participants. We believe eye input performed well because it allows manual input to be provided in parallel to focus selection tasks. Results also suggested that zooming windows outperform static tiled windows by about 30%. Furthermore, this performance gain scaled with the number of windows used. We conclude that eye-controlled zooming windows with key activation provides an efficient and effective alternative to current focus window selection techniques. Download paper (pdf).

David Fono, Roel Vertegaal and Conner Dickie are researchers at the Human Media Lab at the Queen's University in Kingston, Canada.

Friday, February 22, 2008

Inspiration: Fisheye Lens (Ashmore et al. 2005)

In the paper "Efficient Eye Pointing with a Fisheye Lens" (Ashmore et al., 2005) the usage of a fish eye magnification lens is slaved to the foveal region of the users gaze. This is another usage of the zooming style of interaction but compared to the ZoomNavigator (Skovsgaard, 2008) and the EyePointer (Kumar&Winograd, 2007) this is a continuous effect that will magnify what ever the users gaze lands upon. In other words, it is not meant to be a solution for dealing with the low accuracy of eye trackers in typical desktop (windows) interaction. Which makes is suitable for tasks of visual inspection for quality control, medical x-ray examination, satellite images etc. On the downside the nature of the lens distorts the image which breaks the original spatial relationship between items on the display (as demonstrated by the images below)

Abstract
"This paper evaluates refinements to existing eye pointing techniques involving a fisheye lens. We use a fisheye lens and a video-based eye tracker to locally magnify the display at the point of the user’s gaze. Our gaze-contingent fisheye facilitates eye pointing and selection of magnified (expanded) targets. Two novel interaction techniques are evaluated for managing the fisheye, both dependent on real-time analysis of the user’s eye movements. Unlike previous attempts at gaze-contingent fisheye control, our key innovation is to hide the fisheye during visual search, and morph the fisheye into view as soon as the user completes a saccadic eye movement and has begun fixating a target. This style of interaction allows the user to maintain an overview of the desktop during search while selectively zooming in on the foveal region of interest during selection. Comparison of these interaction styles with ones where the fisheye is continuously slaved to the user’s gaze (omnipresent) or is not used to affect target expansion (nonexistent) shows performance benefits in terms of speed and accuracy" Download paper (pdf)

The fish eye lens has been implemented commercially into the products of Idelix Software Inc. which has a set of demonstration available.

Wednesday, February 20, 2008

Inspiration: GUIDe Project (Kumar&Winograd, 2007)

In the previous post I introduced the ZoomNavigator (Skovsgaard, 2008) which is similar to the EyePointer system (Kumar&Winograd, 2007) developed within the GUIDe project (Gaze-Enhaced User Interface Design) an initiative by the department for Human Computer Interaction at Stanford University. This system relies on both an eye tracker and a keyboard which excludes users with disabilities (see video below). The aim of the GUIDe is to make the whole human-computer interaction "smarter" (as in intuitive, faster & less cumbersome) This differs from the COGAIN initiative which mainly aims at giving people with disabilities a higher quality of life.

Abstract
"The GUIDe (Gaze-enhanced User Interface Design) project in the HCI Group at Stanford University explores how gaze information can be effectively used as an augmented input in addition to keyboard and mouse. We present three practical applications of gaze as an augmented input for pointing and selection, application switching, and scrolling. Our gaze-based interaction techniques do not overload the visual channel and present a natural, universally-accessible and general purpose use of gaze information to facilitate interaction with everyday computing devices." Download paper (pdf)

Demonstration video
"The following video shows a quick 5 minute overview of our work on a practical solution for pointing and selection using gaze and keyboard. Please note, our objective is not to replace the mouse as you may have seen in several articles on the Web. Our objective is to provide an effective interaction technique that makes it possible for eye-gaze to be used as a viable alternative (like the trackpad, trackball, trackpoint or other pointing techniques) for everyday pointing and selection tasks, such as surfing the web, depending on the users' abilities, tasks and preferences."

The use of "focus points" is good design decisions as it provides the users with a fixation point which is much smaller than the actual target. This provides a clear and steady fixation which is easily discriminated by the eye tracker. The idea of displaying something that will "lure" the users fixation to remain still is something I intend to explore in my own project.

As mentioned the GUIDe project has developed several applications besides the EyePoint, such as the EyeExpose (application switching), gaze-based password entry, automatic text scrolling.
More information can be found in the GUIDe Publications

Make sure to get a copy of Manu Kumars Ph.D thesis "Gaze-enhanced User Interface Design" which is pleasure to read. Additionally, Manu have founded GazeWorks a company which aims at making the technology accessible for the general public at a lower cost.

Inspiration: ZoomNavigator (Skovsgaard, 2008)

Following up on the StartGazer text entry interface presented in my previous post, another approach to using zooming interfaces is employed in the ZoomNavigator (Skovsgaard, 2008) It addresses the well known issue of using gaze as input on traditional desktop systems, namely inaccuracy and jitter. Interesting solution which relies on dwell-time execution compared to the EyePoint system (Kumar&Winograd, 2007) which is described in the next post.

Abstract
The goal of this research is to estimate the maximum amount of noise of a pointing device that still makes interaction with a Windows interface possible. This work proposes zoom as an alternative activation method to the more well-known interaction methods (dwell and two-step-dwell activation). We present a magnifier called ZoomNavigator that uses the zoom principle to interact with an interface. Selection by zooming was tested with white noise in a range of 0 to 160 pixels in radius on an eye tracker and a standard mouse. The mouse was found to be more accurate than the eye tracker. The zoom principle applied allowed successful interaction with the smallest targets found in the Windows environment even with noise up to about 80 pixels in radius. The work suggests that the zoom interaction gives the user a possibility to make corrective movement during activation time eliminating the waiting time found in all types of dwell activations. Furthermore zooming can be a promising way to compensate for inaccuracies on low-resolution eye trackers or for instance if people have problems controlling the mouse due to hand tremors.

The sequence of images are screenshots from ZoomNavigator showing
a zoom towards a Windows file called ZoomNavigator.exe.

The principles of ZoomNavigator are shown in the figure above. Zooming is used to focus on the attended object and eventually make a selection (unambiguous action). ZoomNavigator allows actions similar to those found in a conventional mouse. (Skovsgaard, 2008) The system is described in a conference paper titled "Estimating acceptable noise-levels on gaze and mouse selection by zooming" Download paper (pdf)

Two-step zoom
The two-step zoom activation is demonstrated in the video below by IT University of Copenhagen (ITU) research director prof. John Paulin Hansen. Notice how the error rate is reduced by the zooming style of interaction, making it suitable for applications with need for detailed discrimination. It might be slower but error rates drops significantly.

"Dwell is the traditional way of making selections by gaze. In the video we compare dwell to magnification and zoom. While the hit-rate is 10 % with dwell on a 12 x 12 pixels target, it is 100 % for both magnification and zoom. Magnification is a two-step process though, while zoom only takes on selection. In the experiment, the initiation of a selection is done by pressing the spacebar. Normally, the gaze tracking system will do this automatically when the gaze remains within a limited area for more than approx. 100 ms"

For more information see the publications of the ITU.

Inspiration: StarGazer (Skovsgaard et al, 2008)

A major area of research for the COGAIN network is to enable communication for the disabled. The Innovative Communications group at IT University of Copenhagen continuously work on making gaze-based interaction technology more accessible, especially in the field of assistive technology.

The ability to enter text into the system is crucial for communication, without hands or speech this is somewhat problematic. The StartGazer software aims at solving this by introducing a novel 3D approach to text entry. In December I had the opportunity to visit ITU and try the StarGazer (among other things) myself, it is astonishingly easy to use. Within just a minute I was typing with my eyes. Rather than describing what it looks like, see the video below.
The associated paper is to be presented at the ETRA08 conference in March.

Tuesday, February 19, 2008

Inspiration: GazeSpace (Laqua et al. 2007)

Parallel to working on the prototypes I continuously search and review papers and thesises on gaze interaction methods / techniques, hardware and software development etc. I will post references on some of these to this blog. A great deal of research and theories on interaction / cognition lies behind the field of gaze interaction.

The paper below was presented last year on a conference held by the British Computer Society specialist group on Human Computer Interaction. Catching my attention is the focus on providing a custom content spaces (canvas), good feedback and using a dynamic dwell-time, something I intend to incorporate into my own gaze GUI components. Additionally, the idea on expanding the content canvas upon a gaze fixation is really nice and something I will attempt to do in .Net/WPF (initial work displays a set of photos that becomes enlarged upon fixation)

GazeSpace Eye Gaze Controlled Content Spaces (Laqua et al. 2007)

Abstract
In this paper, we introduce GazeSpace, a novel system utilizing eye gaze to browse content spaces. While most existing eye gaze systems are designed for medical contexts, GazeSpace is aimed at able-bodied audiences. As this target group has much higher expectations for quality of interaction and general usability, GazeSpace integrates a contextual user interface, and rich continuous feedback to the user. To cope with real-world information tasks, GazeSpace incorporates novel algorithms using a more dynamic gaze-interest threshold instead of static dwell-times. We have conducted an experiment to evaluate user satisfaction and results show that GazeSpace is easy to use and a “fun experience”. Download paper (PDF)

About the author
Sven Laqua is a PhD Student & Teaching Fellow at the Human Centred Systems Group a part of the Dept. of Computer Science at University College London. Sven has a personal homepage, university profile and a blog (rather empty at the moment)

Martin Tall On Gaze Interaction