Showing posts with label algorithm. Show all posts
Showing posts with label algorithm. Show all posts

Monday, November 1, 2010

ScanMatch: A novel method for comparing fixation sequences (Cristino et al, 2010)

Using algorithms designed to compare DNA sequence in eye movement comparison. Radical, with MATLAB source code, fantastic! Appears to be noise tolerant and outperform traditional Levenshtein-distance.

Abstract
We present a novel approach to comparing saccadic eye movement sequences based on the Needleman–Wunsch algorithm used in bioinformatics to compare DNA sequences. In the proposed method, the saccade sequence is spatially and temporally binned and then recoded to create a sequence of letters that retains fixation location, time, and order information. The comparison of two letter sequences is made by maximizing the similarity score computed from a substitution matrix that provides the score for all letter pair substitutions and a penalty gap. The substitution matrix provides a meaningful link between each location coded by the individual letters. This link could be distance but could also encode any useful dimension, including perceptual or semantic space. We show, by using synthetic and behavioral data, the benefits of this method over existing methods. The ScanMatch toolbox for MATLAB is freely available online (www.scanmatch.co.uk).
  • Filipe Cristino, Sebastiaan Mathôt, Jan Theeuwes, and Iain D. Gilchrist
    ScanMatch: A novel method for comparing fixation sequences
    Behav Res Methods 2010 42:692-700; doi:10.3758/BRM.42.3.692
    Abstract   Full Text (PDF)   References





An improved algorithm for automatic detection of saccades in eye movement data and for calculating saccade parameters (Behrens et al, 2010)

Abstract
"This analysis of time series of eye movements is a saccade-detection algorithm that is based on an earlier algorithm. It achieves substantial improvements by using an adaptive-threshold model instead of fixed thresholds and using the eye-movement acceleration signal. This has four advantages: (1) Adaptive thresholds are calculated automatically from the preceding acceleration data for detecting the beginning of a saccade, and thresholds are modified during the saccade. (2) The monotonicity of the position signal during the saccade, together with the acceleration with respect to the thresholds, is used to reliably determine the end of the saccade. (3) This allows differentiation between saccades following the main-sequence and non-main-sequence saccades. (4) Artifacts of various kinds can be detected and eliminated. The algorithm is demonstrated by applying it to human eye movement data (obtained by EOG) recorded during driving a car. A second demonstration of the algorithm detects microsleep episodes in eye movement data."

  • F. Behrens, M. MacKeben, and W. Schröder-Preikschat
    An improved algorithm for automatic detection of saccades in eye movement data and for calculating saccade parameters. Behav Res Methods 2010 42:701-708; doi:10.3758/BRM.42.3.701
    Abstract  Full Text (PDF  References





Friday, March 19, 2010

In the Eye of the Beholder: A Survey of Models for Eyes and Gaze (Hansen&Ji, 2010)

The following paper by Dan Witzner Hansen from ITU Copenhagen and Qiang Ji of Rensselaer Polytechnic Institute surveys and summarizes most existing methods of eye tracking and explains how they operate and what the pros/cons of each methods are. It is one of the most comprehensive publication I've seen on the topic and a delight to read. It was featured in the March edition of IEEE Transactions on Pattern Analysis and Machine Intelligence.

Abstract
"Despite active research and significant progress in the last 30 years, eye detection and tracking remains challenging due to the individuality of eyes, occlusion, variability in scale, location, and light conditions. Data on eye location and details of eye movements have numerous applications and are essential in face detection, biometric identification, and particular human-computer interaction tasks. This paper reviews current progress and state of the art in video-based eye detection and tracking in order to identify promising techniques as well as issues to be further addressed. We present a detailed review of recent eye models and techniques for eye detection and tracking. We also survey methods for gaze estimation and compare them based on their geometric properties and reported accuracies. This review shows that, despite their apparent simplicity, the development of a general eye detection technique involves addressing many challenges, requires further theoretical developments, and is consequently of interest to many other domains problems in computer vision and beyond."

Comparison of gaze estimation methods with respective prerequisites and reported accuracies


Eye Detection models

  • Dan Witzner Hansen, Qiang Ji, "In the Eye of the Beholder: A Survey of Models for Eyes and Gaze," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 3, pp. 478-500, Jan. 2010, doi:10.1109/TPAMI.2009.30. Download as PDF.

Tuesday, November 24, 2009

Remote tracker and 6DOF using a webcam

The following video clips demonstrates a Masters thesis project from the AGH University of Science and Technology in Cracow, Poland. The method developed provides 6 degrees of freedom head tracking and 2D eye tracking using a simple, low resolution 640x480 webcam. Under the hood it's based on the Lucas-Kanade optical flow and POSIT. A great start as the head tracking seems relatively stable. Imagine it with IR illumination, a camera with slightly higher resolution and a narrow angle lens. And of course, pupil + glint tracking algorithms for calibrated gaze estimation.


Monday, September 14, 2009

GaZIR: Gaze-based Zooming Interface for Image Retrieval (Kozma L., Klami A., Kaski S., 2009)

From the Helsinki Institute for Information Technology, Finland, comes a research prototype called GaZIR for gaze based image retrieval built by Laszlo Kozma, Arto Klami and Samuel Kaski. The GaZIR prototype uses a light-weight logistic regression model as a mechanism for predicting relevance based on eye movement data (such as viewing time, revisit counts, fixation length etc.) All occurring on-line in real time. The system is build around the PicSOM (paper) retrieval engine which is based on tree structured self-organizing maps (TS-SOMs). When provided a set of reference images the PicSOM engine goes online to download a set of similar images (based on color, texture or shape)

Abstract
"We introduce GaZIR, a gaze-based interface for browsing and searching for images. The system computes on-line predictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. The key novelty is that the relevance feedback is inferred from implicit cues obtained in real-time from the gaze pattern, using an estimator learned during a separate training phase. The natural zooming interface can be connected to any content-based information retrieval engine operating on user feedback. We show with experiments on one engine that there is sufficient amount of information in the gaze patterns to make the estimated relevance feedback a viable choice to complement or even replace explicit feedback by pointing-and-clicking."


Fig1. "Screenshot of the GaZIR interface. Relevance feedback gathered from outer rings influences the images retrieved for the inner rings, and the user can zoom in to reveal more rings."

Fig2. "Precision-recall and ROC curves for userindependent relevance prediction model. The predictions (solid line) are clearly above the baseline of random ranking (dash-dotted line), showing that relevance of images can be predicted from eye movements. The retrieval accuracy is also above the baseline provided by a naive model making a binary relevance judgement based on whether the image was viewed or not (dashed line), demonstrating the gain from more advanced gaze modeling."

Fig 3. "Retrieval performance in real user experiments. The bars indicate the proportion of relevant images shown during the search in six different search tasks for three different feedback methods. Explicit denotes the standard point-and-click feedback, predicted means implicit feedback inferred from gaze, and random is the baseline of providing random feedback. In all cases both actual feedback types outperform the baseline, but the relative performance of explicit and implicit feedback depends on the search task."
  • László Kozma, Arto Klami, and Samuel Kaski: GaZIR: Gaze-based Zooming Interface for Image Retrieval. To appear in Proceedings of 11th Conference on Multimodal Interfaces and The Sixth Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI), Boston, MA, USA, Novermber 2-6, 2009. (abstract, pdf)

Friday, September 11, 2009

An Adaptive Algorithm for Fixation, Saccade, and Glissade Detection in Eye-Tracking Data (Nyström M. & Holmqvist K, 2009)

From Markus Nyström and Kenneth Holmqvist at the Lund University Humanities Lab (HumLab) in Sweden comes an interesting paper on a novel algorithm that is capable of detecting glissades (aka dynamic overshoot) in eye tracker data. These are wobbling eye movements often found at the end of saccades and has previously been considered errors in saccadic programming with limited value. What ever their function is the phenomena does exists and should be accounted for. The paper reports finding glissades following half of all saccades while reading or viewing scenes, and has an average duration of 24 ms. This is work is important as it extends the default categorization of eye movement e.g. fixation, saccade, smooth pursuit, and blink. The algorithm is based on velocity saccade detection and is driven by data while containing a limited number of subjective settings. The algorithm contains a number of improvements such as thresholds for peak- and saccade onset/offset detection, adaptive threshold adjustment based on local noise levels, physical constraints on eye-movements to exclude noise and jitter, and new recommendations for minimum allowed fixation and saccade duration. Also, important to note that the data was obtained using a high-speed 1250 Hz SMI system, how the algorithm performs on a typical remote tracker running at 50-250Hz has yet to be defined.

Thursday, August 20, 2009

A geometric approach to remote eye tracking (Villanueva et al, 2009)

Came across this paper today, it's good news and a great achievement, especially since consumer products for recording high definition over a plain USB port has begun to appear. For example the upcoming Microsoft Lifecam Cinema HD provides 1,280 x 720 at 30 frames per second. This is to be released on September 9th at a reasonable US$ 80. Hopefully it will allow a simple modification to remove the infrared blocking filter. Things are looking better and better for low-cost eye tracking, keep up the excellent work, it will make a huge difference for all of us.

Abstract
"This paper presents a principled analysis of various combinations of image features to determine their suitability for remote eye tracking. It begins by reviewing the basic theory underlying the connection between eye image and gaze direction. Then a set of approaches is proposed based on different combinations of well-known features and their behaviour is valuated, taking into account various additional criteria such as free head movement, and minimum hardware and calibration requirements. The paper proposes a final method based on multiple glints and the pupil centre; the method is evaluated experimentally. Future trends in eye tracking technology are also discussed."


The algorithms were implemented in C++ running on a Windows PC equipped with a Pentium 4 processor at 3 GHz and 1 GB of Ram. The camera of choice delivers 15 frames per second at 1280 x 1024. Optimal distance from screen is 60 cm which is rather typical for remote eye trackers. This provides a track-box volume of 20 x 20 x 20 cm. Within this area the algorithms produce an average accuracy of 1.57 degrees. A 1 degree accuracy may be achieved obtained if the head is the same position as it was during calibration. Moving the head parallel to the monitor plane increases error by 0.2 - 0.4 deg. while moving closer or further away introduces a larger error between 1-1.5 degrees (mainly due to camera focus range). Note that no temporal filtering was used in the reporting. All-in-all these results are not so far from what typical remote systems produce.


The limitation of 15 fps stems from the frame rate of the camera, the software itself is able to process +50 images per second on the specified machine. Leaving it to our imagination what frame rates may be achieved with a fast Intel Core i7 processor with four cores.


  • A. Villanueva, G. Daunys, D. Hansen, M. Böhme, R. Cabeza, A. Meyer, and E. Barth, "A geometric approach to remote eye tracking," Universal Access in the Information Society. [Online]. Available: http://dx.doi.org/10.1007/s10209-009-0149-0

Tuesday, April 1, 2008

Eye Tracking, Algorithms and Mathematical Modelling (Leimberg & Vester-Christensen, 2005)

Extensive and technical publication on eye tracking algorithms by D. Leimberg and M. Vester-Christensen at the Department of Informatics and Mathematical Modeling at the Technical University of Denmark. The thesis contains a range of approaches and discussion of the implementation of these. Rich with illustrations and examples of the result of various methods.

Abstract
This thesis presents a complete system for eye tracking avoiding restrictions on head movements. A learning-based deformable model - Active Appearance Model (AAM) - is utilized for detection and tracking of the face. Several methods are proposed, described and tested for eye tracking, leading to determination of gaze. The {AAM} is used for a segmentation of the eye region, as well as providing an estimate of the pose of the head.

Among several, we propose a deformable template based eye tracker, combining high speed and accuracy, independently of the resolution. We compare with a state of the art active contour approach, showing that our method is more accurate. We conclude, that eye tracking using standard consumer cameras is feasible providing an accuracy within the measurable range." Download paper as pdf


Following up are two papers (also found at the end of the thesis)
  • M. Vester-Christensen, D. Leimberg, B. K. Ersbøll, L. K. Hansen, Deformable Models for Eye Tracking, Den 14. Danske Konference i Mønstergenkendelse og Billedanalyse, 2005 [full] [bibtex] [pdf]

  • D. Leimberg, M. Vester-Christensen, B. K. Ersbøll, L. K. Hansen, Heuristics for speeding up gaze estimation, Proc. Svenska Symposium i Bildanalys, SSBA 2005, Malmø, Sweden, SSBA, 2005 [full] [bibtex] [pdf]

Wednesday, March 12, 2008

Eye Gaze Interaction with Expanding Targets (Minotas, Spakov, MacKenzie, 2004)

Continuing on the topic of expanding areas this paper presents an approach where the expansion of the target area is invisible.The authors introduce their algorithm called "Grab-and-hold" which aims at stablizing the gaze data and performs a two part experiment to evaluate it.

Abstract
"Recent evidence on the performance benefits of expanding targets during manual pointing raises a provocative question: Can a similar effect be expected for eye gaze interaction? We present two experiments to examine the benefits of target expansion during an eye-controlled selection task. The second experiment also tested the efficiency of a “grab-and-hold algorithm” to counteract inherent eye jitter. Results confirm the benefits of target expansion both in pointing speed and accuracy. Additionally, the grab-and-hold algorithm affords a dramatic 57% reduction in error rates overall. The reduction is as much as 68% for targets subtending 0.35 degrees of visual angle. However, there is a cost which surfaces as a slight increase in movement time (10%). These findings indicate that target expansion coupled with additional measures to accommodate eye jitter has the potential to make eye gaze a more suitable input modality." (Paper available here)

Their "Grab-and-hold" algorithm that puts some more intelligent processing of the gaze data. "Upon appearance of the target, there is a settle-down period of 200 ms during which the gaze is expected to land in the target area and stay there. Then, the algorithm filters the gaze points until the first sample inside the expanded target area is logged. When this occurs, the target is highlighted and the selection timer triggered. The selection timer counts down a specified dwell time (DT) interval. "

While reading this paper I came to think about an important question concerning filtering of gaze data. The delay that comes from collecting the samples used for the algorithm processing causes a delay in the interaction. For example, if I were to sample 50 gaze positions and then average these to reduce the jitter it would result in a one second delay on a system that captures 50 images per second (50Hz) As seen in other papers as well there is a speed-accuracy trade off to make. What is more important, a lower error rate or a more responsive system?