CSCE 624

Tuesday, December 14, 2010

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Summary

In this paper, the author introduces a study done to determine the statistically best ink features for recognition of shapes versus text. About 1500 strokes of data were collected, labeled and the selected features were computed for each. This information was then fed into the R Statistical Package which provided a binary decision tree containing the most significant features.

Discussion

This paper provided valuable insight into which features are most useful in distinguishing shape stroke from text strokes. However, a binary decision tree can lead to misclassification when only a single a feature is not drawn in the same manner as the training data. Once the statistically important features were determined, the use of a linear classifier or some other method may have been preferable.

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Summary

In this paper, the author describes the system they created which applies a constellation or 'pictorial structure' model to the recognition of strokes in sketches of particular classes or objects. They used a constellation model which was composed of mandatory features and optional features. A maximum likelihood search was then conducted to find the most plausible labeling for all the strokes appearing in the image based on pairwise interactions of the various components of the object.

Discussion

The system presented in this paper uses a similar idea to that proposed in LADDER in order to recognize various components of an object. Defining components based on their position compared to other components works for most cases although these relationships may not always remain the same. For example, a character may be in motion or laying down and so their arms may not always be directly to the sides of the torso. One addition that may help with this problem is to find the orientation of the convex envelope of the object and attempt labeling along this new axis.

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Summary

In this paper, the author presents LADDER, the first sketch description language that can be used to describe how shapes and shape groups are drawn, edited, and displayed. In addition, a customizable multi-domain recognition system is built as a proof of concept for the LADDER description language. LADDER allows a creator to define shapes by specifying relationship between various types of primitives. It also allows a developer to define action strokes and how those strokes will modify a shape based on the domain.

Discussion

LADDER is a very useful tool for developing any type of sketch recognition system. Because it defines domain-specific shapes based on the relationships of primitive shapes, it may not be as effective in recognizing more detailed domain shapes. However, most current applications of sketch recognition involve only simple shapes which can mostly be described using this language.

Saturday, December 11, 2010

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Summary

In this paper, the introduces a series of smaller programs which when intertwined form the basis of a graphical input recognition system named HUNCH. Some basic issues in sketch recognition are discussed including how to find lines and corners from raw data and how to methods for dealing with latching and over-tracing. Underlying problems of the bottom-up approach for recognition are also discussed and brief exploration is given into how a top-to-bottom approach might function.

Discussion

In a modern sketch recognition system, many of the smaller programs implemented here may simply be seen as function of a complete recognition program. However, since at the time, computer resources were limited, the process of breaking up the recognition process into smaller tasks provides great insight into the basic steps to be taken for recognition. It is also interesting that in order to locate corners, the author utilizes speed and "bentness". "Bentness" later proved to be a very effective method of locating corners as seen in the Short-Straw method discussed in class. Although it may not have been as effective at the time of this writing since they most likely lacked the computer resources to re-sample a sketch.

Reading #9. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Summary

In this paper, the author presents PaleoSketch, a low-level recognizer which performs with almost 99% accuracy when recognizing 8 primitive shapes. PaleoSketch uses many previously employed pre-processing techniques for recognition but also introduces two new important pre-processing features. The normalized distance between direction extremes (NDDE) and the direction change ratio (DCR) are both very helpful in distinguishing curves from corners. An overview of how each primitive shape is recognized with the inclusion of these features is also presented.

Discussion

This paper two important pre-processing features that prove very useful in primitive shape recognition, the NDDE and the DCR. Using these features, the accuracy of PaleoSketch returning the correct interpretation as the top interpretation increased by over 30%. These features will most likely be included in any future low-level recognizers developed.

Thursday, December 9, 2010

Reading #8. A Lightweight Multistroke Recognizer for User Interface Prototypes (Anthony)

Summary

In this paper, the author presents the $N Recognizer. This system is built on top of the $1 Recognizer but contains various improvements. $N is capable of recognizing gestures comprised of multiple strokes, recognizing 1D gestures such as lines, and providing bounded rotation invariance to allow for recognition of more symbols. $N also employs two optimization techniques based on the starting angle of a gesture and the number of strokes in a gesture, the second technique being optional. Even though $N is slightly more complex than $1, these optimization techniques help to run faster than $1 since both system are based on template matching.

Discussion

The use of optimization techniques in $N to reduce the number of templates compared greatly helps to reduce the run time of the recognition algorithm. Such techniques could be implemented in other template based recognition systems to reduce the amount of processing required. In order to recognize multi-stroke gestures, $N simply creates multiple templates by connecting the strokes in every possible order. For gestures with three or more strokes this can often greatly increase the number of templates being compared. An alternative method might be to have a single template consisting of only the strokes drawn on the interface and use all possible starting angles when attempting to match to a gesture.

Reading #7. Sketch Based Interfaces: Early Processing for Sketch Understanding (Sezgin)

Summary
In this paper, the author describes an early processing system to be used as the basis for higher level sketch understanding. The system consists of three processing steps: approximation, beautification, and basic recognition. The system described allowed users to enter free-hand strokes in a sketch based input system without the need to switch between drawing modes. Previous works required that a user somehow notify the system whether a line, curve, or arc was being drawn.

Discussion

This paper introduces an interesting method of detecting corners in sketched inputs. Instead of relying only on curvature data as some previous works, this system employed a combination of curvature and speed data to determine the most likely locations of corners. This new method helped the authors successfully distinguish between different stroke components and implement the rest of their system. Recognizing the basic geometric components of a sketch is a valuable addition for almost all areas of study in which one might use a sketch based interface.