Tuesday, December 14, 2010

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Summary
In this paper, the author introduces a method for grouping text and graphics basely solely on spatial recognition techniques. The system groups various strokes together which are located close to each other spatially and then searches through different combinations of the strokes by sending each combination to a recognizer to see if a primitive shape can be recognized. Various techniques are used to optimize the search since it can grow rather large as the number of strokes increases. 

Discussion
Using spatial information to group strokes appears to provide the basis for a fairly high recognition rate although the search is not very efficient. In domains where shapes with fewer strokes are more common, it may help to group strokes to a threshold number which is iteratively increased if a match is not found instead of placing a concrete upper bound on the number of strokes. In addition to using spatial information to group strokes, utilizing end point information from the strokes may be another method to reduce the searchable state space. 

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Summary
In this paper, the author presents a new approach to distinguishing shape from graphics in sketches. The system uses features of the gaps between the strokes, both temporally and spatially, as well as features of the strokes themselves. The results show that the use of temporal context of the strokes helps in classification of the strokes, although the use of spatial context does not prove as useful.

Discussion
Using temporal data of the start and end of the strokes is a good idea since strokes drawn in groups will usually be all of the same category. Spatial data may not be as useful for classification, but rather it may prove useful in separating groups of strokes which may be drawn quickly after each other yet are sufficiently separated in space and belong to different categories. 

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Summary
In this paper, the author introduces a new method for graph-based symbol recognition. Each stroke is first recognized as a basic primitive shape and then feature values are computed for each stroke relative to each other stroke in the symbol. The four features used are relative length of a stroke, number of intersections, angle of intersection, and relative position of intersection. 6 error metrics are then weighted and summed to determine the match percentage of an unknown symbol to a template.

Discussion
The author uses four different method to match individual strokes from an unknown symbol to a template symbol. Three of these methods are based on search algorithms and the fourth is a sorting method based on locations of the primitives. While the fourth method was much faster, it often provided less accurate graph matching. Since the fourth method was orders of magnitude faster, it could perhaps be combined with one of the search methods and may help to reduce the run time for the search method if run after the sort method.

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Summary
In this paper, the author describes a new method in which to classify shape strokes and text strokes. Instead of using multiple features as past classifiers have done, the new method relies on the single feature of stroke entropy. Strokes are first grouped based on temporal and spatial information and then stroke entropy is calculated for each group. If the group stroke entropy is above a certain threshold, it is classified as text or classified as a shape if the entropy is below a certain threshold. Anything in between the thresholds is classified as unknown. 

Discussion
The use of entropy to classify text and shape objects appears to be a very useful feature which provided favorable recognition rates. Text recognition rates were quite high, and while shape recognition rates were higher than previous studies, perhaps extra features could be checked when a group of strokes is unable to be classified to determine which category of classification should be favored. 

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Summary
In this paper, the author introduces a study done to determine the statistically best ink features for recognition of shapes versus text. About 1500 strokes of data were collected, labeled and the selected features were computed for each. This information was then fed into the R Statistical Package which provided a binary decision tree containing the most significant features.

Discussion
This paper provided valuable insight into which features are most useful in distinguishing shape stroke from text strokes. However, a binary decision tree can lead to misclassification when only a single a feature is not drawn in the same manner as the training data. Once the statistically important features were determined, the use of a linear classifier or some other method may have been preferable. 

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Summary
In this paper, the author describes the system they created which applies a constellation or 'pictorial structure' model to the recognition of strokes in sketches of particular classes or objects. They used a constellation model which was composed of mandatory features and optional features. A maximum likelihood search was then conducted to find the most plausible labeling for all the strokes appearing in the image based on pairwise interactions of the various components of the object.

Discussion
The system presented in this paper uses a similar idea to that proposed in LADDER in order to recognize various components of an object. Defining components based on their position compared to other components works for most cases although these relationships may not always remain the same. For example, a character may be in motion or laying down and so their arms may not always be directly to the sides of the torso. One addition that may help with this problem is to find the orientation of the convex envelope of the object and attempt labeling along this new axis.

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Summary
In this paper, the author presents LADDER, the first sketch description language that can be used to describe how shapes and shape groups are drawn, edited, and displayed. In addition, a customizable multi-domain recognition system is built as a proof of concept for the LADDER description language. LADDER allows a creator to define shapes by specifying relationship between various types of primitives. It also allows a developer to define action strokes and how those strokes will modify a shape based on the domain. 

Discussion
LADDER is a very useful tool for developing any type of sketch recognition system. Because it defines domain-specific shapes based on the relationships of primitive shapes, it may not be as effective in recognizing more detailed domain shapes. However, most current applications of sketch recognition involve only simple shapes which can mostly be described using this language. 

Saturday, December 11, 2010

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Summary
In this paper, the introduces a series of smaller programs which when intertwined form the basis of a graphical input recognition system named HUNCH. Some basic issues in sketch recognition are discussed including how to find lines and corners from raw data and how to methods for dealing with latching and over-tracing. Underlying problems of the bottom-up approach for recognition are also discussed and brief exploration is given into how a top-to-bottom approach might function. 

Discussion
In a modern sketch recognition system, many of the smaller programs implemented here may simply be seen as function of a complete recognition program. However, since at the time, computer resources were limited, the process of breaking up the recognition process into smaller tasks provides great insight into the basic steps to be taken for recognition. It is also interesting that in order to locate corners, the author utilizes speed and "bentness". "Bentness" later proved to be a very effective method of locating corners as seen in the Short-Straw method discussed in class. Although it may not have been as effective at the time of this writing since they most likely lacked the computer resources to re-sample a sketch.

Reading #9. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Summary
In this paper, the author presents PaleoSketch, a low-level recognizer which performs with almost 99% accuracy when recognizing 8 primitive shapes. PaleoSketch uses many previously employed pre-processing techniques for recognition but also introduces two new important pre-processing features. The normalized distance between direction extremes (NDDE) and the direction change ratio (DCR) are both very helpful in distinguishing curves from corners. An overview of how each primitive shape is recognized with the inclusion of these features is also presented.

Discussion
This paper two important pre-processing features that prove very useful in primitive shape recognition, the NDDE and the DCR. Using these features, the accuracy of PaleoSketch returning the correct interpretation as the top interpretation increased by over 30%. These features will most likely be included in any future low-level recognizers developed.  

Thursday, December 9, 2010

Reading #8. A Lightweight Multistroke Recognizer for User Interface Prototypes (Anthony)

Summary
In this paper, the author presents the $N Recognizer. This system is built on top of the $1 Recognizer but contains various improvements. $N is capable of recognizing gestures comprised of multiple strokes, recognizing 1D gestures such as lines, and providing bounded rotation invariance to allow for recognition of more symbols. $N also employs two optimization techniques based on the starting angle of a gesture and the number of strokes in a gesture, the second technique being optional. Even though $N is slightly more complex than $1, these optimization techniques help to run faster than $1 since both system are based on template matching.

Discussion
The use of optimization techniques in $N to reduce the number of templates compared greatly helps to reduce the run time of the recognition algorithm. Such techniques could be implemented in other template based recognition systems to reduce the amount of processing required. In order to recognize multi-stroke gestures, $N simply creates multiple templates by connecting the strokes in every possible order. For gestures with three or more strokes this can often greatly increase the number of templates being compared. An alternative method might be to have a single template consisting of only the strokes drawn on the interface and use all possible starting angles when attempting to match to a gesture. 

Reading #7. Sketch Based Interfaces: Early Processing for Sketch Understanding (Sezgin)

Summary
In this paper, the author describes an early processing system to be used as the basis for higher level sketch understanding. The system consists of three processing steps: approximation, beautification, and basic recognition. The system described allowed users to enter free-hand strokes in a sketch based input system without the need to switch between drawing modes. Previous works required that a user somehow notify the system whether a line, curve, or arc was being drawn. 

Discussion
This paper introduces an interesting method of detecting corners in sketched inputs. Instead of relying only on curvature data as some previous works, this system employed a combination of curvature and speed data to determine the most likely locations of corners. This new method helped the authors successfully distinguish between different stroke components and implement the rest of their system. Recognizing the basic geometric components of a sketch is a valuable addition for almost all areas of study in which one might use a sketch based interface.