Tuesday, December 14, 2010

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Summary
In this paper, the author introduces a method for grouping text and graphics basely solely on spatial recognition techniques. The system groups various strokes together which are located close to each other spatially and then searches through different combinations of the strokes by sending each combination to a recognizer to see if a primitive shape can be recognized. Various techniques are used to optimize the search since it can grow rather large as the number of strokes increases. 

Discussion
Using spatial information to group strokes appears to provide the basis for a fairly high recognition rate although the search is not very efficient. In domains where shapes with fewer strokes are more common, it may help to group strokes to a threshold number which is iteratively increased if a match is not found instead of placing a concrete upper bound on the number of strokes. In addition to using spatial information to group strokes, utilizing end point information from the strokes may be another method to reduce the searchable state space. 

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Summary
In this paper, the author presents a new approach to distinguishing shape from graphics in sketches. The system uses features of the gaps between the strokes, both temporally and spatially, as well as features of the strokes themselves. The results show that the use of temporal context of the strokes helps in classification of the strokes, although the use of spatial context does not prove as useful.

Discussion
Using temporal data of the start and end of the strokes is a good idea since strokes drawn in groups will usually be all of the same category. Spatial data may not be as useful for classification, but rather it may prove useful in separating groups of strokes which may be drawn quickly after each other yet are sufficiently separated in space and belong to different categories. 

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Summary
In this paper, the author introduces a new method for graph-based symbol recognition. Each stroke is first recognized as a basic primitive shape and then feature values are computed for each stroke relative to each other stroke in the symbol. The four features used are relative length of a stroke, number of intersections, angle of intersection, and relative position of intersection. 6 error metrics are then weighted and summed to determine the match percentage of an unknown symbol to a template.

Discussion
The author uses four different method to match individual strokes from an unknown symbol to a template symbol. Three of these methods are based on search algorithms and the fourth is a sorting method based on locations of the primitives. While the fourth method was much faster, it often provided less accurate graph matching. Since the fourth method was orders of magnitude faster, it could perhaps be combined with one of the search methods and may help to reduce the run time for the search method if run after the sort method.

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Summary
In this paper, the author describes a new method in which to classify shape strokes and text strokes. Instead of using multiple features as past classifiers have done, the new method relies on the single feature of stroke entropy. Strokes are first grouped based on temporal and spatial information and then stroke entropy is calculated for each group. If the group stroke entropy is above a certain threshold, it is classified as text or classified as a shape if the entropy is below a certain threshold. Anything in between the thresholds is classified as unknown. 

Discussion
The use of entropy to classify text and shape objects appears to be a very useful feature which provided favorable recognition rates. Text recognition rates were quite high, and while shape recognition rates were higher than previous studies, perhaps extra features could be checked when a group of strokes is unable to be classified to determine which category of classification should be favored. 

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Summary
In this paper, the author introduces a study done to determine the statistically best ink features for recognition of shapes versus text. About 1500 strokes of data were collected, labeled and the selected features were computed for each. This information was then fed into the R Statistical Package which provided a binary decision tree containing the most significant features.

Discussion
This paper provided valuable insight into which features are most useful in distinguishing shape stroke from text strokes. However, a binary decision tree can lead to misclassification when only a single a feature is not drawn in the same manner as the training data. Once the statistically important features were determined, the use of a linear classifier or some other method may have been preferable. 

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Summary
In this paper, the author describes the system they created which applies a constellation or 'pictorial structure' model to the recognition of strokes in sketches of particular classes or objects. They used a constellation model which was composed of mandatory features and optional features. A maximum likelihood search was then conducted to find the most plausible labeling for all the strokes appearing in the image based on pairwise interactions of the various components of the object.

Discussion
The system presented in this paper uses a similar idea to that proposed in LADDER in order to recognize various components of an object. Defining components based on their position compared to other components works for most cases although these relationships may not always remain the same. For example, a character may be in motion or laying down and so their arms may not always be directly to the sides of the torso. One addition that may help with this problem is to find the orientation of the convex envelope of the object and attempt labeling along this new axis.

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Summary
In this paper, the author presents LADDER, the first sketch description language that can be used to describe how shapes and shape groups are drawn, edited, and displayed. In addition, a customizable multi-domain recognition system is built as a proof of concept for the LADDER description language. LADDER allows a creator to define shapes by specifying relationship between various types of primitives. It also allows a developer to define action strokes and how those strokes will modify a shape based on the domain. 

Discussion
LADDER is a very useful tool for developing any type of sketch recognition system. Because it defines domain-specific shapes based on the relationships of primitive shapes, it may not be as effective in recognizing more detailed domain shapes. However, most current applications of sketch recognition involve only simple shapes which can mostly be described using this language.