CSCE 624: 2010

Tuesday, December 14, 2010

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Summary

In this paper, the author introduces a method for grouping text and graphics basely solely on spatial recognition techniques. The system groups various strokes together which are located close to each other spatially and then searches through different combinations of the strokes by sending each combination to a recognizer to see if a primitive shape can be recognized. Various techniques are used to optimize the search since it can grow rather large as the number of strokes increases.

Discussion

Using spatial information to group strokes appears to provide the basis for a fairly high recognition rate although the search is not very efficient. In domains where shapes with fewer strokes are more common, it may help to group strokes to a threshold number which is iteratively increased if a match is not found instead of placing a concrete upper bound on the number of strokes. In addition to using spatial information to group strokes, utilizing end point information from the strokes may be another method to reduce the searchable state space.

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Summary

In this paper, the author presents a new approach to distinguishing shape from graphics in sketches. The system uses features of the gaps between the strokes, both temporally and spatially, as well as features of the strokes themselves. The results show that the use of temporal context of the strokes helps in classification of the strokes, although the use of spatial context does not prove as useful.

Discussion

Using temporal data of the start and end of the strokes is a good idea since strokes drawn in groups will usually be all of the same category. Spatial data may not be as useful for classification, but rather it may prove useful in separating groups of strokes which may be drawn quickly after each other yet are sufficiently separated in space and belong to different categories.

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Summary
In this paper, the author introduces a new method for graph-based symbol recognition. Each stroke is first recognized as a basic primitive shape and then feature values are computed for each stroke relative to each other stroke in the symbol. The four features used are relative length of a stroke, number of intersections, angle of intersection, and relative position of intersection. 6 error metrics are then weighted and summed to determine the match percentage of an unknown symbol to a template.

Discussion
The author uses four different method to match individual strokes from an unknown symbol to a template symbol. Three of these methods are based on search algorithms and the fourth is a sorting method based on locations of the primitives. While the fourth method was much faster, it often provided less accurate graph matching. Since the fourth method was orders of magnitude faster, it could perhaps be combined with one of the search methods and may help to reduce the run time for the search method if run after the sort method.

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Summary

In this paper, the author describes a new method in which to classify shape strokes and text strokes. Instead of using multiple features as past classifiers have done, the new method relies on the single feature of stroke entropy. Strokes are first grouped based on temporal and spatial information and then stroke entropy is calculated for each group. If the group stroke entropy is above a certain threshold, it is classified as text or classified as a shape if the entropy is below a certain threshold. Anything in between the thresholds is classified as unknown.

Discussion

The use of entropy to classify text and shape objects appears to be a very useful feature which provided favorable recognition rates. Text recognition rates were quite high, and while shape recognition rates were higher than previous studies, perhaps extra features could be checked when a group of strokes is unable to be classified to determine which category of classification should be favored.

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Summary

In this paper, the author introduces a study done to determine the statistically best ink features for recognition of shapes versus text. About 1500 strokes of data were collected, labeled and the selected features were computed for each. This information was then fed into the R Statistical Package which provided a binary decision tree containing the most significant features.

Discussion

This paper provided valuable insight into which features are most useful in distinguishing shape stroke from text strokes. However, a binary decision tree can lead to misclassification when only a single a feature is not drawn in the same manner as the training data. Once the statistically important features were determined, the use of a linear classifier or some other method may have been preferable.

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Summary

In this paper, the author describes the system they created which applies a constellation or 'pictorial structure' model to the recognition of strokes in sketches of particular classes or objects. They used a constellation model which was composed of mandatory features and optional features. A maximum likelihood search was then conducted to find the most plausible labeling for all the strokes appearing in the image based on pairwise interactions of the various components of the object.

Discussion

The system presented in this paper uses a similar idea to that proposed in LADDER in order to recognize various components of an object. Defining components based on their position compared to other components works for most cases although these relationships may not always remain the same. For example, a character may be in motion or laying down and so their arms may not always be directly to the sides of the torso. One addition that may help with this problem is to find the orientation of the convex envelope of the object and attempt labeling along this new axis.

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Summary

In this paper, the author presents LADDER, the first sketch description language that can be used to describe how shapes and shape groups are drawn, edited, and displayed. In addition, a customizable multi-domain recognition system is built as a proof of concept for the LADDER description language. LADDER allows a creator to define shapes by specifying relationship between various types of primitives. It also allows a developer to define action strokes and how those strokes will modify a shape based on the domain.

Discussion

LADDER is a very useful tool for developing any type of sketch recognition system. Because it defines domain-specific shapes based on the relationships of primitive shapes, it may not be as effective in recognizing more detailed domain shapes. However, most current applications of sketch recognition involve only simple shapes which can mostly be described using this language.

Saturday, December 11, 2010

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Summary

In this paper, the introduces a series of smaller programs which when intertwined form the basis of a graphical input recognition system named HUNCH. Some basic issues in sketch recognition are discussed including how to find lines and corners from raw data and how to methods for dealing with latching and over-tracing. Underlying problems of the bottom-up approach for recognition are also discussed and brief exploration is given into how a top-to-bottom approach might function.

Discussion

In a modern sketch recognition system, many of the smaller programs implemented here may simply be seen as function of a complete recognition program. However, since at the time, computer resources were limited, the process of breaking up the recognition process into smaller tasks provides great insight into the basic steps to be taken for recognition. It is also interesting that in order to locate corners, the author utilizes speed and "bentness". "Bentness" later proved to be a very effective method of locating corners as seen in the Short-Straw method discussed in class. Although it may not have been as effective at the time of this writing since they most likely lacked the computer resources to re-sample a sketch.

Reading #9. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Summary

In this paper, the author presents PaleoSketch, a low-level recognizer which performs with almost 99% accuracy when recognizing 8 primitive shapes. PaleoSketch uses many previously employed pre-processing techniques for recognition but also introduces two new important pre-processing features. The normalized distance between direction extremes (NDDE) and the direction change ratio (DCR) are both very helpful in distinguishing curves from corners. An overview of how each primitive shape is recognized with the inclusion of these features is also presented.

Discussion

This paper two important pre-processing features that prove very useful in primitive shape recognition, the NDDE and the DCR. Using these features, the accuracy of PaleoSketch returning the correct interpretation as the top interpretation increased by over 30%. These features will most likely be included in any future low-level recognizers developed.

Thursday, December 9, 2010

Reading #8. A Lightweight Multistroke Recognizer for User Interface Prototypes (Anthony)

Summary

In this paper, the author presents the $N Recognizer. This system is built on top of the $1 Recognizer but contains various improvements. $N is capable of recognizing gestures comprised of multiple strokes, recognizing 1D gestures such as lines, and providing bounded rotation invariance to allow for recognition of more symbols. $N also employs two optimization techniques based on the starting angle of a gesture and the number of strokes in a gesture, the second technique being optional. Even though $N is slightly more complex than $1, these optimization techniques help to run faster than $1 since both system are based on template matching.

Discussion

The use of optimization techniques in $N to reduce the number of templates compared greatly helps to reduce the run time of the recognition algorithm. Such techniques could be implemented in other template based recognition systems to reduce the amount of processing required. In order to recognize multi-stroke gestures, $N simply creates multiple templates by connecting the strokes in every possible order. For gestures with three or more strokes this can often greatly increase the number of templates being compared. An alternative method might be to have a single template consisting of only the strokes drawn on the interface and use all possible starting angles when attempting to match to a gesture.

Reading #7. Sketch Based Interfaces: Early Processing for Sketch Understanding (Sezgin)

Summary
In this paper, the author describes an early processing system to be used as the basis for higher level sketch understanding. The system consists of three processing steps: approximation, beautification, and basic recognition. The system described allowed users to enter free-hand strokes in a sketch based input system without the need to switch between drawing modes. Previous works required that a user somehow notify the system whether a line, curve, or arc was being drawn.

Discussion

This paper introduces an interesting method of detecting corners in sketched inputs. Instead of relying only on curvature data as some previous works, this system employed a combination of curvature and speed data to determine the most likely locations of corners. This new method helped the authors successfully distinguish between different stroke components and implement the rest of their system. Recognizing the basic geometric components of a sketch is a valuable addition for almost all areas of study in which one might use a sketch based interface.

Monday, November 8, 2010

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (Li)

Summary

In this paper, Li introduces his gesture recognizer named Protractor. It works similar to the $1 Recognizer presented by Wobbrock but is faster and can be made to be orientation-sensitive. Protractor first resamples to N points and then rotates based on the indicative angle or to one of eight orientations if it is specified to be orientation-sensitive. It then calculates the optimal angular distance against each template and returns the top choice of the N-best list of results.

Discussion

Protractor uses the same concepts presented in the $1 Recognizer but also makes some key improvements. Protractor is able to recognize gesture which are orientation-sensitive by only rotating to certain angles and not scaling to avoid distorting the input. It also uses a closed-form solution to find the rotation that leads to the minimum angular distance instead of using the iterative approach utilized in $1. This helps Protractor to produce a higher accuracy in shorter amounts of time.

Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Summary

In this paper, Wobbrock introduces his "$1 recognizer". This recognizer is easy to implement compared to other methods of gesture recognizing, such as Rubine's method. The simple algorithm consists of resampling the point path, rotating the gesture based on "indicitave angle", scaling and translating to a reference square, and then finding the optimal angle for the best score. The score is computed as a function of the distance between the template points and the points of the gesture.

Discussion

This paper introduces a simple gesture recognizer that requires less processing to implement than many recognizer before it. The $1 recognizer provided accuracy rates of over 99% for a selection of gestures. However, since it changes the orientation and scale of each gesture that is input, it would not perform well with gestures such as arrows or shapes that may be similar such as squares and rectangles.

Thursday, November 4, 2010

Reading #4: Sketchpad: A Man-Made Graphical Communication System (Sutherland)

Summary

In this paper, the author describes the Sketchpad system. Since the hardware of the time was limited, the author uses a light pen and a combination of push buttons, on and off switches, and turning knobs to input and manipulate data in the system. The system is only able to use lines, circles, and points but the data structure for each was well defined along with the types of constraints that may be used. Several examples of Sketchpad projects were given including bridge drawings, artistic drawings, and electrical circuit diagrams.

Discussion

This paper introduces many ideas that are still used in sketch input systems today such the data structures and constraint systems presented. The idea of copying instances and being able to have sub-pictures within sub-pictures allows for more efficient sketches to be created. The constraint system introduced is very useful for defining components composed of subcomponents.

Wednesday, September 15, 2010

Reading #3: “Those Look Similar!” Issues in Automating Gesture Design Advice (Long)

Summary

In this paper, the author describes their gesture design tool named quill. This system performs user initiated and automatic analysis on different gesture classes to determine which ones may be recognized as similar by either the computer system or a human. The quill system uses the algorithm developed by Rubine to classify gesture classes based on ten to fifteen examples for each gesture class.The paper also discusses various challenges that arise when designing a system to give feedback to designers of gestures. The issues of timing of analysis and feedback, how much feedback, and what kind of feedback should be given are all discussed. The paper finishes by discussing some shortcomings of recognizing human-perceived similarities and how these can be improved in the future.

Discussion

This paper discusses some important issues for any developer creating a system that is meant to provide feedback to its users. The timing and presentation of the feedback given in quill is provided in a way as to minimize disrupting a gesture designer during the creative process. Providing feedback about the machine similarity of gesture classes can be valuable to a new gesture designer who is unfamiliar with the feature set provided by Rubine and the way in which a computer classifies gestures.

Monday, September 13, 2010

Reading #2 Specifying Gestures by Example (Rubine)

Summary
This paper first describes some features of a gesture based drawing program which uses GRANDMA (Gesture Recognizers Automated in a Novel Direct Manipulation) developed by Dean Rubine. The system uses training examples of various gestures to recognizer and classify the input from the user. Several training examples of the "delete" gesture are shown to illustrate the variety possible in a single gesture. Next the features used for gesture recognition are described. These features are used in combination with a linear classifier to determine which class a gesture belongs to. A few example gesture sets are also given along with their respective recognition rates to demonstrate the high accuracy provided by GRANDMA.

Discussion
In 1991, Rubine's GRANDMA system provided a solid framework for gesture recognition based on example gestures. This method proved relatively simple compared to implementing the hand-coded recognizers available at the time. Having a system that can be trained using example gestures allows for easier custom implementations in various types of programs. The feature set chosen by Rubine combined with the linear classifier is sufficient to repeatedly distinguish most common gestures as well as numbers and letters as shown in Figure 9 of the paper.

Wednesday, September 8, 2010

Reading #1 Gesture Recognition (Hammond)

Summary
This article provides a good introduction to the field of gesture recognition and details how it differs from sketch recognition in that a gesture must be drawn in the same manner every time in order to be recognized. An explanation of the 13 features developed by Dean Rubine for gesture recognition is then given. These features form the core of many gesture recognition systems used today such as Christopher Long's Quill system which is explained next. Long's system adds 11 features but disregards the 2 features dealing with time from Rubine's original system. Wobbrock's $1 gesture recognizer is then explained to show a different method of gesture recognition. Instead of relying on features, Wobbrock's method computes distances between sample points to determine which class a gesture belongs to.

Discussion
This reading introduces the reader to a few of the fundamental gesture recognition techniques. For someone who is new to the field, it provides insight into how gesture recognition can be accomplished and gives a feel for what features of a gesture are important when attempting recognition. This is done by exploring some of the very important papers in the field of gesture recognition.

Wednesday, September 1, 2010

Some General Info

1. Photo of yourself.

See profile pic.
2. E-mail address (e.g., yourname at domain.com).

jag427 at gmail dot com

3. Graduate standing (e.g., 3rd year Phd) (e.g., 3rd Year PhD, 2nd Year Masters, 1st Year PhD w/ Masters).

2nd Year M.E.

4. Why are you taking this class?

Curiosity about the field.

5. What experience do you bring to this class?

Not much but plan to leave with plenty.

6. What do you expect to be doing in 10 years?

Working in industry or running a business, preferably the latter

7. What do you think will be the next biggest technological advancement in computer science?

Quantum computing. I don't know much about it but it should be interesting.

8. What was your favorite course when you were an undergraduate (computer science or otherwise)?

Embedded Systems courses

9. What is your favorite movie and why?

Gladiator. It has nothing to do with CS but I could watch that movie all day

10. If you could travel back in time, who would you like to meet and why?

Leonardo da Vinci. I'd give him all my books and course work then see what kind of future I come back to.

11, Give some interesting fact about yourself.

I've never been out of country but I really want to explore the rest of this world.