Paco's Sketch Recognition Blog: 2010

Tuesday, December 14, 2010

Reading #30: Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams (Hammond)

Comments

Summary

Sketch recognition is important because of it empowers the user with computer edition tools yet having a natural interface as in pen and paper. A clear application of this neat combination is Tauthi. In this work Hammond and Davis present a tool for drawing and editing UML diagrams using a pen input device. The work compares this implementation for drawing UML diagrams with a popular UML design application (Rational Rose) and with a popular drawing application (paint). The program offers two main modes, one where the strokes are beautified after recognition and one where the strokes are preserved as drawn by the user. The results show that users prefer the use of interpreted tauthi over paint and rational rose.
The software relies on several recognition techniques that are explained in the paper, such as corner finding, filtering, text detection (not recognition yet but planned as future work), grouping and others. The interface uses strokes both for drawing and for editing commands such as delete and move. Depending on the viewmode the strokes are edited accordingly to maintain consistency of the diagram.

Discussion

For us computer scientists that have to deal with UML diagrams, this is a magnificent application. I associated sketch recognition with diagram drawing since it is a domain were drawing with a marker in a blackboard feels much more natural. I usually draw UML first in a piece of paper or blackboard and then “beautify it” by doing it in Rose, Poseidon or any other UML tool. With this application however it may not longer be the case since hand drawings can be easily transformed into nice UML diagrams. However to make this a reality we still have path to cover not only in the UI and the recognizers but also in the available hardware, digital blackboards and notebooks are available now, but still are usually not affordable or present imprecise recognition. But I think this is a big step in the right direction.

Reading #29: Scratch Input Creating Large, Inexpensive, Unpowered and Mobile Finger Input Surfaces (Harrison)

Comments

Chris

Summary

This paper explores an alternative approach tan the classical pen for sketch recognition. Instead of using a tablet this system uses the sound of sketching to recognize gestures. This very novel approach enables users that do not have a tablet or digital pen input at hand to interact with a computer using the benefits of sketch recognition. This system uses a microphone adapted to a stethoscope as an input sensor. Basing on the features of the resulting sound wave like frequency and amplitude profiles, a particular signature or sound profile can be associated with different gestures. Due to the lack of spatial information in sound it is very difficult to distinguish shapes that are spatially different but have very similar sound profiles (e.g. M and W). The paper shows several applications of the scratch input (walls, mobile devices, tables and even fabric). In a set of simple gestures to be recognized that are dissimilar enough the achieved accuracies are of around 90% which is very impressive for an input source that is so limited.

Discussion

This paper opens a new door in sketch recognition. In the sketch recognition class several projects were inspired by this research work. The sound input presents many limitations to detect complex shapes because the available features are very limited, however it is good enough to recognize simple gestures, and provides an inexpensive and portable way of having an extra input source. I like the idea of some of the students in this class of having several microphones as input, this limits portability but improves accuracy so for instance, a blackboard with a grid of microphones behind can be used as an input device to control features of a classroom.

Thursday, December 9, 2010

Reading #28: iCanDraw? – Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces (Dixon)

Deja vú… Oh! yeah, it was already posted several days ago. Here it is.

Reading #27: K-sketch: A 'Kinetic' Sketch Pad for Novice Animators (Davis)

Comments

Jonathan

Summary

Animators in the world today have many tools to work with and the internet has shown that they are very interesting ideas from people with all types of backgrounds that not necessarily have an animation background. For this novice user as well as for the experienced user that needs a fast prototype of his idea, a tool that enables a quick transition between the creative stage and a simple animation is highly desired. Usually the animation process begins with a storyboard. K-Sketch basically enables the user to create a sketched storyboard that can be animated immediately allowing the user to better visualize the idea.
The paper discusses the implementation of K-sketch as well as the user interface. Studies took place to determine the optimal set of operations in the UI. The resulting tool was tested and compared to MS Powerpoint as a reference of simple animation. The users preferred K-Sketch as it showed a more natural way of animating objects.

Discussion

In fact animation tools today still have a relatively steep learning curve, plus they don’t feel as natural as the storyboards drawn on pen and pencil. In my case I used to spend some time in middle school doing sketches in the top corner of notebooks to pass pages rapidly “animating” the sketches. Off course the task was tedious but hey, what else there is to do in middle school? I guess kids now can instead draw on simple sketch in their tablets and enjoy the magic of animation using tools like K-Sketch in very simple steps. I like very much the path of investigation of this paper, creating magic with a pen is made easier.

Reading #26: Picturephone: A Game for Sketch Data Capture (Johnson)

Comments

Jonathan

Summary

This is a more detailed and particular description of the PicturePhone game described in reading #24. The Picturephone game consist of 4 consecutive steps

Player A is given a text description, and must make a drawing that captures that description as accurately as possible.
Player B receives the drawing and describes it in words.
Player C is given Player B’s description and draws it.
An unrelated player Player D is asked to judge how closely Player A and C’s drawings match, which assigns a score to players A, B, and C.

This enables to collect information on both word to sketch interpretation and also from sketch to words. The implementation allows researchers to use the collected data for their purposes such as training recognizers.

Discussion

Whooop! Short paper! However short, it gets to the point and accurately describes the application. The picturephone game is slower than Pictionary-like games, this can make it boring and not as fun. However it may report more reliable labels as they reward the accuracy of the picture and the descriptions. It serves as a nice compliment to the user that enjoys this type of low paced games.

Reading #25: A Descriptor for Large Scale Image Retrieval Based on Sketched Feature Lines (Eitz)

Comments

Sam

Summary

Internet changed the way we use and retreive information, now the problema is that there is so much information that efficient queries are necessary. Searching engines like google have done an amazing job at searching text. However searching for images is not always easy, particularly when the images in the database do not contain metadata. The paper proposes a way of searching images based on sketches of the desired image. The problem here is that the processing in each image has to be simple enough to allow the query to return results in feasible time. The contribution is a descriptor based on structure tensors that allows easy and efficient implementation to obtain query results in low times over a large database (less than 4 seconds for a 1.5 Million images database).

Discussion

This application shows a really interesting way of doing queries. The results are very impressive in terms of time and accuracy. The resulting images are not always as expected (draw a building return a railroad) but this is not necessarily a bad thing, as actually it gives the user the feedback that he draw a building that looks like a railroad. I don’t think sketch queries will replace text queries for images, as in usual cases it is easier to search for “tree” than to draw a tree. However they are multiple cases where this can be a very nice option as when a very abstract picture is searched or the user is not really sure of how to describe the image he clearly has on his mind. Also this enables images in the result set that do not have the adequate metadata, and allows Multilanguage search. (If I search for “tree” the image will metadata “arból” will most likely be discarded even if they mean the same).

Reading #24: Games for Sketch Data Collection (Johnson)

Comments

Kim

Summary

We have seen in many of the previous posts that an important part of the sketch recognition is data collection. It serves to analyze the features, train recognizers and test algorithms. However, data collection is sometime tedious and it is difficult to get external users to help with this task. In this paper they present an alternative where games are employed to collect such data. They present two particular games that are used to gather this data: Picturephone and stella sketch, the first emulates the broken telephone game, were a user depicts what is said in words and then a second puts into words what the first depicted. Then again the other players will try to sketch what the second player wrote so the sketches can be compared. On the Stellasketch a Pictionary game is implemented were one draws and all the others try to guess what is the picture. The games easily produced over 400 labels and 105 sketches. The future direction section also presents interesting ideas including a nice application of the data collection in the future directions section, a sketch based captcha where user draw a common object.

Discussion

Games are fun, so the idea of having a good game for data collection is a very interesting idea. However this imposes a problem over the reliability of the data. Usually serious is not fun, so in a game like these it would be easy to find “funny labels” that will make you laugh but will not be accurate. They even show this in their screens (e.g. “What is that man doing to that horse?”). However this can be overcome with enough samples to be able to detect false labels.

Reading #23: InkSeine: In Situ Search for Active Note Taking (Hinckley)

Comments

Jonathan

Summary

One of the most popular application of sketch recognition today is note taking. This is thanks to popular devices and applications such as the palm pilot and OneNote. In this paper they present a very nice application where the user can dynamically take notes and embed dynamic content within the notes without having to change the workspace context or the tool that is used. This enables a continuous workflow without interruption. The key part of the application is the Ink search where the user may draw an ink note and using gestures indicate that the selected strokes are text to be searched in a particular context, which is also selected via gestures. Thus without having to leave the canvas the user can create rich context over the ink notes.
The paper explains in detail the use case scenarios of the system, and gives a good impression of the system usage. They however do not go deep down in the recognition techniques that were employed in the text recognition or the gestures. After two iterations of their work the authors found out that a plausible and usable system can be implemented using the InkSeine technology.

Discussion

This tool provides an important improvement in note taking applications for tablet PCs. Most of the applications for ink note taking found today either don’t provide many of the possible features and advantages achievable trough recognition, or have a complex user interface that makes the note taking non-natural. The learning curve of this application seems small enough to allow the novice user to use it, while still providing him with the ability to use embedded and rich content. I really like the idea and the UI, however I would have liked to see some numbers in the results in terms of accuracy, since the complete experience can turn very frustrating if the ink is not recognized correctly.

Reading #22: Plushie: An Interactive Design System for Plush Toys (Mori)

Comments

Sam

Summary

Plushie is a very similar tool than Teddy, However, Plushie is designed to use the 3D model as an intermediate step, where the real output result is a 2D pattern that allows the final user to sew a stuffed animal that resembles the sketched model. The resulting technique has many advantages over the traditional framework for creating 2D patterns for sewing. The fact that there is continuous feedback between the sketched 2D form and the 3D model allows the user to adjust the 2D pattern in simulation before spending time and money making physical prototypes.
The algorithms and techniques behind Plushie are very similar than in the previous paper, a 3D mesh is created based on the input strokes of the user. However this enhanced interface is designed for the particular application of sewing and displays the resulting cloth pattern in real time.

Discussion

This is a very interesting idea and application. It can be really useful and fun to design stuffed animals using this program. Moreover the simulation gives several advantages as the user can adapt the model as it sees the resulting 3D model at the same time that he can make sure he follows the appropriate constraints in cloth pattern. (Area of the cloth, number of parts…). I think the system successfully opened a door in a very different domain.

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design (Igarashi)

Comments

Jonathan

Summary

In this paper they introduce the concept of free sketching in 2D to create 3D shapes in a easy way. Teddy is an application that uses pen-input devices to allow the users to sketch and interact in the 2D space to create a 3D polygonal surface in a more creative manner. Unlike most 3D modeling tools teddy allows easy creation of freeform sketchy models in 3D which makes it ideal for fast prototyping and new users. The project uses recognition both for sketching and for gesture commands.
A novel user interface converts basic strokes in 3D shapes that can be rotated and edited. The edit commands include extruding shapes, smoothing and cutting. The final result is a 3D mesh that can be the input of the multiple tools available for 3D rendering and processing. The implementation was made in Java and exposed to the public for user studies.

Discussion

The application is very interesting and novel and allows creative users to approach more comfortably to 3D computer models. The paper shows good insight of what the application is capable to do and has a detailed explanation of the user interface and the resulting output. However I was expecting more in the implementation details in terms of gesture and sketch recognition, also they lack conclusions of their achievements. I think the ideas were good enough to expand more on sections 6-8 of the paper.

Wednesday, December 1, 2010

Reading #20: MathPad²: A System for the Creation and Exploration of Mathematical Sketches (LaViola)

Summary

MathPad² is a very nice sketch application that attempts to enrich the experience of doing math in tablet by animating the components drawn in the sketch. The paper focuses on the prototype application that involves shape and gesture recognition that enables the user to interact with the equations and the pictures that they model. One of the claimed contributions of mathpad is the novel gestural recognition used in the interface that is said to be more general and work fluently trough several domains as math typing and diagram drawing.

Discussion

This is one of those applications that encourage keeping working on sketch recognition. All of us that have dealt with an equation somewhere in our lives can value the power of receiving live feedback. Furthermore if that can be done in a completely natural interface that feels like the pen and paper that we used in school is even better. The interface that they show has simple yet very powerful ideas to manage gestures accurately and easily. I like the tap after the command as it is natural and easy to use but avoids the annoying false positives usually found in gesture recognizers.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields (Qi)

Comments

Sam

Summary

This is a top down recognizer that relies heavily on context to determine the correct classification of each stroke. In this case a model of Bayesian Conditional Random Fields is used to determine the classification of the strokes. Each stroke that is classified affects the classification of it neighbors. The paper provides a deep mathematical background towards the model compared to others. The first step in recognition is to fragment the strokes in order to create the Bayesian CRF. Note that the fragment here is defined in a different way than in other papers. It is not the line formed by each 2 consecutive point in the strokes, but the set of points in the stroke that could be recognized as a straight line. This implies corner detection as seen in previous posts. Then they can construct the BCRF and train it to make inference on the network. The results show different classifications on variations of the CRF, showing that the BCRF behaves better in the recognition. An improvement of Automatic Relevance Determination makes the recognition even better.

Discussion

A nice thing about this work is that it takes a concept of another field, (computer vision) and applies it succesfully to the domain of sketch recognition. It is not the first time that we see this phenomena. Since the sketch recognition is such an open field in the moment many works successfully or not attempt to convert the sketch recongnition problem into a more familiar one (fuzzy logic, graph searches, HMMs…). In this case, the Bayesian Conditional Random Fields show interesting results in this domain.

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Comments

JianJie

Summary

Once again, this paper focuses in separating text and shape. In this case a general approach is taken based mainly on spatial features of the ink. A graph is built that relates each stroke in the sketch based on its neighboring with other strokes. Then strokes that are grouped closed together can be identified by different recognizers. The novel approach here is that both grouping and recognition are done in parallel such that the recognition can judge if a grouping was good or not and in this case another grouping can be tried. Once a sketch is represented as a graph many of the usual algorithms in graph theory can be used. In this case an A* search is used to optimize the grouping of the strokes. The results show grouping accuracies of 90% and recognition with grouping of 85%.

Discussion

A very nice feature of this recognizer is that it does not require hand coded heuristics. This is very useful for a general recognizer that can be applied to many domains. However as it is usually the case the generalization comes at the price of lower accuracy. Other recognizers that are fine-tuned for particular domains show better accuracy. However this is a very good start-point and the grouping idea can be exploited in similar recognizers.

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Comments

JianJie

Summary

As in readings 13 & 14 this paper also addresses the problem of discerning between shape and text. The approach here is somewhat different than the previous posts, as this one uses not only features of the stroke but also characteristics of the sketch like gaps between strokes. Parting from an independent stroke model where features are extracted as in the other works to allow classification using cross-entropy. In a later step, machine learning techniques are used to take into account other important properties of the context of the stroke to improve classification. Particularly a Hidden Markov Model is used to represent the sketch and run algorithms to detect the optimal labeling for each stroke. The results shown are based on confusion matrices, they are not easily comparable to other recognizers but show internal differences amongst the use of context (independent, vs. uni-partite or bi-partite HMM).

Discussion

This paper presents another technique for classifying text vs shape. Altough the results are not very clear in terms of the accuracy of the recognizer in different domains, the concept of using context is very interesting. And even If their results cannot be trivially compared to others, they show improvements by using context. As a matter of fact the intuition of how humans recognize shape vs text in an apparently natural way relies heavily on concept. For instance the shape O in this paragraph would be classified by any normal person as the letter O. But in another context that letter O would be clearly classified as a wheel just depending on context. (See fig below).

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Comments

Kim

Summary

In this paper a recognizer is presented that bases its recognition on the topology of the sketched symbol and the relationship between its primitives. The work presented resembles ladder in the sense that geometric primitives are extracted and recognition is made based on the relationship between this primitives. However one important difference is that ladder proposes a language for describing shapes and symbols. Instead, this recognizer attempts to represent the primitives and their relationships in an attributed relational graph. And then compare this graph with the stored template graphs. The resulting accuracies are not particularly high compared with similar work at the time of publication, but some advantages are presented.
The first problem this recognizer addreses is the modeling of the sketch as an attributed relational graph (ARG).This is a crucial step for the recognizer and it is not trivial because it inherits all the problems of primitive shape recognition (corner finding, noise reduction, arcs vs polylines…) Fortunately much work has been done in this area such that relatively accurate primitive finders can be used. Then some features like similarity and error are described to be used in a graph matcher. Several graph matching techniques can be used, 4 of them are presented and compared in this paper.

Discussion

Although the accuracy results are not the best presented in the recognizers of this post this approach presents several advantages over other recognizers that are worth looking. Compared to Kara and other template based recognizers for example, this one still works under non-uniform scaling, also compared to ladder for instance presents the advantage of example training, although at the expense of some accuracy drop. For some domains this recognizer might be an interesting choice as it presents easy training and is robust for several types of symbols.

Thursday, November 18, 2010

Reading #15 An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches (Kara)

Comments

Whenze

Summary

This text describes a symbol recognizer that can be easily trained based on single examples of the symbols. The recognizer bases on template matching as other recognizers discussed before on this post. The novel approach of this recognizer relies on two basic aspects: the combination of multiple template matcher recognizers to provide a more accurate final recognition result. And a rotation invariance achieved by the transformation of the unknown symbol into polar coordinates which greatly reduces the rotation invariance processing time compared with other techniques. The accuracy reported is remarkably high, and the recognizer has very good time response.
The recognizer begins with a pre-recognition based on the transformation of the unknown shape into polar coordinates where the stored templates that are too dissimilar to the presented shape can be pruned away. Thereafter the remaining templates can be compared using 4 different methods (Hausdorff Distance, Modified Hausdorff Distance, Tanimoto Similarity Coefficient, and Yule Coefficient) these classifiers return their more likely shapes and a final module standardizes each output and combines the outputs together to form a final decision.

Discussion

This paper presents interesting results based on existing techniques. I think that the mayor contribution is the prerecognition based on polar transformation of coordinates, this smart idea converts the problem of rotation to a more natural domain making it easy to handle. Many advantages come from template matching as explained in the paper such as overtracing support and dashed lines. However shapes that are not always drawn with the same geometric proportions may be a problem (e.g. arrows), nevertheless for a standard symbol domain this is usually not the case so it is a very good option.

Wednesday, November 10, 2010

Reading #28 iCanDraw? – Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces (Dixon)

Comments on others

Chris

Summary

iCanDraw is the first application that uses Sketch Recognition to assist the user in learning how to draw. Although most of the algorithms and techniques of this paper are not new there is a major contribution in opening a new field of application for sketch recognition. They show sketch recognition can have great use in this kind of applications. The results going through 2 iterations of the application reveal that such application is feasible, and although much more studies have to be done to prove this is an efficient teaching tool, the end-to-end system is now available to begin such studies. Another important result of the paper is the set of design principles obtained from the user study in this kind of applications for assisted drawing using free sketch.
For the implementation of the application the user interface is remarkably well achieved. After a first iteration and a deep analysis of it, many mistakes or weaknesses were detected and corrected such that the final version of this interface is very user oriented and can give a more much effective teaching experience. Each face template goes through a face recognizer to extract its most prominent features, and then some hand corrections are done to finally get to a template of the ideal face sketch. The recognition then is mostly template matching oriented. Some gesture recognition is also used as part of the interface for actions such as erasing or undoing.

Discussion

The work presented opens a very interesting field of application to sketch recognition. In the sketch recognition class a project about how to draw an eye is one of the possible descendants of this project. I think one of the mayor challenges in this field is to determine the appropriate amount and quality of the feedback given to the user. If the user is forced to draw too close to the template the experience can be frustrating, but if it is too loose the improvement in drawing might be poor, a solution might be having several difficulty level in different lessons.

Wednesday, November 3, 2010

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Comments

Jhonathan

Summary

This paper also addresses the problem of discerning shape from text. Unlike the paper in the previous post this recognizer does not attempt to use a lot of features, instead it uses only one single feature to split shape vs. text. Entropy proved to be a very distinctive feature between shape and text. Entropy is a measure of uncertainty associated with a random variable; it is in other words the randomness of an object or system. Basically this gives the intuition that text is far more random than simple shapes. In order to measure this randomness in a sketch several steps were followed. First, the strokes were grouped on a time basis. Then, the sketch was resampled to leave every point in each stoke at the same fixed distance. With this angle each joint was classified in 7 possible labels and with this classification the overall entropy of the shape could be calculated according to the formula below.

Results show that this single feature is even better to differentiate shape vs. text than the combination of features shown by Plimmer. It achieved an accuracy of 95.56% with77.51% of the shapes classified (some were left as unclassified).

Discussion

This paper found a single feature that is very important to shape classification versus text. I think it is interesting that the paper analyzed the use of entropy by itself to be able to prove the power of this feature. However in a real classifier I would rely on more than this feature to be able to detect some of the cases not analyzed in this paper, for instance the musical notes, in which case entropy alone fails, but along with other features like density can discern accurately. Also other techniques may aid in the more general process, for instance wrong grouping that relies in time only can affect the whole classification. If other techniques such as growing boxes can detect and recover for a wrong groping this could turn this into a more robust classifier.

Tuesday, November 2, 2010

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Comments

Danielle

Summary

This paper addresses the issue of selecting the right set of features for sketch recognition. Since Rubine the feature based sketch recognizers have become very popular, yet the set of features that is used is somewhat empiric in each case. This paper proposes a more formal method to select the most relevant features that will lead to accurate and fast recognition for certain domain. In this case the feature selection is applied to the problem of differentiating shape versus text. For each of the sample shapes 46 features are extracted and a statistical partitioning technique is employed to find the most relevant ones. The aim is to find the optimal position of a split for each feature such that there are a minimal number of misclassified strokes. After doing this for all features the most selective and important features can be used to build a binary classification or decision tree as shown below. The results were compared with other two classifiers achieving overall better classification than these two.

Discussion

Perhaps the mayor contribution of this paper is the fact that it formally analyzes a way of selecting features and it includes a complete feature set including some new features and some of the most representative found in the literature. The decision tree structure may likely guide to misleading strokes in certain cases which makes it difficult to think that it will ever achieve perfect accuracy without the aid of geometric interpretation or other means of recognition. However this method can be very fast and for most practical purposes it provides a reliable classification.

Tuesday, October 19, 2010

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Comments on others

Danielle

Summary

In this paper recognition is based upon a constellation model or `pictorial structure' is used to aid in recognition. This basically implies that the recognition is not only based on the features of individual shapes but in the context around them. The distance and position of each recognized subshape relative to the others becomes important in labeling it as one thing or another. Some of the shapes are mandatory and some are optional, in the example of a model constellation of a face the mouth, eyes and nose are mandatory and the ears can be modular leading to the model shown in the figure below. In this case individual and pairwise features are calculated to process recognition, but the pairwise features are only calculated between mandatory parts to reduce time complexity. Also mandatory labels are assigned first in order to provide a better context to the usually larger number of optional parts.

Discussion

This paper is a very good example of the use of context in sketch recognition. In this case shape labeling is not only made based on geometric features of individual shapes but also in how they are located relative to each other. Also I find it interesting that it is highly inspired in computer vision, which allows sharing techniques and algorithms in both worlds.

Sunday, October 17, 2010

Reading #11 LADDER, a sketching language for user interface developers. (Hammond)

Comments on others

Jonathan

Summary

LADDER is a language that allows describing shapes in a high level almost natural language and then from this description automatically generates Sketch-based interfaces. These interfaces allow the user to draw shapes in a natural way and these shapes will be recognized, beautified and allow editing as described in LADDER language. The paper shows some related work but it seems there is nothing really alike LADDER to be formally compared; as a complete system is very innovative. Thus its contributions are very important to the field of sketch recognition, introducing the need of higher level languages to quickly generate sketch based interfaces and to reach a wider audience when it comes to the development of such interfaces.
LADDER does not intend to be a universal sketch recognizer builder; it is focused in diagram-like sketch interfaces with a fixed graphical grammar. A Shape in LADDER is defined by its components, constraints, aliases, editing behaviors and display methods. There are many features in LADDER that allow easy description of these attributes and methods: Hierarchical shape definition, abstract shapes and shape groups. This comes along with useful predefined shapes, constraints and display methods. A predefined shape beautification is also available based on the specified constraints using equation solving an ideal shape can be extracted out of a rough sketch.

(define shape OpenArrow
(description "An arrow with an open head")
(components
(Line shaft)
(Line head1)
(Line head2))
(constraints
(coincident shaft.p1 head1.p1)
(coincident shaft.p1 head2.p1)
(coincident head1.p1 head2.p1)
(equal-length head1 head2)
(acute-meet head1 shaft)
(acute-meet shaft head2))
(aliases
(Point head shaft.p1)
(Point tail shaft.p2))
(editing
( (trigger (click_hold_drag shaft))
(action
(translate this)
(set-cursor DRAG)
(show-handle MOVE tail head)))
( (trigger (click_hold_drag head))
(action
(rubber-band this head tail)
(show-handle MOVE head)
(set-cursor DRAG)))
( (trigger (click_hold_drag tail))
(action
(rubber-band this tail head)
(show-handle MOVE tail)
(set-cursor DRAG))))
(display (original-strokes)))

Discussion

By the time of this paper LADDER was still an early prototype considering its potential. The idea of a language able to describe shapes that is useful to automatically generate sketch recognizers to a particular domain is very powerful. The results and prototypes obtained with LADDER already show this power and also show that LADDER itself is a very good implementation of such idea. One of the usual tradeoffs in this kind of high level languages is between ease of use and scope of the language. As more complex domains are included either the language goes short of expressiveness or the language goes more complex. However since the scope of LADDER is limited the language is easy or natural enough to use by most developers and yet covers a very high range of its scope if not all.

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Comments on others

Chris

Summary

This is one of the early papers in free-hand sketching. It presents the HUNCH system, a set of software programs that process a sketch in order to advance towards a general sketch recognizer. The work does not focus on a particular domain but rather tries to explore techniques that may work for several domains. Amongst these techniques they explore corner finding, latching, use of context and over tracing. Many of this properties are now of common use in sketch recognition. This paper emphasizes in heavily involve the user towards the agreement on the machine interpretation in contrast of having a fully automated machine that probably does not really reflects the user intentions.

Discussion

This paper was published in 1976 yet it already covers many of the aspects of preprocessing in sketch recognition that are widely used today. For instance the corner finding algorithm used by STRAIT is very similar to the one presented by Sezign in 2006 as they use curvature and speed. Although the paper is rather general and does not go really deep into specifics, solving the problems found in recognition, its early publishing date most likely inspired many of the papers that were published recently.

Thursday, September 30, 2010

Reading #9. PaleoSketch: Acurate Primitive Sketch Recognition and Beautification (Paulson)

Comments on others

Kim

Summary

PaleoSketch is a primitive shape recognizer for free sketches. Unlike feature based sketches the user is allowed to draw freely. The intention of the recognizer is to match each single stroke to basic primitive shape (Line, polyLine, Circle, Ellipse, Arc, Curve, Spiral, Helix). The main contributions are:

Extend the set of recognizable shapes particularly by differentiating ellipses from circles and arcs from curves, and also by recognizing more complex shapes as spirals and helixes.
Support for overtraded sketches.
The introduction of 2 new features (Normalized Distance between direction extremes NDDE and Direction Change Ratio DCR)
Ranking algorithm to find the best fit for each shape.

The authors base the algorithm on a combination of previous methods, particularly in the pre-recognition stage, where some processing is made to the stroke and several features are calculated. Then several fit tests are made for each primitive and “interpretations” of the stroke are attached with their correspondent confidence. Finally the best-Fit is found with the ranking algorithm. The results show almost perfect accuracy and a real-time response time. Some examples are shown of higher level work built over paleo leaving space for promising future work.

Discussion

Unlike previous papers we now can discuss this work not only as article readers but as paleo users. After assignment No.1 we can talk with more confidence about paleo. The response time is in fact fast enough for most if not all practical purposes. And the accuracy of recognition works very, very well. Also the version of paleo that we are using that features an extended shape set (e.g. arrows) and the work we have done so far with the assignment along with other work such as mekanix show that definitively paleo is a very useful tool for higher level recognition. One thing I would like to know more about is the origin of the threshold values (a.k.a Magic Numbers). But most likely they are just as “magic” as the ones many of us are using in our homework. Another thing I found in this paper, and in general in most of the recognition papers, is the use of the term “recognition rate” as a quantitative form of qualifying the algorithm. And I am aware that this is common use, but even as paleo is the best free-sketch recognizer I have seen some far, either I am very unlucky or paleo does not really recognize 98.56% of the sketches in a general form. And I am guessing that the data was taken in a rather controlled environment. But as I said before this is an issue all of the recognition papers usually have. So I wonder if the recognition rate is really a meaningful number without a universally standardized dataset.

Wednesday, September 15, 2010

Reading #8. A Lightweight Multistroke Recognizer for User Interface Prototypes (Anothony)

Comments on others

Sam

Summary

$N Recognizer is a very nice extensión of the $1 recognizer. It’s most relevant contributions are the support of multi strokes gestures and 1D gestures and the bounded rotation invariance. Its purpose also extends from the $1 recognizer, it does not intend to be a super powerful, ultra accurate gesture recognizer, and instead it is a proposal of a simple recognizer to be easily implemented in any language with relatively few lines of code. Despite this the algorithm remains very fast and acceptably accurate. This makes it ideal for prototyping and for running in machines with low processing power.
The $N relies heavily on the $1 algorithm, but with several tweaks. One of them is the automatic generation of templates to match multiple ways of drawing the same stroke. This is particularly important in multistroke templates where the number of ways of drawing the same shape grows in a combinatorial form with each extra stroke, what would make it very annoying to the “training user” to draw all possible ways of drawing the same stroke. Some other tweaks are the speed optimization based on shape features, automatic discrimination and recognition of 1D shapes, and a parametric bounded rotation invariance. The system was tested in a high school environment and the results were quite satisfactory. Although for one stroke gestures the accuracy was not as good as the $1 it remains over 90% accuracy.

Discussion

The $N might has its limitations, and it may be even inferior to the $1 recognizing single strokes. However I say it is totally worth it for all the extra features. As I see it, the $1 recognizer is good for a very particular set of gestures, but it really limits the developer with some of it restrictions. As I noted in the discussion of $1 the full rotation invariance could be annoying in some cases, also the lack of 1D gestures is very limiting, and of course you are married with single stroke gestures. $N overcomes all of these issues and keeps accuracy acceptable for most applications and the speed is even better due to the optimizations. So far is my favorite gesture recognizer to be used in a simple command interface.

Reading #7. Sketch Based Interfaces: Early Processing for Sketch Understanding (Sezgin)

Comments on others

Summary

This paper addresses the problem of free sketch recognition. Unlike other papers dealing with gesture recognition posted here before, the free sketch does not intend that the user has a particular way of drawing a shape each time in order for the software to recognize it. This particular challenge defies previous approaches where a shape had to be drawn in the same way every time in order to be recognized accurately. For example in a “V” shape, the rubine features of starting and ending angles change dramatically if we start from the left or the right. This may be particularly annoying if the final application is a design tool for the early or creative stages of the process. So in the case you were using a Rubine or $1 for shape recognition in these types of applications and you were suddenly inspired about a design that came to mind, you then would have to stop the flow in order to remember how to draw a triangle “properly”.
The scope of the paper is in the early sketch understanding. And the approach they take to attack this problem is to start by identifying very low level geometric descriptions like lines and ovals. Their contribution was to develop a system that processed a stroke in three phases: approximation, beautification and basic recognition. The first intends to detect vertices in a shape in order to distinguish between the low level components of a shape. In order to do this they use several sources of information including curvature and speed. The second processes each of the detected components in order to make them look as the user intended. (i.e. make lines more straight, curves more smooth…). Finally basic object recognition is attempted to detect basic figures in the sketch such as ovals, rectangles and squares. This information could be used in specific application needs (e.g. the detection of a truss or a spring). The results in evaluation show a very good accuracy of 96% compared to 63% on previous works.

Discussion

The free sketch recognition attempts to take a natural human sketch in an early stage of a design process and make it into something understandable for the computer. I think a very important part in free sketch recognition is to identify the useful information in each particular domain. It is very greedy and perhaps useless to think of a general sketch recognizer at the high level. In other words, for instance an arrow in a mechanic engineering sketch could be identified as a force, and a rectangle as an object with mass, while in a UML diagram this arrow is an association and the rectangle is a class or interface. This paper addresses the low level shapes which I think is a good approach since only very basic shapes are really reusable across domains. Because of this, I think the mayor contribution is in the vertex detection, since the beautification and recognition could be improved substantially once the particular domain of the application is known.

Sunday, September 12, 2010

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (Li)

Comments on others

Wenzhe Li

Summary

Protractor is a recent gesture recognizer that resembles very much to the $1 recognizer, but addresses some of its limitations making it superior. The most remarkable advances of Protractor over the $1 recognizer are the way it handles rotation and scaling sensitivity and the amount of time taken by the overall algorithm as the number of examples grows. As in the $1 recognizer, protractor begins by resampling the gesture, then by rotating it, but not always to the indicative angle but possibly to other 7 possible angles to support rotation sensitivity. Also the scaling to a square is omitted since the calculated distance uses a different approach to the $1. These facts allow one dimensional gestures and rotation sensitive gestures to be better handled and recognized. In the last step, a closed-form solution for calculating the similarity between gestures is used. This improvement greatly changes the way the processing time changes with the number of examples. The author shows that in both $1 and protractor the accuracy gets better as the number of examples grows. However in $1 the processing time also grows substantially with the number of examples, while in protractor only grows at a much lower rate, making the overall balance between processing time and accuracy much better in protractor when several examples are provided.

Discussion

This is a nice tweak of the $1 recognizer. The author is very proud of the new closed form solution which provides a much better processing time with a large number of examples per gesture. And indeed I think it is a very nice job reducing the time complexity of the algorithm with the number of examples. However I feel that in practice the accuracy contribution due to rotation sensitivity and the non-scaling technique is the mayor contribution of this work. I think that the $1 was already fast and accurate with a fair number of examples (around 3) and in practice the user can barely tell the difference in time response in this case. On the other hand Protractor really improves some of the problems of recognition in the $1 (rotation sensitivity, recognition of narrow gestures) which unlike the time is completely obvious and relevant to the user.

Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Comments on others

liwenzhe

Summary

This paper proposes a very simple implement algorithm to recognize gestures that gives very accurate results in low processing time. They call this algorithm the $1 recognizer because of its simplicity to implement. Despite it has some known limitations, this algorithm provides very accurate results and in exchange it only asks for very low processing power and memory. This makes it ideal to have in mobile devices with low system specifications or in web applications. Its simplicity moreover makes it easy to implement in any prototyping design oriented environment like Flash. The author presents the algorithm and compares quantitative results with other known algorithms for recognition. The results show that this recognizer is more accurate than other relatively simple algorithms like Rubine, and that it is almost as good as sophisticated ones like DTW but with a much lower price to pay in implementation and in processing time. The algorithm itself is divided into four steps: resampling, rotating, scale-translate and optimizing the angle for best score.

Discussion

This is a very different approach from Rubine which gives us an idea that gesture recognition is not yet a universal recipe to solve recognition. Both algorithms came from very bright ideas and in fairly different ways solve the same problem quite successfully. After playing a little while with a Javascript implementation of the $1 Recognizer I was very pleased with recognition of predefined shapes if drawn correctly and the response time is almost immediate. However I find the direction independency can be a 2 edge weapon; it is very good if you think it successfully recognizes basic shapes in different rotations as the triangle showed. But also it leads to confusions of false positives as the arrow case shows, it states with a relatively high certainty that what a user may percept as a “left arrow” is really a v leaving very little threshold when compared to a gesture that indeed resembles a “v” (0.73-0.78). (Note that the left arrow is not exactly a 180º rotation of the right arrow but it will most certainly be the way an average user may draw it) So you really want to carefully determine if the particular application is suited for a rotation insensitive algorithm.

Wednesday, September 8, 2010

Reading #4: Sketchpad: A Man-Made Graphical Communication System (Sutherland)

Comments on others

Sam

Summary

In this paper Ivan Sutherland invented the computer graphics.
Although this first line summarizes the unbelievable work of this paper I will try to make a summary without reproducing the whole thing:
The main topic of this paper is The Sketchpad user interface. This is a graphical user interface that allows the user to draw and manipulate shapes on a screen monitor with the use of a light pen. The Sketchpad system runs in a TX2 computer which has several peripherals like switchboards and knobs, along with a 7“ display and a light pen that allows the user to interact with the program. Unlike actual graphics tables it was not pressure sensitive so the way of “lifting” the pen was by moving it fast enough for the computer to lose track. Also physical buttons allowed the computer to know which shape the user was intending to draw, for example a line or a circle. More importantly he introduced the ring structure which allowed the computer to represent and store graphical components in a way that allows efficient operations on the shapes. The use of instances and sub pictures was other of the great outcomes of this system. This object oriented design allows building shapes upon shapes and consistently maintaining the structure of all instances the same independent of size and rotation. Another important result was the use of constraints over a shape, such that the user can input the constraints and the computer will recalculate the shape to satisfy the constraint if possible and display the resulting shape. The author concludes noticing the advantages and disadvantages of using sketchpad as a design tool. He notes that for some complex drawings such as circuits it is only worthwhile to use sketchpad if you can further obtain something else besides the drawing (e.g. a simulation) however in repetitive patterns sketchpad showed very valuable. He also addresses various possibilities to continue the work on sketchpad such as 3D drawings.

Discussion

In many papers the important part goes around the experiments done on existing software and techniques and the results obtained by these experiments, having the conclusions as a very important part of the paper. However I think that the Sketchpad and the work behind it are the main characters of this paper, leaving the results and conclusions to be almost shallow compared with the real impact this work had on the computer world. One may read this paper without mayor shock knowing that most of the drawing tools nowadays can do similar things, going from PowerPoint to AutoCAD. But when you see the date of this paper you understand that all of those tools are a mere reproduction of what Sutherland invented here. The breakthroughs of this research cover a wide set of fields in Computer Science, such as object oriented programming, human computer interaction, and of course computer graphics. The paper is cited directly by more than 400 authors [CiteSeerX] and I think the work behind it may have inspired several other thousands. After watching the video of its demo I think that even now, 50 years later it is still an impressive system that is just beginning to have commercial matches like the ipad interface.

Sunday, September 5, 2010

Reading #3: “Those Look Similar!” Issues in Automating Gesture Design Advice (Long)

Comments on others

Jonathan Hall

Summary

Quill is a software program for creating pen gestures while giving important feedback to the user about the created gestures. Long describes the main functioning of the software, exposes the empirical results of using it, and provides insights about the challenges they faced during development. Quill tool uses the Rubine algorithm to recognize pen gestures based upon training by the user. Then, when a new gesture is entered, quill gives a warning to the user if one of two things happens: the gesture is very hard to recognize by the computer or a gesture is created where there is much perceptual similarity with another and might be difficult to remember for the final user. In both cases the user should change it for one that’s not alike previous ones or that has more unambiguous features (e.g. sharper corners for one shape smooth for the other). The second kind of warning however is based upon human conception of similarity so the software had to be preloaded with a model that judges perceptual similarity based upon a series of experiments made by the author.
Long finally discuses the challenges faced while developing, some of them in the user interface, some in implementation and some in the metrics used to determine similarity. This last one appears to be the one with more improvement to be done.

Discussion

In contrast to the Rubine paper this article is less deep in the algorithms and mathematics, as it is more of a presentation of a software program rather than of the algorithms and techniques behind it. I think that the idea of giving advice to the user is very good since unambiguous gestures will most probably lead to better results in recognition. However I think that the reach of the advice might be too eager when trying to advice a human about perception. For the computer It is easy and accurate to give advice in what can be recognize or not, but the perception of gestures on the other hand is a natural human skill which cannot easily be preloaded on the software, it will probably be a better approach if the user could qualify this kind of feedback (feedback the feedback) so the computer can continuously learn what it can judge as perceptually similar or not.

Saturday, September 4, 2010

Reading #2: Specifying Gestures by Example (Rubine)

Comments on others

Chris Aikens

Summary

In this paper Rubine introduces his toolkit GRANDMA, which serves as a tool to create gesture-based manipulation interfaces in a quick yet effective manner. These kinds of tools represent a mayor breakthrough in the world of gesture-based interfaces. One of the biggest barriers in the development of gesture based interfaces was the difficulty on creating them, this problem is now reduced to the correct usage of the right tools to automatically create a gesture based interface. As an example of its potential Rubine uses GPD, a program built upon the GRANDMA toolkit. In this program users can use gestures, both to sketch shapes and issue actions upon the shapes drawn (rotate, delete, copy…). Rubine explains how the gesture interface was added to GPD using GRANDMA in a relatively easy way.
The later part of this paper briefly describes the heart of GRANDMA in its statistical single-stroke gesture recognition. Every stroke or gesture is represented in the computer as a collection of 2D points in space and time, after some preprocessing is made some features are calculated from the stroke data. A feature in this case is a single numeric value that can be extracted from the gesture data; ideally a feature should be cheap to calculate (constant time per input point) and meaningful to the recognition of the shapes. Rubine proposes a way to determine from a stroke to which class (shape) it belongs, parting from the gesture, a set of features and an equivalent set of weights (importance) per each class or shape to be recognized linear classification is used to match strokes and shapes. The mentioned weights moreover do not have to be calculated by the programmer but instead they can be extracted by a set of examples for each class. Rubine finally exposes successful empirical results and discuses the extensions of his work and future directions.

Discussion

The work of Rubine already has shown its importance. The final part of the conclusions show some of the popular applications inspired on GRANDMA amongst them Garnet for the Palm Pilot interfaces and NeXT probably used as a predecessor for apple’s gesture recognizers. And these two already represent the most popular gesture-based products in the market.
I think that a key point of its success is simplicity, complex solutions do not tend to succeed. This one was easy and reliable and also the fact that the code for recognition is no longer hand coded gives much more flexibility and maintainability to the system.

Thursday, September 2, 2010

Reading #1: Gesture Recognition (Hammond)

Comments on others

Danielle

Summary

This article serves as a brief yet meaningful introduction to the gesture recognition systems. It mainly focuses on gestures as the one made by a pen in a 2D surface, i.e. a set of strokes. Each stroke defined as the path from the moment the pen touches the surface until it raises from it. Gesture recognition is proven to be a useful technique in some cases. However, the author does not encourage its use as main form of sketch recognition. The article also covers 3 techniques for gesture recognition. A relatively simple yet very recognized technique is the Rubine’s technique. Based on a finite set of quantifiable features of a stroke, this technique is able to determine the shape drawn taking into account that some training must occur previous to identification. An improvement of this method is proposed by Long, mostly by modifying the feature set, however many think that the improvement is not significant enough to pay the overhead of calculating a bigger feature set. Another very different approach is presented by Wobbrock, who proposes a simpler algorithm to implement, which also appears to have better accuracy than Rubines, however some of its drawbacks are that the running time to detect a shape is much longer than Rubine’s and also it omits information of the stroke that can be important in some cases (e.g. direction of a line). In general this article introduced gesture recognition as a way of recognizing shapes in a 2D surface, and presented 3 methods to achieve it. It was emphasized that gesture recognition depends on the way a shape is drawn (path of the pen), instead of focusing on the shape itself.

Discussion

This reading presents a good, easy to read introduction to gesture recognition, which I think is an efficient and useful technique for many cases where the user can be trained or is able to provide enough training to the recognizer. Rubine proposed a method that is still very used, which does not mean that is the best method one can imagine, but proves that it is adequate enough for most practical cases. The $1 dollar method on the other hand comes with high accuracy and simplicity in implementation but perhaps a high price to pay which is the running time. In many gesture recognition systems it is often wanted that the shape is not only recognized correctly but also quickly. A nice example of this is a Palm device, where a long recognition time could really ruin the whole user experience since every word would take too long to be written. In general I found first reading very handy as it not only describes gesture recognition in a simple and complete way but also gives an important first glimpse of the sketch recognition world.

Tuesday, August 31, 2010

First assignment, first post.

Hello, so this is the answer to the questionnaire provided in Sketch Recognition class to inaugurate this blog:

As for the picture you can look at my general profile, my email is pacovides at gmail dot com. for the rest I can tell you that this is my first year in the Master, that I am taking this class because I see great potential in this relatively new field of computer science and I have always been interested in AI in general, and the sketch recognition is very related, since it can be based upon AI methods and algorithms. About what I have to give to the class, besides my CS major I also have a major in Electronics Engineering which I think can contribute to some topics in the class, I also come from a Country in South America in a class where there is no big representation of this part of the world, so I can give another point of view. And mostly I come with a lot of disposition towards learning and working in group.
How do I see myself 10 years from now? 10 years is far away, I can barely plan what I will be doing tomorrow! but if I had to guess, I hope that 10 years from now I would be settling down somewhere in the world after a lot of travel, in a somehow stable job, starting a family. Yet I hope, by then my hunger for knowledge has not disappeared yet so I can keep learning from every day that goes by. The next advance in computer science is probably going to be related with the way final users interact with the machines, today mobile devices such as PDAs or mobile phones provide more and more processing power, and hardware capabilities such as new sensors and actuators. The way users interact with these machines is still in a very early stage of its potential. Integration is also playing an important role, such that one can access the same service from many different places or hardware clients in many different ways. My favorite course as an undergraduate was artificial intelligence, and my favorite movie The Matrix(the first one, not the triology). Besides the awesome effects and the action this movie along with others alike like The Thirteenth Floor thinks out of the box about what life can be and what is our role while living it. If I will travel back in time I would meet myself, not because I am an egomaniac, but it would just be an interesting paradox to play with.
An interesting fact about myself is that I am always open to questions and new friends, although my primary objective is this master, I am also here to learn from different cultures and meet new people. I love to travel and I have been in many places looking to expand my vision of life. So if you haven't met anyone from Colombia yet mail me and we can chat over a coffee or something, that is of course, after doing the sketch recognition homework.