Paco's Sketch Recognition Blog

Thursday, December 9, 2010

Reading #24: Games for Sketch Data Collection (Johnson)

Comments

Summary

We have seen in many of the previous posts that an important part of the sketch recognition is data collection. It serves to analyze the features, train recognizers and test algorithms. However, data collection is sometime tedious and it is difficult to get external users to help with this task. In this paper they present an alternative where games are employed to collect such data. They present two particular games that are used to gather this data: Picturephone and stella sketch, the first emulates the broken telephone game, were a user depicts what is said in words and then a second puts into words what the first depicted. Then again the other players will try to sketch what the second player wrote so the sketches can be compared. On the Stellasketch a Pictionary game is implemented were one draws and all the others try to guess what is the picture. The games easily produced over 400 labels and 105 sketches. The future direction section also presents interesting ideas including a nice application of the data collection in the future directions section, a sketch based captcha where user draw a common object.

Discussion

Games are fun, so the idea of having a good game for data collection is a very interesting idea. However this imposes a problem over the reliability of the data. Usually serious is not fun, so in a game like these it would be easy to find “funny labels” that will make you laugh but will not be accurate. They even show this in their screens (e.g. “What is that man doing to that horse?”). However this can be overcome with enough samples to be able to detect false labels.

Reading #23: InkSeine: In Situ Search for Active Note Taking (Hinckley)

Comments

Jonathan

Summary

One of the most popular application of sketch recognition today is note taking. This is thanks to popular devices and applications such as the palm pilot and OneNote. In this paper they present a very nice application where the user can dynamically take notes and embed dynamic content within the notes without having to change the workspace context or the tool that is used. This enables a continuous workflow without interruption. The key part of the application is the Ink search where the user may draw an ink note and using gestures indicate that the selected strokes are text to be searched in a particular context, which is also selected via gestures. Thus without having to leave the canvas the user can create rich context over the ink notes.
The paper explains in detail the use case scenarios of the system, and gives a good impression of the system usage. They however do not go deep down in the recognition techniques that were employed in the text recognition or the gestures. After two iterations of their work the authors found out that a plausible and usable system can be implemented using the InkSeine technology.

Discussion

This tool provides an important improvement in note taking applications for tablet PCs. Most of the applications for ink note taking found today either don’t provide many of the possible features and advantages achievable trough recognition, or have a complex user interface that makes the note taking non-natural. The learning curve of this application seems small enough to allow the novice user to use it, while still providing him with the ability to use embedded and rich content. I really like the idea and the UI, however I would have liked to see some numbers in the results in terms of accuracy, since the complete experience can turn very frustrating if the ink is not recognized correctly.

Reading #22: Plushie: An Interactive Design System for Plush Toys (Mori)

Comments

Sam

Summary

Plushie is a very similar tool than Teddy, However, Plushie is designed to use the 3D model as an intermediate step, where the real output result is a 2D pattern that allows the final user to sew a stuffed animal that resembles the sketched model. The resulting technique has many advantages over the traditional framework for creating 2D patterns for sewing. The fact that there is continuous feedback between the sketched 2D form and the 3D model allows the user to adjust the 2D pattern in simulation before spending time and money making physical prototypes.
The algorithms and techniques behind Plushie are very similar than in the previous paper, a 3D mesh is created based on the input strokes of the user. However this enhanced interface is designed for the particular application of sewing and displays the resulting cloth pattern in real time.

Discussion

This is a very interesting idea and application. It can be really useful and fun to design stuffed animals using this program. Moreover the simulation gives several advantages as the user can adapt the model as it sees the resulting 3D model at the same time that he can make sure he follows the appropriate constraints in cloth pattern. (Area of the cloth, number of parts…). I think the system successfully opened a door in a very different domain.

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design (Igarashi)

Comments

Jonathan

Summary

In this paper they introduce the concept of free sketching in 2D to create 3D shapes in a easy way. Teddy is an application that uses pen-input devices to allow the users to sketch and interact in the 2D space to create a 3D polygonal surface in a more creative manner. Unlike most 3D modeling tools teddy allows easy creation of freeform sketchy models in 3D which makes it ideal for fast prototyping and new users. The project uses recognition both for sketching and for gesture commands.
A novel user interface converts basic strokes in 3D shapes that can be rotated and edited. The edit commands include extruding shapes, smoothing and cutting. The final result is a 3D mesh that can be the input of the multiple tools available for 3D rendering and processing. The implementation was made in Java and exposed to the public for user studies.

Discussion

The application is very interesting and novel and allows creative users to approach more comfortably to 3D computer models. The paper shows good insight of what the application is capable to do and has a detailed explanation of the user interface and the resulting output. However I was expecting more in the implementation details in terms of gesture and sketch recognition, also they lack conclusions of their achievements. I think the ideas were good enough to expand more on sections 6-8 of the paper.

Wednesday, December 1, 2010

Reading #20: MathPad²: A System for the Creation and Exploration of Mathematical Sketches (LaViola)

Summary

MathPad² is a very nice sketch application that attempts to enrich the experience of doing math in tablet by animating the components drawn in the sketch. The paper focuses on the prototype application that involves shape and gesture recognition that enables the user to interact with the equations and the pictures that they model. One of the claimed contributions of mathpad is the novel gestural recognition used in the interface that is said to be more general and work fluently trough several domains as math typing and diagram drawing.

Discussion

This is one of those applications that encourage keeping working on sketch recognition. All of us that have dealt with an equation somewhere in our lives can value the power of receiving live feedback. Furthermore if that can be done in a completely natural interface that feels like the pen and paper that we used in school is even better. The interface that they show has simple yet very powerful ideas to manage gestures accurately and easily. I like the tap after the command as it is natural and easy to use but avoids the annoying false positives usually found in gesture recognizers.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields (Qi)

Comments

Sam

Summary

This is a top down recognizer that relies heavily on context to determine the correct classification of each stroke. In this case a model of Bayesian Conditional Random Fields is used to determine the classification of the strokes. Each stroke that is classified affects the classification of it neighbors. The paper provides a deep mathematical background towards the model compared to others. The first step in recognition is to fragment the strokes in order to create the Bayesian CRF. Note that the fragment here is defined in a different way than in other papers. It is not the line formed by each 2 consecutive point in the strokes, but the set of points in the stroke that could be recognized as a straight line. This implies corner detection as seen in previous posts. Then they can construct the BCRF and train it to make inference on the network. The results show different classifications on variations of the CRF, showing that the BCRF behaves better in the recognition. An improvement of Automatic Relevance Determination makes the recognition even better.

Discussion

A nice thing about this work is that it takes a concept of another field, (computer vision) and applies it succesfully to the domain of sketch recognition. It is not the first time that we see this phenomena. Since the sketch recognition is such an open field in the moment many works successfully or not attempt to convert the sketch recongnition problem into a more familiar one (fuzzy logic, graph searches, HMMs…). In this case, the Bayesian Conditional Random Fields show interesting results in this domain.

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Comments

JianJie

Summary

Once again, this paper focuses in separating text and shape. In this case a general approach is taken based mainly on spatial features of the ink. A graph is built that relates each stroke in the sketch based on its neighboring with other strokes. Then strokes that are grouped closed together can be identified by different recognizers. The novel approach here is that both grouping and recognition are done in parallel such that the recognition can judge if a grouping was good or not and in this case another grouping can be tried. Once a sketch is represented as a graph many of the usual algorithms in graph theory can be used. In this case an A* search is used to optimize the grouping of the strokes. The results show grouping accuracies of 90% and recognition with grouping of 85%.

Discussion

A very nice feature of this recognizer is that it does not require hand coded heuristics. This is very useful for a general recognizer that can be applied to many domains. However as it is usually the case the generalization comes at the price of lower accuracy. Other recognizers that are fine-tuned for particular domains show better accuracy. However this is a very good start-point and the grouping idea can be exploited in similar recognizers.