Tuesday, December 14, 2010

Reading #30: Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams (Hammond)

Comments

Summary

Sketch recognition is important because of it empowers the user with computer edition tools yet having a natural interface as in pen and paper. A clear application of this neat combination is Tauthi. In this work Hammond and Davis present a tool for drawing and editing UML diagrams using a pen input device. The work compares this implementation for drawing UML diagrams with a popular UML design application (Rational Rose) and with a popular drawing application (paint). The program offers two main modes, one where the strokes are beautified after recognition and one where the strokes are preserved as drawn by the user. The results show that users prefer the use of interpreted tauthi over paint and rational rose.
The software relies on several recognition techniques that are explained in the paper, such as corner finding, filtering, text detection (not recognition yet but planned as future work), grouping and others. The interface uses strokes both for drawing and for editing commands such as delete and move. Depending on the viewmode the strokes are edited accordingly to maintain consistency of the diagram.

Discussion

For us computer scientists that have to deal with UML diagrams, this is a magnificent application. I associated sketch recognition with diagram drawing since it is a domain were drawing with a marker in a blackboard feels much more natural. I usually draw UML first in a piece of paper or blackboard and then “beautify it” by doing it in Rose, Poseidon or any other UML tool. With this application however it may not longer be the case since hand drawings can be easily transformed into nice UML diagrams. However to make this a reality we still have path to cover not only in the UI and the recognizers but also in the available hardware, digital blackboards and notebooks are available now, but still are usually not affordable or present imprecise recognition. But I think this is a big step in the right direction.

Reading #29: Scratch Input Creating Large, Inexpensive, Unpowered and Mobile Finger Input Surfaces (Harrison)

Comments

Chris

Summary

This paper explores an alternative approach tan the classical pen for sketch recognition. Instead of using a tablet this system uses the sound of sketching to recognize gestures. This very novel approach enables users that do not have a tablet or digital pen input at hand to interact with a computer using the benefits of sketch recognition. This system uses a microphone adapted to a stethoscope as an input sensor. Basing on the features of the resulting sound wave like frequency and amplitude profiles, a particular signature or sound profile can be associated with different gestures. Due to the lack of spatial information in sound it is very difficult to distinguish shapes that are spatially different but have very similar sound profiles (e.g. M and W). The paper shows several applications of the scratch input (walls, mobile devices, tables and even fabric). In a set of simple gestures to be recognized that are dissimilar enough the achieved accuracies are of around 90% which is very impressive for an input source that is so limited.

Discussion

This paper opens a new door in sketch recognition. In the sketch recognition class several projects were inspired by this research work. The sound input presents many limitations to detect complex shapes because the available features are very limited, however it is good enough to recognize simple gestures, and provides an inexpensive and portable way of having an extra input source. I like the idea of some of the students in this class of having several microphones as input, this limits portability but improves accuracy so for instance, a blackboard with a grid of microphones behind can be used as an input device to control features of a classroom.

Thursday, December 9, 2010

Reading #28: iCanDraw? – Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces (Dixon)

Deja vú… Oh! yeah, it was already posted several days ago. Here it is.

Reading #27: K-sketch: A 'Kinetic' Sketch Pad for Novice Animators (Davis)

Comments

Summary

Animators in the world today have many tools to work with and the internet has shown that they are very interesting ideas from people with all types of backgrounds that not necessarily have an animation background. For this novice user as well as for the experienced user that needs a fast prototype of his idea, a tool that enables a quick transition between the creative stage and a simple animation is highly desired. Usually the animation process begins with a storyboard. K-Sketch basically enables the user to create a sketched storyboard that can be animated immediately allowing the user to better visualize the idea.
The paper discusses the implementation of K-sketch as well as the user interface. Studies took place to determine the optimal set of operations in the UI. The resulting tool was tested and compared to MS Powerpoint as a reference of simple animation. The users preferred K-Sketch as it showed a more natural way of animating objects.

Discussion

In fact animation tools today still have a relatively steep learning curve, plus they don’t feel as natural as the storyboards drawn on pen and pencil. In my case I used to spend some time in middle school doing sketches in the top corner of notebooks to pass pages rapidly “animating” the sketches. Off course the task was tedious but hey, what else there is to do in middle school? I guess kids now can instead draw on simple sketch in their tablets and enjoy the magic of animation using tools like K-Sketch in very simple steps. I like very much the path of investigation of this paper, creating magic with a pen is made easier.

Reading #26: Picturephone: A Game for Sketch Data Capture (Johnson)

Comments

Summary

This is a more detailed and particular description of the PicturePhone game described in reading #24. The Picturephone game consist of 4 consecutive steps

  • Player A is given a text description, and must make a drawing that captures that description as accurately as possible.
  • Player B receives the drawing and describes it in words.
  • Player C is given Player B’s description and draws it.
  • An unrelated player Player D is asked to judge how closely Player A and C’s drawings match, which assigns a score to players A, B, and C.

This enables to collect information on both word to sketch interpretation and also from sketch to words. The implementation allows researchers to use the collected data for their purposes such as training recognizers.

Discussion

Whooop! Short paper! However short, it gets to the point and accurately describes the application. The picturephone game is slower than Pictionary-like games, this can make it boring and not as fun. However it may report more reliable labels as they reward the accuracy of the picture and the descriptions. It serves as a nice compliment to the user that enjoys this type of low paced games.

Reading #25: A Descriptor for Large Scale Image Retrieval Based on Sketched Feature Lines (Eitz)

Comments

Summary

Internet changed the way we use and retreive information, now the problema is that there is so much information that efficient queries are necessary. Searching engines like google have done an amazing job at searching text. However searching for images is not always easy, particularly when the images in the database do not contain metadata. The paper proposes a way of searching images based on sketches of the desired image. The problem here is that the processing in each image has to be simple enough to allow the query to return results in feasible time. The contribution is a descriptor based on structure tensors that allows easy and efficient implementation to obtain query results in low times over a large database (less than 4 seconds for a 1.5 Million images database).

Discussion

This application shows a really interesting way of doing queries. The results are very impressive in terms of time and accuracy. The resulting images are not always as expected (draw a building return a railroad) but this is not necessarily a bad thing, as actually it gives the user the feedback that he draw a building that looks like a railroad. I don’t think sketch queries will replace text queries for images, as in usual cases it is easier to search for “tree” than to draw a tree. However they are multiple cases where this can be a very nice option as when a very abstract picture is searched or the user is not really sure of how to describe the image he clearly has on his mind. Also this enables images in the result set that do not have the adequate metadata, and allows Multilanguage search. (If I search for “tree” the image will metadata “arból” will most likely be discarded even if they mean the same).

Reading #24: Games for Sketch Data Collection (Johnson)

Comments

Summary

We have seen in many of the previous posts that an important part of the sketch recognition is data collection. It serves to analyze the features, train recognizers and test algorithms. However, data collection is sometime tedious and it is difficult to get external users to help with this task. In this paper they present an alternative where games are employed to collect such data. They present two particular games that are used to gather this data: Picturephone and stella sketch, the first emulates the broken telephone game, were a user depicts what is said in words and then a second puts into words what the first depicted. Then again the other players will try to sketch what the second player wrote so the sketches can be compared. On the Stellasketch a Pictionary game is implemented were one draws and all the others try to guess what is the picture. The games easily produced over 400 labels and 105 sketches. The future direction section also presents interesting ideas including a nice application of the data collection in the future directions section, a sketch based captcha where user draw a common object.

Discussion

Games are fun, so the idea of having a good game for data collection is a very interesting idea. However this imposes a problem over the reliability of the data. Usually serious is not fun, so in a game like these it would be easy to find “funny labels” that will make you laugh but will not be accurate. They even show this in their screens (e.g. “What is that man doing to that horse?”). However this can be overcome with enough samples to be able to detect false labels.

Reading #23: InkSeine: In Situ Search for Active Note Taking (Hinckley)

Comments

Summary

One of the most popular application of sketch recognition today is note taking. This is thanks to popular devices and applications such as the palm pilot and OneNote. In this paper they present a very nice application where the user can dynamically take notes and embed dynamic content within the notes without having to change the workspace context or the tool that is used. This enables a continuous workflow without interruption. The key part of the application is the Ink search where the user may draw an ink note and using gestures indicate that the selected strokes are text to be searched in a particular context, which is also selected via gestures. Thus without having to leave the canvas the user can create rich context over the ink notes.
The paper explains in detail the use case scenarios of the system, and gives a good impression of the system usage. They however do not go deep down in the recognition techniques that were employed in the text recognition or the gestures. After two iterations of their work the authors found out that a plausible and usable system can be implemented using the InkSeine technology.

Discussion

This tool provides an important improvement in note taking applications for tablet PCs. Most of the applications for ink note taking found today either don’t provide many of the possible features and advantages achievable trough recognition, or have a complex user interface that makes the note taking non-natural. The learning curve of this application seems small enough to allow the novice user to use it, while still providing him with the ability to use embedded and rich content. I really like the idea and the UI, however I would have liked to see some numbers in the results in terms of accuracy, since the complete experience can turn very frustrating if the ink is not recognized correctly.

Reading #22: Plushie: An Interactive Design System for Plush Toys (Mori)

Comments

Summary

Plushie is a very similar tool than Teddy, However, Plushie is designed to use the 3D model as an intermediate step, where the real output result is a 2D pattern that allows the final user to sew a stuffed animal that resembles the sketched model. The resulting technique has many advantages over the traditional framework for creating 2D patterns for sewing. The fact that there is continuous feedback between the sketched 2D form and the 3D model allows the user to adjust the 2D pattern in simulation before spending time and money making physical prototypes.
The algorithms and techniques behind Plushie are very similar than in the previous paper, a 3D mesh is created based on the input strokes of the user. However this enhanced interface is designed for the particular application of sewing and displays the resulting cloth pattern in real time.

Discussion

This is a very interesting idea and application. It can be really useful and fun to design stuffed animals using this program. Moreover the simulation gives several advantages as the user can adapt the model as it sees the resulting 3D model at the same time that he can make sure he follows the appropriate constraints in cloth pattern. (Area of the cloth, number of parts…). I think the system successfully opened a door in a very different domain.

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design (Igarashi)

Comments

Summary

In this paper they introduce the concept of free sketching in 2D to create 3D shapes in a easy way. Teddy is an application that uses pen-input devices to allow the users to sketch and interact in the 2D space to create a 3D polygonal surface in a more creative manner. Unlike most 3D modeling tools teddy allows easy creation of freeform sketchy models in 3D which makes it ideal for fast prototyping and new users. The project uses recognition both for sketching and for gesture commands.
A novel user interface converts basic strokes in 3D shapes that can be rotated and edited. The edit commands include extruding shapes, smoothing and cutting. The final result is a 3D mesh that can be the input of the multiple tools available for 3D rendering and processing. The implementation was made in Java and exposed to the public for user studies.

Discussion

The application is very interesting and novel and allows creative users to approach more comfortably to 3D computer models. The paper shows good insight of what the application is capable to do and has a detailed explanation of the user interface and the resulting output. However I was expecting more in the implementation details in terms of gesture and sketch recognition, also they lack conclusions of their achievements. I think the ideas were good enough to expand more on sections 6-8 of the paper.

Wednesday, December 1, 2010

Reading #20: MathPad2: A System for the Creation and Exploration of Mathematical Sketches (LaViola)

Summary

MathPad2 is a very nice sketch application that attempts to enrich the experience of doing math in tablet by animating the components drawn in the sketch. The paper focuses on the prototype application that involves shape and gesture recognition that enables the user to interact with the equations and the pictures that they model. One of the claimed contributions of mathpad is the novel gestural recognition used in the interface that is said to be more general and work fluently trough several domains as math typing and diagram drawing.

Discussion

This is one of those applications that encourage keeping working on sketch recognition. All of us that have dealt with an equation somewhere in our lives can value the power of receiving live feedback. Furthermore if that can be done in a completely natural interface that feels like the pen and paper that we used in school is even better. The interface that they show has simple yet very powerful ideas to manage gestures accurately and easily. I like the tap after the command as it is natural and easy to use but avoids the annoying false positives usually found in gesture recognizers.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields (Qi)

Comments

Summary

This is a top down recognizer that relies heavily on context to determine the correct classification of each stroke. In this case a model of Bayesian Conditional Random Fields is used to determine the classification of the strokes. Each stroke that is classified affects the classification of it neighbors. The paper provides a deep mathematical background towards the model compared to others. The first step in recognition is to fragment the strokes in order to create the Bayesian CRF. Note that the fragment here is defined in a different way than in other papers. It is not the line formed by each 2 consecutive point in the strokes, but the set of points in the stroke that could be recognized as a straight line. This implies corner detection as seen in previous posts. Then they can construct the BCRF and train it to make inference on the network. The results show different classifications on variations of the CRF, showing that the BCRF behaves better in the recognition. An improvement of Automatic Relevance Determination makes the recognition even better.

Discussion

A nice thing about this work is that it takes a concept of another field, (computer vision) and applies it succesfully to the domain of sketch recognition. It is not the first time that we see this phenomena. Since the sketch recognition is such an open field in the moment many works successfully or not attempt to convert the sketch recongnition problem into a more familiar one (fuzzy logic, graph searches, HMMs…). In this case, the Bayesian Conditional Random Fields show interesting results in this domain.

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Comments

Summary

Once again, this paper focuses in separating text and shape. In this case a general approach is taken based mainly on spatial features of the ink. A graph is built that relates each stroke in the sketch based on its neighboring with other strokes. Then strokes that are grouped closed together can be identified by different recognizers. The novel approach here is that both grouping and recognition are done in parallel such that the recognition can judge if a grouping was good or not and in this case another grouping can be tried. Once a sketch is represented as a graph many of the usual algorithms in graph theory can be used. In this case an A* search is used to optimize the grouping of the strokes. The results show grouping accuracies of 90% and recognition with grouping of 85%.

Discussion

A very nice feature of this recognizer is that it does not require hand coded heuristics. This is very useful for a general recognizer that can be applied to many domains. However as it is usually the case the generalization comes at the price of lower accuracy. Other recognizers that are fine-tuned for particular domains show better accuracy. However this is a very good start-point and the grouping idea can be exploited in similar recognizers.

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Comments

Summary

As in readings 13 & 14 this paper also addresses the problem of discerning between shape and text. The approach here is somewhat different than the previous posts, as this one uses not only features of the stroke but also characteristics of the sketch like gaps between strokes. Parting from an independent stroke model where features are extracted as in the other works to allow classification using cross-entropy. In a later step, machine learning techniques are used to take into account other important properties of the context of the stroke to improve classification. Particularly a Hidden Markov Model is used to represent the sketch and run algorithms to detect the optimal labeling for each stroke. The results shown are based on confusion matrices, they are not easily comparable to other recognizers but show internal differences amongst the use of context (independent, vs. uni-partite or bi-partite HMM).

Discussion

This paper presents another technique for classifying text vs shape. Altough the results are not very clear in terms of the accuracy of the recognizer in different domains, the concept of using context is very interesting. And even If their results cannot be trivially compared to others, they show improvements by using context. As a matter of fact the intuition of how humans recognize shape vs text in an apparently natural way relies heavily on concept. For instance the shape O in this paragraph would be classified by any normal person as the letter O. But in another context that letter O would be clearly classified as a wheel just depending on context. (See fig below).

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Comments

Summary

In this paper a recognizer is presented that bases its recognition on the topology of the sketched symbol and the relationship between its primitives. The work presented resembles ladder in the sense that geometric primitives are extracted and recognition is made based on the relationship between this primitives. However one important difference is that ladder proposes a language for describing shapes and symbols. Instead, this recognizer attempts to represent the primitives and their relationships in an attributed relational graph. And then compare this graph with the stored template graphs. The resulting accuracies are not particularly high compared with similar work at the time of publication, but some advantages are presented.
The first problem this recognizer addreses is the modeling of the sketch as an attributed relational graph (ARG).This is a crucial step for the recognizer and it is not trivial because it inherits all the problems of primitive shape recognition (corner finding, noise reduction, arcs vs polylines…) Fortunately much work has been done in this area such that relatively accurate primitive finders can be used. Then some features like similarity and error are described to be used in a graph matcher. Several graph matching techniques can be used, 4 of them are presented and compared in this paper.

Discussion

Although the accuracy results are not the best presented in the recognizers of this post this approach presents several advantages over other recognizers that are worth looking. Compared to Kara and other template based recognizers for example, this one still works under non-uniform scaling, also compared to ladder for instance presents the advantage of example training, although at the expense of some accuracy drop. For some domains this recognizer might be an interesting choice as it presents easy training and is robust for several types of symbols.