Thursday, September 30, 2010

Reading #9. PaleoSketch: Acurate Primitive Sketch Recognition and Beautification (Paulson)

Comments on others

Kim

Summary

PaleoSketch is a primitive shape recognizer for free sketches. Unlike feature based sketches the user is allowed to draw freely. The intention of the recognizer is to match each single stroke to basic primitive shape (Line, polyLine, Circle, Ellipse, Arc, Curve, Spiral, Helix). The main contributions are:
  • Extend the set of recognizable shapes particularly by differentiating ellipses from circles and arcs from curves, and also by recognizing more complex shapes as spirals and helixes.
  • Support for overtraded sketches.
  • The introduction of 2 new features (Normalized Distance between direction extremes NDDE and Direction Change Ratio DCR)
  • Ranking algorithm to find the best fit for each shape.
The authors base the algorithm on a combination of previous methods, particularly in the pre-recognition stage, where some processing is made to the stroke and several features are calculated. Then several fit tests are made for each primitive and “interpretations” of the stroke are attached with their correspondent confidence. Finally the best-Fit is found with the ranking algorithm. The results show almost perfect accuracy and a real-time response time. Some examples are shown of higher level work built over paleo leaving space for promising future work.

Discussion

Unlike previous papers we now can discuss this work not only as article readers but as paleo users. After assignment No.1 we can talk with more confidence about paleo. The response time is in fact fast enough for most if not all practical purposes. And the accuracy of recognition works very, very well. Also the version of paleo that we are using that features an extended shape set (e.g. arrows) and the work we have done so far with the assignment along with other work such as mekanix show that definitively paleo is a very useful tool for higher level recognition. One thing I would like to know more about is the origin of the threshold values (a.k.a Magic Numbers). But most likely they are just as “magic” as the ones many of us are using in our homework. Another thing I found in this paper, and in general in most of the recognition papers, is the use of the term “recognition rate” as a quantitative form of qualifying the algorithm. And I am aware that this is common use, but even as paleo is the best free-sketch recognizer I have seen some far, either I am very unlucky or paleo does not really recognize 98.56% of the sketches in a general form. And I am guessing that the data was taken in a rather controlled environment.  But as I said before this is an issue all of the recognition papers usually have. So I wonder if the recognition rate is really a meaningful number without a universally standardized dataset.

Wednesday, September 15, 2010

Reading #8. A Lightweight Multistroke Recognizer for User Interface Prototypes (Anothony)

Comments on others

Sam

Summary

$N Recognizer is a very nice extensión of the $1 recognizer. It’s most relevant contributions are the support of multi strokes gestures and 1D gestures and the bounded rotation invariance. Its purpose also extends from the $1 recognizer, it does not intend to be a super powerful, ultra accurate gesture recognizer, and instead it is a proposal of a simple recognizer to be easily implemented in any language with relatively few lines of code. Despite this the algorithm remains very fast and acceptably accurate. This makes it ideal for prototyping and for running in machines with low processing power.
The $N relies heavily on the $1 algorithm, but with several tweaks. One of them is the automatic generation of templates to match multiple ways of drawing the same stroke. This is particularly important in multistroke templates where the number of ways of drawing the same shape grows in a combinatorial form with each extra stroke, what would make it very annoying to the “training user” to draw all possible ways of drawing the same stroke. Some other tweaks are the speed optimization based on shape features, automatic discrimination and recognition of 1D shapes, and a parametric bounded rotation invariance. The system was tested in a high school environment and the results were quite satisfactory. Although for one stroke gestures the accuracy was not as good as the $1 it remains over 90% accuracy.

Discussion

The $N might has its limitations, and it may be even inferior to the $1 recognizing single strokes. However I say it is totally worth it for all the extra features. As I see it, the $1 recognizer is good for a very particular set of gestures, but it really limits the developer with some of it restrictions. As I noted in the discussion of $1 the full rotation invariance could be annoying in some cases, also the lack of 1D gestures is very limiting, and of course you are married with single stroke gestures. $N overcomes all of these issues and keeps accuracy acceptable for most applications and the speed is even better due to the optimizations. So far is my favorite gesture recognizer to be used in a simple command interface.

Reading #7. Sketch Based Interfaces: Early Processing for Sketch Understanding (Sezgin)

Comments on others

Oz

Summary

This paper addresses the problem of free sketch recognition. Unlike other papers dealing with gesture recognition posted here before, the free sketch does not intend that the user has a particular way of drawing a shape each time in order for the software to recognize it. This particular challenge defies previous approaches where a shape had to be drawn in the same way every time in order to be recognized accurately. For example in a “V” shape, the rubine features of starting and ending angles change dramatically if we start from the left or the right. This may be particularly annoying if the final application is a design tool for the early or creative stages of the process. So in the case you were using a Rubine or $1 for shape recognition in these types of applications and you were suddenly inspired about a design that came to mind, you then would have to stop the flow in order to remember how to draw a triangle “properly”.
The scope of the paper is in the early sketch understanding. And the approach they take to attack this problem is to start by identifying very low level geometric descriptions like lines and ovals. Their contribution was to develop a system that processed a stroke in three phases: approximation, beautification and basic recognition. The first intends to detect vertices in a shape in order to distinguish between the low level components of a shape. In order to do this they use several sources of information including curvature and speed. The second processes each of the detected components in order to make them look as the user intended. (i.e. make lines more straight, curves more smooth…). Finally basic object recognition is attempted to detect basic figures in the sketch such as ovals, rectangles and squares. This information could be used in specific application needs (e.g. the detection of a truss or a spring). The results in evaluation show a very good accuracy of 96% compared to 63% on previous works.

Discussion

The free sketch recognition attempts to take a natural human sketch in an early stage of a design process and make it into something understandable for the computer. I think a very important part in free sketch recognition is to identify the useful information in each particular domain. It is very greedy and perhaps useless to think of a general sketch recognizer at the high level. In other words, for instance an arrow in a mechanic engineering sketch could be identified as a force, and a rectangle as an object with mass, while in a UML diagram this arrow is an association and the rectangle is a class or interface. This paper addresses the low level shapes which I think is a good approach since only very basic shapes are really reusable across domains. Because of this, I think the mayor contribution is in the vertex detection, since the beautification and recognition could be improved substantially once the particular domain of the application is known.

Sunday, September 12, 2010

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (Li)

Comments on others

Wenzhe Li

Summary

Protractor is a recent gesture recognizer that resembles very much to the $1 recognizer, but addresses some of its limitations making it superior. The most remarkable advances of Protractor over the $1 recognizer are the way it handles rotation and scaling sensitivity and the amount of time taken by the overall algorithm as the number of examples grows. As in the $1 recognizer, protractor begins by resampling the gesture, then by rotating it, but not always to the indicative angle but possibly to other 7 possible angles to support rotation sensitivity. Also the scaling to a square is omitted since the calculated distance uses a different approach to the $1. These facts allow one dimensional gestures and rotation sensitive gestures to be better handled and recognized. In the last step, a closed-form solution for calculating the similarity between gestures is used. This improvement greatly changes the way the processing time changes with the number of examples. The author shows that in both $1 and protractor the accuracy gets better as the number of examples grows. However in $1 the processing time also grows substantially with the number of examples, while in protractor only grows at a much lower rate, making the overall balance between processing time and accuracy much better in protractor when several examples are provided.

Discussion

This is a nice tweak of the $1 recognizer. The author is very proud of the new closed form solution which provides a much better processing time with a large number of examples per gesture. And indeed I think it is a very nice job reducing the time complexity of the algorithm with the number of examples. However I feel that in practice the accuracy contribution due to rotation sensitivity and the non-scaling technique is the mayor contribution of this work. I think that the $1 was already fast and accurate with a fair number of examples (around 3) and in practice the user can barely tell the difference in time response in this case. On the other hand Protractor really improves some of the problems of recognition in the $1 (rotation sensitivity, recognition of narrow gestures) which unlike the time is completely obvious and relevant to the user.

Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Comments on others

liwenzhe

Summary

This paper proposes a very simple implement algorithm to recognize gestures that gives very accurate results in low processing time. They call this algorithm the $1 recognizer because of its simplicity to implement. Despite it has some known limitations, this algorithm provides very accurate results and in exchange it only asks for very low processing power and memory. This makes it ideal to have in mobile devices with low system specifications or in web applications. Its simplicity moreover makes it easy to implement in any prototyping design oriented environment like Flash. The author presents the algorithm and compares quantitative results with other known algorithms for recognition. The results show that this recognizer is more accurate than other relatively simple algorithms like Rubine, and that it is almost as good as sophisticated ones like DTW but with a much lower price to pay in implementation and in processing time. The algorithm itself is divided into four steps: resampling, rotating, scale-translate and optimizing the angle for best score.






Discussion

This is a very different approach from Rubine which gives us an idea that gesture recognition is not yet a universal recipe to solve recognition. Both algorithms came from very bright ideas and in fairly different ways solve the same problem quite successfully. After playing a little while with a Javascript implementation of the $1 Recognizer I was very pleased with recognition of predefined shapes if drawn correctly and the response time is almost immediate. However I find the direction independency can be a 2 edge weapon; it is very good if you think it successfully recognizes basic shapes in different rotations as the triangle showed. But also it leads to confusions of false positives as the arrow case shows, it states with a relatively high certainty that what a user may percept as a “left arrow” is really a v leaving very little threshold when compared to a gesture that indeed resembles a “v” (0.73-0.78). (Note that the left arrow is not exactly a 180º rotation of the right arrow but it will most certainly be the way an average user may draw it) So you really want to carefully determine if the particular application is suited for a rotation insensitive algorithm.

Wednesday, September 8, 2010

Reading #4: Sketchpad: A Man-Made Graphical Communication System (Sutherland)

Comments on others

Sam

Summary

In this paper Ivan Sutherland invented the computer graphics.
Although this first line summarizes the unbelievable work of this paper I will try to make a summary without reproducing the whole thing:
The main topic of this paper is The Sketchpad user interface. This is a graphical user interface that allows the user to draw and manipulate shapes on a screen monitor with the use of a light pen. The Sketchpad system runs in a TX2 computer which has several peripherals like switchboards and knobs, along with a 7“ display and a light pen that allows the user to interact with the program.  Unlike actual graphics tables it was not pressure sensitive so the way of “lifting” the pen was by moving it fast enough for the computer to lose track. Also physical buttons allowed the computer to know which shape the user was intending to draw, for example a line or a circle. More importantly he introduced the ring structure which allowed the computer to represent and store graphical components in a way that allows efficient operations on the shapes.  The use of instances and sub pictures was other of the great outcomes of this system.  This object oriented design allows building shapes upon shapes and consistently maintaining the structure of all instances the same independent of size and rotation. Another important result was the use of constraints over a shape, such that the user can input the constraints and the computer will recalculate the shape to satisfy the constraint if possible and display the resulting shape. The author concludes noticing the advantages and disadvantages of using sketchpad as a design tool. He notes that for some complex drawings such as circuits it is only worthwhile to use sketchpad if you can further obtain something else besides the drawing (e.g. a simulation) however in repetitive patterns sketchpad showed very valuable.  He also addresses various possibilities to continue the work on sketchpad such as 3D drawings.

Discussion

In many papers the important part goes around the experiments done on existing software and techniques and the results obtained by these experiments, having the conclusions as a very important part of the paper. However I think that the Sketchpad and the work behind it are the main characters of this paper, leaving the results and conclusions to be almost shallow compared with the real impact this work had on the computer world. One may read this paper without mayor shock knowing that most of the drawing tools nowadays can do similar things, going from PowerPoint to AutoCAD. But when you see the date of this paper you understand that all of those tools are a mere reproduction of what Sutherland invented here. The breakthroughs of this research cover a wide set of fields in Computer Science, such as object oriented programming, human computer interaction, and of course computer graphics. The paper is cited directly by more than 400 authors [CiteSeerX] and I think the work behind it may have inspired several other thousands. After watching the video of its demo I think that even now, 50 years later it is still an impressive system that is just beginning to have commercial matches like the ipad interface.

Sunday, September 5, 2010

Reading #3: “Those Look Similar!” Issues in Automating Gesture Design Advice (Long)

Comments on others

Jonathan Hall

Summary

Quill is a software program for creating pen gestures while giving important feedback to the user about the created gestures. Long describes the main functioning of the software, exposes the empirical results of using it, and provides insights about the challenges they faced during development. Quill tool uses the Rubine algorithm to recognize pen gestures based upon training by the user. Then, when a new gesture is entered, quill gives a warning to the user if one of two things happens: the gesture is very hard to recognize by the computer or a gesture is created where there is much perceptual similarity with another and might be difficult to remember for the final user. In both cases the user should change it for one that’s not alike previous ones or that has more unambiguous features (e.g. sharper corners for one shape smooth for the other). The second kind of warning however is based upon human conception of similarity so the software had to be preloaded with a model that judges perceptual similarity based upon a series of experiments made by the author.
Long finally discuses the challenges faced while developing, some of them in the user interface, some in implementation and some in the metrics used to determine similarity. This last one appears to be the one with more improvement to be done.

Discussion

In contrast to the Rubine paper this article is less deep in the algorithms and mathematics, as it is more of a presentation of a software program rather than of the algorithms and techniques behind it. I think that the idea of giving advice to the user is very good since unambiguous gestures will most probably lead to better results in recognition. However I think that the reach of the advice might be too eager when trying to advice a human about perception. For the computer It is easy and accurate to give advice in what can be recognize or not, but the perception of gestures on the other hand is a natural human skill which cannot easily be preloaded on the software, it will probably be a better approach if the user could qualify this kind of feedback (feedback the feedback) so the computer can continuously learn what it can judge as perceptually similar or not.

Saturday, September 4, 2010

Reading #2: Specifying Gestures by Example (Rubine)

Comments on others

Chris Aikens

Summary

In this paper Rubine introduces his toolkit GRANDMA, which serves as a tool to create gesture-based manipulation interfaces in a quick yet effective manner. These kinds of tools represent a mayor breakthrough in the world of gesture-based interfaces. One of the biggest barriers in the development of gesture based interfaces was the difficulty on creating them, this problem is now reduced to the correct usage of the right tools to automatically create a gesture based interface. As an example of its potential Rubine uses GPD, a program built upon the GRANDMA toolkit. In this program users can use gestures, both to sketch shapes and issue actions upon the shapes drawn (rotate, delete, copy…). Rubine explains how the gesture interface was added to GPD using GRANDMA in a relatively easy way.
The later part of this paper briefly describes the heart of GRANDMA in its statistical single-stroke gesture recognition. Every stroke or gesture is represented in the computer as a collection of 2D points in space and time, after some preprocessing is made some features are calculated from the stroke data. A feature in this case is a single numeric value that can be extracted from the gesture data; ideally a feature should be cheap to calculate (constant time per input point) and meaningful to the recognition of the shapes. Rubine proposes a way to determine from a stroke to which class (shape) it belongs, parting from the gesture, a set of features and an equivalent set of weights (importance) per each class or shape to be recognized linear classification is used to match strokes and shapes. The mentioned weights moreover do not have to be calculated by the programmer but instead they can be extracted by a set of examples for each class. Rubine finally exposes successful empirical results and discuses the extensions of his work and future directions.

Discussion

The work of Rubine already has shown its importance. The final part of the conclusions show some of the popular applications inspired on GRANDMA amongst them Garnet for the Palm Pilot interfaces and NeXT probably used as a predecessor for apple’s gesture recognizers. And these two already represent the most popular gesture-based products in the market.
I think that a key point of its success is simplicity, complex solutions do not tend to succeed. This one was easy and reliable and also the fact that the code for recognition is no longer hand coded gives much more flexibility and maintainability to the system.

Thursday, September 2, 2010

Reading #1: Gesture Recognition (Hammond)

Comments on others

Danielle

Summary


This article serves as a brief yet meaningful introduction to the gesture recognition systems. It mainly focuses on gestures as the one made by a pen in a 2D surface, i.e. a set of strokes. Each stroke defined as the path from the moment the pen touches the surface until it raises from it. Gesture recognition is proven to be a useful technique in some cases. However, the author does not encourage its use as main form of sketch recognition. The article also covers 3 techniques for gesture recognition. A relatively simple yet very recognized technique is the Rubine’s technique. Based on a finite set of quantifiable features of a stroke, this technique is able to determine the shape drawn taking into account that some training must occur previous to identification. An improvement of this method is proposed by Long, mostly by modifying the feature set, however many think that the improvement is not significant enough to pay the overhead of calculating a bigger feature set. Another very different approach is presented by Wobbrock, who proposes a simpler algorithm to implement, which also appears to have better accuracy than Rubines, however some of its drawbacks are that the running time to detect a shape is much longer than Rubine’s and also it omits information of the stroke that can be important in some cases (e.g. direction of a line).  In general this article introduced gesture recognition as a way of recognizing shapes in a 2D surface, and presented 3 methods to achieve it. It was emphasized that gesture recognition depends on the way a shape is drawn (path of the pen), instead of focusing on the shape itself.

Discussion

This reading presents a good, easy to read introduction to gesture recognition, which I think is an efficient and useful technique for many cases where the user can be trained or is able to provide enough training to the recognizer. Rubine proposed a method that is still very used, which does not mean that is the best method one can imagine, but proves that it is adequate enough for most practical cases. The $1 dollar method on the other hand comes with high accuracy and simplicity in implementation but perhaps a high price to pay which is the running time. In many gesture recognition systems it is often wanted that the shape is not only recognized correctly but also quickly. A nice example of this is a Palm device, where a long recognition time could really ruin the whole user experience since every word would take too long to be written. In general I found first reading very handy as it not only describes gesture recognition in a simple and complete way but also gives an important first glimpse of the sketch recognition world.