A child’s success is governed by their ability to master basic reading and math skills early in life. Yetdue to location, costs, class sizes and other issues, many children never have the opportunity to learnthese fundamental skills. E-learning is trying to remove some of these barriers by offering quality,individualized lessons (often free or for a low cost) through the Internet.
While schools and parents are embracing e-learning and happily incorporating technology into children’slives, the majority of the available options are predominantly computer based and rely on the keyboard and mouse, which is not necessarily good for young people working on basic reading and math skills.
A recent study by the University of British Columbia found that students who use mechanized writinghave a harder time retaining new ideas. Children using computers osculate between using their twohands and manipulating the mouse. While students handwriting their answers are less distractedbecause the pencil allows them to focus on a single point. Additionally, the researchers found that handwriting requires students to use visual, cognitive, and motor skills: look at the paper in front ofthem, remember the shape, and actually form the symbol, while the keyboard requires less of theseskills.
With touch screens, smartphones, and tablets becoming more prevalent in young people’s lives, it is nowpossible to incorporate handwriting into educational apps. Knowing the advantages of handwriting, it iseasy to guess that this is the next revolutionary step for e-learning.
myBlee is an example of an early pioneer of this technology. We have created over 50 teaching apps for6-15 year olds that uniquely require users to write the answers on the iPad’s screen. Using handwriting recognition technology by Vision Objects®, myBlee found that simply by handwriting answers rather thantyping them, children have more fun and feel better engaged in learning.
Technology can open amazing doors to learning, but at the same time it does not mean leaving behindthe traditional ways. By incorporating handwriting recognition in an iPad app, the learning experiencerapidly becomes that much more interesting for everyone.
Here're 2 short videos from The Verge providing great information and details about the most popular styluses for iPad. We hope they will help you decide which one you prefer!
It takes some thinking to realize why handwriting recognition is so easy for humans and so difficult for computers. As for other artificial intelligence tasks, a key-notion is probably ambiguity.
One of the reasons why information retrieval is difficult is because words are ambiguous. Typing your query on your favorite search engine sometimes leads you to completely unexpected results just because a word has several meanings. Similarly, a shape can correspond to several different letters, especially if you add to the equation that handwriting is a human activity and, as such, intrinsically irregular, misleading and error-prone.
Fortunately, there is a lot of contextual information that will help us, as humans, resolve the ambiguities without even noticing it.
To understand it better, let's take this little test!
How well can you recognize, without context? (Click on the buttons below to see the recognition steps)
Check out the following strokes out of their context. Can you tell with an absolute certainty which letter they refer to?
Well, you can have a guess: e u l m s t e
Now if I give you more context, let's say the letter before, how easier is it?
You definitely get more information and validate some of earlier guesses, such as the first e letter, with an implicit assumption that you are looking at an English sentence.
On the other hand, the last character now looks more like a question mark.
Given the whole sentence, an English-speaking human is immediately able to decode it without errors:
The whole context shows that our e was in fact an o and our m was the concatenation of an r and an n.
We have not compared the recognition on the full sentence with the one achieved taking words in isolation but, in some cases, there is a slight advantage to have a larger context. In the small experiment above, it is not really possible either to tell if a native speaker would do much better than a second-language learner, but, again, the former has actually more (correct) assumptions about how the language works and how people usually write it. This will help him to recognize better on average. This is our task to gather up all these small bits of information that sometimes give a decisive advantage to output the proper result.
Porting this to a handwriting recognition engine, "the recognizer"
Let us see on several configurations of a handwriting recognizer how this translates.
Running a simple character by character recognizer on the sentence above, we get the following result:
Hew mooh shewld tihegevernmeotraise faxes ?
The recognizer has no knowledge of the word and letter boundaries and has to infer everything from the signal. This, combined with the approximate shape of some of the letters, explains why we are so far off the mark.
What if we add some linguistic knowledge such as a flat list of the valid words in English?
Hew much should tike government raisefaxes?
We have highlighted in orange, words that are now properly recognized. It turns out that, for this example, a list of vocabulary has a huge positive effect.
Now what? Well, we know that some words are much more frequent than others so if we take into account this information together with the vocabulary list, we do solve one of the two remaining errors:
Hew much should the government raise faxes?
The is much more frequent than tike so integrating frequency information is equivalent to telling the recognizer: "when you hesitate between the and tike, pick the, even if the shapes of letters resemble slightly more to tike". It's purely mathematic.
To get the first word correct eventually, the second letter being much more like an e, we need more information than that. We saw that context helps at the character level, we should now take it to the word level and get the right answer:
How much should the government raise taxes?
How is already more frequent than Hew, but the frequency ratio between "How much" and "Hew much" is not in the same order of magnitude. Look at the number of hits on a search engine if you have any doubt... A similar phenomenon is at work with "raise taxes" versus "raise faxes".
Humans are incredibly good at recognizing shapes, so good that it is virtually impossible to determine all the prior language or context-specific knowledge it takes to decode handwriting in all kind of situations. If you write Melk on a piece of paper on the fridge, members of your family will use highly semantic prior knowledge about the fact that a list of paper on the fridge is a shopping list and that Milk is much more frequent than Melk in this context to decode your (bad) handwriting. Computers are not there yet. However, explicitly modeling prior knowledge to integrate it to handwriting recognition helps to understand how much is involved and to which degree. Luckily enough, for many use cases, achieving very good recognition rates does not require to teach a computer what a shopping list is!
In
today’s fast-paced, highly competitive business environment, we all
know that improving the productivity of mobile workers is critical. This
is especially true in field service organizations. In order to increase
productivity and profitability, successful organizations are looking
for fast and efficient ways to streamline information workflows by
capturing and processing useful data at the point of entry. This is a
key area where digital writing can have significant business impact
using natural input such as handwriting.
Recently, Suite
Solutions, a provider of high-speed internet, cable TV, digital
satellite TV, and digital phone services to residential apartment and
condominium communities, was looking for a way to automate data capture
in the field.
Capturing
subscriber agreements and service forms during field visits had
traditionally been administered using pen and paper forms. This method
was highly inefficient and costly due to the need for printed forms.
Orders were sometimes lost and paperwork was incomplete. This led to
cycle time delays.
Suite Solutions needed a cost-effective solution that could digitize the data being captured and be simple to train technicians to use. Suite Solutions evaluated multiple device options and applications
and decided to equip their field technicians with the naturalForms data
capture solution and Samsung Galaxy® tablets.
Suite
Solutions’ paper subscriber agreement and service forms were converted
to digital versions for use on the Android™ tablets. The forms maintain
the look and feel of the paper forms but contain “smart form” functions
such as drop-down option lists, mandatory fields, and validation rules
that enable easy, accurate and complete data capture.
Technicians
now use digital writing to complete sales and service forms
electronically using natural input methods such as handwriting. The
information collected is verified for accuracy and validated at the
point of entry. Digital copies are available immediately and sent
wirelessly to the Suite Solutions home office and emailed to customers -
reducing cycle times by up to 97%.
Technicians
can even print a copy of the form directly from naturalForms if
requested by the customer. At properties where wireless and Wi-Fi
availability are sometimes unreliable, technicians can still collect
data when offline and automatically send the forms when connections are
restored.
With
digital writing, Suite Solutions has been able to streamline its data
capture and processing workflows, resulting in the following
improvements:
Improved Productivity, Reduced Cycle Time Since
employing digital writing to naturally collect data in the field, the
time to process subscriber agreements has been reduced from up to a
month with paper forms to same day.
Better Data Quality and Operational Control With
business rule validation built into the forms, technicians must collect
required data before they can submit forms for processing. Option lists
make collecting the required information efficient and ensure accuracy
for complex fields such as equipment types. Customer signatures are
designated as a mandatory field and contracts cannot be processed
without them.
Ease of Deployment and Rapid User Acceptance Because
the tablet forms have the appearance of the original paper forms,
technicians did not have to learn a new app or system. The familiar
forms essentially became the app interface. Technicians quickly embraced
this new technology and have found that they can enter information
faster than with paper forms by using the input method most comfortable
for them - stylus or finger handwriting, keypad, or an external
Bluetooth keyboard if preferred.
"We
have achieved our return on investment in three months from
administrative and operational cost savings and improved tech
productivity alone," said Steve Ranson, Operations Manager, Suite Solutions. “We
have also enhanced our customer experience. Clients are delighted when
we can immediately email the agreement they just signed and we are
delighted to get rid of paperwork.”
Any field service organization still collecting data on paper forms should consider the business case for digital writing.
When usability
hindered the democratization of technology
August 1993,John Sculley was the CEO of Apple at that time. He introduced the Newton MessagePad,
a personal assistant with a touchscreen, no keyboard, integrating a cursive
handwriting recognition program. He presented it as a “revolution in the computing
history.” In numerous respects this product prefigured the smartphones and tablets
that are now behind success of the company. Most of the time, people recall the
Newton as a flop. A
controversial way of thinking that I can understand easily watching videos
and reading articles about this subject. One of them especially caught my attention. It was saying that Newton does
not “understand”, Newton needs to “learn”, and first and foremost: users need a
guide to use it correctly (yes, we are really talking about Apple!). The result?
A bad user experience and a stark observation:
the technology can’t adapt itself to human behavior leading people to make an
important effort to control their wild machine.
Today’s
handwriting recognition is ready to take the leap
Good news, times have changed! Handwriting recognition
technologies that are now on the market no longer require learning curves. The
user can write naturally, in any language, freed from constraints that once prevented them from
making the most of the technology. From now on, you can grasp your tablet or
smartphone as readily as a sheet of paper. Text, geometrical forms, mathematical symbols, musical notes, drawings,..: digital writing gives the user back control
of their machine, thus restoring the essence of an HMI: usability.
The concept of “personal
assistant” regains meaning
So are we ready to reconsider handwriting recognition as
a potential HMI? In any case, the criticisms heard at the time of the Newton
have now an odd echo. Handwriting recognition, now so easy-to-use, can really transform
our machines into really intuitive personal assistants. Besides, this is what
is happening with the success of Siri (voice recognition) even if this
technology is not perfect yet. This concept of “personal assistant” makes even
more sense in the voice recognition market driven by ‘Mobility’, but we’ll talk
about this further down in this article.
When mobility
evolves faster than HMI
Anchored “Input”
and “Output” notions
I am not pretending I’m holding the truth, but I am convinced
that the PC concept was meant for a sedentary use, imposing accessories that
people commonly use and that have proven their efficiency, despite of the fact
they are not natural: screens, keyboards and mice. I say “not natural”, because
they imply this specific learning curve I talked about earlier, far from our
historical communication modes: handwriting and voice. These accessories
introduced the Input/Output concept, but now the increasing need for mobile
devices is disrupting these “standards”…
The touchscreen
revolution leads us to reconsider our standards
Imagine if you had to plug a keyboard and a mouse to
your tablet, your Smartphone, your TV, or even your fridge. It sounds ridiculous,
right? This is proving that the revolution already occurred. The
democratization of touchscreens has sounded
the death for the HMI’s “Input”
and “Output” concept. From now on, the screen is playing the role of both “Input”
and “Ouput” media, becoming an HMI in itself. It pushes the boundaries of the direct
object manipulation, using gestures that are more natural than the interactions
we got accustomed to with keyboard and mouse. But there is still a long way to
go!
The need for
“content creation devices” and the “one dimension (linear)” frustrations
What people
have to say to each other using machines
I think we all agree, our smartphones and tablets are
perfect tools for media consumption. As long as mobility remains complementary
to sedentary uses, mobile devices can play this role perfectly. In this
scenario we will be using these devices to read the news, check an agenda,
watch a video and so on. But the boundary between sedentary and mobile uses is
becoming fuzzy, our mobile devices are yearning to become content creation tools: “mobile
workstations.”
James Cannavino
(former IBM Chief Strategist) said, “It would be
foolish to deny the importance of effective communication between man and
machine, in both directions. My prediction is that
the real revolution of the coming decades will come from what people have to
say to each other using machines.”
The
impossibility of “talking” about things in a linear (1D) world
The revolution is on the way, but it is stuck. With
touchscreens we were expecting to experience the freedom offered by paper,
writing down ideas as they come to our mind, in 2D. This is how we see and
think but this is not something we can do with a virtual keyboard which is
today’s most commonly used HMI. So, even if the touchscreen seems to be the
ideal interface for such use, as of today, it is impossible to fully benefit
from it.
Handwriting
recognition, key to freedom?
This is what handwriting recognition is all about:
allow us to produce and share this 2D content using a touchscreen. In the ideal
world, thanks to handwriting recognition, I will be able to create my
presentations, my flowcharts, my mind mappings, my DIY sketches… anywhere, from
my tablet. Handwriting recognition is in fact the only relevant HMI that will unlock
the use of our connected devices in a mobile environment.
And what about voice
recognition?
Actually, we might have dismissed voice recognition a
bit too quickly. As mentioned before, I have to confess that voice recognition works
effectively as a mobile HMI. You can ask your Smartphone to call grandma or send
her a text message for her birthday. It can also work with refrigerators,
vacuum cleaners, TVs, and so on, but it is all about the kind of content you
want to create and the use case…
About the content
Remember: we are seeking for 2D. The voice belongs to
the linear (1D) world. In case you doubt about it, ask Siri to create this
“diagram” ;-) !
About the use case
As for the use case, I would personally feel
embarrassed if I had to shout at my phone to make myself understood in the
middle of a noisy place or to whisper during a meeting if I want to be
discreet…
Future is
bright for handwriting recognition
Playing a major
role in the evolution of HMI
As you’ve guessed, I am convinced that handwriting
recognition has to be a driving force for innovation in the mobile and
connected markets. It is complementary to voice recognition. By combining both
technologies we will finally move forward this “revolution in the computing
history.”
Is it a dream?
I arrived at Vision Objects a few months ago, and I immediately
noted the power and the potential of this technology. Then I quickly understood
that the world is going to change, and fast… In fact, the world was already changing.
So, the answer is no, it is not a dream. Market trends confirm what we were feeling:
Samsung floods the market with its HWR-enabled devices, Audi is also at it in the automobile realm, and even Google is showing
interest in this new environment.
What uses for
tomorrow?
Yes, even the automobile industry… You would think
that use cases are unlimited. I believe it too. So, please feel free to share
your ideas with us leaving a comment and tell us how you see the future with
handwriting HMIs. Alan Kay, one of the founders of Xerox PARC, said: “the best
way to predict the future is to invent it.” Invent it with us.
“The pen is mightier than the sword if the sword is
very short, and the pen is very sharp.” – Terry Pratchett
This past
Christmas, I chose a rather cool Meccano set that will make a spaceship for my
8-year-old nephew. Little did I know that he secretly would have preferred … a
fountain pen. Yes, his room is filled with a small but impressive collection of
pens in all shapes and sizes, showing action heros, cartoon characters, groovy
colours and shapes. I empathise: I also am a lover of pens and the stationer’s
still holds a nostalgic fascination for me, with all its beautiful pens,
notebooks, diaries and so on. Maybe that is why I ended up working with
handwriting recognition technology, turning handwritten items into digital
text.
So it is a
little disappointing to occasionally hear, “Handwriting is as good as dead!”
Maybe the pen-haters are partly right. After all, don’t we techno-savvy urban
dwellers have keyboards and number pads for everything these days?
But let’s not put
aside handwriting so quickly. Writing is a long-standing cultural and social tradition,
like tool making, or food preparation, and unique to humans. No doubt our early
prehistoric ancestors first experimented with writing by using a stick or their
finger to draw marks on the ground, probably to make rudimentary maps or
instructions. Later, symbols representing numbers were also written, for
trading purposes, as fingers and toes were limited for expressing quantity! The
idea of making a symbol to express an idea was born and has continued to evolve
exponentially. Humans grew to like writing, even making it into an art form,
with calligraphy.
We also came to
like the writing instruments used to make handwriting. Prehistoric man used
fingers dipped in pigments or liquids, or sticks burnt to charcoal, then
brushes made with animal hair. From there, we discovered that various sharpened
implements could make finer strokes, making more elaborate symbols, and for
centuries, the feather quill was the pen of choice. Human ingenuity has worked over time to
create writing instruments of enormous variety. From quill pens, we invented
fountain pens, that could be filled with ink and carried around, then ballpoints
containing specially thickened ink. We have gone through chalks, pastels,
crayons, pencils, erasable pens, roller points, gel ink. And all of that to put
words to paper.
But was it worth
it? If the pen has been an invaluable tool to man for centuries, and even
turned into a work of art, what is its place in today’s digitized world? As
much as this would break the heart of many a die-hard geek, not everybody loves
a keyboard. Not everyone’s job involves a day at a computer, and despite
technology’s omnipresence, not everyone can type, type really efficiently, at
any rate. Because of this, some innovative companies have over the last 15
years worked on digital pen and paper technology, providing a bridge between
the comfortable art of writing on paper and practical modern computing. Users
of these generally like them because they feel comfortable writing by hand, or
they like the portability of a pen.
So, obviously, the
humble ballpoint is fit for the garbage bin! Right?
The answer is
not so simple. Writing is not actually a biological function, and yet over
time, deciphering meaning from the written word, reading and writing, has become
standard learned behaviour for the human brain. Recent neurological research
(such as that carried out by Jean-Luc Velay, a French researcher) focuses on
the connection between our hands and our brains when we write. During
handwriting, a certain area of the brain is activated. This is a part of the
brain involved in motor activities, indicating that the idea behind a word and
the actual movement of writing that word by hand are associated. Much of this
same area is activated when you read. So writing and reading are associated,
neurologically, with hand movement. The same does not occur when you type on a
keyboard: your hand performs the same keying motion regardless of the word you
are writing. These same studies have shown that in tests involving the learning
of a set of letters from a foreign language, the subjects that wrote them by
hand were much more successful than those using only a keyboard. This provides
an excellent reason to insist on the importance of handwriting in childhood
education (and maybe in any learning activity). Writing by hand also provides
small children with fine motor skills development during their brains’ vital
“formative years”. Even though schools
in industrialised nations are becoming increasingly digital, handwriting
remains a vital skill.
Let’s not forget
that although pen and paper usage has declined since mass computerization, it
has not disappeared. Sure, more and more information is created on a screen, but
not necessarily with a keyboard, virtual or otherwise. Perhaps this intimate biological
connection between brain and hand is why for some people, typing just doesn’t
have the same effect as writing. IT workers have trouble imagining it, but
there are large numbers of people (especially the elderly) for whom the
keyboard remains a source of anxiety. They have sausage fingers, or trembling
hands and find keyboards fiddly, bothersome or confusing. Or their daily
activities don’t require any typing and they have remained faithful to their
first writing instrument, the pen.
Of course, for
the last few years, touch screens, smartphones and tablets have entered almost
everyone’s lives. Some of these come equipped with a mini-keyboard, and all
have a virtual keyboard. However, many
users still like to work with a pseudo-pen (a stylus), for speed, or because
they dislike virtual keyboards, or because they like to add a personal touch to
an increasingly impersonal environment. For those same people who struggle with
a keyboard, the stylus on a touchscreen can be a great relief. They can write
by hand but they are not dependent on paper. Interestingly, mobile devices have
brought us back to a situation where we hold a writing surface in one hand and
enter data with the other. This has been a great boost for handwriting
recognition applications, and has also led to great improvements in inking and
digital writing technology.
In the end, why
choose? Perhaps the greatest blessing is the enormous variety of choice that we
now have, compared to past writers. We have pens of all sizes and shapes, we
can have pens that write on paper or pens that write straight onto a computer,
we can type a letter and attach a handwritten signature, we can write onto a
screen and see it as handwriting or convert it to text. We can even write maths
now and see our calculations as digital text (writing maths by hand is
infinitely simpler than typing them!). Sure, the keyboard is part of our lives,
but the pen’s simplicity will let it live to see another day. I would even say
that writing by hand still holds a special place in human culture. Even those
who don’t harbour a soft spot for a beautiful pen, perhaps still yearn for that
human touch that writing provides: in a world of mails, tweets, and texts, only
the hardest hearts would deny that a handwritten “I love you” goes straight
from the hand to the heart.
Las Vegas, January 7,
2013 - Ideally scheduled the day before CES’ opening, the CTS 2013 focused on “Deploy(ing)
a Dynamic Infotainment & HMI Platform to a Deliver a Cutting Edge User
Experience for the Everyday Driver”.
The
latest studies from IMS Research forecast that 37.5 million cars will be
equipped with Touch Screens by 2019 and at the same time ABI Research estimates
that over 210 million cars will be connected by 2016. These two very strong
trends will completely transform your cars by taking them from a static 7-8
years life cycle to a dynamic “mobile consumer” life cycle where the OEMs might
become your new “App Store”.
But a car is meant to be driven and safely driven. So it’s
the right moment to ask yourself “How am I going to interact with all these new
features and services while driving securely?”
With expert speakers coming from several of the top OEMs
such as Ford, Mercedes-Benz, Porsche, Subaru, Toyota, Kia, Jaguar and Honda,
CTS tried to respond to critical technical and business questions. Here’s an
abstract of tweets to give you some inside from the show:
@Telenav: one phone
destroying OEM, aftermarket, and PND. “Already 80% of all Nav platforms”!
@thilokos(Gartner)
Distracted driving is an opportunity not a threat; make your mobile connect
safely and you've got a winning solution. @thilokos (Gartner)
also says by 2016 one third of car R&D will be for telematics +
infotainment.
Green Hills: Is Android the future OS
for Vehicles? “Compelling but significant challenges”
Agero: Among the apps axioms:
multi-modal HMI for safe driving
Agero: Apps in car:
Driver Distraction & Dynamic HMI are key
It’s now obvious that the Consumer Electronics and the
Automotive markets are starting to merge. One of the proofs being the presence
of an impressive Automotive section at CES. The challenges for the car OEMs are
numerous. A complete reengineering of the existing business processes which are
focused on long life-cycles are required to cope with the consumer pace. But
even more crucial is to imagine how to create a continuous & seamless user
experience from outside to inside the car and at the same time allow a
non-distractive driver experience. The answer is in the HMI…
I had the pleasure to participate in the 2013 International
CES (Consumer Electronics Show). For those who never heard about the CES (are
you serious?), it’s the world’s largest consumer electronics tradeshow that takes place
in Las Vegas, attracting more than 150,000 visitors and more than 3,000
companies showcasing their latest innovative solutions over 1.9 million square
feet (176 516 m²) of exhibition space (that’s really huge!).
As part of the Vision Objects team, I was at the core of the
digital writing area (at the Upper level of the South Hall). This year, in addition to demonstrating our products, we welcomed 10 of our partners representing the automotive,
educational, enterprise and mobile markets and showcasing their solutions
powered by MyScript®.
To give you a brief overview of the atmosphere encountered
there, I have created this video.
I tried to highlight main actors that are, or could be,
linked to digital writing and some surprises!
So, switch on the sound, and enjoy your virtual CES
experience.
Most people have already heard about OCR (Optical Character
Recognition). The general understanding is that this technology allows one to
transform a picture containing text into editable text. However, other
approaches exist for text recognition that do not rely on the same type of
input, work differently, and have different capabilities and accuracy levels. In
this post, I will provide an overview of different strategies used to recognize
text, the way they work and how they differ from each other.
What can
be recognized?
Before attempting to understand how it is possible to recognize text, it
is important to first identify what can be recognized. Although it is
possible to recognize a lot of different contents (shapes, mathematical
equations, graphs, music, etc.), I will only focus on text recognition,
that is to say the process of turning text into computer-readable characters.
What do we mean by "text"? Text can take different forms
as it can come from a machine or from a person. Machine writing corresponds to
what a computer can print or display. It is a sequence of characters, usually
distinct from each other’s, resulting from the juxtaposition of glyphs
from one or more fonts. The text you print from the web, the bills you receive
on bad days, all this is probably machine writing. Handwriting, on the contrary
is a human form of expression. From a text recognition perspective, we can
distinguish between handprint (where each character is separated from the
others) and cursive writing. To ease the recognition, you may have to write
handprint characters into different boxes (common on administrative forms, with
sometimes restrictions like the constraint to only use capital letters): these
characters are defined as isolated handwritten characters.
The following figure provides a few examples. Note the
intra-character cursiveness in the case of cursive Chinese (f); it sometimes
also occurs inter-character cursiveness, where strokes are shared between
several characters. Boxed text (b) corresponds here to isolated handprint
characters.
How to
recognize text?
Describing in details how text recognition works is out of the
scope of my article. Despite the disparities between the different recognition
engines that are available on the market or are being developed in research
institutes, it is possible to present a few basic principles.
Most systems rely on "machine learning" techniques: they
are trained on sample data to be able to classify the characters they will
later face. While there exists a huge variety of techniques, artificial neural
networks are often good at taking into account the fact that fonts may differ
or that humans may not write all characters exactly the same way.
While it is possible to try to individually identify characters, a
frequently-taken approach consists in breaking them into smaller segments that the engine will attempt to
recognize. These segments can be used to characterize different writing styles
and to handle cases like cursive handwriting where the separation between
characters is not always obvious.
To improve the recognition results, some sorts of post or
real-time processing is commonly done, involving spell checkers or linguistic
resources.
Offline
or online systems
Text recognition engines are often classified into two main
families: offline or online systems. These categories reflect fundamental
technology choices and are closely linked to the type of input they can
process.
The offline approach consists in analyzing an image (typically
obtained from an optical scanner or a camera) to extract characters that are in turn converted to a computer readable format. Such techniques require the
extraction of the actual text data from the image background, which is a
non-trivial process that can have a negative impact on their performance.
Onlinerecognition, on the
contrary, does not rely on the analysis of a bitmap to convert writing to
computer-understandable text. Instead, it requires a digital signal, often
referred to as digital ink, that
corresponds to a succession of pen coordinates organized in a timely manner.
Such information usually come from digitizers, touch screens or digital
pens. An added bonus of online recognition is the ability to convert
handwriting in real-time while it is being written. Other data like pressure
and tilt (the angle made by a stylus with the vertical) can sometimes also be
used. Due to the way they work, online techniques are more suitable than
offline techniques for handwriting recognition.
The following figure provides a few examples of electronic devices
that capture digital ink and can feed online recognition engines.
Different
approaches
We have seen that it is possible to recognize different types of
writing, provided basic information as to how recognition works and explained
that two main approaches could be taken depending on the input. We will now
investigate three different classes of engines: OCR, ICR and HWR.
OCR stands for Optical Character Recognition. This covers offline systems that can
recognize typewritten text. They usually rely on neural networks and relatively
simple segmentation algorithms, with sometimes basic post-treatments like
spelling corrections. Dedicated to machine writing, they have difficulties to
recognize handprint, and cursive writing is completely out of their scope.
ICR, standing for Intelligent Character Recognition, deals
with recognizing isolated handwriting characters and may be offline or online.
Mixed or cursive characters are hard for them to recognize, which is why they
often require boxed input to achieve satisfactory results. They are widely used
to automatically process bank checks, postal codes (to sort mail more effectively
than any human would) or to scan and interpret various administrative forms
(those where you have to write in small boxes).
People have often been exposed to offline OCR/ICR software that
comes bundled with scanner hardware.
Natural Handwriting Recognition (NHR or HWR) is a class of online or offline systems that
handle handprint and cursive handwriting, without requiring the user to comply
with any particular constraint (hence the natural adjective). Much more complex than ICR
systems, these engines usually involve a combination of character recognition,
segmentation and linguistic analysis techniques to determine what the user
actually wrote. This tries to tackle the chicken-and-egg problem stating that a
good segmentation is required for accurate character recognition but a good
recognition is needed to properly segment.
Accuracy
It is not easy to compare the accuracy of OCR, ICR or HWR systems,
since they often apply to different use cases: OCR focuses on typewritten text,
ICR on isolated characters and HWR on unconstraint handwriting.
Time information is a very important aspect of online recognition
sytems. It usually makes them more accurate than offline systems that also have
the hurdle to extract text data from the image background.
My
personal vision
I’m convinced the future will focus on natural handwriting
recognition as it is the only technology allowing the user to write the same
way on a paper as on/with a digital device. This is groundbreaking: we don’t
have to adapt ourselves to machines; it’s the other way round now! HWR is finally the only technology that can consistently and accurately understand human writing whether it is handprint or cursive
style and can also take into account the cultural dimension of the language involved (cf. the
previous article).
This topic leads to the discussion of different HMI approaches,
something we’ll definitely cover at some point in this blog.
As native speakers of French, some of us tend to view the pronunciation
of a language and its writing as intrinsically associated. After all, an O is
an O and mimics the shape of the mouth when we pronounce the corresponding
sound. So the ''naturalness" of this association must be somehow
universal. Looking at other languages, as computational linguists, tells you
that nothing could be more wrong.
The fact that French, or English for that matter, is written with a
unique alphabet wherever it is spoken is rather the exception than the rule. In
Europe, Serbian can be inherently written in Cyrillic or Latin without any influence
on the pronunciation. Linguists use the term digraphia to indicate that a given language is commonly associated
with several writing systems. An extreme example of this is Azerbaijan where,
at the crossroads of Arabic, Russian and Roman influence, you may encounter
three scripts for the same language. The fact that Japanese words can be
written with several characters sets: hiragana, katakana, kanji or romaji is probably more
well-known. Also in Asia is an example showing that the writing of a language,
before being shared by billions of people, is the result of an historical and,
in this case, political process. The language of Han people, commonly referred
to as Chinese, is written with "traditional" characters in Taiwan and
these characters have been modified by the Chinese government to become
simplified Chinese. In this case, it may be easier for two people to understand
each other speaking than writing, another very unusual situation with our latin
alphabet. Also worth mentioning, Vietnamese is an Asian language written with
the Latin alphabet. It makes it quite unique from a natural language processing point of view, since despite the Latin alphabet, the identification of word boundaries is somehow similar to what is done on other Asian languages, the syllable being the most natural semantic unit. History explains this since, before western influence, there were other writing systems in Vietnam. Developing
handwriting recognition technology that supports multiple languages is
therefore a complex and passionate task. It’s not just the matter of building
and managing a database of words, but more so the deep understanding required
due to the historical and cultural dimension intrinsically tied to these.
Throughout the following year, we’ll highlight all the languages we
support, some of which are relatively unknown, and share with you the field
adventures experienced by our ‘writing sample collectors’ that travel the world
to collect digital ink directly at its source!