Authored by
Praveen Srivatsa
Director, Asthrasoft Consulting
Artificial Intelligence is being
spearheaded by a set of common learning models - vision, speech, text,
conversations and decisions. Beyond these there are custom machine learning
models that are specific for different domains like fraud detection in
financial services, sentiment analysis in social media or traffic prediction
models for driving.
The common learning models have
multiple use cases that businesses can leverage out of the box. Overtime, these
models can be fine tuned and extended to many more scenarios. An example of
this is using face detection models for attendance, but then extending it to
track unauthorized access for stronger security round the clock.
Vision
One of the most widely used AI
applications is the use of vision models. With webcams being part of most
laptops, mobiles and surveillance devices, we have access to a lot of visual
data. This allows vision models to be used in various practical scenarios in
the real world.
The simplest application of vision
models is detection. This can be object detection - identify animals, cars,
obstacles - or even to identify people using face detection. Different
applications need different levels of accuracy - for example, an intruder
detection system only needs to identify if a human or an animal has ventured
into an unauthorized area. However an attendance or authorization system at an
airport or immigration needs a significantly higher accuracy of detection along
with classification and identification of the person.
Vision models can also be used for
other purposes like identifying obstacles for moving robots or drones,
detecting medical issues that can be visually identified, umpiring actions in
sports or analyzing images for copyrights. These have additional challenges and
need either higher accuracy of identification or the ability to detect in real
time for decision taking.
Speech
Audio is a common model for dealing
with sound and the largest form of audio detection is the interpretation of
human speech. Unless we are looking for a very specific sounds like an
ambulance, other sounds are too generic for direct interpretation. Human speech
however has many applications.
Services like Siri, Cortana and Alexa
have brought speech into our daily lives. Audio commands are becoming common in
many applications and by including such intelligence in mobile phones, they are
becoming ubiquitous. Many common tasks - switch on the lights, turn on the car,
play music, read my messages etc - leverage speech recognition. Speech to text
and text to speech technologies also allow speech to be used for dictation and
reading.
Currently, speech recognition is
limited by languages, dialects and accents. For this to become more mainstream
speech models will need to expand their understanding to a broader audience.
The other limiting factor for speech is that it needs to filter out the noise.
With multiple people talking simultaneously in a public space, speech models
will have to recognize and interpret which instructions to follow.
Text
Text intelligence lies in
understanding the written word. While it is a variation of either vision or
speech (using speech to text) the text intelligence can interpret whole
documents and map them into forms used for software applications. Examples of
text intelligence models include services that can scan an invoice and create
an invoice in the system, parse a medical record and summarize the diagnosis,
interpret a legal document and smartly identify sections that are applicable
for other cases and such.
Text intelligence also include
interpreting hand writings or drawings and converting the same into regular
text or images in applications. Smart glasses can read sign boards for people
who are vision impaired and can also allow them to ‘read’ books or
magazines.
Language
Language models typically extend text
or speech to support different languages. These models bridge the divide
between people who speak different languages. With over 7000 spoken languages
worldwide, these models allows us to tap into knowledge that spans across these
language boundaries including ones that are extinct like Latin and
Sanskrit.
Language translations are more useful
when dealing with simple operations like understanding a formal speech or
lecture or reaching out to local communities for social causes. However, even a
simple conversation service across languages can lead to hilarious or
embarrassing mistakes as each language has nuances that cannot easily be
translated into other languages.
Conversations
Conversations are a more complex form
of speech or language interpretations. When humans have a conversation, we
speak across different contexts. “Getting a home run” is not the same as
“running home” and the conversation models need to interpret these snippets in
the context of a baseball game. The models also need to understand the context
as the conversations are interrupted and resumed later.
Conversations are easier when we use
text like the chatbots on the web. However speech bots which can answer
at a call center can also use audio to respond to specific parts of the
conversations. Speech APIs like Siri, Cortana and Alexa are gradually ‘growing
up’ to manage conversations instead of just short speech instructions. But in
order to do so, they will need to have a lot of data and context of the person
they are trying to have a conversation with.
Decisions
While most of the other services
focus on interpretation, the decision models help in taking an action. While a
vision model can identify an obstacle, the decision model is responsible for
interpreting various options and taking a decision of turning a vehicle onto a
different path.
Decision services are more complex
and typically involve multiple inputs and interpretations before they instruct
an action. Decision services can take actions by themselves (moderate and
filter graphic content), or suggest decisions to humans (recommendation for
stock trades) or connect to a physical device to take decisions (open the gate
for an approaching authorized vehicle)
Custom Machine Learning Models
The common learning models are
available from multiple sources. But as the business use case evolves, these
models will have to be fine tuned for specific implementations. So while
a generic speech API might become quite adept for online support, support for
medical or legal terminology will be a specific service that will need a set of
custom trained models.
Businesses can jump start their AI
implementations by leveraging the common learning models for various uses cases
within their organizations. Doing so, brings in an AI driven culture in the and
allows the teams to focus on unlocking the value of AI in their organizations.
No comments :
Post a Comment