• Applied AI Learning Models

    Authored by Praveen Srivatsa Director, Asthrasoft Consulting


    Artificial Intelligence is being spearheaded by a set of common learning models - vision, speech, text, conversations and decisions. Beyond these there are custom machine learning models that are specific for different domains like fraud detection in financial services, sentiment analysis in social media or traffic prediction models for driving.

    The common learning models have multiple use cases that businesses can leverage out of the box. Overtime, these models can be fine tuned and extended to many more scenarios. An example of this is using face detection models for attendance, but then extending it to track unauthorized access for stronger security round the clock.

    Vision

    One of the most widely used AI applications is the use of vision models. With webcams being part of most laptops, mobiles and surveillance devices, we have access to a lot of visual data. This allows vision models to be used in various practical scenarios in the real world.

    The simplest application of vision models is detection. This can be object detection - identify animals, cars, obstacles - or even to identify people using face detection. Different applications need different levels of accuracy - for example, an intruder detection system only needs to identify if a human or an animal has ventured into an unauthorized area. However an attendance or authorization system at an airport or immigration needs a significantly higher accuracy of detection along with classification and identification of the person.

    Vision models can also be used for other purposes like identifying obstacles for moving robots or drones, detecting medical issues that can be visually identified, umpiring actions in sports or analyzing images for copyrights. These have additional challenges and need either higher accuracy of identification or the ability to detect in real time for decision taking.

    Speech

    Audio is a common model for dealing with sound and the largest form of audio detection is the interpretation of human speech. Unless we are looking for a very specific sounds like an ambulance, other sounds are too generic for direct interpretation. Human speech however has many applications.

    Services like Siri, Cortana and Alexa have brought speech into our daily lives. Audio commands are becoming common in many applications and by including such intelligence in mobile phones, they are becoming ubiquitous. Many common tasks - switch on the lights, turn on the car, play music, read my messages etc - leverage speech recognition. Speech to text and text to speech technologies also allow speech to be used for dictation and reading. 

    Currently, speech recognition is limited by languages, dialects and accents. For this to become more mainstream speech models will need to expand their understanding to a broader audience. The other limiting factor for speech is that it needs to filter out the noise. With multiple people talking simultaneously in a public space, speech models will have to recognize and interpret which instructions to follow.

    Text

    Text intelligence lies in understanding the written word. While it is a variation of either vision or speech (using speech to text) the text intelligence can interpret whole documents and map them into forms used for software applications. Examples of text intelligence models include services that can scan an invoice and create an invoice in the system, parse a medical record and summarize the diagnosis, interpret a legal document and smartly identify sections that are applicable for other cases and such.

    Text intelligence also include interpreting hand writings or drawings and converting the same into regular text or images in applications. Smart glasses can read sign boards for people who are vision impaired and can also allow them to ‘read’ books or magazines. 

    Language

    Language models typically extend text or speech to support different languages. These models bridge the divide between people who speak different languages. With over 7000 spoken languages worldwide, these models allows us to tap into knowledge that spans across these language boundaries including ones that are extinct like Latin and Sanskrit. 

    Language translations are more useful when dealing with simple operations like understanding a formal speech or lecture or reaching out to local communities for social causes. However, even a simple conversation service across languages can lead to hilarious or embarrassing mistakes as each language has nuances that cannot easily be translated into other languages.

    Conversations

    Conversations are a more complex form of speech or language interpretations. When humans have a conversation, we speak across different contexts. “Getting a home run” is not the same as “running home” and the conversation models need to interpret these snippets in the context of a baseball game. The models also need to understand the context as the conversations are interrupted and resumed later. 

    Conversations are easier when we use text  like the chatbots on the web. However speech bots which can answer at a call center can also use audio to respond to specific parts of the conversations. Speech APIs like Siri, Cortana and Alexa are gradually ‘growing up’ to manage conversations instead of just short speech instructions. But in order to do so, they will need to have a lot of data and context of the person they are trying to have a conversation with.

    Decisions

    While most of the other services focus on interpretation, the decision models help in taking an action. While a vision model can identify an obstacle, the decision model is responsible for interpreting various options and taking a decision of turning a vehicle onto a different path.

    Decision services are more complex and typically involve multiple inputs and interpretations before they instruct an action. Decision services can take actions by themselves (moderate and filter graphic content), or suggest decisions to humans (recommendation for stock trades) or connect to a physical device to take decisions (open the gate for an approaching authorized vehicle)

    Custom Machine Learning Models

    The common learning models are available from multiple sources. But as the business use case evolves, these models will have to be  fine tuned for specific implementations. So while a generic speech API might become quite adept for online support, support for medical or legal terminology will be a specific service that will need a set of custom trained models.

    Businesses can jump start their AI implementations by leveraging the common learning models for various uses cases within their organizations. Doing so, brings in an AI driven culture in the and allows the teams to focus on unlocking the value of AI in their organizations.


  • You might also like

    No comments :

    Post a Comment

Talk to an Expert

Name

Email *

Message *