By now, almost anyone reading this blog has experienced a conversational agent, or “chatbot” in some form or another. They have become pervasive in support roles, automated phone systems, and some email processing systems to name just a few. One of the earliest records of a chatbot was the infamous ELIZA system, born out of the MIT AI Laboratory in the 1960s (“Eliza,” 2021). The system was designed to be a parody of a Rogerian psychotherapist, and in some cases fooled people into believing it truly understood them and had feelings of empathy. While ELIZA certainly had some of the earliest components of text processing, it was far from actually understanding what it was reading or writing.
Fast forward 60 years, and we have more compelling chatbots such as Siri (Apple), Cortana (Microsoft), Alexa (Amazon), and the Google Assistant. Some will crack wise, others prompt for additional context for a request, and some like Google Duplex actually sound like a real person on the other end of a conversation. The funny thing is, as impressive as these modern chatbots are, they still have no real concept of understanding.
What is the difference between NLP and NLU?
Natural language processing (NLP) is the field that involves the actual processing of textual data so that it can be used by computer systems. It is responsible for tokenizing and parsing text, often providing annotations for parts of speech (POS) and co-references for pronouns and nonspecific objects. It tries to identify entities (is the “White House” a proper name or simply a house that is white?). Some subfields involve data mining, knowledge base construction, and language modeling; there are many more highly specialized fields in NLP that we won’t be exploring here. Suffice to say it is a very rich field of research, and incredibly active.
Natural language understanding (NLU) on the other hand is a research area with heavy overlap in NLP as it requires the ability to process textual data, but it has its own research focus on trying to understand what the text contains at a conceptual level such as intent, context, and semantics. The “understanding” part is where most chatbots break down, and what is crucial to creating incredibly useful natural language systems that can grasp the same concepts that we can.
In a nutshell, NLP is all about what is contained in the corpora, and NLU is extracting the meaning.
What about [insert SOTA DL language model here]?
We’ve all seen the impressive demos of the latest SOTA language models like GPT-x, BERT, ELMO, Codex, and many of the incredible models provided by the HuggingFace community (Hugging Face – The AI Community Building the Future., n.d.). While these models are great at processing text and generating seemingly relevant follow-on sentences in many simple cases, in the end they are still purely statistical models that identify the sequence of characters that most often follows the leader text. Think of them as probabilistic context free grammars (PCFG) flipped on their head so they can now generate text instead of simply consuming it, reduced to very fast linear algebra operations. If you give one of these large language models (LLM) the leading text of “Shall I compare thee to a summer’s day?” it will churn out Shakespeare’s Sonnet 18 almost verbatim, or at worst some gibberish with a very Shakespearean style – which it has picked up from being trained on literary corpora (Crowe, 2020).
Granted, in some cases this is actually very useful if you have a constrained ontology and good training material. Some people have even explored using such models as “rubber ducks” so they can essentially talk with themselves by training on large corpora of their own conversations with other people (Talking to Myself or How I Trained GPT2-1. 5b for Rubber Ducking Using My Facebook Chat Data, 2020). Others have trained these models on very specific support transcripts to attempt to provide a first-line support option that scales (with oftimes dubious success). While useful, these models will still always be challenged to actually grok the meaning behind the conversations they were intended to handle.
Why is NLU so hard?
While NLP is able to perform amazing analysis on natural language corpora, what it is missing is the critical common knowledge that is shared between conversationalists, commonly referred to as the “Missing Text Problem” (Machine Learning Won’t Solve Natural Language Understanding, 2021). For example, take this simple sentence:
The weather is nice today.
What exactly does that mean in the context of a conversation? If you’re in Atlanta in the summer, this could mean it’s not too humid, raining, or hot. Even then, what constitutes “hot” to someone in Atlanta might be a “nice” temperature somewhere else in the world. If you’re at the beach, “nice” would be sunny and hot.
Through the evolution of language, the sentence has effectively been compressed in a lossy fashion, where the conversationalists are able to effectively decompress it using shared knowledge of where they are, what time of year it is, and what the context is (beach vs in the city). Have you ever tried to have a conversation with someone not versed in a topic you know well, and there always seems to be something “lost in translation?” That’s the lack of common knowledge. Effective communications even between humans requires a shared experience and cultural understanding which a LLM simply does not have.
To make matters worse, most LLM implementations attempt to further compress an already compressed piece of information using autoencoders, embeddings, and other approaches to find relationships in the pattern of the words in an effort to reduce the feature set. You have now approximated a lossy-compressed piece of information. It is no small wonder that LLMs struggle with understanding when they have lost information through at least two steps – thoughts to words, then words to quantitative data.
This lack of understanding not only hinders comprehension of provided textual data, but is a major stumbling block in summarization and storytelling approaches that is an active area of research (An Introduction to AI Story Generation, 2021). Storytelling or content generation of any kind requires strong background information to not only hold a story together cohesively, but provide guidance for actors in the storyline, and often the sources used to train storytelling ML systems are the same lossy-compressed inputs used for NLP consumption.
Conclusion
In the past, symbolic AI systems and manually curated knowledge bases drove NLP and NLU research, but proved difficult to scale and still struggled with understanding. Modern systems leveraging ML and LLMs are able to bypass the knowledge base bottleneck, but lose the ability to understand what they are consuming because they ignore linguistics and the psychology of target users. Maybe it’s time for NLU research to begin bridging the gap between symbolic and statistical AI leveraging hybrid approaches so we can build systems that truly understand conversation and can intelligently engage people.
References
An introduction to AI story generation. (2021, August 21). The Gradient. https://thegradient.pub/an-introduction-to-ai-story-generation/
Crowe, L. (2020, October 5). Poetech: Shall I compare thee to GPT-3? Shakespeare v AI. Sayre Zine. https://sayre-zine.com/2020/10/05/gpt-3-ai-poetry-shakespeare-sonnet-18/
Eliza. (2021). In Wikipedia. https://en.wikipedia.org/w/index.php?title=ELIZA&oldid=1034743779
Hugging Face – The AI community building the future. (n.d.). Retrieved August 30, 2021, from https://huggingface.co/
Machine learning won’t solve natural language understanding. (2021, August 7). The Gradient. https://thegradient.pub/machine-learning-wont-solve-the-natural-language-understanding-challenge/
Talking to myself or how I trained GPT2-1. 5b for rubber ducking using my Facebook chat data. (2020, January 23). Svilen Todorov. https://svilentodorov.xyz//blog/gpt-15b-chat-finetune/