Explore the evolution of AI and ML from traditional systems to generative models, and learn how understanding these advancements can empower organizations to effectively leverage AI technologies in solving real-world challenges.
Artificial Intelligence (AI) systems have evolved over a long period of time from concepts of symbolic AI systems to machine learning (ML), and further deep learning. More recently, the phrase ‘Generative AI’ has garnered a lot of attention with the success of OpenAI’s ChatGPT, followed by Google’s Gemini, Meta’s Llama, and numerous other Large Language Models (LLMs, leading to a rush of development of Gen AI applications. Most of these applications have focused on LLMs that generate text.
The key behind the success of this technology was the introduction of the Transformation model by Google’s researchers in their seminal paper, “Attention is All You Need,” [1] which is based on deep-learning architecture. This model was based on the Attention mechanism that determines the relative importance of each component in a sequence relative to the other components in that sequence.
So where exactly does this lie in the development history of Artificial Intelligence? More importantly, how exactly can we leverage this with the current AI architectures already built? For that, we need to better understand the landscape of the AI field, to not be overwhelmed by the catchy phrases that are gaining popularity in the media.
AI definitions
AI’s definition can be presented in two different perspectives [1]. The first is to define Artificial Intelligence in terms of the goal of the field, and the second from the perspective of an agent.
From the perspective of a goal, it can be laid out across two dimensions:
(i) Is the reasoning based on human performance or ideal rationality?
(ii) Does the system act like humans or act rationally?
This can be summarized in the table below.
All work in AI does not have to necessarily fall in one of the four boxes: Many systems have been built that span across these boxes. Some researchers have focused on building their work in one of them, hoping it would be used in another context. For example, a large amount of work on ‘planning’ can focus on reasoning, while other systems can be built on using this to act upon different scenarios.
Artificial Intelligence systems can also be described from a view of an agent[2]:
“The main unifying theme is the idea of an intelligent agent. We define AI as the study of agents that receive percepts from the environment and perform actions. Each such agent implements a function that maps percept sequences to actions, and we cover different ways to represent these functions.”
The figure below provides an illustration of an Intelligent Agent.
These two perspectives play an important role in how AI has evolved and continues to evolve.
AI Landscape
AI has evolved into multiple sub-fields based on the approach to solve a problem, the kind of inputs, and expected outputs, as summarized in the table above. The landscape itself can be viewed as illustrated in the following figure:
While the figure is not a comprehensive representation of all AI sub-fields, it helps in providing a guideline to the different techniques and terminologies that have been trending. As seen in the graphic, Machine learning is a subset of Artificial Intelligence. AI techniques that do not encompass machine learning are commonly referred to as Symbolic AI and constitute rule-based systems, expert systems, knowledge-based systems, etc. that primarily rely on human knowledge and sets of logic into programmable systems.
Machine learning allows systems to autonomously learn from data without human intervention. Typically, machine learning requires large amounts of data to process and learn. There are three main machine-learning categories:
Deep learning falls under the categories of supervised and unsupervised learning and differs from traditional ML techniques in how the algorithm learns and how much data is required to learn. Neural networks form the backbone of deep learning. Deep learning refers to the existence of multiple hidden layers of interconnected nodes between input layer and output layer of a neural network.
The challenge with deep learning techniques is its complexity, requiring large amounts of data (often dense data) to build the complex multiple hidden layers and to automatically determine the features and scale.
ML models can also be classified based on the tasks or the goals they achieve. We can classify models into:
It is true that the most popular and well-known techniques of classical ML, such as Decision Trees, SVMs, Regressions, etc., focus on discriminative models. However, it is important to note that both classical machine learning and deep learning implement descriptive models and generative models. Although generative models are more popular under deep learning, models such as Naive Bayes, GMMs and HMMs are some examples of generative models in classic machine learning space. Hence, in the figure above, we have labeled ‘Generative AI-DL’ to reflect generative models within deep learning framework.
Gen AI models popularly mentioned in the media are typically based on deep learning methods. One such example is the Large Language Models (LLMs) that are Transformer models focused on NLP tasks. Gen AI models, however, comprise of also various other kinds of models other than Transformer models.
Summary
Artificial Intelligence has evolved as a field for around seventy years. Advances in data collection, storage, and computing power have enabled significant advances in machine learning and deep learning in terms of the ability to conceptualize new models and implement them at scale. The most recent advancement of the Transformer model based on attention mechanism, in conjunction with availability of GPU power for large scale computation, has spun off the ability to build a class of Generative AI applications.
Models enable systems to solve problems. What models one adopts is also largely based on what problem one is trying to solve, the data available, the compute available, and the cost of building, implementing, and maintaining such a system in a production environment, which for large organizations is a multi-cloud environment.
The Generative AI field is itself progressing with the adaptation of Transformer models for various applications, while many systems are also moving to agent-based frameworks where sensors are collecting data continuously and expected to act in real time to the changing environments. However, not all problems require agent-based modeling or deep learning architectures (such as Transformer models), that are complex and require large amounts of data and cost due to their compute power (which is today leveraged in terms of tokens cost). Hence, it is important for teams to understand the problems they are solving, the data, and the infrastructure available before venturing into any line of advanced AI systems.
In our future blogs, we will discuss how exactly we can build an architecture that leverages the different models, describe in detail Generative AI and Transformer models, and summarize agent-based AI systems.
References
[1] Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
[2] Bringsjord, Selmer and Naveen Sundar Govindarajulu, "Artificial Intelligence", The Stanford Encyclopedia of Philosophy (Fall 2024 Edition), Edward N. Zalta & Uri Nodelman (eds.), URL.
[3] Russell, S. & Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd ed.). Saddle River, NJ: Prentice Hall.