Introduction
In recent years, artificial intelligence (AI) and machine learning (ML) tools have become increasingly robust and integrated into our everyday lives. However, the terms to describe these technologies can often be confusing or intimidating, even for those with technical backgrounds. That’s where this article comes in – starting with the broadest terms and working our way down, we’ll break down some core technical vocabulary related to Al and ML so you can understand the basics and better navigate the world of AI and ML.
Table of Contents
Artificial Intelligence (AI)
A broad category that can be summed up as humanity’s efforts to allow computers to approximate human abilities such as vision or the ability to understand language. AI should not mistaken for Artificial General Intelligence or “AGI”, which refers to a hypothetical system that could theoretically learn to accomplish any task a human can. Although the goal of many AI researchers is ultimately to produce an AGI, it’s still not possible with today’s technology and methods.
Machine Learning (ML)
Arthur Samuel said it best in 1959: “[Machine learning is a] Field of study that gives computer(s) the ability to learn without being explicitly programmed.” Although ML is a huge part of AI research and development, because AI is such a broad category, ML is ultimately a subset of AI.
Deep Learning Network (DLN) or Neural Network (NN)
A big mathematical equation that takes an input, performs a bunch of operations, and provides an output. This occurs across many “layers” of individual operations that can be visualized as a web of connections, somewhat similar to how neurons in the brain are connected to each other via synapses. In recent years, however, comparing to human physiology has been falling out of favor, which helps explain why terms like “deep learning” have been introduced to replace terms like “neural network”.
Natural Language Processing (NLP)
A field of study dedicated to allowing computers to understand human language. NLP doesn’t have to be ML, but nowadays, it’s safe to say that most NLP tools are based on some form of ML because it’s proven to be a practical method of building effective NLP tools. Speech recognition, text-to-speech, machine translation, and text generation are just a few examples of NLP tasks.
Transformer
A type of DLN with specific features proven to be quite powerful, particularly for NLP. One of these features, “self-attention”, allows it to track relationships between data points like words in a sentence, allowing it to better understand the context of what’s being said. This makes it much better at understanding language because without understanding context, it’s incredibly difficult to decipher the nuances of human communication.
Large Language Model (LLM)
A broad category of AI models trained on large datasets to learn the patterns and structures of human language. The LLM category includes GPT models, but not all LLMs are GPTs. Nowadays, the most common methods for training an LLM utilize deep learning methods, but deep learning is not necessarily a requirement of an LLM.
Generative Pre-trained Transformers (GPT)
A type of LLM. To define GPT, I think it helps to start from the end of the acronym and work backward:
“T” – a Transformer model…
“P” – which has been Pre-trained on large amounts of data…
“G” – that Generates new content, e.g. a response to a prompt.
Model
The term model is somewhat vague and is often used differently depending on the context:
- “AI” Model – Researchers or developers will often use “model” to refer to the specific algorithms used in machine learning, such as deep learning, linear regression, logistic regression, decision trees, random forest, etc.
- “Foundation” or “Base” Model – This is what most people think of when referring to an AI “model”, and is essentially the end result of training. Examples of foundation models include DALL-E, Stable Diffusion, Llama, GPT-n, and many others.
- These terms can also differentiate between a model provided as-is, such as Stable Diffusion 1.5, and fine-tuned models derived from the base model, such as DreamShaper.
Training
The process where a model learns from the data it has been provided. This requires far more time and computing resources than running the end product, which is the model itself. Meta’s “Llama 2 70B” model, for example, took about 1.7 million GPU-hours to complete its pre-training. 100 top-of-the-line GPUs working together would take almost two years to complete that task and even with 1000 GPUs, it would still take 10 weeks, showing the kind of scale required to train these very large models.
Pre-training
The initial phase of training where the model is exposed to a large amount of data to gain a general understanding of the relevant relationships within that data. In human terms, this could be compared to general education. Note that this is distinct from the term “Pre-trained” found within GPT, which simply means that a GPT model is “already trained” instead of referring to a specific phase of training.
Fine-tuning
A model is fine-tuned after pre-training on a more specific set of data, with to improve its accuracy when performing a specific task. In contrast to the “general education” of pre-training, fine-tuning is more like vocational education.
Checkpoint
Technically, a checkpoint is an intermediate step during training, which allows for saving the current state of the training’s progress without having to complete the entire training process. This gives us more flexibility and fault tolerance during training, which is incredibly important due to the time and computational requirements of training. However, in some cases, the term “checkpoint” is used synonymously with “model”, such as within the stable-diffusion-webui (A1111) interface.
Parameters
Can be thought of as knobs used to affect and “dial in” the output of a given model. In an ideal scenario, the more parameters you use when training a model, the more nuanced and accurate its output will be. But just like knobs controlling a machine, they must be turned the right way to be effective! However, high parameter counts also increase the resources required for both training and running a model. It’s common for model names to include their parameter count, such as “Mistral 7B” featuring 7 billion parameters, or “Llama 2 70B” featuring 70 billion parameters.
Inference
Once a model has been trained, inference is when the model makes predictions based on new data it has been provided. To the end user, this is when you are getting output from a model, such as getting a response from ChatGPT. Your prompt is the new data and the response you get back is the model’s prediction or “inference” of what it thinks you’re looking for, based on what it was trained to do.
Prompt
What you submit to the AI when you are seeking an output. The most common example is text, often in the form of a question to an LLM, but it could be audio or visual data as well.
Context window
The amount of space that an LLM can use as input to generate a response, measured in tokens. Generally, bigger context windows are more useful but require more resources to facilitate. Currently, most LLMs support a context window of 2048 or 4096 tokens, but a lot of effort and research is being directed to efficient ways of increasing these limits for both current and future models.
Token
Basic unit of text that a model can process or generate. The “tokenization” process splits text, such as a user-submitted prompt, into smaller segments to be more easily understood and manipulated by a model. Multiple tokenization methods exist, so depending on the model, a token could be anything from just a single character to entire words. Byte-pair encoding (BPE) is a common method of tokenization, which results in tokens of about 2-3 characters apiece.
Quantization
Lowering the accuracy of a given model to reduce the amount of computational resources needed to load and run the model. For example, a model that requires ~80GB of memory to load with 16-bit accuracy may only take ~20GB of memory if it were quantized down to 4-bit.
Tensor
A tensor is a lot like a spreadsheet with cells that influence each other but with more dimensions than just rows(x) and columns(y). These “cells” follow certain rules, and by manipulating what’s in these “cells”, we can teach a model to make accurate predictions, like what word is likely to come next in a sentence based on the relationships between the preceding words. If you are familiar with vectors, it might help to think of a tensor as a kind of multi-dimensional vector.
For a comprehensive yet accessible introduction to tensors, check out Dan Fleisch’s “What’s a Tensor?” video on YouTube.
Low-Rank Adaptation (LoRA)
A method for fine-tuning models without fundamentally changing the model underneath. Once a LoRA is trained, it can be applied to an existing model to modify its outputs. A common example would be a Stable Diffusion LoRA that has been trained on images of a particular style to get the base model to output images mimicking that style consistently.
Conclusion
We hope that this article has helped improve your understanding of the terminology used in the field of AI and machine learning. If you’ve encountered confusing AI jargon, commonly misunderstood terms, or helpful analogies that aid in understanding, feel free to share your insights in the comments section below!