AI & LLM Glossary
Simple explanations of key terms used in artificial intelligence and large language models
This glossary explains common AI and LLM terms in simple language to help you understand the technology better.
A
AI (Artificial Intelligence)
Technology that enables computers to perform tasks that typically require human intelligence, such as understanding language, recognizing patterns, solving problems, and making decisions.
AGI (Artificial General Intelligence)
A hypothetical form of AI with the ability to understand, learn, and apply knowledge across a wide range of tasks at a level equal to or surpassing humans. Unlike current AI systems that excel at specific tasks, AGI would have general-purpose intelligence.
Attention Mechanism
A technique used in neural networks that allows models to focus on specific parts of the input when generating each part of the output. In transformers, attention helps the model understand relationships between words regardless of their position in a sequence.
B
BERT (Bidirectional Encoder Representations from Transformers)
A transformer-based model developed by Google that reads text in both directions (bidirectionally) to better understand context. BERT is primarily used for understanding text rather than generating it.
Bias
Systematic errors in AI output that reflect societal prejudices or statistical imbalances in training data. Bias can cause models to produce unfair or stereotyped responses for certain groups or topics.
C
Context Window
The number of tokens (words or word pieces) an AI model can consider at once when processing text. A larger context window allows the model to "remember" more of the conversation or document it's working with.
Corpus
A large collection of texts used to train language models. The quality, diversity, and size of the corpus significantly affect what the model learns and how it performs.
D
Deep Learning
A subset of machine learning that uses neural networks with many layers (hence "deep") to analyze patterns in data. Deep learning powers most modern AI systems, including LLMs.
E
Embedding
A technique that converts words or tokens into numerical vectors (lists of numbers) that capture their meaning. Words with similar meanings have similar embeddings, enabling the model to understand relationships between concepts.
Epoch
One complete pass through the entire training dataset during the learning process. Models typically require multiple epochs to learn effectively.
F
Fine-tuning
The process of further training a pre-trained model on a specific dataset to adapt it for particular tasks or domains. Fine-tuning helps models perform better on specialized tasks without requiring full retraining.
Feed-Forward Network
A component of transformer models that processes each position's data independently after the attention mechanism has been applied. It helps the model make sense of the relationships identified by the attention mechanism.
G
GPT (Generative Pre-trained Transformer)
A family of transformer-based language models developed by OpenAI, designed to generate human-like text. Each new version (GPT-3, GPT-4, etc.) has increased in size and capabilities.
Gradient Descent
An optimization algorithm used to train neural networks by gradually adjusting the model's parameters to reduce errors in its predictions.
H
Hallucination
When an AI generates information that seems plausible but is factually incorrect or fabricated. Hallucinations occur because models predict plausible text patterns rather than retrieving verified facts.
I
Inference
The process of using a trained AI model to make predictions or generate outputs based on new inputs. In LLMs, inference is when the model generates text in response to prompts.
L
LLM (Large Language Model)
A type of AI model trained on vast amounts of text data to understand and generate human language. LLMs like GPT, Claude, and LLaMA use transformer architectures and contain billions of parameters.
Layer Normalization
A technique used in transformers to stabilize and standardize data as it passes through layers, helping the model train more efficiently.
M
Machine Learning
A subset of AI that enables systems to learn patterns from data and improve their performance without explicit programming. LLMs are a type of machine learning system.
Multimodal Model
An AI system that can process and generate multiple types of data, such as text, images, audio, or video. Examples include GPT-4V and Claude Opus, which can understand both text and images.
N
Neural Network
A computing system inspired by the structure of the human brain, consisting of interconnected nodes (neurons) organized in layers. Neural networks form the foundation of modern AI systems.
O
Overfitting
When a model learns the training data too well, including its noise and peculiarities, causing it to perform poorly on new, unseen data. It's like memorizing answers instead of understanding concepts.
P
Parameter
A variable within a neural network that the model adjusts during training. LLMs have billions of parameters (weights and biases) that determine how they process and generate text.
Positional Encoding
A technique used in transformers to give the model information about where words appear in a sequence, helping it understand the order and structure of language.
Prompt
The input text provided to an AI model to generate a response. Effective prompts clearly communicate what you want and provide necessary context for the model to understand the task.
R
RLHF (Reinforcement Learning from Human Feedback)
A training method where human evaluators rate AI responses, and these ratings help the model learn which outputs are preferred. RLHF helps align models with human values and preferences.
S
Self-Attention
A key mechanism in transformers that allows the model to consider relationships between all words in a text when processing each individual word. This helps capture context and meaning more effectively.
Softmax
A mathematical function used in AI models to convert raw scores into probabilities that sum to 1. In LLMs, softmax is used when calculating the probability distribution for the next token in a sequence.
T
Temperature
A parameter that controls randomness in AI text generation. Higher temperature (e.g., 0.8) produces more creative and varied responses, while lower temperature (e.g., 0.2) yields more focused, deterministic outputs.
Token
The basic unit of text that language models process. Tokens can be words, parts of words, or punctuation marks. For example, "tokenization" might be split into "token" and "ization" as separate tokens.
Transformer
A neural network architecture introduced in 2017 that revolutionized natural language processing. Transformers use self-attention mechanisms to process text in parallel rather than sequentially, allowing for more efficient training on larger datasets.
Transfer Learning
A machine learning technique where a model trained on one task is repurposed for a related task. Most modern LLMs use transfer learning by pre-training on general text and then fine-tuning for specific applications.
V
Vector
A mathematical representation of data as lists of numbers. In LLMs, words and concepts are represented as vectors in a high-dimensional space, where similar concepts have vectors that are close to each other.
Z
Zero-Shot Learning
The ability of AI models to perform tasks they weren't explicitly trained on. Modern LLMs can answer questions or follow instructions without specific examples, using their general knowledge of language and concepts.