Chapter 2: How Does Generative AI Work?

Chapter Overview

In Chapter 1, we discovered that generative AI is not just about classification or data analysis; rather, it creates original text and ideas based on patterns it has learned from existing information. We explored how these tools can streamline tasks like contract review and legal research, saving time and energy for legal professionals. We also discussed their limitations, such as the tendency to provide confident-sounding but erroneous answers, and the importance of using them responsibly.

In this chapter, we peel back the curtain on how these generative AI tools actually work. We will go step by step through the technical components, but in a way that remains accessible to a general audience (think: high school–level explanations). We will focus on the key ideas behind concepts like:

Neural networks (the building blocks of modern AI systems)
Large Language Models (LLMs) (like GPT-4o, o1, Claude, and others)
The Transformer Architecture (the breakthrough that enables AI to handle words and sentences so effectively)
How LLMs learn from data
Why these models can still make mistakes and how to identify potential pitfalls
Hands-on experiment to see, at a high level, how these systems predict the next word

By the end, you will be able to understand (1) how these systems process language, (2) what makes them both powerful and fallible, and (3) how to begin integrating them thoughtfully into your legal practice. We will also lay the groundwork for our next chapter, which focuses on specific AI tools, including ChatGPT and Claude, to better understand their practical use cases.

From Conceptual Understanding to Technical Foundations

We often hear people talk about AI as if it were magic: “It just knows how to write a motion or contract.” But as future legal professionals, it’s essential to develop AI literacy, the ability to look beyond the “black box” mystique and grasp the essentials of how AI systems work. This understanding, even if it’s at a high level, will help you:

Assess AI tools critically

Is the system trained on relevant legal data?
How trustworthy are its outputs?

Use AI responsibly

Recognize that AI can produce errors or biased content.
Understand how to mitigate risks through review and validation.

Communicate effectively with technical teams

Know the right questions to ask about data sources, training methods, and model performance.

Leverage AI’s strengths

Speed up legal research, drafting, and other routine tasks.
Focus your human expertise where it matters most (e.g., nuanced legal strategy).

Throughout this chapter, we will keep the explanations as simple as possible, sometimes using analogies and everyday language. For those wanting a deeper dive, look for the optional “Callouts and Key Terms” or “Practice Pointers” that give additional detail.

Defining Artificial Intelligence

Artificial Intelligence (AI) is a broad field focused on creating computer programs that can perform tasks that normally require some level of human intelligence. These tasks range from recognizing speech or images to writing entire legal documents. While AI can sometimes seem magical, it is ultimately about pattern recognition, software that detects structures in data and uses those structures to make predictions or decisions.

Machine Learning: A Subset of AI

Within AI, one of the most important and fast-growing areas is machine learning (ML). Rather than manually programming rules for every scenario (which is nearly impossible for complex tasks like natural language understanding), ML systems learn automatically from examples.

Types of Machine Learning

Supervised Learning

The most common type in many industries. You provide labeled examples (e.g., “These 1,000 documents are contracts; these 1,000 are not”), and the model learns to distinguish one from the other.
Example in Law: A system could learn to separate discovery documents that are relevant to a case from those that are not.

Unsupervised Learning

The system is given unlabeled data and asked to find patterns or groups.
Example in Law: Clustering documents by theme or topic to detect patterns in case files, without you having to label them first.

Reinforcement Learning (RL)

The system learns by trial and error, receiving rewards (or penalties) based on actions it takes.
Example: An AI that learns the best strategy to argue a mock case by trying different rhetorical approaches and getting feedback from human evaluators.

Deep Learning: Where Neural Networks Come In

Deep Learning (DL) is a special branch of machine learning that involves neural networks with multiple layers. Each layer captures increasingly complex patterns. Think of it as a multi-layered structure that can start by recognizing letters, then words, then sentences, and so forth. When you hear about breakthroughs in image recognition, speech-to-text, or language generation, there’s a good chance it’s powered by deep learning.

Reference Note

For an in-depth, technical view of deep learning and neural networks, you might look at Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016). They break down how these networks are constructed and trained to handle complex tasks.

General vs. Narrow AI: What’s the Difference?

When discussing AI, it’s helpful to distinguish between two visions:

Artificial General Intelligence (AGI) or Strong AI

An AI that can reason, learn, and apply knowledge across any task, much like a human.
AGI is still a theoretical concept and is the stuff of science fiction, think of a computer that can write legal briefs and drive a car and solve advanced mathematical proofs and paint original artworks, all with the same level of skill.

Narrow AI or Weak AI

AI that excels in specific tasks, such as recognizing faces, playing chess, or writing text based on patterns it has seen.
Almost all of the AI used in business, law, healthcare, and other sectors today is considered narrow AI. Generative AI tools like ChatGPT, while incredibly versatile with language, are still narrow because they can’t do tasks unrelated to their language training.

Practice Pointer

Don’t be fooled by how “intelligent” a large language model seems. It’s still considered narrow AI. It can do many language-related tasks, but it doesn’t “understand” in the same way a human does, nor can it pivot to solve unrelated tasks like robotics (unless it is specifically designed or fine-tuned to do so).

What Is a Neural Network?

A neural network is a computational model inspired by the structure of the human brain (though it’s much simpler, so the analogy is limited). In our brains, billions of neurons pass electrical signals to each other to interpret the world and drive our actions. An artificial neural network mimics this idea with layers of artificial “neurons.”

Perceptron: The Simplest Building Block

The most basic form of an artificial neuron is called a perceptron. Think of a perceptron as a tiny decision-maker that receives inputs (numbers), multiplies them by some “importance factors” (called weights), sums them up, and passes them through a simple rule. If the total is above a certain threshold, the perceptron outputs 1 (like “Yes”); if not, it outputs 0 (like “No”).

Analogy: Brain Neuron

A single human neuron fires an electrical signal if it receives enough of the right inputs from other neurons. Similarly, a perceptron “fires” if the weighted inputs exceed a certain threshold.

How Do These Networks Learn?

When you stack thousands or even millions of perceptrons into multiple layers, you get a deep neural network. These networks learn patterns from lots of data. During training they employ the following algorithm:

The network receives an example (like a sentence or an image).
It makes a prediction (which could be the next word in a sentence).
It checks how close that prediction was to the correct label or outcome.
It adjusts the weights and biases so next time, the prediction gets closer to correct.

Over many iterations, the network finds patterns that allow it to make increasingly accurate predictions.

Large Language Models

Now, let’s zoom in on large language models (LLMs), the AI systems that power tools like ChatGPT and Claude. As the name suggests, they are big neural networks designed specifically for language tasks. Their size is often described in terms of parameters, think of each parameter as a dial that the training process fine-tunes to recognize linguistic patterns.

Scale: Why Are They Called “Large”?

GPT-3.5 (the model behind ChatGPT for a while) has roughly 175 billion parameters.
GPT-4 has been reported to utilize 1.76 trillion parameters. Yes, that's trillion, with a "t"!
Models like Claude also operate in the billions-parameter range.

The idea is that more parameters often give a model the ability to capture more nuanced patterns. If a model is “too small,” it might not learn the richness and variety of human language. But as you go bigger, you can capture a lot of subtle context. This is one reason advanced LLMs can produce surprisingly coherent, human-like text.

Example and Scenario

A smaller language model might consistently confuse words like “defendant” and “plaintiff,” especially in complex sentences.
A large model that has seen more legal documents (and thus has more “experience”) is more likely to use these terms correctly, and even generate entire paragraphs that sound like they were written by a seasoned attorney.

Key Term Callout: “Parameter”

Each parameter in an LLM is like a tiny dial the model adjusts during training to reduce errors. Examples of parameters include the model's weights and biases. More parameters mean more dials, and typically, more capacity to represent complex language patterns.

The Transformer Architecture: “Attention Is All You Need”

A huge breakthrough in language processing came in 2017 with a paper titled “Attention Is All You Need”, introducing the transformer architecture. Before transformers, AI models that handled text often processed words in order, one word after another, using recurrent neural networks (RNNs). This seems logical to process words in order, but many times the meaning of words is informed by the interrelationship of meaning between the words. Transformers changed the game by allowing the model to look at all words in a sentence at the same time and figure out which ones are most important to each other.

What Is “Attention”?

“Attention,” in this context, means the ability of the model to weigh how relevant one word (or part of a sentence) is to another word or part of a sentence. The model doesn’t just read from left to right. Instead, it learns that, for example, in the sentence “The lawyer who was very tired argued the case,” the word “lawyer” is strongly connected to “argued,” while “tired” modifies “lawyer.” This helps the model keep track of context over long sentences.

Analogy: Spotlight on Stage

Picture multiple actors on stage delivering lines. “Attention” is like a movable spotlight that highlights the most relevant actor(s) at any moment, allowing you (or the AI) to focus on the key interactions.

Detailed Example:

“When Jane realized she had forgotten her bag, she rushed back to the store, where Mark had hidden it behind the counter so no one else would take it.”

In this single sentence, the correct interpretation of words like “she,” “her,” “Mark,” and “it” depends on the relationships among them:

Who is “she”? It points back to “Jane.”
Whose bag is it? The word “her” (in “her bag”) also refers to Jane.
What is “it”? “It” refers to the same bag mentioned earlier.
What did Mark hide, and why? Mark’s action of hiding the bag behind the counter so that “no one else would take it” clarifies both his role and the function of “it.”

A language model equipped with self-attention (and cross-attention in multi-sentence contexts) can examine these words and their references simultaneously. Rather than simply reading each word in a linear fashion, the model creates a “map” of relationships among tokens. This allows it to recognize that “her bag” and “it” both point to the same object, that “she” is the same person as “Jane,” and that “Mark” is a different individual performing a distinct action. These interrelationships seem like common sense to us, but they can easily trip up less sophisticated language models. By tracking these interdependencies, the model demonstrates a form of “contextual understanding,” which is pivotal for interpreting meaning accurately in complex sentences.

Why Transformers Changed the Game

Parallel Processing: Instead of reading text sequentially, transformers analyze entire sentences (or paragraphs) in parallel, which is faster and more efficient.
Better Long-Range Context: The model can connect words or phrases that are far apart in the text. In legal documents, context from the beginning of a paragraph can be critical at the end.
Scalability: Transformers scale very well to large amounts of data and large network sizes (hence “large language models”).

This transformer-based approach is why ChatGPT and similar tools can produce well-structured, contextually relevant paragraphs. They’re essentially experts at “paying attention” to the right parts of a sentence.

Interpolation vs. Extrapolation

An important limitation of LLMs (and AI in general) is the difference between interpolation and extrapolation:

Interpolation: Filling in gaps within the patterns the model already knows. For instance, if the model has seen thousands of standard clauses in legal contracts, it can “interpolate” to produce a new clause that fits that familiar pattern.
Extrapolation: Going beyond the data the model was trained on and creating genuinely new ideas or patterns never before encountered.

LLMs are generally good at interpolation because they are experts at spotting and replicating patterns they’ve seen. But they’re not so great at true extrapolation, if you ask them something totally outside their training data, they might give nonsensical or made-up answers. This is one of the reasons why LLMs tend to hallucinate - because they are, in a sense, not drawing from their experience, but just trying to make it up. Of course, this is a critical point in legal settings, where a unique or unprecedented scenario might arise and the model could fail to respond accurately.

Practice Pointer

Always remember that an LLM’s knowledge is bounded by what it has seen. If your legal scenario is highly novel or cutting-edge, rely more on human legal expertise and research rather than a model’s guesses.

Weights, Biases, and Parameters

Let’s circle back to some foundational concepts in neural networks:

Weights

These are the numbers the model uses to determine how strongly one neuron’s output should affect the next neuron’s input.

Biases

Additional values that shift the output of a neuron up or down, similar to how you might adjust the baseline in a scale.

Parameters

Both weights and biases are collectively called “parameters,” and these are what get tuned during training.

When someone says a model has “billions of parameters,” they mean billions of these numerical weights and biases. Training is the process of adjusting all these parameters so the model performs better on the task at hand.

Example

In a simplified sense, if we have an input word “contract,” the network might use a certain weight to link it strongly with “legal obligations” in the next layer. If that weight is too high or too low, the model might overemphasize or underemphasize certain words in its predictions.

What Is Gradient Descent?

Gradient descent is the method most commonly used to train neural networks. Think of it as a systematic way of tuning weights and biases to reduce errors.

Analogy: Climbing Down a Hill

Imagine standing on a foggy hillside trying to reach the lowest point in the valley. You can’t see far, so you test small steps in different directions. If one step moves you downward, you keep going that way. If you go up, you backtrack. Over time, you (hopefully) reach the bottom.

In the same way, a neural network adjusts its parameters in tiny increments, guided by how much these adjustments reduce (or increase) the overall error on training examples.

The key to gradient descent is having a “loss function,” which measures how far off the model’s predictions are from the desired result. Each training step tries to minimize this loss. The ideal is for training to result zero loss, when the model’s predictions perfectly match the target outputs. While it can happen (especially for very simple datasets or overly flexible models), achieving literally zero loss is relatively rare.

Another way to think of it is to flip the numbers around: the model wants to score 100% and get everything right. It's graded at every training step so it knows how far off it is from perfection. If it scores 90%, it tries to adjust its strategy to get 10% better. Currently, getting 90%+ for LLMs is, like it is for us humans, pretty good!

Call Out: The Problem of Overfitting

For Humans: Think of overfitting like a student who memorizes every word in a textbook but never truly learns the underlying concepts. They might ace a practice test because it uses the exact same examples, but when given new questions, they struggle to apply their knowledge.

For AI: A model that is “overfitted” has learned the training data too well, picking up not just meaningful patterns but also noise and irrelevant details. As a result, it performs impressively on the examples it was trained on, yet falls short when it encounters new, unseen data.

Why It Matters: In the context of law (and any real-world application), an overfitted AI tool can give misleading or incorrect results when faced with novel scenarios. Balancing how much a model learns from training data without memorizing every quirk is key to building reliable and trustworthy AI systems.

Vectors and Embeddings

A fundamental idea in language models is representing words (and sometimes sentences or entire documents) as vectors, lists of dimensions or numbers that capture meanings or attributes.

A Simple Analogy

Suppose you want to describe a friend. You might list attributes such as:

Height
Hair color
Birthplace
Occupation
Favorite sports team
Eye color

Each of these attributes is one "dimension" in a vector. If you know enough attributes (dimensions), you can uniquely describe your friend compared to everyone else.

Word Embeddings

In language models, words are also turned into these multi-dimensional vectors called embeddings. If two words frequently appear in similar contexts (like “contract” and “agreement”), their embeddings will be similar. This helps the model “understand” relationships between words in a numeric way.

Example

“Lawyer” and “attorney” may end up with vector embeddings that are very close.
“Dog” and “car” are likely more distant in this “embedding space” because they appear in very different contexts.

Practice Pointer

Embeddings also explain why models might get confused between words that show up in similar contexts. If “defendant” and “respondent” appear in similar environments, a model might occasionally mix them up.

Reinforcement Learning

We touched on reinforcement learning (RL) earlier. Instead of just training on fixed examples, RL has the model interact with an environment. It receives rewards for good actions and penalties for bad ones.

RL With Human Feedback (RLHF)

ChatGPT famously uses a version of RL called Reinforcement Learning with Human Feedback (RLHF). Humans rate the model’s responses, effectively telling it which answers are better or worse. The model uses these ratings to adjust its parameters. Over multiple rounds, it gets better at producing the kind of answers humans find helpful.

Why Did RLHF Make ChatGPT So Good?

Regular language models might produce correct but confusing answers, or answers that are correct in form but irrelevant in content. RLHF aligns the model with human preferences, so it tends to produce responses that are both accurate (most of the time) and helpful in tone.

Janelle Shane’s Quote

“The danger of AI is not that it’s too smart but that it’s not smart enough.” – from You Look Like a Thing and I Love You (2019)

This speaks to the fact that AI can seem brilliant in one moment and then make a glaringly obvious mistake the next. Reinforcement learning with human feedback partially helps, but it’s not a cure-all.

The Scaling Hypothesis

Compute + Data = Intelligence

The scaling hypothesis in AI states that as we increase the size of our models (more parameters), provide more data, and use more powerful computers, we will continue to see improvements in AI capabilities. This is somewhat analogous to Moore’s Law, which for decades accurately predicted exponential increases in computing power.

Relevance to Legal AI

Bigger Models can potentially handle more diverse language tasks, including specialized legal tasks.
More Data means the model might have read more legal documents, making it better at drafting or researching.
Better Hardware enables training on bigger models and delivering faster responses.

However, bigger and faster does not guarantee less bias or more accuracy. If the training data is flawed or incomplete, the model’s output will reflect those flaws.

Practice Pointer: Bigger Isn't Always Better

Don’t assume that a newer, bigger model is always the best choice for every legal use case. Sometimes a smaller, more specialized model that has been carefully fine-tuned on relevant legal data can outperform a huge model that lacks domain-specific training.

Garbage In–Garbage Out: The Importance of Quality Data

You’ve probably heard the expression “garbage in, garbage out” (GIGO). It highlights that AI models are only as good as the data they’re trained on. Poor or biased data can lead to poor or biased results.

Janelle Shane’s Example: Rulers and Sheep

In You Look Like a Thing and I Love You, Janelle Shane provides vivid anecdotes about how AI isn't always as smart as we think it is:

Ruler on an X-ray: A machine learning system was supposed to detect cancer in X-rays. Surprisingly, it learned to spot the ruler often placed next to suspicious areas for measurement, confusing the presence of the ruler with the presence of cancer.
Sheep in a Field: Another system learned to recognize green grass as a signal for “sheep,” because in most training pictures, sheep were standing on green grass. The AI concluded that wherever there was a field of green grass, there must be sheep, even if no actual sheep were visible.

These stories underscore that AI can latch onto the wrong patterns if the data isn’t carefully curated.

Implications for Lawyers

Make sure the AI system has been trained on diverse, accurate legal documents.
If the training data is from only one jurisdiction or era, the system might not generalize to others.
Always verify critical outputs rather than accepting them at face value.

Example and Scenario

If a contract review AI mostly trained on consumer contracts from the 1990s, it may not handle new data privacy clauses introduced by modern regulations like the GDPR or CCPA. This might lead to incomplete or incorrect drafting suggestions. This scenario may seem obvious or unlikely, but humans still appear to have a bias toward viewing AI as a magical oracle and believing that it if it knows one thing well, it should know everything well.

Are We Running Out of Data?

One concern in AI research is that we might be approaching a point where publicly available, high-quality text is nearly all used up. Think of data like oil, there’s a finite supply, and once we’ve extracted it, it becomes harder to find new sources.

Finite Online Text: Since LLMs train on huge swaths of the internet, at some point, they’ve seen most of the high-quality text available.
Data Overlap: Many data sets repeat the same texts (e.g., Wikipedia is reused often).
Synthetic Data: One possible solution is to have AI generate new training data. However, if it’s based on AI’s own output, you can end up in a feedback loop.

Callout: Synthetic Data

Synthetic data is artificially generated content used to expand or diversify a training set. For legal AI, we might create hypothetical case scenarios or fake but realistic contracts. However, synthetic data can introduce new biases or inaccuracies if not carefully validated.

A Hands-On Experiment

It’s easy to talk in abstract terms about “attention” and “vectors.” Let’s do a short exercise using the Transformer Explainer tool at https://poloclub.github.io/transformer-explainer/. This interactive site visualizes how a transformer-based model (like the ones used in LLMs) predicts the next word.

Step-by-Step Guide

Open the site in your browser.
Type a short sentence like “The lawyer presented the argument before the judge.”
Observe the Attention Weights: The tool shows which words in the sentence have the strongest influence on predicting the next word.
Experiment: Try variations like “The exhausted lawyer presented the argument…” and see how “exhausted” changes the attention patterns.

Example: Context Matters

You might notice that the word “exhausted” affects how the model weighs the context around “lawyer.” This reveals why a transformer can keep track of context in a more nuanced way than older models.

By experimenting, you’re seeing a real demonstration of how the model decides which words matter most. This capacity for “attention” is at the heart of why transformers are so good at generating text.

Chapter Recap

We’ve covered a lot of ground in this chapter, moving from a basic notion of AI to the technical underpinnings of generative AI. Here are the key takeaways:

AI and Machine Learning
AI is about machines doing tasks that normally require human intelligence. Machine learning focuses on letting machines learn patterns from data. Deep learning uses multi-layered neural networks to find complex patterns.
General vs. Narrow AI
Current AI systems, including advanced LLMs, are narrow and excel only in specific domains. True Artificial General Intelligence is still a hypothetical future goal.
Neural Networks and Perceptrons
Neural networks are built from tiny units (perceptrons) that learn to activate or not, based on inputs. Stacking many layers of these units allows the system to detect intricate patterns.
Large Language Models
LLMs like GPT-4 have trillions of parameters. Training them involves digesting massive amounts of text data to predict what word comes next in a sentence.
Transformer Architecture
The “Attention Is All You Need” paper introduced a way for models to look at all parts of a sentence at once, transforming what was possible with AI. Attention mechanisms let models figure out which words are most relevant to each other.
Interpolation vs. Extrapolation
LLMs are masters at blending known patterns (interpolation) but struggle with truly novel scenarios (extrapolation).
Weights, Biases, Parameters & Gradient Descent
Training tweaks these numerical parameters via a process akin to feeling your way downhill. Minimizing a “loss function” guides these adjustments.
Vectors and Embeddings
Words get turned into numerical representations that capture their meanings. Similar words end up near each other in “embedding space.”
Reinforcement Learning (RLHF)
ChatGPT uses human feedback to refine its responses. This helps produce more user-friendly and contextually coherent answers.
The Scaling Hypothesis
Bigger models + more data + more compute = more capable AI. But bigger doesn’t always mean better for every use case.
Garbage In, Garbage Out
Quality data is crucial. Poor or biased training data yields flawed outputs. Janelle Shane’s examples (the X-ray ruler and the “sheep = green grass”) show how AI can learn the wrong cues.
Are We Running Out of Data?
High-quality text data might be finite, leading researchers to explore synthetic data. Must avoid feedback loops where models train on their own potentially flawed output.
Hands-On Experiment
Using the Transformer Explainer site can help you visualize how attention guides AI predictions in real time.

Practice Pointer

Before proceeding, reflect on the core question: How might these concepts affect the way you validate AI-generated legal documents? Keep in mind that while AI can save time, it’s crucial to know how these models reach their conclusions and where they might slip up.

Final Thoughts

Generative AI, especially large language models powered by the transformer architecture, represent a significant leap in how we create, analyze, and interpret text. For legal professionals, these tools hold the promise of faster, more efficient workflows, from document drafting to case law summarization. Yet they also come with caveats: they can generate errors or biased language, they may not handle entirely novel scenarios gracefully, and they remain reliant on the data they’re trained on.

Moving forward, keep these lessons in mind:

AI is not a magical oracle nor infallible, human oversight remains essential.
Better data yields better outputs, quality and representativeness matter.
Scaling AI continues to expand possibilities, but size alone doesn’t solve all problems.
Transparency and ethical considerations are crucial for legal professionals who adopt these tools.

What's Next?

In Chapter 3, we’ll focus on the real-world tools that operationalize these concepts, including popular AI tools like ChatGPT and Claude, and other emerging platforms, diving into their strengths, weaknesses, and how they fit into the legal workflow. We’ll talk about what you can realistically expect these tools to do for you in a law office environment, how to integrate them responsibly, and what pitfalls to watch out for. We will also examine practical use cases, like drafting briefs, summarizing case law, and more, and explore the growing number of third-party AI tools tailored for legal tasks.

References

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Kneusel, R. (2021). How AI Works: From Neural Networks and Deep Learning to Natural Language Processing. No Starch Press.
Russell, S. J., & Norvig, P. (2010). Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall.
Shane, J. (2019). You Look Like a Thing and I Love You: How Artificial Intelligence Works and Why It’s Making the World a Weirder Place. Little, Brown and Company.
Wooldridge, M. (2021). A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going. Flatiron Books.

Optional Deeper Dive (For the Inquisitive)

If you’re intrigued by any particular concept, consider exploring these subtopics on your own time:

Activation Functions: These determine how neurons pass signals along in neural networks. Common examples are ReLU and sigmoid.
Hyperparameters: Settings such as learning rate, batch size, or the number of layers in a network. Tuning these can drastically change performance.
Overfitting and Underfitting: Problems that arise when a model memorizes the training data too closely (overfitting) or fails to learn enough patterns (underfitting).

Understanding these deeper topics can help demystify the “secret sauce” behind AI, but for most legal applications, a high-level grasp of the basics is sufficient.

Chapter 2: How Does Generative AI Work? #

Chapter Overview #

From Conceptual Understanding to Technical Foundations #

Defining Artificial Intelligence #

Machine Learning: A Subset of AI #

Types of Machine Learning #

Deep Learning: Where Neural Networks Come In #

Reference Note #

General vs. Narrow AI: What’s the Difference? #

Practice Pointer #

What Is a Neural Network? #

Perceptron: The Simplest Building Block #

How Do These Networks Learn? #

Large Language Models #

Scale: Why Are They Called “Large”? #

Example and Scenario #

Key Term Callout: “Parameter” #

The Transformer Architecture: “Attention Is All You Need” #

What Is “Attention”? #

Detailed Example: #

Why Transformers Changed the Game #

Interpolation vs. Extrapolation #

Practice Pointer #

Weights, Biases, and Parameters #

What Is Gradient Descent? #

Call Out: The Problem of Overfitting #

Vectors and Embeddings #

A Simple Analogy #

Word Embeddings #

Practice Pointer #

Reinforcement Learning #

RL With Human Feedback (RLHF) #

The Scaling Hypothesis #

Relevance to Legal AI #

Practice Pointer: Bigger Isn't Always Better #

Garbage In–Garbage Out: The Importance of Quality Data #

Janelle Shane’s Example: Rulers and Sheep #

Implications for Lawyers #

Example and Scenario #

Are We Running Out of Data? #

Callout: Synthetic Data #

A Hands-On Experiment #

Step-by-Step Guide #

Example: Context Matters #

Chapter Recap #

Practice Pointer #

Final Thoughts #

What's Next? #

References #

Optional Deeper Dive (For the Inquisitive) #

Chapter 2: How Does Generative AI Work?

Chapter Overview

From Conceptual Understanding to Technical Foundations

Defining Artificial Intelligence

Machine Learning: A Subset of AI

Types of Machine Learning

Deep Learning: Where Neural Networks Come In

Reference Note

General vs. Narrow AI: What’s the Difference?

Practice Pointer

What Is a Neural Network?

Perceptron: The Simplest Building Block

How Do These Networks Learn?

Large Language Models

Scale: Why Are They Called “Large”?

Example and Scenario

Key Term Callout: “Parameter”

The Transformer Architecture: “Attention Is All You Need”

What Is “Attention”?

Detailed Example:

Why Transformers Changed the Game

Interpolation vs. Extrapolation

Practice Pointer

Weights, Biases, and Parameters

What Is Gradient Descent?

Call Out: The Problem of Overfitting

Vectors and Embeddings

A Simple Analogy

Word Embeddings

Practice Pointer

Reinforcement Learning

RL With Human Feedback (RLHF)

The Scaling Hypothesis

Relevance to Legal AI

Practice Pointer: Bigger Isn't Always Better

Garbage In–Garbage Out: The Importance of Quality Data

Janelle Shane’s Example: Rulers and Sheep

Implications for Lawyers

Example and Scenario

Are We Running Out of Data?

Callout: Synthetic Data

A Hands-On Experiment

Step-by-Step Guide

Example: Context Matters

Chapter Recap

Practice Pointer

Final Thoughts

What's Next?

References

Optional Deeper Dive (For the Inquisitive)