January 19, 2025 6:00 PM PST

This meeting discusses the inner workings of Large Language Models (LLMs), including their architecture, training processes, and challenges faced in their deployment.

Presenter: Coach Cindy, Director of Data Science, Machine Learning and Frontend Engineering

Key Points

Generative AI Applications

Building generative AI applications involves understanding the underlying technology of LLMs.

Neural Network Architecture

LLMs consist of outer layers and hidden layers.
Models have evolved in size:
- GPT-1: 117 million parameters
- GPT-2: 1.5 billion parameters
- GPT-3: 175 billion parameters
- Claude: Smaller model with competitive performance.

Characteristics of LLMs

Powerful, expensive, and slow.
Benchmarked with 57 subjects against random performance.
Capable of massive multitask language understanding, approaching expert-level performance.

Technical Specifications

Knowledge and problem-solving capabilities are significant but costly.
Utilizes GPU for parallel computation due to independent parameters.
Data structures:
- 1D: Vector
- 2D: Matrix
- 3D and above: Tensor

Model Training and Data Sources

Training data includes:
- Green: Web data (common crawl, Reddit, Stackexchange, Wikipedia)
- Blue: Research papers and documents
- Orange: Cultural texts (e.g., Bible, movies)
Total training data size: 825 GB.

Tokenization Process

Tokenization involves breaking text into words or segments.
GPT-2 uses Byte Pair Encoding (BPE).
Cost of models can be very high (e.g., 10.6 million for LLAMA).
Tokenization converts tokens into floating-point number lists.

Attention Mechanism

Attention measures the relationship between pairs of tokens, incorporating positional information.
Activation functions convert linear outputs to non-linear outputs.
The model outputs the token with the highest probability, influenced by temperature settings.

Limitations of Pre-trained Models

Pre-trained models have limitations, including:
- Hallucinations
- Domain shift
- Task shift
Solutions include being specific about the domain and task in prompts.

Resource Constraints

Models may face resource constraints and lack access to private data.

Retrieval Augmented Generation

This approach may involve changing embeddings to enhance effectiveness.
Fine-tuning involves training the LLM further with high-quality data, often using a different embedding model for database searches.

How LLM Works Under the Hood