January 19, 2025 6:00 PM PST
This meeting discusses the inner workings of Large Language Models (LLMs), including their architecture, training processes, and challenges faced in their deployment.
Presenter: Coach Cindy, Director of Data Science, Machine Learning and Frontend Engineering
Key Points
Generative AI Applications
- Building generative AI applications involves understanding the underlying technology of LLMs.
Neural Network Architecture
- LLMs consist of outer layers and hidden layers.
- Models have evolved in size:
- GPT-1: 117 million parameters
- GPT-2: 1.5 billion parameters
- GPT-3: 175 billion parameters
- Claude: Smaller model with competitive performance.
Characteristics of LLMs
- Powerful, expensive, and slow.
- Benchmarked with 57 subjects against random performance.
- Capable of massive multitask language understanding, approaching expert-level performance.
Technical Specifications
- Knowledge and problem-solving capabilities are significant but costly.
- Utilizes GPU for parallel computation due to independent parameters.
- Data structures:
- 1D: Vector
- 2D: Matrix
- 3D and above: Tensor
Model Training and Data Sources
- Training data includes:
- Green: Web data (common crawl, Reddit, Stackexchange, Wikipedia)
- Blue: Research papers and documents
- Orange: Cultural texts (e.g., Bible, movies)
- Total training data size: 825 GB.
Tokenization Process
- Tokenization involves breaking text into words or segments.
- GPT-2 uses Byte Pair Encoding (BPE).
- Cost of models can be very high (e.g., 10.6 million for LLAMA).
- Tokenization converts tokens into floating-point number lists.
Attention Mechanism
- Attention measures the relationship between pairs of tokens, incorporating positional information.
- Activation functions convert linear outputs to non-linear outputs.
- The model outputs the token with the highest probability, influenced by temperature settings.
Limitations of Pre-trained Models
- Pre-trained models have limitations, including:
- Hallucinations
- Domain shift
- Task shift
- Solutions include being specific about the domain and task in prompts.
Resource Constraints
- Models may face resource constraints and lack access to private data.
Retrieval Augmented Generation
- This approach may involve changing embeddings to enhance effectiveness.
- Fine-tuning involves training the LLM further with high-quality data, often using a different embedding model for database searches.