Personal tools

Foundations of LLMs

Universität Heidelberg_020926A
[Universität Heidelberg, Germany]

- Overview

Foundations of large language models (LLMs) are foundational, large-scale AI models trained on vast, unlabeled datasets using self-supervised learning, enabling them to learn complex language patterns, semantic relationships, and contextual understanding. 

Based on the Transformer architecture, they are designed to be adapted - via fine-tuning or prompting - to a wide range of specialized tasks, such as generating text, code, translation, and answering questions, serving as the core engine of generative AI. 

Key foundation components, such as self-attention mechanisms and massive data training, allow these models to achieve high performance in natural language understanding and generation.

1. Key Aspects of LLM Foundations:

  • Pre-training: LLMs undergo massive training on massive datasets (web pages, books, code) to understand language patterns.
  • Transformer Architecture: The core neural network structure (often featuring transformers with self-attention) allows the model to process tokens and understand context.
  • Self-Supervised Learning: Models learn by predicting the next word or mask in a sequence.
  • Adaptation (Fine-Tuning/Alignment): These base models are refined for specific tasks or safety (reducing bias) through techniques like reinforcement learning and instruction following.
  • Embeddings: Words are represented as multi-dimensional vectors (word embeddings) to capture semantic relationships.


2. Synonyms/Related Terms:

  • Foundation models
  • Base models
  • Pre-trained models
  • Generative pre-trained transformers (GPT)
  • Large-scale AI models [1, 2]


3. Usage Examples:

  • Content Generation: Creating articles, poems, or code.
  • Summarization: Condensing long documents into summaries.
  • Dialogue Systems: Engaging in natural conversation (e.g., chatbots).
  • Translation: Converting text from one language to another.
  • Information Extraction/Retrieval: Performing tasks like sentiment analysis and answering queries.
  • In-context Learning: Arithmetics and decoding, as mentioned in.


4. Common Examples:

  • GPT Series: (e.g., GPT-4) by OpenAI.
  • BERT: (Bidirectional Encoder Representations from Transformers) by Google.

[More to come ...]



Document Actions