Personal tools

AI Foundation Models

Stanford_P1010983
(Stanford University - Jaclyn Chen)

- Overview

AI foundation models, also known as base models, are a type of machine learning (ML) model that are pre-trained on a large, general data set to perform a variety of tasks. They are considered the backbone of artificial intelligence (AI) and are known for their adaptability and generality.

Foundation models are characterized by two defining features:

  • Transfer learning: The model's ability to apply information from one situation to another
  • Scale: The use of hardware, such as graphics processing units (GPUs), that allow the model to perform multiple computations simultaneously 

Foundation models can be used as standalone systems or as a base for other applications, including:

  • Creating apps
  • Text generation
  • Image generation
  • Audio generation
  • Generative tasks, like drug discovery
  • Healthcare
  • Enhancing medical research by analyzing large data sets
  • Creating more human-like chatbots 

Some examples of foundation models include:

  • GPT by OpenAI: A text-based model that can write essays, answer questions, and create poetry
  • IBM NASA geospatial: A model created by IBM and NASA that can predict wildfires and floods, and can be further fine-tuned to track deforestation or predict crop yields 

However, foundation models can also be susceptible to inaccuracies and provide fictitious responses, known as hallucinations. These issues can be caused by a lack of context when prompting, biases in the training data, or low-quality training data. To mitigate these issues, deterministic controls, such as vector databases, can be added to ground the response in real data. 

 

- What's Unique about Foundation Models?

A unique feature of foundation models is their adaptability. These models can perform a variety of different tasks with high accuracy based on input prompts. Some tasks include natural language processing (NLP), question answering, and image classification. 

The scale and general nature of foundation models make it different from traditional ML models, which often perform specific tasks such as analyzing the sentiment of text, classifying images, and predicting trends.

You can use foundation models as a base models to develop more specialized downstream applications. The models are the culmination of more than a decade of work, and they continue to grow in size and complexity.

 

- Trillion-Parameter Models

What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for:

  • Natural language processing tasks like translation, question answering, abstraction, and fluency.
  • Holding longer-term context and conversational ability.
  • Multimodal applications combining language, vision, and speech.
  • Creative applications like storytelling, poetry generation, and code generation.
  • Scientific applications, such as protein folding predictions and drug discovery.
  • Personalization, with the ability to develop a consistent personality and remember user context.

The benefits are huge, but training and deploying large models can be computationally and resource intensive. Computationally efficient, cost-effective, and energy-efficient systems designed to provide on-the-fly inference are critical for widespread deployment.

For example, BERT, one of the first bidirectional foundation models, was released in 2018. It was trained using 340 million parameters and a 16 GB training dataset. In 2023, only five years later, OpenAI trained GPT-4 using 170 trillion parameters and a 45 GB training dataset. According to OpenAI, the computational power required for foundation modeling has doubled every 3.4 months since 2012. 

Today’s foundation models, such as the large language models (LLMs) Claude 2 and Llama 2, and the text-to-image model Stable Diffusion from Stability AI, can perform a range of tasks out of the box spanning multiple domains, like writing blog posts, generating images, solving math problems, engaging in dialog, and answering questions based on a document.

 

- Tokens and Parameters

In AI and ML, the terms "token" and "parameter" are often used interchangeably, but they have different meanings and roles in model training.

Tokens represent the smallest unit of data processed by the model, such as a word or character in natural language processing. 

Parameters, on the other hand, are internal variables that the model adjusts during training to improve its performance. Both tokens and parameters are key elements in model training, but they serve different purposes and significantly impact the model's accuracy and overall performance.

 

[More to come ...]



Document Actions