Skip to main content

Meta LLaMA 2 Model Card

Model Overview

Model Name: LLaMA 2 (Large Language Model Meta AI 2)
Version: 2.0
Release Date: July 2023
Developer: Meta AI
Model Type: Autoregressive Language Model
Architecture: Transformer-based decoder-only architecture

Model Description

Purpose

LLaMA 2 is designed as a foundational large language model for research and commercial applications. It serves as a base model for fine-tuning on specific tasks including dialogue, code generation, and reasoning applications.

Target Users

  • AI researchers and developers
  • Commercial organizations building AI applications
  • Academic institutions conducting NLP research
  • Developers creating conversational AI systems

Model Variants

Model SizeParametersContext LengthUse Case
LLaMA 2-7B7 billion4,096 tokensLightweight applications, research
LLaMA 2-13B13 billion4,096 tokensBalanced performance and efficiency
LLaMA 2-70B70 billion4,096 tokensHigh-performance applications

Training Information

Training Data

  • Sources: Publicly available online sources, excluding Meta's products or services
  • Data Size: 2 trillion tokens
  • Languages: Primarily English, with some multilingual content
  • Cutoff Date: Data collected through early 2023
  • Preprocessing: Deduplication, safety filtering, privacy scrubbing

Training Process

  • Architecture: Transformer decoder with RMSNorm normalization
  • Attention: Grouped-query attention for improved inference efficiency
  • Training Infrastructure: Custom Meta infrastructure with optimized training pipeline
  • Training Duration: Several months using thousands of GPUs
  • Optimization: AdamW optimizer with cosine learning rate schedule

Performance Metrics

Benchmark Results

BenchmarkLLaMA 2-7BLLaMA 2-13BLLaMA 2-70B
MMLU45.354.868.9
HellaSwag77.280.787.3
MATH4.16.713.5
GSM8K14.628.756.8
HumanEval12.818.329.9

Evaluation Methodology

  • Academic Benchmarks: Evaluated on standard NLP benchmarks including reading comprehension, mathematical reasoning, and code generation
  • Human Evaluation: Preference studies comparing outputs to other models
  • Safety Evaluation: Red-teaming exercises and automated safety assessments

Intended Use

Primary Applications

  • Research: Foundation model for academic and industrial research
  • Fine-tuning: Base model for task-specific applications
  • Dialogue Systems: Conversational AI and chatbot development
  • Content Generation: Creative writing and content assistance
  • Code Generation: Programming assistance and code completion

Out-of-Scope Uses

  • Real-time critical decision making without human oversight
  • Applications requiring factual accuracy without verification
  • Use cases involving personal, private, or sensitive information
  • Deployment without appropriate safety measures and monitoring

Limitations and Risks

Known Limitations

  • Factual Accuracy: May generate plausible but incorrect information
  • Temporal Knowledge: Training data cutoff limits recent information
  • Mathematical Reasoning: Limited performance on complex mathematical problems
  • Code Generation: May produce syntactically correct but logically flawed code
  • Context Length: 4K token limit restricts long-form document processing

Bias and Fairness Concerns

  • Demographic Bias: May exhibit biases present in training data
  • Cultural Bias: Primarily trained on English content with Western perspectives
  • Representation Gaps: Underrepresentation of certain communities and viewpoints
  • Stereotyping: Potential for reinforcing harmful stereotypes

Risk Mitigation

  • Safety Training: Reinforcement Learning from Human Feedback (RLHF)
  • Red Team Testing: Adversarial testing for harmful content generation
  • Content Filtering: Built-in safety classifiers and content filters
  • Usage Guidelines: Comprehensive responsible use guidelines provided

Technical Specifications

Model Architecture

  • Type: Transformer decoder-only
  • Layers: Varies by model size (32 layers for 7B, 40 layers for 13B, 80 layers for 70B)
  • Attention Heads: Multi-head attention with grouped-query attention
  • Embedding Dimension: Model-specific (4096 for 7B, 5120 for 13B, 8192 for 70B)
  • Vocabulary Size: 32,000 tokens using SentencePiece tokenization

System Requirements

Model SizeMemory (FP16)Memory (INT8)Recommended GPU
7B~13 GB~7 GBRTX 3090, A100
13B~25 GB~13 GBA100 40GB
70B~140 GB~70 GBMultiple A100s

Licensing and Access

License

  • Type: Custom commercial license
  • Research Use: Permitted for research purposes
  • Commercial Use: Requires separate commercial licensing agreement
  • Distribution: Restrictions on redistribution and modification

Access Requirements

  • Model download requires acceptance of license terms
  • Commercial use requires additional approval process
  • Compliance with Meta's Acceptable Use Policy required

Monitoring and Updates

  • Output Quality: Regular assessment of generation quality
  • Safety Metrics: Monitoring for harmful or biased outputs
  • Performance Tracking: Benchmark performance over time
  • Usage Patterns: Analysis of deployment patterns and use cases

Model Updates

  • Version Control: Clear versioning for model updates
  • Documentation Updates: Regular updates to model card and documentation
  • Community Feedback: Incorporation of community feedback and research findings

Contact and Support

Developer: Meta AI
Documentation: Official LLaMA 2 Documentation
Research Paper: "Llama 2: Open Foundation and Fine-Tuned Chat Models"
Community: Hugging Face Model Hub and research community forums


This model card follows the guidelines established in "Model Cards for Model Reporting" (Mitchell et al., 2019) and incorporates responsible AI documentation practices.