Meta LLaMA 2 Model Card
Model Overview
Model Name: LLaMA 2 (Large Language Model Meta AI 2)
Version: 2.0
Release Date: July 2023
Developer: Meta AI
Model Type: Autoregressive Language Model
Architecture: Transformer-based decoder-only architecture
Model Description
Purpose
LLaMA 2 is designed as a foundational large language model for research and commercial applications. It serves as a base model for fine-tuning on specific tasks including dialogue, code generation, and reasoning applications.
Target Users
- AI researchers and developers
- Commercial organizations building AI applications
- Academic institutions conducting NLP research
- Developers creating conversational AI systems
Model Variants
| Model Size | Parameters | Context Length | Use Case |
|---|---|---|---|
| LLaMA 2-7B | 7 billion | 4,096 tokens | Lightweight applications, research |
| LLaMA 2-13B | 13 billion | 4,096 tokens | Balanced performance and efficiency |
| LLaMA 2-70B | 70 billion | 4,096 tokens | High-performance applications |
Training Information
Training Data
- Sources: Publicly available online sources, excluding Meta's products or services
- Data Size: 2 trillion tokens
- Languages: Primarily English, with some multilingual content
- Cutoff Date: Data collected through early 2023
- Preprocessing: Deduplication, safety filtering, privacy scrubbing
Training Process
- Architecture: Transformer decoder with RMSNorm normalization
- Attention: Grouped-query attention for improved inference efficiency
- Training Infrastructure: Custom Meta infrastructure with optimized training pipeline
- Training Duration: Several months using thousands of GPUs
- Optimization: AdamW optimizer with cosine learning rate schedule
Performance Metrics
Benchmark Results
| Benchmark | LLaMA 2-7B | LLaMA 2-13B | LLaMA 2-70B |
|---|---|---|---|
| MMLU | 45.3 | 54.8 | 68.9 |
| HellaSwag | 77.2 | 80.7 | 87.3 |
| MATH | 4.1 | 6.7 | 13.5 |
| GSM8K | 14.6 | 28.7 | 56.8 |
| HumanEval | 12.8 | 18.3 | 29.9 |
Evaluation Methodology
- Academic Benchmarks: Evaluated on standard NLP benchmarks including reading comprehension, mathematical reasoning, and code generation
- Human Evaluation: Preference studies comparing outputs to other models
- Safety Evaluation: Red-teaming exercises and automated safety assessments
Intended Use
Primary Applications
- Research: Foundation model for academic and industrial research
- Fine-tuning: Base model for task-specific applications
- Dialogue Systems: Conversational AI and chatbot development
- Content Generation: Creative writing and content assistance
- Code Generation: Programming assistance and code completion
Out-of-Scope Uses
- Real-time critical decision making without human oversight
- Applications requiring factual accuracy without verification
- Use cases involving personal, private, or sensitive information
- Deployment without appropriate safety measures and monitoring
Limitations and Risks
Known Limitations
- Factual Accuracy: May generate plausible but incorrect information
- Temporal Knowledge: Training data cutoff limits recent information
- Mathematical Reasoning: Limited performance on complex mathematical problems
- Code Generation: May produce syntactically correct but logically flawed code
- Context Length: 4K token limit restricts long-form document processing
Bias and Fairness Concerns
- Demographic Bias: May exhibit biases present in training data
- Cultural Bias: Primarily trained on English content with Western perspectives
- Representation Gaps: Underrepresentation of certain communities and viewpoints
- Stereotyping: Potential for reinforcing harmful stereotypes
Risk Mitigation
- Safety Training: Reinforcement Learning from Human Feedback (RLHF)
- Red Team Testing: Adversarial testing for harmful content generation
- Content Filtering: Built-in safety classifiers and content filters
- Usage Guidelines: Comprehensive responsible use guidelines provided
Technical Specifications
Model Architecture
- Type: Transformer decoder-only
- Layers: Varies by model size (32 layers for 7B, 40 layers for 13B, 80 layers for 70B)
- Attention Heads: Multi-head attention with grouped-query attention
- Embedding Dimension: Model-specific (4096 for 7B, 5120 for 13B, 8192 for 70B)
- Vocabulary Size: 32,000 tokens using SentencePiece tokenization
System Requirements
| Model Size | Memory (FP16) | Memory (INT8) | Recommended GPU |
|---|---|---|---|
| 7B | ~13 GB | ~7 GB | RTX 3090, A100 |
| 13B | ~25 GB | ~13 GB | A100 40GB |
| 70B | ~140 GB | ~70 GB | Multiple A100s |
Licensing and Access
License
- Type: Custom commercial license
- Research Use: Permitted for research purposes
- Commercial Use: Requires separate commercial licensing agreement
- Distribution: Restrictions on redistribution and modification
Access Requirements
- Model download requires acceptance of license terms
- Commercial use requires additional approval process
- Compliance with Meta's Acceptable Use Policy required
Monitoring and Updates
Recommended Monitoring
- Output Quality: Regular assessment of generation quality
- Safety Metrics: Monitoring for harmful or biased outputs
- Performance Tracking: Benchmark performance over time
- Usage Patterns: Analysis of deployment patterns and use cases
Model Updates
- Version Control: Clear versioning for model updates
- Documentation Updates: Regular updates to model card and documentation
- Community Feedback: Incorporation of community feedback and research findings
Contact and Support
Developer: Meta AI
Documentation: Official LLaMA 2 Documentation
Research Paper: "Llama 2: Open Foundation and Fine-Tuned Chat Models"
Community: Hugging Face Model Hub and research community forums
This model card follows the guidelines established in "Model Cards for Model Reporting" (Mitchell et al., 2019) and incorporates responsible AI documentation practices.