Architecture Overview
UAL Adapter is built on a modular architecture that enables seamless transfer of LoRA adapters across different model families.
System Components
The system consists of five main components:
1. AIR Format
Architecture-Agnostic Intermediate Representation
The AIR format is the core innovation that enables cross-architecture transfer. It:
Maps model-specific parameter names to universal semantic roles
Stores LoRA weights in a portable format
Preserves metadata about training configuration
Enables dimension-aware reconstruction
Semantic Roles:
attention_query- Query projection in attentionattention_key- Key projection in attentionattention_value- Value projection in attentionattention_output- Output projection in attentionmlp_up- MLP up-projectionmlp_down- MLP down-projectionmlp_gate- MLP gate projection (for gated architectures)
2. Model Binders
Architecture-Specific Mappings
Binders provide the mapping between model-specific parameter names and universal semantic roles.
Supported architectures:
GPT-2 family (GPT-2, GPT-Neo)
LLaMA family (LLaMA, LLaMA-2, TinyLlama)
Pythia family (EleutherAI models)
Qwen family (Qwen, Qwen-2)
BERT family (BERT, RoBERTa)
T5 family (T5, Flan-T5)
Each binder defines:
Parameter name patterns with layer indexing
Dimension information (in_features, out_features)
Special handling for fused layers (e.g., GPT-2’s c_attn)
3. Dimension Projection
SVD-Based Dimension Adaptation
When transferring between models with different dimensions, UAL uses three projection methods:
SVD Projection (Recommended)
# Projects through singular value decomposition
# Preserves variance while adapting dimensions
projector.project(lora_weights, target_dim, method="svd")
Truncate Projection
# Simple truncation or zero-padding
# Fast but may lose information
projector.project(lora_weights, target_dim, method="truncate")
Interpolate Projection
# Bilinear interpolation for smooth adaptation
# Good for moderate dimension changes
projector.project(lora_weights, target_dim, method="interpolate")
4. LoRA Dispatcher
Intelligent Multi-Domain Routing
The dispatcher enables multi-agent systems with domain-specific adapters:
Query → Embedding → Router → Domain Selection → Adapter Application
Router Training:
Uses sentence embeddings for semantic understanding
Trains multi-class logistic regression for classification
Provides confidence scores for each domain
Supports confidence thresholds for fallback behavior
5. Training Pipeline
Efficient LoRA Training
The training component provides:
Automatic target module detection
Gradient accumulation support
Learning rate scheduling
Checkpointing and resuming
Comprehensive logging
Data Flow
Training Phase
Load base model and tokenizer
Create UniversalAdapter instance
Detect trainable modules automatically
Apply LoRA to target modules
Train on domain-specific data
Export to AIR format
Transfer Phase
Load AIR format adapter
Parse semantic roles and weights
Detect target model architecture
Map semantic roles to target parameters
Project dimensions if needed
Apply adapted LoRA weights
Inference Phase
Single Domain:
Import adapter to target model
Run inference with adapted model
Multi-Domain:
Register multiple domain adapters
Train dispatcher router
Query routing at inference time
Dynamic adapter selection
Generate with selected adapter
Design Principles
Architecture Agnostic
No hardcoded assumptions about model structure. Everything goes through semantic role mapping.
Dimension Adaptive
Handles any dimension mismatch through intelligent projection methods.
Modular & Extensible
Easy to add new architectures by implementing binders.
Production Ready
Comprehensive error handling, logging, and type hints throughout.
Testable
High test coverage with unit and integration tests.
Performance Considerations
Memory Efficiency
LoRA reduces parameters by 10,000x compared to full fine-tuning:
Full fine-tuning: 100M+ parameters
LoRA (rank=16): ~10K parameters
Compute Efficiency
Training is 3-5x faster than full fine-tuning:
No backward pass through full model
Only adapter parameters updated
Smaller optimizer state
Storage Efficiency
AIR files are compact:
Typical size: 1-10 MB per adapter
vs 500MB+ for full model checkpoints
Limitations
Transfer quality depends on architecture similarity
Very large dimension mismatches may reduce performance
Router accuracy depends on training examples quality
Some architecture-specific optimizations may not transfer
Next Steps
Learn about AIR Format Specification in detail
Understand Dimension Projection methods
Explore LoRA Dispatcher for multi-domain use
Read Adding Custom Architectures to add new models