LLM Model Management

Model Management
Upload and manage LLMs for your RAG pipeline
NameTypeStatusParametersQuantizationContext WindowLast UpdatedActions
Llama 3.1 (8B)Meta's Llama 3.1 8B model, optimized for local deployment
Local
Ready
8BGGUF (Q4_K_M)8,192 tokens2023-10-15
Mistral (7B)Mistral 7B model, good balance of performance and efficiency
Local
Ready
7BGGUF (Q5_K_M)8,192 tokens2023-09-20
GPT-4oOpenAI's GPT-4o model, accessed via API
API
Ready
UnknownN/A128,000 tokens2023-10-22
Custom BERT-based ModelCustom fine-tuned model for specific domain knowledge
Custom
Error
330MGGUF (Q4_0)4,096 tokens2023-10-18
Model Hosting
Configure how models are hosted and served

Continuous batching improves throughput for multiple concurrent requests

Optimizes memory usage for handling multiple requests

Distributes model across multiple GPUs (requires compatible hardware)

Model Hub
Download pre-configured models from the hub
Llama 3.1 (8B)
Meta's latest 8B parameter model
Size:4.2 GB (Q4_K_M)
Context:8K tokens
Mistral (7B)
Efficient 7B parameter model
Size:3.8 GB (Q4_K_M)
Context:8K tokens
Phi-3 (3.8B)
Microsoft's compact but powerful model
Size:2.1 GB (Q4_K_M)
Context:4K tokens