LLM Model Management
Model Management
Upload and manage LLMs for your RAG pipeline
Name | Type | Status | Parameters | Quantization | Context Window | Last Updated | Actions |
---|---|---|---|---|---|---|---|
Llama 3.1 (8B)Meta's Llama 3.1 8B model, optimized for local deployment | Local | Ready | 8B | GGUF (Q4_K_M) | 8,192 tokens | 2023-10-15 | |
Mistral (7B)Mistral 7B model, good balance of performance and efficiency | Local | Ready | 7B | GGUF (Q5_K_M) | 8,192 tokens | 2023-09-20 | |
GPT-4oOpenAI's GPT-4o model, accessed via API | API | Ready | Unknown | N/A | 128,000 tokens | 2023-10-22 | |
Custom BERT-based ModelCustom fine-tuned model for specific domain knowledge | Custom | Error | 330M | GGUF (Q4_0) | 4,096 tokens | 2023-10-18 |
Model Hosting
Configure how models are hosted and served
Continuous batching improves throughput for multiple concurrent requests
Optimizes memory usage for handling multiple requests
Distributes model across multiple GPUs (requires compatible hardware)
Model Hub
Download pre-configured models from the hub
Llama 3.1 (8B)
Meta's latest 8B parameter model
Size:4.2 GB (Q4_K_M)
Context:8K tokens
Mistral (7B)
Efficient 7B parameter model
Size:3.8 GB (Q4_K_M)
Context:8K tokens
Phi-3 (3.8B)
Microsoft's compact but powerful model
Size:2.1 GB (Q4_K_M)
Context:4K tokens