#
Model Serving - Active Models
This is the list of AI models currently live and available on the Phoeniqs Model Service. All models are served on Phoeniqs infrastructure through an OpenAI-compatible API and are ready for production use.
#
Active Models
Input and Output: credits per million tokens. TPM: tokens per minute. Context window: maximum tokens per API call.
Token Throughput Disclaimer : The token-per-minute throughput figures provided are based on controlled testing conditions and are intended for benchmarking and comparison purposes only. Actual performance in production environments may vary significantly depending on workload characteristics, system configuration, model hosting provider, network conditions, and other operational factors. These results should not be interpreted as a guarantee of real-world performance.
Model Updates and Deprecation Disclaimer We reserve the right to modify, upgrade, or replace any AI models used in our services at any time. This may include deprecating older models and introducing newer versions as we deem necessary to maintain performance, security, and service quality. While we aim to provide notice when feasible, changes may occur without prior notification.
NOTE Pricing is subject to change at our discretion.
#
Active Models by Use Case
The groups below mirror the Active Models table: every live model in that table appears exactly once, grouped by primary workload.
Chat, assistants, and general text (instruction-following)
Multilingual dialogue, coding help, and broad assistant-style tasks: Apertus 70B, DeepSeek V3.2, GLM 4.5 Air 110B, GLM 4.6 (chat; deployed on demand), GPT OSS 120B, Granite 3.3 8B, Mistral 7B Instruct v0.3.Reasoning and chain-of-thought
Deliberate, step-heavy tasks: GLM 5.1, Qwen3 8B, QwQ 32B.Multimodal (images and text)
Image+text input and text output, or compact vision-language: Gemma 3 12B IT, Gemma 4 31B, Granite Vision 3.2 2B, Llama 4 Maverick, Llama 4 Scout 17B, Qwen3 VL 235B.RAG (retrieval-augmented generation)
Embeddings, reranking, and retrieval stacks: BGE M3 (embedding), BGE Reranker v2 M3, Granite Embedding 278M.OCR, layout, and document parsing
Optical compression, layout, and structured document workflows: DeepSeek OCR, MinerU 2.5.Speech
Transcription and speech-centric workflows: Whisper Large v3.
Agents, tool calling, and orchestration usually pick one primary LLM from the chat or reasoning groups, then add other pieces only when the workflow needs them: multimodal models for image inputs; OCR or MinerU to extract text or structure from documents; and RAG (embedding, reranking) when the agent must retrieve relevant passages from content you have already chunked and indexed.
#
Active Models by Risk
#
Using the models
Looking for ready-to-run examples? See the Model Service Guides:
- How to inference an AI model — what you need to make a call (Base URL, Model Name, API Key).
- Sample API calls — cURL examples for chat, embeddings, multimodal, OCR, and more.