Models Directory
Browse all 92 available language models and their capabilities
Free Models27
These models are available to all users without any subscription or pay-as-you-go charges.
liquid/lfm-7b
liquid/lfm-3b
mistralai/ministral-3b
mistralai/ministral-8b
gryphe/mythomax-l2-13b
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Context: 4096 tokens
Max output: 4096 tokens
amazon/nova-micro-v1
Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length...
Context: 128000 tokens
Max output: 5120 tokens
microsoft/phi-4
Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...
Context: 16384 tokens
Max output: 16384 tokens
microsoft/wizardlm-2-7b
google/gemini-flash-1.5-8b
mistralai/mistral-7b-instruct
google/gemma-2-9b-it
meta-llama/llama-3.2-3b-instruct
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...
Context: 80000 tokens
Max output: N/A tokens
meta-llama/llama-3.2-1b-instruct
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...
Context: 60000 tokens
Max output: N/A tokens
meta-llama/llama-3.1-8b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...
Context: 16384 tokens
Max output: 16384 tokens
qwen/qwen-2-7b-instruct
mistralai/mistral-7b-instruct-v0.3
meta-llama/llama-3-8b-instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...
Context: 8192 tokens
Max output: 16384 tokens
mistralai/mistral-nemo
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...
Context: 131072 tokens
Max output: 16384 tokens
sao10k/l3-lunaris-8b
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge....
Context: 8192 tokens
Max output: N/A tokens
nousresearch/hermes-2-pro-llama-3-8b
Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced...
Context: 8192 tokens
Max output: 8192 tokens
openchat/openchat-7b
undi95/toppy-m-7b:nitro
amazon/nova-lite-v1
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...
Context: 300000 tokens
Max output: 5120 tokens
mistralai/pixtral-12b
Venice Uncensored 1.1
Flux Dev Uncensored (Image Generation)
Lustify SDXL (Image Generation)
Pro Models44
These models are available to Pro subscribers with unlimited usage included in the subscription.
thedrummer/unslopnemo-12b
UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
Context: 32768 tokens
Max output: 32768 tokens
meta-llama/llama-3.1-70b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...
Context: 131072 tokens
Max output: N/A tokens
nousresearch/hermes-3-llama-3.1-70b
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
Context: 131072 tokens
Max output: N/A tokens
deepseek/deepseek-chat
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations...
Context: 163840 tokens
Max output: 163840 tokens
microsoft/phi-3.5-mini-128k-instruct
ai21/jamba-1-5-mini
mistralai/codestral-mamba
openai/gpt-4o-mini
GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...
Context: 128000 tokens
Max output: 16384 tokens
anthropic/claude-3-haiku
Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance.
See the launch announcement and benchmark results here
#multimodal
Context: 200000 tokens
Max output: 4096 tokens
cognitivecomputations/dolphin-mixtral-8x22b
google/gemma-2-27b-it
Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of...
Context: 8192 tokens
Max output: 2048 tokens
mistralai/mixtral-8x7b-instruct
Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion...
Context: 32768 tokens
Max output: 16384 tokens
mistralai/mistral-small-24b-instruct-2501
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...
Context: 32768 tokens
Max output: 16384 tokens
gryphe/mythomist-7b
anthropic/claude-instant-1:beta
nvidia/llama-3.1-nemotron-70b-instruct
NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinforcement Learning from Human Feedback (RLHF), it excels...
Context: 131072 tokens
Max output: 16384 tokens
deepseek/deepseek-chat-v3-0324
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well...
Context: 163840 tokens
Max output: N/A tokens
thedrummer/rocinante-12b
Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -...
Context: 32768 tokens
Max output: 32768 tokens
eva-unit-01/eva-qwen-2.5-14b
mistralai/mistral-tiny
mistralai/mistral-small
qwen/qwen-turbo
Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.
Context: 131072 tokens
Max output: 8192 tokens
qwen/qwen-plus
Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.
Context: 1000000 tokens
Max output: 32768 tokens
deepseek/deepseek-r1-distill-qwen-1.5b
deepseek/deepseek-r1-distill-qwen-32b
DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new...
Context: 32768 tokens
Max output: 32768 tokens
deepseek/deepseek-r1-distill-llama-70b
DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across...
Context: 131072 tokens
Max output: 16384 tokens
qwen/qvq-72b-preview
qwen/qwq-32b-preview
qwen/qwen-2.5-coder-32b-instruct
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning...
Context: 32768 tokens
Max output: N/A tokens
mistralai/codestral-2501
meta-llama/llama-3.3-70b-instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
Context: 131072 tokens
Max output: 131072 tokens
deepseek/deepseek-r1-distill-llama-3.1-70b
Venice Reasoning (QwQ-32B)
Venice Small (Qwen3-4B)
Venice Medium (Mistral-31-24B)
Venice Large 1.1 (Qwen3-235B)
Llama 3.2 3B
Llama 3.3 70B
Llama 3.1 405B
Dolphin 72B
Qwen 2.5 VL 72B
Qwen 2.5 Coder 32B
DeepSeek R1 671B
DeepSeek Coder V2 Lite
Pro Metered Models21
These premium models are available on a pay-as-you-go basis with per-token pricing.
anthropic/claude-3.7-sonnet
Input: $0.000003 per token
Output: $0.000015 per token
Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and...
Context: 200000 tokens
Max output: 128000 tokens
✓ Moderated
anthropic/claude-3.7-sonnet:thinking
Input: $0.000003 per token
Output: $0.000015 per token
Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and...
Context: 200000 tokens
Max output: 64000 tokens
✗ Unmoderated
deepseek/deepseek-r1
Input: $0.0000007 per token
Output: $0.0000025 per token
DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass....
Context: 64000 tokens
Max output: 16000 tokens
✗ Unmoderated
openai/gpt-4o-2024-11-20
Input: $0.0000025 per token
Output: $0.00001 per token
The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded...
Context: 128000 tokens
Max output: 16384 tokens
✓ Moderated
openai/o3-mini-high
Input: $0.0000011 per token
Output: $0.0000044 per token
OpenAI o3-mini-high is the same model as o3-mini with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and...
Context: 200000 tokens
Max output: 100000 tokens
✓ Moderated
allenai/llama-3.1-tulu-3-405b
aion-labs/aion-1.0
Input: $0.000004 per token
Output: $0.000008 per token
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree...
Context: 131072 tokens
Max output: 32768 tokens
✗ Unmoderated
qwen/qwen-max
Input: $0.00000104 per token
Output: $0.00000416 per token
Qwen-Max, based on Qwen2.5, provides the best inference performance among Qwen models, especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion...
Context: 32768 tokens
Max output: 8192 tokens
✗ Unmoderated
openai/o1
Input: $0.000015 per token
Output: $0.00006 per token
The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason...
Context: 200000 tokens
Max output: 100000 tokens
✓ Moderated
x-ai/grok-2-1212
mistralai/mistral-large-2411
Input: $0.000002 per token
Output: $0.000006 per token
Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It provides a significant upgrade on the previous Mistral Large 24.07, with notable...
Context: 131072 tokens
Max output: N/A tokens
✗ Unmoderated
neversleep/llama-3.1-lumimaid-70b
x-ai/grok-beta
inflection/inflection-3-pi
Input: $0.0000025 per token
Output: $0.00001 per token
Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi...
Context: 8000 tokens
Max output: 1024 tokens
✗ Unmoderated
cohere/command-r-plus-08-2024
Input: $0.0000025 per token
Output: $0.00001 per token
command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint...
Context: 128000 tokens
Max output: 4000 tokens
✓ Moderated
ai21/jamba-1-5-large
01-ai/yi-large
neversleep/llama-3-lumimaid-70b
anthropic/claude-3-opus
anthropic/claude-3-sonnet
alpindale/goliath-120b
Input: $0.00000375 per token
Output: $0.0000075 per token
A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - @chargoddard for developing the framework used to merge...
Context: 6144 tokens
Max output: 1024 tokens
✗ Unmoderated