KinkyGay Logo
KinkyGay
Chat
Community
Referral

Models Directory

Browse all 92 available language models and their capabilities

Free Models27

These models are available to all users without any subscription or pay-as-you-go charges.

liquid/lfm-7b

liquid/lfm-3b

mistralai/ministral-3b

mistralai/ministral-8b

gryphe/mythomax-l2-13b

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

Context: 4096 tokens

Max output: 4096 tokens

amazon/nova-micro-v1

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length...

Context: 128000 tokens

Max output: 5120 tokens

microsoft/phi-4

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...

Context: 16384 tokens

Max output: 16384 tokens

microsoft/wizardlm-2-7b

google/gemini-flash-1.5-8b

mistralai/mistral-7b-instruct

google/gemma-2-9b-it

meta-llama/llama-3.2-3b-instruct

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...

Context: 80000 tokens

Max output: N/A tokens

meta-llama/llama-3.2-1b-instruct

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...

Context: 60000 tokens

Max output: N/A tokens

meta-llama/llama-3.1-8b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...

Context: 16384 tokens

Max output: 16384 tokens

qwen/qwen-2-7b-instruct

mistralai/mistral-7b-instruct-v0.3

meta-llama/llama-3-8b-instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...

Context: 8192 tokens

Max output: 16384 tokens

mistralai/mistral-nemo

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...

Context: 131072 tokens

Max output: 16384 tokens

sao10k/l3-lunaris-8b

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge....

Context: 8192 tokens

Max output: N/A tokens

nousresearch/hermes-2-pro-llama-3-8b

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced...

Context: 8192 tokens

Max output: 8192 tokens

openchat/openchat-7b

undi95/toppy-m-7b:nitro

amazon/nova-lite-v1

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...

Context: 300000 tokens

Max output: 5120 tokens

mistralai/pixtral-12b

Venice Uncensored 1.1

Flux Dev Uncensored (Image Generation)

Lustify SDXL (Image Generation)

Pro Models44

These models are available to Pro subscribers with unlimited usage included in the subscription.

thedrummer/unslopnemo-12b

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.

Context: 32768 tokens

Max output: 32768 tokens

meta-llama/llama-3.1-70b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...

Context: 131072 tokens

Max output: N/A tokens

nousresearch/hermes-3-llama-3.1-70b

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Context: 131072 tokens

Max output: N/A tokens

deepseek/deepseek-chat

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations...

Context: 163840 tokens

Max output: 163840 tokens

microsoft/phi-3.5-mini-128k-instruct

ai21/jamba-1-5-mini

mistralai/codestral-mamba

openai/gpt-4o-mini

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...

Context: 128000 tokens

Max output: 16384 tokens

anthropic/claude-3-haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance.

See the launch announcement and benchmark results here

#multimodal

Context: 200000 tokens

Max output: 4096 tokens

cognitivecomputations/dolphin-mixtral-8x22b

google/gemma-2-27b-it

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of...

Context: 8192 tokens

Max output: 2048 tokens

mistralai/mixtral-8x7b-instruct

Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion...

Context: 32768 tokens

Max output: 16384 tokens

mistralai/mistral-small-24b-instruct-2501

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...

Context: 32768 tokens

Max output: 16384 tokens

gryphe/mythomist-7b

anthropic/claude-instant-1:beta

nvidia/llama-3.1-nemotron-70b-instruct

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinforcement Learning from Human Feedback (RLHF), it excels...

Context: 131072 tokens

Max output: 16384 tokens

deepseek/deepseek-chat-v3-0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well...

Context: 163840 tokens

Max output: N/A tokens

thedrummer/rocinante-12b

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -...

Context: 32768 tokens

Max output: 32768 tokens

eva-unit-01/eva-qwen-2.5-14b

mistralai/mistral-tiny

mistralai/mistral-small

qwen/qwen-turbo

Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.

Context: 131072 tokens

Max output: 8192 tokens

qwen/qwen-plus

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

Context: 1000000 tokens

Max output: 32768 tokens

deepseek/deepseek-r1-distill-qwen-1.5b

deepseek/deepseek-r1-distill-qwen-32b

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new...

Context: 32768 tokens

Max output: 32768 tokens

deepseek/deepseek-r1-distill-llama-70b

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across...

Context: 131072 tokens

Max output: 16384 tokens

qwen/qvq-72b-preview

qwen/qwq-32b-preview

qwen/qwen-2.5-coder-32b-instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning...

Context: 32768 tokens

Max output: N/A tokens

mistralai/codestral-2501

meta-llama/llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Context: 131072 tokens

Max output: 131072 tokens

deepseek/deepseek-r1-distill-llama-3.1-70b

Venice Reasoning (QwQ-32B)

Venice Small (Qwen3-4B)

Venice Medium (Mistral-31-24B)

Venice Large 1.1 (Qwen3-235B)

Llama 3.2 3B

Llama 3.3 70B

Llama 3.1 405B

Dolphin 72B

Qwen 2.5 VL 72B

Qwen 2.5 Coder 32B

DeepSeek R1 671B

DeepSeek Coder V2 Lite

Pro Metered Models21

These premium models are available on a pay-as-you-go basis with per-token pricing.

anthropic/claude-3.7-sonnet

Input: $0.000003 per token

Output: $0.000015 per token

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and...

Context: 200000 tokens

Max output: 128000 tokens

✓ Moderated

anthropic/claude-3.7-sonnet:thinking

Input: $0.000003 per token

Output: $0.000015 per token

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and...

Context: 200000 tokens

Max output: 64000 tokens

✗ Unmoderated

deepseek/deepseek-r1

Input: $0.0000007 per token

Output: $0.0000025 per token

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass....

Context: 64000 tokens

Max output: 16000 tokens

✗ Unmoderated

openai/gpt-4o-2024-11-20

Input: $0.0000025 per token

Output: $0.00001 per token

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded...

Context: 128000 tokens

Max output: 16384 tokens

✓ Moderated

openai/o3-mini-high

Input: $0.0000011 per token

Output: $0.0000044 per token

OpenAI o3-mini-high is the same model as o3-mini with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and...

Context: 200000 tokens

Max output: 100000 tokens

✓ Moderated

allenai/llama-3.1-tulu-3-405b

aion-labs/aion-1.0

Input: $0.000004 per token

Output: $0.000008 per token

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree...

Context: 131072 tokens

Max output: 32768 tokens

✗ Unmoderated

qwen/qwen-max

Input: $0.00000104 per token

Output: $0.00000416 per token

Qwen-Max, based on Qwen2.5, provides the best inference performance among Qwen models, especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion...

Context: 32768 tokens

Max output: 8192 tokens

✗ Unmoderated

openai/o1

Input: $0.000015 per token

Output: $0.00006 per token

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason...

Context: 200000 tokens

Max output: 100000 tokens

✓ Moderated

x-ai/grok-2-1212

mistralai/mistral-large-2411

Input: $0.000002 per token

Output: $0.000006 per token

Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It provides a significant upgrade on the previous Mistral Large 24.07, with notable...

Context: 131072 tokens

Max output: N/A tokens

✗ Unmoderated

neversleep/llama-3.1-lumimaid-70b

x-ai/grok-beta

inflection/inflection-3-pi

Input: $0.0000025 per token

Output: $0.00001 per token

Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi...

Context: 8000 tokens

Max output: 1024 tokens

✗ Unmoderated

cohere/command-r-plus-08-2024

Input: $0.0000025 per token

Output: $0.00001 per token

command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint...

Context: 128000 tokens

Max output: 4000 tokens

✓ Moderated

ai21/jamba-1-5-large

01-ai/yi-large

neversleep/llama-3-lumimaid-70b

anthropic/claude-3-opus

anthropic/claude-3-sonnet

alpindale/goliath-120b

Input: $0.00000375 per token

Output: $0.0000075 per token

A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - @chargoddard for developing the framework used to merge...

Context: 6144 tokens

Max output: 1024 tokens

✗ Unmoderated