Response shape
The endpoint returns an OpenAI-compatible list object:Field reference
The response container type. This is usually
list.The currently available model records. Treat this as dynamic rather than a permanent catalog.
The model ID to pass as
model in /v1/chat/completions. You can also use selectors such as default, fast, and pro where supported.The access tier for the model.
turbo models are fast models available to anonymous and free-token users, subject to lower rate and token limits.pro models are available to Pro subscribers and users with a topped-up balance. Pro subscription allowance is calculated dynamically across the billing period and can be checked in the dashboard.Per-model pricing metadata used to calculate request cost for paid usage and paid allowance accounting.
Input-token price in the listed
currency and unit.Output-token price in the listed
currency and unit.The pricing currency, for example
USD.The pricing unit, for example
1M tokens.Optional minimum cost applied to each request, even when the input and output token total would cost less.
Optional cache accounting floor. When present, cache-related billing treats each request as using at least this many cache tokens.
How pricing is calculated.
token means usage is priced from input and output token counts.Input and output types supported by the model. Models with
image in modalities.input can accept image inputs for vision workflows. Output is usually text.The maximum context the model can process in one request, including prompt input and generated output. Models may report this in
tokens, chars, or both.true means the model is only available through paid usage accounting, such as a Pro allowance or topped-up balance.Whether the model supports streamed responses.
Whether the model supports JSON mode.
Whether the model supports reasoning-style behavior.
Whether the model supports tool and function calling.
Access and limits
| Access type | Models | Token availability | Rate limits |
|---|---|---|---|
| Anonymous | turbo models | 500,000 tokens per day | 1 request/second, 10/minute, 60/hour |
| Free token | turbo models | 1,000,000 tokens per day | 2 requests/second, 40/minute, 100/hour |
| Pro subscription | pro and turbo models | Dynamic Pro allowance for the billing period | Higher paid limits |
| Topped-up balance | pro and turbo models | Usage billed from balance | Higher paid limits |
You can see current Pro allowance and billing status in the LLM7.io dashboard.
Estimating request cost
For token-priced models, calculate cost from the input and output token counts:minimum_request_price_usd is present, the charged request cost is at least that value:
currency and unit fields instead of assuming all models share the same pricing unit forever.
