LLM Reference

LLM Reference is the wild west tracker tech leaders use to instantly find and compare the freshest AI models and providers before they ship.

Visit Website

About LLM Reference

LLM Reference is a decision-support directory built for engineers and technology leaders who need to choose the right large language model (LLM) and provider in today's fast-moving AI landscape. It tracks over 1,700 models from more than 130 providers and 235 research labs, with data refreshed weekly to include new releases, verified price changes, and benchmark updates. The core value proposition is simple: stop wasting time hunting through scattered sources and start shipping with confidence. Whether you are building a coding assistant, an agentic workflow, a writing tool, or a research pipeline, LLM Reference gives you a single, trustworthy place to compare models side-by-side, see who offers the cheapest pricing for frontier output, and browse curated editors' picks for specific tasks like coding, agents, writing, research, image generation, and video creation. The site is designed for fast triage — you can quickly identify the right model for your job, determine the most cost-effective provider, and get back to building. With a Pulse feed that highlights what changed this week, including new models, price cuts, and benchmark refreshes, LLM Reference keeps you informed without the noise. It is built by the Data Advantage project and updated daily, making it an essential resource for anyone who needs to stay current with the exploding LLM ecosystem. The platform currently tracks 1,843 language models, 140 providers, and 247 labs, with a weekly cadence of 177 new models, 53 price cuts, and 368 benchmark refreshes. This is not just a directory; it is a live intelligence hub that cuts through the hype and gives you the cold, hard data to make the right call.

Features of LLM Reference

Comprehensive Model Directory and Search

Search through over 1,800 language models from 140 providers and 247 research labs with a powerful, fast search engine. Filter by task type like coding, RAG, agents, long context, vision, classification, or JSON/tool use to zero in on exactly what you need. The directory is updated daily, so you always have the latest data on new releases and provider changes.

Editors' Picks and Curated Boards

Stop guessing which model is best for your specific job. LLM Reference features expert-curated editors' picks for six core categories: Coding, Agents, Writing, Research, Image generation, and Video creation. Each pick comes with a detailed rationale, benchmark scores, and eligibility lists, so you can see why a model like Claude Fable 5 dominates coding or why FLUX.2 Dev is the photoreal leader for images.

Pulse Feed for Real-Time Market Changes

The Pulse feed is your weekly snapshot of everything that moved in the LLM market. See 177 new models, 53 verified price cuts, and 368 benchmark refreshes at a glance. This feature keeps you informed without the noise, highlighting exactly what changed this week so you can adjust your strategy immediately. No more doomscrolling through Twitter or Reddit for scraps of news.

Side-by-Side Model Comparison

Compare any two models head-to-head with a dedicated comparison tool. See performance metrics, pricing, and benchmark scores laid out in a clean, readable format. Whether you are debating Claude Fable 5 versus GPT-5.5 or trying to decide between Gemini 3 Pro and DeepSeek V4 Pro, this feature gives you the data you need to make a confident, informed decision fast.

Frontier Pricing Tracker

Find the absolute cheapest provider for frontier output with a live tracker that shows the lowest price per million tokens. Currently, Hunyuan HY3 Preview via Tencent Cloud TI Platform leads at $0.260 per 1M output tokens. This feature is a game-changer for teams optimizing cost without sacrificing quality, giving you a direct line to the best deals on the market.

Cheat Sheet for Most-Asked Comparisons

Get instant answers to the most common model matchups without digging through the full directory. The Cheat Sheet features pre-built comparisons like Claude Fable 5 vs Claude Opus 4.8, GPT-5.5 vs Gemini 3.1 Pro Preview, and more. This is a time-saver for anyone who needs a quick, reliable take on the biggest debates in the AI community.

Use Cases of LLM Reference

Selecting the Best Coding Model for Your Team

Engineering teams building coding assistants or agentic workflows need a model that excels at code generation, debugging, and tool use. LLM Reference helps you quickly identify top performers like Claude Fable 5, which boasts an 80.3% SWE-bench Pro score and 96% SWE-bench Verified on Vals.ai. You can compare it against alternatives like Claude Opus 4.8 or GPT-5.5 to find the perfect fit for your stack, saving hours of research and trial-and-error.

Optimizing Cost for High-Volume API Calls

Startups and enterprises running high-volume inference need to minimize costs without sacrificing performance. The Frontier Pricing Tracker shows you exactly which provider offers the cheapest output for frontier models, currently Hunyuan HY3 Preview at $0.260 per 1M tokens. You can cross-reference this with benchmark scores to ensure you are getting the best value for your specific use case, whether that is summarization, translation, or data analysis.

Creative teams working on image generation, video production, or voice synthesis need specialized models that deliver top-tier results. LLM Reference's Editors' Picks for Creatives cover Image (FLUX.2 Dev), Video (Veo 3.1), Voice TTS (ElevenLabs), Transcription (Whisper large-v3-turbo), and Music (Suno AI). This lets you build a complete pipeline with confidence, knowing each component is the best-in-class option for its task.

Research and Knowledge Work Acceleration

Knowledge workers and researchers need models that excel at complex reasoning, summarization, and document analysis. LLM Reference highlights top picks like Claude Fable 5 for research (GDPval-AA ELO 1932) and Claude Opus 4.7 for writing (Chatbot Arena 1503). You can quickly find the right model for your workflow, whether you are drafting reports, analyzing financial data, or synthesizing research papers, all while tracking the latest benchmark updates.

Frequently Asked Questions

How often is the data on LLM Reference updated?

The data is refreshed weekly with new models, verified price changes, and benchmark updates. The Pulse feed highlights exactly what changed each week, including new releases, price cuts, and benchmark refreshes. Currently, the platform tracks 177 new models, 53 price cuts, and 368 benchmark refreshes on a weekly cadence. The site itself is updated daily, ensuring you always have the most current information.

How are the Editors' Picks determined?

Editors' Picks are curated by the LLM Reference team based on a combination of benchmark performance, real-world testing, and community feedback. Each pick comes with a detailed rationale, including specific benchmark scores and eligibility lists. For example, Claude Fable 5 is the top coding pick due to its 80.3% SWE-bench Pro score and 96% SWE-bench Verified on Vals.ai. The picks are updated regularly as new models and benchmarks emerge.

Can I compare models from different providers side-by-side?

Yes, the Compare tool allows you to put any two models head-to-head. You can see performance metrics, pricing, and benchmark scores in a clean, readable format. This is ideal for making informed decisions between models like Claude Fable 5 versus GPT-5.5 or Gemini 3 Pro versus DeepSeek V4 Pro. The tool is designed to give you all the data you need to make a confident choice fast.

Is there a way to see the cheapest pricing for frontier models?

Absolutely. The Frontier Pricing Tracker shows the lowest price per million output tokens for frontier models. Currently, Hunyuan HY3 Preview via Tencent Cloud TI Platform leads at $0.260 per 1M output tokens. This feature is updated weekly to reflect verified provider price cuts, so you can always find the best deal on top-tier performance without wasting time hunting through scattered pricing pages.

What types of tasks can I filter models by?

You can filter the model directory by eight specific task types: Coding, RAG, Agents, Long context, Vision, Classification, JSON/Tool use, and a Browse all option. This makes it easy to zero in on models that are optimized for your specific use case, whether you are building a coding assistant, a document analysis pipeline, or a multimodal creative tool. Each filter surfaces the most relevant models based on benchmark performance and community validation.

Pricing of LLM Reference

LLM Reference itself is a free-to-use directory and decision-support tool. There are no subscription plans or paywalls for accessing the model directory, Editors' Picks, Pulse feed, comparison tools, or any other core features. The platform is built by the Data Advantage project and is available to everyone at no cost. However, note that the pricing data shown for individual models (e.g., $0.260 per 1M output tokens) reflects the costs charged by the respective providers (like Tencent Cloud, Anthropic, or OpenAI) for API access. LLM Reference does not charge for its directory or analysis services.

Explore more in this category:

Best AI Assistants AI tools

View all alternatives for LLM Reference