Dataset Library
Reasoning traces for distilling frontier models
Curated datasets built by querying Claude, GPT, Gemini and other frontier models with diverse coding, math, and reasoning prompts. Designed for training small open models that still think clearly.
What's included
Each dataset includes detailed reasoning traces, carefully filtered conversations, and metadata ready for fine-tuning. Listings are synced hourly from Hugging Face.
claude-4.5-opus-high-reasoning-250x
Distilled from Claude Opus 4.5
gpt-5.2-high-reasoning-250x
gemini-3-pro-preview-high-reasoning-250x
Distilled from Gemini 3 Pro
gemini-3-pro-preview-high-reasoning-1000x
Distilled from Gemini 3 Pro
Pony-Alpha-15k
claude-sonnet-4.5-high-reasoning-250x
Distilled from Claude Sonnet 4.5
Step-3.5-Flash-2600x
gpt-5.1-codex-max-1000x
Distilled from GPT-5.1
convo-v1
claude-haiku-4.5-high-reasoning-1700x
gemini-3-flash-preview
glm-4.7-2000x
MiniMax-M2.1-Code-SFT
gpt-5.1-high-reasoning-1000x
Distilled from GPT-5.1
deepseek-v3.2-speciale-OpenCodeReasoning-3k
Distilled from DeepSeek v3.2 Speciale
claude-haiku-4.5-1700x
deepseek-v3.2-speciale-1000x
Distilled from DeepSeek v3.2 Speciale
deepseek-v3.2-speciale-openr1-math-3k
Distilled from DeepSeek v3.2 Speciale
gemini-2.5-flash-11000x
Distilled from Gemini 2.5 Flash
MiniMax-M2.1-8800x
kimi-k2-thinking-1000x
Distilled from Kimi K2
sherlock-thinking-alpha-11000x
gpt-5-codex-250x
Distilled from GPT-5 Codex
gpt-5-codex-1000x
Distilled from GPT-5 Codex
Aurora-Alpha-15.5k
gemini-3-flash-preview-1000x
grok-code-fast-1-1000x
Distilled from Grok
glm-4.7-350x
minimax-m2.1-1000x
MiMo-V2-Flash-2300x
brainstorm-v3.1-grok-4-fast-200x
Distilled from Grok
Gemini-3-Flash-Preview-VIBE
mistral-small-creative-500x
Distilled from Mistral
gemini-2.5-flash-lite-2509-preview-1000x
Distilled from Gemini 2.5 Flash
glm-4.6-250x
Distilled from GLM 4.6
polaris-alpha-1000x
kimi-k2-thinking-250x
Distilled from Kimi K2