UC Berkeley AI Sandbox BETA: Available AI Models

This page lists AI models available through the UC Berkeley AI Sandbox BETA.

Note: You must be logged in to the UC Berkeley AI Sandbox BETA for the direct "Start Chat" and "Copy Link" buttons to work properly.

For any questions about these models or to report issues, please contact us at aiplatform-team@lists.berkeley.edu.

Thinking iconThinking

Vision iconVision (receive images)

Amazon iconAmazon

  • amazon.nova-pro-v1:0

    A highly capable multimodal model processing text and images, with a 300K token context window. Offers an excellent balance of accuracy, speed, and cost efficiency for a wide range of tasks. Supports over 200 languages (optimized for 15 major languages).

  • amazon.nova-micro-v1:0

    A text-only model optimized for lowest latency responses at very low cost. Features a 128K token context window. Particularly efficient for text-based workflows where speed is essential.

Anthropic iconAnthropic

  • claude-3-5-haiku

    A fast and intelligent model that combines rapid response times with improved reasoning capabilities. Claude 3.5 Haiku matches the performance of Claude 3 Opus while being much more efficient. Perfect for interactive chatbots, customer service applications, and processing large volumes of data quickly. Excellent for code suggestions and educational platforms where speed matters.

  • claude-sonnet-4-5

    Most capable Anthropic model for long-horizon agents, coding, research, and computer use. Suited to production-grade multi-step agents and complex workflows. Max tokens: 200K.

  • claude-haiku-4-5

    Near-frontier Haiku model offering fast, cost-effective performance for real-time agents, coding sub-agents, and high-volume experiences. Max tokens: 200K.

  • claude-3-7-sonnet

    A highly capable model featuring strong coding capabilities and enhanced reasoning. Can produce detailed outputs up to 128K tokens long, making it ideal for advanced coding workflows, complex problem-solving, and tasks requiring comprehensive analysis. Excellent for detailed explanations, creative writing, and technical documentation.

  • claude-sonnet-4

    A highly advanced model optimized for coding and development tasks. Balances performance, responsiveness, and cost, making it excellent for production workloads like code reviews, bug fixes, and feature development. Ideal for software engineering tasks, technical writing, and complex reasoning across various domains.

DeepSeek iconDeepSeek

  • deepseek

    DeepSeek V3 is a powerful Mixture-of-Experts (MoE) model with 671B total parameters (37B active per token). Excels in reasoning tasks, front-end web development, and Chinese content creation. Notable for its enhanced Chinese writing style, multi-turn interactions, and translation quality. Features improved function calling accuracy and detailed report analysis capabilities.

  • deepseek.r1

    A reasoning-focused chat model designed for step-by-step problem solving in language, scientific reasoning, and coding tasks. It has 671B total parameters (37B active) and a 128K token context window. Particularly strong with Chinese writing, offering enhanced style and quality for medium-to-long-form content, improved multi-turn interactions, and optimized translation quality.

Google iconGoogle

  • Gemini 2.5 Flash Lite

    Google's most cost-effective model optimized for high throughput and low latency. Features a 1 million token context window and supports text and image input. The fastest model in the 2.5 line while outperforming Gemini 2.0 Flash on most benchmarks. Perfect for real-time applications and high-volume tasks where speed and cost efficiency are priorities.

  • gemini-2.5-flash

    Google's best model for price-performance balance, offering well-rounded capabilities. Supports text and image input for versatile multimodal tasks. Ideal for tasks requiring a balance of cost efficiency, speed, and intelligent reasoning across various domains.

  • gemini-2.5-pro

    Google's most advanced reasoning model (until Gemini 3 Pro, below) designed to solve complex problems with maximum response accuracy. Supports text and image input for multimodal understanding. Excels at difficult problems, analyzing large databases, complex coding, and visual analysis. Best choice when you need the highest level of intelligence and reasoning capability.

  • gemini-3-flash-preview

    Gemini 3 Flash (Preview) — a multimodal Google model supporting advanced vision and reasoning. Designed for high-capability visual understanding and deep reasoning, suitable for image + text tasks and agentic workflows. Released 2025 with a 1M token context window and strong performance on multimodal benchmarks.

  • gemini-3-pro-preview

    Google's most advanced reasoning Gemini model, capable of solving complex problems. 1M token context window. Training cutoff date January 2025, released Nov 2025.

Meta iconMeta

  • [Meta] Llama Maverick 17B Instruct

    A powerful and efficient model from Meta that understands both text and images. With 17 billion active parameters, it delivers top-tier performance for a wide range of tasks, from creative writing to analyzing images. It supports 12 languages, including English, Spanish, French, and German, making it a great choice for multilingual conversations and complex tasks.

  • llama3-2-1b

    A lightweight model from Meta, designed to run efficiently on devices like phones and laptops. While available here for you to explore, it is optimized for on-device speed rather than the complex reasoning of larger models. It's a great way to experience a fast, responsive AI that is an example of a model optimized for mobile use. Supports 8 languages, including English, Spanish, French, and German.

Microsoft iconMicrosoft

  • [Microsoft] Phi-4 Mini

    A lightweight and versatile model from Microsoft with a large 128K token context window. It excels at tasks requiring strong reasoning, like math and logic. Trained on a high-quality, multilingual dataset (with data up to June 2024), it's a great choice for a wide range of applications, especially where efficiency is important.

  • [Microsoft] Phi-4 Multimodal

    This powerful multimodal model from Microsoft processes both text and images. It offers strong language capabilities, a large 128K token context window, and is ideal for tasks requiring visual understanding alongside text. Trained on data up to June 2024, it supports numerous languages.

Mistral iconMistral

  • mistral-7b

    A small but versatile model excellent for text summarization, classification, and code completion. Despite its smaller size, it handles complex tasks with remarkable efficiency. Works with text up to 8,000 tokens and excels at step-by-step reasoning, including math problems. Perfect for applications needing a balance of performance and efficiency. Available for unrestricted use through an open license.

  • mistral-large-2407

    Mistral's flagship model with 123B parameters and a 128K token context window. Excels at code generation, mathematical reasoning, and providing concise responses. Supports dozens of languages including French, German, Spanish, Chinese, Japanese, and 80+ coding languages. Features advanced function calling capabilities and reduced hallucination tendencies.

OpenAI iconOpenAI

  • gpt-4.1-nano

    OpenAI's fastest and most cost-effective model with exceptional performance at a small size. Features a 1 million token context window and scores impressively on academic benchmarks. Supports text and image input, making it ideal for tasks like classification, autocompletion, and rapid processing where speed and efficiency are priorities. Delivers strong reasoning capabilities at the lowest cost point.

  • o3

    OpenAI's most powerful reasoning model that excels across coding, mathematics, science, and visual perception. Sets new benchmarks in complex problem-solving and makes 20% fewer major errors than previous models on difficult real-world tasks. Supports text and image input with state-of-the-art performance in multimodal reasoning. Ideal for complex queries requiring multi-faceted analysis, programming challenges, and visual tasks like analyzing charts and graphics.

  • o4-mini

    A cost-effective reasoning model from OpenAI designed for complex problem-solving tasks. Supports text and image input with strong performance in mathematics, coding, and logical reasoning. Offers a good balance of capability and efficiency, making it suitable for applications requiring analytical thinking at a lower cost than larger models.

  • gpt-3.5-turbo

    An earlier generation model from OpenAI that offers basic conversational AI capabilities at a lower cost. Features a 16k token context window and 4k max output tokens. While newer models significantly outperform it in reasoning, coding, and complex tasks, GPT-3.5 Turbo can be useful for simple applications, cost-sensitive projects, or as a baseline for comparing performance with more advanced models.

  • gpt-4o-mini

    A compact and efficient multimodal model from OpenAI with vision capabilities. Offers strong performance across various tasks while being more cost-effective than larger models. Ideal for applications requiring both text and image understanding at scale.

  • gpt-5-chat

    OpenAI's latest chat-optimized model with enhanced capabilities for conversational AI. Features a 128K token context window with up to 16K output tokens and knowledge cutoff of September 30, 2024. Represents OpenAI's most advanced improvements for chat use cases, offering superior reasoning and response quality. Ideal for complex conversations, detailed analysis, and applications requiring the latest AI capabilities.

  • gpt-5-mini

    GPT-5 mini — a faster, cost-efficient version of GPT-5 for well-defined tasks and precise prompts. Reasoning: High. Speed: Fast. Supports text and image input, text output. Features a 400,000 token context window and up to 128,000 max output tokens. Knowledge cutoff: May 31, 2024.

  • gpt-5.1

    GPT-5.1 is OpenAI's flagship model for coding and agentic tasks with configurable reasoning effort. Reasoning: Higher. Speed: Fast. Supports text and image input, text output. Features a 400,000 token context window and up to 128,000 max output tokens. Knowledge cutoff: Sep 30, 2024.

  • gpt-5.2

    GPT-5.2 represents a major advancement in AI reasoning and reliability, featuring enhanced multi-step problem-solving capabilities and significantly reduced hallucinations compared to previous versions. This model excels at complex mathematical reasoning, scientific analysis, and professional coding tasks, with dramatic improvements on benchmarks like AIME (100% accuracy) and substantial gains in abstract reasoning. GPT-5.2 offers superior instruction following and multimodal understanding, making it ideal for both rapid daily tasks and expert-level technical work. 400K token context window.

  • gpt-oss-20b

    OpenAI's gpt-oss-20b — an OpenAI open-weight model available under the Apache 2.0 license. Strong for coding, reasoning, and tool use; optimized for efficient deployment. Max tokens: 128K.

  • gpt-oss-120b

    OpenAI's gpt-oss-120b — an OpenAI open-weight model available under the Apache 2.0 license. Designed for coding, scientific analysis, and complex reasoning with strong tool use and long-context handling. Max tokens: 128K.

Qwen iconQwen

  • Qwen3 32B

    Balanced dense model offering strong reasoning and general-purpose performance with predictable deployment on standard infra. Good for reasoning, coding, and research use cases. Note: in this deployment Qwen is currently limited to about 2,000 words (~3,500 tokens) per conversation.

  • qwen3-235b

    Frontier Mixture-of-Experts model delivering strong multilingual reasoning and long-context capability — ideal for enterprise-grade agentic tasks and research that needs extended context. Note: in this deployment Qwen is currently limited to about 2,000 words (~3,500 tokens) per conversation.

  • qwen3-coder-30b

    Coder-focused MoE model that balances strong coding and reasoning capability with practical deployment costs. Great for code generation, debugging, and developer workflows. Note: in this deployment Qwen is currently limited to about 2,000 words (~3,500 tokens) per conversation.

  • qwen3-coder-480b

    Flagship MoE coding model delivering frontier-level reasoning and state-of-the-art performance across software development and math-heavy tasks. Best for large-scale coding and agent workflows. Note: in this deployment Qwen is currently limited to about 2,000 words (~3,500 tokens) per conversation.

xAI iconxAI

  • grok-3

    xAI's enterprise-focused model designed for business applications like finance, healthcare, and legal work. Excels at data extraction, coding, and text summarization with exceptional instruction-following capabilities. Features a large 131K token context window to handle extensive documents and supports over 25 languages including English, Spanish, French, German, Japanese, and Chinese.

  • grok-3-mini

    A lightweight reasoning model from xAI that shows its work by providing detailed thinking traces. Designed for complex problem-solving in coding, mathematics, and science with adjustable "thinking budgets" for different levels of analysis. Features the same 131K context window as Grok 3 and supports 25+ languages, making it ideal for tasks requiring transparent reasoning.

Eliza iconEliza

  • Eliza

    Eliza is a re-creation of the classic 1966 ELIZA chatbot that uses pattern-matching and scripted transformations to reflect user input. It demonstrates early conversational NLP techniques and produces reflective, therapist-like replies that encourage users to continue talking. Eliza is intended for demonstrations and education — it does not respond with the same richness of modern LLMs, and is not suitable as a source of advice. Visit the project repository for background and source code: Eliza_GPT_lambda on GitHub.

Agent Models