UC Berkeley AI Sandbox BETA: Available AI Models

This page lists AI models available through the UC Berkeley AI Sandbox BETA.

Note: You must be logged in to the UC Berkeley AI Sandbox BETA for the direct "Start Chat" and "Copy Link" buttons to work properly.

For any questions about these models or to report issues, please contact us at aiplatform-team@lists.berkeley.edu.

Recommended Models

These recommended model choices offer a good balance between features, performance and cost.

Gemini 2.5 Flash Lite

Good choice for general use. • Web search; Vision; 1M; Jan 2025; $0.10/M in, $0.40/M out

gpt-4o-mini

Good choice for general use. • Web search (with citations); Vision; 128k; Sep 2023; $0.15/M in, $0.60/M out

Vision (receive images)

amazon.nova-pro-v1:0
gpt-4.1-nano
Gemini 2.5 Flash Lite
[Meta] Llama Maverick 17B Instruct
gemini-2.5-flash
gemini-2.5-pro
gpt-3.5-turbo
o3
o4-mini
claude-3-7-sonnet
claude-sonnet-4
gpt-4o-mini

Amazon

amazon.nova-pro-v1:0

A highly capable multimodal model processing text and images, with a 300K token context window. Offers an excellent balance of accuracy, speed, and cost efficiency for a wide range of tasks. Supports over 200 languages (optimized for 15 major languages).
amazon.nova-micro-v1:0

A text-only model optimized for lowest latency responses at very low cost. Features a 128K token context window. Particularly efficient for text-based workflows where speed is essential.

Anthropic

claude-3-5-haiku

A fast and intelligent model that combines rapid response times with improved reasoning capabilities. Claude 3.5 Haiku matches the performance of Claude 3 Opus while being much more efficient. Perfect for interactive chatbots, customer service applications, and processing large volumes of data quickly. Excellent for code suggestions and educational platforms where speed matters.
claude-3-7-sonnet

A highly capable model featuring strong coding capabilities and enhanced reasoning. Can produce detailed outputs up to 128K tokens long, making it ideal for advanced coding workflows, complex problem-solving, and tasks requiring comprehensive analysis. Excellent for detailed explanations, creative writing, and technical documentation.
claude-sonnet-4

A highly advanced model optimized for coding and development tasks. Balances performance, responsiveness, and cost, making it excellent for production workloads like code reviews, bug fixes, and feature development. Ideal for software engineering tasks, technical writing, and complex reasoning across various domains.

DeepSeek

deepseek

DeepSeek V3 is a powerful Mixture-of-Experts (MoE) model with 671B total parameters (37B active per token). Excels in reasoning tasks, front-end web development, and Chinese content creation. Notable for its enhanced Chinese writing style, multi-turn interactions, and translation quality. Features improved function calling accuracy and detailed report analysis capabilities.
deepseek.r1

A reasoning-focused chat model designed for step-by-step problem solving in language, scientific reasoning, and coding tasks. It has 671B total parameters (37B active) and a 128K token context window. Particularly strong with Chinese writing, offering enhanced style and quality for medium-to-long-form content, improved multi-turn interactions, and optimized translation quality.

Google

Gemini 2.5 Flash Lite

Google's most cost-effective model optimized for high throughput and low latency. Features a 1 million token context window and supports text and image input. The fastest model in the 2.5 line while outperforming Gemini 2.0 Flash on most benchmarks. Perfect for real-time applications and high-volume tasks where speed and cost efficiency are priorities.
gemini-2.5-flash

Google's best model for price-performance balance, offering well-rounded capabilities. Supports text and image input for versatile multimodal tasks. Ideal for tasks requiring a balance of cost efficiency, speed, and intelligent reasoning across various domains.
gemini-2.5-pro

Google's most advanced reasoning model designed to solve complex problems with maximum response accuracy. Supports text and image input for multimodal understanding. Excels at difficult problems, analyzing large databases, complex coding, and visual analysis. Best choice when you need the highest level of intelligence and reasoning capability.

Inception

[Inception] جبل جس (Jais 30B Chat)

A powerful bilingual model from Inception, fluent in both Arabic and English. Jais is specifically trained to understand the nuances of Arabic language and culture, making it an excellent choice for applications targeting Arabic-speaking users. This 30B parameter model delivers strong performance without compromising its English capabilities.

Maximum context tokens: 8000

Microsoft

[Microsoft] Phi-4 Mini

A lightweight and versatile model from Microsoft with a large 128K token context window. It excels at tasks requiring strong reasoning, like math and logic. Trained on a high-quality, multilingual dataset (with data up to June 2024), it's a great choice for a wide range of applications, especially where efficiency is important.
[Microsoft] Phi-4 Multimodal

This powerful multimodal model from Microsoft processes both text and images. It offers strong language capabilities, a large 128K token context window, and is ideal for tasks requiring visual understanding alongside text. Trained on data up to June 2024, it supports numerous languages.

Mistral

mistral-7b

A small but versatile model excellent for text summarization, classification, and code completion. Despite its smaller size, it handles complex tasks with remarkable efficiency. Works with text up to 8,000 tokens and excels at step-by-step reasoning, including math problems. Perfect for applications needing a balance of performance and efficiency. Available for unrestricted use through an open license.
mistral-large-2407

Mistral's flagship model with 123B parameters and a 128K token context window. Excels at code generation, mathematical reasoning, and providing concise responses. Supports dozens of languages including French, German, Spanish, Chinese, Japanese, and 80+ coding languages. Features advanced function calling capabilities and reduced hallucination tendencies.

OpenAI

gpt-4.1-nano

OpenAI's fastest and most cost-effective model with exceptional performance at a small size. Features a 1 million token context window and scores impressively on academic benchmarks. Supports text and image input, making it ideal for tasks like classification, autocompletion, and rapid processing where speed and efficiency are priorities. Delivers strong reasoning capabilities at the lowest cost point.
o3

OpenAI's most powerful reasoning model that excels across coding, mathematics, science, and visual perception. Sets new benchmarks in complex problem-solving and makes 20% fewer major errors than previous models on difficult real-world tasks. Supports text and image input with state-of-the-art performance in multimodal reasoning. Ideal for complex queries requiring multi-faceted analysis, programming challenges, and visual tasks like analyzing charts and graphics.
o4-mini

A cost-effective reasoning model from OpenAI designed for complex problem-solving tasks. Supports text and image input with strong performance in mathematics, coding, and logical reasoning. Offers a good balance of capability and efficiency, making it suitable for applications requiring analytical thinking at a lower cost than larger models.
gpt-3.5-turbo

An earlier generation model from OpenAI that offers basic conversational AI capabilities at a lower cost. Features a 16k token context window and 4k max output tokens. While newer models significantly outperform it in reasoning, coding, and complex tasks, GPT-3.5 Turbo can be useful for simple applications, cost-sensitive projects, or as a baseline for comparing performance with more advanced models.
gpt-4o-mini

A compact and efficient multimodal model from OpenAI with vision capabilities. Offers strong performance across various tasks while being more cost-effective than larger models. Ideal for applications requiring both text and image understanding at scale.

xAI

grok-3

xAI's enterprise-focused model designed for business applications like finance, healthcare, and legal work. Excels at data extraction, coding, and text summarization with exceptional instruction-following capabilities. Features a large 131K token context window to handle extensive documents and supports over 25 languages including English, Spanish, French, German, Japanese, and Chinese.
grok-3-mini

A lightweight reasoning model from xAI that shows its work by providing detailed thinking traces. Designed for complex problem-solving in coding, mathematics, and science with adjustable "thinking budgets" for different levels of analysis. Features the same 131K context window as Grok 3 and supports 25+ languages, making it ideal for tasks requiring transparent reasoning.

Agent Models

gpt-4.1-nano
o4-mini
o3
gpt-3.5-turbo
gemini-2.5-flash
Gemini 2.5 Flash Lite
gemini-2.5-pro