We have all seen websites that are dedicated to LLM testing. With Google Stax, you can test your own LLMs. You can just create a new project, enter your prompt, and compare results side by side. For OpenAI and other models, you are going to have to enter your API credentials. Gemini, OpenAI, Claude, Mistral, Grok, and DeepSeek models are all supported.
For each model, you can enter your system instructions. You can upload your dataset in CSV format. For your evaluator, you have the option to change prompt instructions and pick a model. A sample prompt is included but you can use AI to improve even that.
[HT]