Introducing Help Me Choose

AI peer review: when AIs cross-examine each other to help you synthesize diverse perspectives

Jimmy Lin, Chief Scientist and Professor, University of Waterloo

Back

Today, we are proud to introduce “Help Me Choose”, a new Yupp product feature where AIs critique each other and debate among themselves to help users synthesize diverse perspectives and get the best answer out of their own dedicated “AI council”.

From the very beginning, Yupp has imagined a future where there will be thousands, if not millions, of AIs interacting with users on a daily basis. This is fast becoming a reality, as we are living in a world where new and powerful models are announced almost daily. We believe that users can greatly benefit from multiple AIs collaborating and competing with each other.

Obtaining Multiple AI Answers From Yupp

Yupp is a fun and easy way to discover, compare, and use the latest AIs – all while helping to shape the future of the field. We provide access to the latest AIs (800 and counting), all for free. Instead of just one AI responding to your prompts, you can interact with two (or more) AIs at the same time.

Why would you want that? Maybe you're curious about the latest AIs. Maybe you enjoy having multiple perspectives. In real life, you might consult different friends on different topics: Why should it be different with AIs? This one might be great for brainstorming, that one might excel at helping you answer complex questions, and a third AI might be the best at helping you reword delicate emails. You might even have a go-to AI that excels at explaining moral quandaries in language that even a five year old can understand.

Yupp helps with this. At every turn, we analyze your prompt and choose responses from two AIs to offer you diverse, high-quality perspectives (and of course, you can ask for even more AIs to chime in). Think of it as your own personal AI council of advisors!

However, this powerful capability creates its own challenge: having diverse perspectives means that you have more text to read and synthesize. If only Yupp could somehow help you choose the best response… perhaps using the AIs themselves?

AI Peer Review

This is exactly what “Help Me Choose” (HMC) does: We let the AIs critique each other and themselves, and also bring in a third AI to review both responses.

Perhaps it’s easier to illustrate with an example. Let’s consider the age-old question of Jordan vs. Lebron as the greatest of all time (GOAT). Perhaps it’s a debate you’ve engaged in yourself? Let’s ask Yupp and see what the AIs have to say:

Indeed, both Claude Sonnet 4.5 and Grok 4 present persuasive arguments… but how do you decide between them?

Let’s invoke “Help Me Choose”:

You’ll see that the feature provides additional feedback in a fixed structure. In particular:

On the top, Yupp invokes “review by a 3rd AI” to adjudicate, offering a neutral review of both responses.
On the bottom, Yupp provides “model cross-check”, where the same two AIs are invoked to critique both their own responses and the other’s. Here, we’re getting Claude Sonnet 4.5 and Grok 4 to be self reflective and also to engage with each other. Indeed, they both poke at each other’s arguments and refine their own responses.

Let your own AI council help you choose

Here’s an analogy to illustrate our thinking behind “Help Me Choose”: Yupp has assembled a “council” of AI models for your prompt. Just like with a group of well-informed human experts, the council members are given an opportunity to critique each other, and themselves, in light of their initial responses.

And, just as a group of human experts discussing a question might have a leader, your council of AI models has a “council elder” – in this case, Yupp’s own customized AI model, which offers you helpful insights about the different models’ responses. It’s like a TL;DR: short, clear, and with a bit of sass!

It’s important to note that HMC does not suggest which response is better. Instead, it merely highlights the similarities and differences between the AI responses. HMC tells it like it is – you decide which response you like better. In this case, Jordan vs. Lebron comes down to you. At the end of the day, it’s about your preferences as a user; ultimately, you’re the arbiter of your own “taste”.

Behind the Scenes

“Help Me Choose” represents our playful take on what the field calls “LLMs as a judge”. Researchers have long applied AI models to “judge” the output of other AI models, and sometimes a model’s own output (using a technique known as self-reflective prompting). In my research group at the University of Waterloo, we’ve extended this idea to “nuggets”, or atomic facts that can be automatically extracted from and identified in answers from RAG (Retrieval-Augmented Generation) systems to assess answer quality. Automated metrics correlate sufficiently with manual judgments such that they can serve as a proxy for the purposes of system training, thereby accelerating progress.

Pushing these ideas even further, LLM-based evaluations can perhaps tie into verifiable reward signals that feed reinforcement learning (RL) algorithms. This would allow model providers to train LLMs using RL in domains where there isn’t a single correct answer and where answers can be complex, multi-faceted, and heterogeneous. These are all exciting research developments, but for many months we have been grappling with a different question: For the purposes of the Yupp consumer product, how do these technical advances benefit everyday users? It would not make sense to expose fine-grained annotations and automatic reward signals in a consumer product for everyday use, but users would no doubt still benefit from AI peer review. This initial release of HMC represents our current thinking, staying true to our approach to brand and product design.

“Help Me Choose” manifests another idea we’ve been playing with, dating back to this vision paper we shared back in December 2024: there are a number of products that enable you to interact with multiple AIs, but all the ones we’ve seen facilitate only one-way communication between you and each AI. But that’s incredibly limiting. With HMC, we pull together a multi-party dialogue involving you and multiple AIs: you talk to the AIs and the AIs talk to each other in a lively “AI council”. We believe these interactions result in rich exchanges and insightful dialogue not possible with one-on-one interactions. Yet another compelling reason to use Yupp.

The Future

Yupp imagines a future where you, the user, stand at the center of a community of AIs and other users, all custom tailored around your specific needs. “Help Me Choose” is merely our first experiment in realizing this vision. Stay tuned as we further refine this feature, but we’d love to hear what you think!

We are just getting started, and we wish to engage the community in collaborations that will empower humanity to shape the future of AI. If you’re interested in working on these problems with us, drop us a note at research@yupp.ai and let’s talk!

Keep reading

View all