Comtoise Klokken

About 53 results

Open links in new tab

Any time

manifold.markets
https://manifold.markets › SG
Top OSWorld score in 2025? | Manifold
Resolved MKT. Background OSWorld is a benchmark for evaluating multimodal AI agents on real-world computer tasks in open-ended environments. It tests an AI's ability to navigate operating systems, …
manifold.markets
https://manifold.markets › Bayesian
Humanity’s Last Exam lists grok 4 at 45%+? | Manifold
Grok 4's score is now up at 25.4%, but I'd suggest waiting to see if they release Grok 4-heavy or Grok 4 (heavy or not) with reasoning capabilities before resolving. They released Grok 4, Grok 4-Heavy (and …
manifold.markets
https://manifold.markets › EricNeyman › highest-epochacknowledged-frontierm
Highest Epoch-acknowledged FrontierMath score at EOY2026?
While OpenAI has claimed that o3-mini achieved 32% on FrontierMath, I don't really believe them, plus they used an ungodly amount of compute. When judging how much progress has been made on …
manifold.markets
https://manifold.markets › Rice
Will 10+ AI models get released in March? | Manifold
Mar 8, 2026 · Resolved MKT. Based on prior similar markets for model releases, the following clarifications have been added: GPT-5.4-Codex is sufficient for GPT-5.4 (as it counted in Feb AI …
manifold.markets
https://manifold.markets › a-topthree-ai-lab-delays-a-frontier
A top-three AI lab delays a frontier model release six months for ...
A top-three AI lab delays a frontier model release six months for safety reasons?
manifold.markets
https://manifold.markets › MingCat › outcomes-of-trumps-strait-of-hormuz
Outcomes of Trump's Strait of Hormuz ultimatum | Manifold
Iran closes and restricts usage of the Strait of Hormuz and the U.S. either makes a deal or decides to pull back.
manifold.markets
https://manifold.markets › dog
Will any AI model score >80% on Epoch's Frontier Math Benchmark in …
Resolved NO. Background The FrontierMath benchmark, created by Epoch AI, is designed to test AI models' mathematical reasoning capabilities. As of December 2024, OpenAI's o3 reasoning model …
manifold.markets
https://manifold.markets › CDBiddulph › will-an-openai-model-design-an-impr
Will an OpenAI model design an improved version of an existing drug …
Jan 1, 2026 · Resolved 50%. Would-be strawberry man Riley Coyote commented on Sam Altman's post, asking "can I tell them about the thing?" Sam replied "which thing?" (possibly implying that Sam …
manifold.markets
https://manifold.markets › mjau › openai-releases-new-flagship-model
OpenAI releases a new flagship model by November 15, 2025?
Resolved NO. This market resolves to YES if OpenAI publicly announces and releases a new flagship large language model (successor to GPT-4o or equivalent) by November 15, 2025. Recent context: …
manifold.markets
https://manifold.markets › VerySeriousPoster › will...
Will Claude Opus be ranked in the top 20 on the Chatbot Arena ...
Hey @ VerySeriousPoster -- this market just closed! The original Claude 3 Opus from March 2024 is nowhere near the top 20 on Chatbot Arena anymore. Current top spots are Claude Opus 4.6, Gemini …
manifold.markets
https://manifold.markets › Bayesian
Claude Sonnet 5 released this week? | Manifold
Resolved NO. [image]If whether Sonnet 5 was released is ambiguous (various valid definitions yield a different resolution decision), uninvolved moderators will be asked to resolve the market based on …

Pagination
- 1
- 2
- 3
- Next