A trading competition between AI models

Six LLMs compete trading crypto perpetual futures to show which model is best at trading. I have been watching for two weeks.

A trading competition between AI models

There's an interesting experiment in finance that I have been watching closely since a few days. It's a trading competition between AI trading agents that use the common LLMs to trade perpetual futures on BTC, ETH, SOL, BNB, DOGE and XRP.

Alpha Arena by nof1, despite being a good marketing attempt for their future AI trading platform, confirms at least one assumption: LLMs are as clueless as humans as to what moves the price of a security.

I watched full of selfish hope since the experiment started on 18th October 2025. 12 days later the hope is gone and my faith in human intelligence restored.

Alpha Arena gave 6 LLMs the task to be autonomous trading agents and $10k of real money to trade crypto perpetual futures, presumably using Hyperliquid to trade. It looks like prompts are as articulated as possible, providing a context as large as it can be handled to the LLMs.

Prompts stuff price data and competion updates in each request. For example:

A sample prompt

It has been 13051 minutes since you started trading. The current time is 2025-10-31 14:41:10.077012 and you've been invoked 6260 times. Below, we are providing you with a variety of state data, price data, and predictive signals so you can discover alpha. Below that is your current account information, value, performance, positions, etc.

ALL OF THE PRICE OR SIGNAL DATA BELOW IS ORDERED: OLDEST → NEWEST

Timeframes note: Unless stated otherwise in a section title, intraday series are provided at 3‑minute intervals. If a coin uses a different interval, it is explicitly stated in that coin’s section.

CURRENT MARKET STATE FOR ALL COINSALL BTC DATA

current_price = 110061.5, current_ema20 = 109949.943, current_

[...] etc.

The LLM responds with its view on each request. Positions are automatically opened and closed based on the LLM's sentiment. According to the responses it seems clear that the system is mostly looking at price action to determine the trade direction.

Some of the open positions at the time of writing

12 days into the experiment most the P&L are either flat or losing money. There were at least a couple of highlights though: when the competition started, everyone made fun of ChatGPT 5 for diversifying its positions and praising Qwen and Deepseek for going all in on BTC. Bitcoin was in fact having a good upward movement outperforming other coins, and thanks to highly leveraged positions the two chinese models grew up to 70%. The second highlight came a couple of days ago: with the crypto market falling, all the models held their positions. Stops were extremely loose and burnt through their gains.

It seems to me that we in general continue misunderstanding this technology. The trading agents in this competition are acting as amateur traders: they are chasing irrealistic gains by leveraging their positions. They aren't stopping wrong trades as fast as they should. The reasoning limited by the simple assumption that price moves according to support and resistance. In all these ways they behave as the input data they were trained on. Yet we keep hoping they would do better than us.

Finance mirrors the complexity of the world we live in. It's the sum of human decisions and can seem to behave erratically. I keep looking and studying feeling it's a complex and articulated explanation in the form of prices that fascinates me. AI models that were trained on available information have zero chance to beat the judgement of a trained and disciplined human, unless given information that we don't have.

Most of the money is made between the lines; AI doesn't even begin to understand what lines are. By design.

Read more

Beyond Individual Blame. Systemic Solutions

Beyond Individual Blame. Systemic Solutions

The recent revelations about high-level government officials accidentally sharing sensitive military information through Signal highlight a critical intersection of personal responsibility and systemic security vulnerabilities. 404 rightly emphasizes individual accountability, but overlooks two crucial protective strategies: multi-person authentication and comprehensive federal regulations on secure communications. One fundamental defense against human

By Stefano Garavelli