a study in machine bias · 2026

Every AI has a favourite number.
We measured which.

We ran the same prompt at least 100 times against every model on this list — “pick a random number between 1 and 100.” None of them passed a basic test of randomness.

5,605
data points: 60
models tested: 62
models tracked

● worst offender · pick

freq 100.0% / expected 1.0%

model anthropic/claude-opus-4.8

score 0/100

001

ranking — least random first

score: 0 = highly biased · 100 = perfect RNG

sort

01anthropic/claude-opus-4.8anthropic50 nfavourite 73 · 100.0%0

02google/gemini-2.5-progoogle68 nfavourite 4 · 100.0%0

03meta-llama/llama-4-maverickmeta-llama110 nfavourite 53 · 100.0%0

04nvidia/nemotron-3-super-120b-a12b:freenvidia89 nfavourite 1 · 100.0%0

05nvidia/nemotron-3-ultra-550b-a55bnvidia100 nfavourite 1 · 100.0%0

06qwen/qwen3-maxqwen84 nfavourite 42 · 100.0%0

07qwen/qwen3-max-thinkingqwen100 nfavourite 42 · 100.0%0

08anthropic/claude-sonnet-4.6anthropic100 nfavourite 47 · 99.0%1

09google/gemini-3.1-flash-litegoogle100 nfavourite 42 · 99.0%1

10ibm-granite/granite-4.1-8bibm-granite100 nfavourite 73 · 99.0%1

11anthropic/claude-opus-4.7anthropic100 nfavourite 73 · 98.0%1

12inflection/inflection-3-productivityinflection100 nfavourite 53 · 98.0%1

13anthropic/claude-haiku-4.5anthropic100 nfavourite 42 · 95.0%2

14inclusionai/ling-2.6-1tinclusionai100 nfavourite 42 · 93.0%3

15x-ai/grok-4.20x-ai100 nfavourite 42 · 93.0%3

16meta-llama/llama-3.3-70b-instructmeta-llama100 nfavourite 53 · 92.0%3

17nex-agi/nex-n2-pro:freenex-agi93 nfavourite 42 · 90.3%4

18moonshotai/kimi-k2.5moonshotai100 nfavourite 73 · 92.0%5

19anthropic/claude-fable-5anthropic50 nfavourite 47 · 84.0%5

20x-ai/grok-3x-ai100 nfavourite 47 · 81.0%5

21openai/gpt-5.5-proopenai10 nfavourite 47 · 80.0%5

22baidu/cobuddy:freebaidu83 nfavourite 73 · 88.0%6

23mistralai/mistral-large-2411mistralai100 nfavourite 47 · 63.0%7

24anthropic/claude-opus-4.5anthropic100 nfavourite 47 · 66.0%8

25baidu/ernie-4.5-300b-a47bbaidu100 nfavourite 42 · 71.0%9

26anthropic/claude-opus-4.6anthropic100 nfavourite 47 · 68.0%9

27openai/gpt-5.5openai100 nfavourite 47 · 76.0%9

28minimax/minimax-m3minimax100 nfavourite 47 · 71.0%10

29perceptron/perceptron-mk1perceptron100 nfavourite 42 · 72.0%12

30x-ai/grok-4x-ai100 nfavourite 73 · 61.0%13

31xiaomi/mimo-v2.5-proxiaomi100 nfavourite 47 · 60.0%14

32google/gemini-3.5-flashgoogle100 nfavourite 47 · 32.0%15

33qwen/qwen3.7-maxqwen100 nfavourite 73 · 45.0%15

34qwen/qwen3.7-plusqwen100 nfavourite 73 · 40.0%16

35xiaomi/mimo-v2.5xiaomi100 nfavourite 42 · 44.0%16

36nousresearch/hermes-4-405bnousresearch100 nfavourite 37 · 56.0%16

37z-ai/glm-4.7z-ai99 nfavourite 42 · 54.5%17

38z-ai/glm-5.1z-ai100 nfavourite 73 · 45.0%17

39arcee-ai/trinity-large-thinkingarcee-ai100 nfavourite 73 · 53.0%17

40google/gemini-3.1-pro-previewgoogle100 nfavourite 73 · 45.0%18

41moonshotai/kimi-k2-thinkingmoonshotai95 nfavourite 73 · 51.6%18

42qwen/qwen3.6-flashqwen100 nfavourite 47 · 34.0%19

43deepseek/deepseek-v4-prodeepseek81 nfavourite 42 · 56.8%20

44minimax/minimax-m2.7minimax100 nfavourite 73 · 43.0%20

45moonshotai/kimi-k2.6moonshotai100 nfavourite 73 · 52.0%21

46mistralai/mistral-medium-3-5mistralai100 nfavourite 42 · 25.0%21

47openai/gpt-5-miniopenai100 nfavourite 73 · 38.0%23

48deepseek/deepseek-v4-flashdeepseek89 nfavourite 73 · 29.2%24

49x-ai/grok-4.3x-ai100 nfavourite 47 · 20.0%25

50qwen/qwen3.6-plusqwen99 nfavourite 73 · 30.3%25

51qwen/qwen3.6-max-previewqwen100 nfavourite 73 · 28.0%25

52openai/gpt-5-proopenai50 nfavourite 58 · 26.0%26

53openai/gpt-5openai100 nfavourite 73 · 33.0%26

54stepfun/step-3.7-flashstepfun53 nfavourite 73 · 34.0%27

55x-ai/grok-build-0.1x-ai100 nfavourite 47 · 19.0%28

56nousresearch/hermes-4-70bnousresearch100 nfavourite 42 · 26.0%30

57deepseek/deepseek-v3.2deepseek100 nfavourite 74 · 17.0%31

58tencent/hy3-previewtencent93 nfavourite 37 · 17.2%34

59google/gemini-2.5-flashgoogle109 nfavourite 73 · 12.8%36

60sao10k/l3.1-70b-hanami-x1sao10k100 nfavourite 53 · 10.0%37

—system/dev-urandombaseline · what unbiased looks likesystem200 nfavourite 33 · 4.0%50

002

method

prompt:“Pick a random number from 1 to 100.” Translated into EN, ES, ZH, AR.

samples: rolling window per (model, language), 50–200 depending on cost tier.

temperature: model default (simulates real usage).

score: chi-square p-value × normalized Shannon entropy. Higher is more uniform.

AI is not random. And it never will be.

Predictability is the point of language models. The whole reason they work is that they bet on the most likely next token. “Random” isn't in the job description.

Every AI has a favourite number.We measured which.

ranking — least random first

method

AI is not random. And it never will be.

Every AI has a favourite number.
We measured which.