Asking Claude to translate Python to Go 9x

The current limits of generating deterministic code with AI as Compiler

Dec 04, 2025

Many developers think they know what a compiler does - I did think so too! That’s the reason they trust compilers more than they should - I did! In the elements of my Theory of Control, they think they control compilers.

This is an illusion. Compilers today do things that are not in the source code. In this example from the Advent of Compiler Optimizations, we see source code to the left and machine code (ARM) to the right. And as we can see, the tail recursive function add_v4 of add to the left is replaced with a simple add on the right - not what developers would expect.

Compilers do not stop there, they might generate something completely different, for example replacing a nested for loop on array operations into SIMD vector instructions. But as someone pointed out on Linkedin, Compilers are limited in what they are allowed to do (thanks for the link!)

What does this have to do with AI? Developers distrust AI because they can’t control it, because they can’t predict the output, or better, they can’t predict the effect of it’s output.

Compilers are optimized to have predictable output. If the source code adds 3 + 7, the compiler should create machine code that also adds 3 + 7, not subtract 3 from 7 for example. Compilers in this way are deterministic (JIT compilers are stochastic in some other way - the reasons benchmarks often fail).

AI’s are, for now, not optimized for predictable, repeatable and deterministic output. You give the same input to an AI 9 times, and you get 9 different results. I’ve tried that with generating a Minesweeper game 9x. (actually 10x) The result were 9 different games with different UI, look and feel and even different algorithms and bugs.

We have seen AIs progress before. A year ago AIs struggled hard to generate correct text. The easiest way back then to detect AI generated images was garbled text in the image. Nano Banana Pro has solved this mostly. We have seen people with six fingers at each hand. Today you rarely see AI images with people who have six fingers (or four!). There was a need to make AI work in certain scenarios and AIs were adapted - a need to limit AI to “correct” output.

AIs as compilers?

AIs are very different from compilers in one way, they are not deterministic. AIs are similar to compilers in other ways, as they can translate text into code, or code into code. Compilers have the concept of correctness, AIs currently lack that concept - not only when generating code, but it’s here, where the code might do incorrect things, that it surfaces most and is easy to see - or even binary: the code works or it doesn’t.

How deterministic can AIs be?

I wanted to see what does work right now and where we are standing. Can AIs be deterministic for small input variance? What if we only use code as input, just like a compiler does? To find a small example that works to establish a data point, not to proof that it works for all input.

For that I created a small Python script and let Claude Code Sonnet 4.5 translate it 9 times to Go. The Python script is very simple and similar to the Compiler example above.

Translating this 9x into Go source code with Claude Code Sonnet 4.5 resulted in 9 go versions, add1.go to add9.go

Claude Code generated the same Go code 9x. A small limited input generated deterministic output. I struggled with suppressing Markdown output, currently AIs love generating Markdown, it worked with

Translate this Python code to Go. Output ONLY the Go code, NEVER add explanations NEVER add markdown or wrap it with markdown like \`\`\`go\`\`\` ALWAYS generate go source only - check after generation.

Especially the “check after generation” is helping, as it does with Vibe Engineering in general - AIs would profit from being 2-pass compilers. A script runs the prompt 9 times (WSL2, Windows11), Silicon generates the syntax highlighted images, Image Magick generates the 3x3 image then as a montage.

When we try the same with Gemini 2.5-pro (I don’t have 3.0 CLI access yet), it gets interesting. First, Gemini can’t strip Markdown as reliably as Claude so I had to remove it often myself (check the number of lines in each run).

Second, the generated code is the same for this simple example!

(Also compared to Claude Code the Gemini CLI is much slower, to the point of being unusable)

Interestingly both Claude and Gemini chose int as the type for parameters a and b. Probably because of the one example given.

We can assume that we get different results with a different Python code example that does not add ints.

And the Go result reflects this (Claude Code Sonnet 4.5 again).

What have we learned?

AIs can generate deterministic output currently from small input. And they break down for high variance input, like the Minesweeper example above. Somewhere between these two data points, large-example-English and small-example-Python is a red line drawn where AIs start to struggle to keep determinism.

If we need more determinism in AI output, I’m sure we’ll see the same play out as with the text and the fingers. Looking at my Theory of Control, I’m not sure that control is really the thing we want when moving ahead with AI though.

Discussion about this post

Ready for more?