update

The chat doesn't lie to you anymore

We added three new safety nets that catch the chat when it's about to make something up. If it slips, the engine refuses to let the response leave the building.

Yes, this is another chat update

We know, we know. We keep saying the chat is better. The thing is, every time we ship an improvement, real users (you) find a new way it can be wrong, and the cycle starts over. So instead of writing another "we tweaked the prompt" post, we want to tell you what's actually different this time.

What we changed under the hood

The chat is still an AI, and AI models are still bad at chess on their own. Our whole approach has always been to keep the model on a tight leash and let Stockfish do the actual chess thinking. The problem we kept running into was the leash wasn't tight enough. Even with engine data sitting right in front of it, the model would sometimes wander off and write moves that don't exist, claim a piece attacks a square it can't reach, or annotate a move with a checkmate symbol when the king has plenty of legal escapes.

So we built three new automatic checks that run on every single reply before it reaches you. If any of them catch a problem, the reply gets thrown back at the model with the specific error pointed out, and the model has to try again. You never see the broken version.

1. Illegal moves

The most embarrassing failure was the chat suggesting moves that simply aren't possible in your position. A queen on d3 doesn't reach e5. A bishop on f1 doesn't fork two pieces on the seventh rank. The new check replays every move chain in the reply through chess.js. If any move would be refused at the board, the whole reply is rejected and the model is told exactly which move failed and which legal moves were available instead.

2. False attack claims

The trickier failure was when the chat would propose a move that's perfectly legal but then describe what it does incorrectly. The example that pushed us to fix this was a reply that said "play Be2, it hits the queen on h4". The bishop move is legal. The bishop on e2 has no diagonal to h4. The model just made up the geometry. The new check looks for any sentence that mentions an attack, threat, or pressure on a square, replays the relevant move, and asks the engine which pieces actually attack that square afterward. If nothing does, the claim is false and the reply gets sent back.

3. Wrong check and checkmate annotations

The third pattern was the chat writing things like "26. Qh8#" or "Rxe7+" without verifying the king is actually in check or checkmate. Those plus and pound signs at the end of moves carry a lot of weight in chess writing. If the chat says a move is mate, you should be able to trust that it's mate. So now we do the obvious thing: replay the move and ask the engine. If the model wrote "#" but the engine says it's just a check (or nothing at all), the reply is rejected and the model has to either find a real mate or drop the symbol.

What this means for you

In practice, you should notice a few things. The chat is more honest about what it can and can't claim. When it does describe a tactic, the tactic is real. When it shows you a line, the moves in that line are legal. When it tells you something is checkmate, it actually is. We also fixed a long-standing annoyance where the chat panel didn't scroll properly when you sent a message, so now your question anchors to the top of the panel and the reply streams in below it.

Are we done?

Honestly, no. There are still ways the model can be wrong that none of these checks catch yet. If it tells you a line is "winning" without verification, we can't always catch that. If it suggests a move that's legal but strategically pointless, we can't catch that either. We have a plan for those, but we're going to wait until we see them happen in real conversations before adding more layers, because we'd rather not block legitimate replies that just happen to look suspicious.

The real change here isn't any one of these three checks. It's that we now have a clear pattern for every new failure type. Find it, write a deterministic check, ship it, move on. If you spot one of these in the wild, please tell us. The faster you tell us, the faster the next user stops seeing it.

Where are we now?

Caissablanca has seen a big increase in traffic and user activity! This is huge, and what's even better, is the increase in feedback we are getting from our users. As a small team, we are unable to test every position, every analysis, every puzzle, but with the help of our community and their feedback, we are able to quickly improve and continue to deliver the best product possible. Thank you!

"I have suggestions!"

Great! You can contact us at sebastian@soleinnovations.com. Any feedback is welcome! We want to build what our users want, so let us know how we can make Caissablanca better for you!