AI Alignment Cannot Be Top-Down

Nov 3, 2025

Community Notes offers a better model — where citizens, not corporations, decide what “aligned” means.

3 Comments

What this makes clear is that alignment isn’t just a technical problem, it’s a semantic one. You can’t preserve semantic fidelity to a community’s values with a top down, single culture definition of acceptable. Meaning drifts the moment a small group tries to encode it for everyone else. Community Notes worked because it distributed interpretation. Many perspectives correcting for each other, keeping intent and context aligned.

David Hoze

Apr 13

You're right that no small group can determine what alignment means for everyone, and Taiwan's citizen-deliberation model is genuinely interesting.

But deliberation only works if the participants have something to draw from beyond their individual preferences. Citizen assemblies that surface preferences without a framework for evaluating them produce the same problem at larger scale: whose preferences win? The tradition I work from has a 2,000-year-old answer: you need both broad deliberation - machloket l'shem shamayim, disagreement for the sake of heaven - AND a mechanism for binding decision. The Sanhedrin isn't top-down or bottom-up. It's bottom-up deliberation with ruling authority, and it has a safety valve: if the ruling is unanimous, it's automatically suspect, because unanimity signals systemic failure rather than truth. AI governance needs deliberation. It also needs the courage to decide.

Jan Romportl

Nov 21

I'm afraid you're counting your chickens before they hatch. There is currently simply NO technical way how to RELIABLY align AI with anything, no matter what that "anything" looks like. We don't have any tools for mechanistic interpretability and we don't know how to technically solve the problems such as AI sandbagging, faking alignment, strategic deception, sleeper agents, and many others.

So imho we should first worry about this. And when we crack these problems, then I guess your Attentiveness approach could work and I'd actually be happy if it worked.

Alas, the US attitudes to solving these essential technical enablers for alignment are very poor, driven mostly by Effective Accelerationists and their proto-religious beliefs.

Taiwan as the most important chipmaker should put real pressure on US, something along the lines of "no real technical alignment efforts -> no chips for US". I know it's very hard because the other part of the logic is "no chips for US -> no military protection for Taiwan".

Funny enough, I know think that the mainland China is actually giving more resources to cracking the hard problems of technical alignment. Not because of some deep humanistic sentiments, but because of simple pragmatic incentives: the communist party itself (as the main decision-making memeplex) wants to make sure its absolute power will last forever, so it is very sensitive to any new highly competent systems than can spin out of control (such asi misaligned AGI). Meanwhile, the US incentives are driven by some strange transhumanist fight against Thiele's Antichrist.

I guess Taiwan could play some active role here...