If you bought a 9950X3D for Linux gaming, your kernel is probably scheduling games to the wrong half of the chip.

The 9950X3D has two CCDs. Only one of them has the cache that makes X3D parts worth buying. The other is a perfectly competent Zen 5 die that boosts a little higher and has a third of the L3. Games — latency-sensitive, cache-hungry, single-thread-bound on the hot loop — want the V-cache die. Always.

Linux, by default on CachyOS, hands them to the other one.

The cause is the amd_pstate-epp driver in active mode. It reads the CPU’s per-core CPPC ranking, sees that the non-V-cache cores rank higher for single-thread, and dutifully prefers them when scheduling latency-sensitive work. The kernel is doing exactly what AMD told it to do. AMD told it the wrong thing for gamers.

The fix is one sed line and a .desktop override. This post is what I learned getting there.


Why the 9950X3D Has a Split Personality

Two 8-core CCDs. One package. They are not the same.

CCD0 is the V-cache die: 8 Zen 5 cores, 96MB shared L3, slightly lower max boost clock because the stacked cache imposes a thermal ceiling.

CCD1 is the standard die: 8 Zen 5 cores, 32MB shared L3, slightly higher max boost clock because there’s nothing on top of it cooking.

For most software, CCD1 wins. Higher clocks, lower latency to the IO die, and 32MB is not a small cache. For games, CCD0 wins by a wide margin. Modern game engines are bottlenecked on a small number of hot threads chasing pointers through working sets that fit comfortably in 96MB and absolutely do not fit in 32MB. A cache miss costs hundreds of cycles. A few hundred extra MHz of boost does not buy back hundreds of cycles. The math is brutal and the math is not subtle.

One last piece of context before the diagnosis: each physical core on this chip exposes two logical CPUs through SMT. So when I write 0-7,16-23 later, that’s eight physical cores plus their eight SMT siblings. Sixteen logical CPUs, one CCD. That’s the whole V-cache half.


The Discovery — Linux Picks the Wrong Half

Here’s how I actually found this. I’d been playing Cyberpunk 2077 on Linux for a while, then dual-booted into Windows to profile the same game on the same hardware. And I noticed something I didn’t expect: Windows was running hotter and pushing higher CPU utilization than Linux.

My first instinct was the dumb one — Windows is bloated, CachyOS is just more efficient, victory for the penguin. That’s the comforting story. It’s also wrong.

Windows wasn’t doing more work. Windows was doing better work. On dual-CCD X3D chips, Windows auto-affinitizes games to the V-cache CCD — a chipset-driver-plus-Game-Bar handshake recognizes when something is a game and parks the non-V-cache cores so the workload has nowhere to go but CCD0. Whatever the exact mechanism, the effect is what counts: the V-cache cores had more to do, so they ran hotter and showed higher utilization. That’s not inefficiency. That’s the chip getting used correctly.

Which meant Linux wasn’t being efficient. Linux was idle on the wrong cores while the game ran on the wrong CCD. Lower thermals, lower utilization, worse outcome for the workload that actually mattered.

Once I knew what to look for, the diagnosis was a one-liner. The fastest way to ask Linux which cores it thinks are the good ones is to read CPPC’s highest_perf value — that’s the number amd_pstate-epp uses for prefcore ranking.

for i in 0 4 8 12; do
  echo "cpu$i amd_pstate prefcore ranking: $(cat /sys/devices/system/cpu/cpu$i/acpi_cppc/highest_perf)"
done

On my chip, CCD1 cores reported around 236 and 231. CCD0 cores reported around 201 and 181. Higher number means the kernel thinks that core is a better candidate for single-thread work.

Read that again. The kernel believes CCD1 — the cache-starved half — is the better half for single-thread workloads. Because, strictly on Hz alone, it is. The CPPC ranking has no idea your single-thread workload is a game engine that would trade every megahertz on the chip for a smaller pile of cache misses.

amd_pstate-epp in active mode (CachyOS default) takes that ranking at face value and prefers the higher-ranked cores when placing performance-sensitive threads. Without intervention, your game’s main thread lands on CCD1 and stays there.

You bought a 9950X3D and Linux is gaming on it like a 9950X.


Verify on Your Own Machine

Before fixing anything, confirm your chip enumerates the way mine does. Most do. Some don’t — firmware quirks can swap which CCD shows up as cpu0 versus cpu8, and you don’t want to pin to the wrong half.

The L3 size is the ground truth. The V-cache die has 96MB; the standard die has 32MB. Ask the kernel:

for i in 0 8; do
  echo "cpu$i L3 size: $(cat /sys/devices/system/cpu/cpu$i/cache/index3/size)"
done

Expected output: cpu0 reports 98304K (96MB) and cpu8 reports 32768K (32MB). That means CCD0 is the V-cache die — what we want. If those values are reversed, your chip enumerates the other way around, and the affinity mask later in this post needs flipping. Don’t skip this step. The sed line is identical for both layouts; the mask isn’t.

Now ask which logical CPUs share the V-cache L3:

cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list

Expected output: 0-7,16-23. Eight physical cores plus their SMT siblings. The kernel is telling you exactly which logical CPUs are sitting on the 96MB cache. That list is the affinity mask. Don’t second-guess it. Don’t compute it from lscpu output. The kernel knows.


The Fix — Pin Steam Itself

There are two ways people typically try to solve this. One of them is wrong.

The wrong way is per-game launch options. Every game gets a taskset -c 0-7,16-23 %command% in its Steam launch options. It works for whichever games you remember to configure. It silently fails to cover everything else — Proton’s helper processes, shader compilation jobs, anti-cheat shims, and any non-Steam launcher Steam is fronting.

The right way is to pin Steam itself.

(Quick note for anyone who hasn’t used it: taskset pins a process to specific CPU cores. Linux child processes inherit CPU affinity from their parent unless they explicitly override it. So if you wrap Steam, every game Steam launches inherits the same affinity. One line at the top, the whole subprocess tree downstream.)

The clean way to do this is a local override of Steam’s .desktop file. The XDG specification says ~/.local/share/applications/ takes precedence over /usr/share/applications/, so a copy of the file in your home directory wins against the packaged one. This survives package updates (the system file gets rewritten, your override doesn’t), and it’s reversible — delete the file, you’re back to default.

Three commands:

mkdir -p ~/.local/share/applications
cp /usr/share/applications/steam.desktop ~/.local/share/applications/
sed -i 's|^Exec=/usr/bin/steam|Exec=taskset -c 0-7,16-23 /usr/bin/steam|' ~/.local/share/applications/steam.desktop

That’s the fix. Log out, log back in (or relaunch your menu), and the next time you click the Steam icon, you’re launching with affinity pinned to the V-cache CCD.

One thing worth mentioning: the sed only modifies the main Exec= line. Steam’s .desktop also has Desktop Actions — Big Picture, Library, Servers, and so on — each with their own Exec= entries. If you launch Steam via a right-click action from your app menu, those bypass the main line. If you use those, apply the same taskset prefix to the action lines too. Most people don’t, so this is a footnote for the people who do.


Verify It Worked

Critical detail first: launch Steam from your application menu, not from a terminal. Terminal launches bypass .desktop files entirely. If you steam & from a shell, you’ll get default affinity and you’ll think the fix didn’t work. It worked. You’re testing the wrong thing.

Once Steam is running, ask its PID what its affinity looks like:

taskset -cp $(pgrep -x steam)

Expected output: pid NNNN's current affinity list: 0-7,16-23. That’s the V-cache CCD. Steam itself is pinned.

Now launch a game. While it’s running, check its affinity the same way:

taskset -cp $(pgrep -f Cyberpunk2077.exe)

Same expected output: 0-7,16-23. Children inherit. That’s the entire mechanism. The kernel is no longer free to schedule your game’s hot threads onto CCD1, because the process isn’t allowed to run there.


Caveats

Three things to watch for.

Terminal launches bypass .desktop files. If you ever launch Steam from a shell — for debugging, for a custom alias, out of habit — none of this applies to that invocation. Two options: add a shell alias (alias steam='taskset -c 0-7,16-23 /usr/bin/steam') or just stop launching Steam from terminals. Pick one and be consistent. Mixing the two will make you doubt your own setup later.

Window manager autostart bypasses .desktop files too. If you’re on Hyprland with Steam in exec-once (or i3/Sway with exec, or any equivalent), that path goes around your local override. Wrap the autostart command with taskset directly in your config. Same mask.

Edge-case enumeration. If the L3-size verification step earlier showed cpu0 reporting 32768K instead of 98304K, your chip exposes the standard die first and the V-cache die second. The fix is the same shape — copy the .desktop, add a taskset prefix — but the mask becomes 8-15,24-31. The diagnostic earlier in the post is the source of truth; trust it over anything else.

That’s the full surface area. Three places affinity can leak. None of them are subtle once you know to look.


Optional — The Next Few Percent

There’s a smaller, separate win available on top of this one. CachyOS ships game-performance, a thin wrapper around powerprofilesctl that swaps the system to the performance power profile for the duration of whatever command it wraps. EPP swings from balanced to performance, the cores stop being shy about boosting, and you claw back a few percent on top of the affinity fix.

In Steam, per-game launch options:

game-performance %command%

Mid-game, you can confirm the swing landed:

cat /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference
powerprofilesctl list-holds

The first should report performance while the game is running. The second should show an active hold from game-performance. Once the game exits, both revert.

This is its own post. The gain is real but smaller than what we just did, and the mechanism is different enough that mixing them into one explanation muddies both. If you want it, it’s there. If you don’t, the affinity fix alone is the bigger lever by a wide margin.


Closing

There’s no benchmark table at the end of this post. I didn’t run a controlled Cyberpunk pass before and after. What I can tell you is what the system is doing now versus what it was doing before, and the shape of the change is unambiguous: the kernel is scheduling latency-sensitive single-thread workloads onto the cache that exists for them. CCD0 is the V-cache die. CCD0 is where the games run. The hardware finally gets used the way it was sold.

Three commands. One sed. The kernel does the rest.


Originally published on rmnr.net.