Arc B580 crashing proxmox after some time of plex trying to transcode

Server Version#: 1.43.1.10350
Kernel: 6.14.8-3-bpo12-pve
GPU: ARC B580

Recently upgraded to this version to utilize and test my Arc GPU for transcoding. It works like a charm but notice that my journal on the host gets filled with

Dec  9 04:44:10 plex kernel: [73797.506550] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
```

Plex is my only lxc/VM utilizing the gpu and this hard crashes only started once upgrading PMS. I believe the hard crash of my host may be related to log size getting large but still unsure since I’m trying to learn linux still. My /var/lib/syslog is 46GB and I’m sure it’s due to the spam of these errors until it eventually crashes.

Pasting what I believe may be relevant in that sys log before it starts spamming the above error:

```Dec 9 04:44:03 plex kernel: [73790.080952] xe 0000:03:00.0: [drm] ERROR GT1: Force wake domain 5 failed to ack wake (-ETIMEDOUT) reg[0xd58] = 0x0
Dec 9 04:44:03 plex kernel: [73790.130966] xe 0000:03:00.0: [drm] ERROR GT1: Force wake domain 12 failed to ack wake (-ETIMEDOUT) reg[0xd74] = 0x0
Dec 9 04:44:03 plex kernel: [73790.131021] ------------[ cut here ]------------
Dec 9 04:44:03 plex kernel: [73790.131023] xe 0000:03:00.0: [drm] GT1: Forcewake domains 0x1020 failed to acknowledge awake request
Dec 9 04:44:03 plex kernel: [73790.131091] WARNING: CPU: 1 PID: 1799710 at drivers/gpu/drm/xe/xe_force_wake.c:211 xe_force_wake_get+0x2f7/0x320 [xe]
Dec 9 04:44:03 plex kernel: [73790.131194] Modules linked in: dm_snapshot cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs xt_tcpudp xt_mark nft_masq nft_nat nft_limit nft_reject_inet nf_rejec
t_ipv4 nf_reject_ipv6 nft_reject nft_ct wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_conntrack_netlink nfnetlink_acct udp_diag tcp_diag i
net_diag xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_fil
ter ip6_tables iptable_filter nf_tables 8021q garp mrp bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling iwlmvm x86_pkg_te
mp_thermal intel_powerclamp coretemp kvm_intel mac80211 mei_gsc_proxy mei_gsc pmt_crashlog snd_hda_codec_hdmi xfs libarc4 snd_soc_avs
Dec 9 04:44:03 plex kernel: [73790.131232] snd_soc_hda_codec snd_hda_ext_core kvm xe i915 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine irqbypass snd_hda_intel polyval_clmulni polyval_generic snd_intel_dspcfg ghash_clmulni
_intel snd_intel_sdw_acpi drm_gpuvm sha256_ssse3 snd_usb_audio btusb sha1_ssse3 snd_hda_codec gpu_sched snd_usbmidi_lib btrtl iwlwifi aesni_intel snd_ump drm_ttm_helper drm_buddy crypto_simd btintel snd_hda_core drm_exec snd_rawmid
i ttm cryptd btbcm drm_suballoc_helper snd_hwdep snd_seq_device cmdlinepart drm_display_helper snd_pcm mei_hdcp mei_pxp rapl btmtk spi_nor cec snd_timer intel_cstate asus_nb_wmi cfg80211 eeepc_wmi mei_me ee1004 mtd snd rc_core blue
tooth pcspkr wmi_bmof mxm_wmi nzxt_kraken3 soundcore mc mei i2c_algo_bit intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_ta
bles autofs4 btrfs blake2b_generic xor raid6_pq uas usb_storage hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
Dec 9 04:44:03 plex kernel: [73790.131281] mfd_aaeon asus_wmi xhci_pci sparse_keymap nvme i2c_i801 platform_profile i2c_smbus spi_intel_pci intel_lpss_pci thunderbolt xhci_hcd nvme_core ahci spi_intel i2c_mux intel_lpss igc idma6
4 libahci nvme_auth video wmi pinctrl_tigerlake
Dec 9 04:44:03 plex kernel: [73790.131294] CPU: 1 UID: 0 PID: 1799710 Comm: kworker/u80:1 Tainted: P U O 6.14.8-3-bpo12-pve #1
Dec 9 04:44:03 plex kernel: [73790.131296] Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [O]=OOT_MODULE
Dec 9 04:44:03 plex kernel: [73790.131297] Hardware name: ASUS System Product Name/ROG MAXIMUS XIII HERO, BIOS 2302 11/13/2024
Dec 9 04:44:03 plex kernel: [73790.131298] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
Dec 9 04:44:03 plex kernel: [73790.131303] RIP: 0010:xe_force_wake_get+0x2f7/0x320 [xe]
Dec 9 04:44:03 plex kernel: [73790.131352] Code: 4c 8b 3f 44 89 55 c8 89 4d d0 e8 44 85 27 d7 8b 4d d0 41 89 d9 4d 89 e0 48 89 c6 4c 89 fa 48 c7 c7 38 26 22 c2 e8 f9 7c 80 d6 <0f> 0b 44 8b 55 c8 e9 dd fe ff ff 48 8b 75 c0 48 8b 7d
b8 44 89 55
Dec 9 04:44:03 plex kernel: [73790.131353] RSP: 0018:ffff9eb5b6afbce0 EFLAGS: 00010246
Dec 9 04:44:03 plex kernel: [73790.131355] RAX: 0000000000000000 RBX: 0000000000001020 RCX: 0000000000000000
Dec 9 04:44:03 plex kernel: [73790.131356] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Dec 9 04:44:03 plex kernel: [73790.131356] RBP: ffff9eb5b6afbd38 R08: 0000000000000000 R09: 0000000000000000
Dec 9 04:44:03 plex kernel: [73790.131357] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc2215bb2
Dec 9 04:44:03 plex kernel: [73790.131358] R13: ffff90cf96f90078 R14: 0000000000010000 R15: ffff90cf8297e530
Dec 9 04:44:03 plex kernel: [73790.131359] FS: 0000000000000000(0000) GS:ffff90dec9c80000(0000) knlGS:0000000000000000
Dec 9 04:44:03 plex kernel: [73790.131360] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 9 04:44:03 plex kernel: [73790.131361] CR2: 0000766533e5c000 CR3: 00000001947c7002 CR4: 00000000007726f0
Dec 9 04:44:03 plex kernel: [73790.131362] PKRU: 55555554
Dec 9 04:44:03 plex kernel: [73790.131362] Call Trace:
Dec 9 04:44:03 plex kernel: [73790.131363]
Dec 9 04:44:03 plex kernel: [73790.131367] guc_exec_queue_timedout_job+0x792/0xc40 [xe]
Dec 9 04:44:03 plex kernel: [73790.131423] ? queue_delayed_work_on+0x81/0x90
Dec 9 04:44:03 plex kernel: [73790.131427] ? wb_workfn+0x380/0x400
Dec 9 04:44:03 plex kernel: [73790.131430] drm_sched_job_timedout+0x70/0x110 [gpu_sched]
Dec 9 04:44:03 plex kernel: [73790.131433] process_one_work+0x178/0x3b0
Dec 9 04:44:03 plex kernel: [73790.131435] worker_thread+0x2b8/0x3e0
Dec 9 04:44:03 plex kernel: [73790.131437] ? __pfx_worker_thread+0x10/0x10
Dec 9 04:44:03 plex kernel: [73790.131439] kthread+0xfe/0x230
Dec 9 04:44:03 plex kernel: [73790.131441] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:03 plex kernel: [73790.131443] ret_from_fork+0x44/0x70
Dec 9 04:44:03 plex kernel: [73790.131446] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:03 plex kernel: [73790.131447] ret_from_fork_asm+0x1a/0x30
Dec 9 04:44:03 plex kernel: [73790.131451]
Dec 9 04:44:03 plex kernel: [73790.131451] —[ end trace 0000000000000000 ]—
Dec 9 04:44:03 plex kernel: [73790.131453] xe 0000:03:00.0: [drm] GT1: failed to get forcewake for coredump capture
Dec 9 04:44:08 plex kernel: [73795.151016] xe 0000:03:00.0: [drm] GT1: Schedule disable failed to respond, guc_id=3
Dec 9 04:44:08 plex kernel: [73795.201042] xe 0000:03:00.0: [drm] ERROR GT1: Force wake domain 5 failed to ack wake (-ETIMEDOUT) reg[0xd58] = 0x0
Dec 9 04:44:08 plex kernel: [73795.251057] xe 0000:03:00.0: [drm] ERROR GT1: Force wake domain 12 failed to ack wake (-ETIMEDOUT) reg[0xd74] = 0x0
Dec 9 04:44:08 plex kernel: [73795.251095] ------------[ cut here ]------------
Dec 9 04:44:08 plex kernel: [73795.251096] xe 0000:03:00.0: [drm] GT1: Forcewake domains 0x1020 failed to acknowledge awake request
Dec 9 04:44:08 plex kernel: [73795.251173] WARNING: CPU: 15 PID: 1799710 at drivers/gpu/drm/xe/xe_force_wake.c:211 xe_force_wake_get+0x2f7/0x320 [xe]
Dec 9 04:44:08 plex kernel: [73795.251327] Modules linked in: dm_snapshot cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs xt_tcpudp xt_mark nft_masq nft_nat nft_limit nft_reject_inet nf_rejec
t_ipv4 nf_reject_ipv6 nft_reject nft_ct wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_conntrack_netlink nfnetlink_acct udp_diag tcp_diag i
net_diag xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_fil
ter ip6_tables iptable_filter nf_tables 8021q garp mrp bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling iwlmvm x86_pkg_te
mp_thermal intel_powerclamp coretemp kvm_intel mac80211 mei_gsc_proxy mei_gsc pmt_crashlog snd_hda_codec_hdmi xfs libarc4 snd_soc_avs
Dec 9 04:44:08 plex kernel: [73795.251368] snd_soc_hda_codec snd_hda_ext_core kvm xe i915 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine irqbypass snd_hda_intel polyval_clmulni polyval_generic snd_intel_dspcfg ghash_clmulni
_intel snd_intel_sdw_acpi drm_gpuvm sha256_ssse3 snd_usb_audio btusb sha1_ssse3 snd_hda_codec gpu_sched snd_usbmidi_lib btrtl iwlwifi aesni_intel snd_ump drm_ttm_helper drm_buddy crypto_simd btintel snd_hda_core drm_exec snd_rawmid
i ttm cryptd btbcm drm_suballoc_helper snd_hwdep snd_seq_device cmdlinepart drm_display_helper snd_pcm mei_hdcp mei_pxp rapl btmtk spi_nor cec snd_timer intel_cstate asus_nb_wmi cfg80211 eeepc_wmi mei_me ee1004 mtd snd rc_core blue
tooth pcspkr wmi_bmof mxm_wmi nzxt_kraken3 soundcore mc mei i2c_algo_bit intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_ta
bles autofs4 btrfs blake2b_generic xor raid6_pq uas usb_storage hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
Dec 9 04:44:08 plex kernel: [73795.251420] mfd_aaeon asus_wmi xhci_pci sparse_keymap nvme i2c_i801 platform_profile i2c_smbus spi_intel_pci intel_lpss_pci thunderbolt xhci_hcd nvme_core ahci spi_intel i2c_mux intel_lpss igc idma6
4 libahci nvme_auth video wmi pinctrl_tigerlake
Dec 9 04:44:08 plex kernel: [73795.251434] CPU: 15 UID: 0 PID: 1799710 Comm: kworker/u80:1 Tainted: P U W O 6.14.8-3-bpo12-pve #1
Dec 9 04:44:08 plex kernel: [73795.251437] Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [W]=WARN, [O]=OOT_MODULE
Dec 9 04:44:08 plex kernel: [73795.251437] Hardware name: ASUS System Product Name/ROG MAXIMUS XIII HERO, BIOS 2302 11/13/2024
Dec 9 04:44:08 plex kernel: [73795.251439] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
Dec 9 04:44:08 plex kernel: [73795.251447] RIP: 0010:xe_force_wake_get+0x2f7/0x320 [xe]
Dec 9 04:44:08 plex kernel: [73795.251504] Code: 4c 8b 3f 44 89 55 c8 89 4d d0 e8 44 85 27 d7 8b 4d d0 41 89 d9 4d 89 e0 48 89 c6 4c 89 fa 48 c7 c7 38 26 22 c2 e8 f9 7c 80 d6 <0f> 0b 44 8b 55 c8 e9 dd fe ff ff 48 8b 75 c0 48 8b 7d
b8 44 89 55
Dec 9 04:44:08 plex kernel: [73795.251506] RSP: 0018:ffff9eb5b6afbc38 EFLAGS: 00010246
Dec 9 04:44:08 plex kernel: [73795.251507] RAX: 0000000000000000 RBX: 0000000000001020 RCX: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.251508] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.251509] RBP: ffff9eb5b6afbc90 R08: 0000000000000000 R09: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.251509] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc2215bb2
Dec 9 04:44:08 plex kernel: [73795.251510] R13: ffff90cf96f90078 R14: 0000000000010000 R15: ffff90cf8297e530
Dec 9 04:44:08 plex kernel: [73795.251511] FS: 0000000000000000(0000) GS:ffff90deca380000(0000) knlGS:0000000000000000
Dec 9 04:44:08 plex kernel: [73795.251512] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 9 04:44:08 plex kernel: [73795.251513] CR2: 000078768f960510 CR3: 0000000156531005 CR4: 00000000007726f0
Dec 9 04:44:08 plex kernel: [73795.251514] PKRU: 55555554
Dec 9 04:44:08 plex kernel: [73795.251515] Call Trace:
Dec 9 04:44:08 plex kernel: [73795.251516]
Dec 9 04:44:08 plex kernel: [73795.251519] xe_devcoredump+0x279/0x390 [xe]
Dec 9 04:44:08 plex kernel: [73795.251568] guc_exec_queue_timedout_job+0x17d/0xc40 [xe]
Dec 9 04:44:08 plex kernel: [73795.251623] ? queue_delayed_work_on+0x81/0x90
Dec 9 04:44:08 plex kernel: [73795.251627] ? __pfx_autoremove_wake_function+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.251630] drm_sched_job_timedout+0x70/0x110 [gpu_sched]
Dec 9 04:44:08 plex kernel: [73795.251634] process_one_work+0x178/0x3b0
Dec 9 04:44:08 plex kernel: [73795.251635] worker_thread+0x2b8/0x3e0
Dec 9 04:44:08 plex kernel: [73795.251637] ? __pfx_worker_thread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.251639] kthread+0xfe/0x230
Dec 9 04:44:08 plex kernel: [73795.251641] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.251643] ret_from_fork+0x44/0x70
Dec 9 04:44:08 plex kernel: [73795.251645] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.251647] ret_from_fork_asm+0x1a/0x30
Dec 9 04:44:08 plex kernel: [73795.251650]
Dec 9 04:44:08 plex kernel: [73795.251651] —[ end trace 0000000000000000 ]—
Dec 9 04:44:08 plex kernel: [73795.353103] xe 0000:03:00.0: [drm] Xe device coredump has been created
Dec 9 04:44:08 plex kernel: [73795.353107] xe 0000:03:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Dec 9 04:44:08 plex kernel: [73795.353109] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:08 plex kernel: [73795.353226] xe 0000:03:00.0: [drm] GT1: reset queued
Dec 9 04:44:08 plex kernel: [73795.353233] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:08 plex kernel: [73795.353289] xe 0000:03:00.0: [drm] GT1: reset started
Dec 9 04:44:08 plex kernel: [73795.403151] xe 0000:03:00.0: [drm] ERROR GT1: Force wake domain 5 failed to ack wake (-ETIMEDOUT) reg[0xd58] = 0x0
Dec 9 04:44:08 plex kernel: [73795.453165] xe 0000:03:00.0: [drm] ERROR GT1: Force wake domain 12 failed to ack wake (-ETIMEDOUT) reg[0xd74] = 0x0
Dec 9 04:44:08 plex kernel: [73795.453239] ------------[ cut here ]------------
Dec 9 04:44:08 plex kernel: [73795.453240] xe 0000:03:00.0: [drm] GT1: Forcewake domains 0x1020 failed to acknowledge awake request
Dec 9 04:44:08 plex kernel: [73795.453282] WARNING: CPU: 18 PID: 1799714 at drivers/gpu/drm/xe/xe_force_wake.c:211 xe_force_wake_get+0x2f7/0x320 [xe]
Dec 9 04:44:08 plex kernel: [73795.453410] Modules linked in: dm_snapshot cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs xt_tcpudp xt_mark nft_masq nft_nat nft_limit nft_reject_inet nf_rejec
t_ipv4 nf_reject_ipv6 nft_reject nft_ct wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_conntrack_netlink nfnetlink_acct udp_diag tcp_diag i
net_diag xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_fil
ter ip6_tables iptable_filter nf_tables 8021q garp mrp bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling iwlmvm x86_pkg_te
mp_thermal intel_powerclamp coretemp kvm_intel mac80211 mei_gsc_proxy mei_gsc pmt_crashlog snd_hda_codec_hdmi xfs libarc4 snd_soc_avs
Dec 9 04:44:08 plex kernel: [73795.453456] snd_soc_hda_codec snd_hda_ext_core kvm xe i915 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine irqbypass snd_hda_intel polyval_clmulni polyval_generic snd_intel_dspcfg ghash_clmulni
_intel snd_intel_sdw_acpi drm_gpuvm sha256_ssse3 snd_usb_audio btusb sha1_ssse3 snd_hda_codec gpu_sched snd_usbmidi_lib btrtl iwlwifi aesni_intel snd_ump drm_ttm_helper drm_buddy crypto_simd btintel snd_hda_core drm_exec snd_rawmid
i ttm cryptd btbcm drm_suballoc_helper snd_hwdep snd_seq_device cmdlinepart drm_display_helper snd_pcm mei_hdcp mei_pxp rapl btmtk spi_nor cec snd_timer intel_cstate asus_nb_wmi cfg80211 eeepc_wmi mei_me ee1004 mtd snd rc_core blue
tooth pcspkr wmi_bmof mxm_wmi nzxt_kraken3 soundcore mc mei i2c_algo_bit intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_ta
bles autofs4 btrfs blake2b_generic xor raid6_pq uas usb_storage hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
Dec 9 04:44:08 plex kernel: [73795.453511] mfd_aaeon asus_wmi xhci_pci sparse_keymap nvme i2c_i801 platform_profile i2c_smbus spi_intel_pci intel_lpss_pci thunderbolt xhci_hcd nvme_core ahci spi_intel i2c_mux intel_lpss igc idma6
4 libahci nvme_auth video wmi pinctrl_tigerlake
Dec 9 04:44:08 plex kernel: [73795.453524] CPU: 18 UID: 0 PID: 1799714 Comm: kworker/u80:11 Tainted: P U W O 6.14.8-3-bpo12-pve #1
Dec 9 04:44:08 plex kernel: [73795.453527] Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [W]=WARN, [O]=OOT_MODULE
Dec 9 04:44:08 plex kernel: [73795.453528] Hardware name: ASUS System Product Name/ROG MAXIMUS XIII HERO, BIOS 2302 11/13/2024
Dec 9 04:44:08 plex kernel: [73795.453529] Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
Dec 9 04:44:08 plex kernel: [73795.453589] RIP: 0010:xe_force_wake_get+0x2f7/0x320 [xe]
Dec 9 04:44:08 plex kernel: [73795.453642] Code: 4c 8b 3f 44 89 55 c8 89 4d d0 e8 44 85 27 d7 8b 4d d0 41 89 d9 4d 89 e0 48 89 c6 4c 89 fa 48 c7 c7 38 26 22 c2 e8 f9 7c 80 d6 <0f> 0b 44 8b 55 c8 e9 dd fe ff ff 48 8b 75 c0 48 8b 7d
b8 44 89 55
Dec 9 04:44:08 plex kernel: [73795.453643] RSP: 0018:ffff9eb5b6ab3d90 EFLAGS: 00010246
Dec 9 04:44:08 plex kernel: [73795.453644] RAX: 0000000000000000 RBX: 0000000000001020 RCX: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.453645] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.453646] RBP: ffff9eb5b6ab3de8 R08: 0000000000000000 R09: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.453647] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc2215bb2
Dec 9 04:44:08 plex kernel: [73795.453647] R13: ffff90cf96f90078 R14: 0000000000010000 R15: ffff90cf8297e530
Dec 9 04:44:08 plex kernel: [73795.453648] FS: 0000000000000000(0000) GS:ffff90deca500000(0000) knlGS:0000000000000000
Dec 9 04:44:08 plex kernel: [73795.453649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 9 04:44:08 plex kernel: [73795.453650] CR2: 0000766613aab800 CR3: 000000010c981004 CR4: 00000000007726f0
Dec 9 04:44:08 plex kernel: [73795.453651] PKRU: 55555554
Dec 9 04:44:08 plex kernel: [73795.453652] Call Trace:
Dec 9 04:44:08 plex kernel: [73795.453653]
Dec 9 04:44:08 plex kernel: [73795.453656] xe_devcoredump_deferred_snap_work+0x74/0x150 [xe]
Dec 9 04:44:08 plex kernel: [73795.453708] ? __pfx_xe_devcoredump_free+0x10/0x10 [xe]
Dec 9 04:44:08 plex kernel: [73795.453759] process_one_work+0x178/0x3b0
Dec 9 04:44:08 plex kernel: [73795.453764] worker_thread+0x2b8/0x3e0
Dec 9 04:44:08 plex kernel: [73795.453766] ? __pfx_worker_thread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.453769] kthread+0xfe/0x230
Dec 9 04:44:08 plex kernel: [73795.453773] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.453775] ret_from_fork+0x44/0x70
Dec 9 04:44:08 plex kernel: [73795.453779] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.453781] ret_from_fork_asm+0x1a/0x30
Dec 9 04:44:08 plex kernel: [73795.453785]
Dec 9 04:44:08 plex kernel: [73795.453786] —[ end trace 0000000000000000 ]—
Dec 9 04:44:08 plex kernel: [73795.453788] xe 0000:03:00.0: [drm] GT1: failed to get forcewake for coredump capture
Dec 9 04:44:08 plex kernel: [73795.503178] xe 0000:03:00.0: [drm] ERROR GT1: Force wake domain 5 failed to ack wake (-ETIMEDOUT) reg[0xd58] = 0x0
Dec 9 04:44:08 plex kernel: [73795.553191] xe 0000:03:00.0: [drm] ERROR GT1: Force wake domain 12 failed to ack wake (-ETIMEDOUT) reg[0xd74] = 0x0
Dec 9 04:44:08 plex kernel: [73795.553230] ------------[ cut here ]------------
Dec 9 04:44:08 plex kernel: [73795.553231] xe 0000:03:00.0: [drm] GT1: Forcewake domains 0x1020 failed to acknowledge awake request
Dec 9 04:44:08 plex kernel: [73795.553296] WARNING: CPU: 5 PID: 1799710 at drivers/gpu/drm/xe/xe_force_wake.c:211 xe_force_wake_get+0x2f7/0x320 [xe]
Dec 9 04:44:08 plex kernel: [73795.553431] Modules linked in: dm_snapshot cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs xt_tcpudp xt_mark nft_masq nft_nat nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_conntrack_netlink nfnetlink_acct udp_diag tcp_diag i
net_diag xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_fil
ter ip6_tables iptable_filter nf_tables 8021q garp mrp bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling iwlmvm x86_pkg_te
mp_thermal intel_powerclamp coretemp kvm_intel mac80211 mei_gsc_proxy mei_gsc pmt_crashlog snd_hda_codec_hdmi xfs libarc4 snd_soc_avs
Dec 9 04:44:08 plex kernel: [73795.553473] snd_soc_hda_codec snd_hda_ext_core kvm xe i915 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine irqbypass snd_hda_intel polyval_clmulni polyval_generic snd_intel_dspcfg ghash_clmulni
_intel snd_intel_sdw_acpi drm_gpuvm sha256_ssse3 snd_usb_audio btusb sha1_ssse3 snd_hda_codec gpu_sched snd_usbmidi_lib btrtl iwlwifi aesni_intel snd_ump drm_ttm_helper drm_buddy crypto_simd btintel snd_hda_core drm_exec snd_rawmid
i ttm cryptd btbcm drm_suballoc_helper snd_hwdep snd_seq_device cmdlinepart drm_display_helper snd_pcm mei_hdcp mei_pxp rapl btmtk spi_nor cec snd_timer intel_cstate asus_nb_wmi cfg80211 eeepc_wmi mei_me ee1004 mtd snd rc_core blue
tooth pcspkr wmi_bmof mxm_wmi nzxt_kraken3 soundcore mc mei i2c_algo_bit intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_ta
bles autofs4 btrfs blake2b_generic xor raid6_pq uas usb_storage hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
Dec 9 04:44:08 plex kernel: [73795.553528] mfd_aaeon asus_wmi xhci_pci sparse_keymap nvme i2c_i801 platform_profile i2c_smbus spi_intel_pci intel_lpss_pci thunderbolt xhci_hcd nvme_core ahci spi_intel i2c_mux intel_lpss igc idma6
4 libahci nvme_auth video wmi pinctrl_tigerlake
Dec 9 04:44:08 plex kernel: [73795.553543] CPU: 5 UID: 0 PID: 1799710 Comm: kworker/u80:1 Tainted: P U W O 6.14.8-3-bpo12-pve #1
Dec 9 04:44:08 plex kernel: [73795.553545] Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [W]=WARN, [O]=OOT_MODULE
Dec 9 04:44:08 plex kernel: [73795.553546] Hardware name: ASUS System Product Name/ROG MAXIMUS XIII HERO, BIOS 2302 11/13/2024
Dec 9 04:44:08 plex kernel: [73795.553548] Workqueue: gt-ordered-wq gt_reset_worker [xe]
Dec 9 04:44:08 plex kernel: [73795.553636] RIP: 0010:xe_force_wake_get+0x2f7/0x320 [xe]
Dec 9 04:44:08 plex kernel: [73795.553704] Code: 4c 8b 3f 44 89 55 c8 89 4d d0 e8 44 85 27 d7 8b 4d d0 41 89 d9 4d 89 e0 48 89 c6 4c 89 fa 48 c7 c7 38 26 22 c2 e8 f9 7c 80 d6 <0f> 0b 44 8b 55 c8 e9 dd fe ff ff 48 8b 75 c0 48 8b 7d
b8 44 89 55
Dec 9 04:44:08 plex kernel: [73795.553706] RSP: 0018:ffff9eb5b6afbd98 EFLAGS: 00010246
Dec 9 04:44:08 plex kernel: [73795.553708] RAX: 0000000000000000 RBX: 0000000000001020 RCX: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.553710] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.553711] RBP: ffff9eb5b6afbdf0 R08: 0000000000000000 R09: 0000000000000000
Dec 9 04:44:08 plex kernel: [73795.553712] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc2215bb2
Dec 9 04:44:08 plex kernel: [73795.553714] R13: ffff90cf96f90078 R14: 0000000000010000 R15: ffff90cf8297e530
Dec 9 04:44:08 plex kernel: [73795.553715] FS: 0000000000000000(0000) GS:ffff90dec9e80000(0000) knlGS:0000000000000000
Dec 9 04:44:08 plex kernel: [73795.553717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 9 04:44:08 plex kernel: [73795.553718] CR2: 0000146924fab4b0 CR3: 0000000156531002 CR4: 00000000007726f0
Dec 9 04:44:08 plex kernel: [73795.553719] PKRU: 55555554
Dec 9 04:44:08 plex kernel: [73795.553721] Call Trace:
Dec 9 04:44:08 plex kernel: [73795.553722]
Dec 9 04:44:08 plex kernel: [73795.553726] gt_reset_worker+0x99/0x1e0 [xe]
Dec 9 04:44:08 plex kernel: [73795.553794] ? wake_up_process+0x15/0x30
Dec 9 04:44:08 plex kernel: [73795.553798] process_one_work+0x178/0x3b0
Dec 9 04:44:08 plex kernel: [73795.553801] worker_thread+0x2b8/0x3e0
Dec 9 04:44:08 plex kernel: [73795.553803] ? __pfx_worker_thread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.553805] kthread+0xfe/0x230
Dec 9 04:44:08 plex kernel: [73795.553808] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.553810] ret_from_fork+0x44/0x70
Dec 9 04:44:08 plex kernel: [73795.553814] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:08 plex kernel: [73795.553816] ret_from_fork_asm+0x1a/0x30
Dec 9 04:44:08 plex kernel: [73795.553820]
Dec 9 04:44:08 plex kernel: [73795.553821] —[ end trace 0000000000000000 ]—
Dec 9 04:44:08 plex kernel: [73795.563024] xe 0000:03:00.0: [drm] GT1: GuC PC start taking longer than normal [freq = 1500MHz (req = 1500MHz), perf_limit_reasons = 0x01050000]
Dec 9 04:44:10 plex kernel: [73797.506044] xe 0000:03:00.0: [drm] ERROR GT1: GuC PC Start failed: Dynamic GT frequency control and GT sleep states are now disabled.
Dec 9 04:44:10 plex kernel: [73797.506052] ------------[ cut here ]------------
Dec 9 04:44:10 plex kernel: [73797.506052] xe 0000:03:00.0: [drm] GT1: Failed to start GuC PC: -EIO
Dec 9 04:44:10 plex kernel: [73797.506084] WARNING: CPU: 13 PID: 1799710 at drivers/gpu/drm/xe/xe_guc.c:1493 xe_guc_start+0x98/0xa0 [xe]
Dec 9 04:44:10 plex kernel: [73797.506169] Modules linked in: dm_snapshot cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs xt_tcpudp xt_mark nft_masq nft_nat nft_limit nft_reject_inet nf_rejec
t_ipv4 nf_reject_ipv6 nft_reject nft_ct wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_conntrack_netlink nfnetlink_acct udp_diag tcp_diag i
net_diag xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_fil
ter ip6_tables iptable_filter nf_tables 8021q garp mrp bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling iwlmvm x86_pkg_te
mp_thermal intel_powerclamp coretemp kvm_intel mac80211 mei_gsc_proxy mei_gsc pmt_crashlog snd_hda_codec_hdmi xfs libarc4 snd_soc_avs
Dec 9 04:44:10 plex kernel: [73797.506208] snd_soc_hda_codec snd_hda_ext_core kvm xe i915 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine irqbypass snd_hda_intel polyval_clmulni polyval_generic snd_intel_dspcfg ghash_clmulni
_intel snd_intel_sdw_acpi drm_gpuvm sha256_ssse3 snd_usb_audio btusb sha1_ssse3 snd_hda_codec gpu_sched snd_usbmidi_lib btrtl iwlwifi aesni_intel snd_ump drm_ttm_helper drm_buddy crypto_simd btintel snd_hda_core drm_exec snd_rawmid
i ttm cryptd btbcm drm_suballoc_helper snd_hwdep snd_seq_device cmdlinepart drm_display_helper snd_pcm mei_hdcp mei_pxp rapl btmtk spi_nor cec snd_timer intel_cstate asus_nb_wmi cfg80211 eeepc_wmi mei_me ee1004 mtd snd rc_core blue
tooth pcspkr wmi_bmof mxm_wmi nzxt_kraken3 soundcore mc mei i2c_algo_bit intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_ta
bles autofs4 btrfs blake2b_generic xor raid6_pq uas usb_storage hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
Dec 9 04:44:10 plex kernel: [73797.506257] mfd_aaeon asus_wmi xhci_pci sparse_keymap nvme i2c_i801 platform_profile i2c_smbus spi_intel_pci intel_lpss_pci thunderbolt xhci_hcd nvme_core ahci spi_intel i2c_mux intel_lpss igc idma6
4 libahci nvme_auth video wmi pinctrl_tigerlake
Dec 9 04:44:10 plex kernel: [73797.506270] CPU: 13 UID: 0 PID: 1799710 Comm: kworker/u80:1 Tainted: P U W O 6.14.8-3-bpo12-pve #1
Dec 9 04:44:10 plex kernel: [73797.506273] Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [W]=WARN, [O]=OOT_MODULE
Dec 9 04:44:10 plex kernel: [73797.506274] Hardware name: ASUS System Product Name/ROG MAXIMUS XIII HERO, BIOS 2302 11/13/2024
Dec 9 04:44:10 plex kernel: [73797.506275] Workqueue: gt-ordered-wq gt_reset_worker [xe]
Dec 9 04:44:10 plex kernel: [73797.506325] RIP: 0010:xe_guc_start+0x98/0xa0 [xe]
Dec 9 04:44:10 plex kernel: [73797.506376] Code: 08 4c 8b 6f 50 4d 85 ed 75 03 4c 8b 2f e8 e0 aa 26 d7 4d 89 e0 44 89 f1 4c 89 ea 48 89 c6 48 c7 c7 f8 3e 22 c2 e8 98 a2 7f d6 <0f> 0b eb 8a 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90
90 90 90 90
Dec 9 04:44:10 plex kernel: [73797.506378] RSP: 0018:ffff9eb5b6afbdc0 EFLAGS: 00010246
Dec 9 04:44:10 plex kernel: [73797.506379] RAX: 0000000000000000 RBX: ffff90cf96f90d60 RCX: 0000000000000000
Dec 9 04:44:10 plex kernel: [73797.506380] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Dec 9 04:44:10 plex kernel: [73797.506381] RBP: ffff9eb5b6afbde0 R08: 0000000000000000 R09: 0000000000000000
Dec 9 04:44:10 plex kernel: [73797.506381] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffb
Dec 9 04:44:10 plex kernel: [73797.506382] R13: ffff90cf8297e530 R14: 0000000000000001 R15: ffff90cf96f90028
Dec 9 04:44:10 plex kernel: [73797.506383] FS: 0000000000000000(0000) GS:ffff90deca280000(0000) knlGS:0000000000000000
Dec 9 04:44:10 plex kernel: [73797.506384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 9 04:44:10 plex kernel: [73797.506385] CR2: 00001487954da9d0 CR3: 00000001947c7005 CR4: 00000000007726f0
Dec 9 04:44:10 plex kernel: [73797.506386] PKRU: 55555554
Dec 9 04:44:10 plex kernel: [73797.506387] Call Trace:
Dec 9 04:44:10 plex kernel: [73797.506388]
Dec 9 04:44:10 plex kernel: [73797.506391] xe_uc_start+0x2c/0x40 [xe]
Dec 9 04:44:10 plex kernel: [73797.506457] gt_reset_worker+0xb2/0x1e0 [xe]
Dec 9 04:44:10 plex kernel: [73797.506506] ? wake_up_process+0x15/0x30
Dec 9 04:44:10 plex kernel: [73797.506510] process_one_work+0x178/0x3b0
Dec 9 04:44:10 plex kernel: [73797.506513] worker_thread+0x2b8/0x3e0
Dec 9 04:44:10 plex kernel: [73797.506515] ? __pfx_worker_thread+0x10/0x10
Dec 9 04:44:10 plex kernel: [73797.506516] kthread+0xfe/0x230
Dec 9 04:44:10 plex kernel: [73797.506518] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:10 plex kernel: [73797.506520] ret_from_fork+0x44/0x70
Dec 9 04:44:10 plex kernel: [73797.506522] ? __pfx_kthread+0x10/0x10
Dec 9 04:44:10 plex kernel: [73797.506524] ret_from_fork_asm+0x1a/0x30
Dec 9 04:44:10 plex kernel: [73797.506527]
Dec 9 04:44:10 plex kernel: [73797.506528] —[ end trace 0000000000000000 ]—
Dec 9 04:44:10 plex kernel: [73797.506539] xe 0000:03:00.0: [drm] ERROR GT1: reset failed (-ETIMEDOUT)
Dec 9 04:44:10 plex kernel: [73797.506542] xe 0000:03:00.0: [drm] ERROR CRITICAL: Xe has declared device 0000:03:00.0 as wedged.
Dec 9 04:44:10 plex kernel: [73797.506542] IOCTLs and executions are blocked. Only a rebind may clear the failure
Dec 9 04:44:10 plex kernel: [73797.506542] Please file a new bug report at Making sure you're not a bot!
Dec 9 04:44:10 plex kernel: [73797.506550] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.506610] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.506693] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.506780] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.506859] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.506914] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.506967] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.507021] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.507087] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.507170] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.507239] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.507294] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
Dec 9 04:44:10 plex kernel: [73797.507348] xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]```

Could someone with more linux administration experience tell me if I’m understanding these logs correctly? My guess is that the gpu is in some sort of power saving mode and the machine is failing to wake it up? Seems to try a few times until it eventually just says it fails. Is this assumption correct?

I’ve disabled C-States in BIOS. Not sure what else to check for tbh. Any assistance would be appreciated so I don’t have to force shutdown via power button when this happens (Proxmox give an error of “unable to start journal service” when I try to reboot it via GUI or console.)

This is the xe_guc kernel driver failing.
Every driver of this class has has a GUC and a HUC component.
In this case, the GUC part of the driver is failing.

You need to submit this to ProxMox tech support.

Reaching out to them now. Thank you so much for your help.