Advertisement
illwieckz

AMD Radeon PRO W7600 GPU reset on Linux 6.2.0-33-generic on Ubuntu 23.04

Sep 19th, 2023
142
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 15.35 KB | None | 0 0
  1. $ lsb_release -a
  2. Distributor ID: Ubuntu
  3. Description: Ubuntu 23.04
  4. Release: 23.04
  5. Codename: lunar
  6.  
  7. $ uname -a
  8. Linux gollum 6.2.0-33-generic #33-Ubuntu SMP PREEMPT_DYNAMIC Tue Sep 5 14:49:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  9.  
  10. $ printf '0300 0302 0380' \
  11. | xargs -d' ' -I{} lspci -d ::{} -nn -mm -k -v
  12. Slot: 83:00.0
  13. Class: VGA compatible controller [0300]
  14. Vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]
  15. Device: Navi 33 [Radeon RX 7700S/7600S] [7480]
  16. SVendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]
  17. SDevice: Device [0e0d]
  18. ProgIf: 00
  19. Driver: amdgpu
  20. Module: amdgpu
  21. IOMMUGroup: 38
  22.  
  23. $ glxinfo -B
  24. name of display: :1
  25. display: :1 screen: 0
  26. direct rendering: Yes
  27. Extended renderer info (GLX_MESA_query_renderer):
  28. Vendor: AMD (0x1002)
  29. Device: AMD Radeon PRO W7600 (gfx1102, LLVM 15.0.7, DRM 3.49, 6.2.0-33-generic) (0x7480)
  30. Version: 23.0.4
  31. Accelerated: yes
  32. Video memory: 8192MB
  33. Unified memory: no
  34. Preferred profile: core (0x1)
  35. Max core profile version: 4.6
  36. Max compat profile version: 4.6
  37. Max GLES1 profile version: 1.1
  38. Max GLES[23] profile version: 3.2
  39. Memory info (GL_ATI_meminfo):
  40. VBO free memory - total: 4821 MB, largest block: 4821 MB
  41. VBO free aux. memory - total: 128714 MB, largest block: 128714 MB
  42. Texture free memory - total: 4821 MB, largest block: 4821 MB
  43. Texture free aux. memory - total: 128714 MB, largest block: 128714 MB
  44. Renderbuffer free memory - total: 4821 MB, largest block: 4821 MB
  45. Renderbuffer free aux. memory - total: 128714 MB, largest block: 128714 MB
  46. Memory info (GL_NVX_gpu_memory_info):
  47. Dedicated video memory: 8192 MB
  48. Total available memory: 136967 MB
  49. Currently available dedicated video memory: 4821 MB
  50. OpenGL vendor string: AMD
  51. OpenGL renderer string: AMD Radeon PRO W7600 (gfx1102, LLVM 15.0.7, DRM 3.49, 6.2.0-33-generic)
  52. OpenGL core profile version string: 4.6 (Core Profile) Mesa 23.0.4-0ubuntu1~23.04.1
  53. OpenGL core profile shading language version string: 4.60
  54. OpenGL core profile context flags: (none)
  55. OpenGL core profile profile mask: core profile
  56.  
  57. OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.0.4-0ubuntu1~23.04.1
  58. OpenGL shading language version string: 4.60
  59. OpenGL context flags: (none)
  60. OpenGL profile mask: compatibility profile
  61.  
  62. OpenGL ES profile version string: OpenGL ES 3.2 Mesa 23.0.4-0ubuntu1~23.04.1
  63. OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
  64.  
  65. $ dmesg
  66. [ 1122.419148] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=612309, emitted seq=612311
  67. [ 1122.419618] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 8127 thread Xorg:cs0 pid 8130
  68. [ 1122.420062] amdgpu 0000:83:00.0: amdgpu: GPU reset begin!
  69. [ 1123.433117] amdgpu 0000:83:00.0: amdgpu: IP block:gfx_v11_0 is hung!
  70. [ 1123.433686] amdgpu 0000:83:00.0: amdgpu: soft reset failed, will fallback to full reset!
  71. [ 1123.901853] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  72. [ 1123.902350] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  73. [ 1124.016509] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  74. [ 1124.016913] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  75. [ 1124.130354] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  76. [ 1124.130754] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  77. [ 1124.244196] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  78. [ 1124.244597] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  79. [ 1124.358133] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  80. [ 1124.358539] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  81. [ 1124.472073] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  82. [ 1124.472476] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  83. [ 1124.586048] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  84. [ 1124.586450] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  85. [ 1124.699986] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  86. [ 1124.700388] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  87. [ 1124.813921] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
  88. [ 1124.814323] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
  89. [ 1125.080283] [drm:gfx_v11_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx
  90. [ 1125.127227] amdgpu 0000:83:00.0: amdgpu: MODE1 reset
  91. [ 1125.127233] amdgpu 0000:83:00.0: amdgpu: GPU mode1 reset
  92. [ 1125.127340] amdgpu 0000:83:00.0: amdgpu: GPU smu mode1 reset
  93. [ 1125.641572] amdgpu 0000:83:00.0: amdgpu: GPU reset succeeded, trying to resume
  94. [ 1125.642114] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
  95. [ 1125.642198] [drm] VRAM is lost due to GPU reset!
  96. [ 1125.642200] [drm] PSP is resuming...
  97. [ 1125.698760] [drm] reserve 0x1300000 from 0x81fc000000 for PSP TMR
  98. [ 1125.789615] amdgpu 0000:83:00.0: amdgpu: RAS: optional ras ta ucode is not available
  99. [ 1125.797105] amdgpu 0000:83:00.0: amdgpu: RAP: optional rap ta ucode is not available
  100. [ 1125.797108] amdgpu 0000:83:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
  101. [ 1125.797111] amdgpu 0000:83:00.0: amdgpu: SMU is resuming...
  102. [ 1125.797118] amdgpu 0000:83:00.0: amdgpu: smu driver if version = 0x00000037, smu fw if version = 0x00000035, smu fw program = 0, smu fw version = 0x00523c00 (82.60.0)
  103. [ 1125.797123] amdgpu 0000:83:00.0: amdgpu: SMU driver if version not matched
  104. [ 1125.838562] amdgpu 0000:83:00.0: amdgpu: SMU is resumed successfully!
  105. [ 1125.840459] [drm] DMUB hardware initialized: version=0x07000C00
  106. [ 1125.845131] [drm] REG_WAIT timeout 1us * 1000 tries - dcn32_dsc_pg_control line:91
  107. [ 1125.847906] [drm] REG_WAIT timeout 1us * 1000 tries - dcn32_dsc_pg_control line:99
  108. [ 1125.850679] [drm] REG_WAIT timeout 1us * 1000 tries - dcn32_dsc_pg_control line:107
  109. [ 1125.853450] [drm] REG_WAIT timeout 1us * 1000 tries - dcn32_dsc_pg_control line:115
  110. [ 1125.860729] [drm] REG_WAIT timeout 1us * 1000 tries - dcn32_dsc_pg_control line:91
  111. [ 1125.863468] [drm] REG_WAIT timeout 1us * 1000 tries - dcn32_dsc_pg_control line:99
  112. [ 1125.866202] [drm] REG_WAIT timeout 1us * 1000 tries - dcn32_dsc_pg_control line:107
  113. [ 1125.868933] [drm] REG_WAIT timeout 1us * 1000 tries - dcn32_dsc_pg_control line:115
  114. [ 1126.036678] [drm] kiq ring mec 3 pipe 1 q 0
  115. [ 1126.041447] [drm] VCN decode and encode initialized successfully(under DPG Mode).
  116. [ 1126.041793] amdgpu 0000:83:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
  117. [ 1126.042631] amdgpu 0000:83:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
  118. [ 1126.042634] amdgpu 0000:83:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
  119. [ 1126.042636] amdgpu 0000:83:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
  120. [ 1126.042638] amdgpu 0000:83:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
  121. [ 1126.042640] amdgpu 0000:83:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
  122. [ 1126.042641] amdgpu 0000:83:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
  123. [ 1126.042643] amdgpu 0000:83:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
  124. [ 1126.042645] amdgpu 0000:83:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
  125. [ 1126.042647] amdgpu 0000:83:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
  126. [ 1126.042648] amdgpu 0000:83:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
  127. [ 1126.042650] amdgpu 0000:83:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
  128. [ 1126.042652] amdgpu 0000:83:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 1
  129. [ 1126.042654] amdgpu 0000:83:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 1
  130. [ 1126.042655] amdgpu 0000:83:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
  131. [ 1126.044961] amdgpu 0000:83:00.0: amdgpu: recover vram bo from shadow start
  132. [ 1126.055608] amdgpu 0000:83:00.0: amdgpu: recover vram bo from shadow done
  133. [ 1126.055660] [drm] Skip scheduling IBs!
  134. [ 1126.055672] [drm] Skip scheduling IBs!
  135. [ 1126.055676] [drm] Skip scheduling IBs!
  136. [ 1126.055682] [drm] Skip scheduling IBs!
  137. [ 1126.055689] [drm] Skip scheduling IBs!
  138. [ 1126.055694] [drm] Skip scheduling IBs!
  139. [ 1126.055716] [drm] Skip scheduling IBs!
  140. [ 1126.055721] [drm] Skip scheduling IBs!
  141. [ 1126.055725] [drm] Skip scheduling IBs!
  142. [ 1126.055732] [drm] Skip scheduling IBs!
  143. [ 1126.055736] [drm] Skip scheduling IBs!
  144. [ 1126.055741] [drm] Skip scheduling IBs!
  145. [ 1126.055745] [drm] Skip scheduling IBs!
  146. [ 1126.055750] [drm] Skip scheduling IBs!
  147. [ 1126.055756] [drm] Skip scheduling IBs!
  148. [ 1126.055760] [drm] Skip scheduling IBs!
  149. [ 1126.055765] [drm] Skip scheduling IBs!
  150. [ 1126.055771] [drm] Skip scheduling IBs!
  151. [ 1126.055778] [drm] Skip scheduling IBs!
  152. [ 1126.055784] [drm] Skip scheduling IBs!
  153. [ 1126.055789] [drm] Skip scheduling IBs!
  154. [ 1126.055795] [drm] Skip scheduling IBs!
  155. [ 1126.055801] [drm] Skip scheduling IBs!
  156. [ 1126.055807] [drm] Skip scheduling IBs!
  157. [ 1126.056426] [drm] ring gfx_32790.1.1 was added
  158. [ 1126.056971] [drm] ring compute_32790.2.2 was added
  159. [ 1126.057517] [drm] ring sdma_32790.3.3 was added
  160. [ 1126.057530] [drm] ring gfx_32790.1.1 test pass
  161. [ 1126.057553] [drm] ring gfx_32790.1.1 ib test pass
  162. [ 1126.057560] [drm] ring compute_32790.2.2 test pass
  163. [ 1126.057576] [drm] ring compute_32790.2.2 ib test pass
  164. [ 1126.057648] [drm] ring sdma_32790.3.3 test pass
  165. [ 1126.057672] [drm] ring sdma_32790.3.3 ib test pass
  166. [ 1126.058468] amdgpu 0000:83:00.0: amdgpu: GPU reset(2) succeeded!
  167. [ 1126.068921] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
  168. [ 1126.148606] rfkill: input handler enabled
  169. [ 1128.710038] rfkill: input handler disabled
  170. [ 1135.053864] ------------[ cut here ]------------
  171. [ 1135.053867] WARNING: CPU: 11 PID: 45486 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:600 amdgpu_irq_put+0x9f/0xb0 [amdgpu]
  172. [ 1135.054341] Modules linked in: snd_seq_dummy snd_hrtimer vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nvme_fabrics bridge stp llc binfmt_misc ipmi_ssif nls_iso8859_1 snd_hda_codec_hdmi snd_usb_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core intel_rapl_common snd_usbmidi_lib snd_hwdep amd64_edac edac_mce_amd mc snd_pcm kvm_amd snd_seq_midi snd_seq_midi_event kvm snd_rawmidi intel_wmi_thunderbolt irqbypass wmi_bmof rapl snd_seq snd_seq_device acpi_ipmi snd_timer ipmi_si joydev snd input_leds ccp ipmi_devintf soundcore ipmi_msghandler k10temp mac_hid vhba(OE) ee1004 at24 hwmon_vid msr parport_pc ppdev lp parport efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid uas usb_storage amdgpu iommu_v2 drm_buddy gpu_sched
  173. [ 1135.054432] drm_ttm_helper ttm drm_display_helper ast cec crct10dif_pclmul crc32_pclmul drm_shmem_helper rc_core polyval_clmulni polyval_generic ghash_clmulni_intel drm_kms_helper sha512_ssse3 aesni_intel syscopyarea sysfillrect bcache ixgbe sysimgblt crypto_simd cryptd igb xfrm_algo drm nvme ahci dca libahci i2c_algo_bit mdio video nvme_core xhci_pci xhci_pci_renesas i2c_piix4 nvme_common wmi
  174. [ 1135.054471] CPU: 11 PID: 45486 Comm: kworker/11:5 Tainted: G OE 6.2.0-33-generic #33-Ubuntu
  175. [ 1135.054476] Hardware name: Default string Default string/Default string, BIOS WRX80PRO-F1 08/04/2022
  176. [ 1135.054478] Workqueue: events drm_mode_rmfb_work_fn [drm]
  177. [ 1135.054542] RIP: 0010:amdgpu_irq_put+0x9f/0xb0 [amdgpu]
  178. [ 1135.054997] Code: 31 f6 31 ff e9 f2 fa fb c2 44 89 e2 48 89 de 4c 89 f7 e8 94 fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 d1 fa fb c2 <0f> 0b b8 ea ff ff ff eb c3 b8 fe ff ff ff eb bc 90 90 90 90 90 90
  179. [ 1135.055001] RSP: 0018:ffff98686b10f878 EFLAGS: 00010046
  180. [ 1135.055004] RAX: 0000000000000000 RBX: ffff891959f865b8 RCX: 0000000000000000
  181. [ 1135.055006] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  182. [ 1135.055008] RBP: ffff98686b10f898 R08: 0000000000000000 R09: 0000000000000000
  183. [ 1135.055010] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  184. [ 1135.055011] R13: 0000000000000001 R14: ffff891959f80000 R15: ffff891d9c9a1000
  185. [ 1135.055013] FS: 0000000000000000(0000) GS:ffff89573d4c0000(0000) knlGS:0000000000000000
  186. [ 1135.055016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  187. [ 1135.055018] CR2: 0000563036e359a0 CR3: 00000002723b6000 CR4: 0000000000350ee0
  188. [ 1135.055021] Call Trace:
  189. [ 1135.055024] <TASK>
  190. [ 1135.055027] ? show_regs+0x6d/0x80
  191. [ 1135.055033] ? __warn+0x89/0x160
  192. [ 1135.055039] ? amdgpu_irq_put+0x9f/0xb0 [amdgpu]
  193. [ 1135.055486] ? report_bug+0x17e/0x1b0
  194. [ 1135.055493] ? handle_bug+0x46/0x90
  195. [ 1135.055499] ? exc_invalid_op+0x18/0x80
  196. [ 1135.055503] ? asm_exc_invalid_op+0x1b/0x20
  197. [ 1135.055509] ? amdgpu_irq_put+0x9f/0xb0 [amdgpu]
  198. [ 1135.055953] ? amdgpu_irq_put+0x55/0xb0 [amdgpu]
  199. [ 1135.056395] dm_set_vblank+0x195/0x1c0 [amdgpu]
  200. [ 1135.056966] dm_disable_vblank+0x10/0x20 [amdgpu]
  201. [ 1135.057535] drm_vblank_disable_and_save+0xe2/0x120 [drm]
  202. [ 1135.057593] drm_crtc_vblank_off+0xe0/0x290 [drm]
  203. [ 1135.057647] manage_dm_interrupts+0xa9/0xd0 [amdgpu]
  204. [ 1135.058214] amdgpu_dm_atomic_commit_tail+0x161/0x13e0 [amdgpu]
  205. [ 1135.058779] ? __kmem_cache_alloc_node+0x19d/0x340
  206. [ 1135.058785] ? dcn32_validate_bandwidth+0x91/0x3d0 [amdgpu]
  207. [ 1135.059618] rfkill: input handler enabled
  208. [ 1135.059467] ? dcn32_validate_bandwidth+0x1a4/0x3d0 [amdgpu]
  209. [ 1135.060088] ? kfree+0x78/0x120
  210. [ 1135.060093] ? dcn32_validate_bandwidth+0x1a4/0x3d0 [amdgpu]
  211. [ 1135.060701] ? dc_validate_global_state.part.0+0x305/0x4c0 [amdgpu]
  212. [ 1135.061314] ? drm_modeset_lock_all_ctx+0x1ad/0x1d0 [drm]
  213. [ 1135.061375] ? __kmem_cache_alloc_node+0x19d/0x340
  214. [ 1135.061381] ? drm_dp_mst_atomic_setup_commit+0x8a/0x1d0 [drm_display_helper]
  215. [ 1135.061403] ? drm_dp_mst_atomic_setup_commit+0x8a/0x1d0 [drm_display_helper]
  216. [ 1135.061423] ? wait_for_completion_timeout+0x119/0x150
  217. [ 1135.061427] ? drm_dp_mst_atomic_setup_commit+0x8a/0x1d0 [drm_display_helper]
  218. [ 1135.061448] commit_tail+0xc2/0x190 [drm_kms_helper]
  219. [ 1135.061474] ? drm_atomic_helper_swap_state+0x246/0x380 [drm_kms_helper]
  220. [ 1135.061498] drm_atomic_helper_commit+0x11d/0x150 [drm_kms_helper]
  221. [ 1135.061525] drm_atomic_commit+0x99/0xd0 [drm]
  222. [ 1135.061593] ? __pfx___drm_printfn_info+0x10/0x10 [drm]
  223. [ 1135.061662] atomic_remove_fb+0x2fd/0x380 [drm]
  224. [ 1135.061738] drm_framebuffer_remove+0x6b/0x1f0 [drm]
  225. [ 1135.061818] drm_mode_rmfb_work_fn+0x6f/0xa0 [drm]
  226. [ 1135.061887] process_one_work+0x225/0x430
  227. [ 1135.061894] worker_thread+0x1f6/0x3e0
  228. [ 1135.061899] ? __pfx_worker_thread+0x10/0x10
  229. [ 1135.061904] kthread+0xe9/0x110
  230. [ 1135.061909] ? __pfx_kthread+0x10/0x10
  231. [ 1135.061915] ret_from_fork+0x2c/0x50
  232. [ 1135.061923] </TASK>
  233. [ 1135.061925] ---[ end trace 0000000000000000 ]---
  234. [ 1137.199278] rfkill: input handler disabled
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement