summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-01-25drm/amdgpu: declare firmware for new MES 11.0.4Li Ma1-0/+2
To support new mes ip block Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu: enable imu firmware for GC 11.0.4Li Ma1-0/+1
The GC 11.0.4 needs load IMU to power up GFX before loads GFX firmware. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25drm/amdgpu: remove unconditional trap enable on add gfx11 queuesJonathan Kim1-1/+0
Rebase of driver has incorrect unconditional trap enablement for GFX11 when adding mes queues. Reported-by: Graham Sider <graham.sider@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Graham Sider <graham.sider@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-18drm/amdgpu: allow multipipe policy on ASICs with one MECLang Yu1-0/+3
Always enable multipipe policy on ASICs with GC VERSION > 9.0.0 instead of MEC number > 1. This will allow multipipe policy on ASICs with one MEC, e.g., gfx11 APUs. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-18drm/amdgpu: correct MEC number for gfx11 APUsLang Yu1-2/+9
There is only one MEC on these APUs. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-18drm/amdgpu: fix amdgpu_job_free_resources v2Christian König1-2/+8
It can be that neither fence were initialized when we run out of UVD streams for example. v2: fix typo breaking compile Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2324 Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-01-18drm/amdgpu: fix cleaning up reserved VMID on releaseChristian König1-0/+1
We need to reset this or otherwise run into list corruption later on. Fixes: e44a0fe630c5 ("drm/amdgpu: rework reserved VMID handling") Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Candice Li <candice.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-13Merge tag 'amd-drm-fixes-6.2-2023-01-11' of ↵Dave Airlie5-10/+13
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.2-2023-01-11: amdgpu: - SMU13 fan speed fix - SMU13 fix power cap handling - SMU13 BACO fix - Fix a possible segfault in bo validation error case - Delay removal of firmware framebuffer - Fix error when unloading amdkfd: - SVM fix when clearing vram - GC11 fix for multi-GPU Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230112033004.8184-1-alexander.deucher@amd.com
2023-01-13Merge tag 'drm-misc-fixes-2023-01-12' of ↵Dave Airlie2-18/+37
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Several fixes for amdgpu (all addressing issues with fences), yet another orientation quirk for a Lenovo device, a use-after-free fix for virtio, a regression fix in TTM and a performance regression in drm buddy. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230112130954.pxt77g3a7rokha42@houat
2023-01-10drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPUEric Huang1-1/+1
The point bo->kfd_bo is NULL for queue's write pointer BO when creating queue on mGPU. To avoid using the pointer fixes the error. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10drm/amdgpu: fix pipeline sync v2Christian König1-16/+30
This fixes a potential memory leak of dma_fence objects in the CS code as well as glitches in firefox because of missing pipeline sync. v2: use the scheduler instead of the fence context Signed-off-by: Christian König <christian.koenig@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2323 Tested-by: Michal Kubecek mkubecek@suse.cz Tested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230109130120.73389-1-christian.koenig@amd.com
2023-01-09drm/amdgpu: Fixed bug on error when unloading amdgpuYiPeng Chai1-1/+1
Fixed bug on error when unloading amdgpu. The error message is as follows: [ 377.706202] kernel BUG at drivers/gpu/drm/drm_buddy.c:278! [ 377.706215] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 377.706222] CPU: 4 PID: 8610 Comm: modprobe Tainted: G IOE 6.0.0-thomas #1 [ 377.706231] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021 [ 377.706238] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy] [ 377.706264] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53 [ 377.706282] RSP: 0018:ffffad2dc4683cb8 EFLAGS: 00010287 [ 377.706289] RAX: 0000000000000000 RBX: ffff8b1743bd5138 RCX: 0000000000000000 [ 377.706297] RDX: ffff8b1743bd5160 RSI: ffff8b1743bd5c78 RDI: ffff8b16d1b25f70 [ 377.706304] RBP: ffff8b1743bd59e0 R08: 0000000000000001 R09: 0000000000000001 [ 377.706311] R10: ffff8b16c8572400 R11: ffffad2dc4683cf0 R12: ffff8b16d1b25f70 [ 377.706318] R13: ffff8b16d1b25fd0 R14: ffff8b1743bd59c0 R15: ffff8b16d1b25f70 [ 377.706325] FS: 00007fec56c72c40(0000) GS:ffff8b1836500000(0000) knlGS:0000000000000000 [ 377.706334] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 377.706340] CR2: 00007f9b88c1ba50 CR3: 0000000110450004 CR4: 00000000003706e0 [ 377.706347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 377.706354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 377.706361] Call Trace: [ 377.706365] <TASK> [ 377.706369] drm_buddy_free_list+0x2a/0x60 [drm_buddy] [ 377.706376] amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu] [ 377.706572] amdgpu_ttm_fini+0x12e/0x1a0 [amdgpu] [ 377.706650] amdgpu_bo_fini+0x22/0x90 [amdgpu] [ 377.706727] gmc_v11_0_sw_fini+0x26/0x30 [amdgpu] [ 377.706821] amdgpu_device_fini_sw+0xa1/0x3c0 [amdgpu] [ 377.706897] amdgpu_driver_release_kms+0x12/0x30 [amdgpu] [ 377.706975] drm_dev_release+0x20/0x40 [drm] [ 377.707006] release_nodes+0x35/0xb0 [ 377.707014] devres_release_all+0x8b/0xc0 [ 377.707020] device_unbind_cleanup+0xe/0x70 [ 377.707027] device_release_driver_internal+0xee/0x160 [ 377.707033] driver_detach+0x44/0x90 [ 377.707039] bus_remove_driver+0x55/0xe0 [ 377.707045] pci_unregister_driver+0x3b/0x90 [ 377.707052] amdgpu_exit+0x11/0x6c [amdgpu] [ 377.707194] __x64_sys_delete_module+0x142/0x2b0 [ 377.707201] ? fpregs_assert_state_consistent+0x22/0x50 [ 377.707208] ? exit_to_user_mode_prepare+0x3e/0x190 [ 377.707215] do_syscall_64+0x38/0x90 [ 377.707221] entry_SYSCALL_64_after_hwframe+0x63/0xcd Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-01-09drm/amd: Delay removal of the firmware framebufferMario Limonciello2-6/+8
Removing the firmware framebuffer from the driver means that even if the driver doesn't support the IP blocks in a GPU it will no longer be functional after the driver fails to initialize. This change will ensure that unsupported IP blocks at least cause the driver to work with the EFI framebuffer. Cc: stable@vger.kernel.org Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09drm/amdgpu: Fix potential NULL dereferenceLuben Tuikov1-2/+3
Fix potential NULL dereference, in the case when "man", the resource manager might be NULL, when/if we print debug information. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Cc: Dan Carpenter <error27@gmail.com> Cc: kernel test robot <lkp@intel.com> Fixes: 7554886daa31ea ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-05drm/amdgpu: fix missing dma_fence_put in error pathChristian König1-1/+3
When the fence can't be added we need to drop the reference. Suggested-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-2-christian.koenig@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
2023-01-05drm/amdgpu: fix another missing fence reference in the CS codeChristian König1-1/+4
drm_sched_job_add_dependency() consumes the references of the gang members. Only triggered by mesh shaders. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 1728baa7e4e6 ("drm/amdgpu: use scheduler dependencies for CS") Tested-by: Mike Lothian <mike@fireburn.co.uk> Tested-by: Bert Karwatzki <spasswolf@web.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-1-christian.koenig@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
2023-01-04Revert "drm/amd/display: Enable Freesync Video Mode by default"Michel Dänzer2-0/+28
This reverts commit de05abe6b9d0fe08f65d744f7f75a4cba4df27ad. The bug referenced below was bisected to this commit. There has been no activity toward fixing it in 3 months, so let's revert for now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-12-23Merge tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drmLinus Torvalds22-234/+355
Pull drm fixes from Dave Airlie: "Holiday fixes! Two batches from amd, and one group of i915 changes. amdgpu: - Spelling fix - BO pin fix - Properly handle polaris 10/11 overlap asics - GMC9 fix - SR-IOV suspend fix - DCN 3.1.4 fix - KFD userptr locking fix - SMU13.x fixes - GDS/GWS/OA handling fix - Reserved VMID handling fixes - FRU EEPROM fix - BO validation fixes - Avoid large variable on the stack - S0ix fixes - SMU 13.x fixes - VCN fix - Add missing fence reference amdkfd: - Fix init vm error handling - Fix double release of compute pasid i915 - Documentation fixes - OA-perf related fix - VLV/CHV HDMI/DP audio fix - Display DDI/Transcoder fix - Migrate fixes" * tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drm: (39 commits) drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency drm/amdgpu: enable VCN DPG for GC IP v11.0.4 drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0 drm/amd/pm: correct the fan speed retrieving in PWM for some SMU13 asics drm/amd/pm: bump SMU13.0.0 driver_if header to version 0x34 drm/amdgpu: skip MES for S0ix as well since it's part of GFX drm/amd/pm: avoid large variable on kernel stack drm/amdkfd: Fix double release compute pasid drm/amdkfd: Fix kfd_process_device_init_vm error handling drm/amd/pm: update SMU13.0.0 reported maximum shader clock drm/amd/pm: correct SMU13.0.0 pstate profiling clock settings drm/amd/pm: enable GPO dynamic control support for SMU13.0.7 drm/amd/pm: enable GPO dynamic control support for SMU13.0.0 drm/amdgpu: revert "generally allow over-commit during BO allocation" drm/amdgpu: Remove unnecessary domain argument drm/amdgpu: Fix size validation for non-exclusive domains (v4) drm/amdgpu: Check if fru_addr is not NULL (v2) drm/i915/ttm: consider CCS for backup objects drm/i915/migrate: fix corner case in CCS aux copying drm/amdgpu: rework reserved VMID handling ...
2022-12-21drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependencyChristian König1-0/+2
That function consumes the reference. Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: aab9cf7b6954 ("drm/amdgpu: use scheduler dependencies for VM updates") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-21drm/amdgpu: enable VCN DPG for GC IP v11.0.4Saleemkhan Jamadar1-0/+1
Enable VCN Dynamic Power Gating control for GC IP v11.0.4. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0, 6.1
2022-12-20drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0Tim Huang1-1/+2
MES is part of gfxoff and MES suspend and resume are skipped for S0i3. But the mes_self_test call path is still in the amdgpu_device_ip_late_init. it's should also be skipped for s0ix as no hardware re-initialization happened. Besides, mes_self_test will free the BO that triggers a lot of warning messages while in the suspend state. [ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu] [ 81.679435] Call Trace: [ 81.679726] <TASK> [ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu] [ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu] [ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu] [ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu] [ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu] [ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu] [ 81.684818] pci_pm_resume+0x5c/0xa0 [ 81.685247] ? pci_pm_thaw+0x90/0x90 [ 81.685658] dpm_run_callback+0x4e/0x160 [ 81.686110] device_resume+0xad/0x210 [ 81.686529] async_resume+0x1e/0x40 [ 81.686931] async_run_entry_fn+0x33/0x120 [ 81.687405] process_one_work+0x21d/0x3f0 [ 81.687869] worker_thread+0x4a/0x3c0 [ 81.688293] ? process_one_work+0x3f0/0x3f0 [ 81.688777] kthread+0xff/0x130 [ 81.689157] ? kthread_complete_and_exit+0x20/0x20 [ 81.689707] ret_from_fork+0x22/0x30 [ 81.690118] </TASK> [ 81.690380] ---[ end trace 0000000000000000 ]--- v2: make the comment clean and use adev->in_s0ix instead of adev->suspend Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0, 6.1
2022-12-20drm/amdgpu: skip MES for S0ix as well since it's part of GFXAlex Deucher1-2/+3
It's also part of gfxoff. Cc: stable@vger.kernel.org # 6.0, 6.1 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdkfd: Fix double release compute pasidPhilip Yang2-12/+31
If kfd_process_device_init_vm returns failure after vm is converted to compute vm and vm->pasid set to compute pasid, KFD will not take pdd->drm_file reference. As a result, drm close file handler maybe called to release the compute pasid before KFD process destroy worker to release the same pasid and set vm->pasid to zero, this generates below WARNING backtrace and NULL pointer access. Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step of kfd_process_device_init_vm, to ensure vm pasid is the original pasid if acquiring vm failed or is the compute pasid with pdd->drm_file reference taken to avoid double release same pasid. amdgpu: Failed to create process VM object ida_free called for id=32770 which is not allocated. WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140 RIP: 0010:ida_free+0x96/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:ida_free+0x76/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: revert "generally allow over-commit during BO allocation"Christian König2-4/+18
This reverts commit f9d00a4a8dc8fff951c97b3213f90d6bc7a72175. This causes problem for KFD because when we overcommit we accidentially bind the BO to GTT for moving it into VRAM. We also need to make sure that this is done only as fallback after trying to evict first. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: Remove unnecessary domain argumentLuben Tuikov4-14/+6
Remove the "domain" argument to amdgpu_bo_create_kernel_at() since this function takes an "offset" argument which is the offset off of VRAM, and as such allocation always takes place in VRAM. Thus, the "domain" argument is unnecessary. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: Fix size validation for non-exclusive domains (v4)Luben Tuikov1-11/+8
Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the requested memory exists, else we get a kernel oops when dereferencing "man". v2: Make the patch standalone, i.e. not dependent on local patches. v3: Preserve old behaviour and just check that the manager pointer is not NULL. v4: Complain if GTT domain requested and it is uninitialized--most likely a bug. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: Check if fru_addr is not NULL (v2)Luben Tuikov1-2/+4
Always check if fru_addr is not NULL. This commit also fixes a "smatch" warning. v2: Add a Fixes tag. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Dan Carpenter <error27@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Fixes: afbe5d1e4bd7c7 ("drm/amdgpu: Bug-fix: Reading I2C FRU data on newer ASICs") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: rework reserved VMID handlingChristian König3-30/+24
Instead of reserving a VMID for a single process allow that many processes use the reserved ID. This allows for proper isolation between the processes. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: stop waiting for the VM during unreserveChristian König1-16/+0
This is completely pointless since the VMID always stays allocated until the VM is idle. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: cleanup SPM support a bitChristian König3-4/+7
This should probably not access job->vm and also emit the SPM switch under the conditional execute. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: fix GDS/GWS/OA switch handlingChristian König3-43/+54
Bas pointed out that this isn't working as expected and could cause crashes. Fix the handling by storing the marker that a switch is needed inside the job instead. Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: Add notifier lock for KFD userptrsFelix Kuehling6-91/+172
Add a per-process MMU notifier lock for processing notifiers from userptrs. Use that lock to properly synchronize page table updates with MMU notifiers. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Xiaogang Chen<Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: WARN when freeing kernel memory during suspendChristian König1-0/+2
When buffers are freed during suspend there is no guarantee that they can be re-allocated during resume. The PSP subsystem seems to be quite buggy regarding this, so add a WARN_ON() to point out those bugs. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexdeucher@amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: fixx NULL pointer deref in gmc_v9_0_get_vm_pteChristian König1-1/+3
We not only need to make sure that we have a BO, but also that the BO has some backing store. Fixes: d1a372af1c3d ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-13Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
2022-12-13drm/amdgpu: Add an extra evict_resource call during device_suspend.Shikang Fan1-0/+5
- evict_resource is taking too long causing sriov full access mode timeout. So, add an extra evict_resource in the beginning as an early evict. Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-13Merge tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drmLinus Torvalds121-1372/+2870
Pull drm updates from Dave Airlie: "The biggest highlight is that the accel subsystem framework is merged. Hopefully for 6.3 we will be able to line up a driver to use it. In drivers land, i915 enables DG2 support by default now, and nouveau has a big stability refactoring and initial ampere support, AMD includes new hw IP support and should build on ARM again. There is also an ofdrm driver to take over offb on platforms it's used. Stuff outside my tree, the dma-buf patches hit a few places, the vc4 firmware changes also do, and i915 has some interactions with MEI for discrete GPUs. I think all of those should have been acked/reviewed by relevant parties. New driver: - ofdrm - replacement for offb fbdev: - add support for nomodeset fourcc: - add Vivante tiled modifier core: - atomic-helpers: CRTC primary plane test fixes, fb access hooks - connector: TV API consistency, cmdline parser improvements - send connector hotplug on cleanup - sort makefile objects tests: - sort kunit tests - improve DP-MST tests - add kunit helpers to create a device sched: - module param for scheduling policy - refcounting fix buddy: - add back random seed log ttm: - convert ttm_resource to size_t - optimize pool allocations edid: - HFVSDB parsing support fixes - logging/debug improvements - DSC quirks dma-buf: - Add unlocked vmap and attachment mapping - move drivers to common locking convention - locking improvements firmware: - new API for rPI firmware and vc4 xilinx: - zynqmp: displayport bridge support - dpsub fix bridge: - adv7533: Remove dynamic lane switching - it6505: Runtime PM support, sync improvements - ps8640: Handle AUX defer messages - tc358775: Drop soft-reset over I2C panel: - panel-edp: Add INX N116BGE-EA2 C2 and C4 support. - Jadard JD9365DA-H3 - NewVision NV3051D amdgpu: - DCN support on ARM - DCN 2.1 secure display - Sienna Cichlid mode2 reset fixes - new GC 11.x firmware versions - drop AMD specific DSC workarounds in favour of drm code - clang warning fixes - scheduler rework - SR-IOV fixes - GPUVM locking fixes - fix memory leak in CS IOCTL error path - flexible array updates - enable new GC/PSP/SMU/NBIO IP - GFX preemption support for gfx9 amdkfd: - cache size fixes - userptr fixes - enable cooperative launch on gfx 10.3 - enable GC 11.0.4 KFD support radeon: - replace kmap with kmap_local_page - ACPI ref count fix - HDA audio notifier support i915: - DG2 enabled by default - MTL enablement work - hotplug refactoring - VBT improvements - Display and watermark refactoring - ADL-P workaround - temp disable runtime_pm for discrete- - fix for A380 as a secondary GPU - Wa_18017747507 for DG2 - CS timestamp support fixes for gen5 and earlier - never purge busy TTM objects - use i915_sg_dma_sizes for all backends - demote GuC kernel contexts to normal priority - gvt: refactor for new MDEV interface - enable DC power states on eDP ports - fix gen 2/3 workarounds nouveau: - fix page fault handling - Ampere acceleration support - driver stability improvements - nva3 backlight support msm: - MSM_INFO_GET_FLAGS support - DPU: XR30 and P010 image formats - Qualcomm SM6115 support - DSI PHY support for QCM2290 - HDMI: refactored dev init path - remove exclusive-fence hack - fix speed-bin detection - enable clamp to idle on 7c3 - improved hangcheck detection vmwgfx: - fb and cursor refactoring - convert to generic hashtable - cursor improvements etnaviv: - hw workarounds - softpin MMU fixes ast: - atomic gamma LUT support - convert to SHMEM lcdif: - support YUV planes - Increase DMA burst size - FIFO threshold tuning meson: - fix return type of cvbs mode_valid mgag200: - fix PLL setup on some revisions sun4i: - A100 and D1 support udl: - modesetting improvements - hot unplug support vc4: - support PAL-M - fix regression preventing 4K @ 60Hz - fix NULL ptr deref v3d: - switch to drm managed resources renesas: - RZ/G2L DSI support - DU Kconfig cleanup mediatek: - fixup dpi and hdmi - MT8188 dpi support - MT8195 AFBC support tegra: - NVDEC hardware on Tegra234 SoC hdlcd: - switch to drm managed resources ingenic: - fix registration error path hisilicon: - convert to drm_mode_init maildp: - use managed resources mtk: - use drm_mode_init rockchip: - use drm_mode_copy" * tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm: (1397 commits) drm/amdgpu: fix mmhub register base coding error drm/amdgpu: add tmz support for GC IP v11.0.4 drm/amdgpu: enable GFX Clock Gating control for GC IP v11.0.4 drm/amdgpu: enable GFX Power Gating for GC IP v11.0.4 drm/amdgpu: enable GFX IP v11.0.4 CG support drm/amdgpu: Make amdgpu_ring_mux functions as static drm/amdgpu: generally allow over-commit during BO allocation drm/amd/display: fix array index out of bound error in DCN32 DML drm/amd/display: 3.2.215 drm/amd/display: set optimized required for comp buf changes drm/amd/display: Add debug option to skip PSR CRTC disable drm/amd/display: correct DML calc error of UrgentLatency drm/amd/display: correct static_screen_event_mask drm/amd/display: Ensure commit_streams returns the DC return code drm/amd/display: read invalid ddc pin status cause engine busy drm/amd/display: Bypass DET swath fill check for max clocks drm/amd/display: Disable uclk pstate for subvp pipes drm/amd/display: Fix DCN2.1 default DSC clocks drm/amd/display: Enable dp_hdmi21_pcon support drm/amd/display: prevent seamless boot on displays that don't have the preferred dig ...
2022-12-09drm/amdgpu: handle polaris10/11 overlap asics (v2)Alex Deucher1-2/+11
Some special polaris 10 chips overlap with the polaris11 DID range. Handle this properly in the driver. v2: use local flags for other function calls. Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-12-09drm/amdgpu: make display pinning more flexible (v2)Alex Deucher1-1/+2
Only apply the static threshold for Stoney and Carrizo. This hardware has certain requirements that don't allow mixing of GTT and VRAM. Newer asics do not have these requirements so we should be able to be more flexible with where buffers end up. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2270 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2291 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2255 Acked-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-12-09Merge tag 'amd-drm-next-6.2-2022-12-07' of ↵Dave Airlie11-29/+33
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.2-2022-12-07: amdgpu: - DSC fixes for DCN 2.1 - HDMI PCON fixes - PSR fixes - DC DML fixes - Properly throttle on BO allocation - GFX 11.0.4 fixes - MMHUB fix - Make some functions static Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221207232439.5908-1-alexander.deucher@amd.com
2022-12-09Merge tag 'amd-drm-next-6.2-2022-12-02' of ↵Dave Airlie35-154/+1249
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.2-2022-12-02: amdgpu: - Fix CPU stalls when allocating large amounts of system memory - SR-IOV fixes - BACO fixes - Enable GC 11.0.4 - Enable PSP 13.0.11 - Enable SMU 13.0.11 - Enable NBIO 7.7.1 - Fix reported VCN capabilities for RDNA2 - Misc cleanups - PCI ref count fixes - DCN DPIA fixes - DCN 3.2.x fixes - Documentation updates - GC 11.x fixes - VCN RAS fixes - APU fix for passthrough - PSR fixes - GFX preemption support for gfx9 - SDMA fix for S0ix amdkfd: - Enable KFD support for GC 11.0.4 - Misc cleanups - Fix memory leak Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221202160659.5987-1-alexander.deucher@amd.com
2022-12-07drm/amdgpu/sdma_v4_0: turn off SDMA ring buffer in the s2idle suspendPrike Liang1-9/+15
In the SDMA s0ix save process requires to turn off SDMA ring buffer for avoiding the SDMA in-flight request, otherwise will suffer from SDMA page fault which causes by page request from in-flight SDMA ring accessing at SDMA restore phase. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2248 Cc: stable@vger.kernel.org # 6.0,5.15+ Fixes: f8f4e2a51834 ("drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-06drm/amdgpu: fix mmhub register base coding errorYang Wang5-5/+5
fix MMHUB register base coding error. Fixes: ec6837591f992 ("drm/amdgpu/gmc10: program the smallK fragment size") Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-12-06drm/amdgpu: add tmz support for GC IP v11.0.4Tim Huang1-0/+1
Add tmz support for GC 11.0.4. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-06drm/amdgpu: enable GFX Clock Gating control for GC IP v11.0.4Tim Huang1-0/+1
Enable GFX IP v11.0.4 CG gate/ungate control. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-06drm/amdgpu: enable GFX Power Gating for GC IP v11.0.4Tim Huang1-0/+2
Enable GFX Power Gating control for GC IP v11.0.4. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-06drm/amdgpu: enable GFX IP v11.0.4 CG supportTim Huang1-1/+17
Add CG support for GFX/MC/HDP/ATHUB/IH/BIF. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-06drm/amdgpu: Make amdgpu_ring_mux functions as staticJiadong Zhu1-5/+3
lkp robot reported missing-prototypes and unused-but-set-variable warnings on some functions of amdgpu_mcbp_mux.c. Make them static and remove the unused variable. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-06drm/amdgpu: generally allow over-commit during BO allocationChristian König2-18/+4
We already fallback to a dummy BO with no backing store when we allocate GDS,GWS and OA resources and to GTT when we allocate VRAM. Drop all those workarounds and generalize this for GTT as well. This fixes ENOMEM issues with runaway applications which try to allocate/free GTT in a loop and are otherwise only limited by the CPU speed. The CS will wait for the cleanup of freed up BOs to satisfy the various domain specific limits and so effectively throttle those buggy applications down to a sane allocation behavior again. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-02drm/amdgpu: expand on GPUVM documentationAlex Deucher1-10/+31
Expand the GPUVM documentation to better describe the hardware functionality and use cases it serves. v2: Fixed a couple of spelling mistakes. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20221201214153.8453-2-alexander.deucher@amd.com Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>