From 4ccb9272892c33ef1c19a783cfa87103b30c2784 Mon Sep 17 00:00:00 2001 From: bob picco Date: Tue, 16 Sep 2014 09:26:47 -0400 Subject: sparc64: sun4v TLB error power off events We've witnessed a few TLB events causing the machine to power off because of prom_halt. In one case it was some nfs related area during rmmod. Another was an mmapper of /dev/mem. A more recent one is an ITLB issue with a bad pagesize which could be a hardware bug. Bugs happen but we should attempt to not power off the machine and/or hang it when possible. This is a DTLB error from an mmapper of /dev/mem: [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1 SUN4V-DTLB: TPC<0xfffff80100903e6c> SUN4V-DTLB: O7[fffff801081979d0] SUN4V-DTLB: O7<0xfffff801081979d0> SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2] . This is recent mainline for ITLB: [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc> [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8] [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8> [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4] . Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call prom_halt() and drop us to OF command prompt "ok". This isn't the case for LDOMs and the machine powers off. For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal one("1"). Otherwise, for %tl > 1, we proceed eventually to die_if_kernel(). The logic of this patch was partially inspired by David Miller's feedback. Power off of large sparc64 machines is painful. Plus die_if_kernel provides more context. A reset sequence isn't a brief period on large sparc64 but better than power-off/power-on sequence. Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco Signed-off-by: David S. Miller --- arch/sparc/mm/fault_64.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch/sparc/mm') diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index 587cd0565128..18fcd7167095 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -346,6 +346,9 @@ retry: down_read(&mm->mmap_sem); } + if (fault_code & FAULT_CODE_BAD_RA) + goto do_sigbus; + vma = find_vma(mm, address); if (!vma) goto bad_area; -- cgit v1.2.3