summaryrefslogtreecommitdiffstats
path: root/drivers/acpi/nfit
AgeCommit message (Collapse)AuthorFilesLines
2019-02-07acpi/nfit: Fix bus command validationDan Williams1-10/+12
Commit 11189c1089da "acpi/nfit: Fix command-supported detection" broke ND_CMD_CALL for bus-level commands. The "func = cmd" assumption is only valid for: ND_CMD_ARS_CAP ND_CMD_ARS_START ND_CMD_ARS_STATUS ND_CMD_CLEAR_ERROR The function number otherwise needs to be pulled from the command payload for: NFIT_CMD_TRANSLATE_SPA NFIT_CMD_ARS_INJECT_SET NFIT_CMD_ARS_INJECT_CLEAR NFIT_CMD_ARS_INJECT_GET Update cmd_to_func() for the bus case and call it in the common path. Fixes: 11189c1089da ("acpi/nfit: Fix command-supported detection") Cc: <stable@vger.kernel.org> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Grzegorz Burzynski <grzegorz.burzynski@intel.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-02libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM familyDan Williams1-0/+4
As Dexuan reports the NVDIMM_FAMILY_HYPERV platform is incompatible with the existing Linux namespace implementation because it uses NSLABEL_FLAG_LOCAL for x1-width PMEM interleave sets. Quirk it as an platform / DIMM that does not provide BLK-aperture access. Allow the libnvdimm core to assume no potential for aliasing. In case other implementations make the same mistake, provide a "noblk" module parameter to force-enable the quirk. Link: https://lkml.kernel.org/r/PU1P153MB0169977604493B82B662A01CBF920@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM Reported-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-29nfit: Add Hyper-V NVDIMM DSM command set to white listDexuan Cui2-4/+19
Add the Hyper-V _DSM command set to the white list of NVDIMM command sets. This command set is documented at http://www.uefi.org/RFIC_LIST (see "Virtual NVDIMM 0x1901"). Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-29nfit: acpi_nfit_ctl(): Check out_obj->type in the right placeDexuan Cui1-7/+7
In the case of ND_CMD_CALL, we should also check out_obj->type. The patch uses out_obj->type, which is a short alias to out_obj->package.type. Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism") Cc: <stable@vger.kernel.org> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-29nfit: Fix nfit_intel_shutdown_status() command submissionDan Williams1-17/+24
The implementation is broken in all the ways the unit test did not touch: 1/ The local definition of in_buf and in_obj violated C99 initializer expectations for zeroing. By only initializing 2 out of the three struct members the compiler was free to zero-initialize the remaining entry even though the aliased location in the union was initialized. 2/ The implementation made assumptions about the state of the 'smart' payload after command execution that are satisfied by acpi_nfit_ctl(), but not acpi_evaluate_dsm(). 3/ populate_shutdown_status() is skipped on Intel NVDIMMs due to the early return for skipping the common _LS{I,R,W} enabling. 4/ The input length should be zero. This breakage was missed due to the unit test implementation only testing the case where nfit_intel_shutdown_status() returns a valid payload. Much of this complexity would be saved if acpi_nfit_ctl() could be used, but that currently requires a 'struct nvdimm *' argument and one is not created until later in the init process. The health result is needed before the device is created because the payload gates whether the nmemX/nfit/dirty_shutdown property is visible in sysfs. Cc: <stable@vger.kernel.org> Fixes: 0ead11181fe0 ("acpi, nfit: Collect shutdown status") Reported-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-21acpi/nfit: Fix command-supported detectionDan Williams1-14/+40
The _DSM function number validation only happens to succeed when the generic Linux command number translation corresponds with a DSM-family-specific function number. This breaks NVDIMM-N implementations that correctly implement _LSR, _LSW, and _LSI, but do not happen to publish support for DSM function numbers 4, 5, and 6. Recall that the support for _LS{I,R,W} family of methods results in the DIMM being marked as supporting those command numbers at acpi_nfit_register_dimms() time. The DSM function mask is only used for ND_CMD_CALL support of non-NVDIMM_FAMILY_INTEL devices. Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command...") Cc: <stable@vger.kernel.org> Link: https://github.com/pmem/ndctl/issues/78 Reported-by: Sujith Pandel <sujith_pandel@dell.com> Tested-by: Sujith Pandel <sujith_pandel@dell.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-21acpi/nfit: Block function zero DSMsDan Williams1-0/+7
In preparation for using function number 0 as an error value, prevent it from being considered a valid function value by acpi_nfit_ctl(). Cc: <stable@vger.kernel.org> Cc: stuart hayes <stuart.w.hayes@gmail.com> Fixes: e02fb7264d8a ("nfit: add Microsoft NVDIMM DSM command set...") Reported-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-21libnvdimm/security: Require nvdimm_security_setup_events() to succeedDan Williams1-5/+0
The following warning: ACPI0012:00: security event setup failed: -19 ...is meant to capture exceptional failures of sysfs_get_dirent(), however it will also fail in the common case when security support is disabled. A few issues: 1/ A dev_warn() report for a common case is too chatty 2/ The setup of this notifier is generic, no need for it to be driven from the nfit driver, it can exist completely in the core. 3/ If it fails for any reason besides security support being disabled, that's fatal and should abort DIMM activation. Userspace may hang if it never gets overwrite notifications. 4/ The dirent needs to be released. Move the call to the core 'dimm' driver, make it conditional on security support being active, make it fatal for the exceptional case, add the missing sysfs_put() at device disable time. Fixes: 7d988097c546 ("...Add security DSM overwrite support") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-14acpi/nfit: Remove duplicate set nd_set in acpi_nfit_init_interleave_set()Wei Yang1-1/+0
We allocate nd_set in acpi_nfit_init_interleave_set() and assignn it to ndr_desc, while the assignment is done twice in this function. This patch removes the first assignment. No functional change. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-11acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id()Tony Luck1-2/+4
Possible race accessing memdev structures after dropping the mutex. Dan Williams says this could race against another thread that is doing: # echo "ACPI0012:00" > /sys/bus/acpi/drivers/nfit/unbind Reported-by: Jane Chu <jane.chu@oracle.com> Fixes: 23222f8f8dce ("acpi, nfit: Add function to look up nvdimm...") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-08nfit: Mark some functions as __maybe_unusedNathan Chancellor1-4/+4
On arm64 little endian allyesconfig: drivers/acpi/nfit/intel.c:149:12: warning: unused function 'intel_security_unlock' [-Wunused-function] static int intel_security_unlock(struct nvdimm *nvdimm, ^ drivers/acpi/nfit/intel.c:230:12: warning: unused function 'intel_security_erase' [-Wunused-function] static int intel_security_erase(struct nvdimm *nvdimm, ^ drivers/acpi/nfit/intel.c:279:12: warning: unused function 'intel_security_query_overwrite' [-Wunused-function] static int intel_security_query_overwrite(struct nvdimm *nvdimm) ^ drivers/acpi/nfit/intel.c:316:12: warning: unused function 'intel_security_overwrite' [-Wunused-function] static int intel_security_overwrite(struct nvdimm *nvdimm, ^ 4 warnings generated. Mark these functions as __maybe_unused because they are only used when CONFIG_X86 is set. Fixes: 4c6926a23b76 ("acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMs") Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-08ACPI/nfit: delete the function to_acpi_nfit_descXiaochun Lee1-9/+3
The function to_acpi_nfit_desc and function to_acpi_desc do the same things,delete the function to_acpi_nfit_desc, and keep the inline function to_acpi_desc. Signed-off-by: Xiaochun Lee <lixc17@lenovo.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-08ACPI/nfit: delete the redundant header fileXiaochun Lee1-1/+0
The header file "intel.h" is repeated here, So delete one. Signed-off-by: Xiaochun Lee <lixc17@lenovo.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06acpi/nfit, device-dax: Identify differentiated memory with a unique numa-nodeDan Williams1-2/+6
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware Interface Table), is the first known instance of a memory range described by a unique "target" proximity domain. Where "initiator" and "target" proximity domains is an approach that the ACPI HMAT (Heterogeneous Memory Attributes Table) uses to described the unique performance properties of a memory range relative to a given initiator (e.g. CPU or DMA device). Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y char-device follows the traditional notion of 'numa-node' where the attribute conveys the closest online numa-node. That numa-node attribute is useful for cpu-binding and memory-binding processes *near* the device. However, when the memory range backing a 'pmem', or 'dax' device is onlined (memory hot-add) the memory-only-numa-node representing that address needs to be differentiated from the set of online nodes. In other words, the numa-node association of the device depends on whether you can bind processes *near* the cpu-numa-node in the offline device-case, or bind process *on* the memory-range directly after the backing address range is onlined. Allow for the case that platform firmware describes persistent memory with a unique proximity domain, i.e. when it is distinct from the proximity of DRAM and CPUs that are on the same socket. Plumb the Linux numa-node translation of that proximity through the libnvdimm region device to namespaces that are in device-dax mode. With this in place the proposed kmem driver [1] can optionally discover a unique numa-node number for the address range as it transitions the memory from an offline state managed by a device-driver to an online memory range managed by the core-mm. [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com Reported-by: Fan Du <fan.du@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-27Merge miscellaneous libnvdimm updates for 4.21Dan Williams1-2/+8
* Use common helpers, bitmap_zalloc() and kstrndup(), to replace open coded versions. * Clarify the comments around hotplug vs initial init case for the nfit driver. * Cleanup the libnvdimm init path.
2018-12-21acpi/nfit, libnvdimm/security: add Intel DSM 1.8 master passphrase supportDave Jiang2-17/+38
With Intel DSM 1.8 [1] two new security DSMs are introduced. Enable/update master passphrase and master secure erase. The master passphrase allows a secure erase to be performed without the user passphrase that is set on the NVDIMM. The commands of master_update and master_erase are added to the sysfs knob in order to initiate the DSMs. They are similar in opeartion mechanism compare to update and erase. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm/security: Add security DSM overwrite supportDave Jiang2-0/+95
Add support for the NVDIMM_FAMILY_INTEL "ovewrite" capability as described by the Intel DSM spec v1.7. This will allow triggering of overwrite on Intel NVDIMMs. The overwrite operation can take tens of minutes. When the overwrite DSM is issued successfully, the NVDIMMs will be unaccessible. The kernel will do backoff polling to detect when the overwrite process is completed. According to the DSM spec v1.7, the 128G NVDIMMs can take up to 15mins to perform overwrite and larger DIMMs will take longer. Given that overwrite puts the DIMM in an indeterminate state until it completes introduce the NDD_SECURITY_OVERWRITE flag to prevent other operations from executing when overwrite is happening. The NDD_WORK_PENDING flag is added to denote that there is a device reference on the nvdimm device for an async workqueue thread context. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm: Add support for issue secure erase DSM to Intel nvdimmDave Jiang1-0/+47
Add support to issue a secure erase DSM to the Intel nvdimm. The required passphrase is acquired from an encrypted key in the kernel user keyring. To trigger the action, "erase <keyid>" is written to the "security" sysfs attribute. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm: Add disable passphrase support to Intel nvdimm.Dave Jiang1-0/+41
Add support to disable passphrase (security) for the Intel nvdimm. The passphrase used for disabling is pulled from an encrypted-key in the kernel user keyring. The action is triggered by writing "disable <keyid>" to the sysfs attribute "security". Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMsDave Jiang1-0/+109
Add support to unlock the dimm via the kernel key management APIs. The passphrase is expected to be pulled from userspace through keyutils. The key management and sysfs attributes are libnvdimm generic. Encrypted keys are used to protect the nvdimm passphrase at rest. The master key can be a trusted-key sealed in a TPM, preferred, or an encrypted-key, more flexible, but more exposure to a potential attacker. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13acpi/nfit, libnvdimm: Add freeze security support to Intel nvdimmDave Jiang1-0/+28
Add support for freeze security on Intel nvdimm. This locks out any changes to security for the DIMM until a hard reset of the DIMM is performed. This is triggered by writing "freeze" to the generic nvdimm/nmemX "security" sysfs attribute. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13acpi/nfit, libnvdimm: Introduce nvdimm_security_opsDave Jiang4-1/+69
Some NVDIMMs, like the ones defined by the NVDIMM_FAMILY_INTEL command set, expose a security capability to lock the DIMMs at poweroff and require a passphrase to unlock them. The security model is derived from ATA security. In anticipation of other DIMMs implementing a similar scheme, and to abstract the core security implementation away from the device-specific details, introduce nvdimm_security_ops. Initially only a status retrieval operation, ->state(), is defined, along with the base infrastructure and definitions for future operations. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimmDave Jiang2-13/+21
The generated dimm id is needed for the sysfs attribute as well as being used as the identifier/description for the security key. Since it's constant and should never change, store it as a member of struct nvdimm. As nvdimm_create() continues to grow parameters relative to NFIT driver requirements, do not require other implementations to keep pace. Introduce __nvdimm_create() to carry the new parameters and keep nvdimm_create() with the long standing default api. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-10ACPI/nfit: Adjust annotation for why return 0 if fail to find NFIT at startOcean He1-1/+7
Add detailed explanation for why it's ok to return 0 if we fail to find an NFIT at startup. Refer to chapter 9.20.2 NVDIMM Root Device in ACPI 6.2 spec. Signed-off-by: Ocean He <hehy1@lenovo.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-05acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short"Dan Williams1-1/+1
A "short" ARS (address range scrub) instructs the platform firmware to return known errors. In contrast, a "long" ARS instructs platform firmware to arrange every data address on the DIMM to be read / checked for poisoned data. The conversion of the flags in commit d3abaf43bab8 "acpi, nfit: Fix Address Range Scrub completion tracking", changed the meaning of passing '0' to acpi_nfit_ars_rescan(). Previously '0' meant "not short", now '0' is ARS_REQ_SHORT. Pass ARS_REQ_LONG to restore the expected scrub-type behavior of user-initiated ARS sessions. Fixes: d3abaf43bab8 ("acpi, nfit: Fix Address Range Scrub completion tracking") Reported-by: Jacek Zloch <jacek.zloch@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-04acpi/nfit: Add support for Intel DSM 1.8 commandsDave Jiang4-5/+147
Add command definition for security commands defined in Intel DSM specification v1.8 [1]. This includes "get security state", "set passphrase", "unlock unit", "freeze lock", "secure erase", "overwrite", "overwrite query", "master passphrase enable/disable", and "master erase", . Since this adds several Intel definitions, move the relevant bits to their own header. These commands mutate physical data, but that manipulation is not cache coherent. The requirement to flush and invalidate caches makes these commands unsuitable to be called from userspace, so extra logic is added to detect and block these commands from being submitted via the ioctl command submission path. Lastly, the commands may contain sensitive key material that should not be dumped in a standard debug session. Update the nvdimm-command payload-dump facility to move security command payloads behind a default-off compile time switch. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-18Merge tag 'libnvdimm-fixes-4.20-rc3' of ↵Linus Torvalds1-14/+5
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A small batch of fixes for v4.20-rc3. The overflow continuation fix addresses something that has been broken for several releases. Arguably it could wait even longer, but it's a one line fix and this finishes the last of the known address range scrub bug reports. The revert addresses a lockdep regression. The unit tests are not critical to fix, but no reason to hold this fix back. Summary: - Address Range Scrub overflow continuation handling has been broken since it was initially merged. It was only recently that error injection and platform-BIOS support enabled this corner case to be exercised. - The recent attempt to provide more isolation for the kernel Address Range Scrub state machine from userapace initiated sessions triggers a lockdep report. Revert and try again at the next merge window. - Fix a kasan reported buffer overflow in libnvdimm unit test infrastrucutre (nfit_test)" * tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: Revert "acpi, nfit: Further restrict userspace ARS start requests" acpi, nfit: Fix ARS overflow continuation tools/testing/nvdimm: Fix the array size for dimm devices.
2018-11-10Revert "acpi, nfit: Further restrict userspace ARS start requests"Dan Williams1-12/+3
The following lockdep splat results from acquiring the init_mutex in acpi_nfit_clear_to_send(): WARNING: possible circular locking dependency detected lt-daxdev-error/7216 is trying to acquire lock: 00000000f694db15 (&acpi_desc->init_mutex){+.+.}, at: acpi_nfit_clear_to_send+0x27/0x80 [nfit] but task is already holding lock: 00000000182298f2 (&nvdimm_bus->reconfig_mutex){+.+.}, at: __nd_ioctl+0x457/0x610 [libnvdimm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&nvdimm_bus->reconfig_mutex){+.+.}: nvdimm_badblocks_populate+0x41/0x150 [libnvdimm] nd_region_notify+0x95/0xb0 [libnvdimm] nd_device_notify+0x40/0x50 [libnvdimm] ars_complete+0x7f/0xd0 [nfit] acpi_nfit_scrub+0xbb/0x410 [nfit] process_one_work+0x22b/0x5c0 worker_thread+0x3c/0x390 kthread+0x11e/0x140 ret_from_fork+0x3a/0x50 -> #0 (&acpi_desc->init_mutex){+.+.}: __mutex_lock+0x83/0x980 acpi_nfit_clear_to_send+0x27/0x80 [nfit] __nd_ioctl+0x474/0x610 [libnvdimm] nd_ioctl+0xa4/0xb0 [libnvdimm] do_vfs_ioctl+0xa5/0x6e0 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe New infrastructure is needed to be able to perform this check without acquiring the lock. Fixes: 594861215c83 ("acpi, nfit: Further restrict userspace ARS start") Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-10acpi, nfit: Fix ARS overflow continuationDan Williams1-2/+2
When the platform BIOS is unable to report all the media error records it requires the OS to restart the scrub at a prescribed location. The driver detects the overflow condition, but then fails to report it to the ARS state machine after reaping the records. Propagate -ENOSPC correctly to continue the ARS operation. Cc: <stable@vger.kernel.org> Fixes: 1cf03c00e7c1 ("nfit: scrub and register regions in a workqueue") Reported-by: Jacek Zloch <jacek.zloch@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-06acpi/nfit, x86/mce: Validate a MCE's address before using itVishal Verma1-0/+4
The NFIT machine check handler uses the physical address from the mce structure, and compares it against information in the ACPI NFIT table to determine whether that location lies on an NVDIMM. The mce->addr field however may not always be valid, and this is indicated by the MCI_STATUS_ADDRV bit in the status field. Export mce_usable_address() which already performs validation for the address, and use it in the NFIT handler. Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error") Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Arnd Bergmann <arnd@arndb.de> Cc: Dan Williams <dan.j.williams@intel.com> CC: Dave Jiang <dave.jiang@intel.com> CC: elliott@hpe.com CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ingo Molnar <mingo@redhat.com> CC: Len Brown <lenb@kernel.org> CC: linux-acpi@vger.kernel.org CC: linux-edac <linux-edac@vger.kernel.org> CC: linux-nvdimm@lists.01.org CC: Qiuxu Zhuo <qiuxu.zhuo@intel.com> CC: "Rafael J. Wysocki" <rjw@rjwysocki.net> CC: Ross Zwisler <zwisler@kernel.org> CC: stable <stable@vger.kernel.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tony Luck <tony.luck@intel.com> CC: x86-ml <x86@kernel.org> CC: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20181026003729.8420-2-vishal.l.verma@intel.com
2018-11-06acpi/nfit, x86/mce: Handle only uncorrectable machine checksVishal Verma1-2/+2
The MCE handler for nfit devices is called for memory errors on a Non-Volatile DIMM and adds the error location to a 'badblocks' list. This list is used by the various NVDIMM drivers to avoid consuming known poison locations during IO. The MCE handler gets called for both corrected and uncorrectable errors. Until now, both kinds of errors have been added to the badblocks list. However, corrected memory errors indicate that the problem has already been fixed by hardware, and the resulting interrupt is merely a notification to Linux. As far as future accesses to that location are concerned, it is perfectly fine to use, and thus doesn't need to be included in the above badblocks list. Add a check in the nfit MCE handler to filter out corrected mce events, and only process uncorrectable errors. Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error") Reported-by: Omar Avelar <omar.avelar@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Arnd Bergmann <arnd@arndb.de> CC: Dan Williams <dan.j.williams@intel.com> CC: Dave Jiang <dave.jiang@intel.com> CC: elliott@hpe.com CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ingo Molnar <mingo@redhat.com> CC: Len Brown <lenb@kernel.org> CC: linux-acpi@vger.kernel.org CC: linux-edac <linux-edac@vger.kernel.org> CC: linux-nvdimm@lists.01.org CC: Qiuxu Zhuo <qiuxu.zhuo@intel.com> CC: "Rafael J. Wysocki" <rjw@rjwysocki.net> CC: Ross Zwisler <zwisler@kernel.org> CC: stable <stable@vger.kernel.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tony Luck <tony.luck@intel.com> CC: x86-ml <x86@kernel.org> CC: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20181026003729.8420-1-vishal.l.verma@intel.com
2018-10-17acpi, nfit: Further restrict userspace ARS start requestsDan Williams1-5/+14
In addition to not allowing ARS start while the background thread is actively running, prevent ARS start while any scrub request is pending. This aligns the window for ARS start submission with the status of ARS reported via sysfs. Previously userspace could sneak its own ARS start requests in while sysfs reported -EBUSY. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-17acpi, nfit: Fix Address Range Scrub completion trackingDan Williams2-78/+101
The Address Range Scrub implementation tried to skip running scrubs against ranges that were already scrubbed by the BIOS. Unfortunately that support also resulted in early scrub completions as evidenced by this debug output from nfit_test: nd_region region9: ARS: range 1 short complete nd_region region3: ARS: range 1 short complete nd_region region4: ARS: range 2 ARS start (0) nd_region region4: ARS: range 2 short complete ...i.e. completions without any indications that the scrub was started. This state of affairs was hard to see in the code due to the proliferation of state bits and mistakenly trying to track done state per-range when the completion is a global property of the bus. So, kill the four ARS state bits (ARS_REQ, ARS_REQ_REDO, ARS_DONE, and ARS_SHORT), and replace them with just 2 request flags ARS_REQ_SHORT and ARS_REQ_LONG. The implementation will still complete and reap the results of BIOS initiated ARS, but it will not attempt to use that information to affect the completion status of scrubbing the ranges from a Linux perspective. Instead, try to synchronously run a short ARS per range at init time and schedule a long scrub in the background. If ARS is busy with an ARS request, schedule both a short and a long scrub for when ARS returns to idle. This logic also satisfies the intent of what ARS_REQ_REDO was trying to achieve. The new rule is that the REQ flag stays set until the next successful ars_start() for that range. With the new policy that the REQ flags are not cleared until the next start, the implementation no longer loses requests as can be seen from the following log: nd_region region3: ARS: range 1 ARS start short (0) nd_region region9: ARS: range 1 ARS start short (0) nd_region region3: ARS: range 1 complete nd_region region4: ARS: range 2 ARS start short (0) nd_region region9: ARS: range 1 complete nd_region region9: ARS: range 1 ARS start long (0) nd_region region4: ARS: range 2 complete nd_region region3: ARS: range 1 ARS start long (0) nd_region region9: ARS: range 1 complete nd_region region3: ARS: range 1 complete nd_region region4: ARS: range 2 ARS start long (0) nd_region region4: ARS: range 2 complete ...note that the nfit_test emulated driver provides 2 buses, that is why some of the range indices are duplicated. Notice that each range now successfully completes a short and long scrub. Cc: <stable@vger.kernel.org> Fixes: 14c73f997a5e ("nfit, address-range-scrub: introduce nfit_spa->ars_state") Fixes: cc3d3458d46f ("acpi/nfit: queue issuing of ars when an uc error...") Reported-by: Jacek Zloch <jacek.zloch@intel.com> Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-17tools/testing/nvdimm: Populate dirty shutdown dataDan Williams1-2/+5
Allow the unit tests to verify the retrieval of the dirty shutdown count via smart commands, and allow the driver-load-time retrieval of the smart health payload to be simulated by nfit_test. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-17acpi, nfit: Collect shutdown statusDan Williams3-1/+117
Some NVDIMMs, in addition to providing an indication of whether the previous shutdown was clean, also provide a running count of lifetime dirty-shutdown events for the device. In anticipation of this functionality appearing on more devices arrange for the nfit driver to retrieve / cache this data at DIMM discovery time, and export it via sysfs. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-16acpi, nfit: Introduce nfit_mem flagsDan Williams2-13/+22
In preparation for adding a flag to indicate whether a DIMM publishes a dirty-shutdown count, convert the existing flags to a bit field. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-08-25Merge tag 'libnvdimm-for-4.19_misc' of ↵Linus Torvalds2-4/+21
gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dave Jiang: "Collection of misc libnvdimm patches for 4.19 submission: - Adding support to read locked nvdimm capacity. - Change test code to make DSM failure code injection an override. - Add support for calculate maximum contiguous area for namespace. - Add support for queueing a short ARS when there is on going ARS for nvdimm. - Allow NULL to be passed in to ->direct_access() for kaddr and pfn params. - Improve smart injection support for nvdimm emulation testing. - Fix test code that supports for emulating controller temperature. - Fix hang on error before devm_memremap_pages() - Fix a bug that causes user memory corruption when data returned to user for ars_status. - Maintainer updates for Ross Zwisler emails and adding Jan Kara to fsdax" * tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm: fix ars_status output length calculation device-dax: avoid hang on error before devm_memremap_pages() tools/testing/nvdimm: improve emulation of smart injection filesystem-dax: Do not request kaddr and pfn when not required md/dm-writecache: Don't request pointer dummy_addr when not required dax/super: Do not request a pointer kaddr when not required tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access() s390, dcssblk: kaddr and pfn can be NULL to ->direct_access() libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access() acpi/nfit: queue issuing of ars when an uc error notification comes in libnvdimm: Export max available extent libnvdimm: Use max contiguous area for namespace size MAINTAINERS: Add Jan Kara for filesystem DAX MAINTAINERS: update Ross Zwisler's email address tools/testing/nvdimm: Fix support for emulating controller temperature tools/testing/nvdimm: Make DSM failure code injection an override acpi, nfit: Prefer _DSM over _LSR for namespace label reads libnvdimm: Introduce locked DIMM capacity support
2018-07-27acpi/nfit: queue issuing of ars when an uc error notification comes inDave Jiang2-3/+10
When the ACPI UC error notifier gets called and ARS_REQ bit is set with the passed in flag, we can receive -EBUSY if ARS_REQ bit is already set for the nfit_spa->ars_state. When that happens, the ARS request is dropped. That can potentially cause us to miss the unreported errors that the on going ARS request does not receive. Add an ARS_REQ_REDO state that will request short ARS upon ARS completion to grab any errors we missed. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
2018-07-14acpi, nfit: Prefer _DSM over _LSR for namespace label readsDan Williams1-1/+11
The _LSR method indicates locked status via error-code-3 returned in the _LSR payload. When any error is returned the payload of _LSR is truncated to a zero-length buffer. The _DSM path in comparison allows system software to retrieve the locked status *and* namespace label area contents. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-07-11nfit: fix unchecked dereference in acpi_nfit_ctlDave Jiang1-2/+4
Incremental patch to fix the unchecked dereference in acpi_nfit_ctl. Reported by Dan Carpenter: "acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value" from Jun 28, 2018, leads to the following Smatch complaint: drivers/acpi/nfit/core.c:578 acpi_nfit_ctl() warn: variable dereferenced before check 'cmd_rc' (see line 411) drivers/acpi/nfit/core.c 410 411 *cmd_rc = -EINVAL; ^^^^^^^^^^^^^^^^^^ Patch adds unchecked dereference. Fixes: c1985cefd844 ("acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value") Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2018-07-05acpi, nfit: Fix scrub idle detectionDan Williams2-11/+34
The notification of scrub completion happens within the scrub workqueue. That can clearly race someone running scrub_show() and work_busy() before the workqueue has a chance to flush the recently completed work. Add a flag to reliably indicate the idle vs busy state. Without this change applications using poll(2) to wait for scrub-completion may falsely wakeup and read ARS as being busy even though the thread is going idle and then hang indefinitely. Fixes: bc6ba8085842 ("nfit, address-range-scrub: rework and simplify ARS...") Cc: <stable@vger.kernel.org> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-06-30acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a valueDave Jiang1-0/+2
cmd_rc is passed in by reference to the acpi_nfit_ctl() function and the caller expects a value returned. However, when the package is pass through via the ND_CMD_CALL command, cmd_rc is not touched. Make sure cmd_rc is always set. Fixes: aef253382266 ("libnvdimm, nfit: centralize command status translation") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-06-12treewide: devm_kzalloc() -> devm_kcalloc()Kees Cook1-3/+4
The devm_kzalloc() function has a 2-factor argument form, devm_kcalloc(). This patch replaces cases of: devm_kzalloc(handle, a * b, gfp) with: devm_kcalloc(handle, a * b, gfp) as well as handling cases of: devm_kzalloc(handle, a * b * c, gfp) with: devm_kzalloc(handle, array3_size(a, b, c), gfp) as it's slightly less ugly than: devm_kcalloc(handle, array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: devm_kzalloc(handle, 4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. Some manual whitespace fixes were needed in this patch, as Coccinelle really liked to write "=devm_kcalloc..." instead of "= devm_kcalloc...". The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ expression HANDLE; type TYPE; expression THING, E; @@ ( devm_kzalloc(HANDLE, - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | devm_kzalloc(HANDLE, - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression HANDLE; expression COUNT; typedef u8; typedef __u8; @@ ( devm_kzalloc(HANDLE, - sizeof(u8) * (COUNT) + COUNT , ...) | devm_kzalloc(HANDLE, - sizeof(__u8) * (COUNT) + COUNT , ...) | devm_kzalloc(HANDLE, - sizeof(char) * (COUNT) + COUNT , ...) | devm_kzalloc(HANDLE, - sizeof(unsigned char) * (COUNT) + COUNT , ...) | devm_kzalloc(HANDLE, - sizeof(u8) * COUNT + COUNT , ...) | devm_kzalloc(HANDLE, - sizeof(__u8) * COUNT + COUNT , ...) | devm_kzalloc(HANDLE, - sizeof(char) * COUNT + COUNT , ...) | devm_kzalloc(HANDLE, - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ expression HANDLE; type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ expression HANDLE; identifier SIZE, COUNT; @@ - devm_kzalloc + devm_kcalloc (HANDLE, - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression HANDLE; expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( devm_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | devm_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | devm_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | devm_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | devm_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | devm_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | devm_kzalloc(HANDLE, - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | devm_kzalloc(HANDLE, - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression HANDLE; expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | devm_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | devm_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ expression HANDLE; identifier STRIDE, SIZE, COUNT; @@ ( devm_kzalloc(HANDLE, - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | devm_kzalloc(HANDLE, - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | devm_kzalloc(HANDLE, - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | devm_kzalloc(HANDLE, - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | devm_kzalloc(HANDLE, - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | devm_kzalloc(HANDLE, - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | devm_kzalloc(HANDLE, - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | devm_kzalloc(HANDLE, - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression HANDLE; expression E1, E2, E3; constant C1, C2, C3; @@ ( devm_kzalloc(HANDLE, C1 * C2 * C3, ...) | devm_kzalloc(HANDLE, - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | devm_kzalloc(HANDLE, - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | devm_kzalloc(HANDLE, - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | devm_kzalloc(HANDLE, - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression HANDLE; expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( devm_kzalloc(HANDLE, sizeof(THING) * C2, ...) | devm_kzalloc(HANDLE, sizeof(TYPE) * C2, ...) | devm_kzalloc(HANDLE, C1 * C2 * C3, ...) | devm_kzalloc(HANDLE, C1 * C2, ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - (E1) * E2 + E1, E2 , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - (E1) * (E2) + E1, E2 , ...) | - devm_kzalloc + devm_kcalloc (HANDLE, - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-03acpi, nfit: Remove ecc_unit_sizeDan Williams1-11/+0
The "Clear Error Unit" may be smaller than the ECC unit size on some devices. For example, poison may be tracked at 64-byte alignment even though the ECC unit is larger. Unless / until the ACPI specification provides a non-ambiguous way to communicate this property do not expose this to userspace. Software that had been using this property must already be prepared for the case where the property is not provided on older kernels, so it is safe to remove this attribute. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds3-367/+339
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
2018-04-09Merge branch 'for-4.17/libnvdimm' into libnvdimm-for-nextDan Williams3-367/+339
2018-04-07nfit, address-range-scrub: add module option to skip initial arsDan Williams1-0/+7
After attempting to quickly retrieve known errors the kernel proceeds to kick off a long running ARS. Add a module option to disable this behavior at initialization time, or at new region discovery time. Otherwise, ARS can be started manually regardless of the state of this setting. Co-developed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-07nfit, address-range-scrub: rework and simplify ARS state machineDan Williams2-268/+218
ARS is an operation that can take 10s to 100s of seconds to find media errors that should rarely be present. If the platform crashes due to media errors in persistent memory, the expectation is that the BIOS will report those known errors in a 'short' ARS request. A 'short' ARS request asks platform firmware to return an ARS payload with all known errors, but without issuing a 'long' scrub. At driver init a short request is issued to all PMEM ranges before registering regions. Then, in the background, a long ARS is scheduled for each region. The ARS implementation is simplified to centralize ARS completion work in the ars_complete() helper. The timeout is removed since there is no facility to cancel ARS, and this otherwise arranges for system init to never be blocked waiting for a 'long' ARS. The ars_state flags are used to coordinate ARS requests from driver init, ARS requests from userspace, and ARS requests in response to media error notifications. Given that there is no notification of ARS completion the implementation still needs to poll. It backs off exponentially to a maximum poll period of 30 minutes. Suggested-by: Toshi Kani <toshi.kani@hpe.com> Co-developed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-07nfit, address-range-scrub: determine one platform max_ars valueDan Williams2-39/+41
acpi_nfit_query_poison() is awkward in that it requires an nfit_spa argument in order to determine what max_ars value to use. Instead probe for the minimum max_ars across all scrub-capable ranges in the system and drop the nfit_spa argument. This enables a larger rework / simplification of the ARS state machine whereby the status can be retrieved once and then iterated over all address ranges to reap completions. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-05nfit, address-range-scrub: introduce nfit_spa->ars_stateDan Williams2-9/+18
In preparation for re-working the ARS implementation to better handle short vs long ARS runs, introduce nfit_spa->ars_state. For now this just replaces the nfit_spa->ars_required bit-field/flag, but going forward it will be used to track ARS completion and make short vs long ARS requests. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>