linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2016-08-05	Merge branches 'acpi-ec' and 'acpi-button'	Rafael J. Wysocki	1	-4/+37
	* acpi-ec: ACPI / EC: Work around method reentrancy limit in ACPICA for _Qxx * acpi-button: ACPI / button: remove pointer to old lid_sysfs on unbind
2016-08-04	ACPI / EC: Work around method reentrancy limit in ACPICA for _Qxx	Lv Zheng	1	-4/+37
	A regression is caused by the following commit: Commit: 02b771b64b73226052d6e731a0987db3b47281e9 Subject: ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations In this commit, using system workqueue causes that the maximum parallel executions of _Qxx can exceed 255. This violates the method reentrancy limit in ACPICA and generates the following error log: ACPI Error: Method reached maximum reentrancy limit (255) (20150818/dsmethod-341) This patch creates a seperate workqueue and limits the number of parallel _Qxx evaluations down to a configurable value (can be tuned against number of online CPUs). Since EC events are handled after driver probe, we can create the workqueue in acpi_ec_init(). Fixes: 02b771b64b73 (ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations) Link: https://bugzilla.kernel.org/show_bug.cgi?id=135691 Cc: 4.3+ <stable@vger.kernel.org> # 4.3+ Reported-and-tested-by: Helen Buus <ubuntu@hbuus.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-25	Merge branches 'acpi-ec', 'acpi-video', 'acpi-button' and 'acpi-thermal'	Rafael J. Wysocki	1	-53/+53
	* acpi-ec: ACPI / EC: Remove wrong ECDT correction quirks ACPI / EC: Cleanup boot EC code using acpi_ec_alloc() * acpi-video: ACPI / video: Dummy acpi_video_register should return error code ACPI / video: skip evaluating _DOD when it does not exist ACPI / video: Thinkpad X201 Tablet needs video_detect_force_video * acpi-button: ACPI / button: Add quirks for initial lid state notification ACPI / button: Refactor functions to eliminate redundant code ACPI / button: Remove initial lid state notification * acpi-thermal: ACPI / thermal: Remove create_workqueue()
2016-07-08	ACPI / EC: Fix code ordering issue in ec_remove_handlers()	Lv Zheng	1	-2/+13
	There is an order issue in ec_remove_handlers() that acpi_ec_stop() is called before removing the operation region handler. That is incorrect, because the operation region handler removal triggers _REG(DISCONNECT) which may result in new EC transactions to carry out. That existing issue has been triggered by the following commit: Commit: dcf15cbded656a12335bc4151f3f75f10080a375 Subject: ACPI / EC: Fix a boot EC regresion by restoring boot EC which changed the driver to call ec_remove_handlers() after invoking _REG(CONNECT), so the issue has become visible. Fixes: dcf15cbded65 (ACPI / EC: Fix a boot EC regresion by restoring boot EC) Link: https://bugzilla.kernel.org/show_bug.cgi?id=102421 Reported-and-tested-by: Wolfram Sang <wsa@the-dreams.de> Reported-by: Nicholas <nkudriavtsev@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-04	ACPI / EC: Remove wrong ECDT correction quirks	Lv Zheng	1	-16/+5
	Our Windows probe result shows that EC._REG is evaluated after evaluating all _INI/_STA control methods. With boot EC always switched in acpi_ec_dsdt_probe(), we can see that as long as there is no EC opregion accesses in the MLC (module level code, AML code out of any control methods) and in _INI/_STA, there is no need to make sure that ECDT must be correct. Bugs of 9399/12461 were reported against an order issue that BAT0/1._STA evaluations contain EC accesses while the ECDT setting is wrong. >From the acpidump output posted on bug 9399, we can see that it is actually a different issue. In this table, if EC._REG is not executed, EC accesses will be done in a platform specific manner. As we've already ensured not to execute EC._REG during the eary stage, we can remove the quirks for bug 9399. From the acpidump output posted on bug 12461, we can see that it still needs the quirk. In this table, EC._REG flags a named object whose default value is One, thus BAT1._STA surely should invoke EC accesses whatever we invoke EC._REG or not. We have to keep the quirk for it before we can root cause the issue. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-04	ACPI / EC: Cleanup boot EC code using acpi_ec_alloc()	Lv Zheng	1	-37/+48
	Failure handling of the boot EC code is not tidy. This patch cleans them up with acpi_ec_alloc(). This patch also changes acpi_ec_dsdt_probe(), always switches the boot EC from the ECDT one to the DSDT one in this function. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-07	ACPI / EC: Fix a boot EC regresion by restoring boot EC support for the DSDT EC	Lv Zheng	1	-7/+22
	According to the Windows probing result, during the table loading, the EC device described in the ECDT should be used. And the ECDT EC is also effective during the period the namespace objects are initialized (we can see a separate process executing _STA/_INI on Windows before executing other device specific control methods, for example, EC._REG). During the device enumration, the EC device described in the DSDT should be used. But there are differences between Linux and Windows around the device probing order. Thus in Linux, we should enable the DSDT EC as early as possible before enumerating devices in order not to trigger issues related to the device enumeration order differences. This patch thus converts acpi_boot_ec_enable() into acpi_ec_dsdt_probe() to fix the gap. This also fixes a user reported regression triggered after we switched the "table loading"/"ECDT support" to be ACPI spec 2.0 compliant. Fixes: 59f0aa9480cf (ACPI 2.0 / ECDT: Remove early namespace reference from EC) Link: https://bugzilla.kernel.org/show_bug.cgi?id=119261 Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09	ACPI 2.0 / ECDT: Remove early namespace reference from EC	Lv Zheng	1	-91/+54
	All operation region accesses are allowed by AML interpreter when AML is executed, so actually BIOSen are responsible to avoid the operation region accesses in AML before OSPM has prepared an operation region driver. This is done via _REG control method. So AML code normally sets a global named object REGC to 1 when _REG(3, 1) is evaluated. Then what is ECDT? Quoting from ACPI spec 6.0, 5.2.15 Embedded Controller Boot Resources Table (ECDT): "The presence of this table allows OSPM to provide Embedded Controller operation region space access before the namespace has been evaluated." Spec also suggests a compatible mean to indicate the early EC access availability: Device (EC) { Name (REGC, Ones) Method (_REG, 2) { If (LEqual (Arg0, 3)) { Store (Arg1, REGC) } } Method (ECAV) { If (LEqual (REGC, Ones)) { If (LGreaterEqual (_REV, 2)) { Return (One) } Else { Return (Zero) } } Else { Return (REGC) } } } In this way, it allows EC accesses to happen before EC._REG(3, 1) is invoked. But ECAV is not the only way practical BIOSen using to indicate the early EC access availibility, the known variations include: 1. Setting REGC to One in \_SB._INI when _REV >= 2. Since \_SB._INI is the first control method evaluated by OSPM during the enumeration, this allows EC accesses to happen for the entire enumeration process before the namespace EC is enumerated. 2. Initialize REGC to One by default, this even allows EC accesses to happen during the table loading. Linux is now broken around ECDT support during the long term bug fixing work because it has merged many wrong ECDT bug fixes (see details below). Linux currently uses namespace EC's settings instead of ECDT settings when ECDT is detected. This apparently will result in namespace walk and _CRS/_GPE/_REG evaluations. Such stuffs could only happen after namespace is ready, while ECDT is purposely to be used before namespace is ready. The wrong bug fixing story is: 1. Link 1: At Linux ACPI early stages, "no _Lxx/_Exx/_Qxx evaluation can happen before the namespace is ready" are not ensured by ACPICA core and Linux. This is currently ensured by deferred enabling of GPE and defered registering of EC query methods (acpi_ec_register_query_methods). 2. Link 2: Reporters reported buggy ECDTs, expecting quirks for the platform. Originally, the quirk is simple, only doing things with ECDT. Bug 9399 and 12461 are platforms (Asus L4R, Asus M6R, MSI MS-171F) reported to have wrong ECDT IO port addresses, the port addresses are reversed. Bug 11880 is a platform (Asus X50GL) reported to have 0 valued port addresses, we can see that all EC accesses are protected by ECAV on this platform, so actually no early EC accesses is required by this platform. 3. Link 3: But when the bug fixing developer was requested to provide a handy and non-quirk bug fix, he tried to use correct EC settings from namespace and broke the spec purpose. We can even see that the developer was suffered from many regrssions. One interesting one is 14086, where the actual root cause obviously should be: _REG is evaluated too early. But unfortunately, the bug is fixed in a totally wrong way. So everything goes wrong from these commits: Commit: c6cb0e878446c79f42e7833d7bb69ed6bfbb381f Subject: ACPI: EC: Don't trust ECDT tables from ASUS Commit: a5032bfdd9c80e0231a6324661e123818eb46ecd Subject: ACPI: EC: Always parse EC device This patch reverts Linux behavior to simple ECDT quirk support in order to stop early _CRS/_GPE/_REG evaluations. For Bug 9399, 12461, since it is reported that the platforms require early EC accesses, this patch restores the simple ECDT quirks for them. For Bug 11880, since it is not reported that the platform requires early EC accesses and its ACPI tables contain correct ECAV, we choose an ECDT enumeration failure for this platform. Link 1: https://bugzilla.kernel.org/show_bug.cgi?id=9916 http://bugzilla.kernel.org/show_bug.cgi?id=10100 https://lkml.org/lkml/2008/2/25/282 Link 2: https://bugzilla.kernel.org/show_bug.cgi?id=9399 https://bugzilla.kernel.org/show_bug.cgi?id=12461 https://bugzilla.kernel.org/show_bug.cgi?id=11880 Link 3: https://bugzilla.kernel.org/show_bug.cgi?id=11884 https://bugzilla.kernel.org/show_bug.cgi?id=14081 https://bugzilla.kernel.org/show_bug.cgi?id=14086 https://bugzilla.kernel.org/show_bug.cgi?id=14446 Link 4: https://bugzilla.kernel.org/show_bug.cgi?id=112911 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09	ACPI 2.0 / ECDT: Split EC_FLAGS_HANDLERS_INSTALLED	Lv Zheng	1	-41/+55
	This patch splits EC_FLAGS_HANDLERS_INSTALLED so that address space handler can be installed when it is not possible to install GPE handler during early stage. This patch also tunes address space handler installation, making it happening earlier than GPE handler installation for the same purpose. Since acpi_ec_start()/acpi_ec_stop() will be entered multiple times after applying this change, it is also required to protect acpi_enable_gpe()/ acpi_disable_gpe() invocations. Link: https://bugzilla.kernel.org/show_bug.cgi?id=112911 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-16	ACPI-EC: Drop unnecessary check made before calling acpi_ec_delete_query()	Markus Elfring	1	-1/+1
	The acpi_ec_delete_query() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-26	ACPI / EC: Fix a race issue in acpi_ec_guard_event()	Lv Zheng	1	-13/+29
	In acpi_ec_guard_event(), EC transaction state machine variables should be checked with the EC spinlock locked. The bug doesn't trigger any real issue now because this bug can only occur when the ec_event_clearing=event mode is applied while there is no user currently using this mode. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-26	ACPI / EC: Fix query handler related issues	Lv Zheng	1	-28/+45
	1. acpi_ec_remove_query_handlers() This patch refines the query handler removal logic implemented in acpi_ec_remove_query_handler(), making it to invoke new acpi_ec_remove_query_handlers() API, and ensuring all other removal code paths to invoke the new API to honor the reference count of the query handlers. 2. acpi_ec_get_query_handler_by_value() This patch also refines the query handler search logic originally implemented in acpi_ec_query(), collecting it into acpi_ec_get_query_handler_by_value(). And since schedule_work() can ensure the serilization of acpi_ec_event_handler(), we needn't put the mutex_lock() around schedule_work(). Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-26	ACPI / EC: Fix a memory leak issue in acpi_ec_query()	Lv Zheng	1	-0/+2
	When query handler is not found, "result" is actually stil 0, and "struct acpi_ec_query" is not NULL, so the deletion code of "struct acpi_ec_query" at the end of the function cannot be invoked. As a consequence, memory leak can be observed. The issue is introduced by this commit: Commit: 02b771b64b73226052d6e731a0987db3b47281e9 Subject: ACPI / EC: Fix an issue caused by the serialized _Qxx This patch fixes such memory leakage. Fixes: 02b771b64b73 (ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations) Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-01	Merge branches 'acpi-pci', 'acpi-soc', 'acpi-ec' and 'acpi-osl'	Rafael J. Wysocki	1	-22/+60
	* acpi-pci: ACPI, PCI: Penalize legacy IRQ used by ACPI SCI * acpi-soc: ACPI / LPSS: Ignore 10ms delay for Braswell * acpi-ec: ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations * acpi-osl: ACPI / osl: replace custom implementation of readq / writeq
2015-08-25	ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations	Lv Zheng	1	-22/+60
	It is proven that Windows evaluates _Qxx handlers in a parallel way. This patch follows this fact, splits _Qxx evaluations from the NOTIFY queue to form a separate queue, so that _Qxx evaluations can be queued up on different CPUs rather than being queued up on a CPU0 bound queue. Event handling related callbacks are also renamed and sorted in this patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=94411 Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-08	ACPI: Remove FSF mailing addresses	Jarkko Nikula	1	-4/+0
	There is no need to carry potentially outdated Free Software Foundation mailing address in file headers since the COPYING file includes it. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15	ACPI / EC: Fix a code coverity issue when QR_EC transactions are failed.	Lv Zheng	1	-1/+13
	When the QR_EC transaction fails, the EC_FLAGS_QUERY_PENDING flag prevents the event handling work queue from being scheduled again. Though there shouldn't be failed QR_EC transactions, and this gap was efficiently used for catching and learning the SCI_EVT clearing timing compliance issues, we need to fix this as we are not fully compatible with all platforms/Windows to handle SCI_EVT clearing timing correctly. Fixing this gives the EC driver the chances to recover from a state machine failure. So this patch fixes this issue. When nr_pending_queries drops to 0, it clears EC_FLAGS_QUERY_PENDING at the proper position for different modes in order to ensure that the SCI_EVT handling can proceed. In order to be clearer for future ec_event_clearing modes, all checks in this patch are written in the inclusive style, not the exclusive style. Cc: 3.16+ <stable@vger.kernel.org> # 3.16+ Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15	ACPI / EC: Fix EC_FLAGS_QUERY_HANDSHAKE platforms using new event clearing ↵	Lv Zheng	1	-7/+9
	timing. It is reported that on several platforms, EC firmware will not respond non-expected QR_EC (see EC_FLAGS_QUERY_HANDSHAKE, only write QR_EC when SCI_EVT is set). Unfortunately, ACPI specification doesn't define when the SCI_EVT should be cleared by the firmware, thus the original implementation queued up second QR_EC right after writing QR_EC command and before reading the returned event value as at that time the SCI_EVT is ensured not cleared. This behavior is also based on the assumption that the firmware should be able to return 0x00 to indicate "no outstanding event". This behavior did fix issues on Samsung platforms where the spurious query value of 0x00 is supported and didn't break platforms in my test queue. But recently, specific Acer, Asus, Lenovo platforms keep on blaming this change. This patch changes the behavior to re-check the SCI_EVT a bit later and removes EC_FLAGS_QUERY_HANDSHAKE quirks, hoping this is the Windows compliant EC driver behavior. In order to be robust to the possible regressions, instead of removing the quirk directly, this patch keeps the quirk code, removes the quirk users and keeps old behavior for Samsung platforms. Cc: 3.16+ <stable@vger.kernel.org> # 3.16+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=94411 Link: https://bugzilla.kernel.org/show_bug.cgi?id=97381 Link: https://bugzilla.kernel.org/show_bug.cgi?id=98111 Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Reported-and-tested-by: Tigran Gabrielyan <tigrangab@gmail.com> Reported-and-tested-by: Adrien D <ghbdtn@openmailbox.org> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15	ACPI / EC: Add event clearing variation support.	Lv Zheng	1	-5/+132
	We've been suffering from the uncertainty of the SCI_EVT clearing timing. This patch implements 3 of 4 possible modes to handle SCI_EVT clearing variations. The old behavior is kept in this patch. Status: QR_EC is re-checked as early as possible after checking previous SCI_EVT. This always leads to 2 QR_EC transactions per SCI_EVT indication and the target may implement event queue which returns 0x00 indicating "no outstanding event". This is proven to be a conflict against Windows behavior, but is still kept in this patch to make the EC driver robust to the possible regressions that may occur on Samsung platforms. Query: QR_EC is re-checked after the target has handled the QR_EC query request command pushed by the host. Event: QR_EC is re-checked after the target has noticed the query event response data pulled by the host. This timing is not determined by any IRQs, so we may need to use a guard period in this mode, which may explain the existence of the ec_guard() code used by the old EC driver where the re-check timing is implemented in the similar way as this mode. Method: QR_EC is re-checked as late as possible after completing the _Qxx evaluation. The target may implement SCI_EVT like a level triggered interrupt. It is proven on kernel bugzilla 94411 that, Windows will have all _Qxx evaluations parallelized. Thus unless required by further evidences, we needn't implement this mode as it is a conflict of the _Qxx parallelism requirement. Note that, according to the reports, there are platforms that cannot be handled using the "Status" mode without enabling the EC_FLAGS_QUERY_HANDSHAKE quirk. But they can be handled with the other modes according to the tests (kernel bugzilla 97381). The following log entry can be used to confirm the differences of the 3 modes as it should appear at the different positions for the 3 modes: Command(QR_EC) unblocked Status: appearing after EC_SC(W) = 0x84 Query: appearing after EC_DATA(R) = 0xXX where XX is the event number used to determine _QXX Event: appearing after first EC_SC(R) = 0xX0 SCI_EVT=x BURST=0 CMD=0 IBF=0 OBF=0 that is next to the following log entry: Command(QR_EC) completed by hardware Link: https://bugzilla.kernel.org/show_bug.cgi?id=94411 Link: https://bugzilla.kernel.org/show_bug.cgi?id=97381 Link: https://bugzilla.kernel.org/show_bug.cgi?id=98111 Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Reported-and-tested-by: Tigran Gabrielyan <tigrangab@gmail.com> Reported-and-tested-by: Adrien D <ghbdtn@openmailbox.org> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15	ACPI / EC: Convert event handling work queue into loop style.	Lv Zheng	1	-9/+24
	During the period that a work queue is scheduled (queued up for run) but hasn't been run, second schedule_work() could fail. This may not lead to the loss of queries because QR_EC is always ensured to be submitted after the work queue has been in the running state. The event handling work queue can be changed into the loop style to allow us to control the code in a more flexible way: 1. Makes it possible to add event=0x00 termination condition in the loop. 2. Increases the thoughput of the QR_EC transactions as the 2nd+ QR_EC transactions may be handled in the same work item used for the 1st QR_EC transaction, thus the delay caused by the 2nd+ work item scheduling can be eliminated. Except the logging message changes and the throughput improvement, this patch is just a funcitonal no-op. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Tested-by: Tigran Gabrielyan <tigrangab@gmail.com> Tested-by: Adrien D <ghbdtn@openmailbox.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15	ACPI / EC: Cleanup transaction state transition.	Lv Zheng	1	-8/+15
	This patch collects transaction state transition code into one function. We then could have a single function to maintain transaction transition related behaviors. No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Tested-by: Tigran Gabrielyan <tigrangab@gmail.com> Tested-by: Adrien D <ghbdtn@openmailbox.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-16	ACPI / EC: Remove non-root-caused busy polling quirks.	Lv Zheng	1	-32/+3
	{ Update to correct 1 patch subject in the description } We have fixed a lot of race issues in the EC driver recently. The following commit introduces MSI udelay()/msleep() quirk to MSI laptops to make EC firmware working for bug 12011 without root causing any EC driver race issues: Commit: 5423a0cb3f74c16e90683f8ee1cec6c240a9556e Subject: ACPI: EC: Add delay for slow MSI controller Commit: 34ff4dbccccce54c83b1234d39b7ad9e548a75dd Subject: ACPI: EC: Separate delays for MSI hardware The following commit extends ECDT validation quirk to MSI laptops to make EC driver locating EC registers properly for bug 12461: Commit: a5032bfdd9c80e0231a6324661e123818eb46ecd Subject: ACPI: EC: Always parse EC device This is a different quirk than the MSI udelay()/msleep() quirk. This patch keeps validating ECDT for only "Micro-Star MS-171F" as reported. The following commit extends MSI udelay()/msleep() quirk to Quanta laptops to make EC firmware working for bug 20242, there is no requirement to validate ECDT for Quanta laptops: Commit: 534bc4e3d27096e2f3fc00c14a20efd597837a4f Mon Sep 17 00:00:00 2001 Subject: ACPI EC: enable MSI workaround for Quanta laptops The following commit extends MSI udelay()/msleep() quirk to Clevo laptops to make EC firmware working for bug 77431, there is no requirement to validate ECDT for Clevo laptops: Commit: 777cb382958851c88763253fe00a26529be4c0e9 Subject: ACPI / EC: Add msi quirk for Clevo W350etq All udelay()/msleep() quirks for MSI/Quanta/Clevo seem to be the wrong fixes generated without fixing the EC driver race issues. And even if it is not wrong, the guarding can be covered by the following commits in wait polling mode: Commit: 9e295ac14d6a59180beed0735e6a504c2ee87761 Subject: ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp. Commit: commit in the same series Subject: ACPI / EC: Fix and clean up register access guarding logics. The only case that is not covered is the inter-transaction guarding. And there is no evidence that we need the inter-transaction guarding upon reading the noted bug entries. So it is time to remove the quirks and let the users to try again. If there is a regression, the only thing we need to do is to restore the inter-transaction guarding for the reported platforms. Link: https://bugzilla.kernel.org/show_bug.cgi?id=12011 Link: https://bugzilla.kernel.org/show_bug.cgi?id=12461 Link: https://bugzilla.kernel.org/show_bug.cgi?id=20242 Link: https://bugzilla.kernel.org/show_bug.cgi?id=77431 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-16	ACPI / EC: Add module params for polling modes.	Lv Zheng	1	-16/+23
	We have 2 polling modes in the EC driver: 1. busy polling: originally used for the MSI quirks. udelay() is used to perform register access guarding. 2. wait polling: normal code path uses wait_event_timeout() and it can be woken up as soon as the transaction is completed in the interrupt mode. It also contains the register acces guarding logic in case the interrupt doesn't arrive and the EC driver is about to advance the transaction in task context (the polling mode). The wait polling is useful for interrupt mode to allow other tasks to use the CPU during the wait. But for the polling mode, the busy polling takes less time than the wait polling, because if no interrupt arrives, the wait polling has to wait the minimal HZ interval. We have a new use case for using the busy polling mode. Some GPIO drivers initialize PIN configuration which cause a GPIO multiplexed EC GPE to be disabled out of the GPE register's control. Busy polling mode is useful here as it takes less time than the wait polling. But the guarding logic prevents it from responding even faster. We should spinning around the EC status rather than spinning around the nop execution lasted a determined period. This patch introduces 2 module params for the polling mode switch and the guard time, so that users can use the busy polling mode without the guarding in case the guarding is not necessary. This is an example to use the 2 module params for this purpose: acpi.ec_busy_polling acpi.ec_polling_guard=0 We've tested the patch on a test platform. The platform suffers from such kind of the GPIO PIN issue. The GPIO driver resets all PIN configuration and after that, EC interrupt cannot arrive because of the multiplexing. Then the platform suffers from a long delay carried out by the wait_event_timeout() as all further EC transactions will run in the polling mode. We switched the EC driver to use the busy polling mechanism instead of the wait timeout polling mechanism and the delay is still high: [ 44.283005] calling PNP0C0B:00+ @ 1305, parent: platform [ 44.417548] call PNP0C0B:00+ returned 0 after 131323 usecs And this patch can significantly reduce the delay: [ 44.502625] calling PNP0C0B:00+ @ 1308, parent: platform [ 44.503760] call PNP0C0B:00+ returned 0 after 1103 usecs Tested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-16	ACPI / EC: Fix and clean up register access guarding logics.	Lv Zheng	1	-28/+54
	In the polling mode, EC driver shouldn't access the EC registers too frequently. Though this statement is concluded from the non-root caused bugs (see links below), we've maintained the register access guarding logics in the current EC driver. The guarding logics can be found here and there, makes it hard to root cause real timing issues. This patch collects the guarding logics into one single function so that all hidden logics related to this can be seen clearly. The current guarding related code also has several issues: 1. Per-transaction timestamp prevents inter-transaction guarding from being implemented in the same place. We have an inter-transaction udelay() in acpi_ec_transaction_unblocked(), this logic can be merged into ec_poll() if we can use per-device timestamp. This patch completes such merge to form a new ec_guard() function and collects all guarding related hidden logics in it. One hidden logic is: there is no inter-transaction guarding performed for non MSI quirk (wait polling mode), this patch skips inter-transaction guarding before wait_event_timeout() for the wait polling mode to reveal the hidden logic. The other hidden logic is: there is msleep() inter-transaction guarding performed when the GPE storming is observed. As after merging this commit: Commit: e1d4d90fc0313d3d58cbd7912c90f8ef24df45ff Subject: ACPI / EC: Refine command storm prevention support EC_FLAGS_COMMAND_STORM is ensured to be cleared after invoking acpi_ec_transaction_unlocked(), the msleep() guard logic will never happen now. Since no one complains such change, this logic is likely added during the old times where the EC race issues are not fixed and the bugs are false root-caused to the timing issue. This patch simply removes the out-dated logic. We can restore it by stop skipping inter-transaction guarding for wait polling mode. Two different delay values are defined for msleep() and udelay() while they are merged in this patch to 550us. 2. time_after() causes additional delay in the polling mode (can only be observed in noirq suspend/resume processes where polling mode is always used) before advance_transaction() is invoked ("wait polling" log is added before wait_event_timeout()). We can see 2 wait_event_timeout() invocations. This is because time_after() ensures a ">" validation while we only need a ">=" validation here: [ 86.739909] ACPI: Waking up from system sleep state S3 [ 86.742857] ACPI : EC: 2: Increase command [ 86.742859] ACPI : EC: *** Command(RD_EC) started * [ 86.742861] ACPI : EC: ===== TASK (0) ===== [ 86.742871] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 86.742873] ACPI : EC: EC_SC(W) = 0x80 [ 86.742876] ACPI : EC: * Event started * [ 86.742880] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 86.743972] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 86.747966] ACPI : EC: ===== TASK (0) ===== [ 86.747977] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 86.747978] ACPI : EC: EC_DATA(W) = 0x06 [ 86.747981] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 86.751971] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 86.755969] ACPI : EC: ===== TASK (0) ===== [ 86.755991] ACPI : EC: EC_SC(R) = 0x21 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=1 [ 86.755993] ACPI : EC: EC_DATA(R) = 0x03 [ 86.755994] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 86.755995] ACPI : EC: * Command(RD_EC) stopped * [ 86.755996] ACPI : EC: 1: Decrease command This patch corrects this by using time_before() instead in ec_guard(): [ 54.283146] ACPI: Waking up from system sleep state S3 [ 54.285414] ACPI : EC: 2: Increase command [ 54.285415] ACPI : EC: * Command(RD_EC) started * [ 54.285416] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 54.285417] ACPI : EC: ===== TASK (0) ===== [ 54.285424] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 54.285425] ACPI : EC: EC_SC(W) = 0x80 [ 54.285427] ACPI : EC: * Event started * [ 54.285429] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 54.287209] ACPI : EC: ===== TASK (0) ===== [ 54.287218] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 54.287219] ACPI : EC: EC_DATA(W) = 0x06 [ 54.287222] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 54.291190] ACPI : EC: ===== TASK (0) ===== [ 54.291210] ACPI : EC: EC_SC(R) = 0x21 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=1 [ 54.291213] ACPI : EC: EC_DATA(R) = 0x03 [ 54.291214] ACPI : EC: ~~~~~ wait polling ~~~~~ [ 54.291215] ACPI : EC: * Command(RD_EC) stopped *** [ 54.291216] ACPI : EC: 1: Decrease command After cleaning up all guarding logics, we have one single function ec_guard() collecting all old, non-root-caused, hidden logics. Then we can easily tune the logics in one place to respond to the bug reports. Except the time_before() change, all other changes do not change the behavior of the EC driver. Link: https://bugzilla.kernel.org/show_bug.cgi?id=12011 Link: https://bugzilla.kernel.org/show_bug.cgi?id=20242 Link: https://bugzilla.kernel.org/show_bug.cgi?id=77431 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-16	ACPI / EC: Remove irqs_disabled() check.	Lv Zheng	1	-2/+1
	The following commit merges polling and interrupt modes for EC driver: Commit: 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc Mon Sep 17 00:00:00 2001 Subject: ACPI: EC: Merge IRQ and POLL modes The irqs_disabled() check introduced in it tries to fall into busy polling mode when the context of ec_poll() cannot sleep. Actually ec_poll() is ensured to be invoked in the contexts that can sleep (from a sysfs /sys/kernel/debug/ec/ec0/io access, or from acpi_evaluate_object(), or from acpi_ec_gpe_poller()). Without the MSI quirk, we never saw the udelay() logic invoked. Thus this check is useless and can be removed. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-16	ACPI / EC: Remove storming threashold enlarging quirk.	Lv Zheng	1	-15/+0
	This patch removes the storming threashold enlarging quirk. After applying the following commit, we can notice that there is no no-op GPE handling invocation can be observed, thus it is unlikely that the no-op counts can exceed the storming threashold: Commit: ca37bfdfbc8d0a3ec73e4b97bb26dcfa51d515aa Subject: ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode. Even when the storming happens, we have already limited its affection to the only transaction and no further transactions will be affected. This is done by this commit: Commit: e1d4d90fc0313d3d58cbd7912c90f8ef24df45ff Subject: ACPI / EC: Refine command storm prevention support So it's time to remove this quirk. Link: https://bugzilla.kernel.org/show_bug.cgi?id=45151 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-16	ACPI / EC: Update acpi_ec_is_gpe_raised() with new GPE status flag.	Lv Zheng	1	-1/+1
	This patch updates acpi_ec_is_gpe_raised() according to the following commit: Commit: 09af8e8290deaff821ced01ea83594ee4c21e8df Subject: ACPICA: Events: Add support to return both enable/status register values for GPE and fixed event. This is actually a no-op change as both the flags are defined to a same value. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-22	ACPI / EC: fix NULL pointer dereference in acpi_ec_remove_query_handler()	Chris Bainbridge	1	-1/+1
	Use list_for_each_entry_safe for iterating because handler may be freed in the loop. BUG: unable to handle kernel NULL pointer dereference at 000000000000002c IP: [<ffffffff814d69c8>] acpi_ec_put_query_handler+0x7/0x1a Call Trace: acpi_ec_remove_query_handler+0x87/0x97 acpi_smbus_hc_remove+0x2a/0x44 [sbshc] acpi_device_remove+0x7b/0x9a __device_release_driver+0x7e/0x110 driver_detach+0xb0/0xc0 bus_remove_driver+0x54/0xe0 driver_unregister+0x2b/0x60 acpi_bus_unregister_driver+0x10/0x12 acpi_smb_hc_driver_exit+0x10/0x12 [sbshc] SyS_delete_module+0x1b8/0x210 system_call_fastpath+0x12/0x6a Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-02	ACPI / EC: Call acpi_walk_dep_device_list() after installing EC opregion handler	Lan Tianyu	1	-0/+3
	On some machines(E,G Mircosoft surface 3), ACPI battery depends on the EC operation region and it has _DEP method which contains EC. Current code doesn't support such devices whose dep_unmet will be not be decreased after EC opregion handler being installed. This blocks battery device to be attached with its driver. This patch is to fix the issue. Link: https://bugzilla.kernel.org/show_bug.cgi?id=90161 Reported-and-tested-by: Lompik <lompik@voila.fr> Tested-by: Valentin Lab <valentin.lab_bugzilla.kernel.org@kalysto.org> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-10	ACPI / EC: Add GPE reference counting debugging messages.	Lv Zheng	1	-2/+10
	This patch enhances debugging with the GPE reference count messages added. This kind of log entries can be used by the platform validators to validate if there is an EC transaction broken because of firmware/driver bugs. No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-10	ACPI / EC: Cleanup logging/debugging splitter support.	Lv Zheng	1	-34/+74
	This patch refines logging/debugging splitter support so that when DEBUG is disabled, splitters won't be visible in the kernel logs while they are still available for developers when DEBUG is enabled. This patch also refines the splitters to mark the following handling process boundaries: +++++: boundary of driver starting/stopping boundary of IRQ storming =====: boundary of transaction advancement *****: boundary of EC command boundary of EC query #####: boundary of EC _Qxx evaluation The following 2 log entries are originally logged using pr_info() in order to be used as the boot/suspend/resume log entries for the EC device, this patch also restores them to pr_info() logging level: ACPI : EC: EC started ACPI : EC: EC stopped In this patch, one log entry around "Polling quirk" is converted into ec_dbg_raw() which doesn't contain the boundary marker. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-17	ACPI / EC: Remove non-standard log emphasis	Scot Doyle	1	-2/+2
	Remove unusual pr_info() visual emphasis introduced in ad479e7f47ca "ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag". Signed-off-by: Scot Doyle <lkml14@scotdoyle.com> [ rjw: Change pr_info() to pr_debug() too in those places. ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-11	Revert "ACPI / EC: Add query flushing support"	Rafael J. Wysocki	1	-85/+16
	Revert commit f252cb09e1cb (ACPI / EC: Add query flushing support), because it breaks system suspend on Acer Aspire S5. The machine just hangs solid at the last stage of suspend (after taking non-boot CPUs offline). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-11	Revert "ACPI / EC: Add GPE reference counting debugging messages"	Rafael J. Wysocki	1	-22/+2
	Revert commit b5bca896ef3c (ACPI / EC: Add GPE reference counting debugging messages), because it depends on commit f252cb09e1cb (ACPI / EC: Add query flushing support) which breaks system suspend on Acer Aspire S5 and needs to be reverted. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-06	ACPI / EC: Add GPE reference counting debugging messages	Lv Zheng	1	-2/+22
	This patch enhances debugging with the GPE reference count messages added. No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-06	ACPI / EC: Add query flushing support	Lv Zheng	1	-16/+85
	This patch implementes the QR_EC flushing support. Grace periods are implemented from the detection of an SCI_EVT to the submission/completion of the QR_EC transaction. During this period, all EC command transactions are allowed to be submitted. Note that query periods and event periods are intentionally distiguished to allow further improvements. 1. Query period: from the detection of an SCI_EVT to the sumission of the QR_EC command. This period is used for storming prevention, as currently QR_EC is deferred to a work queue rather than directly issued from the IRQ context even there is no other transactions pending, so malicous SCI_EVT GPE can act like "level triggered" to trigger a GPE storm. We need to be prepared for this. And in the future, we may change it to be a part of the advance_transaction() where we will try QR_EC submission in appropriate positions to avoid such GPE storming. 2. Event period: from the detection of an SCI_EVT to the completion of the QR_EC command. We may extend it to the completion of _Qxx evaluation. This is actually a grace period for event flushing, but we only flush queries due to the reason stated in known issue 1. That's also why we use EC_FLAGS_EVENT_xxx. During this period, QR_EC transactions need to pass the flushable submission check. In this patch, the following flags are implemented: 1. EC_FLAGS_EVENT_ENABLED: this is derived from the old EC_FLAGS_QUERY_PENDING flag which can block SCI_EVT handlings. With this flag, the logics implemented by the original flag are extended: 1. Old logic: unless both of the flags are set, the event poller will not be scheduled, and 2. New logic: as soon as both of the flags are set, the evet poller will be scheduled. 2. EC_FLAGS_EVENT_DETECTED: this is also derived from the old EC_FLAGS_QUERY_PENDING flag which can block SCI_EVT detection. It thus can be used to indicate the storming prevention period for query submission. acpi_ec_submit_request()/acpi_ec_complete_request() are invoked to implement this period so that acpi_set_gpe() can be invoked under the "reference count > 0" condition. 3. EC_FLAGS_EVENT_PENDING: this is newly added to indicate the grace period for event flushing (query flushing for now). acpi_ec_submit_request()/acpi_ec_complete_request() are invoked to implement this period so that the flushing process can wait until the event handling (query transaction for now) to be completed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=82611 Link: https://bugzilla.kernel.org/show_bug.cgi?id=77431 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-06	ACPI / EC: Refine command storm prevention support	Lv Zheng	1	-19/+36
	This patch refines EC command storm prevention support. Current command storming code is wrong, when the storming condition is detected, it only flags the condition without doing anything for the current command but performing storming prevention for the follow-up commands. So: 1. The first command which suffers from the storming still suffers from storming. 2. The follow-up commands which may not suffer from the storming are unconditionally forced into the storming prevention mode. Ideally, we should only enable storm prevention immediately after detection for the current command so that the next command can try the power/performance efficient interrupt mode again. This patch improves the command storm prevention by disabling GPE right after the detection and re-enabling it right before completing the command transaction using the GPE storming prevention APIs. This thus deploys the following GPE handling model: 1. acpi_enable_gpe()/acpi_disable_gpe() for reference count changes: This set of APIs are used for EC usage reference counting. 2. acpi_set_gpe(ACPI_GPE_ENABLE)/acpi_set_gpe(ACPI_GPE_DISABLE): This set of APIs are used for preventing GPE storm. They must be invoked when the reference count > 0. Note that as the storming prevention should always happen when there is an outstanding request, or GPE enabling value will be messed up by the races. This patch also adds BUG_ON() to enforces this rule to prevent future bugs. The msleep(1) used after completing a transaction is useless now as this sounds like a guard time only useful for platforms that need the EC_FLAGS_MSI quirks while we have fixed GPE race issues using the previous raw handler mode enabling. It is kept to avoid regressions. A seperate patch which deletes EC_FLAGS_MSI quirks should take care of deleting it. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-06	ACPI / EC: Add command flushing support.	Lv Zheng	1	-3/+65
	This patch implements the EC command flushing support. During the grace period indicated by EC_FLAGS_STARTED and EC_FLAGS_STOPPED, all submitted EC command transactions can be completed and new submissions are prevented before suspending so that the EC hardware can be ensured to be in the idle state when the system is resumed. There is a good indicator for flush support: All acpi_ec_submit_request() is invoked after checking driver state with acpi_ec_started() except the first one. This means all code paths can be flushed as fast as possible by discarding the requests occurred after the flush operation. The reference increased for such kind of code path is wrapped by acpi_ec_submit_flushable_request(). Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-06	ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag	Lv Zheng	1	-11/+54
	By using the 2 flags, we can indicate an inter-mediate state where the current transactions should be completed while the new transactions should be dropped. The comparison of the old flag and the new flags: Old New about to set BLOCKED STOPPED set / STARTED set BLOCKED set STOPPED clear / STARTED clear BLOCKED clear STOPPED clear / STARTED set A new period can be indicated by the 2 flags. The new period is between the point where we are about to set BLOCKED and the point when the BLOCKED is set. The new flags facilitate us with acpi_ec_started() check to allow the EC transaction to be submitted during the new period. This period thus can be used as a grace period for the EC transaction flushing. The only functional change after applying this patch is: 1. The GPE enabling/disabling is protected by the EC specific lock. We can do this because of recent ACPICA GPE API enhancement. This is reasonable as the GPE disabling/enabling state should only be determined by the EC driver's state machine which is protected by the EC spinlock. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-05	ACPI / EC: Update revision due to raw handler mode.	Lv Zheng	1	-3/+3
	The bug fixes around GPE races have been done to the EC driver by the previous commits. This patch increases the revision to 3 to indicate the behavior differences between the old and the new drivers. The copyright/authorship notices are also updated. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-05	ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp.	Lv Zheng	1	-3/+14
	Timeout in the ec_poll() doesn't refer to the last register access time. It thus can win the competition against the acpi_ec_gpe_handler() if a transaction takes longer than 1ms but individual register accesses are less than 1ms. In some cases, it can make the following silicon bug easier to be triggered: GPE EN is not wired to the GPE trigger line, so when GPE STS is already set when 1 is written to GPE EN, no GPE can be triggered. This patch adds register access timestamp reference support for ec_poll() to reduce the number of ec_poll() invocations. Reported-by: Venkat Raghavulu <venkat.raghavulu@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-05	ACPI / EC: Fix several GPE handling issues by deploying ↵	Lv Zheng	1	-8/+73
	ACPI_GPE_DISPATCH_RAW_HANDLER mode. This patch switches EC driver into ACPI_GPE_DISPATCH_RAW_HANDLER mode where the GPE lock is not held for acpi_ec_gpe_handler() and the ACPICA internal GPE enabling/disabling/clearing operations are bypassed so that further improvements are possible with the GPE APIs. There are 2 strong reasons for deploying raw GPE handler mode in the EC driver: 1. Some hardware logics can control their interrupts via their own registers, so their interrupts can be disabled/enabled/acknowledged without using the super IRQ controller provided functions. While there is no mean (EC commands) for the EC driver to achieve this. 2. During suspending, the EC driver is still working for a while to complete the platform firmware provided functionailities using ec_poll() after all GPEs are disabled (see acpi_ec_block_transactions()), which means the EC driver will drive the EC GPE out of the GPE core's control. Without deploying the raw GPE handler mode, we can see many races between the EC driver and the GPE core due to the above restrictions: 1. There is a race condition due to ACPICA internal GPE disabling/clearing/enabling logics in acpi_ev_gpe_dispatch(): Orignally EC GPE is disabled (EN=0), cleared (STS=0) before invoking a GPE handler and re-enabled (EN=1) after invoking a GPE handler in acpi_ev_gpe_dispatch(). When re-enabling appears, GPE may be flagged (STS=1). ================================================================= (event pending A) ================================================================= acpi_ev_gpe_dispatch() ec_poll() EN=0 STS=0 acpi_ec_gpe_handler() *************************************************************** (event handling A) Lock(EC) advance_transaction() EC_SC read ================================================================= (event pending B) ================================================================= EC_SC handled Unlock(EC) ************************************************************* ************************************************************* (event handling B) Lock(EC) advance_transaction() EC_SC read ================================================================= (event pending C) ================================================================= EC_SC handled Unlock(EC) ************************************************************* EN=1 This race condition is the root cause of different issues on different silicon variations. A. Silicon variation A: On some platforms, GPE will be triggered due to "writing 1 to EN when STS=1". This is because both EN and STS lines are wired to the GPE trigger line. 1. Issue 1: We can see no-op acpi_ec_gpe_handler() invoked on such platforms. This is because: a. event pending B: An event can arrive after ACPICA's GPE clearing performed in acpi_ev_gpe_dispatch(), this event may fail to be detected by EC_SC read that is performed before its arrival; b. event handling B: The event can be handled in ec_poll() because EC lock is released after acpi_ec_gpe_handler() invocation; c. There is no code in ec_poll() to clear STS but the GPE can still be triggered by the EN=1 write performed in acpi_ev_finish_gpe(), this leads to a no-op EC GPE handler invocation; d. As no-op GPE handler invocations are counted by the EC driver to trigger the command storming conditions, the wrong no-op GPE handler invocations thus can easily trigger wrong command storming conditions. Note 1: If we removed GPE disabling/enabling code from acpi_ev_gpe_dispatch(), we could still see no-op GPE handlers triggered by the event arriving after the GPE clearing and before the GPE handling on both silicon variation A and B. This can only occur if the CPU is very slow (timing slice between STS=0 write and EC_SC read should be short enough before hardware sets another GPE indication). Thus this is very rare and is not what we need to fix. B. Silicon variation B: On other platforms, GPE may not be triggered due to "writing 1 to EN when STS=1". This is because only STS line is wired to the GPE trigger line. 2. Issue 2: We can see GPE loss on such platforms. This is because: a. event pending B vs. event handling A: An event can arrive after ACPICA's GPE handling performed in acpi_ev_gpe_dispatch(), or event pending C vs. event handling B: An event can arrive after Linux's GPE handling performed in ec_poll(), these events may fail to be detected by EC_SC read that is performed before their arrival; b. The GPE cannot be triggered by EN=1 write performed in acpi_ev_finish_gpe(); c. If no polling mechanism is implemented in the driver for the pending event (for example, SCI_EVT), this event is lost due to no GPE being triggered. Note 2: On most platforms, there might be another rule that GPE may not be triggered due to "writing 1 to STS when STS=1 and EN=1". Then on silicon variation B, an even worse case is if the issue 2 event loss happens, further events may never trigger GPE again on such platforms due to being blocked by the current STS=1. Unless someone clears STS, all events have to be polled. 2. There is a race condition due to lacking in GPE status checking in EC driver: Originally, GPE status is checked in ACPICA core but not checked in the GPE handler. Thus since the status checking and handling is not locked, it can be interrupted by another handling path. ================================================================= (event pending A) ================================================================= acpi_ev_gpe_detect() ec_poll() if (EN==1 && STS==1) ************************************************************* (event handling A) Lock(EC) advance_transaction() EC_SC read EC_SC handled Unlock(EC) ************************************************************* acpi_ev_gpe_dispatch() EN=0 STS=0 acpi_ec_gpe_handler() ************************************************************* (event handling B) Lock(EC) advance_transaction() EC_SC read Unlock(EC) ************************************************************* 3. Issue 3: We can see no-op acpi_ec_gpe_handler() invoked on both silicon variation A and B. This is because: a. event pending A: An event can arrive to trigger an EC GPE and ACPICA checks it and is about to invoke the EC GPE handler; b. event handling A: The event can be handled in ec_poll() because EC lock is not held after the GPE status checking; c. event handling B: Then when the EC GPE handler is invoked, it becomes a no-op GPE handler invocation. d. As no-op GPE handler invocations are counted by the EC driver to trigger the command storming conditions, the wrong no-op GPE handler invocations thus can easily trigger wrong command storming conditions. Note 3: This no-op GPE handler invocation is rare because the time between the IRQ arrival and the acpi_ec_gpe_handler() invocation is less than the timeout value waited in ec_poll(). So most of the no-op GPE handler invocations are caused by the reason described in issue 1. 3. There is a race condition due to ACPICA internal GPE clearing logic in acpi_enable_gpe(): During runtime, acpi_enable_gpe() can be invoked by the EC storming prevention code. When it is invoked, GPE may be flagged (STS=1). ================================================================= (event pending A) ================================================================= acpi_ev_gpe_dispatch() acpi_ec_transaction() EN=0 STS=0 acpi_ec_gpe_handler() ************************************************************* (event handling A) Lock(EC) advance_transaction() EC_SC read EC_SC handled Unlock(EC) ************************************************************* EN=1 ? Lock(EC) Unlock(EC) ================================================================= (event pending B) ================================================================= acpi_enable_gpe() STS=0 EN=1 4. Issue 4: We can see GPE loss on both silicon variation A and B platforms. This is because: a. event pending B: An event can arrive right before ACPICA's GPE clearing performed in acpi_enable_gpe(); b. If the GPE is cleared when GPE is disabled, then EN=1 write in acpi_enable_gpe() cannot trigger this GPE; c. If no polling mechanism is implemented in the driver for this event (for example, SCI_EVT), this event is lost due to no GPE being triggered. Note 4: Currently we don't have this issue, but after we switch the EC driver into ACPI_GPE_DISPATCH_RAW_HANDLER mode, we need to take care of handling this because the EN=1 write in acpi_ev_gpe_dispatch() will be abandoned. There might be more race issues for the current GPE handler usages. This is because the EC IRQ's enabling/disabling/checking/clearing/handling operations should be locked by a single lock that is under the EC driver's control to achieve the serialization. Which means we need to invoke GPE APIs with EC driver's lock held and all ACPICA internal GPE operations related to the GPE handler should be abandoned. Invoking GPE APIs inside of the EC driver lock and bypassing ACPICA internal GPE operations requires the ACPI_GPE_DISPATCH_RAW_HANDLER mode where the same lock used by the APIs are released prior than invoking the handlers. Otherwise, we can see dead locks due to circular locking dependencies (see Reference below). This patch then switches the EC driver into the ACPI_GPE_DISPATCH_RAW_HANDLER mode so that it can perform correct GPE operations using the GPE APIs: 1. Bypasses EN modifications performed in acpi_ev_gpe_dispatch() by using acpi_install_gpe_raw_handler() and invoking all GPE APIs with EC spin lock held. This can fix issue 1 as it makes a non frequent GPE enabling/disabling environment. 2. Bypasses STS clearing performed in acpi_enable_gpe() by replacing acpi_enable_gpe()/acpi_disable_gpe() with acpi_set_gpe(). This can fix issue 4. And this can also help to fix issue 1 as it makes a no sudden GPE clearing environment when GPE is frequently enabled/disabled. 3. Ensures STS acknowledged before handling by invoking acpi_clear_gpe() in advance_transaction(). This can finally fix issue 1 even in a frequent GPE enabling/disabling environment. And this can also finally fix issue 3 when issue 2 is fixed. Note 3: GPE clearing is edge triggered W1C, which means we can clear it right before handling it. Since all EC GPE indications are handled in advance_transaction() by previous commits, we can now move GPE clearing into it to implement the correct GPE clearing. Note 4: We can use acpi_set_gpe() which is not shared GPE safer instead of acpi_enable_gpe()/acpi_disable_gpe() because EC GPE is not shared by other hardware, which is mentioned in the ACPI specification 5.0, 12.6 Interrupt Model: "OSPM driver treats this as an edge event (the EC SCI cannot be shared)". So we can stop using shared GPE safer APIs acpi_enable_gpe()/acpi_disable_gpe() in the EC driver. Otherwise cleanups need to be made in acpi_ev_enable_gpe() to bypass the GPE clearing logic before keeping acpi_enable_gpe(). This patch also invokes advance_transaction() when GPE is re-enabled in the task context which: 1. Ensures EN=1 can trigger GPE by checking and handling EC status register right after EN=1 writes. This can fix issue 2. After applying this patch, without frequent GPE enablings considered: ================================================================= (event pending A) ================================================================= acpi_ec_gpe_handler() ec_poll() ************************************************************* (event handling A) Lock(EC) advance_transaction() if STS==1 STS=0 EC_SC read ================================================================= (event pending B) ================================================================= EC_SC handled Unlock(EC) ************************************************************* ************************************************************* (event handling B) Lock(EC) advance_transaction() if STS==1 STS=0 EC_SC read ================================================================= (event pending C) ================================================================= EC_SC handled Unlock(EC) ************************************************************* The event pending for issue 1 (event pending B) can arrive as a next GPE due to the previous IRQ context STS=0 write. And if it is handled by ec_poll() (event handling B), as it is also acknowledged by ec_poll(), the event pending for issue 2 (event pending C) can properly arrive as a next GPE after the task context STS=0 write. So no GPE will be lost and never triggered due to GPE clearing performed in the wrong position. And since all GPE handling is performed after a locked GPE status checking, we can hardly see no-op GPE handler invocations due to issue 1 and 3. We may still see no-op GPE handler invocations due to "Note 1", but as it is inevitable, it needn't be fixed. After applying this patch, with frequent GPE enablings considered: ================================================================= (event pending A) ================================================================= acpi_ec_gpe_handler() acpi_ec_transaction() ************************************************************* (event handling A) Lock(EC) advance_transaction() if STS==1 STS=0 EC_SC read ================================================================= (event pending B) ================================================================= EC_SC handled Unlock(EC) ************************************************************* ************************************************************* (event handling B) Lock(EC) EN=1 if STS==1 advance_transaction() if STS==1 STS=0 EC_SC read ================================================================= (event pending C) ================================================================= EC_SC handled Unlock(EC) *************************************************************** The event pending for issue 2 can be manually handled by advance_transaction(). And after the STS=0 write performed in the manual triggered advance_transaction(), GPE can always arrive. So no GPE will be lost due to frequent GPE disabling/enabling performed in the driver like issue 4. Note 5: It's ideally when EN=1 write occurred, an IRQ thread should be woken up to handle the GPE when the GPE was raised. But this requires the IRQ thread to contain the poller code for all EC GPE indications, while currently some of the indications are handled in the user tasks. It then is very hard for the code to determine whether a user task should be invoked or the poller work item should be scheduled. So we have to invoke advance_transaction() directly now and it leaves us such a restriction for the GPE re-enabling: it must be performed in the task context to avoid starving the GPEs. As a conclusion: we can see the EC GPE is always handled in serial after deploying the raw GPE handler mode: Lock(EC) if (STS==1) STS=0 EC_SC read EC_SC handled Unlock(EC) The EC driver specific lock is responsible to make the EC GPE handling processes serialized so that EC can handle its GPE from both IRQ and task contexts and the next IRQ can be ensured to arrive after this process. Note 6: We have many EC_FLAGS_MSI qurik users in the current driver. They all seem to be suffering from unexpected GPE triggering source lost. And they are false root caused to a timing issue. Since EC communication protocol has already flow control defined, timing shouldn't be the root cause, while this fix might be fixing the root cause of the old bugs. Link: https://lkml.org/lkml/2014/11/4/974 Link: https://lkml.org/lkml/2014/11/18/316 Link: https://www.spinics.net/lists/linux-acpi/msg54340.html Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23	ACPI / EC: Cleanup QR_EC related code	Lv Zheng	1	-50/+20
	The QR_EC related code pieces have redundants, this patch merges them into acpi_ec_query() which invokes acpi_ec_transaction() where EC mutex and the global lock are already held. After doing so, query handler traversal still need to be locked by EC mutex after invoking acpi_ec_transaction(). Note that EC event handling is sequential. We fetch one event from firmware event queue and process it until 0x00 or error returned. So we don't need to hold mutex for whole acpi_ec_clear() process to determine whether we should continue to drain. And for the same reason, we don't need to hold mutex for the whole procedure from the QR_EC transaction to the query handler traversal. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23	ACPI / EC: Fix issues related to the SCI_EVT handling	Lv Zheng	1	-34/+25
	This patch fixes 2 issues related to the draining behavior. But it doesn't implement the draining support, it only cleans up code so that further draining support is possible. The draining behavior is expected by some platforms (for example, Samsung) where SCI_EVT is set only once for a set of events and might be cleared for the very first QR_EC command issued after SCI_EVT is set. EC firmware on such platforms will return 0x00 to indicate "no outstanding event". Thus after seeing an SCI_EVT indication, EC driver need to fetch events until 0x00 returned (see acpi_ec_clear()). Issue 1 - acpi_ec_submit_query(): It's reported on Samsung laptops that SCI_EVT isn't checked when the transactions are advanced in ec_poll(), which leads to SCI_EVT triggering source lost: If no EC GPE IRQs are arrived after that, EC driver cannot detect this event and handle it. See comment 244/247 for kernel bugzilla 44161. This patch fixes this issue by moving SCI_EVT checks into advance_transaction(). So that SCI_EVT is checked each time we are going to handle the EC firmware indications. And this check will happen for both IRQ context and task context. Since after doing that, SCI_EVT is also checked after completing a transaction, ec_check_sci() and ec_check_sci_sync() can be removed. Issue 2 - acpi_ec_complete_query(): We expect to clear EC_FLAGS_QUERY_PENDING to allow queuing another draining QR_EC after writing a QR_EC command and before reading the event. After reading the event, SCI_EVT might be cleared by the firmware, thus it may not be possible to queue such a draining QR_EC at that time. But putting the EC_FLAGS_QUERY_PENDING clearing code after start_transaction() is wrong as there are chances that after start_transaction(), QR_EC can fail to be sent. If this happens, EC_FLAG_QUERY_PENDING will be cleared earlier. As a consequence, the draining QR_EC will also be queued earlier than expected. This patch also moves this code into advance_transaction() where QR_EC is just sent (ACPI_EC_COMMAND_POLL flagged) to fix this issue. Notes: 1. After introducing the 2 SCI_EVT related handlings into advance_transaction(), a next QR_EC can be queued right after writing the current QR_EC command and before reading the event. But this still hasn't implemented the draining behavior as the draining support requires: If a previous returned event value isn't 0x00, a draining QR_EC need to be issued even when SCI_EVT isn't set. 2. In this patch, acpi_os_execute() is also converted into a seperate work item to avoid invoking kmalloc() in the atomic context. We can do this because of the previous global lock fix. 3. Originally, EC_FLAGS_EVENT_PENDING is also used to avoid queuing up multiple work items (created by acpi_os_execute()), this can be covered by only using a single work item. But this patch still keeps this flag as there are different usages in the driver initialization steps relying on this flag. Link: https://bugzilla.kernel.org/show_bug.cgi?id=44161 Reported-by: Kieran Clancy <clancy.kieran@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23	ACPI / EC: Fix a code path that global lock is not held	Lv Zheng	1	-0/+10
	Currently QR_EC is queued up on CPU 0 to be safe with SMM because there is no global lock held for acpi_ec_gpe_query(). As we are about to move QR_EC to a non CPU 0 bound work queue to avoid invoking kmalloc() in advance_transaction(), we have to acquire global lock for the new QR_EC work item to avoid regressions. Known issue: 1. Global lock for acpi_ec_clear(). This is an existing issue that acpi_ec_clear() which invokes acpi_ec_sync_query() also suffers from the same issue. But this patch's target is only the code to invoke acpi_ec_sync_query() in a CPU 0 bound work queue item, and the acpi_ec_clear() can be automatically fixed by further patch that merges the redundant code, so it is left unchanged. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23	ACPI / EC: Fix returning values in acpi_ec_sync_query()	Lv Zheng	1	-6/+10
	The returning value of acpi_os_execute() is erroneously handled as errno. This patch corrects it by returning EBUSY to indicate the work queue item creation failure. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23	ACPI / EC: Add reference counting for query handlers	Lv Zheng	1	-10/+33
	This patch adds reference counting for query handlers in order to eliminate kmalloc()/kfree() usage. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Steffen Weber <steffen.weber@gmail.com> Tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23	ACPI / EC: Cleanup transaction wakeup code	Lv Zheng	1	-8/+9
	This patch moves transaction wakeup code into advance_transaction(). No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-15	ACPI / EC: Fix unexpected ec_remove_handlers() invocations	Lv Zheng	1	-0/+2
	The ec_remove_handlers() is invoked without checking EC_FLAGS_HANDLERS_INSTALLED, this patch enhances this check to avoid issues that acpi_disable_gpe() is invoked unexpectedly to reduce the GPE runtime count. This may happen when the EC handler installation failed on some platforms. Reported-by: Venkat Raghavulu <venkat.raghavulu@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-29	ACPI / EC: Fix regression due to conflicting firmware behavior between ↵	Lv Zheng	1	-7/+18
	Samsung and Acer. It is reported that Samsung laptops that need to poll events are broken by the following commit: Commit 3afcf2ece453e1a8c2c6de19cdf06da3772a1b08 Subject: ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set The behaviors of the 2 vendor firmwares are conflict: 1. Acer: OSPM shouldn't issue QR_EC unless SCI_EVT is set, firmware automatically sets SCI_EVT as long as there is event queued up. 2. Samsung: OSPM should issue QR_EC whatever SCI_EVT is set, firmware returns 0 when there is no event queued up. This patch is a quick fix to distinguish the behaviors to make Acer behavior only effective for Acer EC firmware so that the breakages on Samsung EC firmware can be avoided. Fixes: 3afcf2ece453 (ACPI / EC: Add support to disallow QR_EC to be issued ...) Link: https://bugzilla.kernel.org/show_bug.cgi?id=44161 Reported-and-tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: 3.17+ <stable@vger.kernel.org> # 3.17+ [ rjw : Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>