linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2020-06-01	habanalabs: correctly cast u64 to void*	Oded Gabbay	1	-1/+1
	Use the u64_to_user_ptr(x) kernel macro to correctly cast u64 to void* Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200601065648.8775-2-oded.gabbay@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-25	habanalabs: don't set default fence_ops->wait	Daniel Vetter	1	-1/+0
	It's the default. Also so much for "we're not going to tell the graphics people how to review their code", dma_fence is a pretty core piece of gpu driver infrastructure. And it's very much uapi relevant, including piles of corresponding userspace protocols and libraries for how to pass these around. Would be great if habanalabs would not use this (from a quick look it's not needed at all), since open source the userspace and playing by the usual rules isn't on the table. If that's not possible (because it's actually using the uapi part of dma_fence to interact with gpu drivers) then we have exactly what everyone promised we'd want to avoid. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19	habanalabs: add signal/wait to CS IOCTL operations	Omer Shpigelman	1	-20/+321
	Add the following two operations to the CS IOCTL: Signal: The signal operation is basically a command submission, that is created by the driver upon user request. It will be implemented using a dedicated PQE that will increment a specific SOB. There will be a new flag: HL_CS_FLAGS_SIGNAL. When the user set this flag in the CS IOCTL structure, the driver will execute a dedicated code path that will prepare this special PQE and submit it. The user only needs to provide a queue index on which to put the signal. Wait: The wait operation is also a command submission that is created by the driver upon user request. It will be implemented using a dedicated PQE that will contain packets of "ARM a monitor" + FENCE packet. There will be a new flag: HL_CS_FLAGS_WAIT. When the user set this flag in the CS structure, the driver will execute a dedicated code path that will prepare this special PQE and submit it. The user needs to provide the following parameters: 1. queue ID 2. an array of signal_seq numbers and the number of signals to wait on (the length of signal_seq_arr). The IOCTL will return the CS sequence number of the wait it put on the queue ID. Currently, the code supports signal_seq_nr==1. But this API definition will allow us to put a single PQE that waits on multiple signals. To correctly configure the monitor and fence, the driver will need to retrieve the specified signal CS object that contains the relevant SOB and its expected value. In case the signal CS has already been completed, there is no point of adding a wait operation. In this case, the driver will return to the user without putting anything on the PQ. The return code should reflect to the user that the signal was completed, as we won't return a CS sequence number for this wait. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19	habanalabs: handle the h/w sync object	Omer Shpigelman	1	-20/+31
	Define a structure representing the h/w sync object (SOB). a SOB can contain up to 2^15 values. Each signal CS will increment the SOB by 1, so after some time we will reach the maximum number the SOB can represent. When that happens, the driver needs to move to a different SOB for the signal operation. A SOB can be in 1 of 4 states: 1. Working state with value < 2^15 2. We reached a value of 2^15, but the signal operations weren't completed yet OR there are pending waits on this signal. For the next submission, the driver will move to another SOB. 3. ALL the signal operations on the SOB have finished AND there are no more pending waits on the SOB AND we reached a value of 2^15 (This basically means the refcnt of the SOB is 0 - see explanation below). When that happens, the driver can clear the SOB by simply doing WREG32 0 to it and set the refcnt back to 1. 4. The SOB is cleared and can be used next time by the driver when it needs to reuse an SOB. Per SOB, the driver will maintain a single refcnt, that will be initialized to 1. When a signal or wait operation on this SOB is submitted to the PQ, the refcnt will be incremented. When a signal or wait operation on this SOB completes, the refcnt will be decremented. After the submission of the signal operation that increments the SOB to a value of 2^15, the refcnt is also decremented. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19	uapi: habanalabs: add signal/wait operations	Omer Shpigelman	1	-1/+12
	This is a pre-requisite to upstreaming GAUDI support. Signal/wait operations are done by the user to perform sync between two Primary Queues (PQs). The sync is done using the sync manager and it is usually resolved inside the device, but sometimes it can be resolved in the host, i.e. the user should be able to wait in the host until a signal has been completed. The mechanism to define signal and wait operations is done by the driver because it needs atomicity and serialization, which is already done in the driver when submitting work to the different queues. To implement this feature, the driver "takes" a couple of h/w resources, and this is reflected by the defines added to the uapi file. The signal/wait operations are done via the existing CS IOCTL, and they use the same data structure. There is a difference in the meaning of some of the parameters, and for that we added unions to make the code more readable. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-17	habanalabs: handle barriers in DMA QMAN streams	Oded Gabbay	1	-0/+1
	When we have DMA QMAN with multiple streams, we need to know whether the command buffer contains at least one DMA packet in order to configure the barriers correctly when adding the 2xMSG_PROT at the end of the JOB. If there is no DMA packet, then there is no need to put engine barrier. This is relevant only for GAUDI as GOYA doesn't have streams so the engine can't be busy by another stream. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24	habanalabs: Remove unused parse_cnt variable	Tomer Tayar	1	-2/+2
	The "parse_cnt" variable is incremented while validating the CS chunks, but it is actually not being used. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24	habanalabs: Avoid running restore chunks if no execute chunks	Tomer Tayar	1	-20/+21
	CS with no chunks for execute phase is invalid, so its context_switch/restore phase should not be run. Hence, move the check of the execute chunks number to the beginning of hl_cs_ioctl(). Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24	habanalabs: use the user CB size as a default job size	Omer Shpigelman	1	-4/+2
	When no patched command buffer (CB) is created, use the user CB size as the job size. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-12-14	habanalabs: rate limit error msg on waiting for CS	Oded Gabbay	1	-2/+3
	In case a user submits a CS, and the submission fails, and the user doesn't check the return value and instead use the error return value as a valid sequence number of a CS and ask to wait on it, the driver will print an error and return an error code for that wait. The real problem happens if now the user ignores the error of the wait, and try to wait again and again. This can lead to a flood of error messages from the driver and even soft lockup event. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2019-11-21	habanalabs: don't print error when queues are full	Oded Gabbay	1	-3/+4
	If the queues are full and we return -EAGAIN to the user, there is no need to print an error, as that case isn't an error and the user is expected to re-submit the work. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-11-21	habanalabs: Add a new H/W queue type	Tomer Tayar	1	-38/+82
	This patch adds a support for a new H/W queue type. This type of queue is for DMA and compute engines jobs, for which completion notification are sent by H/W. Command buffer for this queue can be created either through the CB IOCTL and using the retrieved CB handle, or by preparing a buffer on the host or device SRAM/DRAM, and using the device address to that buffer. The patch includes the handling of the 2 options, as well as the initialization of the H/W queue and its jobs scheduling. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21	habanalabs: Mark queue as expecting CB handle or address	Tomer Tayar	1	-1/+3
	Jobs on some queues must be provided with a handle to a driver command buffer object, while for other queues, jobs must be provided with an address to a command buffer. Currently the distinction is done based on the queue type, which is less flexible if the same queue type behaves differently on different types of ASICs. This patch adds a new queue property for this target, which is configured per queue type per ASIC type. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05	habanalabs: stop using the acronym KMD	Oded Gabbay	1	-2/+3
	We want to stop using the acronym KMD. Therefore, replace all locations (except for register names we can't modify) where KMD is written to other terms such as "Linux kernel driver" or "Host kernel driver", etc. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05	habanalabs: add uapi to retrieve device utilization	Oded Gabbay	1	-4/+16
	Users and sysadmins usually want to know what is the device utilization as a level 0 indication if they are efficiently using the device. Add a new opcode to the INFO IOCTL that will return the device utilization over the last period of 100-1000ms. The return value is 0-100, representing as percentage the total utilization rate. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05	habanalabs: add debug print when rejecting CS	Oded Gabbay	1	-0/+2
	When rejecting CS because of too many in-flight CS, print a debug message about it as it useful to know when the user is debugging (it indicates a back-pressure from the driver as the device is not fast enough to consume the CS) Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-07-29	habanalabs: fix host memory polling in BE architecture	Ben Segal	1	-1/+1
	This patch fix a bug in the host memory polling macro. The bug is that the memory being polled can be written by the device, which always writes it in LE. However, if the host is running Linux in BE mode, we need to convert the value that was written by the device before matching it to the required value that the caller has given to the macro. Signed-off-by: Ben Segal <bpsegal20@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-09	habanalabs: change polling functions to macros	Oded Gabbay	1	-6/+4
	This patch changes two polling functions to macros, in order to make their API the same as the standard readl_poll_timeout so we would be able to define the "condition for exit" when calling these macros. This will simplify the code as it will eliminate the need to check both for timeout and for the (cond) in the calling function. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-01	habanalabs: remove redundant member from parser struct	Dalit Ben Zoor	1	-1/+0
	use_virt_addr member was used for telling whether to treat the addresses in the CB as virtual during parsing. We disabled it only when calling the parser from the driver memset device function, and since this call had been removed, it should always be enabled. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-25	habanalabs: rename restore to ctx_switch when appropriate	Oded Gabbay	1	-8/+8
	This patch only does renaming of certain variables and structure members, and their accompanied comments. This is done to better reflect the actions these variables and members represent. There is no functional change in this patch. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-27	habanalabs: improve error messages	Oded Gabbay	1	-1/+2
	This patch improves two error messages to help the user to better understand what error occurred. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-05	habanalabs: ratelimit warnings at start of IOCTLs	Oded Gabbay	1	-1/+1
	At the start of some IOCTLs we check if the device is disabled or in reset. If it is, we return -EBUSY and print a message to kernel log. Because these IOCTLs can be called at very high frequency, use ratelimit to avoid spamming the kernel log. Also use the same type of message - dev_warn - in all the relevant IOCTLs. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-03	habanalabs: perform accounting for active CS	Oded Gabbay	1	-0/+6
	This patch adds accounting for active CS. Active means that the CS was submitted to the H/W queues and was not completed yet. This is necessary to support suspend operation. Because the device will be reset upon suspend, we can only suspend after all active CS have been completed. Hence, we need to perform accounting on their number. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-02-28	habanalabs: soft-reset device if context-switch fails	Oded Gabbay	1	-7/+9
	This patch fix a bug in the driver, where if the TPC or MME remains in non-IDLE even after all the command submissions are done (due to user bug or malicious user), then future command submissions will fail in the context-switch stage and the driver will remain in "stuck" mode. The fix is to do a soft-reset of the device in case the context-switch fails, because the device should be IDLE during context-switch. If it is not IDLE, then something is wrong and we should reset the compute engines. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18	habanalabs: add debugfs support	Oded Gabbay	1	-0/+12
	This patch adds debugfs support to the driver. It allows the user-space to display information that is contained in the internal structures of the driver, such as: - active command submissions - active user virtual memory mappings - number of allocated command buffers It also enables the user to perform reads and writes through Goya's PCI bars. Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18	habanalabs: add command submission module	Oded Gabbay	1	-0/+766
	This patch adds the main flow for the user to submit work to the device. Each work is described by a command submission object (CS). The CS contains 3 arrays of command buffers: One for execution, and two for context-switch (store and restore). For each CB, the user specifies on which queue to put that CB. In case of an internal queue, the entry doesn't contain a pointer to the CB but the address in the on-chip memory that the CB resides at. The driver parses some of the CBs to enforce security restrictions. The user receives a sequence number that represents the CS object. The user can then query the driver regarding the status of the CS, using that sequence number. In case the CS doesn't finish before the timeout expires, the driver will perform a soft-reset of the device. Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>