| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
drm/vmwgfx: Fix a deadlock in dma buf fence polling
Introduce a version of the fence ops that on release doesn't remove
the fence from the pending list, and thus doesn't require a lock to
fix poll->fence wait->fence unref deadlocks.
vmwgfx overwrites the wait callback to iterate over the list of all
fences and update their status, to do that it holds a lock to prevent
the list modifcations from other threads. The fence destroy callback
both deletes the fence and removes it from the list of pending
fences, for which it holds a lock.
dma buf polling cb unrefs a fence after it's been signaled: so the poll
calls the wait, which signals the fences, which are being destroyed.
The destruction tries to acquire the lock on the pending fences list
which it can never get because it's held by the wait from which it
was called.
Old bug, but not a lot of userspace apps were using dma-buf polling
interfaces. Fix those, in particular this fixes KDE stalls/deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
md: fix deadlock between mddev_suspend and flush bio
Deadlock occurs when mddev is being suspended while some flush bio is in
progress. It is a complex issue.
T1. the first flush is at the ending stage, it clears 'mddev->flush_bio'
and tries to submit data, but is blocked because mddev is suspended
by T4.
T2. the second flush sets 'mddev->flush_bio', and attempts to queue
md_submit_flush_data(), which is already running (T1) and won't
execute again if on the same CPU as T1.
T3. the third flush inc active_io and tries to flush, but is blocked because
'mddev->flush_bio' is not NULL (set by T2).
T4. mddev_suspend() is called and waits for active_io dec to 0 which is inc
by T3.
T1 T2 T3 T4
(flush 1) (flush 2) (third 3) (suspend)
md_submit_flush_data
mddev->flush_bio = NULL;
.
. md_flush_request
. mddev->flush_bio = bio
. queue submit_flushes
. .
. . md_handle_request
. . active_io + 1
. . md_flush_request
. . wait !mddev->flush_bio
. .
. . mddev_suspend
. . wait !active_io
. .
. submit_flushes
. queue_work md_submit_flush_data
. //md_submit_flush_data is already running (T1)
.
md_handle_request
wait resume
The root issue is non-atomic inc/dec of active_io during flush process.
active_io is dec before md_submit_flush_data is queued, and inc soon
after md_submit_flush_data() run.
md_flush_request
active_io + 1
submit_flushes
active_io - 1
md_submit_flush_data
md_handle_request
active_io + 1
make_request
active_io - 1
If active_io is dec after md_handle_request() instead of within
submit_flushes(), make_request() can be called directly intead of
md_handle_request() in md_submit_flush_data(), and active_io will
only inc and dec once in the whole flush process. Deadlock will be
fixed.
Additionally, the only difference between fixing the issue and before is
that there is no return error handling of make_request(). But after
previous patch cleaned md_write_start(), make_requst() only return error
in raid5_make_request() by dm-raid, see commit 41425f96d7aa ("dm-raid456,
md/raid456: fix a deadlock for dm-raid456 while io concurrent with
reshape)". Since dm always splits data and flush operation into two
separate io, io size of flush submitted by dm always is 0, make_request()
will not be called in md_submit_flush_data(). To prevent future
modifications from introducing issues, add WARN_ON to ensure
make_request() no error is returned in this context. |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "ALSA: firewire-lib: operate for period elapse event in process context"
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
With RME Fireface 800, this lead to a regression since
Kernels 5.14.0, causing an AB/BA deadlock competition for the
substream lock with eventual system freeze under ALSA operation:
thread 0:
* (lock A) acquire substream lock by
snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) wait for tasklet to finish by calling
tasklet_unlock_spin_wait() in
tasklet_disable_in_atomic() in
ohci_flush_iso_completions() of ohci.c
thread 1:
* (lock B) enter tasklet
* (lock A) attempt to acquire substream lock,
waiting for it to be released:
snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed() in
update_pcm_pointers() in
process_ctx_payloads() in
process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
Restore the process context work queue to prevent deadlock
AB/BA deadlock competition for ALSA substream lock of
snd_pcm_stream_lock_irq() in snd_pcm_status64()
and snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed().
revert commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period
elapse event in process context")
Replace inline description to prevent future deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
usb: dwc3: core: remove lock of otg mode during gadget suspend/resume to avoid deadlock
When config CONFIG_USB_DWC3_DUAL_ROLE is selected, and trigger system
to enter suspend status with below command:
echo mem > /sys/power/state
There will be a deadlock issue occurring. Detailed invoking path as
below:
dwc3_suspend_common()
spin_lock_irqsave(&dwc->lock, flags); <-- 1st
dwc3_gadget_suspend(dwc);
dwc3_gadget_soft_disconnect(dwc);
spin_lock_irqsave(&dwc->lock, flags); <-- 2nd
This issue is exposed by commit c7ebd8149ee5 ("usb: dwc3: gadget: Fix
NULL pointer dereference in dwc3_gadget_suspend") that removes the code
of checking whether dwc->gadget_driver is NULL or not. It causes the
following code is executed and deadlock occurs when trying to get the
spinlock. In fact, the root cause is the commit 5265397f9442("usb: dwc3:
Remove DWC3 locking during gadget suspend/resume") that forgot to remove
the lock of otg mode. So, remove the redundant lock of otg mode during
gadget suspend/resume. |
| In the Linux kernel, the following vulnerability has been resolved:
io_uring: fix possible deadlock in io_register_iowq_max_workers()
The io_register_iowq_max_workers() function calls io_put_sq_data(),
which acquires the sqd->lock without releasing the uring_lock.
Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock
before acquiring sqd->lock"), this can lead to a potential deadlock
situation.
To resolve this issue, the uring_lock is released before calling
io_put_sq_data(), and then it is re-acquired after the function call.
This change ensures that the locks are acquired in the correct
order, preventing the possibility of a deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
net: ks8851: Fix deadlock with the SPI chip variant
When SMP is enabled and spinlocks are actually functional then there is
a deadlock with the 'statelock' spinlock between ks8851_start_xmit_spi
and ks8851_irq:
watchdog: BUG: soft lockup - CPU#0 stuck for 27s!
call trace:
queued_spin_lock_slowpath+0x100/0x284
do_raw_spin_lock+0x34/0x44
ks8851_start_xmit_spi+0x30/0xb8
ks8851_start_xmit+0x14/0x20
netdev_start_xmit+0x40/0x6c
dev_hard_start_xmit+0x6c/0xbc
sch_direct_xmit+0xa4/0x22c
__qdisc_run+0x138/0x3fc
qdisc_run+0x24/0x3c
net_tx_action+0xf8/0x130
handle_softirqs+0x1ac/0x1f0
__do_softirq+0x14/0x20
____do_softirq+0x10/0x1c
call_on_irq_stack+0x3c/0x58
do_softirq_own_stack+0x1c/0x28
__irq_exit_rcu+0x54/0x9c
irq_exit_rcu+0x10/0x1c
el1_interrupt+0x38/0x50
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x64/0x68
__netif_schedule+0x6c/0x80
netif_tx_wake_queue+0x38/0x48
ks8851_irq+0xb8/0x2c8
irq_thread_fn+0x2c/0x74
irq_thread+0x10c/0x1b0
kthread+0xc8/0xd8
ret_from_fork+0x10/0x20
This issue has not been identified earlier because tests were done on
a device with SMP disabled and so spinlocks were actually NOPs.
Now use spin_(un)lock_bh for TX queue related locking to avoid execution
of softirq work synchronously that would lead to a deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: mt76: mt7921s: fix potential hung tasks during chip recovery
During chip recovery (e.g. chip reset), there is a possible situation that
kernel worker reset_work is holding the lock and waiting for kernel thread
stat_worker to be parked, while stat_worker is waiting for the release of
the same lock.
It causes a deadlock resulting in the dumping of hung tasks messages and
possible rebooting of the device.
This patch prevents the execution of stat_worker during the chip recovery. |
| In the Linux kernel, the following vulnerability has been resolved:
net: fec: remove .ndo_poll_controller to avoid deadlocks
There is a deadlock issue found in sungem driver, please refer to the
commit ac0a230f719b ("eth: sungem: remove .ndo_poll_controller to avoid
deadlocks"). The root cause of the issue is that netpoll is in atomic
context and disable_irq() is called by .ndo_poll_controller interface
of sungem driver, however, disable_irq() might sleep. After analyzing
the implementation of fec_poll_controller(), the fec driver should have
the same issue. Due to the fec driver uses NAPI for TX completions, the
.ndo_poll_controller is unnecessary to be implemented in the fec driver,
so fec_poll_controller() can be safely removed. |
| In the Linux kernel, the following vulnerability has been resolved:
tty: xilinx_uartps: split sysrq handling
lockdep detects the following circular locking dependency:
CPU 0 CPU 1
========================== ============================
cdns_uart_isr() printk()
uart_port_lock(port) console_lock()
cdns_uart_console_write()
if (!port->sysrq)
uart_port_lock(port)
uart_handle_break()
port->sysrq = ...
uart_handle_sysrq_char()
printk()
console_lock()
The fixed commit attempts to avoid this situation by only taking the
port lock in cdns_uart_console_write if port->sysrq unset. However, if
(as shown above) cdns_uart_console_write runs before port->sysrq is set,
then it will try to take the port lock anyway. This may result in a
deadlock.
Fix this by splitting sysrq handling into two parts. We use the prepare
helper under the port lock and defer handling until we release the lock. |
| In the Linux kernel, the following vulnerability has been resolved:
ipv6: Fix soft lockups in fib6_select_path under high next hop churn
Soft lockups have been observed on a cluster of Linux-based edge routers
located in a highly dynamic environment. Using the `bird` service, these
routers continuously update BGP-advertised routes due to frequently
changing nexthop destinations, while also managing significant IPv6
traffic. The lockups occur during the traversal of the multipath
circular linked-list in the `fib6_select_path` function, particularly
while iterating through the siblings in the list. The issue typically
arises when the nodes of the linked list are unexpectedly deleted
concurrently on a different core—indicated by their 'next' and
'previous' elements pointing back to the node itself and their reference
count dropping to zero. This results in an infinite loop, leading to a
soft lockup that triggers a system panic via the watchdog timer.
Apply RCU primitives in the problematic code sections to resolve the
issue. Where necessary, update the references to fib6_siblings to
annotate or use the RCU APIs.
Include a test script that reproduces the issue. The script
periodically updates the routing table while generating a heavy load
of outgoing IPv6 traffic through multiple iperf3 clients. It
consistently induces infinite soft lockups within a couple of minutes.
Kernel log:
0 [ffffbd13003e8d30] machine_kexec at ffffffff8ceaf3eb
1 [ffffbd13003e8d90] __crash_kexec at ffffffff8d0120e3
2 [ffffbd13003e8e58] panic at ffffffff8cef65d4
3 [ffffbd13003e8ed8] watchdog_timer_fn at ffffffff8d05cb03
4 [ffffbd13003e8f08] __hrtimer_run_queues at ffffffff8cfec62f
5 [ffffbd13003e8f70] hrtimer_interrupt at ffffffff8cfed756
6 [ffffbd13003e8fd0] __sysvec_apic_timer_interrupt at ffffffff8cea01af
7 [ffffbd13003e8ff0] sysvec_apic_timer_interrupt at ffffffff8df1b83d
-- <IRQ stack> --
8 [ffffbd13003d3708] asm_sysvec_apic_timer_interrupt at ffffffff8e000ecb
[exception RIP: fib6_select_path+299]
RIP: ffffffff8ddafe7b RSP: ffffbd13003d37b8 RFLAGS: 00000287
RAX: ffff975850b43600 RBX: ffff975850b40200 RCX: 0000000000000000
RDX: 000000003fffffff RSI: 0000000051d383e4 RDI: ffff975850b43618
RBP: ffffbd13003d3800 R8: 0000000000000000 R9: ffff975850b40200
R10: 0000000000000000 R11: 0000000000000000 R12: ffffbd13003d3830
R13: ffff975850b436a8 R14: ffff975850b43600 R15: 0000000000000007
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
9 [ffffbd13003d3808] ip6_pol_route at ffffffff8ddb030c
10 [ffffbd13003d3888] ip6_pol_route_input at ffffffff8ddb068c
11 [ffffbd13003d3898] fib6_rule_lookup at ffffffff8ddf02b5
12 [ffffbd13003d3928] ip6_route_input at ffffffff8ddb0f47
13 [ffffbd13003d3a18] ip6_rcv_finish_core.constprop.0 at ffffffff8dd950d0
14 [ffffbd13003d3a30] ip6_list_rcv_finish.constprop.0 at ffffffff8dd96274
15 [ffffbd13003d3a98] ip6_sublist_rcv at ffffffff8dd96474
16 [ffffbd13003d3af8] ipv6_list_rcv at ffffffff8dd96615
17 [ffffbd13003d3b60] __netif_receive_skb_list_core at ffffffff8dc16fec
18 [ffffbd13003d3be0] netif_receive_skb_list_internal at ffffffff8dc176b3
19 [ffffbd13003d3c50] napi_gro_receive at ffffffff8dc565b9
20 [ffffbd13003d3c80] ice_receive_skb at ffffffffc087e4f5 [ice]
21 [ffffbd13003d3c90] ice_clean_rx_irq at ffffffffc0881b80 [ice]
22 [ffffbd13003d3d20] ice_napi_poll at ffffffffc088232f [ice]
23 [ffffbd13003d3d80] __napi_poll at ffffffff8dc18000
24 [ffffbd13003d3db8] net_rx_action at ffffffff8dc18581
25 [ffffbd13003d3e40] __do_softirq at ffffffff8df352e9
26 [ffffbd13003d3eb0] run_ksoftirqd at ffffffff8ceffe47
27 [ffffbd13003d3ec0] smpboot_thread_fn at ffffffff8cf36a30
28 [ffffbd13003d3ee8] kthread at ffffffff8cf2b39f
29 [ffffbd13003d3f28] ret_from_fork at ffffffff8ce5fa64
30 [ffffbd13003d3f50] ret_from_fork_asm at ffffffff8ce03cbb |
| In the Linux kernel, the following vulnerability has been resolved:
i3c: Use i3cdev->desc->info instead of calling i3c_device_get_info() to avoid deadlock
A deadlock may happen since the i3c_master_register() acquires
&i3cbus->lock twice. See the log below.
Use i3cdev->desc->info instead of calling i3c_device_info() to
avoid acquiring the lock twice.
v2:
- Modified the title and commit message
============================================
WARNING: possible recursive locking detected
6.11.0-mainline
--------------------------------------------
init/1 is trying to acquire lock:
f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_bus_normaluse_lock
but task is already holding lock:
f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&i3cbus->lock);
lock(&i3cbus->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by init/1:
#0: fcffff809b6798f8 (&dev->mutex){....}-{3:3}, at: __driver_attach
#1: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register
stack backtrace:
CPU: 6 UID: 0 PID: 1 Comm: init
Call trace:
dump_backtrace+0xfc/0x17c
show_stack+0x18/0x28
dump_stack_lvl+0x40/0xc0
dump_stack+0x18/0x24
print_deadlock_bug+0x388/0x390
__lock_acquire+0x18bc/0x32ec
lock_acquire+0x134/0x2b0
down_read+0x50/0x19c
i3c_bus_normaluse_lock+0x14/0x24
i3c_device_get_info+0x24/0x58
i3c_device_uevent+0x34/0xa4
dev_uevent+0x310/0x384
kobject_uevent_env+0x244/0x414
kobject_uevent+0x14/0x20
device_add+0x278/0x460
device_register+0x20/0x34
i3c_master_register_new_i3c_devs+0x78/0x154
i3c_master_register+0x6a0/0x6d4
mtk_i3c_master_probe+0x3b8/0x4d8
platform_probe+0xa0/0xe0
really_probe+0x114/0x454
__driver_probe_device+0xa0/0x15c
driver_probe_device+0x3c/0x1ac
__driver_attach+0xc4/0x1f0
bus_for_each_dev+0x104/0x160
driver_attach+0x24/0x34
bus_add_driver+0x14c/0x294
driver_register+0x68/0x104
__platform_driver_register+0x20/0x30
init_module+0x20/0xfe4
do_one_initcall+0x184/0x464
do_init_module+0x58/0x1ec
load_module+0xefc/0x10c8
__arm64_sys_finit_module+0x238/0x33c
invoke_syscall+0x58/0x10c
el0_svc_common+0xa8/0xdc
do_el0_svc+0x1c/0x28
el0_svc+0x50/0xac
el0t_64_sync_handler+0x70/0xbc
el0t_64_sync+0x1a8/0x1ac |
| In the Linux kernel, the following vulnerability has been resolved:
RDMA/hns: Fix deadlock on SRQ async events.
xa_lock for SRQ table may be required in AEQ. Use xa_store_irq()/
xa_erase_irq() to avoid deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
soc: qcom: pdr: Fix the potential deadlock
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the response will queued to the same qmi->wq and it is
ordered workqueue and process B is not able to complete new server
request work due to deadlock on list lock.
Fix it by removing the unnecessary list iteration as the list iteration
is already being done inside locator work, so avoid it here and just
call schedule_work() here.
Process A Process B
process_scheduled_works()
pdr_add_lookup() qmi_data_ready_work()
process_scheduled_works() pdr_locator_new_server()
pdr->locator_init_complete=true;
pdr_locator_work()
mutex_lock(&pdr->list_lock);
pdr_locate_service() mutex_lock(&pdr->list_lock);
pdr_get_domain_list()
pr_err("PDR: %s get domain list
txn wait failed: %d\n",
req->service_name,
ret);
Timeout error log due to deadlock:
"
PDR: tms/servreg get domain list txn wait failed: -110
PDR: service lookup for msm/adsp/sensor_pd:tms/servreg failed: -110
"
Thanks to Bjorn and Johan for letting me know that this commit also fixes
an audio regression when using the in-kernel pd-mapper as that makes it
easier to hit this race. [1] |
| In the Linux kernel, the following vulnerability has been resolved:
net: switchdev: Convert blocking notification chain to a raw one
A blocking notification chain uses a read-write semaphore to protect the
integrity of the chain. The semaphore is acquired for writing when
adding / removing notifiers to / from the chain and acquired for reading
when traversing the chain and informing notifiers about an event.
In case of the blocking switchdev notification chain, recursive
notifications are possible which leads to the semaphore being acquired
twice for reading and to lockdep warnings being generated [1].
Specifically, this can happen when the bridge driver processes a
SWITCHDEV_BRPORT_UNOFFLOADED event which causes it to emit notifications
about deferred events when calling switchdev_deferred_process().
Fix this by converting the notification chain to a raw notification
chain in a similar fashion to the netdev notification chain. Protect
the chain using the RTNL mutex by acquiring it when modifying the chain.
Events are always informed under the RTNL mutex, but add an assertion in
call_switchdev_blocking_notifiers() to make sure this is not violated in
the future.
Maintain the "blocking" prefix as events are always emitted from process
context and listeners are allowed to block.
[1]:
WARNING: possible recursive locking detected
6.14.0-rc4-custom-g079270089484 #1 Not tainted
--------------------------------------------
ip/52731 is trying to acquire lock:
ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0
but task is already holding lock:
ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock((switchdev_blocking_notif_chain).rwsem);
lock((switchdev_blocking_notif_chain).rwsem);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by ip/52731:
#0: ffffffff84f795b0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x727/0x1dc0
#1: ffffffff8731f628 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x790/0x1dc0
#2: ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0
stack backtrace:
...
? __pfx_down_read+0x10/0x10
? __pfx_mark_lock+0x10/0x10
? __pfx_switchdev_port_attr_set_deferred+0x10/0x10
blocking_notifier_call_chain+0x58/0xa0
switchdev_port_attr_notify.constprop.0+0xb3/0x1b0
? __pfx_switchdev_port_attr_notify.constprop.0+0x10/0x10
? mark_held_locks+0x94/0xe0
? switchdev_deferred_process+0x11a/0x340
switchdev_port_attr_set_deferred+0x27/0xd0
switchdev_deferred_process+0x164/0x340
br_switchdev_port_unoffload+0xc8/0x100 [bridge]
br_switchdev_blocking_event+0x29f/0x580 [bridge]
notifier_call_chain+0xa2/0x440
blocking_notifier_call_chain+0x6e/0xa0
switchdev_bridge_port_unoffload+0xde/0x1a0
... |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: avoid holding freeze_mutex during mmap operation
We use map->freeze_mutex to prevent races between map_freeze() and
memory mapping BPF map contents with writable permissions. The way we
naively do this means we'll hold freeze_mutex for entire duration of all
the mm and VMA manipulations, which is completely unnecessary. This can
potentially also lead to deadlocks, as reported by syzbot in [0].
So, instead, hold freeze_mutex only during writeability checks, bump
(proactively) "write active" count for the map, unlock the mutex and
proceed with mmap logic. And only if something went wrong during mmap
logic, then undo that "write active" counter increment.
[0] https://lore.kernel.org/bpf/678dcbc9.050a0220.303755.0066.GAE@google.com/ |
| An attacker may cause chunk-size mismatches that block file transfers and prevent subsequent transfers. |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix deadlock when cloning inline extents and using qgroups
There are a few exceptional cases where cloning an inline extent needs to
copy the inline extent data into a page of the destination inode.
When this happens, we end up starting a transaction while having a dirty
page for the destination inode and while having the range locked in the
destination's inode iotree too. Because when reserving metadata space
for a transaction we may need to flush existing delalloc in case there is
not enough free space, we have a mechanism in place to prevent a deadlock,
which was introduced in commit 3d45f221ce627d ("btrfs: fix deadlock when
cloning inline extent and low on free metadata space").
However when using qgroups, a transaction also reserves metadata qgroup
space, which can also result in flushing delalloc in case there is not
enough available space at the moment. When this happens we deadlock, since
flushing delalloc requires locking the file range in the inode's iotree
and the range was already locked at the very beginning of the clone
operation, before attempting to start the transaction.
When this issue happens, stack traces like the following are reported:
[72747.556262] task:kworker/u81:9 state:D stack: 0 pid: 225 ppid: 2 flags:0x00004000
[72747.556268] Workqueue: writeback wb_workfn (flush-btrfs-1142)
[72747.556271] Call Trace:
[72747.556273] __schedule+0x296/0x760
[72747.556277] schedule+0x3c/0xa0
[72747.556279] io_schedule+0x12/0x40
[72747.556284] __lock_page+0x13c/0x280
[72747.556287] ? generic_file_readonly_mmap+0x70/0x70
[72747.556325] extent_write_cache_pages+0x22a/0x440 [btrfs]
[72747.556331] ? __set_page_dirty_nobuffers+0xe7/0x160
[72747.556358] ? set_extent_buffer_dirty+0x5e/0x80 [btrfs]
[72747.556362] ? update_group_capacity+0x25/0x210
[72747.556366] ? cpumask_next_and+0x1a/0x20
[72747.556391] extent_writepages+0x44/0xa0 [btrfs]
[72747.556394] do_writepages+0x41/0xd0
[72747.556398] __writeback_single_inode+0x39/0x2a0
[72747.556403] writeback_sb_inodes+0x1ea/0x440
[72747.556407] __writeback_inodes_wb+0x5f/0xc0
[72747.556410] wb_writeback+0x235/0x2b0
[72747.556414] ? get_nr_inodes+0x35/0x50
[72747.556417] wb_workfn+0x354/0x490
[72747.556420] ? newidle_balance+0x2c5/0x3e0
[72747.556424] process_one_work+0x1aa/0x340
[72747.556426] worker_thread+0x30/0x390
[72747.556429] ? create_worker+0x1a0/0x1a0
[72747.556432] kthread+0x116/0x130
[72747.556435] ? kthread_park+0x80/0x80
[72747.556438] ret_from_fork+0x1f/0x30
[72747.566958] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[72747.566961] Call Trace:
[72747.566964] __schedule+0x296/0x760
[72747.566968] ? finish_wait+0x80/0x80
[72747.566970] schedule+0x3c/0xa0
[72747.566995] wait_extent_bit.constprop.68+0x13b/0x1c0 [btrfs]
[72747.566999] ? finish_wait+0x80/0x80
[72747.567024] lock_extent_bits+0x37/0x90 [btrfs]
[72747.567047] btrfs_invalidatepage+0x299/0x2c0 [btrfs]
[72747.567051] ? find_get_pages_range_tag+0x2cd/0x380
[72747.567076] __extent_writepage+0x203/0x320 [btrfs]
[72747.567102] extent_write_cache_pages+0x2bb/0x440 [btrfs]
[72747.567106] ? update_load_avg+0x7e/0x5f0
[72747.567109] ? enqueue_entity+0xf4/0x6f0
[72747.567134] extent_writepages+0x44/0xa0 [btrfs]
[72747.567137] ? enqueue_task_fair+0x93/0x6f0
[72747.567140] do_writepages+0x41/0xd0
[72747.567144] __filemap_fdatawrite_range+0xc7/0x100
[72747.567167] btrfs_run_delalloc_work+0x17/0x40 [btrfs]
[72747.567195] btrfs_work_helper+0xc2/0x300 [btrfs]
[72747.567200] process_one_work+0x1aa/0x340
[72747.567202] worker_thread+0x30/0x390
[72747.567205] ? create_worker+0x1a0/0x1a0
[72747.567208] kthread+0x116/0x130
[72747.567211] ? kthread_park+0x80/0x80
[72747.567214] ret_from_fork+0x1f/0x30
[72747.569686] task:fsstress state:D stack:
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
Currently, zswap_cpu_comp_dead() calls crypto_free_acomp() while holding
the per-CPU acomp_ctx mutex. crypto_free_acomp() then holds scomp_lock
(through crypto_exit_scomp_ops_async()).
On the other hand, crypto_alloc_acomp_node() holds the scomp_lock (through
crypto_scomp_init_tfm()), and then allocates memory. If the allocation
results in reclaim, we may attempt to hold the per-CPU acomp_ctx mutex.
The above dependencies can cause an ABBA deadlock. For example in the
following scenario:
(1) Task A running on CPU #1:
crypto_alloc_acomp_node()
Holds scomp_lock
Enters reclaim
Reads per_cpu_ptr(pool->acomp_ctx, 1)
(2) Task A is descheduled
(3) CPU #1 goes offline
zswap_cpu_comp_dead(CPU #1)
Holds per_cpu_ptr(pool->acomp_ctx, 1))
Calls crypto_free_acomp()
Waits for scomp_lock
(4) Task A running on CPU #2:
Waits for per_cpu_ptr(pool->acomp_ctx, 1) // Read on CPU #1
DEADLOCK
Since there is no requirement to call crypto_free_acomp() with the per-CPU
acomp_ctx mutex held in zswap_cpu_comp_dead(), move it after the mutex is
unlocked. Also move the acomp_request_free() and kfree() calls for
consistency and to avoid any potential sublte locking dependencies in the
future.
With this, only setting acomp_ctx fields to NULL occurs with the mutex
held. This is similar to how zswap_cpu_comp_prepare() only initializes
acomp_ctx fields with the mutex held, after performing all allocations
before holding the mutex.
Opportunistically, move the NULL check on acomp_ctx so that it takes place
before the mutex dereference. |
| In the Linux kernel, the following vulnerability has been resolved:
workqueue: Do not warn when cancelling WQ_MEM_RECLAIM work from !WQ_MEM_RECLAIM worker
After commit
746ae46c1113 ("drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM")
amdgpu started seeing the following warning:
[ ] workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu]
...
[ ] Workqueue: sdma0 drm_sched_run_job_work [gpu_sched]
...
[ ] Call Trace:
[ ] <TASK>
...
[ ] ? check_flush_dependency+0xf5/0x110
...
[ ] cancel_delayed_work_sync+0x6e/0x80
[ ] amdgpu_gfx_off_ctrl+0xab/0x140 [amdgpu]
[ ] amdgpu_ring_alloc+0x40/0x50 [amdgpu]
[ ] amdgpu_ib_schedule+0xf4/0x810 [amdgpu]
[ ] ? drm_sched_run_job_work+0x22c/0x430 [gpu_sched]
[ ] amdgpu_job_run+0xaa/0x1f0 [amdgpu]
[ ] drm_sched_run_job_work+0x257/0x430 [gpu_sched]
[ ] process_one_work+0x217/0x720
...
[ ] </TASK>
The intent of the verifcation done in check_flush_depedency is to ensure
forward progress during memory reclaim, by flagging cases when either a
memory reclaim process, or a memory reclaim work item is flushed from a
context not marked as memory reclaim safe.
This is correct when flushing, but when called from the
cancel(_delayed)_work_sync() paths it is a false positive because work is
either already running, or will not be running at all. Therefore
cancelling it is safe and we can relax the warning criteria by letting the
helper know of the calling context.
References: 746ae46c1113 ("drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM") |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe: Only use reserved BCS instances for usm migrate exec queue
The GuC context scheduling queue is 2 entires deep, thus it is possible
for a migration job to be stuck behind a fault if migration exec queue
shares engines with user jobs. This can deadlock as the migrate exec
queue is required to service page faults. Avoid deadlock by only using
reserved BCS instances for usm migrate exec queue.
(cherry picked from commit 04f4a70a183a688a60fe3882d6e4236ea02cfc67) |