From 9d5fcd6249b8b67687e8e5e731a507592966a40e Mon Sep 17 00:00:00 2001 From: Taylor R Campbell Date: Mon, 22 Jan 2024 22:24:20 +0000 Subject: [PATCH] ld@sdmmc(4): Hack around deadlock in cache sync on detach. Yanking a card triggers the sdmmc discovery task, which runs in the sdmmc task thread, to detach any attached child devices. Detaching ld@sdmmc triggers a cache flush (via ldbegindetach -> disk_begindetach -> ld_lastclose -> ld_flush -> ioctl DIOCCACHESYNC), which is implemented by scheduling a task to do sdmmc_mem_flush_cache and then waiting for it to complete. The sdmmc_mem_cache_flush is done by an sdmmc task so it happens after all previously scheduled I/O operations -- that way the cache flush doesn't complete until the previously scheduled I/O operations are complete. However, when the cache flush task is issued from the discovery task, this doesn't work, because the cache flush task can't start until the discovery task has returned -- but the discovery task won't return until the cache flush task has completed. To work around this deadlock, which usually happens only when the device has been yanked anyway so further I/O would be lost anyway, just do the cache flush synchronously in DIOCCACHESYNC if we're running in the task thread. This isn't quite right -- implementation details of the task thread shouldn't bleed into ld@sdmmc, and running the cache sync _before_ any subsequently scheduled I/O tasks is asking for trouble -- but it should serve to avoid the deadlock in PR kern/57870 until we can fix a host of concurrency bugs in sdmmc by fixing the locking scheme and running discovery in a separate thread from tasks. XXX pullup-10 --- sys/dev/sdmmc/ld_sdmmc.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/sys/dev/sdmmc/ld_sdmmc.c b/sys/dev/sdmmc/ld_sdmmc.c index 0c310c3fe21b..9a527ad98db5 100644 --- a/sys/dev/sdmmc/ld_sdmmc.c +++ b/sys/dev/sdmmc/ld_sdmmc.c @@ -599,9 +599,24 @@ static int ld_sdmmc_cachesync(struct ld_softc *ld, bool poll) { struct ld_sdmmc_softc *sc = device_private(ld->sc_dv); + struct sdmmc_softc *sdmmc = device_private(device_parent(ld->sc_dv)); struct ld_sdmmc_task *task; int error = -1; + /* + * If we come here through the sdmmc discovery task, we can't + * wait for a new task because the new task can't even begin + * until the sdmmc discovery task has completed. + * + * XXX This is wrong, because there may already be queued I/O + * tasks ahead of us. Fixing this properly requires doing + * discovery in a separate thread. But this should avoid the + * deadlock of PR kern/57870 (https://gnats.NetBSD.org/57870) + * until we do split that up. + */ + if (curlwp == sdmmc->sc_tskq_lwp) + return sdmmc_mem_flush_cache(sc->sc_sf, poll); + mutex_enter(&sc->sc_lock); /* Acquire a free task, or fail with EBUSY. */