87 lines
3.3 KiB
Diff
87 lines
3.3 KiB
Diff
|
From: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
||
|
Date: Fri, 31 Aug 2018 11:31:06 +0300
|
||
|
Subject: [PATCH] mac80211: fix a race between restart and CSA flows
|
||
|
|
||
|
We hit a problem with iwlwifi that was caused by a bug in
|
||
|
mac80211. A bug in iwlwifi caused the firwmare to crash in
|
||
|
certain cases in channel switch. Because of that bug,
|
||
|
drv_pre_channel_switch would fail and trigger the restart
|
||
|
flow.
|
||
|
Now we had the hw restart worker which runs on the system's
|
||
|
workqueue and the csa_connection_drop_work worker that runs
|
||
|
on mac80211's workqueue that can run together. This is
|
||
|
obviously problematic since the restart work wants to
|
||
|
reconfigure the connection, while the csa_connection_drop_work
|
||
|
worker does the exact opposite: it tries to disconnect.
|
||
|
|
||
|
Fix this by cancelling the csa_connection_drop_work worker
|
||
|
in the restart worker.
|
||
|
|
||
|
Note that this can sound racy: we could have:
|
||
|
|
||
|
driver iface_work CSA_work restart_work
|
||
|
+++++++++++++++++++++++++++++++++++++++++++++
|
||
|
|
|
||
|
<--drv_cs ---|
|
||
|
<FW CRASH!>
|
||
|
-CS FAILED-->
|
||
|
| |
|
||
|
| cancel_work(CSA)
|
||
|
schedule |
|
||
|
CSA work |
|
||
|
| |
|
||
|
Race between those 2
|
||
|
|
||
|
But this is not possible because we flush the workqueue
|
||
|
in the restart worker before we cancel the CSA worker.
|
||
|
That would be bullet proof if we could guarantee that
|
||
|
we schedule the CSA worker only from the iface_work
|
||
|
which runs on the workqueue (and not on the system's
|
||
|
workqueue), but unfortunately we do have an instance
|
||
|
in which we schedule the CSA work outside the context
|
||
|
of the workqueue (ieee80211_chswitch_done).
|
||
|
|
||
|
Note also that we should probably cancel other workers
|
||
|
like beacon_connection_loss_work and possibly others
|
||
|
for different types of interfaces, at the very least,
|
||
|
IBSS should suffer from the exact same problem, but for
|
||
|
now, do the minimum to fix the actual bug that was actually
|
||
|
experienced and reproduced.
|
||
|
|
||
|
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
||
|
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
||
|
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
||
|
---
|
||
|
|
||
|
--- a/net/mac80211/main.c
|
||
|
+++ b/net/mac80211/main.c
|
||
|
@@ -255,8 +255,27 @@ static void ieee80211_restart_work(struc
|
||
|
|
||
|
flush_work(&local->radar_detected_work);
|
||
|
rtnl_lock();
|
||
|
- list_for_each_entry(sdata, &local->interfaces, list)
|
||
|
+ list_for_each_entry(sdata, &local->interfaces, list) {
|
||
|
+ /*
|
||
|
+ * XXX: there may be more work for other vif types and even
|
||
|
+ * for station mode: a good thing would be to run most of
|
||
|
+ * the iface type's dependent _stop (ieee80211_mg_stop,
|
||
|
+ * ieee80211_ibss_stop) etc...
|
||
|
+ * For now, fix only the specific bug that was seen: race
|
||
|
+ * between csa_connection_drop_work and us.
|
||
|
+ */
|
||
|
+ if (sdata->vif.type == NL80211_IFTYPE_STATION) {
|
||
|
+ /*
|
||
|
+ * This worker is scheduled from the iface worker that
|
||
|
+ * runs on mac80211's workqueue, so we can't be
|
||
|
+ * scheduling this worker after the cancel right here.
|
||
|
+ * The exception is ieee80211_chswitch_done.
|
||
|
+ * Then we can have a race...
|
||
|
+ */
|
||
|
+ cancel_work_sync(&sdata->u.mgd.csa_connection_drop_work);
|
||
|
+ }
|
||
|
flush_delayed_work(&sdata->dec_tailroom_needed_wk);
|
||
|
+ }
|
||
|
ieee80211_scan_cancel(local);
|
||
|
|
||
|
/* make sure any new ROC will consider local->in_reconfig */
|