openwrtv4/package/kernel/mac80211/patches/390-mac80211-fix-a-race-between-restart-and-CSA-flows.patch

From: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date: Fri, 31 Aug 2018 11:31:06 +0300
Subject: [PATCH] mac80211: fix a race between restart and CSA flows

We hit a problem with iwlwifi that was caused by a bug in
mac80211. A bug in iwlwifi caused the firwmare to crash in
certain cases in channel switch. Because of that bug,
drv_pre_channel_switch would fail and trigger the restart
flow.
Now we had the hw restart worker which runs on the system's
workqueue and the csa_connection_drop_work worker that runs
on mac80211's workqueue that can run together. This is
obviously problematic since the restart work wants to
reconfigure the connection, while the csa_connection_drop_work
worker does the exact opposite: it tries to disconnect.

Fix this by cancelling the csa_connection_drop_work worker
in the restart worker.

Note that this can sound racy: we could have:

driver   iface_work   CSA_work   restart_work
+++++++++++++++++++++++++++++++++++++++++++++
              |
 <--drv_cs ---|
<FW CRASH!>
-CS FAILED-->
              |                       |
              |                 cancel_work(CSA)
           schedule                   |
           CSA work                   |
                         |            |
                        Race between those 2

But this is not possible because we flush the workqueue
in the restart worker before we cancel the CSA worker.
That would be bullet proof if we could guarantee that
we schedule the CSA worker only from the iface_work
which runs on the workqueue (and not on the system's
workqueue), but unfortunately we do have an instance
in which we schedule the CSA work outside the context
of the workqueue (ieee80211_chswitch_done).

Note also that we should probably cancel other workers
like beacon_connection_loss_work and possibly others
for different types of interfaces, at the very least,
IBSS should suffer from the exact same problem, but for
now, do the minimum to fix the actual bug that was actually
experienced and reproduced.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---

--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -255,8 +255,27 @@ static void ieee80211_restart_work(struc
 
 	flush_work(&local->radar_detected_work);
 	rtnl_lock();
-	list_for_each_entry(sdata, &local->interfaces, list)
+	list_for_each_entry(sdata, &local->interfaces, list) {
+		/*
+		 * XXX: there may be more work for other vif types and even
+		 * for station mode: a good thing would be to run most of
+		 * the iface type's dependent _stop (ieee80211_mg_stop,
+		 * ieee80211_ibss_stop) etc...
+		 * For now, fix only the specific bug that was seen: race
+		 * between csa_connection_drop_work and us.
+		 */
+		if (sdata->vif.type == NL80211_IFTYPE_STATION) {
+			/*
+			 * This worker is scheduled from the iface worker that
+			 * runs on mac80211's workqueue, so we can't be
+			 * scheduling this worker after the cancel right here.
+			 * The exception is ieee80211_chswitch_done.
+			 * Then we can have a race...
+			 */
+			cancel_work_sync(&sdata->u.mgd.csa_connection_drop_work);
+		}
 		flush_delayed_work(&sdata->dec_tailroom_needed_wk);
+	}
 	ieee80211_scan_cancel(local);
 
 	/* make sure any new ROC will consider local->in_reconfig */
mac80211: backport upstream fixes Backport most significant upstream fixes (excl. hwsim fixes) Refreshed all patches. Contains important fixes for CSA (Channel Switch Announcement) and A-MSDU frames. Signed-off-by: Koen Vandeputte <koen.vandeputte@ncentric.com> 2018-09-04 13:09:48 +00:00			`From: Emmanuel Grumbach <emmanuel.grumbach@intel.com>`
			`Date: Fri, 31 Aug 2018 11:31:06 +0300`
			`Subject: [PATCH] mac80211: fix a race between restart and CSA flows`

			`We hit a problem with iwlwifi that was caused by a bug in`
			`mac80211. A bug in iwlwifi caused the firwmare to crash in`
			`certain cases in channel switch. Because of that bug,`
			`drv_pre_channel_switch would fail and trigger the restart`
			`flow.`
			`Now we had the hw restart worker which runs on the system's`
			`workqueue and the csa_connection_drop_work worker that runs`
			`on mac80211's workqueue that can run together. This is`
			`obviously problematic since the restart work wants to`
			`reconfigure the connection, while the csa_connection_drop_work`
			`worker does the exact opposite: it tries to disconnect.`

			`Fix this by cancelling the csa_connection_drop_work worker`
			`in the restart worker.`

			`Note that this can sound racy: we could have:`

			`driver iface_work CSA_work restart_work`
			`+++++++++++++++++++++++++++++++++++++++++++++`
			`\|`
			`<--drv_cs ---\|`
			`<FW CRASH!>`
			`-CS FAILED-->`
			`\| \|`
			`\| cancel_work(CSA)`
			`schedule \|`
			`CSA work \|`
			`\| \|`
			`Race between those 2`

			`But this is not possible because we flush the workqueue`
			`in the restart worker before we cancel the CSA worker.`
			`That would be bullet proof if we could guarantee that`
			`we schedule the CSA worker only from the iface_work`
			`which runs on the workqueue (and not on the system's`
			`workqueue), but unfortunately we do have an instance`
			`in which we schedule the CSA work outside the context`
			`of the workqueue (ieee80211_chswitch_done).`

			`Note also that we should probably cancel other workers`
			`like beacon_connection_loss_work and possibly others`
			`for different types of interfaces, at the very least,`
			`IBSS should suffer from the exact same problem, but for`
			`now, do the minimum to fix the actual bug that was actually`
			`experienced and reproduced.`

			`Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>`
			`Signed-off-by: Luca Coelho <luciano.coelho@intel.com>`
			`Signed-off-by: Johannes Berg <johannes.berg@intel.com>`
			`---`

			`--- a/net/mac80211/main.c`
			`+++ b/net/mac80211/main.c`
			`@@ -255,8 +255,27 @@ static void ieee80211_restart_work(struc`

			`flush_work(&local->radar_detected_work);`
			`rtnl_lock();`
			`- list_for_each_entry(sdata, &local->interfaces, list)`
			`+ list_for_each_entry(sdata, &local->interfaces, list) {`
			`+ /*`
			`+ * XXX: there may be more work for other vif types and even`
			`+ * for station mode: a good thing would be to run most of`
			`+ * the iface type's dependent _stop (ieee80211_mg_stop,`
			`+ * ieee80211_ibss_stop) etc...`
			`+ * For now, fix only the specific bug that was seen: race`
			`+ * between csa_connection_drop_work and us.`
			`+ */`
			`+ if (sdata->vif.type == NL80211_IFTYPE_STATION) {`
			`+ /*`
			`+ * This worker is scheduled from the iface worker that`
			`+ * runs on mac80211's workqueue, so we can't be`
			`+ * scheduling this worker after the cancel right here.`
			`+ * The exception is ieee80211_chswitch_done.`
			`+ * Then we can have a race...`
			`+ */`
			`+ cancel_work_sync(&sdata->u.mgd.csa_connection_drop_work);`
			`+ }`
			`flush_delayed_work(&sdata->dec_tailroom_needed_wk);`
			`+ }`
			`ieee80211_scan_cancel(local);`

			`/* make sure any new ROC will consider local->in_reconfig */`