Vaibhav Gupta [Wed, 24 Jun 2020 17:51:17 +0000 (23:21 +0530)]
bnx2x: use generic power management
With legacy PM, drivers themselves were responsible for managing the
device's power states and takes care of register states.
After upgrading to the generic structure, PCI core will take care of
required tasks and drivers should do only device-specific operations.
The driver was also calling bnx2x_set_power_state() to set the power state
of the device by changing the device's registers' value. It is no more
needed.
Compile-tested only.
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 26 Jun 2020 02:29:51 +0000 (19:29 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Minor overlapping changes in xfrm_device.c, between the double
ESP trailing bug fix setting the XFRM_INIT flag and the changes
in net-next preparing for bonding encryption support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 26 Jun 2020 01:27:40 +0000 (18:27 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen.
2) The default crypto algorithm selection in Kconfig for IPSEC is out
of touch with modern reality, fix this up. From Eric Biggers.
3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii
Nakryiko.
4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from
Hangbin Liu.
5) Adjust packet alignment handling in ax88179_178a driver to match
what the hardware actually does. From Jeremy Kerr.
6) register_netdevice can leak in the case one of the notifiers fail,
from Yang Yingliang.
7) Use after free in ip_tunnel_lookup(), from Taehee Yoo.
8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir
Oltean.
9) tg3 driver can sleep forever when we get enough EEH errors, fix from
David Christensen.
10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet
drivers, from Ciara Loftus.
11) Fix scanning loop break condition in of_mdiobus_register(), from
Florian Fainelli.
12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon.
13) Endianness fix in mlxsw, from Ido Schimmel.
14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen.
15) Missing bridge mrp configuration validation, from Horatiu Vultur.
16) Fix circular netns references in wireguard, from Jason A. Donenfeld.
17) PTP initialization on recovery is not done properly in qed driver,
from Alexander Lobakin.
18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong,
from Rahul Lakkireddy.
19) Don't clear bound device TX queue of socket prematurely otherwise we
get problems with ktls hw offloading, from Tariq Toukan.
20) ipset can do atomics on unaligned memory, fix from Russell King.
21) Align ethernet addresses properly in bridging code, from Thomas
Martitz.
22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set,
from Marcelo Ricardo Leitner.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits)
rds: transport module should be auto loaded when transport is set
sch_cake: fix a few style nits
sch_cake: don't call diffserv parsing code when it is not needed
sch_cake: don't try to reallocate or unshare skb unconditionally
ethtool: fix error handling in linkstate_prepare_data()
wil6210: account for napi_gro_receive never returning GRO_DROP
hns: do not cast return value of napi_gro_receive to null
socionext: account for napi_gro_receive never returning GRO_DROP
wireguard: receive: account for napi_gro_receive never returning GRO_DROP
vxlan: fix last fdb index during dump of fdb with nhid
sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
tc-testing: avoid action cookies with odd length.
bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
net: dsa: sja1105: fix tc-gate schedule with single element
net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
net: dsa: sja1105: unconditionally free old gating config
net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
net: macb: free resources on failure path of at91ether_open()
net: macb: call pm_runtime_put_sync on failure path
...
Kevin Darbyshire-Bryant [Thu, 25 Jun 2020 20:18:00 +0000 (22:18 +0200)]
sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling
Change tin mapping on diffserv3, 4 & 8 for LE PHB support, in essence
making LE a member of the Bulk tin.
Bulk has the least priority and minimum of 1/16th total bandwidth in the
face of higher priority traffic.
NB: Diffserv 3 & 4 swap tin 0 & 1 priorities from the default order as
found in diffserv8, in case anyone is wondering why it looks a bit odd.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
[ reword commit message slightly ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rao Shoaib [Thu, 25 Jun 2020 20:46:00 +0000 (13:46 -0700)]
rds: transport module should be auto loaded when transport is set
This enhancement auto loads transport module when the transport
is set via SO_RDS_TRANSPORT socket option.
Reviewed-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 23:24:05 +0000 (16:24 -0700)]
Merge branch 'sched-A-couple-of-fixes-for-sch_cake'
Toke Høiland-Jørgensen says:
====================
sched: A couple of fixes for sch_cake
This series contains a couple of fixes for diffserv handling in sch_cake that
provide a nice speedup (with a somewhat pedantic nit fix tacked on to the end).
Not quite sure about whether this should go to stable; it does provide a nice
speedup, but it's not strictly a fix in the "correctness" sense. I lean towards
including this in stable as well, since our most important consumer of that
(OpenWrt) is likely to backport the series anyway.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Thu, 25 Jun 2020 20:12:09 +0000 (22:12 +0200)]
sch_cake: fix a few style nits
I spotted a few nits when comparing the in-tree version of sch_cake with
the out-of-tree one: A redundant error variable declaration shadowing an
outer declaration, and an indentation alignment issue. Fix both of these.
Fixes:
046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Thu, 25 Jun 2020 20:12:08 +0000 (22:12 +0200)]
sch_cake: don't call diffserv parsing code when it is not needed
As a further optimisation of the diffserv parsing codepath, we can skip it
entirely if CAKE is configured to neither use diffserv-based
classification, nor to zero out the diffserv bits.
Fixes:
c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Ponetayev [Thu, 25 Jun 2020 20:12:07 +0000 (22:12 +0200)]
sch_cake: don't try to reallocate or unshare skb unconditionally
cake_handle_diffserv() tries to linearize mac and network header parts of
skb and to make it writable unconditionally. In some cases it leads to full
skb reallocation, which reduces throughput and increases CPU load. Some
measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
reallocates skb, if skb was allocated in ethernet driver via so-called
'build skb' method from page cache (it was discovered by strange increase
of kmalloc-2048 slab at first).
Obtain DSCP value via read-only skb_header_pointer() call, and leave
linearization only for DSCP bleaching or ECN CE setting. And, as an
additional optimisation, skip diffserv parsing entirely if it is not needed
by the current configuration.
Fixes:
c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
[ fix a few style issues, reflow commit message ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 23:22:11 +0000 (16:22 -0700)]
Merge branch 'net-phy-mscc-multiple-improvements'
Antoine Tenart says:
====================
net: phy: mscc: multiple improvements
This series contains various improvements to the MSCC PHY driver, fixing
sparse and smatch warnings, using functions provided by the PHY core,
and improving the driver consistency and maintenance.
I don't think any of those improvements and fixes is worth backporting
to stable trees.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Thu, 25 Jun 2020 15:42:11 +0000 (17:42 +0200)]
net: phy: mscc: improve vsc8514/8584_config_init consistency
All PHY read and write return values are checked for errors in
vsc8514_config_init and vsc8584_config_init, except for one. Fix this.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Thu, 25 Jun 2020 15:42:10 +0000 (17:42 +0200)]
net: phy: mscc: remove useless page configuration in the config init
In the middle of vsc8584_config_init and vsc8514_config_init, the page
is set to 'standard'. This is the default value, and the page isn't set
to another value before. Those pages configuration can be safely
removed.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Thu, 25 Jun 2020 15:42:09 +0000 (17:42 +0200)]
net: phy: mscc: restore the base page in vsc8514/8584_config_init
In the vsc8584_config_init and vsc8514_config_init, the base page is set
to 'GPIO', configuration is done, and the page is never explicitly
restored to the standard page. No bug was triggered as it turns out
helpers called in those config_init functions do modify the base page,
and set it back to standard. But that is dangerous and any modification
to those functions would introduce bugs. This patch fixes this, to
improve maintenance, by restoring the base page to 'standard' once
'GPIO' accesses are completed.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Thu, 25 Jun 2020 15:42:08 +0000 (17:42 +0200)]
net: phy: mscc: do not access the MDIO bus lock directly
This patch improves the MSCC driver by using the provided
phy_lock_mdio_bus and phy_unlock_mdio_bus helpers instead of locking and
unlocking the MDIO bus lock directly. The patch is only cosmetic but
should improve maintenance and consistency.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Thu, 25 Jun 2020 15:42:07 +0000 (17:42 +0200)]
net: phy: mscc: ptp: fix a typo in a comment
This patch fixes a typo in a comment, s/Ths/This/. The patch is cosmetic
only.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Thu, 25 Jun 2020 15:42:06 +0000 (17:42 +0200)]
net: phy: mscc: ptp: fix a smatch error
The following error was reported by smatch:
vsc85xx_ts_read_csr() error: uninitialized symbol 'blk_hw'.
In practice this is very unlikely, as all the block identifiers given to
this functions are handled and described in an enum. The smatch error is
fixed by doing what is already done in vsc85xx_ts_write_csr: using the
"PROCESSOR" block by default.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Thu, 25 Jun 2020 15:42:05 +0000 (17:42 +0200)]
net: phy: mscc: fix a possible double unlock
On vsc8584_ptp_init failure we jump to the 'err' label, which unlocks
the MDIO bus lock. But vsc8584_ptp_init isn't called with the MDIO bus
lock taken, which could result in a double unlock. Fix this.
Fixes:
ab2bf9339357 ("net: phy: mscc: 1588 block initialization")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Thu, 25 Jun 2020 15:42:04 +0000 (17:42 +0200)]
net: phy: mscc: macsec: fix sparse warnings
This patch fixes the following sparse warnings when building MACsec
support in the MSCC PHY driver.
mscc_macsec.c:393:42: warning: cast from restricted sci_t
mscc_macsec.c:395:42: warning: restricted sci_t degrades to integer
mscc_macsec.c:402:42: warning: restricted __be16 degrades to integer
mscc_macsec.c:608:34: warning: cast from restricted sci_t
mscc_macsec.c:610:34: warning: restricted sci_t degrades to integer
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubecek [Wed, 24 Jun 2020 22:09:08 +0000 (00:09 +0200)]
ethtool: fix error handling in linkstate_prepare_data()
When getting SQI or maximum SQI value fails in linkstate_prepare_data(), we
must not return without calling ethnl_ops_complete(dev) as that could
result in imbalance between ethtool_ops ->begin() and ->complete() calls.
Fixes:
806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 25 Jun 2020 23:16:49 +0000 (16:16 -0700)]
Merge tag 'trace-v5.8-rc2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Four small fixes:
- Fix a ringbuffer bug for nested events having time go backwards
- Fix a config dependency for boot time tracing to depend on
synthetic events instead of histograms.
- Fix trigger format parsing to handle multiple spaces
- Fix bootconfig to handle failures in multiple events"
* tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/boottime: Fix kprobe multiple events
tracing: Fix event trigger to accept redundant spaces
tracing/boot: Fix config dependency for synthedic event
ring-buffer: Zero out time extend if it is nested and not absolute
David S. Miller [Thu, 25 Jun 2020 23:16:21 +0000 (16:16 -0700)]
Merge branch 'napi_gro_receive-caller-return-value-cleanups'
Jason A. Donenfeld says:
====================
napi_gro_receive caller return value cleanups
In
6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
napi_gro_receive()"), the GRO_NORMAL case stopped calling
netif_receive_skb_internal, checking its return value, and returning
GRO_DROP in case it failed. Instead, it calls into
netif_receive_skb_list_internal (after a bit of indirection), which
doesn't return any error. Therefore, napi_gro_receive will never return
GRO_DROP, making handling GRO_DROP dead code.
I emailed the author of
6570bc79c0df on netdev [1] to see if this change
was intentional, but the dlink.ru email address has been disconnected,
and looking a bit further myself, it seems somewhat infeasible to start
propagating return values backwards from the internal machinations of
netif_receive_skb_list_internal.
Taking a look at all the callers of napi_gro_receive, it appears that
three are checking the return value for the purpose of comparing it to
the now never-happening GRO_DROP, and one just casts it to (void), a
likely historical leftover. Every other of the 120 callers does not
bother checking the return value.
And it seems like these remaining 116 callers are doing the right thing:
after calling napi_gro_receive, the packet is now in the hands of the
upper layers of the newtworking, and the device driver itself has no
business now making decisions based on what the upper layers choose to
do. Incrementing stats counters on GRO_DROP seems like a mistake, made
by these three drivers, but not by the remaining 117.
It would seem, therefore, that after rectifying these four callers of
napi_gro_receive, that I should go ahead and just remove returning the
value from napi_gro_receive all together. However, napi_gro_receive has
a function event tracer, and being able to introspect into the
networking stack to see how often napi_gro_receive is returning whatever
interesting GRO status (aside from _DROP) remains an interesting
data point worth keeping for debugging.
So, this series simply gets rid of the return value checking for the
four useless places where that check never evaluates to anything
meaningful.
[1] https://lore.kernel.org/netdev/
20200624210606.GA1362687@zx2c4.com/
====================
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Wed, 24 Jun 2020 22:06:06 +0000 (16:06 -0600)]
wil6210: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands. In this case, too, the non-gro path didn't bother checking
the return value. Plus, this had some clunky debugging functions that
duplicated code from elsewhere and was generally pretty messy. So, this
commit cleans that all up too.
Fixes:
6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Wed, 24 Jun 2020 22:06:05 +0000 (16:06 -0600)]
hns: do not cast return value of napi_gro_receive to null
Basically no drivers care about the return value here, and there's no
__must_check that would make casting to void sensible, so remove it.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Wed, 24 Jun 2020 22:06:04 +0000 (16:06 -0600)]
socionext: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands.
Fixes:
6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Wed, 24 Jun 2020 22:06:03 +0000 (16:06 -0600)]
wireguard: receive: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands.
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Fixes:
6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 24 Jun 2020 21:02:36 +0000 (14:02 -0700)]
vxlan: fix last fdb index during dump of fdb with nhid
This patch fixes last saved fdb index in fdb dump handler when
handling fdb's with nhid.
Fixes:
1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Wed, 24 Jun 2020 20:34:18 +0000 (17:34 -0300)]
sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
If a socket is set ipv6only, it will still send IPv4 addresses in the
INIT and INIT_ACK packets. This potentially misleads the peer into using
them, which then would cause association termination.
The fix is to not add IPv4 addresses to ipv6only sockets.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Briana Oursler [Wed, 24 Jun 2020 19:29:14 +0000 (12:29 -0700)]
tc-testing: avoid action cookies with odd length.
Update odd length cookie hexstrings in csum.json, tunnel_key.json and
bpf.json to be even length to comply with check enforced in commit
0149dabf2a1b ("tc: m_actions: check cookie hexstring len") in iproute2.
Signed-off-by: Briana Oursler <briana.oursler@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 23:08:47 +0000 (16:08 -0700)]
Merge branch 'tcp_cubic-fix-spurious-HYSTART_DELAY-on-RTT-decrease'
Neal Cardwell says:
====================
tcp_cubic: fix spurious HYSTART_DELAY on RTT decrease
This series fixes a long-standing bug in the TCP CUBIC
HYSTART_DELAY mechanim recently reported by Mirja Kuehlewind. The
code can cause a spurious exit of slow start in some particular
cases: upon an RTT decrease that happens on the 9th or later ACK
in a round trip. This series fixes the original Hystart code and
also the recent BPF implementation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Wed, 24 Jun 2020 16:42:03 +0000 (12:42 -0400)]
bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
Apply the fix from:
"tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT"
to the BPF implementation of TCP CUBIC congestion control.
Repeating the commit description here for completeness:
Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
ACK when the minimum rtt of a connection goes down. From inspection it
is clear from the existing code that this could happen in an example
like the following:
o The first 8 RTT samples in a round trip are 150ms, resulting in a
curr_rtt of 150ms and a delay_min of 150ms.
o The 9th RTT sample is 100ms. The curr_rtt does not change after the
first 8 samples, so curr_rtt remains 150ms. But delay_min can be
lowered at any time, so delay_min falls to 100ms. The code executes
the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
of 100ms, and the curr_rtt is declared far enough above delay_min to
force a (spurious) exit of Slow start.
The fix here is simple: allow every RTT sample in a round trip to
lower the curr_rtt.
Fixes:
6de4a9c430b5 ("bpf: tcp: Add bpf_cubic example")
Reported-by: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Wed, 24 Jun 2020 16:42:02 +0000 (12:42 -0400)]
tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
ACK when the minimum rtt of a connection goes down. From inspection it
is clear from the existing code that this could happen in an example
like the following:
o The first 8 RTT samples in a round trip are 150ms, resulting in a
curr_rtt of 150ms and a delay_min of 150ms.
o The 9th RTT sample is 100ms. The curr_rtt does not change after the
first 8 samples, so curr_rtt remains 150ms. But delay_min can be
lowered at any time, so delay_min falls to 100ms. The code executes
the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
of 100ms, and the curr_rtt is declared far enough above delay_min to
force a (spurious) exit of Slow start.
The fix here is simple: allow every RTT sample in a round trip to
lower the curr_rtt.
Fixes:
ae27e98a5152 ("[TCP] CUBIC v2.3")
Reported-by: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 23:06:56 +0000 (16:06 -0700)]
Merge branch 'Fixes-for-SJA1105-DSA-tc-gate-action'
Vladimir Oltean says:
====================
Fixes for SJA1105 DSA tc-gate action
This small series fixes 2 bugs in the tc-gate implementation:
1. The TAS state machine keeps getting rescheduled even after removing
tc-gate actions on all ports.
2. tc-gate actions with only one gate control list entry are installed
to hardware with an incorrect interval of zero, which makes the
switch erroneously drop those packets (since the configuration is
invalid).
To keep the code palatable, a forward-declaration was avoided by moving
some code around in patch 1/4. I hope that isn't too much of an issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 24 Jun 2020 13:54:47 +0000 (16:54 +0300)]
net: dsa: sja1105: fix tc-gate schedule with single element
The sja1105_gating_cfg_time_to_interval function does this, as per the
comments:
/* The gate entries contain absolute times in their e->interval field. Convert
* that to proper intervals (i.e. "0, 5, 10, 15" to "5, 5, 5, 5").
*/
To perform that task, it iterates over gating_cfg->entries, at each step
updating the interval of the _previous_ entry. So one interval remains
to be updated at the end of the loop: the last one (since it isn't
"prev" for anyone else).
But there was an erroneous check, that the last element's interval
should not be updated if it's also the only element. I'm not quite sure
why that check was there, but it's clearly incorrect, as a tc-gate
schedule with a single element would get an e->interval of zero,
regardless of the duration requested by the user. The switch wouldn't
even consider this configuration as valid: it will just drop all traffic
that matches the rule.
Fixes:
834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 24 Jun 2020 13:54:46 +0000 (16:54 +0300)]
net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
Currently, tas_data->enabled would remain true even after deleting all
tc-gate rules from the switch ports, which would cause the
sja1105_tas_state_machine to get unnecessarily scheduled.
Also, if there were any errors which would prevent the hardware from
enabling the gating schedule, the sja1105_tas_state_machine would
continuously detect and print that, spamming the kernel log, even if the
rules were subsequently deleted.
The rules themselves are _not_ active, because sja1105_init_scheduling
does enough of a job to not install the gating schedule in the static
config. But the virtual link rules themselves are still present.
So call the functions that remove the tc-gate configuration from
priv->tas_data.gating_cfg, so that tas_data->enabled can be set to
false, and sja1105_tas_state_machine will stop from being scheduled.
Fixes:
834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 24 Jun 2020 13:54:45 +0000 (16:54 +0300)]
net: dsa: sja1105: unconditionally free old gating config
Currently sja1105_compose_gating_subschedule is not prepared to be
called for the case where we want to recompute the global tc-gate
configuration after we've deleted those actions on a port.
After deleting the tc-gate actions on the last port, max_cycle_time
would become zero, and that would incorrectly prevent
sja1105_free_gating_config from getting called.
So move the freeing function above the check for the need to apply a new
configuration.
Fixes:
834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 24 Jun 2020 13:54:44 +0000 (16:54 +0300)]
net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
It turns out that sja1105_compose_gating_subschedule must also be called
from sja1105_vl_delete, to recalculate the overall tc-gate
configuration. Currently this is not possible without introducing a
forward declaration. So move the function at the top of the file, along
with its dependencies.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 23:05:21 +0000 (16:05 -0700)]
Merge branch 'RGMII-Internal-delay-common-property'
Dan Murphy says:
====================
RGMII Internal delay common property
The RGMII internal delay is a common setting found in most RGMII capable PHY
devices. It was found that many vendor specific device tree properties exist
to do the same function. This creates a common property to be used for PHY's
that have internal delays for the Rx and Tx paths.
If the internal delay is tunable then the caller needs to pass the internal
delay array and the return will be the index in the array that was found in
the firmware node.
If the internal delay is fixed then the caller only needs to indicate which
delay to return. There is no need for a fixed delay to add device properties
since the value is not configurable. Per the ethernet-controller.yaml the
interface type indicates that the PHY should provide the delay.
This series contains examples of both a configurable delay and a fixed delay.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Wed, 24 Jun 2020 12:16:05 +0000 (07:16 -0500)]
net: phy: DP83822: Add setting the fixed internal delay
The DP83822 can be configured to use the RGMII interface. There are
independent fixed 3.5ns clock shift (aka internal delay) for the TX and RX
paths. This allow either one to be set if the MII interface is RGMII and
the value is set in the firmware node.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Wed, 24 Jun 2020 12:16:04 +0000 (07:16 -0500)]
net: dp83869: Add RGMII internal delay configuration
Add RGMII internal delay configuration for Rx and Tx.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Wed, 24 Jun 2020 12:16:03 +0000 (07:16 -0500)]
dt-bindings: net: Add RGMII internal delay for DP83869
Add the internal delay values into the header and update the binding
with the internal delay properties.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Wed, 24 Jun 2020 12:16:02 +0000 (07:16 -0500)]
net: phy: Add a helper to return the index for of the internal delay
Add a helper function that will return the index in the array for the
passed in internal delay value. The helper requires the array, size and
delay value.
The helper will then return the index for the exact match or return the
index for the index to the closest smaller value.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Wed, 24 Jun 2020 12:16:01 +0000 (07:16 -0500)]
dt-bindings: net: Add tx and rx internal delays
tx-internal-delays and rx-internal-delays are a common setting for RGMII
capable devices.
These properties are used when the phy-mode or phy-controller is set to
rgmii-id, rgmii-rxid or rgmii-txid. These modes indicate to the
controller that the PHY will add the internal delay for the connection.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 23:03:39 +0000 (16:03 -0700)]
Merge branch 'dpaa2-eth-small-updates'
Ioana Ciornei says:
====================
dpaa2-eth: small updates
This patch set adds some updates to the dpaa2-eth driver: trimming of
the frame queue debugfs counters, cleanup of the remaining sparse
warnings and some other small fixes such as a recursive header include.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Wed, 24 Jun 2020 11:34:21 +0000 (14:34 +0300)]
dpaa2-eth: fix misspelled function parameters in dpni_[set/get]_taildrop
Two of the function parameters (qtype and index) were misspelled in the
associated descriptions of dpni_[set/get]_taildrop which led to sparse
warnings. Fix this by using the exact same names as present in the
function definition.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Wed, 24 Jun 2020 11:34:20 +0000 (14:34 +0300)]
dpaa2-eth: fix recursive header include
The dpaa2-eth.h header file includes dpaa2-eth-trace.h which includes
back dpaa2-eth leading to a recursion in the include path. Fix this by
removing the include of dpaa2-eth.h in the trace header.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Wed, 24 Jun 2020 11:34:19 +0000 (14:34 +0300)]
dpaa2-eth: fix condition for number of buffer acquire retries
We should keep retrying to acquire buffers through the software portals
as long as the function returns -EBUSY and the number of retries is
__below__ DPAA2_ETH_SWP_BUSY_RETRIES.
Fixes:
ef17bd7cc0c8 ("dpaa2-eth: Avoid unbounded while loops")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Wed, 24 Jun 2020 11:34:18 +0000 (14:34 +0300)]
dpaa2-eth: check the result of skb_to_sgvec()
Before passing the result of skb_to_sgvec() to dma_map_sg() check if any
error was returned.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Radulescu [Wed, 24 Jun 2020 11:34:17 +0000 (14:34 +0300)]
dpaa2-eth: trim debugfs FQ stats
With the addition of multiple traffic classes support, the number
of available frame queues grew significantly, overly inflating the
debugfs FQ statistics entry. Update it to only show the queues
which are actually in use (i.e. have a non-zero frame counter).
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Wed, 24 Jun 2020 11:30:04 +0000 (12:30 +0100)]
net: phylink: only restart AN if the link mode is using in-band AN
If we are not using in-band autonegotiation, there is no point passing
the request to restart autonegotiation on to the driver.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Wed, 24 Jun 2020 10:21:32 +0000 (11:21 +0100)]
net: dsa/ar9331: convert to mac_link_up()
Convert the ar9331 DSA driver to use the finalised link parameters in
mac_link_up() rather than the parameters in mac_config().
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Beznea [Wed, 24 Jun 2020 10:08:18 +0000 (13:08 +0300)]
net: macb: free resources on failure path of at91ether_open()
DMA buffers were not freed on failure path of at91ether_open().
Along with changes for freeing the DMA buffers the enable/disable
interrupt instructions were moved to at91ether_start()/at91ether_stop()
functions and the operations on at91ether_stop() were done in
their reverse order (compared with how is done in at91ether_start()):
before this patch the operation order on interface open path
was as follows:
1/ alloc DMA buffers
2/ enable tx, rx
3/ enable interrupts
and the order on interface close path was as follows:
1/ disable tx, rx
2/ disable interrupts
3/ free dma buffers.
Fixes:
7897b071ac3b ("net: macb: convert to phylink")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Beznea [Wed, 24 Jun 2020 10:08:17 +0000 (13:08 +0300)]
net: macb: call pm_runtime_put_sync on failure path
Call pm_runtime_put_sync() on failure path of at91ether_open.
Fixes:
e6a41c23df0d ("net: macb: ensure interface is not suspended on at91rm9200")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 21:04:51 +0000 (14:04 -0700)]
Merge tag 'mlx5-updates-2020-06-23' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 updates 2020-06-23
This series adds misc cleanup and updates to mlx5 driver.
1) Misc updates and cleanup
2) Use RCU instead of spinlock for vxlan table
v1->v2:
- Removed unnecessary Fixes Tags
v2->v3:
- Drop "macro undefine" patch, it has no value
v3->v4:
- Drop the Relaxed ordering patch.
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 25 Jun 2020 20:02:58 +0000 (13:02 -0700)]
Merge tag 'fsnotify_for_v5.8-rc3' of git://git./linux/kernel/git/jack/linux-fs
Pull fsnotify fixlet from Jan Kara:
"A performance improvement to reduce impact of fsnotify for inodes
where it isn't used"
* tag 'fsnotify_for_v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fs: Do not check if there is a fsnotify watcher on pseudo inodes
Russell King [Wed, 24 Jun 2020 10:06:54 +0000 (11:06 +0100)]
net: phylink: add phylink_speed_(up|down) interface
Add an interface for the phy_speed_(up|down) functions when a driver
makes use of phylink. These pass the call through to phylib when we
have a normal PHY attached (i.o.w., not a PHY on a SFP module.)
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 19:52:41 +0000 (12:52 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net, they are:
1) Unaligned atomic access in ipset, from Russell King.
2) Missing module description, from Rob Gill.
3) Patches to fix a module unload causing NULL pointer dereference in
xtables, from David Wilder. For the record, I posting here his cover
letter explaining the problem:
A crash happened on ppc64le when running ltp network tests triggered by
"rmmod iptable_mangle".
See previous discussion in this thread:
https://lists.openwall.net/netdev/2020/06/03/161 .
In the crash I found in iptable_mangle_hook() that
state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference.
net->ipv4.iptable_mangle is set to NULL in +iptable_mangle_net_exit() and
called when ip_mangle modules is unloaded. A rmmod task was found running
in the crash dump. A 2nd crash showed the same problem when running
"rmmod iptable_filter" (net->ipv4.iptable_filter=NULL).
To fix this I added .pre_exit hook in all iptable_foo.c. The pre_exit will
un-register the underlying hook and exit would do the table freeing. The
netns core does an unconditional +synchronize_rcu after the pre_exit hooks
insuring no packets are in flight that have picked up the pointer before
completing the un-register.
These patches include changes for both iptables and ip6tables.
We tested this fix with ltp running iptables01.sh and iptables01.sh -6 a
loop for 72 hours.
4) Add a selftest for conntrack helper assignment, from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Sat, 16 May 2020 00:11:29 +0000 (17:11 -0700)]
net/mlx5e: vxlan: Return bool instead of opaque ptr in port_lookup()
struct mlx5_vxlan_port is not exposed to the outside callers, it is
redundant to return a pointer to it from mlx5_vxlan_port_lookup(), to be
only used as a boolean, so just return a boolean.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Sat, 16 May 2020 00:09:05 +0000 (17:09 -0700)]
net/mlx5e: vxlan: Use RCU for vxlan table lookup
Remove the spinlock protecting the vxlan table and use RCU instead.
This will improve performance as it will eliminate contention on data
path cores.
Fixes:
b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Vlad Buslov [Wed, 10 Jun 2020 15:09:13 +0000 (18:09 +0300)]
net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT
en_tc.h header file declares several TC-specific functions in
CONFIG_MLX5_ESWITCH block even though those functions are only compiled
when CONFIG_MLX5_CLS_ACT is set, which is a recent change. Move them to
proper block.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alaa Hleihel [Tue, 2 Jun 2020 09:09:21 +0000 (12:09 +0300)]
net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c
After the cited commit, the header net/arp.h is no longer used in en_rep.c.
So, move it to the new file rep/neigh.c that uses it now.
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Maxim Mikityanskiy [Thu, 11 Jun 2020 12:48:45 +0000 (15:48 +0300)]
net/mlx5e: Remove unused mlx5e_xsk_first_unused_channel
mlx5e_xsk_first_unused_channel is a leftover from old versions of the
first XSK commit, and it was never used. Remove it.
Fixes:
db05815b36cb ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Denis Efremov [Fri, 5 Jun 2020 19:22:35 +0000 (22:22 +0300)]
net/mlx5: Use kfree(ft->g) in arfs_create_groups()
Use kfree() instead of kvfree() on ft->g in arfs_create_groups() because
the memory is allocated with kcalloc().
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Hu Haowen [Fri, 3 Apr 2020 04:26:59 +0000 (12:26 +0800)]
net/mlx5: FWTrace: Add missing space
Missing space at the end of a comment line, add it.
Signed-off-by: Hu Haowen <xianfengting221@163.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Thu, 28 May 2020 09:48:27 +0000 (04:48 -0500)]
net/mlx5: Avoid eswitch header inclusion in fs core layer
Flow steering core layer is independent of the eswitch layer.
Hence avoid fs_core dependency on eswitch.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Thomas Martitz [Thu, 25 Jun 2020 12:26:03 +0000 (14:26 +0200)]
net: bridge: enfore alignment for ethernet address
The eth_addr member is passed to ether_addr functions that require
2-byte alignment, therefore the member must be properly aligned
to avoid unaligned accesses.
The problem is in place since the initial merge of multicast to unicast:
commit
6db6f0eae6052b70885562e1733896647ec1d807 bridge: multicast to unicast
Fixes:
6db6f0eae605 ("bridge: multicast to unicast")
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Thomas Martitz <t.martitz@avm.de>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 25 Jun 2020 19:38:09 +0000 (12:38 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Several regression fixes from work that landed in the merge window,
particularly in the mlx5 driver:
- Various static checker and warning fixes
- General bug fixes in rvt, qedr, hns, mlx5 and hfi1
- Several regression fixes related to the ECE and QP changes in last
cycle
- Fixes for a few long standing crashers in CMA, uverbs ioctl, and
xrc"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
IB/hfi1: Add atomic triggered sleep/wakeup
IB/hfi1: Correct -EBUSY handling in tx code
IB/hfi1: Fix module use count flaw due to leftover module put calls
IB/hfi1: Restore kfree in dummy_netdev cleanup
IB/mad: Fix use after free when destroying MAD agent
RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata
RDMA/counter: Query a counter before release
RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads()
RDMA/mlx5: Fix integrity enabled QP creation
RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs
RDMA/mlx5: Fix remote gid value in query QP
RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path
RDMA/core: Check that type_attrs is not NULL prior access
RDMA/hns: Fix an cmd queue issue when resetting
RDMA/hns: Fix a calltrace when registering MR from userspace
RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake
RDMA/cma: Protect bind_list and listen_list while finding matching cm id
RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532
RDMA/efa: Set maximum pkeys device attribute
RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq
...
Vaibhav Gupta [Thu, 25 Jun 2020 12:10:43 +0000 (17:40 +0530)]
ptp_pch: use generic power management
With legacy PM, drivers themselves were responsible for managing the
device's power states and takes care of register states.
After upgrading to the generic structure, PCI core will take care of
required tasks and drivers should do only device-specific operations.
In the case of ptp_pch, after removing PCI helper functions, .suspend()
and .resume() became empty-body functions. Hence, define them NULL and
use dev_pm_ops.
Compile-tested only.
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis Kirjanov [Thu, 25 Jun 2020 11:51:06 +0000 (14:51 +0300)]
tcp: don't ignore ECN CWR on pure ACK
there is a problem with the CWR flag set in an incoming ACK segment
and it leads to the situation when the ECE flag is latched forever
the following packetdrill script shows what happens:
// Stack receives incoming segments with CE set
+0.1 <[ect0] . 11001:12001(1000) ack 1001 win 65535
+0.0 <[ce] . 12001:13001(1000) ack 1001 win 65535
+0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535
// Stack repsonds with ECN ECHO
+0.0 >[noecn] . 1001:1001(0) ack 12001
+0.0 >[noecn] E. 1001:1001(0) ack 13001
+0.0 >[noecn] E. 1001:1001(0) ack 14001
// Write a packet
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] PE. 1001:2001(1000) ack 14001
// Pure ACK received
+0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
// Since CWR was sent, this packet should NOT have ECE set
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] P. 2001:3001(1000) ack 14001
// but Linux will still keep ECE latched here, with packetdrill
// flagging a missing ECE flag, expecting
// >[ect0] PE. 2001:3001(1000) ack 14001
// in the script
In the situation above we will continue to send ECN ECHO packets
and trigger the peer to reduce the congestion window. To avoid that
we can check CWR on pure ACKs received.
v3:
- Add a sequence check to avoid sending an ACK to an ACK
v2:
- Adjusted the comment
- move CWR check before checking for unacknowledged packets
Signed-off-by: Denis Kirjanov <denis.kirjanov@suse.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ard Biesheuvel [Thu, 25 Jun 2020 07:18:16 +0000 (09:18 +0200)]
net: phy: mscc: avoid skcipher API for single block AES encryption
The skcipher API dynamically instantiates the transformation object
on request that implements the requested algorithm optimally on the
given platform. This notion of optimality only matters for cases like
bulk network or disk encryption, where performance can be a bottleneck,
or in cases where the algorithm itself is not known at compile time.
In the mscc case, we are dealing with AES encryption of a single
block, and so neither concern applies, and we are better off using
the AES library interface, which is lightweight and safe for this
kind of use.
Note that the scatterlist API does not permit references to buffers
that are located on the stack, so the existing code is incorrect in
any case, but avoiding the skcipher and scatterlist APIs entirely is
the most straight-forward approach to fixing this.
Cc: Antoine Tenart <antoine.tenart@bootlin.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Fixes:
28c5107aa904e ("net: phy: mscc: macsec support")
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 25 Jun 2020 16:24:28 +0000 (09:24 -0700)]
Merge tag 's390-5.8-3' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix kernel crash on system call single stepping.
- Make sure early program check handler is executed with DAT on to
avoid an endless program check loop.
- Add __GFP_NOWARN flag to debug feature to avoid user triggerable
allocation failure messages.
* tag 's390-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/debug: avoid kernel warning on too large number of pages
s390/kasan: fix early pgm check handler execution
s390: fix system call single stepping
Linus Torvalds [Thu, 25 Jun 2020 16:15:24 +0000 (09:15 -0700)]
Merge tag 'sound-5.8-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes gathered in the last two weeks.
The major changes here are fixes for the recent DPCM regressions found
on i.MX and Qualcomm platforms and fixes for resource leaks in ASoC
DAI registrations.
Other than those are mostly device-specific fixes including the usual
USB- and HD-audio quirks, and a fix for syzkaller case and ID updates
for new Intel platforms"
* tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits)
ALSA: usb-audio: Fix OOB access of mixer element list
ALSA: usb-audio: add quirk for Samsung USBC Headset (AKG)
ALSA: usb-audio: Add registration quirk for Kingston HyperX Cloud Flight S
ASoC: rockchip: Fix a reference count leak.
ASoC: amd: closing specific instance.
ALSA: hda: Intel: add missing PCI IDs for ICL-H, TGL-H and EKL
ASoC: hdac_hda: fix memleak with regmap not freed on remove
ASoC: SOF: Intel: add PCI IDs for ICL-H and TGL-H
ASoC: SOF: Intel: add PCI ID for CometLake-S
ASoC: Intel: SOF: merge COMETLAKE_LP and COMETLAKE_H
ALSA: hda/realtek: Add mute LED and micmute LED support for HP systems
ALSA: usb-audio: Fix potential use-after-free of streams
ALSA: hda/realtek - Add quirk for MSI GE63 laptop
ASoC: fsl_ssi: Fix bclk calculation for mono channel
ASoC: SOF: Intel: hda: Clear RIRB status before reading WP
ASoC: rt1015: Update rt1015 default register value according to spec modification.
ASoC: qcom: common: set correct directions for dailinks
ASoc: q6afe: add support to get port direction
ASoC: soc-pcm: fix checks for multi-cpu FE dailinks
ASoC: rt5682: Let dai clks be registered whether mclk exists or not
...
Po Liu [Wed, 24 Jun 2020 09:36:31 +0000 (17:36 +0800)]
net: enetc add tc flower offload flow metering policing action
Flow metering entries in IEEE 802.1Qci is an optional function for a
flow filtering module. Flow metering is two rates two buckets and three
color marker to policing the frames. This patch only enable one rate one
bucket and in color blind mode. Flow metering instance are as
specified in the algorithm in MEF 10.3 and in Bandwidth Profile
Parameters. They are:
a) Flow meter instance identifier. An integer value identifying the flow
meter instance. The patch use the police 'index' as thin value.
b) Committed Information Rate (CIR), in bits per second. This patch use
the 'rate_bytes_ps' represent this value.
c) Committed Burst Size (CBS), in octets. This patch use the 'burst'
represent this value.
d) Excess Information Rate (EIR), in bits per second.
e) Excess Burst Size per Bandwidth Profile Flow (EBS), in octets.
And plus some other parameters. This patch set EIR/EBS default disable
and color blind mode.
v1->v2 changes:
- Use div_u64() as division replace the '/' report:
All errors (new ones prefixed by >>):
ld: drivers/net/ethernet/freescale/enetc/enetc_qos.o: in function `enetc_flowmeter_hw_set':
>> enetc_qos.c:(.text+0x66): undefined reference to `__udivdi3'
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Po Liu [Wed, 24 Jun 2020 09:36:30 +0000 (17:36 +0800)]
net: qos: police action add index for tc flower offloading
Hardware device may include more than one police entry. Specifying the
action's index make it possible for several tc filters to share the same
police action when installing the filters.
Propagate this index to device drivers through the flow offload
intermediate representation, so that drivers could share a single
hardware policer between multiple filters.
v1->v2 changes:
- Update the commit message suggest by Ido Schimmel <idosch@idosch.org>
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Po Liu [Wed, 24 Jun 2020 09:36:29 +0000 (17:36 +0800)]
net: enetc: add support max frame size for tc flower offload
Base on the tc flower offload police action add max frame size by the
parameter 'mtu'. Tc flower device driver working by the IEEE 802.1Qci
stream filter can implement the max frame size filtering. Add it to the
current hardware tc flower stearm filter driver.
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Po Liu [Wed, 24 Jun 2020 09:36:28 +0000 (17:36 +0800)]
net: qos: add tc police offloading action with max frame size limit
Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some
hardware own the capability to limit the frame size. If the frame size
larger than the setting, the frame would be dropped. For the police
action itself already accept the 'mtu' parameter in tc command. But not
extend to tc flower offloading. So extend 'mtu' to tc flower offloading.
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Jun 2020 04:51:03 +0000 (21:51 -0700)]
Merge branch 'net-bcmgenet-use-hardware-padding-of-runt-frames'
Doug Berger says:
====================
net: bcmgenet: use hardware padding of runt frames
Now that scatter-gather and tx-checksumming are enabled by default
it revealed a packet corruption issue that can occur for very short
fragmented packets.
When padding these frames to the minimum length it is possible for
the non-linear (fragment) data to be added to the end of the linear
header in an SKB. Since the number of fragments is read before the
padding and used afterward without reloading, the fragment that
should have been consumed can be tacked on in place of part of the
padding.
The third commit in this set corrects this by removing the software
padding and allowing the hardware to add the pad bytes if necessary.
The first two commits resolve warnings observed by the kbuild test
robot and are included here for simplicity of application.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Thu, 25 Jun 2020 01:14:55 +0000 (18:14 -0700)]
net: bcmgenet: use hardware padding of runt frames
When commit
474ea9cafc45 ("net: bcmgenet: correctly pad short
packets") added the call to skb_padto() it should have been
located before the nr_frags parameter was read since that value
could be changed when padding packets with lengths between 55
and 59 bytes (inclusive).
The use of a stale nr_frags value can cause corruption of the
pad data when tx-scatter-gather is enabled. This corruption of
the pad can cause invalid checksum computation when hardware
offload of tx-checksum is also enabled.
Since the original reason for the padding was corrected by
commit
7dd399130efb ("net: bcmgenet: fix skb_len in
bcmgenet_xmit_single()") we can remove the software padding all
together and make use of hardware padding of short frames as
long as the hardware also always appends the FCS value to the
frame.
Fixes:
474ea9cafc45 ("net: bcmgenet: correctly pad short packets")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Thu, 25 Jun 2020 01:14:54 +0000 (18:14 -0700)]
net: bcmgenet: use __be16 for htons(ETH_P_IP)
The 16-bit value that holds a short in network byte order should
be declared as a restricted big endian type to allow type checks
to succeed during assignment.
Fixes:
3e370952287c ("net: bcmgenet: add support for ethtool rxnfc flows")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Thu, 25 Jun 2020 01:14:53 +0000 (18:14 -0700)]
net: bcmgenet: re-remove bcmgenet_hfb_add_filter
This function was originally removed by Baoyou Xie in
commit
e2072600a241 ("net: bcmgenet: remove unused function in
bcmgenet.c") to prevent a build warning.
Some of the functions removed by Baoyou Xie are now used for
WAKE_FILTER support so his commit was reverted, but this function
is still unused and the kbuild test robot dutifully reported the
warning.
This commit once again removes the remaining unused hfb functions.
Fixes:
14da1510fedc ("Revert "net: bcmgenet: remove unused function in bcmgenet.c"")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 25 Jun 2020 00:39:30 +0000 (17:39 -0700)]
Merge tag 'erofs-for-5.8-rc3-fixes' of git://git./linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"Fix a regression which uses potential uninitialized high 32-bit value
unexpectedly recently observed with specific compiler options"
* tag 'erofs-for-5.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup
Florian Westphal [Mon, 22 Jun 2020 08:28:32 +0000 (10:28 +0200)]
selftests: netfilter: add test case for conntrack helper assignment
check that 'nft ... ct helper set <foo>' works:
1. configure ftp helper via nft and assign it to
connections on port 2121
2. check with 'conntrack -L' that the next connection
has the ftp helper attached to it.
Also add a test for auto-assign (old behaviour).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David Wilder [Mon, 22 Jun 2020 17:10:14 +0000 (10:10 -0700)]
netfilter: ip6tables: Add a .pre_exit hook in all ip6table_foo.c.
Using new helpers ip6t_unregister_table_pre_exit() and
ip6t_unregister_table_exit().
Fixes:
b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David Wilder [Mon, 22 Jun 2020 17:10:13 +0000 (10:10 -0700)]
netfilter: ip6tables: Split ip6t_unregister_table() into pre_exit and exit helpers.
The pre_exit will un-register the underlying hook and .exit will do
the table freeing. The netns core does an unconditional synchronize_rcu
after the pre_exit hooks insuring no packets are in flight that have
picked up the pointer before completing the un-register.
Fixes:
b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David Wilder [Mon, 22 Jun 2020 17:10:12 +0000 (10:10 -0700)]
netfilter: iptables: Add a .pre_exit hook in all iptable_foo.c.
Using new helpers ipt_unregister_table_pre_exit() and
ipt_unregister_table_exit().
Fixes:
b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David Wilder [Mon, 22 Jun 2020 17:10:11 +0000 (10:10 -0700)]
netfilter: iptables: Split ipt_unregister_table() into pre_exit and exit helpers.
The pre_exit will un-register the underlying hook and .exit will do the
table freeing. The netns core does an unconditional synchronize_rcu after
the pre_exit hooks insuring no packets are in flight that have picked up
the pointer before completing the un-register.
Fixes:
b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Rob Gill [Sun, 21 Jun 2020 05:27:36 +0000 (05:27 +0000)]
netfilter: Add MODULE_DESCRIPTION entries to kernel modules
The user tool modinfo is used to get information on kernel modules, including a
description where it is available.
This patch adds a brief MODULE_DESCRIPTION to netfilter kernel modules
(descriptions taken from Kconfig file or code comments)
Signed-off-by: Rob Gill <rrobgill@protonmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Russell King [Wed, 10 Jun 2020 20:51:11 +0000 (21:51 +0100)]
netfilter: ipset: fix unaligned atomic access
When using ip_set with counters and comment, traffic causes the kernel
to panic on 32-bit ARM:
Alignment trap: not handling instruction
e1b82f9f at [<
bf01b0dc>]
Unhandled fault: alignment exception (0x221) at 0xea08133c
PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]
The problem occurs when we try to update the 64-bit counters - the
faulting address above is not 64-bit aligned. The problem occurs
due to the way elements are allocated, for example:
set->dsize = ip_set_elem_len(set, tb, 0, 0);
map = ip_set_alloc(sizeof(*map) + elements * set->dsize);
If the element has a requirement for a member to be 64-bit aligned,
and set->dsize is not a multiple of 8, but is a multiple of four,
then every odd numbered elements will be misaligned - and hitting
an atomic64_add() on that element will cause the kernel to panic.
ip_set_elem_len() must return a size that is rounded to the maximum
alignment of any extension field stored in the element. This change
ensures that is the case.
Fixes:
95ad1f4a9358 ("netfilter: ipset: Fix extension alignment")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Colin Ian King [Wed, 24 Jun 2020 10:13:02 +0000 (11:13 +0100)]
qed: add missing error test for DBG_STATUS_NO_MATCHING_FRAMING_MODE
The error DBG_STATUS_NO_MATCHING_FRAMING_MODE was added to the enum
enum dbg_status however there is a missing corresponding entry for
this in the array s_status_str. This causes an out-of-bounds read when
indexing into the last entry of s_status_str. Fix this by adding in
the missing entry.
Addresses-Coverity: ("Out-of-bounds read").
Fixes:
2d22bc8354b1 ("qed: FW 8.42.2.0 debug features")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 24 Jun 2020 21:52:49 +0000 (14:52 -0700)]
Merge branch 'net-phy-call-phy_disable_interrupts-in-phy_init_hw'
Jisheng Zhang says:
====================
net: phy: call phy_disable_interrupts() in phy_init_hw()
We face an issue with rtl8211f, a pin is shared between INTB and PMEB,
and the PHY Register Accessible Interrupt is enabled by default, so
the INTB/PMEB pin is always active in polling mode case.
As Heiner pointed out "I was thinking about calling
phy_disable_interrupts() in phy_init_hw(), to have a defined init
state as we don't know in which state the PHY is if the PHY driver is
loaded. We shouldn't assume that it's the chip power-on defaults, BIOS
or boot loader could have changed this. Or in case of dual-boot
systems the other OS could leave the PHY in whatever state."
patch1 makes phy_disable_interrupts() non-static so that it could be used
in phy_init_hw() to have a defined init state.
patch2 calls phy_disable_interrupts() in phy_init_hw() to have a
defined init state.
Since v3:
- call phy_disable_interrupts() have interrupts disabled first then
config_init, thank Florian
Since v2:
- Don't export phy_disable_interrupts() but just make it non-static
Since v1:
- EXPORT the correct symbol
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jisheng Zhang [Wed, 24 Jun 2020 07:59:23 +0000 (15:59 +0800)]
net: phy: call phy_disable_interrupts() in phy_init_hw()
Call phy_disable_interrupts() in phy_init_hw() to "have a defined init
state as we don't know in which state the PHY is if the PHY driver is
loaded. We shouldn't assume that it's the chip power-on defaults, BIOS
or boot loader could have changed this. Or in case of dual-boot
systems the other OS could leave the PHY in whatever state." as pointed
out by Heiner.
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jisheng Zhang [Wed, 24 Jun 2020 07:58:24 +0000 (15:58 +0800)]
net: phy: make phy_disable_interrupts() non-static
We face an issue with rtl8211f, a pin is shared between INTB and PMEB,
and the PHY Register Accessible Interrupt is enabled by default, so
the INTB/PMEB pin is always active in polling mode case.
As Heiner pointed out "I was thinking about calling
phy_disable_interrupts() in phy_init_hw(), to have a defined init
state as we don't know in which state the PHY is if the PHY driver is
loaded. We shouldn't assume that it's the chip power-on defaults, BIOS
or boot loader could have changed this. Or in case of dual-boot
systems the other OS could leave the PHY in whatever state."
Make phy_disable_interrupts() non-static so that it could be used in
phy_init_hw() to have a defined init state.
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sascha Hauer [Wed, 24 Jun 2020 07:00:45 +0000 (09:00 +0200)]
net: ethernet: mvneta: Add back interface mode validation
When writing the serdes configuration register was moved to
mvneta_config_interface() the whole code block was removed from
mvneta_port_power_up() in the assumption that its only purpose was to
write the serdes configuration register. As mentioned by Russell King
its purpose was also to check for valid interface modes early so that
later in the driver we do not have to care for unexpected interface
modes.
Add back the test to let the driver bail out early on unhandled
interface modes.
Fixes:
b4748553f53f ("net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy")
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sascha Hauer [Wed, 24 Jun 2020 07:00:44 +0000 (09:00 +0200)]
net: ethernet: mvneta: Do not error out in non serdes modes
In mvneta_config_interface() the RGMII modes are catched by the default
case which is an error return. The RGMII modes are valid modes for the
driver, so instead of returning an error add a break statement to return
successfully.
This avoids this warning for non comphy SoCs which use RGMII, like
SolidRun Clearfog:
WARNING: CPU: 0 PID: 268 at drivers/net/ethernet/marvell/mvneta.c:3512 mvneta_start_dev+0x220/0x23c
Fixes:
b4748553f53f ("net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy")
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Wed, 24 Jun 2020 07:18:21 +0000 (07:18 +0000)]
lan743x: Remove duplicated include from lan743x_main.c
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Sat, 20 Jun 2020 19:39:25 +0000 (21:39 +0200)]
dsa: Allow forwarding of redirected IGMP traffic
The driver for Marvell switches puts all ports in IGMP snooping mode
which results in all IGMP/MLD frames that ingress on the ports to be
forwarded to the CPU only.
The bridge code in the kernel can then interpret these frames and act
upon them, for instance by updating the mdb in the switch to reflect
multicast memberships of stations connected to the ports. However,
the IGMP/MLD frames must then also be forwarded to other ports of the
bridge so external IGMP queriers can track membership reports, and
external multicast clients can receive query reports from foreign IGMP
queriers.
Currently, this is impossible as the EDSA tagger sets offload_fwd_mark
on the skb when it unwraps the tagged frames, and that will make the
switchdev layer prevent the skb from egressing on any other port of
the same switch.
To fix that, look at the To_CPU code in the DSA header and make
forwarding of the frame possible for trapped IGMP packets.
Introduce some #defines for the frame types to make the code a bit more
comprehensive.
This was tested on a Marvell
88E6352 variant.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 24 Jun 2020 21:36:33 +0000 (14:36 -0700)]
Merge branch 'net-bridge-fdb-activity-tracking'
Nikolay Aleksandrov says:
====================
net: bridge: fdb activity tracking
This set adds extensions needed for EVPN multi-homing proper and
efficient mac sync. User-space (e.g. FRR) needs to be able to track
non-dynamic entry activity on per-fdb basis depending if a tracked fdb is
currently peer active or locally active and needs to be able to add new
peer active fdb (static + track + inactive) without refreshing it to get
real activity tracking. Patch 02 adds a new NDA attribute - NDA_FDB_EXT_ATTRS
to avoid future pollution of NDA attributes by bridge or vxlan. New
bridge/vxlan specific fdb attributes are embedded in NDA_FDB_EXT_ATTRS,
which is used in patch 03 to pass the new NFEA_ACTIVITY_NOTIFY attribute
which controls if an fdb should be tracked and also reflects its current
state when dumping. It is treated as a bitfield, current valid bits are:
1 - mark an entry for activity tracking
2 - mark an entry as inactive to avoid multiple notifications and
reflect state properly
Patch 04 adds the ability to avoid refreshing an entry when changing it
via the NFEA_DONT_REFRESH flag. That allows user-space to mark a static
entry for tracking and keep its real activity unchanged.
The set has been extensively tested with FRR and those changes will
be upstreamed if/after it gets accepted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 23 Jun 2020 20:47:18 +0000 (23:47 +0300)]
net: bridge: add a flag to avoid refreshing fdb when changing/adding
When we modify or create a new fdb entry sometimes we want to avoid
refreshing its activity in order to track it properly. One example is
when a mac is received from EVPN multi-homing peer by FRR, which doesn't
want to change local activity accounting. It makes it static and sets a
flag to track its activity.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 23 Jun 2020 20:47:17 +0000 (23:47 +0300)]
net: bridge: add option to allow activity notifications for any fdb entries
This patch adds the ability to notify about activity of any entries
(static, permanent or ext_learn). EVPN multihoming peers need it to
properly and efficiently handle mac sync (peer active/locally active).
We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the
current activity state and to control if static entries should be monitored
at all. We use 2 bits - one to activate fdb entry tracking (disabled by
default) and the second to denote that an entry is inactive. We need
the second bit in order to avoid multiple notifications of inactivity.
Obviously this makes no difference for dynamic entries since at the time
of inactivity they get deleted, while the tracked non-dynamic entries get
the inactive bit set and get a notification.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 23 Jun 2020 20:47:16 +0000 (23:47 +0300)]
net: neighbor: add fdb extended attribute
Add an attribute to NDA which will contain all future fdb-specific
attributes in order to avoid polluting the NDA namespace with e.g.
bridge or vxlan specific attributes. The attribute is called
NDA_FDB_EXT_ATTRS and the structure would look like:
[NDA_FDB_EXT_ATTRS] = {
[NFEA_xxx]
}
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 23 Jun 2020 20:47:15 +0000 (23:47 +0300)]
net: bridge: fdb_add_entry takes ndm as argument
We can just pass ndm as an argument instead of its fields separately.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>