--- - branch: MAIN date: Sun Feb 27 19:21:54 UTC 2022 files: - new: '1.9' old: '1.8' path: src/common/lib/libc/arch/mips/atomic/atomic_cas.S pathrev: src/common/lib/libc/arch/mips/atomic/atomic_cas.S@1.9 type: modified - new: '1.5' old: '1.4' path: src/common/lib/libc/arch/mips/atomic/atomic_op_asm.h pathrev: src/common/lib/libc/arch/mips/atomic/atomic_op_asm.h@1.5 type: modified - new: '1.8' old: '1.7' path: src/common/lib/libc/arch/mips/atomic/atomic_swap.S pathrev: src/common/lib/libc/arch/mips/atomic/atomic_swap.S@1.8 type: modified - new: '1.66' old: '1.65' path: src/sys/arch/mips/include/asm.h pathrev: src/sys/arch/mips/include/asm.h@1.66 type: modified - new: '1.15' old: '1.14' path: src/sys/arch/mips/mips/lock_stubs_llsc.S pathrev: src/sys/arch/mips/mips/lock_stubs_llsc.S@1.15 type: modified id: 20220227T192154Z.3256515b04bca96d269b7852d7d83fb0067db750 log: | mips: Membar audit. This change should be safe because it doesn't remove or weaken any memory barriers, but does add, clarify, or strengthen barriers. Goals: - Make sure mutex_enter/exit and mutex_spin_enter/exit have acquire/release semantics. - New macros make maintenance easier and purpose clearer: . SYNC_ACQ is for load-before-load/store barrier, and BDSYNC_ACQ for a branch delay slot -- currently defined as plain sync for MP and nothing, or nop, for UP; thus it is no weaker than SYNC and BDSYNC as currently defined, which is syncw on Octeon, plain sync on non-Octeon MP, and nothing/nop on UP. It is not clear to me whether load-then-syncw or ll/sc-then-syncw or even bare load provides load-acquire semantics on Octeon -- if no, this will fix bugs; if yes (like it is on SPARC PSO), we can relax SYNC_ACQ to be syncw or nothing later. . SYNC_REL is for load/store-before-store barrier -- currently defined as plain sync for MP and nothing for UP. It is not clear to me whether syncw-then-store is enough for store-release on Octeon -- if no, we can leave this as is; if yes, we can relax SYNC_REL to be syncw on Octeon. . SYNC_PLUNGER is there to flush clogged Cavium store buffers, and BDSYNC_PLUNGER for a branch delay slot -- syncw on Octeon, nothing or nop on non-Octeon. => This is not necessary (or, as far as I'm aware, sufficient) for acquire semantics -- it serves only to flush store buffers where stores might otherwise linger for hundreds of thousands of cycles, which would, e.g., cause spin locks to be held for unreasonably long durations. Newerish revisions of the MIPS ISA also have finer-grained sync variants that could be plopped in here. Mechanism: Insert these barriers in the right places, replacing only those where the definition is currently equivalent, so this change is safe. - Replace #ifdef _MIPS_ARCH_OCTEONP / syncw / #endif at the end of atomic_cas_* by SYNC_PLUNGER, which is `sync 4' (a.k.a. syncw) if __OCTEON__ and empty otherwise. => From what I can tell, __OCTEON__ is defined in at least as many contexts as _MIPS_ARCH_OCTEONP -- i.e., there are some Octeons with no _MIPS_ARCH_OCTEONP, but I don't know if any of them are relevant to us or ever saw the light of day outside Cavium; we seem to buid with `-march=octeonp' so this is unlikely to make a difference. If it turns out that we do care, well, now there's a central place to make the distinction for sync instructions. - Replace post-ll/sc SYNC by SYNC_ACQ in _atomic_cas_*, which are internal kernel versions used in sys/arch/mips/include/lock.h where it assumes they have load-acquire semantics. Should move this to lock.h later, since we _don't_ define __HAVE_ATOMIC_AS_MEMBAR on MIPS and so the extra barrier might be costly. - Insert SYNC_REL before ll/sc, and replace post-ll/sc SYNC by SYNC_ACQ, in _ucas_*, which is used without any barriers in futex code and doesn't mention barriers in the man page so I have to assume it is required to be a release/acquire barrier. - Change BDSYNC to BDSYNC_ACQ in mutex_enter and mutex_spin_enter. This is necessary to provide load-acquire semantics -- unclear if it was provided already by syncw on Octeon, but it seems more likely that either (a) no sync or syncw is needed at all, or (b) syncw is not enough and sync is needed, since syncw is only a store-before-store ordering barrier. - Insert SYNC_REL before ll/sc in mutex_exit and mutex_spin_exit. This is currently redundant with the SYNC already there, but SYNC_REL more clearly identifies the necessary semantics in case we want to define it differently on different systems, and having a sync in the middle of an ll/sc is a bit weird and possibly not a good idea, so I intend to (carefully) remove the redundant SYNC in a later change. - Change BDSYNC to BDSYNC_PLUNGER at the end of mutex_exit. This has no semantic change right now -- it's syncw on Octeon, sync on non-Octeon MP, nop on UP -- but we can relax it later to nop on non-Cavium MP. - Leave LLSCSYNC in for now -- it is apparently there for a Cavium erratum, but I'm not sure what the erratum is, exactly, and I have no reference for it. I suspect these can be safely removed, but we might have to double up some other syncw instructions -- Linux uses it only in store-release sequences, not at the head of every ll/sc. module: src subject: 'CVS commit: src' unixtime: '1645989714' user: riastradh