whiterose

linux unikernel
Log | Files | Refs | README | LICENSE | git clone https://git.ne02ptzero.me/git/whiterose

commit 685f7e4f161425b137056abe35ba8ef7b669d83d
parent c7a2c49ea6c9eebbe44ff2c08b663b2905ee2c13
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri, 26 Oct 2018 14:36:21 -0700

Merge tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - A large series to rewrite our SLB miss handling, replacing a lot of
     fairly complicated asm with much fewer lines of C.

   - Following on from that, we now maintain a cache of SLB entries for
     each process and preload them on context switch. Leading to a 27%
     speedup for our context switch benchmark on Power9.

   - Improvements to our handling of SLB multi-hit errors. We now print
     more debug information when they occur, and try to continue running
     by flushing the SLB and reloading, rather than treating them as
     fatal.

   - Enable THP migration on 64-bit Book3S machines (eg. Power7/8/9).

   - Add support for physical memory up to 2PB in the linear mapping on
     64-bit Book3S. We only support up to 512TB as regular system
     memory, otherwise the percpu allocator runs out of vmalloc space.

   - Add stack protector support for 32 and 64-bit, with a per-task
     canary.

   - Add support for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP.

   - Support recognising "big cores" on Power9, where two SMT4 cores are
     presented to us as a single SMT8 core.

   - A large series to cleanup some of our ioremap handling and PTE
     flags.

   - Add a driver for the PAPR SCM (storage class memory) interface,
     allowing guests to operate on SCM devices (acked by Dan).

   - Changes to our ftrace code to handle very large kernels, where we
     need to use a trampoline to get to ftrace_caller().

  And many other smaller enhancements and cleanups.

  Thanks to: Alan Modra, Alistair Popple, Aneesh Kumar K.V, Anton
  Blanchard, Aravinda Prasad, Bartlomiej Zolnierkiewicz, Benjamin
  Herrenschmidt, Breno Leitao, C├ędric Le Goater, Christophe Leroy,
  Christophe Lombard, Dan Carpenter, Daniel Axtens, Finn Thain, Gautham
  R. Shenoy, Gustavo Romero, Haren Myneni, Hari Bathini, Jia Hongtao,
  Joel Stanley, John Allen, Laurent Dufour, Madhavan Srinivasan, Mahesh
  Salgaonkar, Mark Hairgrove, Masahiro Yamada, Michael Bringmann,
  Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Nathan
  Fontenot, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver
  O'Halloran, Paul Mackerras, Petr Vorel, Rashmica Gupta, Reza Arbab,
  Rob Herring, Sam Bobroff, Samuel Mendoza-Jonas, Scott Wood, Stan
  Johnson, Stephen Rothwell, Stewart Smith, Suraj Jitindar Singh, Tyrel
  Datwyler, Vaibhav Jain, Vasant Hegde, YueHaibing, zhong jiang"

* tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (221 commits)
  Revert "selftests/powerpc: Fix out-of-tree build errors"
  powerpc/msi: Fix compile error on mpc83xx
  powerpc: Fix stack protector crashes on CPU hotplug
  powerpc/traps: restore recoverability of machine_check interrupts
  powerpc/64/module: REL32 relocation range check
  powerpc/64s/radix: Fix radix__flush_tlb_collapsed_pmd double flushing pmd
  selftests/powerpc: Add a test of wild bctr
  powerpc/mm: Fix page table dump to work on Radix
  powerpc/mm/radix: Display if mappings are exec or not
  powerpc/mm/radix: Simplify split mapping logic
  powerpc/mm/radix: Remove the retry in the split mapping logic
  powerpc/mm/radix: Fix small page at boundary when splitting
  powerpc/mm/radix: Fix overuse of small pages in splitting logic
  powerpc/mm/radix: Fix off-by-one in split mapping logic
  powerpc/ftrace: Handle large kernel configs
  powerpc/mm: Fix WARN_ON with THP NUMA migration
  selftests/powerpc: Fix out-of-tree build errors
  powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected
  powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64
  powerpc/time: isolate scaled cputime accounting in dedicated functions.
  ...

Diffstat:
MDocumentation/admin-guide/kernel-parameters.txt | 2+-
March/m68k/mac/misc.c | 75++++++++++-----------------------------------------------------------------
Aarch/powerpc/Kbuild | 16++++++++++++++++
March/powerpc/Kconfig | 19++++---------------
March/powerpc/Kconfig.debug | 6------
March/powerpc/Makefile | 91+++++++++++++++++++++++++++++++------------------------------------------------
March/powerpc/boot/.gitignore | 1+
March/powerpc/boot/Makefile | 11++++++++---
March/powerpc/boot/crt0.S | 4+++-
March/powerpc/boot/opal.c | 8--------
March/powerpc/boot/serial.c | 1+
March/powerpc/configs/g5_defconfig | 1+
March/powerpc/configs/maple_defconfig | 1+
March/powerpc/configs/powernv_defconfig | 4++++
March/powerpc/configs/ppc64_defconfig | 4++++
March/powerpc/configs/ps3_defconfig | 1+
March/powerpc/configs/pseries_defconfig | 1+
March/powerpc/configs/skiroot_defconfig | 154+++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
March/powerpc/include/asm/accounting.h | 4++++
March/powerpc/include/asm/asm-prototypes.h | 3++-
March/powerpc/include/asm/book3s/32/pgtable.h | 152++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
March/powerpc/include/asm/book3s/64/hash-4k.h | 2+-
March/powerpc/include/asm/book3s/64/hash.h | 8++++++--
March/powerpc/include/asm/book3s/64/hugetlb.h | 3+++
March/powerpc/include/asm/book3s/64/mmu-hash.h | 95++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------
March/powerpc/include/asm/book3s/64/mmu.h | 4++--
March/powerpc/include/asm/book3s/64/pgtable-64k.h | 3+++
March/powerpc/include/asm/book3s/64/pgtable.h | 181+++++++++++++++++++++++++++++++++++++++++++++----------------------------------
March/powerpc/include/asm/cputhreads.h | 2++
March/powerpc/include/asm/cputime.h | 1-
March/powerpc/include/asm/drmem.h | 5+++++
March/powerpc/include/asm/eeh.h | 24+++++++++---------------
Aarch/powerpc/include/asm/error-injection.h | 13+++++++++++++
March/powerpc/include/asm/exception-64s.h | 17++++-------------
March/powerpc/include/asm/firmware.h | 5++++-
March/powerpc/include/asm/fixmap.h | 2+-
March/powerpc/include/asm/hvcall.h | 11++++++++++-
March/powerpc/include/asm/io.h | 33+++++++++++----------------------
March/powerpc/include/asm/kgdb.h | 5++++-
March/powerpc/include/asm/machdep.h | 3++-
March/powerpc/include/asm/mce.h | 3+++
March/powerpc/include/asm/mmu.h | 15+++++++++++++++
March/powerpc/include/asm/mmu_context.h | 2+-
March/powerpc/include/asm/mpic.h | 7+++++++
March/powerpc/include/asm/nohash/32/pgtable.h | 69++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
March/powerpc/include/asm/nohash/32/pte-40x.h | 43+++++++++++++++++++++++++++++++++++++++++++
March/powerpc/include/asm/nohash/32/pte-44x.h | 30++++++++++++++++++++++++++++++
March/powerpc/include/asm/nohash/32/pte-8xx.h | 87++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
March/powerpc/include/asm/nohash/32/pte-fsl-booke.h | 33+++++++++++++++++++++++++++++++++
March/powerpc/include/asm/nohash/64/pgtable.h | 41++++++++++++++++++++++++++++++++++++++---
March/powerpc/include/asm/nohash/pgtable.h | 100+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------
March/powerpc/include/asm/nohash/pte-book3e.h | 41+++++++++++++++++++++++++++++++++++++++++
March/powerpc/include/asm/opal-api.h | 1+
March/powerpc/include/asm/paca.h | 18++++++++++++++++--
March/powerpc/include/asm/pgtable.h | 29+++++++++++++++++++++++++++++
March/powerpc/include/asm/ppc-pci.h | 1+
March/powerpc/include/asm/processor.h | 7+++----
Darch/powerpc/include/asm/pte-common.h | 219-------------------------------------------------------------------------------
March/powerpc/include/asm/ptrace.h | 36++++++++++++++++++++++++++++++++++++
March/powerpc/include/asm/reg.h | 7++++++-
March/powerpc/include/asm/rtas.h | 15+++++++++++++++
March/powerpc/include/asm/slice.h | 1+
March/powerpc/include/asm/smp.h | 11+++++++++++
March/powerpc/include/asm/sparsemem.h | 11-----------
Aarch/powerpc/include/asm/stackprotector.h | 38++++++++++++++++++++++++++++++++++++++
March/powerpc/include/asm/thread_info.h | 17+++++++++++++++--
March/powerpc/include/asm/trace.h | 15+++++++++++++++
March/powerpc/include/asm/uaccess.h | 6+++---
March/powerpc/include/asm/user.h | 2+-
March/powerpc/include/uapi/asm/ptrace.h | 11++++++++++-
March/powerpc/include/uapi/asm/sigcontext.h | 6+++++-
March/powerpc/kernel/Makefile | 13++++++++-----
March/powerpc/kernel/asm-offsets.c | 19++++++++-----------
March/powerpc/kernel/btext.c | 2+-
March/powerpc/kernel/cacheinfo.c | 37+++++++++++++++++++++++++++++++++++--
March/powerpc/kernel/crash_dump.c | 2+-
March/powerpc/kernel/eeh.c | 42+++++++++++++++++++++---------------------
March/powerpc/kernel/eeh_dev.c | 2--
March/powerpc/kernel/eeh_driver.c | 237++++++++++++++++++++++++++++++++++++++-----------------------------------------
March/powerpc/kernel/eeh_pe.c | 160+++++++++++++++++++++++++++++++++++++++++++++----------------------------------
March/powerpc/kernel/entry_32.S | 4++--
March/powerpc/kernel/entry_64.S | 33+++++++++++++++++----------------
March/powerpc/kernel/exceptions-64s.S | 244+++++++++++++++++++++++++------------------------------------------------------
March/powerpc/kernel/fadump.c | 4++--
March/powerpc/kernel/head_8xx.S | 6+++---
March/powerpc/kernel/io-workarounds.c | 4++--
March/powerpc/kernel/iommu.c | 2+-
March/powerpc/kernel/isa-bridge.c | 6+++---
March/powerpc/kernel/kgdb.c | 43+++++++++++++++++++++++++++++++++++--------
March/powerpc/kernel/mce.c | 9+++++----
March/powerpc/kernel/mce_power.c | 9++++++++-
March/powerpc/kernel/module.c | 8++++++++
March/powerpc/kernel/module_64.c | 14++++++++------
March/powerpc/kernel/pci_32.c | 1-
March/powerpc/kernel/pci_64.c | 2+-
March/powerpc/kernel/process.c | 90++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
March/powerpc/kernel/prom_init.c | 223++++++++++++++++++++++++-------------------------------------------------------
March/powerpc/kernel/prom_init_check.sh | 16++++++++++++++++
March/powerpc/kernel/ptrace.c | 68+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
March/powerpc/kernel/rtas.c | 13+++++++++++--
March/powerpc/kernel/rtasd.c | 25+++++++++----------------
March/powerpc/kernel/setup-common.c | 3+++
March/powerpc/kernel/setup_64.c | 18++++++++++++------
March/powerpc/kernel/smp.c | 245++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
March/powerpc/kernel/swsusp_asm64.S | 2+-
March/powerpc/kernel/time.c | 104+++++++++++++++++++++++++++++++++++++++++++++++--------------------------------
March/powerpc/kernel/tm.S | 75+++++++++++++++++++++++++++++++++++++++++++--------------------------------
March/powerpc/kernel/trace/Makefile | 4+---
March/powerpc/kernel/trace/ftrace.c | 261+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
March/powerpc/kernel/trace/ftrace_64.S | 12++++++++++++
March/powerpc/kernel/traps.c | 123+++++++++++++++++++++++++++++++++++++++----------------------------------------
March/powerpc/kernel/vdso32/datapage.S | 1+
March/powerpc/kernel/vdso32/gettimeofday.S | 1+
March/powerpc/kernel/vdso64/datapage.S | 1+
March/powerpc/kernel/vdso64/gettimeofday.S | 1+
March/powerpc/kernel/vmlinux.lds.S | 16+++++++++++++++-
March/powerpc/kvm/Makefile | 2--
March/powerpc/lib/Makefile | 4++--
March/powerpc/lib/code-patching.c | 3+--
Aarch/powerpc/lib/error-inject.c | 16++++++++++++++++
March/powerpc/lib/mem_64.S | 4++--
March/powerpc/mm/8xx_mmu.c | 5++---
March/powerpc/mm/Makefile | 13++++++++++---
March/powerpc/mm/dma-noncoherent.c | 2+-
Aarch/powerpc/mm/dump_linuxpagetables-8xx.c | 82+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Aarch/powerpc/mm/dump_linuxpagetables-book3s64.c | 120+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Aarch/powerpc/mm/dump_linuxpagetables-generic.c | 82+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
March/powerpc/mm/dump_linuxpagetables.c | 167++++++-------------------------------------------------------------------------
Aarch/powerpc/mm/dump_linuxpagetables.h | 19+++++++++++++++++++
March/powerpc/mm/hash_native_64.c | 4++--
March/powerpc/mm/hash_utils_64.c | 13+++++++------
March/powerpc/mm/hugepage-hash64.c | 6++++++
March/powerpc/mm/hugetlbpage-hash64.c | 4++++
March/powerpc/mm/hugetlbpage.c | 13++++++++++---
March/powerpc/mm/mem.c | 13+++++++------
March/powerpc/mm/mmu_context_book3s64.c | 9+++++++++
March/powerpc/mm/mmu_decl.h | 6+++++-
March/powerpc/mm/numa.c | 6++++++
March/powerpc/mm/pgtable-book3e.c | 9+++------
March/powerpc/mm/pgtable-book3s64.c | 11++++++++---
March/powerpc/mm/pgtable-hash64.c | 7+++----
March/powerpc/mm/pgtable-radix.c | 65++++++++++++++++++++++++++++-------------------------------------
March/powerpc/mm/pgtable.c | 32++++++++++++--------------------
March/powerpc/mm/pgtable_32.c | 70++++++++++++++++++++++++++++++++++++++++------------------------------
March/powerpc/mm/pgtable_64.c | 57+++++++++++++++++++++++++++++++--------------------------
March/powerpc/mm/ppc_mmu_32.c | 2+-
March/powerpc/mm/slb.c | 784++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------
Darch/powerpc/mm/slb_low.S | 335-------------------------------------------------------------------------------
March/powerpc/mm/slice.c | 38++++++++++++++++++++++++++++++--------
March/powerpc/mm/tlb-radix.c | 2+-
March/powerpc/mm/tlb_nohash.c | 3+++
March/powerpc/oprofile/Makefile | 1-
March/powerpc/perf/Makefile | 1-
March/powerpc/perf/imc-pmu.c | 2+-
March/powerpc/perf/power7-pmu.c | 1+
March/powerpc/platforms/40x/Kconfig | 9---------
March/powerpc/platforms/44x/Kconfig | 22----------------------
March/powerpc/platforms/44x/fsp2.c | 8++++----
March/powerpc/platforms/4xx/ocm.c | 7++-----
March/powerpc/platforms/82xx/Kconfig | 1-
March/powerpc/platforms/85xx/smp.c | 4++--
March/powerpc/platforms/8xx/machine_check.c | 4++--
March/powerpc/platforms/Kconfig | 21---------------------
March/powerpc/platforms/Kconfig.cputype | 5+----
March/powerpc/platforms/Makefile | 2--
March/powerpc/platforms/cell/Kconfig | 3---
March/powerpc/platforms/cell/spu_manage.c | 25++++++-------------------
March/powerpc/platforms/embedded6xx/wii.c | 2+-
March/powerpc/platforms/maple/Kconfig | 1-
March/powerpc/platforms/pasemi/Kconfig | 1-
March/powerpc/platforms/pasemi/dma_lib.c | 2+-
March/powerpc/platforms/powermac/Makefile | 3++-
March/powerpc/platforms/powermac/time.c | 126+++++++++++++------------------------------------------------------------------
March/powerpc/platforms/powernv/Kconfig | 6------
March/powerpc/platforms/powernv/eeh-powernv.c | 62+++++---------------------------------------------------------
March/powerpc/platforms/powernv/memtrace.c | 21++++++++++++++++-----
March/powerpc/platforms/powernv/npu-dma.c | 198++++++++++++++++++++++++++++++++++++++-----------------------------------------
March/powerpc/platforms/powernv/opal-powercap.c | 3++-
March/powerpc/platforms/powernv/opal-sensor-groups.c | 4++--
March/powerpc/platforms/powernv/opal-sysparam.c | 2+-
March/powerpc/platforms/powernv/opal.c | 2+-
March/powerpc/platforms/powernv/setup.c | 47+++++++++++++++++++++++++++++++++++++++++------
March/powerpc/platforms/ps3/Kconfig | 2--
March/powerpc/platforms/ps3/os-area.c | 2+-
March/powerpc/platforms/ps3/spu.c | 3+--
March/powerpc/platforms/pseries/Kconfig | 9+++++++--
March/powerpc/platforms/pseries/Makefile | 3++-
March/powerpc/platforms/pseries/dlpar.c | 41+++++++++++++----------------------------
March/powerpc/platforms/pseries/dtl.c | 4++--
March/powerpc/platforms/pseries/eeh_pseries.c | 66++++--------------------------------------------------------------
March/powerpc/platforms/pseries/event_sources.c | 40+++++++++++++---------------------------
March/powerpc/platforms/pseries/firmware.c | 2++
March/powerpc/platforms/pseries/hotplug-cpu.c | 28++++++++++++++--------------
March/powerpc/platforms/pseries/hotplug-memory.c | 116++++++++++++++++++++++++++++++-------------------------------------------------
March/powerpc/platforms/pseries/ibmebus.c | 2+-
March/powerpc/platforms/pseries/lpar.c | 295+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
March/powerpc/platforms/pseries/lparcfg.c | 5++---
March/powerpc/platforms/pseries/mobility.c | 23+++++++++++------------
March/powerpc/platforms/pseries/msi.c | 3++-
Aarch/powerpc/platforms/pseries/papr_scm.c | 345+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
March/powerpc/platforms/pseries/pci.c | 1+
Aarch/powerpc/platforms/pseries/pmem.c | 164+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
March/powerpc/platforms/pseries/pseries.h | 11+++++++++--
March/powerpc/platforms/pseries/ras.c | 308++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
March/powerpc/platforms/pseries/setup.c | 14++++++++++++++
March/powerpc/platforms/pseries/vio.c | 27+++++++++++++--------------
March/powerpc/sysdev/Kconfig | 5-----
March/powerpc/sysdev/Makefile | 3---
March/powerpc/sysdev/fsl_85xx_cache_sram.c | 8++++----
March/powerpc/sysdev/ipic.c | 2+-
March/powerpc/sysdev/xics/Makefile | 1-
March/powerpc/sysdev/xive/Kconfig | 3---
March/powerpc/sysdev/xive/Makefile | 1-
March/powerpc/sysdev/xive/common.c | 7++++---
March/powerpc/sysdev/xive/native.c | 11+----------
March/powerpc/xmon/Makefile | 5+++--
March/powerpc/xmon/xmon.c | 56++++++++++++++++++++++++++++++++++++--------------------
Mdrivers/block/z2ram.c | 3+--
Mdrivers/macintosh/adb-iop.c | 50+++++++++++++++++++++++++++++---------------------
Mdrivers/macintosh/adb.c | 8+++++---
Mdrivers/macintosh/adbhid.c | 53+++++++++++++++++++++--------------------------------
Mdrivers/macintosh/macio_asic.c | 8+++++---
Mdrivers/macintosh/macio_sysfs.c | 8+++++++-
Mdrivers/macintosh/via-cuda.c | 35+++++++++++++++++++++++++++++++++++
Mdrivers/macintosh/via-macii.c | 352++++++++++++++++++++++++++++++++++++++-----------------------------------------
Mdrivers/macintosh/via-pmu.c | 33+++++++++++++++++++++++++++++++++
Mdrivers/macintosh/windfarm_smu_controls.c | 4++--
Mdrivers/macintosh/windfarm_smu_sat.c | 25+++++++------------------
Mdrivers/misc/ocxl/config.c | 4+++-
Mdrivers/pci/hotplug/pnv_php.c | 2+-
Mdrivers/pcmcia/electra_cf.c | 2+-
Mdrivers/soc/fsl/qbman/qman_ccsr.c | 2+-
Mdrivers/video/fbdev/chipsfb.c | 3+--
Mdrivers/video/fbdev/controlfb.c | 5+----
Mdrivers/video/fbdev/platinumfb.c | 5+----
Mdrivers/video/fbdev/valkyriefb.c | 12++++++------
Minclude/linux/cuda.h | 4++++
Minclude/linux/pmu.h | 4++++
Mtools/testing/selftests/powerpc/Makefile | 3++-
Mtools/testing/selftests/powerpc/include/reg.h | 1+
Mtools/testing/selftests/powerpc/include/utils.h | 18++++++++++++++++++
Mtools/testing/selftests/powerpc/mm/.gitignore | 5+++--
Mtools/testing/selftests/powerpc/mm/Makefile | 4+++-
Atools/testing/selftests/powerpc/mm/wild_bctr.c | 155+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtools/testing/selftests/powerpc/primitives/load_unaligned_zeropad.c | 8--------
Mtools/testing/selftests/powerpc/ptrace/Makefile | 2+-
Atools/testing/selftests/powerpc/ptrace/ptrace-syscall.c | 228+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atools/testing/selftests/powerpc/security/Makefile | 9+++++++++
Atools/testing/selftests/powerpc/security/rfi_flush.c | 132+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtools/testing/selftests/powerpc/tm/tm-tmspr.c | 27+++++++++++++++++----------
Mtools/testing/selftests/powerpc/tm/tm-unavailable.c | 9++++++---
Mtools/testing/selftests/powerpc/tm/tm.h | 9+++++++++
Mtools/testing/selftests/powerpc/utils.c | 152+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
253 files changed, 6306 insertions(+), 3401 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt @@ -2428,7 +2428,7 @@ seconds. Use this parameter to check at some other rate. 0 disables periodic checking. - memtest= [KNL,X86,ARM] Enable memtest + memtest= [KNL,X86,ARM,PPC] Enable memtest Format: <integer> default : 0 <disable> Specifies the number of memtest passes to be diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c @@ -37,35 +37,6 @@ static void (*rom_reset)(void); #ifdef CONFIG_ADB_CUDA -static time64_t cuda_read_time(void) -{ - struct adb_request req; - time64_t time; - - if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0) - return 0; - while (!req.complete) - cuda_poll(); - - time = (u32)((req.reply[3] << 24) | (req.reply[4] << 16) | - (req.reply[5] << 8) | req.reply[6]); - - return time - RTC_OFFSET; -} - -static void cuda_write_time(time64_t time) -{ - struct adb_request req; - u32 data = lower_32_bits(time + RTC_OFFSET); - - if (cuda_request(&req, NULL, 6, CUDA_PACKET, CUDA_SET_TIME, - (data >> 24) & 0xFF, (data >> 16) & 0xFF, - (data >> 8) & 0xFF, data & 0xFF) < 0) - return; - while (!req.complete) - cuda_poll(); -} - static __u8 cuda_read_pram(int offset) { struct adb_request req; @@ -91,33 +62,6 @@ static void cuda_write_pram(int offset, __u8 data) #endif /* CONFIG_ADB_CUDA */ #ifdef CONFIG_ADB_PMU -static time64_t pmu_read_time(void) -{ - struct adb_request req; - time64_t time; - - if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0) - return 0; - pmu_wait_complete(&req); - - time = (u32)((req.reply[0] << 24) | (req.reply[1] << 16) | - (req.reply[2] << 8) | req.reply[3]); - - return time - RTC_OFFSET; -} - -static void pmu_write_time(time64_t time) -{ - struct adb_request req; - u32 data = lower_32_bits(time + RTC_OFFSET); - - if (pmu_request(&req, NULL, 5, PMU_SET_RTC, - (data >> 24) & 0xFF, (data >> 16) & 0xFF, - (data >> 8) & 0xFF, data & 0xFF) < 0) - return; - pmu_wait_complete(&req); -} - static __u8 pmu_read_pram(int offset) { struct adb_request req; @@ -295,13 +239,17 @@ static time64_t via_read_time(void) * is basically any machine with Mac II-style ADB. */ -static void via_write_time(time64_t time) +static void via_set_rtc_time(struct rtc_time *tm) { union { __u8 cdata[4]; __u32 idata; } data; __u8 temp; + time64_t time; + + time = mktime64(tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday, + tm->tm_hour, tm->tm_min, tm->tm_sec); /* Clear the write protect bit */ @@ -641,12 +589,12 @@ int mac_hwclk(int op, struct rtc_time *t) #ifdef CONFIG_ADB_CUDA case MAC_ADB_EGRET: case MAC_ADB_CUDA: - now = cuda_read_time(); + now = cuda_get_time(); break; #endif #ifdef CONFIG_ADB_PMU case MAC_ADB_PB2: - now = pmu_read_time(); + now = pmu_get_time(); break; #endif default: @@ -665,24 +613,21 @@ int mac_hwclk(int op, struct rtc_time *t) __func__, t->tm_year + 1900, t->tm_mon + 1, t->tm_mday, t->tm_hour, t->tm_min, t->tm_sec); - now = mktime64(t->tm_year + 1900, t->tm_mon + 1, t->tm_mday, - t->tm_hour, t->tm_min, t->tm_sec); - switch (macintosh_config->adb_type) { case MAC_ADB_IOP: case MAC_ADB_II: case MAC_ADB_PB1: - via_write_time(now); + via_set_rtc_time(t); break; #ifdef CONFIG_ADB_CUDA case MAC_ADB_EGRET: case MAC_ADB_CUDA: - cuda_write_time(now); + cuda_set_rtc_time(t); break; #endif #ifdef CONFIG_ADB_PMU case MAC_ADB_PB2: - pmu_write_time(now); + pmu_set_rtc_time(t); break; #endif default: diff --git a/arch/powerpc/Kbuild b/arch/powerpc/Kbuild @@ -0,0 +1,16 @@ +subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror + +obj-y += kernel/ +obj-y += mm/ +obj-y += lib/ +obj-y += sysdev/ +obj-y += platforms/ +obj-y += math-emu/ +obj-y += crypto/ +obj-y += net/ + +obj-$(CONFIG_XMON) += xmon/ +obj-$(CONFIG_KVM) += kvm/ + +obj-$(CONFIG_PERF_EVENTS) += perf/ +obj-$(CONFIG_KEXEC_FILE) += purgatory/ diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig @@ -137,7 +137,7 @@ config PPC select ARCH_HAS_PMEM_API if PPC64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_MEMBARRIER_CALLBACKS - select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE + select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC64 select ARCH_HAS_SG_CHAIN select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && !RELOCATABLE && !HIBERNATION) select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST @@ -180,6 +180,8 @@ config PPC select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_TRACEHOOK select HAVE_CBPF_JIT if !PPC64 + select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13) + select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2) select HAVE_CONTEXT_TRACKING if PPC64 select HAVE_DEBUG_KMEMLEAK select HAVE_DEBUG_STACKOVERFLOW @@ -188,6 +190,7 @@ config PPC select HAVE_EBPF_JIT if PPC64 select HAVE_EFFICIENT_UNALIGNED_ACCESS if !(CPU_LITTLE_ENDIAN && POWER7_CPU) select HAVE_FTRACE_MCOUNT_RECORD + select HAVE_FUNCTION_ERROR_INJECTION select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACER select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200 # plugin support on gcc <= 5.1 is buggy on PPC @@ -285,12 +288,10 @@ config ARCH_MAY_HAVE_PC_FDC config PPC_UDBG_16550 bool - default n config GENERIC_TBSYNC bool default y if PPC32 && SMP - default n config AUDIT_ARCH bool @@ -309,13 +310,11 @@ config EPAPR_BOOT bool help Used to allow a board to specify it wants an ePAPR compliant wrapper. - default n config DEFAULT_UIMAGE bool help Used to allow a board to specify it wants a uImage built by default - default n config ARCH_HIBERNATION_POSSIBLE bool @@ -329,11 +328,9 @@ config ARCH_SUSPEND_POSSIBLE config PPC_DCR_NATIVE bool - default n config PPC_DCR_MMIO bool - default n config PPC_DCR bool @@ -344,7 +341,6 @@ config PPC_OF_PLATFORM_PCI bool depends on PCI depends on PPC64 # not supported on 32 bits yet - default n config ARCH_SUPPORTS_DEBUG_PAGEALLOC depends on PPC32 || PPC_BOOK3S_64 @@ -447,14 +443,12 @@ config PPC_TRANSACTIONAL_MEM depends on SMP select ALTIVEC select VSX - default n ---help--- Support user-mode Transactional Memory on POWERPC. config LD_HEAD_STUB_CATCH bool "Reserve 256 bytes to cope with linker stubs in HEAD text" if EXPERT depends on PPC64 - default n help Very large kernels can cause linker branch stubs to be generated by code in head_64.S, which moves the head text sections out of their @@ -557,7 +551,6 @@ config RELOCATABLE config RELOCATABLE_TEST bool "Test relocatable kernel" depends on (PPC64 && RELOCATABLE) - default n help This runs the relocatable kernel at the address it was initially loaded at, which tends to be non-zero and therefore test the @@ -769,7 +762,6 @@ config PPC_SUBPAGE_PROT config PPC_COPRO_BASE bool - default n config SCHED_SMT bool "SMT (Hyperthreading) scheduler support" @@ -892,7 +884,6 @@ config PPC_INDIRECT_PCI bool depends on PCI default y if 40x || 44x - default n config EISA bool @@ -989,7 +980,6 @@ source "drivers/pcmcia/Kconfig" config HAS_RAPIDIO bool - default n config RAPIDIO tristate "RapidIO support" @@ -1012,7 +1002,6 @@ endmenu config NONSTATIC_KERNEL bool - default n menu "Advanced setup" depends on PPC32 diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug @@ -2,7 +2,6 @@ config PPC_DISABLE_WERROR bool "Don't build arch/powerpc code with -Werror" - default n help This option tells the compiler NOT to build the code under arch/powerpc with the -Werror flag (which means warnings @@ -56,7 +55,6 @@ config PPC_EMULATED_STATS config CODE_PATCHING_SELFTEST bool "Run self-tests of the code-patching code" depends on DEBUG_KERNEL - default n config JUMP_LABEL_FEATURE_CHECKS bool "Enable use of jump label for cpu/mmu_has_feature()" @@ -70,7 +68,6 @@ config JUMP_LABEL_FEATURE_CHECKS config JUMP_LABEL_FEATURE_CHECK_DEBUG bool "Do extra check on feature fixup calls" depends on DEBUG_KERNEL && JUMP_LABEL_FEATURE_CHECKS - default n help This tries to catch incorrect usage of cpu_has_feature() and mmu_has_feature() in the code. @@ -80,16 +77,13 @@ config JUMP_LABEL_FEATURE_CHECK_DEBUG config FTR_FIXUP_SELFTEST bool "Run self-tests of the feature-fixup code" depends on DEBUG_KERNEL - default n config MSI_BITMAP_SELFTEST bool "Run self-tests of the MSI bitmap code" depends on DEBUG_KERNEL - default n config PPC_IRQ_SOFT_MASK_DEBUG bool "Include extra checks for powerpc irq soft masking" - default n config XMON bool "Include xmon kernel debugger" diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile @@ -112,6 +112,13 @@ KBUILD_LDFLAGS += -m elf$(BITS)$(LDEMULATION) KBUILD_ARFLAGS += --target=elf$(BITS)-$(GNUTARGET) endif +cflags-$(CONFIG_STACKPROTECTOR) += -mstack-protector-guard=tls +ifdef CONFIG_PPC64 +cflags-$(CONFIG_STACKPROTECTOR) += -mstack-protector-guard-reg=r13 +else +cflags-$(CONFIG_STACKPROTECTOR) += -mstack-protector-guard-reg=r2 +endif + LDFLAGS_vmlinux-y := -Bstatic LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie LDFLAGS_vmlinux := $(LDFLAGS_vmlinux-y) @@ -160,8 +167,17 @@ else CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=powerpc64 endif +ifdef CONFIG_FUNCTION_TRACER +CC_FLAGS_FTRACE := -pg ifdef CONFIG_MPROFILE_KERNEL - CC_FLAGS_FTRACE := -pg -mprofile-kernel +CC_FLAGS_FTRACE += -mprofile-kernel +endif +# Work around gcc code-gen bugs with -pg / -fno-omit-frame-pointer in gcc <= 4.8 +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199 +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52828 +ifneq ($(cc-name),clang) +CC_FLAGS_FTRACE += $(call cc-ifversion, -lt, 0409, -mno-sched-epilog) +endif endif CFLAGS-$(CONFIG_TARGET_CPU_BOOL) += $(call cc-option,-mcpu=$(CONFIG_TARGET_CPU)) @@ -229,16 +245,15 @@ ifdef CONFIG_6xx KBUILD_CFLAGS += -mcpu=powerpc endif -# Work around a gcc code-gen bug with -fno-omit-frame-pointer. -ifdef CONFIG_FUNCTION_TRACER -KBUILD_CFLAGS += -mno-sched-epilog -endif - cpu-as-$(CONFIG_4xx) += -Wa,-m405 cpu-as-$(CONFIG_ALTIVEC) += $(call as-option,-Wa$(comma)-maltivec) cpu-as-$(CONFIG_E200) += -Wa,-me200 cpu-as-$(CONFIG_E500) += -Wa,-me500 -cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower4 + +# When using '-many -mpower4' gas will first try and find a matching power4 +# mnemonic and failing that it will allow any valid mnemonic that GAS knows +# about. GCC will pass -many to GAS when assembling, clang does not. +cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower4 -Wa,-many cpu-as-$(CONFIG_PPC_E500MC) += $(call as-option,-Wa$(comma)-me500mc) KBUILD_AFLAGS += $(cpu-as-y) @@ -258,18 +273,8 @@ head-$(CONFIG_PPC_FPU) += arch/powerpc/kernel/fpu.o head-$(CONFIG_ALTIVEC) += arch/powerpc/kernel/vector.o head-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += arch/powerpc/kernel/prom_init.o -core-y += arch/powerpc/kernel/ \ - arch/powerpc/mm/ \ - arch/powerpc/lib/ \ - arch/powerpc/sysdev/ \ - arch/powerpc/platforms/ \ - arch/powerpc/math-emu/ \ - arch/powerpc/crypto/ \ - arch/powerpc/net/ -core-$(CONFIG_XMON) += arch/powerpc/xmon/ -core-$(CONFIG_KVM) += arch/powerpc/kvm/ -core-$(CONFIG_PERF_EVENTS) += arch/powerpc/perf/ -core-$(CONFIG_KEXEC_FILE) += arch/powerpc/purgatory/ +# See arch/powerpc/Kbuild for content of core part of the kernel +core-y += arch/powerpc/ drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/ @@ -397,40 +402,20 @@ archclean: archprepare: checkbin -# Use the file '.tmp_gas_check' for binutils tests, as gas won't output -# to stdout and these checks are run even on install targets. -TOUT := .tmp_gas_check +ifdef CONFIG_STACKPROTECTOR +prepare: stack_protector_prepare -# Check gcc and binutils versions: -# - gcc-3.4 and binutils-2.14 are a fatal combination -# - Require gcc 4.0 or above on 64-bit -# - gcc-4.2.0 has issues compiling modules on 64-bit +stack_protector_prepare: prepare0 +ifdef CONFIG_PPC64 + $(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk '{if ($$2 == "PACA_CANARY") print $$3;}' include/generated/asm-offsets.h)) +else + $(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk '{if ($$2 == "TASK_CANARY") print $$3;}' include/generated/asm-offsets.h)) +endif +endif + +# Check toolchain versions: +# - gcc-4.6 is the minimum kernel-wide version so nothing required. checkbin: - @if test "$(cc-name)" != "clang" \ - && test "$(cc-version)" = "0304" ; then \ - if ! /bin/echo mftb 5 | $(AS) -v -mppc -many -o $(TOUT) >/dev/null 2>&1 ; then \ - echo -n '*** ${VERSION}.${PATCHLEVEL} kernels no longer build '; \ - echo 'correctly with gcc-3.4 and your version of binutils.'; \ - echo '*** Please upgrade your binutils or downgrade your gcc'; \ - false; \ - fi ; \ - fi - @if test "$(cc-name)" != "clang" \ - && test "$(cc-version)" -lt "0400" \ - && test "x${CONFIG_PPC64}" = "xy" ; then \ - echo -n "Sorry, GCC v4.0 or above is required to build " ; \ - echo "the 64-bit powerpc kernel." ; \ - false ; \ - fi - @if test "$(cc-name)" != "clang" \ - && test "$(cc-fullversion)" = "040200" \ - && test "x${CONFIG_MODULES}${CONFIG_PPC64}" = "xyy" ; then \ - echo -n '*** GCC-4.2.0 cannot compile the 64-bit powerpc ' ; \ - echo 'kernel with modules enabled.' ; \ - echo -n '*** Please use a different GCC version or ' ; \ - echo 'disable kernel modules' ; \ - false ; \ - fi @if test "x${CONFIG_CPU_LITTLE_ENDIAN}" = "xy" \ && $(LD) --version | head -1 | grep ' 2\.24$$' >/dev/null ; then \ echo -n '*** binutils 2.24 miscompiles weak symbols ' ; \ @@ -438,7 +423,3 @@ checkbin: echo -n '*** Please use a different binutils version.' ; \ false ; \ fi - - -CLEAN_FILES += $(TOUT) - diff --git a/arch/powerpc/boot/.gitignore b/arch/powerpc/boot/.gitignore @@ -44,4 +44,5 @@ fdt_sw.c fdt_wip.c libfdt.h libfdt_internal.h +autoconf.h diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile @@ -32,8 +32,8 @@ else endif BOOTCFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ - -fno-strict-aliasing -Os -msoft-float -pipe \ - -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \ + -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx \ + -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \ -D$(compress-y) ifdef CONFIG_PPC64_BOOT_WRAPPER @@ -197,9 +197,14 @@ $(obj)/empty.c: $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: $(srctree)/$(src)/%.S $(Q)cp $< $@ +$(obj)/serial.c: $(obj)/autoconf.h + +$(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/% + $(Q)cp $< $@ + clean-files := $(zlib-) $(zlibheader-) $(zliblinuxheader-) \ $(zlib-decomp-) $(libfdt) $(libfdtheader) \ - empty.c zImage.coff.lds zImage.ps3.lds zImage.lds + autoconf.h empty.c zImage.coff.lds zImage.ps3.lds zImage.lds quiet_cmd_bootcc = BOOTCC $@ cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< diff --git a/arch/powerpc/boot/crt0.S b/arch/powerpc/boot/crt0.S @@ -47,8 +47,10 @@ p_end: .long _end p_pstack: .long _platform_stack_top #endif - .weak _zimage_start .globl _zimage_start + /* Clang appears to require the .weak directive to be after the symbol + * is defined. See https://bugs.llvm.org/show_bug.cgi?id=38921 */ + .weak _zimage_start _zimage_start: .globl _zimage_start_lib _zimage_start_lib: diff --git a/arch/powerpc/boot/opal.c b/arch/powerpc/boot/opal.c @@ -13,8 +13,6 @@ #include <libfdt.h> #include "../include/asm/opal-api.h" -#ifdef CONFIG_PPC64_BOOT_WRAPPER - /* Global OPAL struct used by opal-call.S */ struct opal { u64 base; @@ -101,9 +99,3 @@ int opal_console_init(void *devp, struct serial_console_data *scdp) return 0; } -#else -int opal_console_init(void *devp, struct serial_console_data *scdp) -{ - return -1; -} -#endif /* __powerpc64__ */ diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c @@ -18,6 +18,7 @@ #include "stdio.h" #include "io.h" #include "ops.h" +#include "autoconf.h" static int serial_open(void) { diff --git a/arch/powerpc/configs/g5_defconfig b/arch/powerpc/configs/g5_defconfig @@ -262,3 +262,4 @@ CONFIG_CRYPTO_SERPENT=m CONFIG_CRYPTO_TEA=m CONFIG_CRYPTO_TWOFISH=m # CONFIG_CRYPTO_HW is not set +CONFIG_PRINTK_TIME=y diff --git a/arch/powerpc/configs/maple_defconfig b/arch/powerpc/configs/maple_defconfig @@ -112,3 +112,4 @@ CONFIG_PPC_EARLY_DEBUG=y CONFIG_CRYPTO_ECB=m CONFIG_CRYPTO_PCBC=m # CONFIG_CRYPTO_HW is not set +CONFIG_PRINTK_TIME=y diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig @@ -44,6 +44,9 @@ CONFIG_PPC_MEMTRACE=y # CONFIG_PPC_PSERIES is not set # CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y +CONFIG_CPU_FREQ_GOV_POWERSAVE=y +CONFIG_CPU_FREQ_GOV_USERSPACE=y +CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y CONFIG_CPU_IDLE=y CONFIG_HZ_100=y CONFIG_BINFMT_MISC=m @@ -350,3 +353,4 @@ CONFIG_VIRTUALIZATION=y CONFIG_KVM_BOOK3S_64=m CONFIG_KVM_BOOK3S_64_HV=m CONFIG_VHOST_NET=m +CONFIG_PRINTK_TIME=y diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig @@ -40,6 +40,9 @@ CONFIG_PS3_LPM=m CONFIG_PPC_IBM_CELL_BLADE=y CONFIG_RTAS_FLASH=m CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y +CONFIG_CPU_FREQ_GOV_POWERSAVE=y +CONFIG_CPU_FREQ_GOV_USERSPACE=y +CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y CONFIG_CPU_FREQ_PMAC64=y CONFIG_HZ_100=y CONFIG_BINFMT_MISC=m @@ -365,3 +368,4 @@ CONFIG_VIRTUALIZATION=y CONFIG_KVM_BOOK3S_64=m CONFIG_KVM_BOOK3S_64_HV=m CONFIG_VHOST_NET=m +CONFIG_PRINTK_TIME=y diff --git a/arch/powerpc/configs/ps3_defconfig b/arch/powerpc/configs/ps3_defconfig @@ -171,3 +171,4 @@ CONFIG_CRYPTO_PCBC=m CONFIG_CRYPTO_MICHAEL_MIC=m CONFIG_CRYPTO_SALSA20=m CONFIG_CRYPTO_LZO=m +CONFIG_PRINTK_TIME=y diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig @@ -325,3 +325,4 @@ CONFIG_VIRTUALIZATION=y CONFIG_KVM_BOOK3S_64=m CONFIG_KVM_BOOK3S_64_HV=m CONFIG_VHOST_NET=m +CONFIG_PRINTK_TIME=y diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig @@ -3,20 +3,17 @@ CONFIG_ALTIVEC=y CONFIG_VSX=y CONFIG_NR_CPUS=2048 CONFIG_CPU_LITTLE_ENDIAN=y +CONFIG_KERNEL_XZ=y # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y # CONFIG_CROSS_MEMORY_ATTACH is not set CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y -CONFIG_TASKSTATS=y -CONFIG_TASK_DELAY_ACCT=y -CONFIG_TASK_XACCT=y -CONFIG_TASK_IO_ACCOUNTING=y +# CONFIG_CPU_ISOLATION is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=20 -CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y # CONFIG_RD_GZIP is not set # CONFIG_RD_BZIP2 is not set @@ -24,8 +21,14 @@ CONFIG_BLK_DEV_INITRD=y # CONFIG_RD_LZO is not set # CONFIG_RD_LZ4 is not set CONFIG_CC_OPTIMIZE_FOR_SIZE=y +CONFIG_EXPERT=y +# CONFIG_SGETMASK_SYSCALL is not set +# CONFIG_SYSFS_SYSCALL is not set +# CONFIG_SHMEM is not set +# CONFIG_AIO is not set CONFIG_PERF_EVENTS=y # CONFIG_COMPAT_BRK is not set +CONFIG_SLAB_FREELIST_HARDENED=y CONFIG_JUMP_LABEL=y CONFIG_STRICT_KERNEL_RWX=y CONFIG_MODULES=y @@ -35,7 +38,9 @@ CONFIG_MODULE_SIG_FORCE=y CONFIG_MODULE_SIG_SHA512=y CONFIG_PARTITION_ADVANCED=y # CONFIG_IOSCHED_DEADLINE is not set +# CONFIG_PPC_VAS is not set # CONFIG_PPC_PSERIES is not set +# CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y CONFIG_CPU_IDLE=y CONFIG_HZ_100=y @@ -48,8 +53,9 @@ CONFIG_NUMA=y CONFIG_PPC_64K_PAGES=y CONFIG_SCHED_SMT=y CONFIG_CMDLINE_BOOL=y -CONFIG_CMDLINE="console=tty0 console=hvc0 powersave=off" +CONFIG_CMDLINE="console=tty0 console=hvc0 ipr.fast_reboot=1 quiet" # CONFIG_SECCOMP is not set +# CONFIG_PPC_MEM_KEYS is not set CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y @@ -60,7 +66,6 @@ CONFIG_SYN_COOKIES=y # CONFIG_INET_XFRM_MODE_TRANSPORT is not set # CONFIG_INET_XFRM_MODE_TUNNEL is not set # CONFIG_INET_XFRM_MODE_BEET is not set -# CONFIG_IPV6 is not set CONFIG_DNS_RESOLVER=y # CONFIG_WIRELESS is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" @@ -73,8 +78,10 @@ CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=65536 CONFIG_VIRTIO_BLK=m CONFIG_BLK_DEV_NVME=m -CONFIG_EEPROM_AT24=y +CONFIG_NVME_MULTIPATH=y +CONFIG_EEPROM_AT24=m # CONFIG_CXL is not set +# CONFIG_OCXL is not set CONFIG_BLK_DEV_SD=m CONFIG_BLK_DEV_SR=m CONFIG_BLK_DEV_SR_VENDOR=y @@ -85,7 +92,6 @@ CONFIG_SCSI_FC_ATTRS=y CONFIG_SCSI_CXGB3_ISCSI=m CONFIG_SCSI_CXGB4_ISCSI=m CONFIG_SCSI_BNX2_ISCSI=m -CONFIG_BE2ISCSI=m CONFIG_SCSI_AACRAID=m CONFIG_MEGARAID_NEWGEN=y CONFIG_MEGARAID_MM=m @@ -102,7 +108,7 @@ CONFIG_SCSI_VIRTIO=m CONFIG_SCSI_DH=y CONFIG_SCSI_DH_ALUA=m CONFIG_ATA=y -CONFIG_SATA_AHCI=y +CONFIG_SATA_AHCI=m # CONFIG_ATA_SFF is not set CONFIG_MD=y CONFIG_BLK_DEV_MD=m @@ -119,25 +125,72 @@ CONFIG_DM_SNAPSHOT=m CONFIG_DM_MIRROR=m CONFIG_DM_ZERO=m CONFIG_DM_MULTIPATH=m +# CONFIG_NET_VENDOR_3COM is not set +# CONFIG_NET_VENDOR_ADAPTEC is not set +# CONFIG_NET_VENDOR_AGERE is not set +# CONFIG_NET_VENDOR_ALACRITECH is not set CONFIG_ACENIC=m CONFIG_ACENIC_OMIT_TIGON_I=y -CONFIG_TIGON3=y +# CONFIG_NET_VENDOR_AMAZON is not set +# CONFIG_NET_VENDOR_AMD is not set +# CONFIG_NET_VENDOR_AQUANTIA is not set +# CONFIG_NET_VENDOR_ARC is not set +# CONFIG_NET_VENDOR_ATHEROS is not set +CONFIG_TIGON3=m CONFIG_BNX2X=m -CONFIG_CHELSIO_T1=y +# CONFIG_NET_VENDOR_BROCADE is not set +# CONFIG_NET_CADENCE is not set +# CONFIG_NET_VENDOR_CAVIUM is not set +CONFIG_CHELSIO_T1=m +# CONFIG_NET_VENDOR_CISCO is not set +# CONFIG_NET_VENDOR_CORTINA is not set +# CONFIG_NET_VENDOR_DEC is not set +# CONFIG_NET_VENDOR_DLINK is not set CONFIG_BE2NET=m -CONFIG_S2IO=m -CONFIG_E100=m +# CONFIG_NET_VENDOR_EZCHIP is not set +# CONFIG_NET_VENDOR_HP is not set +# CONFIG_NET_VENDOR_HUAWEI is not set CONFIG_E1000=m -CONFIG_E1000E=m +CONFIG_IGB=m CONFIG_IXGB=m CONFIG_IXGBE=m +CONFIG_I40E=m +CONFIG_S2IO=m +# CONFIG_NET_VENDOR_MARVELL is not set CONFIG_MLX4_EN=m +# CONFIG_MLX4_CORE_GEN2 is not set CONFIG_MLX5_CORE=m -CONFIG_MLX5_CORE_EN=y +# CONFIG_NET_VENDOR_MICREL is not set CONFIG_MYRI10GE=m +# CONFIG_NET_VENDOR_NATSEMI is not set +# CONFIG_NET_VENDOR_NETRONOME is not set +# CONFIG_NET_VENDOR_NI is not set +# CONFIG_NET_VENDOR_NVIDIA is not set +# CONFIG_NET_VENDOR_OKI is not set +# CONFIG_NET_PACKET_ENGINE is not set CONFIG_QLGE=m CONFIG_NETXEN_NIC=m +# CONFIG_NET_VENDOR_QUALCOMM is not set +# CONFIG_NET_VENDOR_RDC is not set +# CONFIG_NET_VENDOR_REALTEK is not set +# CONFIG_NET_VENDOR_RENESAS is not set +# CONFIG_NET_VENDOR_ROCKER is not set +# CONFIG_NET_VENDOR_SAMSUNG is not set +# CONFIG_NET_VENDOR_SEEQ is not set CONFIG_SFC=m +# CONFIG_NET_VENDOR_SILAN is not set +# CONFIG_NET_VENDOR_SIS is not set +# CONFIG_NET_VENDOR_SMSC is not set +# CONFIG_NET_VENDOR_SOCIONEXT is not set +# CONFIG_NET_VENDOR_STMICRO is not set +# CONFIG_NET_VENDOR_SUN is not set +# CONFIG_NET_VENDOR_SYNOPSYS is not set +# CONFIG_NET_VENDOR_TEHUTI is not set +# CONFIG_NET_VENDOR_TI is not set +# CONFIG_NET_VENDOR_VIA is not set +# CONFIG_NET_VENDOR_WIZNET is not set +# CONFIG_NET_VENDOR_XILINX is not set +CONFIG_PHYLIB=y # CONFIG_USB_NET_DRIVERS is not set # CONFIG_WLAN is not set CONFIG_INPUT_EVDEV=y @@ -149,39 +202,51 @@ CONFIG_SERIAL_8250_CONSOLE=y CONFIG_IPMI_HANDLER=y CONFIG_IPMI_DEVICE_INTERFACE=y CONFIG_IPMI_POWERNV=y +CONFIG_IPMI_WATCHDOG=y CONFIG_HW_RANDOM=y +CONFIG_TCG_TPM=y CONFIG_TCG_TIS_I2C_NUVOTON=y +CONFIG_I2C=y # CONFIG_I2C_COMPAT is not set CONFIG_I2C_CHARDEV=y # CONFIG_I2C_HELPER_AUTO is not set -CONFIG_DRM=y -CONFIG_DRM_RADEON=y +CONFIG_I2C_ALGOBIT=y +CONFIG_I2C_OPAL=m +CONFIG_PPS=y +CONFIG_SENSORS_IBMPOWERNV=m +CONFIG_DRM=m CONFIG_DRM_AST=m +CONFIG_FB=y CONFIG_FIRMWARE_EDID=y -CONFIG_FB_MODE_HELPERS=y -CONFIG_FB_OF=y -CONFIG_FB_MATROX=y -CONFIG_FB_MATROX_MILLENIUM=y -CONFIG_FB_MATROX_MYSTIQUE=y -CONFIG_FB_MATROX_G=y -# CONFIG_LCD_CLASS_DEVICE is not set -# CONFIG_BACKLIGHT_GENERIC is not set # CONFIG_VGA_CONSOLE is not set +CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_LOGO=y # CONFIG_LOGO_LINUX_MONO is not set # CONFIG_LOGO_LINUX_VGA16 is not set +CONFIG_HID_GENERIC=m +CONFIG_HID_A4TECH=y +CONFIG_HID_BELKIN=y +CONFIG_HID_CHERRY=y +CONFIG_HID_CHICONY=y +CONFIG_HID_CYPRESS=y +CONFIG_HID_EZKEY=y +CONFIG_HID_ITE=y +CONFIG_HID_KENSINGTON=y +CONFIG_HID_LOGITECH=y +CONFIG_HID_MICROSOFT=y +CONFIG_HID_MONTEREY=y CONFIG_USB_HIDDEV=y -CONFIG_USB=y -CONFIG_USB_MON=y -CONFIG_USB_XHCI_HCD=y -CONFIG_USB_EHCI_HCD=y +CONFIG_USB=m +CONFIG_USB_XHCI_HCD=m +CONFIG_USB_EHCI_HCD=m # CONFIG_USB_EHCI_HCD_PPC_OF is not set -CONFIG_USB_OHCI_HCD=y -CONFIG_USB_STORAGE=y +CONFIG_USB_OHCI_HCD=m +CONFIG_USB_STORAGE=m CONFIG_RTC_CLASS=y +CONFIG_RTC_DRV_OPAL=m CONFIG_RTC_DRV_GENERIC=m CONFIG_VIRT_DRIVERS=y -CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_PCI=m # CONFIG_IOMMU_SUPPORT is not set CONFIG_EXT4_FS=m CONFIG_EXT4_FS_POSIX_ACL=y @@ -195,10 +260,9 @@ CONFIG_UDF_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_PROC_KCORE=y -CONFIG_TMPFS=y -CONFIG_TMPFS_POSIX_ACL=y # CONFIG_MISC_FILESYSTEMS is not set # CONFIG_NETWORK_FILESYSTEMS is not set +CONFIG_NLS=y CONFIG_NLS_DEFAULT="utf8" CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_ASCII=y @@ -207,26 +271,24 @@ CONFIG_NLS_UTF8=y CONFIG_CRC16=y CONFIG_CRC_ITU_T=y CONFIG_LIBCRC32C=y +# CONFIG_XZ_DEC_X86 is not set +# CONFIG_XZ_DEC_IA64 is not set +# CONFIG_XZ_DEC_ARM is not set +# CONFIG_XZ_DEC_ARMTHUMB is not set +# CONFIG_XZ_DEC_SPARC is not set CONFIG_PRINTK_TIME=y CONFIG_MAGIC_SYSRQ=y -CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_SOFTLOCKUP_DETECTOR=y +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y CONFIG_HARDLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y CONFIG_WQ_WATCHDOG=y -CONFIG_SCHEDSTATS=y +# CONFIG_SCHED_DEBUG is not set # CONFIG_FTRACE is not set +# CONFIG_RUNTIME_TESTING_MENU is not set CONFIG_XMON=y CONFIG_XMON_DEFAULT=y -CONFIG_SECURITY=y -CONFIG_IMA=y -CONFIG_EVM=y +CONFIG_ENCRYPTED_KEYS=y # CONFIG_CRYPTO_ECHAINIV is not set -CONFIG_CRYPTO_ECB=y -CONFIG_CRYPTO_CMAC=y -CONFIG_CRYPTO_MD4=y -CONFIG_CRYPTO_ARC4=y -CONFIG_CRYPTO_DES=y # CONFIG_CRYPTO_HW is not set diff --git a/arch/powerpc/include/asm/accounting.h b/arch/powerpc/include/asm/accounting.h @@ -15,8 +15,10 @@ struct cpu_accounting_data { /* Accumulated cputime values to flush on ticks*/ unsigned long utime; unsigned long stime; +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME unsigned long utime_scaled; unsigned long stime_scaled; +#endif unsigned long gtime; unsigned long hardirq_time; unsigned long softirq_time; @@ -25,8 +27,10 @@ struct cpu_accounting_data { /* Internal counters */ unsigned long starttime; /* TB value snapshot */ unsigned long starttime_user; /* TB value on exit to usermode */ +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME unsigned long startspurr; /* SPURR value snapshot */ unsigned long utime_sspurr; /* ->user_time when ->startspurr set */ +#endif }; #endif diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h @@ -63,7 +63,6 @@ void program_check_exception(struct pt_regs *regs); void alignment_exception(struct pt_regs *regs); void slb_miss_bad_addr(struct pt_regs *regs); void StackOverflow(struct pt_regs *regs); -void nonrecoverable_exception(struct pt_regs *regs); void kernel_fp_unavailable_exception(struct pt_regs *regs); void altivec_unavailable_exception(struct pt_regs *regs); void vsx_unavailable_exception(struct pt_regs *regs); @@ -78,6 +77,8 @@ void kernel_bad_stack(struct pt_regs *regs); void system_reset_exception(struct pt_regs *regs); void machine_check_exception(struct pt_regs *regs); void emulation_assist_interrupt(struct pt_regs *regs); +long do_slb_fault(struct pt_regs *regs, unsigned long ea); +void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err); /* signals, syscalls and interrupts */ long sys_swapcontext(struct ucontext __user *old_ctx, diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -8,7 +8,97 @@ #include <asm/book3s/32/hash.h> /* And here we include common definitions */ -#include <asm/pte-common.h> + +#define _PAGE_KERNEL_RO 0 +#define _PAGE_KERNEL_ROX 0 +#define _PAGE_KERNEL_RW (_PAGE_DIRTY | _PAGE_RW) +#define _PAGE_KERNEL_RWX (_PAGE_DIRTY | _PAGE_RW) + +#define _PAGE_HPTEFLAGS _PAGE_HASHPTE + +#ifndef __ASSEMBLY__ + +static inline bool pte_user(pte_t pte) +{ + return pte_val(pte) & _PAGE_USER; +} +#endif /* __ASSEMBLY__ */ + +/* + * Location of the PFN in the PTE. Most 32-bit platforms use the same + * as _PAGE_SHIFT here (ie, naturally aligned). + * Platform who don't just pre-define the value so we don't override it here. + */ +#define PTE_RPN_SHIFT (PAGE_SHIFT) + +/* + * The mask covered by the RPN must be a ULL on 32-bit platforms with + * 64-bit PTEs. + */ +#ifdef CONFIG_PTE_64BIT +#define PTE_RPN_MASK (~((1ULL << PTE_RPN_SHIFT) - 1)) +#else +#define PTE_RPN_MASK (~((1UL << PTE_RPN_SHIFT) - 1)) +#endif + +/* + * _PAGE_CHG_MASK masks of bits that are to be preserved across + * pgprot changes. + */ +#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HASHPTE | _PAGE_DIRTY | \ + _PAGE_ACCESSED | _PAGE_SPECIAL) + +/* + * We define 2 sets of base prot bits, one for basic pages (ie, + * cacheable kernel and user pages) and one for non cacheable + * pages. We always set _PAGE_COHERENT when SMP is enabled or + * the processor might need it for DMA coherency. + */ +#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED) +#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT) + +/* + * Permission masks used to generate the __P and __S table. + * + * Note:__pgprot is defined in arch/powerpc/include/asm/page.h + * + * Write permissions imply read permissions for now. + */ +#define PAGE_NONE __pgprot(_PAGE_BASE) +#define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) +#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) +#define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER) + +/* Permission masks used for kernel mappings */ +#define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) +#define PAGE_KERNEL_NC __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | _PAGE_NO_CACHE) +#define PAGE_KERNEL_NCG __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | \ + _PAGE_NO_CACHE | _PAGE_GUARDED) +#define PAGE_KERNEL_X __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) +#define PAGE_KERNEL_RO __pgprot(_PAGE_BASE | _PAGE_KERNEL_RO) +#define PAGE_KERNEL_ROX __pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX) + +/* + * Protection used for kernel text. We want the debuggers to be able to + * set breakpoints anywhere, so don't write protect the kernel text + * on platforms where such control is possible. + */ +#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\ + defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) +#define PAGE_KERNEL_TEXT PAGE_KERNEL_X +#else +#define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX +#endif + +/* Make modules code happy. We don't set RO yet */ +#define PAGE_KERNEL_EXEC PAGE_KERNEL_X + +/* Advertise special mapping type for AGP */ +#define PAGE_AGP (PAGE_KERNEL_NC) +#define HAVE_PAGE_AGP #define PTE_INDEX_SIZE PTE_SHIFT #define PMD_INDEX_SIZE 0 @@ -219,7 +309,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - pte_update(ptep, (_PAGE_RW | _PAGE_HWWRITE), _PAGE_RO); + pte_update(ptep, _PAGE_RW, 0); } static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) @@ -234,10 +324,9 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma, int psize) { unsigned long set = pte_val(entry) & - (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC); - unsigned long clr = ~pte_val(entry) & _PAGE_RO; + (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW); - pte_update(ptep, clr, set); + pte_update(ptep, 0, set); flush_tlb_page(vma, address); } @@ -292,7 +381,7 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma, #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) -int map_kernel_page(unsigned long va, phys_addr_t pa, int flags); +int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot); /* Generic accessors to PTE bits */ static inline int pte_write(pte_t pte) { return !!(pte_val(pte) & _PAGE_RW);} @@ -301,13 +390,28 @@ static inline int pte_dirty(pte_t pte) { return !!(pte_val(pte) & _PAGE_DIRTY); static inline int pte_young(pte_t pte) { return !!(pte_val(pte) & _PAGE_ACCESSED); } static inline int pte_special(pte_t pte) { return !!(pte_val(pte) & _PAGE_SPECIAL); } static inline int pte_none(pte_t pte) { return (pte_val(pte) & ~_PTE_NONE_MASK) == 0; } -static inline pgprot_t pte_pgprot(pte_t pte) { return __pgprot(pte_val(pte) & PAGE_PROT_BITS); } +static inline bool pte_exec(pte_t pte) { return true; } static inline int pte_present(pte_t pte) { return pte_val(pte) & _PAGE_PRESENT; } +static inline bool pte_hw_valid(pte_t pte) +{ + return pte_val(pte) & _PAGE_PRESENT; +} + +static inline bool pte_hashpte(pte_t pte) +{ + return !!(pte_val(pte) & _PAGE_HASHPTE); +} + +static inline bool pte_ci(pte_t pte) +{ + return !!(pte_val(pte) & _PAGE_NO_CACHE); +} + /* * We only find page table entry in the last level * Hence no need for other accessors @@ -315,17 +419,14 @@ static inline int pte_present(pte_t pte) #define pte_access_permitted pte_access_permitted static inline bool pte_access_permitted(pte_t pte, bool write) { - unsigned long pteval = pte_val(pte); /* * A read-only access is controlled by _PAGE_USER bit. * We have _PAGE_READ set for WRITE and EXECUTE */ - unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER; - - if (write) - need_pte_bits |= _PAGE_WRITE; + if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte)) + return false; - if ((pteval & need_pte_bits) != need_pte_bits) + if (write && !pte_write(pte)) return false; return true; @@ -354,6 +455,11 @@ static inline pte_t pte_wrprotect(pte_t pte) return __pte(pte_val(pte) & ~_PAGE_RW); } +static inline pte_t pte_exprotect(pte_t pte) +{ + return pte; +} + static inline pte_t pte_mkclean(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_DIRTY); @@ -364,6 +470,16 @@ static inline pte_t pte_mkold(pte_t pte) return __pte(pte_val(pte) & ~_PAGE_ACCESSED); } +static inline pte_t pte_mkexec(pte_t pte) +{ + return pte; +} + +static inline pte_t pte_mkpte(pte_t pte) +{ + return pte; +} + static inline pte_t pte_mkwrite(pte_t pte) { return __pte(pte_val(pte) | _PAGE_RW); @@ -389,6 +505,16 @@ static inline pte_t pte_mkhuge(pte_t pte) return pte; } +static inline pte_t pte_mkprivileged(pte_t pte) +{ + return __pte(pte_val(pte) & ~_PAGE_USER); +} + +static inline pte_t pte_mkuser(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_USER); +} + static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot)); diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h @@ -66,7 +66,7 @@ static inline int hash__hugepd_ok(hugepd_t hpd) * if it is not a pte and have hugepd shift mask * set, then it is a hugepd directory pointer */ - if (!(hpdval & _PAGE_PTE) && + if (!(hpdval & _PAGE_PTE) && (hpdval & _PAGE_PRESENT) && ((hpdval & HUGEPD_SHIFT_MASK) != 0)) return true; return false; diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h @@ -18,6 +18,11 @@ #include <asm/book3s/64/hash-4k.h> #endif +/* Bits to set in a PMD/PUD/PGD entry valid bit*/ +#define HASH_PMD_VAL_BITS (0x8000000000000000UL) +#define HASH_PUD_VAL_BITS (0x8000000000000000UL) +#define HASH_PGD_VAL_BITS (0x8000000000000000UL) + /* * Size of EA range mapped by our pagetables. */ @@ -196,8 +201,7 @@ static inline void hpte_do_hugepage_flush(struct mm_struct *mm, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -extern int hash__map_kernel_page(unsigned long ea, unsigned long pa, - unsigned long flags); +int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot); extern int __meminit hash__vmemmap_create_mapping(unsigned long start, unsigned long page_size, unsigned long phys); diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h b/arch/powerpc/include/asm/book3s/64/hugetlb.h @@ -39,4 +39,7 @@ static inline bool gigantic_page_supported(void) } #endif +/* hugepd entry valid bit */ +#define HUGEPD_VAL_BITS (0x8000000000000000UL) + #endif diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -30,7 +30,7 @@ * SLB */ -#define SLB_NUM_BOLTED 3 +#define SLB_NUM_BOLTED 2 #define SLB_CACHE_ENTRIES 8 #define SLB_MIN_SIZE 32 @@ -499,6 +499,8 @@ int htab_remove_mapping(unsigned long vstart, unsigned long vend, extern void pseries_add_gpage(u64 addr, u64 page_size, unsigned long number_of_pages); extern void demote_segment_4k(struct mm_struct *mm, unsigned long addr); +extern void hash__setup_new_exec(void); + #ifdef CONFIG_PPC_PSERIES void hpte_init_pseries(void); #else @@ -507,11 +509,18 @@ static inline void hpte_init_pseries(void) { } extern void hpte_init_native(void); +struct slb_entry { + u64 esid; + u64 vsid; +}; + extern void slb_initialize(void); -extern void slb_flush_and_rebolt(void); +void slb_flush_and_restore_bolted(void); void slb_flush_all_realmode(void); void __slb_restore_bolted_realmode(void); void slb_restore_bolted_realmode(void); +void slb_save_contents(struct slb_entry *slb_ptr); +void slb_dump_contents(struct slb_entry *slb_ptr); extern void slb_vmalloc_update(void); extern void slb_set_size(u16 size); @@ -524,13 +533,9 @@ extern void slb_set_size(u16 size); * from mmu context id and effective segment id of the address. * * For user processes max context id is limited to MAX_USER_CONTEXT. - - * For kernel space, we use context ids 1-4 to map addresses as below: - * NOTE: each context only support 64TB now. - * 0x00001 - [ 0xc000000000000000 - 0xc0003fffffffffff ] - * 0x00002 - [ 0xd000000000000000 - 0xd0003fffffffffff ] - * 0x00003 - [ 0xe000000000000000 - 0xe0003fffffffffff ] - * 0x00004 - [ 0xf000000000000000 - 0xf0003fffffffffff ] + * more details in get_user_context + * + * For kernel space get_kernel_context * * The proto-VSIDs are then scrambled into real VSIDs with the * multiplicative hash: @@ -571,6 +576,21 @@ extern void slb_set_size(u16 size); #define ESID_BITS_1T_MASK ((1 << ESID_BITS_1T) - 1) /* + * Now certain config support MAX_PHYSMEM more than 512TB. Hence we will need + * to use more than one context for linear mapping the kernel. + * For vmalloc and memmap, we use just one context with 512TB. With 64 byte + * struct page size, we need ony 32 TB in memmap for 2PB (51 bits (MAX_PHYSMEM_BITS)). + */ +#if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT) +#define MAX_KERNEL_CTX_CNT (1UL << (MAX_PHYSMEM_BITS - MAX_EA_BITS_PER_CONTEXT)) +#else +#define MAX_KERNEL_CTX_CNT 1 +#endif + +#define MAX_VMALLOC_CTX_CNT 1 +#define MAX_MEMMAP_CTX_CNT 1 + +/* * 256MB segment * The proto-VSID space has 2^(CONTEX_BITS + ESID_BITS) - 1 segments * available for user + kernel mapping. VSID 0 is reserved as invalid, contexts @@ -580,12 +600,13 @@ extern void slb_set_size(u16 size); * We also need to avoid the last segment of the last context, because that * would give a protovsid of 0x1fffffffff. That will result in a VSID 0 * because of the modulo operation in vsid scramble. + * + * We add one extra context to MIN_USER_CONTEXT so that we can map kernel + * context easily. The +1 is to map the unused 0xe region mapping. */ #define MAX_USER_CONTEXT ((ASM_CONST(1) << CONTEXT_BITS) - 2) -#define MIN_USER_CONTEXT (5) - -/* Would be nice to use KERNEL_REGION_ID here */ -#define KERNEL_REGION_CONTEXT_OFFSET (0xc - 1) +#define MIN_USER_CONTEXT (MAX_KERNEL_CTX_CNT + MAX_VMALLOC_CTX_CNT + \ + MAX_MEMMAP_CTX_CNT + 2) /* * For platforms that support on 65bit VA we limit the context bits @@ -746,6 +767,39 @@ static inline unsigned long get_vsid(unsigned long context, unsigned long ea, } /* + * For kernel space, we use context ids as below + * below. Range is 512TB per context. + * + * 0x00001 - [ 0xc000000000000000 - 0xc001ffffffffffff] + * 0x00002 - [ 0xc002000000000000 - 0xc003ffffffffffff] + * 0x00003 - [ 0xc004000000000000 - 0xc005ffffffffffff] + * 0x00004 - [ 0xc006000000000000 - 0xc007ffffffffffff] + + * 0x00005 - [ 0xd000000000000000 - 0xd001ffffffffffff ] + * 0x00006 - Not used - Can map 0xe000000000000000 range. + * 0x00007 - [ 0xf000000000000000 - 0xf001ffffffffffff ] + * + * So we can compute the context from the region (top nibble) by + * subtracting 11, or 0xc - 1. + */ +static inline unsigned long get_kernel_context(unsigned long ea) +{ + unsigned long region_id = REGION_ID(ea); + unsigned long ctx; + /* + * For linear mapping we do support multiple context + */ + if (region_id == KERNEL_REGION_ID) { + /* + * We already verified ea to be not beyond the addr limit. + */ + ctx = 1 + ((ea & ~REGION_MASK) >> MAX_EA_BITS_PER_CONTEXT); + } else + ctx = (region_id - 0xc) + MAX_KERNEL_CTX_CNT; + return ctx; +} + +/* * This is only valid for addresses >= PAGE_OFFSET */ static inline unsigned long get_kernel_vsid(unsigned long ea, int ssize) @@ -755,20 +809,7 @@ static inline unsigned long get_kernel_vsid(unsigned long ea, int ssize) if (!is_kernel_addr(ea)) return 0; - /* - * For kernel space, we use context ids 1-4 to map the address space as - * below: - * - * 0x00001 - [ 0xc000000000000000 - 0xc0003fffffffffff ] - * 0x00002 - [ 0xd000000000000000 - 0xd0003fffffffffff ] - * 0x00003 - [ 0xe000000000000000 - 0xe0003fffffffffff ] - * 0x00004 - [ 0xf000000000000000 - 0xf0003fffffffffff ] - * - * So we can compute the context from the region (top nibble) by - * subtracting 11, or 0xc - 1. - */ - context = (ea >> 60) - KERNEL_REGION_CONTEXT_OFFSET; - + context = get_kernel_context(ea); return get_vsid(context, ea, ssize); } diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -208,7 +208,7 @@ extern void radix_init_pseries(void); static inline void radix_init_pseries(void) { }; #endif -static inline int get_ea_context(mm_context_t *ctx, unsigned long ea) +static inline int get_user_context(mm_context_t *ctx, unsigned long ea) { int index = ea >> MAX_EA_BITS_PER_CONTEXT; @@ -223,7 +223,7 @@ static inline int get_ea_context(mm_context_t *ctx, unsigned long ea) static inline unsigned long get_user_vsid(mm_context_t *ctx, unsigned long ea, int ssize) { - unsigned long context = get_ea_context(ctx, ea); + unsigned long context = get_user_context(ctx, ea); return get_vsid(context, ea, ssize); } diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h @@ -10,6 +10,9 @@ * * Defined in such a way that we can optimize away code block at build time * if CONFIG_HUGETLB_PAGE=n. + * + * returns true for pmd migration entries, THP, devmap, hugetlb + * But compile time dependent on CONFIG_HUGETLB_PAGE */ static inline int pmd_huge(pmd_t pmd) { diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -14,10 +14,6 @@ */ #define _PAGE_BIT_SWAP_TYPE 0 -#define _PAGE_NA 0 -#define _PAGE_RO 0 -#define _PAGE_USER 0 - #define _PAGE_EXEC 0x00001 /* execute permission */ #define _PAGE_WRITE 0x00002 /* write access allowed */ #define _PAGE_READ 0x00004 /* read access allowed */ @@ -123,10 +119,6 @@ #define _PAGE_KERNEL_RWX (_PAGE_PRIVILEGED | _PAGE_DIRTY | \ _PAGE_RW | _PAGE_EXEC) /* - * No page size encoding in the linux PTE - */ -#define _PAGE_PSIZE 0 -/* * _PAGE_CHG_MASK masks of bits that are to be preserved across * pgprot changes */ @@ -137,19 +129,12 @@ #define H_PTE_PKEY (H_PTE_PKEY_BIT0 | H_PTE_PKEY_BIT1 | H_PTE_PKEY_BIT2 | \ H_PTE_PKEY_BIT3 | H_PTE_PKEY_BIT4) /* - * Mask of bits returned by pte_pgprot() - */ -#define PAGE_PROT_BITS (_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT | \ - H_PAGE_4K_PFN | _PAGE_PRIVILEGED | _PAGE_ACCESSED | \ - _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY | _PAGE_EXEC | \ - _PAGE_SOFT_DIRTY | H_PTE_PKEY) -/* * We define 2 sets of base prot bits, one for basic pages (ie, * cacheable kernel and user pages) and one for non cacheable * pages. We always set _PAGE_COHERENT when SMP is enabled or * the processor might need it for DMA coherency. */ -#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE) +#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED) #define _PAGE_BASE (_PAGE_BASE_NC) /* Permission masks used to generate the __P and __S table, @@ -159,8 +144,6 @@ * Write permissions imply read permissions for now (we could make write-only * pages on BookE but we don't bother for now). Execute permission control is * possible on platforms that define _PAGE_EXEC - * - * Note due to the way vm flags are laid out, the bits are XWR */ #define PAGE_NONE __pgprot(_PAGE_BASE | _PAGE_PRIVILEGED) #define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_RW) @@ -170,24 +153,6 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_READ) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) -#define __P000 PAGE_NONE -#define __P001 PAGE_READONLY -#define __P010 PAGE_COPY -#define __P011 PAGE_COPY -#define __P100 PAGE_READONLY_X -#define __P101 PAGE_READONLY_X -#define __P110 PAGE_COPY_X -#define __P111 PAGE_COPY_X - -#define __S000 PAGE_NONE -#define __S001 PAGE_READONLY -#define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED -#define __S100 PAGE_READONLY_X -#define __S101 PAGE_READONLY_X -#define __S110 PAGE_SHARED_X -#define __S111 PAGE_SHARED_X - /* Permission masks used for kernel mappings */ #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) #define PAGE_KERNEL_NC __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | \ @@ -519,7 +484,11 @@ static inline int pte_special(pte_t pte) return !!(pte_raw(pte) & cpu_to_be64(_PAGE_SPECIAL)); } -static inline pgprot_t pte_pgprot(pte_t pte) { return __pgprot(pte_val(pte) & PAGE_PROT_BITS); } +static inline bool pte_exec(pte_t pte) +{ + return !!(pte_raw(pte) & cpu_to_be64(_PAGE_EXEC)); +} + #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline bool pte_soft_dirty(pte_t pte) @@ -529,12 +498,12 @@ static inline bool pte_soft_dirty(pte_t pte) static inline pte_t pte_mksoft_dirty(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SOFT_DIRTY); + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_SOFT_DIRTY)); } static inline pte_t pte_clear_soft_dirty(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_SOFT_DIRTY); + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_SOFT_DIRTY)); } #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ @@ -555,7 +524,7 @@ static inline pte_t pte_mk_savedwrite(pte_t pte) */ VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | _PAGE_PRIVILEGED)) != cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)); - return __pte(pte_val(pte) & ~_PAGE_PRIVILEGED); + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_PRIVILEGED)); } #define pte_clear_savedwrite pte_clear_savedwrite @@ -565,14 +534,14 @@ static inline pte_t pte_clear_savedwrite(pte_t pte) * Used by KSM subsystem to make a protnone pte readonly. */ VM_BUG_ON(!pte_protnone(pte)); - return __pte(pte_val(pte) | _PAGE_PRIVILEGED); + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PRIVILEGED)); } #else #define pte_clear_savedwrite pte_clear_savedwrite static inline pte_t pte_clear_savedwrite(pte_t pte) { VM_WARN_ON(1); - return __pte(pte_val(pte) & ~_PAGE_WRITE); + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_WRITE)); } #endif /* CONFIG_NUMA_BALANCING */ @@ -587,6 +556,11 @@ static inline int pte_present(pte_t pte) return !!(pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)); } +static inline bool pte_hw_valid(pte_t pte) +{ + return !!(pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT)); +} + #ifdef CONFIG_PPC_MEM_KEYS extern bool arch_pte_access_permitted(u64 pte, bool write, bool execute); #else @@ -596,25 +570,22 @@ static inline bool arch_pte_access_permitted(u64 pte, bool write, bool execute) } #endif /* CONFIG_PPC_MEM_KEYS */ +static inline bool pte_user(pte_t pte) +{ + return !(pte_raw(pte) & cpu_to_be64(_PAGE_PRIVILEGED)); +} + #define pte_access_permitted pte_access_permitted static inline bool pte_access_permitted(pte_t pte, bool write) { - unsigned long pteval = pte_val(pte); - /* Also check for pte_user */ - unsigned long clear_pte_bits = _PAGE_PRIVILEGED; /* * _PAGE_READ is needed for any access and will be * cleared for PROT_NONE */ - unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_READ; - - if (write) - need_pte_bits |= _PAGE_WRITE; - - if ((pteval & need_pte_bits) != need_pte_bits) + if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte)) return false; - if ((pteval & clear_pte_bits) == clear_pte_bits) + if (write && !pte_write(pte)) return false; return arch_pte_access_permitted(pte_val(pte), write, 0); @@ -643,17 +614,32 @@ static inline pte_t pte_wrprotect(pte_t pte) { if (unlikely(pte_savedwrite(pte))) return pte_clear_savedwrite(pte); - return __pte(pte_val(pte) & ~_PAGE_WRITE); + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_WRITE)); +} + +static inline pte_t pte_exprotect(pte_t pte) +{ + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_EXEC)); } static inline pte_t pte_mkclean(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_DIRTY); + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_DIRTY)); } static inline pte_t pte_mkold(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_ACCESSED); + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_ACCESSED)); +} + +static inline pte_t pte_mkexec(pte_t pte) +{ + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC)); +} + +static inline pte_t pte_mkpte(pte_t pte) +{ + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE)); } static inline pte_t pte_mkwrite(pte_t pte) @@ -661,22 +647,22 @@ static inline pte_t pte_mkwrite(pte_t pte) /* * write implies read, hence set both */ - return __pte(pte_val(pte) | _PAGE_RW); + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_RW)); } static inline pte_t pte_mkdirty(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_DIRTY | _PAGE_SOFT_DIRTY)); } static inline pte_t pte_mkyoung(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_ACCESSED); + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_ACCESSED)); } static inline pte_t pte_mkspecial(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SPECIAL); + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_SPECIAL)); } static inline pte_t pte_mkhuge(pte_t pte) @@ -686,7 +672,17 @@ static inline pte_t pte_mkhuge(pte_t pte) static inline pte_t pte_mkdevmap(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SPECIAL|_PAGE_DEVMAP); + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_SPECIAL | _PAGE_DEVMAP)); +} + +static inline pte_t pte_mkprivileged(pte_t pte) +{ + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PRIVILEGED)); +} + +static inline pte_t pte_mkuser(pte_t pte) +{ + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_PRIVILEGED)); } /* @@ -705,12 +701,8 @@ static inline int pte_devmap(pte_t pte) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { /* FIXME!! check whether this need to be a conditional */ - return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot)); -} - -static inline bool pte_user(pte_t pte) -{ - return !(pte_raw(pte) & cpu_to_be64(_PAGE_PRIVILEGED)); + return __pte_raw((pte_raw(pte) & cpu_to_be64(_PAGE_CHG_MASK)) | + cpu_to_be64(pgprot_val(newprot))); } /* Encode and de-code a swap entry */ @@ -741,6 +733,8 @@ static inline bool pte_user(pte_t pte) */ #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) & ~_PAGE_PTE }) #define __swp_entry_to_pte(x) __pte((x).val | _PAGE_PTE) +#define __pmd_to_swp_entry(pmd) (__pte_to_swp_entry(pmd_pte(pmd))) +#define __swp_entry_to_pmd(x) (pte_pmd(__swp_entry_to_pte(x))) #ifdef CONFIG_MEM_SOFT_DIRTY #define _PAGE_SWP_SOFT_DIRTY (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE)) @@ -751,7 +745,7 @@ static inline bool pte_user(pte_t pte) #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline pte_t pte_swp_mksoft_dirty(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SWP_SOFT_DIRTY); + return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_SWP_SOFT_DIRTY)); } static inline bool pte_swp_soft_dirty(pte_t pte) @@ -761,7 +755,7 @@ static inline bool pte_swp_soft_dirty(pte_t pte) static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) { - return __pte(pte_val(pte) & ~_PAGE_SWP_SOFT_DIRTY); + return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_SWP_SOFT_DIRTY)); } #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ @@ -850,10 +844,10 @@ static inline pgprot_t pgprot_writecombine(pgprot_t prot) */ static inline bool pte_ci(pte_t pte) { - unsigned long pte_v = pte_val(pte); + __be64 pte_v = pte_raw(pte); - if (((pte_v & _PAGE_CACHE_CTL) == _PAGE_TOLERANT) || - ((pte_v & _PAGE_CACHE_CTL) == _PAGE_NON_IDEMPOTENT)) + if (((pte_v & cpu_to_be64(_PAGE_CACHE_CTL)) == cpu_to_be64(_PAGE_TOLERANT)) || + ((pte_v & cpu_to_be64(_PAGE_CACHE_CTL)) == cpu_to_be64(_PAGE_NON_IDEMPOTENT))) return true; return false; } @@ -875,8 +869,16 @@ static inline int pmd_none(pmd_t pmd) static inline int pmd_present(pmd_t pmd) { + /* + * A pmd is considerent present if _PAGE_PRESENT is set. + * We also need to consider the pmd present which is marked + * invalid during a split. Hence we look for _PAGE_INVALID + * if we find _PAGE_PRESENT cleared. + */ + if (pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) + return true; - return !pmd_none(pmd); + return false; } static inline int pmd_bad(pmd_t pmd) @@ -903,7 +905,7 @@ static inline int pud_none(pud_t pud) static inline int pud_present(pud_t pud) { - return !pud_none(pud); + return (pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT)); } extern struct page *pud_page(pud_t pud); @@ -950,7 +952,7 @@ static inline int pgd_none(pgd_t pgd) static inline int pgd_present(pgd_t pgd) { - return !pgd_none(pgd); + return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT)); } static inline pte_t pgd_pte(pgd_t pgd) @@ -1020,17 +1022,16 @@ extern struct page *pgd_page(pgd_t pgd); #define pgd_ERROR(e) \ pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e)) -static inline int map_kernel_page(unsigned long ea, unsigned long pa, - unsigned long flags) +static inline int map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot) { if (radix_enabled()) { #if defined(CONFIG_PPC_RADIX_MMU) && defined(DEBUG_VM) unsigned long page_size = 1 << mmu_psize_defs[mmu_io_psize].shift; WARN((page_size != PAGE_SIZE), "I/O page size != PAGE_SIZE"); #endif - return radix__map_kernel_page(ea, pa, __pgprot(flags), PAGE_SIZE); + return radix__map_kernel_page(ea, pa, prot, PAGE_SIZE); } - return hash__map_kernel_page(ea, pa, flags); + return hash__map_kernel_page(ea, pa, prot); } static inline int __meminit vmemmap_create_mapping(unsigned long start, @@ -1082,6 +1083,12 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_soft_dirty(pmd) pte_soft_dirty(pmd_pte(pmd)) #define pmd_mksoft_dirty(pmd) pte_pmd(pte_mksoft_dirty(pmd_pte(pmd))) #define pmd_clear_soft_dirty(pmd) pte_pmd(pte_clear_soft_dirty(pmd_pte(pmd))) + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#define pmd_swp_mksoft_dirty(pmd) pte_pmd(pte_swp_mksoft_dirty(pmd_pte(pmd))) +#define pmd_swp_soft_dirty(pmd) pte_swp_soft_dirty(pmd_pte(pmd)) +#define pmd_swp_clear_soft_dirty(pmd) pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd))) +#endif #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ #ifdef CONFIG_NUMA_BALANCING @@ -1127,6 +1134,10 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set); } +/* + * returns true for pmd migration entries, THP, devmap, hugetlb + * But compile time dependent on THP config + */ static inline int pmd_large(pmd_t pmd) { return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); @@ -1161,8 +1172,22 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_hugepage_update(mm, addr, pmdp, 0, _PAGE_PRIVILEGED); } +/* + * Only returns true for a THP. False for pmd migration entry. + * We also need to return true when we come across a pte that + * in between a thp split. While splitting THP, we mark the pmd + * invalid (pmdp_invalidate()) before we set it with pte page + * address. A pmd_trans_huge() check against a pmd entry during that time + * should return true. + * We should not call this on a hugetlb entry. We should check for HugeTLB + * entry using vma->vm_flags + * The page table walk rule is explained in Documentation/vm/transhuge.rst + */ static inline int pmd_trans_huge(pmd_t pmd) { + if (!pmd_present(pmd)) + return false; + if (radix_enabled()) return radix__pmd_trans_huge(pmd); return hash__pmd_trans_huge(pmd); diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h @@ -23,11 +23,13 @@ extern int threads_per_core; extern int threads_per_subcore; extern int threads_shift; +extern bool has_big_cores; extern cpumask_t threads_core_mask; #else #define threads_per_core 1 #define threads_per_subcore 1 #define threads_shift 0 +#define has_big_cores 0 #define threads_core_mask (*get_cpu_mask(0)) #endif diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h @@ -61,7 +61,6 @@ static inline void arch_vtime_task_switch(struct task_struct *prev) struct cpu_accounting_data *acct0 = get_accounting(prev); acct->starttime = acct0->starttime; - acct->startspurr = acct0->startspurr; } #endif diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h @@ -99,4 +99,9 @@ void __init walk_drmem_lmbs_early(unsigned long node, void (*func)(struct drmem_lmb *, const __be32 **)); #endif +static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb) +{ + lmb->aa_index = 0xffffffff; +} + #endif /* _ASM_POWERPC_LMB_H */ diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h @@ -43,7 +43,6 @@ struct pci_dn; #define EEH_VALID_PE_ZERO 0x10 /* PE#0 is valid */ #define EEH_ENABLE_IO_FOR_LOG 0x20 /* Enable IO for log */ #define EEH_EARLY_DUMP_LOG 0x40 /* Dump log immediately */ -#define EEH_POSTPONED_PROBE 0x80 /* Powernv may postpone device probe */ /* * Delay for PE reset, all in ms @@ -99,13 +98,13 @@ struct eeh_pe { atomic_t pass_dev_cnt; /* Count of passed through devs */ struct eeh_pe *parent; /* Parent PE */ void *data; /* PE auxillary data */ - struct list_head child_list; /* Link PE to the child list */ - struct list_head edevs; /* Link list of EEH devices */ - struct list_head child; /* Child PEs */ + struct list_head child_list; /* List of PEs below this PE */ + struct list_head child; /* Memb. child_list/eeh_phb_pe */ + struct list_head edevs; /* List of eeh_dev in this PE */ }; #define eeh_pe_for_each_dev(pe, edev, tmp) \ - list_for_each_entry_safe(edev, tmp, &pe->edevs, list) + list_for_each_entry_safe(edev, tmp, &pe->edevs, entry) #define eeh_for_each_pe(root, pe) \ for (pe = root; pe; pe = eeh_pe_next(pe, root)) @@ -142,13 +141,12 @@ struct eeh_dev { int aer_cap; /* Saved AER capability */ int af_cap; /* Saved AF capability */ struct eeh_pe *pe; /* Associated PE */ - struct list_head list; /* Form link list in the PE */ - struct list_head rmv_list; /* Record the removed edevs */ + struct list_head entry; /* Membership in eeh_pe.edevs */ + struct list_head rmv_entry; /* Membership in rmv_list */ struct pci_dn *pdn; /* Associated PCI device node */ struct pci_dev *pdev; /* Associated PCI device */ bool in_error; /* Error flag for edev */ struct pci_dev *physfn; /* Associated SRIOV PF */ - struct pci_bus *bus; /* PCI bus for partial hotplug */ }; static inline struct pci_dn *eeh_dev_to_pdn(struct eeh_dev *edev) @@ -207,9 +205,8 @@ struct eeh_ops { void* (*probe)(struct pci_dn *pdn, void *data); int (*set_option)(struct eeh_pe *pe, int option); int (*get_pe_addr)(struct eeh_pe *pe); - int (*get_state)(struct eeh_pe *pe, int *state); + int (*get_state)(struct eeh_pe *pe, int *delay); int (*reset)(struct eeh_pe *pe, int option); - int (*wait_state)(struct eeh_pe *pe, int max_wait); int (*get_log)(struct eeh_pe *pe, int severity, char *drv_log, unsigned long len); int (*configure_bridge)(struct eeh_pe *pe); int (*err_inject)(struct eeh_pe *pe, int type, int func, @@ -243,11 +240,7 @@ static inline bool eeh_has_flag(int flag) static inline bool eeh_enabled(void) { - if (eeh_has_flag(EEH_FORCE_DISABLED) || - !eeh_has_flag(EEH_ENABLED)) - return false; - - return true; + return eeh_has_flag(EEH_ENABLED) && !eeh_has_flag(EEH_FORCE_DISABLED); } static inline void eeh_serialize_lock(unsigned long *flags) @@ -270,6 +263,7 @@ typedef void *(*eeh_edev_traverse_func)(struct eeh_dev *edev, void *flag); typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag); void eeh_set_pe_aux_size(int size); int eeh_phb_pe_create(struct pci_controller *phb); +int eeh_wait_state(struct eeh_pe *pe, int max_wait); struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb); struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root); struct eeh_pe *eeh_pe_get(struct pci_controller *phb, diff --git a/arch/powerpc/include/asm/error-injection.h b/arch/powerpc/include/asm/error-injection.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#ifndef _ASM_ERROR_INJECTION_H +#define _ASM_ERROR_INJECTION_H + +#include <linux/compiler.h> +#include <linux/linkage.h> +#include <asm/ptrace.h> +#include <asm-generic/error-injection.h> + +void override_function_with_return(struct pt_regs *regs); + +#endif /* _ASM_ERROR_INJECTION_H */ diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h @@ -61,14 +61,6 @@ #define MAX_MCE_DEPTH 4 /* - * EX_LR is only used in EXSLB and where it does not overlap with EX_DAR - * EX_CCR similarly with DSISR, but being 4 byte registers there is a hole - * in the save area so it's not necessary to overlap them. Could be used - * for future savings though if another 4 byte register was to be saved. - */ -#define EX_LR EX_DAR - -/* * EX_R3 is only used by the bad_stack handler. bad_stack reloads and * saves DAR from SPRN_DAR, and EX_DAR is not used. So EX_R3 can overlap * with EX_DAR. @@ -236,11 +228,10 @@ * PPR save/restore macros used in exceptions_64s.S * Used for P7 or later processors */ -#define SAVE_PPR(area, ra, rb) \ +#define SAVE_PPR(area, ra) \ BEGIN_FTR_SECTION_NESTED(940) \ - ld ra,PACACURRENT(r13); \ - ld rb,area+EX_PPR(r13); /* Read PPR from paca */ \ - std rb,TASKTHREADPPR(ra); \ + ld ra,area+EX_PPR(r13); /* Read PPR from paca */ \ + std ra,_PPR(r1); \ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940) #define RESTORE_PPR_PACA(area, ra) \ @@ -508,7 +499,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) 3: EXCEPTION_PROLOG_COMMON_1(); \ beq 4f; /* if from kernel mode */ \ ACCOUNT_CPU_USER_ENTRY(r13, r9, r10); \ - SAVE_PPR(area, r9, r10); \ + SAVE_PPR(area, r9); \ 4: EXCEPTION_PROLOG_COMMON_2(area) \ EXCEPTION_PROLOG_COMMON_3(n) \ ACCOUNT_STOLEN_TIME diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h @@ -52,6 +52,8 @@ #define FW_FEATURE_PRRN ASM_CONST(0x0000000200000000) #define FW_FEATURE_DRMEM_V2 ASM_CONST(0x0000000400000000) #define FW_FEATURE_DRC_INFO ASM_CONST(0x0000000800000000) +#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000) +#define FW_FEATURE_PAPR_SCM ASM_CONST(0x0000002000000000) #ifndef __ASSEMBLY__ @@ -69,7 +71,8 @@ enum { FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY | FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN | FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 | - FW_FEATURE_DRC_INFO, + FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE | + FW_FEATURE_PAPR_SCM, FW_FEATURE_PSERIES_ALWAYS = 0, FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL, FW_FEATURE_POWERNV_ALWAYS = 0, diff --git a/arch/powerpc/include/asm/fixmap.h b/arch/powerpc/include/asm/fixmap.h @@ -72,7 +72,7 @@ enum fixed_addresses { static inline void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags) { - map_kernel_page(fix_to_virt(idx), phys, pgprot_val(flags)); + map_kernel_page(fix_to_virt(idx), phys, flags); } #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h @@ -278,6 +278,7 @@ #define H_COP 0x304 #define H_GET_MPP_X 0x314 #define H_SET_MODE 0x31C +#define H_BLOCK_REMOVE 0x328 #define H_CLEAR_HPT 0x358 #define H_REQUEST_VMC 0x360 #define H_RESIZE_HPT_PREPARE 0x36C @@ -295,7 +296,15 @@ #define H_INT_ESB 0x3C8 #define H_INT_SYNC 0x3CC #define H_INT_RESET 0x3D0 -#define MAX_HCALL_OPCODE H_INT_RESET +#define H_SCM_READ_METADATA 0x3E4 +#define H_SCM_WRITE_METADATA 0x3E8 +#define H_SCM_BIND_MEM 0x3EC +#define H_SCM_UNBIND_MEM 0x3F0 +#define H_SCM_QUERY_BLOCK_MEM_BINDING 0x3F4 +#define H_SCM_QUERY_LOGICAL_MEM_BINDING 0x3F8 +#define H_SCM_MEM_QUERY 0x3FC +#define H_SCM_BLOCK_CLEAR 0x400 +#define MAX_HCALL_OPCODE H_SCM_BLOCK_CLEAR /* H_VIOCTL functions */ #define H_GET_VIOA_DUMP_SIZE 0x01 diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h @@ -3,6 +3,9 @@ #ifdef __KERNEL__ #define ARCH_HAS_IOREMAP_WC +#ifdef CONFIG_PPC32 +#define ARCH_HAS_IOREMAP_WT +#endif /* * This program is free software; you can redistribute it and/or @@ -108,25 +111,6 @@ extern bool isa_io_special; #define IO_SET_SYNC_FLAG() #endif -/* gcc 4.0 and older doesn't have 'Z' constraint */ -#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ == 0) -#define DEF_MMIO_IN_X(name, size, insn) \ -static inline u##size name(const volatile u##size __iomem *addr) \ -{ \ - u##size ret; \ - __asm__ __volatile__("sync;"#insn" %0,0,%1;twi 0,%0,0;isync" \ - : "=r" (ret) : "r" (addr), "m" (*addr) : "memory"); \ - return ret; \ -} - -#define DEF_MMIO_OUT_X(name, size, insn) \ -static inline void name(volatile u##size __iomem *addr, u##size val) \ -{ \ - __asm__ __volatile__("sync;"#insn" %1,0,%2" \ - : "=m" (*addr) : "r" (val), "r" (addr) : "memory"); \ - IO_SET_SYNC_FLAG(); \ -} -#else /* newer gcc */ #define DEF_MMIO_IN_X(name, size, insn) \ static inline u##size name(const volatile u##size __iomem *addr) \ { \ @@ -143,7 +127,6 @@ static inline void name(volatile u##size __iomem *addr, u##size val) \ : "=Z" (*addr) : "r" (val) : "memory"); \ IO_SET_SYNC_FLAG(); \ } -#endif #define DEF_MMIO_IN_D(name, size, insn) \ static inline u##size name(const volatile u##size __iomem *addr) \ @@ -746,6 +729,10 @@ static inline void iosync(void) * * * ioremap_wc enables write combining * + * * ioremap_wt enables write through + * + * * ioremap_coherent maps coherent cached memory + * * * iounmap undoes such a mapping and can be hooked * * * __ioremap_at (and the pending __iounmap_at) are low level functions to @@ -767,6 +754,8 @@ extern void __iomem *ioremap(phys_addr_t address, unsigned long size); extern void __iomem *ioremap_prot(phys_addr_t address, unsigned long size, unsigned long flags); extern void __iomem *ioremap_wc(phys_addr_t address, unsigned long size); +void __iomem *ioremap_wt(phys_addr_t address, unsigned long size); +void __iomem *ioremap_coherent(phys_addr_t address, unsigned long size); #define ioremap_nocache(addr, size) ioremap((addr), (size)) #define ioremap_uc(addr, size) ioremap((addr), (size)) #define ioremap_cache(addr, size) \ @@ -777,12 +766,12 @@ extern void iounmap(volatile void __iomem *addr); extern void __iomem *__ioremap(phys_addr_t, unsigned long size, unsigned long flags); extern void __iomem *__ioremap_caller(phys_addr_t, unsigned long size, - unsigned long flags, void *caller); + pgprot_t prot, void *caller); extern void __iounmap(volatile void __iomem *addr); extern void __iomem * __ioremap_at(phys_addr_t pa, void *ea, - unsigned long size, unsigned long flags); + unsigned long size, pgprot_t prot); extern void __iounmap_at(void *ea, unsigned long size); /* diff --git a/arch/powerpc/include/asm/kgdb.h b/arch/powerpc/include/asm/kgdb.h @@ -26,9 +26,12 @@ #define BREAK_INSTR_SIZE 4 #define BUFMAX ((NUMREGBYTES * 2) + 512) #define OUTBUFMAX ((NUMREGBYTES * 2) + 512) + +#define BREAK_INSTR 0x7d821008 /* twge r2, r2 */ + static inline void arch_kgdb_breakpoint(void) { - asm(".long 0x7d821008"); /* twge r2, r2 */ + asm(stringify_in_c(.long BREAK_INSTR)); } #define CACHE_FLUSH_IS_SAFE 1 #define DBG_MAX_REG_NUM 70 diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h @@ -35,7 +35,7 @@ struct machdep_calls { char *name; #ifdef CONFIG_PPC64 void __iomem * (*ioremap)(phys_addr_t addr, unsigned long size, - unsigned long flags, void *caller); + pgprot_t prot, void *caller); void (*iounmap)(volatile void __iomem *token); #ifdef CONFIG_PM @@ -108,6 +108,7 @@ struct machdep_calls { /* Early exception handlers called in realmode */ int (*hmi_exception_early)(struct pt_regs *regs); + long (*machine_check_early)(struct pt_regs *regs); /* Called during machine check exception to retrive fixup address. */ bool (*mce_check_early_recovery)(struct pt_regs *regs); diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h @@ -210,4 +210,7 @@ extern void release_mce_event(void); extern void machine_check_queue_event(void); extern void machine_check_print_event_info(struct machine_check_event *evt, bool user_mode); +#ifdef CONFIG_PPC_BOOK3S_64 +void flush_and_reload_slb(void); +#endif /* CONFIG_PPC_BOOK3S_64 */ #endif /* __ASM_PPC64_MCE_H__ */ diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h @@ -309,6 +309,21 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address) */ #define MMU_PAGE_COUNT 16 +/* + * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS + * if we increase SECTIONS_WIDTH we will not store node details in page->flags and + * page_to_nid does a page->section->node lookup + * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce + * memory requirements with large number of sections. + * 51 bits is the max physical real address on POWER9 + */ +#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) && \ + defined (CONFIG_PPC_64K_PAGES) +#define MAX_PHYSMEM_BITS 51 +#else +#define MAX_PHYSMEM_BITS 46 +#endif + #ifdef CONFIG_PPC_BOOK3S_64 #include <asm/book3s/64/mmu.h> #else /* CONFIG_PPC_BOOK3S_64 */ diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h @@ -82,7 +82,7 @@ static inline bool need_extra_context(struct mm_struct *mm, unsigned long ea) { int context_id; - context_id = get_ea_context(&mm->context, ea); + context_id = get_user_context(&mm->context, ea); if (!context_id) return true; return false; diff --git a/arch/powerpc/include/asm/mpic.h b/arch/powerpc/include/asm/mpic.h @@ -393,7 +393,14 @@ extern struct bus_type mpic_subsys; #define MPIC_REGSET_TSI108 MPIC_REGSET(1) /* Tsi108/109 PIC */ /* Get the version of primary MPIC */ +#ifdef CONFIG_MPIC extern u32 fsl_mpic_primary_get_version(void); +#else +static inline u32 fsl_mpic_primary_get_version(void) +{ + return 0; +} +#endif /* Allocate the controller structure and setup the linux irq descs * for the range if interrupts passed in. No HW initialization is diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h @@ -128,14 +128,65 @@ extern int icache_44x_need_flush; #include <asm/nohash/32/pte-8xx.h> #endif -/* And here we include common definitions */ -#include <asm/pte-common.h> +/* + * Location of the PFN in the PTE. Most 32-bit platforms use the same + * as _PAGE_SHIFT here (ie, naturally aligned). + * Platform who don't just pre-define the value so we don't override it here. + */ +#ifndef PTE_RPN_SHIFT +#define PTE_RPN_SHIFT (PAGE_SHIFT) +#endif + +/* + * The mask covered by the RPN must be a ULL on 32-bit platforms with + * 64-bit PTEs. + */ +#if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT) +#define PTE_RPN_MASK (~((1ULL << PTE_RPN_SHIFT) - 1)) +#else +#define PTE_RPN_MASK (~((1UL << PTE_RPN_SHIFT) - 1)) +#endif + +/* + * _PAGE_CHG_MASK masks of bits that are to be preserved across + * pgprot changes. + */ +#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPECIAL) #ifndef __ASSEMBLY__ #define pte_clear(mm, addr, ptep) \ do { pte_update(ptep, ~0, 0); } while (0) +#ifndef pte_mkwrite +static inline pte_t pte_mkwrite(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_RW); +} +#endif + +static inline pte_t pte_mkdirty(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_DIRTY); +} + +static inline pte_t pte_mkyoung(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_ACCESSED); +} + +#ifndef pte_wrprotect +static inline pte_t pte_wrprotect(pte_t pte) +{ + return __pte(pte_val(pte) & ~_PAGE_RW); +} +#endif + +static inline pte_t pte_mkexec(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_EXEC); +} + #define pmd_none(pmd) (!pmd_val(pmd)) #define pmd_bad(pmd) (pmd_val(pmd) & _PMD_BAD) #define pmd_present(pmd) (pmd_val(pmd) & _PMD_PRESENT_MASK) @@ -244,7 +295,10 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - pte_update(ptep, (_PAGE_RW | _PAGE_HWWRITE), _PAGE_RO); + unsigned long clr = ~pte_val(pte_wrprotect(__pte(~0))); + unsigned long set = pte_val(pte_wrprotect(__pte(0))); + + pte_update(ptep, clr, set); } static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) @@ -258,9 +312,10 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, int psize) { - unsigned long set = pte_val(entry) & - (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC); - unsigned long clr = ~pte_val(entry) & (_PAGE_RO | _PAGE_NA); + pte_t pte_set = pte_mkyoung(pte_mkdirty(pte_mkwrite(pte_mkexec(__pte(0))))); + pte_t pte_clr = pte_mkyoung(pte_mkdirty(pte_mkwrite(pte_mkexec(__pte(~0))))); + unsigned long set = pte_val(entry) & pte_val(pte_set); + unsigned long clr = ~pte_val(entry) & ~pte_val(pte_clr); pte_update(ptep, clr, set); @@ -323,7 +378,7 @@ static inline int pte_young(pte_t pte) #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) -int map_kernel_page(unsigned long va, phys_addr_t pa, int flags); +int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot); #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/nohash/32/pte-40x.h b/arch/powerpc/include/asm/nohash/32/pte-40x.h @@ -50,13 +50,56 @@ #define _PAGE_EXEC 0x200 /* hardware: EX permission */ #define _PAGE_ACCESSED 0x400 /* software: R: page referenced */ +/* No page size encoding in the linux PTE */ +#define _PAGE_PSIZE 0 + +/* cache related flags non existing on 40x */ +#define _PAGE_COHERENT 0 + +#define _PAGE_KERNEL_RO 0 +#define _PAGE_KERNEL_ROX _PAGE_EXEC +#define _PAGE_KERNEL_RW (_PAGE_DIRTY | _PAGE_RW | _PAGE_HWWRITE) +#define _PAGE_KERNEL_RWX (_PAGE_DIRTY | _PAGE_RW | _PAGE_HWWRITE | _PAGE_EXEC) + #define _PMD_PRESENT 0x400 /* PMD points to page of PTEs */ +#define _PMD_PRESENT_MASK _PMD_PRESENT #define _PMD_BAD 0x802 #define _PMD_SIZE_4M 0x0c0 #define _PMD_SIZE_16M 0x0e0 +#define _PMD_USER 0 + +#define _PTE_NONE_MASK 0 /* Until my rework is finished, 40x still needs atomic PTE updates */ #define PTE_ATOMIC_UPDATES 1 +#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED) +#define _PAGE_BASE (_PAGE_BASE_NC) + +/* Permission masks used to generate the __P and __S table */ +#define PAGE_NONE __pgprot(_PAGE_BASE) +#define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) +#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | _PAGE_EXEC) +#define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) +#define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) + +#ifndef __ASSEMBLY__ +static inline pte_t pte_wrprotect(pte_t pte) +{ + return __pte(pte_val(pte) & ~(_PAGE_RW | _PAGE_HWWRITE)); +} + +#define pte_wrprotect pte_wrprotect + +static inline pte_t pte_mkclean(pte_t pte) +{ + return __pte(pte_val(pte) & ~(_PAGE_DIRTY | _PAGE_HWWRITE)); +} + +#define pte_mkclean pte_mkclean +#endif + #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_NOHASH_32_PTE_40x_H */ diff --git a/arch/powerpc/include/asm/nohash/32/pte-44x.h b/arch/powerpc/include/asm/nohash/32/pte-44x.h @@ -85,14 +85,44 @@ #define _PAGE_NO_CACHE 0x00000400 /* H: I bit */ #define _PAGE_WRITETHRU 0x00000800 /* H: W bit */ +/* No page size encoding in the linux PTE */ +#define _PAGE_PSIZE 0 + +#define _PAGE_KERNEL_RO 0 +#define _PAGE_KERNEL_ROX _PAGE_EXEC +#define _PAGE_KERNEL_RW (_PAGE_DIRTY | _PAGE_RW) +#define _PAGE_KERNEL_RWX (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC) + /* TODO: Add large page lowmem mapping support */ #define _PMD_PRESENT 0 #define _PMD_PRESENT_MASK (PAGE_MASK) #define _PMD_BAD (~PAGE_MASK) +#define _PMD_USER 0 /* ERPN in a PTE never gets cleared, ignore it */ #define _PTE_NONE_MASK 0xffffffff00000000ULL +/* + * We define 2 sets of base prot bits, one for basic pages (ie, + * cacheable kernel and user pages) and one for non cacheable + * pages. We always set _PAGE_COHERENT when SMP is enabled or + * the processor might need it for DMA coherency. + */ +#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED) +#if defined(CONFIG_SMP) +#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT) +#else +#define _PAGE_BASE (_PAGE_BASE_NC) +#endif + +/* Permission masks used to generate the __P and __S table */ +#define PAGE_NONE __pgprot(_PAGE_BASE) +#define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) +#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | _PAGE_EXEC) +#define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) +#define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_NOHASH_32_PTE_44x_H */ diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h @@ -29,10 +29,10 @@ */ /* Definitions for 8xx embedded chips. */ -#define _PAGE_PRESENT 0x0001 /* Page is valid */ -#define _PAGE_NO_CACHE 0x0002 /* I: cache inhibit */ -#define _PAGE_PRIVILEGED 0x0004 /* No ASID (context) compare */ -#define _PAGE_HUGE 0x0008 /* SPS: Small Page Size (1 if 16k, 512k or 8M)*/ +#define _PAGE_PRESENT 0x0001 /* V: Page is valid */ +#define _PAGE_NO_CACHE 0x0002 /* CI: cache inhibit */ +#define _PAGE_SH 0x0004 /* SH: No ASID (context) compare */ +#define _PAGE_SPS 0x0008 /* SPS: Small Page Size (1 if 16k, 512k or 8M)*/ #define _PAGE_DIRTY 0x0100 /* C: page changed */ /* These 4 software bits must be masked out when the L2 entry is loaded @@ -46,18 +46,95 @@ #define _PAGE_NA 0x0200 /* Supervisor NA, User no access */ #define _PAGE_RO 0x0600 /* Supervisor RO, User no access */ +/* cache related flags non existing on 8xx */ +#define _PAGE_COHERENT 0 +#define _PAGE_WRITETHRU 0 + +#define _PAGE_KERNEL_RO (_PAGE_SH | _PAGE_RO) +#define _PAGE_KERNEL_ROX (_PAGE_SH | _PAGE_RO | _PAGE_EXEC) +#define _PAGE_KERNEL_RW (_PAGE_SH | _PAGE_DIRTY) +#define _PAGE_KERNEL_RWX (_PAGE_SH | _PAGE_DIRTY | _PAGE_EXEC) + #define _PMD_PRESENT 0x0001 +#define _PMD_PRESENT_MASK _PMD_PRESENT #define _PMD_BAD 0x0fd0 #define _PMD_PAGE_MASK 0x000c #define _PMD_PAGE_8M 0x000c #define _PMD_PAGE_512K 0x0004 #define _PMD_USER 0x0020 /* APG 1 */ +#define _PTE_NONE_MASK 0 + /* Until my rework is finished, 8xx still needs atomic PTE updates */ #define PTE_ATOMIC_UPDATES 1 #ifdef CONFIG_PPC_16K_PAGES -#define _PAGE_PSIZE _PAGE_HUGE +#define _PAGE_PSIZE _PAGE_SPS +#else +#define _PAGE_PSIZE 0 +#endif + +#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE) +#define _PAGE_BASE (_PAGE_BASE_NC) + +/* Permission masks used to generate the __P and __S table */ +#define PAGE_NONE __pgprot(_PAGE_BASE | _PAGE_NA) +#define PAGE_SHARED __pgprot(_PAGE_BASE) +#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_EXEC) +#define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_RO) +#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_RO | _PAGE_EXEC) +#define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_RO) +#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_RO | _PAGE_EXEC) + +#ifndef __ASSEMBLY__ +static inline pte_t pte_wrprotect(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_RO); +} + +#define pte_wrprotect pte_wrprotect + +static inline int pte_write(pte_t pte) +{ + return !(pte_val(pte) & _PAGE_RO); +} + +#define pte_write pte_write + +static inline pte_t pte_mkwrite(pte_t pte) +{ + return __pte(pte_val(pte) & ~_PAGE_RO); +} + +#define pte_mkwrite pte_mkwrite + +static inline bool pte_user(pte_t pte) +{ + return !(pte_val(pte) & _PAGE_SH); +} + +#define pte_user pte_user + +static inline pte_t pte_mkprivileged(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_SH); +} + +#define pte_mkprivileged pte_mkprivileged + +static inline pte_t pte_mkuser(pte_t pte) +{ + return __pte(pte_val(pte) & ~_PAGE_SH); +} + +#define pte_mkuser pte_mkuser + +static inline pte_t pte_mkhuge(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_SPS); +} + +#define pte_mkhuge pte_mkhuge #endif #endif /* __KERNEL__ */ diff --git a/arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h b/arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h @@ -31,11 +31,44 @@ #define _PAGE_WRITETHRU 0x00400 /* H: W bit */ #define _PAGE_SPECIAL 0x00800 /* S: Special page */ +#define _PAGE_KERNEL_RO 0 +#define _PAGE_KERNEL_ROX _PAGE_EXEC +#define _PAGE_KERNEL_RW (_PAGE_DIRTY | _PAGE_RW) +#define _PAGE_KERNEL_RWX (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC) + +/* No page size encoding in the linux PTE */ +#define _PAGE_PSIZE 0 + #define _PMD_PRESENT 0 #define _PMD_PRESENT_MASK (PAGE_MASK) #define _PMD_BAD (~PAGE_MASK) +#define _PMD_USER 0 + +#define _PTE_NONE_MASK 0 #define PTE_WIMGE_SHIFT (6) +/* + * We define 2 sets of base prot bits, one for basic pages (ie, + * cacheable kernel and user pages) and one for non cacheable + * pages. We always set _PAGE_COHERENT when SMP is enabled or + * the processor might need it for DMA coherency. + */ +#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED) +#if defined(CONFIG_SMP) || defined(CONFIG_PPC_E500MC) +#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT) +#else +#define _PAGE_BASE (_PAGE_BASE_NC) +#endif + +/* Permission masks used to generate the __P and __S table */ +#define PAGE_NONE __pgprot(_PAGE_BASE) +#define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) +#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | _PAGE_EXEC) +#define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) +#define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) + #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_NOHASH_32_PTE_FSL_BOOKE_H */ diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h @@ -89,11 +89,47 @@ * Include the PTE bits definitions */ #include <asm/nohash/pte-book3e.h> -#include <asm/pte-common.h> + +#define _PAGE_SAO 0 + +#define PTE_RPN_MASK (~((1UL << PTE_RPN_SHIFT) - 1)) + +/* + * _PAGE_CHG_MASK masks of bits that are to be preserved across + * pgprot changes. + */ +#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPECIAL) + +#define H_PAGE_4K_PFN 0 #ifndef __ASSEMBLY__ /* pte_clear moved to later in this file */ +static inline pte_t pte_mkwrite(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_RW); +} + +static inline pte_t pte_mkdirty(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_DIRTY); +} + +static inline pte_t pte_mkyoung(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_ACCESSED); +} + +static inline pte_t pte_wrprotect(pte_t pte) +{ + return __pte(pte_val(pte) & ~_PAGE_RW); +} + +static inline pte_t pte_mkexec(pte_t pte) +{ + return __pte(pte_val(pte) | _PAGE_EXEC); +} + #define PMD_BAD_BITS (PTE_TABLE_SIZE-1) #define PUD_BAD_BITS (PMD_TABLE_SIZE-1) @@ -327,8 +363,7 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma, #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) #define __swp_entry_to_pte(x) __pte((x).val) -extern int map_kernel_page(unsigned long ea, unsigned long pa, - unsigned long flags); +int map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot); extern int __meminit vmemmap_create_mapping(unsigned long start, unsigned long page_size, unsigned long phys); diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h @@ -8,18 +8,50 @@ #include <asm/nohash/32/pgtable.h> #endif +/* Permission masks used for kernel mappings */ +#define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) +#define PAGE_KERNEL_NC __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | _PAGE_NO_CACHE) +#define PAGE_KERNEL_NCG __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | \ + _PAGE_NO_CACHE | _PAGE_GUARDED) +#define PAGE_KERNEL_X __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) +#define PAGE_KERNEL_RO __pgprot(_PAGE_BASE | _PAGE_KERNEL_RO) +#define PAGE_KERNEL_ROX __pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX) + +/* + * Protection used for kernel text. We want the debuggers to be able to + * set breakpoints anywhere, so don't write protect the kernel text + * on platforms where such control is possible. + */ +#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\ + defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) +#define PAGE_KERNEL_TEXT PAGE_KERNEL_X +#else +#define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX +#endif + +/* Make modules code happy. We don't set RO yet */ +#define PAGE_KERNEL_EXEC PAGE_KERNEL_X + +/* Advertise special mapping type for AGP */ +#define PAGE_AGP (PAGE_KERNEL_NC) +#define HAVE_PAGE_AGP + #ifndef __ASSEMBLY__ /* Generic accessors to PTE bits */ +#ifndef pte_write static inline int pte_write(pte_t pte) { - return (pte_val(pte) & (_PAGE_RW | _PAGE_RO)) != _PAGE_RO; + return pte_val(pte) & _PAGE_RW; } +#endif static inline int pte_read(pte_t pte) { return 1; } static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } static inline int pte_special(pte_t pte) { return pte_val(pte) & _PAGE_SPECIAL; } static inline int pte_none(pte_t pte) { return (pte_val(pte) & ~_PTE_NONE_MASK) == 0; } -static inline pgprot_t pte_pgprot(pte_t pte) { return __pgprot(pte_val(pte) & PAGE_PROT_BITS); } +static inline bool pte_hashpte(pte_t pte) { return false; } +static inline bool pte_ci(pte_t pte) { return pte_val(pte) & _PAGE_NO_CACHE; } +static inline bool pte_exec(pte_t pte) { return pte_val(pte) & _PAGE_EXEC; } #ifdef CONFIG_NUMA_BALANCING /* @@ -29,8 +61,7 @@ static inline pgprot_t pte_pgprot(pte_t pte) { return __pgprot(pte_val(pte) & PA */ static inline int pte_protnone(pte_t pte) { - return (pte_val(pte) & - (_PAGE_PRESENT | _PAGE_USER)) == _PAGE_PRESENT; + return pte_present(pte) && !pte_user(pte); } static inline int pmd_protnone(pmd_t pmd) @@ -44,6 +75,23 @@ static inline int pte_present(pte_t pte) return pte_val(pte) & _PAGE_PRESENT; } +static inline bool pte_hw_valid(pte_t pte) +{ + return pte_val(pte) & _PAGE_PRESENT; +} + +/* + * Don't just check for any non zero bits in __PAGE_USER, since for book3e + * and PTE_64BIT, PAGE_KERNEL_X contains _PAGE_BAP_SR which is also in + * _PAGE_USER. Need to explicitly match _PAGE_BAP_UR bit in that case too. + */ +#ifndef pte_user +static inline bool pte_user(pte_t pte) +{ + return (pte_val(pte) & _PAGE_USER) == _PAGE_USER; +} +#endif + /* * We only find page table entry in the last level * Hence no need for other accessors @@ -77,53 +125,53 @@ static inline unsigned long pte_pfn(pte_t pte) { return pte_val(pte) >> PTE_RPN_SHIFT; } /* Generic modifiers for PTE bits */ -static inline pte_t pte_wrprotect(pte_t pte) +static inline pte_t pte_exprotect(pte_t pte) { - pte_basic_t ptev; - - ptev = pte_val(pte) & ~(_PAGE_RW | _PAGE_HWWRITE); - ptev |= _PAGE_RO; - return __pte(ptev); + return __pte(pte_val(pte) & ~_PAGE_EXEC); } +#ifndef pte_mkclean static inline pte_t pte_mkclean(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_DIRTY | _PAGE_HWWRITE)); + return __pte(pte_val(pte) & ~_PAGE_DIRTY); } +#endif static inline pte_t pte_mkold(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_ACCESSED); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkpte(pte_t pte) { - pte_basic_t ptev; - - ptev = pte_val(pte) & ~_PAGE_RO; - ptev |= _PAGE_RW; - return __pte(ptev); + return pte; } -static inline pte_t pte_mkdirty(pte_t pte) +static inline pte_t pte_mkspecial(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_DIRTY); + return __pte(pte_val(pte) | _PAGE_SPECIAL); } -static inline pte_t pte_mkyoung(pte_t pte) +#ifndef pte_mkhuge +static inline pte_t pte_mkhuge(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_ACCESSED); + return __pte(pte_val(pte)); } +#endif -static inline pte_t pte_mkspecial(pte_t pte) +#ifndef pte_mkprivileged +static inline pte_t pte_mkprivileged(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SPECIAL); + return __pte(pte_val(pte) & ~_PAGE_USER); } +#endif -static inline pte_t pte_mkhuge(pte_t pte) +#ifndef pte_mkuser +static inline pte_t pte_mkuser(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_HUGE); + return __pte(pte_val(pte) | _PAGE_USER); } +#endif static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { @@ -197,6 +245,8 @@ extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addre #if _PAGE_WRITETHRU != 0 #define pgprot_cached_wthru(prot) (__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) | \ _PAGE_COHERENT | _PAGE_WRITETHRU)) +#else +#define pgprot_cached_wthru(prot) pgprot_noncached(prot) #endif #define pgprot_cached_noncoherent(prot) \ diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h b/arch/powerpc/include/asm/nohash/pte-book3e.h @@ -77,7 +77,48 @@ #define _PMD_PRESENT 0 #define _PMD_PRESENT_MASK (PAGE_MASK) #define _PMD_BAD (~PAGE_MASK) +#define _PMD_USER 0 +#else +#define _PTE_NONE_MASK 0 +#endif + +/* + * We define 2 sets of base prot bits, one for basic pages (ie, + * cacheable kernel and user pages) and one for non cacheable + * pages. We always set _PAGE_COHERENT when SMP is enabled or + * the processor might need it for DMA coherency. + */ +#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE) +#if defined(CONFIG_SMP) +#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT) +#else +#define _PAGE_BASE (_PAGE_BASE_NC) #endif +/* Permission masks used to generate the __P and __S table */ +#define PAGE_NONE __pgprot(_PAGE_BASE) +#define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) +#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | _PAGE_EXEC) +#define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) +#define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) +#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) + +#ifndef __ASSEMBLY__ +static inline pte_t pte_mkprivileged(pte_t pte) +{ + return __pte((pte_val(pte) & ~_PAGE_USER) | _PAGE_PRIVILEGED); +} + +#define pte_mkprivileged pte_mkprivileged + +static inline pte_t pte_mkuser(pte_t pte) +{ + return __pte((pte_val(pte) & ~_PAGE_PRIVILEGED) | _PAGE_USER); +} + +#define pte_mkuser pte_mkuser +#endif /* __ASSEMBLY__ */ + #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_NOHASH_PTE_BOOK3E_H */ diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h @@ -1050,6 +1050,7 @@ enum OpalSysCooling { enum { OPAL_REBOOT_NORMAL = 0, OPAL_REBOOT_PLATFORM_ERROR = 1, + OPAL_REBOOT_FULL_IPL = 2, }; /* Argument to OPAL_PCI_TCE_KILL */ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h @@ -113,7 +113,13 @@ struct paca_struct { * on the linear mapping */ /* SLB related definitions */ u16 vmalloc_sllp; - u16 slb_cache_ptr; + u8 slb_cache_ptr; + u8 stab_rr; /* stab/slb round-robin counter */ +#ifdef CONFIG_DEBUG_VM + u8 in_kernel_slb_handler; +#endif + u32 slb_used_bitmap; /* Bitmaps for first 32 SLB entries. */ + u32 slb_kern_bitmap; u32 slb_cache[SLB_CACHE_ENTRIES]; #endif /* CONFIG_PPC_BOOK3S_64 */ @@ -160,7 +166,6 @@ struct paca_struct { */ struct task_struct *__current; /* Pointer to current */ u64 kstack; /* Saved Kernel stack addr */ - u64 stab_rr; /* stab/slb round-robin counter */ u64 saved_r1; /* r1 save for RTAS calls or PM or EE=0 */ u64 saved_msr; /* MSR saved here by enter_rtas */ u16 trap_save; /* Used when bad stack is encountered */ @@ -250,6 +255,15 @@ struct paca_struct { #ifdef CONFIG_PPC_PSERIES u8 *mce_data_buf; /* buffer to hold per cpu rtas errlog */ #endif /* CONFIG_PPC_PSERIES */ + +#ifdef CONFIG_PPC_BOOK3S_64 + /* Capture SLB related old contents in MCE handler. */ + struct slb_entry *mce_faulty_slbs; + u16 slb_save_cache_ptr; +#endif /* CONFIG_PPC_BOOK3S_64 */ +#ifdef CONFIG_STACKPROTECTOR + unsigned long canary; +#endif } ____cacheline_aligned; extern void copy_mm_to_paca(struct mm_struct *mm); diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h @@ -20,6 +20,25 @@ struct mm_struct; #include <asm/nohash/pgtable.h> #endif /* !CONFIG_PPC_BOOK3S */ +/* Note due to the way vm flags are laid out, the bits are XWR */ +#define __P000 PAGE_NONE +#define __P001 PAGE_READONLY +#define __P010 PAGE_COPY +#define __P011 PAGE_COPY +#define __P100 PAGE_READONLY_X +#define __P101 PAGE_READONLY_X +#define __P110 PAGE_COPY_X +#define __P111 PAGE_COPY_X + +#define __S000 PAGE_NONE +#define __S001 PAGE_READONLY +#define __S010 PAGE_SHARED +#define __S011 PAGE_SHARED +#define __S100 PAGE_READONLY_X +#define __S101 PAGE_READONLY_X +#define __S110 PAGE_SHARED_X +#define __S111 PAGE_SHARED_X + #ifndef __ASSEMBLY__ #include <asm/tlbflush.h> @@ -27,6 +46,16 @@ struct mm_struct; /* Keep these as a macros to avoid include dependency mess */ #define pte_page(x) pfn_to_page(pte_pfn(x)) #define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot)) +/* + * Select all bits except the pfn + */ +static inline pgprot_t pte_pgprot(pte_t pte) +{ + unsigned long pte_flags; + + pte_flags = pte_val(pte) & ~PTE_RPN_MASK; + return __pgprot(pte_flags); +} /* * ZERO_PAGE is a global shared page that is always zero: used diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h @@ -58,6 +58,7 @@ void eeh_save_bars(struct eeh_dev *edev); int rtas_write_config(struct pci_dn *, int where, int size, u32 val); int rtas_read_config(struct pci_dn *, int where, int size, u32 *val); void eeh_pe_state_mark(struct eeh_pe *pe, int state); +void eeh_pe_mark_isolated(struct eeh_pe *pe); void eeh_pe_state_clear(struct eeh_pe *pe, int state); void eeh_pe_state_mark_with_cfg(struct eeh_pe *pe, int state); void eeh_pe_dev_mode_mark(struct eeh_pe *pe, int mode); diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h @@ -32,9 +32,9 @@ /* Default SMT priority is set to 3. Use 11- 13bits to save priority. */ #define PPR_PRIORITY 3 #ifdef __ASSEMBLY__ -#define INIT_PPR (PPR_PRIORITY << 50) +#define DEFAULT_PPR (PPR_PRIORITY << 50) #else -#define INIT_PPR ((u64)PPR_PRIORITY << 50) +#define DEFAULT_PPR ((u64)PPR_PRIORITY << 50) #endif /* __ASSEMBLY__ */ #endif /* CONFIG_PPC64 */ @@ -273,6 +273,7 @@ struct thread_struct { #endif /* CONFIG_HAVE_HW_BREAKPOINT */ struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */ unsigned long trap_nr; /* last trap # on this thread */ + u8 load_slb; /* Ages out SLB preload cache entries */ u8 load_fp; #ifdef CONFIG_ALTIVEC u8 load_vec; @@ -341,7 +342,6 @@ struct thread_struct { * onwards. */ int dscr_inherit; - unsigned long ppr; /* used to save/restore SMT priority */ unsigned long tidr; #endif #ifdef CONFIG_PPC_BOOK3S_64 @@ -389,7 +389,6 @@ struct thread_struct { .regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \ .addr_limit = KERNEL_DS, \ .fpexc_mode = 0, \ - .ppr = INIT_PPR, \ .fscr = FSCR_TAR | FSCR_EBB \ } #endif diff --git a/arch/powerpc/include/asm/pte-common.h b/arch/powerpc/include/asm/pte-common.h @@ -1,219 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* Included from asm/pgtable-*.h only ! */ - -/* - * Some bits are only used on some cpu families... Make sure that all - * the undefined gets a sensible default - */ -#ifndef _PAGE_HASHPTE -#define _PAGE_HASHPTE 0 -#endif -#ifndef _PAGE_HWWRITE -#define _PAGE_HWWRITE 0 -#endif -#ifndef _PAGE_EXEC -#define _PAGE_EXEC 0 -#endif -#ifndef _PAGE_ENDIAN -#define _PAGE_ENDIAN 0 -#endif -#ifndef _PAGE_COHERENT -#define _PAGE_COHERENT 0 -#endif -#ifndef _PAGE_WRITETHRU -#define _PAGE_WRITETHRU 0 -#endif -#ifndef _PAGE_4K_PFN -#define _PAGE_4K_PFN 0 -#endif -#ifndef _PAGE_SAO -#define _PAGE_SAO 0 -#endif -#ifndef _PAGE_PSIZE -#define _PAGE_PSIZE 0 -#endif -/* _PAGE_RO and _PAGE_RW shall not be defined at the same time */ -#ifndef _PAGE_RO -#define _PAGE_RO 0 -#else -#define _PAGE_RW 0 -#endif - -#ifndef _PAGE_PTE -#define _PAGE_PTE 0 -#endif -/* At least one of _PAGE_PRIVILEGED or _PAGE_USER must be defined */ -#ifndef _PAGE_PRIVILEGED -#define _PAGE_PRIVILEGED 0 -#else -#ifndef _PAGE_USER -#define _PAGE_USER 0 -#endif -#endif -#ifndef _PAGE_NA -#define _PAGE_NA 0 -#endif -#ifndef _PAGE_HUGE -#define _PAGE_HUGE 0 -#endif - -#ifndef _PMD_PRESENT_MASK -#define _PMD_PRESENT_MASK _PMD_PRESENT -#endif -#ifndef _PMD_USER -#define _PMD_USER 0 -#endif -#ifndef _PAGE_KERNEL_RO -#define _PAGE_KERNEL_RO (_PAGE_PRIVILEGED | _PAGE_RO) -#endif -#ifndef _PAGE_KERNEL_ROX -#define _PAGE_KERNEL_ROX (_PAGE_PRIVILEGED | _PAGE_RO | _PAGE_EXEC) -#endif -#ifndef _PAGE_KERNEL_RW -#define _PAGE_KERNEL_RW (_PAGE_PRIVILEGED | _PAGE_DIRTY | _PAGE_RW | \ - _PAGE_HWWRITE) -#endif -#ifndef _PAGE_KERNEL_RWX -#define _PAGE_KERNEL_RWX (_PAGE_PRIVILEGED | _PAGE_DIRTY | _PAGE_RW | \ - _PAGE_HWWRITE | _PAGE_EXEC) -#endif -#ifndef _PAGE_HPTEFLAGS -#define _PAGE_HPTEFLAGS _PAGE_HASHPTE -#endif -#ifndef _PTE_NONE_MASK -#define _PTE_NONE_MASK _PAGE_HPTEFLAGS -#endif - -#ifndef __ASSEMBLY__ - -/* - * Don't just check for any non zero bits in __PAGE_USER, since for book3e - * and PTE_64BIT, PAGE_KERNEL_X contains _PAGE_BAP_SR which is also in - * _PAGE_USER. Need to explicitly match _PAGE_BAP_UR bit in that case too. - */ -static inline bool pte_user(pte_t pte) -{ - return (pte_val(pte) & (_PAGE_USER | _PAGE_PRIVILEGED)) == _PAGE_USER; -} -#endif /* __ASSEMBLY__ */ - -/* Location of the PFN in the PTE. Most 32-bit platforms use the same - * as _PAGE_SHIFT here (ie, naturally aligned). - * Platform who don't just pre-define the value so we don't override it here - */ -#ifndef PTE_RPN_SHIFT -#define PTE_RPN_SHIFT (PAGE_SHIFT) -#endif - -/* The mask covered by the RPN must be a ULL on 32-bit platforms with - * 64-bit PTEs - */ -#if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT) -#define PTE_RPN_MASK (~((1ULL<<PTE_RPN_SHIFT)-1)) -#else -#define PTE_RPN_MASK (~((1UL<<PTE_RPN_SHIFT)-1)) -#endif - -/* _PAGE_CHG_MASK masks of bits that are to be preserved across - * pgprot changes - */ -#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \ - _PAGE_ACCESSED | _PAGE_SPECIAL) - -/* Mask of bits returned by pte_pgprot() */ -#define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \ - _PAGE_WRITETHRU | _PAGE_ENDIAN | _PAGE_4K_PFN | \ - _PAGE_USER | _PAGE_ACCESSED | _PAGE_RO | _PAGE_NA | \ - _PAGE_PRIVILEGED | \ - _PAGE_RW | _PAGE_HWWRITE | _PAGE_DIRTY | _PAGE_EXEC) - -/* - * We define 2 sets of base prot bits, one for basic pages (ie, - * cacheable kernel and user pages) and one for non cacheable - * pages. We always set _PAGE_COHERENT when SMP is enabled or - * the processor might need it for DMA coherency. - */ -#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE) -#if defined(CONFIG_SMP) || defined(CONFIG_PPC_STD_MMU) || \ - defined(CONFIG_PPC_E500MC) -#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT) -#else -#define _PAGE_BASE (_PAGE_BASE_NC) -#endif - -/* Permission masks used to generate the __P and __S table, - * - * Note:__pgprot is defined in arch/powerpc/include/asm/page.h - * - * Write permissions imply read permissions for now (we could make write-only - * pages on BookE but we don't bother for now). Execute permission control is - * possible on platforms that define _PAGE_EXEC - * - * Note due to the way vm flags are laid out, the bits are XWR - */ -#define PAGE_NONE __pgprot(_PAGE_BASE | _PAGE_NA) -#define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW) -#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | \ - _PAGE_EXEC) -#define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO) -#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO | \ - _PAGE_EXEC) -#define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO) -#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO | \ - _PAGE_EXEC) - -#define __P000 PAGE_NONE -#define __P001 PAGE_READONLY -#define __P010 PAGE_COPY -#define __P011 PAGE_COPY -#define __P100 PAGE_READONLY_X -#define __P101 PAGE_READONLY_X -#define __P110 PAGE_COPY_X -#define __P111 PAGE_COPY_X - -#define __S000 PAGE_NONE -#define __S001 PAGE_READONLY -#define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED -#define __S100 PAGE_READONLY_X -#define __S101 PAGE_READONLY_X -#define __S110 PAGE_SHARED_X -#define __S111 PAGE_SHARED_X - -/* Permission masks used for kernel mappings */ -#define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) -#define PAGE_KERNEL_NC __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | \ - _PAGE_NO_CACHE) -#define PAGE_KERNEL_NCG __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | \ - _PAGE_NO_CACHE | _PAGE_GUARDED) -#define PAGE_KERNEL_X __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) -#define PAGE_KERNEL_RO __pgprot(_PAGE_BASE | _PAGE_KERNEL_RO) -#define PAGE_KERNEL_ROX __pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX) - -/* Protection used for kernel text. We want the debuggers to be able to - * set breakpoints anywhere, so don't write protect the kernel text - * on platforms where such control is possible. - */ -#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\ - defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) -#define PAGE_KERNEL_TEXT PAGE_KERNEL_X -#else -#define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX -#endif - -/* Make modules code happy. We don't set RO yet */ -#define PAGE_KERNEL_EXEC PAGE_KERNEL_X - -/* Advertise special mapping type for AGP */ -#define PAGE_AGP (PAGE_KERNEL_NC) -#define HAVE_PAGE_AGP - -#ifndef _PAGE_READ -/* if not defined, we should not find _PAGE_WRITE too */ -#define _PAGE_READ 0 -#define _PAGE_WRITE _PAGE_RW -#endif - -#ifndef H_PAGE_4K_PFN -#define H_PAGE_4K_PFN 0 -#endif diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h @@ -26,6 +26,37 @@ #include <uapi/asm/ptrace.h> #include <asm/asm-const.h> +#ifndef __ASSEMBLY__ +struct pt_regs +{ + union { + struct user_pt_regs user_regs; + struct { + unsigned long gpr[32]; + unsigned long nip; + unsigned long msr; + unsigned long orig_gpr3; + unsigned long ctr; + unsigned long link; + unsigned long xer; + unsigned long ccr; +#ifdef CONFIG_PPC64 + unsigned long softe; +#else + unsigned long mq; +#endif + unsigned long trap; + unsigned long dar; + unsigned long dsisr; + unsigned long result; + }; + }; + +#ifdef CONFIG_PPC64 + unsigned long ppr; +#endif +}; +#endif #ifdef __powerpc64__ @@ -102,6 +133,11 @@ static inline long regs_return_value(struct pt_regs *regs) return -regs->gpr[3]; } +static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc) +{ + regs->gpr[3] = rc; +} + #ifdef __powerpc64__ #define user_mode(regs) ((((regs)->msr) >> MSR_PR_LG) & 0x1) #else diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h @@ -118,11 +118,16 @@ #define MSR_TS_S __MASK(MSR_TS_S_LG) /* Transaction Suspended */ #define MSR_TS_T __MASK(MSR_TS_T_LG) /* Transaction Transactional */ #define MSR_TS_MASK (MSR_TS_T | MSR_TS_S) /* Transaction State bits */ -#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */ #define MSR_TM_RESV(x) (((x) & MSR_TS_MASK) == MSR_TS_MASK) /* Reserved */ #define MSR_TM_TRANSACTIONAL(x) (((x) & MSR_TS_MASK) == MSR_TS_T) #define MSR_TM_SUSPENDED(x) (((x) & MSR_TS_MASK) == MSR_TS_S) +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */ +#else +#define MSR_TM_ACTIVE(x) 0 +#endif + #if defined(CONFIG_PPC_BOOK3S_64) #define MSR_64BIT MSR_SF diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h @@ -125,6 +125,7 @@ struct rtas_suspend_me_data { #define RTAS_TYPE_INFO 0xE2 #define RTAS_TYPE_DEALLOC 0xE3 #define RTAS_TYPE_DUMP 0xE4 +#define RTAS_TYPE_HOTPLUG 0xE5 /* I don't add PowerMGM events right now, this is a different topic */ #define RTAS_TYPE_PMGM_POWER_SW_ON 0x60 #define RTAS_TYPE_PMGM_POWER_SW_OFF 0x61 @@ -185,11 +186,23 @@ static inline uint8_t rtas_error_disposition(const struct rtas_error_log *elog) return (elog->byte1 & 0x18) >> 3; } +static inline +void rtas_set_disposition_recovered(struct rtas_error_log *elog) +{ + elog->byte1 &= ~0x18; + elog->byte1 |= (RTAS_DISP_FULLY_RECOVERED << 3); +} + static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog) { return (elog->byte1 & 0x04) >> 2; } +static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog) +{ + return (elog->byte2 & 0xf0) >> 4; +} + #define rtas_error_type(x) ((x)->byte3) static inline @@ -275,6 +288,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log) #define PSERIES_ELOG_SECT_ID_CALL_HOME (('C' << 8) | 'H') #define PSERIES_ELOG_SECT_ID_USER_DEF (('U' << 8) | 'D') #define PSERIES_ELOG_SECT_ID_HOTPLUG (('H' << 8) | 'P') +#define PSERIES_ELOG_SECT_ID_MCE (('M' << 8) | 'C') /* Vendor specific Platform Event Log Format, Version 6, section header */ struct pseries_errorlog { @@ -316,6 +330,7 @@ struct pseries_hp_errorlog { #define PSERIES_HP_ELOG_RESOURCE_MEM 2 #define PSERIES_HP_ELOG_RESOURCE_SLOT 3 #define PSERIES_HP_ELOG_RESOURCE_PHB 4 +#define PSERIES_HP_ELOG_RESOURCE_PMEM 6 #define PSERIES_HP_ELOG_ACTION_ADD 1 #define PSERIES_HP_ELOG_ACTION_REMOVE 2 diff --git a/arch/powerpc/include/asm/slice.h b/arch/powerpc/include/asm/slice.h @@ -32,6 +32,7 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start, unsigned long len, unsigned int psize); void slice_init_new_context_exec(struct mm_struct *mm); +void slice_setup_new_exec(void); #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h @@ -100,6 +100,7 @@ static inline void set_hard_smp_processor_id(int cpu, int phys) DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_map); DECLARE_PER_CPU(cpumask_var_t, cpu_l2_cache_map); DECLARE_PER_CPU(cpumask_var_t, cpu_core_map); +DECLARE_PER_CPU(cpumask_var_t, cpu_smallcore_map); static inline struct cpumask *cpu_sibling_mask(int cpu) { @@ -116,6 +117,11 @@ static inline struct cpumask *cpu_l2_cache_mask(int cpu) return per_cpu(cpu_l2_cache_map, cpu); } +static inline struct cpumask *cpu_smallcore_mask(int cpu) +{ + return per_cpu(cpu_smallcore_map, cpu); +} + extern int cpu_to_core_id(int cpu); /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers. @@ -166,6 +172,11 @@ static inline const struct cpumask *cpu_sibling_mask(int cpu) return cpumask_of(cpu); } +static inline const struct cpumask *cpu_smallcore_mask(int cpu) +{ + return cpumask_of(cpu); +} + #endif /* CONFIG_SMP */ #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h @@ -9,17 +9,6 @@ * MAX_PHYSMEM_BITS 2^N: how much memory we can have in that space */ #define SECTION_SIZE_BITS 24 -/* - * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS - * if we increase SECTIONS_WIDTH we will not store node details in page->flags and - * page_to_nid does a page->section->node lookup - * Hence only increase for VMEMMAP. - */ -#ifdef CONFIG_SPARSEMEM_VMEMMAP -#define MAX_PHYSMEM_BITS 47 -#else -#define MAX_PHYSMEM_BITS 46 -#endif #endif /* CONFIG_SPARSEMEM */ diff --git a/arch/powerpc/include/asm/stackprotector.h b/arch/powerpc/include/asm/stackprotector.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * GCC stack protector support. + * + */ + +#ifndef _ASM_STACKPROTECTOR_H +#define _ASM_STACKPROTECTOR_H + +#include <linux/random.h> +#include <linux/version.h> +#include <asm/reg.h> +#include <asm/current.h> +#include <asm/paca.h> + +/* + * Initialize the stackprotector canary value. + * + * NOTE: this must only be called from functions that never return, + * and it must always be inlined. + */ +static __always_inline void boot_init_stack_canary(void) +{ + unsigned long canary; + + /* Try to get a semi random initial value. */ + canary = get_random_canary(); + canary ^= mftb(); + canary ^= LINUX_VERSION_CODE; + canary &= CANARY_MASK; + + current->stack_canary = canary; +#ifdef CONFIG_PPC64 + get_paca()->canary = canary; +#endif +} + +#endif /* _ASM_STACKPROTECTOR_H */ diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h @@ -29,6 +29,7 @@ #include <asm/page.h> #include <asm/accounting.h> +#define SLB_PRELOAD_NR 16U /* * low level task data. */ @@ -44,6 +45,10 @@ struct thread_info { #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC32) struct cpu_accounting_data accounting; #endif + unsigned char slb_preload_nr; + unsigned char slb_preload_tail; + u32 slb_preload_esid[SLB_PRELOAD_NR]; + /* low level flags - has atomic operations done on it */ unsigned long flags ____cacheline_aligned_in_smp; }; @@ -72,6 +77,12 @@ static inline struct thread_info *current_thread_info(void) } extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); + +#ifdef CONFIG_PPC_BOOK3S_64 +void arch_setup_new_exec(void); +#define arch_setup_new_exec arch_setup_new_exec +#endif + #endif /* __ASSEMBLY__ */ /* @@ -81,7 +92,7 @@ extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src #define TIF_SIGPENDING 1 /* signal pending */ #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ #define TIF_FSCHECK 3 /* Check FS is USER_DS on return */ -#define TIF_32BIT 4 /* 32 bit binary */ +#define TIF_SYSCALL_EMU 4 /* syscall emulation active */ #define TIF_RESTORE_TM 5 /* need to restore TM FP/VEC/VSX */ #define TIF_PATCH_PENDING 6 /* pending live patching update */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ @@ -100,6 +111,7 @@ extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src #define TIF_ELF2ABI 18 /* function descriptors must die! */ #endif #define TIF_POLLING_NRFLAG 19 /* true if poll_idle() is polling TIF_NEED_RESCHED */ +#define TIF_32BIT 20 /* 32 bit binary */ /* as above, but as bit values */ #define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE) @@ -120,9 +132,10 @@ extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src #define _TIF_EMULATE_STACK_STORE (1<<TIF_EMULATE_STACK_STORE) #define _TIF_NOHZ (1<<TIF_NOHZ) #define _TIF_FSCHECK (1<<TIF_FSCHECK) +#define _TIF_SYSCALL_EMU (1<<TIF_SYSCALL_EMU) #define _TIF_SYSCALL_DOTRACE (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \ - _TIF_NOHZ) + _TIF_NOHZ | _TIF_SYSCALL_EMU) #define _TIF_USER_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \ _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h @@ -201,6 +201,21 @@ TRACE_EVENT(tlbie, __entry->r) ); +TRACE_EVENT(tlbia, + + TP_PROTO(unsigned long id), + TP_ARGS(id), + TP_STRUCT__entry( + __field(unsigned long, id) + ), + + TP_fast_assign( + __entry->id = id; + ), + + TP_printk("ctx.id=0x%lx", __entry->id) +); + #endif /* _TRACE_POWERPC_H */ #undef TRACE_INCLUDE_PATH diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h @@ -260,7 +260,7 @@ do { \ ({ \ long __gu_err; \ __long_type(*(ptr)) __gu_val; \ - const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ + __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr); \ if (!is_kernel_addr((unsigned long)__gu_addr)) \ might_fault(); \ @@ -274,7 +274,7 @@ do { \ ({ \ long __gu_err = -EFAULT; \ __long_type(*(ptr)) __gu_val = 0; \ - const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ + __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ might_fault(); \ if (access_ok(VERIFY_READ, __gu_addr, (size))) { \ barrier_nospec(); \ @@ -288,7 +288,7 @@ do { \ ({ \ long __gu_err; \ __long_type(*(ptr)) __gu_val; \ - const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ + __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr); \ barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ diff --git a/arch/powerpc/include/asm/user.h b/arch/powerpc/include/asm/user.h @@ -31,7 +31,7 @@ * to write an integer number of pages. */ struct user { - struct pt_regs regs; /* entire machine state */ + struct user_pt_regs regs; /* entire machine state */ size_t u_tsize; /* text size (pages) */ size_t u_dsize; /* data size (pages) */ size_t u_ssize; /* stack size (pages) */ diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h @@ -29,7 +29,12 @@ #ifndef __ASSEMBLY__ -struct pt_regs { +#ifdef __KERNEL__ +struct user_pt_regs +#else +struct pt_regs +#endif +{ unsigned long gpr[32]; unsigned long nip; unsigned long msr; @@ -160,6 +165,10 @@ struct pt_regs { #define PTRACE_GETVSRREGS 0x1b #define PTRACE_SETVSRREGS 0x1c +/* Syscall emulation defines */ +#define PTRACE_SYSEMU 0x1d +#define PTRACE_SYSEMU_SINGLESTEP 0x1e + /* * Get or set a debug register. The first 16 are DABR registers and the * second 16 are IABR registers. diff --git a/arch/powerpc/include/uapi/asm/sigcontext.h b/arch/powerpc/include/uapi/asm/sigcontext.h @@ -22,7 +22,11 @@ struct sigcontext { #endif unsigned long handler; unsigned long oldmask; - struct pt_regs __user *regs; +#ifdef __KERNEL__ + struct user_pt_regs __user *regs; +#else + struct pt_regs *regs; +#endif #ifdef __powerpc64__ elf_gregset_t gp_regs; elf_fpregset_t fp_regs; diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile @@ -5,7 +5,8 @@ CFLAGS_ptrace.o += -DUTS_MACHINE='"$(UTS_MACHINE)"' -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror +# Disable clang warning for using setjmp without setjmp.h header +CFLAGS_crash.o += $(call cc-disable-warning, builtin-requires-header) ifdef CONFIG_PPC64 CFLAGS_prom_init.o += $(NO_MINIMAL_TOC) @@ -20,12 +21,14 @@ CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) +CFLAGS_prom_init.o += $(call cc-option, -fno-stack-protector) + ifdef CONFIG_FUNCTION_TRACER # Do not trace early boot code -CFLAGS_REMOVE_cputable.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) -CFLAGS_REMOVE_prom_init.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) -CFLAGS_REMOVE_btext.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) -CFLAGS_REMOVE_prom.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_cputable.o = $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_prom_init.o = $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_btext.o = $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_prom.o = $(CC_FLAGS_FTRACE) endif obj-y := cputable.o ptrace.o syscalls.o \ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c @@ -79,11 +79,16 @@ int main(void) { OFFSET(THREAD, task_struct, thread); OFFSET(MM, task_struct, mm); +#ifdef CONFIG_STACKPROTECTOR + OFFSET(TASK_CANARY, task_struct, stack_canary); +#ifdef CONFIG_PPC64 + OFFSET(PACA_CANARY, paca_struct, canary); +#endif +#endif OFFSET(MMCONTEXTID, mm_struct, context.id); #ifdef CONFIG_PPC64 DEFINE(SIGSEGV, SIGSEGV); DEFINE(NMI_MASK, NMI_MASK); - OFFSET(TASKTHREADPPR, task_struct, thread.ppr); #else OFFSET(THREAD_INFO, task_struct, stack); DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16)); @@ -173,7 +178,6 @@ int main(void) OFFSET(PACAKSAVE, paca_struct, kstack); OFFSET(PACACURRENT, paca_struct, __current); OFFSET(PACASAVEDMSR, paca_struct, saved_msr); - OFFSET(PACASTABRR, paca_struct, stab_rr); OFFSET(PACAR1, paca_struct, saved_r1); OFFSET(PACATOC, paca_struct, kernel_toc); OFFSET(PACAKBASE, paca_struct, kernelbase); @@ -212,6 +216,7 @@ int main(void) #ifdef CONFIG_PPC_BOOK3S_64 OFFSET(PACASLBCACHE, paca_struct, slb_cache); OFFSET(PACASLBCACHEPTR, paca_struct, slb_cache_ptr); + OFFSET(PACASTABRR, paca_struct, stab_rr); OFFSET(PACAVMALLOCSLLP, paca_struct, vmalloc_sllp); #ifdef CONFIG_PPC_MM_SLICES OFFSET(MMUPSIZESLLP, mmu_psize_def, sllp); @@ -274,11 +279,6 @@ int main(void) /* Interrupt register frame */ DEFINE(INT_FRAME_SIZE, STACK_INT_FRAME_SIZE); DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs)); -#ifdef CONFIG_PPC64 - /* Create extra stack space for SRR0 and SRR1 when calling prom/rtas. */ - DEFINE(PROM_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 16); - DEFINE(RTAS_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 16); -#endif /* CONFIG_PPC64 */ STACK_PT_REGS_OFFSET(GPR0, gpr[0]); STACK_PT_REGS_OFFSET(GPR1, gpr[1]); STACK_PT_REGS_OFFSET(GPR2, gpr[2]); @@ -322,10 +322,7 @@ int main(void) STACK_PT_REGS_OFFSET(_ESR, dsisr); #else /* CONFIG_PPC64 */ STACK_PT_REGS_OFFSET(SOFTE, softe); - - /* These _only_ to be used with {PROM,RTAS}_FRAME_SIZE!!! */ - DEFINE(_SRR0, STACK_FRAME_OVERHEAD+sizeof(struct pt_regs)); - DEFINE(_SRR1, STACK_FRAME_OVERHEAD+sizeof(struct pt_regs)+8); + STACK_PT_REGS_OFFSET(_PPR, ppr); #endif /* CONFIG_PPC64 */ #if defined(CONFIG_PPC32) diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c @@ -163,7 +163,7 @@ void btext_map(void) offset = ((unsigned long) dispDeviceBase) - base; size = dispDeviceRowBytes * dispDeviceRect[3] + offset + dispDeviceRect[0]; - vbase = __ioremap(base, size, pgprot_val(pgprot_noncached_wc(__pgprot(0)))); + vbase = ioremap_wc(base, size); if (!vbase) return; logicalDisplayBase = vbase + offset; diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c @@ -20,6 +20,8 @@ #include <linux/percpu.h> #include <linux/slab.h> #include <asm/prom.h> +#include <asm/cputhreads.h> +#include <asm/smp.h> #include "cacheinfo.h" @@ -627,17 +629,48 @@ static ssize_t level_show(struct kobject *k, struct kobj_attribute *attr, char * static struct kobj_attribute cache_level_attr = __ATTR(level, 0444, level_show, NULL); +static unsigned int index_dir_to_cpu(struct cache_index_dir *index) +{ + struct kobject *index_dir_kobj = &index->kobj; + struct kobject *cache_dir_kobj = index_dir_kobj->parent; + struct kobject *cpu_dev_kobj = cache_dir_kobj->parent; + struct device *dev = kobj_to_dev(cpu_dev_kobj); + + return dev->id; +} + +/* + * On big-core systems, each core has two groups of CPUs each of which + * has its own L1-cache. The thread-siblings which share l1-cache with + * @cpu can be obtained via cpu_smallcore_mask(). + */ +static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache) +{ + if (cache->level == 1) + return cpu_smallcore_mask(cpu); + + return &cache->shared_cpu_map; +} + static ssize_t shared_cpu_map_show(struct kobject *k, struct kobj_attribute *attr, char *buf) { struct cache_index_dir *index; struct cache *cache; - int ret; + const struct cpumask *mask; + int ret, cpu; index = kobj_to_cache_index_dir(k); cache = index->cache; + if (has_big_cores) { + cpu = index_dir_to_cpu(index); + mask = get_big_core_shared_cpu_map(cpu, cache); + } else { + mask = &cache->shared_cpu_map; + } + ret = scnprintf(buf, PAGE_SIZE - 1, "%*pb\n", - cpumask_pr_args(&cache->shared_cpu_map)); + cpumask_pr_args(mask)); buf[ret++] = '\n'; buf[ret] = '\0'; return ret; diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c @@ -110,7 +110,7 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf, vaddr = __va(paddr); csize = copy_oldmem_vaddr(vaddr, buf, csize, offset, userbuf); } else { - vaddr = __ioremap(paddr, PAGE_SIZE, 0); + vaddr = ioremap_cache(paddr, PAGE_SIZE); csize = copy_oldmem_vaddr(vaddr, buf, csize, offset, userbuf); iounmap(vaddr); } diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c @@ -169,6 +169,11 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len) int n = 0, l = 0; char buffer[128]; + if (!pdn) { + pr_warn("EEH: Note: No error log for absent device.\n"); + return 0; + } + n += scnprintf(buf+n, len-n, "%04x:%02x:%02x.%01x\n", pdn->phb->global_number, pdn->busno, PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn)); @@ -399,7 +404,7 @@ static int eeh_phb_check_failure(struct eeh_pe *pe) } /* Isolate the PHB and send event */ - eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(phb_pe); eeh_serialize_unlock(flags); pr_err("EEH: PHB#%x failure detected, location: %s\n", @@ -558,7 +563,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev) * with other functions on this device, and functions under * bridges. */ - eeh_pe_state_mark(pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(pe); eeh_serialize_unlock(flags); /* Most EEH events are due to device driver bugs. Having @@ -676,7 +681,7 @@ int eeh_pci_enable(struct eeh_pe *pe, int function) /* Check if the request is finished successfully */ if (active_flag) { - rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC); + rc = eeh_wait_state(pe, PCI_BUS_RESET_WAIT_MSEC); if (rc < 0) return rc; @@ -825,7 +830,8 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat eeh_pe_state_clear(pe, EEH_PE_ISOLATED); break; case pcie_hot_reset: - eeh_pe_state_mark_with_cfg(pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(pe); + eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED); eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE); eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev); if (!(pe->type & EEH_PE_VF)) @@ -833,7 +839,8 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat eeh_ops->reset(pe, EEH_RESET_HOT); break; case pcie_warm_reset: - eeh_pe_state_mark_with_cfg(pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(pe); + eeh_pe_state_clear(pe, EEH_PE_CFG_BLOCKED); eeh_ops->set_option(pe, EEH_OPT_FREEZE_PE); eeh_pe_dev_traverse(pe, eeh_disable_and_save_dev_state, dev); if (!(pe->type & EEH_PE_VF)) @@ -913,16 +920,15 @@ int eeh_pe_reset_full(struct eeh_pe *pe) break; /* Wait until the PE is in a functioning state */ - state = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC); - if (eeh_state_active(state)) - break; - + state = eeh_wait_state(pe, PCI_BUS_RESET_WAIT_MSEC); if (state < 0) { pr_warn("%s: Unrecoverable slot failure on PHB#%x-PE#%x", __func__, pe->phb->global_number, pe->addr); ret = -ENOTRECOVERABLE; break; } + if (eeh_state_active(state)) + break; /* Set error in case this is our last attempt */ ret = -EIO; @@ -1036,6 +1042,11 @@ void eeh_probe_devices(void) pdn = hose->pci_data; traverse_pci_dn(pdn, eeh_ops->probe, NULL); } + if (eeh_enabled()) + pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n"); + else + pr_info("EEH: No capable adapters found\n"); + } /** @@ -1079,18 +1090,7 @@ static int eeh_init(void) eeh_dev_phb_init_dynamic(hose); /* Initialize EEH event */ - ret = eeh_event_init(); - if (ret) - return ret; - - eeh_probe_devices(); - - if (eeh_enabled()) - pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n"); - else if (!eeh_has_flag(EEH_POSTPONED_PROBE)) - pr_info("EEH: No capable adapters found\n"); - - return ret; + return eeh_event_init(); } core_initcall_sync(eeh_init); diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c @@ -60,8 +60,6 @@ struct eeh_dev *eeh_dev_init(struct pci_dn *pdn) /* Associate EEH device with OF node */ pdn->edev = edev; edev->pdn = pdn; - INIT_LIST_HEAD(&edev->list); - INIT_LIST_HEAD(&edev->rmv_list); return edev; } diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c @@ -35,8 +35,8 @@ #include <asm/rtas.h> struct eeh_rmv_data { - struct list_head edev_list; - int removed; + struct list_head removed_vf_list; + int removed_dev_count; }; static int eeh_result_priority(enum pci_ers_result result) @@ -281,6 +281,10 @@ static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn, struct pci_driver *driver; enum pci_ers_result new_result; + if (!edev->pdev) { + eeh_edev_info(edev, "no device"); + return; + } device_lock(&edev->pdev->dev); if (eeh_edev_actionable(edev)) { driver = eeh_pcid_get(edev->pdev); @@ -400,7 +404,7 @@ static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata) * EEH device is created. */ if (edev->pe && (edev->pe->state & EEH_PE_CFG_RESTRICTED)) { - if (list_is_last(&edev->list, &edev->pe->edevs)) + if (list_is_last(&edev->entry, &edev->pe->edevs)) eeh_pe_restore_bars(edev->pe); return NULL; @@ -465,10 +469,9 @@ static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev, return rc; } -static void *eeh_add_virt_device(void *data, void *userdata) +static void *eeh_add_virt_device(struct eeh_dev *edev) { struct pci_driver *driver; - struct eeh_dev *edev = (struct eeh_dev *)data; struct pci_dev *dev = eeh_dev_to_pci_dev(edev); struct pci_dn *pdn = eeh_dev_to_pdn(edev); @@ -499,7 +502,6 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata) struct pci_driver *driver; struct pci_dev *dev = eeh_dev_to_pci_dev(edev); struct eeh_rmv_data *rmv_data = (struct eeh_rmv_data *)userdata; - int *removed = rmv_data ? &rmv_data->removed : NULL; /* * Actually, we should remove the PCI bridges as well. @@ -521,7 +523,7 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata) if (eeh_dev_removed(edev)) return NULL; - if (removed) { + if (rmv_data) { if (eeh_pe_passed(edev->pe)) return NULL; driver = eeh_pcid_get(dev); @@ -539,10 +541,9 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata) /* Remove it from PCI subsystem */ pr_debug("EEH: Removing %s without EEH sensitive driver\n", pci_name(dev)); - edev->bus = dev->bus; edev->mode |= EEH_DEV_DISCONNECTED; - if (removed) - (*removed)++; + if (rmv_data) + rmv_data->removed_dev_count++; if (edev->physfn) { #ifdef CONFIG_PCI_IOV @@ -558,7 +559,7 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata) pdn->pe_number = IODA_INVALID_PE; #endif if (rmv_data) - list_add(&edev->rmv_list, &rmv_data->edev_list); + list_add(&edev->rmv_entry, &rmv_data->removed_vf_list); } else { pci_lock_rescan_remove(); pci_stop_and_remove_bus_device(dev); @@ -727,7 +728,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, * the device up before the scripts have taken it down, * potentially weird things happen. */ - if (!driver_eeh_aware || rmv_data->removed) { + if (!driver_eeh_aware || rmv_data->removed_dev_count) { pr_info("EEH: Sleep 5s ahead of %s hotplug\n", (driver_eeh_aware ? "partial" : "complete")); ssleep(5); @@ -737,10 +738,10 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, * PE. We should disconnect it so the binding can be * rebuilt when adding PCI devices. */ - edev = list_first_entry(&pe->edevs, struct eeh_dev, list); + edev = list_first_entry(&pe->edevs, struct eeh_dev, entry); eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL); if (pe->type & EEH_PE_VF) { - eeh_add_virt_device(edev, NULL); + eeh_add_virt_device(edev); } else { if (!driver_eeh_aware) eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); @@ -789,7 +790,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe) struct eeh_pe *tmp_pe; int rc = 0; enum pci_ers_result result = PCI_ERS_RESULT_NONE; - struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0}; + struct eeh_rmv_data rmv_data = + {LIST_HEAD_INIT(rmv_data.removed_vf_list), 0}; bus = eeh_pe_bus_get(pe); if (!bus) { @@ -806,10 +808,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe) pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n", pe->phb->global_number, pe->addr, pe->freeze_count); - goto hard_fail; + result = PCI_ERS_RESULT_DISCONNECT; } - pr_warn("EEH: This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n", - pe->freeze_count, eeh_max_freezes); /* Walk the various device drivers attached to this slot through * a reset sequence, giving each an opportunity to do what it needs @@ -821,31 +821,39 @@ void eeh_handle_normal_event(struct eeh_pe *pe) * the error. Override the result if necessary to have partially * hotplug for this case. */ - pr_info("EEH: Notify device drivers to shutdown\n"); - eeh_set_channel_state(pe, pci_channel_io_frozen); - eeh_set_irq_state(pe, false); - eeh_pe_report("error_detected(IO frozen)", pe, eeh_report_error, - &result); - if ((pe->type & EEH_PE_PHB) && - result != PCI_ERS_RESULT_NONE && - result != PCI_ERS_RESULT_NEED_RESET) - result = PCI_ERS_RESULT_NEED_RESET; + if (result != PCI_ERS_RESULT_DISCONNECT) { + pr_warn("EEH: This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n", + pe->freeze_count, eeh_max_freezes); + pr_info("EEH: Notify device drivers to shutdown\n"); + eeh_set_channel_state(pe, pci_channel_io_frozen); + eeh_set_irq_state(pe, false); + eeh_pe_report("error_detected(IO frozen)", pe, + eeh_report_error, &result); + if ((pe->type & EEH_PE_PHB) && + result != PCI_ERS_RESULT_NONE && + result != PCI_ERS_RESULT_NEED_RESET) + result = PCI_ERS_RESULT_NEED_RESET; + } /* Get the current PCI slot state. This can take a long time, * sometimes over 300 seconds for certain systems. */ - rc = eeh_ops->wait_state(pe, MAX_WAIT_FOR_RECOVERY*1000); - if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) { - pr_warn("EEH: Permanent failure\n"); - goto hard_fail; + if (result != PCI_ERS_RESULT_DISCONNECT) { + rc = eeh_wait_state(pe, MAX_WAIT_FOR_RECOVERY*1000); + if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) { + pr_warn("EEH: Permanent failure\n"); + result = PCI_ERS_RESULT_DISCONNECT; + } } /* Since rtas may enable MMIO when posting the error log, * don't post the error log until after all dev drivers * have been informed. */ - pr_info("EEH: Collect temporary log\n"); - eeh_slot_error_detail(pe, EEH_LOG_TEMP); + if (result != PCI_ERS_RESULT_DISCONNECT) { + pr_info("EEH: Collect temporary log\n"); + eeh_slot_error_detail(pe, EEH_LOG_TEMP); + } /* If all device drivers were EEH-unaware, then shut * down all of the device drivers, and hope they @@ -857,7 +865,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) if (rc) { pr_warn("%s: Unable to reset, err=%d\n", __func__, rc); - goto hard_fail; + result = PCI_ERS_RESULT_DISCONNECT; } } @@ -866,9 +874,9 @@ void eeh_handle_normal_event(struct eeh_pe *pe) pr_info("EEH: Enable I/O for affected devices\n"); rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); - if (rc < 0) - goto hard_fail; - if (rc) { + if (rc < 0) { + result = PCI_ERS_RESULT_DISCONNECT; + } else if (rc) { result = PCI_ERS_RESULT_NEED_RESET; } else { pr_info("EEH: Notify device drivers to resume I/O\n"); @@ -882,9 +890,9 @@ void eeh_handle_normal_event(struct eeh_pe *pe) pr_info("EEH: Enabled DMA for affected devices\n"); rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA); - if (rc < 0) - goto hard_fail; - if (rc) { + if (rc < 0) { + result = PCI_ERS_RESULT_DISCONNECT; + } else if (rc) { result = PCI_ERS_RESULT_NEED_RESET; } else { /* @@ -897,12 +905,6 @@ void eeh_handle_normal_event(struct eeh_pe *pe) } } - /* If any device has a hard failure, then shut off everything. */ - if (result == PCI_ERS_RESULT_DISCONNECT) { - pr_warn("EEH: Device driver gave up\n"); - goto hard_fail; - } - /* If any device called out for a reset, then reset the slot */ if (result == PCI_ERS_RESULT_NEED_RESET) { pr_info("EEH: Reset without hotplug activity\n"); @@ -910,88 +912,81 @@ void eeh_handle_normal_event(struct eeh_pe *pe) if (rc) { pr_warn("%s: Cannot reset, err=%d\n", __func__, rc); - goto hard_fail; + result = PCI_ERS_RESULT_DISCONNECT; + } else { + result = PCI_ERS_RESULT_NONE; + eeh_set_channel_state(pe, pci_channel_io_normal); + eeh_set_irq_state(pe, true); + eeh_pe_report("slot_reset", pe, eeh_report_reset, + &result); } - - pr_info("EEH: Notify device drivers " - "the completion of reset\n"); - result = PCI_ERS_RESULT_NONE; - eeh_set_channel_state(pe, pci_channel_io_normal); - eeh_set_irq_state(pe, true); - eeh_pe_report("slot_reset", pe, eeh_report_reset, &result); - } - - /* All devices should claim they have recovered by now. */ - if ((result != PCI_ERS_RESULT_RECOVERED) && - (result != PCI_ERS_RESULT_NONE)) { - pr_warn("EEH: Not recovered\n"); - goto hard_fail; - } - - /* - * For those hot removed VFs, we should add back them after PF get - * recovered properly. - */ - list_for_each_entry_safe(edev, tmp, &rmv_data.edev_list, rmv_list) { - eeh_add_virt_device(edev, NULL); - list_del(&edev->rmv_list); } - /* Tell all device drivers that they can resume operations */ - pr_info("EEH: Notify device driver to resume\n"); - eeh_set_channel_state(pe, pci_channel_io_normal); - eeh_set_irq_state(pe, true); - eeh_pe_report("resume", pe, eeh_report_resume, NULL); - eeh_for_each_pe(pe, tmp_pe) { - eeh_pe_for_each_dev(tmp_pe, edev, tmp) { - edev->mode &= ~EEH_DEV_NO_HANDLER; - edev->in_error = false; + if ((result == PCI_ERS_RESULT_RECOVERED) || + (result == PCI_ERS_RESULT_NONE)) { + /* + * For those hot removed VFs, we should add back them after PF + * get recovered properly. + */ + list_for_each_entry_safe(edev, tmp, &rmv_data.removed_vf_list, + rmv_entry) { + eeh_add_virt_device(edev); + list_del(&edev->rmv_entry); } - } - pr_info("EEH: Recovery successful.\n"); - goto final; + /* Tell all device drivers that they can resume operations */ + pr_info("EEH: Notify device driver to resume\n"); + eeh_set_channel_state(pe, pci_channel_io_normal); + eeh_set_irq_state(pe, true); + eeh_pe_report("resume", pe, eeh_report_resume, NULL); + eeh_for_each_pe(pe, tmp_pe) { + eeh_pe_for_each_dev(tmp_pe, edev, tmp) { + edev->mode &= ~EEH_DEV_NO_HANDLER; + edev->in_error = false; + } + } -hard_fail: - /* - * About 90% of all real-life EEH failures in the field - * are due to poorly seated PCI cards. Only 10% or so are - * due to actual, failed cards. - */ - pr_err("EEH: Unable to recover from failure from PHB#%x-PE#%x.\n" - "Please try reseating or replacing it\n", - pe->phb->global_number, pe->addr); + pr_info("EEH: Recovery successful.\n"); + } else { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards. + */ + pr_err("EEH: Unable to recover from failure from PHB#%x-PE#%x.\n" + "Please try reseating or replacing it\n", + pe->phb->global_number, pe->addr); - eeh_slot_error_detail(pe, EEH_LOG_PERM); + eeh_slot_error_detail(pe, EEH_LOG_PERM); - /* Notify all devices that they're about to go down. */ - eeh_set_channel_state(pe, pci_channel_io_perm_failure); - eeh_set_irq_state(pe, false); - eeh_pe_report("error_detected(permanent failure)", pe, - eeh_report_failure, NULL); + /* Notify all devices that they're about to go down. */ + eeh_set_channel_state(pe, pci_channel_io_perm_failure); + eeh_set_irq_state(pe, false); + eeh_pe_report("error_detected(permanent failure)", pe, + eeh_report_failure, NULL); - /* Mark the PE to be removed permanently */ - eeh_pe_state_mark(pe, EEH_PE_REMOVED); + /* Mark the PE to be removed permanently */ + eeh_pe_state_mark(pe, EEH_PE_REMOVED); - /* - * Shut down the device drivers for good. We mark - * all removed devices correctly to avoid access - * the their PCI config any more. - */ - if (pe->type & EEH_PE_VF) { - eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL); - eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); - } else { - eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); - eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); + /* + * Shut down the device drivers for good. We mark + * all removed devices correctly to avoid access + * the their PCI config any more. + */ + if (pe->type & EEH_PE_VF) { + eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL); + eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); + } else { + eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); + eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); - pci_lock_rescan_remove(); - pci_hp_remove_devices(bus); - pci_unlock_rescan_remove(); - /* The passed PE should no longer be used */ - return; + pci_lock_rescan_remove(); + pci_hp_remove_devices(bus); + pci_unlock_rescan_remove(); + /* The passed PE should no longer be used */ + return; + } } -final: eeh_pe_state_clear(pe, EEH_PE_RECOVERING); } @@ -1026,7 +1021,7 @@ void eeh_handle_special_event(void) phb_pe = eeh_phb_pe_get(hose); if (!phb_pe) continue; - eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(phb_pe); } eeh_serialize_unlock(flags); @@ -1041,11 +1036,9 @@ void eeh_handle_special_event(void) /* Purge all events of the PHB */ eeh_remove_event(pe, true); - if (rc == EEH_NEXT_ERR_DEAD_PHB) - eeh_pe_state_mark(pe, EEH_PE_ISOLATED); - else - eeh_pe_state_mark(pe, - EEH_PE_ISOLATED | EEH_PE_RECOVERING); + if (rc != EEH_NEXT_ERR_DEAD_PHB) + eeh_pe_state_mark(pe, EEH_PE_RECOVERING); + eeh_pe_mark_isolated(pe); eeh_serialize_unlock(flags); diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c @@ -75,7 +75,6 @@ static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type) pe->type = type; pe->phb = phb; INIT_LIST_HEAD(&pe->child_list); - INIT_LIST_HEAD(&pe->child); INIT_LIST_HEAD(&pe->edevs); pe->data = (void *)pe + ALIGN(sizeof(struct eeh_pe), @@ -110,6 +109,57 @@ int eeh_phb_pe_create(struct pci_controller *phb) } /** + * eeh_wait_state - Wait for PE state + * @pe: EEH PE + * @max_wait: maximal period in millisecond + * + * Wait for the state of associated PE. It might take some time + * to retrieve the PE's state. + */ +int eeh_wait_state(struct eeh_pe *pe, int max_wait) +{ + int ret; + int mwait; + + /* + * According to PAPR, the state of PE might be temporarily + * unavailable. Under the circumstance, we have to wait + * for indicated time determined by firmware. The maximal + * wait time is 5 minutes, which is acquired from the original + * EEH implementation. Also, the original implementation + * also defined the minimal wait time as 1 second. + */ +#define EEH_STATE_MIN_WAIT_TIME (1000) +#define EEH_STATE_MAX_WAIT_TIME (300 * 1000) + + while (1) { + ret = eeh_ops->get_state(pe, &mwait); + + if (ret != EEH_STATE_UNAVAILABLE) + return ret; + + if (max_wait <= 0) { + pr_warn("%s: Timeout when getting PE's state (%d)\n", + __func__, max_wait); + return EEH_STATE_NOT_SUPPORT; + } + + if (mwait < EEH_STATE_MIN_WAIT_TIME) { + pr_warn("%s: Firmware returned bad wait value %d\n", + __func__, mwait); + mwait = EEH_STATE_MIN_WAIT_TIME; + } else if (mwait > EEH_STATE_MAX_WAIT_TIME) { + pr_warn("%s: Firmware returned too long wait value %d\n", + __func__, mwait); + mwait = EEH_STATE_MAX_WAIT_TIME; + } + + msleep(min(mwait, max_wait)); + max_wait -= mwait; + } +} + +/** * eeh_phb_pe_get - Retrieve PHB PE based on the given PHB * @phb: PCI controller * @@ -360,7 +410,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) edev->pe = pe; /* Put the edev to PE */ - list_add_tail(&edev->list, &pe->edevs); + list_add_tail(&edev->entry, &pe->edevs); pr_debug("EEH: Add %04x:%02x:%02x.%01x to Bus PE#%x\n", pdn->phb->global_number, pdn->busno, @@ -369,7 +419,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) pe->addr); return 0; } else if (pe && (pe->type & EEH_PE_INVALID)) { - list_add_tail(&edev->list, &pe->edevs); + list_add_tail(&edev->entry, &pe->edevs); edev->pe = pe; /* * We're running to here because of PCI hotplug caused by @@ -379,7 +429,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) while (parent) { if (!(parent->type & EEH_PE_INVALID)) break; - parent->type &= ~(EEH_PE_INVALID | EEH_PE_KEEP); + parent->type &= ~EEH_PE_INVALID; parent = parent->parent; } @@ -429,7 +479,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) * link the EEH device accordingly. */ list_add_tail(&pe->child, &parent->child_list); - list_add_tail(&edev->list, &pe->edevs); + list_add_tail(&edev->entry, &pe->edevs); edev->pe = pe; pr_debug("EEH: Add %04x:%02x:%02x.%01x to " "Device PE#%x, Parent PE#%x\n", @@ -457,7 +507,8 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev) int cnt; struct pci_dn *pdn = eeh_dev_to_pdn(edev); - if (!edev->pe) { + pe = eeh_dev_to_pe(edev); + if (!pe) { pr_debug("%s: No PE found for device %04x:%02x:%02x.%01x\n", __func__, pdn->phb->global_number, pdn->busno, @@ -467,9 +518,8 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev) } /* Remove the EEH device */ - pe = eeh_dev_to_pe(edev); edev->pe = NULL; - list_del(&edev->list); + list_del(&edev->entry); /* * Check if the parent PE includes any EEH devices. @@ -541,56 +591,50 @@ void eeh_pe_update_time_stamp(struct eeh_pe *pe) } /** - * __eeh_pe_state_mark - Mark the state for the PE - * @data: EEH PE - * @flag: state + * eeh_pe_state_mark - Mark specified state for PE and its associated device + * @pe: EEH PE * - * The function is used to mark the indicated state for the given - * PE. Also, the associated PCI devices will be put into IO frozen - * state as well. + * EEH error affects the current PE and its child PEs. The function + * is used to mark appropriate state for the affected PEs and the + * associated devices. */ -static void *__eeh_pe_state_mark(struct eeh_pe *pe, void *flag) +void eeh_pe_state_mark(struct eeh_pe *root, int state) { - int state = *((int *)flag); - struct eeh_dev *edev, *tmp; - struct pci_dev *pdev; - - /* Keep the state of permanently removed PE intact */ - if (pe->state & EEH_PE_REMOVED) - return NULL; - - pe->state |= state; - - /* Offline PCI devices if applicable */ - if (!(state & EEH_PE_ISOLATED)) - return NULL; - - eeh_pe_for_each_dev(pe, edev, tmp) { - pdev = eeh_dev_to_pci_dev(edev); - if (pdev) - pdev->error_state = pci_channel_io_frozen; - } - - /* Block PCI config access if required */ - if (pe->state & EEH_PE_CFG_RESTRICTED) - pe->state |= EEH_PE_CFG_BLOCKED; + struct eeh_pe *pe; - return NULL; + eeh_for_each_pe(root, pe) + if (!(pe->state & EEH_PE_REMOVED)) + pe->state |= state; } +EXPORT_SYMBOL_GPL(eeh_pe_state_mark); /** - * eeh_pe_state_mark - Mark specified state for PE and its associated device + * eeh_pe_mark_isolated * @pe: EEH PE * - * EEH error affects the current PE and its child PEs. The function - * is used to mark appropriate state for the affected PEs and the - * associated devices. + * Record that a PE has been isolated by marking the PE and it's children as + * EEH_PE_ISOLATED (and EEH_PE_CFG_BLOCKED, if required) and their PCI devices + * as pci_channel_io_frozen. */ -void eeh_pe_state_mark(struct eeh_pe *pe, int state) +void eeh_pe_mark_isolated(struct eeh_pe *root) { - eeh_pe_traverse(pe, __eeh_pe_state_mark, &state); + struct eeh_pe *pe; + struct eeh_dev *edev; + struct pci_dev *pdev; + + eeh_pe_state_mark(root, EEH_PE_ISOLATED); + eeh_for_each_pe(root, pe) { + list_for_each_entry(edev, &pe->edevs, entry) { + pdev = eeh_dev_to_pci_dev(edev); + if (pdev) + pdev->error_state = pci_channel_io_frozen; + } + /* Block PCI config access if required */ + if (pe->state & EEH_PE_CFG_RESTRICTED) + pe->state |= EEH_PE_CFG_BLOCKED; + } } -EXPORT_SYMBOL_GPL(eeh_pe_state_mark); +EXPORT_SYMBOL_GPL(eeh_pe_mark_isolated); static void *__eeh_pe_dev_mode_mark(struct eeh_dev *edev, void *flag) { @@ -671,28 +715,6 @@ void eeh_pe_state_clear(struct eeh_pe *pe, int state) eeh_pe_traverse(pe, __eeh_pe_state_clear, &state); } -/** - * eeh_pe_state_mark_with_cfg - Mark PE state with unblocked config space - * @pe: PE - * @state: PE state to be set - * - * Set specified flag to PE and its child PEs. The PCI config space - * of some PEs is blocked automatically when EEH_PE_ISOLATED is set, - * which isn't needed in some situations. The function allows to set - * the specified flag to indicated PEs without blocking their PCI - * config space. - */ -void eeh_pe_state_mark_with_cfg(struct eeh_pe *pe, int state) -{ - eeh_pe_traverse(pe, __eeh_pe_state_mark, &state); - if (!(state & EEH_PE_ISOLATED)) - return; - - /* Clear EEH_PE_CFG_BLOCKED, which might be set just now */ - state = EEH_PE_CFG_BLOCKED; - eeh_pe_traverse(pe, __eeh_pe_state_clear, &state); -} - /* * Some PCI bridges (e.g. PLX bridges) have primary/secondary * buses assigned explicitly by firmware, and we probably have @@ -945,7 +967,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe) return pe->bus; /* Retrieve the parent PCI bus of first (top) PCI device */ - edev = list_first_entry_or_null(&pe->edevs, struct eeh_dev, list); + edev = list_first_entry_or_null(&pe->edevs, struct eeh_dev, entry); pdev = eeh_dev_to_pci_dev(edev); if (pdev) return pdev->bus; diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S @@ -794,7 +794,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_601) lis r10,MSR_KERNEL@h ori r10,r10,MSR_KERNEL@l bl transfer_to_handler_full - .long nonrecoverable_exception + .long unrecoverable_exception .long ret_from_except #endif @@ -1297,7 +1297,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_601) rlwinm r3,r3,0,0,30 stw r3,_TRAP(r1) 4: addi r3,r1,STACK_FRAME_OVERHEAD - bl nonrecoverable_exception + bl unrecoverable_exception /* shouldn't return */ b 4b diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S @@ -171,7 +171,7 @@ system_call: /* label this so stack traces look sane */ * based on caller's run-mode / personality. */ ld r11,SYS_CALL_TABLE@toc(2) - andi. r10,r10,_TIF_32BIT + andis. r10,r10,_TIF_32BIT@h beq 15f addi r11,r11,8 /* use 32-bit syscall entries */ clrldi r3,r3,32 @@ -386,10 +386,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) 4: /* Anything else left to do? */ BEGIN_FTR_SECTION - lis r3,INIT_PPR@highest /* Set thread.ppr = 3 */ - ld r10,PACACURRENT(r13) + lis r3,DEFAULT_PPR@highest /* Set default PPR */ sldi r3,r3,32 /* bits 11-13 are used for ppr */ - std r3,TASKTHREADPPR(r10) + std r3,_PPR(r1) END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP) @@ -624,6 +623,10 @@ _GLOBAL(_switch) addi r6,r4,-THREAD /* Convert THREAD to 'current' */ std r6,PACACURRENT(r13) /* Set new 'current' */ +#if defined(CONFIG_STACKPROTECTOR) + ld r6, TASK_CANARY(r6) + std r6, PACA_CANARY(r13) +#endif ld r8,KSP(r4) /* new stack pointer */ #ifdef CONFIG_PPC_BOOK3S_64 @@ -672,7 +675,9 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) isync slbie r6 +BEGIN_FTR_SECTION slbie r6 /* Workaround POWER5 < DD2.1 issue */ +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) slbmte r7,r0 isync 2: @@ -936,12 +941,6 @@ fast_exception_return: andi. r0,r3,MSR_RI beq- .Lunrecov_restore - /* Load PPR from thread struct before we clear MSR:RI */ -BEGIN_FTR_SECTION - ld r2,PACACURRENT(r13) - ld r2,TASKTHREADPPR(r2) -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) - /* * Clear RI before restoring r13. If we are returning to * userspace and we take an exception after restoring r13, @@ -962,7 +961,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) andi. r0,r3,MSR_PR beq 1f BEGIN_FTR_SECTION - mtspr SPRN_PPR,r2 /* Restore PPR */ + /* Restore PPR */ + ld r2,_PPR(r1) + mtspr SPRN_PPR,r2 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ACCOUNT_CPU_USER_EXIT(r13, r2, r4) REST_GPR(13, r1) @@ -1118,7 +1119,7 @@ _ASM_NOKPROBE_SYMBOL(fast_exception_return); _GLOBAL(enter_rtas) mflr r0 std r0,16(r1) - stdu r1,-RTAS_FRAME_SIZE(r1) /* Save SP and create stack space. */ + stdu r1,-SWITCH_FRAME_SIZE(r1) /* Save SP and create stack space. */ /* Because RTAS is running in 32b mode, it clobbers the high order half * of all registers that it saves. We therefore save those registers @@ -1250,7 +1251,7 @@ rtas_restore_regs: ld r8,_DSISR(r1) mtdsisr r8 - addi r1,r1,RTAS_FRAME_SIZE /* Unstack our frame */ + addi r1,r1,SWITCH_FRAME_SIZE /* Unstack our frame */ ld r0,16(r1) /* get return address */ mtlr r0 @@ -1261,7 +1262,7 @@ rtas_restore_regs: _GLOBAL(enter_prom) mflr r0 std r0,16(r1) - stdu r1,-PROM_FRAME_SIZE(r1) /* Save SP and create stack space */ + stdu r1,-SWITCH_FRAME_SIZE(r1) /* Save SP and create stack space */ /* Because PROM is running in 32b mode, it clobbers the high order half * of all registers that it saves. We therefore save those registers @@ -1318,8 +1319,8 @@ _GLOBAL(enter_prom) REST_10GPRS(22, r1) ld r4,_CCR(r1) mtcr r4 - - addi r1,r1,PROM_FRAME_SIZE + + addi r1,r1,SWITCH_FRAME_SIZE ld r0,16(r1) mtlr r0 blr diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S @@ -244,14 +244,13 @@ EXC_REAL_BEGIN(machine_check, 0x200, 0x100) SET_SCRATCH0(r13) /* save r13 */ EXCEPTION_PROLOG_0(PACA_EXMC) BEGIN_FTR_SECTION - b machine_check_powernv_early + b machine_check_common_early FTR_SECTION_ELSE b machine_check_pSeries_0 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) EXC_REAL_END(machine_check, 0x200, 0x100) EXC_VIRT_NONE(0x4200, 0x100) -TRAMP_REAL_BEGIN(machine_check_powernv_early) -BEGIN_FTR_SECTION +TRAMP_REAL_BEGIN(machine_check_common_early) EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200) /* * Register contents: @@ -305,7 +304,9 @@ BEGIN_FTR_SECTION /* Save r9 through r13 from EXMC save area to stack frame. */ EXCEPTION_PROLOG_COMMON_2(PACA_EXMC) mfmsr r11 /* get MSR value */ +BEGIN_FTR_SECTION ori r11,r11,MSR_ME /* turn on ME bit */ +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) ori r11,r11,MSR_RI /* turn on RI bit */ LOAD_HANDLER(r12, machine_check_handle_early) 1: mtspr SPRN_SRR0,r12 @@ -324,13 +325,15 @@ BEGIN_FTR_SECTION andc r11,r11,r10 /* Turn off MSR_ME */ b 1b b . /* prevent speculative execution */ -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) TRAMP_REAL_BEGIN(machine_check_pSeries) .globl machine_check_fwnmi machine_check_fwnmi: SET_SCRATCH0(r13) /* save r13 */ EXCEPTION_PROLOG_0(PACA_EXMC) +BEGIN_FTR_SECTION + b machine_check_common_early +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE) machine_check_pSeries_0: EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200) /* @@ -440,6 +443,9 @@ EXC_COMMON_BEGIN(machine_check_handle_early) bl machine_check_early std r3,RESULT(r1) /* Save result */ ld r12,_MSR(r1) +BEGIN_FTR_SECTION + b 4f +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE) #ifdef CONFIG_PPC_P7_NAP /* @@ -463,11 +469,12 @@ EXC_COMMON_BEGIN(machine_check_handle_early) */ rldicl. r11,r12,4,63 /* See if MC hit while in HV mode. */ beq 5f - andi. r11,r12,MSR_PR /* See if coming from user. */ +4: andi. r11,r12,MSR_PR /* See if coming from user. */ bne 9f /* continue in V mode if we are. */ 5: #ifdef CONFIG_KVM_BOOK3S_64_HANDLER +BEGIN_FTR_SECTION /* * We are coming from kernel context. Check if we are coming from * guest. if yes, then we can continue. We will fall through @@ -476,6 +483,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early) lbz r11,HSTATE_IN_GUEST(r13) cmpwi r11,0 /* Check if coming from guest */ bne 9f /* continue if we are. */ +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) #endif /* * At this point we are not sure about what context we come from. @@ -510,6 +518,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early) cmpdi r3,0 /* see if we handled MCE successfully */ beq 1b /* if !handled then panic */ +BEGIN_FTR_SECTION /* * Return from MC interrupt. * Queue up the MCE event so that we can log it later, while @@ -518,10 +527,24 @@ EXC_COMMON_BEGIN(machine_check_handle_early) bl machine_check_queue_event MACHINE_CHECK_HANDLER_WINDUP RFI_TO_USER_OR_KERNEL +FTR_SECTION_ELSE + /* + * pSeries: Return from MC interrupt. Before that stay on emergency + * stack and call machine_check_exception to log the MCE event. + */ + LOAD_HANDLER(r10,mce_return) + mtspr SPRN_SRR0,r10 + ld r10,PACAKMSR(r13) + mtspr SPRN_SRR1,r10 + RFI_TO_KERNEL + b . +ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) 9: /* Deliver the machine check to host kernel in V mode. */ MACHINE_CHECK_HANDLER_WINDUP - b machine_check_pSeries + SET_SCRATCH0(r13) /* save r13 */ + EXCEPTION_PROLOG_0(PACA_EXMC) + b machine_check_pSeries_0 EXC_COMMON_BEGIN(unrecover_mce) /* Invoke machine_check_exception to print MCE event and panic. */ @@ -535,6 +558,13 @@ EXC_COMMON_BEGIN(unrecover_mce) bl unrecoverable_exception b 1b +EXC_COMMON_BEGIN(mce_return) + /* Invoke machine_check_exception to print MCE event and return. */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl machine_check_exception + MACHINE_CHECK_HANDLER_WINDUP + RFI_TO_KERNEL + b . EXC_REAL(data_access, 0x300, 0x80) EXC_VIRT(data_access, 0x4300, 0x80, 0x300) @@ -566,28 +596,36 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_DAR - mfspr r11,SPRN_SRR1 - crset 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_common) +EXCEPTION_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, KVMTEST_PR, 0x380); EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_DAR - mfspr r11,SPRN_SRR1 - crset 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_common) +EXCEPTION_RELON_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, NOTEST, 0x380); EXC_VIRT_END(data_access_slb, 0x4380, 0x80) + TRAMP_KVM_SKIP(PACA_EXSLB, 0x380) +EXC_COMMON_BEGIN(data_access_slb_common) + mfspr r10,SPRN_DAR + std r10,PACA_EXSLB+EX_DAR(r13) + EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB) + ld r4,PACA_EXSLB+EX_DAR(r13) + std r4,_DAR(r1) + addi r3,r1,STACK_FRAME_OVERHEAD + bl do_slb_fault + cmpdi r3,0 + bne- 1f + b fast_exception_return +1: /* Error case */ + std r3,RESULT(r1) + bl save_nvgprs + RECONCILE_IRQ_STATE(r10, r11) + ld r4,_DAR(r1) + ld r5,RESULT(r1) + addi r3,r1,STACK_FRAME_OVERHEAD + bl do_bad_slb_fault + b ret_from_except + EXC_REAL(instruction_access, 0x400, 0x80) EXC_VIRT(instruction_access, 0x4400, 0x80, 0x400) @@ -610,160 +648,34 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_SRR0 /* SRR0 is faulting address */ - mfspr r11,SPRN_SRR1 - crclr 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_common) +EXCEPTION_PROLOG(PACA_EXSLB, instruction_access_slb_common, EXC_STD, KVMTEST_PR, 0x480); EXC_REAL_END(instruction_access_slb, 0x480, 0x80) EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x480) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_SRR0 /* SRR0 is faulting address */ - mfspr r11,SPRN_SRR1 - crclr 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_common) +EXCEPTION_RELON_PROLOG(PACA_EXSLB, instruction_access_slb_common, EXC_STD, NOTEST, 0x480); EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) -TRAMP_KVM(PACA_EXSLB, 0x480) - - -/* - * This handler is used by the 0x380 and 0x480 SLB miss interrupts, as well as - * the virtual mode 0x4380 and 0x4480 interrupts if AIL is enabled. - */ -EXC_COMMON_BEGIN(slb_miss_common) - /* - * r13 points to the PACA, r9 contains the saved CR, - * r12 contains the saved r3, - * r11 contain the saved SRR1, SRR0 is still ready for return - * r3 has the faulting address - * r9 - r13 are saved in paca->exslb. - * cr6.eq is set for a D-SLB miss, clear for a I-SLB miss - * We assume we aren't going to take any exceptions during this - * procedure. - */ - mflr r10 - stw r9,PACA_EXSLB+EX_CCR(r13) /* save CR in exc. frame */ - std r10,PACA_EXSLB+EX_LR(r13) /* save LR */ - - andi. r9,r11,MSR_PR // Check for exception from userspace - cmpdi cr4,r9,MSR_PR // And save the result in CR4 for later - - /* - * Test MSR_RI before calling slb_allocate_realmode, because the - * MSR in r11 gets clobbered. However we still want to allocate - * SLB in case MSR_RI=0, to minimise the risk of getting stuck in - * recursive SLB faults. So use cr5 for this, which is preserved. - */ - andi. r11,r11,MSR_RI /* check for unrecoverable exception */ - cmpdi cr5,r11,MSR_RI - - crset 4*cr0+eq -#ifdef CONFIG_PPC_BOOK3S_64 -BEGIN_MMU_FTR_SECTION - bl slb_allocate -END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX) -#endif - - ld r10,PACA_EXSLB+EX_LR(r13) - lwz r9,PACA_EXSLB+EX_CCR(r13) /* get saved CR */ - mtlr r10 - - /* - * Large address, check whether we have to allocate new contexts. - */ - beq- 8f - bne- cr5,2f /* if unrecoverable exception, oops */ - - /* All done -- return from exception. */ - - bne cr4,1f /* returning to kernel */ - - mtcrf 0x80,r9 - mtcrf 0x08,r9 /* MSR[PR] indication is in cr4 */ - mtcrf 0x04,r9 /* MSR[RI] indication is in cr5 */ - mtcrf 0x02,r9 /* I/D indication is in cr6 */ - mtcrf 0x01,r9 /* slb_allocate uses cr0 and cr7 */ - - RESTORE_CTR(r9, PACA_EXSLB) - RESTORE_PPR_PACA(PACA_EXSLB, r9) - mr r3,r12 - ld r9,PACA_EXSLB+EX_R9(r13) - ld r10,PACA_EXSLB+EX_R10(r13) - ld r11,PACA_EXSLB+EX_R11(r13) - ld r12,PACA_EXSLB+EX_R12(r13) - ld r13,PACA_EXSLB+EX_R13(r13) - RFI_TO_USER - b . /* prevent speculative execution */ -1: - mtcrf 0x80,r9 - mtcrf 0x08,r9 /* MSR[PR] indication is in cr4 */ - mtcrf 0x04,r9 /* MSR[RI] indication is in cr5 */ - mtcrf 0x02,r9 /* I/D indication is in cr6 */ - mtcrf 0x01,r9 /* slb_allocate uses cr0 and cr7 */ - - RESTORE_CTR(r9, PACA_EXSLB) - RESTORE_PPR_PACA(PACA_EXSLB, r9) - mr r3,r12 - ld r9,PACA_EXSLB+EX_R9(r13) - ld r10,PACA_EXSLB+EX_R10(r13) - ld r11,PACA_EXSLB+EX_R11(r13) - ld r12,PACA_EXSLB+EX_R12(r13) - ld r13,PACA_EXSLB+EX_R13(r13) - RFI_TO_KERNEL - b . /* prevent speculative execution */ - - -2: std r3,PACA_EXSLB+EX_DAR(r13) - mr r3,r12 - mfspr r11,SPRN_SRR0 - mfspr r12,SPRN_SRR1 - LOAD_HANDLER(r10,unrecov_slb) - mtspr SPRN_SRR0,r10 - ld r10,PACAKMSR(r13) - mtspr SPRN_SRR1,r10 - RFI_TO_KERNEL - b . - -8: std r3,PACA_EXSLB+EX_DAR(r13) - mr r3,r12 - mfspr r11,SPRN_SRR0 - mfspr r12,SPRN_SRR1 - LOAD_HANDLER(r10, large_addr_slb) - mtspr SPRN_SRR0,r10 - ld r10,PACAKMSR(r13) - mtspr SPRN_SRR1,r10 - RFI_TO_KERNEL - b . +TRAMP_KVM(PACA_EXSLB, 0x480) -EXC_COMMON_BEGIN(unrecov_slb) - EXCEPTION_PROLOG_COMMON(0x4100, PACA_EXSLB) - RECONCILE_IRQ_STATE(r10, r11) +EXC_COMMON_BEGIN(instruction_access_slb_common) + EXCEPTION_PROLOG_COMMON(0x480, PACA_EXSLB) + ld r4,_NIP(r1) + addi r3,r1,STACK_FRAME_OVERHEAD + bl do_slb_fault + cmpdi r3,0 + bne- 1f + b fast_exception_return +1: /* Error case */ + std r3,RESULT(r1) bl save_nvgprs -1: addi r3,r1,STACK_FRAME_OVERHEAD - bl unrecoverable_exception - b 1b - -EXC_COMMON_BEGIN(large_addr_slb) - EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB) RECONCILE_IRQ_STATE(r10, r11) - ld r3, PACA_EXSLB+EX_DAR(r13) - std r3, _DAR(r1) - beq cr6, 2f - li r10, 0x481 /* fix trap number for I-SLB miss */ - std r10, _TRAP(r1) -2: bl save_nvgprs - addi r3, r1, STACK_FRAME_OVERHEAD - bl slb_miss_large_addr + ld r4,_NIP(r1) + ld r5,RESULT(r1) + addi r3,r1,STACK_FRAME_OVERHEAD + bl do_bad_slb_fault b ret_from_except + EXC_REAL_BEGIN(hardware_interrupt, 0x500, 0x100) .globl hardware_interrupt_hv; hardware_interrupt_hv: diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c @@ -1444,8 +1444,8 @@ static ssize_t fadump_register_store(struct kobject *kobj, break; case 1: if (fw_dump.dump_registered == 1) { - ret = -EEXIST; - goto unlock_out; + /* Un-register Firmware-assisted dump */ + fadump_unregister_dump(&fdm); } /* Register Firmware-assisted dump */ ret = register_fadump(); diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S @@ -642,7 +642,7 @@ DTLBMissIMMR: mtspr SPRN_MD_TWC, r10 mfspr r10, SPRN_IMMR /* Get current IMMR */ rlwinm r10, r10, 0, 0xfff80000 /* Get 512 kbytes boundary */ - ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \ + ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ _PAGE_PRESENT | _PAGE_NO_CACHE mtspr SPRN_MD_RPN, r10 /* Update TLB entry */ @@ -660,7 +660,7 @@ DTLBMissLinear: li r11, MD_PS8MEG | MD_SVALID | M_APG2 mtspr SPRN_MD_TWC, r11 rlwinm r10, r10, 0, 0x0f800000 /* 8xx supports max 256Mb RAM */ - ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \ + ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ _PAGE_PRESENT mtspr SPRN_MD_RPN, r10 /* Update TLB entry */ @@ -679,7 +679,7 @@ ITLBMissLinear: li r11, MI_PS8MEG | MI_SVALID | M_APG2 mtspr SPRN_MI_TWC, r11 rlwinm r10, r10, 0, 0x0f800000 /* 8xx supports max 256Mb RAM */ - ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \ + ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \ _PAGE_PRESENT mtspr SPRN_MI_RPN, r10 /* Update TLB entry */ diff --git a/arch/powerpc/kernel/io-workarounds.c b/arch/powerpc/kernel/io-workarounds.c @@ -153,10 +153,10 @@ static const struct ppc_pci_io iowa_pci_io = { #ifdef CONFIG_PPC_INDIRECT_MMIO static void __iomem *iowa_ioremap(phys_addr_t addr, unsigned long size, - unsigned long flags, void *caller) + pgprot_t prot, void *caller) { struct iowa_bus *bus; - void __iomem *res = __ioremap_caller(addr, size, flags, caller); + void __iomem *res = __ioremap_caller(addr, size, prot, caller); int busno; bus = iowa_pci_find(0, (unsigned long)addr); diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c @@ -785,9 +785,9 @@ dma_addr_t iommu_map_page(struct device *dev, struct iommu_table *tbl, vaddr = page_address(page) + offset; uaddr = (unsigned long)vaddr; - npages = iommu_num_pages(uaddr, size, IOMMU_PAGE_SIZE(tbl)); if (tbl) { + npages = iommu_num_pages(uaddr, size, IOMMU_PAGE_SIZE(tbl)); align = 0; if (tbl->it_page_shift < PAGE_SHIFT && size >= PAGE_SIZE && ((unsigned long)vaddr & ~PAGE_MASK) == 0) diff --git a/arch/powerpc/kernel/isa-bridge.c b/arch/powerpc/kernel/isa-bridge.c @@ -110,14 +110,14 @@ static void pci_process_ISA_OF_ranges(struct device_node *isa_node, size = 0x10000; __ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE, - size, pgprot_val(pgprot_noncached(__pgprot(0)))); + size, pgprot_noncached(PAGE_KERNEL)); return; inval_range: printk(KERN_ERR "no ISA IO ranges or unexpected isa range, " "mapping 64k\n"); __ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE, - 0x10000, pgprot_val(pgprot_noncached(__pgprot(0)))); + 0x10000, pgprot_noncached(PAGE_KERNEL)); } @@ -253,7 +253,7 @@ void __init isa_bridge_init_non_pci(struct device_node *np) */ isa_io_base = ISA_IO_BASE; __ioremap_at(pbase, (void *)ISA_IO_BASE, - size, pgprot_val(pgprot_noncached(__pgprot(0)))); + size, pgprot_noncached(PAGE_KERNEL)); pr_debug("ISA: Non-PCI bridge is %pOF\n", np); } diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c @@ -24,6 +24,7 @@ #include <asm/processor.h> #include <asm/machdep.h> #include <asm/debug.h> +#include <asm/code-patching.h> #include <linux/slab.h> /* @@ -144,7 +145,7 @@ static int kgdb_handle_breakpoint(struct pt_regs *regs) if (kgdb_handle_exception(1, SIGTRAP, 0, regs) != 0) return 0; - if (*(u32 *) (regs->nip) == *(u32 *) (&arch_kgdb_ops.gdb_bpt_instr)) + if (*(u32 *)regs->nip == BREAK_INSTR) regs->nip += BREAK_INSTR_SIZE; return 1; @@ -441,16 +442,42 @@ int kgdb_arch_handle_exception(int vector, int signo, int err_code, return -1; } +int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt) +{ + int err; + unsigned int instr; + unsigned int *addr = (unsigned int *)bpt->bpt_addr; + + err = probe_kernel_address(addr, instr); + if (err) + return err; + + err = patch_instruction(addr, BREAK_INSTR); + if (err) + return -EFAULT; + + *(unsigned int *)bpt->saved_instr = instr; + + return 0; +} + +int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt) +{ + int err; + unsigned int instr = *(unsigned int *)bpt->saved_instr; + unsigned int *addr = (unsigned int *)bpt->bpt_addr; + + err = patch_instruction(addr, instr); + if (err) + return -EFAULT; + + return 0; +} + /* * Global data */ -struct kgdb_arch arch_kgdb_ops = { -#ifdef __LITTLE_ENDIAN__ - .gdb_bpt_instr = {0x08, 0x10, 0x82, 0x7d}, -#else - .gdb_bpt_instr = {0x7d, 0x82, 0x10, 0x08}, -#endif -}; +struct kgdb_arch arch_kgdb_ops; static int kgdb_not_implemented(struct pt_regs *regs) { diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c @@ -488,10 +488,11 @@ long machine_check_early(struct pt_regs *regs) { long handled = 0; - __this_cpu_inc(irq_stat.mce_exceptions); - - if (cur_cpu_spec && cur_cpu_spec->machine_check_early) - handled = cur_cpu_spec->machine_check_early(regs); + /* + * See if platform is capable of handling machine check. + */ + if (ppc_md.machine_check_early) + handled = ppc_md.machine_check_early(regs); return handled; } diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c @@ -60,7 +60,7 @@ static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) /* flush SLBs and reload */ #ifdef CONFIG_PPC_BOOK3S_64 -static void flush_and_reload_slb(void) +void flush_and_reload_slb(void) { /* Invalidate all SLBs */ slb_flush_all_realmode(); @@ -89,6 +89,13 @@ static void flush_and_reload_slb(void) static void flush_erat(void) { +#ifdef CONFIG_PPC_BOOK3S_64 + if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) { + flush_and_reload_slb(); + return; + } +#endif + /* PPC_INVALIDATE_ERAT can only be used on ISA v3 and newer */ asm volatile(PPC_INVALIDATE_ERAT : : :"memory"); } diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c @@ -74,6 +74,14 @@ int module_finalize(const Elf_Ehdr *hdr, (void *)sect->sh_addr + sect->sh_size); #endif /* CONFIG_PPC64 */ +#ifdef PPC64_ELF_ABI_v1 + sect = find_section(hdr, sechdrs, ".opd"); + if (sect != NULL) { + me->arch.start_opd = sect->sh_addr; + me->arch.end_opd = sect->sh_addr + sect->sh_size; + } +#endif /* PPC64_ELF_ABI_v1 */ + #ifdef CONFIG_PPC_BARRIER_NOSPEC sect = find_section(hdr, sechdrs, "__spec_barrier_fixup"); if (sect != NULL) diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c @@ -360,11 +360,6 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr, else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0) dedotify_versions((void *)hdr + sechdrs[i].sh_offset, sechdrs[i].sh_size); - else if (!strcmp(secstrings + sechdrs[i].sh_name, ".opd")) { - me->arch.start_opd = sechdrs[i].sh_addr; - me->arch.end_opd = sechdrs[i].sh_addr + - sechdrs[i].sh_size; - } /* We don't handle .init for the moment: rename to _init */ while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init"))) @@ -685,7 +680,14 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, case R_PPC64_REL32: /* 32 bits relative (used by relative exception tables) */ - *(u32 *)location = value - (unsigned long)location; + /* Convert value to relative */ + value -= (unsigned long)location; + if (value + 0x80000000 > 0xffffffff) { + pr_err("%s: REL32 %li out of range!\n", + me->name, (long int)value); + return -ENOEXEC; + } + *(u32 *)location = value; break; case R_PPC64_TOCSAVE: diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c @@ -17,7 +17,6 @@ #include <linux/of.h> #include <linux/slab.h> #include <linux/export.h> -#include <linux/syscalls.h> #include <asm/processor.h> #include <asm/io.h> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c @@ -159,7 +159,7 @@ static int pcibios_map_phb_io_space(struct pci_controller *hose) /* Establish the mapping */ if (__ioremap_at(phys_page, area->addr, size_page, - pgprot_val(pgprot_noncached(__pgprot(0)))) == NULL) + pgprot_noncached(PAGE_KERNEL)) == NULL) return -ENOMEM; /* Fixup hose IO resource */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c @@ -43,6 +43,7 @@ #include <linux/uaccess.h> #include <linux/elf-randomize.h> #include <linux/pkeys.h> +#include <linux/seq_buf.h> #include <asm/pgtable.h> #include <asm/io.h> @@ -65,6 +66,7 @@ #include <asm/livepatch.h> #include <asm/cpu_has_feature.h> #include <asm/asm-prototypes.h> +#include <asm/stacktrace.h> #include <linux/kprobes.h> #include <linux/kdebug.h> @@ -102,24 +104,18 @@ static void check_if_tm_restore_required(struct task_struct *tsk) } } -static inline bool msr_tm_active(unsigned long msr) -{ - return MSR_TM_ACTIVE(msr); -} - static bool tm_active_with_fp(struct task_struct *tsk) { - return msr_tm_active(tsk->thread.regs->msr) && + return MSR_TM_ACTIVE(tsk->thread.regs->msr) && (tsk->thread.ckpt_regs.msr & MSR_FP); } static bool tm_active_with_altivec(struct task_struct *tsk) { - return msr_tm_active(tsk->thread.regs->msr) && + return MSR_TM_ACTIVE(tsk->thread.regs->msr) && (tsk->thread.ckpt_regs.msr & MSR_VEC); } #else -static inline bool msr_tm_active(unsigned long msr) { return false; } static inline void check_if_tm_restore_required(struct task_struct *tsk) { } static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; } static inline bool tm_active_with_altivec(struct task_struct *tsk) { return false; } @@ -247,7 +243,8 @@ void enable_kernel_fp(void) * giveup as this would save to the 'live' structure not the * checkpointed structure. */ - if(!msr_tm_active(cpumsr) && msr_tm_active(current->thread.regs->msr)) + if (!MSR_TM_ACTIVE(cpumsr) && + MSR_TM_ACTIVE(current->thread.regs->msr)) return; __giveup_fpu(current); } @@ -311,7 +308,8 @@ void enable_kernel_altivec(void) * giveup as this would save to the 'live' structure not the * checkpointed structure. */ - if(!msr_tm_active(cpumsr) && msr_tm_active(current->thread.regs->msr)) + if (!MSR_TM_ACTIVE(cpumsr) && + MSR_TM_ACTIVE(current->thread.regs->msr)) return; __giveup_altivec(current); } @@ -397,7 +395,8 @@ void enable_kernel_vsx(void) * giveup as this would save to the 'live' structure not the * checkpointed structure. */ - if(!msr_tm_active(cpumsr) && msr_tm_active(current->thread.regs->msr)) + if (!MSR_TM_ACTIVE(cpumsr) && + MSR_TM_ACTIVE(current->thread.regs->msr)) return; __giveup_vsx(current); } @@ -530,7 +529,7 @@ void restore_math(struct pt_regs *regs) { unsigned long msr; - if (!msr_tm_active(regs->msr) && + if (!MSR_TM_ACTIVE(regs->msr) && !current->thread.load_fp && !loadvec(current->thread)) return; @@ -1252,17 +1251,16 @@ struct task_struct *__switch_to(struct task_struct *prev, return last; } -static int instructions_to_print = 16; +#define NR_INSN_TO_PRINT 16 static void show_instructions(struct pt_regs *regs) { int i; - unsigned long pc = regs->nip - (instructions_to_print * 3 / 4 * - sizeof(int)); + unsigned long pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int)); printk("Instruction dump:"); - for (i = 0; i < instructions_to_print; i++) { + for (i = 0; i < NR_INSN_TO_PRINT; i++) { int instr; if (!(i % 8)) @@ -1277,7 +1275,7 @@ static void show_instructions(struct pt_regs *regs) #endif if (!__kernel_text_address(pc) || - probe_kernel_address((unsigned int __user *)pc, instr)) { + probe_kernel_address((const void *)pc, instr)) { pr_cont("XXXXXXXX "); } else { if (regs->nip == pc) @@ -1295,43 +1293,43 @@ static void show_instructions(struct pt_regs *regs) void show_user_instructions(struct pt_regs *regs) { unsigned long pc; - int i; + int n = NR_INSN_TO_PRINT; + struct seq_buf s; + char buf[96]; /* enough for 8 times 9 + 2 chars */ - pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int)); + pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int)); /* * Make sure the NIP points at userspace, not kernel text/data or * elsewhere. */ - if (!__access_ok(pc, instructions_to_print * sizeof(int), USER_DS)) { + if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) { pr_info("%s[%d]: Bad NIP, not dumping instructions.\n", current->comm, current->pid); return; } - pr_info("%s[%d]: code: ", current->comm, current->pid); + seq_buf_init(&s, buf, sizeof(buf)); - for (i = 0; i < instructions_to_print; i++) { - int instr; + while (n) { + int i; - if (!(i % 8) && (i > 0)) { - pr_cont("\n"); - pr_info("%s[%d]: code: ", current->comm, current->pid); - } + seq_buf_clear(&s); - if (probe_kernel_address((unsigned int __user *)pc, instr)) { - pr_cont("XXXXXXXX "); - } else { - if (regs->nip == pc) - pr_cont("<%08x> ", instr); - else - pr_cont("%08x ", instr); + for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) { + int instr; + + if (probe_kernel_address((const void *)pc, instr)) { + seq_buf_printf(&s, "XXXXXXXX "); + continue; + } + seq_buf_printf(&s, regs->nip == pc ? "<%08x> " : "%08x ", instr); } - pc += sizeof(int); + if (!seq_buf_has_overflowed(&s)) + pr_info("%s[%d]: code: %s\n", current->comm, + current->pid, s.buffer); } - - pr_cont("\n"); } struct regbit { @@ -1485,6 +1483,15 @@ void flush_thread(void) #endif /* CONFIG_HAVE_HW_BREAKPOINT */ } +#ifdef CONFIG_PPC_BOOK3S_64 +void arch_setup_new_exec(void) +{ + if (radix_enabled()) + return; + hash__setup_new_exec(); +} +#endif + int set_thread_uses_vas(void) { #ifdef CONFIG_PPC_BOOK3S_64 @@ -1705,7 +1712,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, p->thread.dscr = mfspr(SPRN_DSCR); } if (cpu_has_feature(CPU_FTR_HAS_PPR)) - p->thread.ppr = INIT_PPR; + childregs->ppr = DEFAULT_PPR; p->thread.tidr = 0; #endif @@ -1713,6 +1720,8 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, return 0; } +void preload_new_slb_context(unsigned long start, unsigned long sp); + /* * Set up a thread for executing a new program */ @@ -1720,6 +1729,10 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) { #ifdef CONFIG_PPC64 unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */ + +#ifdef CONFIG_PPC_BOOK3S_64 + preload_new_slb_context(start, sp); +#endif #endif /* @@ -1810,6 +1823,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) #ifdef CONFIG_VSX current->thread.used_vsr = 0; #endif + current->thread.load_slb = 0; current->thread.load_fp = 0; memset(&current->thread.fp_state, 0, sizeof(current->thread.fp_state)); current->thread.fp_save_area = NULL; diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c @@ -43,11 +43,13 @@ #include <asm/btext.h> #include <asm/sections.h> #include <asm/machdep.h> -#include <asm/opal.h> #include <asm/asm-prototypes.h> #include <linux/linux_logo.h> +/* All of prom_init bss lives here */ +#define __prombss __section(.bss.prominit) + /* * Eventually bump that one up */ @@ -87,7 +89,7 @@ #define OF_WORKAROUNDS 0 #else #define OF_WORKAROUNDS of_workarounds -int of_workarounds; +static int of_workarounds __prombss; #endif #define OF_WA_CLAIM 1 /* do phys/virt claim separately, then map */ @@ -148,29 +150,31 @@ extern void copy_and_flush(unsigned long dest, unsigned long src, unsigned long size, unsigned long offset); /* prom structure */ -static struct prom_t __initdata prom; +static struct prom_t __prombss prom; -static unsigned long prom_entry __initdata; +static unsigned long __prombss prom_entry; #define PROM_SCRATCH_SIZE 256 -static char __initdata of_stdout_device[256]; -static char __initdata prom_scratch[PROM_SCRATCH_SIZE]; +static char __prombss of_stdout_device[256]; +static char __prombss prom_scratch[PROM_SCRATCH_SIZE]; -static unsigned long __initdata dt_header_start; -static unsigned long __initdata dt_struct_start, dt_struct_end; -static unsigned long __initdata dt_string_start, dt_string_end; +static unsigned long __prombss dt_header_start; +static unsigned long __prombss dt_struct_start, dt_struct_end; +static unsigned long __prombss dt_string_start, dt_string_end; -static unsigned long __initdata prom_initrd_start, prom_initrd_end; +static unsigned long __prombss prom_initrd_start, prom_initrd_end; #ifdef CONFIG_PPC64 -static int __initdata prom_iommu_force_on; -static int __initdata prom_iommu_off; -static unsigned long __initdata prom_tce_alloc_start; -static unsigned long __initdata prom_tce_alloc_end; +static int __prombss prom_iommu_force_on; +static int __prombss prom_iommu_off; +static unsigned long __prombss prom_tce_alloc_start; +static unsigned long __prombss prom_tce_alloc_end; #endif -static bool prom_radix_disable __initdata = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT); +#ifdef CONFIG_PPC_PSERIES +static bool __prombss prom_radix_disable; +#endif struct platform_support { bool hash_mmu; @@ -188,26 +192,25 @@ struct platform_support { #define PLATFORM_LPAR 0x0001 #define PLATFORM_POWERMAC 0x0400 #define PLATFORM_GENERIC 0x0500 -#define PLATFORM_OPAL 0x0600 -static int __initdata of_platform; +static int __prombss of_platform; -static char __initdata prom_cmd_line[COMMAND_LINE_SIZE]; +static char __prombss prom_cmd_line[COMMAND_LINE_SIZE]; -static unsigned long __initdata prom_memory_limit; +static unsigned long __prombss prom_memory_limit; -static unsigned long __initdata alloc_top; -static unsigned long __initdata alloc_top_high; -static unsigned long __initdata alloc_bottom; -static unsigned long __initdata rmo_top; -static unsigned long __initdata ram_top; +static unsigned long __prombss alloc_top; +static unsigned long __prombss alloc_top_high; +static unsigned long __prombss alloc_bottom; +static unsigned long __prombss rmo_top; +static unsigned long __prombss ram_top; -static struct mem_map_entry __initdata mem_reserve_map[MEM_RESERVE_MAP_SIZE]; -static int __initdata mem_reserve_cnt; +static struct mem_map_entry __prombss mem_reserve_map[MEM_RESERVE_MAP_SIZE]; +static int __prombss mem_reserve_cnt; -static cell_t __initdata regbuf[1024]; +static cell_t __prombss regbuf[1024]; -static bool rtas_has_query_cpu_stopped; +static bool __prombss rtas_has_query_cpu_stopped; /* @@ -522,8 +525,8 @@ static void add_string(char **str, const char *q) static char *tohex(unsigned int x) { - static char digits[] = "0123456789abcdef"; - static char result[9]; + static const char digits[] __initconst = "0123456789abcdef"; + static char result[9] __prombss; int i; result[8] = 0; @@ -664,6 +667,8 @@ static void __init early_cmdline_parse(void) #endif } +#ifdef CONFIG_PPC_PSERIES + prom_radix_disable = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT); opt = strstr(prom_cmd_line, "disable_radix"); if (opt) { opt += 13; @@ -679,9 +684,10 @@ static void __init early_cmdline_parse(void) } if (prom_radix_disable) prom_debug("Radix disabled from cmdline\n"); +#endif /* CONFIG_PPC_PSERIES */ } -#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) +#ifdef CONFIG_PPC_PSERIES /* * The architecture vector has an array of PVR mask/value pairs, * followed by # option vectors - 1, followed by the option vectors. @@ -782,7 +788,7 @@ struct ibm_arch_vec { struct option_vector6 vec6; } __packed; -struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = { +static const struct ibm_arch_vec ibm_architecture_vec_template __initconst = { .pvrs = { { .mask = cpu_to_be32(0xfffe0000), /* POWER5/POWER5+ */ @@ -920,9 +926,11 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = { }, }; +static struct ibm_arch_vec __prombss ibm_architecture_vec ____cacheline_aligned; + /* Old method - ELF header with PT_NOTE sections only works on BE */ #ifdef __BIG_ENDIAN__ -static struct fake_elf { +static const struct fake_elf { Elf32_Ehdr elfhdr; Elf32_Phdr phdr[2]; struct chrpnote { @@ -955,7 +963,7 @@ static struct fake_elf { u32 ignore_me; } rpadesc; } rpanote; -} fake_elf = { +} fake_elf __initconst = { .elfhdr = { .e_ident = { 0x7f, 'E', 'L', 'F', ELFCLASS32, ELFDATA2MSB, EV_CURRENT }, @@ -1129,14 +1137,21 @@ static void __init prom_check_platform_support(void) }; int prop_len = prom_getproplen(prom.chosen, "ibm,arch-vec-5-platform-support"); + + /* First copy the architecture vec template */ + ibm_architecture_vec = ibm_architecture_vec_template; + if (prop_len > 1) { int i; - u8 vec[prop_len]; + u8 vec[8]; prom_debug("Found ibm,arch-vec-5-platform-support, len: %d\n", prop_len); + if (prop_len > sizeof(vec)) + prom_printf("WARNING: ibm,arch-vec-5-platform-support longer than expected (len: %d)\n", + prop_len); prom_getprop(prom.chosen, "ibm,arch-vec-5-platform-support", &vec, sizeof(vec)); - for (i = 0; i < prop_len; i += 2) { + for (i = 0; i < sizeof(vec); i += 2) { prom_debug("%d: index = 0x%x val = 0x%x\n", i / 2 , vec[i] , vec[i + 1]); @@ -1225,7 +1240,7 @@ static void __init prom_send_capabilities(void) } #endif /* __BIG_ENDIAN__ */ } -#endif /* #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) */ +#endif /* CONFIG_PPC_PSERIES */ /* * Memory allocation strategy... our layout is normally: @@ -1562,88 +1577,6 @@ static void __init prom_close_stdin(void) } } -#ifdef CONFIG_PPC_POWERNV - -#ifdef CONFIG_PPC_EARLY_DEBUG_OPAL -static u64 __initdata prom_opal_base; -static u64 __initdata prom_opal_entry; -#endif - -/* - * Allocate room for and instantiate OPAL - */ -static void __init prom_instantiate_opal(void) -{ - phandle opal_node; - ihandle opal_inst; - u64 base, entry; - u64 size = 0, align = 0x10000; - __be64 val64; - u32 rets[2]; - - prom_debug("prom_instantiate_opal: start...\n"); - - opal_node = call_prom("finddevice", 1, 1, ADDR("/ibm,opal")); - prom_debug("opal_node: %x\n", opal_node); - if (!PHANDLE_VALID(opal_node)) - return; - - val64 = 0; - prom_getprop(opal_node, "opal-runtime-size", &val64, sizeof(val64)); - size = be64_to_cpu(val64); - if (size == 0) - return; - val64 = 0; - prom_getprop(opal_node, "opal-runtime-alignment", &val64,sizeof(val64)); - align = be64_to_cpu(val64); - - base = alloc_down(size, align, 0); - if (base == 0) { - prom_printf("OPAL allocation failed !\n"); - return; - } - - opal_inst = call_prom("open", 1, 1, ADDR("/ibm,opal")); - if (!IHANDLE_VALID(opal_inst)) { - prom_printf("opening opal package failed (%x)\n", opal_inst); - return; - } - - prom_printf("instantiating opal at 0x%llx...", base); - - if (call_prom_ret("call-method", 4, 3, rets, - ADDR("load-opal-runtime"), - opal_inst, - base >> 32, base & 0xffffffff) != 0 - || (rets[0] == 0 && rets[1] == 0)) { - prom_printf(" failed\n"); - return; - } - entry = (((u64)rets[0]) << 32) | rets[1]; - - prom_printf(" done\n"); - - reserve_mem(base, size); - - prom_debug("opal base = 0x%llx\n", base); - prom_debug("opal align = 0x%llx\n", align); - prom_debug("opal entry = 0x%llx\n", entry); - prom_debug("opal size = 0x%llx\n", size); - - prom_setprop(opal_node, "/ibm,opal", "opal-base-address", - &base, sizeof(base)); - prom_setprop(opal_node, "/ibm,opal", "opal-entry-address", - &entry, sizeof(entry)); - -#ifdef CONFIG_PPC_EARLY_DEBUG_OPAL - prom_opal_base = base; - prom_opal_entry = entry; -#endif - prom_debug("prom_instantiate_opal: end...\n"); -} - -#endif /* CONFIG_PPC_POWERNV */ - /* * Allocate room for and instantiate RTAS */ @@ -2150,10 +2083,6 @@ static int __init prom_find_machine_type(void) } } #ifdef CONFIG_PPC64 - /* Try to detect OPAL */ - if (PHANDLE_VALID(call_prom("finddevice", 1, 1, ADDR("/ibm,opal")))) - return PLATFORM_OPAL; - /* Try to figure out if it's an IBM pSeries or any other * PAPR compliant platform. We assume it is if : * - /device_type is "chrp" (please, do NOT use that for future @@ -2202,7 +2131,7 @@ static void __init prom_check_displays(void) ihandle ih; int i; - static unsigned char default_colors[] = { + static const unsigned char default_colors[] __initconst = { 0x00, 0x00, 0x00, 0x00, 0x00, 0xaa, 0x00, 0xaa, 0x00, @@ -2398,7 +2327,7 @@ static void __init scan_dt_build_struct(phandle node, unsigned long *mem_start, char *namep, *prev_name, *sstart, *p, *ep, *lp, *path; unsigned long soff; unsigned char *valp; - static char pname[MAX_PROPERTY_NAME]; + static char pname[MAX_PROPERTY_NAME] __prombss; int l, room, has_phandle = 0; dt_push_token(OF_DT_BEGIN_NODE, mem_start, mem_end); @@ -2481,14 +2410,11 @@ static void __init scan_dt_build_struct(phandle node, unsigned long *mem_start, has_phandle = 1; } - /* Add a "linux,phandle" property if no "phandle" property already - * existed (can happen with OPAL) - */ + /* Add a "phandle" property if none already exist */ if (!has_phandle) { - soff = dt_find_string("linux,phandle"); + soff = dt_find_string("phandle"); if (soff == 0) - prom_printf("WARNING: Can't find string index for" - " <linux-phandle> node %s\n", path); + prom_printf("WARNING: Can't find string index for <phandle> node %s\n", path); else { dt_push_token(OF_DT_PROP, mem_start, mem_end); dt_push_token(4, mem_start, mem_end); @@ -2548,9 +2474,9 @@ static void __init flatten_device_tree(void) dt_string_start = mem_start; mem_start += 4; /* hole */ - /* Add "linux,phandle" in there, we'll need it */ + /* Add "phandle" in there, we'll need it */ namep = make_room(&mem_start, &mem_end, 16, 1); - strcpy(namep, "linux,phandle"); + strcpy(namep, "phandle"); mem_start = (unsigned long)namep + strlen(namep) + 1; /* Build string array */ @@ -3172,7 +3098,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, */ early_cmdline_parse(); -#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) +#ifdef CONFIG_PPC_PSERIES /* * On pSeries, inform the firmware about our capabilities */ @@ -3216,15 +3142,9 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, * On non-powermacs, try to instantiate RTAS. PowerMacs don't * have a usable RTAS implementation. */ - if (of_platform != PLATFORM_POWERMAC && - of_platform != PLATFORM_OPAL) + if (of_platform != PLATFORM_POWERMAC) prom_instantiate_rtas(); -#ifdef CONFIG_PPC_POWERNV - if (of_platform == PLATFORM_OPAL) - prom_instantiate_opal(); -#endif /* CONFIG_PPC_POWERNV */ - #ifdef CONFIG_PPC64 /* instantiate sml */ prom_instantiate_sml(); @@ -3237,8 +3157,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, * * (This must be done after instanciating RTAS) */ - if (of_platform != PLATFORM_POWERMAC && - of_platform != PLATFORM_OPAL) + if (of_platform != PLATFORM_POWERMAC) prom_hold_cpus(); /* @@ -3282,11 +3201,9 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, /* * in case stdin is USB and still active on IBM machines... * Unfortunately quiesce crashes on some powermacs if we have - * closed stdin already (in particular the powerbook 101). It - * appears that the OPAL version of OFW doesn't like it either. + * closed stdin already (in particular the powerbook 101). */ - if (of_platform != PLATFORM_POWERMAC && - of_platform != PLATFORM_OPAL) + if (of_platform != PLATFORM_POWERMAC) prom_close_stdin(); /* @@ -3304,10 +3221,8 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, hdr = dt_header_start; /* Don't print anything after quiesce under OPAL, it crashes OFW */ - if (of_platform != PLATFORM_OPAL) { - prom_printf("Booting Linux via __start() @ 0x%lx ...\n", kbase); - prom_debug("->dt_header_start=0x%lx\n", hdr); - } + prom_printf("Booting Linux via __start() @ 0x%lx ...\n", kbase); + prom_debug("->dt_header_start=0x%lx\n", hdr); #ifdef CONFIG_PPC32 reloc_got2(-offset); @@ -3315,13 +3230,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, unreloc_toc(); #endif -#ifdef CONFIG_PPC_EARLY_DEBUG_OPAL - /* OPAL early debug gets the OPAL base & entry in r8 and r9 */ - __start(hdr, kbase, 0, 0, 0, - prom_opal_base, prom_opal_entry); -#else __start(hdr, kbase, 0, 0, 0, 0, 0); -#endif return 0; } diff --git a/arch/powerpc/kernel/prom_init_check.sh b/arch/powerpc/kernel/prom_init_check.sh @@ -28,6 +28,18 @@ OBJ="$2" ERROR=0 +function check_section() +{ + file=$1 + section=$2 + size=$(objdump -h -j $section $file 2>/dev/null | awk "\$2 == \"$section\" {print \$3}") + size=${size:-0} + if [ $size -ne 0 ]; then + ERROR=1 + echo "Error: Section $section not empty in prom_init.c" >&2 + fi +} + for UNDEF in $($NM -u $OBJ | awk '{print $2}') do # On 64-bit nm gives us the function descriptors, which have @@ -66,4 +78,8 @@ do fi done +check_section $OBJ .data +check_section $OBJ .bss +check_section $OBJ .init.data + exit $ERROR diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c @@ -297,7 +297,7 @@ int ptrace_get_reg(struct task_struct *task, int regno, unsigned long *data) } #endif - if (regno < (sizeof(struct pt_regs) / sizeof(unsigned long))) { + if (regno < (sizeof(struct user_pt_regs) / sizeof(unsigned long))) { *data = ((unsigned long *)task->thread.regs)[regno]; return 0; } @@ -360,10 +360,10 @@ static int gpr_get(struct task_struct *target, const struct user_regset *regset, ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &target->thread.regs->orig_gpr3, offsetof(struct pt_regs, orig_gpr3), - sizeof(struct pt_regs)); + sizeof(struct user_pt_regs)); if (!ret) ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, - sizeof(struct pt_regs), -1); + sizeof(struct user_pt_regs), -1); return ret; } @@ -853,10 +853,10 @@ static int tm_cgpr_get(struct task_struct *target, ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &target->thread.ckpt_regs.orig_gpr3, offsetof(struct pt_regs, orig_gpr3), - sizeof(struct pt_regs)); + sizeof(struct user_pt_regs)); if (!ret) ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, - sizeof(struct pt_regs), -1); + sizeof(struct user_pt_regs), -1); return ret; } @@ -1609,7 +1609,7 @@ static int ppr_get(struct task_struct *target, void *kbuf, void __user *ubuf) { return user_regset_copyout(&pos, &count, &kbuf, &ubuf, - &target->thread.ppr, 0, sizeof(u64)); + &target->thread.regs->ppr, 0, sizeof(u64)); } static int ppr_set(struct task_struct *target, @@ -1618,7 +1618,7 @@ static int ppr_set(struct task_struct *target, const void *kbuf, const void __user *ubuf) { return user_regset_copyin(&pos, &count, &kbuf, &ubuf, - &target->thread.ppr, 0, sizeof(u64)); + &target->thread.regs->ppr, 0, sizeof(u64)); } static int dscr_get(struct task_struct *target, @@ -2508,6 +2508,7 @@ void ptrace_disable(struct task_struct *child) { /* make sure the single step bit is not set. */ user_disable_single_step(child); + clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); } #ifdef CONFIG_PPC_ADV_DEBUG_REGS @@ -3130,7 +3131,7 @@ long arch_ptrace(struct task_struct *child, long request, case PTRACE_GETREGS: /* Get all pt_regs from the child. */ return copy_regset_to_user(child, &user_ppc_native_view, REGSET_GPR, - 0, sizeof(struct pt_regs), + 0, sizeof(struct user_pt_regs), datavp); #ifdef CONFIG_PPC64 @@ -3139,7 +3140,7 @@ long arch_ptrace(struct task_struct *child, long request, case PTRACE_SETREGS: /* Set all gp regs in the child. */ return copy_regset_from_user(child, &user_ppc_native_view, REGSET_GPR, - 0, sizeof(struct pt_regs), + 0, sizeof(struct user_pt_regs), datavp); case PTRACE_GETFPREGS: /* Get the child FPU state (FPR0...31 + FPSCR) */ @@ -3264,6 +3265,16 @@ long do_syscall_trace_enter(struct pt_regs *regs) { user_exit(); + if (test_thread_flag(TIF_SYSCALL_EMU)) { + ptrace_report_syscall(regs); + /* + * Returning -1 will skip the syscall execution. We want to + * avoid clobbering any register also, thus, not 'gotoing' + * skip label. + */ + return -1; + } + /* * The tracer may decide to abort the syscall, if so tracehook * will return !0. Note that the tracer may also just change @@ -3324,3 +3335,42 @@ void do_syscall_trace_leave(struct pt_regs *regs) user_enter(); } + +void __init pt_regs_check(void) +{ + BUILD_BUG_ON(offsetof(struct pt_regs, gpr) != + offsetof(struct user_pt_regs, gpr)); + BUILD_BUG_ON(offsetof(struct pt_regs, nip) != + offsetof(struct user_pt_regs, nip)); + BUILD_BUG_ON(offsetof(struct pt_regs, msr) != + offsetof(struct user_pt_regs, msr)); + BUILD_BUG_ON(offsetof(struct pt_regs, msr) != + offsetof(struct user_pt_regs, msr)); + BUILD_BUG_ON(offsetof(struct pt_regs, orig_gpr3) != + offsetof(struct user_pt_regs, orig_gpr3)); + BUILD_BUG_ON(offsetof(struct pt_regs, ctr) != + offsetof(struct user_pt_regs, ctr)); + BUILD_BUG_ON(offsetof(struct pt_regs, link) != + offsetof(struct user_pt_regs, link)); + BUILD_BUG_ON(offsetof(struct pt_regs, xer) != + offsetof(struct user_pt_regs, xer)); + BUILD_BUG_ON(offsetof(struct pt_regs, ccr) != + offsetof(struct user_pt_regs, ccr)); +#ifdef __powerpc64__ + BUILD_BUG_ON(offsetof(struct pt_regs, softe) != + offsetof(struct user_pt_regs, softe)); +#else + BUILD_BUG_ON(offsetof(struct pt_regs, mq) != + offsetof(struct user_pt_regs, mq)); +#endif + BUILD_BUG_ON(offsetof(struct pt_regs, trap) != + offsetof(struct user_pt_regs, trap)); + BUILD_BUG_ON(offsetof(struct pt_regs, dar) != + offsetof(struct user_pt_regs, dar)); + BUILD_BUG_ON(offsetof(struct pt_regs, dsisr) != + offsetof(struct user_pt_regs, dsisr)); + BUILD_BUG_ON(offsetof(struct pt_regs, result) != + offsetof(struct user_pt_regs, result)); + + BUILD_BUG_ON(sizeof(struct user_pt_regs) > sizeof(struct pt_regs)); +} diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c @@ -981,7 +981,15 @@ int rtas_ibm_suspend_me(u64 handle) goto out; } - stop_topology_update(); + cpu_hotplug_disable(); + + /* Check if we raced with a CPU-Offline Operation */ + if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) { + pr_err("%s: Raced against a concurrent CPU-Offline\n", + __func__); + atomic_set(&data.error, -EBUSY); + goto out_hotplug_enable; + } /* Call function on all CPUs. One of us will make the * rtas call @@ -994,7 +1002,8 @@ int rtas_ibm_suspend_me(u64 handle) if (atomic_read(&data.error) != 0) printk(KERN_ERR "Error doing global join\n"); - start_topology_update(); +out_hotplug_enable: + cpu_hotplug_enable(); /* Take down CPUs not online prior to suspend */ cpuret = rtas_offline_cpus_mask(offline_mask); diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c @@ -91,6 +91,8 @@ static char *rtas_event_type(int type) return "Dump Notification Event"; case RTAS_TYPE_PRRN: return "Platform Resource Reassignment Event"; + case RTAS_TYPE_HOTPLUG: + return "Hotplug Event"; } return rtas_type[0]; @@ -150,8 +152,10 @@ static void printk_log_rtas(char *buf, int len) } else { struct rtas_error_log *errlog = (struct rtas_error_log *)buf; - printk(RTAS_DEBUG "event: %d, Type: %s, Severity: %d\n", - error_log_cnt, rtas_event_type(rtas_error_type(errlog)), + printk(RTAS_DEBUG "event: %d, Type: %s (%d), Severity: %d\n", + error_log_cnt, + rtas_event_type(rtas_error_type(errlog)), + rtas_error_type(errlog), rtas_error_severity(errlog)); } } @@ -274,27 +278,16 @@ void pSeries_log_error(char *buf, unsigned int err_type, int fatal) } #ifdef CONFIG_PPC_PSERIES -static s32 prrn_update_scope; - -static void prrn_work_fn(struct work_struct *work) +static void handle_prrn_event(s32 scope) { /* * For PRRN, we must pass the negative of the scope value in * the RTAS event. */ - pseries_devicetree_update(-prrn_update_scope); + pseries_devicetree_update(-scope); numa_update_cpu_topology(false); } -static DECLARE_WORK(prrn_work, prrn_work_fn); - -static void prrn_schedule_update(u32 scope) -{ - flush_work(&prrn_work); - prrn_update_scope = scope; - schedule_work(&prrn_work); -} - static void handle_rtas_event(const struct rtas_error_log *log) { if (rtas_error_type(log) != RTAS_TYPE_PRRN || !prrn_is_enabled()) @@ -303,7 +296,7 @@ static void handle_rtas_event(const struct rtas_error_log *log) /* For PRRN Events the extended log length is used to denote * the scope for calling rtas update-nodes. */ - prrn_schedule_update(rtas_error_extended_log_length(log)); + handle_prrn_event(rtas_error_extended_log_length(log)); } #else diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c @@ -33,6 +33,7 @@ #include <linux/serial_8250.h> #include <linux/percpu.h> #include <linux/memblock.h> +#include <linux/bootmem.h> #include <linux/of_platform.h> #include <linux/hugetlb.h> #include <asm/debugfs.h> @@ -966,6 +967,8 @@ void __init setup_arch(char **cmdline_p) initmem_init(); + early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT); + #ifdef CONFIG_DUMMY_CONSOLE conswitchp = &dummy_con; #endif diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c @@ -243,13 +243,19 @@ static void cpu_ready_for_interrupts(void) } /* - * Fixup HFSCR:TM based on CPU features. The bit is set by our - * early asm init because at that point we haven't updated our - * CPU features from firmware and device-tree. Here we have, - * so let's do it. + * Set HFSCR:TM based on CPU features: + * In the special case of TM no suspend (P9N DD2.1), Linux is + * told TM is off via the dt-ftrs but told to (partially) use + * it via OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED. So HFSCR[TM] + * will be off from dt-ftrs but we need to turn it on for the + * no suspend case. */ - if (cpu_has_feature(CPU_FTR_HVMODE) && !cpu_has_feature(CPU_FTR_TM_COMP)) - mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) & ~HFSCR_TM); + if (cpu_has_feature(CPU_FTR_HVMODE)) { + if (cpu_has_feature(CPU_FTR_TM_COMP)) + mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) | HFSCR_TM); + else + mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) & ~HFSCR_TM); + } /* Set IR and DR in PACA MSR */ get_paca()->kernel_msr = MSR_KERNEL; diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c @@ -34,6 +34,8 @@ #include <linux/topology.h> #include <linux/profile.h> #include <linux/processor.h> +#include <linux/random.h> +#include <linux/stackprotector.h> #include <asm/ptrace.h> #include <linux/atomic.h> @@ -74,14 +76,32 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 }; #endif struct thread_info *secondary_ti; +bool has_big_cores; DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); +DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map); DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map); DEFINE_PER_CPU(cpumask_var_t, cpu_core_map); EXPORT_PER_CPU_SYMBOL(cpu_sibling_map); EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map); EXPORT_PER_CPU_SYMBOL(cpu_core_map); +EXPORT_SYMBOL_GPL(has_big_cores); + +#define MAX_THREAD_LIST_SIZE 8 +#define THREAD_GROUP_SHARE_L1 1 +struct thread_groups { + unsigned int property; + unsigned int nr_groups; + unsigned int threads_per_group; + unsigned int thread_list[MAX_THREAD_LIST_SIZE]; +}; + +/* + * On big-cores system, cpu_l1_cache_map for each CPU corresponds to + * the set its siblings that share the L1-cache. + */ +DEFINE_PER_CPU(cpumask_var_t, cpu_l1_cache_map); /* SMP operations for this machine */ struct smp_ops_t *smp_ops; @@ -674,6 +694,185 @@ static void set_cpus_unrelated(int i, int j, } #endif +/* + * parse_thread_groups: Parses the "ibm,thread-groups" device tree + * property for the CPU device node @dn and stores + * the parsed output in the thread_groups + * structure @tg if the ibm,thread-groups[0] + * matches @property. + * + * @dn: The device node of the CPU device. + * @tg: Pointer to a thread group structure into which the parsed + * output of "ibm,thread-groups" is stored. + * @property: The property of the thread-group that the caller is + * interested in. + * + * ibm,thread-groups[0..N-1] array defines which group of threads in + * the CPU-device node can be grouped together based on the property. + * + * ibm,thread-groups[0] tells us the property based on which the + * threads are being grouped together. If this value is 1, it implies + * that the threads in the same group share L1, translation cache. + * + * ibm,thread-groups[1] tells us how many such thread groups exist. + * + * ibm,thread-groups[2] tells us the number of threads in each such + * group. + * + * ibm,thread-groups[3..N-1] is the list of threads identified by + * "ibm,ppc-interrupt-server#s" arranged as per their membership in + * the grouping. + * + * Example: If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12] it + * implies that there are 2 groups of 4 threads each, where each group + * of threads share L1, translation cache. + * + * The "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8} + * and the "ibm,ppc-interrupt-server#s" of the second group is {9, 10, + * 11, 12} structure + * + * Returns 0 on success, -EINVAL if the property does not exist, + * -ENODATA if property does not have a value, and -EOVERFLOW if the + * property data isn't large enough. + */ +static int parse_thread_groups(struct device_node *dn, + struct thread_groups *tg, + unsigned int property) +{ + int i; + u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE]; + u32 *thread_list; + size_t total_threads; + int ret; + + ret = of_property_read_u32_array(dn, "ibm,thread-groups", + thread_group_array, 3); + if (ret) + return ret; + + tg->property = thread_group_array[0]; + tg->nr_groups = thread_group_array[1]; + tg->threads_per_group = thread_group_array[2]; + if (tg->property != property || + tg->nr_groups < 1 || + tg->threads_per_group < 1) + return -ENODATA; + + total_threads = tg->nr_groups * tg->threads_per_group; + + ret = of_property_read_u32_array(dn, "ibm,thread-groups", + thread_group_array, + 3 + total_threads); + if (ret) + return ret; + + thread_list = &thread_group_array[3]; + + for (i = 0 ; i < total_threads; i++) + tg->thread_list[i] = thread_list[i]; + + return 0; +} + +/* + * get_cpu_thread_group_start : Searches the thread group in tg->thread_list + * that @cpu belongs to. + * + * @cpu : The logical CPU whose thread group is being searched. + * @tg : The thread-group structure of the CPU node which @cpu belongs + * to. + * + * Returns the index to tg->thread_list that points to the the start + * of the thread_group that @cpu belongs to. + * + * Returns -1 if cpu doesn't belong to any of the groups pointed to by + * tg->thread_list. + */ +static int get_cpu_thread_group_start(int cpu, struct thread_groups *tg) +{ + int hw_cpu_id = get_hard_smp_processor_id(cpu); + int i, j; + + for (i = 0; i < tg->nr_groups; i++) { + int group_start = i * tg->threads_per_group; + + for (j = 0; j < tg->threads_per_group; j++) { + int idx = group_start + j; + + if (tg->thread_list[idx] == hw_cpu_id) + return group_start; + } + } + + return -1; +} + +static int init_cpu_l1_cache_map(int cpu) + +{ + struct device_node *dn = of_get_cpu_node(cpu, NULL); + struct thread_groups tg = {.property = 0, + .nr_groups = 0, + .threads_per_group = 0}; + int first_thread = cpu_first_thread_sibling(cpu); + int i, cpu_group_start = -1, err = 0; + + if (!dn) + return -ENODATA; + + err = parse_thread_groups(dn, &tg, THREAD_GROUP_SHARE_L1); + if (err) + goto out; + + zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu), + GFP_KERNEL, + cpu_to_node(cpu)); + + cpu_group_start = get_cpu_thread_group_start(cpu, &tg); + + if (unlikely(cpu_group_start == -1)) { + WARN_ON_ONCE(1); + err = -ENODATA; + goto out; + } + + for (i = first_thread; i < first_thread + threads_per_core; i++) { + int i_group_start = get_cpu_thread_group_start(i, &tg); + + if (unlikely(i_group_start == -1)) { + WARN_ON_ONCE(1); + err = -ENODATA; + goto out; + } + + if (i_group_start == cpu_group_start) + cpumask_set_cpu(i, per_cpu(cpu_l1_cache_map, cpu)); + } + +out: + of_node_put(dn); + return err; +} + +static int init_big_cores(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + int err = init_cpu_l1_cache_map(cpu); + + if (err) + return err; + + zalloc_cpumask_var_node(&per_cpu(cpu_smallcore_map, cpu), + GFP_KERNEL, + cpu_to_node(cpu)); + } + + has_big_cores = true; + return 0; +} + void __init smp_prepare_cpus(unsigned int max_cpus) { unsigned int cpu; @@ -712,6 +911,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus) cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid)); cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid)); + init_big_cores(); + if (has_big_cores) { + cpumask_set_cpu(boot_cpuid, + cpu_smallcore_mask(boot_cpuid)); + } + if (smp_ops && smp_ops->probe) smp_ops->probe(); } @@ -995,10 +1200,28 @@ static void remove_cpu_from_masks(int cpu) set_cpus_unrelated(cpu, i, cpu_core_mask); set_cpus_unrelated(cpu, i, cpu_l2_cache_mask); set_cpus_unrelated(cpu, i, cpu_sibling_mask); + if (has_big_cores) + set_cpus_unrelated(cpu, i, cpu_smallcore_mask); } } #endif +static inline void add_cpu_to_smallcore_masks(int cpu) +{ + struct cpumask *this_l1_cache_map = per_cpu(cpu_l1_cache_map, cpu); + int i, first_thread = cpu_first_thread_sibling(cpu); + + if (!has_big_cores) + return; + + cpumask_set_cpu(cpu, cpu_smallcore_mask(cpu)); + + for (i = first_thread; i < first_thread + threads_per_core; i++) { + if (cpu_online(i) && cpumask_test_cpu(i, this_l1_cache_map)) + set_cpus_related(i, cpu, cpu_smallcore_mask); + } +} + static void add_cpu_to_masks(int cpu) { int first_thread = cpu_first_thread_sibling(cpu); @@ -1015,6 +1238,7 @@ static void add_cpu_to_masks(int cpu) if (cpu_online(i)) set_cpus_related(i, cpu, cpu_sibling_mask); + add_cpu_to_smallcore_masks(cpu); /* * Copy the thread sibling mask into the cache sibling mask * and mark any CPUs that share an L2 with this CPU. @@ -1044,6 +1268,7 @@ static bool shared_caches; void start_secondary(void *unused) { unsigned int cpu = smp_processor_id(); + struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask; mmgrab(&init_mm); current->active_mm = &init_mm; @@ -1069,11 +1294,13 @@ void start_secondary(void *unused) /* Update topology CPU masks */ add_cpu_to_masks(cpu); + if (has_big_cores) + sibling_mask = cpu_smallcore_mask; /* * Check for any shared caches. Note that this must be done on a * per-core basis because one core in the pair might be disabled. */ - if (!cpumask_equal(cpu_l2_cache_mask(cpu), cpu_sibling_mask(cpu))) + if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu))) shared_caches = true; set_numa_node(numa_cpu_lookup_table[cpu]); @@ -1083,6 +1310,8 @@ void start_secondary(void *unused) notify_cpu_starting(cpu); set_cpu_online(cpu, true); + boot_init_stack_canary(); + local_irq_enable(); /* We can enable ftrace for secondary cpus now */ @@ -1140,6 +1369,13 @@ static const struct cpumask *shared_cache_mask(int cpu) return cpu_l2_cache_mask(cpu); } +#ifdef CONFIG_SCHED_SMT +static const struct cpumask *smallcore_smt_mask(int cpu) +{ + return cpu_smallcore_mask(cpu); +} +#endif + static struct sched_domain_topology_level power9_topology[] = { #ifdef CONFIG_SCHED_SMT { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) }, @@ -1167,6 +1403,13 @@ void __init smp_cpus_done(unsigned int max_cpus) shared_proc_topology_init(); dump_numa_cpu_topology(); +#ifdef CONFIG_SCHED_SMT + if (has_big_cores) { + pr_info("Using small cores at SMT level\n"); + power9_topology[0].mask = smallcore_smt_mask; + powerpc_topology[0].mask = smallcore_smt_mask; + } +#endif /* * If any CPU detects that it's sharing a cache with another CPU then * use the deeper topology that is aware of this sharing. diff --git a/arch/powerpc/kernel/swsusp_asm64.S b/arch/powerpc/kernel/swsusp_asm64.S @@ -262,7 +262,7 @@ END_FW_FTR_SECTION_IFCLR(FW_FEATURE_LPAR) addi r1,r1,-128 #ifdef CONFIG_PPC_BOOK3S_64 - bl slb_flush_and_rebolt + bl slb_flush_and_restore_bolted #endif bl do_after_copyback addi r1,r1,128 diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c @@ -111,6 +111,7 @@ struct clock_event_device decrementer_clockevent = { .rating = 200, .irq = 0, .set_next_event = decrementer_set_next_event, + .set_state_oneshot_stopped = decrementer_shutdown, .set_state_shutdown = decrementer_shutdown, .tick_resume = decrementer_shutdown, .features = CLOCK_EVT_FEAT_ONESHOT | @@ -175,7 +176,7 @@ static void calc_cputime_factors(void) * Read the SPURR on systems that have it, otherwise the PURR, * or if that doesn't exist return the timebase value passed in. */ -static unsigned long read_spurr(unsigned long tb) +static inline unsigned long read_spurr(unsigned long tb) { if (cpu_has_feature(CPU_FTR_SPURR)) return mfspr(SPRN_SPURR); @@ -281,26 +282,17 @@ static inline u64 calculate_stolen_time(u64 stop_tb) * Account time for a transition between system, hard irq * or soft irq state. */ -static unsigned long vtime_delta(struct task_struct *tsk, - unsigned long *stime_scaled, - unsigned long *steal_time) +static unsigned long vtime_delta_scaled(struct cpu_accounting_data *acct, + unsigned long now, unsigned long stime) { - unsigned long now, nowscaled, deltascaled; - unsigned long stime; + unsigned long stime_scaled = 0; +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME + unsigned long nowscaled, deltascaled; unsigned long utime, utime_scaled; - struct cpu_accounting_data *acct = get_accounting(tsk); - WARN_ON_ONCE(!irqs_disabled()); - - now = mftb(); nowscaled = read_spurr(now); - stime = now - acct->starttime; - acct->starttime = now; deltascaled = nowscaled - acct->startspurr; acct->startspurr = nowscaled; - - *steal_time = calculate_stolen_time(now); - utime = acct->utime - acct->utime_sspurr; acct->utime_sspurr = acct->utime; @@ -314,17 +306,38 @@ static unsigned long vtime_delta(struct task_struct *tsk, * the user ticks get saved up in paca->user_time_scaled to be * used by account_process_tick. */ - *stime_scaled = stime; + stime_scaled = stime; utime_scaled = utime; if (deltascaled != stime + utime) { if (utime) { - *stime_scaled = deltascaled * stime / (stime + utime); - utime_scaled = deltascaled - *stime_scaled; + stime_scaled = deltascaled * stime / (stime + utime); + utime_scaled = deltascaled - stime_scaled; } else { - *stime_scaled = deltascaled; + stime_scaled = deltascaled; } } acct->utime_scaled += utime_scaled; +#endif + + return stime_scaled; +} + +static unsigned long vtime_delta(struct task_struct *tsk, + unsigned long *stime_scaled, + unsigned long *steal_time) +{ + unsigned long now, stime; + struct cpu_accounting_data *acct = get_accounting(tsk); + + WARN_ON_ONCE(!irqs_disabled()); + + now = mftb(); + stime = now - acct->starttime; + acct->starttime = now; + + *stime_scaled = vtime_delta_scaled(acct, now, stime); + + *steal_time = calculate_stolen_time(now); return stime; } @@ -341,7 +354,9 @@ void vtime_account_system(struct task_struct *tsk) if ((tsk->flags & PF_VCPU) && !irq_count()) { acct->gtime += stime; +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME acct->utime_scaled += stime_scaled; +#endif } else { if (hardirq_count()) acct->hardirq_time += stime; @@ -350,7 +365,9 @@ void vtime_account_system(struct task_struct *tsk) else acct->stime += stime; +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME acct->stime_scaled += stime_scaled; +#endif } } EXPORT_SYMBOL_GPL(vtime_account_system); @@ -364,6 +381,21 @@ void vtime_account_idle(struct task_struct *tsk) acct->idle_time += stime + steal_time; } +static void vtime_flush_scaled(struct task_struct *tsk, + struct cpu_accounting_data *acct) +{ +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME + if (acct->utime_scaled) + tsk->utimescaled += cputime_to_nsecs(acct->utime_scaled); + if (acct->stime_scaled) + tsk->stimescaled += cputime_to_nsecs(acct->stime_scaled); + + acct->utime_scaled = 0; + acct->utime_sspurr = 0; + acct->stime_scaled = 0; +#endif +} + /* * Account the whole cputime accumulated in the paca * Must be called with interrupts disabled. @@ -378,14 +410,13 @@ void vtime_flush(struct task_struct *tsk) if (acct->utime) account_user_time(tsk, cputime_to_nsecs(acct->utime)); - if (acct->utime_scaled) - tsk->utimescaled += cputime_to_nsecs(acct->utime_scaled); - if (acct->gtime) account_guest_time(tsk, cputime_to_nsecs(acct->gtime)); - if (acct->steal_time) + if (IS_ENABLED(CONFIG_PPC_SPLPAR) && acct->steal_time) { account_steal_time(cputime_to_nsecs(acct->steal_time)); + acct->steal_time = 0; + } if (acct->idle_time) account_idle_time(cputime_to_nsecs(acct->idle_time)); @@ -393,8 +424,6 @@ void vtime_flush(struct task_struct *tsk) if (acct->stime) account_system_index_time(tsk, cputime_to_nsecs(acct->stime), CPUTIME_SYSTEM); - if (acct->stime_scaled) - tsk->stimescaled += cputime_to_nsecs(acct->stime_scaled); if (acct->hardirq_time) account_system_index_time(tsk, cputime_to_nsecs(acct->hardirq_time), @@ -403,14 +432,12 @@ void vtime_flush(struct task_struct *tsk) account_system_index_time(tsk, cputime_to_nsecs(acct->softirq_time), CPUTIME_SOFTIRQ); + vtime_flush_scaled(tsk, acct); + acct->utime = 0; - acct->utime_scaled = 0; - acct->utime_sspurr = 0; acct->gtime = 0; - acct->steal_time = 0; acct->idle_time = 0; acct->stime = 0; - acct->stime_scaled = 0; acct->hardirq_time = 0; acct->softirq_time = 0; } @@ -984,10 +1011,14 @@ static void register_decrementer_clockevent(int cpu) *dec = decrementer_clockevent; dec->cpumask = cpumask_of(cpu); + clockevents_config_and_register(dec, ppc_tb_freq, 2, decrementer_max); + printk_once(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n", dec->name, dec->mult, dec->shift, cpu); - clockevents_register_device(dec); + /* Set values for KVM, see kvm_emulate_dec() */ + decrementer_clockevent.mult = dec->mult; + decrementer_clockevent.shift = dec->shift; } static void enable_large_decrementer(void) @@ -1035,18 +1066,7 @@ static void __init set_decrementer_max(void) static void __init init_decrementer_clockevent(void) { - int cpu = smp_processor_id(); - - clockevents_calc_mult_shift(&decrementer_clockevent, ppc_tb_freq, 4); - - decrementer_clockevent.max_delta_ns = - clockevent_delta2ns(decrementer_max, &decrementer_clockevent); - decrementer_clockevent.max_delta_ticks = decrementer_max; - decrementer_clockevent.min_delta_ns = - clockevent_delta2ns(2, &decrementer_clockevent); - decrementer_clockevent.min_delta_ticks = 2; - - register_decrementer_clockevent(cpu); + register_decrementer_clockevent(smp_processor_id()); } void secondary_cpu_time_init(void) diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S @@ -92,13 +92,14 @@ _GLOBAL(tm_abort) blr EXPORT_SYMBOL_GPL(tm_abort); -/* void tm_reclaim(struct thread_struct *thread, +/* + * void tm_reclaim(struct thread_struct *thread, * uint8_t cause) * * - Performs a full reclaim. This destroys outstanding - * transactions and updates thread->regs.tm_ckpt_* with the - * original checkpointed state. Note that thread->regs is - * unchanged. + * transactions and updates thread.ckpt_regs, thread.ckfp_state and + * thread.ckvr_state with the original checkpointed state. Note that + * thread->regs is unchanged. * * Purpose is to both abort transactions of, and preserve the state of, * a transactions at a context switch. We preserve/restore both sets of process @@ -163,15 +164,16 @@ _GLOBAL(tm_reclaim) */ TRECLAIM(R4) /* Cause in r4 */ - /* ******************** GPRs ******************** */ - /* Stash the checkpointed r13 away in the scratch SPR and get the real - * paca + /* + * ******************** GPRs ******************** + * Stash the checkpointed r13 in the scratch SPR and get the real paca. */ SET_SCRATCH0(r13) GET_PACA(r13) - /* Stash the checkpointed r1 away in paca tm_scratch and get the real - * stack pointer back + /* + * Stash the checkpointed r1 away in paca->tm_scratch and get the real + * stack pointer back into r1. */ std r1, PACATMSCRATCH(r13) ld r1, PACAR1(r13) @@ -209,14 +211,15 @@ _GLOBAL(tm_reclaim) addi r7, r12, PT_CKPT_REGS /* Thread's ckpt_regs */ - /* Make r7 look like an exception frame so that we - * can use the neat GPRx(n) macros. r7 is NOT a pt_regs ptr! + /* + * Make r7 look like an exception frame so that we can use the neat + * GPRx(n) macros. r7 is NOT a pt_regs ptr! */ subi r7, r7, STACK_FRAME_OVERHEAD /* Sync the userland GPRs 2-12, 14-31 to thread->regs: */ SAVE_GPR(0, r7) /* user r0 */ - SAVE_GPR(2, r7) /* user r2 */ + SAVE_GPR(2, r7) /* user r2 */ SAVE_4GPRS(3, r7) /* user r3-r6 */ SAVE_GPR(8, r7) /* user r8 */ SAVE_GPR(9, r7) /* user r9 */ @@ -237,7 +240,8 @@ _GLOBAL(tm_reclaim) /* ******************** NIP ******************** */ mfspr r3, SPRN_TFHAR std r3, _NIP(r7) /* Returns to failhandler */ - /* The checkpointed NIP is ignored when rescheduling/rechkpting, + /* + * The checkpointed NIP is ignored when rescheduling/rechkpting, * but is used in signal return to 'wind back' to the abort handler. */ @@ -260,12 +264,13 @@ _GLOBAL(tm_reclaim) std r3, THREAD_TM_TAR(r12) std r4, THREAD_TM_DSCR(r12) - /* MSR and flags: We don't change CRs, and we don't need to alter - * MSR. + /* + * MSR and flags: We don't change CRs, and we don't need to alter MSR. */ - /* ******************** FPR/VR/VSRs ************ + /* + * ******************** FPR/VR/VSRs ************ * After reclaiming, capture the checkpointed FPRs/VRs. * * We enabled VEC/FP/VSX in the msr above, so we can execute these @@ -275,7 +280,7 @@ _GLOBAL(tm_reclaim) /* Altivec (VEC/VMX/VR)*/ addi r7, r3, THREAD_CKVRSTATE - SAVE_32VRS(0, r6, r7) /* r6 scratch, r7 transact vr state */ + SAVE_32VRS(0, r6, r7) /* r6 scratch, r7 ckvr_state */ mfvscr v0 li r6, VRSTATE_VSCR stvx v0, r7, r6 @@ -286,12 +291,13 @@ _GLOBAL(tm_reclaim) /* Floating Point (FP) */ addi r7, r3, THREAD_CKFPSTATE - SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 transact fp state */ + SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 ckfp_state */ mffs fr0 stfd fr0,FPSTATE_FPSCR(r7) - /* TM regs, incl TEXASR -- these live in thread_struct. Note they've + /* + * TM regs, incl TEXASR -- these live in thread_struct. Note they've * been updated by the treclaim, to explain to userland the failure * cause (aborted). */ @@ -327,7 +333,7 @@ _GLOBAL(tm_reclaim) blr - /* + /* * void __tm_recheckpoint(struct thread_struct *thread) * - Restore the checkpointed register state saved by tm_reclaim * when we switch_to a process. @@ -343,7 +349,8 @@ _GLOBAL(__tm_recheckpoint) std r2, STK_GOT(r1) stdu r1, -TM_FRAME_SIZE(r1) - /* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD]. + /* + * We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD]. * This is used for backing up the NVGPRs: */ SAVE_NVGPRS(r1) @@ -352,8 +359,9 @@ _GLOBAL(__tm_recheckpoint) addi r7, r3, PT_CKPT_REGS /* Thread's ckpt_regs */ - /* Make r7 look like an exception frame so that we - * can use the neat GPRx(n) macros. r7 is now NOT a pt_regs ptr! + /* + * Make r7 look like an exception frame so that we can use the neat + * GPRx(n) macros. r7 is now NOT a pt_regs ptr! */ subi r7, r7, STACK_FRAME_OVERHEAD @@ -421,14 +429,15 @@ restore_gprs: REST_NVGPRS(r7) /* GPR14-31 */ - /* Load up PPR and DSCR here so we don't run with user values for long - */ + /* Load up PPR and DSCR here so we don't run with user values for long */ mtspr SPRN_DSCR, r5 mtspr SPRN_PPR, r6 - /* Do final sanity check on TEXASR to make sure FS is set. Do this + /* + * Do final sanity check on TEXASR to make sure FS is set. Do this * here before we load up the userspace r1 so any bugs we hit will get - * a call chain */ + * a call chain. + */ mfspr r5, SPRN_TEXASR srdi r5, r5, 16 li r6, (TEXASR_FS)@h @@ -436,8 +445,9 @@ restore_gprs: 1: tdeqi r6, 0 EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0 - /* Do final sanity check on MSR to make sure we are not transactional - * or suspended + /* + * Do final sanity check on MSR to make sure we are not transactional + * or suspended. */ mfmsr r6 li r5, (MSR_TS_MASK)@higher @@ -453,8 +463,8 @@ restore_gprs: REST_GPR(6, r7) /* - * Store r1 and r5 on the stack so that we can access them - * after we clear MSR RI. + * Store r1 and r5 on the stack so that we can access them after we + * clear MSR RI. */ REST_GPR(5, r7) @@ -484,7 +494,8 @@ restore_gprs: HMT_MEDIUM - /* Our transactional state has now changed. + /* + * Our transactional state has now changed. * * Now just get out of here. Transactional (current) state will be * updated once restore is called on the return path in the _switch-ed diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile @@ -3,11 +3,9 @@ # Makefile for the powerpc trace subsystem # -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror - ifdef CONFIG_FUNCTION_TRACER # do not trace tracer code -CFLAGS_REMOVE_ftrace.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE) endif obj32-$(CONFIG_FUNCTION_TRACER) += ftrace_32.o diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c @@ -30,6 +30,16 @@ #ifdef CONFIG_DYNAMIC_FTRACE + +/* + * We generally only have a single long_branch tramp and at most 2 or 3 plt + * tramps generated. But, we don't use the plt tramps currently. We also allot + * 2 tramps after .text and .init.text. So, we only end up with around 3 usable + * tramps in total. Set aside 8 just to be sure. + */ +#define NUM_FTRACE_TRAMPS 8 +static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS]; + static unsigned int ftrace_call_replace(unsigned long ip, unsigned long addr, int link) { @@ -85,13 +95,16 @@ static int test_24bit_addr(unsigned long ip, unsigned long addr) return create_branch((unsigned int *)ip, addr, 0); } -#ifdef CONFIG_MODULES - static int is_bl_op(unsigned int op) { return (op & 0xfc000003) == 0x48000001; } +static int is_b_op(unsigned int op) +{ + return (op & 0xfc000003) == 0x48000000; +} + static unsigned long find_bl_target(unsigned long ip, unsigned int op) { static int offset; @@ -104,6 +117,7 @@ static unsigned long find_bl_target(unsigned long ip, unsigned int op) return ip + (long)offset; } +#ifdef CONFIG_MODULES #ifdef CONFIG_PPC64 static int __ftrace_make_nop(struct module *mod, @@ -270,6 +284,146 @@ __ftrace_make_nop(struct module *mod, #endif /* PPC64 */ #endif /* CONFIG_MODULES */ +static unsigned long find_ftrace_tramp(unsigned long ip) +{ + int i; + + /* + * We have the compiler generated long_branch tramps at the end + * and we prefer those + */ + for (i = NUM_FTRACE_TRAMPS - 1; i >= 0; i--) + if (!ftrace_tramps[i]) + continue; + else if (create_branch((void *)ip, ftrace_tramps[i], 0)) + return ftrace_tramps[i]; + + return 0; +} + +static int add_ftrace_tramp(unsigned long tramp) +{ + int i; + + for (i = 0; i < NUM_FTRACE_TRAMPS; i++) + if (!ftrace_tramps[i]) { + ftrace_tramps[i] = tramp; + return 0; + } + + return -1; +} + +/* + * If this is a compiler generated long_branch trampoline (essentially, a + * trampoline that has a branch to _mcount()), we re-write the branch to + * instead go to ftrace_[regs_]caller() and note down the location of this + * trampoline. + */ +static int setup_mcount_compiler_tramp(unsigned long tramp) +{ + int i, op; + unsigned long ptr; + static unsigned long ftrace_plt_tramps[NUM_FTRACE_TRAMPS]; + + /* Is this a known long jump tramp? */ + for (i = 0; i < NUM_FTRACE_TRAMPS; i++) + if (!ftrace_tramps[i]) + break; + else if (ftrace_tramps[i] == tramp) + return 0; + + /* Is this a known plt tramp? */ + for (i = 0; i < NUM_FTRACE_TRAMPS; i++) + if (!ftrace_plt_tramps[i]) + break; + else if (ftrace_plt_tramps[i] == tramp) + return -1; + + /* New trampoline -- read where this goes */ + if (probe_kernel_read(&op, (void *)tramp, sizeof(int))) { + pr_debug("Fetching opcode failed.\n"); + return -1; + } + + /* Is this a 24 bit branch? */ + if (!is_b_op(op)) { + pr_debug("Trampoline is not a long branch tramp.\n"); + return -1; + } + + /* lets find where the pointer goes */ + ptr = find_bl_target(tramp, op); + + if (ptr != ppc_global_function_entry((void *)_mcount)) { + pr_debug("Trampoline target %p is not _mcount\n", (void *)ptr); + return -1; + } + + /* Let's re-write the tramp to go to ftrace_[regs_]caller */ +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS + ptr = ppc_global_function_entry((void *)ftrace_regs_caller); +#else + ptr = ppc_global_function_entry((void *)ftrace_caller); +#endif + if (!create_branch((void *)tramp, ptr, 0)) { + pr_debug("%ps is not reachable from existing mcount tramp\n", + (void *)ptr); + return -1; + } + + if (patch_branch((unsigned int *)tramp, ptr, 0)) { + pr_debug("REL24 out of range!\n"); + return -1; + } + + if (add_ftrace_tramp(tramp)) { + pr_debug("No tramp locations left\n"); + return -1; + } + + return 0; +} + +static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr) +{ + unsigned long tramp, ip = rec->ip; + unsigned int op; + + /* Read where this goes */ + if (probe_kernel_read(&op, (void *)ip, sizeof(int))) { + pr_err("Fetching opcode failed.\n"); + return -EFAULT; + } + + /* Make sure that that this is still a 24bit jump */ + if (!is_bl_op(op)) { + pr_err("Not expected bl: opcode is %x\n", op); + return -EINVAL; + } + + /* Let's find where the pointer goes */ + tramp = find_bl_target(ip, op); + + pr_devel("ip:%lx jumps to %lx", ip, tramp); + + if (setup_mcount_compiler_tramp(tramp)) { + /* Are other trampolines reachable? */ + if (!find_ftrace_tramp(ip)) { + pr_err("No ftrace trampolines reachable from %ps\n", + (void *)ip); + return -EINVAL; + } + } + + if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) { + pr_err("Patching NOP failed.\n"); + return -EPERM; + } + + return 0; +} + int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr) { @@ -286,7 +440,8 @@ int ftrace_make_nop(struct module *mod, old = ftrace_call_replace(ip, addr, 1); new = PPC_INST_NOP; return ftrace_modify_code(ip, old, new); - } + } else if (core_kernel_text(ip)) + return __ftrace_make_nop_kernel(rec, addr); #ifdef CONFIG_MODULES /* @@ -456,6 +611,53 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) #endif /* CONFIG_PPC64 */ #endif /* CONFIG_MODULES */ +static int __ftrace_make_call_kernel(struct dyn_ftrace *rec, unsigned long addr) +{ + unsigned int op; + void *ip = (void *)rec->ip; + unsigned long tramp, entry, ptr; + + /* Make sure we're being asked to patch branch to a known ftrace addr */ + entry = ppc_global_function_entry((void *)ftrace_caller); + ptr = ppc_global_function_entry((void *)addr); + + if (ptr != entry) { +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS + entry = ppc_global_function_entry((void *)ftrace_regs_caller); + if (ptr != entry) { +#endif + pr_err("Unknown ftrace addr to patch: %ps\n", (void *)ptr); + return -EINVAL; +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS + } +#endif + } + + /* Make sure we have a nop */ + if (probe_kernel_read(&op, ip, sizeof(op))) { + pr_err("Unable to read ftrace location %p\n", ip); + return -EFAULT; + } + + if (op != PPC_INST_NOP) { + pr_err("Unexpected call sequence at %p: %x\n", ip, op); + return -EINVAL; + } + + tramp = find_ftrace_tramp((unsigned long)ip); + if (!tramp) { + pr_err("No ftrace trampolines reachable from %ps\n", ip); + return -EINVAL; + } + + if (patch_branch(ip, tramp, BRANCH_SET_LINK)) { + pr_err("Error patching branch to ftrace tramp!\n"); + return -EINVAL; + } + + return 0; +} + int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) { unsigned long ip = rec->ip; @@ -471,7 +673,8 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) old = PPC_INST_NOP; new = ftrace_call_replace(ip, addr, 1); return ftrace_modify_code(ip, old, new); - } + } else if (core_kernel_text(ip)) + return __ftrace_make_call_kernel(rec, addr); #ifdef CONFIG_MODULES /* @@ -603,6 +806,12 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, old = ftrace_call_replace(ip, old_addr, 1); new = ftrace_call_replace(ip, addr, 1); return ftrace_modify_code(ip, old, new); + } else if (core_kernel_text(ip)) { + /* + * We always patch out of range locations to go to the regs + * variant, so there is nothing to do here + */ + return 0; } #ifdef CONFIG_MODULES @@ -654,10 +863,54 @@ void arch_ftrace_update_code(int command) ftrace_modify_all_code(command); } +#ifdef CONFIG_PPC64 +#define PACATOC offsetof(struct paca_struct, kernel_toc) + +#define PPC_LO(v) ((v) & 0xffff) +#define PPC_HI(v) (((v) >> 16) & 0xffff) +#define PPC_HA(v) PPC_HI ((v) + 0x8000) + +extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[]; + +int __init ftrace_dyn_arch_init(void) +{ + int i; + unsigned int *tramp[] = { ftrace_tramp_text, ftrace_tramp_init }; + u32 stub_insns[] = { + 0xe98d0000 | PACATOC, /* ld r12,PACATOC(r13) */ + 0x3d8c0000, /* addis r12,r12,<high> */ + 0x398c0000, /* addi r12,r12,<low> */ + 0x7d8903a6, /* mtctr r12 */ + 0x4e800420, /* bctr */ + }; +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS + unsigned long addr = ppc_global_function_entry((void *)ftrace_regs_caller); +#else + unsigned long addr = ppc_global_function_entry((void *)ftrace_caller); +#endif + long reladdr = addr - kernel_toc_addr(); + + if (reladdr > 0x7FFFFFFF || reladdr < -(0x80000000L)) { + pr_err("Address of %ps out of range of kernel_toc.\n", + (void *)addr); + return -1; + } + + for (i = 0; i < 2; i++) { + memcpy(tramp[i], stub_insns, sizeof(stub_insns)); + tramp[i][1] |= PPC_HA(reladdr); + tramp[i][2] |= PPC_LO(reladdr); + add_ftrace_tramp((unsigned long)tramp[i]); + } + + return 0; +} +#else int __init ftrace_dyn_arch_init(void) { return 0; } +#endif #endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_FUNCTION_GRAPH_TRACER diff --git a/arch/powerpc/kernel/trace/ftrace_64.S b/arch/powerpc/kernel/trace/ftrace_64.S @@ -14,6 +14,18 @@ #include <asm/ppc-opcode.h> #include <asm/export.h> +.pushsection ".tramp.ftrace.text","aw",@progbits; +.globl ftrace_tramp_text +ftrace_tramp_text: + .space 64 +.popsection + +.pushsection ".tramp.ftrace.init","aw",@progbits; +.globl ftrace_tramp_init +ftrace_tramp_init: + .space 64 +.popsection + _GLOBAL(mcount) _GLOBAL(_mcount) EXPORT_SYMBOL(_mcount) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c @@ -247,8 +247,6 @@ static void oops_end(unsigned long flags, struct pt_regs *regs, mdelay(MSEC_PER_SEC); } - if (in_interrupt()) - panic("Fatal exception in interrupt"); if (panic_on_oops) panic("Fatal exception"); do_exit(signr); @@ -535,10 +533,10 @@ int machine_check_e500mc(struct pt_regs *regs) printk("Caused by (from MCSR=%lx): ", reason); if (reason & MCSR_MCP) - printk("Machine Check Signal\n"); + pr_cont("Machine Check Signal\n"); if (reason & MCSR_ICPERR) { - printk("Instruction Cache Parity Error\n"); + pr_cont("Instruction Cache Parity Error\n"); /* * This is recoverable by invalidating the i-cache. @@ -556,7 +554,7 @@ int machine_check_e500mc(struct pt_regs *regs) } if (reason & MCSR_DCPERR_MC) { - printk("Data Cache Parity Error\n"); + pr_cont("Data Cache Parity Error\n"); /* * In write shadow mode we auto-recover from the error, but it @@ -575,38 +573,38 @@ int machine_check_e500mc(struct pt_regs *regs) } if (reason & MCSR_L2MMU_MHIT) { - printk("Hit on multiple TLB entries\n"); + pr_cont("Hit on multiple TLB entries\n"); recoverable = 0; } if (reason & MCSR_NMI) - printk("Non-maskable interrupt\n"); + pr_cont("Non-maskable interrupt\n"); if (reason & MCSR_IF) { - printk("Instruction Fetch Error Report\n"); + pr_cont("Instruction Fetch Error Report\n"); recoverable = 0; } if (reason & MCSR_LD) { - printk("Load Error Report\n"); + pr_cont("Load Error Report\n"); recoverable = 0; } if (reason & MCSR_ST) { - printk("Store Error Report\n"); + pr_cont("Store Error Report\n"); recoverable = 0; } if (reason & MCSR_LDG) { - printk("Guarded Load Error Report\n"); + pr_cont("Guarded Load Error Report\n"); recoverable = 0; } if (reason & MCSR_TLBSYNC) - printk("Simultaneous tlbsync operations\n"); + pr_cont("Simultaneous tlbsync operations\n"); if (reason & MCSR_BSL2_ERR) { - printk("Level 2 Cache Error\n"); + pr_cont("Level 2 Cache Error\n"); recoverable = 0; } @@ -616,7 +614,7 @@ int machine_check_e500mc(struct pt_regs *regs) addr = mfspr(SPRN_MCAR); addr |= (u64)mfspr(SPRN_MCARU) << 32; - printk("Machine Check %s Address: %#llx\n", + pr_cont("Machine Check %s Address: %#llx\n", reason & MCSR_MEA ? "Effective" : "Physical", addr); } @@ -640,29 +638,29 @@ int machine_check_e500(struct pt_regs *regs) printk("Caused by (from MCSR=%lx): ", reason); if (reason & MCSR_MCP) - printk("Machine Check Signal\n"); + pr_cont("Machine Check Signal\n"); if (reason & MCSR_ICPERR) - printk("Instruction Cache Parity Error\n"); + pr_cont("Instruction Cache Parity Error\n"); if (reason & MCSR_DCP_PERR) - printk("Data Cache Push Parity Error\n"); + pr_cont("Data Cache Push Parity Error\n"); if (reason & MCSR_DCPERR) - printk("Data Cache Parity Error\n"); + pr_cont("Data Cache Parity Error\n"); if (reason & MCSR_BUS_IAERR) - printk("Bus - Instruction Address Error\n"); + pr_cont("Bus - Instruction Address Error\n"); if (reason & MCSR_BUS_RAERR) - printk("Bus - Read Address Error\n"); + pr_cont("Bus - Read Address Error\n"); if (reason & MCSR_BUS_WAERR) - printk("Bus - Write Address Error\n"); + pr_cont("Bus - Write Address Error\n"); if (reason & MCSR_BUS_IBERR) - printk("Bus - Instruction Data Error\n"); + pr_cont("Bus - Instruction Data Error\n"); if (reason & MCSR_BUS_RBERR) - printk("Bus - Read Data Bus Error\n"); + pr_cont("Bus - Read Data Bus Error\n"); if (reason & MCSR_BUS_WBERR) - printk("Bus - Write Data Bus Error\n"); + pr_cont("Bus - Write Data Bus Error\n"); if (reason & MCSR_BUS_IPERR) - printk("Bus - Instruction Parity Error\n"); + pr_cont("Bus - Instruction Parity Error\n"); if (reason & MCSR_BUS_RPERR) - printk("Bus - Read Parity Error\n"); + pr_cont("Bus - Read Parity Error\n"); return 0; } @@ -680,19 +678,19 @@ int machine_check_e200(struct pt_regs *regs) printk("Caused by (from MCSR=%lx): ", reason); if (reason & MCSR_MCP) - printk("Machine Check Signal\n"); + pr_cont("Machine Check Signal\n"); if (reason & MCSR_CP_PERR) - printk("Cache Push Parity Error\n"); + pr_cont("Cache Push Parity Error\n"); if (reason & MCSR_CPERR) - printk("Cache Parity Error\n"); + pr_cont("Cache Parity Error\n"); if (reason & MCSR_EXCP_ERR) - printk("ISI, ITLB, or Bus Error on first instruction fetch for an exception handler\n"); + pr_cont("ISI, ITLB, or Bus Error on first instruction fetch for an exception handler\n"); if (reason & MCSR_BUS_IRERR) - printk("Bus - Read Bus Error on instruction fetch\n"); + pr_cont("Bus - Read Bus Error on instruction fetch\n"); if (reason & MCSR_BUS_DRERR) - printk("Bus - Read Bus Error on data load\n"); + pr_cont("Bus - Read Bus Error on data load\n"); if (reason & MCSR_BUS_WRERR) - printk("Bus - Write Bus Error on buffered store or cache line push\n"); + pr_cont("Bus - Write Bus Error on buffered store or cache line push\n"); return 0; } @@ -705,30 +703,30 @@ int machine_check_generic(struct pt_regs *regs) printk("Caused by (from SRR1=%lx): ", reason); switch (reason & 0x601F0000) { case 0x80000: - printk("Machine check signal\n"); + pr_cont("Machine check signal\n"); break; case 0: /* for 601 */ case 0x40000: case 0x140000: /* 7450 MSS error and TEA */ - printk("Transfer error ack signal\n"); + pr_cont("Transfer error ack signal\n"); break; case 0x20000: - printk("Data parity error signal\n"); + pr_cont("Data parity error signal\n"); break; case 0x10000: - printk("Address parity error signal\n"); + pr_cont("Address parity error signal\n"); break; case 0x20000000: - printk("L1 Data Cache error\n"); + pr_cont("L1 Data Cache error\n"); break; case 0x40000000: - printk("L1 Instruction Cache error\n"); + pr_cont("L1 Instruction Cache error\n"); break; case 0x00100000: - printk("L2 data cache parity error\n"); + pr_cont("L2 data cache parity error\n"); break; default: - printk("Unknown values in msr\n"); + pr_cont("Unknown values in msr\n"); } return 0; } @@ -741,9 +739,7 @@ void machine_check_exception(struct pt_regs *regs) if (!nested) nmi_enter(); - /* 64s accounts the mce in machine_check_early when in HVMODE */ - if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) || !cpu_has_feature(CPU_FTR_HVMODE)) - __this_cpu_inc(irq_stat.mce_exceptions); + __this_cpu_inc(irq_stat.mce_exceptions); add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); @@ -767,12 +763,17 @@ void machine_check_exception(struct pt_regs *regs) if (check_io_access(regs)) goto bail; - die("Machine check", regs, SIGBUS); - /* Must die if the interrupt is not recoverable */ if (!(regs->msr & MSR_RI)) nmi_panic(regs, "Unrecoverable Machine check"); + if (!nested) + nmi_exit(); + + die("Machine check", regs, SIGBUS); + + return; + bail: if (!nested) nmi_exit(); @@ -1433,7 +1434,7 @@ void program_check_exception(struct pt_regs *regs) goto bail; } else { printk(KERN_EMERG "Unexpected TM Bad Thing exception " - "at %lx (msr 0x%x)\n", regs->nip, reason); + "at %lx (msr 0x%lx)\n", regs->nip, regs->msr); die("Unrecoverable exception", regs, SIGABRT); } } @@ -1547,14 +1548,6 @@ void StackOverflow(struct pt_regs *regs) panic("kernel stack overflow"); } -void nonrecoverable_exception(struct pt_regs *regs) -{ - printk(KERN_ERR "Non-recoverable exception at PC=%lx MSR=%lx\n", - regs->nip, regs->msr); - debugger(regs); - die("nonrecoverable exception", regs, SIGKILL); -} - void kernel_fp_unavailable_exception(struct pt_regs *regs) { enum ctx_state prev_state = exception_enter(); @@ -1750,16 +1743,20 @@ void fp_unavailable_tm(struct pt_regs *regs) * checkpointed FP registers need to be loaded. */ tm_reclaim_current(TM_CAUSE_FAC_UNAV); - /* Reclaim didn't save out any FPRs to transact_fprs. */ + + /* + * Reclaim initially saved out bogus (lazy) FPRs to ckfp_state, and + * then it was overwrite by the thr->fp_state by tm_reclaim_thread(). + * + * At this point, ck{fp,vr}_state contains the exact values we want to + * recheckpoint. + */ /* Enable FP for the task: */ current->thread.load_fp = 1; - /* This loads and recheckpoints the FP registers from - * thread.fpr[]. They will remain in registers after the - * checkpoint so we don't need to reload them after. - * If VMX is in use, the VRs now hold checkpointed values, - * so we don't want to load the VRs from the thread_struct. + /* + * Recheckpoint all the checkpointed ckpt, ck{fp, vr}_state registers. */ tm_recheckpoint(&current->thread); } @@ -2086,8 +2083,8 @@ void SPEFloatingPointRoundException(struct pt_regs *regs) */ void unrecoverable_exception(struct pt_regs *regs) { - printk(KERN_EMERG "Unrecoverable exception %lx at %lx\n", - regs->trap, regs->nip); + pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n", + regs->trap, regs->nip, regs->msr); die("Unrecoverable exception", regs, SIGABRT); } NOKPROBE_SYMBOL(unrecoverable_exception); diff --git a/arch/powerpc/kernel/vdso32/datapage.S b/arch/powerpc/kernel/vdso32/datapage.S @@ -37,6 +37,7 @@ data_page_branch: mtlr r0 addi r3, r3, __kernel_datapage_offset-data_page_branch lwz r0,0(r3) + .cfi_restore lr add r3,r0,r3 blr .cfi_endproc diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S @@ -139,6 +139,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime) */ 99: li r0,__NR_clock_gettime + .cfi_restore lr sc blr .cfi_endproc diff --git a/arch/powerpc/kernel/vdso64/datapage.S b/arch/powerpc/kernel/vdso64/datapage.S @@ -37,6 +37,7 @@ data_page_branch: mtlr r0 addi r3, r3, __kernel_datapage_offset-data_page_branch lwz r0,0(r3) + .cfi_restore lr add r3,r0,r3 blr .cfi_endproc diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/powerpc/kernel/vdso64/gettimeofday.S @@ -169,6 +169,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime) */ 99: li r0,__NR_clock_gettime + .cfi_restore lr sc blr .cfi_endproc diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S @@ -4,6 +4,9 @@ #else #define PROVIDE32(x) PROVIDE(x) #endif + +#define BSS_FIRST_SECTIONS *(.bss.prominit) + #include <asm/page.h> #include <asm-generic/vmlinux.lds.h> #include <asm/cache.h> @@ -99,6 +102,9 @@ SECTIONS #endif /* careful! __ftr_alt_* sections need to be close to .text */ *(.text.hot TEXT_MAIN .text.fixup .text.unlikely .fixup __ftr_alt_* .ref.text); +#ifdef CONFIG_PPC64 + *(.tramp.ftrace.text); +#endif SCHED_TEXT CPUIDLE_TEXT LOCK_TEXT @@ -181,7 +187,15 @@ SECTIONS */ . = ALIGN(STRICT_ALIGN_SIZE); __init_begin = .; - INIT_TEXT_SECTION(PAGE_SIZE) :kernel + . = ALIGN(PAGE_SIZE); + .init.text : AT(ADDR(.init.text) - LOAD_OFFSET) { + _sinittext = .; + INIT_TEXT + _einittext = .; +#ifdef CONFIG_PPC64 + *(.tramp.ftrace.init); +#endif + } :kernel /* .exit.text is discarded at runtime, not link time, * to deal with references from __bug_table diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile @@ -3,8 +3,6 @@ # Makefile for Kernel-based Virtual Machine module # -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror - ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm KVM := ../../../virt/kvm diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile @@ -3,8 +3,6 @@ # Makefile for ppc-specific library files.. # -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror - ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) CFLAGS_REMOVE_code-patching.o = $(CC_FLAGS_FTRACE) @@ -14,6 +12,8 @@ obj-y += string.o alloc.o code-patching.o feature-fixups.o obj-$(CONFIG_PPC32) += div64.o copy_32.o crtsavres.o strlen_32.o +obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o + # See corresponding test in arch/powerpc/Makefile # 64-bit linker creates .sfpr on demand for final link (vmlinux), # so it is only needed for modules, and only for older linkers which diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c @@ -98,8 +98,7 @@ static int map_patch_area(void *addr, unsigned long text_poke_addr) else pfn = __pa_symbol(addr) >> PAGE_SHIFT; - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), - pgprot_val(PAGE_KERNEL)); + err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL); pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err); if (err) diff --git a/arch/powerpc/lib/error-inject.c b/arch/powerpc/lib/error-inject.c @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include <linux/error-injection.h> +#include <linux/kprobes.h> +#include <linux/uaccess.h> + +void override_function_with_return(struct pt_regs *regs) +{ + /* + * Emulate 'blr'. 'regs' represents the state on entry of a predefined + * function in the kernel/module, captured on a kprobe. We don't need + * to worry about 32-bit userspace on a 64-bit kernel. + */ + regs->nip = regs->link; +} +NOKPROBE_SYMBOL(override_function_with_return); diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S @@ -40,7 +40,7 @@ _GLOBAL(memset) .Lms: PPC_MTOCRF(1,r0) mr r6,r3 blt cr1,8f - beq+ 3f /* if already 8-byte aligned */ + beq 3f /* if already 8-byte aligned */ subf r5,r0,r5 bf 31,1f stb r4,0(r6) @@ -85,7 +85,7 @@ _GLOBAL(memset) addi r6,r6,8 8: cmpwi r5,0 PPC_MTOCRF(1,r5) - beqlr+ + beqlr bf 29,9f stw r4,0(r6) addi r6,r6,4 diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c @@ -67,7 +67,7 @@ void __init MMU_init_hw(void) /* PIN up to the 3 first 8Mb after IMMR in DTLB table */ #ifdef CONFIG_PIN_TLB_DATA unsigned long ctr = mfspr(SPRN_MD_CTR) & 0xfe000000; - unsigned long flags = 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY; + unsigned long flags = 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY; #ifdef CONFIG_PIN_TLB_IMMR int i = 29; #else @@ -91,11 +91,10 @@ static void __init mmu_mapin_immr(void) { unsigned long p = PHYS_IMMR_BASE; unsigned long v = VIRT_IMMR_BASE; - unsigned long f = pgprot_val(PAGE_KERNEL_NCG); int offset; for (offset = 0; offset < IMMR_SIZE; offset += PAGE_SIZE) - map_kernel_page(v + offset, p + offset, f); + map_kernel_page(v + offset, p + offset, PAGE_KERNEL_NCG); } /* Address of instructions to patch */ diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile @@ -3,10 +3,10 @@ # Makefile for the linux ppc-specific parts of the memory manager. # -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror - ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) +CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE) + obj-y := fault.o mem.o pgtable.o mmap.o \ init_$(BITS).o pgtable_$(BITS).o \ init-common.o mmu_context.o drmem.o @@ -15,7 +15,7 @@ obj-$(CONFIG_PPC_MMU_NOHASH) += mmu_context_nohash.o tlb_nohash.o \ obj-$(CONFIG_PPC_BOOK3E) += tlb_low_$(BITS)e.o hash64-$(CONFIG_PPC_NATIVE) := hash_native_64.o obj-$(CONFIG_PPC_BOOK3E_64) += pgtable-book3e.o -obj-$(CONFIG_PPC_BOOK3S_64) += pgtable-hash64.o hash_utils_64.o slb_low.o slb.o $(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o +obj-$(CONFIG_PPC_BOOK3S_64) += pgtable-hash64.o hash_utils_64.o slb.o $(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o obj-$(CONFIG_PPC_RADIX_MMU) += pgtable-radix.o tlb-radix.o obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o hash_low_32.o mmu_context_hash32.o obj-$(CONFIG_PPC_STD_MMU) += tlb_hash$(BITS).o @@ -43,5 +43,12 @@ obj-$(CONFIG_HIGHMEM) += highmem.o obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o obj-$(CONFIG_SPAPR_TCE_IOMMU) += mmu_context_iommu.o obj-$(CONFIG_PPC_PTDUMP) += dump_linuxpagetables.o +ifdef CONFIG_PPC_PTDUMP +obj-$(CONFIG_4xx) += dump_linuxpagetables-generic.o +obj-$(CONFIG_PPC_8xx) += dump_linuxpagetables-8xx.o +obj-$(CONFIG_PPC_BOOK3E_MMU) += dump_linuxpagetables-generic.o +obj-$(CONFIG_PPC_BOOK3S_32) += dump_linuxpagetables-generic.o +obj-$(CONFIG_PPC_BOOK3S_64) += dump_linuxpagetables-book3s64.o +endif obj-$(CONFIG_PPC_HTDUMP) += dump_hashpagetable.o obj-$(CONFIG_PPC_MEM_KEYS) += pkeys.o diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c @@ -228,7 +228,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t do { SetPageReserved(page); map_kernel_page(vaddr, page_to_phys(page), - pgprot_val(pgprot_noncached(PAGE_KERNEL))); + pgprot_noncached(PAGE_KERNEL)); page++; vaddr += PAGE_SIZE; } while (size -= PAGE_SIZE); diff --git a/arch/powerpc/mm/dump_linuxpagetables-8xx.c b/arch/powerpc/mm/dump_linuxpagetables-8xx.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * From split of dump_linuxpagetables.c + * Copyright 2016, Rashmica Gupta, IBM Corp. + * + */ +#include <linux/kernel.h> +#include <asm/pgtable.h> + +#include "dump_linuxpagetables.h" + +static const struct flag_info flag_array[] = { + { + .mask = _PAGE_SH, + .val = 0, + .set = "user", + .clear = " ", + }, { + .mask = _PAGE_RO | _PAGE_NA, + .val = 0, + .set = "rw", + }, { + .mask = _PAGE_RO | _PAGE_NA, + .val = _PAGE_RO, + .set = "r ", + }, { + .mask = _PAGE_RO | _PAGE_NA, + .val = _PAGE_NA, + .set = " ", + }, { + .mask = _PAGE_EXEC, + .val = _PAGE_EXEC, + .set = " X ", + .clear = " ", + }, { + .mask = _PAGE_PRESENT, + .val = _PAGE_PRESENT, + .set = "present", + .clear = " ", + }, { + .mask = _PAGE_GUARDED, + .val = _PAGE_GUARDED, + .set = "guarded", + .clear = " ", + }, { + .mask = _PAGE_DIRTY, + .val = _PAGE_DIRTY, + .set = "dirty", + .clear = " ", + }, { + .mask = _PAGE_ACCESSED, + .val = _PAGE_ACCESSED, + .set = "accessed", + .clear = " ", + }, { + .mask = _PAGE_NO_CACHE, + .val = _PAGE_NO_CACHE, + .set = "no cache", + .clear = " ", + }, { + .mask = _PAGE_SPECIAL, + .val = _PAGE_SPECIAL, + .set = "special", + } +}; + +struct pgtable_level pg_level[5] = { + { + }, { /* pgd */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pud */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pmd */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pte */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, +}; diff --git a/arch/powerpc/mm/dump_linuxpagetables-book3s64.c b/arch/powerpc/mm/dump_linuxpagetables-book3s64.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * From split of dump_linuxpagetables.c + * Copyright 2016, Rashmica Gupta, IBM Corp. + * + */ +#include <linux/kernel.h> +#include <asm/pgtable.h> + +#include "dump_linuxpagetables.h" + +static const struct flag_info flag_array[] = { + { + .mask = _PAGE_PRIVILEGED, + .val = 0, + .set = "user", + .clear = " ", + }, { + .mask = _PAGE_READ, + .val = _PAGE_READ, + .set = "r", + .clear = " ", + }, { + .mask = _PAGE_WRITE, + .val = _PAGE_WRITE, + .set = "w", + .clear = " ", + }, { + .mask = _PAGE_EXEC, + .val = _PAGE_EXEC, + .set = " X ", + .clear = " ", + }, { + .mask = _PAGE_PTE, + .val = _PAGE_PTE, + .set = "pte", + .clear = " ", + }, { + .mask = _PAGE_PRESENT, + .val = _PAGE_PRESENT, + .set = "valid", + .clear = " ", + }, { + .mask = _PAGE_PRESENT | _PAGE_INVALID, + .val = 0, + .set = " ", + .clear = "present", + }, { + .mask = H_PAGE_HASHPTE, + .val = H_PAGE_HASHPTE, + .set = "hpte", + .clear = " ", + }, { + .mask = _PAGE_DIRTY, + .val = _PAGE_DIRTY, + .set = "dirty", + .clear = " ", + }, { + .mask = _PAGE_ACCESSED, + .val = _PAGE_ACCESSED, + .set = "accessed", + .clear = " ", + }, { + .mask = _PAGE_NON_IDEMPOTENT, + .val = _PAGE_NON_IDEMPOTENT, + .set = "non-idempotent", + .clear = " ", + }, { + .mask = _PAGE_TOLERANT, + .val = _PAGE_TOLERANT, + .set = "tolerant", + .clear = " ", + }, { + .mask = H_PAGE_BUSY, + .val = H_PAGE_BUSY, + .set = "busy", + }, { +#ifdef CONFIG_PPC_64K_PAGES + .mask = H_PAGE_COMBO, + .val = H_PAGE_COMBO, + .set = "combo", + }, { + .mask = H_PAGE_4K_PFN, + .val = H_PAGE_4K_PFN, + .set = "4K_pfn", + }, { +#else /* CONFIG_PPC_64K_PAGES */ + .mask = H_PAGE_F_GIX, + .val = H_PAGE_F_GIX, + .set = "f_gix", + .is_val = true, + .shift = H_PAGE_F_GIX_SHIFT, + }, { + .mask = H_PAGE_F_SECOND, + .val = H_PAGE_F_SECOND, + .set = "f_second", + }, { +#endif /* CONFIG_PPC_64K_PAGES */ + .mask = _PAGE_SPECIAL, + .val = _PAGE_SPECIAL, + .set = "special", + } +}; + +struct pgtable_level pg_level[5] = { + { + }, { /* pgd */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pud */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pmd */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pte */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, +}; diff --git a/arch/powerpc/mm/dump_linuxpagetables-generic.c b/arch/powerpc/mm/dump_linuxpagetables-generic.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * From split of dump_linuxpagetables.c + * Copyright 2016, Rashmica Gupta, IBM Corp. + * + */ +#include <linux/kernel.h> +#include <asm/pgtable.h> + +#include "dump_linuxpagetables.h" + +static const struct flag_info flag_array[] = { + { + .mask = _PAGE_USER, + .val = _PAGE_USER, + .set = "user", + .clear = " ", + }, { + .mask = _PAGE_RW, + .val = _PAGE_RW, + .set = "rw", + .clear = "r ", + }, { +#ifndef CONFIG_PPC_BOOK3S_32 + .mask = _PAGE_EXEC, + .val = _PAGE_EXEC, + .set = " X ", + .clear = " ", + }, { +#endif + .mask = _PAGE_PRESENT, + .val = _PAGE_PRESENT, + .set = "present", + .clear = " ", + }, { + .mask = _PAGE_GUARDED, + .val = _PAGE_GUARDED, + .set = "guarded", + .clear = " ", + }, { + .mask = _PAGE_DIRTY, + .val = _PAGE_DIRTY, + .set = "dirty", + .clear = " ", + }, { + .mask = _PAGE_ACCESSED, + .val = _PAGE_ACCESSED, + .set = "accessed", + .clear = " ", + }, { + .mask = _PAGE_WRITETHRU, + .val = _PAGE_WRITETHRU, + .set = "write through", + .clear = " ", + }, { + .mask = _PAGE_NO_CACHE, + .val = _PAGE_NO_CACHE, + .set = "no cache", + .clear = " ", + }, { + .mask = _PAGE_SPECIAL, + .val = _PAGE_SPECIAL, + .set = "special", + } +}; + +struct pgtable_level pg_level[5] = { + { + }, { /* pgd */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pud */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pmd */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, { /* pte */ + .flag = flag_array, + .num = ARRAY_SIZE(flag_array), + }, +}; diff --git a/arch/powerpc/mm/dump_linuxpagetables.c b/arch/powerpc/mm/dump_linuxpagetables.c @@ -27,6 +27,8 @@ #include <asm/page.h> #include <asm/pgalloc.h> +#include "dump_linuxpagetables.h" + #ifdef CONFIG_PPC32 #define KERN_VIRT_START 0 #endif @@ -101,159 +103,6 @@ static struct addr_marker address_markers[] = { { -1, NULL }, }; -struct flag_info { - u64 mask; - u64 val; - const char *set; - const char *clear; - bool is_val; - int shift; -}; - -static const struct flag_info flag_array[] = { - { - .mask = _PAGE_USER | _PAGE_PRIVILEGED, - .val = _PAGE_USER, - .set = "user", - .clear = " ", - }, { - .mask = _PAGE_RW | _PAGE_RO | _PAGE_NA, - .val = _PAGE_RW, - .set = "rw", - }, { - .mask = _PAGE_RW | _PAGE_RO | _PAGE_NA, - .val = _PAGE_RO, - .set = "ro", - }, { -#if _PAGE_NA != 0 - .mask = _PAGE_RW | _PAGE_RO | _PAGE_NA, - .val = _PAGE_RO, - .set = "na", - }, { -#endif - .mask = _PAGE_EXEC, - .val = _PAGE_EXEC, - .set = " X ", - .clear = " ", - }, { - .mask = _PAGE_PTE, - .val = _PAGE_PTE, - .set = "pte", - .clear = " ", - }, { - .mask = _PAGE_PRESENT, - .val = _PAGE_PRESENT, - .set = "present", - .clear = " ", - }, { -#ifdef CONFIG_PPC_BOOK3S_64 - .mask = H_PAGE_HASHPTE, - .val = H_PAGE_HASHPTE, -#else - .mask = _PAGE_HASHPTE, - .val = _PAGE_HASHPTE, -#endif - .set = "hpte", - .clear = " ", - }, { -#ifndef CONFIG_PPC_BOOK3S_64 - .mask = _PAGE_GUARDED, - .val = _PAGE_GUARDED, - .set = "guarded", - .clear = " ", - }, { -#endif - .mask = _PAGE_DIRTY, - .val = _PAGE_DIRTY, - .set = "dirty", - .clear = " ", - }, { - .mask = _PAGE_ACCESSED, - .val = _PAGE_ACCESSED, - .set = "accessed", - .clear = " ", - }, { -#ifndef CONFIG_PPC_BOOK3S_64 - .mask = _PAGE_WRITETHRU, - .val = _PAGE_WRITETHRU, - .set = "write through", - .clear = " ", - }, { -#endif -#ifndef CONFIG_PPC_BOOK3S_64 - .mask = _PAGE_NO_CACHE, - .val = _PAGE_NO_CACHE, - .set = "no cache", - .clear = " ", - }, { -#else - .mask = _PAGE_NON_IDEMPOTENT, - .val = _PAGE_NON_IDEMPOTENT, - .set = "non-idempotent", - .clear = " ", - }, { - .mask = _PAGE_TOLERANT, - .val = _PAGE_TOLERANT, - .set = "tolerant", - .clear = " ", - }, { -#endif -#ifdef CONFIG_PPC_BOOK3S_64 - .mask = H_PAGE_BUSY, - .val = H_PAGE_BUSY, - .set = "busy", - }, { -#ifdef CONFIG_PPC_64K_PAGES - .mask = H_PAGE_COMBO, - .val = H_PAGE_COMBO, - .set = "combo", - }, { - .mask = H_PAGE_4K_PFN, - .val = H_PAGE_4K_PFN, - .set = "4K_pfn", - }, { -#else /* CONFIG_PPC_64K_PAGES */ - .mask = H_PAGE_F_GIX, - .val = H_PAGE_F_GIX, - .set = "f_gix", - .is_val = true, - .shift = H_PAGE_F_GIX_SHIFT, - }, { - .mask = H_PAGE_F_SECOND, - .val = H_PAGE_F_SECOND, - .set = "f_second", - }, { -#endif /* CONFIG_PPC_64K_PAGES */ -#endif - .mask = _PAGE_SPECIAL, - .val = _PAGE_SPECIAL, - .set = "special", - } -}; - -struct pgtable_level { - const struct flag_info *flag; - size_t num; - u64 mask; -}; - -static struct pgtable_level pg_level[] = { - { - }, { /* pgd */ - .flag = flag_array, - .num = ARRAY_SIZE(flag_array), - }, { /* pud */ - .flag = flag_array, - .num = ARRAY_SIZE(flag_array), - }, { /* pmd */ - .flag = flag_array, - .num = ARRAY_SIZE(flag_array), - }, { /* pte */ - .flag = flag_array, - .num = ARRAY_SIZE(flag_array), - }, -}; - static void dump_flag_info(struct pg_state *st, const struct flag_info *flag, u64 pte, int num) { @@ -418,12 +267,13 @@ static void walk_pagetables(struct pg_state *st) unsigned int i; unsigned long addr; + addr = st->start_address; + /* * Traverse the linux pagetable structure and dump pages that are in * the hash pagetable. */ - for (i = 0; i < PTRS_PER_PGD; i++, pgd++) { - addr = KERN_VIRT_START + i * PGDIR_SIZE; + for (i = 0; i < PTRS_PER_PGD; i++, pgd++, addr += PGDIR_SIZE) { if (!pgd_none(*pgd) && !pgd_huge(*pgd)) /* pgd exists */ walk_pud(st, pgd, addr); @@ -472,9 +322,14 @@ static int ptdump_show(struct seq_file *m, void *v) { struct pg_state st = { .seq = m, - .start_address = KERN_VIRT_START, .marker = address_markers, }; + + if (radix_enabled()) + st.start_address = PAGE_OFFSET; + else + st.start_address = KERN_VIRT_START; + /* Traverse kernel page tables */ walk_pagetables(&st); note_page(&st, 0, 0, 0); diff --git a/arch/powerpc/mm/dump_linuxpagetables.h b/arch/powerpc/mm/dump_linuxpagetables.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include <linux/types.h> + +struct flag_info { + u64 mask; + u64 val; + const char *set; + const char *clear; + bool is_val; + int shift; +}; + +struct pgtable_level { + const struct flag_info *flag; + size_t num; + u64 mask; +}; + +extern struct pgtable_level pg_level[5]; diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c @@ -115,6 +115,8 @@ static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is) tlbiel_hash_set_isa300(0, is, 0, 2, 1); asm volatile("ptesync": : :"memory"); + + asm volatile(PPC_INVALIDATE_ERAT "; isync" : : :"memory"); } void hash__tlbiel_all(unsigned int action) @@ -140,8 +142,6 @@ void hash__tlbiel_all(unsigned int action) tlbiel_all_isa206(POWER7_TLB_SETS, is); else WARN(1, "%s called on pre-POWER7 CPU\n", __func__); - - asm volatile(PPC_INVALIDATE_ERAT "; isync" : : :"memory"); } static inline unsigned long ___tlbie(unsigned long vpn, int psize, diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c @@ -1001,9 +1001,9 @@ void __init hash__early_init_mmu(void) * 4k use hugepd format, so for hash set then to * zero */ - __pmd_val_bits = 0; - __pud_val_bits = 0; - __pgd_val_bits = 0; + __pmd_val_bits = HASH_PMD_VAL_BITS; + __pud_val_bits = HASH_PUD_VAL_BITS; + __pgd_val_bits = HASH_PGD_VAL_BITS; __kernel_virt_start = H_KERN_VIRT_START; __kernel_virt_size = H_KERN_VIRT_SIZE; @@ -1125,7 +1125,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr) if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) { copy_mm_to_paca(mm); - slb_flush_and_rebolt(); + slb_flush_and_restore_bolted(); } } #endif /* CONFIG_PPC_64K_PAGES */ @@ -1197,7 +1197,7 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm, if (user_region) { if (psize != get_paca_psize(ea)) { copy_mm_to_paca(mm); - slb_flush_and_rebolt(); + slb_flush_and_restore_bolted(); } } else if (get_paca()->vmalloc_sllp != mmu_psize_defs[mmu_vmalloc_psize].sllp) { @@ -1482,7 +1482,7 @@ static bool should_hash_preload(struct mm_struct *mm, unsigned long ea) #endif void hash_preload(struct mm_struct *mm, unsigned long ea, - unsigned long access, unsigned long trap) + bool is_exec, unsigned long trap) { int hugepage_shift; unsigned long vsid; @@ -1490,6 +1490,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea, pte_t *ptep; unsigned long flags; int rc, ssize, update_flags = 0; + unsigned long access = _PAGE_PRESENT | _PAGE_READ | (is_exec ? _PAGE_EXEC : 0); BUG_ON(REGION_ID(ea) != USER_REGION_ID); diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c @@ -51,6 +51,12 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid, new_pmd |= _PAGE_DIRTY; } while (!pmd_xchg(pmdp, __pmd(old_pmd), __pmd(new_pmd))); + /* + * Make sure this is thp or devmap entry + */ + if (!(old_pmd & (H_PAGE_THP_HUGE | _PAGE_DEVMAP))) + return 0; + rflags = htab_convert_pte_flags(new_pmd); #if 0 diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -62,6 +62,10 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, new_pte |= _PAGE_DIRTY; } while(!pte_xchg(ptep, __pte(old_pte), __pte(new_pte))); + /* Make sure this is a hugetlb entry */ + if (old_pte & (H_PAGE_THP_HUGE | _PAGE_DEVMAP)) + return 0; + rflags = htab_convert_pte_flags(new_pte); if (unlikely(mmu_psize == MMU_PAGE_16G)) offset = PTRS_PER_PUD; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c @@ -19,6 +19,7 @@ #include <linux/moduleparam.h> #include <linux/swap.h> #include <linux/swapops.h> +#include <linux/kmemleak.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> #include <asm/tlb.h> @@ -95,7 +96,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp, break; else { #ifdef CONFIG_PPC_BOOK3S_64 - *hpdp = __hugepd(__pa(new) | + *hpdp = __hugepd(__pa(new) | HUGEPD_VAL_BITS | (shift_to_mmu_psize(pshift) << 2)); #elif defined(CONFIG_PPC_8xx) *hpdp = __hugepd(__pa(new) | _PMD_USER | @@ -112,6 +113,8 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp, for (i = i - 1 ; i >= 0; i--, hpdp--) *hpdp = __hugepd(0); kmem_cache_free(cachep, new); + } else { + kmemleak_ignore(new); } spin_unlock(ptl); return 0; @@ -837,8 +840,12 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, ret_pte = (pte_t *) pmdp; goto out; } - - if (pmd_huge(pmd)) { + /* + * pmd_large check below will handle the swap pmd pte + * we need to do both the check because they are config + * dependent. + */ + if (pmd_huge(pmd) || pmd_large(pmd)) { ret_pte = (pte_t *) pmdp; goto out; } else if (is_hugepd(__hugepd(pmd_val(pmd)))) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c @@ -309,11 +309,11 @@ void __init paging_init(void) unsigned long end = __fix_to_virt(FIX_HOLE); for (; v < end; v += PAGE_SIZE) - map_kernel_page(v, 0, 0); /* XXX gross */ + map_kernel_page(v, 0, __pgprot(0)); /* XXX gross */ #endif #ifdef CONFIG_HIGHMEM - map_kernel_page(PKMAP_BASE, 0, 0); /* XXX gross */ + map_kernel_page(PKMAP_BASE, 0, __pgprot(0)); /* XXX gross */ pkmap_page_table = virt_to_kpte(PKMAP_BASE); kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN)); @@ -509,7 +509,8 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, * We don't need to worry about _PAGE_PRESENT here because we are * called with either mm->page_table_lock held or ptl lock held */ - unsigned long access, trap; + unsigned long trap; + bool is_exec; if (radix_enabled()) { prefetch((void *)address); @@ -531,16 +532,16 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, trap = current->thread.regs ? TRAP(current->thread.regs) : 0UL; switch (trap) { case 0x300: - access = 0UL; + is_exec = false; break; case 0x400: - access = _PAGE_EXEC; + is_exec = true; break; default: return; } - hash_preload(vma->vm_mm, address, access, trap); + hash_preload(vma->vm_mm, address, is_exec, trap); #endif /* CONFIG_PPC_STD_MMU */ #if (defined(CONFIG_PPC_BOOK3E_64) || defined(CONFIG_PPC_FSL_BOOK3E)) \ && defined(CONFIG_HUGETLB_PAGE) diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c @@ -53,6 +53,8 @@ int hash__alloc_context_id(void) } EXPORT_SYMBOL_GPL(hash__alloc_context_id); +void slb_setup_new_exec(void); + static int hash__init_new_context(struct mm_struct *mm) { int index; @@ -84,6 +86,13 @@ static int hash__init_new_context(struct mm_struct *mm) return index; } +void hash__setup_new_exec(void) +{ + slice_setup_new_exec(); + + slb_setup_new_exec(); +} + static int radix__init_new_context(struct mm_struct *mm) { unsigned long rts_field; diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h @@ -22,6 +22,7 @@ #include <asm/mmu.h> #ifdef CONFIG_PPC_MMU_NOHASH +#include <asm/trace.h> /* * On 40x and 8xx, we directly inline tlbia and tlbivax @@ -30,10 +31,12 @@ static inline void _tlbil_all(void) { asm volatile ("sync; tlbia; isync" : : : "memory"); + trace_tlbia(MMU_NO_CONTEXT); } static inline void _tlbil_pid(unsigned int pid) { asm volatile ("sync; tlbia; isync" : : : "memory"); + trace_tlbia(pid); } #define _tlbil_pid_noind(pid) _tlbil_pid(pid) @@ -55,6 +58,7 @@ static inline void _tlbil_va(unsigned long address, unsigned int pid, unsigned int tsize, unsigned int ind) { asm volatile ("tlbie %0; sync" : : "r" (address) : "memory"); + trace_tlbie(0, 0, address, pid, 0, 0, 0); } #elif defined(CONFIG_PPC_BOOK3E) extern void _tlbil_va(unsigned long address, unsigned int pid, @@ -82,7 +86,7 @@ static inline void _tlbivax_bcast(unsigned long address, unsigned int pid, #else /* CONFIG_PPC_MMU_NOHASH */ extern void hash_preload(struct mm_struct *mm, unsigned long ea, - unsigned long access, unsigned long trap); + bool is_exec, unsigned long trap); extern void _tlbie(unsigned long address); diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c @@ -1521,6 +1521,10 @@ int start_topology_update(void) } } + pr_info("Starting topology update%s%s\n", + (prrn_enabled ? " prrn_enabled" : ""), + (vphn_enabled ? " vphn_enabled" : "")); + return rc; } @@ -1542,6 +1546,8 @@ int stop_topology_update(void) rc = del_timer_sync(&topology_timer); } + pr_info("Stopping topology update\n"); + return rc; } diff --git a/arch/powerpc/mm/pgtable-book3e.c b/arch/powerpc/mm/pgtable-book3e.c @@ -42,7 +42,7 @@ int __meminit vmemmap_create_mapping(unsigned long start, * thus must have the low bits clear */ for (i = 0; i < page_size; i += PAGE_SIZE) - BUG_ON(map_kernel_page(start + i, phys, flags)); + BUG_ON(map_kernel_page(start + i, phys, __pgprot(flags))); return 0; } @@ -70,7 +70,7 @@ static __ref void *early_alloc_pgtable(unsigned long size) * map_kernel_page adds an entry to the ioremap page table * and adds an entry to the HPT, possibly bolting it */ -int map_kernel_page(unsigned long ea, unsigned long pa, unsigned long flags) +int map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot) { pgd_t *pgdp; pud_t *pudp; @@ -89,8 +89,6 @@ int map_kernel_page(unsigned long ea, unsigned long pa, unsigned long flags) ptep = pte_alloc_kernel(pmdp, ea); if (!ptep) return -ENOMEM; - set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, - __pgprot(flags))); } else { pgdp = pgd_offset_k(ea); #ifndef __PAGETABLE_PUD_FOLDED @@ -113,9 +111,8 @@ int map_kernel_page(unsigned long ea, unsigned long pa, unsigned long flags) pmd_populate_kernel(&init_mm, pmdp, ptep); } ptep = pte_offset_kernel(pmdp, ea); - set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, - __pgprot(flags))); } + set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot)); smp_wmb(); return 0; diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c @@ -69,9 +69,14 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd) { #ifdef CONFIG_DEBUG_VM - WARN_ON(pte_present(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp))); + /* + * Make sure hardware valid bit is not set. We don't do + * tlb flush for this update. + */ + + WARN_ON(pte_hw_valid(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp))); assert_spin_locked(pmd_lockptr(mm, pmdp)); - WARN_ON(!(pmd_trans_huge(pmd) || pmd_devmap(pmd))); + WARN_ON(!(pmd_large(pmd) || pmd_devmap(pmd))); #endif trace_hugepage_set_pmd(addr, pmd_val(pmd)); return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); @@ -106,7 +111,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, { unsigned long old_pmd; - old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0); + old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); /* * This ensures that generic code that rely on IRQ disabling diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c @@ -142,7 +142,7 @@ void hash__vmemmap_remove_mapping(unsigned long start, * map_kernel_page adds an entry to the ioremap page table * and adds an entry to the HPT, possibly bolting it */ -int hash__map_kernel_page(unsigned long ea, unsigned long pa, unsigned long flags) +int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot) { pgd_t *pgdp; pud_t *pudp; @@ -161,8 +161,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, unsigned long flag ptep = pte_alloc_kernel(pmdp, ea); if (!ptep) return -ENOMEM; - set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, - __pgprot(flags))); + set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot)); } else { /* * If the mm subsystem is not fully up, we cannot create a @@ -170,7 +169,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, unsigned long flag * entry in the hardware page table. * */ - if (htab_bolt_mapping(ea, ea + PAGE_SIZE, pa, flags, + if (htab_bolt_mapping(ea, ea + PAGE_SIZE, pa, pgprot_val(prot), mmu_io_psize, mmu_kernel_ssize)) { printk(KERN_ERR "Failed to do bolted mapping IO " "memory at %016lx !\n", pa); diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c @@ -241,9 +241,8 @@ void radix__mark_initmem_nx(void) } #endif /* CONFIG_STRICT_KERNEL_RWX */ -static inline void __meminit print_mapping(unsigned long start, - unsigned long end, - unsigned long size) +static inline void __meminit +print_mapping(unsigned long start, unsigned long end, unsigned long size, bool exec) { char buf[10]; @@ -252,7 +251,17 @@ static inline void __meminit print_mapping(unsigned long start, string_get_size(size, 1, STRING_UNITS_2, buf, sizeof(buf)); - pr_info("Mapped 0x%016lx-0x%016lx with %s pages\n", start, end, buf); + pr_info("Mapped 0x%016lx-0x%016lx with %s pages%s\n", start, end, buf, + exec ? " (exec)" : ""); +} + +static unsigned long next_boundary(unsigned long addr, unsigned long end) +{ +#ifdef CONFIG_STRICT_KERNEL_RWX + if (addr < __pa_symbol(__init_begin)) + return __pa_symbol(__init_begin); +#endif + return end; } static int __meminit create_physical_mapping(unsigned long start, @@ -260,13 +269,8 @@ static int __meminit create_physical_mapping(unsigned long start, int nid) { unsigned long vaddr, addr, mapping_size = 0; + bool prev_exec, exec = false; pgprot_t prot; - unsigned long max_mapping_size; -#ifdef CONFIG_STRICT_KERNEL_RWX - int split_text_mapping = 1; -#else - int split_text_mapping = 0; -#endif int psize; start = _ALIGN_UP(start, PAGE_SIZE); @@ -274,14 +278,12 @@ static int __meminit create_physical_mapping(unsigned long start, unsigned long gap, previous_size; int rc; - gap = end - addr; + gap = next_boundary(addr, end) - addr; previous_size = mapping_size; - max_mapping_size = PUD_SIZE; + prev_exec = exec; -retry: if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE && - mmu_psize_defs[MMU_PAGE_1G].shift && - PUD_SIZE <= max_mapping_size) { + mmu_psize_defs[MMU_PAGE_1G].shift) { mapping_size = PUD_SIZE; psize = MMU_PAGE_1G; } else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE && @@ -293,32 +295,21 @@ retry: psize = mmu_virtual_psize; } - if (split_text_mapping && (mapping_size == PUD_SIZE) && - (addr <= __pa_symbol(__init_begin)) && - (addr + mapping_size) >= __pa_symbol(_stext)) { - max_mapping_size = PMD_SIZE; - goto retry; - } - - if (split_text_mapping && (mapping_size == PMD_SIZE) && - (addr <= __pa_symbol(__init_begin)) && - (addr + mapping_size) >= __pa_symbol(_stext)) { - mapping_size = PAGE_SIZE; - psize = mmu_virtual_psize; - } - - if (mapping_size != previous_size) { - print_mapping(start, addr, previous_size); - start = addr; - } - vaddr = (unsigned long)__va(addr); if (overlaps_kernel_text(vaddr, vaddr + mapping_size) || - overlaps_interrupt_vector_text(vaddr, vaddr + mapping_size)) + overlaps_interrupt_vector_text(vaddr, vaddr + mapping_size)) { prot = PAGE_KERNEL_X; - else + exec = true; + } else { prot = PAGE_KERNEL; + exec = false; + } + + if (mapping_size != previous_size || exec != prev_exec) { + print_mapping(start, addr, previous_size, prev_exec); + start = addr; + } rc = __map_kernel_page(vaddr, addr, prot, mapping_size, nid, start, end); if (rc) @@ -327,7 +318,7 @@ retry: update_page_count(psize, 1); } - print_mapping(start, addr, mapping_size); + print_mapping(start, addr, mapping_size, exec); return 0; } diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c @@ -44,20 +44,13 @@ static inline int is_exec_fault(void) static inline int pte_looks_normal(pte_t pte) { -#if defined(CONFIG_PPC_BOOK3S_64) - if ((pte_val(pte) & (_PAGE_PRESENT | _PAGE_SPECIAL)) == _PAGE_PRESENT) { + if (pte_present(pte) && !pte_special(pte)) { if (pte_ci(pte)) return 0; if (pte_user(pte)) return 1; } return 0; -#else - return (pte_val(pte) & - (_PAGE_PRESENT | _PAGE_SPECIAL | _PAGE_NO_CACHE | _PAGE_USER | - _PAGE_PRIVILEGED)) == - (_PAGE_PRESENT | _PAGE_USER); -#endif } static struct page *maybe_pte_to_page(pte_t pte) @@ -73,7 +66,7 @@ static struct page *maybe_pte_to_page(pte_t pte) return page; } -#if defined(CONFIG_PPC_STD_MMU) || _PAGE_EXEC == 0 +#ifdef CONFIG_PPC_BOOK3S /* Server-style MMU handles coherency when hashing if HW exec permission * is supposed per page (currently 64-bit only). If not, then, we always @@ -106,7 +99,7 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma, return pte; } -#else /* defined(CONFIG_PPC_STD_MMU) || _PAGE_EXEC == 0 */ +#else /* CONFIG_PPC_BOOK3S */ /* Embedded type MMU with HW exec support. This is a bit more complicated * as we don't have two bits to spare for _PAGE_EXEC and _PAGE_HWEXEC so @@ -117,7 +110,7 @@ static pte_t set_pte_filter(pte_t pte) struct page *pg; /* No exec permission in the first place, move on */ - if (!(pte_val(pte) & _PAGE_EXEC) || !pte_looks_normal(pte)) + if (!pte_exec(pte) || !pte_looks_normal(pte)) return pte; /* If you set _PAGE_EXEC on weird pages you're on your own */ @@ -137,7 +130,7 @@ static pte_t set_pte_filter(pte_t pte) } /* Else, we filter out _PAGE_EXEC */ - return __pte(pte_val(pte) & ~_PAGE_EXEC); + return pte_exprotect(pte); } static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma, @@ -150,7 +143,7 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma, * if necessary. Also if _PAGE_EXEC is already set, same deal, * we just bail out */ - if (dirty || (pte_val(pte) & _PAGE_EXEC) || !is_exec_fault()) + if (dirty || pte_exec(pte) || !is_exec_fault()) return pte; #ifdef CONFIG_DEBUG_VM @@ -176,10 +169,10 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma, set_bit(PG_arch_1, &pg->flags); bail: - return __pte(pte_val(pte) | _PAGE_EXEC); + return pte_mkexec(pte); } -#endif /* !(defined(CONFIG_PPC_STD_MMU) || _PAGE_EXEC == 0) */ +#endif /* CONFIG_PPC_BOOK3S */ /* * set_pte stores a linux PTE into the linux page table. @@ -188,14 +181,13 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { /* - * When handling numa faults, we already have the pte marked - * _PAGE_PRESENT, but we can be sure that it is not in hpte. - * Hence we can use set_pte_at for them. + * Make sure hardware valid bit is not set. We don't do + * tlb flush for this update. */ - VM_WARN_ON(pte_present(*ptep) && !pte_protnone(*ptep)); + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); /* Add the pte bit when trying to set a pte */ - pte = __pte(pte_val(pte) | _PAGE_PTE); + pte = pte_mkpte(pte); /* Note: mm->context.id might not yet have been assigned as * this context might not have been activated yet when this diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c @@ -76,56 +76,69 @@ pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address) void __iomem * ioremap(phys_addr_t addr, unsigned long size) { - return __ioremap_caller(addr, size, _PAGE_NO_CACHE | _PAGE_GUARDED, - __builtin_return_address(0)); + pgprot_t prot = pgprot_noncached(PAGE_KERNEL); + + return __ioremap_caller(addr, size, prot, __builtin_return_address(0)); } EXPORT_SYMBOL(ioremap); void __iomem * ioremap_wc(phys_addr_t addr, unsigned long size) { - return __ioremap_caller(addr, size, _PAGE_NO_CACHE, - __builtin_return_address(0)); + pgprot_t prot = pgprot_noncached_wc(PAGE_KERNEL); + + return __ioremap_caller(addr, size, prot, __builtin_return_address(0)); } EXPORT_SYMBOL(ioremap_wc); void __iomem * +ioremap_wt(phys_addr_t addr, unsigned long size) +{ + pgprot_t prot = pgprot_cached_wthru(PAGE_KERNEL); + + return __ioremap_caller(addr, size, prot, __builtin_return_address(0)); +} +EXPORT_SYMBOL(ioremap_wt); + +void __iomem * +ioremap_coherent(phys_addr_t addr, unsigned long size) +{ + pgprot_t prot = pgprot_cached(PAGE_KERNEL); + + return __ioremap_caller(addr, size, prot, __builtin_return_address(0)); +} +EXPORT_SYMBOL(ioremap_coherent); + +void __iomem * ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long flags) { + pte_t pte = __pte(flags); + /* writeable implies dirty for kernel addresses */ - if ((flags & (_PAGE_RW | _PAGE_RO)) != _PAGE_RO) - flags |= _PAGE_DIRTY | _PAGE_HWWRITE; + if (pte_write(pte)) + pte = pte_mkdirty(pte); /* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */ - flags &= ~(_PAGE_USER | _PAGE_EXEC); - flags |= _PAGE_PRIVILEGED; + pte = pte_exprotect(pte); + pte = pte_mkprivileged(pte); - return __ioremap_caller(addr, size, flags, __builtin_return_address(0)); + return __ioremap_caller(addr, size, pte_pgprot(pte), __builtin_return_address(0)); } EXPORT_SYMBOL(ioremap_prot); void __iomem * __ioremap(phys_addr_t addr, unsigned long size, unsigned long flags) { - return __ioremap_caller(addr, size, flags, __builtin_return_address(0)); + return __ioremap_caller(addr, size, __pgprot(flags), __builtin_return_address(0)); } void __iomem * -__ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags, - void *caller) +__ioremap_caller(phys_addr_t addr, unsigned long size, pgprot_t prot, void *caller) { unsigned long v, i; phys_addr_t p; int err; - /* Make sure we have the base flags */ - if ((flags & _PAGE_PRESENT) == 0) - flags |= pgprot_val(PAGE_KERNEL); - - /* Non-cacheable page cannot be coherent */ - if (flags & _PAGE_NO_CACHE) - flags &= ~_PAGE_COHERENT; - /* * Choose an address to map it to. * Once the vmalloc system is running, we use it. @@ -183,7 +196,7 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags, err = 0; for (i = 0; i < size && err == 0; i += PAGE_SIZE) - err = map_kernel_page(v+i, p+i, flags); + err = map_kernel_page(v + i, p + i, prot); if (err) { if (slab_is_available()) vunmap((void *)v); @@ -209,7 +222,7 @@ void iounmap(volatile void __iomem *addr) } EXPORT_SYMBOL(iounmap); -int map_kernel_page(unsigned long va, phys_addr_t pa, int flags) +int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot) { pmd_t *pd; pte_t *pg; @@ -224,10 +237,8 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, int flags) /* The PTE should never be already set nor present in the * hash table */ - BUG_ON((pte_val(*pg) & (_PAGE_PRESENT | _PAGE_HASHPTE)) && - flags); - set_pte_at(&init_mm, va, pg, pfn_pte(pa >> PAGE_SHIFT, - __pgprot(flags))); + BUG_ON((pte_present(*pg) | pte_hashpte(*pg)) && pgprot_val(prot)); + set_pte_at(&init_mm, va, pg, pfn_pte(pa >> PAGE_SHIFT, prot)); } smp_wmb(); return err; @@ -238,7 +249,7 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, int flags) */ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top) { - unsigned long v, s, f; + unsigned long v, s; phys_addr_t p; int ktext; @@ -248,11 +259,10 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top) for (; s < top; s += PAGE_SIZE) { ktext = ((char *)v >= _stext && (char *)v < etext) || ((char *)v >= _sinittext && (char *)v < _einittext); - f = ktext ? pgprot_val(PAGE_KERNEL_TEXT) : pgprot_val(PAGE_KERNEL); - map_kernel_page(v, p, f); + map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL); #ifdef CONFIG_PPC_STD_MMU_32 if (ktext) - hash_preload(&init_mm, v, 0, 0x300); + hash_preload(&init_mm, v, false, 0x300); #endif v += PAGE_SIZE; p += PAGE_SIZE; diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c @@ -113,17 +113,12 @@ unsigned long ioremap_bot = IOREMAP_BASE; * __ioremap_at - Low level function to establish the page tables * for an IO mapping */ -void __iomem * __ioremap_at(phys_addr_t pa, void *ea, unsigned long size, - unsigned long flags) +void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, pgprot_t prot) { unsigned long i; - /* Make sure we have the base flags */ - if ((flags & _PAGE_PRESENT) == 0) - flags |= pgprot_val(PAGE_KERNEL); - /* We don't support the 4K PFN hack with ioremap */ - if (flags & H_PAGE_4K_PFN) + if (pgprot_val(prot) & H_PAGE_4K_PFN) return NULL; WARN_ON(pa & ~PAGE_MASK); @@ -131,7 +126,7 @@ void __iomem * __ioremap_at(phys_addr_t pa, void *ea, unsigned long size, WARN_ON(size & ~PAGE_MASK); for (i = 0; i < size; i += PAGE_SIZE) - if (map_kernel_page((unsigned long)ea+i, pa+i, flags)) + if (map_kernel_page((unsigned long)ea + i, pa + i, prot)) return NULL; return (void __iomem *)ea; @@ -152,7 +147,7 @@ void __iounmap_at(void *ea, unsigned long size) } void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size, - unsigned long flags, void *caller) + pgprot_t prot, void *caller) { phys_addr_t paligned; void __iomem *ret; @@ -182,11 +177,11 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size, return NULL; area->phys_addr = paligned; - ret = __ioremap_at(paligned, area->addr, size, flags); + ret = __ioremap_at(paligned, area->addr, size, prot); if (!ret) vunmap(area->addr); } else { - ret = __ioremap_at(paligned, (void *)ioremap_bot, size, flags); + ret = __ioremap_at(paligned, (void *)ioremap_bot, size, prot); if (ret) ioremap_bot += size; } @@ -199,49 +194,59 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size, void __iomem * __ioremap(phys_addr_t addr, unsigned long size, unsigned long flags) { - return __ioremap_caller(addr, size, flags, __builtin_return_address(0)); + return __ioremap_caller(addr, size, __pgprot(flags), __builtin_return_address(0)); } void __iomem * ioremap(phys_addr_t addr, unsigned long size) { - unsigned long flags = pgprot_val(pgprot_noncached(__pgprot(0))); + pgprot_t prot = pgprot_noncached(PAGE_KERNEL); void *caller = __builtin_return_address(0); if (ppc_md.ioremap) - return ppc_md.ioremap(addr, size, flags, caller); - return __ioremap_caller(addr, size, flags, caller); + return ppc_md.ioremap(addr, size, prot, caller); + return __ioremap_caller(addr, size, prot, caller); } void __iomem * ioremap_wc(phys_addr_t addr, unsigned long size) { - unsigned long flags = pgprot_val(pgprot_noncached_wc(__pgprot(0))); + pgprot_t prot = pgprot_noncached_wc(PAGE_KERNEL); + void *caller = __builtin_return_address(0); + + if (ppc_md.ioremap) + return ppc_md.ioremap(addr, size, prot, caller); + return __ioremap_caller(addr, size, prot, caller); +} + +void __iomem *ioremap_coherent(phys_addr_t addr, unsigned long size) +{ + pgprot_t prot = pgprot_cached(PAGE_KERNEL); void *caller = __builtin_return_address(0); if (ppc_md.ioremap) - return ppc_md.ioremap(addr, size, flags, caller); - return __ioremap_caller(addr, size, flags, caller); + return ppc_md.ioremap(addr, size, prot, caller); + return __ioremap_caller(addr, size, prot, caller); } void __iomem * ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long flags) { + pte_t pte = __pte(flags); void *caller = __builtin_return_address(0); /* writeable implies dirty for kernel addresses */ - if (flags & _PAGE_WRITE) - flags |= _PAGE_DIRTY; + if (pte_write(pte)) + pte = pte_mkdirty(pte); /* we don't want to let _PAGE_EXEC leak out */ - flags &= ~_PAGE_EXEC; + pte = pte_exprotect(pte); /* * Force kernel mapping. */ - flags &= ~_PAGE_USER; - flags |= _PAGE_PRIVILEGED; + pte = pte_mkprivileged(pte); if (ppc_md.ioremap) - return ppc_md.ioremap(addr, size, flags, caller); - return __ioremap_caller(addr, size, flags, caller); + return ppc_md.ioremap(addr, size, pte_pgprot(pte), caller); + return __ioremap_caller(addr, size, pte_pgprot(pte), caller); } @@ -306,7 +311,7 @@ struct page *pud_page(pud_t pud) */ struct page *pmd_page(pmd_t pmd) { - if (pmd_trans_huge(pmd) || pmd_huge(pmd) || pmd_devmap(pmd)) + if (pmd_large(pmd) || pmd_huge(pmd) || pmd_devmap(pmd)) return pte_page(pmd_pte(pmd)); return virt_to_page(pmd_page_vaddr(pmd)); } diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c @@ -163,7 +163,7 @@ void __init setbat(int index, unsigned long virt, phys_addr_t phys, * Preload a translation in the hash table */ void hash_preload(struct mm_struct *mm, unsigned long ea, - unsigned long access, unsigned long trap) + bool is_exec, unsigned long trap) { pmd_t *pmd; diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c @@ -14,6 +14,7 @@ * 2 of the License, or (at your option) any later version. */ +#include <asm/asm-prototypes.h> #include <asm/pgtable.h> #include <asm/mmu.h> #include <asm/mmu_context.h> @@ -30,11 +31,10 @@ enum slb_index { LINEAR_INDEX = 0, /* Kernel linear map (0xc000000000000000) */ - VMALLOC_INDEX = 1, /* Kernel virtual map (0xd000000000000000) */ - KSTACK_INDEX = 2, /* Kernel stack map */ + KSTACK_INDEX = 1, /* Kernel stack map */ }; -extern void slb_allocate(unsigned long ea); +static long slb_allocate_user(struct mm_struct *mm, unsigned long ea); #define slb_esid_mask(ssize) \ (((ssize) == MMU_SEGSIZE_256M)? ESID_MASK: ESID_MASK_1T) @@ -45,13 +45,43 @@ static inline unsigned long mk_esid_data(unsigned long ea, int ssize, return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | index; } -static inline unsigned long mk_vsid_data(unsigned long ea, int ssize, +static inline unsigned long __mk_vsid_data(unsigned long vsid, int ssize, unsigned long flags) { - return (get_kernel_vsid(ea, ssize) << slb_vsid_shift(ssize)) | flags | + return (vsid << slb_vsid_shift(ssize)) | flags | ((unsigned long) ssize << SLB_VSID_SSIZE_SHIFT); } +static inline unsigned long mk_vsid_data(unsigned long ea, int ssize, + unsigned long flags) +{ + return __mk_vsid_data(get_kernel_vsid(ea, ssize), ssize, flags); +} + +static void assert_slb_exists(unsigned long ea) +{ +#ifdef CONFIG_DEBUG_VM + unsigned long tmp; + + WARN_ON_ONCE(mfmsr() & MSR_EE); + + asm volatile("slbfee. %0, %1" : "=r"(tmp) : "r"(ea) : "cr0"); + WARN_ON(tmp == 0); +#endif +} + +static void assert_slb_notexists(unsigned long ea) +{ +#ifdef CONFIG_DEBUG_VM + unsigned long tmp; + + WARN_ON_ONCE(mfmsr() & MSR_EE); + + asm volatile("slbfee. %0, %1" : "=r"(tmp) : "r"(ea) : "cr0"); + WARN_ON(tmp != 0); +#endif +} + static inline void slb_shadow_update(unsigned long ea, int ssize, unsigned long flags, enum slb_index index) @@ -84,6 +114,7 @@ static inline void create_shadowed_slbe(unsigned long ea, int ssize, */ slb_shadow_update(ea, ssize, flags, index); + assert_slb_notexists(ea); asm volatile("slbmte %0,%1" : : "r" (mk_vsid_data(ea, ssize, flags)), "r" (mk_esid_data(ea, ssize, index)) @@ -105,17 +136,20 @@ void __slb_restore_bolted_realmode(void) : "r" (be64_to_cpu(p->save_area[index].vsid)), "r" (be64_to_cpu(p->save_area[index].esid))); } + + assert_slb_exists(local_paca->kstack); } /* * Insert the bolted entries into an empty SLB. - * This is not the same as rebolt because the bolted segments are not - * changed, just loaded from the shadow area. */ void slb_restore_bolted_realmode(void) { __slb_restore_bolted_realmode(); get_paca()->slb_cache_ptr = 0; + + get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1; + get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap; } /* @@ -123,113 +157,262 @@ void slb_restore_bolted_realmode(void) */ void slb_flush_all_realmode(void) { - /* - * This flushes all SLB entries including 0, so it must be realmode. - */ asm volatile("slbmte %0,%0; slbia" : : "r" (0)); } -static void __slb_flush_and_rebolt(void) +/* + * This flushes non-bolted entries, it can be run in virtual mode. Must + * be called with interrupts disabled. + */ +void slb_flush_and_restore_bolted(void) { - /* If you change this make sure you change SLB_NUM_BOLTED - * and PR KVM appropriately too. */ - unsigned long linear_llp, vmalloc_llp, lflags, vflags; - unsigned long ksp_esid_data, ksp_vsid_data; + struct slb_shadow *p = get_slb_shadow(); - linear_llp = mmu_psize_defs[mmu_linear_psize].sllp; - vmalloc_llp = mmu_psize_defs[mmu_vmalloc_psize].sllp; - lflags = SLB_VSID_KERNEL | linear_llp; - vflags = SLB_VSID_KERNEL | vmalloc_llp; + BUILD_BUG_ON(SLB_NUM_BOLTED != 2); - ksp_esid_data = mk_esid_data(get_paca()->kstack, mmu_kernel_ssize, KSTACK_INDEX); - if ((ksp_esid_data & ~0xfffffffUL) <= PAGE_OFFSET) { - ksp_esid_data &= ~SLB_ESID_V; - ksp_vsid_data = 0; - slb_shadow_clear(KSTACK_INDEX); - } else { - /* Update stack entry; others don't change */ - slb_shadow_update(get_paca()->kstack, mmu_kernel_ssize, lflags, KSTACK_INDEX); - ksp_vsid_data = - be64_to_cpu(get_slb_shadow()->save_area[KSTACK_INDEX].vsid); - } + WARN_ON(!irqs_disabled()); + + /* + * We can't take a PMU exception in the following code, so hard + * disable interrupts. + */ + hard_irq_disable(); - /* We need to do this all in asm, so we're sure we don't touch - * the stack between the slbia and rebolting it. */ asm volatile("isync\n" "slbia\n" - /* Slot 1 - first VMALLOC segment */ - "slbmte %0,%1\n" - /* Slot 2 - kernel stack */ - "slbmte %2,%3\n" - "isync" - :: "r"(mk_vsid_data(VMALLOC_START, mmu_kernel_ssize, vflags)), - "r"(mk_esid_data(VMALLOC_START, mmu_kernel_ssize, VMALLOC_INDEX)), - "r"(ksp_vsid_data), - "r"(ksp_esid_data) + "slbmte %0, %1\n" + "isync\n" + :: "r" (be64_to_cpu(p->save_area[KSTACK_INDEX].vsid)), + "r" (be64_to_cpu(p->save_area[KSTACK_INDEX].esid)) : "memory"); + assert_slb_exists(get_paca()->kstack); + + get_paca()->slb_cache_ptr = 0; + + get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1; + get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap; } -void slb_flush_and_rebolt(void) +void slb_save_contents(struct slb_entry *slb_ptr) { + int i; + unsigned long e, v; - WARN_ON(!irqs_disabled()); + /* Save slb_cache_ptr value. */ + get_paca()->slb_save_cache_ptr = get_paca()->slb_cache_ptr; + + if (!slb_ptr) + return; + + for (i = 0; i < mmu_slb_size; i++) { + asm volatile("slbmfee %0,%1" : "=r" (e) : "r" (i)); + asm volatile("slbmfev %0,%1" : "=r" (v) : "r" (i)); + slb_ptr->esid = e; + slb_ptr->vsid = v; + slb_ptr++; + } +} + +void slb_dump_contents(struct slb_entry *slb_ptr) +{ + int i, n; + unsigned long e, v; + unsigned long llp; + + if (!slb_ptr) + return; + + pr_err("SLB contents of cpu 0x%x\n", smp_processor_id()); + pr_err("Last SLB entry inserted at slot %d\n", get_paca()->stab_rr); + + for (i = 0; i < mmu_slb_size; i++) { + e = slb_ptr->esid; + v = slb_ptr->vsid; + slb_ptr++; + + if (!e && !v) + continue; + + pr_err("%02d %016lx %016lx\n", i, e, v); + + if (!(e & SLB_ESID_V)) { + pr_err("\n"); + continue; + } + llp = v & SLB_VSID_LLP; + if (v & SLB_VSID_B_1T) { + pr_err(" 1T ESID=%9lx VSID=%13lx LLP:%3lx\n", + GET_ESID_1T(e), + (v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T, llp); + } else { + pr_err(" 256M ESID=%9lx VSID=%13lx LLP:%3lx\n", + GET_ESID(e), + (v & ~SLB_VSID_B) >> SLB_VSID_SHIFT, llp); + } + } + pr_err("----------------------------------\n"); + + /* Dump slb cache entires as well. */ + pr_err("SLB cache ptr value = %d\n", get_paca()->slb_save_cache_ptr); + pr_err("Valid SLB cache entries:\n"); + n = min_t(int, get_paca()->slb_save_cache_ptr, SLB_CACHE_ENTRIES); + for (i = 0; i < n; i++) + pr_err("%02d EA[0-35]=%9x\n", i, get_paca()->slb_cache[i]); + pr_err("Rest of SLB cache entries:\n"); + for (i = n; i < SLB_CACHE_ENTRIES; i++) + pr_err("%02d EA[0-35]=%9x\n", i, get_paca()->slb_cache[i]); +} +void slb_vmalloc_update(void) +{ /* - * We can't take a PMU exception in the following code, so hard - * disable interrupts. + * vmalloc is not bolted, so just have to flush non-bolted. */ - hard_irq_disable(); + slb_flush_and_restore_bolted(); +} - __slb_flush_and_rebolt(); - get_paca()->slb_cache_ptr = 0; +static bool preload_hit(struct thread_info *ti, unsigned long esid) +{ + unsigned char i; + + for (i = 0; i < ti->slb_preload_nr; i++) { + unsigned char idx; + + idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR; + if (esid == ti->slb_preload_esid[idx]) + return true; + } + return false; } -void slb_vmalloc_update(void) +static bool preload_add(struct thread_info *ti, unsigned long ea) { - unsigned long vflags; + unsigned char idx; + unsigned long esid; + + if (mmu_has_feature(MMU_FTR_1T_SEGMENT)) { + /* EAs are stored >> 28 so 256MB segments don't need clearing */ + if (ea & ESID_MASK_1T) + ea &= ESID_MASK_1T; + } + + esid = ea >> SID_SHIFT; - vflags = SLB_VSID_KERNEL | mmu_psize_defs[mmu_vmalloc_psize].sllp; - slb_shadow_update(VMALLOC_START, mmu_kernel_ssize, vflags, VMALLOC_INDEX); - slb_flush_and_rebolt(); + if (preload_hit(ti, esid)) + return false; + + idx = (ti->slb_preload_tail + ti->slb_preload_nr) % SLB_PRELOAD_NR; + ti->slb_preload_esid[idx] = esid; + if (ti->slb_preload_nr == SLB_PRELOAD_NR) + ti->slb_preload_tail = (ti->slb_preload_tail + 1) % SLB_PRELOAD_NR; + else + ti->slb_preload_nr++; + + return true; } -/* Helper function to compare esids. There are four cases to handle. - * 1. The system is not 1T segment size capable. Use the GET_ESID compare. - * 2. The system is 1T capable, both addresses are < 1T, use the GET_ESID compare. - * 3. The system is 1T capable, only one of the two addresses is > 1T. This is not a match. - * 4. The system is 1T capable, both addresses are > 1T, use the GET_ESID_1T macro to compare. - */ -static inline int esids_match(unsigned long addr1, unsigned long addr2) +static void preload_age(struct thread_info *ti) { - int esid_1t_count; + if (!ti->slb_preload_nr) + return; + ti->slb_preload_nr--; + ti->slb_preload_tail = (ti->slb_preload_tail + 1) % SLB_PRELOAD_NR; +} - /* System is not 1T segment size capable. */ - if (!mmu_has_feature(MMU_FTR_1T_SEGMENT)) - return (GET_ESID(addr1) == GET_ESID(addr2)); +void slb_setup_new_exec(void) +{ + struct thread_info *ti = current_thread_info(); + struct mm_struct *mm = current->mm; + unsigned long exec = 0x10000000; - esid_1t_count = (((addr1 >> SID_SHIFT_1T) != 0) + - ((addr2 >> SID_SHIFT_1T) != 0)); + WARN_ON(irqs_disabled()); - /* both addresses are < 1T */ - if (esid_1t_count == 0) - return (GET_ESID(addr1) == GET_ESID(addr2)); + /* + * preload cache can only be used to determine whether a SLB + * entry exists if it does not start to overflow. + */ + if (ti->slb_preload_nr + 2 > SLB_PRELOAD_NR) + return; - /* One address < 1T, the other > 1T. Not a match */ - if (esid_1t_count == 1) - return 0; + hard_irq_disable(); - /* Both addresses are > 1T. */ - return (GET_ESID_1T(addr1) == GET_ESID_1T(addr2)); + /* + * We have no good place to clear the slb preload cache on exec, + * flush_thread is about the earliest arch hook but that happens + * after we switch to the mm and have aleady preloaded the SLBEs. + * + * For the most part that's probably okay to use entries from the + * previous exec, they will age out if unused. It may turn out to + * be an advantage to clear the cache before switching to it, + * however. + */ + + /* + * preload some userspace segments into the SLB. + * Almost all 32 and 64bit PowerPC executables are linked at + * 0x10000000 so it makes sense to preload this segment. + */ + if (!is_kernel_addr(exec)) { + if (preload_add(ti, exec)) + slb_allocate_user(mm, exec); + } + + /* Libraries and mmaps. */ + if (!is_kernel_addr(mm->mmap_base)) { + if (preload_add(ti, mm->mmap_base)) + slb_allocate_user(mm, mm->mmap_base); + } + + /* see switch_slb */ + asm volatile("isync" : : : "memory"); + + local_irq_enable(); } +void preload_new_slb_context(unsigned long start, unsigned long sp) +{ + struct thread_info *ti = current_thread_info(); + struct mm_struct *mm = current->mm; + unsigned long heap = mm->start_brk; + + WARN_ON(irqs_disabled()); + + /* see above */ + if (ti->slb_preload_nr + 3 > SLB_PRELOAD_NR) + return; + + hard_irq_disable(); + + /* Userspace entry address. */ + if (!is_kernel_addr(start)) { + if (preload_add(ti, start)) + slb_allocate_user(mm, start); + } + + /* Top of stack, grows down. */ + if (!is_kernel_addr(sp)) { + if (preload_add(ti, sp)) + slb_allocate_user(mm, sp); + } + + /* Bottom of heap, grows up. */ + if (heap && !is_kernel_addr(heap)) { + if (preload_add(ti, heap)) + slb_allocate_user(mm, heap); + } + + /* see switch_slb */ + asm volatile("isync" : : : "memory"); + + local_irq_enable(); +} + + /* Flush all user entries from the segment table of the current processor. */ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) { - unsigned long offset; - unsigned long slbie_data = 0; - unsigned long pc = KSTK_EIP(tsk); - unsigned long stack = KSTK_ESP(tsk); - unsigned long exec_base; + struct thread_info *ti = task_thread_info(tsk); + unsigned char i; /* * We need interrupts hard-disabled here, not just soft-disabled, @@ -238,91 +421,107 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) * which would update the slb_cache/slb_cache_ptr fields in the PACA. */ hard_irq_disable(); - offset = get_paca()->slb_cache_ptr; - if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) && - offset <= SLB_CACHE_ENTRIES) { - int i; - asm volatile("isync" : : : "memory"); - for (i = 0; i < offset; i++) { - slbie_data = (unsigned long)get_paca()->slb_cache[i] - << SID_SHIFT; /* EA */ - slbie_data |= user_segment_size(slbie_data) - << SLBIE_SSIZE_SHIFT; - slbie_data |= SLBIE_C; /* C set for user addresses */ - asm volatile("slbie %0" : : "r" (slbie_data)); - } - asm volatile("isync" : : : "memory"); + asm volatile("isync" : : : "memory"); + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + /* + * SLBIA IH=3 invalidates all Class=1 SLBEs and their + * associated lookaside structures, which matches what + * switch_slb wants. So ARCH_300 does not use the slb + * cache. + */ + asm volatile(PPC_SLBIA(3)); } else { - __slb_flush_and_rebolt(); - } + unsigned long offset = get_paca()->slb_cache_ptr; + + if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) && + offset <= SLB_CACHE_ENTRIES) { + unsigned long slbie_data = 0; + + for (i = 0; i < offset; i++) { + unsigned long ea; + + ea = (unsigned long) + get_paca()->slb_cache[i] << SID_SHIFT; + /* + * Could assert_slb_exists here, but hypervisor + * or machine check could have come in and + * removed the entry at this point. + */ + + slbie_data = ea; + slbie_data |= user_segment_size(slbie_data) + << SLBIE_SSIZE_SHIFT; + slbie_data |= SLBIE_C; /* user slbs have C=1 */ + asm volatile("slbie %0" : : "r" (slbie_data)); + } + + /* Workaround POWER5 < DD2.1 issue */ + if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1) + asm volatile("slbie %0" : : "r" (slbie_data)); + + } else { + struct slb_shadow *p = get_slb_shadow(); + unsigned long ksp_esid_data = + be64_to_cpu(p->save_area[KSTACK_INDEX].esid); + unsigned long ksp_vsid_data = + be64_to_cpu(p->save_area[KSTACK_INDEX].vsid); + + asm volatile(PPC_SLBIA(1) "\n" + "slbmte %0,%1\n" + "isync" + :: "r"(ksp_vsid_data), + "r"(ksp_esid_data)); + + get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1; + } - /* Workaround POWER5 < DD2.1 issue */ - if (offset == 1 || offset > SLB_CACHE_ENTRIES) - asm volatile("slbie %0" : : "r" (slbie_data)); + get_paca()->slb_cache_ptr = 0; + } + get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap; - get_paca()->slb_cache_ptr = 0; copy_mm_to_paca(mm); /* - * preload some userspace segments into the SLB. - * Almost all 32 and 64bit PowerPC executables are linked at - * 0x10000000 so it makes sense to preload this segment. + * We gradually age out SLBs after a number of context switches to + * reduce reload overhead of unused entries (like we do with FP/VEC + * reload). Each time we wrap 256 switches, take an entry out of the + * SLB preload cache. */ - exec_base = 0x10000000; - - if (is_kernel_addr(pc) || is_kernel_addr(stack) || - is_kernel_addr(exec_base)) - return; + tsk->thread.load_slb++; + if (!tsk->thread.load_slb) { + unsigned long pc = KSTK_EIP(tsk); - slb_allocate(pc); + preload_age(ti); + preload_add(ti, pc); + } - if (!esids_match(pc, stack)) - slb_allocate(stack); + for (i = 0; i < ti->slb_preload_nr; i++) { + unsigned char idx; + unsigned long ea; - if (!esids_match(pc, exec_base) && - !esids_match(stack, exec_base)) - slb_allocate(exec_base); -} + idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR; + ea = (unsigned long)ti->slb_preload_esid[idx] << SID_SHIFT; -static inline void patch_slb_encoding(unsigned int *insn_addr, - unsigned int immed) -{ + slb_allocate_user(mm, ea); + } /* - * This function patches either an li or a cmpldi instruction with - * a new immediate value. This relies on the fact that both li - * (which is actually addi) and cmpldi both take a 16-bit immediate - * value, and it is situated in the same location in the instruction, - * ie. bits 16-31 (Big endian bit order) or the lower 16 bits. - * The signedness of the immediate operand differs between the two - * instructions however this code is only ever patching a small value, - * much less than 1 << 15, so we can get away with it. - * To patch the value we read the existing instruction, clear the - * immediate value, and or in our new value, then write the instruction - * back. + * Synchronize slbmte preloads with possible subsequent user memory + * address accesses by the kernel (user mode won't happen until + * rfid, which is safe). */ - unsigned int insn = (*insn_addr & 0xffff0000) | immed; - patch_instruction(insn_addr, insn); + asm volatile("isync" : : : "memory"); } -extern u32 slb_miss_kernel_load_linear[]; -extern u32 slb_miss_kernel_load_io[]; -extern u32 slb_compare_rr_to_size[]; -extern u32 slb_miss_kernel_load_vmemmap[]; - void slb_set_size(u16 size) { - if (mmu_slb_size == size) - return; - mmu_slb_size = size; - patch_slb_encoding(slb_compare_rr_to_size, mmu_slb_size); } void slb_initialize(void) { unsigned long linear_llp, vmalloc_llp, io_llp; - unsigned long lflags, vflags; + unsigned long lflags; static int slb_encoding_inited; #ifdef CONFIG_SPARSEMEM_VMEMMAP unsigned long vmemmap_llp; @@ -338,34 +537,24 @@ void slb_initialize(void) #endif if (!slb_encoding_inited) { slb_encoding_inited = 1; - patch_slb_encoding(slb_miss_kernel_load_linear, - SLB_VSID_KERNEL | linear_llp); - patch_slb_encoding(slb_miss_kernel_load_io, - SLB_VSID_KERNEL | io_llp); - patch_slb_encoding(slb_compare_rr_to_size, - mmu_slb_size); - pr_devel("SLB: linear LLP = %04lx\n", linear_llp); pr_devel("SLB: io LLP = %04lx\n", io_llp); - #ifdef CONFIG_SPARSEMEM_VMEMMAP - patch_slb_encoding(slb_miss_kernel_load_vmemmap, - SLB_VSID_KERNEL | vmemmap_llp); pr_devel("SLB: vmemmap LLP = %04lx\n", vmemmap_llp); #endif } - get_paca()->stab_rr = SLB_NUM_BOLTED; + get_paca()->stab_rr = SLB_NUM_BOLTED - 1; + get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1; + get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap; lflags = SLB_VSID_KERNEL | linear_llp; - vflags = SLB_VSID_KERNEL | vmalloc_llp; /* Invalidate the entire SLB (even entry 0) & all the ERATS */ asm volatile("isync":::"memory"); asm volatile("slbmte %0,%0"::"r" (0) : "memory"); asm volatile("isync; slbia; isync":::"memory"); create_shadowed_slbe(PAGE_OFFSET, mmu_kernel_ssize, lflags, LINEAR_INDEX); - create_shadowed_slbe(VMALLOC_START, mmu_kernel_ssize, vflags, VMALLOC_INDEX); /* For the boot cpu, we're running on the stack in init_thread_union, * which is in the first segment of the linear mapping, and also @@ -381,122 +570,259 @@ void slb_initialize(void) asm volatile("isync":::"memory"); } -static void insert_slb_entry(unsigned long vsid, unsigned long ea, - int bpsize, int ssize) +static void slb_cache_update(unsigned long esid_data) { - unsigned long flags, vsid_data, esid_data; - enum slb_index index; int slb_cache_index; - /* - * We are irq disabled, hence should be safe to access PACA. - */ - VM_WARN_ON(!irqs_disabled()); - - /* - * We can't take a PMU exception in the following code, so hard - * disable interrupts. - */ - hard_irq_disable(); - - index = get_paca()->stab_rr; - - /* - * simple round-robin replacement of slb starting at SLB_NUM_BOLTED. - */ - if (index < (mmu_slb_size - 1)) - index++; - else - index = SLB_NUM_BOLTED; - - get_paca()->stab_rr = index; - - flags = SLB_VSID_USER | mmu_psize_defs[bpsize].sllp; - vsid_data = (vsid << slb_vsid_shift(ssize)) | flags | - ((unsigned long) ssize << SLB_VSID_SSIZE_SHIFT); - esid_data = mk_esid_data(ea, ssize, index); - - /* - * No need for an isync before or after this slbmte. The exception - * we enter with and the rfid we exit with are context synchronizing. - * Also we only handle user segments here. - */ - asm volatile("slbmte %0, %1" : : "r" (vsid_data), "r" (esid_data) - : "memory"); + if (cpu_has_feature(CPU_FTR_ARCH_300)) + return; /* ISAv3.0B and later does not use slb_cache */ /* * Now update slb cache entries */ - slb_cache_index = get_paca()->slb_cache_ptr; + slb_cache_index = local_paca->slb_cache_ptr; if (slb_cache_index < SLB_CACHE_ENTRIES) { /* * We have space in slb cache for optimized switch_slb(). * Top 36 bits from esid_data as per ISA */ - get_paca()->slb_cache[slb_cache_index++] = esid_data >> 28; - get_paca()->slb_cache_ptr++; + local_paca->slb_cache[slb_cache_index++] = esid_data >> 28; + local_paca->slb_cache_ptr++; } else { /* * Our cache is full and the current cache content strictly * doesn't indicate the active SLB conents. Bump the ptr * so that switch_slb() will ignore the cache. */ - get_paca()->slb_cache_ptr = SLB_CACHE_ENTRIES + 1; + local_paca->slb_cache_ptr = SLB_CACHE_ENTRIES + 1; } } -static void handle_multi_context_slb_miss(int context_id, unsigned long ea) +static enum slb_index alloc_slb_index(bool kernel) { - struct mm_struct *mm = current->mm; - unsigned long vsid; - int bpsize; + enum slb_index index; /* - * We are always above 1TB, hence use high user segment size. + * The allocation bitmaps can become out of synch with the SLB + * when the _switch code does slbie when bolting a new stack + * segment and it must not be anywhere else in the SLB. This leaves + * a kernel allocated entry that is unused in the SLB. With very + * large systems or small segment sizes, the bitmaps could slowly + * fill with these entries. They will eventually be cleared out + * by the round robin allocator in that case, so it's probably not + * worth accounting for. */ - vsid = get_vsid(context_id, ea, mmu_highuser_ssize); - bpsize = get_slice_psize(mm, ea); - insert_slb_entry(vsid, ea, bpsize, mmu_highuser_ssize); + + /* + * SLBs beyond 32 entries are allocated with stab_rr only + * POWER7/8/9 have 32 SLB entries, this could be expanded if a + * future CPU has more. + */ + if (local_paca->slb_used_bitmap != U32_MAX) { + index = ffz(local_paca->slb_used_bitmap); + local_paca->slb_used_bitmap |= 1U << index; + if (kernel) + local_paca->slb_kern_bitmap |= 1U << index; + } else { + /* round-robin replacement of slb starting at SLB_NUM_BOLTED. */ + index = local_paca->stab_rr; + if (index < (mmu_slb_size - 1)) + index++; + else + index = SLB_NUM_BOLTED; + local_paca->stab_rr = index; + if (index < 32) { + if (kernel) + local_paca->slb_kern_bitmap |= 1U << index; + else + local_paca->slb_kern_bitmap &= ~(1U << index); + } + } + BUG_ON(index < SLB_NUM_BOLTED); + + return index; } -void slb_miss_large_addr(struct pt_regs *regs) +static long slb_insert_entry(unsigned long ea, unsigned long context, + unsigned long flags, int ssize, bool kernel) { - enum ctx_state prev_state = exception_enter(); - unsigned long ea = regs->dar; - int context; + unsigned long vsid; + unsigned long vsid_data, esid_data; + enum slb_index index; - if (REGION_ID(ea) != USER_REGION_ID) - goto slb_bad_addr; + vsid = get_vsid(context, ea, ssize); + if (!vsid) + return -EFAULT; /* - * Are we beyound what the page table layout supports ? + * There must not be a kernel SLB fault in alloc_slb_index or before + * slbmte here or the allocation bitmaps could get out of whack with + * the SLB. + * + * User SLB faults or preloads take this path which might get inlined + * into the caller, so add compiler barriers here to ensure unsafe + * memory accesses do not come between. */ - if ((ea & ~REGION_MASK) >= H_PGTABLE_RANGE) - goto slb_bad_addr; + barrier(); - /* Lower address should have been handled by asm code */ - if (ea < (1UL << MAX_EA_BITS_PER_CONTEXT)) - goto slb_bad_addr; + index = alloc_slb_index(kernel); + + vsid_data = __mk_vsid_data(vsid, ssize, flags); + esid_data = mk_esid_data(ea, ssize, index); + + /* + * No need for an isync before or after this slbmte. The exception + * we enter with and the rfid we exit with are context synchronizing. + * User preloads should add isync afterwards in case the kernel + * accesses user memory before it returns to userspace with rfid. + */ + assert_slb_notexists(ea); + asm volatile("slbmte %0, %1" : : "r" (vsid_data), "r" (esid_data)); + + barrier(); + + if (!kernel) + slb_cache_update(esid_data); + + return 0; +} + +static long slb_allocate_kernel(unsigned long ea, unsigned long id) +{ + unsigned long context; + unsigned long flags; + int ssize; + + if (id == KERNEL_REGION_ID) { + + /* We only support upto MAX_PHYSMEM_BITS */ + if ((ea & ~REGION_MASK) > (1UL << MAX_PHYSMEM_BITS)) + return -EFAULT; + + flags = SLB_VSID_KERNEL | mmu_psize_defs[mmu_linear_psize].sllp; + +#ifdef CONFIG_SPARSEMEM_VMEMMAP + } else if (id == VMEMMAP_REGION_ID) { + + if ((ea & ~REGION_MASK) >= (1ULL << MAX_EA_BITS_PER_CONTEXT)) + return -EFAULT; + + flags = SLB_VSID_KERNEL | mmu_psize_defs[mmu_vmemmap_psize].sllp; +#endif + } else if (id == VMALLOC_REGION_ID) { + + if ((ea & ~REGION_MASK) >= (1ULL << MAX_EA_BITS_PER_CONTEXT)) + return -EFAULT; + + if (ea < H_VMALLOC_END) + flags = get_paca()->vmalloc_sllp; + else + flags = SLB_VSID_KERNEL | mmu_psize_defs[mmu_io_psize].sllp; + } else { + return -EFAULT; + } + + ssize = MMU_SEGSIZE_1T; + if (!mmu_has_feature(MMU_FTR_1T_SEGMENT)) + ssize = MMU_SEGSIZE_256M; + + context = get_kernel_context(ea); + return slb_insert_entry(ea, context, flags, ssize, true); +} + +static long slb_allocate_user(struct mm_struct *mm, unsigned long ea) +{ + unsigned long context; + unsigned long flags; + int bpsize; + int ssize; /* * consider this as bad access if we take a SLB miss * on an address above addr limit. */ - if (ea >= current->mm->context.slb_addr_limit) - goto slb_bad_addr; + if (ea >= mm->context.slb_addr_limit) + return -EFAULT; - context = get_ea_context(&current->mm->context, ea); + context = get_user_context(&mm->context, ea); if (!context) - goto slb_bad_addr; + return -EFAULT; + + if (unlikely(ea >= H_PGTABLE_RANGE)) { + WARN_ON(1); + return -EFAULT; + } - handle_multi_context_slb_miss(context, ea); - exception_exit(prev_state); - return; + ssize = user_segment_size(ea); -slb_bad_addr: - if (user_mode(regs)) - _exception(SIGSEGV, regs, SEGV_BNDERR, ea); - else - bad_page_fault(regs, ea, SIGSEGV); - exception_exit(prev_state); + bpsize = get_slice_psize(mm, ea); + flags = SLB_VSID_USER | mmu_psize_defs[bpsize].sllp; + + return slb_insert_entry(ea, context, flags, ssize, false); +} + +long do_slb_fault(struct pt_regs *regs, unsigned long ea) +{ + unsigned long id = REGION_ID(ea); + + /* IRQs are not reconciled here, so can't check irqs_disabled */ + VM_WARN_ON(mfmsr() & MSR_EE); + + if (unlikely(!(regs->msr & MSR_RI))) + return -EINVAL; + + /* + * SLB kernel faults must be very careful not to touch anything + * that is not bolted. E.g., PACA and global variables are okay, + * mm->context stuff is not. + * + * SLB user faults can access all of kernel memory, but must be + * careful not to touch things like IRQ state because it is not + * "reconciled" here. The difficulty is that we must use + * fast_exception_return to return from kernel SLB faults without + * looking at possible non-bolted memory. We could test user vs + * kernel faults in the interrupt handler asm and do a full fault, + * reconcile, ret_from_except for user faults which would make them + * first class kernel code. But for performance it's probably nicer + * if they go via fast_exception_return too. + */ + if (id >= KERNEL_REGION_ID) { + long err; +#ifdef CONFIG_DEBUG_VM + /* Catch recursive kernel SLB faults. */ + BUG_ON(local_paca->in_kernel_slb_handler); + local_paca->in_kernel_slb_handler = 1; +#endif + err = slb_allocate_kernel(ea, id); +#ifdef CONFIG_DEBUG_VM + local_paca->in_kernel_slb_handler = 0; +#endif + return err; + } else { + struct mm_struct *mm = current->mm; + long err; + + if (unlikely(!mm)) + return -EFAULT; + + err = slb_allocate_user(mm, ea); + if (!err) + preload_add(current_thread_info(), ea); + + return err; + } +} + +void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err) +{ + if (err == -EFAULT) { + if (user_mode(regs)) + _exception(SIGSEGV, regs, SEGV_BNDERR, ea); + else + bad_page_fault(regs, ea, SIGSEGV); + } else if (err == -EINVAL) { + unrecoverable_exception(regs); + } else { + BUG(); + } } diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S @@ -1,335 +0,0 @@ -/* - * Low-level SLB routines - * - * Copyright (C) 2004 David Gibson <dwg@au.ibm.com>, IBM - * - * Based on earlier C version: - * Dave Engebretsen and Mike Corrigan {engebret|mikejc}@us.ibm.com - * Copyright (c) 2001 Dave Engebretsen - * Copyright (C) 2002 Anton Blanchard <anton@au.ibm.com>, IBM - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include <asm/processor.h> -#include <asm/ppc_asm.h> -#include <asm/asm-offsets.h> -#include <asm/cputable.h> -#include <asm/page.h> -#include <asm/mmu.h> -#include <asm/pgtable.h> -#include <asm/firmware.h> -#include <asm/feature-fixups.h> - -/* - * This macro generates asm code to compute the VSID scramble - * function. Used in slb_allocate() and do_stab_bolted. The function - * computed is: (protovsid*VSID_MULTIPLIER) % VSID_MODULUS - * - * rt = register containing the proto-VSID and into which the - * VSID will be stored - * rx = scratch register (clobbered) - * rf = flags - * - * - rt and rx must be different registers - * - The answer will end up in the low VSID_BITS bits of rt. The higher - * bits may contain other garbage, so you may need to mask the - * result. - */ -#define ASM_VSID_SCRAMBLE(rt, rx, rf, size) \ - lis rx,VSID_MULTIPLIER_##size@h; \ - ori rx,rx,VSID_MULTIPLIER_##size@l; \ - mulld rt,rt,rx; /* rt = rt * MULTIPLIER */ \ -/* \ - * powermac get slb fault before feature fixup, so make 65 bit part \ - * the default part of feature fixup \ - */ \ -BEGIN_MMU_FTR_SECTION \ - srdi rx,rt,VSID_BITS_65_##size; \ - clrldi rt,rt,(64-VSID_BITS_65_##size); \ - add rt,rt,rx; \ - addi rx,rt,1; \ - srdi rx,rx,VSID_BITS_65_##size; \ - add rt,rt,rx; \ - rldimi rf,rt,SLB_VSID_SHIFT_##size,(64 - (SLB_VSID_SHIFT_##size + VSID_BITS_65_##size)); \ -MMU_FTR_SECTION_ELSE \ - srdi rx,rt,VSID_BITS_##size; \ - clrldi rt,rt,(64-VSID_BITS_##size); \ - add rt,rt,rx; /* add high and low bits */ \ - addi rx,rt,1; \ - srdi rx,rx,VSID_BITS_##size; /* extract 2^VSID_BITS bit */ \ - add rt,rt,rx; \ - rldimi rf,rt,SLB_VSID_SHIFT_##size,(64 - (SLB_VSID_SHIFT_##size + VSID_BITS_##size)); \ -ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_68_BIT_VA) - - -/* void slb_allocate(unsigned long ea); - * - * Create an SLB entry for the given EA (user or kernel). - * r3 = faulting address, r13 = PACA - * r9, r10, r11 are clobbered by this function - * r3 is preserved. - * No other registers are examined or changed. - */ -_GLOBAL(slb_allocate) - /* - * Check if the address falls within the range of the first context, or - * if we may need to handle multi context. For the first context we - * allocate the slb entry via the fast path below. For large address we - * branch out to C-code and see if additional contexts have been - * allocated. - * The test here is: - * (ea & ~REGION_MASK) >= (1ull << MAX_EA_BITS_PER_CONTEXT) - */ - rldicr. r9,r3,4,(63 - MAX_EA_BITS_PER_CONTEXT - 4) - bne- 8f - - srdi r9,r3,60 /* get region */ - srdi r10,r3,SID_SHIFT /* get esid */ - cmpldi cr7,r9,0xc /* cmp PAGE_OFFSET for later use */ - - /* r3 = address, r10 = esid, cr7 = <> PAGE_OFFSET */ - blt cr7,0f /* user or kernel? */ - - /* Check if hitting the linear mapping or some other kernel space - */ - bne cr7,1f - - /* Linear mapping encoding bits, the "li" instruction below will - * be patched by the kernel at boot - */ -.globl slb_miss_kernel_load_linear -slb_miss_kernel_load_linear: - li r11,0 - /* - * context = (ea >> 60) - (0xc - 1) - * r9 = region id. - */ - subi r9,r9,KERNEL_REGION_CONTEXT_OFFSET - -BEGIN_FTR_SECTION - b .Lslb_finish_load -END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT) - b .Lslb_finish_load_1T - -1: -#ifdef CONFIG_SPARSEMEM_VMEMMAP - cmpldi cr0,r9,0xf - bne 1f -/* Check virtual memmap region. To be patched at kernel boot */ -.globl slb_miss_kernel_load_vmemmap -slb_miss_kernel_load_vmemmap: - li r11,0 - b 6f -1: -#endif /* CONFIG_SPARSEMEM_VMEMMAP */ - - /* - * r10 contains the ESID, which is the original faulting EA shifted - * right by 28 bits. We need to compare that with (H_VMALLOC_END >> 28) - * which is 0xd00038000. That can't be used as an immediate, even if we - * ignored the 0xd, so we have to load it into a register, and we only - * have one register free. So we must load all of (H_VMALLOC_END >> 28) - * into a register and compare ESID against that. - */ - lis r11,(H_VMALLOC_END >> 32)@h // r11 = 0xffffffffd0000000 - ori r11,r11,(H_VMALLOC_END >> 32)@l // r11 = 0xffffffffd0003800 - // Rotate left 4, then mask with 0xffffffff0 - rldic r11,r11,4,28 // r11 = 0xd00038000 - cmpld r10,r11 // if r10 >= r11 - bge 5f // goto io_mapping - - /* - * vmalloc mapping gets the encoding from the PACA as the mapping - * can be demoted from 64K -> 4K dynamically on some machines. - */ - lhz r11,PACAVMALLOCSLLP(r13) - b 6f -5: - /* IO mapping */ -.globl slb_miss_kernel_load_io -slb_miss_kernel_load_io: - li r11,0 -6: - /* - * context = (ea >> 60) - (0xc - 1) - * r9 = region id. - */ - subi r9,r9,KERNEL_REGION_CONTEXT_OFFSET - -BEGIN_FTR_SECTION - b .Lslb_finish_load -END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT) - b .Lslb_finish_load_1T - -0: /* - * For userspace addresses, make sure this is region 0. - */ - cmpdi r9, 0 - bne- 8f - /* - * user space make sure we are within the allowed limit - */ - ld r11,PACA_SLB_ADDR_LIMIT(r13) - cmpld r3,r11 - bge- 8f - - /* when using slices, we extract the psize off the slice bitmaps - * and then we need to get the sllp encoding off the mmu_psize_defs - * array. - * - * XXX This is a bit inefficient especially for the normal case, - * so we should try to implement a fast path for the standard page - * size using the old sllp value so we avoid the array. We cannot - * really do dynamic patching unfortunately as processes might flip - * between 4k and 64k standard page size - */ -#ifdef CONFIG_PPC_MM_SLICES - /* r10 have esid */ - cmpldi r10,16 - /* below SLICE_LOW_TOP */ - blt 5f - /* - * Handle hpsizes, - * r9 is get_paca()->context.high_slices_psize[index], r11 is mask_index - */ - srdi r11,r10,(SLICE_HIGH_SHIFT - SLICE_LOW_SHIFT + 1) /* index */ - addi r9,r11,PACAHIGHSLICEPSIZE - lbzx r9,r13,r9 /* r9 is hpsizes[r11] */ - /* r11 = (r10 >> (SLICE_HIGH_SHIFT - SLICE_LOW_SHIFT)) & 0x1 */ - rldicl r11,r10,(64 - (SLICE_HIGH_SHIFT - SLICE_LOW_SHIFT)),63 - b 6f - -5: - /* - * Handle lpsizes - * r9 is get_paca()->context.low_slices_psize[index], r11 is mask_index - */ - srdi r11,r10,1 /* index */ - addi r9,r11,PACALOWSLICESPSIZE - lbzx r9,r13,r9 /* r9 is lpsizes[r11] */ - rldicl r11,r10,0,63 /* r11 = r10 & 0x1 */ -6: - sldi r11,r11,2 /* index * 4 */ - /* Extract the psize and multiply to get an array offset */ - srd r9,r9,r11 - andi. r9,r9,0xf - mulli r9,r9,MMUPSIZEDEFSIZE - - /* Now get to the array and obtain the sllp - */ - ld r11,PACATOC(r13) - ld r11,mmu_psize_defs@got(r11) - add r11,r11,r9 - ld r11,MMUPSIZESLLP(r11) - ori r11,r11,SLB_VSID_USER -#else - /* paca context sllp already contains the SLB_VSID_USER bits */ - lhz r11,PACACONTEXTSLLP(r13) -#endif /* CONFIG_PPC_MM_SLICES */ - - ld r9,PACACONTEXTID(r13) -BEGIN_FTR_SECTION - cmpldi r10,0x1000 - bge .Lslb_finish_load_1T -END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) - b .Lslb_finish_load - -8: /* invalid EA - return an error indication */ - crset 4*cr0+eq /* indicate failure */ - blr - -/* - * Finish loading of an SLB entry and return - * - * r3 = EA, r9 = context, r10 = ESID, r11 = flags, clobbers r9, cr7 = <> PAGE_OFFSET - */ -.Lslb_finish_load: - rldimi r10,r9,ESID_BITS,0 - ASM_VSID_SCRAMBLE(r10,r9,r11,256M) - /* r3 = EA, r11 = VSID data */ - /* - * Find a slot, round robin. Previously we tried to find a - * free slot first but that took too long. Unfortunately we - * dont have any LRU information to help us choose a slot. - */ - - mr r9,r3 - - /* slb_finish_load_1T continues here. r9=EA with non-ESID bits clear */ -7: ld r10,PACASTABRR(r13) - addi r10,r10,1 - /* This gets soft patched on boot. */ -.globl slb_compare_rr_to_size -slb_compare_rr_to_size: - cmpldi r10,0 - - blt+ 4f - li r10,SLB_NUM_BOLTED - -4: - std r10,PACASTABRR(r13) - -3: - rldimi r9,r10,0,36 /* r9 = EA[0:35] | entry */ - oris r10,r9,SLB_ESID_V@h /* r10 = r9 | SLB_ESID_V */ - - /* r9 = ESID data, r11 = VSID data */ - - /* - * No need for an isync before or after this slbmte. The exception - * we enter with and the rfid we exit with are context synchronizing. - */ - slbmte r11,r10 - - /* we're done for kernel addresses */ - crclr 4*cr0+eq /* set result to "success" */ - bgelr cr7 - - /* Update the slb cache */ - lhz r9,PACASLBCACHEPTR(r13) /* offset = paca->slb_cache_ptr */ - cmpldi r9,SLB_CACHE_ENTRIES - bge 1f - - /* still room in the slb cache */ - sldi r11,r9,2 /* r11 = offset * sizeof(u32) */ - srdi r10,r10,28 /* get the 36 bits of the ESID */ - add r11,r11,r13 /* r11 = (u32 *)paca + offset */ - stw r10,PACASLBCACHE(r11) /* paca->slb_cache[offset] = esid */ - addi r9,r9,1 /* offset++ */ - b 2f -1: /* offset >= SLB_CACHE_ENTRIES */ - li r9,SLB_CACHE_ENTRIES+1 -2: - sth r9,PACASLBCACHEPTR(r13) /* paca->slb_cache_ptr = offset */ - crclr 4*cr0+eq /* set result to "success" */ - blr - -/* - * Finish loading of a 1T SLB entry (for the kernel linear mapping) and return. - * - * r3 = EA, r9 = context, r10 = ESID(256MB), r11 = flags, clobbers r9 - */ -.Lslb_finish_load_1T: - srdi r10,r10,(SID_SHIFT_1T - SID_SHIFT) /* get 1T ESID */ - rldimi r10,r9,ESID_BITS_1T,0 - ASM_VSID_SCRAMBLE(r10,r9,r11,1T) - - li r10,MMU_SEGSIZE_1T - rldimi r11,r10,SLB_VSID_SSIZE_SHIFT,0 /* insert segment size */ - - /* r3 = EA, r11 = VSID data */ - clrrdi r9,r3,SID_SHIFT_1T /* clear out non-ESID bits */ - b 7b - - -_ASM_NOKPROBE_SYMBOL(slb_allocate) -_ASM_NOKPROBE_SYMBOL(slb_miss_kernel_load_linear) -_ASM_NOKPROBE_SYMBOL(slb_miss_kernel_load_io) -_ASM_NOKPROBE_SYMBOL(slb_compare_rr_to_size) -#ifdef CONFIG_SPARSEMEM_VMEMMAP -_ASM_NOKPROBE_SYMBOL(slb_miss_kernel_load_vmemmap) -#endif diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c @@ -31,6 +31,7 @@ #include <linux/spinlock.h> #include <linux/export.h> #include <linux/hugetlb.h> +#include <linux/sched/mm.h> #include <asm/mman.h> #include <asm/mmu.h> #include <asm/copro.h> @@ -61,6 +62,13 @@ static void slice_print_mask(const char *label, const struct slice_mask *mask) { #endif +static inline bool slice_addr_is_low(unsigned long addr) +{ + u64 tmp = (u64)addr; + + return tmp < SLICE_LOW_TOP; +} + static void slice_range_to_mask(unsigned long start, unsigned long len, struct slice_mask *ret) { @@ -70,7 +78,7 @@ static void slice_range_to_mask(unsigned long start, unsigned long len, if (SLICE_NUM_HIGH) bitmap_zero(ret->high_slices, SLICE_NUM_HIGH); - if (start < SLICE_LOW_TOP) { + if (slice_addr_is_low(start)) { unsigned long mend = min(end, (unsigned long)(SLICE_LOW_TOP - 1)); @@ -78,7 +86,7 @@ static void slice_range_to_mask(unsigned long start, unsigned long len, - (1u << GET_LOW_SLICE_INDEX(start)); } - if ((start + len) > SLICE_LOW_TOP) { + if (SLICE_NUM_HIGH && !slice_addr_is_low(end)) { unsigned long start_index = GET_HIGH_SLICE_INDEX(start); unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT)); unsigned long count = GET_HIGH_SLICE_INDEX(align_end) - start_index; @@ -133,7 +141,7 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret, if (!slice_low_has_vma(mm, i)) ret->low_slices |= 1u << i; - if (high_limit <= SLICE_LOW_TOP) + if (slice_addr_is_low(high_limit - 1)) return; for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) @@ -182,7 +190,7 @@ static bool slice_check_range_fits(struct mm_struct *mm, unsigned long end = start + len - 1; u64 low_slices = 0; - if (start < SLICE_LOW_TOP) { + if (slice_addr_is_low(start)) { unsigned long mend = min(end, (unsigned long)(SLICE_LOW_TOP - 1)); @@ -192,7 +200,7 @@ static bool slice_check_range_fits(struct mm_struct *mm, if ((low_slices & available->low_slices) != low_slices) return false; - if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) { + if (SLICE_NUM_HIGH && !slice_addr_is_low(end)) { unsigned long start_index = GET_HIGH_SLICE_INDEX(start); unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT)); unsigned long count = GET_HIGH_SLICE_INDEX(align_end) - start_index; @@ -219,7 +227,7 @@ static void slice_flush_segments(void *parm) copy_mm_to_paca(current->active_mm); local_irq_save(flags); - slb_flush_and_rebolt(); + slb_flush_and_restore_bolted(); local_irq_restore(flags); #endif } @@ -303,7 +311,7 @@ static bool slice_scan_available(unsigned long addr, int end, unsigned long *boundary_addr) { unsigned long slice; - if (addr < SLICE_LOW_TOP) { + if (slice_addr_is_low(addr)) { slice = GET_LOW_SLICE_INDEX(addr); *boundary_addr = (slice + end) << SLICE_LOW_SHIFT; return !!(available->low_slices & (1u << slice)); @@ -706,7 +714,7 @@ unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr) VM_BUG_ON(radix_enabled()); - if (addr < SLICE_LOW_TOP) { + if (slice_addr_is_low(addr)) { psizes = mm->context.low_slices_psize; index = GET_LOW_SLICE_INDEX(addr); } else { @@ -757,6 +765,20 @@ void slice_init_new_context_exec(struct mm_struct *mm) bitmap_fill(mask->high_slices, SLICE_NUM_HIGH); } +#ifdef CONFIG_PPC_BOOK3S_64 +void slice_setup_new_exec(void) +{ + struct mm_struct *mm = current->mm; + + slice_dbg("slice_setup_new_exec(mm=%p)\n", mm); + + if (!is_32bit_task()) + return; + + mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW; +} +#endif + void slice_set_range_psize(struct mm_struct *mm, unsigned long start, unsigned long len, unsigned int psize) { diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c @@ -366,6 +366,7 @@ static inline void _tlbiel_lpid_guest(unsigned long lpid, unsigned long ric) __tlbiel_lpid_guest(lpid, set, RIC_FLUSH_TLB); asm volatile("ptesync": : :"memory"); + asm volatile(PPC_INVALIDATE_ERAT : : :"memory"); } @@ -1016,7 +1017,6 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr) goto local; } _tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true); - goto local; } else { local: _tlbiel_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true); diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c @@ -503,6 +503,9 @@ static void setup_page_sizes(void) for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) { struct mmu_psize_def *def = &mmu_psize_defs[psize]; + if (!def->shift) + continue; + if (tlb1ps & (1U << (def->shift - 10))) { def->flags |= MMU_PAGE_SIZE_DIRECT; diff --git a/arch/powerpc/oprofile/Makefile b/arch/powerpc/oprofile/Makefile @@ -1,5 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile @@ -1,5 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror obj-$(CONFIG_PERF_EVENTS) += callchain.o perf_regs.o diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c @@ -1392,7 +1392,7 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id if (ret) goto err_free_cpuhp_mem; - pr_info("%s performance monitor hardware support registered\n", + pr_debug("%s performance monitor hardware support registered\n", pmu_ptr->pmu.name); return 0; diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c @@ -238,6 +238,7 @@ static int power7_marked_instr_event(u64 event) case 6: if (psel == 0x64) return pmc >= 3; + break; case 8: return unit == 0xd; } diff --git a/arch/powerpc/platforms/40x/Kconfig b/arch/powerpc/platforms/40x/Kconfig @@ -2,7 +2,6 @@ config ACADIA bool "Acadia" depends on 40x - default n select PPC40x_SIMPLE select 405EZ help @@ -11,7 +10,6 @@ config ACADIA config EP405 bool "EP405/EP405PC" depends on 40x - default n select 405GP select PCI help @@ -20,7 +18,6 @@ config EP405 config HOTFOOT bool "Hotfoot" depends on 40x - default n select PPC40x_SIMPLE select PCI help @@ -29,7 +26,6 @@ config HOTFOOT config KILAUEA bool "Kilauea" depends on 40x - default n select 405EX select PPC40x_SIMPLE select PPC4xx_PCI_EXPRESS @@ -41,7 +37,6 @@ config KILAUEA config MAKALU bool "Makalu" depends on 40x - default n select 405EX select PCI select PPC4xx_PCI_EXPRESS @@ -62,7 +57,6 @@ config WALNUT config XILINX_VIRTEX_GENERIC_BOARD bool "Generic Xilinx Virtex board" depends on 40x - default n select XILINX_VIRTEX_II_PRO select XILINX_VIRTEX_4_FX select XILINX_INTC @@ -80,7 +74,6 @@ config XILINX_VIRTEX_GENERIC_BOARD config OBS600 bool "OpenBlockS 600" depends on 40x - default n select 405EX select PPC40x_SIMPLE help @@ -90,7 +83,6 @@ config OBS600 config PPC40x_SIMPLE bool "Simple PowerPC 40x board support" depends on 40x - default n help This option enables the simple PowerPC 40x platform support. @@ -156,7 +148,6 @@ config IBM405_ERR51 config APM8018X bool "APM8018X" depends on 40x - default n select PPC40x_SIMPLE help This option enables support for the AppliedMicro APM8018X evaluation diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig @@ -2,7 +2,6 @@ config PPC_47x bool "Support for 47x variant" depends on 44x - default n select MPIC help This option enables support for the 47x family of processors and is @@ -11,7 +10,6 @@ config PPC_47x config BAMBOO bool "Bamboo" depends on 44x - default n select PPC44x_SIMPLE select 440EP select PCI @@ -21,7 +19,6 @@ config BAMBOO config BLUESTONE bool "Bluestone" depends on 44x - default n select PPC44x_SIMPLE select APM821xx select PCI_MSI @@ -44,7 +41,6 @@ config EBONY config SAM440EP bool "Sam440ep" depends on 44x - default n select 440EP select PCI help @@ -53,7 +49,6 @@ config SAM440EP config SEQUOIA bool "Sequoia" depends on 44x - default n select PPC44x_SIMPLE select 440EPX help @@ -62,7 +57,6 @@ config SEQUOIA config TAISHAN bool "Taishan" depends on 44x - default n select PPC44x_SIMPLE select 440GX select PCI @@ -73,7 +67,6 @@ config TAISHAN config KATMAI bool "Katmai" depends on 44x - default n select PPC44x_SIMPLE select 440SPe select PCI @@ -86,7 +79,6 @@ config KATMAI config RAINIER bool "Rainier" depends on 44x - default n select PPC44x_SIMPLE select 440GRX select PCI @@ -96,7 +88,6 @@ config RAINIER config WARP bool "PIKA Warp" depends on 44x - default n select 440EP help This option enables support for the PIKA Warp(tm) Appliance. The Warp @@ -109,7 +100,6 @@ config WARP config ARCHES bool "Arches" depends on 44x - default n select PPC44x_SIMPLE select 460EX # Odd since it uses 460GT but the effects are the same select PCI @@ -120,7 +110,6 @@ config ARCHES config CANYONLANDS bool "Canyonlands" depends on 44x - default n select 460EX select PCI select PPC4xx_PCI_EXPRESS @@ -134,7 +123,6 @@ config CANYONLANDS config GLACIER bool "Glacier" depends on 44x - default n select PPC44x_SIMPLE select 460EX # Odd since it uses 460GT but the effects are the same select PCI @@ -147,7 +135,6 @@ config GLACIER config REDWOOD bool "Redwood" depends on 44x - default n select PPC44x_SIMPLE select 460SX select PCI @@ -160,7 +147,6 @@ config REDWOOD config EIGER bool "Eiger" depends on 44x - default n select PPC44x_SIMPLE select 460SX select PCI @@ -172,7 +158,6 @@ config EIGER config YOSEMITE bool "Yosemite" depends on 44x - default n select PPC44x_SIMPLE select 440EP select PCI @@ -182,7 +167,6 @@ config YOSEMITE config ISS4xx bool "ISS 4xx Simulator" depends on (44x || 40x) - default n select 405GP if 40x select 440GP if 44x && !PPC_47x select PPC_FPU @@ -193,7 +177,6 @@ config ISS4xx config CURRITUCK bool "IBM Currituck (476fpe) Support" depends on PPC_47x - default n select SWIOTLB select 476FPE select PPC4xx_PCI_EXPRESS @@ -203,7 +186,6 @@ config CURRITUCK config FSP2 bool "IBM FSP2 (476fpe) Support" depends on PPC_47x - default n select 476FPE select IBM_EMAC_EMAC4 if IBM_EMAC select IBM_EMAC_RGMII if IBM_EMAC @@ -215,7 +197,6 @@ config FSP2 config AKEBONO bool "IBM Akebono (476gtr) Support" depends on PPC_47x - default n select SWIOTLB select 476FPE select PPC4xx_PCI_EXPRESS @@ -241,7 +222,6 @@ config AKEBONO config ICON bool "Icon" depends on 44x - default n select PPC44x_SIMPLE select 440SPe select PCI @@ -252,7 +232,6 @@ config ICON config XILINX_VIRTEX440_GENERIC_BOARD bool "Generic Xilinx Virtex 5 FXT board support" depends on 44x - default n select XILINX_VIRTEX_5_FXT select XILINX_INTC help @@ -280,7 +259,6 @@ config XILINX_ML510 config PPC44x_SIMPLE bool "Simple PowerPC 44x board support" depends on 44x - default n help This option enables the simple PowerPC 44x platform support. diff --git a/arch/powerpc/platforms/44x/fsp2.c b/arch/powerpc/platforms/44x/fsp2.c @@ -210,15 +210,15 @@ static void node_irq_request(const char *compat, irq_handler_t errirq_handler) for_each_compatible_node(np, NULL, compat) { irq = irq_of_parse_and_map(np, 0); if (irq == NO_IRQ) { - pr_err("device tree node %s is missing a interrupt", - np->name); + pr_err("device tree node %pOFn is missing a interrupt", + np); return; } rc = request_irq(irq, errirq_handler, 0, np->name, np); if (rc) { - pr_err("fsp_of_probe: request_irq failed: np=%s rc=%d", - np->full_name, rc); + pr_err("fsp_of_probe: request_irq failed: np=%pOF rc=%d", + np, rc); return; } } diff --git a/arch/powerpc/platforms/4xx/ocm.c b/arch/powerpc/platforms/4xx/ocm.c @@ -113,7 +113,6 @@ static void __init ocm_init_node(int count, struct device_node *node) int len; struct resource rsrc; - int ioflags; ocm = ocm_get_node(count); @@ -179,9 +178,8 @@ static void __init ocm_init_node(int count, struct device_node *node) /* ioremap the non-cached region */ if (ocm->nc.memtotal) { - ioflags = _PAGE_NO_CACHE | _PAGE_GUARDED | _PAGE_EXEC; ocm->nc.virt = __ioremap(ocm->nc.phys, ocm->nc.memtotal, - ioflags); + _PAGE_EXEC | PAGE_KERNEL_NCG); if (!ocm->nc.virt) { printk(KERN_ERR @@ -195,9 +193,8 @@ static void __init ocm_init_node(int count, struct device_node *node) /* ioremap the cached region */ if (ocm->c.memtotal) { - ioflags = _PAGE_EXEC; ocm->c.virt = __ioremap(ocm->c.phys, ocm->c.memtotal, - ioflags); + _PAGE_EXEC | PAGE_KERNEL); if (!ocm->c.virt) { printk(KERN_ERR diff --git a/arch/powerpc/platforms/82xx/Kconfig b/arch/powerpc/platforms/82xx/Kconfig @@ -51,7 +51,6 @@ endif config PQ2ADS bool - default n config 8260 bool diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c @@ -216,8 +216,8 @@ static int smp_85xx_start_cpu(int cpu) /* Map the spin table */ if (ioremappable) - spin_table = ioremap_prot(*cpu_rel_addr, - sizeof(struct epapr_spin_table), _PAGE_COHERENT); + spin_table = ioremap_coherent(*cpu_rel_addr, + sizeof(struct epapr_spin_table)); else spin_table = phys_to_virt(*cpu_rel_addr); diff --git a/arch/powerpc/platforms/8xx/machine_check.c b/arch/powerpc/platforms/8xx/machine_check.c @@ -18,9 +18,9 @@ int machine_check_8xx(struct pt_regs *regs) pr_err("Machine check in kernel mode.\n"); pr_err("Caused by (from SRR1=%lx): ", reason); if (reason & 0x40000000) - pr_err("Fetch error at address %lx\n", regs->nip); + pr_cont("Fetch error at address %lx\n", regs->nip); else - pr_err("Data access error at address %lx\n", regs->dar); + pr_cont("Data access error at address %lx\n", regs->dar); #ifdef CONFIG_PCI /* the qspan pci read routines can cause machine checks -- Cort diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig @@ -23,7 +23,6 @@ source "arch/powerpc/platforms/amigaone/Kconfig" config KVM_GUEST bool "KVM Guest support" - default n select EPAPR_PARAVIRT ---help--- This option enables various optimizations for running under the KVM @@ -34,7 +33,6 @@ config KVM_GUEST config EPAPR_PARAVIRT bool "ePAPR para-virtualization support" - default n help Enables ePAPR para-virtualization support for guests. @@ -74,7 +72,6 @@ config PPC_DT_CPU_FTRS config UDBG_RTAS_CONSOLE bool "RTAS based debug console" depends on PPC_RTAS - default n config PPC_SMP_MUXED_IPI bool @@ -86,16 +83,13 @@ config PPC_SMP_MUXED_IPI config IPIC bool - default n config MPIC bool - default n config MPIC_TIMER bool "MPIC Global Timer" depends on MPIC && FSL_SOC - default n help The MPIC global timer is a hardware timer inside the Freescale PIC complying with OpenPIC standard. When the @@ -107,7 +101,6 @@ config MPIC_TIMER config FSL_MPIC_TIMER_WAKEUP tristate "Freescale MPIC global timer wakeup driver" depends on FSL_SOC && MPIC_TIMER && PM - default n help The driver provides a way to wake up the system by MPIC timer. @@ -115,43 +108,35 @@ config FSL_MPIC_TIMER_WAKEUP config PPC_EPAPR_HV_PIC bool - default n select EPAPR_PARAVIRT config MPIC_WEIRD bool - default n config MPIC_MSGR bool "MPIC message register support" depends on MPIC - default n help Enables support for the MPIC message registers. These registers are used for inter-processor communication. config PPC_I8259 bool - default n config U3_DART bool depends on PPC64 - default n config PPC_RTAS bool - default n config RTAS_ERROR_LOGGING bool depends on PPC_RTAS - default n config PPC_RTAS_DAEMON bool depends on PPC_RTAS - default n config RTAS_PROC bool "Proc interface to RTAS" @@ -164,11 +149,9 @@ config RTAS_FLASH config MMIO_NVRAM bool - default n config MPIC_U3_HT_IRQS bool - default n config MPIC_BROKEN_REGREAD bool @@ -187,15 +170,12 @@ config EEH config PPC_MPC106 bool - default n config PPC_970_NAP bool - default n config PPC_P7_NAP bool - default n config PPC_INDIRECT_PIO bool @@ -295,7 +275,6 @@ config CPM2 config FSL_ULI1575 bool - default n select GENERIC_ISA_DMA help Supports for the ULI1575 PCIe south bridge that exists on some diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype @@ -1,7 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 config PPC64 bool "64-bit kernel" - default n select ZLIB_DEFLATE help This option selects whether a 32-bit or a 64-bit kernel @@ -72,6 +71,7 @@ config PPC_BOOK3S_64 select PPC_HAVE_PMU_SUPPORT select SYS_SUPPORTS_HUGETLBFS select HAVE_ARCH_TRANSPARENT_HUGEPAGE + select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE select ARCH_SUPPORTS_NUMA_BALANCING select IRQ_WORK @@ -368,7 +368,6 @@ config PPC_MM_SLICES bool default y if PPC_BOOK3S_64 default y if PPC_8xx && HUGETLB_PAGE - default n config PPC_HAVE_PMU_SUPPORT bool @@ -382,7 +381,6 @@ config PPC_PERF_CTRS config FORCE_SMP # Allow platforms to force SMP=y by selecting this bool - default n select SMP config SMP @@ -423,7 +421,6 @@ config CHECK_CACHE_COHERENCY config PPC_DOORBELL bool - default n endmenu diff --git a/arch/powerpc/platforms/Makefile b/arch/powerpc/platforms/Makefile @@ -1,7 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror - obj-$(CONFIG_FSL_ULI1575) += fsl_uli1575.o obj-$(CONFIG_PPC_PMAC) += powermac/ diff --git a/arch/powerpc/platforms/cell/Kconfig b/arch/powerpc/platforms/cell/Kconfig @@ -1,7 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 config PPC_CELL bool - default n config PPC_CELL_COMMON bool @@ -22,7 +21,6 @@ config PPC_CELL_NATIVE select IBM_EMAC_RGMII if IBM_EMAC select IBM_EMAC_ZMII if IBM_EMAC #test only select IBM_EMAC_TAH if IBM_EMAC #test only - default n config PPC_IBM_CELL_BLADE bool "IBM Cell Blade" @@ -54,7 +52,6 @@ config SPU_FS config SPU_BASE bool - default n select PPC_COPRO_BASE config CBE_RAS diff --git a/arch/powerpc/platforms/cell/spu_manage.c b/arch/powerpc/platforms/cell/spu_manage.c @@ -180,35 +180,22 @@ out: static int __init spu_map_interrupts(struct spu *spu, struct device_node *np) { - struct of_phandle_args oirq; - int ret; int i; for (i=0; i < 3; i++) { - ret = of_irq_parse_one(np, i, &oirq); - if (ret) { - pr_debug("spu_new: failed to get irq %d\n", i); - goto err; - } - ret = -EINVAL; - pr_debug(" irq %d no 0x%x on %pOF\n", i, oirq.args[0], - oirq.np); - spu->irqs[i] = irq_create_of_mapping(&oirq); - if (!spu->irqs[i]) { - pr_debug("spu_new: failed to map it !\n"); + spu->irqs[i] = irq_of_parse_and_map(np, i); + if (!spu->irqs[i]) goto err; - } } return 0; err: - pr_debug("failed to map irq %x for spu %s\n", *oirq.args, - spu->name); + pr_debug("failed to map irq %x for spu %s\n", i, spu->name); for (; i >= 0; i--) { if (spu->irqs[i]) irq_dispose_mapping(spu->irqs[i]); } - return ret; + return -EINVAL; } static int spu_map_resource(struct spu *spu, int nr, @@ -295,8 +282,8 @@ static int __init of_enumerate_spus(int (*fn)(void *data)) for_each_node_by_type(node, "spe") { ret = fn(node); if (ret) { - printk(KERN_WARNING "%s: Error initializing %s\n", - __func__, node->name); + printk(KERN_WARNING "%s: Error initializing %pOFn\n", + __func__, node); of_node_put(node); break; } diff --git a/arch/powerpc/platforms/embedded6xx/wii.c b/arch/powerpc/platforms/embedded6xx/wii.c @@ -112,7 +112,7 @@ static void __iomem *wii_ioremap_hw_regs(char *name, char *compatible) } error = of_address_to_resource(np, 0, &res); if (error) { - pr_err("no valid reg found for %s\n", np->name); + pr_err("no valid reg found for %pOFn\n", np); goto out_put; } diff --git a/arch/powerpc/platforms/maple/Kconfig b/arch/powerpc/platforms/maple/Kconfig @@ -13,7 +13,6 @@ config PPC_MAPLE select PPC_RTAS select MMIO_NVRAM select ATA_NONSTANDARD if ATA - default n help This option enables support for the Maple 970FX Evaluation Board. For more information, refer to <http://www.970eval.com> diff --git a/arch/powerpc/platforms/pasemi/Kconfig b/arch/powerpc/platforms/pasemi/Kconfig @@ -2,7 +2,6 @@ config PPC_PASEMI depends on PPC64 && PPC_BOOK3S && CPU_BIG_ENDIAN bool "PA Semi SoC-based platforms" - default n select MPIC select PCI select PPC_UDBG_16550 diff --git a/arch/powerpc/platforms/pasemi/dma_lib.c b/arch/powerpc/platforms/pasemi/dma_lib.c @@ -576,7 +576,7 @@ int pasemi_dma_init(void) res.start = 0xfd800000; res.end = res.start + 0x1000; } - dma_status = __ioremap(res.start, resource_size(&res), 0); + dma_status = ioremap_cache(res.start, resource_size(&res)); pci_dev_put(iob_pdev); for (i = 0; i < MAX_TXCH; i++) diff --git a/arch/powerpc/platforms/powermac/Makefile b/arch/powerpc/platforms/powermac/Makefile @@ -1,9 +1,10 @@ # SPDX-License-Identifier: GPL-2.0 CFLAGS_bootx_init.o += -fPIC +CFLAGS_bootx_init.o += $(call cc-option, -fno-stack-protector) ifdef CONFIG_FUNCTION_TRACER # Do not trace early boot code -CFLAGS_REMOVE_bootx_init.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_bootx_init.o = $(CC_FLAGS_FTRACE) endif obj-y += pic.o setup.o time.o feature.o pci.o \ diff --git a/arch/powerpc/platforms/powermac/time.c b/arch/powerpc/platforms/powermac/time.c @@ -45,13 +45,6 @@ #endif /* - * Offset between Unix time (1970-based) and Mac time (1904-based). Cuda and PMU - * times wrap in 2040. If we need to handle later times, the read_time functions - * need to be changed to interpret wrapped times as post-2040. - */ -#define RTC_OFFSET 2082844800 - -/* * Calibrate the decrementer frequency with the VIA timer 1. */ #define VIA_TIMER_FREQ_6 4700000 /* time 1 frequency * 6 */ @@ -90,98 +83,6 @@ long __init pmac_time_init(void) return delta; } -#ifdef CONFIG_ADB_CUDA -static time64_t cuda_get_time(void) -{ - struct adb_request req; - time64_t now; - - if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0) - return 0; - while (!req.complete) - cuda_poll(); - if (req.reply_len != 7) - printk(KERN_ERR "cuda_get_time: got %d byte reply\n", - req.reply_len); - now = (u32)((req.reply[3] << 24) + (req.reply[4] << 16) + - (req.reply[5] << 8) + req.reply[6]); - /* it's either after year 2040, or the RTC has gone backwards */ - WARN_ON(now < RTC_OFFSET); - - return now - RTC_OFFSET; -} - -#define cuda_get_rtc_time(tm) rtc_time64_to_tm(cuda_get_time(), (tm)) - -static int cuda_set_rtc_time(struct rtc_time *tm) -{ - u32 nowtime; - struct adb_request req; - - nowtime = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET); - if (cuda_request(&req, NULL, 6, CUDA_PACKET, CUDA_SET_TIME, - nowtime >> 24, nowtime >> 16, nowtime >> 8, - nowtime) < 0) - return -ENXIO; - while (!req.complete) - cuda_poll(); - if ((req.reply_len != 3) && (req.reply_len != 7)) - printk(KERN_ERR "cuda_set_rtc_time: got %d byte reply\n", - req.reply_len); - return 0; -} - -#else -#define cuda_get_time() 0 -#define cuda_get_rtc_time(tm) -#define cuda_set_rtc_time(tm) 0 -#endif - -#ifdef CONFIG_ADB_PMU -static time64_t pmu_get_time(void) -{ - struct adb_request req; - time64_t now; - - if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0) - return 0; - pmu_wait_complete(&req); - if (req.reply_len != 4) - printk(KERN_ERR "pmu_get_time: got %d byte reply from PMU\n", - req.reply_len); - now = (u32)((req.reply[0] << 24) + (req.reply[1] << 16) + - (req.reply[2] << 8) + req.reply[3]); - - /* it's either after year 2040, or the RTC has gone backwards */ - WARN_ON(now < RTC_OFFSET); - - return now - RTC_OFFSET; -} - -#define pmu_get_rtc_time(tm) rtc_time64_to_tm(pmu_get_time(), (tm)) - -static int pmu_set_rtc_time(struct rtc_time *tm) -{ - u32 nowtime; - struct adb_request req; - - nowtime = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET); - if (pmu_request(&req, NULL, 5, PMU_SET_RTC, nowtime >> 24, - nowtime >> 16, nowtime >> 8, nowtime) < 0) - return -ENXIO; - pmu_wait_complete(&req); - if (req.reply_len != 0) - printk(KERN_ERR "pmu_set_rtc_time: %d byte reply from PMU\n", - req.reply_len); - return 0; -} - -#else -#define pmu_get_time() 0 -#define pmu_get_rtc_time(tm) -#define pmu_set_rtc_time(tm) 0 -#endif - #ifdef CONFIG_PMAC_SMU static time64_t smu_get_time(void) { @@ -191,11 +92,6 @@ static time64_t smu_get_time(void) return 0; return rtc_tm_to_time64(&tm); } - -#else -#define smu_get_time() 0 -#define smu_get_rtc_time(tm, spin) -#define smu_set_rtc_time(tm, spin) 0 #endif /* Can't be __init, it's called when suspending and resuming */ @@ -203,12 +99,18 @@ time64_t pmac_get_boot_time(void) { /* Get the time from the RTC, used only at boot time */ switch (sys_ctrler) { +#ifdef CONFIG_ADB_CUDA case SYS_CTRLER_CUDA: return cuda_get_time(); +#endif +#ifdef CONFIG_ADB_PMU case SYS_CTRLER_PMU: return pmu_get_time(); +#endif +#ifdef CONFIG_PMAC_SMU case SYS_CTRLER_SMU: return smu_get_time(); +#endif default: return 0; } @@ -218,15 +120,21 @@ void pmac_get_rtc_time(struct rtc_time *tm) { /* Get the time from the RTC, used only at boot time */ switch (sys_ctrler) { +#ifdef CONFIG_ADB_CUDA case SYS_CTRLER_CUDA: - cuda_get_rtc_time(tm); + rtc_time64_to_tm(cuda_get_time(), tm); break; +#endif +#ifdef CONFIG_ADB_PMU case SYS_CTRLER_PMU: - pmu_get_rtc_time(tm); + rtc_time64_to_tm(pmu_get_time(), tm); break; +#endif +#ifdef CONFIG_PMAC_SMU case SYS_CTRLER_SMU: smu_get_rtc_time(tm, 1); break; +#endif default: ; } @@ -235,12 +143,18 @@ void pmac_get_rtc_time(struct rtc_time *tm) int pmac_set_rtc_time(struct rtc_time *tm) { switch (sys_ctrler) { +#ifdef CONFIG_ADB_CUDA case SYS_CTRLER_CUDA: return cuda_set_rtc_time(tm); +#endif +#ifdef CONFIG_ADB_PMU case SYS_CTRLER_PMU: return pmu_set_rtc_time(tm); +#endif +#ifdef CONFIG_PMAC_SMU case SYS_CTRLER_SMU: return smu_set_rtc_time(tm, 1); +#endif default: return -ENODEV; } diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig @@ -15,11 +15,6 @@ config PPC_POWERNV select PPC_SCOM select ARCH_RANDOM select CPU_FREQ - select CPU_FREQ_GOV_PERFORMANCE - select CPU_FREQ_GOV_POWERSAVE - select CPU_FREQ_GOV_USERSPACE - select CPU_FREQ_GOV_ONDEMAND - select CPU_FREQ_GOV_CONSERVATIVE select PPC_DOORBELL select MMU_NOTIFIER select FORCE_SMP @@ -35,7 +30,6 @@ config OPAL_PRD config PPC_MEMTRACE bool "Enable removal of RAM from kernel mappings for tracing" depends on PPC_POWERNV && MEMORY_HOTREMOVE - default n help Enabling this option allows for the removal of memory (RAM) from the kernel mappings to be used for hardware tracing. diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -223,14 +223,6 @@ int pnv_eeh_post_init(void) eeh_probe_devices(); eeh_addr_cache_build(); - if (eeh_has_flag(EEH_POSTPONED_PROBE)) { - eeh_clear_flag(EEH_POSTPONED_PROBE); - if (eeh_enabled()) - pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n"); - else - pr_info("EEH: No capable adapters found\n"); - } - /* Register OPAL event notifier */ eeh_event_irq = opal_event_request(ilog2(OPAL_EVENT_PCI_ERROR)); if (eeh_event_irq < 0) { @@ -391,12 +383,6 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data) if ((pdn->class_code >> 8) == PCI_CLASS_BRIDGE_ISA) return NULL; - /* Skip if we haven't probed yet */ - if (phb->ioda.pe_rmap[config_addr] == IODA_INVALID_PE) { - eeh_add_flag(EEH_POSTPONED_PROBE); - return NULL; - } - /* Initialize eeh device */ edev->class_code = pdn->class_code; edev->mode &= 0xFFFFFF00; @@ -604,7 +590,7 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe) EEH_STATE_MMIO_ENABLED | EEH_STATE_DMA_ENABLED); } else if (!(pe->state & EEH_PE_ISOLATED)) { - eeh_pe_state_mark(pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(pe); pnv_eeh_get_phb_diag(pe); if (eeh_has_flag(EEH_EARLY_DUMP_LOG)) @@ -706,7 +692,7 @@ static int pnv_eeh_get_pe_state(struct eeh_pe *pe) if (phb->freeze_pe) phb->freeze_pe(phb, pe->addr); - eeh_pe_state_mark(pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(pe); pnv_eeh_get_phb_diag(pe); if (eeh_has_flag(EEH_EARLY_DUMP_LOG)) @@ -1054,7 +1040,7 @@ static int pnv_eeh_reset_vf_pe(struct eeh_pe *pe, int option) int ret; /* The VF PE should have only one child device */ - edev = list_first_entry_or_null(&pe->edevs, struct eeh_dev, list); + edev = list_first_entry_or_null(&pe->edevs, struct eeh_dev, entry); pdn = eeh_dev_to_pdn(edev); if (!pdn) return -ENXIO; @@ -1148,43 +1134,6 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option) } /** - * pnv_eeh_wait_state - Wait for PE state - * @pe: EEH PE - * @max_wait: maximal period in millisecond - * - * Wait for the state of associated PE. It might take some time - * to retrieve the PE's state. - */ -static int pnv_eeh_wait_state(struct eeh_pe *pe, int max_wait) -{ - int ret; - int mwait; - - while (1) { - ret = pnv_eeh_get_state(pe, &mwait); - - /* - * If the PE's state is temporarily unavailable, - * we have to wait for the specified time. Otherwise, - * the PE's state will be returned immediately. - */ - if (ret != EEH_STATE_UNAVAILABLE) - return ret; - - if (max_wait <= 0) { - pr_warn("%s: Timeout getting PE#%x's state (%d)\n", - __func__, pe->addr, max_wait); - return EEH_STATE_NOT_SUPPORT; - } - - max_wait -= mwait; - msleep(mwait); - } - - return EEH_STATE_NOT_SUPPORT; -} - -/** * pnv_eeh_get_log - Retrieve error log * @pe: EEH PE * @severity: temporary or permanent error log @@ -1611,7 +1560,7 @@ static int pnv_eeh_next_error(struct eeh_pe **pe) if ((ret == EEH_NEXT_ERR_FROZEN_PE || ret == EEH_NEXT_ERR_FENCED_PHB) && !((*pe)->state & EEH_PE_ISOLATED)) { - eeh_pe_state_mark(*pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(*pe); pnv_eeh_get_phb_diag(*pe); if (eeh_has_flag(EEH_EARLY_DUMP_LOG)) @@ -1640,7 +1589,7 @@ static int pnv_eeh_next_error(struct eeh_pe **pe) } /* We possibly migrate to another PE */ - eeh_pe_state_mark(*pe, EEH_PE_ISOLATED); + eeh_pe_mark_isolated(*pe); } /* @@ -1702,7 +1651,6 @@ static struct eeh_ops pnv_eeh_ops = { .get_pe_addr = pnv_eeh_get_pe_addr, .get_state = pnv_eeh_get_state, .reset = pnv_eeh_reset, - .wait_state = pnv_eeh_wait_state, .get_log = pnv_eeh_get_log, .configure_bridge = pnv_eeh_configure_bridge, .err_inject = pnv_eeh_err_inject, diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c @@ -90,17 +90,15 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE, change_memblock_state); - lock_device_hotplug(); - remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT); - unlock_device_hotplug(); return true; } static u64 memtrace_alloc_node(u32 nid, u64 size) { - u64 start_pfn, end_pfn, nr_pages; + u64 start_pfn, end_pfn, nr_pages, pfn; u64 base_pfn; + u64 bytes = memory_block_size_bytes(); if (!node_spanned_pages(nid)) return 0; @@ -113,8 +111,21 @@ static u64 memtrace_alloc_node(u32 nid, u64 size) end_pfn = round_down(end_pfn - nr_pages, nr_pages); for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) { - if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true) + if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true) { + /* + * Remove memory in memory block size chunks so that + * iomem resources are always split to the same size and + * we never try to remove memory that spans two iomem + * resources. + */ + lock_device_hotplug(); + end_pfn = base_pfn + nr_pages; + for (pfn = base_pfn; pfn < end_pfn; pfn += bytes>> PAGE_SHIFT) { + remove_memory(nid, pfn << PAGE_SHIFT, bytes); + } + unlock_device_hotplug(); return base_pfn << PAGE_SHIFT; + } } return 0; diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c @@ -17,7 +17,7 @@ #include <linux/pci.h> #include <linux/memblock.h> #include <linux/iommu.h> -#include <linux/debugfs.h> +#include <linux/sizes.h> #include <asm/debugfs.h> #include <asm/tlb.h> @@ -42,14 +42,6 @@ static DEFINE_SPINLOCK(npu_context_lock); /* - * When an address shootdown range exceeds this threshold we invalidate the - * entire TLB on the GPU for the given PID rather than each specific address in - * the range. - */ -static uint64_t atsd_threshold = 2 * 1024 * 1024; -static struct dentry *atsd_threshold_dentry; - -/* * Other types of TCE cache invalidation are not functional in the * hardware. */ @@ -454,79 +446,73 @@ static void put_mmio_atsd_reg(struct npu *npu, int reg) } /* MMIO ATSD register offsets */ -#define XTS_ATSD_AVA 1 -#define XTS_ATSD_STAT 2 - -static void mmio_launch_invalidate(struct mmio_atsd_reg *mmio_atsd_reg, - unsigned long launch, unsigned long va) -{ - struct npu *npu = mmio_atsd_reg->npu; - int reg = mmio_atsd_reg->reg; - - __raw_writeq_be(va, npu->mmio_atsd_regs[reg] + XTS_ATSD_AVA); - eieio(); - __raw_writeq_be(launch, npu->mmio_atsd_regs[reg]); -} +#define XTS_ATSD_LAUNCH 0 +#define XTS_ATSD_AVA 1 +#define XTS_ATSD_STAT 2 -static void mmio_invalidate_pid(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS], - unsigned long pid, bool flush) +static unsigned long get_atsd_launch_val(unsigned long pid, unsigned long psize) { - int i; - unsigned long launch; - - for (i = 0; i <= max_npu2_index; i++) { - if (mmio_atsd_reg[i].reg < 0) - continue; + unsigned long launch = 0; - /* IS set to invalidate matching PID */ - launch = PPC_BIT(12); - - /* PRS set to process-scoped */ - launch |= PPC_BIT(13); + if (psize == MMU_PAGE_COUNT) { + /* IS set to invalidate entire matching PID */ + launch |= PPC_BIT(12); + } else { + /* AP set to invalidate region of psize */ + launch |= (u64)mmu_get_ap(psize) << PPC_BITLSHIFT(17); + } - /* AP */ - launch |= (u64) - mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17); + /* PRS set to process-scoped */ + launch |= PPC_BIT(13); - /* PID */ - launch |= pid << PPC_BITLSHIFT(38); + /* PID */ + launch |= pid << PPC_BITLSHIFT(38); - /* No flush */ - launch |= !flush << PPC_BITLSHIFT(39); + /* Leave "No flush" (bit 39) 0 so every ATSD performs a flush */ - /* Invalidating the entire process doesn't use a va */ - mmio_launch_invalidate(&mmio_atsd_reg[i], launch, 0); - } + return launch; } -static void mmio_invalidate_va(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS], - unsigned long va, unsigned long pid, bool flush) +static void mmio_atsd_regs_write(struct mmio_atsd_reg + mmio_atsd_reg[NV_MAX_NPUS], unsigned long offset, + unsigned long val) { - int i; - unsigned long launch; + struct npu *npu; + int i, reg; for (i = 0; i <= max_npu2_index; i++) { - if (mmio_atsd_reg[i].reg < 0) + reg = mmio_atsd_reg[i].reg; + if (reg < 0) continue; - /* IS set to invalidate target VA */ - launch = 0; + npu = mmio_atsd_reg[i].npu; + __raw_writeq_be(val, npu->mmio_atsd_regs[reg] + offset); + } +} + +static void mmio_invalidate_pid(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS], + unsigned long pid) +{ + unsigned long launch = get_atsd_launch_val(pid, MMU_PAGE_COUNT); - /* PRS set to process scoped */ - launch |= PPC_BIT(13); + /* Invalidating the entire process doesn't use a va */ + mmio_atsd_regs_write(mmio_atsd_reg, XTS_ATSD_LAUNCH, launch); +} - /* AP */ - launch |= (u64) - mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17); +static void mmio_invalidate_range(struct mmio_atsd_reg + mmio_atsd_reg[NV_MAX_NPUS], unsigned long pid, + unsigned long start, unsigned long psize) +{ + unsigned long launch = get_atsd_launch_val(pid, psize); - /* PID */ - launch |= pid << PPC_BITLSHIFT(38); + /* Write all VAs first */ + mmio_atsd_regs_write(mmio_atsd_reg, XTS_ATSD_AVA, start); - /* No flush */ - launch |= !flush << PPC_BITLSHIFT(39); + /* Issue one barrier for all address writes */ + eieio(); - mmio_launch_invalidate(&mmio_atsd_reg[i], launch, va); - } + /* Launch */ + mmio_atsd_regs_write(mmio_atsd_reg, XTS_ATSD_LAUNCH, launch); } #define mn_to_npu_context(x) container_of(x, struct npu_context, mn) @@ -612,14 +598,36 @@ static void release_atsd_reg(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS]) } /* - * Invalidate either a single address or an entire PID depending on - * the value of va. + * Invalidate a virtual address range */ -static void mmio_invalidate(struct npu_context *npu_context, int va, - unsigned long address, bool flush) +static void mmio_invalidate(struct npu_context *npu_context, + unsigned long start, unsigned long size) { struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS]; unsigned long pid = npu_context->mm->context.id; + unsigned long atsd_start = 0; + unsigned long end = start + size - 1; + int atsd_psize = MMU_PAGE_COUNT; + + /* + * Convert the input range into one of the supported sizes. If the range + * doesn't fit, use the next larger supported size. Invalidation latency + * is high, so over-invalidation is preferred to issuing multiple + * invalidates. + * + * A 4K page size isn't supported by NPU/GPU ATS, so that case is + * ignored. + */ + if (size == SZ_64K) { + atsd_start = start; + atsd_psize = MMU_PAGE_64K; + } else if (ALIGN_DOWN(start, SZ_2M) == ALIGN_DOWN(end, SZ_2M)) { + atsd_start = ALIGN_DOWN(start, SZ_2M); + atsd_psize = MMU_PAGE_2M; + } else if (ALIGN_DOWN(start, SZ_1G) == ALIGN_DOWN(end, SZ_1G)) { + atsd_start = ALIGN_DOWN(start, SZ_1G); + atsd_psize = MMU_PAGE_1G; + } if (npu_context->nmmu_flush) /* @@ -634,23 +642,25 @@ static void mmio_invalidate(struct npu_context *npu_context, int va, * an invalidate. */ acquire_atsd_reg(npu_context, mmio_atsd_reg); - if (va) - mmio_invalidate_va(mmio_atsd_reg, address, pid, flush); + + if (atsd_psize == MMU_PAGE_COUNT) + mmio_invalidate_pid(mmio_atsd_reg, pid); else - mmio_invalidate_pid(mmio_atsd_reg, pid, flush); + mmio_invalidate_range(mmio_atsd_reg, pid, atsd_start, + atsd_psize); mmio_invalidate_wait(mmio_atsd_reg); - if (flush) { - /* - * The GPU requires two flush ATSDs to ensure all entries have - * been flushed. We use PID 0 as it will never be used for a - * process on the GPU. - */ - mmio_invalidate_pid(mmio_atsd_reg, 0, true); - mmio_invalidate_wait(mmio_atsd_reg); - mmio_invalidate_pid(mmio_atsd_reg, 0, true); - mmio_invalidate_wait(mmio_atsd_reg); - } + + /* + * The GPU requires two flush ATSDs to ensure all entries have been + * flushed. We use PID 0 as it will never be used for a process on the + * GPU. + */ + mmio_invalidate_pid(mmio_atsd_reg, 0); + mmio_invalidate_wait(mmio_atsd_reg); + mmio_invalidate_pid(mmio_atsd_reg, 0); + mmio_invalidate_wait(mmio_atsd_reg); + release_atsd_reg(mmio_atsd_reg); } @@ -667,7 +677,7 @@ static void pnv_npu2_mn_release(struct mmu_notifier *mn, * There should be no more translation requests for this PID, but we * need to ensure any entries for it are removed from the TLB. */ - mmio_invalidate(npu_context, 0, 0, true); + mmio_invalidate(npu_context, 0, ~0UL); } static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn, @@ -676,8 +686,7 @@ static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn, pte_t pte) { struct npu_context *npu_context = mn_to_npu_context(mn); - - mmio_invalidate(npu_context, 1, address, true); + mmio_invalidate(npu_context, address, PAGE_SIZE); } static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn, @@ -685,21 +694,7 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn, unsigned long start, unsigned long end) { struct npu_context *npu_context = mn_to_npu_context(mn); - unsigned long address; - - if (end - start > atsd_threshold) { - /* - * Just invalidate the entire PID if the address range is too - * large. - */ - mmio_invalidate(npu_context, 0, 0, true); - } else { - for (address = start; address < end; address += PAGE_SIZE) - mmio_invalidate(npu_context, 1, address, false); - - /* Do the flush only on the final addess == end */ - mmio_invalidate(npu_context, 1, address, true); - } + mmio_invalidate(npu_context, start, end - start); } static const struct mmu_notifier_ops nv_nmmu_notifier_ops = { @@ -962,11 +957,6 @@ int pnv_npu2_init(struct pnv_phb *phb) static int npu_index; uint64_t rc = 0; - if (!atsd_threshold_dentry) { - atsd_threshold_dentry = debugfs_create_x64("atsd_threshold", - 0600, powerpc_debugfs_root, &atsd_threshold); - } - phb->npu.nmmu_flush = of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush"); for_each_child_of_node(phb->hose->dn, dn) { diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c b/arch/powerpc/platforms/powernv/opal-powercap.c @@ -199,7 +199,7 @@ void __init opal_powercap_init(void) } j = 0; - pcaps[i].pg.name = node->name; + pcaps[i].pg.name = kasprintf(GFP_KERNEL, "%pOFn", node); if (has_min) { powercap_add_attr(min, "powercap-min", &pcaps[i].pattrs[j]); @@ -237,6 +237,7 @@ out_pcaps_pattrs: while (--i >= 0) { kfree(pcaps[i].pattrs); kfree(pcaps[i].pg.attrs); + kfree(pcaps[i].pg.name); } kobject_put(powercap_kobj); out_pcaps: diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c b/arch/powerpc/platforms/powernv/opal-sensor-groups.c @@ -214,9 +214,9 @@ void __init opal_sensor_groups_init(void) } if (!of_property_read_u32(node, "ibm,chip-id", &chipid)) - sprintf(sgs[i].name, "%s%d", node->name, chipid); + sprintf(sgs[i].name, "%pOFn%d", node, chipid); else - sprintf(sgs[i].name, "%s", node->name); + sprintf(sgs[i].name, "%pOFn", node); sgs[i].sg.name = sgs[i].name; if (add_attr_group(ops, len, &sgs[i], sgid)) { diff --git a/arch/powerpc/platforms/powernv/opal-sysparam.c b/arch/powerpc/platforms/powernv/opal-sysparam.c @@ -194,7 +194,7 @@ void __init opal_sys_param_init(void) count = of_property_count_strings(sysparam, "param-name"); if (count < 0) { pr_err("SYSPARAM: No string found of property param-name in " - "the node %s\n", sysparam->name); + "the node %pOFn\n", sysparam); goto out_param_buf; } diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c @@ -535,7 +535,7 @@ static int opal_recover_mce(struct pt_regs *regs, return recovered; } -void pnv_platform_error_reboot(struct pt_regs *regs, const char *msg) +void __noreturn pnv_platform_error_reboot(struct pt_regs *regs, const char *msg) { panic_flush_kmsg_start(); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c @@ -219,17 +219,41 @@ static void pnv_prepare_going_down(void) static void __noreturn pnv_restart(char *cmd) { - long rc = OPAL_BUSY; + long rc; pnv_prepare_going_down(); - while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { - rc = opal_cec_reboot(); - if (rc == OPAL_BUSY_EVENT) - opal_poll_events(NULL); + do { + if (!cmd) + rc = opal_cec_reboot(); + else if (strcmp(cmd, "full") == 0) + rc = opal_cec_reboot2(OPAL_REBOOT_FULL_IPL, NULL); else + rc = OPAL_UNSUPPORTED; + + if (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { + /* Opal is busy wait for some time and retry */ + opal_poll_events(NULL); mdelay(10); - } + + } else if (cmd && rc) { + /* Unknown error while issuing reboot */ + if (rc == OPAL_UNSUPPORTED) + pr_err("Unsupported '%s' reboot.\n", cmd); + else + pr_err("Unable to issue '%s' reboot. Err=%ld\n", + cmd, rc); + pr_info("Forcing a cec-reboot\n"); + cmd = NULL; + rc = OPAL_BUSY; + + } else if (rc != OPAL_SUCCESS) { + /* Unknown error while issuing cec-reboot */ + pr_err("Unable to reboot. Err=%ld\n", rc); + } + + } while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT); + for (;;) opal_poll_events(NULL); } @@ -437,6 +461,16 @@ static unsigned long pnv_get_proc_freq(unsigned int cpu) return ret_freq; } +static long pnv_machine_check_early(struct pt_regs *regs) +{ + long handled = 0; + + if (cur_cpu_spec && cur_cpu_spec->machine_check_early) + handled = cur_cpu_spec->machine_check_early(regs); + + return handled; +} + define_machine(powernv) { .name = "PowerNV", .probe = pnv_probe, @@ -448,6 +482,7 @@ define_machine(powernv) { .machine_shutdown = pnv_shutdown, .power_save = NULL, .calibrate_decr = generic_calibrate_decr, + .machine_check_early = pnv_machine_check_early, #ifdef CONFIG_KEXEC_CORE .kexec_cpu_down = pnv_kexec_cpu_down, #endif diff --git a/arch/powerpc/platforms/ps3/Kconfig b/arch/powerpc/platforms/ps3/Kconfig @@ -49,7 +49,6 @@ config PS3_HTAB_SIZE config PS3_DYNAMIC_DMA depends on PPC_PS3 bool "PS3 Platform dynamic DMA page table management" - default n help This option will enable kernel support to take advantage of the per device dynamic DMA page table management provided by the Cell @@ -89,7 +88,6 @@ config PS3_SYS_MANAGER config PS3_REPOSITORY_WRITE bool "PS3 Repository write support" if PS3_ADVANCED depends on PPC_PS3 - default n help Enables support for writing to the PS3 System Repository. diff --git a/arch/powerpc/platforms/ps3/os-area.c b/arch/powerpc/platforms/ps3/os-area.c @@ -664,7 +664,7 @@ static int update_flash_db(void) db_set_64(db, &os_area_db_id_rtc_diff, saved_params.rtc_diff); count = os_area_flash_write(db, sizeof(struct os_area_db), pos); - if (count < sizeof(struct os_area_db)) { + if (count < 0 || count < sizeof(struct os_area_db)) { pr_debug("%s: os_area_flash_write failed %zd\n", __func__, count); error = count < 0 ? count : -EIO; diff --git a/arch/powerpc/platforms/ps3/spu.c b/arch/powerpc/platforms/ps3/spu.c @@ -215,8 +215,7 @@ static int __init setup_areas(struct spu *spu) goto fail_ioremap; } - spu->local_store = (__force void *)ioremap_prot(spu->local_store_phys, - LS_SIZE, pgprot_val(pgprot_noncached_wc(__pgprot(0)))); + spu->local_store = (__force void *)ioremap_wc(spu->local_store_phys, LS_SIZE); if (!spu->local_store) { pr_debug("%s:%d: ioremap local_store failed\n", diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig @@ -28,7 +28,6 @@ config PPC_PSERIES config PPC_SPLPAR depends on PPC_PSERIES bool "Support for shared-processor logical partitions" - default n help Enabling this option will make the kernel run more efficiently on logically-partitioned pSeries systems which use shared @@ -99,7 +98,6 @@ config PPC_SMLPAR bool "Support for shared-memory logical partitions" depends on PPC_PSERIES select LPARCFG - default n help Select this option to enable shared memory partition support. With this option a system running in an LPAR can be given more @@ -140,3 +138,10 @@ config IBMEBUS bool "Support for GX bus based adapters" help Bus device driver for GX bus based adapters. + +config PAPR_SCM + depends on PPC_PSERIES && MEMORY_HOTPLUG + select LIBNVDIMM + tristate "Support for the PAPR Storage Class Memory interface" + help + Enable access to hypervisor provided storage class memory. diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_KEXEC_CORE) += kexec.o obj-$(CONFIG_PSERIES_ENERGY) += pseries_energy.o obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu.o -obj-$(CONFIG_MEMORY_HOTPLUG) += hotplug-memory.o +obj-$(CONFIG_MEMORY_HOTPLUG) += hotplug-memory.o pmem.o obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o @@ -24,6 +24,7 @@ obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o obj-$(CONFIG_LPARCFG) += lparcfg.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_IBMEBUS) += ibmebus.o +obj-$(CONFIG_PAPR_SCM) += papr_scm.o ifdef CONFIG_PPC_PSERIES obj-$(CONFIG_SUSPEND) += suspend.o diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c @@ -32,8 +32,6 @@ static struct workqueue_struct *pseries_hp_wq; struct pseries_hp_work { struct work_struct work; struct pseries_hp_errorlog *errlog; - struct completion *hp_completion; - int *rc; }; struct cc_workarea { @@ -329,7 +327,7 @@ int dlpar_release_drc(u32 drc_index) return 0; } -static int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog) +int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog) { int rc; @@ -357,6 +355,10 @@ static int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog) case PSERIES_HP_ELOG_RESOURCE_CPU: rc = dlpar_cpu(hp_elog); break; + case PSERIES_HP_ELOG_RESOURCE_PMEM: + rc = dlpar_hp_pmem(hp_elog); + break; + default: pr_warn_ratelimited("Invalid resource (%d) specified\n", hp_elog->resource); @@ -371,20 +373,13 @@ static void pseries_hp_work_fn(struct work_struct *work) struct pseries_hp_work *hp_work = container_of(work, struct pseries_hp_work, work); - if (hp_work->rc) - *(hp_work->rc) = handle_dlpar_errorlog(hp_work->errlog); - else - handle_dlpar_errorlog(hp_work->errlog); - - if (hp_work->hp_completion) - complete(hp_work->hp_completion); + handle_dlpar_errorlog(hp_work->errlog); kfree(hp_work->errlog); kfree((void *)work); } -void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog, - struct completion *hotplug_done, int *rc) +void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog) { struct pseries_hp_work *work; struct pseries_hp_errorlog *hp_errlog_copy; @@ -397,13 +392,9 @@ void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog, if (work) { INIT_WORK((struct work_struct *)work, pseries_hp_work_fn); work->errlog = hp_errlog_copy; - work->hp_completion = hotplug_done; - work->rc = rc; queue_work(pseries_hp_wq, (struct work_struct *)work); } else { - *rc = -ENOMEM; kfree(hp_errlog_copy); - complete(hotplug_done); } } @@ -521,18 +512,15 @@ static int dlpar_parse_id_type(char **cmd, struct pseries_hp_errorlog *hp_elog) static ssize_t dlpar_store(struct class *class, struct class_attribute *attr, const char *buf, size_t count) { - struct pseries_hp_errorlog *hp_elog; - struct completion hotplug_done; + struct pseries_hp_errorlog hp_elog; char *argbuf; char *args; int rc; args = argbuf = kstrdup(buf, GFP_KERNEL); - hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL); - if (!hp_elog || !argbuf) { + if (!argbuf) { pr_info("Could not allocate resources for DLPAR operation\n"); kfree(argbuf); - kfree(hp_elog); return -ENOMEM; } @@ -540,25 +528,22 @@ static ssize_t dlpar_store(struct class *class, struct class_attribute *attr, * Parse out the request from the user, this will be in the form: * <resource> <action> <id_type> <id> */ - rc = dlpar_parse_resource(&args, hp_elog); + rc = dlpar_parse_resource(&args, &hp_elog); if (rc) goto dlpar_store_out; - rc = dlpar_parse_action(&args, hp_elog); + rc = dlpar_parse_action(&args, &hp_elog); if (rc) goto dlpar_store_out; - rc = dlpar_parse_id_type(&args, hp_elog); + rc = dlpar_parse_id_type(&args, &hp_elog); if (rc) goto dlpar_store_out; - init_completion(&hotplug_done); - queue_hotplug_event(hp_elog, &hotplug_done, &rc); - wait_for_completion(&hotplug_done); + rc = handle_dlpar_errorlog(&hp_elog); dlpar_store_out: kfree(argbuf); - kfree(hp_elog); if (rc) pr_err("Could not handle DLPAR request \"%s\"\n", buf); diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c @@ -149,7 +149,7 @@ static int dtl_start(struct dtl *dtl) /* Register our dtl buffer with the hypervisor. The HV expects the * buffer size to be passed in the second word of the buffer */ - ((u32 *)dtl->buf)[1] = DISPATCH_LOG_BYTES; + ((u32 *)dtl->buf)[1] = cpu_to_be32(DISPATCH_LOG_BYTES); hwcpu = get_hard_smp_processor_id(dtl->cpu); addr = __pa(dtl->buf); @@ -184,7 +184,7 @@ static void dtl_stop(struct dtl *dtl) static u64 dtl_current_index(struct dtl *dtl) { - return lppaca_of(dtl->cpu).dtl_idx; + return be64_to_cpu(lppaca_of(dtl->cpu).dtl_idx); } #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */ diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c @@ -438,7 +438,7 @@ static int pseries_eeh_get_pe_addr(struct eeh_pe *pe) /** * pseries_eeh_get_state - Retrieve PE state * @pe: EEH PE - * @state: return value + * @delay: suggested time to wait if state is unavailable * * Retrieve the state of the specified PE. On RTAS compliant * pseries platform, there already has one dedicated RTAS function @@ -448,7 +448,7 @@ static int pseries_eeh_get_pe_addr(struct eeh_pe *pe) * RTAS calls for the purpose, we need to try the new one and back * to the old one if the new one couldn't work properly. */ -static int pseries_eeh_get_state(struct eeh_pe *pe, int *state) +static int pseries_eeh_get_state(struct eeh_pe *pe, int *delay) { int config_addr; int ret; @@ -499,7 +499,8 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state) break; case 5: if (rets[2]) { - if (state) *state = rets[2]; + if (delay) + *delay = rets[2]; result = EEH_STATE_UNAVAILABLE; } else { result = EEH_STATE_NOT_SUPPORT; @@ -554,64 +555,6 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option) } /** - * pseries_eeh_wait_state - Wait for PE state - * @pe: EEH PE - * @max_wait: maximal period in millisecond - * - * Wait for the state of associated PE. It might take some time - * to retrieve the PE's state. - */ -static int pseries_eeh_wait_state(struct eeh_pe *pe, int max_wait) -{ - int ret; - int mwait; - - /* - * According to PAPR, the state of PE might be temporarily - * unavailable. Under the circumstance, we have to wait - * for indicated time determined by firmware. The maximal - * wait time is 5 minutes, which is acquired from the original - * EEH implementation. Also, the original implementation - * also defined the minimal wait time as 1 second. - */ -#define EEH_STATE_MIN_WAIT_TIME (1000) -#define EEH_STATE_MAX_WAIT_TIME (300 * 1000) - - while (1) { - ret = pseries_eeh_get_state(pe, &mwait); - - /* - * If the PE's state is temporarily unavailable, - * we have to wait for the specified time. Otherwise, - * the PE's state will be returned immediately. - */ - if (ret != EEH_STATE_UNAVAILABLE) - return ret; - - if (max_wait <= 0) { - pr_warn("%s: Timeout when getting PE's state (%d)\n", - __func__, max_wait); - return EEH_STATE_NOT_SUPPORT; - } - - if (mwait <= 0) { - pr_warn("%s: Firmware returned bad wait value %d\n", - __func__, mwait); - mwait = EEH_STATE_MIN_WAIT_TIME; - } else if (mwait > EEH_STATE_MAX_WAIT_TIME) { - pr_warn("%s: Firmware returned too long wait value %d\n", - __func__, mwait); - mwait = EEH_STATE_MAX_WAIT_TIME; - } - - max_wait -= mwait; - msleep(mwait); - } - - return EEH_STATE_NOT_SUPPORT; -} - -/** * pseries_eeh_get_log - Retrieve error log * @pe: EEH PE * @severity: temporary or permanent error log @@ -849,7 +792,6 @@ static struct eeh_ops pseries_eeh_ops = { .get_pe_addr = pseries_eeh_get_pe_addr, .get_state = pseries_eeh_get_state, .reset = pseries_eeh_reset, - .wait_state = pseries_eeh_wait_state, .get_log = pseries_eeh_get_log, .configure_bridge = pseries_eeh_configure_bridge, .err_inject = NULL, diff --git a/arch/powerpc/platforms/pseries/event_sources.c b/arch/powerpc/platforms/pseries/event_sources.c @@ -16,7 +16,8 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include <asm/prom.h> +#include <linux/interrupt.h> +#include <linux/of_irq.h> #include "pseries.h" @@ -24,34 +25,19 @@ void request_event_sources_irqs(struct device_node *np, irq_handler_t handler, const char *name) { - int i, index, count = 0; - struct of_phandle_args oirq; - unsigned int virqs[16]; + int i, virq, rc; - /* First try to do a proper OF tree parsing */ - for (index = 0; of_irq_parse_one(np, index, &oirq) == 0; - index++) { - if (count > 15) - break; - virqs[count] = irq_create_of_mapping(&oirq); - if (!virqs[count]) { - pr_err("event-sources: Unable to allocate " - "interrupt number for %pOF\n", - np); - WARN_ON(1); - } else { - count++; - } - } + for (i = 0; i < 16; i++) { + virq = of_irq_get(np, i); + if (virq < 0) + return; + if (WARN(!virq, "event-sources: Unable to allocate " + "interrupt number for %pOF\n", np)) + continue; - /* Now request them */ - for (i = 0; i < count; i++) { - if (request_irq(virqs[i], handler, 0, name, NULL)) { - pr_err("event-sources: Unable to request interrupt " - "%d for %pOF\n", virqs[i], np); - WARN_ON(1); + rc = request_irq(virq, handler, 0, name, NULL); + if (WARN(rc, "event-sources: Unable to request interrupt %d for %pOF\n", + virq, np)) return; - } } } - diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c @@ -65,6 +65,8 @@ hypertas_fw_features_table[] = { {FW_FEATURE_SET_MODE, "hcall-set-mode"}, {FW_FEATURE_BEST_ENERGY, "hcall-best-energy-1*"}, {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"}, + {FW_FEATURE_BLOCK_REMOVE, "hcall-block-remove"}, + {FW_FEATURE_PAPR_SCM, "hcall-scm"}, }; /* Build up the firmware features bitmask using the contents of diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -287,7 +287,7 @@ static int pseries_add_processor(struct device_node *np) if (cpumask_empty(tmp)) { printk(KERN_ERR "Unable to find space in cpu_present_mask for" - " processor %s with %d thread(s)\n", np->name, + " processor %pOFn with %d thread(s)\n", np, nthreads); goto out_unlock; } @@ -481,8 +481,8 @@ static ssize_t dlpar_cpu_add(u32 drc_index) if (rc) { saved_rc = rc; - pr_warn("Failed to attach node %s, rc: %d, drc index: %x\n", - dn->name, rc, drc_index); + pr_warn("Failed to attach node %pOFn, rc: %d, drc index: %x\n", + dn, rc, drc_index); rc = dlpar_release_drc(drc_index); if (!rc) @@ -494,8 +494,8 @@ static ssize_t dlpar_cpu_add(u32 drc_index) rc = dlpar_online_cpu(dn); if (rc) { saved_rc = rc; - pr_warn("Failed to online cpu %s, rc: %d, drc index: %x\n", - dn->name, rc, drc_index); + pr_warn("Failed to online cpu %pOFn, rc: %d, drc index: %x\n", + dn, rc, drc_index); rc = dlpar_detach_node(dn); if (!rc) @@ -504,7 +504,7 @@ static ssize_t dlpar_cpu_add(u32 drc_index) return saved_rc; } - pr_debug("Successfully added CPU %s, drc index: %x\n", dn->name, + pr_debug("Successfully added CPU %pOFn, drc index: %x\n", dn, drc_index); return rc; } @@ -570,19 +570,19 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index) { int rc; - pr_debug("Attempting to remove CPU %s, drc index: %x\n", - dn->name, drc_index); + pr_debug("Attempting to remove CPU %pOFn, drc index: %x\n", + dn, drc_index); rc = dlpar_offline_cpu(dn); if (rc) { - pr_warn("Failed to offline CPU %s, rc: %d\n", dn->name, rc); + pr_warn("Failed to offline CPU %pOFn, rc: %d\n", dn, rc); return -EINVAL; } rc = dlpar_release_drc(drc_index); if (rc) { - pr_warn("Failed to release drc (%x) for CPU %s, rc: %d\n", - drc_index, dn->name, rc); + pr_warn("Failed to release drc (%x) for CPU %pOFn, rc: %d\n", + drc_index, dn, rc); dlpar_online_cpu(dn); return rc; } @@ -591,7 +591,7 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index) if (rc) { int saved_rc = rc; - pr_warn("Failed to detach CPU %s, rc: %d", dn->name, rc); + pr_warn("Failed to detach CPU %pOFn, rc: %d", dn, rc); rc = dlpar_acquire_drc(drc_index); if (!rc) @@ -662,8 +662,8 @@ static int find_dlpar_cpus_to_remove(u32 *cpu_drcs, int cpus_to_remove) rc = of_property_read_u32(dn, "ibm,my-drc-index", &cpu_drcs[cpus_found - 1]); if (rc) { - pr_warn("Error occurred getting drc-index for %s\n", - dn->name); + pr_warn("Error occurred getting drc-index for %pOFn\n", + dn); of_node_put(dn); return -1; } diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -101,11 +101,12 @@ static struct property *dlpar_clone_property(struct property *prop, return new_prop; } -static u32 find_aa_index(struct device_node *dr_node, - struct property *ala_prop, const u32 *lmb_assoc) +static bool find_aa_index(struct device_node *dr_node, + struct property *ala_prop, + const u32 *lmb_assoc, u32 *aa_index) { - u32 *assoc_arrays; - u32 aa_index; + u32 *assoc_arrays, new_prop_size; + struct property *new_prop; int aa_arrays, aa_array_entries, aa_array_sz; int i, index; @@ -121,54 +122,48 @@ static u32 find_aa_index(struct device_node *dr_node, aa_array_entries = be32_to_cpu(assoc_arrays[1]); aa_array_sz = aa_array_entries * sizeof(u32); - aa_index = -1; for (i = 0; i < aa_arrays; i++) { index = (i * aa_array_entries) + 2; if (memcmp(&assoc_arrays[index], &lmb_assoc[1], aa_array_sz)) continue; - aa_index = i; - break; + *aa_index = i; + return true; } - if (aa_index == -1) { - struct property *new_prop; - u32 new_prop_size; - - new_prop_size = ala_prop->length + aa_array_sz; - new_prop = dlpar_clone_property(ala_prop, new_prop_size); - if (!new_prop) - return -1; - - assoc_arrays = new_prop->value; + new_prop_size = ala_prop->length + aa_array_sz; + new_prop = dlpar_clone_property(ala_prop, new_prop_size); + if (!new_prop) + return false; - /* increment the number of entries in the lookup array */ - assoc_arrays[0] = cpu_to_be32(aa_arrays + 1); + assoc_arrays = new_prop->value; - /* copy the new associativity into the lookup array */ - index = aa_arrays * aa_array_entries + 2; - memcpy(&assoc_arrays[index], &lmb_assoc[1], aa_array_sz); + /* increment the number of entries in the lookup array */ + assoc_arrays[0] = cpu_to_be32(aa_arrays + 1); - of_update_property(dr_node, new_prop); + /* copy the new associativity into the lookup array */ + index = aa_arrays * aa_array_entries + 2; + memcpy(&assoc_arrays[index], &lmb_assoc[1], aa_array_sz); - /* - * The associativity lookup array index for this lmb is - * number of entries - 1 since we added its associativity - * to the end of the lookup array. - */ - aa_index = be32_to_cpu(assoc_arrays[0]) - 1; - } + of_update_property(dr_node, new_prop); - return aa_index; + /* + * The associativity lookup array index for this lmb is + * number of entries - 1 since we added its associativity + * to the end of the lookup array. + */ + *aa_index = be32_to_cpu(assoc_arrays[0]) - 1; + return true; } -static u32 lookup_lmb_associativity_index(struct drmem_lmb *lmb) +static int update_lmb_associativity_index(struct drmem_lmb *lmb) { struct device_node *parent, *lmb_node, *dr_node; struct property *ala_prop; const u32 *lmb_assoc; u32 aa_index; + bool found; parent = of_find_node_by_path("/"); if (!parent) @@ -200,46 +195,17 @@ static u32 lookup_lmb_associativity_index(struct drmem_lmb *lmb) return -ENODEV; } - aa_index = find_aa_index(dr_node, ala_prop, lmb_assoc); + found = find_aa_index(dr_node, ala_prop, lmb_assoc, &aa_index); dlpar_free_cc_nodes(lmb_node); - return aa_index; -} -static int dlpar_add_device_tree_lmb(struct drmem_lmb *lmb) -{ - int rc, aa_index; - - lmb->flags |= DRCONF_MEM_ASSIGNED; - - aa_index = lookup_lmb_associativity_index(lmb); - if (aa_index < 0) { - pr_err("Couldn't find associativity index for drc index %x\n", - lmb->drc_index); - return aa_index; + if (!found) { + pr_err("Could not find LMB associativity\n"); + return -1; } lmb->aa_index = aa_index; - - rtas_hp_event = true; - rc = drmem_update_dt(); - rtas_hp_event = false; - - return rc; -} - -static int dlpar_remove_device_tree_lmb(struct drmem_lmb *lmb) -{ - int rc; - - lmb->flags &= ~DRCONF_MEM_ASSIGNED; - lmb->aa_index = 0xffffffff; - - rtas_hp_event = true; - rc = drmem_update_dt(); - rtas_hp_event = false; - - return rc; + return 0; } static struct memory_block *lmb_to_memblock(struct drmem_lmb *lmb) @@ -428,7 +394,9 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb) /* Update memory regions for memory remove */ memblock_remove(lmb->base_addr, block_sz); - dlpar_remove_device_tree_lmb(lmb); + invalidate_lmb_associativity_index(lmb); + lmb->flags &= ~DRCONF_MEM_ASSIGNED; + return 0; } @@ -688,10 +656,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) if (lmb->flags & DRCONF_MEM_ASSIGNED) return -EINVAL; - rc = dlpar_add_device_tree_lmb(lmb); + rc = update_lmb_associativity_index(lmb); if (rc) { - pr_err("Couldn't update device tree for drc index %x\n", - lmb->drc_index); dlpar_release_drc(lmb->drc_index); return rc; } @@ -704,14 +670,14 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) /* Add the memory */ rc = add_memory(nid, lmb->base_addr, block_sz); if (rc) { - dlpar_remove_device_tree_lmb(lmb); + invalidate_lmb_associativity_index(lmb); return rc; } rc = dlpar_online_lmb(lmb); if (rc) { remove_memory(nid, lmb->base_addr, block_sz); - dlpar_remove_device_tree_lmb(lmb); + invalidate_lmb_associativity_index(lmb); } else { lmb->flags |= DRCONF_MEM_ASSIGNED; } @@ -958,6 +924,12 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog) break; } + if (!rc) { + rtas_hp_event = true; + rc = drmem_update_dt(); + rtas_hp_event = false; + } + unlock_device_hotplug(); return rc; } diff --git a/arch/powerpc/platforms/pseries/ibmebus.c b/arch/powerpc/platforms/pseries/ibmebus.c @@ -404,7 +404,7 @@ static ssize_t name_show(struct device *dev, struct platform_device *ofdev; ofdev = to_platform_device(dev); - return sprintf(buf, "%s\n", ofdev->dev.of_node->name); + return sprintf(buf, "%pOFn\n", ofdev->dev.of_node); } static DEVICE_ATTR_RO(name); diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c @@ -48,6 +48,7 @@ #include <asm/kexec.h> #include <asm/fadump.h> #include <asm/asm-prototypes.h> +#include <asm/debugfs.h> #include "pseries.h" @@ -417,6 +418,79 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn, BUG_ON(lpar_rc != H_SUCCESS); } + +/* + * As defined in the PAPR's section 14.5.4.1.8 + * The control mask doesn't include the returned reference and change bit from + * the processed PTE. + */ +#define HBLKR_AVPN 0x0100000000000000UL +#define HBLKR_CTRL_MASK 0xf800000000000000UL +#define HBLKR_CTRL_SUCCESS 0x8000000000000000UL +#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL +#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL + +/** + * H_BLOCK_REMOVE caller. + * @idx should point to the latest @param entry set with a PTEX. + * If PTE cannot be processed because another CPUs has already locked that + * group, those entries are put back in @param starting at index 1. + * If entries has to be retried and @retry_busy is set to true, these entries + * are retried until success. If @retry_busy is set to false, the returned + * is the number of entries yet to process. + */ +static unsigned long call_block_remove(unsigned long idx, unsigned long *param, + bool retry_busy) +{ + unsigned long i, rc, new_idx; + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE]; + + if (idx < 2) { + pr_warn("Unexpected empty call to H_BLOCK_REMOVE"); + return 0; + } +again: + new_idx = 0; + if (idx > PLPAR_HCALL9_BUFSIZE) { + pr_err("Too many PTEs (%lu) for H_BLOCK_REMOVE", idx); + idx = PLPAR_HCALL9_BUFSIZE; + } else if (idx < PLPAR_HCALL9_BUFSIZE) + param[idx] = HBR_END; + + rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf, + param[0], /* AVA */ + param[1], param[2], param[3], param[4], /* TS0-7 */ + param[5], param[6], param[7], param[8]); + if (rc == H_SUCCESS) + return 0; + + BUG_ON(rc != H_PARTIAL); + + /* Check that the unprocessed entries were 'not found' or 'busy' */ + for (i = 0; i < idx-1; i++) { + unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK; + + if (ctrl == HBLKR_CTRL_ERRBUSY) { + param[++new_idx] = param[i+1]; + continue; + } + + BUG_ON(ctrl != HBLKR_CTRL_SUCCESS + && ctrl != HBLKR_CTRL_ERRNOTFOUND); + } + + /* + * If there were entries found busy, retry these entries if requested, + * of if all the entries have to be retried. + */ + if (new_idx && (retry_busy || new_idx == (PLPAR_HCALL9_BUFSIZE-1))) { + idx = new_idx + 1; + goto again; + } + + return new_idx; +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need @@ -424,17 +498,57 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn, */ #define PPC64_HUGE_HPTE_BATCH 12 -static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot, - unsigned long *vpn, int count, - int psize, int ssize) +static void hugepage_block_invalidate(unsigned long *slot, unsigned long *vpn, + int count, int psize, int ssize) { unsigned long param[PLPAR_HCALL9_BUFSIZE]; - int i = 0, pix = 0, rc; - unsigned long flags = 0; - int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); + unsigned long shift, current_vpgb, vpgb; + int i, pix = 0; - if (lock_tlbie) - spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags); + shift = mmu_psize_defs[psize].shift; + + for (i = 0; i < count; i++) { + /* + * Shifting 3 bits more on the right to get a + * 8 pages aligned virtual addresse. + */ + vpgb = (vpn[i] >> (shift - VPN_SHIFT + 3)); + if (!pix || vpgb != current_vpgb) { + /* + * Need to start a new 8 pages block, flush + * the current one if needed. + */ + if (pix) + (void)call_block_remove(pix, param, true); + current_vpgb = vpgb; + param[0] = hpte_encode_avpn(vpn[i], psize, ssize); + pix = 1; + } + + param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot[i]; + if (pix == PLPAR_HCALL9_BUFSIZE) { + pix = call_block_remove(pix, param, false); + /* + * pix = 0 means that all the entries were + * removed, we can start a new block. + * Otherwise, this means that there are entries + * to retry, and pix points to latest one, so + * we should increment it and try to continue + * the same block. + */ + if (pix) + pix++; + } + } + if (pix) + (void)call_block_remove(pix, param, true); +} + +static void hugepage_bulk_invalidate(unsigned long *slot, unsigned long *vpn, + int count, int psize, int ssize) +{ + unsigned long param[PLPAR_HCALL9_BUFSIZE]; + int i = 0, pix = 0, rc; for (i = 0; i < count; i++) { @@ -462,6 +576,23 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot, param[6], param[7]); BUG_ON(rc != H_SUCCESS); } +} + +static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot, + unsigned long *vpn, + int count, int psize, + int ssize) +{ + unsigned long flags = 0; + int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); + + if (lock_tlbie) + spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags); + + if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) + hugepage_block_invalidate(slot, vpn, count, psize, ssize); + else + hugepage_bulk_invalidate(slot, vpn, count, psize, ssize); if (lock_tlbie) spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags); @@ -546,6 +677,86 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea, return 0; } + +static inline unsigned long compute_slot(real_pte_t pte, + unsigned long vpn, + unsigned long index, + unsigned long shift, + int ssize) +{ + unsigned long slot, hash, hidx; + + hash = hpt_hash(vpn, shift, ssize); + hidx = __rpte_to_hidx(pte, index); + if (hidx & _PTEIDX_SECONDARY) + hash = ~hash; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; + slot += hidx & _PTEIDX_GROUP_IX; + return slot; +} + +/** + * The hcall H_BLOCK_REMOVE implies that the virtual pages to processed are + * "all within the same naturally aligned 8 page virtual address block". + */ +static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch, + unsigned long *param) +{ + unsigned long vpn; + unsigned long i, pix = 0; + unsigned long index, shift, slot, current_vpgb, vpgb; + real_pte_t pte; + int psize, ssize; + + psize = batch->psize; + ssize = batch->ssize; + + for (i = 0; i < number; i++) { + vpn = batch->vpn[i]; + pte = batch->pte[i]; + pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) { + /* + * Shifting 3 bits more on the right to get a + * 8 pages aligned virtual addresse. + */ + vpgb = (vpn >> (shift - VPN_SHIFT + 3)); + if (!pix || vpgb != current_vpgb) { + /* + * Need to start a new 8 pages block, flush + * the current one if needed. + */ + if (pix) + (void)call_block_remove(pix, param, + true); + current_vpgb = vpgb; + param[0] = hpte_encode_avpn(vpn, psize, + ssize); + pix = 1; + } + + slot = compute_slot(pte, vpn, index, shift, ssize); + param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot; + + if (pix == PLPAR_HCALL9_BUFSIZE) { + pix = call_block_remove(pix, param, false); + /* + * pix = 0 means that all the entries were + * removed, we can start a new block. + * Otherwise, this means that there are entries + * to retry, and pix points to latest one, so + * we should increment it and try to continue + * the same block. + */ + if (pix) + pix++; + } + } pte_iterate_hashed_end(); + } + + if (pix) + (void)call_block_remove(pix, param, true); +} + /* * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie * lock. @@ -558,13 +769,18 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local) struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch); int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); unsigned long param[PLPAR_HCALL9_BUFSIZE]; - unsigned long hash, index, shift, hidx, slot; + unsigned long index, shift, slot; real_pte_t pte; int psize, ssize; if (lock_tlbie) spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags); + if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) { + do_block_remove(number, batch, param); + goto out; + } + psize = batch->psize; ssize = batch->ssize; pix = 0; @@ -572,12 +788,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local) vpn = batch->vpn[i]; pte = batch->pte[i]; pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) { - hash = hpt_hash(vpn, shift, ssize); - hidx = __rpte_to_hidx(pte, index); - if (hidx & _PTEIDX_SECONDARY) - hash = ~hash; - slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; - slot += hidx & _PTEIDX_GROUP_IX; + slot = compute_slot(pte, vpn, index, shift, ssize); if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) { /* * lpar doesn't use the passed actual page size @@ -608,6 +819,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local) BUG_ON(rc != H_SUCCESS); } +out: if (lock_tlbie) spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags); } @@ -1028,3 +1240,56 @@ static int __init reserve_vrma_context_id(void) return 0; } machine_device_initcall(pseries, reserve_vrma_context_id); + +#ifdef CONFIG_DEBUG_FS +/* debugfs file interface for vpa data */ +static ssize_t vpa_file_read(struct file *filp, char __user *buf, size_t len, + loff_t *pos) +{ + int cpu = (long)filp->private_data; + struct lppaca *lppaca = &lppaca_of(cpu); + + return simple_read_from_buffer(buf, len, pos, lppaca, + sizeof(struct lppaca)); +} + +static const struct file_operations vpa_fops = { + .open = simple_open, + .read = vpa_file_read, + .llseek = default_llseek, +}; + +static int __init vpa_debugfs_init(void) +{ + char name[16]; + long i; + static struct dentry *vpa_dir; + + if (!firmware_has_feature(FW_FEATURE_SPLPAR)) + return 0; + + vpa_dir = debugfs_create_dir("vpa", powerpc_debugfs_root); + if (!vpa_dir) { + pr_warn("%s: can't create vpa root dir\n", __func__); + return -ENOMEM; + } + + /* set up the per-cpu vpa file*/ + for_each_possible_cpu(i) { + struct dentry *d; + + sprintf(name, "cpu-%ld", i); + + d = debugfs_create_file(name, 0400, vpa_dir, (void *)i, + &vpa_fops); + if (!d) { + pr_warn("%s: can't create per-cpu vpa file\n", + __func__); + return -ENOMEM; + } + } + + return 0; +} +machine_arch_initcall(pseries, vpa_debugfs_init); +#endif /* CONFIG_DEBUG_FS */ diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c @@ -585,8 +585,7 @@ static ssize_t update_mpp(u64 *entitlement, u8 *weight) static ssize_t lparcfg_write(struct file *file, const char __user * buf, size_t count, loff_t * off) { - int kbuf_sz = 64; - char kbuf[kbuf_sz]; + char kbuf[64]; char *tmp; u64 new_entitled, *new_entitled_ptr = &new_entitled; u8 new_weight, *new_weight_ptr = &new_weight; @@ -595,7 +594,7 @@ static ssize_t lparcfg_write(struct file *file, const char __user * buf, if (!firmware_has_feature(FW_FEATURE_SPLPAR)) return -EINVAL; - if (count > kbuf_sz) + if (count > sizeof(kbuf)) return -EINVAL; if (copy_from_user(kbuf, buf, count)) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c @@ -242,7 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index) static void prrn_update_node(__be32 phandle) { - struct pseries_hp_errorlog *hp_elog; + struct pseries_hp_errorlog hp_elog; struct device_node *dn; /* @@ -255,18 +255,12 @@ static void prrn_update_node(__be32 phandle) return; } - hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL); - if(!hp_elog) - return; - - hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_MEM; - hp_elog->action = PSERIES_HP_ELOG_ACTION_READD; - hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX; - hp_elog->_drc_u.drc_index = phandle; - - queue_hotplug_event(hp_elog, NULL, NULL); + hp_elog.resource = PSERIES_HP_ELOG_RESOURCE_MEM; + hp_elog.action = PSERIES_HP_ELOG_ACTION_READD; + hp_elog.id_type = PSERIES_HP_ELOG_ID_DRC_INDEX; + hp_elog._drc_u.drc_index = phandle; - kfree(hp_elog); + handle_dlpar_errorlog(&hp_elog); } int pseries_devicetree_update(s32 scope) @@ -366,6 +360,8 @@ static ssize_t migration_store(struct class *class, if (rc) return rc; + stop_topology_update(); + do { rc = rtas_ibm_suspend_me(streamid); if (rc == -EAGAIN) @@ -376,6 +372,9 @@ static ssize_t migration_store(struct class *class, return rc; post_mobility_fixup(); + + start_topology_update(); + return count; } diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c @@ -203,7 +203,8 @@ static struct device_node *find_pe_dn(struct pci_dev *dev, int *total) /* Get the top level device in the PE */ edev = pdn_to_eeh_dev(PCI_DN(dn)); if (edev->pe) - edev = list_first_entry(&edev->pe->edevs, struct eeh_dev, list); + edev = list_first_entry(&edev->pe->edevs, struct eeh_dev, + entry); dn = pci_device_to_OF_node(edev->pdev); if (!dn) return NULL; diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c @@ -0,0 +1,345 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) "papr-scm: " fmt + +#include <linux/of.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/ioport.h> +#include <linux/slab.h> +#include <linux/ndctl.h> +#include <linux/sched.h> +#include <linux/libnvdimm.h> +#include <linux/platform_device.h> + +#include <asm/plpar_wrappers.h> + +#define BIND_ANY_ADDR (~0ul) + +#define PAPR_SCM_DIMM_CMD_MASK \ + ((1ul << ND_CMD_GET_CONFIG_SIZE) | \ + (1ul << ND_CMD_GET_CONFIG_DATA) | \ + (1ul << ND_CMD_SET_CONFIG_DATA)) + +struct papr_scm_priv { + struct platform_device *pdev; + struct device_node *dn; + uint32_t drc_index; + uint64_t blocks; + uint64_t block_size; + int metadata_size; + + uint64_t bound_addr; + + struct nvdimm_bus_descriptor bus_desc; + struct nvdimm_bus *bus; + struct nvdimm *nvdimm; + struct resource res; + struct nd_region *region; + struct nd_interleave_set nd_set; +}; + +static int drc_pmem_bind(struct papr_scm_priv *p) +{ + unsigned long ret[PLPAR_HCALL_BUFSIZE]; + uint64_t rc, token; + + /* + * When the hypervisor cannot map all the requested memory in a single + * hcall it returns H_BUSY and we call again with the token until + * we get H_SUCCESS. Aborting the retry loop before getting H_SUCCESS + * leave the system in an undefined state, so we wait. + */ + token = 0; + + do { + rc = plpar_hcall(H_SCM_BIND_MEM, ret, p->drc_index, 0, + p->blocks, BIND_ANY_ADDR, token); + token = be64_to_cpu(ret[0]); + cond_resched(); + } while (rc == H_BUSY); + + if (rc) { + dev_err(&p->pdev->dev, "bind err: %lld\n", rc); + return -ENXIO; + } + + p->bound_addr = be64_to_cpu(ret[1]); + + dev_dbg(&p->pdev->dev, "bound drc %x to %pR\n", p->drc_index, &p->res); + + return 0; +} + +static int drc_pmem_unbind(struct papr_scm_priv *p) +{ + unsigned long ret[PLPAR_HCALL_BUFSIZE]; + uint64_t rc, token; + + token = 0; + + /* NB: unbind has the same retry requirements mentioned above */ + do { + rc = plpar_hcall(H_SCM_UNBIND_MEM, ret, p->drc_index, + p->bound_addr, p->blocks, token); + token = be64_to_cpu(ret); + cond_resched(); + } while (rc == H_BUSY); + + if (rc) + dev_err(&p->pdev->dev, "unbind error: %lld\n", rc); + + return !!rc; +} + +static int papr_scm_meta_get(struct papr_scm_priv *p, + struct nd_cmd_get_config_data_hdr *hdr) +{ + unsigned long data[PLPAR_HCALL_BUFSIZE]; + int64_t ret; + + if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1) + return -EINVAL; + + ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index, + hdr->in_offset, 1); + + if (ret == H_PARAMETER) /* bad DRC index */ + return -ENODEV; + if (ret) + return -EINVAL; /* other invalid parameter */ + + hdr->out_buf[0] = data[0] & 0xff; + + return 0; +} + +static int papr_scm_meta_set(struct papr_scm_priv *p, + struct nd_cmd_set_config_hdr *hdr) +{ + int64_t ret; + + if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1) + return -EINVAL; + + ret = plpar_hcall_norets(H_SCM_WRITE_METADATA, + p->drc_index, hdr->in_offset, hdr->in_buf[0], 1); + + if (ret == H_PARAMETER) /* bad DRC index */ + return -ENODEV; + if (ret) + return -EINVAL; /* other invalid parameter */ + + return 0; +} + +int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, + unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) +{ + struct nd_cmd_get_config_size *get_size_hdr; + struct papr_scm_priv *p; + + /* Only dimm-specific calls are supported atm */ + if (!nvdimm) + return -EINVAL; + + p = nvdimm_provider_data(nvdimm); + + switch (cmd) { + case ND_CMD_GET_CONFIG_SIZE: + get_size_hdr = buf; + + get_size_hdr->status = 0; + get_size_hdr->max_xfer = 1; + get_size_hdr->config_size = p->metadata_size; + *cmd_rc = 0; + break; + + case ND_CMD_GET_CONFIG_DATA: + *cmd_rc = papr_scm_meta_get(p, buf); + break; + + case ND_CMD_SET_CONFIG_DATA: + *cmd_rc = papr_scm_meta_set(p, buf); + break; + + default: + return -EINVAL; + } + + dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc); + + return 0; +} + +static const struct attribute_group *region_attr_groups[] = { + &nd_region_attribute_group, + &nd_device_attribute_group, + &nd_mapping_attribute_group, + &nd_numa_attribute_group, + NULL, +}; + +static const struct attribute_group *bus_attr_groups[] = { + &nvdimm_bus_attribute_group, + NULL, +}; + +static const struct attribute_group *papr_scm_dimm_groups[] = { + &nvdimm_attribute_group, + &nd_device_attribute_group, + NULL, +}; + +static int papr_scm_nvdimm_init(struct papr_scm_priv *p) +{ + struct device *dev = &p->pdev->dev; + struct nd_mapping_desc mapping; + struct nd_region_desc ndr_desc; + unsigned long dimm_flags; + + p->bus_desc.ndctl = papr_scm_ndctl; + p->bus_desc.module = THIS_MODULE; + p->bus_desc.of_node = p->pdev->dev.of_node; + p->bus_desc.attr_groups = bus_attr_groups; + p->bus_desc.provider_name = kstrdup(p->pdev->name, GFP_KERNEL); + + if (!p->bus_desc.provider_name) + return -ENOMEM; + + p->bus = nvdimm_bus_register(NULL, &p->bus_desc); + if (!p->bus) { + dev_err(dev, "Error creating nvdimm bus %pOF\n", p->dn); + return -ENXIO; + } + + dimm_flags = 0; + set_bit(NDD_ALIASING, &dimm_flags); + + p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_groups, + dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL); + if (!p->nvdimm) { + dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn); + goto err; + } + + /* now add the region */ + + memset(&mapping, 0, sizeof(mapping)); + mapping.nvdimm = p->nvdimm; + mapping.start = 0; + mapping.size = p->blocks * p->block_size; // XXX: potential overflow? + + memset(&ndr_desc, 0, sizeof(ndr_desc)); + ndr_desc.attr_groups = region_attr_groups; + ndr_desc.numa_node = dev_to_node(&p->pdev->dev); + ndr_desc.res = &p->res; + ndr_desc.of_node = p->dn; + ndr_desc.provider_data = p; + ndr_desc.mapping = &mapping; + ndr_desc.num_mappings = 1; + ndr_desc.nd_set = &p->nd_set; + set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); + + p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc); + if (!p->region) { + dev_err(dev, "Error registering region %pR from %pOF\n", + ndr_desc.res, p->dn); + goto err; + } + + return 0; + +err: nvdimm_bus_unregister(p->bus); + kfree(p->bus_desc.provider_name); + return -ENXIO; +} + +static int papr_scm_probe(struct platform_device *pdev) +{ + uint32_t drc_index, metadata_size, unit_cap[2]; + struct device_node *dn = pdev->dev.of_node; + struct papr_scm_priv *p; + int rc; + + /* check we have all the required DT properties */ + if (of_property_read_u32(dn, "ibm,my-drc-index", &drc_index)) { + dev_err(&pdev->dev, "%pOF: missing drc-index!\n", dn); + return -ENODEV; + } + + if (of_property_read_u32_array(dn, "ibm,unit-capacity", unit_cap, 2)) { + dev_err(&pdev->dev, "%pOF: missing unit-capacity!\n", dn); + return -ENODEV; + } + + p = kzalloc(sizeof(*p), GFP_KERNEL); + if (!p) + return -ENOMEM; + + /* optional DT properties */ + of_property_read_u32(dn, "ibm,metadata-size", &metadata_size); + + p->dn = dn; + p->drc_index = drc_index; + p->block_size = unit_cap[0]; + p->blocks = unit_cap[1]; + + /* might be zero */ + p->metadata_size = metadata_size; + p->pdev = pdev; + + /* request the hypervisor to bind this region to somewhere in memory */ + rc = drc_pmem_bind(p); + if (rc) + goto err; + + /* setup the resource for the newly bound range */ + p->res.start = p->bound_addr; + p->res.end = p->bound_addr + p->blocks * p->block_size; + p->res.name = pdev->name; + p->res.flags = IORESOURCE_MEM; + + rc = papr_scm_nvdimm_init(p); + if (rc) + goto err2; + + platform_set_drvdata(pdev, p); + + return 0; + +err2: drc_pmem_unbind(p); +err: kfree(p); + return rc; +} + +static int papr_scm_remove(struct platform_device *pdev) +{ + struct papr_scm_priv *p = platform_get_drvdata(pdev); + + nvdimm_bus_unregister(p->bus); + drc_pmem_unbind(p); + kfree(p); + + return 0; +} + +static const struct of_device_id papr_scm_match[] = { + { .compatible = "ibm,pmemory" }, + { }, +}; + +static struct platform_driver papr_scm_driver = { + .probe = papr_scm_probe, + .remove = papr_scm_remove, + .driver = { + .name = "papr_scm", + .owner = THIS_MODULE, + .of_match_table = papr_scm_match, + }, +}; + +module_platform_driver(papr_scm_driver); +MODULE_DEVICE_TABLE(of, papr_scm_match); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("IBM Corporation"); diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c @@ -239,6 +239,7 @@ void __init pSeries_final_fixup(void) { pSeries_request_regions(); + eeh_probe_devices(); eeh_addr_cache_build(); #ifdef CONFIG_PCI_IOV diff --git a/arch/powerpc/platforms/pseries/pmem.c b/arch/powerpc/platforms/pseries/pmem.c @@ -0,0 +1,164 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Handles hot and cold plug of persistent memory regions on pseries. + */ + +#define pr_fmt(fmt) "pseries-pmem: " fmt + +#include <linux/kernel.h> +#include <linux/interrupt.h> +#include <linux/delay.h> +#include <linux/sched.h> /* for idle_task_exit */ +#include <linux/sched/hotplug.h> +#include <linux/cpu.h> +#include <linux/of.h> +#include <linux/of_platform.h> +#include <linux/slab.h> +#include <asm/prom.h> +#include <asm/rtas.h> +#include <asm/firmware.h> +#include <asm/machdep.h> +#include <asm/vdso_datapage.h> +#include <asm/plpar_wrappers.h> +#include <asm/topology.h> + +#include "pseries.h" +#include "offline_states.h" + +static struct device_node *pmem_node; + +static ssize_t pmem_drc_add_node(u32 drc_index) +{ + struct device_node *dn; + int rc; + + pr_debug("Attempting to add pmem node, drc index: %x\n", drc_index); + + rc = dlpar_acquire_drc(drc_index); + if (rc) { + pr_err("Failed to acquire DRC, rc: %d, drc index: %x\n", + rc, drc_index); + return -EINVAL; + } + + dn = dlpar_configure_connector(cpu_to_be32(drc_index), pmem_node); + if (!dn) { + pr_err("configure-connector failed for drc %x\n", drc_index); + dlpar_release_drc(drc_index); + return -EINVAL; + } + + /* NB: The of reconfig notifier creates platform device from the node */ + rc = dlpar_attach_node(dn, pmem_node); + if (rc) { + pr_err("Failed to attach node %s, rc: %d, drc index: %x\n", + dn->name, rc, drc_index); + + if (dlpar_release_drc(drc_index)) + dlpar_free_cc_nodes(dn); + + return rc; + } + + pr_info("Successfully added %pOF, drc index: %x\n", dn, drc_index); + + return 0; +} + +static ssize_t pmem_drc_remove_node(u32 drc_index) +{ + struct device_node *dn; + uint32_t index; + int rc; + + for_each_child_of_node(pmem_node, dn) { + if (of_property_read_u32(dn, "ibm,my-drc-index", &index)) + continue; + if (index == drc_index) + break; + } + + if (!dn) { + pr_err("Attempting to remove unused DRC index %x\n", drc_index); + return -ENODEV; + } + + pr_debug("Attempting to remove %pOF, drc index: %x\n", dn, drc_index); + + /* * NB: tears down the ibm,pmemory device as a side-effect */ + rc = dlpar_detach_node(dn); + if (rc) + return rc; + + rc = dlpar_release_drc(drc_index); + if (rc) { + pr_err("Failed to release drc (%x) for CPU %s, rc: %d\n", + drc_index, dn->name, rc); + dlpar_attach_node(dn, pmem_node); + return rc; + } + + pr_info("Successfully removed PMEM with drc index: %x\n", drc_index); + + return 0; +} + +int dlpar_hp_pmem(struct pseries_hp_errorlog *hp_elog) +{ + u32 count, drc_index; + int rc; + + /* slim chance, but we might get a hotplug event while booting */ + if (!pmem_node) + pmem_node = of_find_node_by_type(NULL, "ibm,persistent-memory"); + if (!pmem_node) { + pr_err("Hotplug event for a pmem device, but none exists\n"); + return -ENODEV; + } + + if (hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_INDEX) { + pr_err("Unsupported hotplug event type %d\n", + hp_elog->id_type); + return -EINVAL; + } + + count = hp_elog->_drc_u.drc_count; + drc_index = hp_elog->_drc_u.drc_index; + + lock_device_hotplug(); + + if (hp_elog->action == PSERIES_HP_ELOG_ACTION_ADD) { + rc = pmem_drc_add_node(drc_index); + } else if (hp_elog->action == PSERIES_HP_ELOG_ACTION_REMOVE) { + rc = pmem_drc_remove_node(drc_index); + } else { + pr_err("Unsupported hotplug action (%d)\n", hp_elog->action); + rc = -EINVAL; + } + + unlock_device_hotplug(); + return rc; +} + +const struct of_device_id drc_pmem_match[] = { + { .type = "ibm,persistent-memory", }, + {} +}; + +static int pseries_pmem_init(void) +{ + pmem_node = of_find_node_by_type(NULL, "ibm,persistent-memory"); + if (!pmem_node) + return 0; + + /* + * The generic OF bus probe/populate handles creating platform devices + * from the child (ibm,pmemory) nodes. The generic code registers an of + * reconfig notifier to handle the hot-add/remove cases too. + */ + of_platform_bus_probe(pmem_node, drc_pmem_match, NULL); + + return 0; +} +machine_arch_initcall(pseries, pseries_pmem_init); diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h @@ -24,6 +24,7 @@ struct pt_regs; extern int pSeries_system_reset_exception(struct pt_regs *regs); extern int pSeries_machine_check_exception(struct pt_regs *regs); +extern long pseries_machine_check_realmode(struct pt_regs *regs); #ifdef CONFIG_SMP extern void smp_init_pseries(void); @@ -59,15 +60,21 @@ extern int dlpar_detach_node(struct device_node *); extern int dlpar_acquire_drc(u32 drc_index); extern int dlpar_release_drc(u32 drc_index); -void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog, - struct completion *hotplug_done, int *rc); +void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog); +int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_errlog); + #ifdef CONFIG_MEMORY_HOTPLUG int dlpar_memory(struct pseries_hp_errorlog *hp_elog); +int dlpar_hp_pmem(struct pseries_hp_errorlog *hp_elog); #else static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog) { return -EOPNOTSUPP; } +static inline int dlpar_hp_pmem(struct pseries_hp_errorlog *hp_elog) +{ + return -EOPNOTSUPP; +} #endif #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c @@ -27,6 +27,7 @@ #include <asm/machdep.h> #include <asm/rtas.h> #include <asm/firmware.h> +#include <asm/mce.h> #include "pseries.h" @@ -50,6 +51,101 @@ static irqreturn_t ras_hotplug_interrupt(int irq, void *dev_id); static irqreturn_t ras_epow_interrupt(int irq, void *dev_id); static irqreturn_t ras_error_interrupt(int irq, void *dev_id); +/* RTAS pseries MCE errorlog section. */ +struct pseries_mc_errorlog { + __be32 fru_id; + __be32 proc_id; + u8 error_type; + /* + * sub_err_type (1 byte). Bit fields depends on error_type + * + * MSB0 + * | + * V + * 01234567 + * XXXXXXXX + * + * For error_type == MC_ERROR_TYPE_UE + * XXXXXXXX + * X 1: Permanent or Transient UE. + * X 1: Effective address provided. + * X 1: Logical address provided. + * XX 2: Reserved. + * XXX 3: Type of UE error. + * + * For error_type != MC_ERROR_TYPE_UE + * XXXXXXXX + * X 1: Effective address provided. + * XXXXX 5: Reserved. + * XX 2: Type of SLB/ERAT/TLB error. + */ + u8 sub_err_type; + u8 reserved_1[6]; + __be64 effective_address; + __be64 logical_address; +} __packed; + +/* RTAS pseries MCE error types */ +#define MC_ERROR_TYPE_UE 0x00 +#define MC_ERROR_TYPE_SLB 0x01 +#define MC_ERROR_TYPE_ERAT 0x02 +#define MC_ERROR_TYPE_TLB 0x04 +#define MC_ERROR_TYPE_D_CACHE 0x05 +#define MC_ERROR_TYPE_I_CACHE 0x07 + +/* RTAS pseries MCE error sub types */ +#define MC_ERROR_UE_INDETERMINATE 0 +#define MC_ERROR_UE_IFETCH 1 +#define MC_ERROR_UE_PAGE_TABLE_WALK_IFETCH 2 +#define MC_ERROR_UE_LOAD_STORE 3 +#define MC_ERROR_UE_PAGE_TABLE_WALK_LOAD_STORE 4 + +#define MC_ERROR_SLB_PARITY 0 +#define MC_ERROR_SLB_MULTIHIT 1 +#define MC_ERROR_SLB_INDETERMINATE 2 + +#define MC_ERROR_ERAT_PARITY 1 +#define MC_ERROR_ERAT_MULTIHIT 2 +#define MC_ERROR_ERAT_INDETERMINATE 3 + +#define MC_ERROR_TLB_PARITY 1 +#define MC_ERROR_TLB_MULTIHIT 2 +#define MC_ERROR_TLB_INDETERMINATE 3 + +static inline u8 rtas_mc_error_sub_type(const struct pseries_mc_errorlog *mlog) +{ + switch (mlog->error_type) { + case MC_ERROR_TYPE_UE: + return (mlog->sub_err_type & 0x07); + case MC_ERROR_TYPE_SLB: + case MC_ERROR_TYPE_ERAT: + case MC_ERROR_TYPE_TLB: + return (mlog->sub_err_type & 0x03); + default: + return 0; + } +} + +static +inline u64 rtas_mc_get_effective_addr(const struct pseries_mc_errorlog *mlog) +{ + __be64 addr = 0; + + switch (mlog->error_type) { + case MC_ERROR_TYPE_UE: + if (mlog->sub_err_type & 0x40) + addr = mlog->effective_address; + break; + case MC_ERROR_TYPE_SLB: + case MC_ERROR_TYPE_ERAT: + case MC_ERROR_TYPE_TLB: + if (mlog->sub_err_type & 0x80) + addr = mlog->effective_address; + default: + break; + } + return be64_to_cpu(addr); +} /* * Enable the hotplug interrupt late because processing them may touch other @@ -237,8 +333,9 @@ static irqreturn_t ras_hotplug_interrupt(int irq, void *dev_id) * hotplug events on the ras_log_buf to be handled by rtas_errd. */ if (hp_elog->resource == PSERIES_HP_ELOG_RESOURCE_MEM || - hp_elog->resource == PSERIES_HP_ELOG_RESOURCE_CPU) - queue_hotplug_event(hp_elog, NULL, NULL); + hp_elog->resource == PSERIES_HP_ELOG_RESOURCE_CPU || + hp_elog->resource == PSERIES_HP_ELOG_RESOURCE_PMEM) + queue_hotplug_event(hp_elog); else log_error(ras_log_buf, ERR_TYPE_RTAS_LOG, 0); @@ -427,6 +524,188 @@ int pSeries_system_reset_exception(struct pt_regs *regs) return 0; /* need to perform reset */ } +#define VAL_TO_STRING(ar, val) \ + (((val) < ARRAY_SIZE(ar)) ? ar[(val)] : "Unknown") + +static void pseries_print_mce_info(struct pt_regs *regs, + struct rtas_error_log *errp) +{ + const char *level, *sevstr; + struct pseries_errorlog *pseries_log; + struct pseries_mc_errorlog *mce_log; + u8 error_type, err_sub_type; + u64 addr; + u8 initiator = rtas_error_initiator(errp); + int disposition = rtas_error_disposition(errp); + + static const char * const initiators[] = { + "Unknown", + "CPU", + "PCI", + "ISA", + "Memory", + "Power Mgmt", + }; + static const char * const mc_err_types[] = { + "UE", + "SLB", + "ERAT", + "TLB", + "D-Cache", + "Unknown", + "I-Cache", + }; + static const char * const mc_ue_types[] = { + "Indeterminate", + "Instruction fetch", + "Page table walk ifetch", + "Load/Store", + "Page table walk Load/Store", + }; + + /* SLB sub errors valid values are 0x0, 0x1, 0x2 */ + static const char * const mc_slb_types[] = { + "Parity", + "Multihit", + "Indeterminate", + }; + + /* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3 */ + static const char * const mc_soft_types[] = { + "Unknown", + "Parity", + "Multihit", + "Indeterminate", + }; + + if (!rtas_error_extended(errp)) { + pr_err("Machine check interrupt: Missing extended error log\n"); + return; + } + + pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE); + if (pseries_log == NULL) + return; + + mce_log = (struct pseries_mc_errorlog *)pseries_log->data; + + error_type = mce_log->error_type; + err_sub_type = rtas_mc_error_sub_type(mce_log); + + switch (rtas_error_severity(errp)) { + case RTAS_SEVERITY_NO_ERROR: + level = KERN_INFO; + sevstr = "Harmless"; + break; + case RTAS_SEVERITY_WARNING: + level = KERN_WARNING; + sevstr = ""; + break; + case RTAS_SEVERITY_ERROR: + case RTAS_SEVERITY_ERROR_SYNC: + level = KERN_ERR; + sevstr = "Severe"; + break; + case RTAS_SEVERITY_FATAL: + default: + level = KERN_ERR; + sevstr = "Fatal"; + break; + } + +#ifdef CONFIG_PPC_BOOK3S_64 + /* Display faulty slb contents for SLB errors. */ + if (error_type == MC_ERROR_TYPE_SLB) + slb_dump_contents(local_paca->mce_faulty_slbs); +#endif + + printk("%s%s Machine check interrupt [%s]\n", level, sevstr, + disposition == RTAS_DISP_FULLY_RECOVERED ? + "Recovered" : "Not recovered"); + if (user_mode(regs)) { + printk("%s NIP: [%016lx] PID: %d Comm: %s\n", level, + regs->nip, current->pid, current->comm); + } else { + printk("%s NIP [%016lx]: %pS\n", level, regs->nip, + (void *)regs->nip); + } + printk("%s Initiator: %s\n", level, + VAL_TO_STRING(initiators, initiator)); + + switch (error_type) { + case MC_ERROR_TYPE_UE: + printk("%s Error type: %s [%s]\n", level, + VAL_TO_STRING(mc_err_types, error_type), + VAL_TO_STRING(mc_ue_types, err_sub_type)); + break; + case MC_ERROR_TYPE_SLB: + printk("%s Error type: %s [%s]\n", level, + VAL_TO_STRING(mc_err_types, error_type), + VAL_TO_STRING(mc_slb_types, err_sub_type)); + break; + case MC_ERROR_TYPE_ERAT: + case MC_ERROR_TYPE_TLB: + printk("%s Error type: %s [%s]\n", level, + VAL_TO_STRING(mc_err_types, error_type), + VAL_TO_STRING(mc_soft_types, err_sub_type)); + break; + default: + printk("%s Error type: %s\n", level, + VAL_TO_STRING(mc_err_types, error_type)); + break; + } + + addr = rtas_mc_get_effective_addr(mce_log); + if (addr) + printk("%s Effective address: %016llx\n", level, addr); +} + +static int mce_handle_error(struct rtas_error_log *errp) +{ + struct pseries_errorlog *pseries_log; + struct pseries_mc_errorlog *mce_log; + int disposition = rtas_error_disposition(errp); + u8 error_type; + + if (!rtas_error_extended(errp)) + goto out; + + pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE); + if (pseries_log == NULL) + goto out; + + mce_log = (struct pseries_mc_errorlog *)pseries_log->data; + error_type = mce_log->error_type; + +#ifdef CONFIG_PPC_BOOK3S_64 + if (disposition == RTAS_DISP_NOT_RECOVERED) { + switch (error_type) { + case MC_ERROR_TYPE_SLB: + case MC_ERROR_TYPE_ERAT: + /* + * Store the old slb content in paca before flushing. + * Print this when we go to virtual mode. + * There are chances that we may hit MCE again if there + * is a parity error on the SLB entry we trying to read + * for saving. Hence limit the slb saving to single + * level of recursion. + */ + if (local_paca->in_mce == 1) + slb_save_contents(local_paca->mce_faulty_slbs); + flush_and_reload_slb(); + disposition = RTAS_DISP_FULLY_RECOVERED; + rtas_set_disposition_recovered(errp); + break; + default: + break; + } + } +#endif + +out: + return disposition; +} + /* * Process MCE rtas errlog event. */ @@ -452,8 +731,11 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err) int recovered = 0; int disposition = rtas_error_disposition(err); + pseries_print_mce_info(regs, err); + if (!(regs->msr & MSR_RI)) { /* If MSR_RI isn't set, we cannot recover */ + pr_err("Machine check interrupt unrecoverable: MSR(RI=0)\n"); recovered = 0; } else if (disposition == RTAS_DISP_FULLY_RECOVERED) { @@ -503,11 +785,31 @@ int pSeries_machine_check_exception(struct pt_regs *regs) struct rtas_error_log *errp; if (fwnmi_active) { - errp = fwnmi_get_errinfo(regs); fwnmi_release_errinfo(); + errp = fwnmi_get_errlog(); if (errp && recover_mce(regs, errp)) return 1; } return 0; } + +long pseries_machine_check_realmode(struct pt_regs *regs) +{ + struct rtas_error_log *errp; + int disposition; + + if (fwnmi_active) { + errp = fwnmi_get_errinfo(regs); + /* + * Call to fwnmi_release_errinfo() in real mode causes kernel + * to panic. Hence we will call it as soon as we go into + * virtual mode. + */ + disposition = mce_handle_error(errp); + if (disposition == RTAS_DISP_FULLY_RECOVERED) + return 1; + } + + return 0; +} diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c @@ -107,6 +107,10 @@ static void __init fwnmi_init(void) u8 *mce_data_buf; unsigned int i; int nr_cpus = num_possible_cpus(); +#ifdef CONFIG_PPC_BOOK3S_64 + struct slb_entry *slb_ptr; + size_t size; +#endif int ibm_nmi_register = rtas_token("ibm,nmi-register"); if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE) @@ -132,6 +136,15 @@ static void __init fwnmi_init(void) paca_ptrs[i]->mce_data_buf = mce_data_buf + (RTAS_ERROR_LOG_MAX * i); } + +#ifdef CONFIG_PPC_BOOK3S_64 + /* Allocate per cpu slb area to save old slb contents during MCE */ + size = sizeof(struct slb_entry) * mmu_slb_size * nr_cpus; + slb_ptr = __va(memblock_alloc_base(size, sizeof(struct slb_entry), + ppc64_rma_size)); + for_each_possible_cpu(i) + paca_ptrs[i]->mce_faulty_slbs = slb_ptr + (mmu_slb_size * i); +#endif } static void pseries_8259_cascade(struct irq_desc *desc) @@ -1017,6 +1030,7 @@ define_machine(pseries) { .calibrate_decr = generic_calibrate_decr, .progress = rtas_progress, .system_reset_exception = pSeries_system_reset_exception, + .machine_check_early = pseries_machine_check_realmode, .machine_check_exception = pSeries_machine_check_exception, #ifdef CONFIG_KEXEC_CORE .machine_kexec = pSeries_machine_kexec, diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c @@ -1349,7 +1349,6 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) struct device_node *parent_node; const __be32 *prop; enum vio_dev_family family; - const char *of_node_name = of_node->name ? of_node->name : "<unknown>"; /* * Determine if this node is a under the /vdevice node or under the @@ -1362,24 +1361,24 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) else if (!strcmp(parent_node->type, "vdevice")) family = VDEVICE; else { - pr_warn("%s: parent(%pOF) of %s not recognized.\n", + pr_warn("%s: parent(%pOF) of %pOFn not recognized.\n", __func__, parent_node, - of_node_name); + of_node); of_node_put(parent_node); return NULL; } of_node_put(parent_node); } else { - pr_warn("%s: could not determine the parent of node %s.\n", - __func__, of_node_name); + pr_warn("%s: could not determine the parent of node %pOFn.\n", + __func__, of_node); return NULL; } if (family == PFO) { if (of_get_property(of_node, "interrupt-controller", NULL)) { - pr_debug("%s: Skipping the interrupt controller %s.\n", - __func__, of_node_name); + pr_debug("%s: Skipping the interrupt controller %pOFn.\n", + __func__, of_node); return NULL; } } @@ -1399,15 +1398,15 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) if (of_node->type != NULL) viodev->type = of_node->type; else { - pr_warn("%s: node %s is missing the 'device_type' " - "property.\n", __func__, of_node_name); + pr_warn("%s: node %pOFn is missing the 'device_type' " + "property.\n", __func__, of_node); goto out; } prop = of_get_property(of_node, "reg", NULL); if (prop == NULL) { - pr_warn("%s: node %s missing 'reg'\n", - __func__, of_node_name); + pr_warn("%s: node %pOFn missing 'reg'\n", + __func__, of_node); goto out; } unit_address = of_read_number(prop, 1); @@ -1422,8 +1421,8 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) if (prop != NULL) viodev->resource_id = of_read_number(prop, 1); - dev_set_name(&viodev->dev, "%s", of_node_name); - viodev->type = of_node_name; + dev_set_name(&viodev->dev, "%pOFn", of_node); + viodev->type = dev_name(&viodev->dev); viodev->irq = 0; } @@ -1694,7 +1693,7 @@ struct vio_dev *vio_find_node(struct device_node *vnode) snprintf(kobj_name, sizeof(kobj_name), "%x", (uint32_t)of_read_number(prop, 1)); } else if (!strcmp(dev_type, "ibm,platform-facilities")) - snprintf(kobj_name, sizeof(kobj_name), "%s", vnode->name); + snprintf(kobj_name, sizeof(kobj_name), "%pOFn", vnode); else return NULL; diff --git a/arch/powerpc/sysdev/Kconfig b/arch/powerpc/sysdev/Kconfig @@ -6,19 +6,16 @@ config PPC4xx_PCI_EXPRESS bool depends on PCI && 4xx - default n config PPC4xx_HSTA_MSI bool depends on PCI_MSI depends on PCI && 4xx - default n config PPC4xx_MSI bool depends on PCI_MSI depends on PCI && 4xx - default n config PPC_MSI_BITMAP bool @@ -37,11 +34,9 @@ config PPC_SCOM config SCOM_DEBUGFS bool "Expose SCOM controllers via debugfs" depends on PPC_SCOM && DEBUG_FS - default n config GE_FPGA bool - default n config FSL_CORENET_RCPM bool diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile @@ -1,5 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) @@ -56,8 +55,6 @@ obj-$(CONFIG_PPC_SCOM) += scom.o obj-$(CONFIG_PPC_EARLY_DEBUG_MEMCONS) += udbg_memcons.o -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror - obj-$(CONFIG_PPC_XICS) += xics/ obj-$(CONFIG_PPC_XIVE) += xive/ diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c @@ -107,11 +107,11 @@ int __init instantiate_cache_sram(struct platform_device *dev, goto out_free; } - cache_sram->base_virt = ioremap_prot(cache_sram->base_phys, - cache_sram->size, _PAGE_COHERENT | PAGE_KERNEL); + cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, + cache_sram->size); if (!cache_sram->base_virt) { - dev_err(&dev->dev, "%pOF: ioremap_prot failed\n", - dev->dev.of_node); + dev_err(&dev->dev, "%pOF: ioremap_coherent failed\n", + dev->dev.of_node); ret = -ENOMEM; goto out_release; } diff --git a/arch/powerpc/sysdev/ipic.c b/arch/powerpc/sysdev/ipic.c @@ -846,7 +846,7 @@ void ipic_disable_mcp(enum ipic_mcp_irq mcp_irq) u32 ipic_get_mcp_status(void) { - return ipic_read(primary_ipic->regs, IPIC_SERSR); + return primary_ipic ? ipic_read(primary_ipic->regs, IPIC_SERSR) : 0; } void ipic_clear_mcp_status(u32 mask) diff --git a/arch/powerpc/sysdev/xics/Makefile b/arch/powerpc/sysdev/xics/Makefile @@ -1,5 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror obj-y += xics-common.o obj-$(CONFIG_PPC_ICP_NATIVE) += icp-native.o diff --git a/arch/powerpc/sysdev/xive/Kconfig b/arch/powerpc/sysdev/xive/Kconfig @@ -1,17 +1,14 @@ # SPDX-License-Identifier: GPL-2.0 config PPC_XIVE bool - default n select PPC_SMP_MUXED_IPI select HARDIRQS_SW_RESEND config PPC_XIVE_NATIVE bool - default n select PPC_XIVE depends on PPC_POWERNV config PPC_XIVE_SPAPR bool - default n select PPC_XIVE diff --git a/arch/powerpc/sysdev/xive/Makefile b/arch/powerpc/sysdev/xive/Makefile @@ -1,4 +1,3 @@ -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror obj-y += common.o obj-$(CONFIG_PPC_XIVE_NATIVE) += native.o diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c @@ -1010,12 +1010,13 @@ static void xive_ipi_eoi(struct irq_data *d) { struct xive_cpu *xc = __this_cpu_read(xive_cpu); - DBG_VERBOSE("IPI eoi: irq=%d [0x%lx] (HW IRQ 0x%x) pending=%02x\n", - d->irq, irqd_to_hwirq(d), xc->hw_ipi, xc->pending_prio); - /* Handle possible race with unplug and drop stale IPIs */ if (!xc) return; + + DBG_VERBOSE("IPI eoi: irq=%d [0x%lx] (HW IRQ 0x%x) pending=%02x\n", + d->irq, irqd_to_hwirq(d), xc->hw_ipi, xc->pending_prio); + xive_do_source_eoi(xc->hw_ipi, &xc->ipi_data); xive_do_queue_eoi(xc); } diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c @@ -238,20 +238,11 @@ static bool xive_native_match(struct device_node *node) #ifdef CONFIG_SMP static int xive_native_get_ipi(unsigned int cpu, struct xive_cpu *xc) { - struct device_node *np; - unsigned int chip_id; s64 irq; - /* Find the chip ID */ - np = of_get_cpu_node(cpu, NULL); - if (np) { - if (of_property_read_u32(np, "ibm,chip-id", &chip_id) < 0) - chip_id = 0; - } - /* Allocate an IPI and populate info about it */ for (;;) { - irq = opal_xive_allocate_irq(chip_id); + irq = opal_xive_allocate_irq(xc->chip_id); if (irq == OPAL_BUSY) { msleep(OPAL_BUSY_DELAY_MS); continue; diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile @@ -1,14 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 # Makefile for xmon -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror +# Disable clang warning for using setjmp without setjmp.h header +subdir-ccflags-y := $(call cc-disable-warning, builtin-requires-header) GCOV_PROFILE := n UBSAN_SANITIZE := n # Disable ftrace for the entire directory ORIG_CFLAGS := $(KBUILD_CFLAGS) -KBUILD_CFLAGS = $(subst -mno-sched-epilog,,$(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))) +KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS)) ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c @@ -2378,25 +2378,33 @@ static void dump_one_paca(int cpu) DUMP(p, cpu_start, "%#-*x"); DUMP(p, kexec_state, "%#-*x"); #ifdef CONFIG_PPC_BOOK3S_64 - for (i = 0; i < SLB_NUM_BOLTED; i++) { - u64 esid, vsid; + if (!early_radix_enabled()) { + for (i = 0; i < SLB_NUM_BOLTED; i++) { + u64 esid, vsid; - if (!p->slb_shadow_ptr) - continue; + if (!p->slb_shadow_ptr) + continue; + + esid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].esid); + vsid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].vsid); - esid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].esid); - vsid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].vsid); + if (esid || vsid) { + printf(" %-*s[%d] = 0x%016llx 0x%016llx\n", + 22, "slb_shadow", i, esid, vsid); + } + } + DUMP(p, vmalloc_sllp, "%#-*x"); + DUMP(p, stab_rr, "%#-*x"); + DUMP(p, slb_used_bitmap, "%#-*x"); + DUMP(p, slb_kern_bitmap, "%#-*x"); - if (esid || vsid) { - printf(" %-*s[%d] = 0x%016llx 0x%016llx\n", - 22, "slb_shadow", i, esid, vsid); + if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) { + DUMP(p, slb_cache_ptr, "%#-*x"); + for (i = 0; i < SLB_CACHE_ENTRIES; i++) + printf(" %-*s[%d] = 0x%016x\n", + 22, "slb_cache", i, p->slb_cache[i]); } } - DUMP(p, vmalloc_sllp, "%#-*x"); - DUMP(p, slb_cache_ptr, "%#-*x"); - for (i = 0; i < SLB_CACHE_ENTRIES; i++) - printf(" %-*s[%d] = 0x%016x\n", - 22, "slb_cache", i, p->slb_cache[i]); DUMP(p, rfi_flush_fallback_area, "%-*px"); #endif @@ -2412,7 +2420,9 @@ static void dump_one_paca(int cpu) DUMP(p, __current, "%-*px"); DUMP(p, kstack, "%#-*llx"); printf(" %-*s = 0x%016llx\n", 25, "kstack_base", p->kstack & ~(THREAD_SIZE - 1)); - DUMP(p, stab_rr, "%#-*llx"); +#ifdef CONFIG_STACKPROTECTOR + DUMP(p, canary, "%#-*lx"); +#endif DUMP(p, saved_r1, "%#-*llx"); DUMP(p, trap_save, "%#-*x"); DUMP(p, irq_soft_mask, "%#-*x"); @@ -2444,11 +2454,15 @@ static void dump_one_paca(int cpu) DUMP(p, accounting.utime, "%#-*lx"); DUMP(p, accounting.stime, "%#-*lx"); +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME DUMP(p, accounting.utime_scaled, "%#-*lx"); +#endif DUMP(p, accounting.starttime, "%#-*lx"); DUMP(p, accounting.starttime_user, "%#-*lx"); +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME DUMP(p, accounting.startspurr, "%#-*lx"); DUMP(p, accounting.utime_sspurr, "%#-*lx"); +#endif DUMP(p, accounting.steal_time, "%#-*lx"); #undef DUMP @@ -2988,15 +3002,17 @@ static void show_task(struct task_struct *tsk) #ifdef CONFIG_PPC_BOOK3S_64 void format_pte(void *ptep, unsigned long pte) { + pte_t entry = __pte(pte); + printf("ptep @ 0x%016lx = 0x%016lx\n", (unsigned long)ptep, pte); printf("Maps physical address = 0x%016lx\n", pte & PTE_RPN_MASK); printf("Flags = %s%s%s%s%s\n", - (pte & _PAGE_ACCESSED) ? "Accessed " : "", - (pte & _PAGE_DIRTY) ? "Dirty " : "", - (pte & _PAGE_READ) ? "Read " : "", - (pte & _PAGE_WRITE) ? "Write " : "", - (pte & _PAGE_EXEC) ? "Exec " : ""); + pte_young(entry) ? "Accessed " : "", + pte_dirty(entry) ? "Dirty " : "", + pte_read(entry) ? "Read " : "", + pte_write(entry) ? "Write " : "", + pte_exec(entry) ? "Exec " : ""); } static void show_pte(unsigned long addr) diff --git a/drivers/block/z2ram.c b/drivers/block/z2ram.c @@ -191,8 +191,7 @@ static int z2_open(struct block_device *bdev, fmode_t mode) vfree(vmalloc (size)); } - vaddr = (unsigned long) __ioremap (paddr, size, - _PAGE_WRITETHRU); + vaddr = (unsigned long)ioremap_wt(paddr, size); #else vaddr = (unsigned long)z_remap_nocache_nonser(paddr, size); diff --git a/drivers/macintosh/adb-iop.c b/drivers/macintosh/adb-iop.c @@ -20,13 +20,13 @@ #include <linux/init.h> #include <linux/proc_fs.h> -#include <asm/macintosh.h> -#include <asm/macints.h> +#include <asm/macintosh.h> +#include <asm/macints.h> #include <asm/mac_iop.h> #include <asm/mac_oss.h> #include <asm/adb_iop.h> -#include <linux/adb.h> +#include <linux/adb.h> /*#define DEBUG_ADB_IOP*/ @@ -38,9 +38,9 @@ static unsigned char *reply_ptr; #endif static enum adb_iop_state { - idle, - sending, - awaiting_reply + idle, + sending, + awaiting_reply } adb_iop_state; static void adb_iop_start(void); @@ -66,7 +66,8 @@ static void adb_iop_end_req(struct adb_request *req, int state) { req->complete = 1; current_req = req->next; - if (req->done) (*req->done)(req); + if (req->done) + (*req->done)(req); adb_iop_state = state; } @@ -100,7 +101,7 @@ static void adb_iop_complete(struct iop_msg *msg) static void adb_iop_listen(struct iop_msg *msg) { - struct adb_iopmsg *amsg = (struct adb_iopmsg *) msg->message; + struct adb_iopmsg *amsg = (struct adb_iopmsg *)msg->message; struct adb_request *req; unsigned long flags; #ifdef DEBUG_ADB_IOP @@ -113,9 +114,9 @@ static void adb_iop_listen(struct iop_msg *msg) #ifdef DEBUG_ADB_IOP printk("adb_iop_listen %p: rcvd packet, %d bytes: %02X %02X", req, - (uint) amsg->count + 2, (uint) amsg->flags, (uint) amsg->cmd); + (uint)amsg->count + 2, (uint)amsg->flags, (uint)amsg->cmd); for (i = 0; i < amsg->count; i++) - printk(" %02X", (uint) amsg->data[i]); + printk(" %02X", (uint)amsg->data[i]); printk("\n"); #endif @@ -168,14 +169,15 @@ static void adb_iop_start(void) /* get the packet to send */ req = current_req; - if (!req) return; + if (!req) + return; local_irq_save(flags); #ifdef DEBUG_ADB_IOP printk("adb_iop_start %p: sending packet, %d bytes:", req, req->nbytes); - for (i = 0 ; i < req->nbytes ; i++) - printk(" %02X", (uint) req->data[i]); + for (i = 0; i < req->nbytes; i++) + printk(" %02X", (uint)req->data[i]); printk("\n"); #endif @@ -196,19 +198,20 @@ static void adb_iop_start(void) /* Now send it. The IOP manager will call adb_iop_complete */ /* when the packet has been sent. */ - iop_send_message(ADB_IOP, ADB_CHAN, req, - sizeof(amsg), (__u8 *) &amsg, adb_iop_complete); + iop_send_message(ADB_IOP, ADB_CHAN, req, sizeof(amsg), (__u8 *)&amsg, + adb_iop_complete); } int adb_iop_probe(void) { - if (!iop_ism_present) return -ENODEV; + if (!iop_ism_present) + return -ENODEV; return 0; } int adb_iop_init(void) { - printk("adb: IOP ISM driver v0.4 for Unified ADB.\n"); + pr_info("adb: IOP ISM driver v0.4 for Unified ADB\n"); iop_listen(ADB_IOP, ADB_CHAN, adb_iop_listen, "ADB"); return 0; } @@ -218,10 +221,12 @@ int adb_iop_send_request(struct adb_request *req, int sync) int err; err = adb_iop_write(req); - if (err) return err; + if (err) + return err; if (sync) { - while (!req->complete) adb_iop_poll(); + while (!req->complete) + adb_iop_poll(); } return 0; } @@ -251,7 +256,9 @@ static int adb_iop_write(struct adb_request *req) } local_irq_restore(flags); - if (adb_iop_state == idle) adb_iop_start(); + + if (adb_iop_state == idle) + adb_iop_start(); return 0; } @@ -263,7 +270,8 @@ int adb_iop_autopoll(int devs) void adb_iop_poll(void) { - if (adb_iop_state == idle) adb_iop_start(); + if (adb_iop_state == idle) + adb_iop_start(); iop_ism_irq_poll(ADB_IOP); } diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c @@ -203,15 +203,15 @@ static int adb_scan_bus(void) } /* Now fill in the handler_id field of the adb_handler entries. */ - pr_debug("adb devices:\n"); for (i = 1; i < 16; i++) { if (adb_handler[i].original_address == 0) continue; adb_request(&req, NULL, ADBREQ_SYNC | ADBREQ_REPLY, 1, (i << 4) | 0xf); adb_handler[i].handler_id = req.reply[2]; - pr_debug(" [%d]: %d %x\n", i, adb_handler[i].original_address, - adb_handler[i].handler_id); + printk(KERN_DEBUG "adb device [%d]: %d 0x%X\n", i, + adb_handler[i].original_address, + adb_handler[i].handler_id); devmask |= 1 << i; } return devmask; @@ -579,6 +579,8 @@ adb_try_handler_change(int address, int new_id) mutex_lock(&adb_handler_mutex); ret = try_handler_change(address, new_id); mutex_unlock(&adb_handler_mutex); + if (ret) + pr_debug("adb handler change: [%d] 0x%X\n", address, new_id); return ret; } EXPORT_SYMBOL(adb_try_handler_change); diff --git a/drivers/macintosh/adbhid.c b/drivers/macintosh/adbhid.c @@ -757,6 +757,7 @@ adbhid_input_register(int id, int default_id, int original_handler_id, struct input_dev *input_dev; int err; int i; + char *keyboard_type; if (adbhid[id]) { pr_err("Trying to reregister ADB HID on ID %d\n", id); @@ -798,24 +799,23 @@ adbhid_input_register(int id, int default_id, int original_handler_id, memcpy(hid->keycode, adb_to_linux_keycodes, sizeof(adb_to_linux_keycodes)); - pr_info("Detected ADB keyboard, type "); switch (original_handler_id) { default: - pr_cont("<unknown>.\n"); + keyboard_type = "<unknown>"; input_dev->id.version = ADB_KEYBOARD_UNKNOWN; break; case 0x01: case 0x02: case 0x03: case 0x06: case 0x08: case 0x0C: case 0x10: case 0x18: case 0x1B: case 0x1C: case 0xC0: case 0xC3: case 0xC6: - pr_cont("ANSI.\n"); + keyboard_type = "ANSI"; input_dev->id.version = ADB_KEYBOARD_ANSI; break; case 0x04: case 0x05: case 0x07: case 0x09: case 0x0D: case 0x11: case 0x14: case 0x19: case 0x1D: case 0xC1: case 0xC4: case 0xC7: - pr_cont("ISO, swapping keys.\n"); + keyboard_type = "ISO, swapping keys"; input_dev->id.version = ADB_KEYBOARD_ISO; i = hid->keycode[10]; hid->keycode[10] = hid->keycode[50]; @@ -824,10 +824,11 @@ adbhid_input_register(int id, int default_id, int original_handler_id, case 0x12: case 0x15: case 0x16: case 0x17: case 0x1A: case 0x1E: case 0xC2: case 0xC5: case 0xC8: case 0xC9: - pr_cont("JIS.\n"); + keyboard_type = "JIS"; input_dev->id.version = ADB_KEYBOARD_JIS; break; } + pr_info("Detected ADB keyboard, type %s.\n", keyboard_type); for (i = 0; i < 128; i++) if (hid->keycode[i]) @@ -972,16 +973,13 @@ adbhid_probe(void) ->get it to send separate codes for left and right shift, control, option keys */ #if 0 /* handler 5 doesn't send separate codes for R modifiers */ - if (adb_try_handler_change(id, 5)) - printk("ADB keyboard at %d, handler set to 5\n", id); - else + if (!adb_try_handler_change(id, 5)) #endif - if (adb_try_handler_change(id, 3)) - printk("ADB keyboard at %d, handler set to 3\n", id); - else - printk("ADB keyboard at %d, handler 1\n", id); + adb_try_handler_change(id, 3); adb_get_infos(id, &default_id, &cur_handler_id); + printk(KERN_DEBUG "ADB keyboard at %d has handler 0x%X\n", + id, cur_handler_id); reg |= adbhid_input_reregister(id, default_id, org_handler_id, cur_handler_id, 0); } @@ -999,48 +997,44 @@ adbhid_probe(void) for (i = 0; i < mouse_ids.nids; i++) { int id = mouse_ids.id[i]; int mouse_kind; + char *desc = "standard"; adb_get_infos(id, &default_id, &org_handler_id); if (adb_try_handler_change(id, 4)) { - printk("ADB mouse at %d, handler set to 4", id); mouse_kind = ADBMOUSE_EXTENDED; } else if (adb_try_handler_change(id, 0x2F)) { - printk("ADB mouse at %d, handler set to 0x2F", id); mouse_kind = ADBMOUSE_MICROSPEED; } else if (adb_try_handler_change(id, 0x42)) { - printk("ADB mouse at %d, handler set to 0x42", id); mouse_kind = ADBMOUSE_TRACKBALLPRO; } else if (adb_try_handler_change(id, 0x66)) { - printk("ADB mouse at %d, handler set to 0x66", id); mouse_kind = ADBMOUSE_MICROSPEED; } else if (adb_try_handler_change(id, 0x5F)) { - printk("ADB mouse at %d, handler set to 0x5F", id); mouse_kind = ADBMOUSE_MICROSPEED; } else if (adb_try_handler_change(id, 3)) { - printk("ADB mouse at %d, handler set to 3", id); mouse_kind = ADBMOUSE_MS_A3; } else if (adb_try_handler_change(id, 2)) { - printk("ADB mouse at %d, handler set to 2", id); mouse_kind = ADBMOUSE_STANDARD_200; } else { - printk("ADB mouse at %d, handler 1", id); mouse_kind = ADBMOUSE_STANDARD_100; } if ((mouse_kind == ADBMOUSE_TRACKBALLPRO) || (mouse_kind == ADBMOUSE_MICROSPEED)) { + desc = "Microspeed/MacPoint or compatible"; init_microspeed(id); } else if (mouse_kind == ADBMOUSE_MS_A3) { + desc = "Mouse Systems A3 Mouse or compatible"; init_ms_a3(id); } else if (mouse_kind == ADBMOUSE_EXTENDED) { + desc = "extended"; /* * Register 1 is usually used for device * identification. Here, we try to identify @@ -1054,32 +1048,36 @@ adbhid_probe(void) (req.reply[1] == 0x9a) && ((req.reply[2] == 0x21) || (req.reply[2] == 0x20))) { mouse_kind = ADBMOUSE_TRACKBALL; + desc = "trackman/mouseman"; init_trackball(id); } else if ((req.reply_len >= 4) && (req.reply[1] == 0x74) && (req.reply[2] == 0x70) && (req.reply[3] == 0x61) && (req.reply[4] == 0x64)) { mouse_kind = ADBMOUSE_TRACKPAD; + desc = "trackpad"; init_trackpad(id); } else if ((req.reply_len >= 4) && (req.reply[1] == 0x4b) && (req.reply[2] == 0x4d) && (req.reply[3] == 0x4c) && (req.reply[4] == 0x31)) { mouse_kind = ADBMOUSE_TURBOMOUSE5; + desc = "TurboMouse 5"; init_turbomouse(id); } else if ((req.reply_len == 9) && (req.reply[1] == 0x4b) && (req.reply[2] == 0x4f) && (req.reply[3] == 0x49) && (req.reply[4] == 0x54)) { if (adb_try_handler_change(id, 0x42)) { - pr_cont("\nADB MacAlly 2-button mouse at %d, handler set to 0x42", id); mouse_kind = ADBMOUSE_MACALLY2; + desc = "MacAlly 2-button"; } } } - pr_cont("\n"); adb_get_infos(id, &default_id, &cur_handler_id); + printk(KERN_DEBUG "ADB mouse (%s) at %d has handler 0x%X\n", + desc, id, cur_handler_id); reg |= adbhid_input_reregister(id, default_id, org_handler_id, cur_handler_id, mouse_kind); } @@ -1092,12 +1090,10 @@ init_trackpad(int id) struct adb_request req; unsigned char r1_buffer[8]; - pr_cont(" (trackpad)"); - adb_request(&req, NULL, ADBREQ_SYNC | ADBREQ_REPLY, 1, ADB_READREG(id,1)); if (req.reply_len < 8) - pr_cont("bad length for reg. 1\n"); + pr_err("%s: bad length for reg. 1\n", __func__); else { memcpy(r1_buffer, &req.reply[1], 8); @@ -1145,8 +1141,6 @@ init_trackball(int id) { struct adb_request req; - pr_cont(" (trackman/mouseman)"); - adb_request(&req, NULL, ADBREQ_SYNC, 3, ADB_WRITEREG(id,1), 00,0x81); @@ -1177,8 +1171,6 @@ init_turbomouse(int id) { struct adb_request req; - pr_cont(" (TurboMouse 5)"); - adb_request(&req, NULL, ADBREQ_SYNC, 1, ADB_FLUSH(id)); adb_request(&req, NULL, ADBREQ_SYNC, 1, ADB_FLUSH(3)); @@ -1213,8 +1205,6 @@ init_microspeed(int id) { struct adb_request req; - pr_cont(" (Microspeed/MacPoint or compatible)"); - adb_request(&req, NULL, ADBREQ_SYNC, 1, ADB_FLUSH(id)); /* This will initialize mice using the Microspeed, MacPoint and @@ -1253,7 +1243,6 @@ init_ms_a3(int id) { struct adb_request req; - pr_cont(" (Mouse Systems A3 Mouse, or compatible)"); adb_request(&req, NULL, ADBREQ_SYNC, 3, ADB_WRITEREG(id, 0x2), 0x00, diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c @@ -360,9 +360,10 @@ static struct macio_dev * macio_add_one_device(struct macio_chip *chip, struct macio_dev *in_bay, struct resource *parent_res) { + char name[MAX_NODE_NAME_SIZE + 1]; struct macio_dev *dev; const u32 *reg; - + if (np == NULL) return NULL; @@ -402,6 +403,7 @@ static struct macio_dev * macio_add_one_device(struct macio_chip *chip, #endif /* MacIO itself has a different reg, we use it's PCI base */ + snprintf(name, sizeof(name), "%pOFn", np); if (np == chip->of_node) { dev_set_name(&dev->ofdev.dev, "%1d.%08x:%.*s", chip->lbus.index, @@ -410,12 +412,12 @@ static struct macio_dev * macio_add_one_device(struct macio_chip *chip, #else 0, /* NuBus may want to do something better here */ #endif - MAX_NODE_NAME_SIZE, np->name); + MAX_NODE_NAME_SIZE, name); } else { reg = of_get_property(np, "reg", NULL); dev_set_name(&dev->ofdev.dev, "%1d.%08x:%.*s", chip->lbus.index, - reg ? *reg : 0, MAX_NODE_NAME_SIZE, np->name); + reg ? *reg : 0, MAX_NODE_NAME_SIZE, name); } /* Setup interrupts & resources */ diff --git a/drivers/macintosh/macio_sysfs.c b/drivers/macintosh/macio_sysfs.c @@ -58,7 +58,13 @@ static ssize_t devspec_show(struct device *dev, static DEVICE_ATTR_RO(modalias); static DEVICE_ATTR_RO(devspec); -macio_config_of_attr (name, "%s\n"); +static ssize_t name_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%pOFn\n", dev->of_node); +} +static DEVICE_ATTR_RO(name); + macio_config_of_attr (type, "%s\n"); static struct attribute *macio_dev_attrs[] = { diff --git a/drivers/macintosh/via-cuda.c b/drivers/macintosh/via-cuda.c @@ -766,3 +766,38 @@ cuda_input(unsigned char *buf, int nb) buf, nb, false); } } + +/* Offset between Unix time (1970-based) and Mac time (1904-based) */ +#define RTC_OFFSET 2082844800 + +time64_t cuda_get_time(void) +{ + struct adb_request req; + u32 now; + + if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0) + return 0; + while (!req.complete) + cuda_poll(); + if (req.reply_len != 7) + pr_err("%s: got %d byte reply\n", __func__, req.reply_len); + now = (req.reply[3] << 24) + (req.reply[4] << 16) + + (req.reply[5] << 8) + req.reply[6]; + return (time64_t)now - RTC_OFFSET; +} + +int cuda_set_rtc_time(struct rtc_time *tm) +{ + u32 now; + struct adb_request req; + + now = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET); + if (cuda_request(&req, NULL, 6, CUDA_PACKET, CUDA_SET_TIME, + now >> 24, now >> 16, now >> 8, now) < 0) + return -ENXIO; + while (!req.complete) + cuda_poll(); + if ((req.reply_len != 3) && (req.reply_len != 7)) + pr_err("%s: got %d byte reply\n", __func__, req.reply_len); + return 0; +} diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c @@ -12,7 +12,7 @@ * * 1999-08-02 (jmt) - Initial rewrite for Unified ADB. * 2000-03-29 Tony Mantler <tonym@mac.linux-m68k.org> - * - Big overhaul, should actually work now. + * - Big overhaul, should actually work now. * 2006-12-31 Finn Thain - Another overhaul. * * Suggested reading: @@ -23,7 +23,7 @@ * Apple's "ADB Analyzer" bus sniffer is invaluable: * ftp://ftp.apple.com/developer/Tool_Chest/Devices_-_Hardware/Apple_Desktop_Bus/ */ - + #include <stdarg.h> #include <linux/types.h> #include <linux/errno.h> @@ -77,7 +77,7 @@ static volatile unsigned char *via; #define ST_ODD 0x20 /* ADB state: odd data byte */ #define ST_IDLE 0x30 /* ADB state: idle, nothing to send */ -static int macii_init_via(void); +static int macii_init_via(void); static void macii_start(void); static irqreturn_t macii_interrupt(int irq, void *arg); static void macii_queue_poll(void); @@ -120,31 +120,15 @@ static int srq_asserted; /* have to poll for the device that asserted it */ static int command_byte; /* the most recent command byte transmitted */ static int autopoll_devs; /* bits set are device addresses to be polled */ -/* Sanity check for request queue. Doesn't check for cycles. */ -static int request_is_queued(struct adb_request *req) { - struct adb_request *cur; - unsigned long flags; - local_irq_save(flags); - cur = current_req; - while (cur) { - if (cur == req) { - local_irq_restore(flags); - return 1; - } - cur = cur->next; - } - local_irq_restore(flags); - return 0; -} - /* Check for MacII style ADB */ static int macii_probe(void) { - if (macintosh_config->adb_type != MAC_ADB_II) return -ENODEV; + if (macintosh_config->adb_type != MAC_ADB_II) + return -ENODEV; via = via1; - printk("adb: Mac II ADB Driver v1.0 for Unified ADB\n"); + pr_info("adb: Mac II ADB Driver v1.0 for Unified ADB\n"); return 0; } @@ -153,15 +137,17 @@ int macii_init(void) { unsigned long flags; int err; - + local_irq_save(flags); - + err = macii_init_via(); - if (err) goto out; + if (err) + goto out; err = request_irq(IRQ_MAC_ADB, macii_interrupt, 0, "ADB", macii_interrupt); - if (err) goto out; + if (err) + goto out; macii_state = idle; out: @@ -169,7 +155,7 @@ out: return err; } -/* initialize the hardware */ +/* initialize the hardware */ static int macii_init_via(void) { unsigned char x; @@ -179,7 +165,7 @@ static int macii_init_via(void) /* Set up state: idle */ via[B] |= ST_IDLE; - last_status = via[B] & (ST_MASK|CTLR_IRQ); + last_status = via[B] & (ST_MASK | CTLR_IRQ); /* Shift register on input */ via[ACR] = (via[ACR] & ~SR_CTRL) | SR_EXT; @@ -205,7 +191,8 @@ static void macii_queue_poll(void) int next_device; static struct adb_request req; - if (!autopoll_devs) return; + if (!autopoll_devs) + return; device_mask = (1 << (((command_byte & 0xF0) >> 4) + 1)) - 1; if (autopoll_devs & ~device_mask) @@ -213,10 +200,7 @@ static void macii_queue_poll(void) else next_device = ffs(autopoll_devs) - 1; - BUG_ON(request_is_queued(&req)); - - adb_request(&req, NULL, ADBREQ_NOSEND, 1, - ADB_READREG(next_device, 0)); + adb_request(&req, NULL, ADBREQ_NOSEND, 1, ADB_READREG(next_device, 0)); req.sent = 0; req.complete = 0; @@ -235,45 +219,47 @@ static void macii_queue_poll(void) static int macii_send_request(struct adb_request *req, int sync) { int err; - unsigned long flags; - BUG_ON(request_is_queued(req)); - - local_irq_save(flags); err = macii_write(req); - local_irq_restore(flags); + if (err) + return err; - if (!err && sync) { - while (!req->complete) { + if (sync) + while (!req->complete) macii_poll(); - } - BUG_ON(request_is_queued(req)); - } - return err; + return 0; } /* Send an ADB request (append to request queue) */ static int macii_write(struct adb_request *req) { + unsigned long flags; + if (req->nbytes < 2 || req->data[0] != ADB_PACKET || req->nbytes > 15) { req->complete = 1; return -EINVAL; } - + req->next = NULL; req->sent = 0; req->complete = 0; req->reply_len = 0; + local_irq_save(flags); + if (current_req != NULL) { last_req->next = req; last_req = req; } else { current_req = req; last_req = req; - if (macii_state == idle) macii_start(); + if (macii_state == idle) + macii_start(); } + + local_irq_restore(flags); + return 0; } @@ -287,7 +273,8 @@ static int macii_autopoll(int devs) /* bit 1 == device 1, and so on. */ autopoll_devs = devs & 0xFFFE; - if (!autopoll_devs) return 0; + if (!autopoll_devs) + return 0; local_irq_save(flags); @@ -304,7 +291,8 @@ static int macii_autopoll(int devs) return err; } -static inline int need_autopoll(void) { +static inline int need_autopoll(void) +{ /* Was the last command Talk Reg 0 * and is the target on the autopoll list? */ @@ -317,21 +305,17 @@ static inline int need_autopoll(void) { /* Prod the chip without interrupts */ static void macii_poll(void) { - disable_irq(IRQ_MAC_ADB); macii_interrupt(0, NULL); - enable_irq(IRQ_MAC_ADB); } /* Reset the bus */ static int macii_reset_bus(void) { static struct adb_request req; - - if (request_is_queued(&req)) - return 0; /* Command = 0, Address = ignored */ - adb_request(&req, NULL, 0, 1, ADB_BUSRESET); + adb_request(&req, NULL, ADBREQ_NOSEND, 1, ADB_BUSRESET); + macii_send_request(&req, 1); /* Don't want any more requests during the Global Reset low time. */ udelay(3000); @@ -346,10 +330,6 @@ static void macii_start(void) req = current_req; - BUG_ON(req == NULL); - - BUG_ON(macii_state != idle); - /* Now send it. Be careful though, that first byte of the request * is actually ADB_PACKET; the real data begins at index 1! * And req->nbytes is the number of bytes of real data plus one. @@ -375,7 +355,7 @@ static void macii_start(void) * to be activity on the ADB bus. The chip will poll to achieve this. * * The basic ADB state machine was left unchanged from the original MacII code - * by Alan Cox, which was based on the CUDA driver for PowerMac. + * by Alan Cox, which was based on the CUDA driver for PowerMac. * The syntax of the ADB status lines is totally different on MacII, * though. MacII uses the states Command -> Even -> Odd -> Even ->...-> Idle * for sending and Idle -> Even -> Odd -> Even ->...-> Idle for receiving. @@ -387,164 +367,166 @@ static void macii_start(void) static irqreturn_t macii_interrupt(int irq, void *arg) { int x; - static int entered; struct adb_request *req; + unsigned long flags; + + local_irq_save(flags); if (!arg) { /* Clear the SR IRQ flag when polling. */ if (via[IFR] & SR_INT) via[IFR] = SR_INT; - else + else { + local_irq_restore(flags); return IRQ_NONE; + } } - BUG_ON(entered++); - last_status = status; - status = via[B] & (ST_MASK|CTLR_IRQ); + status = via[B] & (ST_MASK | CTLR_IRQ); switch (macii_state) { - case idle: - if (reading_reply) { - reply_ptr = current_req->reply; - } else { - BUG_ON(current_req != NULL); - reply_ptr = reply_buf; - } + case idle: + if (reading_reply) { + reply_ptr = current_req->reply; + } else { + WARN_ON(current_req); + reply_ptr = reply_buf; + } - x = via[SR]; + x = via[SR]; - if ((status & CTLR_IRQ) && (x == 0xFF)) { - /* Bus timeout without SRQ sequence: - * data is "FF" while CTLR_IRQ is "H" - */ - reply_len = 0; - srq_asserted = 0; - macii_state = read_done; - } else { - macii_state = reading; - *reply_ptr = x; - reply_len = 1; - } + if ((status & CTLR_IRQ) && (x == 0xFF)) { + /* Bus timeout without SRQ sequence: + * data is "FF" while CTLR_IRQ is "H" + */ + reply_len = 0; + srq_asserted = 0; + macii_state = read_done; + } else { + macii_state = reading; + *reply_ptr = x; + reply_len = 1; + } - /* set ADB state = even for first data byte */ - via[B] = (via[B] & ~ST_MASK) | ST_EVEN; - break; + /* set ADB state = even for first data byte */ + via[B] = (via[B] & ~ST_MASK) | ST_EVEN; + break; - case sending: - req = current_req; - if (data_index >= req->nbytes) { - req->sent = 1; - macii_state = idle; - - if (req->reply_expected) { - reading_reply = 1; - } else { - req->complete = 1; - current_req = req->next; - if (req->done) (*req->done)(req); - - if (current_req) - macii_start(); - else - if (need_autopoll()) - macii_autopoll(autopoll_devs); - } + case sending: + req = current_req; + if (data_index >= req->nbytes) { + req->sent = 1; + macii_state = idle; - if (macii_state == idle) { - /* reset to shift in */ - via[ACR] &= ~SR_OUT; - x = via[SR]; - /* set ADB state idle - might get SRQ */ - via[B] = (via[B] & ~ST_MASK) | ST_IDLE; - } + if (req->reply_expected) { + reading_reply = 1; } else { - via[SR] = req->data[data_index++]; - - if ( (via[B] & ST_MASK) == ST_CMD ) { - /* just sent the command byte, set to EVEN */ - via[B] = (via[B] & ~ST_MASK) | ST_EVEN; - } else { - /* invert state bits, toggle ODD/EVEN */ - via[B] ^= ST_MASK; - } - } - break; - - case reading: - x = via[SR]; - BUG_ON((status & ST_MASK) == ST_CMD || - (status & ST_MASK) == ST_IDLE); - - /* Bus timeout with SRQ sequence: - * data is "XX FF" while CTLR_IRQ is "L L" - * End of packet without SRQ sequence: - * data is "XX...YY 00" while CTLR_IRQ is "L...H L" - * End of packet SRQ sequence: - * data is "XX...YY 00" while CTLR_IRQ is "L...L L" - * (where XX is the first response byte and - * YY is the last byte of valid response data.) - */ + req->complete = 1; + current_req = req->next; + if (req->done) + (*req->done)(req); - srq_asserted = 0; - if (!(status & CTLR_IRQ)) { - if (x == 0xFF) { - if (!(last_status & CTLR_IRQ)) { - macii_state = read_done; - reply_len = 0; - srq_asserted = 1; - } - } else if (x == 0x00) { - macii_state = read_done; - if (!(last_status & CTLR_IRQ)) - srq_asserted = 1; - } + if (current_req) + macii_start(); + else if (need_autopoll()) + macii_autopoll(autopoll_devs); } - if (macii_state == reading) { - BUG_ON(reply_len > 15); - reply_ptr++; - *reply_ptr = x; - reply_len++; + if (macii_state == idle) { + /* reset to shift in */ + via[ACR] &= ~SR_OUT; + x = via[SR]; + /* set ADB state idle - might get SRQ */ + via[B] = (via[B] & ~ST_MASK) | ST_IDLE; } + } else { + via[SR] = req->data[data_index++]; - /* invert state bits, toggle ODD/EVEN */ - via[B] ^= ST_MASK; - break; + if ((via[B] & ST_MASK) == ST_CMD) { + /* just sent the command byte, set to EVEN */ + via[B] = (via[B] & ~ST_MASK) | ST_EVEN; + } else { + /* invert state bits, toggle ODD/EVEN */ + via[B] ^= ST_MASK; + } + } + break; - case read_done: - x = via[SR]; + case reading: + x = via[SR]; + WARN_ON((status & ST_MASK) == ST_CMD || + (status & ST_MASK) == ST_IDLE); + + /* Bus timeout with SRQ sequence: + * data is "XX FF" while CTLR_IRQ is "L L" + * End of packet without SRQ sequence: + * data is "XX...YY 00" while CTLR_IRQ is "L...H L" + * End of packet SRQ sequence: + * data is "XX...YY 00" while CTLR_IRQ is "L...L L" + * (where XX is the first response byte and + * YY is the last byte of valid response data.) + */ - if (reading_reply) { - reading_reply = 0; - req = current_req; - req->reply_len = reply_len; - req->complete = 1; - current_req = req->next; - if (req->done) (*req->done)(req); - } else if (reply_len && autopoll_devs) - adb_input(reply_buf, reply_len, 0); + srq_asserted = 0; + if (!(status & CTLR_IRQ)) { + if (x == 0xFF) { + if (!(last_status & CTLR_IRQ)) { + macii_state = read_done; + reply_len = 0; + srq_asserted = 1; + } + } else if (x == 0x00) { + macii_state = read_done; + if (!(last_status & CTLR_IRQ)) + srq_asserted = 1; + } + } - macii_state = idle; + if (macii_state == reading && + reply_len < ARRAY_SIZE(reply_buf)) { + reply_ptr++; + *reply_ptr = x; + reply_len++; + } - /* SRQ seen before, initiate poll now */ - if (srq_asserted) - macii_queue