- Programming language and platform layer
- OS layer
- CPU layer
- Putting Mechanical Sympathy into practice
- Where do I learn about this stuff?
Mechanical Sympathy was a term used by Formula 1 racing driver Jackie Stewart, which performance specialist Martin Thompson adopted for software engineering practices. It means developing an understanding of how the machines you work with operate, so that you can get the best out of them.
Applied to software development, this means developing awareness at multiple levels:
- Programming language and platform (e.g. the VM for VM managed languages)
The further down the layers you go (the closer to the metal) the more general and cross platform the mechanical sympathy becomes. Once you know how to design data structures that are CPU core safe, you can design them on any OS/language platform, by learning how to access those CPU features from the higher layers.
Specific guidance below will be directed at the Java and JVM ecosystems.
Things to be aware of are:
- allocation and the effect of allocation on performance
- memory layouts and effect on memory access patterns
- GC and its effects on the application as a whole. For example, it is no good to have a super optimised critical path thread with zero allocations and great memory layout, if in the same process you have non-critical path threads generating a whole host of garbage ("no need to optimise these 'cause they're outside the critical path, y'know"). This "non-critical" garbage may result in stop the world GC pauses that will completely swamp your super optimised thread, because STW affects the process as a whole, not individual threads.
- OS scheduling and process pre-empting
- Context switches
- Processor affinity: CPU pinning and isolation
- User mode vs. kernel mode and transitioning costs
- Clocks and their accuracy vs. precision, and difference between monotonic and wall clocks
- Develop an understanding of CPU architecture and design. Things like:
- instruction pipeline, microcode, micro-ops, and out-of-order execution
- prefetching - both instruction and data (memory)
- cores and sockets, NUMA CPU architectures
- cache levels (L1, L2, L3)
- instruction cache
- cache coherence protocols (MESI and variants)
- CPU memory models and memory barriers and fences
- Understand what False Sharing is, and why it occurs
- Compare and Swap (CAS), aka Compare and Exchange (CMPXCHG), and other atomic operations
- Lock free algorithms
- Lock free vs. wait free
Mechanically sympathetic patterns include:
- Single Writer Principle
- Smart Batching
- Back pressure
- Martin Thompson is the guy who applied the term to software development. Find some of his thoughts on his blog, or check out his highly optimised code libraries.
- Gil Tene is the CTO of Azul Systems, known for its pauseless Zing JVM. He is the author of various Java utilities in the high performance space like HdrHistogram, and jHiccup. You should also watch his talk How NOT to Measure Latency in which he describes the Coordinated Omission problem in latency measurements.
- Read Nitsan Wakart's thoughts at your peril on his blog. He is also the author of JCTools, a collection of low latency Java Concurrency Tools.
- Aleksey Shipilėv puts the scientific method through its paces as he digs down into the lowest levels of the JVM and GC algorithms.
- Intel CPU manual for a list of all the x86 CPU instructions and Opcodes. Also check out the Wikipedia page on the topic.