There is a thing you see a lot in incident management, and customer response to it. The demand for a Root Cause Analysis and identification of a root-cause.
This is misleading.
Most systems are distributed systems these days. They're complex. Which makes for complex failures. Complex failures are interlocking failures.
Failure-Mode Analysis is a better term than RCA.
The Interstate 35W bridge fell into the Mississippi in Minneapolis several years ago. The failures there were many.
* A bad design (known)
* Lots of salt used on MN roads (known)
* Inspections missed a few things (unknown)
* Incorrect understanding about how cracks move in that particular structure (unknown)
All of it failed.