My SIMD details are a little rusty (sorry Kayvon!). Why is not using all of our SIMD lanes a problem? Surely our SIMD could be doing more, but why is a SIMD without full utilization worse than just using a single processor?
Not using all SIMD lanes signifies a loss in efficiency. The machine with SIMD-width N has the capability to do N math operations at once, with the constraint that is this possible only when all operations are the same. A packet with all active rays fits this criterion, so the throughout of the machine with be ~N. If a packet has rays that are "inactive", then the application is only providing the machine with less than N similar things to do. Some of the capability of the machine has gone wasted.