Is there a tradeoff between memory latency and arithmetic intensity when deciding what to precompute? I can imagine if memory is expensive to access, it might be more efficient to compute some of these on the fly
Is there a tradeoff between memory latency and arithmetic intensity when deciding what to precompute? I can imagine if memory is expensive to access, it might be more efficient to compute some of these on the fly