I think I missed the explanation -- what's the purpose of a cosine-weighted sampling? Is it just importance sampling based on the assumption that things close to the horizon are less likely to hit, or something else?

This paper talks about a method of combining ray tracing and rasterization, and it comes with some good example images: https://pdfs.semanticscholar.org/008d/6628e0787a95b802dae28546593078d4ab7a.pdf

That said, I don't know how much this is done in practice.

@cou I think you are right about your definition of theta, but I'm not sure what you mean by the hemisphere normal being defined by the middle expression?

@nrauen Sure, you can just generate a random point in some cube (or rectangular bounding box), and then check if the point is within the shape you're interested in. For example, for a sphere, you could check if it is < r away from the sphere's center. But as you move to higher and higher dimensions, your probability of success decreases.

In the discrete case, what we're trying to compute is:

sum [x in omega] f(x)

So if we sample a point x with probability p(x), then we need to account for that by dividing by the sampling probability, and then taking n samples we get:

(|omega| / n) sum[i = 1..n] (f(x_i) / p(x_i))

The same logic should apply in the continuous case, so we're trying to compute:

integral [x in omega] f(x)

Again, we need to account for our sampling bias. The weight with which each point biases the sample is exactly its value in the probability density function, so we again divide by that. We then take the same approach to approximating as before, and end up with the same result.

So @lykospirit, assuming this intuition is correct, that should work so long as the area of the red and blue rectangles are each 1 (so the integral of p(x) over the entire area is 1, so it's a valid PDF).

Is there ever a need to use both ray tracing and rasterization in the pipeline together or is it simply best to stick with one and go with it? A good example can go well if you can find one.

Could someone explain what the different quantity of Epsilon measure? I know that it is the resulting probability at the point but I don't know why there exists both E1 and E2.

Why are there three terms in RHS of the equation. What do each of these three terms correspond to?

How do we deal with a light that is multiple point sources?

What is theta in this case? The angle between the hemisphere normal and a random vector? And is the hemisphere normal defined by the middle expression?

@BellaJ That's right! If we let the square have side length of 1, then it has area 1. The circle, inscribed in the square, has radius 0.5 and area $\pi r^2 = \pi (0.5)^2 = \frac{\pi}{4}$.

Then, we have

$$ \frac{\text{Area of the circle}}{\text{Area of the square}} = \frac{\pi / 4}{1} = \frac{\pi}{4} $$

$$ \implies \frac{\pi}{4} = \frac{\text{Coconuts in circle}}{\text{All coconuts}} \implies \pi \approx 4 \times \frac{\text{Coconuts in circle}}{\text{All coconuts}} $$

And, as we throw more and more coconuts, the ratio of coconuts in the circle to all coconuts thrown gets closer and closer to the true value of $\pi$, as desired.

This might be a trivial question but where does the biasing term p(omega) go?

I was wondering about this too: the red/blue example makes sense with a discrete probability distribution, but how does this work with a continuous one?

Is the p(X_i) in the expression for the Monte Carlo integral a "discretized" version of the probability distribution? Like in the red/blue example, for any point (x,y) in [0,1]^2, we assign p(X_i)=8/9 if x<1/2, p(X_i)=1/9 if x>=1/2?

I think importance sampling also makes sense from a variance perspective:

If we consider x as taking values from -inf to inf, the mean amount of light across all possible values of x is essentially 0, hence it would make sense to sample at points with high variance, i.e. where f(x) deviates from zero.

While, in the context of the BRDF, the integral is over finitely many values of theta and phi, the same idea applies that for a single light source, most sample points will have little to no light coming from that light source, so the mean is close to zero.

Conversely, if there was uniformly a lot of light across the hemisphere, it would make sense to sample uniformly as well since the variance is equal throughout the hemisphere.

I think for the f(x) in the example above, for the values of x shown in the graph, the mean would be somewhere in the middle of the plotted y-axis, so technically both the left and right sides of the graph have similar variances and a uniform sampling might do better? Either way, p_1(x) is still a better choice than p_2(x), but I think it's not the best illustrative example for the other sampling examples on the slide?

Is albedo the "Value" of the color in this case?

The missing sign is "approximates"

I understand dividing by p is necessary for averaging samples, but how can we prove the result is just “?" integration instead of the expression multiplies some scalar things? Mathematical proof seems to be not easy since p(X) is probability distribution instead of probability.

How do we deal the discrete case? For example, in directional light what is the p(w) for each direction? If p(w) is 1 for the specific solid angle (and 0 elsewhere), then how could the integration of p(w) to be 1? Moreover, what is p(w) if a certain light has two fixed direction? Is p(w)=0.5 for these two specific directions?

The examples given show the quadrature points as evenly spaced along the x-axis. Is this always the case or are these just carefully chosen examples?

Is the coconut experiment evaluating PI based on how many times the coconut lands inside the circle vs outside the circle ?

Since this hemispherical light image showed up in last lecture's slide, it has been a mystery to me that which part of the car does the middle bright area of the globe correspond to.

So, for our PathTracer, this circle distribution should be what our random sampler weighted by cosine should look from above, right?

I understand it is possible to do a more explicit version of a sphere, but is there an equivalent rejection sampling method for higher dimensions?

Is this a similar idea to polynomial interpolation (i.e. a set of n + 1 points uniquely defines a polynomial of degree at most n)?

I think the outer one could be any rectangle since we can generate uniform samples for rectangles easily. This square is just the smallest rectangle that encloses the circle.

I believe you can actually use any parameterization of the 2D region as long as you warp the random variables correctly. If you wanted, you could use the first random variable, A to determine the X coord and the second variable, B for the Y coord in the spiral. Then you would need a function f(x) that determines the area of the spiral to the left of x (divided by the total area). Apply f^-1(A) to determine the X coord. Then make a function g_x(y) that returns the "length along" that vertical slice of the region (divided by the total length of the region). compute g_x^-1(y) to get the Y coord.

How does this paradox make sense intuitively?

I was reading about gauss quadrature in wikipedia. It says that we need n point to exactly yield the integral of 2n-1 degree polynomial.

https://en.wikipedia.org/wiki/Gaussian_quadrature

Maybe we will have to approximate the probability distribution?

It seems that the shape of the outer one is not restricted to squares, but can also be rectangles?

This is a problem that was in my computational physics class that we had to solve!

@jzhanson yeah, I think in class prof mentioned that because the range of thickness at certain points would be wider than others, the density of like the thinner parts would be higher, but I'm not sure. Would it help if we could weight the random generation instead of creating a totally accurate equation?

@xTheBHox: It's (probably) possible to do this with a space-filling curve of some kind. For example, you could adjust a Peano curve to fill the boundary of a circle, as in this image:

Of course, this appears to have the same issue as with the uniform probability case on the left of the slide -- the sampling distribution will (probably) not be uniform. The amount of recursive computation needed to accurately compute where on the fractal curve you end up is also probably not worth the savings of generating a second random number.

Are there any ways to sample the circle (intuitively) with only one RV?

@kc1 Thanks for this great stuff! I think the reason why we don't have an r term on the left-hand side is that we are integrating a probability density function. If we just ignore the context here and treat r and theta as two random variables without special meanings, this LHS is how the integration of a pdf looks like.

The one thing I'm not sure about is: since the integral of RHS equals to 1, LHS and RHS should be equal?

It's nice that, since we're guaranteed the cdf is always non-decreasing, there should always be an inverse. Though things may be harder if the inverse isn't in closed form.

@ ChrisZzh I agree that there's some explanation missing here. p(r,theta) is the probability density function. If we integrate this over the circle it should equal 1. Equating the LHS (Left-hand side) to RHS in line 2 is just saying exactly this. Note the RHS integrated over unit circle is equal to 1.

However, usually the integral p(r,theta)d(r)d(theta) on the LHS should have an extra r term in there ie. p(r,theta) r d(r)d(theta) since integrating over a circle we have dA = r dr dtheta. There looks to be some explanation missing/being swept under the rug as to why this r term is being moved into p(r,theta). I think it is being done because we are trying to figure out how to draw random samples from r and theta variables agnostic of geometry. So we integrate over the variables r and theta agnostic of geometric meaning ie. integrate over dr dtheta instead of dA = r d(r)d(theta).

But just my thoughts, would need insight from prof.

What is the exact definition of the point of interest here?

Where does the second line "p(r,theta)d(r)d(theta)..." come from?

How exactly does this work? How is the special set of n points determined, and why does this work?

How would you "warp" random variables if you have a shape that can't be nicely represented in a mathematical equation? Are there some numerical methods you can use?

For anybody interested in robotics out there, Monte Carlo methods are used in what's called a particle filter. The scenario where a particle filter is used is when a robot has a map of a landscape and takes sensor measurements of the landscape, but it doesn't specifically know where in the map it is. The robot keeps predicting where it is and resampling based on how good it's guesses are. What's cool is that this eventually converges to the robot's actual position.

Don't let me drone on about it though. Check out this better video! https://www.youtube.com/watch?v=aUkBa1zMKv4

I think for discrete functions sum makes sense. Since the CDF of discrete random variables is just the sum of the probability for any x' less than x.

I didn't quite understand why choosing a length along our spiral, then choosing based on the width at that point on the spiral where to locate our point would not give uniformly random points on the spiral...would it be a similar reasoning to the circle where the thicker parts on the spiral would have less dense randomly generated points while the thinner parts of the spiral would have more dense randomly generated points?

If we did want to generate random points along the spiral, would we have to use some equation for the thickness of the spiral versus its length or its area to change our random variables like we did with the circle?

Do we weight each sample evenly (just add f(x) for each sample and divide by the number of samples), or do we do something more complicated like evaluate the trapezoid/rectangle method on the samples?

Do the constants here not matter? If we have some suitable way to terminate (like if current primitive count is less than a set maximum count for a leaf), then we only really care about SAH of one partition relative to another, right?

What should we do if the lighting is directional? In that case L(p,w_i) will be 0 almost everywhere expect one certain solid angle. How can we do integration on that?

This only works for continuous functions right? Are there real world scenarios where we need to integrate discrete functions? Would we just use sum in that case ?

The integral should be 3x^2 to be integrated to x^3.

Besides being simpler and probably quicker, is there any other benefit of rasterization over ray tracing? Is there ever a situation where rasteriation will actually produce something that ray tracing cannot?