diff options
author | sotech117 <michael_foiani@brown.edu> | 2024-03-11 15:40:47 -0400 |
---|---|---|
committer | sotech117 <michael_foiani@brown.edu> | 2024-03-11 15:40:47 -0400 |
commit | 6cbc309cb94551e75d914e105ff94e5a39878abf (patch) | |
tree | d5fc7ae437fd3ca766e317ac307361c08f4563e2 | |
parent | 7b4d951fa00ee0e94d1d1b65a2f2f06cb9850146 (diff) |
update readme & add energy graph to match chi^2 curve
-rw-r--r-- | README.md | 24 | ||||
-rw-r--r-- | figs/rs-1dvels.png | bin | 28263 -> 32212 bytes | |||
-rw-r--r-- | figs/rs-3dvels.png | bin | 60791 -> 68968 bytes | |||
-rw-r--r-- | figs/rs-energies.png | bin | 0 -> 25811 bytes | |||
-rw-r--r-- | figs/rs-speeds.png | bin | 17602 -> 26668 bytes | |||
-rw-r--r-- | figs/rw-demo3.png | bin | 134331 -> 135108 bytes | |||
-rw-r--r-- | random-speed.jl | 25 |
7 files changed, 43 insertions, 6 deletions
@@ -1,2 +1,24 @@ # ab-testing-special-topics -The simulations used to give intuition and understanding into the reasoning behind why the normal and chi^2 curves are declarative for AB tests.. +The simulations used to give intuition and understanding into the reasoning behind why the normal and chi^2 curves are declarative for AB tests through an example of the random walk and random speed problems. + +## Random Walk +Notice how the random walks frequency graph created a normal distribution, which became "more normal" with more points on the walk. +This frequency plot represents the (un-normalized) pdf of finding a points, so we can quanitivaly define how probable it is to find a walk at a certain distance. + +Relating to AB testing, we can view the t-score as this particle doing a "random walk", with the p-value as the probability of finding a walk at that distance. By converting the average to a t-score, you have effectively normalized the walk, and can use the normal distribution to find the p-value. + +When you argue that "if the p-value is less than 0.05, then the null hypothesis is rejected", you are saying that "if the probability of finding a sample in this position is less than .05 and I found it (in your sampleA vs sampleB calculations), then it's highly unlinkely this path is a coicidence and the null hypothesis can be rejected." + +## Random Speeds +Notice how the distribution of random speeds and enegeries frequency graph is not normal - it's mostly normal but skewed with a longer right tail. This is because speed has no direction (as it's the magnitude of velocity), so the distrubtion is no longer normal. + +While the velocity for each dimension in our system (x, y, z) has a normal distribution, the sum of the squares of them results in this non-symmetic, non-normal frequncy map, the chi distribution (not squared yet). Now chi^2 relates the energies of the system (speed^2), which is directly proportional the generalized chaos in the system (entropy). + +Chi^2 is used to test the null hypothesis of "no difference" between categorical variables in AB testing because it measures generalized, non-directional chaos among all dimensions of the system. If your distrubtions from the dimensions are similar, it should converge to be highly-chaotic & high-energy, as stated by the second law of thermodynamics. By contrast, if your underlying distrubtions create an immensely low-chaotic (i.e. low-energy state), then it's highly likely these underlying distrubtions are different. + +Relating to AB testing, when you argue that, for chi^2, "if the p-value is less than 0.05, then the null hypothesis is rejected", you are saying that "if the probability of finding these directions at energy level this low and I found it (in your sampleA vs sampleB calculations), then it's highly unlinkely this state is a coicidence (violates the second law of thermodynamics) and the null hypothesis can be rejected (i.e. these distributions are not the same)." + +In theory, for performing a hypothesis test with categorical variables, we take each dimension to be the difference between the normal distrubtions (of differences in observed-expected) of the samples. This encapsulates the difference between the distrubtions into a normal curve, which we combine into the chi^2 curve (visuals helps this explanation, see video). + +## Video of Special Topic Hour +TODO: add video after the hours
\ No newline at end of file diff --git a/figs/rs-1dvels.png b/figs/rs-1dvels.png Binary files differindex c4c06d4..00f7c1d 100644 --- a/figs/rs-1dvels.png +++ b/figs/rs-1dvels.png diff --git a/figs/rs-3dvels.png b/figs/rs-3dvels.png Binary files differindex 8dae989..f021ad5 100644 --- a/figs/rs-3dvels.png +++ b/figs/rs-3dvels.png diff --git a/figs/rs-energies.png b/figs/rs-energies.png Binary files differnew file mode 100644 index 0000000..a0705c0 --- /dev/null +++ b/figs/rs-energies.png diff --git a/figs/rs-speeds.png b/figs/rs-speeds.png Binary files differindex 6110329..0309747 100644 --- a/figs/rs-speeds.png +++ b/figs/rs-speeds.png diff --git a/figs/rw-demo3.png b/figs/rw-demo3.png Binary files differindex 09e8623..d807594 100644 --- a/figs/rw-demo3.png +++ b/figs/rw-demo3.png diff --git a/random-speed.jl b/random-speed.jl index 0af9121..058fedd 100644 --- a/random-speed.jl +++ b/random-speed.jl @@ -1,13 +1,22 @@ using Plots using Distributions -num_velocities = 1000 +num_velocities = 100000 +num_dimensions = 3 println("Starting Random Speed Simualtions...\n") function make_random_velocity() # pull 3 nums randomly from normal distribution N = Normal(0, 1) + if num_dimensions == 1 + return (rand(N), 0, 0) + end + + if num_dimensions == 2 + return (rand(N), rand(N), 0) + end + return (rand(N), rand(N), rand(N)) end @@ -56,14 +65,20 @@ p = Plots.scatter( title="Velocities") savefig(p, "figs/rs-3dvels.png") -# plot their speeds speeds = [sqrt(v[1]^2 + v[2]^2 + v[3]^2) for v in velocities] p = histogram( - speeds, title="Randomly Generated Speeds (n=$num_velocities)", - legend=false, xlabel="Speed", ylabel="Frequency") + speeds, title="Randomly Generated Speeds (n=$num_velocities, d=$num_dimensions)", + legend=false, xlabel="Speed = \$ √(v_x^2 + v_y^2 + v_z^2) \$", ylabel="Frequency") savefig(p, "figs/rs-speeds.png") + +# plot their energy +energies = [.5 * (v[1]^2 + v[2]^2 + v[3]^2) for v in velocities] +p = histogram( + energies, title="Randomly Generate`d Energies (n=$num_velocities, d=$num_dimensions)", + legend=false, xlabel="Energy = \$ .5m(v_x^2 + v_y^2 + v_z^2) \$", ylabel="Frequency") +savefig(p, "figs/rs-energies.png") # print the mean and standard deviation of the speed distribution -println("\nSpeed->\tμ:$(mean(speeds)), σ:$(std(speeds)), n:$(length(speeds))") +println("\tEnergies->\tμ:$(mean(energies)), σ:$(std(energies)), n:$(length(energies))") println("\nRandom Speed Simualtions Complete!")
\ No newline at end of file |