Advertisement
elena1234

Sampling from a entire and biased population in Python

May 19th, 2022 (edited)
450
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 2.53 KB | None | 0 0
  1. import numpy as np  # for sampling for the distributions
  2. import matplotlib.pyplot as plt  # for basic plotting
  3. import seaborn as sns  # for plotting of the histograms
  4.  
  5. '''
  6. Sampling from a Biased Population
  7. In this tutorial we will go over some code that recreates the visualizations in the Interactive Sampling Distribution Demo.
  8. This demo looks at a hypothetical problem that illustrates what happens when we sample from a biased population and
  9. not the entire population we are interested in.
  10. '''
  11. # Recreate the simulations from the video
  12. mean_uofm = 155
  13. sd_uofm = 5
  14. mean_gym = 185
  15. sd_gym = 5
  16. gymperc = .3
  17. totalPopSize = 40000
  18.  
  19. # Create the two subgroups
  20. uofm_students = np.random.normal(
  21.     mean_uofm, sd_uofm, int(totalPopSize * (1 - gymperc)))
  22. students_at_gym = np.random.normal(
  23.     mean_gym, sd_gym, int(totalPopSize * (gymperc)))
  24.  
  25. # Create the population from the subgroups
  26. population = np.append(uofm_students, students_at_gym)
  27.  
  28. # Set up the figure for plotting
  29. plt.figure(figsize=(10, 12))
  30.  
  31. # Plot the UofM students only
  32. plt.subplot(3, 1, 1)
  33. sns.distplot(uofm_students)
  34. plt.title("UofM Students Only")
  35. plt.xlim([140, 200])
  36.  
  37. # Plot the Gym Goers only
  38. plt.subplot(3, 1, 2)
  39. sns.distplot(students_at_gym)
  40. plt.title("Gym Goers Only")
  41. plt.xlim([140, 200])
  42.  
  43. # Plot both groups together
  44. plt.subplot(3, 1, 3)
  45. sns.distplot(population)
  46. plt.title("Full Population of UofM Students")
  47. plt.axvline(x=np.mean(population))
  48. plt.xlim([140, 200])
  49.  
  50. plt.show()
  51.  
  52.  
  53.  
  54. '''
  55. In practice !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  56. What Happens if We Sample from the Entire Population?
  57. We will sample randomly from all students at the University of Michigan.
  58. '''
  59. # Simulation parameters
  60. numberSamps = 5000
  61. sampSize = 50
  62.  
  63. # Get the sampling distribution of the mean from only the gym
  64. mean_distribution = np.empty(numberSamps)
  65. for i in range(numberSamps):
  66.     random_students = np.random.choice(population, sampSize)
  67.     mean_distribution[i] = np.mean(random_students)
  68.    
  69. # Plot the population and the biased sampling distribution
  70. plt.figure(figsize = (10,8))
  71.  
  72. # Plotting the population again
  73. plt.subplot(2,1,1)
  74. sns.distplot(population)
  75. plt.title("Full Population of UofM Students")
  76. plt.axvline(x = np.mean(population))
  77. plt.xlim([140,200])
  78.  
  79. # Plotting the sampling distribution
  80. plt.subplot(2,1,2)
  81. sns.distplot(mean_distribution)
  82. plt.title("Sampling Distribution of the Mean Weight of All UofM Students")
  83. plt.axvline(x = np.mean(population))
  84. plt.axvline(x = np.mean(mean_distribution), color = "black")
  85. plt.xlim([140,200])
  86.  
  87. plt.show()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement