elena1234

summary statistics in Python

May 6th, 2022 (edited)
334
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.85 KB | None | 0 0
  1. import pandas as pd
  2. import numpy as np
  3.  
  4. da = pd.read_csv("nhanes_2015_2016.csv")
  5. x = da.BMXWT.dropna()  # Extract all non-missing values of BMXWT into a variable called 'x'
  6. print(x.mean()) # Pandas method
  7. print(np.mean(x)) # Numpy function
  8.  
  9. print(x.median())
  10. print(np.percentile(x, 50))  # 50th percentile, same as the median
  11. print(np.percentile(x, 75))  # 75th percentile
  12. print(x.quantile(0.75)) # Pandas method for quantiles, equivalent to 75th percentile
  13.  
  14. # Considering only the systolic condition, we can calculate the proprotion of the NHANES sample who would be considered to have pre-hypertension.
  15. np.mean((da.BPXSY1 >= 120) & (da.BPXSY2 <= 139))  # "&" means "and"
  16. np.mean((da.BPXDI1 >= 80) & (da.BPXDI2 <= 89))
  17.  
  18. # or
  19. a = (da.BPXSY1 >= 120) & (da.BPXSY2 <= 139)
  20. b = (da.BPXDI1 >= 80) & (da.BPXDI2 <= 89)
  21. print(np.mean(a | b))  # "|" means "or"
Add Comment
Please, Sign In to add comment