Advertisement
jules0707

feature_importance.py

Jan 10th, 2025
32
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 5.04 KB | None | 0 0
  1. # %% [markdown]
  2. # ## Title :
  3. #
  4. # Exercise: Feature Importance
  5. #
  6. # The goal of this exercise is to compare two feature importance methods; MDI, and Permutation Importance. For a discussion on the merits of each go to this <a href="https://scikit-learn.org/stable/modules/permutation_importance.html" target="_blank">link</a>.
  7. #
  8. # ## Description :
  9. #
  10. # <img src="./fig/fig2.png" style="width: 1000px;">
  11. #
  12. # ## Instructions:
  13. #
  14. # - Read the dataset `heart.csv` as a pandas dataframe, and take a quick look at the data.
  15. # - Assign the predictor and response variables as per the instructions given in the scaffold.
  16. # - Set a max_depth value.
  17. # - Define a `DecisionTreeClassifier` and fit on the entire data.
  18. # - Define a `RandomForestClassifier` and fit on the entire data.
  19. # - Calculate Permutation Importance for each of the two models. Remember that the MDI is automatically computed by sklearn when you call the classifiers.
  20. # - Use the routines provided to display the feature importance of bar plots. The plots will look similar to the one given above.
  21. #
  22. # ## Hints:
  23. #
  24. # <a href="https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html#" target="_blank">forest.feature_importances_</a>
  25. # Calculate the impurity-based feature importance.
  26. #
  27. # <a href="https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html#sklearn.inspection.permutation_importance" target="_blank">sklearn.inspection.permutation_importance()</a>
  28. # Calculate the permutation-based feature importance.
  29. #
  30. # <a href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html" target="_blank">sklearn.RandomForestClassifier()</a>
  31. # Returns a random forest classifier object.
  32. #
  33. # <a href="https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html" target="_blank">sklearn.DecisionTreeClassifier()</a>
  34. # Returns a decision tree classifier object.
  35. #
  36. #
  37. # **NOTE** - MDI is automatically computed by sklearn by calling RandomForestClassifier and/or DecisionTreeClassifier.
  38.  
  39. # %%
  40. # Import necessary libraries
  41. import numpy as np
  42. import pandas as pd
  43. import matplotlib.pyplot as plt
  44. import helper
  45. from sklearn.ensemble import RandomForestClassifier
  46. from sklearn.model_selection import train_test_split
  47. from sklearn.inspection import permutation_importance
  48. from sklearn.tree import DecisionTreeClassifier
  49. from helper import plot_permute_importance, plot_feature_importance
  50.  
  51.  
  52. # %%
  53. # Read the dataset "heart.csv"
  54. df = pd.read_csv("heart.csv")
  55.  
  56. # Take a quick look at the data
  57. df.head()
  58.  
  59.  
  60. # %%
  61. # Assign the predictor and response variables.
  62. # 'AHD' is the response and all the other columns are the predictors
  63. X = df.drop('AHD', axis=1)
  64. X_design = pd.get_dummies(X, drop_first=True)
  65. y = df['AHD']
  66.  
  67.  
  68. # %%
  69. # Set the model parameters
  70.  
  71. # The random state is fized for testing purposes
  72. random_state = 44
  73.  
  74. # Choose a `max_depth` for your trees
  75. max_depth = 3
  76.  
  77.  
  78. # %% [markdown]
  79. # ### SINGLE TREE
  80.  
  81. # %%
  82. ### edTest(test_decision_tree) ###
  83.  
  84. # Define a Decision Tree classifier with random_state as the above defined variable
  85. # Set the maximum depth to be max_depth
  86. tree = DecisionTreeClassifier(random_state=random_state, max_depth=max_depth)
  87.  
  88. # Fit the model on the entire data
  89. tree.fit(X_design, y)
  90.  
  91. # Using Permutation Importance to get the importance of features for the Decision Tree
  92. # with random_state as the above defined variable
  93. tree_result = permutation_importance(tree, X_design, y, random_state=random_state)
  94.  
  95.  
  96. # %% [markdown]
  97. # ### RANDOM FOREST
  98.  
  99. # %%
  100. ### edTest(test_random_forest) ###
  101.  
  102. # Define a Random Forest classifier with random_state as the above defined variable
  103. # Set the maximum depth to be max_depth and use 10 estimators
  104. forest = RandomForestClassifier(random_state=random_state, n_estimators=10, max_depth=max_depth)
  105.  
  106. # Fit the model on the entire data
  107. forest.fit(X_design, y)
  108.  
  109. # Use Permutation Importance to get the importance of features for the Random Forest model
  110. # with random_state as the above defined variable
  111. forest_result = permutation_importance(forest, X_design, y, random_state=random_state)
  112.  
  113.  
  114. # %% [markdown]
  115. # ### PLOTTING THE FEATURE RANKING
  116.  
  117. # %%
  118. # Helper code to visualize the feature importance using 'MDI'
  119. plot_feature_importance(tree,forest,X_design,y)
  120.  
  121. # Helper code to visualize the feature importance using 'permutation feature importance'
  122. plot_permute_importance(tree_result,forest_result,X_design,y)
  123.  
  124.  
  125. # %% [markdown]
  126. # ⏸ A common criticism for the MDI method is that it assigns a lot of importance to noisy features (more here). Did you make such an observation in the plots above?
  127.  
  128. # %%
  129. ### edTest(test_chow1) ###
  130. # Type your answer within in the quotes given
  131. answer1 = 'yes'
  132.  
  133.  
  134. # %% [markdown]
  135. # ⏸ After marking, change the max_depth for your classifiers to a very low value such as
  136. # 3
  137. # , and see if you see a change in the relative importance of predictors.
  138.  
  139. # %%
  140. ### edTest(test_chow2) ###
  141. # Type your answer within in the quotes given
  142. answer2 = 'yes'
  143.  
  144.  
  145.  
  146.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement