Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # %% [markdown]
- # ## Title :
- # Exercise: Visualizing a Decision Tree
- #
- # ## Description :
- # The aim of this exercise is to visualize the decision tree that is created when performing Decision Tree Classification or Regression. The tree will look similar to the one given below.
- #
- # <img src="./fig1.png" style="background-color:white;width:1300px;" >
- #
- # ## Data Description:
- # We are trying to predict the winner of the 2016 Presidential election (Trump vs. Clinton) in each county in the US. To do this, we will consider several predictors including minority: the percentage of residents that are minorities and bachelor: the percentage of resident adults with a bachelor's degree (or higher).
- #
- # ## Instructions:
- #
- # - Read the datafile `county_election_train.csv` into a Pandas data frame.
- # - Create the response variable based on the columns `trump` and `clinton`.
- # - Initialize a Decision Tree classifier of depth 3 and fit on the training data.
- # - Visualise the Decision Tree.
- #
- # ## Hints:
- #
- # <a href="https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html" target="_blank">sklearn.DecisionTreeClassifier()</a>Generates a Logistic Regression classifier.
- #
- # <a href="https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier.fit" target="_blank">classifier.fit()</a>Build a decision tree classifier from the training set (X, y).
- #
- # <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html" target="_blank">plt.scatter()</a>A scatter plot of y vs. x with varying marker size and/or color.
- #
- # <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html" target="_blank">plt.xlabel()</a>Set the label for the x-axis.
- #
- # <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylabel.html" target="_blank">plt.ylabel()</a>Set the label for the y-axis.
- #
- # <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html" target="_blank">plt.legend()</a>Place a legend on the Axes.
- #
- # <a href="https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html" target="_blank">tree.plot_tree()</a>Plot a decision tree.
- #
- # **Note: This exercise is auto-graded and you can try multiple attempts. **
- # %%
- # Import necessary libraries
- import numpy as np
- import pandas as pd
- import sklearn as sk
- import seaborn as sns
- from sklearn import tree
- import matplotlib.pyplot as plt
- from sklearn.tree import DecisionTreeClassifier
- from sklearn.model_selection import cross_val_score
- pd.set_option('display.width', 100)
- pd.set_option('display.max_columns', 20)
- plt.rcParams["figure.figsize"] = (12,8)
- # %%
- # Read the datafile "county_election_train.csv" as a Pandas dataframe
- elect_train = pd.read_csv("../DATA/county_election_train.csv")
- # Read the datafile "county_election_test.csv" as a Pandas dataframe
- elect_test = pd.read_csv("../DATA/county_election_test.csv")
- # Take a quick look at the dataframe
- elect_train.head()
- # %%
- ### edTest(test_response) ###
- # Creating the response variable
- # Set all the rows in the train data where "trump" value is more than "clinton" as 1
- y_train = np.where(elect_train['trump'] > elect_train['clinton'],'1','0')
- # Set all the rows in the test data where "trump" value is more than "clinton" as 1
- y_test = np.where(elect_test['trump'] > elect_test['clinton'],'1','0')
- # %%
- # Plot "minority" vs "bachelor" as a scatter plot
- # Set colours blue for Trump and green for Clinton
- vote_train = pd.DataFrame(y_train.reshape(-1,1),columns=['won'])
- # Your code here
- plt.scatter(elect_train['minority'],elect_train['bachelor'],c=['r' if vote_result == '1' else 'b' for vote_result in vote_train['won']])
- vote_train.head()
- vote_train.shape
- # %%
- # Initialize a Decision Tree classifier of depth 3 and choose
- # splitting criteria to be the gini
- dtree = DecisionTreeClassifier(max_depth=3,criterion='gini')
- # Fit the classifier on the train data
- # but only use the minority column as the predictor variable
- x = elect_train[['minority']]
- y = vote_train['won']
- dtree.fit(x,y)
- # %% [markdown]
- #
- # %%
- # Code to set the size of the plot
- plt.figure(figsize=(30,20))
- # Plot the Decision Tree trained above with parameters filled as True
- tree.plot_tree(decision_tree=dtree,filled=True,impurity=True,node_ids=True,proportion=True,rounded=True)
- plt.show();
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement