My Brain Cells

Easiest (and best) learning materials for anyone with a curiosity for machine learning and artificial intelligence, Deep learning, Programming, and other fun life hacks.

One-Hot Encoding in Python

What is One Hot Encoding?

A one-hot encoding is a representation of categorical variables as binary vectors.

This first requires that the categorical values be mapped to integer values.

Then, each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.

Why use One Hot Encoding?

Many machine learning algorithms cannot work with categorical data directly. The categories must be converted into numbers. This is required for both input and output variables that are categorical.

One Hot Encode with scikit-learn

step 1: Create the dataset

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29]})

#view DataFrame
print(df)

  team  points
0    A      25
1    A      12
2    B      15
3    B      14
4    B      19
5    B      23
6    C      25
7    C      29

Step 2: Perform one hot encoding

from sklearn.preprocessing import OneHotEncoder

#creating instance of one-hot-encoder
encoder = OneHotEncoder(handle_unknown='ignore')

#perform one-hot encoding on 'team' column 
encoder_df = pd.DataFrame(encoder.fit_transform(df[['team']]).toarray())

#merge one-hot encoded columns back with original DataFrame
final_df = df.join(encoder_df)

#view final df
print(final_df)

  team  points    0    1    2
0    A      25  1.0  0.0  0.0
1    A      12  1.0  0.0  0.0
2    B      15  0.0  1.0  0.0
3    B      14  0.0  1.0  0.0
4    B      19  0.0  1.0  0.0
5    B      23  0.0  1.0  0.0
6    C      25  0.0  0.0  1.0
7    C      29  0.0  0.0  1.0

Step 3: Drop the column and get the results

#drop 'team' column
final_df.drop('team', axis=1, inplace=True)

#view final df
print(final_df)

   points    0    1    2
0      25  1.0  0.0  0.0
1      12  1.0  0.0  0.0
2      15  0.0  1.0  0.0
3      14  0.0  1.0  0.0
4      19  0.0  1.0  0.0
5      23  0.0  1.0  0.0
6      25  0.0  0.0  1.0
7      29  0.0  0.0  1.0

Anthony

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top