ML|AI|DS

One-Hot Encoding in Python

3 years ago
Read Time: 1 minute
by Anthony
Leave a comment

What is One Hot Encoding?

A one-hot encoding is a representation of categorical variables as binary vectors.

This first requires that the categorical values be mapped to integer values.

Then, each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.

Why use One Hot Encoding?

Many machine learning algorithms cannot work with categorical data directly. The categories must be converted into numbers. This is required for both input and output variables that are categorical.

One Hot Encode with scikit-learn

step 1: Create the dataset

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29]})

#view DataFrame
print(df)

  team  points
0    A      25
1    A      12
2    B      15
3    B      14
4    B      19
5    B      23
6    C      25
7    C      29

Step 2: Perform one hot encoding

from sklearn.preprocessing import OneHotEncoder

#creating instance of one-hot-encoder
encoder = OneHotEncoder(handle_unknown='ignore')

#perform one-hot encoding on 'team' column 
encoder_df = pd.DataFrame(encoder.fit_transform(df[['team']]).toarray())

#merge one-hot encoded columns back with original DataFrame
final_df = df.join(encoder_df)

#view final df
print(final_df)

  team  points    0    1    2
0    A      25  1.0  0.0  0.0
1    A      12  1.0  0.0  0.0
2    B      15  0.0  1.0  0.0
3    B      14  0.0  1.0  0.0
4    B      19  0.0  1.0  0.0
5    B      23  0.0  1.0  0.0
6    C      25  0.0  0.0  1.0
7    C      29  0.0  0.0  1.0

Step 3: Drop the column and get the results

#drop 'team' column
final_df.drop('team', axis=1, inplace=True)

#view final df
print(final_df)

   points    0    1    2
0      25  1.0  0.0  0.0
1      12  1.0  0.0  0.0
2      15  0.0  1.0  0.0
3      14  0.0  1.0  0.0
4      19  0.0  1.0  0.0
5      23  0.0  1.0  0.0
6      25  0.0  0.0  1.0
7      29  0.0  0.0  1.0

How to Build an LLM-Powered ChatBot with Streamlit

PyTorch for Mac M1/M2 with GPU Acceleration: A Small Guide

Deploying LLaMA 2 on Amazon SageMaker with Hugging Face DLCs

Personal Finance Analysis with Local LLMs

Web Scraping with Python to Creating ML/AI Datasets

Curated 65 Cheatsheets (All you need)

One-Hot Encoding in Python

What is One Hot Encoding?

Why use One Hot Encoding?

One Hot Encode with scikit-learn

Related

Anthony

How to Build an LLM-Powered ChatBot with Streamlit

PyTorch for Mac M1/M2 with GPU Acceleration: A Small Guide

Deploying LLaMA 2 on Amazon SageMaker with Hugging Face DLCs

Personal Finance Analysis with Local LLMs

Leave a Reply Cancel reply

FastApi

How to Build an LLM-Powered ChatBot with Streamlit

PyTorch for Mac M1/M2 with GPU Acceleration: A Small Guide

Deploying LLaMA 2 on Amazon SageMaker with Hugging Face DLCs

Personal Finance Analysis with Local LLMs

Popular Post

Recent Post

How to Build an LLM-Powered ChatBot with Streamlit

PyTorch for Mac M1/M2 with GPU Acceleration: A Small Guide

Deploying LLaMA 2 on Amazon SageMaker with Hugging Face DLCs

One-Hot Encoding in Python

What is One Hot Encoding?

Why use One Hot Encoding?

One Hot Encode with scikit-learn

Related

Anthony

Related Posts

Leave a Reply Cancel reply

Popular Post

Share It

Categories

Recent Post