My Brain Cells

Easiest (and best) learning materials for anyone with a curiosity for machine learning and artificial intelligence, Deep learning, Programming, and other fun life hacks.

Label Encoding

Label Encoding

Label Encoding is a popular encoding technique for handling categorical variables. In this technique, each label is assigned a unique integer based on alphabetical ordering.

Let’s see how to implement label encoding in Python using the scikit-learn library and also understand the challenges with label encoding.

Let’s first import the required libraries and dataset:

#importing the libraries
import pandas as pd
import numpy as np
	
#reading the dataset
df=pd.read_csv("Salary.csv")

Output:

One Hot Encoding

Understanding the datatypes of features:

print df.info

Output:

One Hot Encoding

As you can see here, the first column, Country, is the categorical feature as it is represented by the object data type and the rest of them are numerical features as they are represented by int64.

Now, let us implement label encoding in Python:

# Import label encoder
from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'Country'.
data['Country']= label_encoder.fit_transform(data[‘Country'])
print(data.head())

Anthony

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top