Label Encoding
Label Encoding is a popular encoding technique for handling categorical variables. In this technique, each label is assigned a unique integer based on alphabetical ordering.
Let’s see how to implement label encoding in Python using the scikit-learn library and also understand the challenges with label encoding.
Let’s first import the required libraries and dataset:
#importing the libraries
import pandas as pd
import numpy as np
#reading the dataset
df=pd.read_csv("Salary.csv")
Output:
Understanding the datatypes of features:
print df.info
Output:
As you can see here, the first column, Country, is the categorical feature as it is represented by the object data type and the rest of them are numerical features as they are represented by int64.
Now, let us implement label encoding in Python:
# Import label encoder
from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'Country'.
data['Country']= label_encoder.fit_transform(data[‘Country'])
print(data.head())