Learn One Hot Encoding in 3 Minutes
Important feature used during Machine Learning Process — One Hot Encoding.
What is One Hot Encoding
To better understand One Hot Encoding, we would need to have a good understanding of Data. Note that we have different type of Data:
Continuous Data
Continuous Data also known as numeric data are values that can be quantified, measured, counted, or aggregated.
Usually in numeric digits(0–9) and can be used to perform mathematical operations on them
Examples:
Age: 30, 45, 18, 65, etc.
Salary ($) : 50,000, 75,000, 100,000
Weight (kg): 70.5, 65.2, 90.0
Categorical Data
Categorical Data are non-numeric values that can be classified. They can be classified based on specific labels. Either Nominal or Ordinal These are typically used for grouping, ordering or ranking
Examples:
- Booleans: True or False, Yes or No,
- Gender: Male or Female
- Income Level: Low, Medium, High
- Position: First, Second, Third
- Payment Type: Credit Card, Debit Card, Cash, PayPal
- Product Ratings: 1 star, 2 stars, 3 stars, 4 stars, 5 stars
One-Hot Encoding.
One Hot Encoding is a feature in Machine Learning used for turning categorical data into numbers. It is specifically built for categorical values. It is nothing but a group of values in binary e.g 001001, 10001, 000111 etc. This binary features is also known as the process of creating Dummy Variables.
When is it been used
One-Hot Encoding is often applied to nominal variables, in order to improve the performance of the algorithm. Nominal variables are variables that cannot be ordered.
It is typically done before splitting the data into the training dataset and the testing dataset.
One Hot Encoding is used at the data preprocessing stage to prepare the data in a suitable format for the machine learning algorithm to learn from it effectively.
How to Use it:
To use One Hot Encoding, import OneHotEncoder from SciKit Learn Preprocessing library. This library is fetched from sklearn preprocessing package in order to convert categorical data into numerical data. So, it prepares this raw data and makes it suitable for machine learning models.
from sklearn.preprocessing import OneHotEncoder
Here is an official link to Scikit Learn documentation on One Hot Encoding.
Read Full Article Here