How to use Pandas pd.cut with Age example

meet Shah
2 min readFeb 6, 2021

Method pd.cut is used to bin values into discrete intervals. If you are a novice python programmer then the below example might help you to clear some of your confusion. Let’s start by creating a simple Dataframe.

import pandas as pddata = [5], [16], [24], [25], []
df = pd.DataFrame(data, columns=['Age'])
bins = [5, 16, 25, 101]
group = ['<16', '16-24', '>24']
df['age_group] = pd.cut(df['Age'], bins=bins, labels=group, right=False).cat.add_categories('missing').fillna('missing')
df

As you can see, I want to group the age into three categories. <16, 16–24 and >24. Please note that 16–24 is inclusive. Based on this age group I’ve created my bins and to make sure that it works as expected, in the line where I’ve created a new column called ‘age_group’, I’ve used right=False. Below you can see the output.

Output of the DataFrame using right=False
using right=False

right=False means that it will not include the rightmost edge. So my bins will look like [5,16), [16,25), [25,101). Please note that the default value is right=True. Let’s see what output do we get when we use right=True.

Output of DataFrame using right=True
using right=True

Now the bins will include the rightmost edge. Bins will look like (5,16], (16,25], (25,101]. After entering the parameters in pd.cut(), I’ve chained it with .cat.add_categories(‘missing’).fillna(‘missing’). This will add a category called ‘missing’ to all the values which are not satisfying the bin conditions and also to NaN. As you can see in the above output age ‘5’ is categorized as ‘missing’ because the bin (5,16] includes values from 6–16.

Thank you for reading my article. HAPPY CODING!!!

--

--