Netflix Data Analisys¶
import numpy as np # linear algebra
import pandas as pd# data processing, CSV file I/O (e.g. pd.read_csv)
netflix = pd.read_csv("netflix_titles.csv")
netflix.info()
Is Netflix has increasingly focusing on TV rather than movies in recent years?¶
So, first of all we will pull up the csv file and show the first five columns of it to see how the file is organized.
netflix[['type', 'title', 'listed_in', 'description']].head()
| type | title | listed_in | description | |
|---|---|---|---|---|
| 0 | Movie | Norm of the North: King Sized Adventure | Children & Family Movies, Comedies | Before planning an awesome wedding for his gra... |
| 1 | Movie | Jandino: Whatever it Takes | Stand-Up Comedy | Jandino Asporaat riffs on the challenges of ra... |
| 2 | TV Show | Transformers Prime | Kids' TV | With the help of three human allies, the Autob... |
| 3 | TV Show | Transformers: Robots in Disguise | Kids' TV | When a prison ship crash unleashes hundreds of... |
| 4 | Movie | #realityhigh | Comedies | When nerdy high schooler Dani finally attracts... |
And then, if we pull up the data of the type column and count all the movies and TV Shows, we will see that Netflix has more movies than TV shows on its catalog.
netflix['type'].value_counts(ascending=True)
TV Show 1969 Movie 4265 Name: type, dtype: int64
And by pulling up the data of added TV Shows and movies from 2018, 2019 and 2020, we will obtain the following numbers:
netflix[(netflix['type']=='TV Show')].value_counts(netflix['date_added'].str.contains('2020'))
netflix[(netflix['type']=='TV Show')].value_counts(netflix['date_added'].str.contains('2019'))
netflix[(netflix['type']=='TV Show')].value_counts(netflix['date_added'].str.contains('2018'))
date_added False 1467 True 492 dtype: int64
This code result means that 492 TV Shows were added in the analised years
netflix[(netflix['type']=='Movie')].value_counts(netflix['date_added'].str.contains('2020'))
netflix[(netflix['type']=='Movie')].value_counts(netflix['date_added'].str.contains('2019'))
netflix[(netflix['type']=='Movie')].value_counts(netflix['date_added'].str.contains('2018'))
date_added False 2974 True 1290 dtype: int64
This code result means that 1290 movies were added in the analised years
As we can see, Netflix added more movies than TV Shows in the analised years, but we have to considerate that many of the TV Shows has lots of seasons, so that doesn't mean that TV Shows are less relevant. And if we compare the numbers of TV shows uploaded in 2015,2016 and 2017 with the ones above, we will see that Netflix heavily increased the added TV Shows in its platform.
netflix[(netflix['type']=='TV Show')].value_counts(netflix['date_added'].str.contains('2017'))
netflix[(netflix['type']=='TV Show')].value_counts(netflix['date_added'].str.contains('2016'))
netflix[(netflix['type']=='TV Show')].value_counts(netflix['date_added'].str.contains('2015'))
date_added False 1927 True 32 dtype: int64
This code result means that 32 TV Shows were added in the analised years
So in 2015,2016 and 2017 Netflix added only 32 TV Shows. And in 2018,2019 and 2020,the stream platform added 492 TV Shows, 15 times more than the other past 3 years.
In conclusion Netflix has increased a lot the focus on TV Shows, although movies are the main focus of the stream service wich has added more than 1200 titles in the past three years. Therefore, Netflix didn't increased the focus on TV Shows rather than movies.
What is the greatest number of seasons for a TV show on Netflix?¶
Well, there is a lot of TV Shows with the same numbers of seasons, although there is only two TV Shows that are the biggest, as we can see bellow.
netflix[(netflix['type']=='TV Show')].value_counts(['duration'], ascending=True)
duration 14 Seasons 1 12 Seasons 2 13 Seasons 2 15 Seasons 2 10 Seasons 3 11 Seasons 3 9 Seasons 7 8 Seasons 16 7 Seasons 21 6 Seasons 22 5 Seasons 46 4 Seasons 61 3 Seasons 158 2 Seasons 304 1 Season 1321 dtype: int64
This code above counted the number of seasons of every TV Shows in the archive, by using value_counts command.
And if we select only the TV Shows that has more than 13 Seasons we will get the name and all the details of the biggest TV Shows in Netflix.
netflix_biggest_shows = netflix[((netflix['type']=='TV Show') & (netflix['duration'] >= '13 Seasons '))].sort_values(['duration'], ascending=True)
netflix_biggest_shows[['title', 'duration', 'listed_in', 'description']].head(3)
| title | duration | listed_in | description | |
|---|---|---|---|---|
| 5787 | Supernatural | 14 Seasons | Classic & Cult TV, TV Action & Adventure, TV H... | Siblings Dean and Sam crisscross the country, ... |
| 5908 | Grey's Anatomy | 15 Seasons | Romantic TV Shows, TV Dramas | Intern (and eventual resident) Meredith Grey f... |
| 5974 | NCIS | 15 Seasons | Crime TV Shows, TV Dramas, TV Mysteries | Follow the quirky agents of the NCIS – the Nav... |
This code above showed only the biggest TV Shows in the archive
Therefore, Grey's Anatomy and NCIS are the biggest TV Shows on netflix catalog with 15 seasons each as we can see by the data analisys above.