20
loading...
This website collects cookies to deliver better user experience
.Series()
method. We will also give the Series a name by passing a string to the name =
keyword argument. This Series name will also provide pandas with the column name for these values when added to a DataFrame. import pandas as pd
years = [2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014]
year_series = pd.Series(data=years, name='year')
year_series
0 2021
1 2020
2 2019
3 2018
4 2017
5 2016
6 2015
7 2014
Name: year, dtype: int64
.values
attribute gives us the list of years we gave to pandas initially, but we see here that it has been turned it into a numpy array.print(type(year_series.values))
year_series.values
<class 'numpy.ndarray'>
array([2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014])
.memory_usage()
method, which outputs the amount of space being used in bytes. For a refresher, every byte is 8 bits. Since our Series contains 8 values, the sum of the bytes used for our array will equal the amount of bits used for each element in the array..memory_usage()
method should give us an output of 64bytes.print(year_series.memory_usage(), 'bytes')
192 bytes
.nbytes
attribute will give us the memory consumption of a numpy array. Let's combine it with the .values
and .index
attributes on the Series object to check the consumption of both.print(
f'''
values consumption: {year_series.values.nbytes} bytes
index consumption: {year_series.index.nbytes} bytes
''')
values consumption: 64 bytes
index consumption: 128 bytes
year_series.index.dtype
dtype('int64')
pd.DatFrame()
method. To reduce the amount of memory used by the index, let's try creating our own index with the pandas .Index()
method. Then we can recreate our Series with our new custom index by passing it to the index=
keyword argument. For even more memory efficiency we can alter the dtype used for the underlying data in our Series. Pandas decided to use 'int64' which can store values between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807. This is far more space than we need. Let's downgrade the data type of our values to 'uint16'. The 'u' is for unsigned, which works well for values that will not be negative like a year and 'int16' for 16 bits, which can store numbers ranging between 0 and 65,535. This is plenty for representing the years of python and pandas library versions.index = pd.Index(list(range(len(years))))
year_series2 = pd.Series(years, index=index, dtype='uint16', name='year')
year_series2
0 2021
1 2020
2 2019
3 2018
4 2017
5 2016
6 2015
7 2014
Name: year, dtype: uint16
print(
f'''
total consumption: {year_series2.memory_usage()} bytes
values consumption: {year_series2.values.nbytes} bytes
index consumption: {year_series2.index.nbytes} bytes
''')
total consumption: 80 bytes
values consumption: 16 bytes
index consumption: 64 bytes
python_versions = [3.9, 3.9, 3.8, 3.7, 3.6, 3.6, 3, 3]
python_series = pd.Series(python_versions, index=index, dtype='float32', name='python_version')
python_series
0 3.9
1 3.9
2 3.8
3 3.7
4 3.6
5 3.6
6 3.0
7 3.0
Name: python_version, dtype: float32
print(
f'''
total consumption: {python_series.memory_usage()} bytes
values consumption: {python_series.values.nbytes} bytes
index consumption: {python_series.index.nbytes} bytes
''')
total consumption: 96 bytes
values consumption: 32 bytes
index consumption: 64 bytes
pandas_versions = ['1.2->1.4', '1.0->1.1', '0.24->0.25', '0.23', '0.20->0.22',
'0.18->0.19','0.16->0.17','0.13->0.15']
pandas_series = pd.Series(pandas_versions, index=index, name='pandas_version')
pandas_series
0 1.2->1.4
1 1.0->1.1
2 0.24->0.25
3 0.23
4 0.20->0.22
5 0.18->0.19
6 0.16->0.17
7 0.13->0.15
Name: pandas_version, dtype: object
print(
f'''
total consumption: {pandas_series.memory_usage()} bytes
values consumption: {pandas_series.values.nbytes} bytes
index consumption: {pandas_series.index.nbytes} bytes
''')
total consumption: 128 bytes
values consumption: 64 bytes
index consumption: 64 bytes
pd.concat()
method.df = pd.concat([year_series2, python_series, pandas_series], axis = 1)
df
.memory_usage()
method as the Series' used.print(df.memory_usage())
print()
print(f'total memory consumption: {df.memory_usage().sum()} bytes')
Index 64
year 16
python_version 32
pandas_version 64
dtype: int64
total memory consumption: 176 bytes
pd.DataFrame()
constructor method and checked the memory consumption.data = {'year': years, 'python_versions': python_versions, 'pandas_versions': pandas_versions}
df2 = pd.DataFrame(data=data)
df2
print(df2.memory_usage())
print()
print(f'total memory consumption: {df2.memory_usage().sum()} bytes')
Index 128
year 64
python_versions 64
pandas_versions 64
dtype: int64
total memory consumption: 320 bytes
pd.DatFrame()
method and no customizing came in at 320 bytes. That's a 45% increase in memory consumption!