Exploratory Data Analysis of CO2 Emissions Data with Python
This EDA is a work in progress and is the groundwork for a visulisation in Power Bi.
ToDo:
- Remove Global Total Row From the Top 20 and 5 Datasets
- % Increase in Avaiation and Shipping
- Extract Figures for Aviation and Shipping
- % of Total Emissions by the Top 20 - 1970
- % of Total Emissions by the Top 5 - 1970
- % of Total Emissions by the Top 20 - 2023
- % of Total Emissions by the Top 5 - 2023
- Total CO2 Emissions for Each Year
- Grand Total Emissions of Co2
- Create a ToC
- Create A Visual/Dashboard
An EDA of CO2 emissons based on EDGAR(Emissions Database for Global Atmospheric Research) data.
# Import
# To read Excel files install openpyxl, 'pip install openpyxl'.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the Excel file
file_path = 'data/EDGAR_2024_GHG_booklet_2024_fossilCO2only.xlsx'
excel_file = pd.ExcelFile(file_path)
# Get the sheet names
sheet_names = excel_file.sheet_names
print(sheet_names)
['info', 'citations and references', 'fossil_CO2_totals_by_country', 'fossil_CO2_by_sector_country_su', 'fossil_CO2_per_GDP_by_country', 'fossil_CO2_per_capita_by_countr']
# Load a specific 'fossil_CO2_totals_by_country' sheet into a DataFrame
# You can load the other sheets here for additional Analysis
data = pd.read_excel(file_path, sheet_name='fossil_CO2_totals_by_country')
# Inspect the Data
print(data.head())
print(data.info())
print(data.describe())
Substance EDGAR Country Code Country 1970 \
0 CO2 ABW Aruba 0.025214
1 CO2 AFG Afghanistan 1.733920
2 CO2 AGO Angola 8.933899
3 CO2 AIA Anguilla 0.002178
4 CO2 AIR International Aviation 169.900399
1971 1972 1973 1974 1975 1976 \
0 0.028828 0.039472 0.044289 0.043469 0.057396 0.056423
1 1.733710 1.693584 1.733905 2.190318 2.028967 1.892642
2 8.519513 10.366104 11.346996 11.806561 10.904653 7.291981
3 0.002178 0.002273 0.002118 0.002360 0.002594 0.002444
4 169.900399 179.759531 187.494406 180.478129 174.582471 174.907983
... 2014 2015 2016 2017 2018 \
0 ... 0.440689 0.462026 0.484889 0.466592 0.465881
1 ... 7.825741 8.346521 7.527594 8.066138 7.932005
2 ... 30.887264 33.097499 31.285803 27.942099 26.258887
3 ... 0.027917 0.028027 0.028363 0.029087 0.028247
4 ... 507.505761 536.213680 560.173839 589.919315 615.937542
2019 2020 2021 2022 2023
0 0.557917 0.452553 0.500635 0.502693 0.530026
1 7.249069 7.054133 7.930781 8.259915 8.707350
2 27.573216 20.710918 25.262832 27.353038 28.229928
3 0.027604 0.022804 0.022018 0.021861 0.022956
4 625.141435 298.655678 331.317425 411.474866 491.632308
[5 rows x 57 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 57 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Substance 212 non-null object
1 EDGAR Country Code 212 non-null object
2 Country 212 non-null object
3 1970 212 non-null float64
4 1971 212 non-null float64
5 1972 212 non-null float64
6 1973 212 non-null float64
7 1974 212 non-null float64
8 1975 212 non-null float64
9 1976 212 non-null float64
10 1977 212 non-null float64
11 1978 212 non-null float64
12 1979 212 non-null float64
13 1980 212 non-null float64
14 1981 212 non-null float64
15 1982 212 non-null float64
16 1983 212 non-null float64
17 1984 212 non-null float64
18 1985 212 non-null float64
19 1986 212 non-null float64
20 1987 212 non-null float64
21 1988 212 non-null float64
22 1989 212 non-null float64
23 1990 212 non-null float64
24 1991 212 non-null float64
25 1992 212 non-null float64
26 1993 212 non-null float64
27 1994 212 non-null float64
28 1995 212 non-null float64
29 1996 212 non-null float64
30 1997 212 non-null float64
31 1998 212 non-null float64
32 1999 212 non-null float64
33 2000 212 non-null float64
34 2001 212 non-null float64
35 2002 212 non-null float64
36 2003 212 non-null float64
37 2004 212 non-null float64
38 2005 212 non-null float64
39 2006 212 non-null float64
40 2007 212 non-null float64
41 2008 212 non-null float64
42 2009 212 non-null float64
43 2010 212 non-null float64
44 2011 212 non-null float64
45 2012 212 non-null float64
46 2013 212 non-null float64
47 2014 212 non-null float64
48 2015 212 non-null float64
49 2016 212 non-null float64
50 2017 212 non-null float64
51 2018 212 non-null float64
52 2019 212 non-null float64
53 2020 212 non-null float64
54 2021 212 non-null float64
55 2022 212 non-null float64
56 2023 212 non-null float64
dtypes: float64(54), object(3)
memory usage: 95.4+ KB
None
1970 1971 1972 1973 1974 \
count 212.000000 212.000000 212.000000 212.000000 212.000000
mean 165.187547 164.593604 172.811704 182.994107 182.107000
std 1156.059764 1149.393923 1207.690640 1277.712318 1270.065099
min 0.000781 0.000808 0.000830 0.000849 0.000866
25% 0.286447 0.286884 0.318343 0.343017 0.365587
50% 3.386873 3.597102 3.847670 4.354438 4.416961
75% 28.495989 28.027408 30.036502 32.365149 31.981332
max 15751.858044 15683.389817 16481.436077 17464.383654 17400.275859
1975 1976 1977 1978 1979 \
count 212.000000 212.000000 212.000000 212.000000 212.000000
mean 181.002229 191.552881 196.688969 203.219252 209.009110
std 1261.219562 1334.127097 1373.062121 1416.315701 1454.302602
min 0.000880 0.000893 0.000903 0.000913 0.000923
25% 0.385887 0.352417 0.394585 0.401558 0.452316
50% 4.937966 5.336292 4.912663 5.285625 5.108881
75% 34.740841 37.028400 38.140764 38.765811 40.364992
max 17328.558005 18317.005105 18871.708706 19491.220885 20031.856482
... 2014 2015 2016 2017 \
count ... 212.000000 212.000000 212.000000 212.000000
mean ... 357.931300 357.014848 358.217444 364.215338
std ... 2640.222845 2628.672453 2636.405088 2681.270458
min ... 0.001999 0.001999 0.001999 0.002013
25% ... 1.689968 1.598465 1.643029 1.802726
50% ... 9.933750 10.577303 10.712506 10.877226
75% ... 65.950974 67.545226 67.691992 68.810767
max ... 36427.769713 36300.466451 36423.660699 37047.472031
2018 2019 2020 2021 2022 \
count 212.000000 212.000000 212.000000 212.000000 212.000000
mean 372.634995 372.835024 353.536811 372.998986 373.821483
std 2753.141689 2762.651851 2636.710472 2779.917529 2786.844016
min 0.002029 0.002045 0.002059 0.002073 0.002089
25% 1.956477 1.974339 2.024798 2.149692 2.277833
50% 12.212169 12.533925 11.492360 12.185316 12.472898
75% 66.465855 69.363412 66.917612 65.511073 63.795406
max 37974.553886 38066.434146 36154.308044 38121.014981 38246.624061
2023
count 212.000000
mean 379.999726
std 2851.542226
min 0.002103
25% 2.378238
50% 12.050200
75% 64.702804
max 39023.937039
[8 rows x 54 columns]
# Check for missing values
print(data.isnull().sum())
Substance 2
EDGAR Country Code 2
Country 2
1970 2
1971 2
1972 2
1973 2
1974 2
1975 2
1976 2
1977 2
1978 2
1979 2
1980 2
1981 2
1982 2
1983 2
1984 2
1985 2
1986 2
1987 2
1988 2
1989 2
1990 2
1991 2
1992 2
1993 2
1994 2
1995 2
1996 2
1997 2
1998 2
1999 2
2000 2
2001 2
2002 2
2003 2
2004 2
2005 2
2006 2
2007 2
2008 2
2009 2
2010 2
2011 2
2012 2
2013 2
2014 2
2015 2
2016 2
2017 2
2018 2
2019 2
2020 2
2021 2
2022 2
2023 2
dtype: int64
# Check for rows where all columns are missing
# As we can see from above we have 2 missing values for each coloumn which could be that we have two rows with no data
missing_rows = data[data.isnull().all(axis=1)]
print(missing_rows)
Substance EDGAR Country Code Country 1970 1971 1972 1973 1974 1975 \
210 NaN NaN NaN NaN NaN NaN NaN NaN NaN
212 NaN NaN NaN NaN NaN NaN NaN NaN NaN
1976 ... 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
210 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
212 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
[2 rows x 57 columns]
# Let's drop thos emissing rows
# Drop rows where all columns are missing
data_cleaned = data.dropna(how='all')
# Display the first few rows of the cleaned dataset
print(data_cleaned.head())
# Check if the rows were successfully dropped
print(data_cleaned.isnull().sum())
Substance EDGAR Country Code Country 1970 \
0 CO2 ABW Aruba 0.025214
1 CO2 AFG Afghanistan 1.733920
2 CO2 AGO Angola 8.933899
3 CO2 AIA Anguilla 0.002178
4 CO2 AIR International Aviation 169.900399
1971 1972 1973 1974 1975 1976 \
0 0.028828 0.039472 0.044289 0.043469 0.057396 0.056423
1 1.733710 1.693584 1.733905 2.190318 2.028967 1.892642
2 8.519513 10.366104 11.346996 11.806561 10.904653 7.291981
3 0.002178 0.002273 0.002118 0.002360 0.002594 0.002444
4 169.900399 179.759531 187.494406 180.478129 174.582471 174.907983
... 2014 2015 2016 2017 2018 \
0 ... 0.440689 0.462026 0.484889 0.466592 0.465881
1 ... 7.825741 8.346521 7.527594 8.066138 7.932005
2 ... 30.887264 33.097499 31.285803 27.942099 26.258887
3 ... 0.027917 0.028027 0.028363 0.029087 0.028247
4 ... 507.505761 536.213680 560.173839 589.919315 615.937542
2019 2020 2021 2022 2023
0 0.557917 0.452553 0.500635 0.502693 0.530026
1 7.249069 7.054133 7.930781 8.259915 8.707350
2 27.573216 20.710918 25.262832 27.353038 28.229928
3 0.027604 0.022804 0.022018 0.021861 0.022956
4 625.141435 298.655678 331.317425 411.474866 491.632308
[5 rows x 57 columns]
Substance 0
EDGAR Country Code 0
Country 0
1970 0
1971 0
1972 0
1973 0
1974 0
1975 0
1976 0
1977 0
1978 0
1979 0
1980 0
1981 0
1982 0
1983 0
1984 0
1985 0
1986 0
1987 0
1988 0
1989 0
1990 0
1991 0
1992 0
1993 0
1994 0
1995 0
1996 0
1997 0
1998 0
1999 0
2000 0
2001 0
2002 0
2003 0
2004 0
2005 0
2006 0
2007 0
2008 0
2009 0
2010 0
2011 0
2012 0
2013 0
2014 0
2015 0
2016 0
2017 0
2018 0
2019 0
2020 0
2021 0
2022 0
2023 0
dtype: int64
# Display the data types of all columns
print(data.dtypes)
Substance object
EDGAR Country Code object
Country object
1970 float64
1971 float64
1972 float64
1973 float64
1974 float64
1975 float64
1976 float64
1977 float64
1978 float64
1979 float64
1980 float64
1981 float64
1982 float64
1983 float64
1984 float64
1985 float64
1986 float64
1987 float64
1988 float64
1989 float64
1990 float64
1991 float64
1992 float64
1993 float64
1994 float64
1995 float64
1996 float64
1997 float64
1998 float64
1999 float64
2000 float64
2001 float64
2002 float64
2003 float64
2004 float64
2005 float64
2006 float64
2007 float64
2008 float64
2009 float64
2010 float64
2011 float64
2012 float64
2013 float64
2014 float64
2015 float64
2016 float64
2017 float64
2018 float64
2019 float64
2020 float64
2021 float64
2022 float64
2023 float64
dtype: object
# Display the cleaned column names
print(data_cleaned.columns)
Index([ 'Substance', 'EDGAR Country Code', 'Country',
1970, 1971, 1972,
1973, 1974, 1975,
1976, 1977, 1978,
1979, 1980, 1981,
1982, 1983, 1984,
1985, 1986, 1987,
1988, 1989, 1990,
1991, 1992, 1993,
1994, 1995, 1996,
1997, 1998, 1999,
2000, 2001, 2002,
2003, 2004, 2005,
2006, 2007, 2008,
2009, 2010, 2011,
2012, 2013, 2014,
2015, 2016, 2017,
2018, 2019, 2020,
2021, 2022, 2023],
dtype='object')
# Convert all columns to string types
data_cleaned = data_cleaned.astype(str)
# Clean column names
#data.columns = data.columns.str.strip() # Remove leading and trailing spaces
#data.columns = data.columns.str.replace('\n', ' ') # Replace newline characters with spaces
# Display the cleaned column names
print(data.columns)
# Display the data types of all columns
print(data_cleaned.dtypes)
Index([ 'Substance', 'EDGAR Country Code', 'Country',
1970, 1971, 1972,
1973, 1974, 1975,
1976, 1977, 1978,
1979, 1980, 1981,
1982, 1983, 1984,
1985, 1986, 1987,
1988, 1989, 1990,
1991, 1992, 1993,
1994, 1995, 1996,
1997, 1998, 1999,
2000, 2001, 2002,
2003, 2004, 2005,
2006, 2007, 2008,
2009, 2010, 2011,
2012, 2013, 2014,
2015, 2016, 2017,
2018, 2019, 2020,
2021, 2022, 2023],
dtype='object')
Substance object
EDGAR Country Code object
Country object
1970 object
1971 object
1972 object
1973 object
1974 object
1975 object
1976 object
1977 object
1978 object
1979 object
1980 object
1981 object
1982 object
1983 object
1984 object
1985 object
1986 object
1987 object
1988 object
1989 object
1990 object
1991 object
1992 object
1993 object
1994 object
1995 object
1996 object
1997 object
1998 object
1999 object
2000 object
2001 object
2002 object
2003 object
2004 object
2005 object
2006 object
2007 object
2008 object
2009 object
2010 object
2011 object
2012 object
2013 object
2014 object
2015 object
2016 object
2017 object
2018 object
2019 object
2020 object
2021 object
2022 object
2023 object
dtype: object
year_columns = data_cleaned.filter(regex=r'^\d{4}$').columns
print(year_columns)
Index([1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981,
1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018, 2019, 2020, 2021, 2022, 2023],
dtype='object')
# Convert Index to list
year_columns_list = year_columns.tolist()
print(year_columns_list)
# Access the column by its name
print(year_columns_list[year_columns_list.index(1970)])
[1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023]
1970
# Drop rows where all columns are missing
data_cleaned = data.dropna(how='all')
Calculate Total CO2 Emissions and Percentage Contribution
# Convert all columns to string types
# This is because
#data = data.astype(str)
# Use regex to filter columns that contain the year (assuming years are four-digit numbers)
year_columns = data.filter(regex=r'^\d{4}$').columns
# Convert year columns to numeric
data[year_columns] = data[year_columns].apply(pd.to_numeric, errors='coerce')
# Calculate total CO2 emissions for each country
data['Total_CO2'] = data[year_columns].sum(axis=1)
# Identify the top 20 most polluting countries
top_20_countries = data.nlargest(20, 'Total_CO2')
# Calculate percentage contribution for each year
for year in year_columns:
total_emissions_year = data[year].sum()
top_20_countries[f'{year}_Percentage'] = (top_20_countries[year] / total_emissions_year) * 100
# Display the top 20 countries with their total CO2 emissions and percentage contribution
#print(top_20_countries[['Country', 'Total_CO2'] + [f'{year}_Percentage' for year in year_columns]])
print(top_20_countries[['Country', 'Total_CO2']])
Country Total_CO2
213 GLOBAL TOTAL 1.424122e+06
36 China 2.779245e+05
198 United States 2.761683e+05
211 EU27 1.925624e+05
160 Russia 9.865735e+04
99 Japan 6.067946e+04
90 India 5.948510e+04
50 Germany 5.013919e+04
69 United Kingdom 2.899794e+04
33 Canada 2.683242e+04
165 International Shipping 2.600512e+04
196 Ukraine 2.460650e+04
66 France and Monaco 2.174306e+04
96 Italy, San Marino and the Holy See 2.143453e+04
106 South Korea 2.040593e+04
151 Poland 1.938792e+04
92 Iran 1.933123e+04
123 Mexico 1.877987e+04
207 South Africa 1.868865e+04
4 International Aviation 1.807156e+04
# Return the Global Total for 1970 and 2023
# Extract the row for Global Total
global_total_row = data_cleaned[data_cleaned['Country'] == 'GLOBAL TOTAL']
print(global_total_row)
# Return the Global Total for 1970
global_total_1970 = global_total_row[1970].values[0]
# Return the Global Total for 2023
global_total_2023 = global_total_row[2023].values[0]
print(f"Global Total Co2 Emmisions - 1970: {global_total_1970}")
print(f"Global Total Co2 Emmisions - 2023: {global_total_2023}")
# Calculate the grand total of all years combined
grand_total = global_total_row[year_columns].sum(axis=1).values[0]
print(f"Grand Total of Emissions 1970 -2023: {grand_total}")
Substance EDGAR Country Code Country 1970 1971 \
213 CO2 GLOBAL TOTAL GLOBAL TOTAL 15751.858044 15683.389817
1972 1973 1974 1975 1976 \
213 16481.436077 17464.383654 17400.275859 17328.558005 18317.005105
... 2014 2015 2016 2017 \
213 ... 36427.769713 36300.466451 36423.660699 37047.472031
2018 2019 2020 2021 2022 \
213 37974.553886 38066.434146 36154.308044 38121.014981 38246.624061
2023
213 39023.937039
[1 rows x 57 columns]
Global Total Co2 Emmisions - 1970: 15751.858044223
Global Total Co2 Emmisions - 2023: 39023.937038738
Grand Total of Emissions 1970 -2023: 1424121.724861018
# Extract Figures for Aviation - 'International Aviation'
# Extarct Aviation Row
avaiation_row = data_cleaned[data_cleaned['Country'] == 'International Aviation']
print(avaiation_row)
# Return the Aviation Total for 1970
avaiation_total_1970 = avaiation_row[1970].values[0]
# Return the Global Total for 2023
avaiation_total_2023 = avaiation_row[2023].values[0]
print(f"International Aviation Total Co2 Emmisions - 1970: {global_total_1970}")
print(f"International Aviation Total Co2 Emmisions - 2023: {global_total_2023}")
# Total Emissions for International Aviation - 1970 -2023
aviation_grand_total = avaiation_row[year_columns].sum(axis=1).values[0]
print(f"Total Emissions for International Aviation - 1970 -2023: {aviation_grand_total}")
Substance EDGAR Country Code Country 1970 \
4 CO2 AIR International Aviation 169.900399
1971 1972 1973 1974 1975 1976 \
4 169.900399 179.759531 187.494406 180.478129 174.582471 174.907983
... 2014 2015 2016 2017 2018 2019 \
4 ... 507.505761 536.21368 560.173839 589.919315 615.937542 625.141435
2020 2021 2022 2023
4 298.655678 331.317425 411.474866 491.632308
[1 rows x 57 columns]
International Aviation Total Co2 Emmisions - 1970: 15751.858044223
International Aviation Total Co2 Emmisions - 2023: 39023.937038738
Total Emissions for International Aviation - 1970 -2023: 18071.560975227614
# International Aviation as a % Global Total
# Calculate the percentage of global emissions for each year
aviation_percentage_emissions = {} # Initialize an empty dictionary to store the percentage emissions for each year
for year in year_columns: # Loop through each year in the year_columns
global_total = global_total_row[year].values[0] # Get the global total CO2 emissions for the current year
shipping_total = avaiation_row[year].values[0] # Get the CO2 emissions for International Shipping for the current year
aviation_percentage_emissions[year] = (shipping_total / global_total) * 100 # Calculate the percentage of global emissions and store it in the dictionary
# Convert the dictionary to a DataFrame for plotting
aviation_percentage_emissions_df = pd.DataFrame(list(aviation_percentage_emissions.items()), columns=['Year', 'Percentage'])
import matplotlib.pyplot as plt
# Create a bar chart
plt.figure(figsize=(14, 8))
plt.bar(aviation_percentage_emissions_df['Year'], aviation_percentage_emissions_df['Percentage'], color='skyblue')
plt.xlabel('Year')
plt.ylabel('Percentage of Global CO2 Emissions (%)')
plt.title('International Aviation CO2 Emissions as a Percentage of Global Emissions (1970-2023)')
plt.xticks(rotation=45, ha='right')
plt.show()
# Extract Figures for = Shipping - 'International Shipping'
# Extarct Shipping Row
shipping_row = data_cleaned[data_cleaned['Country'] == 'International Shipping']
print(avaiation_row)
# Return the Aviation Total for 1970
shipping_total_1970 = shipping_row[1970].values[0]
print(f"International Shipping Total Co2 Emmisions - 1970: {shipping_total_1970}")
# Return the Global Total for 2023
shipping_total_2023 = shipping_row[2023].values[0]
print(f"International Shipping Total Co2 Emmisions - 2023: {shipping_total_2023}")
# Total Emissions for International Shipping - 1970 -2023
shipping_grand_total = shipping_row[year_columns].sum(axis=1).values[0]
print(f"Total Emissions for International Shipping - 1970 -2023: {shipping_grand_total}")
Substance EDGAR Country Code Country 1970 \
4 CO2 AIR International Aviation 169.900399
1971 1972 1973 1974 1975 1976 \
4 169.900399 179.759531 187.494406 180.478129 174.582471 174.907983
... 2014 2015 2016 2017 2018 2019 \
4 ... 507.505761 536.21368 560.173839 589.919315 615.937542 625.141435
2020 2021 2022 2023
4 298.655678 331.317425 411.474866 491.632308
[1 rows x 57 columns]
International Shipping Total Co2 Emmisions - 1970: 353.84635222099
International Shipping Total Co2 Emmisions - 2023: 706.32042124359
Total Emissions for International Shipping - 1970 -2023: 26005.11654870811
# International Shipping as a % Global Total
# Calculate the percentage of global emissions for each year
percentage_emissions = {} # Initialize an empty dictionary to store the percentage emissions for each year
for year in year_columns: # Loop through each year in the year_columns
global_total = global_total_row[year].values[0] # Get the global total CO2 emissions for the current year
shipping_total = shipping_row[year].values[0] # Get the CO2 emissions for International Shipping for the current year
percentage_emissions[year] = (shipping_total / global_total) * 100 # Calculate the percentage of global emissions and store it in the dictionary
# Convert the dictionary to a DataFrame for plotting
percentage_emissions_df = pd.DataFrame(list(percentage_emissions.items()), columns=['Year', 'Percentage'])
import matplotlib.pyplot as plt
# Create a bar chart
plt.figure(figsize=(14, 8))
plt.bar(percentage_emissions_df['Year'], percentage_emissions_df['Percentage'], color='skyblue')
plt.xlabel('Year')
plt.ylabel('Percentage of Global CO2 Emissions (%)')
plt.title('International Shipping CO2 Emissions as a Percentage of Global Emissions (1970-2023)')
plt.xticks(rotation=45, ha='right')
plt.show()
################################# REMOVE GLOBAL TOTAL FROM TOP 20 and TOP 5 ##################################
% of Total Emissions by the Top 20 - 1970
# Calculate total CO2 emissions for each country in 1970
data['Total_1970'] = data[1970].astype(float)
# Identify the top 20 most polluting countries in 1970
top_20_1970 = data.nlargest(20, 'Total_1970')
# Calculate the percentage of total emissions by the top 20 countries in 1970
total_emissions_1970 = data[1970].astype(float).sum()
top_20_percentage_1970 = (top_20_1970['Total_1970'].sum() / total_emissions_1970) * 100
print(f"% of Total Emissions by the Top 20 - 1970: {top_20_percentage_1970:.2f}%")
% of Total Emissions by the Top 20 - 1970: 91.59%
% of Total Emissions by the Top 5 - 1970
# Identify the top 5 most polluting countries in 1970
top_5_1970 = data.nlargest(5, 'Total_1970')
# Calculate the percentage of total emissions by the top 5 countries in 1970
top_5_percentage_1970 = (top_5_1970['Total_1970'].sum() / total_emissions_1970) * 100
print(f"% of Total Emissions by the Top 5 - 1970: {top_5_percentage_1970:.2f}%")
% of Total Emissions by the Top 5 - 1970: 74.97%
% of Total Emissions by the Top 20 - 2023
# Calculate total CO2 emissions for each country in 2023
data['Total_2023'] = data[2023].astype(float)
# Identify the top 20 most polluting countries in 2023
top_20_2023 = data.nlargest(20, 'Total_2023')
# Calculate the percentage of total emissions by the top 20 countries in 2023
total_emissions_2023 = data[2023].astype(float).sum()
top_20_percentage_2023 = (top_20_2023['Total_2023'].sum() / total_emissions_2023) * 100
print(f"% of Total Emissions by the Top 20 - 2023: {top_20_percentage_2023:.2f}%")
% of Total Emissions by the Top 20 - 2023: 90.15%
% of Total Emissions by the Top 5 - 2023
# Identify the top 5 most polluting countries in 2023
top_5_2023 = data.nlargest(5, 'Total_2023')
# Calculate the percentage of total emissions by the top 5 countries in 2023
top_5_percentage_2023 = (top_5_2023['Total_2023'].sum() / total_emissions_2023) * 100
print(f"% of Total Emissions by the Top 5 - 2023: {top_5_percentage_2023:.2f}%")
print(top_5_2023)
% of Total Emissions by the Top 5 - 2023: 77.50%
Substance EDGAR Country Code Country 1970 1971 \
213 CO2 GLOBAL TOTAL GLOBAL TOTAL 15751.858044 15683.389817
36 CO2 CHN China 909.976242 913.419357
198 CO2 USA United States 4595.062878 4459.919046
90 CO2 IND India 213.934448 214.428120
211 CO2 EU27 EU27 3516.043820 3527.064466
1972 1973 1974 1975 1976 \
213 16481.436077 17464.383654 17400.275859 17328.558005 18317.005105
36 973.909875 1013.661370 1031.209355 1182.598584 1230.625731
198 4710.777160 4896.845457 4727.681218 4515.615986 4816.015256
90 222.963236 221.937314 237.640939 253.201702 270.693578
211 3673.209130 3865.983444 3806.132347 3715.356555 3975.200616
... 2017 2018 2019 2020 \
213 ... 37047.472031 37974.553886 38066.434146 36154.308044
36 ... 11037.127069 11572.416381 11850.592876 12022.432962
198 ... 4959.764867 5118.114833 4966.950968 4466.042820
90 ... 2433.783081 2573.119402 2542.035076 2318.947665
211 ... 3118.707522 3049.511170 2908.156852 2641.187916
2021 2022 2023 Total_CO2 Total_1970 \
213 38121.014981 38246.624061 39023.937039 1.424122e+06 15751.858044
36 12621.614750 12526.826281 13259.638954 2.779245e+05 909.976242
198 4755.183784 4786.630553 4682.039414 2.761683e+05 4595.062878
90 2548.483265 2740.820631 2955.181684 5.948510e+04 213.934448
211 2833.755059 2756.906302 2512.067780 1.925624e+05 3516.043820
Total_2023
213 39023.937039
36 13259.638954
198 4682.039414
90 2955.181684
211 2512.067780
[5 rows x 60 columns]