Data Exploration With SQL 2 (Covid-19 Death Dataset)

In this project I used the Coivd-19 death dataset and explored many weird and also sad stuff.

Again I’m using SQL to explore this dataset, I want to improve my SQL querying skills with another project.


About The Dataset

Coronavirus Pandemic has a large dataset in Our World In Data website and I used SQL to explore it.

Raw data on confirmed cases and deaths for all countries is sourced from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.

Here is the link to dataset itself:

https://ourworldindata.org/covid-deaths

This datasets updates constantly.

Data Exploration

First I created a data base then imported the data into Microsoft SQL Server and now I want to read it.

So I select the data that is not null.

Select *
From PortfolioProject..CovidDeaths
Where continent is not null 

Next I will select the data I want to check out, ordering in Location and Date.

Select Location, date, total_cases, new_cases, total_deaths, population
From PortfolioProject..CovidDeaths
Where continent is not null 
order by 1,2

Now I want to check out the total the total number of cases and death caused by Coivid-19 in United States.

Select Location, date, total_cases,total_deaths
From PortfolioProject..CovidDeaths
Where location like '%states%'
and continent is not null 
order by 1,2

After that, I wanted to check that if someone get infected with Covid-19 on United States, what is the probability that he or she dies.

Select Location, date, total_cases,total_deaths, (total_deaths/total_cases)*100 as DeathPercentage
From PortfolioProject..CovidDeaths
Where location like '%states%'
and continent is not null 
order by 1,2

Result :

Around 1.5%

Since the dataset updates, you might get a different number.

Next, I was curious about the infection rate of each country.

So I used aggregation function Max and called it H_InfectionCount and grouped the data by Location and Population and ordered descending.

Select Location, Population, MAX(total_cases) as H_InfectionCount,  Max((total_cases/population))*100 as PercentPopulationInfected
From PortfolioProject..CovidDeaths
Group by Location, Population
order by PercentPopulationInfected desc

Result:

Country : Montenegro , Infection rate: around 17%

Then I wanted to check things based on continents.

So, I tried to find out the which continent had the highest number of deaths per population

Select continent, MAX(cast(Total_deaths as int)) as TotalDeathCount
From PortfolioProject..CovidDeaths
Where continent is not null 
Group by continent
order by TotalDeathCount desc

Result:

Continent : North America , Death Count: more than 570,000

After continent, I wanted to check out the global numbers.

So here is how I checked the Total Cases, Total Deaths, and the Death Percentage.

Select SUM(new_cases) as total_cases, SUM(cast(new_deaths as int)) as total_deaths, SUM(cast(new_deaths as int))/SUM(New_Cases)*100 as DeathPercentage
From PortfolioProject..CovidDeaths
where continent is not null 
order by 1,2

Result:

Total Cases: Around 160 Million cases, Total Death: Around 4 Million Deaths, Death Percentage: Around 2%

I wanted to check the Percentage of Population that has received at least one Covid Vaccine


Select dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(int,vac.new_vaccinations)) OVER (Partition by dea.Location Order by dea.location, dea.Date) as RollingPeopleVaccinated
--, (RollingPeopleVaccinated/population)*100
From PortfolioProject..CovidDeaths dea
Join PortfolioProject..CovidVaccinations vac
	On dea.location = vac.location
	and dea.date = vac.date
where dea.continent is not null 
order by 2,3

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button