In this project I explored Migration Flow Dataset using SQL and I loved because SQL is so easy to work with.
I will share with you the code I used in this project and the result I got using those code.
About The SQL Data Exploration Project
Since I love SQL I wanted to have some fun so I used MSSQL or Microsoft SQL server to dig in and check out this dataset.
The dataset is about 30 years of migration between different countries.
Here is the link to dataset:
https://ourworldindata.org/explorers/migration-flows
In this project I focused on immigrants who migrated to Canada and explored different things about them.
Before I start it’s necessary to know these things about the dataset.
Immigrants to Canada means:
The total number of people born in another country that now live in Canada. Negative numbers indicate that people migrated away from that country,
into Canada. This is a measure of migrant stocks – it is not the annual flow of migrants. The value for Canada is the total immigrant stock.
Emigrants from Canada means:
The total number of people born in Canada that now live in another country. This is a measure of emigrant stocks – it is not the annual flow of
emigrants. The value for Canada is the total emigrant stock living in another country.
Let’s Start The SQL Data Exploration Project
So in this project table name is Migration and there are two columns for each country, Immigrants to that country and emigrants from that country.
So As I mentioned I want to check out Immigrants to Canada in this project.
Data Cleaning with SQL
Before importing the data if possible, check out the name of columns and make sure they are nit separated by space.
Alright after importing the data to MSSQL, let’s see the table.
select *
from [portfolio project 1]..[migration]
Next, let’s find out if there are any null values in the Immigrants_to_Canada Column.
Select Immigrants_to_Canada
from [portfolio project 1]..[migration]
Where Immigrants_to_Canada is null;
Since there are some null values and they don’t affect the calculation but they will cause problems when running the aggregation functions, I decided to put 0 instead of nulls. So this code will change null values to zero.
update [portfolio project 1]..'migration]
set Immigrants_to_Canada = 0
where Immigrants_to_Canada is null
Net thing which is very important in data cleaning is to check out the data type of a column, so this is how I checked out the data type
SELECT
DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH AS MAX_LENGTH,
CHARACTER_OCTET_LENGTH AS OCTET_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'portfolio project 1'
AND COLUMN_NAME = 'Immigrants_to_Canada';
I checked out the data set and found out that the data type is not numeric so I decided to check out the column and see which part of it is not numeric. I did it using below code.
SELECT Immigrants_to_Canada
FROM [portfolio project 1]..[migration]
WHERE ISNUMERIC(Immigrants_to_Canada) <> 1;
Next I wanted to convert the column data type to numeric so I did this.
select try_convert(numeric(38, 12),Immigrants_to_Canada)
from [portfolio project 1]..[migration]
Data Exploration with SQL
First Let’s see the number of all immigrants who migrated to Canada in this 30 years.
select SUM (Immigrants_to_Canada)
from [portfolio project 1]..[migration];
Result: 45,269,302
Wow so more than 45 million people migrated to Canada From 1990 to 2020.
Now Let’s check out the highest number of immigrants entered Canada in a year.
Since the number of people migrated from other countries are shown with a negative number and Canadians with a positive number, if I just simply sum the numbers, it will be deduct from each other, so I need to first remove the number of people from Canada who traveled back to Canada.
update [portfolio project 1]..[migration]
set Immigrants_to_Canada = 0
where Country = 'Canada';
After that I grouped the data by year and calculated the sum of Immigrants_to_Canada column
select Year, sum (Immigrants_to_Canada)
from [portfolio project 1]..[migration]
Group By Year;
Result:
year: 2020
8,460,763 people
The highest number of migrated to Canada in a year happened in 2020 which more that 8 million people entered this country.
Next I wanted to know the top 5 countries that immigrants to Canada are coming from, so this is how I got it.
select Country, SUM (Immigrants_to_Canada) As Nations_To_Canada
from [portfolio project 1]..[migration]
WHERE Immigrants_to_Canada is not null
Group By Country
order by Nations_To_Canada;
Result:
United Kingdom: 4,068,444
China: 3,332,405
India: 3,053,106
Philippines: 2,555,117
Caribbean: 2,283,159