DBPersonal Project
Architecture

COVID-19 Genome Project

About

The goal of this project is to analyza a dataset on the COVID-19 genome and preform data analysis to essentially determine what kind of patterns this different genomes contain.
Dataset I will be using: NCBI Dataset

This Dataset contains the coronavirus viral genome and protein sequences in a csv format. I will be analyzing this data to find common patterns and correlations that can provide with more usefull information about the data that I am analyzing.

As a result, we will be showcasing how we can turn raw data into some usefull information that can benefit analysis.


First Step: Pre-Processing

First we have to analyze the data and figure out which commlums are relevant for analysis.

unprocessed data

Result

After removing unnecessary collums that are irrelevent for what we are looking for lets continue to the next step.

unprocessed data

Second Step: Analysis

Now lets use the Data 2 Int tool and generate a report using pandas profiling!


As you can see, there is alot of interesting information that can be discovered thanks to Data2Int report tool that is generated. Very Cool!

COVID-19 Genome Project

Finally, lets use d3 and generate a cool visualizations and plot the data that we selected. Please see the source code attached below so you can see how this was implemented!