About
The goal of this project is to analyza a dataset on the COVID-19 genome and preform data analysis to
essentially determine what kind of patterns this different genomes contain.
Dataset I will be using: NCBI Dataset
This Dataset contains the coronavirus viral genome and protein sequences in a csv format. I will be analyzing this data to find common patterns and correlations that can provide with more usefull information about the data that I am analyzing.
As a result, we will be showcasing how we can turn raw data into some usefull information that can benefit analysis.
First Step: Pre-Processing
First we have to analyze the data and figure out which commlums are relevant for analysis.
Result
After removing unnecessary collums that are irrelevent for what we are looking for lets continue to the next step.
Second Step: Analysis
Now lets use the Data 2 Int tool and generate a report using pandas profiling!
As you can see, there is alot of interesting information that can be discovered thanks to Data2Int report tool that is generated. Very Cool!
COVID-19 Genome Project
Finally, lets use d3 and generate a cool visualizations and plot the data that we selected. Please see the source code attached below so you can see how this was implemented!