Welcome to Data 2 Intelligence

This website will serve as an oasis for data scientists and researchers. It will evolve into providing the tools for researchers and lay-persons to help them better understand their data and find different ways to make sense of it.

For a start, a team of programmers will work on the Data Patterns project that will allow researchers to upload their data and get a quick description of their data and use various statistical and visualization tools to identify patterns or the 'big picture' thus paving the way for a detail analysis. The project overview is listed below.

Data Patterns

The purpose of this project is to develop a website to ease data science exploration, analysis and visualization for researchers not familiar with web programming and common statistical tools. The primary components of the JavaScript-based website involve uploading data in CSV format and automatically presenting common exploratory data descriptions. Based on the type of data uploaded, a suite of data analysis and visualizations will be made possible through the use of Python and D3.js libraries. The output can be in the form of pdf or webpages.

Uploading CSV

The main page will allow researchers to upload their CSV files. The CSV will be stored in a database that will be purged after a day. The CSV file size will be limited to 10 MB.

Exploratory Description Output

The output will include basic description of the type of data, categorical or quantitative, and if quantitative, whether the data are nominal, ordinal, intervalor ratio. These outputs will be stored for 24 hours and presented on the right panel of the Website and/or as pdf. It will also help to flag out missing data or outliers.

Data Analysis

Based on the data in the CSV file, the data analysis output will try at least 5 statistical analysis for all possible combinations of columns in the CSV file. Some of the analytical methods may not be relevant to the type of data and will therefore continue to the next loop of the test.

Visualization

The main purpose of the website is to throw out as many types of visualization so as to get a big picture of the patterns in the data they uploaded. This big picture will give researchers insights into the relationships between all their data and thus allow them to drill down and choose certain aspects for deeper analysis.

For this project we will be using the following:

  • Ubuntu 21.04 - for development because it is readily available and free
  • Python 3.8.5 - its is powerful and well supported
  • Flask - simple and easly to learn and deploy JavaScript Framework
  • MongoDb - Database that can store unstructure data
  • D3.js - wonderful visualization
  • DC.js< - interactive visualization
  • BootStrap and CSS - simple and effective website display controls
  • Warm Up Project

    For a start, individually, on your own localhost implement the step by step project at here

    Be careful on how you load the csv file and make sure it has the headers in the mongodb database before proceeding further. Also take note of the comments below the website to ensure the datetime setting match your own. For a start limit the number of fields in the rows to 10,000

    Next week, we will review the project together. Most of the skills needed for the project are embedded in this project.