Meeting Minutes


Welcome your visit, enjoy your stay. Here is the weekly meeting notes of our group

The following is the weekly meeting record for each month


data2int.com official website name

Introductions from the all person

Skeletal Model provided to us of the project

Went into detail about the projects purpose, goal

  • purpose: for researchers and masters students to analyze and make sense of data. Supposed to give them a direction of where to go with the data.
  • goal: Cost Effective Solution, Easily upload and understand data, Allow users to learn while exploring.

Kaggel - a lot of data science projects there that transform the data into visual form

  • upload csv/json to the backend.
  • manipulate data
  • allow people to build intelligence (statistical analysis, visual forms (graphs, charts), predictive/artificial intelligence

talked about all software that will be used for the project

  • Ubuntu 21.04 - for development because it is readily available and free
  • Python 3.8.5 - its is powerful and well supported
  • Flask - simple and easy to learn and deploy JavaScript Framework
  • MongoDb - Database that can store unstructured data
  • D3.js - wonderful visualization
  • DC.js - interactive visualization
  • crossfilter.js
  • prism.js
  • BootStrap and CSS - simple and effective website display controls

We will email the meetings agenda to the client before the meeting begins

Went over software that will be used for the application.

Team will discuss what what cloud server we want to use.

Use developer 1 as an example and that's how we are going to develop and update our codes\ using prism.js

Learn to use sudo nano

365 data science to learn statistics

Data analysis is run by: Flask (web server that will manage the data analysis)

JavaScript and website will be run by: Nginx (web server that will manage the website navigation)

Pick one specification from the business analysis and predictive analysis that we want to do and at the end we will work on creating an app that we want using the tool create.

Systemctl restart gunicorn Command to update the website by restarting web server

High Level Requirements:

  • Accomplish data visualization to better understand data sets and the relationships between data.
  • Accomplish data visualization to assist researchers and help researchers recognize patterns in their data using d3.js
  • We do this through D3.js.
  • Using python because it already has a lot of data analysis libraries.
  • Using python cause of the data manipulation libraries that exist within the language
  • Utilize python to manipulate visualizations.
  • types of data: -interval -ordinal -relational -nominal
  • Choose 10 D3.js applications to begin with.
  • Show all possible visualizations of their data and see what they like.
  • Data uploading step. (First task) Make sure it is secure and make sure that it is only saved for 24 hours.
  • Analyze each row of data and figure out what can be done.
  • Present each visualization possible.
  • Leave website as is. No added options. All the usage is on the main page.

Professor wants a UML Outlining how everything on the website works together.

page should spit out different visualizations that apply to the dataset

  • If the technique doesn't apply to the dataset, skip to the next technique and try the next one
  • Data visualizations should appear all on the same page together

Timeline:

page should spit out different visualizations that apply to the dataset

  • Find out how is data is uploaded and make it secure; stick to 10MG files
  • Business/predictive analysis - relating to the topic on statistics to teach end users how to use

Website for statistics analysis(Click to view)

Vega-Lite. A website used for analysis based on data that is selected.(Click to view)

This is a project that is like something he wants Done by Singapore PHD Student Grace Tang.

Example project by a grad student on map visualizations using Python(Click to view)

We need hash coding for uploading the data.

Apache codec libraries as a reference for how to hash our data.

Paul worked with similar stuff but not with python.

Figure out how to hash data in the backend.

They will go through the UML with us so that we can verify our steps we take are correct.

Do not worry about crashing the website. He has a backup of the website and can bring it back up.

He WANTS at least one of us doing genetics analysis.

Upload from URL if possible.

All fields will be attached with an attribute that shows what type of data it is.

Missing Data, the user will get a few options to automatically generate data.

Professor will show us some modules later as to what we should do for missing data.

Professor will tell us how to manage the outliers.

Outliers can skew the data.

Make sure our data is good. Make sure the data is valuable.

Month or two to decide what we want to do on our personal data project. Ideally predictive data analysis. Has many levels of data. Like a hierarchy.

Bayesian analysis is on the Washington Edu website. It is an AI replicates how humans think and we could use that as a reference for our projects.

One of us he wants in someone in finance.

Professor wants one or two of us in multiple regression for different issues. Can use survey data.

Structural equation modeling! To him, it is the future of stats. He believes it is very valuable for us.

Look at the folium library.

Map based libraries.

Not really a good database on property sale transactions in Canada.

Otherwise pretty good data analysis for other stuff Canada.

Show us Stats Canada next week

SCP COMMAND TO COPY WEBSITE DIRECTORIES

USE JUPITER NOTEBOOK TO WORK ON THE WEBSITE

JUPITER NOTEBOOK IS ONE OF THE BEST DEVELOPMENT ENVIRONMENT FOR PYTHON DEVELOPMENT

Reviewed Hamza's solution for uploading files

Reviewed Mark's research on his developer page

Reviewed Dante's solution for development environment

Client mentioned R language and R Studio

Reviewed GeoMap that client created

Client asked us to import census data onto the website and display it as a map

Make sure data is secure. And then we can upload it to MongoDB.

  • Learn more about MongoDB.
  • Learn how to merge tables.
  • Learn how to convert rows into columns, columns into rows, stuff like that.

R Studio something he will mention as a potential IDE solution.

Everyone working on the same stuff at the same time.

Everyone can learn what we are doing in each sprint (i.e MongoDB, Uploading Files, etc.)

Mark doing C# equivalent on his page.

Professor wants us to try with provincial first, then move to full Canadian Data.

Professor wants us all to research the different types of data types to upload to our website/MongoDB.

Amazon Lambda / Scala used for data streams.

Update the tabs at the top.

Taking 2 weeks off from development to finish documentation and presentation preparation

Went over Hamza's development on implementing JSON file uploading

Discussed the approach of uploading and verifying the file to the server before uploading to MongoDB

Need to look into lazy loading

Look into uploading HTML or plain text files to database

Things to bring up:

  • The documentation is all finished
  • Show him the project plan
  • We can fix up the top bar and make sure all the links work
  • We will also be documenting all the links and resources that he has shared with us, Bochi is going to work on this.
  • Kevin will talk about the UML
  • Presentation that he is invited to on Friday the 13th. Project Presentation that is basically covering everything. We will be presenting that to the chair as well.

Questions:

  • Have a constant way of uploading files JSON/XML
  • Shared repository? He is ok with us using GitHub but make sure to keep it private.

Notes:

  • Update UML to show full output. It is currently incomplete. I t needs to be a full design of the website
  • Links on top: show everything that we learn
  • He wants us to think about an outline of all the individual projects that we will be working on.
  • Include personal projects on out project plan as well as the presentation.

Reviewed Geo-Map Progression

  • Geo-map solution supports largest possible census file
  • Loading time is slow because data is not being pulled from database

Reviewed File Upload Progression

  • In final stages, made significant progress.

Reviewed Top Navigation Tabs

  • Tabs have been updated to show detailed summaries of each of the technologies.


Discussions:

  • We have made significant progress in the Managing Data Phase (90% completed)
  • After careful consideration, we are not going to encrypt the data in the database. (no more hashing)
  • Since classes are resuming pace will slow down due to classes resuming, but not to the point of the previous semester. We will always make sure to make to have steady progression each week and no delays.