Code Changes - Edited by KT

Progress Update

How to start a SSH session using shell script

Note: this method is outdated, although it is necessary that we documented our progress

For your convenience, you can download this .sh file here
Or you can copy and paste it into your .sh file

        
            #!/bin/bash
            USERNAME="{@username}"
            IP="{@host}"
            PORT="22"
            ssh ${USERNAME}@${IP} -${PORT} -t "{cd /your/working/folder/here} ; bash --login"

Why do we choose to convert all files to JSON?

I decided to convert all files to JSON after uploading then upload to MongoDB. Some changes have been made to the user case diagram so here is a quick review:

Here is an example of how to convert XML file to JSON file:

First, I installed xmltodict library

                    
                        
                            
                            pip3 install xmltodict

Second, I created a new python file called xmlToJson.py

Third, let's look at our brand new converted file

Before:	After

Side notes:

Check out something I have discovered while editing this HTML file.
In Pycharm, you can drag and drop your image in your project explorer onto to code editor itself.

Convert CSV to JSON using Pandas

First, install pandas library

                    
                        
                            
                            pip3 install pandas

Second, I created a new python file called csvToJson.py

Third, let's look at our brand new converted file

Before:	After

Convert XLSX (Excel) to JSON using Pandas

First, install pandas library

                    
                        
                            
                            pip3 install pandas

Second, I created a new python file called xlsxToJson.py

Third, let's look at our brand new converted file

Before:	After

There is an extended Pandas library, where it uses a Pandas DataFrame to generate profile report quickly and easily.
I highly recommend checking out their Github repository where their documentation is available at Pandas Profiling Reports
I have created a script where you could generate a profile report from .csv file

This tab shows you how to fetch data from MongoDB

For this website to function properly, you need to upload clean data.
Here is an example of cleaning data from a raw dataset.
You can file this dataset from the Texas Department of Criminal Justice here
The information pf the attributes in this dataset can be found in the table below:

Attribute's name	Attribute's type	Description
SID Number	Numeric (Integer)	Security Identifier reference number
TDCJ Number	Numeric (Integer)	Full name of the inmate
Name	String	Security Identifier reference number
Current Facility	String	The current facility the inmate is staying in
Gender	Nominal(F: Female, M: Male)	Gender of the inmate
Race	Nominal(A: Asian, B: Black, H: Hispanic, I: Indigenous, U: Unspecified, W: White, O: Others)	Category of humankind that shares certain distinctive physical traits
Age	Numeric	The current age of the inmate
Projected Release	DateTime	Expected discharge in DateTime format
Maximum Sentence Date	DateTime	The maximum penalty in DateTime format
Parole Eligibility Date	DateTime	The start date of the parole if eligible
Case number	String	Unique identifier reference number
County	String	A specific region of a state
Offence code	Numeric	Offence code
TDCJ Offense	String	Texas Department of Criminal Justice Offense charges
Sentence Date	DateTime	Trial date
Offence Date	DateTime	DateTime of the offence occurring
Sentence (Years)	String	The penalty of the inmate in years
Last Parole Decision	String	Parole decision
Next Parole Review Date	DateTime	DateTime of parole review
Parole Review Status	Nominal(IN PAROLE REVIEW PROCESS, NOT IN REVIEW PROCESS, )	Parole status

Removed attributes and reasons for removal
- Any reference numbers are irrelevant in this study because it does not have meaningful value to our prediction. In this dataset, there are three different attributes that represent reference numbers: SID Number, TDJC Number and Case Numbers. These attributes will be removed from the dataset for analysis purposes.
- The Name attribute is irrelevant because the full name does not have any effect on determining the amount of time an individual is sentenced.
- The Age attribute is relevant, however, this attribute will be replaced by a newly added column that nominalizes the Age Group, more details on this column can be found in part c of this section.
- The Current Facility attribute is relevant, however, this attribute will be transformed to Prison type, more details on this column can be found in part c of this section.
- The County attribute is relevant, however, it will be replaced by Region, more details on this column can be found in part c of this section.
Relevant attributes, newly added attributes and reasons for inclusion
- The Age group attribute description is as follows:
> If age is not given, then NK (no instances)
> If age is less than 12, then Child
> If age is less than 18, then Teen
> If age is less than 50, then Young Adult
> If age is less than or equal to 65, then Senior Adult
>And if age is greater than 65, then Elderly
- The Prison type attribute is tricky to classify because there are a lot of inmate facilities. After some research, we found this dataset from the Texas Department of Criminal Justice that provides Facility name, Prison Gender and Prison Type. Then, we use the VLOOKUP function to find the Prison Gender and Type based on the Current Facility attribute, then use the CONCAT function to concatenate the string in both columns. We also use the ISNA function to identify not listed facilities as “Others”. Please refer to the screenshots below for a better understanding.

- The Region is very similar to the Prison Type attribute. First, we transform the Region attribute from the dataset from TDJC to a numeric type (I for 1, II for 2, III for 3, IV for 4, V for 5, VI for 6 and Private for 7, NA for 8). Then, we combine VLOOKUP, IF and ISNA functions to get the Region attribute for this dataset. Please refer to the screenshots below for a better understanding.

- The Release_Early is relevant because it is the boolean value of the difference between Projected Release and Maximum Sentence Date. There was an issue while trying to calculate the difference between the two dates because the input was a text instead of a date format. To fix this, we can Select the column → Text to Columns → Next → Next → Choose Date Format as “MDY” → Finish. Please refer to the screenshot below for a better understanding.
- Similarly, the Able_to_parole_early is relevant because it is the boolean value of the difference between the Parole Eligibility Date and Maximum Sentence Date.

- The Age_when_committing_crime_in_group is relevant because we want to find out at what age they committed their crime. Although their age is important, it is also important to find out when they did such things. To calculate this attribute, we can use the DATEDIF function (by years) to find the difference between Offense Date and TODAY(). Then, we use their current Age to subtract that result to get the age when they committed their crime. Then, we apply the same category (from Age) to the current value.

- For the sentenced, we classified them less than 5 would be low, less than 10 is medium, more than 10 is high and everything else is life or capital life, depends on the description

- For Parole Decision, we trimmed to columns parole review to Approved, None, Denied, or Blanks

- For offence code, we categorize them by looking at the first 2 numbers of the offence code. For example, numbers start with '9' will classify at Murder, '10' for kidnapping and so on.

- Let's take a look at the BEFORE and AFTER:
BEFORE

AFTER

Kevin Personal Project

This project uses the dataset from Kaggle.

This dataset is about NFT (Non-fungible token) sale history.
Did you know the most expensive NFT is sold for ~ $532 million? (last updated Dec 1st, 2021). Check out this article here
In this small little project, my goal is to understand the data, perform data cleansing and provide visualizations to the public!

Learn more