Accessing Data


It's great that you've found your data, but now you need to be able to utilize and analyze it. That starts with inputting the data into the application. This involves a series of steps to send the data from the client to the server.

  1. Create the webpage
  2. Import packages
  3. Set an upload endpoint
  4. Grab the filename
  5. Check the file extension
  6. Check for a virus
  7. Move the data...

Create the webpage

Build the webpage your user will interface with. You'll need a form with multipart form-data. The button will be linked to handle the event to upload the file to the server.

                                
                                    
                                
                            

Import packages

To be able to manipulate the server's file system and files, you'll need to import a new Python library called os. You'll also need the request package from the Flask library to be able to access the file reference in your scripts.

To install the os library, install onto your version of Python from command line using the following command:

                                
                                    
                                        
                                        pip install os
                                    
                                
                            

                                
                                    
                                
                            

Set an upload endpoint

You'll need to establish an a directory to which users with upload the file to the server. It's a good idea to save this as a constant variable, since this theoretically should never change, except with changing dev environments. Note that you should save this path as a relative path from the project root. This way migrating the app from one server to another server won't cause routing issues.

                                
                                    
                                
                            

Grab the file name and file extension

You'll need to create a function that receives a POST request to upload the file to the server. To grab the file name, as long as you have the path to the file, you can access the file name as property and can split the the text to get the name and extension.

                                
                                    
                                
                            

Check for a virus

After the file is uploaded and saved to the server, the file is then virus scanned. While there are several ways to approach this, we took an approach of using a virus scanning API. The result that comes back says the file has a virus or not. We used stored the result from our virus scan as a bool and tell the server to delete the file.

                                
                                    
                                
                            

If the check returns false, bring the user to the error page.

                                
                                    
                                
                            

Converting the file

To prepare the file for cleanup, we decided to approach this task in strategic way. We are double checking the file extension specifically for preparing the data for cleanup before it's entered into the database. The approach we took was to convert supported file types to JSON and to clean up the data using one method rather than a different way for each different file type; however, we discovered that this process is harder than expected and therefore stuck with the traditional methods. The supported file types for the Data2Int site are CSV, JSON, XLSX, and XML.

JSON
                                
                                    
                                
                            

CSV
                                
                                    
                                
                            

XML
                                
                                    
                                
                            

XLSX
                                
                                    
                                
                            

Moving the data

Last step is to upload to the database. This would involve setting up variables such as the host, port, database name and collection name. Just like the file upload directory, these are best saved as constants. The process is relatively simple for uploading to a MongoDB: You instantiate a MongoClient object using the host and the port number. Then establish the connection using the database and collection names. Finally, you read the file into a data structure before inserting as a row of data.