Know where to start looking for data. In the process, you will look at different methods of obtaining data, and then find your own data in a relaxed way.

First define the subject and specify the subject so that you can narrow your search, but be flexible enough to tailor your needs based on existing sources.

                            
                                #local
                                import pandas as pd
                                csv_file = "/Users/xxx.csv"
                                df_from_csv = pd.read_csv(csv_file)
                                df_from_csv.info()

                                #web
                                import numpy as np
                                import pandas as pd
                                import requests
                                import matplotlib.pyplot as plt
                                %matplotlib inline
                                r = requests.get('https://www.renthop.com/nyc/apartments-for-rent')

                                #database
                                import pandas as pd
                                import sqlite3 as sql
                                conn = sql.connect('/Users/xxx.db')
                                
                                # First pattern - turn query directly into dataframe:
                                df1 = pd.read_sql_query("SELECT * FROM invoice", conn)

                                # Second pattern - get row-level data, but no column names
                                cur = conn.cursor()
                                results = cur.execute("SELECT * FROM invoice LIMIT 5").fetchall()
                                df2 = pd.DataFrame(results)

                                #mongo
                                import pymongo
                                client = pymongo.MongoClient("mongodb://localhost:27017/")
 
                                # Database Name
                                db = client["xxx"]
 
                                # Collection Name
                                col = db["xxx"]
 
                                x = col.find_one()
                                print(x)
                                
                                #APIs
                                import requests
                                response = requests.get("http://api.open-notify.org/astros.json")
                                print(response.status_code)
                                print(response.json())
                                res = pd.DataFrame(response.json()["people"])
                                res.head()

                            
                            

When searching for data, follow specific steps to ensure that you get accurate and relevant project data. Use the information below to start searching for your data.


Before searching for data, clearly define the type of data you need. Ask yourself the following questions to help identify potential data sources.


  • What are you measuring?

  • What is the analysis unit related to your topic? What are you studying?


  • When did your research topic happen?

  • Specify the date or time frame you are interested in. Keep in mind that for recent events, data may not be available immediately. In addition, the existence of a subject's data today does not mean that it exists in the past. Likewise, just because data existed in the past does not mean that the same data exists now.


  • What is the focus of your research topic?

  • Determine the geographic location of your topic's attention. Your topic can focus on political boundaries, such as countries, states, countries, cities, or regions. It may also focus on statistical boundaries, such as census tracts or urban areas.


  • Who can collect data about your subject?

  • The main data collectors are private companies, government agencies, non-governmental organizations, academic institutions, and voting organizations.


  • How is the data collected?

  • How often is the data collected, collected regularly or only once in a period of time? Is it collected from surveys or interviews? Is the collected data ethical?


    After defining topic boundaries, you can use them to identify search terms or keywords to begin the search process. This will ensure that the search method is efficient and effective, saves time, and produces the most relevant results.


  • To get started, divide the topic into different sections and identify the main concepts.

  • For example, from 2015 to 2020, how many cars were traded between Germany and the United States? For this question, your main concept is: What is the main core thing? (Car) What is the main action? (trade) Who are the participants? (Germany and the United States) What time is it? (2015 and 2020).


  • When searching for data, use these concepts as keywords. When proposing appropriate search terms, be sure to consider synonyms and word variations.

  • For example, you can try to search for the words "import" or "export" instead of "transaction".


  • Search strategy #1 Start with general data search.

  • If you are not sure what types of variables exist or what data is relevant to the project, this is a good strategy. Select a portal from the general data list.


  • Search strategy #2 Search by main research area.

  • Visit the online data source to find a suitable area for your research.


  • Search strategy #3 Targeted search.

  • If you are familiar with library databases or know who is the main source of the data you are looking for, this may be a good strategy. Visit the library database, or go to the website of the relevant organization to find the data.


  • Search strategy #4 Turn to literature.

  • By searching existing literature, you can discover data sets. When you find a related article, it may point to the data set it uses. You can also browse the data repository to see if anyone has archived the data in their research.


    10 Great Places to Find Free Datasets for Your Next Project


  • Google Dataset Search.

  • Kaggle.

  • Data.Gov.

  • Datahub.io.

  • UCI Machine Learning Repository.

  • Earth Data.

  • CERN Open Data Portal.

  • Global Health Observatory Data Repository.

  • NYC Taxi Trip Data

  • FBI Crime Data Explorer

  • From basic development to step-by-step learning and research to find data, from what data is needed, clarifying one's research direction, and finally determining the goal, starting is to find data and put it into practice, and find data that meets your goals on the Internet.