Know where to start looking for data. In the process, you will look at different methods of obtaining data, and then find your own data in a relaxed way.
First define the subject and specify the subject so that you can narrow your search, but be flexible enough to tailor your needs based on existing sources.
#local
import pandas as pd
csv_file = "/Users/xxx.csv"
df_from_csv = pd.read_csv(csv_file)
df_from_csv.info()
#web
import numpy as np
import pandas as pd
import requests
import matplotlib.pyplot as plt
%matplotlib inline
r = requests.get('https://www.renthop.com/nyc/apartments-for-rent')
#database
import pandas as pd
import sqlite3 as sql
conn = sql.connect('/Users/xxx.db')
# First pattern - turn query directly into dataframe:
df1 = pd.read_sql_query("SELECT * FROM invoice", conn)
# Second pattern - get row-level data, but no column names
cur = conn.cursor()
results = cur.execute("SELECT * FROM invoice LIMIT 5").fetchall()
df2 = pd.DataFrame(results)
#mongo
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017/")
# Database Name
db = client["xxx"]
# Collection Name
col = db["xxx"]
x = col.find_one()
print(x)
#APIs
import requests
response = requests.get("http://api.open-notify.org/astros.json")
print(response.status_code)
print(response.json())
res = pd.DataFrame(response.json()["people"])
res.head()
When searching for data, follow specific steps to ensure that you get accurate and relevant project data. Use the information below to start searching for your data.
Before searching for data, clearly define the type of data you need. Ask yourself the following questions to help identify potential data sources.
What is the analysis unit related to your topic? What are you studying?
Specify the date or time frame you are interested in. Keep in mind that for recent events, data may not be available immediately. In addition, the existence of a subject's data today does not mean that it exists in the past. Likewise, just because data existed in the past does not mean that the same data exists now.
Determine the geographic location of your topic's attention. Your topic can focus on political boundaries, such as countries, states, countries, cities, or regions. It may also focus on statistical boundaries, such as census tracts or urban areas.
The main data collectors are private companies, government agencies, non-governmental organizations, academic institutions, and voting organizations.
How often is the data collected, collected regularly or only once in a period of time? Is it collected from surveys or interviews? Is the collected data ethical?
After defining topic boundaries, you can use them to identify search terms or keywords to begin the search process. This will ensure that the search method is efficient and effective, saves time, and produces the most relevant results.
For example, from 2015 to 2020, how many cars were traded between Germany and the United States? For this question, your main concept is: What is the main core thing? (Car) What is the main action? (trade) Who are the participants? (Germany and the United States) What time is it? (2015 and 2020).
For example, you can try to search for the words "import" or "export" instead of "transaction".
If you are not sure what types of variables exist or what data is relevant to the project, this is a good strategy. Select a portal from the general data list.
Visit the online data source to find a suitable area for your research.
If you are familiar with library databases or know who is the main source of the data you are looking for, this may be a good strategy. Visit the library database, or go to the website of the relevant organization to find the data.
By searching existing literature, you can discover data sets. When you find a related article, it may point to the data set it uses. You can also browse the data repository to see if anyone has archived the data in their research.
10 Great Places to Find Free Datasets for Your Next Project
From basic development to step-by-step learning and research to find data, from what data is needed, clarifying one's research direction, and finally determining the goal, starting is to find data and put it into practice, and find data that meets your goals on the Internet.