1. ORDER BY
ORDER BY는 query 구문에서 가장 마지막에 쓰이며 결과를 ORDER BY 뒤에 나오는 컬럼명으로 요약한다.
query = """
SELECT ID, Name, Animal
FROM `bigquery-public-data.pet_records.pets`
ORDER BY ID
"""
내림차순으로 데이터를 정렬하고 싶을 경우 ORDER BY 구문의 가장 마지막에 DESC를 붙인다.
query = """
SELECT ID, Name, Animal
FROM `bigquery-public-data.pet_records.pets`
ORDER BY Animal DESC
"""
2. Dates
BigQuery에는 DATE와 DATETIME 방식이 있다. DATE 형식의 경우 연, 월, 일의 형태로 존재하며 DATETIME은 시간이 추가되어 있다.
3. EXTRACT
EXTRCAT를 활용하면 DATE나 DATETIME 타입의 데이터에서 연도와 일자를 추출할 수 있다.
query = """
SELECT Name, EXTRACT(DAY from Date) AS Day
FROM `bigquery-public-data.pet_records.pets_with_date`
"""
4. Using case of ORDER BY
# Query to find out the number of accidents for each day of the week
query = """
SELECT COUNT(consequtive_number) AS num_accidents,
EXTRACT(DATOFWEEK FROM timestamp_of_crash) AS day_of_week
FROM `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015`
GROUP BY day_of_week
ORDER BY num_accidents DESC
"""
# Set up the query
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed = 10**9)
query_job = client.query(query, job_config = safe_config)
# API request
accidents_by_day = query_job.to_dataframe()
5. Exercise : Order By
Step 1 : Government expenditure on education
To answer this question, consider only the rows in the dataset corresponding to indicator code SE.XPD.TOTL.GD.ZS, and write a query that returns the average value in the value column for each country in the dataset between the years 2010-2017 (including 2010 and 2017 in the average).
# Your code goes here
country_spend_pct_query = """
SELECT country_name, AVG(value) AS avg_ed_spending_pct
FROM `bigquery-public-data.world_bank_intl_education.international_education`
WHERE indicator_code = 'SE.XPD.TOTL.GD.ZS' AND year <= 2017 AND year >= 2010
GROUP BY country_name
ORDER BY avg_ed_spending_pct DESC
"""
# Set up the query (cancel the query if it would use too much of
# your quota, with the limit set to 1 GB)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
country_spend_pct_query_job = client.query(country_spend_pct_query, job_config=safe_config)
# API request - run the query, and return a pandas DataFrame
country_spending_results = country_spend_pct_query_job.to_dataframe()
# View top few rows of results
print(country_spending_results.head())
Step 2 : Identify interesting codes to explore
# Your code goes here
code_count_query = """SELECT indicator_code, indicator_name, COUNT(1) AS num_rows
FROM `bigquery-public-data.world_bank_intl_education.international_education`
WHERE year = 2016
GROUP BY indicator_code, indicator_name
HAVING num_rows >= 175
ORDER BY num_rows DESC
"""
# Set up the query
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
code_count_query_job = client.query(code_count_query, job_config=safe_config)
# API request - run the query, and return a pandas DataFrame
code_count_results = code_count_query_job.to_dataframe()
# View top few rows of results
print(code_count_results.head())
Source of the course : Kaggle Course _ Order By
Order By
Explore and run machine learning code with Kaggle Notebooks | Using data from US Traffic Fatality Records
www.kaggle.com
'Course > [Kaggle] Data Science' 카테고리의 다른 글
[SQL] Joining data (0) | 2022.02.24 |
---|---|
[SQL] AS & With (0) | 2022.02.24 |
[SQL] Group By, Having & Count (0) | 2022.02.23 |
[SQL] Select, From & Where (0) | 2022.02.23 |
[SQL] Getting Started with SQL and BigQuery (0) | 2022.02.23 |