1. AS
라이브러리를 축약해서 사용하기위해 as를 쓰는것과 동일하게 Bigquery도 컬럼명을 변경하기 위해서 AS를 쓴다.
query = """
SELECT Animal, COUNT(1) AS Number
FROM `bigquery-public-data.pets_records.pets`
GROUP BY Animal
"""
2. With ... AS
CTE(Common Table Expression)는 서브쿼리로 쓰이는 파생테이블과 비슷한 개념으로 사용되며 복잡한 쿼리문에서 코드의 가독성과 재사용성을 위해 사용된다. CTE안에 독립적으로 존재하는 query는 하나의 데이터 묶음을 형성하여 복잡한 데이터 셋을 생성하는데 효율적으로 코드를 작성하게 해준다.
# Query to select the number of transactions per date, sorted by date
query_with_CTE = """
WITH time AS
(
SELECT DATE(block_timestamp) AS trans_date
FROM `bigquery-public-data.crypto_bitcoin.transactions`
)
SELCECT COUNT(1) AS transactions, trans_date
FROM time
GROUP BY trans_date
ORDER BY trans_date
"""
# Set up the query
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
query_job = client.query(query_with_CTE, job_config=safe_config)
# API request - run the query, and convert the results to a pandas DataFrame
transactions_by_date = query_job.to_dataframe()
# Print the first five rows
transactions_by_date.head()
3. Exercise : As & With
Step 1 : Find the data
# Your code here to find the table name
tables = list(client.list_tables(dataset))
for table in tables :
print(table.table_id)
# Write the table name as a string below
table_name = 'taxi_trips'
Step 2 : Peek at the data
# Your code here
table_ref = dataset_ref.table("taxi_trips")
table = client.get_table(table_ref)
client.list_rows(table, max_results = 5).to_dataframe()
Step 3 : Determine when this data is from
# Your code goes here
rides_per_year_query = """SELECT EXTRACT(YEAR FROM trip_start_timestamp) AS year, COUNT(1) AS num_trips
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
GROUP BY year
"""
# Set up the query (cancel the query if it would use too much of
# your quota)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
rides_per_year_query_job = client.query(rides_per_year_query, job_config = safe_config) # Your code goes here
# API request - run the query, and return a pandas DataFrame
rides_per_year_result = rides_per_year_query_job.to_dataframe() # Your code goes here
# View results
print(rides_per_year_result)
Step 4 : Dive slightly deeper
# Your code goes here
rides_per_month_query = """SELECT EXTRACT(MONTH FROM trip_start_timestamp) AS month, COUNT(1) AS num_trips
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE EXTRACT(YEAR FROM trip_start_timestamp) = 2017
GROUP BY month
ORDER BY month
"""
# Set up the query
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
rides_per_month_query_job = client.query(rides_per_month_query, job_config = safe_config) # Your code goes here
# API request - run the query, and return a pandas DataFrame
rides_per_month_result = rides_per_month_query_job.to_dataframe()
Step 5 : Write the query
# Your code goes here
speeds_query = """
WITH RelevantRides AS
(
SELECT EXTRACT(HOUR FROM trip_start_timestamp) AS hour_of_day,
trip_miles, trip_seconds
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE trip_start_timestamp > '2017-01-01' AND
trip_start_timestamp < '2017-07-01' AND
trip_miles > 0 AND
trip_seconds > 0
)
SELECT hour_of_day, COUNT(1) AS num_trips,
3600 * SUM(trip_miles) / SUM(trip_seconds) AS avg_mph
FROM RelevantRides
GROUP BY hour_of_day
ORDER BY hour_of_day
"""
# Set up the query
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
speeds_query_job = client.query(speeds_query, job_config = safe_config) # Your code here
# API request - run the query, and return a pandas DataFrame
speeds_result = speeds_query_job.to_dataframe() # Your code here
# View results
print(speeds_result)
Source of the course : Kaggle Course _ As & With
'Course > [Kaggle] Data Science' 카테고리의 다른 글
[SQL] JOINs and UNIONs (0) | 2022.02.24 |
---|---|
[SQL] Joining data (0) | 2022.02.24 |
[SQL] Order By (0) | 2022.02.24 |
[SQL] Group By, Having & Count (0) | 2022.02.23 |
[SQL] Select, From & Where (0) | 2022.02.23 |