1. AS

라이브러리를 축약해서 사용하기위해 as를 쓰는것과 동일하게 Bigquery도 컬럼명을 변경하기 위해서 AS를 쓴다.

query = """
        SELECT Animal, COUNT(1) AS Number
        FROM `bigquery-public-data.pets_records.pets`
        GROUP BY Animal
        """

2. With ... AS

CTE(Common Table Expression)는 서브쿼리로 쓰이는 파생테이블과 비슷한 개념으로 사용되며 복잡한 쿼리문에서 코드의 가독성과 재사용성을 위해 사용된다. CTE안에 독립적으로 존재하는 query는 하나의 데이터 묶음을 형성하여 복잡한 데이터 셋을 생성하는데 효율적으로 코드를 작성하게 해준다.

# Query to select the number of transactions per date, sorted by date

query_with_CTE = """
                 WITH time AS
                 (
                     SELECT DATE(block_timestamp) AS trans_date
                     FROM `bigquery-public-data.crypto_bitcoin.transactions`
                 )
                 SELCECT COUNT(1) AS transactions, trans_date
                 FROM time
                 GROUP BY trans_date
                 ORDER BY trans_date
                 """
# Set up the query

safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
query_job = client.query(query_with_CTE, job_config=safe_config)

# API request - run the query, and convert the results to a pandas DataFrame
transactions_by_date = query_job.to_dataframe()

# Print the first five rows
transactions_by_date.head()

3. Exercise : As & With

Step 1 : Find the data

# Your code here to find the table name

tables = list(client.list_tables(dataset))

for table in tables : 
    print(table.table_id)

# Write the table name as a string below
table_name = 'taxi_trips'

Step 2 : Peek at the data

# Your code here

table_ref = dataset_ref.table("taxi_trips")

table = client.get_table(table_ref)

client.list_rows(table, max_results = 5).to_dataframe()

Step 3 : Determine when this data is from

# Your code goes here
rides_per_year_query = """SELECT EXTRACT(YEAR FROM trip_start_timestamp) AS year, COUNT(1) AS num_trips
                          FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
                          GROUP BY year
                       """

# Set up the query (cancel the query if it would use too much of 
# your quota)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
rides_per_year_query_job = client.query(rides_per_year_query, job_config = safe_config) # Your code goes here

# API request - run the query, and return a pandas DataFrame
rides_per_year_result = rides_per_year_query_job.to_dataframe() # Your code goes here

# View results
print(rides_per_year_result)

Step 4 : Dive slightly deeper

# Your code goes here
rides_per_month_query = """SELECT EXTRACT(MONTH FROM trip_start_timestamp) AS month, COUNT(1) AS num_trips
                           FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
                           WHERE EXTRACT(YEAR FROM trip_start_timestamp) = 2017
                           GROUP BY month
                           ORDER BY month
                        """ 

# Set up the query
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
rides_per_month_query_job = client.query(rides_per_month_query, job_config = safe_config) # Your code goes here

# API request - run the query, and return a pandas DataFrame
rides_per_month_result = rides_per_month_query_job.to_dataframe()

Step 5 : Write the query

# Your code goes here
speeds_query = """
               WITH RelevantRides AS
               (
                   SELECT EXTRACT(HOUR FROM trip_start_timestamp) AS hour_of_day, 
                          trip_miles, trip_seconds
                   FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
                   WHERE trip_start_timestamp > '2017-01-01' AND 
                         trip_start_timestamp < '2017-07-01' AND 
                         trip_miles > 0 AND 
                         trip_seconds > 0
               )
               SELECT hour_of_day, COUNT(1) AS num_trips,
                      3600 * SUM(trip_miles) / SUM(trip_seconds) AS avg_mph
               FROM RelevantRides
               GROUP BY hour_of_day
               ORDER BY hour_of_day
               """

# Set up the query
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
speeds_query_job = client.query(speeds_query, job_config = safe_config) # Your code here

# API request - run the query, and return a pandas DataFrame
speeds_result = speeds_query_job.to_dataframe() # Your code here

# View results
print(speeds_result)

Source of the course : Kaggle Course _ As & With

'Course > [Kaggle] Data Science' 카테고리의 다른 글

[SQL] JOINs and UNIONs (0)	2022.02.24
[SQL] Joining data (0)	2022.02.24
[SQL] Order By (0)	2022.02.24
[SQL] Group By, Having & Count (0)	2022.02.23
[SQL] Select, From & Where (0)	2022.02.23

[SQL] AS & With

1. AS

2. With ... AS

3. Exercise : As & With

'Course > [Kaggle] Data Science' 카테고리의 다른 글

티스토리툴바