1. Dtypes

column에 대한 데이터 타입을 알고자 한다면, dtype 를 사용한다.

reviews.price.dtype

# result will be dtype('float64')

모든 column에 대한 dtype를 알고싶다면, dtypes 를 사용한다.

reviews.dtypes

데이터 타입은 판다스가 데이터를 어떻게 저장하고 있는지 알 수 있다. 예를들어 'float64'는 64-bit floating 숫자를 이용하고 있다.

2. Missing data

결측값을 가지고 있는 항목은 NaN(Not a Number) 값을 가지고 있다. 기술적 이유로 인해 NaN은 float64 데이터 타입을 가지고 있으며 NaN항목의 값을 isnull()을 통해 검색할 수 있다.

reviews[pd.isnull(reviews.country)]

결측값을 대치하는 것은 fillna()함수를 통해 특정 column의 결측값을 변경할 수 있다.

reviews.region_2.fillna('Unknown')

replace()를 사용할 경우 이미 존재하고 있는 값을 다른 값으로 변경할 수 있다.

reviews.taster_twitter_handle.replace('@kerionkeefe', '@kerino')

Data Types and Missing Values

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

[ML]Basic Data Exploration (0)	2022.02.14
[Python] Renaming and Combining (0)	2022.02.11
[Python] Grouping and Sorting (0)	2022.02.11
[Python] Summary Functions and Maps (0)	2022.02.11
[Python] Indexing, Selecting & Assigning (0)	2022.02.11