当前位置:网站首页>Dive into deep learning - 2.2 data preprocessing
Dive into deep learning - 2.2 data preprocessing
2022-07-19 03:28:00 【Trehol】
One 、 Reading data sets
os.makedirs(dir_name2, exist_ok=True): Function and os.mkdir It is also used to create new folders , But it is more convenient to use , More functions .
os.makedirs: You can create multiple folders recursively
os.makedirs: Of exist_ok Parameter set to True when , It can automatically judge Do not create a folder when it already exists
os.path.join('..', 'data')---- Stored in CSV( Comma separated values ) file ../data/house_tiny.csv in
import os
os.makedirs(os.path.join('..', 'data'), exist_ok=True)
data_file = os.path.join('..', 'data', 'house_tiny.csv')
with open(data_file, 'w') as f: # open data_file file , Write it down as f, Then write it .
f.write('NumRooms,Alley,Price\n') # Name
f.write('NA,Pave,127500\n') # Each row represents a data sample
f.write('2,NA,106000\n')
f.write('4,NA,178100\n')
f.write('NA,NA,140000\n')Two 、 Handling missing values
inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)dummy_na : bool, default False, Add a column to show the vacancy value , If False Just ignore the vacancy value
NumRooms Alley_Pave Alley_nan
0 3.0 1 0
1 2.0 0 1
2 4.0 0 1
3 3.0 0 13、 ... and 、 practice
Create a raw dataset with more rows and columns .
Delete the column with the most missing values .
Convert the preprocessed data set into tensor format .
Practice solving the first problem
Methods of counting missing values
null_all = data.isnull().sum()
#isnull The function checks whether the data is missing and returns a Boolean value , Element is empty or NaN return Ture, Otherwise, it would be False
#data.isnull().any() Determine which columns contain missing values , If there is a missing value in this column, return True, conversely False
#data.isnull().sum() Returns the number of missing values per column
#dropna(thresh=2),thresh Set the threshold , The number of missing values is greater than the threshold value for the whole line (axis=0) Or entire column (axis=1) Will be deleted
drop Description of relevant parameters of function :
Parameters axis=0, Indicates the operation on the line , If you operate on a column, change the default parameter to axis=1.
Parameters inplace=False, Indicates that the deletion operation does not change the original data , Returns a new after the delete operation dataframe, For example, delete the original data directly , Then change the default parameter to inplace=True.
import pandas as pd
import os
import torch
os.makedirs(os.path.join('F:/Pycharm/DIVE INTO DL', 'data2.2'), exist_ok=True)
data_file = os.path.join('F:/Pycharm/DIVE INTO DL', 'data2.2', 'house_tiny.csv')
with open(data_file, 'w') as f:
f.write('NumRooms,Alley,Price\n')
f.write('NA,Pave,127500\n') # Each row represents a data sample
f.write('2,NA,106000\n')
f.write('NA,NA,178100\n')
f.write('NA,NA,140000\n')
f.write('2,NA,106000\n')
f.write('NA,NA,178100\n')
f.write('NA,NA,140000\n')
f.write('NA,NA,178100\n')
f.write('NA,NA,140000\n')
data = pd.read_csv(data_file)
# Handling missing values , First delete the column with the most missing values
col_null = data.isna().sum(axis=0)
col_null_dict = col_null.to_dict()# Turn to dictionary
#col_max = col_null.max(axis=0)# Find the maximum value of missing values summed by columns
max_key = max(col_null_dict.keys(),key=col_null_dict.get)
# Incoming here col_null_dict The effect is the same , All are passed in key values for iteration
# The latter represents the standard of comparison , Is equal to get(keys). It can be understood as finding the key corresponding to the maximum value , Return to key
print(col_null)
print(' The key corresponding to the maximum value is :'+ max_key)
del data[max_key]
print(' After deleting the column with the most missing values , The data is :')
print(data)
# because data Data is not in numeric format , So it can't be used directly data_tensor = torch.tensor(data) Convert to tensor format
data_post = data.iloc[:, :2]# At this time data_post It's not a numeric type
data_tensor = torch.tensor(data_post.values)# First convert to numeric type
print(' After converting to tensor format :')
print(data_tensor)
This operation only deletes the column with the most missing values from the original data set , Missing values that are not deleted are not handled
边栏推荐
- zsh: command not found: mysql
- Browser cannot open tensorboard
- Examine your investment path
- [template record] string hash to judge palindrome string
- oracle 查询非自增长分区的最大分区
- Using gatekeeper to restrict kubernetes to create specific types of resources
- While loop
- Simple usage and interface introduction of labelme
- Theoretical basis of double Q-learning and its code implementation [pendulum-v0]
- MySQL optimized index
猜你喜欢

Ubuntu clear CUDA cache

【模板记录】字符串哈希判断回文串
![Monte Carlo based reinforcement learning method [with code implementation]](/img/39/346b2f4122238eb0d51ca164ab6d86.png)
Monte Carlo based reinforcement learning method [with code implementation]

通过Dao投票STI的销毁,SeekTiger真正做到由社区驱动

Zabbix6.0 monitors Dell and IBM server hardware through Idrac and imm2

JDBC connection to MySQL database

Browser cannot open tensorboard

By voting for the destruction of STI by Dao, seektiger is truly community driven

洛谷每日三题之第三天(第四天补做)

04_服务注册Eureka
随机推荐
Graphql first acquaintance
oracle 关闭回收站
RuntimeError_ Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor)
Vs code problem: launch:program '... \ vscode\launch. exe‘ dose not exist
Pure virtual function
zsh: command not found: mysql
We should increase revenue and reduce expenditure
Polynomial interpolation fitting (II)
leetcode162. 寻找峰值
Ncnn allocator memory allocator
Bisenetv2 face segmentation ncnn reasoning
Leetcode: subsequence problem in dynamic programming
Rewrite equals why rewrite hashcode
ubuntu清除cuda缓存
leetcode:50. Pow(x, n)
Win10 network connection shows no network but Internet access
Dqn theoretical basis and code implementation [pytoch + cartpole-v0]
RESNET learning notes
Game theory of catching lice
Install Net prompt "cannot establish a certificate chain to trust the root authority" (simple method with download address)