当前位置:网站首页>Sqoop [put it into practice 02] sqoop latest version full database import + data filtering + field type support description and example code (query parameter and field type forced conversion)
Sqoop [put it into practice 02] sqoop latest version full database import + data filtering + field type support description and example code (query parameter and field type forced conversion)
2022-07-26 09:45:00 【Wind】
1. Environmental statement
Or the previous environment :
# Unnecessary information is no longer posted
# JDK
[[email protected] ~]# java -version
java version "1.8.0_251"
# MySQL
[[email protected] ~]# mysql -V
mysql Ver 14.14 Distrib 5.7.28
# Hadoop
[[email protected] ~]# hadoop version
Hadoop 3.1.3
# Hive
[[email protected] ~]# hive --version
Hive 3.1.2
# Sqoop
[[email protected] ~]# sqoop version
Sqoop 1.4.7
2.import-all-tables
Sqoop1 Supported by import-all-tables Command to export the whole database to HDFS/Hive, But there are two limitations to note :
- All tables must have primary keys ; Or use --autoreset-to-one-mapper , Represents that only one map task, That is, not in parallel .【 If you still want to execute in parallel , You can use --split-by Indicates the reference column for split data 】 ;
- You can't use non default Split Columns , It can't pass WHERE Clause to add any restrictions . Official original
You must not intend to use non-default splitting column, nor impose any conditionsvia a WHERE clause.
The default path for import is :/user/ user name / Several data table names / Data files and _SUCCESS, If you want to specify a directory, you need to use the specified parameters 【 Specific parameters can be used sqoop help import-all-tables see 】
【 example 1️⃣ 】 Import all libraries into HDFS:
# Be careful : The specified Import Directory uses --warehouse-dir No --target-dir Parameters
sqoop import-all-tables \
--connect jdbc:mysql://tcloud:3306/mysql \
--username root \
--password [email protected]\
--warehouse-dir /sqoop_all \
--fields-terminated-by '\t' \
--bindir /usr/local/sqoop/lib/ \
-m 1
# When all tables have primary keys You can set -m The parameter of is greater than 1 Otherwise, it can only be 1
【 example 2️⃣ 】 Export the entire library to Hive:
# Be careful : Specify that the imported database uses --hive-database
sqoop import-all-tables \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://tcloud:3306/mysql \
--username root \
--password [email protected]\
--hive-database sqoop_test \
--hive-import \
--hive-overwrite \
--bindir /usr/local/sqoop/lib/ \
-m 1
3.query
Sqoop1 Support use query Parameter definition query SQL Achieve data filtering , So you can import any desired result set . example :
sqoop import \
--connect jdbc:mysql://tcloud:3306/mysql \
--username root \
--password [email protected]\
--query 'select * from help_keyword where $CONDITIONS and help_keyword_id < 50' \
--delete-target-dir \
--target-dir /sqoop_hive \
--hive-database sqoop_test \ # Specify import target database If it is not specified, it defaults to Hive Medium default library
--hive-table filter_help_keyword \ # Specify import target table
--split-by help_keyword_id \ # Designated for split The column of
--hive-import \ # Import to Hive
--hive-overwrite \
--bindir /usr/local/sqoop/lib/ \
-m 1
In the use of query When filtering data , The following three points need to be noted :
- Must use --hive-table Indicate the target table ;
- If the degree of parallelism -m Not for 1 Or no designation --autoreset-to-one-mapper , You need to use --split-by Indicate the reference column ;
- SQL Of where The words must contain $CONDITIONS , This is a fixed way of writing , The function is dynamic replacement .
4. Field types support
Sqoop1 Most of the field types in the database are supported by default , But some special types are not supported . Unsupported type encountered , The program throws an exception Hive does not support the SQL type for column xxx
abnormal , In this case, you can cast the following two parameters :
- –map-column-java : rewrite SQL To Java Mapping of types ;
- –map-column-hive : rewrite Hive To Java Mapping of types .
Examples are as follows , Put the original id The field is forced to String type , value The field is forced to Integer type :
$ sqoop import ... --map-column-java id=String,value=Integer
边栏推荐
- Learning notes: what are the common array APIs that change the original array or do not change the original array?
- Double authentication of server and client
- SSG framework Gatsby accesses the database and displays it on the page
- Table extraction for opencv table recognition (2)
- 开发转测试:从0开始的6年自动化之路...
- JS 一行代码 获取数组最大值与最小值
- Logical architecture of MySQL
- 【Mysql数据库】mysql基本操作集锦-看得会的基础(增删改查)
- 新公链Aptos何以拉满市场期待值?
- 阿里云技术专家郝晨栋:云上可观测能力——问题的发现与定位实践
猜你喜欢
挡不住了,纯国产PC已就位,美国的软硬件体系垄断正式被破
MQTT X CLI 正式发布:强大易用的 MQTT 5.0 命令行工具
Matlab Simulink realizes fuzzy PID control of time-delay temperature control system of central air conditioning
Node memory overflow and V8 garbage collection mechanism
[MySQL] understand the important architecture of MySQL (I)
【Datawhale】【机器学习】糖尿病遗传风险检测挑战赛
阿里云技术专家郝晨栋:云上可观测能力——问题的发现与定位实践
面试突击68:为什么 TCP 需要 3 次握手?
asp. Net using redis cache
Gauss elimination solves the inverse of matrix (Gauss)
随机推荐
Development to testing: a six-year road to automation starting from 0
网络流学习笔记
m进制数str转n进制数
R language ggpubr package ggsummarystats function visualizes the grouping box diagram (custom grouping color) and adds the statistical values corresponding to the grouping under the x-axis label (samp
R语言ggplot2可视化: 将图例标题(legend title)对齐到ggplot2中图例框的中间(默认左对齐、align legend title to middle of legend)
Sqoop【付诸实践 02】Sqoop1最新版 全库导入 + 数据过滤 + 字段类型支持 说明及举例代码(query参数及字段类型强制转换)
服务器环境配置全过程
Simple pedestrian recognition code to 88% accuracy Zheng Zhedong preparation
Customize permission validation in blazor
面试题目大赏
Fiddler packet capturing tool for mobile packet capturing
POJ 1012 Joseph
Gauss elimination solves the inverse of matrix (Gauss)
高斯消元的应用
JS one line code to obtain the maximum and minimum values of the array
配置ADCS后访问certsrv的问题
matlab simulink实现模糊pid对中央空调时延温度控制系统控制
2019 ICPC Asia Yinchuan regional (water problem solution)
The whole process of server environment configuration
Mo team learning summary (II)