当前位置:网站首页>8. ODS layer construction of data warehouse
8. ODS layer construction of data warehouse
2022-07-19 05:34:00 【Mmj666】
Data warehouse ODS Layer construction
In this project, the construction of each layer of data warehouse is mainly divided into two parts , The first part is to determine which tables , The second part is to determine the way of data loading .
We're doing ODS When the floor is built , The following points need to be clarified :
1)ODS The table structure design of layer relies on the data structure synchronized from the business system .
2)ODS The layer should save all historical data , Therefore, the compression format should choose the one with high compression , Choose... Here gzip.
3)ODS The naming convention of layer table name is :ods_ Table name _ Single partition incremental full identification (inc/full).
When we synchronize data , The synchronized user behavior log data is json String format ; The incremental table uses Maxwell Synchronized , It's also json String format ; The full scale uses DataX synchronous , The data synchronized to is tsv Format . therefore , We're doing ODS When designing the floor structure , This needs to be taken into account .
1. Log table design
We have two options , The first scheme is to create a table with only one field , One line stores a json character string , When we get the corresponding data , Can pass get_json_object() function , Get the information of the corresponding field from the string ; The second scheme is that we directly build a json surface , The fields in this table and json The fields in the string correspond one by one , This will make it convenient for us to use data later . Our final choice is to build json surface
1.1 Log table creation statement
Let's go to the official website to check Jian json Syntax table :
We combine behavioral data json The format of , The final log table creation statement is as follows :
DROP TABLE IF EXISTS ods_log_inc;
CREATE EXTERNAL TABLE ods_log_inc
(
`common` STRUCT<ar :STRING,ba :STRING,ch :STRING,is_new :STRING,md :STRING,MID :STRING,os :STRING,uid :STRING,vc
:STRING> COMMENT ' Public information ',
`page` STRUCT<during_time :STRING,item :STRING,item_type :STRING,last_page_id :STRING,page_id
:STRING,source_type :STRING> COMMENT ' Page information ',
`actions` ARRAY<STRUCT<action_id:STRING,item:STRING,item_type:STRING,ts:BIGINT>> COMMENT ' Action information ',
`displays` ARRAY<STRUCT<display_type :STRING,item :STRING,item_type :STRING,`order` :STRING,pos_id
:STRING>> COMMENT ' Exposure information ',
`start` STRUCT<entry :STRING,loading_time :BIGINT,open_ad_id :BIGINT,open_ad_ms :BIGINT,open_ad_skip_ms
:BIGINT> COMMENT ' Startup information ',
`err` STRUCT<error_code:BIGINT,msg:STRING> COMMENT ' error message ',
`ts` BIGINT COMMENT ' Time stamp '
) COMMENT ' Activity information table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_log_inc/';
We can see that the compression format is not written in the above table creation statement , But we are transmitting to HDFS When on, it uses gzip Compressed format , This is because hive It can automatically recognize our gzip Compressed format .
As shown in the figure below , We built the table successfully :
1.2 Log table load statement
Next, we write the log table loading statement , We will hdfs The data on is loaded into the table of the corresponding date partition :
-- Data loading
load data inpath '/origin_data/gmall/log/topic_log/2022-05-01' into table ods_log_inc partition(dt='2022-05-01');
But this statement can only 2022-05-01 The daily data is loaded into the table , So we need to write a daily loading script , Used to load log table data every day :
(1) stay hadoop102 Of /home/hadoop/bin Create under directory hdfs_to_ods_log.sh
[[email protected] bin]$ vim hdfs_to_ods_log.sh
(2) We write the following in this script :
#!/bin/bash
# Defining variables is easy to modify
APP=gmall
# If it is the entered date, the entered date shall be taken ; If no date is entered, take the day before the current time
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
echo ================== Log date is $do_date ==================
sql="
load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log_inc partition(dt='$do_date');
"
hive -e "$sql"
(3) Add permissions to the script
[[email protected] bin]$ chmod +x hdfs_to_ods_log.sh
(4) We only need to execute this script every day , You can complete the loading of log data , The following figure shows the loading 2022/5/1 The data of
[[email protected] bin]$ hdfs_to_ods_log.sh 2022-05-01


2. Business table design
The design of our full scale should be consistent with Mysql The fields of the corresponding table in the business system are consistent , in addition , We need to set up partitions , Load the data into the corresponding partition every day .
For the design of incremental table , Let's first look at the format of the data in the incremental table :

We can see , Because we use Maxwell To synchronize , So the data we synchronized is based on json Stored in the form of a string , For incremental table , It's also about building json surface . The fields we finally leave are type,ts as well as data.
2.1 Activity information table ( Full scale ) Design
The table creation statement of the activity information table is as follows :
DROP TABLE IF EXISTS ods_activity_info_full;
CREATE EXTERNAL TABLE ods_activity_info_full
(
`id` STRING COMMENT ' Activities id',
`activity_name` STRING COMMENT ' The name of the event ',
`activity_type` STRING COMMENT ' Type of activity ',
`activity_desc` STRING COMMENT ' Activity description ',
`start_time` STRING COMMENT ' Starting time ',
`end_time` STRING COMMENT ' End time ',
`create_time` STRING COMMENT ' Creation time '
) COMMENT ' Activity information table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_activity_info_full/';
We will hive In the table null The storage format of the value is set to ’‘ An empty string . Because we use DataX Take data from mysql Lead to HDFS Upper time ,DataX Will Mysql The null value is stored as ’' Empty string form . We want to make sure that hive Null values can be correctly identified , So here we will Hive Of NULL Defined as an empty string .
(Hive Default null value format :‘\N’,Mysql Default null value storage format in :null,datax Will be able to null Save value as ’')
2.2 Activity rule table ( Full scale ) Design
DROP TABLE IF EXISTS ods_activity_rule_full;
CREATE EXTERNAL TABLE ods_activity_rule_full
(
`id` STRING COMMENT ' Number ',
`activity_id` STRING COMMENT ' type ',
`activity_type` STRING COMMENT ' Type of activity ',
`condition_amount` DECIMAL(16, 2) COMMENT ' Full reduction amount ',
`condition_num` BIGINT COMMENT ' Full reduction of pieces ',
`benefit_amount` DECIMAL(16, 2) COMMENT ' Preferential amount ',
`benefit_discount` DECIMAL(16, 2) COMMENT ' Discount ',
`benefit_level` STRING COMMENT ' Discount level '
) COMMENT ' Activity rule table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_activity_rule_full/';
2.3 List of first-class categories ( Full scale ) Design
DROP TABLE IF EXISTS ods_base_category1_full;
CREATE EXTERNAL TABLE ods_base_category1_full
(
`id` STRING COMMENT ' Number ',
`name` STRING COMMENT ' Category name '
) COMMENT ' List of first-class categories '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_base_category1_full/';
2.4 Class II ( Full scale ) Design
DROP TABLE IF EXISTS ods_base_category2_full;
CREATE EXTERNAL TABLE ods_base_category2_full
(
`id` STRING COMMENT ' Number ',
`name` STRING COMMENT ' Secondary classification name ',
`category1_id` STRING COMMENT ' First level classification number '
) COMMENT ' Class II '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_base_category2_full/';
2.5 Class III category list ( Full scale ) Design
DROP TABLE IF EXISTS ods_base_category3_full;
CREATE EXTERNAL TABLE ods_base_category3_full
(
`id` STRING COMMENT ' Number ',
`name` STRING COMMENT ' The name of the third level classification ',
`category2_id` STRING COMMENT ' Secondary classification number '
) COMMENT ' Class III category list '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_base_category3_full/';
2.6 Coding dictionary table ( Full scale ) Design
DROP TABLE IF EXISTS ods_base_dic_full;
CREATE EXTERNAL TABLE ods_base_dic_full
(
`dic_code` STRING COMMENT ' Number ',
`dic_name` STRING COMMENT ' Code name ',
`parent_code` STRING COMMENT ' Parent number ',
`create_time` STRING COMMENT ' Date of creation ',
`operate_time` STRING COMMENT ' modification date '
) COMMENT ' Coding dictionary table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_base_dic_full/';
2.7 Province table ( Full scale ) Design
DROP TABLE IF EXISTS ods_base_province_full;
CREATE EXTERNAL TABLE ods_base_province_full
(
`id` STRING COMMENT ' Number ',
`name` STRING COMMENT ' Name of province ',
`region_id` STRING COMMENT ' region ID',
`area_code` STRING COMMENT ' Area code ',
`iso_code` STRING COMMENT ' Old edition ISO-3166-2 code , For visualization ',
`iso_3166_2` STRING COMMENT ' new edition IOS-3166-2 code , For visualization '
) COMMENT ' Province table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_base_province_full/';
2.8 District Table ( Full scale ) Design
DROP TABLE IF EXISTS ods_base_region_full;
CREATE EXTERNAL TABLE ods_base_region_full
(
`id` STRING COMMENT ' Number ',
`region_name` STRING COMMENT ' Area name '
) COMMENT ' District Table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_base_region_full/';
2.9 Brand list ( Full scale ) Design
DROP TABLE IF EXISTS ods_base_trademark_full;
CREATE EXTERNAL TABLE ods_base_trademark_full
(
`id` STRING COMMENT ' Number ',
`tm_name` STRING COMMENT ' The brand name ',
`logo_url` STRING COMMENT ' brand logo Picture path for '
) COMMENT ' Brand list '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_base_trademark_full/';
2.10 Shopping cart table ( Full scale ) Design
DROP TABLE IF EXISTS ods_cart_info_full;
CREATE EXTERNAL TABLE ods_cart_info_full
(
`id` STRING COMMENT ' Number ',
`user_id` STRING COMMENT ' user id',
`sku_id` STRING COMMENT 'sku_id',
`cart_price` DECIMAL(16, 2) COMMENT ' The price when you put it in the shopping cart ',
`sku_num` BIGINT COMMENT ' Number ',
`img_url` BIGINT COMMENT ' Product image address ',
`sku_name` STRING COMMENT 'sku name ( redundancy )',
`is_checked` STRING COMMENT ' Is it selected ',
`create_time` STRING COMMENT ' Creation time ',
`operate_time` STRING COMMENT ' Modification time ',
`is_ordered` STRING COMMENT ' Have you placed an order ',
`order_time` STRING COMMENT ' Order time ',
`source_type` STRING COMMENT ' Source type ',
`source_id` STRING COMMENT ' Source number '
) COMMENT ' Shopping cart scale '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_cart_info_full/';
2.11 Coupon information form ( Full scale ) Design
DROP TABLE IF EXISTS ods_coupon_info_full;
CREATE EXTERNAL TABLE ods_coupon_info_full
(
`id` STRING COMMENT ' Shopping voucher No ',
`coupon_name` STRING COMMENT ' Name of the shopping voucher ',
`coupon_type` STRING COMMENT ' Type of shopping voucher 1 cash coupon 2 coupon 3 Full discount 4 Full discount coupons ',
`condition_amount` DECIMAL(16, 2) COMMENT ' Full amount ',
`condition_num` BIGINT COMMENT ' Full number ',
`activity_id` STRING COMMENT ' Activity number ',
`benefit_amount` DECIMAL(16, 2) COMMENT ' Less amount ',
`benefit_discount` DECIMAL(16, 2) COMMENT ' discount ',
`create_time` STRING COMMENT ' Creation time ',
`range_type` STRING COMMENT ' Range type 1、 goods 2、 category 3、 brand ',
`limit_num` BIGINT COMMENT ' Maximum collection times ',
`taken_count` BIGINT COMMENT ' Number of times collected ',
`start_time` STRING COMMENT ' Start collection time ',
`end_time` STRING COMMENT ' End collection time ',
`operate_time` STRING COMMENT ' Modification time ',
`expire_time` STRING COMMENT ' Expiration time '
) COMMENT ' Coupon information form '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_coupon_info_full/';
2.12 Product platform attribute table ( Full scale ) Design
DROP TABLE IF EXISTS ods_sku_attr_value_full;
CREATE EXTERNAL TABLE ods_sku_attr_value_full
(
`id` STRING COMMENT ' Number ',
`attr_id` STRING COMMENT ' Platform properties ID',
`value_id` STRING COMMENT ' Platform property value ID',
`sku_id` STRING COMMENT ' goods ID',
`attr_name` STRING COMMENT ' Platform attribute name ',
`value_name` STRING COMMENT ' Platform attribute value name '
) COMMENT 'sku Platform property sheet '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_sku_attr_value_full/';
2.13 Commodity list ( Full scale ) Design
DROP TABLE IF EXISTS ods_sku_info_full;
CREATE EXTERNAL TABLE ods_sku_info_full
(
`id` STRING COMMENT 'skuId',
`spu_id` STRING COMMENT 'spuid',
`price` DECIMAL(16, 2) COMMENT ' Price ',
`sku_name` STRING COMMENT ' Name of commodity ',
`sku_desc` STRING COMMENT ' Commodity Description ',
`weight` DECIMAL(16, 2) COMMENT ' weight ',
`tm_id` STRING COMMENT ' brand id',
`category3_id` STRING COMMENT ' category id',
`sku_default_igm` STRING COMMENT ' Product image address ',
`is_sale` STRING COMMENT ' Are you selling ',
`create_time` STRING COMMENT ' Creation time '
) COMMENT 'SKU Commodity list '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_sku_info_full/';
2.14 Commodity sales attribute value table ( Full scale ) Design
DROP TABLE IF EXISTS ods_sku_sale_attr_value_full;
CREATE EXTERNAL TABLE ods_sku_sale_attr_value_full
(
`id` STRING COMMENT ' Number ',
`sku_id` STRING COMMENT 'sku_id',
`spu_id` STRING COMMENT 'spu_id',
`sale_attr_value_id` STRING COMMENT ' Sales attribute value id',
`sale_attr_id` STRING COMMENT ' Sales attributes id',
`sale_attr_name` STRING COMMENT ' Sales attribute name ',
`sale_attr_value_name` STRING COMMENT ' Sales attribute value name '
) COMMENT 'sku Sales attribute name '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value_full/';
2.15 SPU surface ( Full scale ) Design
DROP TABLE IF EXISTS ods_spu_info_full;
CREATE EXTERNAL TABLE ods_spu_info_full
(
`id` STRING COMMENT 'spu_id',
`spu_name` STRING COMMENT 'spu name ',
`description` STRING COMMENT ' Description information ',
`category3_id` STRING COMMENT ' category id',
`tm_id` STRING COMMENT ' brand id'
) COMMENT 'SPU Commodity list '
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
LOCATION '/warehouse/gmall/ods/ods_spu_info_full/';
2.16 Shopping cart table ( Increment table )
DROP TABLE IF EXISTS ods_cart_info_inc;
CREATE EXTERNAL TABLE ods_cart_info_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,user_id :STRING,sku_id :STRING,cart_price :DECIMAL(16, 2),sku_num :BIGINT,img_url :STRING,sku_name
:STRING,is_checked :STRING,create_time :STRING,operate_time :STRING,is_ordered :STRING,order_time
:STRING,source_type :STRING,source_id :STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Shopping cart increment table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_cart_info_inc/';
2.17 Comment table ( Increment table )
DROP TABLE IF EXISTS ods_comment_info_inc;
CREATE EXTERNAL TABLE ods_comment_info_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,user_id :STRING,nick_name :STRING,head_img :STRING,sku_id :STRING,spu_id :STRING,order_id
:STRING,appraise :STRING,comment_txt :STRING,create_time :STRING,operate_time :STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Evaluation form '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_comment_info_inc/';
2.18 Coupon collection form ( Increment table )
DROP TABLE IF EXISTS ods_coupon_use_inc;
CREATE EXTERNAL TABLE ods_coupon_use_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,coupon_id :STRING,user_id :STRING,order_id :STRING,coupon_status :STRING,get_time :STRING,using_time
:STRING,used_time :STRING,expire_time :STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Coupon collection form '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_coupon_use_inc/';
2.19 Collection table ( Increment table )
DROP TABLE IF EXISTS ods_favor_info_inc;
CREATE EXTERNAL TABLE ods_favor_info_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,user_id :STRING,sku_id :STRING,spu_id :STRING,is_cancel :STRING,create_time :STRING,cancel_time
:STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Collection table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_favor_info_inc/';
2.20 Order details ( Increment table )
DROP TABLE IF EXISTS ods_order_detail_inc;
CREATE EXTERNAL TABLE ods_order_detail_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,order_id :STRING,sku_id :STRING,sku_name :STRING,img_url :STRING,order_price
:DECIMAL(16, 2),sku_num :BIGINT,create_time :STRING,source_type :STRING,source_id :STRING,split_total_amount
:DECIMAL(16, 2),split_activity_amount :DECIMAL(16, 2),split_coupon_amount
:DECIMAL(16, 2)> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Order details '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_order_detail_inc/';
2.21 Order details activity association table ( Increment table )
DROP TABLE IF EXISTS ods_order_detail_activity_inc;
CREATE EXTERNAL TABLE ods_order_detail_activity_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,order_id :STRING,order_detail_id :STRING,activity_id :STRING,activity_rule_id :STRING,sku_id
:STRING,create_time :STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Order details activity association table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_order_detail_activity_inc/';
2.22 Order details coupon association table ( Increment table )
DROP TABLE IF EXISTS ods_order_detail_coupon_inc;
CREATE EXTERNAL TABLE ods_order_detail_coupon_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,order_id :STRING,order_detail_id :STRING,coupon_id :STRING,coupon_use_id :STRING,sku_id
:STRING,create_time :STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Order details coupon association table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon_inc/';
2.23 The order sheet ( Increment table )
DROP TABLE IF EXISTS ods_order_info_inc;
CREATE EXTERNAL TABLE ods_order_info_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,consignee :STRING,consignee_tel :STRING,total_amount :DECIMAL(16, 2),order_status :STRING,user_id
:STRING,payment_way :STRING,delivery_address :STRING,order_comment :STRING,out_trade_no :STRING,trade_body
:STRING,create_time :STRING,operate_time :STRING,expire_time :STRING,process_status :STRING,tracking_no
:STRING,parent_order_id :STRING,img_url :STRING,province_id :STRING,activity_reduce_amount
:DECIMAL(16, 2),coupon_reduce_amount :DECIMAL(16, 2),original_total_amount :DECIMAL(16, 2),freight_fee
:DECIMAL(16, 2),freight_fee_reduce :DECIMAL(16, 2),refundable_time :DECIMAL(16, 2)> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' The order sheet '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_order_info_inc/';
2.24 Chargeback form ( Increment table )
DROP TABLE IF EXISTS ods_order_refund_info_inc;
CREATE EXTERNAL TABLE ods_order_refund_info_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,user_id :STRING,order_id :STRING,sku_id :STRING,refund_type :STRING,refund_num :BIGINT,refund_amount
:DECIMAL(16, 2),refund_reason_type :STRING,refund_reason_txt :STRING,refund_status :STRING,create_time
:STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Chargeback form '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_order_refund_info_inc/';
2.25 Order status flow table ( Increment table )
DROP TABLE IF EXISTS ods_order_status_log_inc;
CREATE EXTERNAL TABLE ods_order_status_log_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,order_id :STRING,order_status :STRING,operate_time :STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Chargeback form '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_order_status_log_inc/';
2.26 Payment form ( Increment table )
DROP TABLE IF EXISTS ods_payment_info_inc;
CREATE EXTERNAL TABLE ods_payment_info_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,out_trade_no :STRING,order_id :STRING,user_id :STRING,payment_type :STRING,trade_no
:STRING,total_amount :DECIMAL(16, 2),subject :STRING,payment_status :STRING,create_time :STRING,callback_time
:STRING,callback_content :STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Payment form '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_payment_info_inc/';
2.27 Refund form ( Increment table )
DROP TABLE IF EXISTS ods_refund_payment_inc;
CREATE EXTERNAL TABLE ods_refund_payment_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,out_trade_no :STRING,order_id :STRING,sku_id :STRING,payment_type :STRING,trade_no :STRING,total_amount
:DECIMAL(16, 2),subject :STRING,refund_status :STRING,create_time :STRING,callback_time :STRING,callback_content
:STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' Refund form '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_refund_payment_inc/';
2.28 User table ( Increment table )
DROP TABLE IF EXISTS ods_user_info_inc;
CREATE EXTERNAL TABLE ods_user_info_inc
(
`type` STRING COMMENT ' Change type ',
`ts` BIGINT COMMENT ' Change time ',
`data` STRUCT<id :STRING,login_name :STRING,nick_name :STRING,passwd :STRING,name :STRING,phone_num :STRING,email
:STRING,head_img :STRING,user_level :STRING,birthday :STRING,gender :STRING,create_time :STRING,operate_time
:STRING,status :STRING> COMMENT ' data ',
`old` MAP<STRING,STRING> COMMENT ' The old value '
) COMMENT ' User table '
PARTITIONED BY (`dt` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
LOCATION '/warehouse/gmall/ods/ods_user_info_inc/';
2.29 Data table creation statement execution display

2.29 Data loading script design
Because of the above 28 The data loading logic of the table is the same , So we write a script to unify 28 Data loading of a table .
(1) stay hadoop102 Of /home/root/bin Create under directory hdfs_to_ods_db.sh
[[email protected] bin]$ vim hdfs_to_ods_db.sh
(2) The specific content of the script is as follows :
We first judge whether there are parameters transmitted , The first parameter is the table to load data , You can load single table data , You can also use parameters “all“ To load the full table data . The second parameter is the date , After the project goes online , This parameter can be omitted , It will be automatically set as the date of the previous day .
#!/bin/bash
APP=gmall
if [ -n "$2" ] ;then
do_date=$2
else
do_date=`date -d '-1 day' +%F`
fi
load_data(){
sql=""
for i in $*; do
# Determine if the path exists
hadoop fs -test -e /origin_data/$APP/db/${i:4}/$do_date
# Data can be loaded only when the path exists
if [[ $? = 0 ]]; then
sql=$sql"load data inpath '/origin_data/$APP/db/${i:4}/$do_date' OVERWRITE into table ${APP}.$i partition(dt='$do_date');"
fi
done
hive -e "$sql"
}
case $1 in
"ods_activity_info_full")
load_data "ods_activity_info_full"
;;
"ods_activity_rule_full")
load_data "ods_activity_rule_full"
;;
"ods_base_category1_full")
load_data "ods_base_category1_full"
;;
"ods_base_category2_full")
load_data "ods_base_category2_full"
;;
"ods_base_category3_full")
load_data "ods_base_category3_full"
;;
"ods_base_dic_full")
load_data "ods_base_dic_full"
;;
"ods_base_province_full")
load_data "ods_base_province_full"
;;
"ods_base_region_full")
load_data "ods_base_region_full"
;;
"ods_base_trademark_full")
load_data "ods_base_trademark_full"
;;
"ods_cart_info_full")
load_data "ods_cart_info_full"
;;
"ods_coupon_info_full")
load_data "ods_coupon_info_full"
;;
"ods_sku_attr_value_full")
load_data "ods_sku_attr_value_full"
;;
"ods_sku_info_full")
load_data "ods_sku_info_full"
;;
"ods_sku_sale_attr_value_full")
load_data "ods_sku_sale_attr_value_full"
;;
"ods_spu_info_full")
load_data "ods_spu_info_full"
;;
"ods_cart_info_inc")
load_data "ods_cart_info_inc"
;;
"ods_comment_info_inc")
load_data "ods_comment_info_inc"
;;
"ods_coupon_use_inc")
load_data "ods_coupon_use_inc"
;;
"ods_favor_info_inc")
load_data "ods_favor_info_inc"
;;
"ods_order_detail_inc")
load_data "ods_order_detail_inc"
;;
"ods_order_detail_activity_inc")
load_data "ods_order_detail_activity_inc"
;;
"ods_order_detail_coupon_inc")
load_data "ods_order_detail_coupon_inc"
;;
"ods_order_info_inc")
load_data "ods_order_info_inc"
;;
"ods_order_refund_info_inc")
load_data "ods_order_refund_info_inc"
;;
"ods_order_status_log_inc")
load_data "ods_order_status_log_inc"
;;
"ods_payment_info_inc")
load_data "ods_payment_info_inc"
;;
"ods_refund_payment_inc")
load_data "ods_refund_payment_inc"
;;
"ods_user_info_inc")
load_data "ods_user_info_inc"
;;
"all")
load_data "ods_activity_info_full" "ods_activity_rule_full" "ods_base_category1_full" "ods_base_category2_full" "ods_base_category3_full" "ods_base_dic_full" "ods_base_province_full" "ods_base_region_full" "ods_base_trademark_full" "ods_cart_info_full" "ods_coupon_info_full" "ods_sku_attr_value_full" "ods_sku_info_full" "ods_sku_sale_attr_value_full" "ods_spu_info_full" "ods_cart_info_inc" "ods_comment_info_inc" "ods_coupon_use_inc" "ods_favor_info_inc" "ods_order_detail_inc" "ods_order_detail_activity_inc" "ods_order_detail_coupon_inc" "ods_order_info_inc" "ods_order_refund_info_inc" "ods_order_status_log_inc" "ods_payment_info_inc" "ods_refund_payment_inc" "ods_user_info_inc"
;;
esac
(3) We increase the execution permission of this script
[[email protected] bin]$ chmod +x hdfs_to_ods_db.sh
(4) Use this script to load 2022-05-01 The data of
[[email protected] bin]$ hdfs_to_ods_db.sh all 2022-05-01

Final , Let's check whether there is load entry data in the table :

We can see from the above figure ,ODS Layer data has been loaded successfully .
3.ODS Layer summary
about ODS Layer , We just need to execute every day hdfs_to_ods_log.sh and hdfs_to_ods_db.sh These two scripts will collect HDFS The raw data on is loaded into ODS Layer can be .
边栏推荐
猜你喜欢

4. Neusoft cross border e-commerce data warehouse project - user behavior data acquisition channel construction of data acquisition channel construction (2022.6.1-2022.6.4)

Spark核心编程(4)--Spark运行架构

Redis source code analysis skip table implementation

用Flink SQL流化市场数据2:盘中风险价值

Macro definition of C language

Using Flink SQL to fluidize market data 2: intraday var

聊聊写代码的20个反面教材

ETL tool -- kettle realizes simple data migration

Parent components plus scoped sometimes affect child components

Solutions for vscode terminal failure
随机推荐
11.数据仓库搭建之DWS层搭建
Data Lakehouse的未来-开放
线上软件测试培训机构柠檬班与iTest.AI平台达成战略合作
【函数的效率】
Talk about the 8 pits of redis distributed lock
Solutions for vscode terminal failure
zTree自定义Title属性
图片的大小限制显示
replace限制文本框只能输入数字,数字和字母等的正则表达式
C语言动态内存管理
PCM静默检测
共用(联合)体
Talk about 20 negative teaching materials for writing code
idea导入本地包
操作系统常见面试题
UML (use case diagram, class diagram, object diagram, package diagram)
10.数据仓库搭建之DWD层搭建
Online software testing training institutions lemon class and itest AI platform achieves strategic cooperation
Is the software testing training of lemon class reliable? This successful case of counter attack from the training class tells you
sql时间对比