Anti-money laundering
Dedect relationship between A and E by tracing through payments with similar amounts and identifying payment chains.
For example:
lag means lag(daystamp,-1) over (partitin by accname, Cntpty_Acct_Name order by daystamp )
| accname | Event_Dt | Tx_Amt | Cntpty_Acct_Name | daystamp | id | lag |
|---|---|---|---|---|---|---|
| a | 2020-01-01 | 20.0 | b | 7305 | 5068 | 7306.0 |
| a | 2020-01-02 | 300.0 | b | 7306 | 5069 | 7307.0 |
| a | 2020-01-03 | 180.0 | b | 7307 | 5070 | Infinity |
| b | 2020-01-03 | 40.0 | c | 7307 | 5071 | 7307.0 |
| b | 2020-01-03 | 500.0 | c | 7307 | 5072 | 7308.0 |
| b | 2020-01-04 | 10.0 | c | 7308 | 5073 | Infinity |
| b | 2020-01-03 | 150.0 | d | 7307 | 5074 | Infinity |
| c | 2020-01-04 | 50.0 | e | 7308 | 5075 | Infinity |
| d | 2020-01-04 | 150.0 | e | 7308 | 5076 | Infinity |
You can run spark-submit aml.py to get the payment chains seem like money laundering. The same 'batch_id' indicates these transactions belong to a complete chain. The field 'depth' indicates the length of the chain.
| id | batch_id | src | dst | amount_sum | depth | accname | Event_Dt | Tx_Amt | Cntpty_Acct_Name |
|---|---|---|---|---|---|---|---|---|---|
| 5068 | 0 | a | e | 200.0 | 4 | a | 2020-01-01 | 20.0 | b |
| 5070 | 0 | a | e | 200.0 | 4 | a | 2020-01-03 | 180.0 | b |
| 5071 | 0 | a | e | 200.0 | 4 | b | 2020-01-03 | 40.0 | c |
| 5073 | 0 | a | e | 200.0 | 4 | b | 2020-01-04 | 10.0 | c |
| 5074 | 0 | a | e | 200.0 | 4 | b | 2020-01-03 | 150.0 | d |
| 5075 | 0 | a | e | 200.0 | 4 | c | 2020-01-04 | 50.0 | e |
| 5076 | 0 | a | e | 200.0 | 4 | d | 2020-01-04 | 150.0 | e |
| 5068 | 1 | a | c | 500.0 | 3 | a | 2020-01-01 | 20.0 | b |
| 5069 | 1 | a | c | 500.0 | 3 | a | 2020-01-02 | 300.0 | b |
| 5070 | 1 | a | c | 500.0 | 3 | a | 2020-01-03 | 180.0 | b |
| 5072 | 1 | a | c | 500.0 | 3 | b | 2020-01-03 | 500.0 | c |
Enjoy it !