DOC | Quick Start | 中文
Breaking News !!
Super excited to announce our PGL team won TWO FIRST place and ONE SECOND place in a total of three track in OGB-LSC KDD CUP 2021. Leaderboards can be found here.
-  First place in MAG240M-LSC track: Code and Technical Report can be found here. 
-  First place in WikiKG90M-LSC track: Code and Technical Report can be found here. 
-  Second place in PCQM4M-LSC track: Code and Technical Report can be found here. 
Two amazing paper using PGL are accepted: (2021.06.17)
- Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification, to appear in IJCAI2021.
- HGAMN: Heterogeneous Graph Attention Matching Network for Multilingual POI Retrieval at Baidu Maps, to appear in KDD2021.
PGL Dstributed Graph Engine API released!!
- Our Dstributed Graph Engine API has been released and we developed a tutorial to show how to launch a graph engine and a demo for training model using graph engine.
PGL v2.1 2021.02.02
- We are now support dygraph version of PaddlePaddle 2.0, and release PGL v2.1.
- You can find the stable staic version of PGL in the branch "static_stable"
PGL v1.2 2020.11.20
-  The PGL team proposed a new Unified Message Passing Model (UniMP), and achieved the State of the Art on three tasks on the OGB leaderboards. You can find the code here. 
-  The PGL team proposed a two-stage recall and ranking model based on ERNIEsage, and won the first place in the TextGraphs-2020 competition co-organized by COLING. 
-  The PGL team worked hard to develop an open course of Graph Neural Network (GNN), which will help you getting started with Graph Neural Network in seven days. Details can be found in course. 
PGL v1.1 2020.4.29
-  You can find ERNIESage, a novel model for modeling text and graph structures, and its introduction here. 
-  PGL for Open Graph Benchmark examples can be found here. 
-  We add newly graph level operators like GraphPooling and GraphNormalization for graph level predictions. 
-  We relase a PGL-KE toolkit here including classical knowledge graph embedding t algorithms like TransE, TransR, RotatE. 
Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle.
The newly released PGL supports heterogeneous graph learning on both walk based paradigm and message-passing based paradigm by providing MetaPath sampling and Message Passing mechanism on heterogeneous graph. Furthermor, The newly released PGL also support distributed graph storage and some distributed training algorithms, such as distributed deep walk and distributed graphsage. Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.
One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt Message Passing Paradigm similar to DGL to help to build a customize graph neural network easily. Users only need to write send and recv functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function  to send the message from the source to the target node. For the second step, the recv function 
 is responsible for aggregating 
 messages together from different sources.
To write a sum aggregator, users only need to write the following codes.
    import pgl
    import paddle
    import numpy as np
    
    num_nodes = 5
    edges = [(0, 1), (1, 2), (3, 4)]
    feature = np.random.randn(5, 100).astype(np.float32)
    g = pgl.Graph(num_nodes=num_nodes,
        edges=edges,
        node_feat={
            "h": feature
        })
    g.tensor()
    def send_func(src_feat, dst_feat, edge_feat):
        return src_feat
    def recv_func(msg):
        return msg.reduce_sum(msg["h"]) 
     
    msg = g.send(send_func, src_feat=g.node_feat)
    ret = g.recv(recv_func, msg)
Highlight: Flexibility - Natively Support Heterogeneous Graph Learning
Graph can conveniently represent the relation between things in the real world, but the categories of things and the relation between things are various. Therefore, in the heterogeneous graph, we need to distinguish the node types and edge types in the graph network. PGL models heterogeneous graphs that contain multiple node types and multiple edge types, and can describe complex connections between different types.
Support meta path walk sampling on heterogeneous graph
The left side of the figure above describes a shopping social network. The nodes above have two categories of users and goods, and the relations between users and users, users and goods, and goods and goods. The right of the above figure is a simple sampling process of MetaPath. When you input any MetaPath as UPU (user-product-user), you will find the following results Then on this basis, and introducing word2vec and other methods to support learning metapath2vec and other algorithms of heterogeneous graph representation.Support Message Passing mechanism on heterogeneous graph
Because of the different node types on the heterogeneous graph, the message delivery is also different. As shown on the left, it has five neighbors, belonging to two different node types. As shown on the right of the figure above, nodes belonging to different types need to be aggregated separately during message delivery, and then merged into the final message to update the target node. On this basis, PGL supports heterogeneous graph algorithms based on message passing, such as GATNE and other algorithms.Large-Scale: Support distributed graph storage and distributed training algorithms
In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted PaddleFleet as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so it can easily set up a large scale distributed training algorithm with MPI clusters.
Model Zoo
The following graph learning models have been implemented in the framework. You can find more examples and the details here.
| Model | feature | 
|---|---|
| ERNIESage | ERNIE SAmple aggreGatE for Text and Graph | 
| GCN | Graph Convolutional Neural Networks | 
| GAT | Graph Attention Network | 
| GraphSage | Large-scale graph convolution network based on neighborhood sampling | 
| unSup-GraphSage | Unsupervised GraphSAGE | 
| LINE | Representation learning based on first-order and second-order neighbors | 
| DeepWalk | Representation learning by DFS random walk | 
| MetaPath2Vec | Representation learning based on metapath | 
| Node2Vec | The representation learning Combined with DFS and BFS | 
| Struct2Vec | Representation learning based on structural similarity | 
| SGC | Simplified graph convolution neural network | 
| GES | The graph represents learning method with node features | 
| DGI | Unsupervised representation learning based on graph convolution network | 
| GATNE | Representation Learning of Heterogeneous Graph based on MessagePassing | 
The above models consists of three parts, namely, graph representation learning, graph neural network and heterogeneous graph learning, which are also divided into graph representation learning and graph neural network.
System requirements
PGL requires:
- paddlepaddle >= 2.2.0
- cython
PGL only supports Python 3
Installation
You can simply install it via pip.
pip install pgl
The Team
PGL is developed and maintained by NLP and Paddle Teams at Baidu
E-mail: nlp-gnn[at]baidu.com
License
PGL uses Apache License 2.0.







 ai studio cpu 结果:
ai studio cpu 结果:
 ai studio gpu结果:
ai studio gpu结果:




 这是我fleet分布式的log
这是我fleet分布式的log
 下面是main函数的分布式部分,我只修改了main函数
下面是main函数的分布式部分,我只修改了main函数 mag240m 的数据很大 24g,运行到 STAGE [GPU Load] end load edge into GPU, type[inst2author] 后就直接退出了,也没有报错,开了debug 也看不到报错日志,我已经拉了最新的main 分支代码, 是代码没推完整吗, 还是数据量太大了 ,我显卡跑不了,我机器是2张卡, 没看到哪里配置说明是 多卡训练,不支持多机多卡么,目前看只是docker 里单机运行。后面是否能提供在k8s 上可以跑的 yarm 环境,支持多机多卡。我没有百度一体机 没有1机8卡的设备。需要多机多卡运行。 mag240m数据量太大了, 能否提供小数据集快速验证的。可以提供一份 没有 sharding_tool 之前的 ,测试下 sharding 到运行的整体流程。谢谢。
mag240m 的数据很大 24g,运行到 STAGE [GPU Load] end load edge into GPU, type[inst2author] 后就直接退出了,也没有报错,开了debug 也看不到报错日志,我已经拉了最新的main 分支代码, 是代码没推完整吗, 还是数据量太大了 ,我显卡跑不了,我机器是2张卡, 没看到哪里配置说明是 多卡训练,不支持多机多卡么,目前看只是docker 里单机运行。后面是否能提供在k8s 上可以跑的 yarm 环境,支持多机多卡。我没有百度一体机 没有1机8卡的设备。需要多机多卡运行。 mag240m数据量太大了, 能否提供小数据集快速验证的。可以提供一份 没有 sharding_tool 之前的 ,测试下 sharding 到运行的整体流程。谢谢。