A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.


English 中文版


TransBigData is a Python package developed for transportation spatio-temporal big data processing, analysis and visualization. TransBigData provides fast and concise methods for processing common transportation spatio-temporal big data such as Taxi GPS data, bicycle sharing data and bus GPS data. TransBigData provides a variety of processing methods for each stage of transportation spatio-temporal big data analysis. The code with TransBigData is clean, efficient, flexible, and easy to use, allowing complex data tasks to be achieved with concise code.

For some specific types of data, TransBigData also provides targeted tools for specific needs, such as extraction of Origin and Destination(OD) of taxi trips from taxi GPS data and identification of arrival and departure information from bus GPS data. The latest stable release of the software can be installed via pip and full documentation can be found at https://transbigdata.readthedocs.io/en/latest/. Introduction PPT can be found here and here(in Chinese)

Target Audience

The target audience of TransBigData includes:

  • Data science researchers and data engineers in the field of transportation big data, smart transportation systems, and urban computing, particularly those who want to integrate innovative algorithms into intelligent trasnportation systems
  • Government, enterprises, or other entities who expect efficient and reliable management decision support through transportation spatio-temporal data analysis.

Technical Features

  • Provide a variety of processing methods for each stage of transportation spatio-temporal big data analysis.
  • The code with TransBigData is clean, efficient, flexible, and easy to use, allowing complex data tasks to be achieved with concise code.

Main Functions

Currently, TransBigData mainly provides the following methods:

  • Data Quality: Provides methods to quickly obtain the general information of the dataset, including the data amount the time period and the sampling interval.
  • Data Preprocess: Provides methods to clean multiple types of data error.
  • Data Gridding: Provides methods to generate multiple types of geographic grids (Rectangular grids, Hexagonal grids) in the research area. Provides fast algorithms to map GPS data to the generated grids.
  • Data Aggregating: Provides methods to aggregate GPS data and OD data into geographic polygon.
  • Data Visualization: Built-in visualization capabilities leverage the visualization package keplergl to interactively visualize data on Jupyter notebook with simple code.
  • Trajectory Processing: Provides methods to process trajectory data, including generating trajectory linestring from GPS points, and trajectory densification, etc.
  • Basemap Loading: Provides methods to display Mapbox basemap on matplotlib figures


It is recommended to use Python 3.7, 3.8, 3.9

Using pypi PyPI version

TransBigData can be installed by using pip install. Before installing TransBigData, make sure that you have installed the available geopandas package. If you already have geopandas installed, run the following code directly from the command prompt to install TransBigData:

pip install transbigdata

Using conda-forge Conda Version Conda Downloads

You can also install TransBigData by conda-forge, this will automaticaly solve the dependency, it can be installed with:

conda install -c conda-forge transbigdata

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. A detailed overview on how to contribute can be found in the contributing guide on GitHub.


Example of data visualization

Visualize trajectories (with keplergl)


Visualize data distribution (with keplergl)


Visualize OD (with keplergl)


Example of taxi GPS data processing

The following example shows how to use the TransBigData to perform data gridding, data aggregating and data visualization for taxi GPS data.

Read the data

import transbigdata as tbd
import pandas as pd
#Read taxi gps data  
data = pd.read_csv('TaxiData-Sample.csv',header = None) 
data.columns = ['VehicleNum','time','lon','lat','OpenStatus','Speed'] 
VehicleNum time lon lat OpenStatus Speed
0 34745 20:27:43 113.806847 22.623249 1 27
1 34745 20:24:07 113.809898 22.627399 0 0
2 34745 20:24:27 113.809898 22.627399 0 0
3 34745 20:22:07 113.811348 22.628067 0 0
4 34745 20:10:06 113.819885 22.647800 0 54
... ... ... ... ... ... ...
544994 28265 21:35:13 114.321503 22.709499 0 18
544995 28265 09:08:02 114.322701 22.681700 0 0
544996 28265 09:14:31 114.336700 22.690100 0 0
544997 28265 21:19:12 114.352600 22.728399 0 0
544998 28265 19:08:06 114.137703 22.621700 0 0

544999 rows × 6 columns

Data pre-processing

Define the study area and use the tbd.clean_outofbounds method to delete the data out of the study area

#Define the study area
bounds = [113.75, 22.4, 114.62, 22.86]
#Delete the data out of the study area
data = tbd.clean_outofbounds(data,bounds = bounds,col = ['lon','lat'])

Data gridding

The most basic way to express the data distribution is in the form of geograpic grids. TransBigData provides methods to generate multiple types of geographic grids (Rectangular grids, Hexagonal grids) in the research area. For rectangular gridding, you need to determine the gridding parameters at first (which can be interpreted as defining a grid coordinate system):

#Obtain the gridding parameters
params = tbd.area_to_params(bounds,accuracy = 1000)

{'slon': 113.75, 'slat': 22.4, 'deltalon': 0.00974336289289822, 'deltalat': 0.008993210412845813, 'theta': 0, 'method': 'rect', 'gridsize': 1000}

The gridding parameters store the information of the initial position, the size and the angle of the gridding system.

The next step is to map the GPS data to their corresponding grids. Using the tbd.GPS_to_grid, it will generate the LONCOL column and the LATCOL column (Rectangular grids). The two columns together can specify a grid:

#Map the GPS data to grids
data['LONCOL'],data['LATCOL'] = tbd.GPS_to_grid(data['lon'],data['lat'],params)

Count the amount of data in each grids, generate the geometry of the grids and transform it into a GeoDataFrame:

#Aggregate data into grids
grid_agg = data.groupby(['LONCOL','LATCOL'])['VehicleNum'].count().reset_index()
#Generate grid geometry
grid_agg['geometry'] = tbd.grid_to_polygon([grid_agg['LONCOL'],grid_agg['LATCOL']],params)
#Change the type into GeoDataFrame
import geopandas as gpd
grid_agg = gpd.GeoDataFrame(grid_agg)
#Plot the grids
grid_agg.plot(column = 'VehicleNum',cmap = 'autumn_r')


Triangle and Hexagon grids & rotation angle

TransBigData also support the triangle and hexagon grids. It also supports given rotation angle for the grids. We can alter the gridding parameter:

#set to the hexagon grids
params['method'] = 'hexa'
#or set as triangle grids: params['method'] = 'tri'
#set a rotation angle (degree)
params['theta'] = 5

Then we can do the GPS data matching again:

#Triangle and Hexagon grids requires three columns to store ID
data['loncol_1'],data['loncol_2'],data['loncol_3'] = tbd.GPS_to_grid(data['lon'],data['lat'],params)
#Aggregate data into grids
grid_agg = data.groupby(['loncol_1','loncol_2','loncol_3'])['VehicleNum'].count().reset_index()
#Generate grid geometry
grid_agg['geometry'] = tbd.grid_to_polygon([grid_agg['loncol_1'],grid_agg['loncol_2'],grid_agg['loncol_3']],params)
#Change the type into GeoDataFrame
import geopandas as gpd
grid_agg = gpd.GeoDataFrame(grid_agg)
#Plot the grids
grid_agg.plot(column = 'VehicleNum',cmap = 'autumn_r')


Data Visualization(with basemap)

For a geographical data visualization figure, we still have to add the basemap, the colorbar, the compass and the scale. Use tbd.plot_map to load the basemap and tbd.plotscale to add compass and scale in matplotlib figure:

import matplotlib.pyplot as plt
fig =plt.figure(1,(8,8),dpi=300)
ax =plt.subplot(111)
#Load basemap
tbd.plot_map(plt,bounds,zoom = 11,style = 4)
#Define colorbar
cax = plt.axes([0.05, 0.33, 0.02, 0.3])
plt.title('Data count')
#Plot the data
grid_agg.plot(column = 'VehicleNum',cmap = 'autumn_r',ax = ax,cax = cax,legend = True)
#Add scale
tbd.plotscale(ax,bounds = bounds,textsize = 10,compasssize = 1,accuracy = 2000,rect = [0.06,0.03],zorder = 10)


Griding framework offered by TransBigData

Here is an overview of the gridding framework offered by TransBigData.


See This Example for further details.

Citation information DOI status

Please cite this when using TransBigData in your research. Citation information can be found at CITATION.cff.

Introducing Video (In Chinese) bilibili

  • 0.4.16(Nov 16, 2022)

    Add activity.py to analysis human activity

    • Entropy to calculate Entropy and Entropy rate
    • Confidence ellipse to calculate and plot confidence ellipse
    • Activity plot to plot Activity

    Update function tbd.mobile_plot_activity rename it to tbd.plot_activity

    • Add parameter fontsize to control fontsize of xticks and yticks
    • Add parameter yticks_gap to control yticks
    • Add parameter xticks_rotation and xticks_gap to control xticks
    • Use column group to control the color of the bars
    Source code(tar.gz)
    Source code(zip)
  • 0.4.15(Oct 31, 2022)

    • rename the tbd.mobile_stay_duration method name
    • fix bug in tbd.mobile_stay_move: some stay can not correctly identified.
    • fix bug in tbd.mobile_plot_activity: add the shuffle parameter and fix the norm function to control the color display
    Source code(tar.gz)
    Source code(zip)
  • 0.4.14(Oct 6, 2022)

    • Update function clean_taxi_status. Sort the VehicleNum and Time columns before clean taxi status
    • Add error info in amap getadmin function
    • Fix error info in bounds setting in grid.py
    Source code(tar.gz)
    Source code(zip)
  • 0.4.13(Sep 11, 2022)

    Improve plotmap:

    • Change the way of creating file path in plotmap to solve the error not reading local base map in some system environment.
    • Expose the read_imgsavepath and read_mapboxtoken function
    Source code(tar.gz)
    Source code(zip)
  • 0.4.12(Sep 8, 2022)

    • Improve the test coverage to 100%
    • Require the geopandas version 0.10.2 to avoid some potential errors
    • Support python 3.6 and 3.10
    • Add the timeout parameter in crawler.py
    • Use requests instead of urllib in data fetching
    Source code(tar.gz)
    Source code(zip)
  • 0.4.11(Jul 19, 2022)

  • 0.4.10(Jul 8, 2022)

    Update the mobile phone data processing function, See example for detail usage. Add functions:

    • transbigdata.mobile_stay_move: Input trajectory data and gridding parameters, identify stay and move.
    • transbigdata.mobile_stay_dutation: Input the stay point data to identify the duration during night and day time.
    • transbigdata.mobile_identify_home: Identify home location from mobile phone stay data. The rule is to identify the locations with longest duration in night time.
    • transbigdata.mobile_identify_work: Identify work location from mobile phone stay data. The rule is to identify the locations with longest duration in day time on weekdays(Average duration should over minhour).
    • transbigdata.mobile_plot_activity: Plot the activity plot of individual.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.9(Jul 1, 2022)

    Update the metro model, add functions:

    • transbigdata.metro_network: create metro network
    • transbigdata.get_shortest_path: Obtain the shortest path with given OD
    • transbigdata.get_k_shortest_paths: Obtain the k shortest path with given OD
    • transbigdata.get_path_traveltime: Obtain the travel time of the path

    See example for detail usage: https://transbigdata.readthedocs.io/en/latest/gallery/Example%205-Modeling%20for%20subway%20network%20topology.html

    Source code(tar.gz)
    Source code(zip)
  • 0.4.8(May 21, 2022)

  • 0.4.7(Apr 25, 2022)

    The tbd.plot_map function is added OpenStreetMap as the style 0, which do not need access token any more. imgsavepath and access_token are not neccessarily required now.

    Source code(tar.gz)
    Source code(zip)
  • 0.4.6(Apr 25, 2022)

  • 0.4.5(Apr 20, 2022)

  • v0.4.4(Apr 15, 2022)

  • v0.4.1(Mar 27, 2022)

    Add the Triangle and hexagon gridding methods, the methods are vectorized and fast:

    • Triangle grids: GPS_to_grids_tri and gridid_to_polygon_tri
    • Hexagon grids: GPS_to_grids_hexa and gridid_to_polygon_hexa

    See Example for details.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.12(Mar 25, 2022)

  • v0.3.11(Mar 17, 2022)

  • v0.3.10(Mar 16, 2022)

  • v0.3.9(Mar 6, 2022)

    Add two functions for isochrone download:

    • get_isochrone_mapbox: Obtain the isochrone from mapbox isochrone.
    • get_isochrone_amap: Obtain the isochrone from Amap isochrone.

    Grids are now support rotate with given angle:

    • grid params are now support the fifth parameter theta to represent the rotation angle.
    • GPS_to_grids,grids_centre,rect_grids,gridid_to_polygon,regenerate_params are rewrite to support grids with angle.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.7(Feb 23, 2022)

  • v0.3.6(Feb 22, 2022)

    Add two functions:

    transbigdata.regenerate_params(grid): Regenerate gridding params from grid. transbigdata.grid_from_params(params,location): Generate grids from params and bounds or shape.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.5(Feb 1, 2022)

    TransBigData v0.3.5 Integrate plot_map and CoordinatesConverter, it nolonger depends on these two packages. This is also the first version on conda-forge

    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Jan 28, 2022)

Qing Yu
Python, JavaScript, Spatio-temporal big data, Data visualization
Qing Yu
