TableParser
Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at [email protected]
1. Clone repositories
Download and install a git client and clone this repository:
git clone git@github.com:DS3Lab/TableParser.git
into <git-home>
directory. (home directory is denoted as git-home furtheron).
2. System components of TableParser
-
System overview of the TableParser pipeline
This browser does not support PDFs. Please download the PDF to view it: the TableParser pipeline.
-
Model overview of Mask RCNN in DocParser
This browser does not support PDFs. Please download the PDF to view it: Mask-RCNN.
- TableAnnotator: refer to this repo.
- Demo of annotating a table using TableAnnotator
- ExcelAnnotator:
./ExcelAnnotator
. - TableParser pipelines:
./TableParser
. - Data: Download from this Google Drive link.
- TableParser M1 (ModernTableParser) and M2 (HistoricalTableParser) can be downloaded from this Google Drive link, and put under
./TableParser/TableParser/detectron2/tools/docparser_outputs
.
3. References
To cite TableParser, refer to these items:
@inproceedings{rausch2021docparser, title={DocParser: Hierarchical Document Structure Parsing from Renderings}, author={Rausch, Johannes and Martinez, Octavio and Bissig, Fabian and Zhang, Ce and Feuerriegel, Stefan}, booktitle={35th AAAI Conference on Artificial Intelligence (AAAI-21)(virtual)}, year={2021} }
@article{rao2022tableparser, title={TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets}, author={Rao, Susie Xi and Rausch, Johannes and Egger, Peter and Zhang, Ce}, booktitle={Scientific Document Understanding Workshop (\tt SDU{@}AAAI-22)(virtual)}, year={2022} }
- TableAnnotator: refer to this repo.