Excalibur: A web interface to extract tabular data from PDFs

Last update: Jan 04, 2023

Overview

Excalibur: A web interface to extract tabular data from PDFs

Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It is powered by Camelot.

Note: Excalibur only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

Using Excalibur

Note: You need to install ghostscript before moving forward.

After installing Excalibur with pip, you need to initialize the metadata database using:

$ excalibur initdb

And then start the webserver using:

$ excalibur webserver

That's it! Now you can go to http://localhost:5000 and start extracting tabular data from your PDFs.

Upload a PDF and enter the page numbers you want to extract tables from.
Go to each page and select the table by drawing a box around it. (You can choose to skip this step since Excalibur can automatically detect tables on its own. Click on "Autodetect tables" to see what Excalibur sees.)
Choose a flavor (Lattice or Stream) from "Advanced".

a. Lattice: For tables formed with lines.

b. Stream: For tables formed with whitespaces.
Click on "View and download data" to see the extracted tables.
Select your favorite format (CSV/Excel/JSON/HTML) and click on "Download"!

Note: You can also download executables for Windows and Linux from the releases page and run them directly!

Why Excalibur?

Extracting tables from PDFs is hard. A simple copy-and-paste from a PDF into an Excel doesn't preserve table structure. Excalibur makes PDF table extraction very easy, by automatically detecting tables in PDFs and letting you save them into CSVs and Excel files.
Excalibur uses Camelot under the hood, which gives you additional settings to tweak table extraction and get the best results. You can see how it performs better than other open-source tools and libraries in this comparison.
You can save table extraction settings (like table areas) for a PDF once, and apply them on new PDFs to extract tables with similar structures.
You get complete control over your data. All file storage and processing happens on your own local or remote machine.
Excalibur can be configured with MySQL and Celery for parallel and distributed workloads. By default, sqlite and multiprocessing are used for sequential workloads.

Installation

Using pip

After installing ghostscript, which is one of the requirements for Camelot (See install instructions), you can simply use pip to install Excalibur:

$ pip install excalibur-py

From the source code

After installing ghostscript, clone the repo using:

$ git clone https://www.github.com/camelot-dev/excalibur

and install Excalibur using pip:

$ cd excalibur
$ pip install .

Documentation

Fantastic documentation is available at http://excalibur-py.readthedocs.io/.

Development

The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.

Source code

You can check the latest sources with:

$ git clone https://www.github.com/camelot-dev/excalibur

Setting up a development environment

You can install the development dependencies easily, using pip:

$ pip install excalibur-py[dev]

Testing (soon)

After installation, you can run tests using:

$ python setup.py test

Versioning

Excalibur uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out HISTORY.md.

License

This project is licensed under the MIT License, see the LICENSE file for details.

Support the development

You can support our work on Excalibur with a one-time or monthly donation on OpenCollective. Organizations who use Excalibur can also sponsor the project for an acknowledgement on our official site and this README.

Special thanks to all the users and organizations that support Excalibur!

Comments

ImportError: cannot import name 'secure_filename' after `excalibur initdb`

I can't start the database after running the command: excalibur initdb

I get this error:

~$ excalibur initdb Creating new Excalibur configuration file in: /home/localhost/excalibur/excalibur.cfg Traceback (most recent call last): File "/home/localhost/.local/bin/excalibur", line 5, in from excalibur.cli import cli File "/home/localhost/.local/lib/python3.6/site-packages/excalibur/cli.py", line 12, in from .www.app import create_app File "/home/localhost/.local/lib/python3.6/site-packages/excalibur/www/app.py", line 7, in from .views import views File "/home/localhost/.local/lib/python3.6/site-packages/excalibur/www/views.py", line 10, in from werkzeug import secure_filename ImportError: cannot import name 'secure_filename'

It seems that the library mentioned is already installed.

~$ pip3 install werkzeug Requirement already satisfied: werkzeug in ./.local/lib/python3.6/site-packages (1.0.0) WARNING: You are using pip version 19.2.3, however version 20.0.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command.

Any tip?

opened by belisards 9
AttributeError Nonetype for 'job_id'
Here's the print out of the problem. I'm getting a 500 internal server error. I'm in python 3.7. Camelot works fine for me (I can parse, read, export no problems). Just have a problem with running Excalibur.

Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) 127.0.0.1 - - [06/Nov/2018 21:30:22] "GET / HTTP/1.1" 302 - [2018-11-06 21:30:22,663] ERROR in app: Exception on /files [GET] Traceback (most recent call last): File "d:\python3.7\lib\site-packages\flask\app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "d:\python3.7\lib\site-packages\flask\app.py", line 1815, in full_dispatch_request rv = self.handle_user_exception(e) File "d:\python3.7\lib\site-packages\flask\app.py", line 1718, in handle_user_exception reraise(exc_type, exc_value, tb) File "d:\python3.7\lib\site-packages\flask_compat.py", line 35, in reraise raise value File "d:\python3.7\lib\site-packages\flask\app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "d:\python3.7\lib\site-packages\flask\app.py", line 1799, in dispatch_request return self.view_functionsrule.endpoint File "d:\python3.7\lib\site-packages\excalibur\www\views.py", line 39, in files 'job_id': job.job_id, AttributeError: 'NoneType' object has no attribute 'job_id'

Here's the code that contains the 'job_id' line 39 from the www\views.py file

@views.route('/files', methods=['GET', 'POST']) def files(): if request.method == 'GET': files_response = [] session = Session() for file in session.query(File).order_by(File.uploaded_at.desc()).all(): job = session.query(Job).filter(Job.file_id == file.file_id).order_by(Job.started_at.desc()).first() files_response.append({ 'file_id': file.file_id, 'job_id': job.job_id, 'uploaded_at': file.uploaded_at.strftime('%Y-%m-%dT%H:%M:%S'), 'filename': file.filename })

Any thoughts or am I making some stupid mistakes here?
bug
opened by willardgtan 9
Processing PDF - error message

I am on ubuntu 14.04 and installed excalibur-py using pip. while processing the following pdf (this is also used in camilot-py) and works well... the system returns the following message -

ERROR:root:'Table' object has no attribute '_bbox' Traceback (most recent call last): File "/home/sandeep/anaconda3/lib/python3.6/site-packages/excalibur/tasks.py", line 96, in split x1, y1, x2, y2 = tables[0]._bbox AttributeError: 'Table' object has no attribute '_bbox' Refresh does not change anything... if i click on excalibur then i get this msg back "The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application." background_lines.pdf

pdf file used is attached

opened by sandeepraizada 7
Error on Windows: OSError: exception: access violation writing 0x0967BC48 while running python-Excalibur code
camelot Excalibur thow an oserror:access violation writing 0x0967BC48 os - Windows 10 python version - 3.7

below is the output screen

Debug mode: off

Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) 127.0.0.1 - - [02/Jul/2019 19:03:54] "GET /files HTTP/1.1" 200 - 127.0.0.1 - - [02/Jul/2019 19:04:10] "POST /files HTTP/1.1" 200 - 127.0.0.1 - - [02/Jul/2019 19:04:10] "GET /workspaces/59f1c984-31fa-4ade-b944-770072f82827 HTTP/1.1" 200 - 127.0.0.1 - - [02/Jul/2019 19:04:19] "GET /workspaces/59f1c984-31fa-4ade-b944-770072f82827 HTTP/1.1" 200 - 127.0.0.1 - - [02/Jul/2019 19:04:20] "GET /static/favicon.ico HTTP/1.1" 200 - ERROR:root:exception: access violation writing 0x0967BC48 Traceback (most recent call last): File "c:\users\comp7\appdata\local\programs\python\python37-32\lib\site-packages\excalibur\tasks.py", line 44, in split with Ghostscript(*gs_call, stdout=null) as gs: File "c:\users\comp7\appdata\local\programs\python\python37-32\lib\site-packages\camelot\ext\ghostscript_init_.py", line 93, in Ghostscript stderr=kwargs.get('stderr', None)) File "c:\users\comp7\appdata\local\programs\python\python37-32\lib\site-packages\camelot\ext\ghostscript_init_.py", line 39, in init rc = gs.init_with_args(instance, args) File "c:\users\comp7\appdata\local\programs\python\python37-32\lib\site-packages\camelot\ext\ghostscript_gsprint.py", line 167, in init_with_args rc = libgs.gsapi_init_with_args(instance, len(argv), c_argv) OSError: exception: access violation writing 0x0967BC48
opened by swaraj20 6
GhostscriptNotFound: Please make sure that Ghostscript is installed and available on the PATH environment variable

OS: Windows 10

After downloading excalibur, I had to download/install ghostscript (this should be stated in instructions).

After installing ghostscript, the PATH needs to be set. After setting PATH, the error still exists:

PATH set: C:\Program Files\gs\gs9.26\bin\gswin64c.exe (I've restarted)

opened by majestique 5
werkzeug.utils not werkzeug

excalibur would not start on my linux Fedora 29 system. here was the message:

.local/bin/excalibur Traceback (most recent call last): File ".local/bin/excalibur", line 5, in from excalibur.cli import cli File "/home/thorsten/.local/lib/python3.6/site-packages/excalibur/cli.py", line 12, in from .www.app import create_app File "/home/thorsten/.local/lib/python3.6/site-packages/excalibur/www/app.py", line 7, in from .views import views File "/home/thorsten/.local/lib/python3.6/site-packages/excalibur/www/views.py", line 10, in from werkzeug import secure_filename ImportError: cannot import name 'secure_filename'

After some googling I fixed it myself by editing line 10 in views.py to read

from werkzeug.utils import secure_filename

This seems like a very simple issue to fix.

After that, the program worked brilliantly. I found it on

https://hackernoon.com/an-open-source-science-tool-to-extract-tables-from-pdfs-into-excels-3ed3cc7f22e1

John Thorstensen

opened by jrthorstensen 4
Change the import from werkzeug
python 3.6, werkzeug 1.0, on ubuntu 18.04 in WSL (Windows) I couldn't run due to the error

from werkzeug import secure_filename ImportError: cannot import name 'secure_filename'
opened by sabas 3
Excalibur's data directory is created in HOME

I consider it bad form for Excalibur to create a user-visible folder in the home folder (/Users/akx/excalibur on my Mac, for instance).

It'd be better to use e.g. appdirs to figure out the "user data" directory, and create the Excalibur directory there.

opened by akx 3

$ERROR:root:'charmap' codec can't encode character '\ued6f' in position 350: char acter maps to <undefined>$

ERROR:root:'charmap' codec can't encode character '\ued6f' in position 350: char acter maps to

I am unable to share the pdf that is causing this issue. I would like to know what I can do to bypass this error.
Even if it means dropping the "offending char". Getting some of the data is better than getting none of the data. I'd be ecstatic if this is a PEBKAC issue, so please don't discount that.

Using the latest download of excaliber and Python 3.7.3 (I think). Only using the webui to do this. Don't think I could handle coding it, without some hand holding.

This is happening on several pages of a very large pdf (700+ pages). But not all of them. So the file can be parsed. Just not the important portion, which is most of the file.

I DID just realize that it is creating some of the output files (excel, csv, and json), but not html. Since I on'y really need the csv or excel, I might be good. Will keep pushing on the remainder of the file (its slow to handle 100 pages at a time)

127.0.0.1 - - [05/Aug/2019 16:55:09] "GET /jobs/fbfeb974-5f3d-4991-b26c-98356064
0de5 HTTP/1.1" 200 -
ERROR:root:'charmap' codec can't encode character '\ued6f' in position 350: char
acter maps to <undefined>
Traceback (most recent call last):
  File "excalibur\executors\sequential_executor.py", line 12, in execute_command

  File "subprocess.py", line 336, in check_call
  File "subprocess.py", line 317, in call
  File "subprocess.py", line 769, in __init__
  File "subprocess.py", line 1172, in _execute_child
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "excalibur\tasks.py", line 161, in extract
  File "lib\site-packages\camelot\core.py", line 479, in export
  File "lib\site-packages\camelot\core.py", line 437, in _write_file
  File "lib\site-packages\camelot\core.py", line 394, in to_html
  File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
UnicodeEncodeError: 'charmap' codec can't encode character '\ued6f' in position
350: character maps to <undefined>

opened by randomstability 3

Failure when "all" or 1-end is selected

Excalibur struggles on large pdfs (20pgs or more) when I indicate the "all" or "1-end" options. I get the following warning: UserWarning: No tables found on page-144 [lattice.py:399] UserWarning: No tables found on page-144 [stream.py:447] UserWarning: No tables found in table area 1 [stream.py:361] UserWarning: No tables found in table area 1 [stream.py:361] UserWarning: No tables found in table area 2 [stream.py:361]

However if I manually select the pages it works fine. Is there a way to solve this?

opened by VAnthonyrajah 3
[Flavor] - Selecting flavor while extracting table which requires to process background.

I have a pdf which has multiple tables having some cells with margins having colored background and some having no margin at all, only having the background color difference. While selecting the flavor as lattice or stream the alignment of the extracted text is getting disturbed a lot in case of extracting without margin cell values from tables. I even tried the same with process_background = True which is not solving the problem.

Is there any way to resolve the issue?

opened by Akhurana01 2

Fix for "No module named camelot.ext" error

When i followed the instructions using excalibur I ran into the following issue.

Traceback (most recent call last):
  File "/Users/balakumaranpalanivel/.pyenv/versions/3.7.9/bin/excalibur", line 5, in <module>
    from excalibur.cli import cli
  File "/Users/balakumaranpalanivel/ReposPersonal/excaliburRoot/excalibur-fork/excalibur/cli.py", line 8, in <module>
    from .tasks import split, extract
  File "/Users/balakumaranpalanivel/ReposPersonal/excaliburRoot/excalibur-fork/excalibur/tasks.py", line 9, in <module>
    from camelot.ext.ghostscript import Ghostscript
ModuleNotFoundError: No module named 'camelot.ext'

There seems to be multiple different ways to fix online. But the root cause seems to be this commit where the ext folder was removed in camelot but excalibur continues to use it.

This fix seems to be the most popular one based on stackoverflow upvotes and makes sense to me. But please correct me if am wrong.

P.S - I had a look atcontributing guidelines, i hope i did not miss anything 🤞

opened by balakumaranpalanivel 0

Bump decode-uri-component from 0.2.0 to 0.2.2 in /public
Bumps decode-uri-component from 0.2.0 to 0.2.2.

Release notes

Sourced from decode-uri-component's releases.

v0.2.2

Prevent overwriting previously decoded tokens 980e0bf

https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.1...v0.2.2

v0.2.1

Switch to GitHub workflows 76abc93

Fix issue where decode throws - fixes #6 746ca5d

Update license (#1) 486d7e2

Tidelift tasks a650457

Meta tweaks 66e1c28

https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.1

Commits

a0eea46 0.2.2

980e0bf Prevent overwriting previously decoded tokens

3c8a373 0.2.1

76abc93 Switch to GitHub workflows

746ca5d Fix issue where decode throws - fixes #6

486d7e2 Update license (#1)

a650457 Tidelift tasks

66e1c28 Meta tweaks

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies javascript
opened by dependabot[bot] 0

Error during `excalibur initdb` on Windows 10


C:\Users\user\Documents\MLReportParser>excalibur initdb
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\Scripts\excalibur.exe\__main__.py", line 4, in <module>  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\excalibur\cli.py", line 7, in <module>
    from . import __version__, settings
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\excalibur\settings.py", line 6, in <module>
    from sqlalchemy import create_engine, exc
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlalchemy\__init__.py", line 12, in <module>
    from sqlalchemy.sql import (
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlalchemy\sql\__init__.py", line 7, in <module>
    from sqlalchemy.sql.expression import (
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlalchemy\sql\expression.py", line 32, in <module>
    from sqlalchemy import util, exc
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlalchemy\util\__init__.py", line 7, in <module>
    from .compat import callable, cmp, reduce, defaultdict, py25_dict, \
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlalchemy\util\compat.py", line 202, in <module>
    time_func = time.clock
AttributeError: module 'time' has no attribute 'clock'

opened by js333031 0

Bump engine.io and browser-sync in /public
Bumps engine.io to 6.2.1 and updates ancestor dependency browser-sync. These dependencies need to be updated together.

Updates engine.io from 3.2.0 to 6.2.1

Release notes

Sourced from engine.io's releases.

6.2.1

:warning: This release contains an important security fix :warning:

A malicious client could send a specially crafted HTTP request, triggering an uncaught exception and killing the Node.js process:

Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:209:20) Emitted 'error' event on Socket instance at: at emitErrorNT (internal/streams/destroy.js:106:8) at emitErrorCloseNT (internal/streams/destroy.js:74:3) at processTicksAndRejections (internal/process/task_queues.js:80:21) { errno: -104, code: 'ECONNRESET', syscall: 'read' }

Please upgrade as soon as possible.

Bug Fixes

catch errors when destroying invalid upgrades (#658) (425e833)

6.2.0

Features

add the "maxPayload" field in the handshake details (088dcb4)

So that clients in HTTP long-polling can decide how many packets they have to send to stay under the maxHttpBufferSize value.

This is a backward compatible change which should not mandate a new major revision of the protocol (we stay in v4), as we only add a field in the JSON-encoded handshake data:

0{"sid":"lv_VI97HAXpY6yYWAAAC","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000,"maxPayload":1000000}

Links

Diff: https://github.com/socketio/engine.io/compare/6.1.3...6.2.0

Client release: 6.2.0

ws version: ~8.2.3

6.1.3

Bug Fixes

typings: allow CorsOptionsDelegate as cors options (#641) (a463d26)

uws: properly handle chunked content (#642) (3367440)

... (truncated)

Changelog

Sourced from engine.io's changelog.

6.2.1 (2022-11-20)

:warning: This release contains an important security fix :warning:

A malicious client could send a specially crafted HTTP request, triggering an uncaught exception and killing the Node.js process:

Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:209:20) Emitted 'error' event on Socket instance at: at emitErrorNT (internal/streams/destroy.js:106:8) at emitErrorCloseNT (internal/streams/destroy.js:74:3) at processTicksAndRejections (internal/process/task_queues.js:80:21) { errno: -104, code: 'ECONNRESET', syscall: 'read' }

Please upgrade as soon as possible.

Bug Fixes

catch errors when destroying invalid upgrades (#658) (425e833)

3.6.0 (2022-06-06)

Bug Fixes

add extension in the package.json main entry (#608) (3ad0567)

do not reset the ping timer after upgrade (1f5d469), closes socketio/socket.io-client-swift#1309

Features

decrease the default value of maxHttpBufferSize (58e274c)

This change reduces the default value from 100 mb to a more sane 1 mb.

This helps protect the server against denial of service attacks by malicious clients sending huge amounts of data.

See also: https://github.com/advisories/GHSA-j4f2-536g-r55m

increase the default value of pingTimeout (f55a79a)

... (truncated)

Commits

24b847b chore(release): 6.2.1

425e833 fix: catch errors when destroying invalid upgrades (#658)

99adb00 chore(deps): bump xmlhttprequest-ssl and engine.io-client in /examples/latenc...

d196f6a chore(deps): bump minimatch from 3.0.4 to 3.1.2 (#660)

7c1270f chore(deps): bump nanoid from 3.1.25 to 3.3.1 (#659)

535a01d ci: add Node.js 18 in the test matrix

1b71a6f docs: remove "Vanilla JS" highlight from README (#656)

917d1d2 refactor: replace deprecated String.prototype.substr() (#646)

020801a chore: add changelog for version 3.6.0

ed1d6f9 test: make test script work on Windows (#643)

Additional commits viewable in compare view

Updates browser-sync from 2.24.7 to 2.27.10

Release notes

Sourced from browser-sync's releases.

2.27.9

What's Changed

fix(cli): Where's the command help? fixes #1929 by @shakyShane in BrowserSync/browser-sync#1945

A bug prevented the help output from displaying - it was introduced when the CLI parser yargs was updated, and is now fixed :)

Full Changelog: https://github.com/BrowserSync/browser-sync/compare/v2.27.8...v2.27.9

2.27.8

This release upgrades Socket.io (client+server) to the latest versions - solving the following issues, and silencing security warning :)

PR:

https://github.com/BrowserSync/browser-sync/commit/58ab4ab861d7c50b4349f25bdd4c7f8871d0ad32

Resolved Issues:

BrowserSync/browser-sync#1850

BrowserSync/browser-sync#1892

BrowserSync/browser-sync#1925

BrowserSync/browser-sync#1926

BrowserSync/browser-sync#1933

Thanks to @lachieh for the original PR, which helped me land this fix

added snippet: boolean option

This release adds a feature to address BrowserSync/browser-sync#1882

Sometimes you don't want Browsersync to auto-inject it's connection snippet into your HTML - now you can disable it globally via either a CLI param or the new snippet option :)

browser-sync . --no-snippet

or in any Browsersync configuration

const config = { snippet: false, };

the original request was related to Eleventy usage, so here's how that would look

eleventyConfig.setBrowserSyncConfig({ snippet: false, });

... (truncated)

Commits

f6965a6 v2.27.10

e6c7bed Updated portscanner to 2.2.0 (#1960)

6a587ec fix readme's

91258ae Merge branch 'browser-sync-1946-esbuild'

f48d6b4 👋 app veyor

30c24dc Merge pull request #1947

9d24de5 drop webpack from UI

7a00341 build client with esbuild

c30868a v2.27.9

9b5fcdc fix(cli): Where's the command help? fixes #1929 (#1945)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies javascript
opened by dependabot[bot] 0
Bump minimatch and gulp in /public
Bumps minimatch to 3.0.4 and updates ancestor dependency gulp. These dependencies need to be updated together.

Updates minimatch from 0.2.14 to 3.0.4

Commits

e46989a v3.0.4

ddfacbd update brace-expansion

55ed736 update package scripts and deps

eed8949 v3.0.3

ecabc57 Do not throw on unfinished !( extglob patterns

81edb7c v3.0.2

6944abf Handle extremely long and terrible patterns more gracefully

8ac560e v3.0.1

4f3a8bc update tap

9cf2d88 Remove mentions of cache from readme

Additional commits viewable in compare view

Maintainer changes

This version was pushed to npm by isaacs, a new releaser for minimatch since your current version.

Updates gulp from 3.9.1 to 4.0.2

Release notes

Sourced from gulp's releases.

v4.0.2

Fix

Bind src/dest/symlink to the gulp instance to support esm exports (5667666) - Ref standard-things/esm#797

Docs

Add notes about esm support (4091bd3) - Closes #2278

Fix the Negative Globs section & examples (3c66d95) - Closes #2297

Remove next tag from recipes (1693a11) - Closes #2277

Add default task wrappers to Watching Files examples to make runnable (d916276) - Closes #2322

Fix syntax error in lastRun API docs (ea52a92) - Closes #2315

Fix typo in Explaining Globs (5d81f42) - Closes #2326

Build

Add node 12 to Travis & Azure (b4b5a68)

v4.0.1

Fix

Temporary workaround for facebook/Docusaurus#257 (9f4a2e9) - Closes facebook/Docusaurus#257

Docs

Fix error in ES2015 usage example (a4e8d48) - Closes #2099 #2100

Add temporary notice for 4.0.0 vs 3.9.1 documentation (126423a) - Closes #2121

Improve recipe for empty glob array (45830cf) - Closes #2122

Reword standard to default (b065a13)

Fix recipe typo (86acdea) - Closes #2156

Add front-matter to each file (d693e49) - Closes #2109

Rename "Getting Started" to "Quick Start" & update it (6a0fa00)

Add "Creating Tasks" documentation (21b6962)

Add "JavaScript and Gulpfiles" documentation (31adf07)

Add "Working with Files" documentation (50fafc6)

Add "Async Completion" documentation (ad8b568)

Add "Explaining Globs" documentation (f8cafa0)

Add "Using Plugins" documentation (233c3f9)

Add "Watching Files" documentation (f3f2d9f)

Add Table of Contents to "Getting Started" directory (a43caf2)

Improve & fix parts of Getting Started (84b0234)

Create and link-to a "docs missing" page for LINK_NEEDED references (2bd75d0)

Redirect users to new Getting Started guides (53e9727)

Temporarily reference [email protected] in Quick Start (2cecf1e)

Fixed a capitalization typo in a heading (3d051d8) - Closes #2242

Use h2 headers within Quick Start documentation (921312c) - Closes #2241

Fix for nested directories references (4c2b9a7)

Add some more cleanup for Docusaurus (6a8fd8f)

Temporarily point LINK_NEEDED references to documentation-missing.md (df7cdcb)

API documentation improvements based on feedback (0a68710)

... (truncated)

Changelog

Sourced from gulp's changelog.

gulp changelog

4.0.0

Task system changes

replaced 3.x task system (orchestrator) with new task system (bach)

removed gulp.reset

removed 3 argument syntax for gulp.task

gulp.task should only be used when you will call the task with the CLI

added gulp.series and gulp.parallel methods for composing tasks. Everything must use these now.

added single argument syntax for gulp.task which allows a named function to be used as the name of the task and task function.

added gulp.tree method for retrieving the task tree. Pass { deep: true } for an archy compatible node list.

added gulp.registry for setting custom registries.

CLI changes

split CLI out into a module if you want to save bandwidth/disk space. you can install the gulp CLI using either npm install gulp -g or npm install gulp-cli -g, where gulp-cli is the smaller one (no module code included)

add --tasks-json flag to CLI to dump the whole tree out for other tools to consume

added --verify flag to check the dependencies in package.json against the plugin blacklist.

vinyl/vinyl-fs changes

added gulp.symlink which functions exactly like gulp.dest, but symlinks instead.

added dirMode param to gulp.dest and gulp.symlink which allows better control over the mode of the destination folder that is created.

globs passed to gulp.src will be evaluated in order, which means this is possible gulp.src(['*.js', '!b*.js', 'bad.js']) (exclude every JS file that starts with a b except bad.js)

performance for gulp.src has improved massively

gulp.src(['**/*', '!b.js']) will no longer eat CPU since negations happen during walking now

added since option to gulp.src which lets you only match files that have been modified since a certain date (for incremental builds)

fixed gulp.src not following symlinks

added overwrite option to gulp.dest which allows you to enable or disable overwriting of existing files

Commits

069350a Release: 4.0.2

b4b5a68 Build: Add node 12 to Travis & Azure

5667666 Fix: Bind src/dest/symlink to the gulp instance to support esm exports (ref s...

4091bd3 Docs: Add notes about esm support (closes #2278)

3c66d95 Docs: Fix the Negative Globs section & examples (closes #2297)

1693a11 Docs: Remove next tag from recipes (closes #2277)

d916276 Docs: Add default task wrappers to Watching Files examples to make runnable (...

ea52a92 Docs: Fix syntax error in lastRun API docs (closes #2315)

5d81f42 Docs: Fix typo in Explaining Globs (#2326)

ea3bba4 Release: 4.0.1

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies javascript
opened by dependabot[bot] 0
Bump socket.io-parser and browser-sync in /public
Bumps socket.io-parser to 4.2.1 and updates ancestor dependency browser-sync. These dependencies need to be updated together.

Updates socket.io-parser from 3.1.3 to 4.2.1

Release notes

Sourced from socket.io-parser's releases.

4.2.1

Bug Fixes

check the format of the index of each attachment (b5d0cb7)

Links

Diff: https://github.com/socketio/socket.io-parser/compare/4.2.0...4.2.1

4.2.0

Features

allow the usage of custom replacer and reviver (#112) (b08bc1a)

Links

Diff: https://github.com/socketio/socket.io-parser/compare/4.1.2...4.2.0

4.1.2

Bug Fixes

allow objects with a null prototype in binary packets (#114) (7f6b262)

Links

Diff: https://github.com/socketio/socket.io-parser/compare/4.1.1...4.1.2

4.1.1

Links

Diff: https://github.com/socketio/socket.io-parser/compare/4.1.0...4.1.1

4.1.0

Features

provide an ESM build with and without debug (388c616)

Links

Diff: https://github.com/socketio/socket.io-parser/compare/4.0.4...4.1.0

4.0.5

Bug Fixes

check the format of the index of each attachment (b559f05)

Links

Diff: https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5

... (truncated)

Changelog

Sourced from socket.io-parser's changelog.

4.2.1 (2022-06-27)

Bug Fixes

check the format of the index of each attachment (b5d0cb7)

4.2.0 (2022-04-17)

Features

allow the usage of custom replacer and reviver (#112) (b08bc1a)

4.1.2 (2022-02-17)

Bug Fixes

allow objects with a null prototype in binary packets (#114) (7f6b262)

4.1.1 (2021-10-14)

4.1.0 (2021-10-11)

Features

provide an ESM build with and without debug (388c616)

4.0.4 (2021-01-15)

Bug Fixes

allow integers as event names (1c220dd)

4.0.3 (2021-01-05)

4.0.2 (2020-11-25)

... (truncated)

Commits

5a2ccff chore(release): 4.2.1

b5d0cb7 fix: check the format of the index of each attachment

c7514b5 chore(release): 4.2.0

931f152 chore: add Node.js 16 in the test matrix

6c9cb27 chore: bump @socket.io/component-emitter to version 3.1.0

b08bc1a feat: allow the usage of custom replacer and reviver (#112)

aed252c chore(release): 4.1.2

89209fa chore: bump cached-path-relative from 1.0.2 to 1.1.0 (#113)

0a3b556 chore: bump path-parse from 1.0.6 to 1.0.7 (#108)

7f6b262 fix: allow objects with a null prototype in binary packets (#114)

Additional commits viewable in compare view

Updates browser-sync from 2.24.7 to 2.27.10

Release notes

Sourced from browser-sync's releases.

2.27.9

What's Changed

fix(cli): Where's the command help? fixes #1929 by @shakyShane in BrowserSync/browser-sync#1945

A bug prevented the help output from displaying - it was introduced when the CLI parser yargs was updated, and is now fixed :)

Full Changelog: https://github.com/BrowserSync/browser-sync/compare/v2.27.8...v2.27.9

2.27.8

This release upgrades Socket.io (client+server) to the latest versions - solving the following issues, and silencing security warning :)

PR:

https://github.com/BrowserSync/browser-sync/commit/58ab4ab861d7c50b4349f25bdd4c7f8871d0ad32

Resolved Issues:

BrowserSync/browser-sync#1850

BrowserSync/browser-sync#1892

BrowserSync/browser-sync#1925

BrowserSync/browser-sync#1926

BrowserSync/browser-sync#1933

Thanks to @lachieh for the original PR, which helped me land this fix

added snippet: boolean option

This release adds a feature to address BrowserSync/browser-sync#1882

Sometimes you don't want Browsersync to auto-inject it's connection snippet into your HTML - now you can disable it globally via either a CLI param or the new snippet option :)

browser-sync . --no-snippet

or in any Browsersync configuration

const config = { snippet: false, };

the original request was related to Eleventy usage, so here's how that would look

eleventyConfig.setBrowserSyncConfig({ snippet: false, });

... (truncated)

Commits

f6965a6 v2.27.10

e6c7bed Updated portscanner to 2.2.0 (#1960)

6a587ec fix readme's

91258ae Merge branch 'browser-sync-1946-esbuild'

f48d6b4 👋 app veyor

30c24dc Merge pull request #1947

9d24de5 drop webpack from UI

7a00341 build client with esbuild

c30868a v2.27.9

9b5fcdc fix(cli): Where's the command help? fixes #1929 (#1945)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies javascript
opened by dependabot[bot] 0

Releases(v0.4.3)

v0.4.3(Jul 17, 2020)
Please download the appropriate package for your operating system and run the executable. You can then go to http://localhost:5000 and you should be able to see the web interface!

Improvements

GitHub Action for automated Linux, MacOS and Windows builds. #107 by arky.

The command-line interface can now be run using python -m excalibur. #97.

Excalibur now follows the Black code style! #98.

Bugfixes

#95 Fix Werkzeug import error. #96.

#41 Fix failed function call wrt to upstream changes. #42.

#32 Changing port in excalibur.cfg has no effect. #34.

Source code(tar.gz)
Source code(zip)
excalibur-macos-latest-x64.zip(82.70 MB)
excalibur-ubuntu-latest-x64.zip(93.88 MB)
excalibur-windows-latest-ia32.zip(43.37 MB)
excalibur-windows-latest-x64.zip(56.60 MB)
v0.4.0(Nov 25, 2018)
This release adds a lot of UI enhancements (based on user feedback), and updates to the docs and website. Table autodetection is now even more awesome 🔥 since Camelot v0.4.0 adds an improved Stream table detection algorithm. To know more, check out #206.

You can download excalibur-win-v0.4.0.zip (for Windows) or excalibur-linux-v0.4.0.zip (for Linux).

After extracting the contents to a folder, you need to execute arthur.exe (for Windows) or arthur (for Linux).

Now go to http://localhost:5000 and you should be able to see the web interface!

Source code(tar.gz)
Source code(zip)
excalibur-linux-v0.4.0.zip(71.54 MB)
excalibur-win-v0.4.0.zip(55.13 MB)
v0.3.0(Nov 12, 2018)
This release adds enhancements to the static website, a rule manager to view, upload and download table extraction rules and an option to load a saved extraction rule on the workspace.

You can download excalibur-win-v0.3.0.zip (for Windows) or excalibur-linux-v0.3.0.zip (for Linux).

After extracting the contents to a folder, you need to execute arthur.exe (for Windows) or arthur (for Linux).

Now go to http://localhost:5000 and you should be able to see the web interface!

Source code(tar.gz)
Source code(zip)
excalibur-linux-v0.3.0.zip(57.15 MB)
excalibur-win-v0.3.0.zip(53.88 MB)
v0.2.1(Nov 6, 2018)
This release adds a static website, an option to auto-detect tables on the web interface, and support for MySQL and Celery.

You can download excalibur-win-v0.2.1.zip (for Windows) or excalibur-linux-v0.2.1.zip (for Linux).

After extracting the contents to a folder, you need to execute arthur.exe (for Windows) or arthur (for Linux).

Now go to http://localhost:5000 and you should be able to see the web interface!

Source code(tar.gz)
Source code(zip)
excalibur-linux-v0.2.1.zip(57.25 MB)
excalibur-win-v0.2.1.zip(53.88 MB)
v0.1.1(Oct 22, 2018)
This release adds executables for Windows and Linux and updates documentation.

You can download excalibur-win-v0.1.1.zip (for Windows) or excalibur-linux-v0.1.1.zip (for Linux).

After extracting the contents to a folder, you need to execute arthur.exe (for Windows) or arthur (for Linux).

Now go to http://localhost:5000 and you should be able to see the web interface!

Source code(tar.gz)
Source code(zip)
excalibur-linux-v0.1.1.zip(69.20 MB)
excalibur-win-v0.1.1.zip(64.58 MB)

Excalibur: A web interface to extract tabular data from PDFs

Related tags

Overview

Excalibur: A web interface to extract tabular data from PDFs

Using Excalibur

Why Excalibur?

Installation

Using pip

From the source code

Documentation

Development

Source code

Setting up a development environment

Testing (soon)

Versioning

License

Support the development

Comments

v0.2.2

v0.2.1

6.2.1

Bug Fixes

6.2.0

Features

Links

6.1.3

Bug Fixes

6.2.1 (2022-11-20)

Bug Fixes

3.6.0 (2022-06-06)

Bug Fixes

Features

2.27.9

What's Changed

2.27.8

added snippet: boolean option

v4.0.2

Fix

Docs

Build

v4.0.1

Fix

Docs

gulp changelog

4.0.0

Task system changes

CLI changes

vinyl/vinyl-fs changes

4.2.1

Bug Fixes

Links

4.2.0

Features

Links

4.1.2

Bug Fixes

Links

4.1.1

Links

4.1.0

Features

Links

4.0.5

Bug Fixes

Links

4.2.1 (2022-06-27)

Bug Fixes

4.2.0 (2022-04-17)

Features

4.1.2 (2022-02-17)

Bug Fixes

4.1.1 (2021-10-14)

4.1.0 (2021-10-11)

Features

4.0.4 (2021-01-15)

Bug Fixes

4.0.3 (2021-01-05)

4.0.2 (2020-11-25)

2.27.9

What's Changed

added `snippet: boolean` option

added `snippet: boolean` option