Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Overview

Tesseract.js

Lint & Test CodeQL Gitpod Ready-to-Code Financial Contributors on Open Collective npm version Maintenance License Code Style Downloads Total Downloads Month

Version 2 is now available and under development in the master branch, read a story about v2: Why I refactor tesseract.js v2?
Check the support/1.x branch for version 1


Tesseract.js is a javascript library that gets words in almost any language out of images. (Demo)

Image Recognition

fancy demo gif

Video Real-time Recognition

Tesseract.js Video

Tesseract.js wraps an emscripten port of the Tesseract OCR Engine. It works in the browser using webpack or plain script tags with a CDN and on the server with Node.js. After you install it, using it is as simple as:

import Tesseract from 'tesseract.js';

Tesseract.recognize(
  'https://tesseract.projectnaptha.com/img/eng_bw.png',
  'eng',
  { logger: m => console.log(m) }
).then(({ data: { text } }) => {
  console.log(text);
})

Or more imperative

import { createWorker } from 'tesseract.js';

const worker = createWorker({
  logger: m => console.log(m)
});

(async () => {
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
  console.log(text);
  await worker.terminate();
})();

Check out the docs for a full explanation of the API.

Major changes in v2

  • Upgrade to tesseract v4.1.1 (using emscripten 1.39.10 upstream)
  • Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese
  • Supported image formats: png, jpg, bmp, pbm
  • Support WebAssembly (fallback to ASM.js when browser doesn't support)
  • Support Typescript

Installation

Tesseract.js works with a <script> tag via local copy or CDN, with webpack via npm and on Node.js with npm/yarn.

CDN

<!-- v2 -->
<script src='https://unpkg.com/[email protected]/dist/tesseract.min.js'></script>

<!-- v1 -->
<script src='https://unpkg.com/[email protected]/src/index.js'></script>

After including the script the Tesseract variable will be globally available.

Node.js

Tesseract.js currently requires Node.js v6.8.0 or higher

# For v2
npm install tesseract.js
yarn add tesseract.js

# For v1
npm install [email protected]
yarn add [email protected]

Documentation

Use tesseract.js the way you like!

Contributing

Development

To run a development copy of Tesseract.js do the following:

# First we clone the repository
git clone https://github.com/naptha/tesseract.js.git
cd tesseract.js

# Then we install the dependencies
npm install

# And finally we start the development server
npm start

The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser. It will automatically rebuild tesseract.dev.js and worker.dev.js when you change files in the src folder.

Online Setup with a single Click

You can use Gitpod(A free online VS Code like IDE) for contributing. With a single click it will launch a ready to code workspace with the build & start scripts already in process and within a few seconds it will spin up the dev server so that you can start contributing straight away without wasting any time.

Open in Gitpod

Building Static Files

To build the compiled static files just execute the following:

npm run build

This will output the files into the dist directory.

Contributors

Code Contributors

This project exists thanks to all the people who contribute. [Contribute].

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. [Contribute]

Comments
  • react-native support?

    react-native support?

    Hey guys!! I wonder if you have considered bringing support for frameworks like react-native through node. I was working on a tesseract wrapper for react-native but your lib looks much better. (Considering that now the wrapper is only implemented on android)

    So, I tryed to create a test using yours but I'm getting this error

    rsz_14632600_1552933568054084_273631139_o

    opened by jonathanpalma 30
  • TypeError: TesseractWorker is not a constructor

    TypeError: TesseractWorker is not a constructor

    const Worker= new TesseractWorker();//For analyzing images ^

    TypeError: TesseractWorker is not a constructor at Object. (/Users/hyder/Desktop/OCR-PDF/app.js:6:15) at Module._compile (internal/modules/cjs/loader.js:774:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10) at Module.load (internal/modules/cjs/loader.js:641:32) at Function.Module._load (internal/modules/cjs/loader.js:556:12) at Function.Module.runMain (internal/modules/cjs/loader.js:837:10) at internal/main/run_main_module.js:17:11 [nodemon] app crashed - waiting for file changes before starting...

    Please suggest a solution

    opened by hussainhyder23 23
  • Cannot read property 'arrayBuffer' of undefined (Electron & React)

    Cannot read property 'arrayBuffer' of undefined (Electron & React)

    Describe the bug I've spent a few hours tonight trying to get tesseract.js working with an application I've been building. The stack is Electron & React and I can't seem to get it to work, I've pulled both the Electron & React example applications and they seem to work fine, but with my application, I'm bundling React inside Electron--which I think might be causing this issue.

    At first, my application wasn't loading the languages with the default setup, so I went ahead and moved to the offline tesseract. To do this, I used Webpack to copy the files from the node_modules to my build folder using copy-webpack-plugin, this works fine, so then I went ahead and created the worker like so:

    const worker = createWorker({
      cacheMethod: 'none',
      langPath: `http://localhost:3000/static/vendor/lang-data/eng.traineddata`,
      workerPath: `http://localhost:3000/static/vendor/worker.min.js`,
      corePath: `http://localhost:3000/static/vendor/tesseract-core.wasm.js`,
      logger: (m) => console.log(m),
    });
    

    Note: If I remove http://localhost:3000/ - I get Uncaught DOMException: Failed to execute 'importScripts' on 'WorkerGlobalScope': The URL '/static/vendor/worker.min.js' is invalid.

    After running the application with the steps below, I get the following error: Uncaught (in promise) TypeError: Cannot read property 'arrayBuffer' of undefined - I've spent a few hours trying to debug this, but to no avail. The langPath, workerPath, corePath all seem correct, and I can access these directly in the browser.

    I'm kind of stumped at this point, any help would be appreciated.

    To Reproduce Steps to reproduce the behavior:

    1. Go to 'https://github.com/karlhadwen/notes' - pull the repo
    2. yarn install & yarn dev
    3. Click the [+] button on the bottom left (with console open)
    4. See error (Cannot read property 'arrayBuffer' of undefined)

    Expected behavior To read the data from the image in 'http://localhost:3000/note.png' - which is the example image.

    Screenshots Screenshot 2020-05-18 at 22 22 41

    App.js: https://github.com/karlhadwen/notes/blob/master/src/App.js electron.js: https://github.com/karlhadwen/notes/blob/master/public/electron.js .webpack.config.js: https://github.com/karlhadwen/notes/blob/master/.webpack.config.js

    Desktop (please complete the following information):

    • OS: OS X (10.15.4)
    • Electron & Chrome - both do not work
    • Version: ^2.1.1

    Additional context Repo where this is happening: https://github.com/karlhadwen/notes/

    opened by karlhadwen 20
  • Please call SetImage before attempting recognition.

    Please call SetImage before attempting recognition.

    Describe the bug

    I am trying to use tesseract.js in nodejs, and can't seem to get it to work.

    To Reproduce

    I make code like this:

    const tesseract = require('tesseract.js')
    
    const extractImageText = async filename => {
      const worker = tesseract.createWorker()
      await worker.load()
      await worker.loadLanguage('eng')
      await worker.initialize('eng')
      const { data: { text } } = await worker.recognize(filename)
      await worker.terminate()
      return text
    }
    
    extractImageText('test.pdf').then(console.log)
    

    I get this error:

    Error in pixReadMemGif: function not present
    Error in pixReadMem: gif: no pix returned
    Error in pixGetSpp: pix not defined
    Error in pixGetDimensions: pix not defined
    Error in pixGetColormap: pix not defined
    Error in pixCopy: pixs not defined
    Error in pixGetDepth: pix not defined
    Error in pixGetWpl: pix not defined
    Error in pixGetYRes: pix not defined
    Error in pixClone: pixs not defined
    Please call SetImage before attempting recognition.
    PROJECT/node_modules/tesseract.js/src/createWorker.js:173
            throw Error(data);
            ^
    
    Error: RuntimeError: function signature mismatch
        at ChildProcess.<anonymous> (PROJECT/node_modules/tesseract.js/src/createWorker.js:173:15)
        at ChildProcess.emit (events.js:209:13)
        at emit (internal/child_process.js:876:12)
        at processTicksAndRejections (internal/process/task_queues.js:77:11)
    

    Desktop (please complete the following information):

    • OS: OSX Catalina 10.15.5 (19F101)
    • Node: v12.9.1
    • Versio:n 2.1.1
    opened by konsumer 18
  • Incorrect header check at Zlib._handle.onerror (zlib.js:363:17)

    Incorrect header check at Zlib._handle.onerror (zlib.js:363:17)

    I'm trying to process an image which is saved locally in my node server. I'm getting following error:

    2017-06-17T16:12:45.087797+00:00 app[web.1]: File write complete-- /app/sample.png 2017-06-17T16:12:46.065537+00:00 app[web.1]: pre-main prep time: 61 ms 2017-06-17T16:12:46.114192+00:00 app[web.1]: events.js:154 2017-06-17T16:12:46.114195+00:00 app[web.1]: throw er; // Unhandled 'error' event 2017-06-17T16:12:46.114196+00:00 app[web.1]: ^ 2017-06-17T16:12:46.114197+00:00 app[web.1]: 2017-06-17T16:12:46.114198+00:00 app[web.1]: Error: incorrect header check 2017-06-17T16:12:46.114201+00:00 app[web.1]: at Zlib._handle.onerror (zlib.js:363:17)

    Here is my code:

    Tesseract.recognize(completeFilePath)
       .then(function(data) {
       		console.log('Job completed');
       	})
       	.catch(function(err){
            console.log('catch\n', err);
         })
       	.finally(function(e){
            console.log('Finally');
            //cleanup temp file
         });
    
    opened by abhisheksett 17
  • Tesseractjs recognize Error: abort(undefined). Build with -s ASSERTIONS=1 for more info

    Tesseractjs recognize Error: abort(undefined). Build with -s ASSERTIONS=1 for more info

    Using tesseractjs in vue-cli 3 I use tesseractjs to recognize numbers with downloading language packages to local. After some tests, i got the runtime error " abort(undefined). Build with -s ASSERTIONS=1 for more info ". I dont't konw what happened as I restart the project ang ran tesseract successfully.

    Steps Steps to reproduce the behavior:

    1. start the project
    2. dev_environment: (checked means performed well)
    • [ ] recognize some images on mobile(access by ip address)
    • [x] recognize some images on pc(access by localhost)
    • [ ] recognize some images on pc(access by ip address)
    1. see error - abort(undefingoued). Build with -s ASSERTIONS=1 for more info
    2. all is well

    Expected behavior I want to know if this is a bug on Tesseractjs and what can I do to avoid the same situation.

    Desktop (please complete the following information):

    • Browser chrome
    • Version 78.0.3904.97
    opened by hello-astar 15
  • Current CDN Example Not Working

    Current CDN Example Not Working

    Hi. I'm trying to conduct a very simple test using just a single HTML file and by including the tesseract.js script using the CDN source in the documentation:

    <script src='https://cdn.rawgit.com/naptha/tesseract.js/1.0.10/dist/tesseract.js'></script>
    

    My HTML file is simple:

    <html>
        <head>
            <script src='https://cdn.rawgit.com/naptha/tesseract.js/1.0.10/dist/tesseract.js'></script>
            <title>Tesseract Test</title>
        </head>
        <body>
            <label for="fileInput">Choose File to OCR:</label>
            <input type="file" id="fileInput" name="fileInput"/>
            <br />
            <br />
            <div id="document-content">
            </div>
        </body>
        <script>
            document.addEventListener('DOMContentLoaded', function(){
                var fileInput = document.getElementById('fileInput');
                fileInput.addEventListener('change', handleInputChange);
            });
    
            function handleInputChange(event){
                var input = event.target;
                var file = input.files[0];
                console.log(file);
                Tesseract.recognize(file)
                    .progress(function(message){
                        console.log(message);
                    })
                    .then(function(result){
                        var contentArea = document.getElementById('document-content');
                        console.log(result);
                    })
                    .catch(function(err){
                        console.error(err);
                    });
            }
        </script>
    </html>
    

    But if I try to add an image, nothing happens in the console or anywhere else. This is also true if I clone the repository and instead load tesseract.js from the dist directory.

    I see that the main (non-github) website for the project uses the CDN version 1.0.7, so I tried using that source instead. It came to life and started reporting back progress, but then threw the following error:

    tesseract_example.html:27 Object {status: "loading tesseract core", progress: 0}
    tesseract_example.html:27 Object {status: "loading tesseract core", progress: 1}
    tesseract_example.html:27 Object {status: "initializing tesseract", progress: 0}
    index.js:10 pre-main prep time: 65 ms
    tesseract_example.html:27 Object {status: "initializing tesseract", progress: 1}
    worker.js:11953 Uncaught DOMException: Failed to execute 'postMessage' on 'DedicatedWorkerGlobalScope': An object could not be cloned.(…)
    
    (anonymous function)	@	worker.js:11953
    respond	@	worker.js:12185
    dispatchHandlers	@	worker.js:12205
    (anonymous function)	@	worker.js:11952
    
    

    Am I just doing this wrong somehow?

    (Using Chrome 54 in OSX 10.11.)

    opened by darth-cheney 15
  • Working with Tesseract.js with custom language and without internet connection

    Working with Tesseract.js with custom language and without internet connection

    Hey,

    Wonder if it's possible to use tesseract.js on a mobile app using a custom traineddata file? In addition, is it possible to use it offline? locally from the mobile device without an internet connection.

    Thanks.

    opened by caspx 14
  • Tesseract couldn't load any languages!

    Tesseract couldn't load any languages!

    Hey folks, I'm just trying out tesseract.js and seem to be missing something... I've installed it via npm, and am trying to run what is basically the simple example in node 7:

    const Tesseract = require('tesseract.js');
    const image = require('path').resolve(__dirname, 'test.jpeg')
    
    Tesseract.recognize(image)
    .then(data => console.log('then\n', data.text))
    .catch(err => console.log('catch\n', err))
    .finally(e => {
      console.log('finally\n');
      process.exit();
    });
    

    Running this file the first time generated this error:

    // progress { status: 'loading tesseract core' }
    // progress { status: 'loaded tesseract core' }
    // progress { status: 'initializing tesseract', progress: 0 }
    // pre-main prep time: 131 ms
    // progress { status: 'initializing tesseract', progress: 1 }
    // progress { status: 'downloading eng.traineddata.gz',
    //   loaded: 116,
    //   progress: 0.000012270517521770119 }
    // events.js:160
    //       throw er; // Unhandled 'error' event
    //       ^
    
    // Error: incorrect header check
    //     at Zlib._handle.onerror (zlib.js:356:17)
    
    // SECOND ERROR
    // AdaptedTemplates != NULL:Error:Assert failed:in file ../classify/adaptmatch.cpp, line 190
    

    Subsequent running of the file results in this error:

    pre-main prep time: 83 ms
    Failed loading language 'eng'
    Tesseract couldn't load any languages!
    AdaptedTemplates != NULL:Error:Assert failed:in file ../classify/adaptmatch.cpp, line 190
    
    /Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:4
    function f(a){throw a;}var h=void 0,i=!0,j=null,k=!1;function aa(){return function(){}}function ba(a){return function(){return a}}var n,Module;Module||(Module=eval("(function() { try { return TesseractCore || {} } catch(e) { return {} } })()"));var ca={},da;for(da in Module)Module.hasOwnProperty(da)&&(ca[da]=Module[da]);var ea=i,fa=!ea&&i;
                  ^
    abort() at Error
        at Na (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:32:26)
        at Object.ka [as abort] (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:507:108)
        at _abort (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:373:173)
        at $L (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:383:55709)
        at jpa (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:388:22274)
        at lT (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:387:80568)
        at mT (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:387:80700)
        at Array.BS (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:387:69011)
        at bP (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:383:110121)
        at jT (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:387:80280)
    If this abort() is unexpected, build with -s ASSERTIONS=1 which can give more information.
    

    What am I missing?

    opened by intersim 13
  • Failed loading language 'eng'

    Failed loading language 'eng'

    because i couldn't get traineddate from cdn l downloaded data in my repository and tried to load it but failed...

    i used langPath to load from my local storage and i don't know why

    this is my javascript code

    var Tesseract = require('tesseract.js')
      const path = require("path");
      var imagePath= path.join(__dirname,"jake.jpg");
    
      Tesseract.create({
        langPath: path.join(__dirname, "langs")
      }).recognize(imagePath, {lang: "eng"}) 
          .then((result) => console.log(result.text));
    

    my traineddata is in 'langs' folder which is in the same repository with javascript file

    image and lang folder is in the same repository with javascript file above.

    opened by frontalnh 12
  • Worker loading language traineddata progress 0

    Worker loading language traineddata progress 0

    Describe the bug Using a basic example code I'm unable to get an extracted text from an image.

    Object { status: "loading tesseract core", progress: 0 }
    Object { status: "loading tesseract core", progress: 1 }
    Object { workerId: "Worker-0-ac418", status: "initializing tesseract", progress: 0 }
    Object { workerId: "Worker-0-ac418", status: "initialized tesseract", progress: 1 }
    Object { workerId: "Worker-0-ac418", status: "loading language traineddata", progress: 0 }
    

    after this point nothing happen.

    To Reproduce

    <template>
        <div>
            <button v-on:click="recognize">recognize</button>
        </div>
    </template>
    
    <script>
    import { createWorker } from "tesseract.js";
    
    const worker = createWorker({
        logger: m => console.log(m)
    });
    
    export default {
        name: "ocr-reader",
    
        methods: {
            "recognize": function() {
                await worker.load();
                await worker.loadLanguage("eng");
                await worker.initialize("eng");
                await worker.initialize("eng");
                const {
                    data: { text }
                } = await worker.recognize("http://localhost:8000/WEZK8.png");
                console.log(text);
                await worker.terminate();
            }
        }
    };
    </script>
    
    

    simplest Vue component

    Expected behavior I expect to see a text message on console

    Additional context I'm doing a test on my localhost. I checked everything is correctly loaded. even traineddata file is correctly downloaded with status 200

    opened by IAndreaGiuseppe 11
  • Error in PixCreateNoInit: pixdata_malloc fail for data

    Error in PixCreateNoInit: pixdata_malloc fail for data

    When processing an image of 3k x 4k pixels I'm receiving the following error:

    Error in pixCreateNoInit: pixdata_malloc fail for data
    node:internal/event_target:1011
       process.nextTick(()=> { throw err; });
    
    Error: RuntimeError: null function or function signature mismatch
       at Worker.<anonymous> (/usr/src/app/node_modules/tesseract.js/src/createWorker.js:173:15)
    ...
    

    This occurs with Tesseract.js 3.0.2 on Node.js 16.15.1 on Debian 11 inside Docker. This only appears to happen for large images. Do you know how to resolve this?

    opened by jasondalycan 3
  • Exposing the character classifier

    Exposing the character classifier

    I have a very niche use case where I want to recognize one-letter characters or symbols.

    As my understanding of OCR goes, once lines are isolated, characters are isolated and then the isolated characters are passed through a character classifier.

    Right now, passing the single characters through normal Tesseract mistakes a single isolated character for multiple different characters in the same image, or detect lines in a way that breaks the character classifier.

    Example: chrome_37q8KBt02d: gives the correct result of "testing, testing," chrome_YAkhaSuTpv: gives the result "Bl" presumably from the line detection or character isolation

    image

    Would it be possible to expose the character classification portion of Tesseract and skip the other portions?

    opened by Nomlax 1
  • Bump express from 4.17.1 to 4.17.3

    Bump express from 4.17.1 to 4.17.3

    Bumps express from 4.17.1 to 4.17.3.

    Release notes

    Sourced from express's releases.

    4.17.3

    4.17.2

    Changelog

    Sourced from express's changelog.

    4.17.3 / 2022-02-16

    4.17.2 / 2021-12-16

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Cannot initialize worker because of out of memory error on WASM

    Cannot initialize worker because of out of memory error on WASM

    Hi, im currently facing an issue on mobile phone using chrome 32 bits (with 64 bits works fine):

    • Device: Samung Galaxy S20 FE
    • OS: Android 12
    • Browser Google Chrome: 108.0.5359.79 (Official Build) (32-bit) Revision: 5194e1e1073e30a8fc93c72c2aee4bc572f5b07a-refs/branch-heads/5359_61 OS: Android 12; SM-G780F Build/SP1A.210812.016 JavaScript: V8 10.8.168.21

    Library Tesseract.js installed using npm ("tesseract.js": "^4.0.0")

    When using tesseract.js from react app sometimes hangs up on initialization, changing line 40 in src/worker-script/index.js with:

    Core({
          TesseractProgress(percent) {
            latestJob.progress({
              workerId,
              jobId,
              status: 'recognizing text',
              progress: Math.max(0, (percent - 30) / 70),
            });
          },
        }).then((tessModule) => {
          TessModule = tessModule;
          res.progress({ workerId, status: 'initialized tesseract', progress: 1 });
          res.resolve({ loaded: true });
        }).catch((err) => {
          console.error(err);
          res.reject(err.toString());
        });
    

    will reveal that there is a problem with memory its rejecting with message:

    RuntimeError: Aborted(RangeError: WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for new instance). Build with -sASSERTIONS for more info.
    
    opened by fmonpelat 9
  • Cannot run worker on electron renderer process

    Cannot run worker on electron renderer process

    I'm using react-electron-boilerplate and install tesseract by yarn add tesseract.js . Then I create a worker like snippet bellow, and it doesn't work, but when I move it to main process, it work

    • Expected: tesseract run on renderer process
    • Actually: tesseract run on main process only

    main.ts

    ipcMain.on('translate', async (event, args) => {
      const worker = createWorker();
      await worker.load();
      await worker.loadLanguage('eng');
      await worker.initialize('eng');
      const { data } = await worker.recognize(
        path.join(__dirname, '../../assets/icons/eng_bw.png')
      );
    console.log(text)
    });
    

    App.ts

     const start = async () => {
        const worker = createWorker();
        await worker.load();
    
        await worker.loadLanguage('eng');
        await worker.initialize('eng');
        const { data } = await worker.recognize('../../assets/icons/eng_bw.png');
        console.log(data);
      };
    
    start() // won't work
    

    Bonus: I'm trying to implement realtime translate (video translate) and I need to do OCR stuff on renderer process.

    Does anyone know a work around or something?

    opened by vuggy17 1
  • Bump qs from 6.5.2 to 6.5.3

    Bump qs from 6.5.2 to 6.5.3

    Bumps qs from 6.5.2 to 6.5.3.

    Changelog

    Sourced from qs's changelog.

    6.5.3

    • [Fix] parse: ignore __proto__ keys (#428)
    • [Fix] utils.merge`: avoid a crash with a null target and a truthy non-array source
    • [Fix] correctly parse nested arrays
    • [Fix] stringify: fix a crash with strictNullHandling and a custom filter/serializeDate (#279)
    • [Fix] utils: merge: fix crash when source is a truthy primitive & no options are provided
    • [Fix] when parseArrays is false, properly handle keys ending in []
    • [Fix] fix for an impossible situation: when the formatter is called with a non-string value
    • [Fix] utils.merge: avoid a crash with a null target and an array source
    • [Refactor] utils: reduce observable [[Get]]s
    • [Refactor] use cached Array.isArray
    • [Refactor] stringify: Avoid arr = arr.concat(...), push to the existing instance (#269)
    • [Refactor] parse: only need to reassign the var once
    • [Robustness] stringify: avoid relying on a global undefined (#427)
    • [readme] remove travis badge; add github actions/codecov badges; update URLs
    • [Docs] Clean up license text so it’s properly detected as BSD-3-Clause
    • [Docs] Clarify the need for "arrayLimit" option
    • [meta] fix README.md (#399)
    • [meta] add FUNDING.yml
    • [actions] backport actions from main
    • [Tests] always use String(x) over x.toString()
    • [Tests] remove nonexistent tape option
    • [Dev Deps] backport from main
    Commits
    • 298bfa5 v6.5.3
    • ed0f5dc [Fix] parse: ignore __proto__ keys (#428)
    • 691e739 [Robustness] stringify: avoid relying on a global undefined (#427)
    • 1072d57 [readme] remove travis badge; add github actions/codecov badges; update URLs
    • 12ac1c4 [meta] fix README.md (#399)
    • 0338716 [actions] backport actions from main
    • 5639c20 Clean up license text so it’s properly detected as BSD-3-Clause
    • 51b8a0b add FUNDING.yml
    • 45f6759 [Fix] fix for an impossible situation: when the formatter is called with a no...
    • f814a7f [Dev Deps] backport from main
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(v4.0.2)
  • v4.0.2(Dec 18, 2022)

    What's Changed

    • Fixed bug breaking compatibility with certain devices (#701)

    Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.0.1...v4.0.2

    Source code(tar.gz)
    Source code(zip)
  • v4.0.1(Dec 10, 2022)

    What's Changed

    • Running recognize or detect with invalid image argument now throws error message (#699)
    • Fixed bug with custom langdata paths (#697)

    New Contributors

    • @fmonpelat made their first contribution in https://github.com/naptha/tesseract.js/pull/697

    Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.0.0...v4.0.1

    Source code(tar.gz)
    Source code(zip)
  • v4.0.0(Nov 25, 2022)

    Breaking Changes

    1. createWorker is now async
      1. In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
      2. Calling with invalid workerPath or corePath now produces error/rejected promise (#654)
    2. worker.load is no longer needed (createWorker now returns worker pre-loaded)
    3. getPDF function replaced by pdf recognize option (#488)
      1. This allows PDFs to be created when using a scheduler
      2. See browser and node examples for usage

    Major New Features

    1. Processed images created by Tesseract can be retrieved using imageColor, imageGrey, and imageBinary options (#588)
      1. See image-processing.html example for usage
    2. Image rotation options rotateAuto and rotateRadians have been added, which significantly improve accuracy on certain documents
      1. See Issue #648 example of how auto-rotation improves accuracy
      2. See image-processing.html example for usage of rotateAuto option
    3. Tesseract parameters (usually set using worker.setParameters) can now be set for single jobs using worker.recognize options (#665)
      1. For example, a single job can be set to recognize only numbers using worker.recognize(image, {tessedit_char_whitelist: "0123456789"})
      2. As these settings are reverted after the job, this allows for using different parameters for specific jobs when working with schedulers
    4. Initialization parameters (e.g. load_system_dawg, load_number_dawg, and load_punc_dawg) can now be set (#613)
      1. The third argument to worker.initialize now accepts either (1) an object with key/value pairs or (2) a string containing contents to write to a config file
      2. For example, both of these lines set load_number_dawg to 0:
        1. worker.initialize('eng', "0", {load_number_dawg: "0"});
        2. worker.initialize('eng', "0", "load_number_dawg 0");

    Other Changes

    1. loadLanguage now resolves without error when language is loaded but writing to cache fails
      1. This allows for running in Firefox incognito mode using default settings (#609)
    2. detect returns null values when OS detection fails rather than throwing error (#526)
    3. Memory leak causing crashes fixed (#678)
    4. Cache corruption should now be much less common (#666)

    New Contributors

    • @reda-alaoui made their first contribution in https://github.com/naptha/tesseract.js/pull/570

    Full Changelog: https://github.com/naptha/tesseract.js/compare/v3.0.3...v4.0.0

    Source code(tar.gz)
    Source code(zip)
  • v3.0.3(Sep 20, 2022)

    What's Changed

    • Invalid language data now throws error at initialize step (#602)
    • Recognition progress logging fixed (#655)
    • Minor changes to types, documentation

    Full Changelog: https://github.com/naptha/tesseract.js/compare/v3.0.2...v3.0.3

    Source code(tar.gz)
    Source code(zip)
  • v3.0.2(Aug 20, 2022)

    What's Changed

    • Updated to Tesseract.js-core v.3.0.1 (uses Tesseract v5.1.0)
    • Added SIMD-enabled build, automatic detection of supported devices
    • Fix caching of bad langData responses by @andreialecu in https://github.com/naptha/tesseract.js/pull/585
    • Added benchmark code and assets per #628 by @Balearica in https://github.com/naptha/tesseract.js/pull/629
    • Replaced child_process with worker_threads per #630 by @Balearica in https://github.com/naptha/tesseract.js/pull/631
    • Updated to webpack 5 for compatibility with Node.js 18 by @Balearica in https://github.com/naptha/tesseract.js/pull/640

    New Contributors

    • @andreialecu made their first contribution in https://github.com/naptha/tesseract.js/pull/585
    • @SusanDoggie made their first contribution in https://github.com/naptha/tesseract.js/pull/621

    Full Changelog: https://github.com/naptha/tesseract.js/compare/v2.1.5...v3.0.2

    Source code(tar.gz)
    Source code(zip)
  • v2.1.5(Aug 2, 2021)

    • Add language constants (thanks to @stonefruit )
    • Add user job id to logger (thanks to @miguelm3)
    • Fix env selection bug in electron (thanks to @LoginovIlya)
    Source code(tar.gz)
    Source code(zip)
  • v2.1.4(Oct 14, 2020)

    • Fix Electron WebView (thanks to @CedricCouton )
    • Fix security vulnerabilities by upgrading packages
    • Migrate from Travis CI to Github Actions
    • Add CodeQL scanning
    Source code(tar.gz)
    Source code(zip)
  • v2.1.3(Sep 15, 2020)

  • v2.1.2(Sep 10, 2020)

  • v2.1.1(Mar 25, 2020)

  • v2.1.0(Mar 20, 2020)

    Major Changes

    • Upgrade to emscripten 1.39.10 upstream
    • Upgrade to tesseract v4.1.1
    • Add FS functions: writeText, readText, removeFile, FS

    Minor Changes

    • Fix errors in typescript definition file (src/index.d.ts)
    • Verify user_defined_dpi in Worker.setParameters
    • Update gitpod configuration
    • Fix security issues by npm audit fix --force
    Source code(tar.gz)
    Source code(zip)
  • v2.0.2(Jan 2, 2020)

  • v2.0.1(Dec 23, 2019)

    Major Changes:

    • Add tesseract.js logo
    • Update Worker.recognize() API, now only one rectangle can be passed to options
    • Add Electron support, example repo

    Minor Changes:

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Dec 19, 2019)

  • v2.0.0-beta.2(Oct 28, 2019)

  • v2.0.0-beta.1(Dec 18, 2019)

    Breaking Changes:

    • Refactor core APIs
      • Rewrite APIs to be more imperative
      • Add Scheduler

    Minor Changes:

    • Update index.d.ts to support TypeScript
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0-alpha.16(Dec 18, 2019)

    Minor Changes:

    • Add workerBlobURL option to allow loading worker script without Blob
    • Remove node-fetch
    • Add isBrowser to resolve DOMException undfined issue in node
    • Upgrade to tesseract.js-core v2.0.0-beta.12
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0-alpha.15(Aug 25, 2019)

  • v2.0.0-alpha.13(Jul 24, 2019)

  • v2.0.0-alpha.12(Jul 16, 2019)

    Thanks for contributions from @nisarhassan12, @iscoolapp, @tmcw and @monkeywithacupcake !!

    Breaking Changes:

    Major update:

    • Now you can recognize only part of the image, check example HERE
    • detect() and recognize() is now thenable, check HERE

    New features:

    • Add Open Collective
    • Add Gitpod support
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0-alpha.11(Jul 2, 2019)

    Thanks to @HoldYourWaffle, @tmcw and @antonrifco for PRs. Major changes:

    • Default OCR Engine Mode changes to LSTM_ONLY

    Minor changes:

    • Remove LFS support
    • Enable new image formats
      • Buffer (Node.js only)
      • Base64 string
    • Fix process issue (for angular.js)
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0-alpha.9(Jun 2, 2019)

  • v2.0.0-alpha.10(Jun 2, 2019)

  • v2.0.0-alpha.8(May 22, 2019)

    • Ignore DOMException in loadLang for now (a weird issue in Google Chrome)
    • Update documents about how to train your own data.
    • Minor restructure tests folder
    • Fix lint error
    • Add babel-loader to transpile code to pure es5 (for browser compatibility)
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0-alpha.7(May 18, 2019)

  • v2.0.0-alpha.6(May 17, 2019)

  • v1.0.19(May 14, 2019)

  • v2.0.0-alpha.3(May 14, 2019)

  • v2.0.0-alpha.2(May 14, 2019)

  • v2.0.0-alpha.1(May 14, 2019)

Owner
Project Naptha
highlight, copy, search, edit and translate text in any image
Project Naptha
Simple SDF mesh generation in Python

Generate 3D meshes based on SDFs (signed distance functions) with a dirt simple Python API.

Michael Fogleman 1.1k Jan 08, 2023
TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

FOTS: Fast Oriented Text Spotting with a Unified Network I am still working on this repo. updates and detailed instructions are coming soon! Table of

Masao Taketani 52 Nov 11, 2022
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 06, 2022
原神风花节自动弹琴辅助

GenshinAutoPlayBalladsofBreeze 原神风花节自动弹琴辅助(已适配1920*1080分辨率) 本程序基于opencv图像识别技术,不存在任何封号。 因为正确率取决于你的cpu性能,10900k都不一定全对。 由于图像识别存在误差,根本无法确定出错时间。更不用说被检测到了。

晓轩 20 Oct 27, 2022
OpenCVを用いたカメラキャリブレーションのサンプルです。2021/06/21時点でPython実装のある3種類(通常カメラ向け、魚眼レンズ向け(fisheyeモジュール)、全方位カメラ向け(omnidirモジュール))について用意しています。

OpenCV-CameraCalibration-Example FishEyeCameraCalibration.mp4 OpenCVを用いたカメラキャリブレーションのサンプルです 2021/06/21時点でPython実装のある以下3種類について用意しています。 通常カメラ向け 魚眼レンズ向け(

KazuhitoTakahashi 34 Nov 17, 2022
Msos searcher - A half-hearted attempt at finding a magic square of squares

MSOS searcher A half-hearted attempt at finding (or rather searching) a MSOS (Magic Square of Squares) in the spirit of the Parker Square. Running I r

Niels Mündler 1 Jan 02, 2022
BoxToolBox is a simple python application built around the openCV library

BoxToolBox is a simple python application built around the openCV library. It is not a full featured application to guide you through the w

František Horínek 1 Nov 12, 2021
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 04, 2023
ERQA - Edge Restoration Quality Assessment

ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR, deblurring, denoising, etc) are restoring real details.

MSU Video Group 27 Dec 17, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

81 Dec 01, 2022
A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Home-Security-Demo A facial recognition program that plays a alarm (mp3 file) when a person is seen in the room. A basic theif using Python and OpenCV

SysKey 4 Nov 02, 2021
Dirty, ugly, and hopefully useful OCR of Facebook Papers docs released by Gizmodo

Quick and Dirty OCR of Facebook Papers Gizmodo has been working through the Facebook Papers and releasing the docs that they process and review. As lu

Bill Fitzgerald 2 Oct 28, 2021
Drowsiness Detection and Alert System

A countless number of people drive on the highway day and night. Taxi drivers, bus drivers, truck drivers, and people traveling long-distance suffer from lack of sleep.

Astitva Veer Garg 4 Aug 01, 2022
Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Sergio Díaz Fernández 1 Jan 13, 2022
Pre-Recognize Library - library with algorithms for improving OCR quality.

PRLib - Pre-Recognition Library. The main aim of the library - prepare image for recogntion. Image processing can really help to improve recognition q

Alex 80 Dec 30, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65.7k Jan 03, 2023
Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Rizky Dermawan 4 Mar 10, 2022
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022