一键翻译各类图片内文字

Last update: Dec 28, 2022

Related tags

Computer Vision manga-image-translator

Overview

一键翻译各类图片内文字

针对群内、各个图站上大量不太可能会有人去翻译的图片设计，让我这种日语小白能够勉强看懂图片
主要支持日语，不过也能识别汉语和小写英文
支持简单的涂白和嵌字
该项目是求闻转译志的v2版本

使用说明

clone这个repo
下载ocr.ckpt和detect.ckpt，放到这个repo的根目录下
申请百度翻译API，把你的appid和密钥存到key.py里
运行python translate_demo.py --image <图片文件路径>，结果会存放到result文件夹里

只是初步版本，我们需要您的帮助完善

这个项目目前只完成了简单的demo，依旧存在大量不完善的地方，我们需要您的帮助完善这个项目！

下一步

完善这个项目

图片涂改目前只是简单的涂白，图片修补的模型正在训练中！
【重要，请求帮助】目前的文字渲染引擎只能勉强看，和Adobe的渲染引擎差距明显，我们需要您的帮助完善文本渲染！
我尝试了在OCR模型里提取文字颜色，均以失败告终，现在只能用DPGMM凑活提取文字颜色，但是效果欠佳，我会尽量完善文字颜色提取，如果您有好的建议请尽管提issue
文本检测目前不能很好处理英语和韩语，等图片修补模型训练好了我就会训练新版的文字检测模型。
文本渲染区域是根据检测到的文本，而不是汽包决定的，这样可以处理没有汽包的图片但是不能很好进行英语嵌字，目前没有想到好的解决方案。
Ryota et al.提出了获取配对漫画作为训练数据，训练可以结合图片内容进行翻译的模型，未来可以考虑把大量图片VQVAE化，输入nmt的encoder辅助翻译，而不是分框提取tag辅助翻译，这样可以处理范围更广的图片。这需要我们也获取大量配对翻译漫画/图片数据，以及训练VQVAE模型。
求闻转译志针对视频设计，未来这个项目要能优化到可以处理视频，提取文本颜色用于生成ass字幕，进一步辅助东方视频字幕组工作。甚至可以涂改视频内容，去掉视频内字幕。

效果图

原始图片	翻译后图片

Citation

@inproceedings{baek2019character,
  title={Character region awareness for text detection},
  author={Baek, Youngmin and Lee, Bado and Han, Dongyoon and Yun, Sangdoo and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9365--9374},
  year={2019}
}
@article{hinami2020towards,
  title={Towards Fully Automated Manga Translation},
  author={Hinami, Ryota and Ishiwatari, Shonosuke and Yasuda, Kazuhiko and Matsui, Yusuke},
  journal={arXiv preprint arXiv:2012.14271},
  year={2020}
}
@article{oord2017neural,
  title={Neural discrete representation learning},
  author={Oord, Aaron van den and Vinyals, Oriol and Kavukcuoglu, Koray},
  journal={arXiv preprint arXiv:1711.00937},
  year={2017}
}

Comments

Translation of big images

For images with height more then 4000px I'm getting error error-too-large But that really nessasary for web's. It has for example 720 × 6781 size. I see the restriction in the code. Why It was added? What is the real limit? Can it be changed for example to widthheight < 40004000?

opened by purpleraven 20

Translation is failed when using docker

I deployed with docker compose. And then i accessed web and then test with image. But I got this error message

manga_image_translator_gpu  | New `submit` task d54abe601f1c164299865da1999e479cd043999676634b0fe47968e7e4f2f06b-M-google-KOR-default-auto
manga_image_translator_gpu  |  -- Processing task d54abe601f1c164299865da1999e479cd043999676634b0fe47968e7e4f2f06b-M-google-KOR-default-auto
manga_image_translator_gpu  |  -- Detection resolution 1536
manga_image_translator_gpu  |  -- Detector using default
manga_image_translator_gpu  |  -- Render text direction is auto
manga_image_translator_gpu  |  -- Preparing translator
manga_image_translator_gpu  |  -- Preparing upscaling
manga_image_translator_gpu  |  -- Running upscaling
manga_image_translator_gpu  | Task state d54abe601f1c164299865da1999e479cd043999676634b0fe47968e7e4f2f06b-M-google-KOR-default-auto to upscaling
manga_image_translator_gpu  | /app/models/waifu2x-linux/waifu2x-ncnn-vulkan: error while loading shared libraries: libvulkan.so.1: cannot open shared object file: No such file or directory
manga_image_translator_gpu  | Traceback (most recent call last):
manga_image_translator_gpu  |   File "/app/translate_demo.py", line 283, in infer_safe
manga_image_translator_gpu  |     return await infer(
manga_image_translator_gpu  |   File "/app/translate_demo.py", line 147, in infer
manga_image_translator_gpu  |     img_upscaled_pil = (await dispatch_upscaling('waifu2x', [image], ratio, args.use_cuda))[0]
manga_image_translator_gpu  |   File "/app/upscaling/__init__.py", line 35, in dispatch
manga_image_translator_gpu  |     return await upscaler.upscale(image_batch, upscale_ratio)
manga_image_translator_gpu  |   File "/app/upscaling/common.py", line 17, in upscale
manga_image_translator_gpu  |     return await self._upscale(image_batch, upscale_ratio)
manga_image_translator_gpu  |   File "/app/upscaling/common.py", line 26, in _upscale
manga_image_translator_gpu  |     return await self.forward(*args, **kwargs)
manga_image_translator_gpu  |   File "/app/utils.py", line 338, in forward
manga_image_translator_gpu  |     return await self._forward(*args, **kwargs)
manga_image_translator_gpu  |   File "/app/upscaling/waifu2x.py", line 67, in _forward
manga_image_translator_gpu  |     self._run_waifu2x_executable(in_dir, out_dir, upscale_ratio, 0)
manga_image_translator_gpu  |   File "/app/upscaling/waifu2x.py", line 92, in _run_waifu2x_executable
manga_image_translator_gpu  |     subprocess.check_call(cmds)
manga_image_translator_gpu  |   File "/opt/conda/lib/python3.9/subprocess.py", line 373, in check_call
manga_image_translator_gpu  |     raise CalledProcessError(retcode, cmd)
manga_image_translator_gpu  | subprocess.CalledProcessError: Command '['/app/models/waifu2x-linux/waifu2x-ncnn-vulkan', '-i', '/tmp/tmp4ou8jfmp', '-o', '/tmp/tmpok5pw82w', '-m', '/app/models/waifu2x-linux/models-cunet', '-s', '4', '-n', '0']' returned non-zero exit status 127.
manga_image_translator_gpu  | Task state d54abe601f1c164299865da1999e479cd043999676634b0fe47968e7e4f2f06b-M-google-KOR-default-auto to error

What is wrong?

opened by kkbpower 15

Upload Failed & Access is denies error

I cloned the repo and install all the required module, but I get upload failed error with web mode

F:\Offline\Manga[MTL] Machine Translation tool\manga-image-translator-main>python translate_demo.py --mode web --use-inpainting --verbose --translator=google --target-lang=ENG Namespace(mode='web', image='', image_dst='', size=1536, use_inpainting=True, use_cuda=False, force_horizontal=False, inpainting_size=2048, unclip_ratio=2.3, box_threshold=0.7, text_threshold=0.5, text_mag_ratio=1, translator='google', target_lang='ENG', use_ctd=False, verbose=True) -- Loading models -- Running in web service mode -- Waiting for translation tasks fail to initialize deepl : auth_key must not be empty switch to google translator Serving up app on 127.0.0.1:5003

Also, when I try to run batch translation, I get access denied error like this F:\Offline\Manga[MTL] Machine Translation tool\manga-image-translator-main>python translate_demo.py --image <F:\Offline\Manga\Sousaku Kanojo\lime message> --use-inpainting --verbose --translator=google --target-lang=ENG Access is denied.

Anyone know the solution or maybe somethings is wrong with my command?

opened by bregassa 12
Error: (-215:Assertion failed) in function 'warpPerspective'

I am getting the error in the title on some image files. Here is an example.

I tried the online demo in case it was a problem local to my machine, and it also produced an error.
bug

opened by crotron 11
Can I use wider area while text rendering process?

Because it is awkward that reading text vertically, I modified text_render.py to always render text horizontally. But since original text area is too narrow, this result is still hard to read. So, I want to know it is possible to modify code to use more wider area while text rendering process.(and how to)

Sorry for my poor English and thanks in advance.

opened by ppkppk8520 10
Add Offline translation Support
What

This PR adds support for an offline translation mode. (And also adds a local docker stack along side a few other QOL improvements) (Does aim to partially remedy issues found in #41)

How

This PR relies on the https://ai.facebook.com/research/no-language-left-behind/ model for it's reasonable performance in per sentence language translation and broad support for all currently mapped languages in the tool (In every mapped combination). It supports two modes :

"offline" which uses the facebook/nllb-200-distilled-600M model

"offline_big" which uses the facebook/nllb-200-distilled-1.3B model

It should be noted that "NLLB-200 is a research model and is not released for production deployment." but in internal testing the translation quality seems acceptable enough to continue for using as a base for a basic offline translation model that can be used to fill any / all language variations.

More specific models like the one used in Sugoi Translator, http://www.kecl.ntt.co.jp/icl/lirg/jparacrawl/ or https://github.com/argosopentech/argos-translate may offer higher quality translation options but would require a more complex implementation / be significantly less broad in scope.

This has been tested in both CPU (3950x) and GPU (RTX3080) modes and demonstrated reasonable performance in each. With the big model loaded GPU ram consumption does not exceed 10GB. (Hovering around 9.5)

The model is automatically downloaded at runtime when "offline" is set for the translation model and then cached.

Why

So in theory this could be run standalone without any dependencies on external services. :thinking:
opened by ryanolee 9
Add docker support
What

This PR aims to add docker support to the manga image translator project. It also add's a few minor bug fixes / quality of life improvements to the CLI.

More specifically:

Docker image for the CLI

Examples for how to use the CLI with docker

CICD pipeline for when docker is revised

Options to rebind the port of the web server

Options to change the host of the server (0.0.0.0) for docker

Options to log output from the running web server

Why

Currently the setup process requires a very specific application environment. Dockerising the tool should make it way way more portable for other people to use. (In theory running a single command to get a working environment setup)

How

By adding a build pipeline for docker along side documentation on it's use.

How to test

Se documentation changes in PR

Caveats

Currently the docker file is hosted on my docker hub. If this does look like it could be good to merge might be worth you @zyddnys setting up a docker hub quickly and hooking up the release pipeline. (Should be done now)

The documentation will need updating to remove references to my name (Should be done now)

Unfortunately I do not speak mandarin so cannot update the CN readme.md :smiling_face_with_tear:

The docker container is fairly big (~5GB)

Upon new releases the docker file might need updating to point to new models by changing the RELEASE build arg

In either case thanks for looking at this and great job with the tool! Any comment would be much appreciated :eyes:
opened by ryanolee 9

error offline using windows 11

hi, sorry for my bad english i use windows 11, python 310 when using offline mode, it error

C:\Users\GIGABYTE>cd /d E:\manga-image-translator
E:\manga-image-translator>python translate_demo.py --verbose --translator=offline --target-lang=ENG --image C:\Users\GIGABYTE\Desktop\eat\download\009\3.png
Namespace(mode='demo', image='C:\\Users\\GIGABYTE\\Desktop\\eat\\download\\009\\3.png', image_dst='', target_lang='ENG', verbose=True, host='127.0.0.1', port=5003, log_web=False, ocr='48px_ctc', inpainter='lama_mpe', translator='offline', use_cuda=False, use_cuda_limited=False, detection_size=1536, inpainting_size=2048, unclip_ratio=2.3, box_threshold=0.7, text_threshold=0.5, text_mag_ratio=1, font_size_offset=0, force_horizontal=False, force_vertical=False, upscale_ratio=None, use_ctd=False, manga2eng=False, eng_font='fonts/comic shanns 2.ttf')
 -- Preload Checks
 -- Loading models
 -- Running in single image demo mode
 -- Detection resolution 1536
 -- Detector using default
 -- Render text direction is auto
 -- Preparing translator
 -- Preparing upscaling
 -- Running upscaling
[0 NVIDIA GeForce GTX 1660 SUPER]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[0 NVIDIA GeForce GTX 1660 SUPER]  bugsbn1=0  bugbilz=51  bugcopc=0  bugihfa=0
[0 NVIDIA GeForce GTX 1660 SUPER]  fp16-p/s/a=1/1/1  int8-p/s/a=1/1/1
[0 NVIDIA GeForce GTX 1660 SUPER]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
 -- Running text detection
Detection resolution: 1280x1536
 -- Running OCR
------------------------------
世: 0.9999960661089133
子: 0.9999487426325072
！: 0.9994326474291275
------------------------------
そ: 0.9999395645427589
ろ: 0.9999988079084972
そ: 0.9999650728993075
ろ: 0.9999986886995842
------------------------------
私: 1.0
も: 0.9999935627400018
我: 0.9999998807907247
慢: 1.0
の: 0.9999998807907247
限: 0.9999998807907247
界: 0.9999996423722521
だ: 0.9999955892755635
------------------------------
国: 0.9996509578637115
境: 0.9999815229018083
の: 1.0
な: 0.9998477929052435
い: 0.9999911785905904
楽: 0.9999907017622996
し: 0.9998875982730323
み: 0.9999920130413283
------------------------------
絶: 1.0
対: 0.9999957084838798
に: 0.9999997615814777
許: 0.9999974966112362
さ: 0.9999606624830781
れ: 0.9999986886995842
な: 0.9999624504845601
い: 0.9999949932351057
------------------------------
今: 0.9999971389852362
年: 1.0
の: 1.0
内: 0.9996265376028781
に: 0.9999996423722521
決: 0.9999963045256735
断: 0.9999983310727032
せ: 0.9999849798550975
よ: 0.9999939203633587
------------------------------
跡: 1.0
継: 1.0
ぎ: 0.9999963045256735
を: 0.9999986886995842
絶: 0.9999997615814777
や: 0.999977350747961
す: 0.9999538681349789
な: 0.9985389629265963
ど: 0.999985933501902
0.9997924528221414 世子！ fg: (1, 0, 1) bg: (0, 1, 1)
0.9999755332023961 そろそろ fg: (0, 0, 1) bg: (0, 0, 0)
0.9999985545922271 私も我慢の限界だ fg: (0, 0, 3) bg: (0, 0, 3)
0.9999177141633738 国境のない楽しみ fg: (64, 42, 43) bg: (62, 40, 43)
0.999988720072921 絶対に許されない fg: (0, 0, 0) bg: (0, 0, 0)
0.999955199323479 今年の内に決断せよ fg: (0, 0, 0) bg: (0, 0, 0)
0.999827770426052 跡継ぎを絶やすなど fg: (0, 0, 1) bg: (0, 0, 1)
 -- spliting {0, 4, 6}
to split [0, 4, 6]
edge_weights [185.02432272541898, 175.025712396779]
std: 4.999305164319992, mean: 180.025017561099
 -- spliting {1, 2, 5}
to split [1, 2, 5]
edge_weights [180.01111076819674, 172.14238292762187]
std: 3.9343639202874385, mean: 176.0767468479093
 -- spliting {3}
to split [3]
region_indices [{0, 4, 6}, {1, 2, 5}, {3}]
 -- Generating text mask
100%|████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  7.97it/s]
 -- Running inpainting
Inpainting resolution: 1560x2048
 -- Translating
 -- Selected translator: SugoiTranslator
2022-12-19 04:56:34 | INFO | fairseq.file_utils | loading archive file E:\manga-image-translator\models\translators\jparacrawl/
2022-12-19 04:56:34 | INFO | fairseq.file_utils | loading archive file E:\manga-image-translator\models\translators\jparacrawl/
Traceback (most recent call last):
  File "E:\manga-image-translator\translate_demo.py", line 394, in <module>
    loop.run_until_complete(main(args.mode))
  File "C:\Python\Python310\lib\asyncio\base_events.py", line 646, in run_until_complete
    return future.result()
  File "E:\manga-image-translator\translate_demo.py", line 323, in main
    await infer(Image.open(args.image), mode)
  File "E:\manga-image-translator\translate_demo.py", line 227, in infer
    translated_sentences = await dispatch_translation(translator, src_lang, tgt_lang, queries, use_cuda = args.use_cuda and not args.use_cuda_limited)
  File "E:\manga-image-translator\translators\__init__.py", line 91, in dispatch
    result = await asyncio.create_task(translator.translate(src_lang, tgt_lang, queries))
  File "E:\manga-image-translator\translators\selective.py", line 71, in translate
    await self._real_translator.load(*self._cached_load_params)
  File "E:\manga-image-translator\translators\common.py", line 62, in load
    return await super().load(device, *self.parse_language_codes(from_lang, to_lang))
  File "E:\manga-image-translator\utils.py", line 330, in load
    await self._load(*args, **kwargs, device=device)
  File "E:\manga-image-translator\translators\sugoi.py", line 61, in _load
    self.model = TransformerModel.from_pretrained(
  File "C:\Python\Python310\lib\site-packages\fairseq\models\fairseq_model.py", line 267, in from_pretrained
    x = hub_utils.from_pretrained(
  File "C:\Python\Python310\lib\site-packages\fairseq\hub_utils.py", line 73, in from_pretrained
    models, args, task = checkpoint_utils.load_model_ensemble_and_task(
  File "C:\Python\Python310\lib\site-packages\fairseq\checkpoint_utils.py", line 423, in load_model_ensemble_and_task
    raise IOError("Model file not found: {}".format(filename))
OSError: Model file not found: E:\manga-image-translator\models\translators\jparacrawl/base.pretrain.ja-en.pt

my prediction is this "/" in "/base.pretrain.ja-en.pt" because you write in linux OS, when i use windows OS that use "\", the error triggered

bug

opened by kucingkembar 8

自动切换翻译器问题

大佬好，我在Windows段执行命令行翻译整个文件夹，发现其中有段问题： Inpainting resolution: 1360x1920 -- Translating oh no. fail to initialize deepl : auth_key must not be empty switch to google translator -- Rendering translated text 似乎识别我所选择的翻译器为deepl了，实际上我所执行的为 python translate_demo.py --verbose --mode batch --use-inpainting --use-cuda --translator=baidu --target-lang=CHS --image D:/translate/1234 很显然，我所选择的是百度翻译，但是却切换为了谷歌翻译器了，该如何解决这个问题呢，谢谢！

opened by biandefeng0315 8
AttributeError: 'TextBlock' object has no attribute 'aabb'

我尝试修改代码在处理照片的exe，但遇到了这个问题 File "C:/ipython/image/aaaaaaaaaaaa.py", line 284, in loop.run_until_complete(main()) File "C:\Users\19378.conda\envs\pytorch\lib\asyncio\base_events.py", line 616, in run_until_complete return future.result() File "C:/ipython/image/aaaaaaaaaaaa.py", line 278, in main await infer(img, mode, '', alpha_ch = alpha_ch) File "C:/ipython/image/aaaaaaaaaaaa.py", line 137, in infer await dispatch_ocr(img, text_regions, True, model_name="32px", verbose=True) File "C:\ipython\image\ocr_init_.py", line 276, in dispatch return run_ocr_32px(img, cuda, list(generate_text_direction(textlines)), batch_size, verbose=verbose) File "C:\ipython\image\ocr_init_.py", line 254, in generate_text_direction if quadrilateral_can_merge_region(ubox, vbox): File "C:\ipython\image\utils.py", line 1685, in quadrilateral_can_merge_region b1 = a.aabb File "C:\ipython\image\textblockdetector\textblock.py", line 185, in getattribute return object.getattribute(self, name) AttributeError: 'TextBlock' object has no attribute 'aabb' aabb是在Quadrilateral类中，但他会调用textblockdetector\textblock.py导致错误，应该如何解决呢

opened by shkds 7
A simplistic and naive wordbreak implementation
Note: I'm not a python developer, so might contain horrible code/not use obvious libs/methods

I gave it a try to add some more logic to the world/line break logic, so it doesn't break off single letters/spaces and uses -. becomes

This is still far from perfect, but an improvement. The issues I noticed:

it adds line breaks to places like I'd/don't, making it I-'d/don-'t

it adds linebreaks where the words would have more than enough space like in the bottom left example, pull could as well be in the same/next line, shifting everything down a bit.
opened by tr7zw 7
$cv2.error: OpenCV(4.7.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\imgwarp.cpp:1724: error: (-215:Assertion failed) dst.cols < SHRT_MAX && dst.rows < SHRT_MAX && src.cols < SHRT_MAX && src.rows < SHRT_MAX in function 'cv::remap'$

cv2.error: OpenCV(4.7.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\imgwarp.cpp:1724: error: (-215:Assertion failed) dst.cols < SHRT_MAX && dst.rows < SHRT_MAX && src.cols < SHRT_MAX && src.rows < SHRT_MAX in function 'cv::remap'

cv2.error: OpenCV(4.7.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\imgwarp.cpp:1724: error: (-215:Assertion failed) dst.cols < SHRT_MAX && dst.rows < SHRT_MAX && src.cols < SHRT_MAX && src.rows < SHRT_MAX in function 'cv::remap'

opened by My12123 0
Error logging in web mode
Currently, in web mode during batch images processing very difficult to find logs for non processed images. Will be much easyly, with error log for the image in same folder with it.

Now for every task working dirrectory created. Something like this

result/89d8b9b34198b23753309232f60fc118feab2c9cde671bdcb6467f801c4f6589--deepl-RUS-default-auto/

the logs for one image processing can be storred in the same folder in log.txt

In that case I can create bug reports very fast =)
enhancement
opened by purpleraven 0

Error arithm.cpp:647 during rendering with custom font

192 83 113 74
font_size: 47
['ХАХА,', 'КАКОЙ', 'ИНТЕРЕСН-', 'ЫЙ', 'ПЕРСОНАЖ.'] [157, 214, 361, 80, 362]
HAH?
ХАХ?
520 1160 68 33
font_size: 79
['ХАХ?'] [237]
WHY DID-YOU SUDDENLY TURN INTO. A TOTAL DICK?
ПОЧЕМУ ТЫ ВДРУГ ПРЕВРАТИЛСЯ В. В ПОЛНОГО ПРИДУРКА?
608 1193 271 39
font_size: 47
['ПОЧЕМУ ТЫ ВДРУГ ПРЕВРАТИЛ-', 'СЯ В. В ПОЛНОГО ПРИДУРКА?'] [944, 877]
个腾讯动漫 ac.aa.caM
个腾讯动漫 ac.aa.caM
804 1238 110 36
font_size: 47
['个腾讯动漫', 'ac.aa.caM'] [235, 332]
HEY KID! YOUR ELDER'S ASKING YOU A QUESTION.
ЭЙ, МАЛЫШ! ТВОЙ СТАРЕЙШИНА ЗАДАЕТ ТЕБЕ ВОПРОС.
400 425 159 71
font_size: 79
['ЭЙ, МАЛЫШ! ТВОЙ', 'СТАРЕЙШИНА', 'ЗАДАЕТ ТЕБЕ', 'ВОПРОС.'] [879, 659, 712, 462]
Traceback (most recent call last):
  File "/app/translate_demo.py", line 282, in infer_safe
    return await infer(
  File "/app/translate_demo.py", line 263, in infer
    output = await dispatch_rendering(np.copy(img_inpainted), args.text_mag_ratio, translated_sentences, textlines, text_regions, render_text_direction_overwrite, args.target_lang, args.font_size_offset)
  File "/app/text_rendering/__init__.py", line 65, in dispatch
    img_canvas = render(img_canvas, font_size, text_mag_ratio, trans_text, region, majority_dir, fg, bg, False, font_size_offset)
  File "/app/text_rendering/__init__.py", line 119, in render
    temp_box = text_render.put_text_horizontal(
  File "/app/text_rendering/text_render.py", line 497, in put_text_horizontal
    offset_x = put_char_horizontal(font_size, rot, t, pen_line, canvas_text, canvas_border, border_size=bgsize)
  File "/app/text_rendering/text_render.py", line 471, in put_char_horizontal
    canvas_border[pen_border[1]:pen_border[1]+bitmap_b.rows, pen_border[0]:pen_border[0]+bitmap_b.width] = cv2.add(canvas_border[pen_border[1]:pen_border[1]+bitmap_b.rows, pen_border[0]:pen_border[0]+bitmap_b.width], bitmap_border)
cv2.error: OpenCV(4.6.0) /io/opencv/modules/core/src/arithm.cpp:647: error: (-209:Sizes of input arguments do not match) The operation is neither 'array op array' (where arrays have the same size and the same number of channels), nor 'array op scalar', nor 'scalar op array' in function 'arithm_op'

happance for that images translations. Any ideas? 0011 0006 0002

bug

opened by purpleraven 5

Releases(beta-0.3)

beta-0.3(Apr 23, 2022)

请下载detect.ckpt，ocr.ckpt，ocr-ctc.ckpt，comictextdetector.pt，comictextdetector.pt.onnx和inpainting_lama_mpe.ckpt Please download ocr.ckpt, ocr-ctc.ckpt, detect.ckpt, comictextdetector.pt, comictextdetector.pt.onnx and inpainting_lama_mpe.ckpt
Source code(tar.gz)
Source code(zip)
comictextdetector.pt(76.24 MB)
comictextdetector.pt.onnx(90.28 MB)
detect.ckpt(294.09 MB)
inpainting.ckpt(21.72 MB)
inpainting_lama_mpe.ckpt(103.55 MB)
jparacrawl-base-models.zip(523.18 MB)
jparacrawl-big-models.zip(1542.79 MB)
ocr-ctc.ckpt(161.24 MB)
ocr.ckpt(163.34 MB)
beta-0.2.1(Jul 27, 2021)

请下载detect.ckpt，ocr.ckpt, comictextdetector.pt, comictextdetector.pt.onnx和inpainting.ckpt Please download ocr.ckpt, detect.ckpt, comictextdetector.pt, comictextdetector.pt.onnx and inpainting.ckpt
Source code(tar.gz)
Source code(zip)
comictextdetector.pt(76.24 MB)
comictextdetector.pt.onnx(90.28 MB)
detect.ckpt(294.09 MB)
inpainting.ckpt(21.72 MB)
ocr.ckpt(163.34 MB)
alpha-v3.0.0(May 21, 2021)

请下载detect.ckpt，ocr.ckpt和inpainting.ckpt
Please download ocr.ckpt, detect.ckpt and inpainting.ckpt
Source code(tar.gz)
Source code(zip)
detect.ckpt(294.09 MB)
inpainting.ckpt(21.72 MB)
ocr.ckpt(391.00 MB)
alpha-v2.2.1(May 6, 2021)

Please download ocr.ckpt and detect.ckpt here and download inpainting.ckpt from https://github.com/zyddnys/manga-image-translator/releases/tag/alpha-v2.2
Source code(tar.gz)
Source code(zip)
detect.ckpt(359.65 MB)
ocr.ckpt(391.00 MB)
alpha-v2.2(Mar 4, 2021)

增加了图像修补模型，OCR模型现支持文字颜色提取
请下载detect.ckpt，ocr.ckpt和inpainting.ckpt
Source code(tar.gz)
Source code(zip)
detect.ckpt(213.04 MB)
inpainting.ckpt(38.84 MB)
ocr.ckpt(378.52 MB)

Owner

GitHub Repository

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

1.8k Dec 28, 2022

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss This is an unofficial implementation of AutoVC based on the official one. The reposi

27 Jun 16, 2022

This is a real life mario project using python and mediapipe

real-life-mario This is a real life mario project using python and mediapipe How to run to run this just run - realMario.py file requirements This req

42 Dec 22, 2022

A machine learning software for extracting information from scholarly documents

GROBID GROBID documentation Visit the GROBID documentation for more detailed information. Summary GROBID (or Grobid, but not GroBid nor GroBiD) means

1.9k Jan 08, 2023

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

101 Dec 12, 2022

零样本学习测评基准，中文版

ZeroCLUE 零样本学习测评基准，中文版零样本学习是AI识别方法之一。简单来说就是识别从未见过的数据类别，即训练的分类器不仅仅能够识别出训练集中已有的数据类别，还可以对于来自未见过的类别的数据进行区分。这是一个很有用的功能，使得计算机能够具有知识迁移的能力，并无需任何训练数据，很符合现

27 Dec 10, 2022

nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

faceprocessor nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex. Tech faceprocessor uses a number of open source projec

3 Sep 06, 2021

A simple QR-Code Reader in Python

A simple QR-Code Reader written in Python, that copies the content of a QR-Code directly into the copy clipboard.

1 Oct 28, 2021

Automatically remove the mosaics in images and videos, or add mosaics to them.

1.4k Dec 30, 2022

Natural language detection

Detect the language of text. What’s so cool about franc? franc can support more languages(†) than any other library franc is packaged with support for

3.8k Jan 02, 2023

✌️Using this you can control your PC/Laptop volume by Hand Gestures created with Python.

Hand Gesture Volume Controller ✋ Hand recognition 👆 Finger recognition 🔊 you can decrease and increase volume Demo Code Firstly I have created a Mod

19 Nov 17, 2022

document image degradation

ocrodeg The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OC

134 Nov 18, 2022

OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

144 Jan 05, 2023

Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

15 Nov 09, 2022

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

76 Dec 06, 2022

PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

46 Nov 14, 2022

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image

840 Dec 26, 2022

一键翻译各类图片内文字

Related tags

Overview

一键翻译各类图片内文字

使用说明

只是初步版本，我们需要您的帮助完善

下一步

效果图

Citation

Comments

What

How

Why

What

Why

How

How to test

Caveats

Releases(beta-0.3)

beta-0.3(Apr 23, 2022)

beta-0.2.1(Jul 27, 2021)

alpha-v3.0.0(May 21, 2021)

alpha-v2.2.1(May 6, 2021)

alpha-v2.2(Mar 4, 2021)

Owner

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

This is a real life mario project using python and mediapipe

A machine learning software for extracting information from scholarly documents

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

零样本学习测评基准，中文版

nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

A simple QR-Code Reader in Python

Automatically remove the mosaics in images and videos, or add mosaics to them.

Natural language detection

✌️Using this you can control your PC/Laptop volume by Hand Gestures created with Python.

document image degradation

OCR system for Arabic language that converts images of typed text to machine-encoded text.

Some bits of javascript to transcribe scanned pages using PageXML

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

PAGE XML format collection for document image page content and more

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

The virtual calculator will be above the live streaming from your camera

Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

Implementation of EAST scene text detector in Keras