BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Overview

BERN2

We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by employing a multi-task NER model and neural network-based NEN models to achieve much faster and more accurate inference. This repository provides a way to host your own BERN2 server. See our paper for more details.

***** Try BERN2 at http://bern2.korea.ac.kr *****

Installing BERN2

You first need to install BERN2 and its dependencies.

# Install torch with conda (please check your CUDA version)
conda create -n bern2 python=3.7
conda activate bern2
conda install pytorch==1.9.0 cudatoolkit=10.2 -c pytorch
conda install faiss-gpu libfaiss-avx2 -c conda-forge

# Check if cuda is available
python -c "import torch;print(torch.cuda.is_available())"

# Install BERN2
git clone [email protected]:dmis-lab/BERN2.git
cd BERN2
pip install -r requirements.txt

(Optional) If you want to use mongodb as a caching database, you need to install and run it.

# https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/#install-mongodb-community-edition-using-deb-packages
sudo systemctl start mongod
sudo systemctl status mongod

Then, you need to download resources (e.g., external modules or dictionaries) for running BERN2. Note that you will need 70GB of free disk space.

wget http://nlp.dmis.korea.edu/projects/bern2/resources.tar.gz
tar -zxvf resources.tar.gz
rm -rf resources.tar.gz
# install CRF
cd resources/GNormPlusJava/CRF
./configure --prefix="$HOME"
make
make install
cd ../../..

Running BERN2

The following command runs BERN2.

export CUDA_VISIBLE_DEVICES=0
cd scripts
bash run_bern2.sh

(Optional) To restart BERN2, you need to run the following commands.

export CUDA_VISIBLE_DEVICES=0
cd scripts
bash stop_bern2.sh
bash start_bern2.sh

Annotations

Click here to download the annotations (NER and normalization) for 25.7+ millions of PubMed articles (From pubmed21n0001 to pubmed21n1057 (2021.01.12)) (Compressed, 18 GB).

The data provided by BERN2 is post-processed and may differ from the most current/accurate data available from U.S. National Library of Medicine (NLM).

Citation

@article{sung2022bern2,
    title={BERN2: an advanced neural biomedical namedentity recognition and normalization tool}, 
    author={Sung, Mujeen and Jeong, Minbyul and Choi, Yonghwa and Kim, Donghyeon and Lee, Jinhyuk and Kang, Jaewoo},
    year={2022},
    eprint={2201.02080},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Contact Information

For help or issues using BERN2, please submit a GitHub issue. Please contact Mujeen Sung (mujeensung (at) korea.ac.kr), or Minbyul Jeong (minbyuljeong (at) korea.ac.kr) for communication related to BERN2.

Comments
  • FileNotFoundError: [Errno 2] for local server POST

    FileNotFoundError: [Errno 2] for local server POST

    Hello. I installed BERN2 in my VM (Ubuntu 18.04 x64, 1 GPU). It succesfully installed, however, if I run example post script on READAME.md, then the below error occured. (Server console log)

    [23/Mar/2022 06:02:40.831728] [550f7242b14dc039a8a5ee6aa9233f3472dd7b4a33484f5955175a4d] GNormPlus 0.0001380443572998047 sec
    Traceback (most recent call last):
      File "/home/vessl/BERN2/bern2/bern2.py", line 106, in annotate_text
        output = self.tag_entities(text, base_name)
      File "/home/vessl/BERN2/bern2/bern2.py", line 358, in tag_entities
        async_result = loop.run_until_complete(self.async_ner(arguments_for_coroutines))
      File "/home/vessl/miniconda3/envs/bern2/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
        return future.result()
      File "/home/vessl/BERN2/bern2/bern2.py", line 491, in async_ner
        result = await asyncio.gather(*coroutines)
      File "/home/vessl/BERN2/bern2/bern2.py", line 533, in _ner_wrap
        with open(output_mtner, 'r', encoding='utf-8') as f:
    FileNotFoundError: [Errno 2] No such file or directory: './multi_ner/output/550f7242b14dc039a8a5ee6aa9233f3472dd7b4a33484f5955175a4d.PubTator.json'
    
    [2022-03-23 06:02:41,425] ERROR in app: Exception on /plain [POST]
    Traceback (most recent call last):
      File "/home/vessl/miniconda3/envs/bern2/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
        response = self.full_dispatch_request()
      File "/home/vessl/miniconda3/envs/bern2/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
        rv = self.handle_user_exception(e)
      File "/home/vessl/miniconda3/envs/bern2/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
        rv = self.dispatch_request()
      File "/home/vessl/miniconda3/envs/bern2/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
        return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
      File "/home/vessl/BERN2/app/__init__.py", line 85, in plain_api
        return Response(json.dumps({"error_message": result_dict["error_message"]}), status=404, content_type='application/json')
    NameError: name 'Response' is not defined
    127.0.0.1 - - [23/Mar/2022 06:02:41] "POST /plain HTTP/1.1" 500 -
    

    nohup_multi_ner.out log

    MTNER init_t 28.547 sec.
    1it [00:00, 185.89it/s]
    Prediction: 100%|██████████| 1/1 [00:00<00:00,  2.28it/s]
    Traceback (most recent call last):
      File "multi_ner/ner_server.py", line 91, in <module>
        run_server(mt_ner, args)
      File "multi_ner/ner_server.py", line 63, in run_server
        mtner_recognize(model, dict_path, base_name, args)
      File "multi_ner/ner_server.py", line 47, in mtner_recognize
        with open(output_mt_ner, 'w', encoding='utf-8') as f:
    PermissionError: [Errno 13] Permission denied: 'multi_ner/output/550f7242b14dc039a8a5ee6aa9233f3472dd7b4a33484f5955175a4d.PubTator.json'
    

    Thank you.

    opened by starmpcc 23
  • Files are duplicated with same PMID

    Files are duplicated with same PMID

    There are major issues in the data. Many files were just duplicates of one PMID's labelling. Is this intentional? I list the files of problem, and the pmid that it is filled up with in a table, following up with an example image of the issue. Look forward to your feedback.

    | file # of problem 21n0XXX | file is filled up with only |
    |---|---| | 713, 714, 715, 716, 717, 718, 719 | pmid: 22137675 | | 740, 741, 742 | pmid: 23065850 | | 748, 749 | pmid: 23334882 | | 758, 759 | pmid: 23630654 | | 766, 767 | pmid: 23886943 | | 779, 780, 781, 782 | pmid: 24335088 | | 788, 789, 790, 791 | pmid: 24595986 | | 813, 812, 811, 810 | pmid: 25329600 |

    example image (21n0713.json) MicrosoftTeams-image

    opened by ksj20 9
  • Bern2 local deployment can't query a PMID

    Bern2 local deployment can't query a PMID

    Hello! I was testing the local deployment of Bern2 and it seems to work fine when submitting plain text. With PMID I get the following error:

    Traceback (most recent call last): File "", line 1, in File "", line 2, in query_pmid File "/Users/franciscos/opt/anaconda3/lib/python3.8/site-packages/requests/models.py", line 910, in json return complexjson.loads(self.text, **kwargs) File "/Users/franciscos/opt/anaconda3/lib/python3.8/json/init.py", line 357, in loads return _default_decoder.decode(s) File "/Users/franciscos/opt/anaconda3/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/Users/franciscos/opt/anaconda3/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

    Any idea how I can fix this? Thanks in advance and thanks for building such a nice tool

    opened by FCoroado 8
  • Annotation download ERR_EMPTY_RESPONSE

    Annotation download ERR_EMPTY_RESPONSE

    Hi,

    First of all, thanks for the great work!

    I tried to download the pre-processed PubMed annotations, but got a server error: ERR_EMPTY_RESPONSE. Is there any other way to download the data?

    opened by mbiemans 8
  • I am running into same issue as @yuetieqi-meow

    I am running into same issue as @yuetieqi-meow

        I am running into same issue as @yuetieqi-meow 
    

    No issue encountered in installation, and when I ran the test, in the nohup_bern2.out: `[01/Oct/2022 17:54:12.841469] id: 9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2 [Errno 111] Connection refused [01/Oct/2022 17:54:12.899379] [9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2] GNormPlus 0.0003619194030761719 sec [01/Oct/2022 17:54:13.614587] [9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2] tmVar 2.0 0.7597899436950684 sec [01/Oct/2022 17:54:13.801885] [9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2] Multi-task NER 0.9021279811859131 sec, #entities: 2 Traceback (most recent call last): File "/home/kun/anaconda3/envs/bern2/lib/python3.7/shutil.py", line 566, in move os.rename(src, real_dst) FileNotFoundError: [Errno 2] No such file or directory: './resources/GNormPlusJava/output/9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2.PubTator' -> './resources/tmVarJava/input/9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2.PubTator.PubTator.Gene'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/media/kun/Storage/BERN2/bern2/bern2.py", line 107, in annotate_text output = self.tag_entities(text, base_name) File "/media/kun/Storage/BERN2/bern2/bern2.py", line 376, in tag_entities shutil.move(output_gnormplus, input_tmvar_gene) File "/home/kun/anaconda3/envs/bern2/lib/python3.7/shutil.py", line 580, in move copy_function(src, real_dst) File "/home/kun/anaconda3/envs/bern2/lib/python3.7/shutil.py", line 266, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/home/kun/anaconda3/envs/bern2/lib/python3.7/shutil.py", line 120, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: './resources/GNormPlusJava/output/9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2.PubTator'`

    No other issues found on other log files: nohup_disease_normalize.out: Sieve loading .. 2628 ms, Ready nohup_gene_normalize.out: Ready (port 18888) nohup_gnormplus.out: Starting GNormPlus Service at 127.0.1.1:18895 Loading Gene Dictionary : Processing ... Loading Gene Dictionary : Processing Time:9.951sec Ready nohup_multi_ner.out: `MTNER init_t 13.013 sec.

    0it [00:00, ?it/s] 1it [00:00, 34.71it/s]

    Prediction: 0%| | 0/1 [00:00<?, ?it/s] Prediction: 100%|██████████| 1/1 [00:00<00:00, 1.37it/s] Prediction: 100%|██████████| 1/1 [00:00<00:00, 1.37it/s]nohup_tmvar.out:Starting tmVar 2.0 Service at 127.0.1.1:18896 Reading POS tagger model from lib/taggers/english-left3words-distsim.tagger ... done [1.5 sec]. Loading tmVar : Processing Time:1.739sec Ready input/9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2.PubTator - (PubTator format) : Processing Time:0.521sec ner success = 9f85ebe7f122e750ca113ce20ef88f448960721e701cd1cc3e1650e2.PubTator`

    I noticed when I ran the test case, the nohup_bern2.out nohup_multi_ner.out and nohup_tmvar.out logs are updated, but nohup_gnormplus.out is not (the other two normalize.out log files are not, and I assume because they are not executed because NER encounters error). This sounds like GNormPlus not executed at all? Any ideas on this?

    Originally posted by @kunlu-ou in https://github.com/dmis-lab/BERN2/issues/24#issuecomment-1264506002

    opened by kunlu-ou 7
  • Troubles running locally, FileNotFoundError

    Troubles running locally, FileNotFoundError

    I'm trying to run locally and following the steps described in README, I get this

    [21/Jan/2022 11:56:35.780036] id: 7bb79cf620bd0ba345504784f5aaa6ad48394bcd1cf10b7e29964e05
    [21/Jan/2022 11:56:36.632464] [7bb79cf620bd0ba345504784f5aaa6ad48394bcd1cf10b7e29964e05] tmVar 2.0 0.8203573226928711 sec
    [21/Jan/2022 11:56:38.605409] [7bb79cf620bd0ba345504784f5aaa6ad48394bcd1cf10b7e29964e05] GNormPlus 2.742703914642334 sec
    Traceback (most recent call last):
      File "/Users/dima/code/BERN2/bern2/bern2.py", line 105, in annotate_text
        output = self.tag_entities(text, base_name)
      File "/Users/dima/code/BERN2/bern2/bern2.py", line 290, in tag_entities
        async_result = loop.run_until_complete(self.async_ner(arguments_for_coroutines))
      File "/Users/dima/opt/anaconda3/envs/bern2/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
        return future.result()
      File "/Users/dima/code/BERN2/bern2/bern2.py", line 417, in async_ner
        result = await asyncio.gather(*coroutines)
      File "/Users/dima/code/BERN2/bern2/bern2.py", line 459, in _ner_wrap
        with open(output_mtner, 'r', encoding='utf-8') as f:
    FileNotFoundError: [Errno 2] No such file or directory: './multi_ner/output/7bb79cf620bd0ba345504784f5aaa6ad48394bcd1cf10b7e29964e05.PubTator.json'
    

    The only thing I've changed is removing transferring models to cuda, since I don't have GPU on my current machine.

    opened by dmytrobabych 7
  • Use preferred bioregistry prefixes for normalized entity identifiers

    Use preferred bioregistry prefixes for normalized entity identifiers

    Great to see that BERN2 normalizes entities to compact identifiers in resource:identifier format. I noticed that there is an opportunity to standardize the prefixes used with Bioregistry:

    • NCBITaxon prefix as per http://bioregistry.io/registry/ncbitaxon
    • NCBIGene prefix as per http://bioregistry.io/registry/ncbigene

    FYI I didn't check all the entity types BERN2 is capable of tagging for whether they use the preferred prefix.

    @cthoyt might also be helpful here.

    opened by dhimmel 7
  • FileNotFoundError: [Errno 2]

    FileNotFoundError: [Errno 2]

    Hello, I posted my error a few days ago in this and I retried a few times these days. I found it's really hard for me to fix it because all logs except nohup_bern2.out goes well. Here is my lastest nohup_bern2.out:

    [08/May/2022 19:28:51.930673] id: 1c3d05e08ffc1ca51c537406a8e2342fd5c546912049d73bd31e0427
    [Errno 111] Connection refused
    [08/May/2022 19:28:52.126999] [1c3d05e08ffc1ca51c537406a8e2342fd5c546912049d73bd31e0427] GNormPlus 0.0005352497100830078 sec
    [08/May/2022 19:29:07.017467] [1c3d05e08ffc1ca51c537406a8e2342fd5c546912049d73bd31e0427] Multi-task NER 14.890285730361938 sec, #entities: 2
    [08/May/2022 19:29:26.444897] [1c3d05e08ffc1ca51c537406a8e2342fd5c546912049d73bd31e0427] tmVar 2.0 34.42515420913696 sec
    Traceback (most recent call last):
      File "/home/scholar1/anaconda3/envs/bern2/lib/python3.7/shutil.py", line 566, in move
        os.rename(src, real_dst)
    FileNotFoundError: [Errno 2] No such file or directory: './resources/GNormPlusJava/output/1c3d05e08ffc1ca51c537406a8e2342fd5c546912049d73bd31e0427.PubTator' -> './resources/tmVarJava/input/1c3d05e08ffc1ca51c537406a8e2342fd5c546912049d73bd31e0427.PubTator.PubTator.Gene'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/scholar1/bern2/BERN2/bern2/bern2.py", line 106, in annotate_text
        output = self.tag_entities(text, base_name)
      File "/home/scholar1/bern2/BERN2/bern2/bern2.py", line 371, in tag_entities
        shutil.move(output_gnormplus, input_tmvar_gene)
      File "/home/scholar1/anaconda3/envs/bern2/lib/python3.7/shutil.py", line 580, in move
        copy_function(src, real_dst)
      File "/home/scholar1/anaconda3/envs/bern2/lib/python3.7/shutil.py", line 266, in copy2
        copyfile(src, dst, follow_symlinks=follow_symlinks)
      File "/home/scholar1/anaconda3/envs/bern2/lib/python3.7/shutil.py", line 120, in copyfile
        with open(src, 'rb') as fsrc:
    FileNotFoundError: [Errno 2] No such file or directory: './resources/GNormPlusJava/output/1c3d05e08ffc1ca51c537406a8e2342fd5c546912049d73bd31e0427.PubTator'
    
    127.0.0.1 - - [08/May/2022 19:29:27] "POST /plain HTTP/1.1" 404 -
    

    and all other 5 logs just similar as I posted here

    Thank you!

    opened by yuetieqi-meow 6
  • Errors while extracting resource-file

    Errors while extracting resource-file

    Hi:

    Thank you very much for sharing your excellent work. While trying to extract the resource file, I ran into this error. Any suggestions as to the cause(s)?

    tar -zxvf resources_v1.1.b.tar.gz . .

    resources/tmVarJava/Database/var2rs_Xm.db resources/tmVarJava/Database/gene2rs.db tar: Skipping to next header tar: A lone zero block at 22871798 tar: Exiting with failure status due to previous errors .

    Regards,

    Bancherd

    opened by Bancherd-DeLong 5
  • Syntax Error in server.py Causes BERN2 Local Installation to Not Work

    Syntax Error in server.py Causes BERN2 Local Installation to Not Work

    Hi @minstar & @mjeensung , I have attempted to install BERN2 locally, but I get the following error when I launch it:

    Traceback (most recent call last):
      File "server.py", line 1, in <module>
        from app import create_app
      File "/Users/[myuser]/BERN2/app/__init__.py", line 129
        return render_template('result_text.html', result_items=res_items, latency = f'{latency*1000:9.2f}', result_str=json.dumps(result_dict, sort_keys=True, indent=4))
    

    I have also seen that this error appears to keep server.py from running as it is not running when I run bash stop_bern2.sh. I get the following output

    No ner_server.py found to stop.
    Stopped GNormPlusServer.main.jar
    Stopped tmVar2Server.main.jar
    Stopped disease_normalizer_21.jar
    Stopped gnormplus-normalization_21.jar
    No server.py found to stop.
    

    Finally, I still tried to submit the plain text and PMID examples listed under the local instillation instructions and I simply get no output even after waiting for several minutes. My thought process leads me to believe that the error in server.py is causing BERN2 to simply terminate and hence no outputs from the tutorial text and PMIDS.

    Thanks for your help! Let me know if I can help with any additional files or insight.

    Andrew

    opened by compbiolover 5
  • How to improve named entity normalization for human proteins?

    How to improve named entity normalization for human proteins?

    Very excited to see BERN2! Really nice work so far.

    I'm looking to map certain mentions of proteins to standard identifiers. Here's a list of these proteins, where each protein is also followed by a direction of activity:

    3 beta hydroxysteroid dehydrogenase 5 stimulator; AF4/FMR2 protein 2 inhibitor; Adenylate cyclase 2 stimulator; Alpha gamma adaptin binding protein p34 stimulator; BR serine threonine protein kinase 1 stimulator; Complement Factor B stimulator; DNA gyrase B inhibitor; Ectonucleotide pyrophosphatase-PDE-3 stimulator; Falcipain 1 stimulator; Homeobox protein Nkx 2.4 stimulator; ISLR protein inhibitor; Integrin alpha-IIb/beta-4 antagonist; Inter alpha trypsin inhibitor H5 stimulator; Interleukin receptor 17B antagonist; Isopropylmalate dehydrogenase stimulator; Methylthioadenosine nucleosidase stimulator; Patched domain containing protein 2 inhibitor; Protein FAM161A stimulator; Protocadherin gamma A1 inhibitor; Ring finger protein 4 stimulator; SMAD-9 inhibitor; Small ubiquitin related modifier 1 inhibitor; Sodium-dicarboxylate cotransporter-1 inhibitor; Sorting nexin 9 inhibitor; Sugar phosphate exchanger 2 stimulator; Transcription factor p65 stimulator; Tumor necrosis factor 14 ligand inhibitor; Ubiquitin-conjugating enzyme E21 stimulator; Unspecified ion channel inhibitor; Zinc finger BED domain protein 6 inhibitor

    Using the nice web interface, I get:

    image

    So overall BERN2 does a good job recognizing the protein mentions. However, we actually already know what the protein text is, and are more interested in normalization. Most of the gene/protein mentions receive "ID: CUI-less". Any advice on how to improve the performance of named entity normalization for human proteins?

    I see that the website notes that normalization is done by https://github.com/dmis-lab/BioSyn, so feel free to migrate this issue to that repo if it's best there.

    opened by dhimmel 5
  • Unable to get response

    Unable to get response

    I get the following error:

    [Errno 2] No such file or directory : './resources/GNormPlusJava/output/2ec5cc999ceefc8f5a83a423fe6db771ab7868adf7fb74598c36ccd.PubTator' -> ./resources/tmVarJava/input/2ec5cc999ceefc8f5a83a423fe6db771ab7868adf7fb74598c36ccd.PubTator.PubTator.Gene

    does anyone can help me fix this?

    I'm not able to get response when I call the rest api. Could someone please help?

    opened by Wickkey 0
  • Unable to run it on Ubuntu 22. Anaconda already runs on port 8888. Response 403.

    Unable to run it on Ubuntu 22. Anaconda already runs on port 8888. Response 403.

    Can you please publish a video for the local installation. Trying on Ubuntu with already exiting environment of Jupyter. Nothing is running on 8888. 403 error.

    opened by zeusstuxnet 0
  • Unable to install bern2 dependencies on Mac OS

    Unable to install bern2 dependencies on Mac OS

    Hi,

    I am trying to install bern2 dependencies on mac Os Montery. When I run the below commands , I am getting package not found error for cudatoolkit and faiss-gpu. And these version are not available for osx. I confirmed it through conda search and also by searching on anaconda.org search packages.

    Let me know if I can run this on osx ?

    Dependencies : conda install pytorch==1.9.0 cudatoolkit=10.2 -c pytorch conda install faiss-gpu libfaiss-avx2 -c conda-forge

    Thanks

    opened by naveenjack 3
Releases(v1.1.0)
Owner
DMIS Laboratory - Korea University
Data Mining & Information Systems Laboratory @ Korea University
DMIS Laboratory - Korea University
This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.

Improving Transformer Models by Reordering their Sublayers This repository contains the code for running the character-level Sandwich Transformers fro

Ofir Press 53 Sep 26, 2022
Twitter-Sentiment-Analysis - Analysis of twitter posts' positive and negative score.

Twitter-Sentiment-Analysis The hands-on project is in Python 3 Programming class offered by University of Michigan via Coursera. The task is to build

Eszter Pai 1 Jan 03, 2022
Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展,本项目提供了 CPM-LM (2.6B) 模型的文本生成代码,可用于文本生成的本地测试,并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理,我们建议使用高效推理工具BMI

Tsinghua AI 1.4k Jan 03, 2023
Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

Friedrich Lindenberg 2 Nov 12, 2021
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
Extract Keywords from sentence or Replace keywords in sentences.

FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install

Vikash Singh 5.3k Jan 01, 2023
Python generation script for BitBirds

BitBirds generation script Intro This is published under MIT license, which means you can do whatever you want with it - entirely at your own risk. Pl

286 Dec 06, 2022
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

GI-Pi Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library. The SP0

Nick Bild 8 Dec 15, 2021
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

Explosion 222 Dec 16, 2022
Code for the Python code smells video on the ArjanCodes channel.

7 Python code smells This repository contains the code for the Python code smells video on the ArjanCodes channel (watch the video here). The example

55 Dec 29, 2022
RIDE automatically creates the package and boilerplate OOP Python node scripts as per your needs

RIDE: ROS IDE RIDE automatically creates the package and boilerplate OOP Python code for nodes as per your needs (RIDE is not an IDE, but even ROS isn

Jash Mota 20 Jul 14, 2022
运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

OlittleRer 运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。编程语言和工具包括Java、Python、Matlab、CPLEX、Gurobi、SCIP 等。 关注我们: 运筹小公众号 有问题可以直接在

运小筹 151 Dec 30, 2022
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Hugging Face 15k Jan 02, 2023
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Dec 30, 2022
SDL: Synthetic Document Layout dataset

SDL is the project that synthesizes document images. It facilitates multiple-level labeling on document images and can generate in multiple languages.

Sơn Nguyễn 0 Oct 07, 2021
原神抽卡记录数据集-Genshin Impact gacha data

提要 持续收集原神抽卡记录中 可以使用抽卡记录导出工具导出抽卡记录的json,将json文件发送至[email protected],我会在清除个人信息后

117 Dec 27, 2022
TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

TweebankNLP This repo contains the new Tweebank-NER dataset and off-the-shelf Twitter-Stanza pipeline for state-of-the-art Tweet NLP, as described in

Laboratory for Social Machines 84 Dec 20, 2022
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
🏆 • 5050 most frequent words in 109 languages

🏆 Most Common Words Multilingual 5000 most frequent words in 109 languages. Uses wordfrequency.info as a source. 🔗 License source code license data

14 Nov 24, 2022