PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

Overview

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

PyPI - Python Version PyPI PyPI_downloads License Gitter

total views total views per week total clones total clones per week

All Contributors

Aspect Term Extraction (ATE) & Aspect Polarity Classification (APC)

Fast & Low Memory requirement & Enhanced implementation of Local Context Focus

Build from LC-ABSA / LCF-ABSA / LCF-BERT and LCF-ATEPC.

PyTorch Implementations (CPU & CUDA supported).

If you are willing to support PyABSA project, please star this repository as your contribution.

1. Package Overview

pyabsa package root (including all interfaces)
pyabsa.functional recommend interface entry
pyabsa.functional.checkpoint checkpoint manager entry, inference model entry
pyabsa.functional.dataset datasets entry
pyabsa.functional.config predefined config manager
pyabsa.functional.trainer training module, every trainer return a inference model

2. Read the Important Tips

2.1 Use your custom dataset

PyABSA use the FindFile to find the target file which means you can specify a dataset/checkpoint by keywords instead of using absolute path. e.g.,

  • First, refer to ABSADatasets to prepare your dataset into acceptable format.
  • You can PR to contribute your dataset and use it like ABDADatasets.your_dataset, or use it by dataset absolute / relative path, or dataset dir name
dataset = './laptop' # relative path
dataset = 'ABSOLUTE_PATH/laptop/' # absolute path
dataset = 'laptop' # dataset directory name, keyword case doesn't matter
dataset = 'lapto' # search any directory whose path contains the 'lapto' or 'aptop'

checkpoint = 'lcfs' # checkpoint assignment is similar to above methods

2.2 Auto select the free cuda for training & inference

PyABSA use the AutoCUDA to support automatic cuda assignment, but you can still set a preferred device.

auto_device=True  # to auto assign a cuda device for training / inference
auto_device=False  # to use cpu
auto_device='cuda:1'  # to specify a preferred device
auto_device='cpu'  # to specify a preferred device

2.3 Flexible labels than others

PyABSA support auto label fixing which means you can set the labels to any token (except -999), e.g., sentiment labels = {-9. 2, negative, positive}

  • Check and make sure the version and datasets of checkpoint are compatible to your current PyABSA. The version information of PyABSA is also available in the output while loading checkpoints training args.
  • You can train a model using multiple datasets with same sentiment labels, and you can even contribute and define a combination of datasets here!
  • Other features are available to be found

3. Quick Start

  • Create a new python environment and install pyabsa
  • ind a target demo script (ATEPC, APC, Text Classification) to prepare your work
  • Format your dataset referring to ABSADatasets or use public dataset in ABSADatasets
  • Init your config to specify Model, Dataset, hyper-parameters
  • Training your model and get checkpoint
  • Share your checkpoint and dataset

4. Installation

Please do not install the version without corresponding release note to avoid installing a test version.

4.1 install via pip

To use PyABSA, install the latest version from pip or source code:

pip install -U pyabsa

4.2 install via source

git clone https://github.com/yangheng95/PyABSA --depth=1
cd PyABSA 
python setup.py install

5. Learning to Use Checkpoint

5.1 How to get available checkpoints from Google Drive

PyABSA will check the latest available checkpoints before and load the latest checkpoint from Google Drive. To view available checkpoints, you can use the following code and load the checkpoint by name:

from pyabsa import available_checkpoints
checkpoint_map = available_checkpoints()  # show available checkpoints of PyABSA of current version 

If you can not access to Google Drive, you can download our checkpoints and load the unzipped checkpoint manually. 如果您无法访问谷歌Drive,您可以从此处 (提取码:ABSA) 下载我们预训练的模型,并手动解压缩并加载模型。

5.2 How to use our pretrained checkpoints on your dataset

5.3 How to share checkpoints (e.g., checkpoints trained on your custom dataset) with community

6. Datasets

More datasets are available at ABSADatasets.

  1. Twitter
  2. Laptop14
  3. Restaurant14
  4. Restaurant15
  5. Restaurant16
  6. Phone
  7. Car
  8. Camera
  9. Notebook
  10. MAMS
  11. TShirt
  12. Television
  13. MOOC
  14. Shampoo
  15. Multilingual (The sum of all datasets.)

You don't have to download the datasets, as the datasets will be downloaded automatically.

7. Model Support

Except for the following models, we provide a template model involving LCF vec, you can develop your model based on the LCF-APC model template or LCF-ATEPC model template.

7.1 ATEPC

  1. LCF-ATEPC
  2. LCF-ATEPC-LARGE (Dual BERT)
  3. FAST-LCF-ATEPC
  4. LCFS-ATEPC
  5. LCFS-ATEPC-LARGE (Dual BERT)
  6. FAST-LCFS-ATEPC
  7. BERT-BASE

7.2 APC

Bert-based APC models

  1. SLIDE-LCF-BERT (Faster & Performs Better than LCF/LCFS-BERT)
  2. SLIDE-LCFS-BERT (Faster & Performs Better than LCF/LCFS-BERT)
  3. LCF-BERT (Reimplemented & Enhanced)
  4. LCFS-BERT (Reimplemented & Enhanced)
  5. FAST-LCF-BERT (Faster with slightly performance loss)
  6. FAST_LCFS-BERT (Faster with slightly performance loss)
  7. LCF-DUAL-BERT (Dual BERT)
  8. LCFS-DUAL-BERT (Dual BERT)
  9. BERT-BASE
  10. BERT-SPC
  11. LCA-Net
  12. DLCF-DCA-BERT *

Bert-based APC baseline models

  1. AOA_BERT
  2. ASGCN_BERT
  3. ATAE_LSTM_BERT
  4. Cabasc_BERT
  5. IAN_BERT
  6. LSTM_BERT
  7. MemNet_BERT
  8. MGAN_BERT
  9. RAM_BERT
  10. TD_LSTM_BERT
  11. TC_LSTM_BERT
  12. TNet_LF_BERT

GloVe-based APC baseline models

  1. AOA
  2. ASGCN
  3. ATAE-LSTM
  4. Cabasc
  5. IAN
  6. LSTM
  7. MemNet
  8. MGAN
  9. RAM
  10. TD-LSTM
  11. TD-LSTM
  12. TNet_LF

Contribution

We expect that you can help us improve this project, and your contributions are welcome. You can make a contribution in many ways, including:

  • Share your custom dataset in PyABSA and ABSADatasets
  • Integrates your models in PyABSA. (You can share your models whether it is or not based on PyABSA. if you are interested, we will help you)
  • Raise a bug report while you use PyABSA or review the code (PyABSA is a individual project driven by enthusiasm so your help is needed)
  • Give us some advice about feature design/refactor (You can advise to improve some feature)
  • Correct/Rewrite some error-messages or code comment (The comments are not written by native english speaker, you can help us improve documents)
  • Create an example script in a particular situation (Such as specify a SpaCy model, pretrainedbert type, some hyperparameters)
  • Star this repository to keep it active

Notice

The LCF is a simple and adoptive mechanism proposed for ABSA. Many models based on LCF has been proposed and achieved SOTA performance. Developing your models based on LCF will significantly improve your ABSA models. If you are looking for the original proposal of local context focus, please redirect to the introduction of LCF. If you are looking for the original codes of the LCF-related papers, please redirect to LC-ABSA / LCF-ABSA or LCF-ATEPC.

Acknowledgement

This work build from LC-ABSA/LCF-ABSA and LCF-ATEPC, and other impressive works such as PyTorch-ABSA and LCFS-BERT.

License

MIT

Contributors

Thanks goes to these wonderful people (emoji key):


XuMayi

💻

YangHeng

📆

brtgpy

🔣

Ryan

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Comments
  • IndexError: list index out of range | ATEPC English training on Tshirt dataset

    IndexError: list index out of range | ATEPC English training on Tshirt dataset

    Out-of-range error while training ATEPC model - english on T-shirt dataset.


    ... config.model = ATEPCModelList.LCFS_ATEPC config.evaluate_begin = 5 config.num_epoch = 6 config.log_step = 100 tshirt = ABSADatasetList.TShirt

    aspect_extractor = Trainer(config=config, dataset=tshirt, checkpoint_save_mode=1, auto_device=True )

    Traceback - >

    TShirt dataset is not found locally, search at https://github.com/yangheng95/ABSADatasets Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight']

    • This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    • This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Using bos_token, but it is not set yet. Using eos_token, but it is not set yet. 59%|█████▊ | 1098/1870 [00:10<00:07, 100.65it/s, convert examples to features]

    IndexError Traceback (most recent call last) in () 3 # from_checkpoint=checkpoint_path, 4 checkpoint_save_mode=1, ----> 5 auto_device=True 6 )

    7 frames /usr/local/lib/python3.7/dist-packages/pyabsa/functional/trainer/trainer.py in init(self, config, dataset, from_checkpoint, checkpoint_save_mode, auto_device) 92 config.model_path_to_save = None 93 ---> 94 self.train() 95 96 def train(self):

    /usr/local/lib/python3.7/dist-packages/pyabsa/functional/trainer/trainer.py in train(self) 103 self.config.seed = s 104 if self.checkpoint_save_mode: --> 105 model_path.append(self.train_func(self.config, self.from_checkpoint, self.logger)) 106 else: 107 # always return the last trained model if dont save trained model

    /usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/training/atepc_trainer.py in train4atepc(opt, from_checkpoint_path, logger) 352 while not trainer: 353 try: --> 354 trainer = Instructor(opt, logger) 355 if from_checkpoint_path: 356 model_path = find_files(from_checkpoint_path, '.model')

    /usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/training/atepc_trainer.py in init(self, opt, logger) 70 len(self.train_examples) / self.opt.batch_size / self.opt.gradient_accumulation_steps) * self.opt.num_epoch 71 train_features = convert_examples_to_features(self.train_examples, self.label_list, self.opt.max_seq_len, ---> 72 self.tokenizer, self.opt) 73 all_spc_input_ids = torch.tensor([f.input_ids_spc for f in train_features], dtype=torch.long) 74 all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long)

    /usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/dataset_utils/data_utils_for_training.py in convert_examples_to_features(examples, label_list, max_seq_len, tokenizer, opt) 188 text_right = '' 189 aspect = '' --> 190 prepared_inputs = prepare_input_for_atepc(opt, tokenizer, text_left, text_right, aspect) 191 lcf_cdm_vec = prepared_inputs['lcf_cdm_vec'] 192 lcf_cdw_vec = prepared_inputs['lcf_cdw_vec']

    /usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/dataset_utils/atepc_utils.py in prepare_input_for_atepc(opt, tokenizer, text_left, text_right, aspect) 60 61 if 'lcfs' in opt.model_name or opt.use_syntax_based_SRD: ---> 62 syntactical_dist, _ = get_syntax_distance(text_raw, aspect, tokenizer, opt) 63 else: 64 syntactical_dist = None

    /usr/local/lib/python3.7/dist-packages/pyabsa/core/apc/dataset_utils/apc_utils.py in get_syntax_distance(text_raw, aspect, tokenizer, opt) 240 # the following two functions are both designed to calculate syntax-based distances 241 if opt.srd_alignment: --> 242 syntactical_dist = syntax_distance_alignment(raw_tokens, dist, opt.max_seq_len, tokenizer) 243 else: 244 syntactical_dist = pad_syntax_based_srd(raw_tokens, dist, tokenizer, opt)[1]

    /usr/local/lib/python3.7/dist-packages/pyabsa/core/apc/dataset_utils/apc_utils.py in syntax_distance_alignment(tokens, dist, max_seq_len, tokenizer) 38 if bert_tokens != text: 39 while text or bert_tokens: ---> 40 if text[0] == ' ' or text[0] == '\xa0': # bad case handle 41 text = text[1:] 42 dep_dist = dep_dist[1:]

    IndexError: list index out of range

    bug 
    opened by hitz02 52
  • If Review contains numbers or emojis, its not generating any entities

    If Review contains numbers or emojis, its not generating any entities

    I am applying PyABASA package on amazon mobile phone reviews and its not generating attributes when the review contains numbers or emojis.

    For example : iPhone 12. Best phone 😍 Genuine product thanks a lot amazon I purchase this divice 20 jan 2022 almost work fine. Best one

    For above reviews and similar ones its not generating entities with sentiment. I really appreciate if this issue can be resolved.

    opened by ImSanjayChintha 17
  • [Question] Why all then sentiment predict Positive

    [Question] Why all then sentiment predict Positive

    Question Hi, it's great works you'd been made on this project.

    I used this project for training on custom dataset, it has around 2000 examples. Label count is a little imbalance.Finally, I trained a model with 100 apoach and achieved apc_acc around 90 score. But the predict resullt is always Positive on all the aspect.

    thanks very much you any advice?

    opened by brightgems 17
  • Question about inference

    Question about inference

    Hi, thanks for the nice work. Recently I try to use the multilingual pretrained model for inference. I found that if the model predicts both of 2 consecutive words as (B-ASP). There will be a 'empty separator' error while inferencing. Is there any advice for avoiding this situation? Thanks again !

    image image

    bug 
    opened by leohsuofnthu 16
  • Question about the version of the package used by the framework

    Question about the version of the package used by the framework

    Hello, excuse me

    1. It is not convenient for the party to write a document listing the versions of each package used by the framework.
    2. One more question, will the packages used by the framework be updated in a timely manner? For example, if the torch is upgraded to 1.11.0, will the framework be updated in a timely manner?
    opened by yaoysyao 15
  • 使用atepc分析时有些文本无法获取结果

    使用atepc分析时有些文本无法获取结果

    你好,冒昧打扰,作者辛苦了,谢谢维护这个项目,在使用过程中遇到如下问题: 版本:1.16.5 文本如下: Let me begin by saying that there are two kinds of people, those who will give the Tokyo Hotel 5 stars and rave about it to everyone they know, or... people who can't get past the broken phone, blood stains, beeping fire alarms, peg-legged receptionist, lack of water pressure, cracked walls, strange smells, questionable elevator, televisions left to die after the digital conversion, and the possibility that the air conditioner may fall out the window at any moment. That being said, I whole-heartedly give the Tokyo Hotel 5 stars. This is not a place to quietly slip in and out of with nothing to show but a faint memory of the imitation Thomas Kinkade painting bolted to the wall above your bed. And, there is no continental breakfast or coffee in the lobby. There are a few vending machines, but I'm pretty sure they wont take change minted after 1970. Here your senses will be assaulted, and after you leave you will have enough memories to compete with a 1,000 mile road-trip. I beg anyone who is even mildly considering staying here to give it a chance. The location is prime. We were able to walk down Michigan Ave and the river-walk in the middle of the night, all without straying too far from the hotel. There is a grocery store a block away and parking (which may cost more that your hotel room) across the street. Besides, this place is cheap. Super-cheap for downtown Chicago. The closest price we found in the area was four times as expensive. But, be sure to grab some cash. They don't accept credit cards. Some rules though: - Say hello to Clifton Jackson, the homeless guy by Jewel-Osco. - Buy him a drink, some chicken and look him up on Facebook. - Stay on the 17 floor. All the way at the top. - Go out the fire escape (be sure to prop the door open or you'll have a looong walk down) - Be very very careful. - Explore. (Yes, that ladder will hold your weight) - Be very very careful. - Don't be alarmed by any weird noises you hear. - Spend the night on the roof. 17 stories up, in the heart of Chicago. - Write your own Yelp review. I want to see that others are getting the Tokyo Hotel Experience. - Check out is at noon. Be sure to drink lots of water. - Spend the next day hung over. And... Please be careful on the roof. 使用的预训练好的模型:fast_lcf_atepc_Multilingual_cdw_apcacc_88.96_apcf1_81.58_atef1_81.92 得到的结果:'aspect': [], 'position': [], 'sentiment': [], 'probs': [], 'confidence': [] 从结果看出,无法分析文本的细粒度情感,请问这种情况出现的原因是文本造成的还是模型的原因 关于预训练好的模型,我在hugging face上看到你有更新一些checkpoint,请问那些模型是不是可以直接用来加载使用?

    opened by yaoysyao 12
  • [Question] atepc prediction result is array, but its length is not equal with inputs

    [Question] atepc prediction result is array, but its length is not equal with inputs

    Environment pyabsa: v1.1.22

    Question atepc prediction result is array, but its length is not equal with inputs. For example: inputs examples = ['我就想问,这个真的用清水可以清洗的干净的吗?洗完之后油的吹不太干……难不成我昨晚发膜还要拿洗发水再洗一遍?那请问意义何在了……实在是很尴尬']*20

    outputs [{'sentence': '我 就 想 问 , 这 个 真 的 用 清 水 可 以 清 洗 的 干 净 的 吗 ? 洗 完 之 后 油 的 吹 不 太 干 & hellip ; & hellip ; 难 不 成 我 昨 晚 发 膜 还 要 拿 洗 发 水 再 洗 一 遍 ? 那 请 问 意 义 何 在 了 & hellip ; & hellip ; 实 在 是 很 尴 尬', 'IOB': ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'I-ASP', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', '[SEP]', 'O', 'O', 'O', 'O', 'O', 'O'], 'tokens': ['我', '就', '想', '问', ',', '这', '个', '真', '的', '用', '清', '水', '可', '以', '清', '洗', '的', '干', '净', '的', '吗', '?', '洗', '完', '之', '后', '油', '的', '吹', '不', '太', '干', '&', 'hellip', ';', '&', 'hellip', ';', '难', '不', '成', '我', '昨', '晚', '发', '膜', '还', '要', '拿', '洗', '发', '水', '再', '洗', '一', '遍', '?', '那', '请', '问', '意', '义', '何', '在', '了', '&', 'hellip', ';', '&', 'hellip', ';', '实', '在', '是', '很', '尴', '尬'], 'aspect': ['完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干', '完 之 后 油 的 吹 不 太 干'], 'position': [[23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31], [23, 24, 25, 26, 27, 28, 29, 30, 31]], 'sentiment': ['Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative']}]

    opened by brightgems 12
  • 使用deploy demo,情感预测总是Positive

    使用deploy demo,情感预测总是Positive

    checkpoint = 'model_garden/V0.8.8.0/Chinese/ATEPC/fast_lcf_atepc_Chinese_cdw_apcacc_96.69_apcf1_96.25_atef1_92.26' checkpoint = 'model_garden/V0.8.8.0/Chinese/ATEPC/fast_lcf_atepc_Multilingual_cdw_apcacc_79.61_apcf1_76.24_atef1_63.29.zip' 这两个模型都试了,Sentiment总是Positive,即便用很负向的表达。

    image

    opened by jkkl 11
  • 对于atepc关于使用自己的数据集APC指标较低的问题?

    对于atepc关于使用自己的数据集APC指标较低的问题?

    我标注了一套自己的数据集,单独跑apc(用你们的APC模型跑的)任务指标正常。APC acc为91,f1为91.但是我跑多任务的时候,用你们的ATEPC的时候,ATE指标倒是正常,可是APC的指标很低,F1为37.37(max:37.37)。。。这是为啥。。。 是不是我想做多任务的时候只能先方面抽取ATE再情感极性分类?

    opened by zhujinqiu 11
  • IndexError: list index out of range

    IndexError: list index out of range

    Hi, yangheng! the project used to worked fine on my computer, but after installing the latest version of pyabsa, indexerror inccurs as below: yelp = "C:/Users/Li Wei/integrated_datasets/apc_datasets/SemEval/yelprestaurant" aspect_extractor = Trainer(config=config, dataset=yelp, checkpoint_save_mode=1, auto_device=True ).load_trained_model()

    and indexerror related to above code is: `IndexError Traceback (most recent call last) in 1 yelp = "C:/Users/Li Wei/integrated_datasets/apc_datasets/SemEval/yelprestaurant" ----> 2 aspect_extractor = Trainer(config=config, 3 dataset=yelp, 4 checkpoint_save_mode=1, 5 auto_device=True

    D:\Anaconda\lib\site-packages\pyabsa\functional\trainer\trainer.py in init(self, config, dataset, from_checkpoint, checkpoint_save_mode, auto_device) 71 72 """ ---> 73 config.ABSADatasetsVersion = query_local_version() 74 if isinstance(config, APCConfigManager): 75 self.train_func = train4apc

    D:\Anaconda\lib\site-packages\pyabsa\utils\file_utils.py in query_local_version() 293 def query_local_version(): 294 fin = open(find_cwd_file(['init.py', 'integrated_datasets'])) --> 295 local_version = fin.read().split(''')[-2] 296 fin.close() 297 return local_version

    IndexError: list index out of range`

    opened by WeiLi9811 11
  • Torch not compiled with CUDA enabled

    Torch not compiled with CUDA enabled

    I have run the "https://github.com/yangheng95/PyABSA/blob/release/examples/aspect_term_extraction/extract_aspects_chinese.py" on CPU device, and set "auto_device=False", but error message received that "Torch not compiled with CUDA enabled"。I have checked the class of "AspectExtractor" and the model class of "LCF_ATEPC", but no mistake were found。

    opened by zhihao-chen 11
  • Question on ATEPC performance metrics and loss.

    Question on ATEPC performance metrics and loss.

    Hi author @yangheng95 ,

    I'm using the FAST-LCF-ATEPC model on my custom dataset and I have 4 questions on the ATEPC performance metrics and loss:

    1. Whats the difference between these 2 Metric Visualizer (MV) tables? Is the validation set used to calculate these metrics? image

    2. As I understand from atepc_trainer.py , there are 3 types of losses which are loss_ate , loss_apc and lastly the combined loss that uses this formula loss = loss_ate + ate_loss_weight * loss_apc. I was wondering if you could explain it in simple terms how are each of the losses calculated from the expected output and the actual output?

    3. In continuation to question 2, I want to check if the model overfits to my dataset and to do that I need to plot the training loss and validation loss. So does the `losses' list refer to the training loss? (see below) https://github.com/yangheng95/PyABSA/blob/964d7862da13ef8cc38cb56fe0e65086b343a9cd/pyabsa/core/atepc/training/atepc_trainer.py#L204

    4. How can I retrieve the validation loss for ATE and APC separately so that I could plot them in a graph.

    Kind regards, kerolzeeq

    opened by kerolzeeq 4
  • Performance measures test data FAST_LCF checkpoint model

    Performance measures test data FAST_LCF checkpoint model

    Dear @yangheng95,

    Thanks for making and maintaining this repo, it's great!

    I have some trouble to get the accuracy and F1 scores for the Restaurant Test data Gold. (Ideally I want to make a confusion matrix). What is the easiest way to get F1 scores for APC & ATE after running a checkpoint model on test data? Does the model store these metrics somewhere?

    Alternatively, how do you compare your predictions to the TRUE test data (Restaurant Test data Gold annotated)? I can easily transform the models' predictions ('atepc_inference.result_json') to a pandas dataframe. But it is very hard to transform the test data stored in integrated datasets (from ABSAdatasets) (it is in IOB format) to that exact same format (pandas dataframe) in order to test performance. Do you have a script for that, or a certain function? I was not able to find it.

    Btw: I used the multilingual checkpoint model (FAST-LCF-ATEPC) on the Restaurant14 Test data Gold (But, ultimately I want to use this model on Dutch data. That is why I want to know how to test performance).

    Thanks a lot,

    Karsten

    Code:

    import pyabsa as pyabsa
    
    from pyabsa import available_checkpoints
    # The results of available_checkpoints() depend on the PyABSA version
    checkpoint_map = available_checkpoints()  # show available checkpoints of PyABSA of current version 
    
    from pyabsa.functional import ABSADatasetList
    from pyabsa.functional import ATEPCCheckpointManager
    inference_source = ABSADatasetList.Restaurant14
    aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='multilingual')
    atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source,
                                                   save_result=True,
                                                   print_result=True,  # print the result
                                                   pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                                                   )
    
    import pandas as pd
    df_restaurant_EN_test_pred = pd.read_json('atepc_inference.result_EN.json')
    
    opened by KarstenLasse 3
  • update ATEPC for ATE and ACD

    update ATEPC for ATE and ACD

    Hello can we update lCF-ATEPC to do Aspect term extraction and aspect category detection for SemEval dataset (instead of Aspect polarity classification) where replacing sentiment polarity(positive, negative, natural) with aspect categories (food, service, .....) Thanks in advance

    opened by Astudnew 1
Releases(v2.0.11)
Owner
YangHeng
PhD, University of Exeter
YangHeng
This repository contains examples of Task-Informed Meta-Learning

Task-Informed Meta-Learning This repository contains examples of Task-Informed Meta-Learning (paper). We consider two tasks: Crop Type Classification

10 Dec 19, 2022
Help you discover excellent English projects and get rid of disturbing by other spoken language

GitHub English Top Charts 「Help you discover excellent English projects and get

GrowingGit 544 Jan 09, 2023
Backend for the Autocomplete platform. An AI assisted coding platform.

Introduction A custom predictor allows you to deploy your own prediction implementation, useful when the existing serving implementations don't fit yo

Tatenda Christopher Chinyamakobvu 1 Jan 31, 2022
Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Graph-Bert Source code of "Graph-Bert: Only Attention is Needed for Learning Graph Representations". Please check the script.py as the entry point. We

14 Mar 25, 2022
OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters - where the final result looks like waves in the ocean.

Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
Baseline code for Korean open domain question answering(ODQA)

Open-Domain Question Answering(ODQA)는 다양한 주제에 대한 문서 집합으로부터 자연어 질의에 대한 답변을 찾아오는 task입니다. 이때 사용자 질의에 답변하기 위해 주어지는 지문이 따로 존재하지 않습니다. 따라서 사전에 구축되어있는 Knowl

VUMBLEB 69 Nov 04, 2022
Machine learning models from Singapore's NLP research community

SG-NLP Machine learning models from Singapore's natural language processing (NLP) research community. sgnlp is a Python package that allows you to eas

AI Singapore | AI Makerspace 21 Dec 17, 2022
문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Namuwiki corpus 문장단위로 미리 분절된 나무위키 코퍼스. 목적이 LM등에서 사용하기 위한 데이터셋이라, 링크/이미지/테이블 등등이 잘려있습니다. 문장 단위 분절은 kss를 활용하였습니다. 라이선스는 나무위키에 명시된 바와 같이 CC BY-NC-SA 2.0

Jeong Ukjae 16 Apr 02, 2022
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022
Stand-alone language identification system

langid.py readme Introduction langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained

2k Jan 04, 2023
Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

rJAM splitscreen message reader for MysticBBS A46+

Robbert Langezaal 4 Nov 22, 2022
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

2 Jun 19, 2022
Score-Based Point Cloud Denoising (ICCV'21)

Score-Based Point Cloud Denoising (ICCV'21) [Paper] https://arxiv.org/abs/2107.10981 Installation Recommended Environment The code has been tested in

Shitong Luo 79 Dec 26, 2022
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 775 Jan 08, 2023
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
Text-Based zombie apocalyptic decision-making game in Python

Inspiration We shared university first year game coursework.[to gauge previous experience and start brainstorming] Adapted a particular nuclear fallou

Amin Sabbagh 2 Feb 17, 2022
VoiceFixer VoiceFixer is a framework for general speech restoration.

VoiceFixer VoiceFixer is a framework for general speech restoration. We aim at the restoration of severly degraded speech and historical speech. Paper

Leo 174 Jan 06, 2023
A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large 💻 GitHub Repository 📚 Documentat

Xing Han Lu 244 Dec 30, 2022
List of GSoC organisations with number of times they have been selected.

Welcome to GSoC Organisation Frequency And Details 👋 List of GSoC organisations with number of times they have been selected, techonologies, topics,

Shivam Kumar Jha 41 Oct 01, 2022