Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Last update: Aug 25, 2022

Overview

NLP learning

Trying to learn NLP to use in my projects!

Table of Contents

About The Project
- Built With
Getting Started
- Requirements
- Run
Usage
License
Contact

About The Project

There many ways and algorithms to understand language by machines. but first of all we should convert our words to vetcotrs ecause we nedd do to some calulcation on them

Here's some NLP keywords that i have learned till now:

Using classic AI algorithms like NAIVE Bayes
using TF-IDF to convert words to vectors
using word2vec to convert words to vectors

Of course, the list above in not complete but we will epand it in future.

(back to top)

Built With

This section should list any major frameworks/libraries and tools used implement this project.

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Requirements

We used Numpy for it array and math functions

numpy
```
pip install numpy
```

Run

$ python3 main.py

(back to top)

Usage

With the TF-IDF algorithm implemented you can find similaroty between different documnets so you can use it in chat bots and search engines.

For more examples, please refer to the Documentation

(back to top)

License

Distributed under the MIT License. See LICENSE.md for more information.

(back to top)

Contact

Faraz Farangizadeh - [email protected]

Project Link: https://github.com/farazff/NLP-Learning

(back to top)

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Related tags

Overview

NLP learning

About The Project

Built With

Getting Started

Requirements

Run

Usage

License

Contact

Owner

Faraz Farangizadeh

Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

CorNet Correlation Networks for Extreme Multi-label Text Classification

Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Easy, fast, effective, and automatic g-code compression!

Multi Task Vision and Language

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

Checking spelling of form elements

Basic Utilities for PyTorch Natural Language Processing (NLP)

A fast, efficient universal vector embedding utility package.

A toolkit for document-level event extraction, containing some SOTA model implementations

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).