Dados Públicos de CNPJ disponibilizados pela Receita Federal do Brasil

Overview

Dados Públicos CNPJ

  • Fonte oficial da Receita Federal do Brasil, aqui.
  • Layout dos arquivos, aqui.

A Receita Federal do Brasil disponibiliza bases com os dados públicos do cadastro nacional de pessoas jurídicas (CNPJ).

De forma geral, nelas constam as mesmas informações que conseguimos ver no cartão do CNPJ, quando fazemos uma consulta individual, acrescidas de outros dados de Simples Nacional, sócios e etc. Análises muito ricas podem sair desses dados, desde econômicas, mercadológicas até investigações.

Nesse repositório consta um processo de ETL para i) baixar os arquivos; ii) descompactar; iii) ler, tratar e iv) inserir num banco de dados relacional PostgreSQL.


Infraestrutura necessária:

  • Python 3.8 - libraries:

    • wget
    • pandas
    • ftplib
    • datetime
    • gzip
    • urllib
    • bs4
    • re
    • os
    • zipfile
    • sqlalchemy
    • psycopg2
    • time
  • Banco de dados:


How to use:

  1. Com o Postgre instalado, inicie a instância do servidor (pode ser local) e crie o banco de dados conforme o arquivo banco_de_dados.sql.

  2. Conforme o seu ambiente, substitua as variáveis abaixo no arquivo ETL_coletar_dados_e_gravar_BD.py:

    • output_files: diretório de destino para o donwload dos arquivos
    • user: usuário do banco de dados criado pelo arquivo banco_de_dados.sql
    • passw: senha do usuário do BD
    • host: host da conexão com o BD
    • port: porta da conexão com o BD
    • database: nome da base de dados na instância (Dados_RFB - conforme arquivo banco_de_dados.sql)
  3. Executar o arquivo ETL_coletar_dados_e_gravar_BD.py e aguardar a finalização do processo.

    • Os arquivos são grandes: dependendo da infraestrutura isso deve levar muitas horas para conclusão.
    • Arquivos de 08/05/2021: 4,68 GB compactados e 17,1 GB descompactados.

Tabelas geradas:

  • Para maiores informações, consulte o layout.

    • empresa: dados cadastrais da empresa em nível de matriz
    • estabelecimento: dados analíticos da empresa por unidade / estabelecimento (telefones, endereço, filial, etc)
    • socios: dados cadastrais dos sócios das empresas
    • simples: dados de MEI e Simples Nacional
    • cnae: código e descrição dos CNAEs
    • quals: tabela de qualificação das pessoas físicas - sócios, responsável e representante legal.
    • natju: tabela de naturezas jurídicas - código e descrição.
    • moti: tabela de motivos da situação cadastral - código e descrição.
    • pais: tabela de países - código e descrição.
    • munic: tabela de municípios - código e descrição.
  • Pelo volume de dados, as tabelas empresa, estabelecimento, socios e simples possuem índices para a coluna cnpj_basico, que é a principal chave de ligação entre elas.

Modelo de Entidade Relacionamento:

alt text

Owner
Aphonso Henrique do Amaral Rafael
Economist, accountant and data & analytics enthusiastic. Data science and statistics permanently student.
Aphonso Henrique do Amaral Rafael
CryptoApp - Python code to pull wallet balances from a variety of different chains through nothing other than your public key.

CryptoApp - Python code to pull wallet balances from a variety of different chains through nothing other than your public key.

Zach Frank 4 Dec 13, 2022
Multi-purpose bot made with discord.py

PizzaHat Discord Bot A multi-purpose bot for your server! ℹ️ • Info PizzaHat is a multi-purpose bot, made to satisfy your needs, as well as your serve

DTS 28 Dec 16, 2022
Discord.py(disnake) selfbot

Zzee selfbot Discord.py selfbot Version: 1.0 ATTENTION! we are not responsible for your discord account! this program violates the ToS discord rules!

1 Jan 10, 2022
Music bot for Discord

Treble Music bot for Discord Youtube is after music bots on Discord. So we are here to fill the void. Introducing Treble, the next generation of Disco

Aja Khanal 0 Sep 16, 2022
Ethone-Selfbot - Open Source Discord Self-Bot, written in discord.py

Ethone SB Table of contents Newest open-source Discord SelfBot with useful commands and easy documentation on how to add your own and change the exist

Ethone 3 Jan 08, 2022
BleachBit system cleaner for Windows and Linux

BleachBit BleachBit cleans files to free disk space and to maintain privacy. Running from source To run BleachBit without installation, unpack the tar

1.9k Jan 06, 2023
Python library for the DeepL language translation API.

The DeepL API is a language translation API that allows other computer programs to send texts and documents to DeepL's servers and receive high-quality translations. This opens a whole universe of op

DeepL 535 Jan 04, 2023
Bitstamp API wrapper for Python

NOTICE: THIS REPOSITORY IS NO LONGER ACTIVELY MAINTAINED It is highly unlikely that I will respond to PRs and questions about usage. This library was

Jack Preston 53 Mar 09, 2022
Discord bot template.py

discord_bot_template.py A minimal and open-source discord.py boilerplate for kick-starting bot projects. I spend a lot of time developing bots for dif

Tarran Prior 1 Feb 24, 2022
New discord token grabber, password and general information

New discord token grabber, password and general information

Monstered 6 Nov 09, 2022
Discord Rich Presence implementation for Plex.

Perplex Perplex is a Discord Rich Presence implementation for Plex. Features Modern and beautiful Rich Presence for both movies and TV shows The Movie

Ethan 52 Dec 19, 2022
A Rich renderable for viewing Multiple Sequence Alignments in the terminal.

rich-msa A simple module to render colorful Multiple Sequence Alignment with rich in the terminal. 🔧 Installing Install the rich-msa package directly

Martin Larralde 64 Dec 04, 2022
Telegram Bot Repo Capable of fetching the following Info via Anilist API inspired from AniFluid and Nepgear

Telegram Bot Repo Capable of fetching the following Info via Anilist API inspired from AniFluid and Nepgear Anime Airing Manga Character Scheduled Top

Rikka-Chan 2 Apr 01, 2022
Fully Automated Omegle Chatbot

omegle-bot tutorial features fast runs in background can run multiple instances at once Requirement Run this command in cmd, terminal or PowerShell (i

6 Aug 07, 2021
A anti-repostbot script for reddit, runs u/ThisIsARepostBotBot

ThisIsARepostBotBot So you found a repost bot, now what? This is a bot to reply to all posts of a repost bot with a message urging users to report and

3 May 23, 2022
An example of using discordpy 2.0.0a to create a bot that supports slash commands

DpySlashBotExample An example of using discordpy 2.0.0a to create a bot that supports slash commands. This is not a fully complete bot, just an exampl

7 Oct 17, 2022
AWS Lambda - Parsing Cloudwatch Data and sending the response via email.

AWS Lambda - Parsing Cloudwatch Data and sending the response via email. Author: Evan Erickson Language: Python Backend: AWS / Serverless / AWS Lambda

Evan Scott Erickson 1 Nov 14, 2021
A program used to create accounts in bulk, still a work in progress as of now.

Discord Account Creator This project is still a work in progress. It will be published upon its full completion. About This project is still under dev

patched 8 Sep 15, 2022
A modular bot running on python3 with anime theme and have a lot features

STINKY ROBOT Emiko Robot is a modular bot running on python3 with anime theme and have a lot features. Easiest Way To Deploy On Heroku This Bot is Cre

Riyan.rz 3 Jan 21, 2022
A demo without 🚀 science, just simple UTXO spending logic.

Stuck TX Demo Docker container that runs 4 dogecoind to demonstrate "the stuck tx problem". Scenario A wallet sends out 3 transactions to a recipient

Patrick Lodder 2 Nov 16, 2021