Dados Públicos de CNPJ disponibilizados pela Receita Federal do Brasil

Overview

Dados Públicos CNPJ

  • Fonte oficial da Receita Federal do Brasil, aqui.
  • Layout dos arquivos, aqui.

A Receita Federal do Brasil disponibiliza bases com os dados públicos do cadastro nacional de pessoas jurídicas (CNPJ).

De forma geral, nelas constam as mesmas informações que conseguimos ver no cartão do CNPJ, quando fazemos uma consulta individual, acrescidas de outros dados de Simples Nacional, sócios e etc. Análises muito ricas podem sair desses dados, desde econômicas, mercadológicas até investigações.

Nesse repositório consta um processo de ETL para i) baixar os arquivos; ii) descompactar; iii) ler, tratar e iv) inserir num banco de dados relacional PostgreSQL.


Infraestrutura necessária:

  • Python 3.8 - libraries:

    • wget
    • pandas
    • ftplib
    • datetime
    • gzip
    • urllib
    • bs4
    • re
    • os
    • zipfile
    • sqlalchemy
    • psycopg2
    • time
  • Banco de dados:


How to use:

  1. Com o Postgre instalado, inicie a instância do servidor (pode ser local) e crie o banco de dados conforme o arquivo banco_de_dados.sql.

  2. Conforme o seu ambiente, substitua as variáveis abaixo no arquivo ETL_coletar_dados_e_gravar_BD.py:

    • output_files: diretório de destino para o donwload dos arquivos
    • user: usuário do banco de dados criado pelo arquivo banco_de_dados.sql
    • passw: senha do usuário do BD
    • host: host da conexão com o BD
    • port: porta da conexão com o BD
    • database: nome da base de dados na instância (Dados_RFB - conforme arquivo banco_de_dados.sql)
  3. Executar o arquivo ETL_coletar_dados_e_gravar_BD.py e aguardar a finalização do processo.

    • Os arquivos são grandes: dependendo da infraestrutura isso deve levar muitas horas para conclusão.
    • Arquivos de 08/05/2021: 4,68 GB compactados e 17,1 GB descompactados.

Tabelas geradas:

  • Para maiores informações, consulte o layout.

    • empresa: dados cadastrais da empresa em nível de matriz
    • estabelecimento: dados analíticos da empresa por unidade / estabelecimento (telefones, endereço, filial, etc)
    • socios: dados cadastrais dos sócios das empresas
    • simples: dados de MEI e Simples Nacional
    • cnae: código e descrição dos CNAEs
    • quals: tabela de qualificação das pessoas físicas - sócios, responsável e representante legal.
    • natju: tabela de naturezas jurídicas - código e descrição.
    • moti: tabela de motivos da situação cadastral - código e descrição.
    • pais: tabela de países - código e descrição.
    • munic: tabela de municípios - código e descrição.
  • Pelo volume de dados, as tabelas empresa, estabelecimento, socios e simples possuem índices para a coluna cnpj_basico, que é a principal chave de ligação entre elas.

Modelo de Entidade Relacionamento:

alt text

Owner
Aphonso Henrique do Amaral Rafael
Economist, accountant and data & analytics enthusiastic. Data science and statistics permanently student.
Aphonso Henrique do Amaral Rafael
𝐀 𝐦𝐨𝐝𝐮𝐥𝐚𝐫 𝐓𝐞𝐥𝐞𝐠𝐫𝐚𝐦 𝐆𝐫𝐨𝐮𝐩 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐛𝐨𝐭 𝐰𝐢𝐭𝐡 𝐮𝐥𝐭𝐢𝐦𝐚𝐭𝐞 𝐟𝐞𝐚𝐭𝐮𝐫𝐞𝐬 !!

𝐇𝐨𝐰 𝐓𝐨 𝐃𝐞𝐩𝐥𝐨𝐲 For easiest way to deploy this Bot click on the below button 𝐌𝐚𝐝𝐞 𝐁𝐲 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐆𝐫𝐨𝐮𝐩 𝐒𝐨𝐮𝐫𝐜𝐞𝐬 𝐆𝐞𝐧𝐞?

Mukesh Solanki 4 Oct 18, 2021
汪汪Bot是一个Telegram Bot,用于帮助你管理一台服务器上的Docker

WangWangBot 汪汪Bot是一个Telegram Bot,用于帮助你管理一台服务器上的Docker运行的Bot。这是使用视频: 部署说明 安装运行 准备 local.env 普通版本 BOT_TOKEN=你的BOT_TOKEN ADMINS=使用,分隔的管理员ID列表 如果你使用doppl

老房东的代码练习册 56 Aug 23, 2022
An advanced Filter Bot with nearly unlimitted filters!

Unlimited Filter Bot ㅤㅤㅤㅤㅤㅤㅤ ㅤㅤㅤㅤㅤㅤㅤ An advanced Filter Bot with nearly unlimitted filters! Features Nearly unlimited filters Supports all type of fil

1 Nov 20, 2021
A Python app to serve Conveyor room requests and run approvals through Slack

✨ CONVEYOR FOR SLACK ✨ This is a friendly little Python app that will allow you to integrate your instance of Conveyor with your Slack workspace. In o

Vivienne 4 Sep 27, 2021
Student-Management-System-in-Python - Student Management System in Python

Student-Management-System-in-Python Student Management System in Python

G.Niruthian 3 Jan 01, 2022
Tools to download and aggregate feeds of vaccination clinic location information in the United States.

vaccine-feed-ingest Pipeline for ingesting nationwide feeds of vaccine facilities. Contributing How to Configure your environment (instructions on the

Call the Shots 26 Aug 05, 2022
Python wrapper for CoWin API's

Cowin Tracker Python API wrapper for CoWin, India's digital platform launched by the government to help citizens register themselves for the vaccinati

Saiprasad Balasubramanian 43 Jun 11, 2022
A Twitch bot to provide information from the WebNowPlayingCompanion extension

WebNowPlayingTwitch A Twitch bot made using TwitchIO which displays information obtained from the WebNowPlaying Compaion browser extension. Image is o

NiceAesth 1 Mar 21, 2022
This Wrapper is a Discum Copy With Addons, original one is made by Merubokkusu

Remaded Discum Its not Official Discum Wrapper ! This Wrapper is a Discum Copy With Addons, original one is made by Merubokkusu Authors @merubokkusu (

discum-remaded 8 Aug 09, 2022
Azure Neural Speech Service TTS

Written in Python using the Azure Speech SDK. App.py provides an easy way to create an Text-To-Speech request to Azure Speech and download the wav file. Azure Neural Voices Text-To-Speech enables flu

Rodney 4 Dec 14, 2022
Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!

SageMaker Studio Lab Sample Notebooks Available today in public preview. If you are looking for a no-cost compute environment to run Jupyter notebooks

Amazon Web Services 304 Jan 01, 2023
A simple message content sniping Discord bot which you can run yourself! Sniping API pulled from isobot and Arch bot

Discord Snipe Bot This is a bot made with the same message content sniping API from isobot and Arch bot. It's default prefix is -, however you can als

notsniped 5 Aug 11, 2022
A discord nuking tool made by python, this also has nuke accounts, inbuilt Selfbot, Massreport, Token Grabber, Nitro Sniper and ALOT more!

Disclaimer: Rage Multi Tool was made for Educational Purposes This project was created only for good purposes and personal use. By using Rage, you agr

†† 50 Jul 19, 2022
A Advanced Auto Filter Bot Which Can Be Used In Many Groups With Multiple Channel Support....

Adv Auto Filter Bot This Just A Simple Hand Auto Filter Bot For Searching Files From Channel... Just Sent Any Text I Will Search In All Connected Chat

Albert Einstein 33 Oct 21, 2022
Kyura-Userbot: a modular Telegram userbot that runs in Python3 with a sqlalchemy database

Kyura-Userbot Telegram Kyura-Userbot adalah userbot Telegram modular yang berjal

Kyura 17 Oct 29, 2022
A simple and easy to use musicbot in python and it uses lavalink.

Lavalink-MusicBot A simple and easy to use musicbot in python and it uses lavalink. ✨ Features plays music in your discord server well thats it i gues

Afnan 1 Nov 29, 2021
this is a telegram bot repository, that can stream video on telegram group video chat.

VIDEO STREAM BOT telegram bot project for streaming video on telegram video chat, powered by tgcalls and pyrogram 🛠 Commands: /vstream (reply to vide

levina 319 Aug 15, 2022
Enumerate Microsoft 365 Groups in a tenant with their metadata

Enumerate Microsoft 365 Groups in a tenant with their metadata Description The all_groups.py script allows to enumerate all Microsoft 365 Groups in a

Clément Notin 46 Dec 26, 2022
A Telegram Bot To Stream Videos in Telegram Voice Chat.

Video Stream X Bot Telegram bot project for streaming video on telegram video chat, powered by tgcalls and pyrogram Deploy to Heroku 👨‍🔧 The easy wa

Mⷨoͦns͛ᴛⷮeͤrͬ Zeͤrͬoͦ 13 Dec 05, 2022