Dados Públicos de CNPJ disponibilizados pela Receita Federal do Brasil

Overview

Dados Públicos CNPJ

  • Fonte oficial da Receita Federal do Brasil, aqui.
  • Layout dos arquivos, aqui.

A Receita Federal do Brasil disponibiliza bases com os dados públicos do cadastro nacional de pessoas jurídicas (CNPJ).

De forma geral, nelas constam as mesmas informações que conseguimos ver no cartão do CNPJ, quando fazemos uma consulta individual, acrescidas de outros dados de Simples Nacional, sócios e etc. Análises muito ricas podem sair desses dados, desde econômicas, mercadológicas até investigações.

Nesse repositório consta um processo de ETL para i) baixar os arquivos; ii) descompactar; iii) ler, tratar e iv) inserir num banco de dados relacional PostgreSQL.


Infraestrutura necessária:

  • Python 3.8 - libraries:

    • wget
    • pandas
    • ftplib
    • datetime
    • gzip
    • urllib
    • bs4
    • re
    • os
    • zipfile
    • sqlalchemy
    • psycopg2
    • time
  • Banco de dados:


How to use:

  1. Com o Postgre instalado, inicie a instância do servidor (pode ser local) e crie o banco de dados conforme o arquivo banco_de_dados.sql.

  2. Conforme o seu ambiente, substitua as variáveis abaixo no arquivo ETL_coletar_dados_e_gravar_BD.py:

    • output_files: diretório de destino para o donwload dos arquivos
    • user: usuário do banco de dados criado pelo arquivo banco_de_dados.sql
    • passw: senha do usuário do BD
    • host: host da conexão com o BD
    • port: porta da conexão com o BD
    • database: nome da base de dados na instância (Dados_RFB - conforme arquivo banco_de_dados.sql)
  3. Executar o arquivo ETL_coletar_dados_e_gravar_BD.py e aguardar a finalização do processo.

    • Os arquivos são grandes: dependendo da infraestrutura isso deve levar muitas horas para conclusão.
    • Arquivos de 08/05/2021: 4,68 GB compactados e 17,1 GB descompactados.

Tabelas geradas:

  • Para maiores informações, consulte o layout.

    • empresa: dados cadastrais da empresa em nível de matriz
    • estabelecimento: dados analíticos da empresa por unidade / estabelecimento (telefones, endereço, filial, etc)
    • socios: dados cadastrais dos sócios das empresas
    • simples: dados de MEI e Simples Nacional
    • cnae: código e descrição dos CNAEs
    • quals: tabela de qualificação das pessoas físicas - sócios, responsável e representante legal.
    • natju: tabela de naturezas jurídicas - código e descrição.
    • moti: tabela de motivos da situação cadastral - código e descrição.
    • pais: tabela de países - código e descrição.
    • munic: tabela de municípios - código e descrição.
  • Pelo volume de dados, as tabelas empresa, estabelecimento, socios e simples possuem índices para a coluna cnpj_basico, que é a principal chave de ligação entre elas.

Modelo de Entidade Relacionamento:

alt text

Owner
Aphonso Henrique do Amaral Rafael
Economist, accountant and data & analytics enthusiastic. Data science and statistics permanently student.
Aphonso Henrique do Amaral Rafael
Skyscanner Python SDK

Skyscanner Python SDK Important As of May 1st, 2020, the project is deprecated and no longer maintained. The latest update in v1.1.5 includes changing

Skyscanner 118 Sep 23, 2022
This bot is created by AJTimePyro and It accepts direct downloading url & then return file as telegram file.

URL Uploader Bot This is the source code of URL Uploader Bot. And the developer of this bot is AJTimePyro, His Telegram Channel & Group. You can use t

Abhijeet 23 Nov 13, 2022
Twitter FakeNFT With Python

This project is a server that fetches your Twitter profile picture and applies the hexagonal transparency mask displayed on the profiles of users who have an NFT profile picture.

Mathis HAMMEL 29 Apr 23, 2022
A Telegram bot for combining emojis.

combimoji combimoji is a Telegram bot for combining emojis. How can I use it? You can find combimoji at @combimoji_bot, however it is not up (as of No

Yarema Mishchenko 2 Dec 02, 2021
A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats.

VC UserBot A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats. It's made with PyTgCalls and Pyrogram Requirements Python

조던 1 Nov 29, 2021
A mood based crypto tracking application.

Crypto Bud - API A mood based crypto tracking application. The main repository is private. I am creating the API before I connect everything to the ma

Krishnasis Mandal 1 Oct 23, 2021
Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.

GoGettr GoGettr is an API client for GETTR, a "non-bias [sic] social network." (We will not reward their domain with a hyperlink.) GoGettr is built an

Stanford Internet Observatory 72 Dec 14, 2022
This is a script to export logs from AWS CloudTrail to a local file.

cloudtrail-export-logs This is a script to export logs from AWS CloudTrail to a local file. Getting Started Prerequisites python 3 boto3 pip Installin

Claick Assunção de Oliveira 2 Jan 02, 2022
ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete

ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete

240 Jan 06, 2023
Telegram bot to stream videos in telegram voicechat for both groups and channels. Supports live strams, YouTube videos and telegram media.

Telegram bot to stream videos in telegram voicechat for both groups and channels. Supports live strams, YouTube videos and telegram media.

SUBIN 449 Dec 27, 2022
Easy to use phishing tool with 63 website templates. Author is not responsible for any misuse.

PyPhisher [+] Created By KasRoudra [+] Description : Ultimate phishing tool in python. Includes popular websites like facebook, twitter, instagram, gi

KasRoudra 1.1k Jan 01, 2023
Ubuntu env build; Nginx build; DB build;

Deploy 介绍 Deploy related scripts bitnami Dependencies Ubuntu openssl envsubst docker v18.06.3 docker-compose init base env upload https://gitlab-runn

Colin(liuji) 10 Dec 01, 2021
This project checks the weather in the next 12 hours and sends an SMS to your phone number if it's going to rain to remind you to take your umbrella.

RainAlert-Request-Twilio This project checks the weather in the next 12 hours and sends an SMS to your phone number if it's going to rain to remind yo

9 Apr 15, 2022
Cogs for Red-DiscordBot

matcha-cogs Cogs for Red-DiscordBot. Installation [p]repo add matcha-cogs

MatchaTeaLeaf 2 Aug 27, 2022
Huan Xu 1.6k Jan 04, 2023
Telegram PHub Bot using ARQ Api and Pyrogram. This Bot can Download and Send PHub HQ videos in Telegram using ARQ API.

Tg_PHub_Bot Telegram PHub Bot using ARQ Api and Pyrogram. This Bot can Download and Send PHub HQ videos in Telegram using ARQ API. OS Support All linu

TheProgrammerCat 13 Oct 21, 2022
A Discord bot themed around the Swedish heavy metal band Sabaton! (Python)

A Discord bot themed around the Swedish heavy metal band Sabaton! (Python)

Evan Lundberg 1 Nov 29, 2021
Python SDK for accessing the Hanko Authentication API

Hanko Authentication SDK for Python This package is maintained by Hanko. Contents Introduction Documentation Installation Usage Prerequisites Create a

Hanko.io 3 Mar 08, 2022
ELiza music is a telegram music bot project, allow you to play music on voice chat group telegram.

❤️ 𝗘𝗹𝗶𝘇𝗮 𝗠𝘂𝘀𝗶𝗰 ❤️ Unmaintained. The new repo of @MrsElizaRobot is private. (It is no longer based on this source code. The completely rewrit

Team Eliza 2 Dec 08, 2022
Utility for converting IP Fabric webhooks into a Teams format

IP Fabric Webhook Integration for Microsoft Teams and/or Slack Setup IP Fabric Setup Go to Settings Webhooks Add webhook Provide a name URL will b

Community Fabric 1 Jan 26, 2022