Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
GCP Scripts and API Client Toolss

GCP Scripts and API Client Toolss Script Authentication The scripts and CLI assume GCP Application Default Credentials are set. Credentials can be set

3 Feb 21, 2022
This is a menu driven Railway Reservation Project which is mainly based on the python-mysql connectivity.

Online-Railway-Reservation-System This is a menu driven Railway Reservation Project which is mainly based on the python-mysql connectivity. The projec

Ananya Gupta 1 Jan 09, 2022
Fly DCS without a joystick

Intro Usage Delete all mouse view axis Install DCSEasyControlExports to your "Saved Games/DCS/" Path python DCSEasyControl/main.py Set DCS to F12 view

XuHao 36 Dec 27, 2022
Tool that adds githuh profile views to ur acc

Tool that adds githuh profile views to ur acc

Lamp 2 Nov 28, 2021
UFDR2DIR - A script to convert a Cellebrite UFDR to the original file structure

UFDR2DIR A script to convert a Cellebrite UFDR to it's original file and directo

DFIRScience 25 Oct 24, 2022
Monitor the New World login queue and notify when it is about to finish

nwwatch - Monitor the New World queue and notify when it is about to finish Getting Started install python 3.7+ navigate to the directory where you un

14 Jan 10, 2022
This repo contains scripts that add functionality to xbar.

xbar-custom-plugins This repo contains scripts that add functionality to xbar. Usage You have to add scripts to xbar plugin folder. If you don't find

osman uygar 1 Jan 10, 2022
Jarvis Python BOT acts like Google-assistance

Jarvis-Python-BOT Jarvis Python BOT acts like Google-assistance Setup Add Mail ID (Gmail) in the file at line no 82.

Ishan Jogalekar 1 Jan 08, 2022
A refresher for PowerBI Desktop documents

PowerBI_Refresher-NPP Informació Per executar el programa s'ha de tenir instalat el python versio 3 o mes. Requeriments a requirements.txt. El fitxer

Nil Pujol 1 May 02, 2022
Hexa is an advanced browser.It can carry out all the functions present in a browser.

Hexa is an advanced browser.It can carry out all the functions present in a browser.It is coded in the language Python using the modules PyQt5 and sys mainly.It is gonna get developed more in the fut

1 Dec 10, 2021
[draft] tools for schnetpack

schnetkit some tooling for schnetpack EXPERIMENTAL/IN DEVELOPMENT DO NOT USE This is an early draft of some infrastructure built around schnetpack. In

Marcel 1 Nov 08, 2021
Meaningful and minimalist release notes for developers

Managing manual release notes is hard. Therefore, everyone tends to generate release notes from commit messages. But, you won't get a meaningful release note at the end.

codezri 31 Dec 30, 2022
C++ Environment InitiatorVisual Studio Code C / C++ Environment Initiator

Visual Studio Code C / C++ Environment Initiator Latest Version : v 1.0.1(2021/11/08) .exe link here About : Visual Studio Code에서 C/C++환경을 MinGW GCC/G

Junho Yoon 2 Dec 19, 2021
The Python Fuzzer that the world deserves 🐍

pip3 install frelatage Current release : 0.0.2 The Python Fuzzer that the world deserves Installation | How it works | Features | Use Frelatage | Conf

Rog3r 219 Dec 21, 2022
Convert-Decimal-to-Binary-Octal-and-Hexadecimal

Convert-Decimal-to-Binary-Octal-and-Hexadecimal We have a number in a decimal number, and we have to convert it into a binary, octal, and hexadecimal

Maanyu M 2 Oct 08, 2021
Python Control Systems Library

The Python Control Systems Library is a Python module that implements basic operations for analysis and design of feedback control systems.

Control Systems Library for Python 1.3k Jan 06, 2023
Dockernized ZeroTierOne controller with zero-ui web interface.

docker-zerotier-controller Dockernized ZeroTierOne controller with zero-ui web interface. 中文讨论 Customize ZeroTierOne's controller planets Modify patch

sbilly 209 Jan 04, 2023
High-level bindings to the Valhalla framework.

Valhalla for Python This spin-off project simply offers improved Python bindings to the fantastic Valhalla project. Installation pip install valhalla

GIS • OPS 20 Dec 13, 2022
use Notepad++ for real-time sync after python appending new log text

FTP远程log同步工具 使用Notepad++配合来获取实时更新的log文档效果 适用于FTP协议的log远程同步工具,配合MT管理器开启FTP服务器使用,通过Notepad++监听文本变化,更便捷的使用电脑查看方法注入打印后的信息 功能 过滤器 对每行要打印的文本使用回调函数筛选,支持链式调用

Liuhaixv 1 Oct 17, 2021
Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting

Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting

Junhua Zou 7 Oct 20, 2022