Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
A simple, fantasy and fast note taking program.

notes A simple, fantasy and fast note taking program Installation This program supposed to run in linux and may have some bugs on windows or any other

Ali Hosseinverdi 1 Apr 06, 2022
A small Python library which gives you the IEEE-754 representation of a floating point number.

ieee754 ieee754 is small Python library which gives you the IEEE-754 representation of a floating point number. You can specify a precision given in t

Bora Canbula 5 Dec 20, 2022
디텍션 유틸 모음

Object detection utils 유틸모음 설명 링크 convert convert 관련코드 https://github.com/AI-infinyx/ob_utils/tree/main/convert crawl 구글, 네이버, 빙 등 크롤링 관련 https://gith

codetest 41 Jan 22, 2021
This repo contains scripts that add functionality to xbar.

xbar-custom-plugins This repo contains scripts that add functionality to xbar. Usage You have to add scripts to xbar plugin folder. If you don't find

osman uygar 1 Jan 10, 2022
BDD base project: Python + Behave

BDD base project: Python + Behave Basic example of using Python with Behave (BDD). This Gherkin example includes: Basic Scenario Scenario Outline Tagg

eccanto 1 Dec 08, 2021
Mmr image postbot - Бот для создания изображений с новыми релизами в сообщество ВК MMR Aggregator

Mmr image postbot - Бот для создания изображений с новыми релизами в сообщество ВК MMR Aggregator

Max 3 Jan 07, 2022
Checks for Vaccine Availability at your district and notifies you using E-mail, subscribe to our website.

Vaccine Availability Notifier Project Description Checks for Vaccine Availability at your district and notifies you using E-mail every 10 mins. Kindly

Farhan Hai Khan 19 Jun 03, 2021
Moleey Panel with python 3

Painel-Moleey pkg upgrade && pkg update pkg install python3 pip install pyfiglet pip install colored pip install requests pip install phonenumbers pkg

Moleey. 1 Oct 17, 2021
A Python version of Canvacord

A copy of canvacord made in python! Installation Run any of these commands in terminal: Mac / Linux pip install canvacord Windows python -m pip insta

10 Mar 28, 2022
A Python wrapper around Bacting

pybacting Python wrapper around bacting. Usage Based on the example from the bacting page, you can do: from pybacting import cdk print(cdk.fromSMILES

Charles Tapley Hoyt 5 Jan 03, 2022
Import Apex legends mprt files exported from Legion

Apex-mprt-importer-for-Blender Import Apex legends mprt files exported from Legion. REQUIRES CAST IMPORTER Usage: Use a VPK extracter to extract the m

15 Dec 18, 2022
Always fill your package requirements without the user having to do anything! Simple and easy!

WSL Should now work always-fill-reqs-python3 Always fill your package requirements without the user having to do anything! Simple and easy! Supported

Hashm 7 Jan 19, 2022
💘 Write any Python with 9 Characters: e,x,c,h,r,(,+,1,)

💘 PyFuck exchr(+1) PyFuck is a strange playful code. It uses only nine different characters to write Python3 code. Inspired by aemkei/jsfuck Example

Satoki 10 Dec 25, 2022
A free micro-blog written in Python and powered by Heroku. *Merge requests are appreciated!*

Background Hobo is an ultra-lightweight blog engine written in Python. It has two dependencies, fully integrated into the codebase with no additional

Andrew Nelder 48 Jan 28, 2021
Beacon Object File (BOF) to obtain a usable TGT for the current user.

Beacon Object File (BOF) to obtain a usable TGT for the current user.

Connor McGarr 109 Dec 25, 2022
Py4J enables Python programs to dynamically access arbitrary Java objects

Py4J Py4J enables Python programs running in a Python interpreter to dynamically access Java objects in a Java Virtual Machine. Methods are called as

Barthelemy Dagenais 1k Jan 02, 2023
This is an API to get user details for competitive coding platforms - Codeforces, Codechef, SPOJ, Interviewbit. More Platform will be Added Soon.

Competitive-Programming-Score-API An API to get user details for competitive coding platforms - Codeforces, Codechef, SPOJ, Interviewbit Platforms Ava

Aaditya Prakash 3 Jan 17, 2022
A simple single-color identicon generator

Identicons What are identicons? Setup: git clone https://github.com/vjdad4m/identicons.git cd identicons pip3 install -r requirements.txt chmod +x

Adam Vajda 1 Oct 31, 2021
SWS Filters App - SWS Filters App With Python

SWS Filters App Fun 😅 ... Fun 😅 Click On photo and see 😂 😂 😂 Your Video rec

Sagar Jangid 3 Jul 07, 2022
Buffer overflow example for python

Buffer overflow example for python

Mehmet 1 Jan 04, 2022