Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
LinuxHelper - A collection of utilities for non-technical Linux users accessible via a GUI

Linux Helper A collection of utilities for non-technical Linux users accessible via a GUI This app is still in very early development, expect bugs and

Seth 7 Oct 03, 2022
A tool to nowcast quarterly data with monthly indicators: US consumption example

MIDAS_Nowcaster A tool to nowcast quarterly data with monthly indicators: US consumption example Pulls data directly from FRED from a list of codes -

Gene Kindberg-Hanlon 3 Oct 06, 2022
tgEasy | Easy for a Brighter Shine | Monkey Patcher Addon for Pyrogram

tgEasy | Easy for a Brighter Shine | Monkey Patcher Addon for Pyrogram

Jayant Hegde Kageri 35 Nov 12, 2022
OWASP Foundation Web Respository

WWWGrep OWASP Foundation Web Respository Author: Mark Deen & Aditi Mohan Introduction WWWGrep is a rapid search “grepping” mechanism that examines HTM

OWASP 34 Jun 15, 2022
使用clash核心,对服务器进行Netflix解锁批量测试。

注意事项 测速及解锁测试仅供参考,不代表实际使用情况,由于网络情况变化、Netflix封锁及ip更换,测速具有时效性 本项目使用 Python 编写,使用前请完成环境安装 首次运行前请安装pip及相关依赖,也可使用 pip install -r requirements.txt 命令自行安装 Net

11 Dec 07, 2022
A package with multiple bias correction methods for climatic variables, including the QM, DQM, QDM, UQM, and SDM methods

A package with multiple bias correction methods for climatic variables, including the QM, DQM, QDM, UQM, and SDM methods

Sebastián A. Aedo Quililongo 9 Nov 18, 2022
STAC in Jupyter Notebooks

stac-nb STAC in Jupyter Notebooks Install pip install stac-nb Usage To use stac-nb in a project, start Jupyter Lab (jupyter lab), create a new noteboo

Darren Wiens 32 Oct 04, 2022
A parallel branch-and-bound engine for Python.

pybnb A parallel branch-and-bound engine for Python. This software is copyright (c) by Gabriel A. Hackebeil (gabe.hacke

Gabriel Hackebeil 52 Nov 12, 2022
Convex Optimisation MVA course - Assignment

Convex Optimisation MVA course - Assignment This repository contains the coding files of the third assignment in the MVA Convex Optimisation course. U

1 Nov 27, 2021
A few of my adventures with Devito.

Devito-playbox A few of my adventures with Devito. This repository contains a few notebooks and scripts that will lead me in the road of learning this

Átila Saraiva Quintela Soares 1 Feb 08, 2022
A small program to vote for Councilors at 42 Heilbronn.

This Docker container is build to run on server an provide an easy to use interface for every student to vote for their councillors. To run docker on

Kevin Hirsig 2 Jan 17, 2022
Repo to store back end infrastructure for Message in a Bottle

Message in a Bottle Backend API RESTful API for Message in a Bottle frontend application consumption. About The Project • Tools Used • Local Set Up •

4 Dec 05, 2021
Minimalistic Gridworld Environment (MiniGrid)

Minimalistic Gridworld Environment (MiniGrid) There are other gridworld Gym environments out there, but this one is designed to be particularly simple

Maxime Chevalier-Boisvert 1.7k Jan 03, 2023
A basic notes app to store your notes.

Notes Webapp A basic notes webapp to keep your notes.You can add, edit and delete notes after signing up. To add a note type your note in the text box

2 Oct 23, 2021
Addon to give a keybind to automatically enable contact shadows on all lights in a scene

3-2-1 Contact(Shadow) An easy way to let you enable contact shadows on all your lights, because Blender doesn't enable it by default, and doesn't give

TDV Alinsa 3 Feb 02, 2022
Python programming language Test

Exercise You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirem

Monirul Islam Khan 1 Dec 13, 2021
This project recreates the R-based RCy3 Cytoscape Automation library as a Python package.

Python library for calling Cytoscape Automation via CyREST

Cytoscape Consortium 40 Dec 22, 2022
Hitchhikers-guide - The Hitchhiker's Guide to Data Science for Social Good

Welcome to the Hitchhiker's Guide to Data Science for Social Good. What is the Data Science for Social Good Fellowship? The Data Science for Social Go

Data Science for Social Good 907 Jan 01, 2023
Web3 Solidity Connector

With this project, you can compile your sol files and create new transactions including creating contract and calling the state changer functions. You can integrate integrate your sol files with Pyth

Fethi Tekyaygil 3 Oct 09, 2022
Script to change official Kali repository to mirrors

Script to change official Kali repository to mirrors. This helps increase packages update and downloading for some user.

Vineet Bhavsar 2 Nov 29, 2021