Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
Fuzz introspector for python

Fuzz introspector High-level goals: Show fuzzing-relevant data about each function in a given project Show reachability of fuzzer(s) Integrate seamles

14 Mar 25, 2022
Project Guide for ASAM OpenX standards

ASAM Project Guide Important This guide is a work in progress and subject to change! Hosted version available at: ASAM Project Guide (Link) Includes:

ASAM e.V. 2 Mar 17, 2022
Library for Memory Trace Statistics in Python

Memory Search Library for Memory Trace Statistics in Python The library uses tracemalloc as a core module, which is why it is only available for Pytho

Memory Search 1 Dec 20, 2021
A very terrible python-based programming language that uses folders instead of text files

PYFolders by Lewis L. Foster PYFolders is a very terrible python-based programming language that uses folders instead of regular text files. In this r

Lewis L. Foster 5 Jan 08, 2022
A Python package to request and process seismic waveform data from Hi-net.

HinetPy is a Python package to simplify tedious data request, download and format conversion tasks related to NIED Hi-net. NIED Hi-net | Source Code |

Dongdong Tian 65 Dec 09, 2022
A program that makes all 47 textures of Optifine CTM only using 2 textures

A program that makes all 47 textures of Optifine CTM only using 2 textures

1 Jan 22, 2022
An OBS script to fuze files together

OBS TEXT FUZE Fuze text files and inject the output into a text source. The Index file directory should be a list of file directorys for the text file

SuperZooper3 1 Dec 27, 2021
Irrigation Component V4 providing support for a custom card

Irrigation Component V4 This release sees the delivery of a custom card https://github.com/petergridge/irrigation_card to render the program options s

12 Oct 28, 2022
Pulse sequence builder and compiler for q1asm

q1pulse Pulse sequence builder and compiler for q1asm. q1pulse is a simple library to compile pulse sequence to q1asm, the assembly language of Qblox

Sander de Snoo 3 Dec 14, 2022
Custom component to calculate estimated power consumption of lights and other appliances

Custom component to calculate estimated power consumption of lights and other appliances. Provides easy configuration to get virtual power consumption sensors in Home Assistant for all your devices w

Bram Gerritsen 552 Dec 28, 2022
Tutorial on Tempo, Beat and Downbeat estimation

Tempo, Beat and Downbeat Estimation By Matthew E. P. Davies, Sebastian Böck and Magdalena Fuentes Resources and Jupyter Book for the ISMIR 2021 tutori

49 Nov 06, 2022
Automatically unpin old messages so you can always pin more!

PinRotate Automatically unpin old messages so you can always pin more! Installation You will need to install poetry to run this bot locally for develo

3 Sep 18, 2022
Birthday program - A program that lookups a birthday txt file and compares to the current date to check for birthdays

Birthday Program This is a program that lookups a birthday txt file and compares

Daquiver 4 Feb 02, 2022
Uma versão em Python/Ursina do aplicativo Real Drum (android).

Real Drum Descrição Esta é uma versão alternativa feita em Python com a engine Ursina do aplicatio Real Drum (presente no Google Play Store). Como exe

hayukimori 5 Aug 20, 2022
SimilarWeb for Team ACT v.0.0.1

SimilarWeb for Team ACT v.0.0.1 This module has been built to provide a better environment specifically for Similarweb in Team ACT. This module itself

Sunkyeong Lee 0 Dec 29, 2021
⚙️ Compile, Read and update your .conf file in python

⚙️ Compile, Read and update your .conf file in python

Reece Harris 2 Aug 15, 2022
Repository for DNN training, theory to practice, part of the Large Scale Machine Learning class at Mines Paritech

DNN Training, from theory to practice This repository is complementary to the deep learning training lesson given to les Mines ParisTech on the 11th o

Alexandre Défossez 6 Nov 14, 2022
Python framework to build apps with the GASP metaphor

Gaspium Python framework to build apps with the GASP metaphor This project is part of the Pyrustic Open Ecosystem. Installation | Documentation | Late

5 Jan 01, 2023
A guy with a lot of useful things to do when doing AtCoder in Python

atcoder_python_env Python で AtCoder をやるときに便利な諸々を用意したやつ コンテスト用フォルダの作成 セットアップ 自動テス

2 Dec 28, 2021
Model synchronization from dbt to Metabase.

dbt-metabase Model synchronization from dbt to Metabase. If dbt is your source of truth for database schemas and you use Metabase as your analytics to

Mike Gouline 270 Jan 08, 2023