Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
[Cython] Vs [Python] Which one is Faster ?

[Cython] Vs [Python] ? Attractive Contrast :) Mission : Which one is Faster ? Comparing of Execution runtime for [Selection_sort] with Time Complexity

baqer marani 1 Dec 05, 2021
Reproduction repository for the MDX 2021 Hybrid Demucs model

Submission This is the submission for MDX 2021 Track A, for Track B go to the track_b branch. Submission Summary Submission ID: 151378 Submitter: defo

Alexandre Défossez 62 Dec 18, 2022
Validate UC alumni identifier numbers with Python 3.

UC number validator Validate UC alumni identifier numbers with Python 3. Getting started Install the library with: pip install -U ucnumber Usage from

Open Source eUC 1 Jul 07, 2021
A community-driven python bot that aims to be as simple as possible to serve humans with their everyday tasks

JARVIS on Messenger Just A Rather Very Intelligent System, now on Messenger! Messenger is now used by 1.2 billion people every month. With the launch

Swapnil Agarwal 1.3k Jan 07, 2023
This Open-Source project is great for sensor capture and storage solutions.

Phase 1 This project helps developers in the creation of extended realities that communicate with Arduino and require the security of blockchain stora

Wolfberry, LLC 10 Dec 28, 2022
Cairo-math-64x61 - Fixed point 64.61 math library for Cairo / Starknet

Cairo Math 64x61 A fixed point 64.61 math library for Cairo & Starknet Signed 64

Influence 63 Dec 05, 2022
(Pre-)compromise operations for MITRE CALDERA

(Pre-)compromise operations for CALDERA Extend your CALDERA operations over the entire adversary killchain. In contrast to MITRE's access plugin, cald

Diederik Bakker 3 Aug 22, 2022
This is a small Panel applet for the Budgie Desktop to display the battery charge of a connected Bluetooth device.

BudgieBluetoothBattery This is a small Panel applet for the Budgie Desktop to display the battery charge of a connected Bluetooth device. It uses the

Konstantin Köhring 7 Dec 05, 2022
Workshop OOP - Workshop OOP - Discover object-oriented programming

About: This is an open-source bot, the code is open for anyone to see, fork and

Francis Clairicia-Rose-Claire-Joséphine 5 May 02, 2022
A collection of resources on neural rendering.

awesome neural rendering A collection of resources on neural rendering. Contributing If you think I have missed out on something (or) have any suggest

1.8k Dec 30, 2022
Python Multilingual Ucrel Semantic Analysis System

PymUSAS Python Multilingual Ucrel Semantic Analysis System, it currently is a rule based token level semantic tagger which can be added to any spaCy p

UCREL 13 Nov 18, 2022
Checking-For-Fibonacci-Syquence-In-Python - Checking For Fibonacci Syquence In Python

Checking-For-Fibonacci-Syquence-In-Python The Fibonacci sequence is a set of num

John Michael Oliba 1 Feb 14, 2022
Cross-platform .NET Core pre-commit hooks

dotnet-core-pre-commit Cross-platform .NET Core pre-commit hooks How to use Add this to your .pre-commit-config.yaml - repo: https://github.com/juan

Juan Odicio 5 Jul 20, 2021
a really simple bot that send you memes from reddit to whatsapp

a really simple bot that send you memes from reddit to whatsapp want to use use it? install the dependencies with pip3 install -r requirements.txt the

pai 10 Nov 28, 2021
A Desktop application for the signalum python library

Signalum Desktop A Desktop application on the Signalum Python Library/CLI Tool. The Signalum Desktop application is an attempt to develop a single too

BISOHNS 35 Feb 15, 2021
Rock 💎 Paper 📝 Scissors ✂️ Lizard 🦎 Spock 🖖

Rock 💎 Paper 📝 Scissors ✂️ Lizard 🦎 Spock 🖖 If you’ve seen The Big Bang Theory, you’ve heard of a game called “Rock, Paper, Scissors, Lizard, Spoc

AmirHossein Mohammadi 16 Jun 19, 2022
The mock Pokemon Environment I built in 2019 to study Reinforcement Learning + Pokemon

ghetto-pokemon-rl-environment ##NOT MAINTAINED! Fork and maintain yourself. Environment I made back in 2019 to use Pokemon to practice reinforcement l

2 Dec 09, 2021
InverterApi - This project has been designed to take monitoring data from Voltronic, Axpert, Mppsolar PIP, Voltacon, Effekta

InverterApi - This project has been designed to take monitoring data from Voltronic, Axpert, Mppsolar PIP, Voltacon, Effekta

Josep Escobar 2 Sep 03, 2022
A python based app to improve your presentation workflow

Presentation Remote A remote made for making presentations easier by enabling all the members to have access to change the slide and control the flow

Parnav 1 Oct 28, 2021
Originally used during Marketplace.tf's open period, this program was used to get the profit of items bought with keys and sold for dollars.

Originally used during Marketplace.tf's open period, this program was used to get the profit of items bought with keys and sold for dollars. Practically useless for me now, but can be used as an exam

BoggoTV 1 Dec 11, 2021