Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
A simple program to recolour simple png icon-like pictures with just one colour + transparent or white background. Resulting images all have transparent background and a new colour.

A simple program to recolour simple png icon-like pictures with just one colour + transparent or white background. Resulting images all have transparent background and a new colour.

Anna Tůmová 0 Jan 30, 2022
DeDRM tools for ebooks

DeDRM_tools DeDRM tools for ebooks This is a fork of Apprentice Harper's version of the DeDRM tools. I've added some of the PRs that still haven't bee

2 Jan 10, 2022
Simple Wayland HotKey Daemon

swhkd Simple Wayland HotKey Daemon This project is still very new and I'm making new decisions everyday as to where I should drive this project. I'm u

Aakash Sen Sharma 407 Dec 30, 2022
Your one and only Discord Bot that helps you concentrate!

Your one and only Discord Bot thats helps you concentrate! Consider leaving a ⭐ if you found the project helpful. concy-bot A bot which constructively

IEEE VIT Student Chapter 22 Sep 27, 2022
Battle-Ship - Python-console battle ship

Battle-Ship this SHOULD work in lenux(if i spelled it wrong spam issues till I fix it) the thing that maby wont work is where it clears the screen the

pl608 2 Jan 06, 2022
Collection of tools to be more productive in your work environment and to avoid certain repetitive tasks. 💛💙💚

Collection of tools to be more productive in your work environment and to avoid certain repetitive tasks. 💛💙💚

Raja Rakotonirina 2 Jan 10, 2022
Python implementation of an automatic parallel parking system in a virtual environment, including path planning, path tracking, and parallel parking

Automatic Parallel Parking: Path Planning, Path Tracking & Control This repository contains a python implementation of an automatic parallel parking s

134 Jan 09, 2023
This is a modified variation of abhiTronix's vidgear. In this variation, it is possible to write the output file anywhere regardless the permissions.

Info In order to download this package: Windows 10: Press Windows+S, Type PowerShell (cmd in older versions) and hit enter, Type pip install vidgear_n

Ege Akman 3 Jan 30, 2022
Removes all archived super productivity tasks. Just run the python script.

delete-archived-sp-tasks.py Removes all archived super productivity tasks. Just run the python script. This is helpful to do a cleanup every 3-6 month

Ben Herbst 1 Jan 09, 2022
An interactive tool with which to explore the possible imaging performance of candidate ngEHT architectures.

ngEHTexplorer An interactive tool with which to explore the possible imaging performance of candidate ngEHT architectures. Welcome! ngEHTexplorer is a

Avery Broderick 7 Jan 28, 2022
本仓库整理了腾讯视频、爱奇艺、优酷、哔哩哔哩等视频网站中,能够观看的「豆瓣电影 Top250 榜单」影片。

Where is top 250 movie ? 本仓库整理了腾讯视频、爱奇艺、优酷、哔哩哔哩等视频网站中,能够观看的「豆瓣电影 Top250 榜单」影片,点击 Badge 可跳转至相应的电影首页。

MayanDev 123 Dec 22, 2022
This is a spamming selfbot that has custom spammed message and @everyone spam.

This is a spamming selfbot that has custom spammed message and @everyone spam.

astro1212 1 Jul 31, 2022
The git for the Python Story Utility Package library.

PSUP, The Python Story Utility Package Module. PSUP helps making stories or games with options, diverging paths, different endings and so on. You can

Enoki 6 Nov 27, 2022
A tool for removing PUPs using signatures

Unwanted program removal tool A tool for removing PUPs using signatures What is the unwanted program removal tool? The unwanted program removal tool i

4 Sep 20, 2022
Automatically remove user join messages when the user leaves the server.

CleanLeave Automatically remove user join messages when the user leaves the server. Installation You will need to install poetry to run this bot local

11 Sep 19, 2022
Goal: Enable awesome tooling for Bazel users of the C language family.

Hedron's Compile Commands Extractor for Bazel — User Interface What is this project trying to do for me? First, provide Bazel users cross-platform aut

Hedron Vision 290 Dec 26, 2022
System Information Utility With Python

System-Information-Utility This is a simple utility, for the terminal, which allows you to find out information about your PC. It's very easy to run t

2 Apr 15, 2022
Lookup for interesting stuff in SMB shares

SMBSR - what is that? Well, SMBSR is a python script which given a CIDR/IP/IP_file/HOSTNAME(s) enumerates all the SMB services listening (445) among t

Vincenzo 112 Dec 15, 2022
This is the community maintained fork of ungleich's cdist (after f061fb1).

cdist This is the community maintained fork of ungleich's cdist (after f061fb1). Work is split between three repositories: cdist - implementation of t

cdist community edition 0 Aug 02, 2022
A python package for batch import of resume attachments to be parsed in HrFlow.

HrFlow Importer Description A python package for batch import of resume attachments to be parsed in HrFlow. hrflow-importer is an open-source project

HrFlow.ai (ex: Riminder.net) 3 Nov 15, 2022