Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
JPMC Virtual Experience

This repository contains the submitted patch files along with raw files of the various tasks assigned by JPMorgan Chase & Co. through its Software Engineering Virtual Experience Program on Forage (fo

Vardhini K 1 Dec 05, 2021
Awesome & interesting talks about programming

Programming Talks I watch a lot of talks that I love to share with my friends, fellows and coworkers. As I consider all GitHubbers my friends (oh yeah

Veit Heller 7k Dec 26, 2022
BridgeWalk is a partially-observed reinforcement learning environment with dynamics of varying stochasticity.

BridgeWalk is a partially-observed reinforcement learning environment with dynamics of varying stochasticity. The player needs to walk along a bridge to reach a goal location. When the player walks o

Danijar Hafner 6 Jun 13, 2022
Nuclei - Burp Extension allows to run nuclei scanner directly from burp and transforms json results into the issues

Nuclei - Burp Extension Simple extension that allows to run nuclei scanner directly from burp and transforms json results into the issues. Installatio

106 Dec 22, 2022
Qt-creator-boost-debugging-helper - Qt Creator Debugging Helper for Boost Library

Go to Tools Options Debugger Locals & Expressions. Paste the script path t

Dmitry Bravikov 2 Apr 22, 2022
This repository contains Python games that I've worked on. You'll learn how to create python games with AI. I try to focus on creating board games without GUI in Jupyter-notebook.

92_Python_Games 🎮 Introduction 👋 This repository contains Python games that I've worked on. You'll learn how to create python games with AI. I try t

Milaan Parmar / Милан пармар / _米兰 帕尔马 166 Jan 01, 2023
MeerKAT radio telescope simulation package. Built to simulate multibeam antenna data.

MeerKATgen MeerKAT radio telescope simulation package. Designed with performance in mind and utilizes Just in time compile (JIT) and XLA backed vectro

Peter Ma 6 Jan 23, 2022
A Modern Fetch Tool for Linux!

Ufetch A Modern Fetch Tool for Linux! Programming Language: Python IDE: Visual Studio Code Developed by Avishek Dutta If you get any kind of problem,

Avishek Dutta 7 Dec 12, 2021
navigation_commander is a ROS package to command the robot to navigate autonomously to each table for food delivery inside a hotel.

navigation_commander navigation_commander is a ROS package to command the robot to navigate autonomously to each table for food delivery inside a hote

ALEENA LENTIN 9 Nov 08, 2021
Decoupled Smoothing in Probabilistic Soft Logic

Decoupled Smoothing in Probabilistic Soft Logic Experiments for "Decoupled Smoothing in Probabilistic Soft Logic". Probabilistic Soft Logic Probabilis

Kushal Shingote 1 Feb 08, 2022
Shell Trality API for local development.

Trality Simulator Intro This package is a work in progress. It allows local development of Trality bots in an IDE such as VS Code. The package provide

CrypTrality 1 Nov 17, 2021
Open-source data observability for modern data teams

Use cases Monitor your data warehouse in minutes: Data anomalies monitoring as dbt tests Data lineage made simple, reliable, and automated dbt operati

889 Jan 01, 2023
Automatically deletes Capital One Eno virtual cards for when you've made a couple too many.

eno-delete Automatically deletes Capital One Eno virtual cards for when you've made a couple too many. Warning: Program will delete ALL virtual cards

h3x 3 Sep 29, 2022
TrainingBike - Code, models and schematics I've used to interface my stationary training bike with PC.

TrainingBike Code, models and schematics I've used to interface my stationary training bike with PC. You can find more information about the project i

1 Jan 01, 2022
AdventOfCode 2021 solutions from the Devcord server

adventofcode-21 Ein Sammel-Repository für Advent of Code 2021-Lösungen der deutschen DevCord-Community. A repository collecting Advent of Code 2021 so

Devcord 12 Aug 26, 2022
Modelling and Implementation of Cable Driven Parallel Manipulator System with Tension Control

Cable Driven Parallel Robots (CDPR) is also known as Cable-Suspended Robots are the emerging and flexible end effector manipulation system. Cable-driven parallel robots (CDPRs) are categorized as a t

Siddharth U 0 Jul 19, 2022
This directory gathers the tools developed by the Data Sourcing Working Group

BigScience Data Sourcing Code This directory gathers the tools developed by the Data Sourcing Working Group First Sourcing Sprint: October 2021 The co

BigScience Workshop 27 Nov 04, 2022
Create standalone, installable R Shiny apps using Electron

Create standalone, installable R Shiny apps using Electron

Chase Clark 5 Dec 24, 2021
About A python based Apple Quicktime protocol,you can record audio and video from real iOS devices

介绍 本应用程序使用 python 实现,可以通过 USB 连接 iOS 设备进行屏幕共享 高帧率(30〜60fps) 高画质 低延迟(200ms) 非侵入性 支持多设备并行 Mac OSX 安装 python =3.7 brew install libusb pkg-config 如需使用 g

YueC 124 Nov 30, 2022
For radiometrically calibrating and PSF deconvolving IRIS data

irispreppy For radiometrically calibrating and PSF deconvolving IRIS data. I dislike how I need to own proprietary software (IDL) just to simply prepa

Aaron W. Peat 4 Nov 01, 2022