Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
Iss-tracker - ISS tracking script in python using NASA's API

ISS Tracker Tracking International Space Station using NASA's API and plotting i

Partho 9 Nov 29, 2022
SimplePyBLE - Python bindings for SimpleBLE

The ultimate fully-fledged cross-platform Python BLE library, designed for simplicity and ease of use.

Open Bluetooth Toolbox 27 Aug 28, 2022
This is a menu driven Railway Reservation Project which is mainly based on the python-mysql connectivity.

Online-Railway-Reservation-System This is a menu driven Railway Reservation Project which is mainly based on the python-mysql connectivity. The projec

Ananya Gupta 1 Jan 09, 2022
Online HackerRank problem solving challenges

LinkedListHackerRank Online HackerRank problem solving challenges This challenge is part of a tutorial track by MyCodeSchool You are given the pointer

Sefineh Tesfa 1 Nov 21, 2021
Oblique Strategies for Python

Oblique Strategies for Python

Łukasz Langa 3 Feb 17, 2022
PythonCalculator - A simple Calculator made in python using tkinter for GUI

PythonCalculator A simple Calculator made in python using tkinter for GUI. For P

ʀᴇxɪɴᴀᴢᴏʀ 1 Jan 01, 2022
The Python Achievements Framework!

Pychievements: The Python Achievements Framework! Pychievements is a framework for creating and tracking achievements within a Python application. It

Brian 114 Jul 21, 2022
aaencode for python,把python代码转换为颜文字

py-aaencode aaencode for python,把python代码转换为颜文字 compile.py: 将python编译成颜文字,编译结果有随机性,可以选择BPE词表压缩代码 compile_min.py: 最小化的编译器 compiled_min.txt: 编译得到的最小的com

11 Dec 30, 2021
A continuation Of Project Glow By @glowstik-yt

Project Glow Greetings, I see you have stumbled upon project glow. Project glow is an open source bot worked on by many people to create a good and sa

1 Nov 17, 2021
Declarative and extensible library for configuration & code separation

ClassyConf ClassyConf is the configuration architecture solution for perfectionists with deadlines. It provides a declarative way to define settings f

83 Dec 07, 2022
a simple functional programming language compiler written in python

Functional Programming Language A compiler for my small functional language. Written in python with SLY lexer/parser generator library. Requirements p

Ashkan Laei 3 Nov 05, 2021
RELATE is an Environment for Learning And TEaching

RELATE Relate is an Environment for Learning And TEaching RELATE is a web-based courseware package. It is set apart by the following features: Focus o

Andreas Klöckner 311 Dec 25, 2022
A tool for light-duty persistent memoization of API calls

JSON Memoize What is this? json_memoize is a straightforward tool for light-duty persistent memoization, created with API calls in mind. It stores the

1 Dec 11, 2021
Companion Web site for Fluent Python, Second Edition

Fluent Python, the site Source code and content for fluentpython.com. The site complements Fluent Python, Second Edition with extra content that did n

Fluent Python 49 Dec 08, 2022
A 3D Slicer Extension to view data from the flywheel heirarchy

flywheel-connect A 3D Slicer Extension to view, select, and download images from a Flywheel instance to 3D Slicer and storing Slicer outputs back to F

4 Nov 05, 2022
Statistics Calculator module for all types of Stats calculations.

Statistics-Calculator This Calculator user the formulas and methods to find the statistical values listed. Statistics Calculator module for all types

2 May 29, 2022
Python Freecell Solver

freecell Python Freecell Solver Very early version right now. You can pick a board by changing the file path in freecell.py If you want to play a game

Ben Kaufman 1 Nov 26, 2021
Simulation-Based Inference Benchmark

This repository contains a simulation-based inference benchmark framework, sbibm, which we describe in the associated manuscript "Benchmarking Simulation-based Inference".

SBI Benchmark 58 Oct 13, 2022
PyScaffold is a project generator for bootstrapping high quality Python packages

PyScaffold is a project generator for bootstrapping high quality Python packages, ready to be shared on PyPI and installable via pip. It is easy to use and encourages the adoption of the best tools a

PyScaffold 1.7k Jan 03, 2023
A conda-smithy repository for boost-histogram.

The official Boost.Histogram Python bindings. Provides fast, efficient histogramming with a variety of different storages combined with dozens of composable axes. Part of the Scikit-HEP family.

conda-forge 0 Dec 17, 2021