Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
Ellipitical Curve Table Generator

Ellipitical-Curve-Table-Generator This script generates a table of elliptical po

Nishaant Goswamy 1 Jan 02, 2022
Use Fofa、shodan、zoomeye、360quake to collect information(e.g:domain,IP,CMS,OS)同时调用Fofa、shodan、zoomeye、360quake四个网络空间测绘API完成红队信息收集

Cyberspace Map API English/中文 Development fofaAPI Completed zoomeyeAPI shodanAPI regular 360 quakeAPI Completed Difficulty APIs uses different inputs

Xc1Ym 61 Oct 08, 2022
API to summarize input text

summaries API to summarize input text normal run $ docker-compose exec web python -m pytest disable warnings $ docker-compose exec web python -m pytes

Brad 1 Sep 08, 2021
Let's pretend you want to create a AWS Lambda project called "sns-processor".

Usage Let's pretend you want to create a AWS Lambda project called "sns-processor". Rather than using lambda and then editing the results to include y

1 Dec 31, 2021
A set of decks and notebooks with exercises for use in a hands-on causal inference tutorial session

intro-to-causal-inference A introduction to causal inference using common tools from the python data stack Table of Contents Getting Started Install g

Roni Kobrosly 15 Dec 07, 2022
Expense-manager - Expense manager with python

Expense_manager TO-DO Source extractor: Credit Card, Wallet Destination extracto

1 Feb 13, 2022
Strong Typing in Python with Decorators

typy Strong Typing in Python with Decorators Description This light-weight library provides decorators that can be used to implement strongly-typed be

Ekin 0 Feb 06, 2022
ELF file deserializer and serializer library

elfo ELF file deserializer and serializer library. import elfo elf = elfo.ELF.from_path('main') elf ELF( header=ELFHeader( e_ident=e

Filipe Laíns 3 Aug 23, 2021
Null safe support for Python

Null Safe Python Null safe support for Python. Installation pip install nullsafe Quick Start Dummy Class class Dummy: pass Normal Python code: o =

Paaksing 13 Nov 17, 2022
This directory gathers the tools developed by the Data Sourcing Working Group

BigScience Data Sourcing Code This directory gathers the tools developed by the Data Sourcing Working Group First Sourcing Sprint: October 2021 The co

BigScience Workshop 27 Nov 04, 2022
Coded in Python 3 - I make for education, easily clone simple website.

Simple Website Cloner - Single Page Coded in Python 3 - I make for education, easily clone simple website. How to use ? Install Python 3 first. Instal

Phạm Đức Thanh 2 Jan 13, 2022
A smart personal companion and health assistant.

Steps to Install : Clone the repository Go to ResQ-Sources Execute ResQ-Lite.py --: Manual Controls : DanceRobot.py --: You can call functions like fo

Tuhinadri Banerjee 1 May 25, 2022
A Snakemake workflow for standardised sc/snRNAseq analysis

single_snake_sequencing - sc/snRNAseq Snakemake Workflow A Snakemake workflow for standardised sc/snRNAseq analysis. Every single cell analysis is sli

IMS Bio2Core Facility 1 Nov 02, 2021
Open-source library for analyzing the results produced by ABINIT

Package Continuous Integration Documentation About AbiPy is a python library to analyze the results produced by Abinit, an open-source program for the

ABINIT 91 Dec 09, 2022
Checkers Project Built Using Python

Checkers Project Built Using Python

Meekness Anyaeche 1 Nov 08, 2021
Minimalistic Gridworld Environment (MiniGrid)

Minimalistic Gridworld Environment (MiniGrid) There are other gridworld Gym environments out there, but this one is designed to be particularly simple

Maxime Chevalier-Boisvert 1.7k Jan 03, 2023
Py-Parser est un parser de code python en python encore en plien dévlopement.

PY - PARSER Py-Parser est un parser de code python en python encore en plien dévlopement. Une fois achevé, il servira a de nombreux projets comme glad

pf4 3 Feb 21, 2022
Um sistema de llogin feito em uma interface grafica.

Interface-para-login Um sistema de login feito com JSON. Utilizando a biblioteca Tkinter, eu criei um sistema de login, onde guarda a informações de l

Mobben 1 Nov 28, 2021
Nick Craig-Wood's Website

Nick Craig-Wood's public website This directory tree is used to build all the different docs for Nick Craig-Wood's website. The content here is (c) Ni

Nick Craig-Wood 2 Sep 02, 2022
Vaccine for STOP/DJVU ransomware, prevents encryption

STOP/DJVU Ransomware Vaccine Prevents STOP/DJVU Ransomware from encrypting your files. This tool does not prevent the infection itself. STOP ransomwar

Karsten Hahn 16 May 31, 2022