Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"

Overview

Amazon Brands and Exclusives

This repository contains code to reproduce the findings featured in our story "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up" from our series Amazon's Advantage.

Our methodology is described in "How We Analyzed Amazon’s Treatment of Its Brands in Search Results".

Data that we collected and analyzed is in the data folder.
To use the full input dataset (which is not hosted here), please refer to Download data.

Jupyter notebooks used for data preprocessing and analysis are available in the notebooks folder.
Descriptions for each notebook are outlined in the Notebooks section below.

Installation

Python

Make sure you have Python 3.6+ installed. We used Miniconda to create a Python 3.8 virtual environment.

Then install the Python packages:
pip install -r requirements.txt

Notebooks

These notebooks are intended to be run sequentially, but they are not dependent on one another. If you want a quick overview of the methodology, you only need to concern yourself with the notebooks with an asterisk(*).

0-data-preprocessing.ipynb

This notebook parses Amazon search results and Amazon product pages, and produces the intermediary datasets (data/output/datasets/) used in ranking analysis and random forest classifiers.

1-data-analysis-search-results.ipynb *

Bulk of the ranking analysis and stats in the data analysis.

2-random-forest-analysis.ipynb *

Feature engineering training set, finding optimal hyperparameters, and performing the ablation study on a random forest model. The most predictive feature is verified using three separate methods.

3-survey-results.ipynb

Visualizing the survey results from our national panel of 1,000 adults.

4-limiations-product-page-changes.ipynb

Analysis of how often the Buy Box's default shipper and seller change between Amazon and a third party.

utils.py

Contains convenient functions used in the notebooks.

parsers.py

Contains parsers for search results and product pages.

Data

This directory is where inputs, intermediaries, and outputs are saved.

data
├── output
│   ├── figures
│   ├── tables
│   └── datasets
│       ├── amazon_private_label.csv.xz
│       ├── products.csv.xz
│       ├── searches.csv.xz
│       ├── training_set.csv.gz
│       ├── pairwise_training_set.csv.gz
│       └── trademarks
└── input
    ├── combined_queries_with_source.csv
    ├── best_sellers
    ├── generic_search_terms
    ├── search-private-label
    ├── search-selenium
    ├── search-selenium-our-brands-filter_
    ├── selenium-products
    ├── seller_central
    └── spotcheck

data/output/ contains tables, figures, and datasets used in our methodology.

data/output/datasets/amazon_private_label.csv.xz is our dataset of Amazon brands, exclusives, and proprietary electronics (N=137,428 products). We use each product's unique ID (called an ASIN) to identify Amazon's own products in our methodology.

data/output/datasets/trademarks contains a dataset of trademarked brands registered by Amazon. The data was collected from USPTO.gov and Amazon. We included an additional README with the exact steps we took to build this dataset in the directory.

data/output/datasets/searches.csv.xz parsed search result pages from top and generic searches (N=187,534 product positions). You can filter this by search_term for each of these subsets from data/input/combined_queries_with_source.csv.

data/output/datasets/products.csv.xz parsed product pages from the searches above (N=157,405 product pages).

data/output/training_set.csv.gz metadata used to train and evaluate the random forest. Additionally, feature engineering is conducted in notebooks/2-random-forest-analysis.ipynb, which produces pairwise_training_set.csv.gz.

Every file in data/input except combined_queries_with_source.csv is stored in AWS s3. Those are not hosted in this repository.

Download Data

You can find the raw inputs in data/input in s3://markup-public-data/amazon-brands/.

If you trust us, you can download the HTML and JSON files in data/input using this script: sh data/download_input_data.sh

Note this is not necessary to run notebooks and see full results.

data/input/search-selenium/ (12 GB uncompressed)

First page of search results collected in January 2021. Download the HTML files search-selenium.tar.xz (238 MB compressed) here.

data/input/selenium-products/ (220 GB uncompressed)

Product pages collected in February 2021. Download the HTML files selenium-products.tar.xz (9 GB compressed) here.

data/input/search-selenium-our-brands-filter_/ (35 GB uncompressed)

Search results filtered by "our brands". Contains every page of search results. Download search-selenium-our-brands-filter_.tar.xz (403 MB compressed) here.

data/input/search-private-label/ (25 GB uncompressed)

API responses for search results filtered down to products Amazon identifies as "our brands". Contains paginated API results. Download the JSON files search-private-label.tar.xz (402 MB uncompressed) here.

data/input/seller_central/ (105 MB)

Seller central data for Q4 2020. Download the CSV file All_Q4_2020.csv.xz (105 MB compressioned) here.

data/input/best_sellers/ (4 GB)

Amazon's best sellers under the category "Amazon Devices & Accessories". Download the HTML files best_sellers.tar.xz (60MB compressed) here.

data/input/spotcheck/ (4 GB)

A sub-sample of product pages for spot-checking Buy Box changes. Download the HTML files spotcheck.tar.xz (159 MB compressed) here.

A discord bot can stress ip addresses with python tool

Python-ddos-bot Coded by Lamp#1442 A discord bot can stress ip addresses with python tool. Warning! DOS or DDOS is illegal, i shared for educational p

IrgyGANS 1 Nov 16, 2021
A module to complement discord.py that has Music, Paginator and Levelling.

discord-super-utils A modern python module including many useful features that make discord bot programming extremely easy. Features Modern leveling m

Yash 106 Dec 19, 2022
A telegram bot to read RSS feeds

Telegram bot to fetch RSS feeds This is a telegram bot that fetches RSS feeds in regular intervals and send it to you. The feed sources can be added o

Santhosh Thottingal 14 Dec 15, 2022
Github repository started notify 💕

Github repository started notify 💕

4 Aug 06, 2022
Auto file forward bot with python

Auto-File-Forward-Bot Auto file forward bot. Without Admin Permission in FROM_CHANNEL Only Give Permission In your Telegram Personal Channel Please fo

Milas 1 Oct 15, 2021
AWS Lambda Fast API starter application

AWS Lambda Fast API Fast API starter application compatible with API Gateway and Lambda Function. How to deploy it? Terraform AWS Lambda API is a reus

OBytes 6 Apr 20, 2022
Solcast rooftop api for HA

Solcast Solar Home Assistant(https://www.home-assistant.io/) Component This custom component integrates the Solcast API into Home Assistant. Modified

Greg 1 Oct 11, 2021
Written in Python, freezed into stand-alone executable with PyInstaller. This app will make sure you stay in New World without getting kicked for inactivity.

New World - AFK Written in Python, freezed into stand-alone executable with PyInstaller. This app will make sure you stay in New World without getting

Rodney 5 Oct 31, 2021
This is a starter template of discord.py project

Template Discord.py This is a starter template of discord.py project (Supports Slash commands!). 👀 Getting Started First, you need to install Python

1 Dec 22, 2021
A comand-line utility for taking automated screenshots of websites

shot-scraper A comand-line utility for taking automated screenshots of websites For background on this project see shot-scraper: automated screenshots

Simon Willison 837 Jan 07, 2023
Announces when a web3 wallet receives a token

excitare_cito v2.0 by Bogdan Vaida ([email protected]) Announces wh

1 Nov 30, 2021
ThetaGang is an IBKR bot for collecting money

💬 Join the Matrix chat, we can get money together. Θ ThetaGang Θ Beat the capitalists at their own game with ThetaGang 📈 ThetaGang is an IBKR tradin

Brenden Matthews 1.5k Jan 08, 2023
One of Best renamer bot with python

🌀 One of Best renamer bot repo Please Give a ☆ if You like This Open Source and Don't Forget to Follow Me On Github For More Repos And Codes. Scrappe

1 Dec 14, 2021
Ivan Telegram Userbot with python

Riviani Ramadhan Ivan-Ubot Pada Dasarnya Ivan-Ubot adalah userbot Telegram modular yang berjalan di Python3 dengan database sqlalchemy. Berbasis Paper

1 Oct 29, 2021
Shuffle and add items from jellyfin to mpd (use in tandem with jellyfin-mopidy and mpd-mopidy). Similar to ncmpcpp's "Add random" feature..

jellyshuf Essentially implements ncmpcpp's add random feature (default hotkey: `) through a script which grabs info from jellyfin api itself. jellyfin

Ethan Djeric 2 Dec 14, 2021
A code that can make an account bump your discord server 24/7!

BumpCord A code that can make an account bump your discord server 24/7! The main.py is the main file. keep_alive.py prevents your repl from going to s

Phantom 28 Aug 20, 2022
JAKYM, Just Another Konsole YouTube-Music. A command line based Youtube music player written in Python with spotify and youtube playlist support

Just Another Konsole YouTube-Music Overview I wanted to create this application so that I could use the command line to play music easily. I often pla

Mayank Jha 73 Jan 01, 2023
Listen to the radio station from your favorite broadcast

Latest news Listen to the radio station from your favorite broadcast MyCroft Radio Skill for testing and copy at docker skill About Play regional radi

1 Dec 22, 2021
SEP Finder Bot

SEP Finder Bot This is a Telegram bot that will help you find the correct SEP and Baseband files to use for your device with futurerestore. Usage A ho

6 Dec 03, 2022
Select random winners for a Twitter giveaway

twitter_picker Select random winners for a Twitter giveaway Once the Twitter giveaway (or airdrop) is closed, assign a number to each participant. The

Michael Rawner 1 Dec 11, 2021