Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"

Overview

Amazon Brands and Exclusives

This repository contains code to reproduce the findings featured in our story "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up" from our series Amazon's Advantage.

Our methodology is described in "How We Analyzed Amazon’s Treatment of Its Brands in Search Results".

Data that we collected and analyzed is in the data folder.
To use the full input dataset (which is not hosted here), please refer to Download data.

Jupyter notebooks used for data preprocessing and analysis are available in the notebooks folder.
Descriptions for each notebook are outlined in the Notebooks section below.

Installation

Python

Make sure you have Python 3.6+ installed. We used Miniconda to create a Python 3.8 virtual environment.

Then install the Python packages:
pip install -r requirements.txt

Notebooks

These notebooks are intended to be run sequentially, but they are not dependent on one another. If you want a quick overview of the methodology, you only need to concern yourself with the notebooks with an asterisk(*).

0-data-preprocessing.ipynb

This notebook parses Amazon search results and Amazon product pages, and produces the intermediary datasets (data/output/datasets/) used in ranking analysis and random forest classifiers.

1-data-analysis-search-results.ipynb *

Bulk of the ranking analysis and stats in the data analysis.

2-random-forest-analysis.ipynb *

Feature engineering training set, finding optimal hyperparameters, and performing the ablation study on a random forest model. The most predictive feature is verified using three separate methods.

3-survey-results.ipynb

Visualizing the survey results from our national panel of 1,000 adults.

4-limiations-product-page-changes.ipynb

Analysis of how often the Buy Box's default shipper and seller change between Amazon and a third party.

utils.py

Contains convenient functions used in the notebooks.

parsers.py

Contains parsers for search results and product pages.

Data

This directory is where inputs, intermediaries, and outputs are saved.

data
├── output
│   ├── figures
│   ├── tables
│   └── datasets
│       ├── amazon_private_label.csv.xz
│       ├── products.csv.xz
│       ├── searches.csv.xz
│       ├── training_set.csv.gz
│       ├── pairwise_training_set.csv.gz
│       └── trademarks
└── input
    ├── combined_queries_with_source.csv
    ├── best_sellers
    ├── generic_search_terms
    ├── search-private-label
    ├── search-selenium
    ├── search-selenium-our-brands-filter_
    ├── selenium-products
    ├── seller_central
    └── spotcheck

data/output/ contains tables, figures, and datasets used in our methodology.

data/output/datasets/amazon_private_label.csv.xz is our dataset of Amazon brands, exclusives, and proprietary electronics (N=137,428 products). We use each product's unique ID (called an ASIN) to identify Amazon's own products in our methodology.

data/output/datasets/trademarks contains a dataset of trademarked brands registered by Amazon. The data was collected from USPTO.gov and Amazon. We included an additional README with the exact steps we took to build this dataset in the directory.

data/output/datasets/searches.csv.xz parsed search result pages from top and generic searches (N=187,534 product positions). You can filter this by search_term for each of these subsets from data/input/combined_queries_with_source.csv.

data/output/datasets/products.csv.xz parsed product pages from the searches above (N=157,405 product pages).

data/output/training_set.csv.gz metadata used to train and evaluate the random forest. Additionally, feature engineering is conducted in notebooks/2-random-forest-analysis.ipynb, which produces pairwise_training_set.csv.gz.

Every file in data/input except combined_queries_with_source.csv is stored in AWS s3. Those are not hosted in this repository.

Download Data

You can find the raw inputs in data/input in s3://markup-public-data/amazon-brands/.

If you trust us, you can download the HTML and JSON files in data/input using this script: sh data/download_input_data.sh

Note this is not necessary to run notebooks and see full results.

data/input/search-selenium/ (12 GB uncompressed)

First page of search results collected in January 2021. Download the HTML files search-selenium.tar.xz (238 MB compressed) here.

data/input/selenium-products/ (220 GB uncompressed)

Product pages collected in February 2021. Download the HTML files selenium-products.tar.xz (9 GB compressed) here.

data/input/search-selenium-our-brands-filter_/ (35 GB uncompressed)

Search results filtered by "our brands". Contains every page of search results. Download search-selenium-our-brands-filter_.tar.xz (403 MB compressed) here.

data/input/search-private-label/ (25 GB uncompressed)

API responses for search results filtered down to products Amazon identifies as "our brands". Contains paginated API results. Download the JSON files search-private-label.tar.xz (402 MB uncompressed) here.

data/input/seller_central/ (105 MB)

Seller central data for Q4 2020. Download the CSV file All_Q4_2020.csv.xz (105 MB compressioned) here.

data/input/best_sellers/ (4 GB)

Amazon's best sellers under the category "Amazon Devices & Accessories". Download the HTML files best_sellers.tar.xz (60MB compressed) here.

data/input/spotcheck/ (4 GB)

A sub-sample of product pages for spot-checking Buy Box changes. Download the HTML files spotcheck.tar.xz (159 MB compressed) here.

Simple-nft-tutorial - A simple tutorial on making nft/memecoins on algorand

nft/memecoin Tutorial on Algorand Let's make a simple NFT/memecoin on the Algora

2 Feb 05, 2022
Fast and multi-threaded script to automatically claim targeted username including 14 day bypass

Instagram Username Auto Claimer Fast and multi-threaded script to automatically claim targeted username. Click here to report bugs. Usage Download ZIP

265 Dec 28, 2022
❤️ Hi There Im EzilaX ❤️ A next gen powerful telegram group manager bot 😱 for manage your groups and have fun with other cool modules Made By Sadew Jayasekara 🔥

❤️ EzilaX v1 ❤️ Unmaintained. The new repo of @EzilaXBot is Public. (It is no longer based on this source code. The completely rewritten bot available

Sadew Jayasekara 18 Nov 24, 2021
Python API for Photoshop.

Python API for Photoshop. The example above was created with Photoshop Python API.

Hal 372 Jan 02, 2023
Telegram Group Manager Bot Written In Python Using Pyrogram.

──「𝐂𝐡𝐢𝐤𝐚 𝐅𝐮𝐣𝐢𝐰𝐚𝐫𝐚」── Telegram Group Manager Bot Written In Python Using Pyrogram. Deploy To Heroku NOTE: I'm making this note to whoever

Wahyusaputra 3 Feb 12, 2022
Enables you to execute scripts and perform API requests in MikroTik router

HomeAssistant component: MikroTik API The mikrotik_api platform enables you to execute scripts and perform API requests in MikroTik router To enable M

Pavel S 6 Aug 12, 2022
Assistant made in python to control your spotify via voice

Spotify-Assistant Assistant made in python to control your spotify via voice Overview 🚀 PLAY, PAUSE, NEXT, PREVIOUS, VOLUME COMMANDS 📝 Toast notific

Mauri 6 Jan 18, 2022
Telegram bot for searching videos in your PDisk account by @AbirHasan2005

PDisk-Videos-Search A Telegram bot for searching videos in your PDisk account by @AbirHasan2005. Configs API_ID - Get from @TeleORG_Bot API_HASH - Get

Abir Hasan 39 Oct 21, 2022
微信支付接口V3版python库

wechatpayv3 介绍 微信支付接口V3版python库。 适用对象 wechatpayv3支持微信支付直连商户,接口说明详见 官网。 特性 平台证书自动更新,无需开发者关注平台证书有效性,无需手动下载更新; 支持本地缓存平台证书,初始化时指定平台证书保存目录即可。 适配进度 微信支付V3版A

chen gang 258 Jan 06, 2023
A slack bot that notifies you when a restaurant is available for orders

Slack Wolt Notifier A Slack bot that notifies you when a Wolt restaurant or venue is available for orders. How does it work? Slack supports bots that

Gil Matok 8 Oct 24, 2022
StringSessionGenerator - A Telegram bot to generate pyrogram and telethon string session

⭐️ String Session Generator ⭐️ Genrate String Session Using this bot. Made by TeamUltronX 🔥 String Session Demo Bot: Environment Variables Mandatory

TheUltronX 1 Dec 31, 2021
Discord bot written in python

Discord bot created by dpshark#3004 for fun List of features/commands: [keyword] responses tools !add [respons] Adds new response to [keyword] !remove

Daniel K.Gunleiksrud 3 Dec 28, 2021
Streaming Finance Data with AWS Lambda

A data pipeline consisting of an AWS lambda function reading data from yfinance API, an AWS Kinesis stream to receive & store data in S3 buckets and AWS Glue crawler & Athena to run SQL queries.

Aarif Munwar Jahan 4 Aug 30, 2022
A liblary whre you can find helpful functions for your discord bot

DBotUtils A liblary whre you can find helpful functions for your discord bot Easy setup Setup is easily and flexible. Change anytime. After setup just

Kondek286 1 Nov 02, 2021
A VCVideoPlayer Bot for Telegram made with 💞 By @ProErrorXD

VC Video Player How To Host ✨ Heroku Deploy ✨ The easiest way to deploy this Bot is via Heroku. Credit 🔥 |🇮🇳 Louis |🇮🇳 Sammy |🇮🇳 Blaze Marsha

丂ムᄊᄊƳ 95 May 17, 2022
DaProfiler vous permet d'automatiser vos recherches sur des particuliers basés en France uniquement et d'afficher vos résultats sous forme d'arbre.

A but educatif seulement. DaProfiler DaProfiler vous permet de créer un profil sur votre target basé en France uniquement. La particularité de ce prog

Dalunacrobate 73 Dec 21, 2022
A Script to automate fowarding all new messages from one/many channel(s) to another channel(s), without the forwarded tag.

Channel Auto Message Forward A script to automate fowarding all new messages from one/many channel(s) to another channel(s), without the forwarded tag

16 Oct 21, 2022
A python based all-in-one tool for Google Drive

gdrive-tools A python based all-in-one tool for Google Drive Uses For Gdrive-Tools ✓ generate SA ✓ Add the SA and Add them to TD automatically ✓ Gener

XcodersHub 32 Feb 09, 2022
Telegram bot untuk mencari jawaban dibrainly, support inline juga

Brainly-Telebot Bot Untuk Mencari Jawaban Dibrainly Jika ingin clone. Boleh kok Dibuat dengan python menggunakan MTproto Library. Yaitu Pyrogram Bot y

... 7 Mar 17, 2022
NiceHash Python Library and Command Line Rest API

NiceHash Python Library and Command Line Rest API Requirements / Modules pip install requests Required data and where to get it Following data is nee

Ashlin Darius Govindasamy 2 Jan 02, 2022