Zotero2Readwise - A Python Library to retrieve annotations and notes from Zotero and upload them to your Readwise

Overview

Zotero ➡️ Readwise

zotero2readwise is a Python library that retrieves all Zotero annotations† and notes. Then, It automatically uploads them to your Readwise§§.

This is particularly useful for the new Zotero PDF Reader that stores all highlights in the Zotero database. The new Zotero, also available for iOS app (currently in beta). In the new Zotero, the annotations are NOT saved in the PDF file unless you export the highlights in order to save them.

If you annotate your files outside the new Zotero PDF reader, this library may not work with your PDF annotations as those are not retrievable from Zotero API.

This library is for you if you annotate (highlight + note) using the Zotero's PDF reader (including the Zotero iOS)

👉 Updating an existing Zotero annotation or note and re-running this library will update the corresponding Readwise highlight without creating a duplicate!

† Annotations made in the new Zotero PDF reader and note editor.

§ Readwise is a paid service/software that integrates your highlights from almost everywhere (Pocket, Instapaper, Twitter, Medium, Apple Books, and many more). It even has an amazing OCR for directly importing your highlights on a physical book/article into Readwise and allowing you to export all your highlights to Obsidian, Notion, Roam, Markdown, etc. Moreover, It has an automated Spaced Repition and Active Recall. You can use the the link here to get an extra free month (Disclaimer: I will get a free month too!)


Installation

You can install the library by running

pip install zotero2readwise

Note: If you do not have pip installed on your system, you can follow the instructions here.

Usage

Since we have to retrieve the notes from Zotero API and then upload them to the Readwise, the minimum requirements are:

  • Readwise access token [Required]: You can get your access token from https://readwise.io/access_token
  • Zotero API key [Required]: Create a new Zotero Key from your Zotero settings
  • Zotero personal or group ID [Required]:
    • Your personal library ID (aka userID) can be found here next to Your userID for use in API calls is XXXXXX.
    • If you're using a group library, you can find the library ID by
      1. Go to https://www.zotero.org/groups/
      2. Click on the interested group.
      3. You can find the library ID from the URL link that has format like https://www.zotero.org/groups/<group_id>/group_name. The number between /groups/ and /group_name is the libarry ID.
  • Zotero library type [Optional]: "user" (default) if using personal library and "group" if using group library.

Note that if you want to retrieve annotations and notes from a group, you should provide the group ID (zotero_library_id=<group_id>) and set the library type to group (zotero_library_type="group").

Approach 1 (Recommended)

from zotero2readwise.zt2rw import Zotero2Readwise

zt_rw = Zotero2Readwise(
    readwise_token="your_readwise_access_token",  # Visit https://readwise.io/access_token)
    zotero_key="your_zotero_key",  # Visit https://www.zotero.org/settings/keys
    zotero_library_id="your_zotero_id", # Visit https://www.zotero.org/settings/keys
    zotero_library_type="user", # "user" (default) or "group"
    include_annotations=True, # Include Zotero annotations -> Default: True
    include_notes=False, # Include Zotero notes -> Default: False
)
zt_rw.run()

Just to make sure that all files are created, you can run save_failed_items_to_json() from readwise attribute of the class object to save any highlight that failed to upload to Readwise. If a file or more failed to create, the filename (item title) and the corresponding Zotero item key will be saved to a txt file.

zt_rw.readwise.save_failed_items_to_json("failed_readwise_highlights.json")

Approach 2

You can use the run.py script. Run python run.py -h to get more information about all options. You can simply run the script as the following:

python run.py <readwise_token> <zotero_key> <zotero_id> 

Request a new feature or report a bug

Feel free to request a new feature or report a bug in GitHub issue here.

📫 How to reach me:

Personal Website LinkedIn Medium Twitter

Buy Me A Coffee

Comments
  • Update README.md

    Update README.md

    Link to Zotero Settings changed from https://www.zotero.org/settings/key to https://www.zotero.org/settings/keys

    I also added /new to directly link to generating a new key, maybe you could explain which settings are needed for a new key (read/write).

    opened by floriankilian 1
  • Fix invalid link to Zotero Settings page

    Fix invalid link to Zotero Settings page

    Thank you so much for your great work! While setting up my forked repo, I noticed a broken link, so I fixed it.

    "https://www.zotero.org/settings/key" to "https://www.zotero.org/settings/keys"

    opened by nobuyukioishi 0
  • Send case law and other types of documents to Readwise?

    Send case law and other types of documents to Readwise?

    Would it be possible to send other types of documents other than books and articles to Readwise? For example I annotate a lot of case law and laws and reports. I’m fine if these are categorised as articles if this means they are also sent to Readwise.

    But if it’s possible to categorise them correctly and get them into Readwise that’d be wonderful! Is this a possibility?

    opened by ABeehive 0
  • Partial push of highlights to Readwise

    Partial push of highlights to Readwise

    Currently, the library fetches all Zotero highlights/notes and pushes them to Readwise each time.

    For efficiency and also in order to avoid any potential duplicated highlights due to either library changes or changes in Readwise de-duplication algorithm, the library should be able to push only latest highlights.

    This is related to the issue #31.

    opened by e-alizadeh 0
  • After the last release all my articles from Zottero got duplicated

    After the last release all my articles from Zottero got duplicated

    As in the title. I have set the cronjob for 3 am every day. And on 20 Oct (after the new release on 19 Oct) all the articles got pushed to the Zotero second time. I tried to simply remove them, but they got pushed again. Can we somehow fix this issue?

    opened by piojanu 6
  • Z2R.zt2rw Approach 2 (through python terminal) stopped working in July 2022

    Z2R.zt2rw Approach 2 (through python terminal) stopped working in July 2022

    It goes through the normal sequence in python, exactly as before but highlights just no longer appear in Readwise. I recreated my zotero key and readwise token and still not appearing. No errors, zotero seems to be collating and pushing the data to Readwise.

    Anyone else experiencing this? Did Readwise change something on their end?

    opened by bcmorrison3 6
  • Only sync pre-specified color(s)?

    Only sync pre-specified color(s)?

    It would be convenient to have a highlight color which signifies "I want this to be synced to Readwise". In my case, most of my highlights aren't review-worthy in a generalized context outside of whatever research I'm doing. Some small amount are.

    image

    I imagine most people don't use all of the available options anyway.

    opened by deklanw 0
Releases(v0.2.6)
Owner
Essi Alizadeh
Engineer & Data Scientist in Permanent Beta: Learning, Improving, Evolving ...
Essi Alizadeh
fast python port of arc90's readability tool, updated to match latest readability.js!

python-readability Given a html document, it pulls out the main body text and cleans it up. This is a python port of a ruby port of arc90's readabilit

Yuri Baburov 2.2k Dec 28, 2022
Every web site provides APIs.

Toapi Overview Toapi give you the ability to make every web site provides APIs. Version v2.0.0, Completely rewrote. More elegant. More pythonic v1.0.0

Jiuli Gao 3.3k Jan 05, 2023
Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

Mišo Belica 3k Jan 03, 2023
Brownant is a web data extracting framework.

Brownant Brownant is a lightweight web data extracting framework. Who uses it? At the moment, dongxi.douban.com (a.k.a. Douban Dongxi) uses Brownant i

Douban Inc. 157 Jan 06, 2022
Fast and robust date extraction from web pages, with Python or on the command-line

Find original and updated publication dates of any web page. From the command-line or within Python, all the steps needed from web page download to HTML parsing, scraping, and text analysis are inclu

Adrien Barbaresi 60 Dec 14, 2022
Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

Python Software Foundation 12.9k Jan 01, 2023
RSS feed generator website with user friendly interface

RSS feed generator website with user friendly interface

Alexandr Nesterenko 331 Jan 02, 2023
Zotero2Readwise - A Python Library to retrieve annotations and notes from Zotero and upload them to your Readwise

Zotero ➡️ Readwise zotero2readwise is a Python library that retrieves all Zotero

Essi Alizadeh 49 Dec 20, 2022
Web-Extractor - Simple Tool To Extract IP-Adress From Website

IP-Adress Extractor Simple Tool To Extract IP-Adress From Website Socials: Langu

ميخائيل 7 Jan 16, 2022
News, full-text, and article metadata extraction in Python 3. Advanced docs:

Newspaper3k: Article scraping & curation Inspired by requests for its simplicity and powered by lxml for its speed: "Newspaper is an amazing python li

Lucas Ou-Yang 12.3k Jan 01, 2023
a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

Charles Leifer 588 Dec 27, 2022
Export your data from Xiami

Xiami Exporter 导出虾米音乐的个人数据,功能: 导出歌曲为 json 收藏歌曲 收藏专辑 播放列表 导出收藏艺人为 json 导出收藏专辑为 json 导出播放列表为 json (个人和收藏) 将导出的数据整理至 sqlite 数据库 收藏歌曲 收藏艺人 收藏专辑 播放列表 下载已导出

Xiao Meng 59 Nov 13, 2021
Open clone of OpenAI's unreleased WebText dataset scraper.

Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.

Joshua C Peterson 471 Dec 30, 2022
Github Actions采集RSS, 打造无广告内容优质的头版头条超赞宝藏页

Github Actions Rss (garss, 嘎RSS! 已收集69个RSS源, 生成时间: 2021-02-26 11:23:45) 信息茧房是指人们关注的信息领域会习惯性地被自己的兴趣所引导,从而将自己的生活桎梏于像蚕茧一般的“茧房”中的现象。

zhaoolee 721 Jan 02, 2023
Web Content Retrieval for Humans™

Lassie Lassie is a Python library for retrieving basic content from websites. Usage import lassie lassie.fetch('http://www.youtube.com/watch?v

Mike Helmick 571 Dec 29, 2022
Convert HTML to Markdown-formatted text.

html2text html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to

Alireza Savand 1.3k Dec 31, 2022
Combine XPath, CSS Selectors and JSONPath for Web data extracting.

Data Extractor Combine XPath, CSS Selectors and JSONPath for Web data extracting. Quickstarts Installation Install the stable version from PYPI. pip i

林玮 (Jade Lin) 27 Oct 22, 2022