A command line tool (and Python library) for archiving Twitter JSON

Last update: Dec 28, 2022

Related tags

Command-line Tools twarc

Overview

twarc

Collect data at the command line from the Twitter API (v1.1 and v2).

Read the documentation
Ask questions in Slack or Matrix

Contributing

Documentation

The documentation is managed at ReadTheDocs. If you would like to improve the documentation you can edit the Markdown files in docs or add new ones. Then send a pull request and we can add it.

To view your documentation locally you should be able to:

pip install -r requirements-mkdocs.txt
mkdocs serve
open http://127.0.0.1:8000/

If you prefer you can create a page on the wiki to workshop the documentation, and then when/if you think it's ready to be merged with the documentation create an issue. Please feel free to create whatever documentation is useful in the wiki area.

Code

If you are interested in adding functionality to twarc or fixing something that's broken here are the steps to setting up your development environment:

git clone https://github.com/docnow/twarc
cd twarc
pip install -r requirements.txt

Create a .env file that included Twitter App keys to use during testing:

BEARER_TOKEN=CHANGEME
CONSUMER_KEY=CHANGEME
CONSUMER_SECRET=CHANGEME
ACCESS_TOKEN=CHANGEME
ACCESS_TOKEN_SECRET=CHANGEME

Now run the tests:

python setup.py test

Add your code and some new tests, and send a pull request!

Comments

twarc2 search without configure on Windows throws JSON parse error

I ran the request below: twarc2 search '#ENDSARS-is:retweet' --start-time 2017-12-01 --end-time 2020-11-30 --flatten --archive C:\Users\USER\Desktop\MyTwarcResults.json

and I got this error message below:

Traceback (most recent call last):
  File "C:\Users\USER\PycharmProjects\workspace\venv\Scripts\twarc2-script.py", line 33, in <module>
    sys.exit(load_entry_point('twarc==2.0.6', 'console_scripts', 'twarc2')())
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\twarc\decorators.py", line 172, in __call__
    result = e.response.json()
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\requests\models.py", line 900, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

What exactly be the cause/source of this error, and how can i get help?

opened by osemele 62

unable to parse 400 error as json: Bad request

Hello to everyone, I am new to programming and I have been working for two days in Twarc in order to download tweets for my research. Note that I have access to Academic Product track v2

I’m doing:

pip install --upgrade twarc twarc2 configure (I paste the Bearer Token )

Your keys have been written to …some folder…config etc etc and I get the message: Happy twarcing!
Εverything seemed to be fine..!!

When I run: twarc2 search --archive --start-time 2020-04-01 --end-time 2021-03-31 "(#coronahysterie OR #coronaluege OR #wirwerdenalledasein) lang:de" results.jsonl

It crashes with an error: Unable to parse 400 error as JSON: Bad Request

I also tried: twarc2 search climatestrike > tweets.jsonl , but I get the same error. I changed hashtags and dates but the problem remains.

Can someone help me?? Thank you in advance! Sofia

opened by sofiavlachou28 38
Dehydrated file empty

I collected tweets using twarc and then when I enter the commend dehydrate the .txt file is empty, there is no output. The result is the same with all files tried.

opened by SCrockfo 33
Labs endpoints

Twitter is developing new API endpoints with all new JSON payloads as part of their Labs environment:

https://developer.twitter.com/en/account/labs

I think it might make sense to slowly develop a branch of twarc that uses these endpoints instead of the standard ones that are available now? It's not clear yet when (or if) the current API will be turned off. Once that is clearer I think this issue may take on some urgency.

opened by edsu 25
Error: cannot import name 'Twarc'

Hi! I'm pretty new to python and twitter; I'm trying to run the sample code to use twarc as a library to collect tweets, but I keep getting ImportError: cannot import name 'Twarc' whenever I run the code. Any ideas what I'm doing wrong?

Thanks!

opened by jte229 24

Archive & Hydrate failures getting OpenSSL.SSL.SysCallError: (104, 'ECONNRESET') by Twitter

It's a few weeks that very often Archive.py and Twarc --Hydrate failes unnoticed when launching with &. It doesn't write any traceback neither forwarding output to files. But launching both interactively I get this:

Traceback (most recent call last):
  File "/usr/local/bin/twarc.py", line 4, in <module>
    __import__('pkg_resources').run_script('twarc==0.3.0', 'twarc.py')
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 729, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1649, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 335, in <module>

  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 109, in main

  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 298, in hydrate

  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 172, in new_f

  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 323, in post

  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 507, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 370, in send
    timeout=timeout
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 372, in _make_request
    httplib_response = conn.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline()
  File "/usr/lib/python2.7/socket.py", line 447, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 188, in recv
    data = self.connection.recv(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pyOpenSSL-0.14-py2.7.egg/OpenSSL/SSL.py", line 995, in recv
    self._raise_ssl_error(self._ssl, result)
  File "/usr/local/lib/python2.7/dist-packages/pyOpenSSL-0.14-py2.7.egg/OpenSSL/SSL.py", line 862, in _raise_ssl_error
    raise SysCallError(errno, errorcode[errno])
OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')

The systems used have no I/O issues or network issues, all debians & OS X full updated and Twarc is current v0.3.0. How can I help in finding any solution?

opened by remagio 24

conversations module "sleep" inconsistencies

Hi there, I feel like I'm running into a similar issue as described here: https://twittercommunity.com/t/inconsistent-rate-limit-academic-research-full-archive-search/162928/18?u=igorbrigadir

and here: https://github.com/DocNow/twarc/pull/578.

I too am fetching all tweets related to a conversation id using the twarc2 command: Twarc2 conversations --archive input_conversation_ids.txt output_conversation_tweets.jsonl

But, I'm finding that it is doing far fewer than the 300 requests per 15 minutes that my academic Twitter API account has been allotted.

I'm using the latest version of twarc 2.8.2

Here is some of the log that I'm seeing (Notice that at 10:14:24 it stops...and then doesn't restart until 11:23:27 for no clear reason.

2022-01-01 10:04:32,148 INFO fetching conversation 290902007206772736
2022-01-01 10:04:32,148 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902007206772736', 'max_results': 100}}
2022-01-01 10:04:32,174 WARNING rate limit exceeded: sleeping 591.8250887393951 secs
2022-01-01 10:14:24,003 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902007206772736', 'max_results': 100}}
2022-01-01 10:14:24,117 INFO Retrieved an empty page of results.
2022-01-01 10:14:24,117 INFO No more results for search conversation_id:290902007206772736.
2022-01-01 11:23:27,550 INFO fetching conversation 290902009131966464
2022-01-01 11:23:27,551 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902009131966464', 'max_results': 100}}
2022-01-01 11:23:28,605 INFO Retrieved an empty page of results.
2022-01-01 11:23:28,605 INFO No more results for search conversation_id:290902009131966464.
2022-01-01 11:23:28,607 INFO fetching conversation 290902010599964672

I'm also getting a lot of these warnings about "overlong sleep interval"s:

2022-01-01 15:53:56,688 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
2022-01-01 15:53:56,688 WARNING rate limit exceeded: sleeping 901 secs
2022-01-01 16:08:57,693 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291011973909467136', 'max_results': 100}}
2022-01-01 16:08:57,795 INFO Retrieved an empty page of results.
2022-01-01 16:08:57,795 INFO No more results for search conversation_id:291011973909467136.
2022-01-01 16:08:57,795 INFO fetching conversation 291018004416839680
2022-01-01 16:08:57,796 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291018004416839680', 'max_results': 100}}
2022-01-01 16:08:57,820 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
2022-01-01 16:08:57,820 WARNING rate limit exceeded: sleeping 901 secs
2022-01-01 16:23:58,834 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291018004416839680', 'max_results': 100}}
2022-01-01 16:23:58,967 INFO Retrieved an empty page of results.
2022-01-01 16:23:58,967 INFO No more results for search conversation_id:291018004416839680.
2022-01-01 16:23:58,968 INFO fetching conversation 291062323647500288
2022-01-01 16:23:58,968 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291062323647500288', 'max_results': 100}}
2022-01-01 16:23:58,992 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
2022-01-01 16:23:58,992 WARNING rate limit exceeded: sleeping 901 secs

Any help would be appreciated as at this rate, I won't put this dataset together within any reasonable amount of time.

opened by jonlee112 21

deleted.py

What's the usage command for deleted.py? I've been using the command python utils/deleted.py election_data.txt > election_deleted.jsonl
where election_data is the dehydration output of tweet ids from an election dataset. I keep getting this error: Traceback (most recent call last): File "utils/deleted.py", line 31, in for t in missing(tweets): File "utils/deleted.py", line 16, in missing tweet_ids = [t['id_str'] for t in tweets] File "utils/deleted.py", line 16, in tweet_ids = [t['id_str'] for t in tweets] TypeError: 'int' object is not subscriptable

opened by ameliameyer 21

replies --recursive doesnt seem to be working

when I run for this tweet

https://twitter.com/hey_ciara/status/1082335132818771968

like twarc --mykeys=keys replies 1082335132818771968 --recursive

I get the following output, which has 8 tweets and they are not the "thread".

I understand there is no way to just get main user's tweets but this looks like it's not exhausting the search space. I want this id as the second one (1082335516232687617, 2nd tweet in the thread)

{"created_at": "Mon Jan 07 17:56:19 +0000 2019", "id": 1082335132818771968, "id_str": "1082335132818771968", "full_text": "Y\u2019all REALLY tryna travel in 2019?? I got you. \n\nWe\u2019re covering everything today - flights, accommodation, activities, EVERYTHING! \n\nHere\u2019s a thread on how I\u2019ve traveled to over 30 countries for the low:", "truncated": false, "display_text_range": [0, 203], "entities": {"hashtags": [], "symbols": [], "user_mentions": [], "urls": []}, "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>", "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 786360813187571712, "id_str": "786360813187571712", "name": "Ciara Johnson", "screen_name": "hey_ciara", "location": "[email protected]", "description": "solo travel queen | Quit my job to travel the world | travel blog: https://t.co/iaRZsxQmMy. IG: Hey_ciara | Feat: @forbes @essence @elle @cosmopolitan TAMU15", "url": "https://t.co/PoGu3SY7li", "entities": {"url": {"urls": [{"url": "https://t.co/PoGu3SY7li", "expanded_url": "http://www.instagram.com/hey_ciara", "display_url": "instagram.com/hey_ciara", "indices": [0, 23]}]}, "description": {"urls": [{"url": "https://t.co/iaRZsxQmMy", "expanded_url": "http://www.heyciara.com", "display_url": "heyciara.com", "indices": [67, 90]}]}}, "protected": false, "followers_count": 31330, "friends_count": 631, "listed_count": 170, "created_at": "Thu Oct 13 00:20:02 +0000 2016", "favourites_count": 5194, "utc_offset": null, "time_zone": null, "geo_enabled": true, "verified": false, "statuses_count": 4890, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "F5F8FA", "profile_background_image_url": null, "profile_background_image_url_https": null, "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/948180869738565634/8OcWwI-n_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/948180869738565634/8OcWwI-n_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/786360813187571712/1514898912", "profile_image_extensions_alt_text": null, "profile_banner_extensions_alt_text": null, "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 26468, "favorite_count": 97299, "favorited": false, "retweeted": false, "lang": "en"}
{"created_at": "Mon Feb 04 22:08:03 +0000 2019", "id": 1092545345622589440, "id_str": "1092545345622589440", "full_text": "@hey_ciara Hey I\u2019m trying to go to Colorado over spring break, what is the best website to find cheap flights? Tx-&gt;Co :)", "truncated": false, "display_text_range": [11, 123], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "hey_ciara", "name": "Ciara Johnson", "id": 786360813187571712, "id_str": "786360813187571712", "indices": [0, 10]}], "urls": []}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1082335132818771968, "in_reply_to_status_id_str": "1082335132818771968", "in_reply_to_user_id": 786360813187571712, "in_reply_to_user_id_str": "786360813187571712", "in_reply_to_screen_name": "hey_ciara", "user": {"id": 3068904452, "id_str": "3068904452", "name": "Deissy \u2728", "screen_name": "Cucuuuy29", "location": "TX \ud83d\ude1c", "description": "- Instagram: iintoxicatex - Snapchat: cucuuuy - Scorpio \u264f\ufe0f", "url": "https://t.co/UeJ4Vw3DNb", "entities": {"url": {"urls": [{"url": "https://t.co/UeJ4Vw3DNb", "expanded_url": "https://youtu.be/UkwsermH0EY", "display_url": "youtu.be/UkwsermH0EY", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 130, "friends_count": 101, "listed_count": 0, "created_at": "Mon Mar 09 01:37:39 +0000 2015", "favourites_count": 8732, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 2150, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1090821215579705351/ci0x3d1U_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1090821215579705351/ci0x3d1U_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/3068904452/1529832326", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": true, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "lang": "en"}
{"created_at": "Tue Feb 05 16:00:55 +0000 2019", "id": 1092815340210438147, "id_str": "1092815340210438147", "full_text": "@Cucuuuy29 Well, we're slightly biased but... we think you will like to give this a try: https://t.co/l8Eo3ARyhp", "truncated": false, "display_text_range": [11, 112], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "Cucuuuy29", "name": "Deissy \u2728", "id": 3068904452, "id_str": "3068904452", "indices": [0, 10]}], "urls": [{"url": "https://t.co/l8Eo3ARyhp", "expanded_url": "https://concorde.io", "display_url": "concorde.io", "indices": [89, 112]}]}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"https://concorde.io\" rel=\"nofollow\">Concorde Helper</a>", "in_reply_to_status_id": 1092545345622589440, "in_reply_to_status_id_str": "1092545345622589440", "in_reply_to_user_id": 3068904452, "in_reply_to_user_id_str": "3068904452", "in_reply_to_screen_name": "Cucuuuy29", "user": {"id": 4249658592, "id_str": "4249658592", "name": "Concorde Cheaps", "screen_name": "concordecheaps", "location": "In the cloud.", "description": "Discover cheap flights with Concorde.", "url": "https://t.co/YZ6V5SLzKh", "entities": {"url": {"urls": [{"url": "https://t.co/YZ6V5SLzKh", "expanded_url": "https://concorde.io", "display_url": "concorde.io", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 769, "friends_count": 4, "listed_count": 79, "created_at": "Sun Nov 15 21:37:40 +0000 2015", "favourites_count": 49, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 22381, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/666008495133626368/vBf8kjXB_normal.png", "profile_image_url_https": "https://pbs.twimg.com/profile_images/666008495133626368/vBf8kjXB_normal.png", "profile_banner_url": "https://pbs.twimg.com/profile_banners/4249658592/1454193154", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 1, "favorited": false, "retweeted": false, "possibly_sensitive": false, "lang": "en"}
{"created_at": "Tue Feb 05 16:29:37 +0000 2019", "id": 1092822563527495680, "id_str": "1092822563527495680", "full_text": "@concordecheaps I can\u2019t figure out your website \ud83d\ude2c but I don\u2019t see any flights from IAH to Colorado? Thank you though.", "truncated": false, "display_text_range": [16, 117], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "concordecheaps", "name": "Concorde Cheaps", "id": 4249658592, "id_str": "4249658592", "indices": [0, 15]}], "urls": []}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1092815340210438147, "in_reply_to_status_id_str": "1092815340210438147", "in_reply_to_user_id": 4249658592, "in_reply_to_user_id_str": "4249658592", "in_reply_to_screen_name": "concordecheaps", "user": {"id": 3068904452, "id_str": "3068904452", "name": "Deissy \u2728", "screen_name": "Cucuuuy29", "location": "TX \ud83d\ude1c", "description": "- Instagram: iintoxicatex - Snapchat: cucuuuy - Scorpio \u264f\ufe0f", "url": "https://t.co/UeJ4Vw3DNb", "entities": {"url": {"urls": [{"url": "https://t.co/UeJ4Vw3DNb", "expanded_url": "https://youtu.be/UkwsermH0EY", "display_url": "youtu.be/UkwsermH0EY", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 130, "friends_count": 101, "listed_count": 0, "created_at": "Mon Mar 09 01:37:39 +0000 2015", "favourites_count": 8732, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 2150, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1090821215579705351/ci0x3d1U_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1090821215579705351/ci0x3d1U_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/3068904452/1529832326", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": true, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "lang": "en"}
{"created_at": "Tue Feb 05 13:01:10 +0000 2019", "id": 1092770107196145664, "id_str": "1092770107196145664", "full_text": "@Cucuuuy29 Have you checked out Concorde? Take a look at our latest flight deals: https://t.co/l8Eo3ARyhp", "truncated": false, "display_text_range": [11, 105], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "Cucuuuy29", "name": "Deissy \u2728", "id": 3068904452, "id_str": "3068904452", "indices": [0, 10]}], "urls": [{"url": "https://t.co/l8Eo3ARyhp", "expanded_url": "https://concorde.io", "display_url": "concorde.io", "indices": [82, 105]}]}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"https://concorde.io\" rel=\"nofollow\">Concorde Helper</a>", "in_reply_to_status_id": 1092545345622589440, "in_reply_to_status_id_str": "1092545345622589440", "in_reply_to_user_id": 3068904452, "in_reply_to_user_id_str": "3068904452", "in_reply_to_screen_name": "Cucuuuy29", "user": {"id": 4249658592, "id_str": "4249658592", "name": "Concorde Cheaps", "screen_name": "concordecheaps", "location": "In the cloud.", "description": "Discover cheap flights with Concorde.", "url": "https://t.co/YZ6V5SLzKh", "entities": {"url": {"urls": [{"url": "https://t.co/YZ6V5SLzKh", "expanded_url": "https://concorde.io", "display_url": "concorde.io", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 769, "friends_count": 4, "listed_count": 79, "created_at": "Sun Nov 15 21:37:40 +0000 2015", "favourites_count": 49, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 22381, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/666008495133626368/vBf8kjXB_normal.png", "profile_image_url_https": "https://pbs.twimg.com/profile_images/666008495133626368/vBf8kjXB_normal.png", "profile_banner_url": "https://pbs.twimg.com/profile_banners/4249658592/1454193154", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "possibly_sensitive": false, "lang": "en"}
{"created_at": "Mon Feb 04 18:34:34 +0000 2019", "id": 1092491619243302918, "id_str": "1092491619243302918", "full_text": "@hey_ciara @coolkidmiri", "truncated": false, "display_text_range": [11, 23], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "hey_ciara", "name": "Ciara Johnson", "id": 786360813187571712, "id_str": "786360813187571712", "indices": [0, 10]}, {"screen_name": "coolkidmiri", "name": "miri \ud83c\uddf3\ud83c\uddec", "id": 1053462132262690816, "id_str": "1053462132262690816", "indices": [11, 23]}], "urls": []}, "metadata": {"iso_language_code": "und", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1082335132818771968, "in_reply_to_status_id_str": "1082335132818771968", "in_reply_to_user_id": 786360813187571712, "in_reply_to_user_id_str": "786360813187571712", "in_reply_to_screen_name": "hey_ciara", "user": {"id": 2680409402, "id_str": "2680409402", "name": "nicole", "screen_name": "enimsajn_", "location": "216", "description": "", "url": null, "entities": {"description": {"urls": []}}, "protected": false, "followers_count": 2168, "friends_count": 745, "listed_count": 7, "created_at": "Fri Jul 25 20:26:50 +0000 2014", "favourites_count": 35168, "utc_offset": null, "time_zone": null, "geo_enabled": true, "verified": false, "statuses_count": 35331, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "000000", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": true, "profile_image_url": "http://pbs.twimg.com/profile_images/1092265592822865920/prl8CRTC_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1092265592822865920/prl8CRTC_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/2680409402/1549251387", "profile_link_color": "BDEDFF", "profile_sidebar_border_color": "000000", "profile_sidebar_fill_color": "000000", "profile_text_color": "000000", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": false, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "lang": "und"}
{"created_at": "Sat Feb 02 12:02:21 +0000 2019", "id": 1091668140012974080, "id_str": "1091668140012974080", "full_text": "@hey_ciara following !", "truncated": false, "display_text_range": [11, 22], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "hey_ciara", "name": "Ciara Johnson", "id": 786360813187571712, "id_str": "786360813187571712", "indices": [0, 10]}], "urls": []}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1082335132818771968, "in_reply_to_status_id_str": "1082335132818771968", "in_reply_to_user_id": 786360813187571712, "in_reply_to_user_id_str": "786360813187571712", "in_reply_to_screen_name": "hey_ciara", "user": {"id": 28462348, "id_str": "28462348", "name": "Jh\u00e9ani", "screen_name": "iamjheani", "location": "Los Angeles, CA", "description": "pop/r&b princess IG: iamjheani", "url": null, "entities": {"description": {"urls": []}}, "protected": false, "followers_count": 2798, "friends_count": 245, "listed_count": 29, "created_at": "Fri Apr 03 00:46:41 +0000 2009", "favourites_count": 124782, "utc_offset": null, "time_zone": null, "geo_enabled": true, "verified": false, "statuses_count": 43157, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "0E0000", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme6/bg.gif", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme6/bg.gif", "profile_background_tile": true, "profile_image_url": "http://pbs.twimg.com/profile_images/1081151604814802944/CCYQBSw4_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1081151604814802944/CCYQBSw4_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/28462348/1547805519", "profile_link_color": "542403", "profile_sidebar_border_color": "000000", "profile_sidebar_fill_color": "010300", "profile_text_color": "4A4730", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": false, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "lang": "en"}
{"created_at": "Tue Jan 29 01:23:20 +0000 2019", "id": 1090057773369315328, "id_str": "1090057773369315328", "full_text": "@hey_ciara @chanslifee", "truncated": false, "display_text_range": [11, 22], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "hey_ciara", "name": "Ciara Johnson", "id": 786360813187571712, "id_str": "786360813187571712", "indices": [0, 10]}, {"screen_name": "chanslifee", "name": "babygirlvampedupp", "id": 2872357053, "id_str": "2872357053", "indices": [11, 22]}], "urls": []}, "metadata": {"iso_language_code": "und", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1082335132818771968, "in_reply_to_status_id_str": "1082335132818771968", "in_reply_to_user_id": 786360813187571712, "in_reply_to_user_id_str": "786360813187571712", "in_reply_to_screen_name": "hey_ciara", "user": {"id": 992933584603369474, "id_str": "992933584603369474", "name": "Shay\ud83d\udc97\ud83d\ude80", "screen_name": "Shaunequa5", "location": "United States", "description": "just livin\u2019 life \ud83e\uddd8\ud83c\udffd\u200d\u2640\ufe0f|IG:shay.leeeeeee_", "url": null, "entities": {"description": {"urls": []}}, "protected": false, "followers_count": 90, "friends_count": 81, "listed_count": 0, "created_at": "Sun May 06 01:06:29 +0000 2018", "favourites_count": 8403, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 1260, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "F5F8FA", "profile_background_image_url": null, "profile_background_image_url_https": null, "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1091389797145276416/Bf-rGH0g_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1091389797145276416/Bf-rGH0g_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/992933584603369474/1533486045", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": true, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 1, "favorited": false, "retweeted": false, "lang": "und"}

opened by EralpB 20

Refactor command line and add manually set fields and expansions

Fix #493 will also Fix #550

This also refactors the command2.py click command line options.

Needs a good bit of testing to make sure all the old commands still work.

opened by igorbrigadir 19

Client forbidden

Hi, I have configured twarc2

Your keys have been written to /home/aborruso/.config/twarc/config


✨ ✨ ✨  Happy twarcing! ✨ ✨ ✨

But when I run twarc2 conversation 1406852944784412675 I have

⚡ Client Forbidden

My log

2021-06-21 09:41:00,283 INFO using config /home/aborruso/.config/twarc/config
2021-06-21 09:41:00,283 INFO creating HTTP session headers for app auth.
2021-06-21 09:41:00,283 INFO getting ('https://api.twitter.com/2/tweets/search/recent',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'media.fields': 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'max_results': 100, 'query': 'conversation_id:1406852944784412675'}}

If I run twarc filter covid I have no error. I'm using twarc v2.2.0.

Thank you

opened by aborruso 18

CLI: Allow to differentiate between 404 and connection timeout

As I'm struggling to work around the API connection issues with CLI twarc1, I need some way to differentiate unsuccessful completion with "Read timed out" and "404 no users found from the list specified".

My app deals with accounts that last few hours to few weeks, and therefore "No users found" for users/show is absolutely legit outcome that needs not to be retried or handled in any special way.

On the contrary, in timeout scenario I need to retry twarc invocation until it succeeds.

CLI twarc1 returns error code 1 both in case of 404 and connection timeout.

Is there way to either make twarc consider 404 as a non-error scenario, or to differentiate between 404 and connection timeout for CLI utility?

opened by antibot4navalny 4
Add `include_ext_is_blue_verified=True`

Where we have this:

https://github.com/DocNow/twarc/blob/cdb03503d3bdb7bf724e9b0edf6009fa0b9acc1a/twarc/client.py#L120

We should also include this new one. The v1.1 API is no longer maintained, but it seems it is getting a few fields that may be useful. So for those that still have access to v1.1, this will be good to have.
enhancement

opened by igorbrigadir 0
Add ability to configure max retries for v1.1 server errors
Given the increasing instability of the v1.1 API, it's helpful to be able to tune the number of retries for server errors.

Adds max_server_error_retries param to the v1.1 client retaining old value of 30 as default value

Pass max_server_error_retries through the rate_limit decorator

Pass max_server_error_retries through uses of Twarc.get() in client.py

Given I haven't consulted with anyone on this change, please let me know what you think and I'm very happy to alter it!

Some questions I have for you all:

Do you think the client initialisation is an appropriate place to expose the parameter?

Should the parameter be exposed in the CLI as well as in the library?

I passed the parameter through all usages of Twarc.get() in client.py, but not in the usages in utils/deletes.py as those are a bit different as they initialise their own client. What are your thoughts on how they should handle the parameter?

Should I pass the parameter through Twarc.post() as well so they are also configurable? Should they possibly be a separate parameter for posts rather than gets as they're for different purposes so people may want different values?

For further context, I'm suggesting this change because we're seeing an increasing number of 500 errors returned from the v1.1 timeline endpoint, which is not surprising given everything that's going on at Twitter. For my usage of the endpoint, I'd much rather skip over that request earlier and move on to my next request - we are finding some requests do reach the 30 retries! I'd imagine that other people may want to retry for longer to increase their chances of getting the data they want.
enhancement
opened by betsybookwyrm 3
Raise a meaningful error when trying to flatten non-tweets
Using Jupyter in VSCode. Several of the twarc functions lead to the entire of the data being printed to the console which makes debugging a pain and generally clogs things up.

An example code that does this.

from twarc.client2 import Twarc2 from twarc.expansions import ensure_flattened from twarc.expansions import ensure_flattened twarc = Twarc2(SOME LOGIN INFO) listOfIds = [SOMEIDS] for id in listOfIds search = twarc.liking_users(id, max_results=100) for page in search: for profile in ensure_flattened(page): # Do something with the tweet allLikes.append({tid:profile['username']})

This code leads to every returned profile being printed in full to the console. Example of the output here.

https://imgur.com/a/X82V5K9
good first issue
opened by gdhpearson 2
implement home timeline reverse chrono

PR for the feature requested here: https://github.com/DocNow/twarc/issues/639

Unless I am missing something, the changes I made to the __init__ should allow users to access other v2-only endpoint as well https://github.com/DocNow/twarc/issues/581

opened by ntorba 5
Option to use full archive search by default

I would suppose that users that have Academic Research access would typically use the full archive search. However, it is easy to forget the --archive flag, especially since it is not available for all commands.

It would be preferable to be able to set that full archive search to be the default option.

I understand that silently changing the endpoint is not a good idea. Instead, a warning could be displayed if recent search version is used, where full archive would be available.

opened by Iseratho 2

Releases(v2.13.0)

v2.13.0(Dec 26, 2022)
What's Changed

bump python version to >=3.6 by @igorbrigadir in https://github.com/DocNow/twarc/pull/660

Updates to tutorial doc by @boyd-nguyen in https://github.com/DocNow/twarc/pull/665

Twarc Tutorial by @SamHames in https://github.com/DocNow/twarc/pull/558

Add twarc demo video by @Quiet27 in https://github.com/DocNow/twarc/pull/677

Add missing variants field to media by @igorbrigadir in https://github.com/DocNow/twarc/pull/679

New Contributors

@boyd-nguyen made their first contribution in https://github.com/DocNow/twarc/pull/665

@Quiet27 made their first contribution in https://github.com/DocNow/twarc/pull/677

Full Changelog: https://github.com/DocNow/twarc/compare/v2.12.0...v2.13.0
Source code(tar.gz)
Source code(zip)
v2.12.0(Oct 1, 2022)
What's Changed

Add additional expansions and fields for the new tweet edit related API parameters by @SamHames in https://github.com/DocNow/twarc/pull/657

Full Changelog: https://github.com/DocNow/twarc/compare/v2.11.3...v2.12.0
Source code(tar.gz)
Source code(zip)
v2.11.3(Sep 12, 2022)
What's Changed

Add __twarc metadata to list_lookup returned data by @SamHames in https://github.com/DocNow/twarc/pull/654

Full Changelog: https://github.com/DocNow/twarc/compare/v2.11.2...v2.11.3
Source code(tar.gz)
Source code(zip)
v2.11.2(Aug 16, 2022)

This is a small bug fix release for an issue with progress bars being used incorrectly when reading data from stdin: #652.
Source code(tar.gz)
Source code(zip)
v2.11.1(Jul 18, 2022)
What's Changed

Add sort_order parameter for search api by @mirkolenz in https://github.com/DocNow/twarc/pull/645

Append matching rules from stream when flattening by @igorbrigadir in https://github.com/DocNow/twarc/pull/646

Fix bug where --max-results could not be set with --no-context-annota… by @SamHames in https://github.com/DocNow/twarc/pull/648

New Contributors

@mirkolenz made their first contribution in https://github.com/DocNow/twarc/pull/645

Full Changelog: https://github.com/DocNow/twarc/compare/v2.10.4...v2.11.1
Source code(tar.gz)
Source code(zip)
v2.11.0(Jun 30, 2022)
What's Changed

Add sort_order parameter for search api by @mirkolenz in https://github.com/DocNow/twarc/pull/645

Append matching rules from stream when flattening by @igorbrigadir in https://github.com/DocNow/twarc/pull/646

New Contributors

@mirkolenz made their first contribution in https://github.com/DocNow/twarc/pull/645

Full Changelog: https://github.com/DocNow/twarc/compare/v2.10.4...v2.11.0
Source code(tar.gz)
Source code(zip)
v2.10.4(Apr 29, 2022)
This release contains two bug fixes:

A fix to the ensure_flattened function that handles valid API responses that contain errors, but no data #627

A fix to the v1.1 user_lookup function that raises a useful error when a string is passed, preventing the lookup of single character usernames. Thanks to @hauselin.

Source code(tar.gz)
Source code(zip)
v2.10.3(Apr 21, 2022)
This release fixes two issues:

reports a meaningful error when the timeline command is called for a user that doesn't exist

correctly handles counts when querying for a user that doesn't exist

Source code(tar.gz)
Source code(zip)
v2.10.2(Apr 1, 2022)
What's Changed

Set daemon attribute instead of using setDaemon method that was deprecated in Python 3.10. by @tirkarthi in https://github.com/DocNow/twarc/pull/617

Fix method name typo introduced by refactor by @SamHames in https://github.com/DocNow/twarc/pull/621

New Contributors

@tirkarthi made their first contribution in https://github.com/DocNow/twarc/pull/617

Full Changelog: https://github.com/DocNow/twarc/compare/v2.10.1...v2.10.2
Source code(tar.gz)
Source code(zip)
v2.10.1(Mar 25, 2022)

This fixes the issue with the searches command not handling an empty file correctly reported in #612.
Source code(tar.gz)
Source code(zip)
v2.10.0(Mar 23, 2022)
This release adds support for:

all of the list related endpoints via the twarc2 lists subcommands and associated client methods.

the new quote tweet endpoint via the twarc2 quotes command and associated client methods.

Source code(tar.gz)
Source code(zip)
v2.9.5(Mar 4, 2022)

This release adds a workaround for a bug in Twitter's Counts API endpoint which was resulting in the twarc2 counts command stopping prematurely. Thanks to @melaniewalsh and @SamHames for the detective work! See #602 for the story.
Source code(tar.gz)
Source code(zip)
v2.9.4(Feb 24, 2022)

This release is functionally identical to v2.9.3, which contained a bugfix for an issue with the streaming API raising an exception and stopping early.

Due to a mistake in the release process v2.9.3 wasn't deployed to PyPI. Rather than edit history to re-release that version, this new release is being made instead.
Source code(tar.gz)
Source code(zip)
v2.9.3(Feb 9, 2022)

This version fixes a bug in the twarc2 sample command, that would cause an exception to be raised when trying to log a non-existent tweet ID.
Source code(tar.gz)
Source code(zip)
v2.9.2(Jan 31, 2022)
This release includes new functionality to provide a User-Agent HTTP header with all Twitter API requests. For example:

twarc/2.9.2 (Darwin x86_64) CPython/3.10.1
Source code(tar.gz)
Source code(zip)
v2.9.1(Jan 31, 2022)

Updated version number.
Source code(tar.gz)
Source code(zip)
v2.9.0(Jan 27, 2022)
What's Changed

More badges in the readme by @igorbrigadir in https://github.com/DocNow/twarc/pull/577

Support likes and retweets endpoints. by @SamHames in https://github.com/DocNow/twarc/pull/588

Full Changelog: https://github.com/DocNow/twarc/compare/v2.8.3...v2.9.0
Source code(tar.gz)
Source code(zip)
v2.8.3(Jan 6, 2022)

Fixes an issue where twarc was not correctly handling the 1 request/s rate limit for the search/all endpoint. Also includes better handling and error messages of situations when that rate limit is hit.
Source code(tar.gz)
Source code(zip)
v2.8.2(Dec 4, 2021)

This release includes some improvements to how the mandatory the mandatory one second sleep between requests to the search api is handled with some of the twarc2 commands. See #575 for details.
Source code(tar.gz)
Source code(zip)
v2.8.1(Nov 15, 2021)

v2.8.1 includes a small update to the twarc search --help message that links to Twitter's Building Queries for Search Tweets to help users figure out what's possible.

https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query
Source code(tar.gz)
Source code(zip)

v2.8.0(Oct 23, 2021)

v2.8.0 adds some new controls for shaping the data that is returned from the Twitter API. The default behavior is for twarc to retrieve the fullest representation of a tweet by requesting all tweet, user, media, place and poll fields as well as all available expansions. This is generally good practice with twarc because it means that downstream processing of the collected data can rely on have all this data at its disposal. However there may be cases where you want to customize the data that comes back. This is not recommended practice but it could be useful in some contexts.

The following options allow you to fine tune the types of data that are requested when using the following sub-commands: search, searches, tweet, sample, hydrate, users, mentions, timeline, timelines, conversation, conversations, and stream. The options include:

  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.

These correspond to the API Fields and Expansions.

There is also --minimal-fields which requests just a minimal subset of data, and --no-context-annotations that does not include context-annotations, which allows more tweets to be fetched at one time (500 instead of 100). This also applies to the sub-commands: search, searches, tweet, sample, hydrate, users, mentions, timeline, timelines, conversation, conversations, stream.

  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--counts-only, --poll-fields,
                                  --media-fields, --expansions, --no-context-
                                  annotations, --place-fields, --user-fields,
                                  --tweet-fields].

Source code(tar.gz)
Source code(zip)

v2.7.3(Oct 10, 2021)

A bugfix release to apply black formatting rules.
Source code(tar.gz)
Source code(zip)
v2.7.2(Oct 10, 2021)

A bugfix release to apply black formatting rules.
Source code(tar.gz)
Source code(zip)
v2.7.1(Oct 10, 2021)
Add start-time/since-id parameters in the timeline CLI command to the timelines CLI command.

Ensure that sample command only writes JSON on stdout.

Source code(tar.gz)
Source code(zip)

v2.7.0(Oct 4, 2021)

v2.7.0 adds a new places command to search for places and their identifiers, which can be used in search and stream queries. Even though it's still on the 1.1 endpoint the 1.1/geo/search.json API endpoint makes these place identifiers available when searching by the name, geo coordinates, or ip address.

Usage: twarc2 places [OPTIONS] VALUE [OUTFILE]

  Search for places by place name, geo coordinates or ip address.

Options:
  --type [name|geo|ip]            How to search for places (defaults to name)
  --granularity [neighborhood|city|admin|country]
                                  What type of places to search for (defaults
                                  to neighborhood)
  --max-results INTEGER           Maximum results to return
  --json                          Output raw JSON response
  --help                          Show this message and exit.

There is a corresponding method twarc.client2.Twarc2.geo() method which you can use to do the lookup yourself from Python.

Source code(tar.gz)
Source code(zip)

v2.6.0(Sep 27, 2021)
Adds the searches CLI command for running multiple searches from an input file

Makes progress reporting more accurate for commands that consume files one line at a time (users, conversations, hydrate etc)

Source code(tar.gz)
Source code(zip)

v2.5.0(Sep 22, 2021)

This release includes new functionality for working with Twitter's new Batch Compliance API which allow you to upload large datasets of Tweet or user IDs to retrieve their compliance status in order to determine what data requires action in order to bring your datasets into compliance.

Usage: twarc2 compliance-job [OPTIONS] COMMAND [ARGS]...

  Create, retrieve and list batch compliance jobs for Tweets and Users.

Options:
  --help  Show this message and exit.

Commands:
  create    Create a new compliance job and upload tweet IDs.
  download  Download the compliance job with the specified ID.
  get       Returns status and download information about the job ID.
  list      Returns a list of compliance jobs by job type and status.

Source code(tar.gz)
Source code(zip)

v2.4.3(Aug 18, 2021)

Source code(tar.gz)
Source code(zip)
v2.4.2(Aug 17, 2021)
This release ensures that the timeline, timelines, conversation and conversations commands default to a --start-time of 2006-03-21 (the first day of tweets) when being instructed to use the /tweets/search/all endpoint behind the scenese. For example when doing:

twarc2 timeline --use-search jack

or:

twarc2 conversation --archive 21

Previously it was defaulting to the last 30 days (which is an unfortunate default set by the /tweets/search/all endpoint). Many thanks to Darren Halpin and @SamHames for identifying and fixing the issue!
Source code(tar.gz)
Source code(zip)
v2.4.1(Aug 11, 2021)

This release includes support for requesting the new alt_text field for media from Twitter's v2 API:

https://twittercommunity.com/t/media-alt-text-field-now-available-in-twitter-api-v2/157939
Source code(tar.gz)
Source code(zip)

A command line tool (and Python library) for archiving Twitter JSON

Related tags

Overview

twarc

Contributing

Documentation

Code

Comments

Releases(v2.13.0)

v2.13.0(Dec 26, 2022)

What's Changed

New Contributors

v2.12.0(Oct 1, 2022)

What's Changed

v2.11.3(Sep 12, 2022)

What's Changed

v2.11.2(Aug 16, 2022)

v2.11.1(Jul 18, 2022)

What's Changed

New Contributors

v2.11.0(Jun 30, 2022)

What's Changed

New Contributors

v2.10.4(Apr 29, 2022)

v2.10.3(Apr 21, 2022)

v2.10.2(Apr 1, 2022)

What's Changed

New Contributors

v2.10.1(Mar 25, 2022)

v2.10.0(Mar 23, 2022)

v2.9.5(Mar 4, 2022)

v2.9.4(Feb 24, 2022)

v2.9.3(Feb 9, 2022)

v2.9.2(Jan 31, 2022)

v2.9.1(Jan 31, 2022)

v2.9.0(Jan 27, 2022)

What's Changed

v2.8.3(Jan 6, 2022)

v2.8.2(Dec 4, 2021)

v2.8.1(Nov 15, 2021)

v2.8.0(Oct 23, 2021)

v2.7.3(Oct 10, 2021)

v2.7.2(Oct 10, 2021)

v2.7.1(Oct 10, 2021)

v2.7.0(Oct 4, 2021)

v2.6.0(Sep 27, 2021)

v2.5.0(Sep 22, 2021)

v2.4.3(Aug 18, 2021)

v2.4.2(Aug 17, 2021)

v2.4.1(Aug 11, 2021)

Owner

Documenting the Now

PipeCat - A command line Youtube music player written in python.

🎈 `st` is a CLI to quickly kick-off your new Streamlit project

Standalone script written in Python 3 for generating Reverse Shell one liner snippets and handles the communication between target and client using custom Netcat binaries

OneDriveExplorer - A command line and GUI based application for reconstructing the folder structure of OneDrive from the UserCid.dat file

A VIM-inspired filemanager for the console

Synchronization tool for external devices which does not support time stamps, e.g. over MTP.

Doing set operations on files considered as sets of lines

Tablicate - Python library for easy table creation and output to terminal

A command line tool made in Python for the popular rhythm game

A 3D engine powered by ASCII art

A begginer reverse shell tool python.

A simple command for converting and processing data from your clipboard.

🌍 Harness the power of whatsmydns from the command-line.

A command line tool to query source code from your current Python env

py-image-dedup is a tool to sort out or remove duplicates within a photo library

A terminal client for connecting to hack.chat servers

moviepy-cli: Command line interface for MoviePy.

A very simple and lightweight ToDo app using python that can be used from the command line

A command line tool to hide and reveal information inside images (works for both PNGs and JPGs)

Python3 library for multimedia functions at the command terminal