Pelican plugin that adds site search capability

Overview

Search: A Plugin for Pelican

Build Status PyPI Version

This plugin generates an index for searching content on a Pelican-powered site.

Why would you want this?

Static sites are, well, staticโ€ฆ and thus usually donโ€™t have an application server component that could be used to power site search functionality. Rather than give up control (and privacy) to third-party search engine corporations, this plugin adds elegant and self-hosted site search capability to your site. Last but not least, searches are really fast. ๐Ÿš€

Installation

This plugin uses Stork to generate a search index. Follow the Stork installation instructions to install this required command-line tool and ensure it is available within /usr/local/bin/ or another $PATH-accessible location of your choosing. For example, Stork can be installed on macOS via:

export STORKVERSION="v1.2.1"
wget -O /usr/local/bin/stork https://files.stork-search.net/releases/$STORKVERSION/stork-macos-10-15
chmod +x /usr/local/bin/stork

Confirm that Stork is properly installed via:

stork --help

Once Stork has been successfully installed and tested, this plugin can be installed via:

python -m pip install pelican-search

Settings

This pluginโ€™s behavior can be customized via Pelican settings. Those settings, and their default values, follow below.

SEARCH_MODE = "output"

In addition to plain-text files, Stork can recognize and index HTML and Markdown-formatted content. The default behavior of this plugin is to index generated HTML files, since Stork is good at extracting content from tags, scripts, and styles. But that mode may require a slight theme modification that isnโ€™t necessary when indexing Markdown source (see SEARCH_HTML_SELECTOR setting below). That said, indexing Markdown means that markup information may not be removed from the indexed content and will thus be visible in the search preview results. With that caveat aside, if you want to index Markdown source content files instead of the generated HTML output, you can use: SEARCH_MODE = "source"

SEARCH_HTML_SELECTOR = "main"

By default, Stork looks for

[โ€ฆ]
tags to determine where your main content is located. If such tags do not already exist in your themeโ€™s template files, you can either (1) add
tags or (2) change the HTML selector that Stork should look for.

To use the default main selector, in each of your themeโ€™s relevant template files, wrap the content you want to index with

tags. For example:

article.html:

<main>
{{ article.content }}
main>

page.html:

<main>
{{ page.content }}
main>

For more information, refer to Storkโ€™s documentation on HTML tag selection.

Static Assets

There are two options for serving the necessary JavaScript, WebAssembly, and CSS static assets:

  1. Use a content delivery network (CDN) to serve Storkโ€™s static assets
  2. Self-host the Stork static assets

The first option is easier to set up. The second option is provided for folks who prefer to self-host everything. After you have decided which option you prefer, follow the relevant sectionโ€™s instructions below.

Static Assets โ€” Option 1: Use CDN

CSS

Add the Stork CSS before the closing tag in your themeโ€™s base template file, such as base.html:

">
<link rel="stylesheet" href="https://files.stork-search.net/basic.css" />

If your theme supports dark mode, you may want to also add Storkโ€™s dark CSS file:

">
<link rel="stylesheet" media="screen and (prefers-color-scheme: dark)" href="https://files.stork-search.net/dark.css">

JavaScript

Add the following script tags to your themeโ€™s base template, just before your closing tag, which will load the most recent Stork module along with the matching WASM binary:

">
<script src="https://files.stork-search.net/releases/v1.2.1/stork.js">script>
<script>
    stork.register("sitesearch", "{{ SITEURL }}/search-index.st")
script>

Static Assets โ€” Option 2: Self-Host

Download the Stork JavaScript, WebAssembly, and CSS files and put them in your themeโ€™s respective static asset directories:

export STORKVERSION="v1.2.1"
cd $YOUR-THEME-DIR
mkdir -p static/{js,css}
wget -O static/js/stork.js https://files.stork-search.net/releases/$STORKVERSION/stork.js
wget -O static/js/stork.wasm https://files.stork-search.net/releases/$STORKVERSION/stork.wasm
wget -O static/css/stork.css https://files.stork-search.net/basic.css
wget -O static/css/stork-dark.css https://files.stork-search.net/dark.css

CSS

Add the Stork CSS before the closing tag in your themeโ€™s base template file, such as base.html:

">
<link rel="stylesheet" href="{{ SITEURL }}/{{ THEME_STATIC_DIR }}/css/stork.css">

If your theme supports dark mode, you may want to also add Storkโ€™s dark CSS file:

">
<link rel="stylesheet" media="screen and (prefers-color-scheme: dark)" href="{{ SITEURL }}/{{ THEME_STATIC_DIR }}/css/stork-dark.css">

JavaScript & WebAssembly

Add the following script tags to your themeโ€™s base template file, such as base.html, just before the closing tag:

">
<script src="{{ SITEURL }}/{{ THEME_STATIC_DIR }}/js/stork.js">script>
<script>
    stork.initialize("{{ SITEURL }}/{{ THEME_STATIC_DIR }}/js/stork.wasm")
    stork.downloadIndex("sitesearch", "{{ SITEURL }}/search-index.st")
    stork.attach("sitesearch")
script>

Search Input Form

Decide in which place(s) on your site you want to put your search field, such as your index.html template file. Then add the search field to the template:

">
Search: <input data-stork="sitesearch" />
<div data-stork="sitesearch-output">div>

For more information regarding this topic, see the Stork search interface documentation.

Deployment

Ensure your production web server serves the WebAssembly file with the application/wasm MIME type. For folks using older versions of Nginx, that might look like the following:

โ€ฆ
http {
    โ€ฆ
    include             mime.types;
    # Types not included in older Nginx versions:
    types {
        application/wasm                                 wasm;
    }
    โ€ฆ
}

For other self-hosting considerations, see the Stork self-hosting documentation.

Contributing

Contributions are welcome and much appreciated. Every little bit helps. You can contribute by improving the documentation, adding missing features, and fixing bugs. You can also help out by reviewing and commenting on existing issues.

To start contributing to this plugin, review the Contributing to Pelican documentation, beginning with the Contributing Code section.

Comments
  • unexpected keyword argument 'capture_output'

    unexpected keyword argument 'capture_output'

    Installed the module according the installation documentation.

    But get the following error msg: CRITICAL TypeError: init() got an unexpected keyword argument 'capture_output'

    Stork --help, results in the expected output.

    I've added the plugin to the configuration, as well as the settings:

    PLUGINS = [ .....
      'post_stats', 'related_posts', 'search', 'seo', 'simple_footnotes', 'share_post', 'sitemap',
      ....
    ]
    
    SEARCH_MODE = "output"
    SEARCH_HTML_SELECTOR = "main"
    

    Can this be investigated?

    opened by radoeka 8
  • expected newline, found an identifier

    expected newline, found an identifier

    • [X] I have read the Filing Issues and subsequent โ€œHow to Get Helpโ€ sections of the documentation.
    • [X] I have searched the issues (including closed ones) and believe that this is not a duplicate.

    • OS version and name: FreeBSD 13.0-RELEASE-p4
    • Python version:3.8.12
    • Pelican version: 4.7.1
    • Version of this plugin: 1.0.0

    Steps up to this point:

    • I installed Stork as needed for this plugin via Rustup. stork --version reports 1.3.0.
    • I set up a venv with Pelican and generated some posts using the default theme. It all generated okay.
    • I installed pelican-search via pip in the venv
    • I set edited the theme to add the CDN code and add the search box.
    • I added SEARCH_MODE = "output" and SEARCH_HTML_SELECTOR = "main" to the pelicanconf.py file.
    • I called pelican to regenerate the site.

    After that I got this error.

    CRITICAL Exception: Search plugin         __init__.py:550
                        reported Error: expected                        
                        newline, found an identifier at                 
                        line 438 column 11   
    

    Help! I do not understand Python at all, so I don't have any insight into the issue beyond what I have here. :( I am happy to provide additional information if requested (and given a little instruction on how).

    bug 
    opened by oiseaumanifesto 6
  • How to translate stork?

    How to translate stork?

    • [x] I have searched the issues (including closed ones) and believe that this is not a duplicate.
    • [x] I have searched the documentation and believe that my question is not covered.

    Issue

    How to translate search? My site is in portuguese and the search in english:

    image question 
    opened by paulocoutinhox 3
  • Wrong URL when inside an article

    Wrong URL when inside an article

    • [x] I have read the Filing Issues and subsequent โ€œHow to Get Helpโ€ sections of the documentation.
    • [x] I have searched the issues (including closed ones) and believe that this is not a duplicate.

    Issue

    Hi,

    When enable search, the stork url is not going to correct place when im inside article:

    image

    Example on screenshot above: http://localhost:8000/2022/06/28/2022/06/28/apocalipse-de-jesus-cristo-fase-3-o-trono-e-os-seres-viventes.html

    duplicated: 2022/06/28/2022/06/28
    

    If im on home, it is working.

    Thanks.

    bug 
    opened by paulocoutinhox 2
  • demo website ?

    demo website ?

    • [x] I have searched the issues (including closed ones) and believe that this is not a duplicate.
    • [x] I have searched the documentation and believe that my question is not covered.
    • [ ] I am willing to lend a hand to help implement this feature. (well, ATM I don't have time, but in a few week why not)

    Feature Request

    Hello ! It could be cool to have a demo site for this plugin ! :D

    enhancement 
    opened by ebanDev 2
  • search.toml

    search.toml

    I have installed the plugin, using Flex theme. I cannot see any search.toml, or instructions on how to generate it.

    • [x] I have searched the issues (including closed ones) and believe that this is not a duplicate.
    • [x] I have searched the documentation and believe that my question is not covered.

    Issue

    Trying to generate my theme, with the search enabled, and I get errors regarding no search.toml file - it is similar to https://github.com/pelican-plugins/search/issues/3 and it appears all I need is a search.toml file. However, I can't see any definitive / suggested configurations for pelican.

    If you could point me in the correct direction that would be superb!

    Thanks

    question 
    opened by Bobspadger 0
  • Escape double quotes in page titles in stork TOML file

    Escape double quotes in page titles in stork TOML file

    Pull Request Checklist

    Resolves: https://github.com/pelican-plugins/search/issues/3#issuecomment-1186287268

    • [x] Conformed to code style guidelines by running appropriate linting tools
    • [x] Updated documentation for changed code

    Description

    If a page title contains a double quote ("), the double quote is rendered into the stork toml file verbatim, creating a syntax error and causing stork to fail.

    This PR escapes double quotes in titles (AFAICT this is the only place where they should appear) with \", fixing the syntax errors.

    opened by s3lph 0
  • Layout issues due to Stork progress bar

    Layout issues due to Stork progress bar

    • [X ] I have searched the issues (including closed ones) and believe that this is not a duplicate.
    • [ X] I have searched the documentation and believe that my question is not covered.

    I have a couple layout issues with the Stork progress bar.

    First it appears in the middle of the article (seemingly where the results box would end).

    Second, more bothersome, blank space appears on the right side of my page, which impacts usability, especially on mobiles when swiping.

    Narrowed down the issue (deleting elements in DevTools until I found the culprit) to this div:

    <div class="stork-progress" style="width: 100%; opacity: 0;"></div>
    

    which is not mine - it seemingly gets inserted by the plugin at build time, and I have not managed to override the inline style via external CSS.

    When width: 0% my layout issue disappears (because the progress bar does not have any space, ie is hidden).

    Where/how can I change width to 0%, or deactivate the progress bar entirely (ie not have this div inserted)? Perhaps better for a future version to define those inline styles in the external CSS?

    question 
    opened by ndeville 0
  • hint to include plugins in pelicanconf.py

    hint to include plugins in pelicanconf.py

    Please include a hint to include plugins=['search'] in the README.MD

    • [x] I have searched the issues (including closed ones) and believe that this is not a duplicate.

    Issue

    opened by kika21 0
  • fix(windows): convert back-slashes to forward-slashes for Windows

    fix(windows): convert back-slashes to forward-slashes for Windows

    1. Summary

    Pelican Stork search doesnโ€™t work correctly on my Windows if OUTPUT_PATH setting is custom.

    After fixing I can successfully use Pelican Stork search:

    404 page search demo

    2. MCVE files

    You can see this MCVE configuration on the KiraPelicanPluginsSitemapStork branch of my demo repository for testing Pelican.

    All files except those listed below are the result of running the command pelican-quickstart.

    1. pelicanconf.py

      """MCVE."""
      
      # [INFO] Default settings
      AUTHOR = 'Sasha Chernykh'
      SITENAME = 'SashaPelicanDebugging'
      SITEURL = 'https://kristinita.netlify.app'
      
      PATH = 'content'
      
      TIMEZONE = 'Europe/Moscow'
      
      DEFAULT_LANG = 'en'
      
      ARTICLE_PATHS = [
          'Articles'
      ]
      
      MARKDOWN = {
          'output_format': 'html5',
      }
      
      
      # [INFO] Settings for this issue
      PLUGINS = [
          'search'
      ]
      
      SEARCH_HTML_SELECTOR = 'body'
      
      OUTPUT_PATH = 'output/'
      
      
    2. content/Articles/KiraArticle.md:

      Slug: KiraArticle
      Title: KiraArticle
      Date: 2020-09-24 18:57:33
      
      Kira Goddess!
      
      
    3. .circleci/config.yml:

      version: 2.1
      
      jobs:
        build:
          machine:
            image: ubuntu-2204:current
          steps:
          - checkout
          - run: pyenv global 3.10.5
          - run: pip install pelican markdown
          - run: pip install pelican-search
          # [INFO] Non-interactive Rust installation on Ubuntu
          # https://stackoverflow.com/a/57251636/5951529
          - run: curl https://sh.rustup.rs -sSf | sh -s -- -y
          - run: cargo install stork-search --locked
          - run: stork --version
          - run: pelican content -s pelicanconf.py --fatal warnings --debug
          - run: ls output
          - run: cat output/search.toml
      
      

    3. Behavior before change

    If custom OUTPUT_PATH on Windows, Pelican Stork search generate invalid path slashes for the value of base_directory setting of search.toml file:

    [input]
    base_directory = "D:\SashaDemoRepositories\SashaPelicanDebugging\output"
    html_selector = "body"
    
    [[input.files]]
    path = "KiraArticle.html"
    url = "/KiraArticle.html"
    title = "KiraArticle"
    

    Incorrect TOML

    If I run:

    pelican content -s pelicanconf.py --fatal warnings --ignore-cache --debug
    

    I get an error:

    Exception: Search plugin reported Error: Couldn't read the configuration file: Cannot parse config as TOML. Stork recieved error: `invalid escape character in string: `S` at line 2 column 22`
    

    Full output:

    D:\SashaDemoRepositories\SashaPelicanDebugging>pelican content -s pelicanconf.py --fatal warnings --ignore-cache --debug
    [11:23:23] DEBUG    Pelican version: 4.8.0                                                                                                                                                                                                                        __init__.py:531
               DEBUG    Python version: 3.10.6                                                                                                                                                                                                                        __init__.py:532
               DEBUG    Adding current directory to system path                                                                                                                                                                                                        __init__.py:66
               DEBUG    Finding namespace plugins                                                                                                                                                                                                                        _utils.py:81
               DEBUG    Namespace plugins found:                                                                                                                                                                                                                         _utils.py:84
                        pelican.plugins.search
                        pelican.plugins.sitemap
               DEBUG    Loading plugin `search`                                                                                                                                                                                                                          _utils.py:90
               DEBUG    Registering plugin `pelican.plugins.search`                                                                                                                                                                                                    __init__.py:73
               DEBUG    Found generator: ArticlesGenerator (internal)                                                                                                                                                                                                 __init__.py:209
               DEBUG    Found generator: PagesGenerator (internal)                                                                                                                                                                                                    __init__.py:209
               DEBUG    Found generator: SearchSettingsGenerator (pelican.plugins.search.search)                                                                                                                                                                      __init__.py:209
               DEBUG    Found generator: StaticGenerator (internal)                                                                                                                                                                                                   __init__.py:209
               DEBUG    Template list: ['!simple/archives.html', '!simple/article.html', '!simple/author.html', '!simple/authors.html', '!simple/base.html', '!simple/categories.html', '!simple/category.html', '!simple/gosquared.html', '!simple/index.html',     generators.py:70
                        '!simple/page.html', '!simple/pagination.html', '!simple/period_archives.html', '!simple/tag.html', '!simple/tags.html', '!simple/translations.html', '!theme/analytics.html', '!theme/archives.html', '!theme/article.html',
                        '!theme/article_infos.html', '!theme/author.html', '!theme/authors.html', '!theme/base.html', '!theme/categories.html', '!theme/category.html', '!theme/comments.html', '!theme/disqus_script.html', '!theme/github.html',
                        '!theme/index.html', '!theme/page.html', '!theme/period_archives.html', '!theme/tag.html', '!theme/taglist.html', '!theme/tags.html', '!theme/translations.html', '!theme/twitter.html', 'analytics.html', 'archives.html', 'article.html',
                        'article_infos.html', 'author.html', 'authors.html', 'base.html', 'categories.html', 'category.html', 'comments.html', 'disqus_script.html', 'github.html', 'gosquared.html', 'index.html', 'page.html', 'pagination.html',
                        'period_archives.html', 'tag.html', 'taglist.html', 'tags.html', 'translations.html', 'twitter.html']
               DEBUG    Read file Articles/KiraArticle.md -> Article                                                                                                                                                                                                   readers.py:547
               DEBUG    Signal article_generator_preread.send(ArticlesGenerator)                                                                                                                                                                                       readers.py:560
               DEBUG    Successfully imported extension module "markdown.extensions.meta".                                                                                                                                                                                core.py:163
               DEBUG    Successfully loaded extension "markdown.extensions.meta.MetaExtension".                                                                                                                                                                           core.py:126
               DEBUG    Signal article_generator_context.send(ArticlesGenerator, <metadata>)                                                                                                                                                                           readers.py:627 [11:23:24] DEBUG    Read file images/.keep -> Static                                                                                                                                                                                                               readers.py:547
               DEBUG    Signal static_generator_preread.send(StaticGenerator)                                                                                                                                                                                          readers.py:560
               DEBUG    Signal static_generator_context.send(StaticGenerator, <metadata>)                                                                                                                                                                              readers.py:627
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/all.atom.xml                                                                                                                                                               writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/articles.atom.xml                                                                                                                                                          writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/sasha-chernykh.atom.xml                                                                                                                                                    writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/sasha-chernykh.rss.xml                                                                                                                                                     writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/all-en.atom.xml                                                                                                                                                            writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/KiraArticle.html                                                                                                                                                                 writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/index.html                                                                                                                                                                       writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/tags.html                                                                                                                                                                        writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/categories.html                                                                                                                                                                  writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/authors.html                                                                                                                                                                     writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/archives.html                                                                                                                                                                    writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/category/articles.html                                                                                                                                                           writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/author/sasha-chernykh.html                                                                                                                                                       writers.py:212
               CRITICAL Exception: Search plugin reported Error: Couldn't read the configuration file: Cannot parse config as TOML. Stork recieved error: `invalid escape character in string: `S` at line 2 column 22`                                               __init__.py:566
    
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ C:\Python310\lib\site-packages\pelican\plugins\search\search.py:38 in build_search_index         โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚    35 โ”‚   โ”‚   if not which("stork"):                                                             โ”‚
    โ”‚    36 โ”‚   โ”‚   โ”‚   raise Exception("Stork must be installed and available on $PATH.")             โ”‚
    โ”‚    37 โ”‚   โ”‚   try:                                                                               โ”‚
    โ”‚ >  38 โ”‚   โ”‚   โ”‚   output = subprocess.run(                                                       โ”‚
    โ”‚    39 โ”‚   โ”‚   โ”‚   โ”‚   [                                                                          โ”‚
    โ”‚    40 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   "stork",                                                               โ”‚
    โ”‚    41 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   "build",                                                               โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚ C:\Python310\lib\subprocess.py:524 in run                                                        โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚    521 โ”‚   โ”‚   โ”‚   raise                                                                         โ”‚
    โ”‚    522 โ”‚   โ”‚   retcode = process.poll()                                                          โ”‚
    โ”‚    523 โ”‚   โ”‚   if check and retcode:                                                             โ”‚
    โ”‚ >  524 โ”‚   โ”‚   โ”‚   raise CalledProcessError(retcode, process.args,                               โ”‚
    โ”‚    525 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚    output=stdout, stderr=stderr)                        โ”‚
    โ”‚    526 โ”‚   return CompletedProcess(process.args, retcode, stdout, stderr)                        โ”‚
    โ”‚    527                                                                                           โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    CalledProcessError: Command '['stork', 'build', '--input', 'D:\\SashaDemoRepositories\\SashaPelicanDebugging\\output\\search.toml', '--output', 'D:\\SashaDemoRepositories\\SashaPelicanDebugging\\output/search-index.st']' returned non-zero exit status 1.
    
    During handling of the above exception, another exception occurred:
    
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ C:\Python310\lib\site-packages\pelican\__init__.py:562 in main                                   โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚   559 โ”‚   โ”‚   โ”‚   watcher = FileSystemWatcher(args.settings, Readers, settings)                  โ”‚
    โ”‚   560 โ”‚   โ”‚   โ”‚   watcher.check()                                                                โ”‚
    โ”‚   561 โ”‚   โ”‚   โ”‚   with console.status("Generatingโ€ฆ"):                                          โ”‚
    โ”‚ > 562 โ”‚   โ”‚   โ”‚   โ”‚   pelican.run()                                                              โ”‚
    โ”‚   563 โ”‚   except KeyboardInterrupt:                                                              โ”‚
    โ”‚   564 โ”‚   โ”‚   logger.warning('Keyboard interrupt received. Exiting.')                            โ”‚
    โ”‚   565 โ”‚   except Exception as e:                                                                 โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚ C:\Python310\lib\site-packages\pelican\__init__.py:127 in run                                    โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚   124 โ”‚   โ”‚                                                                                      โ”‚
    โ”‚   125 โ”‚   โ”‚   for p in generators:                                                               โ”‚
    โ”‚   126 โ”‚   โ”‚   โ”‚   if hasattr(p, 'generate_output'):                                              โ”‚
    โ”‚ > 127 โ”‚   โ”‚   โ”‚   โ”‚   p.generate_output(writer)                                                  โ”‚
    โ”‚   128 โ”‚   โ”‚                                                                                      โ”‚
    โ”‚   129 โ”‚   โ”‚   signals.finalized.send(self)                                                       โ”‚
    โ”‚   130                                                                                            โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚ C:\Python310\lib\site-packages\pelican\plugins\search\search.py:113 in generate_output           โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚   110 โ”‚   โ”‚   โ”‚   fd.write(search_settings)                                                      โ”‚
    โ”‚   111 โ”‚   โ”‚                                                                                      โ”‚
    โ”‚   112 โ”‚   โ”‚   # Build the search index                                                           โ”‚
    โ”‚ > 113 โ”‚   โ”‚   build_log = self.build_search_index(search_settings_path)                          โ”‚
    โ”‚   114 โ”‚   โ”‚   build_log = "".join(["Search plugin reported ", build_log])                        โ”‚
    โ”‚   115 โ”‚   โ”‚   logger.error(build_log) if "error" in build_log else logger.debug(build_log)       โ”‚
    โ”‚   116                                                                                            โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚ C:\Python310\lib\site-packages\pelican\plugins\search\search.py:52 in build_search_index         โ”‚
    โ”‚                                                                                                  โ”‚
    โ”‚    49 โ”‚   โ”‚   โ”‚   โ”‚   check=True,                                                                โ”‚
    โ”‚    50 โ”‚   โ”‚   โ”‚   )                                                                              โ”‚
    โ”‚    51 โ”‚   โ”‚   except subprocess.CalledProcessError as e:                                         โ”‚
    โ”‚ >  52 โ”‚   โ”‚   โ”‚   raise Exception("".join(["Search plugin reported ", e.stdout, e.stderr]))      โ”‚
    โ”‚    53 โ”‚   โ”‚                                                                                      โ”‚
    โ”‚    54 โ”‚   โ”‚   return output.stdout                                                               โ”‚
    โ”‚    55                                                                                            โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    Exception: Search plugin reported Error: Couldn't read the configuration file: Cannot parse config as TOML. Stork recieved error: `invalid escape character in string: `S` at line 2 column 22`
    

    4. Change

    I applied os.sep to convert back-slashes to forward-slashes. I change the line of search.py:

    - self.output_path = output_path
    + self.output_path = output_path.replace(os.sep, '/')
    

    5. Behavior after change

    search.toml on Windows after my changes:

    - base_directory = "D:\SashaDemoRepositories\SashaPelicanDebugging\output"
    + base_directory = "D:/SashaDemoRepositories/SashaPelicanDebugging/output"
    

    This is the correct path for Windows. No errors in output.

    D:\SashaDemoRepositories\SashaPelicanDebugging>pelican content -s pelicanconf.py --fatal warnings --ignore-cache --debug
    [11:30:32] DEBUG    Pelican version: 4.8.0                                                                                                                                                                                                                        __init__.py:531
               DEBUG    Python version: 3.10.6                                                                                                                                                                                                                        __init__.py:532
               DEBUG    Adding current directory to system path                                                                                                                                                                                                        __init__.py:66
               DEBUG    Finding namespace plugins                                                                                                                                                                                                                        _utils.py:81
               DEBUG    Namespace plugins found:                                                                                                                                                                                                                         _utils.py:84
                        pelican.plugins.search
                        pelican.plugins.sitemap
               DEBUG    Loading plugin `search`                                                                                                                                                                                                                          _utils.py:90
               DEBUG    Registering plugin `pelican.plugins.search`                                                                                                                                                                                                    __init__.py:73
               DEBUG    Found generator: ArticlesGenerator (internal)                                                                                                                                                                                                 __init__.py:209
               DEBUG    Found generator: PagesGenerator (internal)                                                                                                                                                                                                    __init__.py:209
               DEBUG    Found generator: SearchSettingsGenerator (pelican.plugins.search.search)                                                                                                                                                                      __init__.py:209
               DEBUG    Found generator: StaticGenerator (internal)                                                                                                                                                                                                   __init__.py:209
               DEBUG    Template list: ['!simple/archives.html', '!simple/article.html', '!simple/author.html', '!simple/authors.html', '!simple/base.html', '!simple/categories.html', '!simple/category.html', '!simple/gosquared.html', '!simple/index.html',     generators.py:70
                        '!simple/page.html', '!simple/pagination.html', '!simple/period_archives.html', '!simple/tag.html', '!simple/tags.html', '!simple/translations.html', '!theme/analytics.html', '!theme/archives.html', '!theme/article.html',
                        '!theme/article_infos.html', '!theme/author.html', '!theme/authors.html', '!theme/base.html', '!theme/categories.html', '!theme/category.html', '!theme/comments.html', '!theme/disqus_script.html', '!theme/github.html',
                        '!theme/index.html', '!theme/page.html', '!theme/period_archives.html', '!theme/tag.html', '!theme/taglist.html', '!theme/tags.html', '!theme/translations.html', '!theme/twitter.html', 'analytics.html', 'archives.html', 'article.html',
                        'article_infos.html', 'author.html', 'authors.html', 'base.html', 'categories.html', 'category.html', 'comments.html', 'disqus_script.html', 'github.html', 'gosquared.html', 'index.html', 'page.html', 'pagination.html',
                        'period_archives.html', 'tag.html', 'taglist.html', 'tags.html', 'translations.html', 'twitter.html']
               DEBUG    Read file Articles/KiraArticle.md -> Article                                                                                                                                                                                                   readers.py:547
               DEBUG    Signal article_generator_preread.send(ArticlesGenerator)                                                                                                                                                                                       readers.py:560
               DEBUG    Successfully imported extension module "markdown.extensions.meta".                                                                                                                                                                                core.py:163
               DEBUG    Successfully loaded extension "markdown.extensions.meta.MetaExtension".                                                                                                                                                                           core.py:126
               DEBUG    Signal article_generator_context.send(ArticlesGenerator, <metadata>)                                                                                                                                                                           readers.py:627
               DEBUG    Read file images/.keep -> Static                                                                                                                                                                                                               readers.py:547
               DEBUG    Signal static_generator_preread.send(StaticGenerator)                                                                                                                                                                                          readers.py:560
               DEBUG    Signal static_generator_context.send(StaticGenerator, <metadata>)                                                                                                                                                                              readers.py:627
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/all.atom.xml                                                                                                                                                               writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/articles.atom.xml                                                                                                                                                          writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/sasha-chernykh.atom.xml                                                                                                                                                    writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/sasha-chernykh.rss.xml                                                                                                                                                     writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/feeds/all-en.atom.xml                                                                                                                                                            writers.py:163
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/KiraArticle.html                                                                                                                                                                 writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/index.html                                                                                                                                                                       writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/tags.html                                                                                                                                                                        writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/categories.html                                                                                                                                                                  writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/authors.html                                                                                                                                                                     writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/archives.html                                                                                                                                                                    writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/category/articles.html                                                                                                                                                           writers.py:212
               INFO     Writing D:/SashaDemoRepositories/SashaPelicanDebugging/output/author/sasha-chernykh.html                                                                                                                                                       writers.py:212
               DEBUG    Search plugin reported                                                                                                                                                                                                                          search.py:118
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\css\fonts.css to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\css\fonts.css                                                                                utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\css\main.css to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\css\main.css                                                                                  utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\css\pygment.css to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\css\pygment.css                                                                            utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\css\reset.css to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\css\reset.css                                                                                utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\css\typogrify.css to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\css\typogrify.css                                                                        utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\css\wide.css to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\css\wide.css                                                                                  utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\fonts\font.css to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\fonts\font.css                                                                              utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\fonts\Yanone_Kaffeesatz_400.eot to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\fonts\Yanone_Kaffeesatz_400.eot                                            utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\fonts\Yanone_Kaffeesatz_400.svg to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\fonts\Yanone_Kaffeesatz_400.svg                                            utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\fonts\Yanone_Kaffeesatz_400.ttf to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\fonts\Yanone_Kaffeesatz_400.ttf                                            utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\fonts\Yanone_Kaffeesatz_400.woff to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\fonts\Yanone_Kaffeesatz_400.woff                                          utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\fonts\Yanone_Kaffeesatz_400.woff2 to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\fonts\Yanone_Kaffeesatz_400.woff2                                        utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\aboutme.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\aboutme.png                                                          utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\bitbucket.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\bitbucket.png                                                      utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\delicious.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\delicious.png                                                      utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\facebook.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\facebook.png                                                        utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\github.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\github.png                                                            utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\gitorious.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\gitorious.png                                                      utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\gittip.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\gittip.png                                                            utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\google-groups.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\google-groups.png                                              utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\google-plus.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\google-plus.png                                                  utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\hackernews.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\hackernews.png                                                    utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\lastfm.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\lastfm.png                                                            utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\linkedin.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\linkedin.png                                                        utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\reddit.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\reddit.png                                                            utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\rss.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\rss.png                                                                  utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\slideshare.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\slideshare.png                                                    utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\speakerdeck.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\speakerdeck.png                                                  utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\stackoverflow.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\stackoverflow.png                                              utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\twitter.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\twitter.png                                                          utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\vimeo.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\vimeo.png                                                              utils.py:332
               INFO     Copying C:\Python310\lib\site-packages\pelican\themes\notmyidea\static\images\icons\youtube.png to D:\SashaDemoRepositories\SashaPelicanDebugging\output\theme\images\icons\youtube.png                                                          utils.py:332
               INFO     Copying D:\SashaDemoRepositories\SashaPelicanDebugging\content\images\.keep to D:\SashaDemoRepositories\SashaPelicanDebugging\output\images\.keep                                                                                                utils.py:302
               INFO     Copying D:\SashaDemoRepositories\SashaPelicanDebugging\content\images\.keep to images/.keep                                                                                                                                                 generators.py:906 Done: Processed 1 article, 0 drafts, 0 hidden articles, 0 pages, 0 hidden pages and 0 draft pages in 0.56 seconds
    

    6. UNIX possible consequences

    My change shouldnโ€™t affect *nix operating systems. I check it on Circle CI.

    1. Circle CI build with configuration from the item 2.3 of this issue, generated search.toml:

      [input]
      base_directory = "/home/circleci/project/output"
      html_selector = "body"
      
      [[input.files]]
      path = "KiraArticle.html"
      url = "/KiraArticle.html"
      title = "KiraArticle"
      
    2. I change in my config.yml:

      - - run: pip install pelican-search
      + - run: pip install git+https://github.com/Kristinita/[email protected]
      

    Circle CI build, the same search.toml.

    I didnโ€™t see anything changed in Ubuntu build after my change.

    7. Reproducing problem

    I canโ€™t reproduce my problem on free remote CI services. Unfortunately, installing Stork on Windows isnโ€™t quick. To install Stork, a Windows user must install Rust and compile Stork on his own machine. I can to compile Stork on my machine, but I was getting bugs on Circle CI and AppVeyor CI.

    If you know how to compile Stork for Windows on free remote CI services, please, tell me. See also my issue on the Stork issue tracker.

    8. Environment

    1. Operating system:

      1. Local โ€” Microsoft Windows [Version 10.0.19041.1415]
      2. Circle CI โ€” Ubuntu 22.04 LTS (Jammy Jellyfish)
    2. Python โ€” 3.10.5, 3.10.6

    3. Pelican โ€” 4.8.0

    4. Stork โ€” 1.5.0

    5. pelican-search โ€” 1.0.1

    Thanks.

    opened by Kristinita 0
This tool can be used to extract information from any website

WEB-INFO- This tool can be used to extract information from any website Install Termux and run the command --- $ apt-get update $ apt-get upgrade $ pk

1 Oct 24, 2021
A scalable frontier for web crawlers

Frontera Overview Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large sc

Scrapinghub 1.2k Jan 02, 2023
A list of Python Bots used to extract data from several websites

A list of Python Bots used to extract data from several websites. Data extraction is for products on e-commerce (ecommerce) websites. Data fetched i

Sahil Ladhani 1 Jan 14, 2022
่‡ชๅŠจๅฎŒๆˆๆฏๆ—ฅไฝ“ๆธฉไธŠๆŠฅ๏ผˆGithub Actions๏ผ‰

ไฝ“ๆธฉไธŠๆŠฅๅŠฉๆ‰‹ ็ฎ€ไป‹ ๆฏๅคฉ 10:30 GMT+8 ่‡ชๅŠจๅฎŒๆˆไฝ“ๆธฉไธŠๆŠฅ๏ผŒๅฆ‚ๆƒณไฟฎๆ”นๅฎšๆ—ถ่ฟ่กŒ็š„ๆ—ถ้—ด๏ผŒๅฏไฟฎๆ”น .github/workflows/SduHealthReport.yml ไธญ schedule ๅฑžๆ€งใ€‚ ๅฆ‚ๆžœๅฝ“ๆ—ฅๆœ‰ๅผ‚ๅธธ๏ผŒ่ฏทๆ‰‹ๅŠจๅœจๅฐ็จ‹ๅบ็ซฏ/PC ็ซฏๅกซๅ†™๏ผ

Teng Zhang 23 Sep 15, 2022
Nekopoi scraper using python3

Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper from nekopoi import Hent

MhankBarBar 9 Apr 06, 2022
Web Scraping COVID 19 Meta Portal with Python

Web-Scraping-COVID-19-Meta-Portal-with-Python - Requests API and Beautiful Soup to scrape real-time COVID statistics from worldometer website and perform data cleaning and visual analysis in Jupyter

Aarif Munwar Jahan 1 Jan 04, 2022
A web service for scanning media hosted by a Matrix media repository

Matrix Content Scanner A web service for scanning media hosted by a Matrix media repository Installation TODO Development In a virtual environment wit

Brendan Abolivier 5 Dec 01, 2022
Examine.com supplement research scraper!

ExamineScraper Examine.com supplement research scraper! Why I want to be able to search pages for a specific term. For example, I want to be able to s

Tyler 15 Dec 06, 2022
Simple tool to scrape and download cross country ski timings and results from live.skidor.com

LiveSkidorDownload Simple tool to scrape and download cross country ski timings

0 Jan 07, 2022
An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

Autoscraper-n-blogger An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post and notifies via Telegram bot

GOKUL A.P 13 Dec 21, 2022
A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

WaGpScraper A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working

Muhammed Rizad 27 Dec 18, 2022
ๆฒณๅ—ๅทฅไธšๅคงๅญฆ ๅฎŒ็พŽๆ กๅ›ญ ่‡ชๅŠจๆ กๅค–ๆ‰“ๅก

HAUT-checkin ๆฒณๅ—ๅทฅไธšๅคงๅญฆ่‡ชๅŠจๆ กๅค–ๆ‰“ๅก ็”ฑไบŽgithub actionsๅญ˜ๅœจๆ˜Žๆ˜พๅปถ่ฟŸ๏ผŒๅปบ่ฎฎ็›ดๆŽฅไฝฟ็”จ่…พ่ฎฏไบ‘ๅ‡ฝๆ•ฐ ็‰น็‚น ๅคšไบบๆ‰“ๅก ไฝฟ็”จ็ฎ€ๅ•๏ผŒไป…้œ€่ดฆๅทๅฏ†็ ไปฅๅŠ็”จไบŽๅพฎไฟกๆŽจ้€็š„uid ่‡ชๅŠจ่Žทๅ–ไธŠไธ€ๆฌกๆ‰“ๅกไฟกๆฏ็”จไบŽๆ‰“ๅก ๅ‘ๆ‰€ๆœ‰ๆˆๅ‘˜ๅพฎไฟกๅ•็‹ฌๆŽจ้€ๆ‰“ๅก็Šถๆ€ ๅฎŒ็พŽๆ กๅ›ญๆœๅŠกๅ™จ็นๅฟ™ๆ—ถ้€ ๆˆๆ‰“ๅกๅคฑ่ดฅไผš่‡ชๅŠจ้‡ๆ–ฐๆ‰“ๅก

36 Oct 27, 2022
Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Shopee Scraper A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil. The project was crea

Paulo DaRosa 5 Nov 29, 2022
Meme-videos - Scrapes memes and turn them into a video compilations

Meme Videos Scrapes memes from reddit using praw and request and then converts t

Partho 12 Oct 28, 2022
๐Ÿฅซ The simple, fast, and modern web scraping library

About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I

Max Humber 692 Dec 22, 2022
Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Repositรณrio contendo scripts Python que realizam a consulta de CPF e CNPJ diretamente no site da Receita Federal.

Josuรฉ Campos 5 Nov 29, 2021
Web Scraping images using Selenium and Python

Web Scraping images using Selenium and Python A propos de ce document This is a markdown document about Web scraping images and videos using Selenium

Nafaa BOUGRAINE 3 Jul 01, 2022
A crawler of doubamovie

่ฑ†็“ฃ็”ตๅฝฑ A crawler of doubamovie ไธ€ไธชๅฐๅฐ็š„ๅ…ฅ้—จ็บงscrapyๆก†ๆžถ็š„ๅบ”็”จ๏ผŒ้€‰ๅ–่ฑ†็“ฃ็”ตๅฝฑๅฏนๆŽ’่กŒๆฆœๅ‰1000็š„็”ตๅฝฑๆ•ฐๆฎ่ฟ›่กŒ็ˆฌๅ–ใ€‚ spider.py start_requestsๆ–นๆณ•ไธบscrapy็š„ๆ–นๆณ•๏ผŒๆˆ‘ไปฌๅฏนๅฎƒ่ฟ›่กŒ้‡ๅ†™ใ€‚ def start_requests(self):

Cats without dried fish 1 Oct 05, 2021
FilmMikirAPI - A simple rest-api which is used for scrapping on the Kincir website using the Python and Flask package

FilmMikirAPI - A simple rest-api which is used for scrapping on the Kincir website using the Python and Flask package

UserGhost411 1 Nov 17, 2022
A web Scraper for CSrankings.com that scrapes University and Faculty list for a particular country

A look into what we're building Demo.mp4 Prerequisites Python 3 Node v16+ Steps to run Create a virtual environment. Activate the virtual environment.

2 Jun 06, 2022