List of content farm sites like g.penzai.com.

Overview

内容农场网站清单

Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。

尤为过分的是,该类网站可能有成千上万个分身域名被 Google 收录,严重影响搜索体验。详见 2021 年 10 初的社区反馈:

  1. Github: 如何屏蔽“小搭百科网”?
  2. V2EX: 请问在 google 搜索时,频繁遇到小 X 知识网等内容农场式结果,怎么办?
  3. V2EX: google 搜中文太毒了吧,是不是已经放弃中文搜索了
  4. HOSTLOC: 这采集站群太强了吧
  5. HOSTLOC: 小*知识网站群是哪位大佬的杰作

使用正则匹配标题的方式不能完全屏蔽,所以为方便广大网友过滤搜索结果,特整理此清单。

由于此次事件主角「小搭百科网」在造成影响后主动关站,所以接下来也将关注、收录其他的类似内容农场站。

使用方式

uBlacklist

安装 uBlacklist

Chrome Web Store / Firefox Add-ons / App Store (for macOS and iOS)

后进入 Option 菜单,点击 Add a subscription,输入如下内容:

  • Name: content-farm-list
  • URL: https://raw.githubusercontent.com/wdmpa/content-farm-list/main/uBlacklist.txt

  • Name: content-farm-list
  • URL: https://wdmpa.org/content-farm-list/uBlacklist.txt

单击 'Add' 按钮。

Google Hit Hider

http://www.jeffersonscher.com/gm/google-hit-hider/

Install

Grease Fork / OpenUserJS.org

Manage lists

http://www.jeffersonscher.com/gm/google-hit-hider/manage-lists.php

订阅说明

文件 说明
uBlacklist.txt uBlacklist 规则集合
Surge.txt Surge 规则集合
uBlacklist/spam/g.penzai.com.txt uBlacklist 专用小搭百科网域名集合
Surge/spam/g.penzai.com.txt Surge 专用小搭百科网域名集合
uBlacklist/machine-translated/stackoverflow.txt uBlacklist 专用机翻 StackOverflow 域名集合
Surge/machine-translated/stackoverflow.txt Surge 专用机翻 StackOverflow 域名集合

设置搜索引擎

因与清单中域名匹配的结果会被移除,所以搜索引擎的结果页剩余条目太少,不便浏览,建议登录后设置搜索结果显示为每页面 100 条。

我们能做什么?

一、发 PR 添加域名

  1. 从本地插件 uBlacklist 中导出域名列表
  2. 在搜索引擎中尝试长尾关键词,以发现更多目前权重尚低的农场域名

按结构在 domains 目录中添加新的分类集合文件。参考文件中已有内容的格式,在任意位置添加即可。(Fork 本仓库后编辑再 Push,或在页面中编辑均可。)

文件 说明
domains/spam/g.penzai.com.txt 小搭百科网域名集合
domains/machine-translated/stackoverflow.txt 机翻 StackOverflow 域名集合

提交后,脚本会自动更新订阅文件中的内容。

二、举报

向其使用的云服务提供商举报其滥用行为。

Owner
WDMPA
World Developer Mood Protection Association
WDMPA
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering

Deep Cognition and Language Research (DeCLaRe) Lab 23 Dec 16, 2022
Tensor-based approaches for fMRI classification

tensor-fmri Using tensor-based approaches to classify fMRI data from StarPLUS. Citation If you use any code in this repository, please cite the follow

4 Sep 07, 2022
An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Augmentation-Free Self-Supervised Learning on Graphs An official source code for Augmentation-Free Self-Supervised Learning on Graphs paper, accepted

Namkyeong Lee 59 Dec 01, 2022
Simulation of moving particles under microscopic imaging

Simulation of moving particles under microscopic imaging Install scipy numpy scikit-image tiffile Run python simulation.py Read result https://imagej

Zehao Wang 2 Dec 14, 2021
Implementation of CSRL from the AAAI2022 paper: Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

CSRL Implementation of CSRL from the AAAI2022 paper: Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning Python: 3

4 Apr 14, 2022
Official implementation of TMANet.

Temporal Memory Attention for Video Semantic Segmentation, arxiv Introduction We propose a Temporal Memory Attention Network (TMANet) to adaptively in

wanghao 94 Dec 02, 2022
All public open-source implementations of convnets benchmarks

convnet-benchmarks Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below. Machine: 6-cor

Soumith Chintala 2.7k Dec 30, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022
OpenDelta - An Open-Source Framework for Paramter Efficient Tuning.

OpenDelta is a toolkit for parameter efficient methods (we dub it as delta tuning), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most paramters

THUNLP 386 Dec 26, 2022
The fastest way to visualize GradCAM with your Keras models.

VizGradCAM VizGradCam is the fastest way to visualize GradCAM in Keras models. GradCAM helps with providing visual explainability of trained models an

58 Nov 19, 2022
This repository contains the code for: RerrFact model for SciVer shared task

RerrFact This repository contains the code for: RerrFact model for SciVer shared task. Setup for Inference 1. Download SciFact database Download the S

Ashish Rana 1 May 22, 2022
Dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K Our dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

96 Jul 05, 2022
Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs

Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs MATLAB implementation of the paper: P. Mercado, F. Tudisco, and M. Hein,

Pedro Mercado 6 May 26, 2022
NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs.

NAS-HPO-Bench-II API Overview NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs. It helps a fair and low-

yoichi hirose 8 Nov 21, 2022
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

OCTIS : Optimizing and Comparing Topic Models is Simple! OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and compa

MIND 478 Jan 01, 2023
Food recognition model using convolutional neural network & computer vision

Food recognition model using convolutional neural network & computer vision. The goal is to match or beat the DeepFood Research Paper

Hemanth Chandran 1 Jan 13, 2022
Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation The reference code of Improving Factual Completeness and C

46 Dec 15, 2022
Machine Translation Implement By Bi-GRU And Transformer

Seq2Seq Translation Implement By Bidirectional GRU And Transformer In Pytorch Before You Run The Code You should download the data through the link be

He Wang 2 Oct 27, 2021
DeconvNet : Learning Deconvolution Network for Semantic Segmentation

DeconvNet: Learning Deconvolution Network for Semantic Segmentation Created by Hyeonwoo Noh, Seunghoon Hong and Bohyung Han at POSTECH Acknowledgement

Hyeonwoo Noh 325 Oct 20, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Jan 03, 2023