List of content farm sites like g.penzai.com.

Overview

内容农场网站清单

Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。

尤为过分的是,该类网站可能有成千上万个分身域名被 Google 收录,严重影响搜索体验。详见 2021 年 10 初的社区反馈:

  1. Github: 如何屏蔽“小搭百科网”?
  2. V2EX: 请问在 google 搜索时,频繁遇到小 X 知识网等内容农场式结果,怎么办?
  3. V2EX: google 搜中文太毒了吧,是不是已经放弃中文搜索了
  4. HOSTLOC: 这采集站群太强了吧
  5. HOSTLOC: 小*知识网站群是哪位大佬的杰作

使用正则匹配标题的方式不能完全屏蔽,所以为方便广大网友过滤搜索结果,特整理此清单。

由于此次事件主角「小搭百科网」在造成影响后主动关站,所以接下来也将关注、收录其他的类似内容农场站。

使用方式

uBlacklist

安装 uBlacklist

Chrome Web Store / Firefox Add-ons / App Store (for macOS and iOS)

后进入 Option 菜单,点击 Add a subscription,输入如下内容:

  • Name: content-farm-list
  • URL: https://raw.githubusercontent.com/wdmpa/content-farm-list/main/uBlacklist.txt

  • Name: content-farm-list
  • URL: https://wdmpa.org/content-farm-list/uBlacklist.txt

单击 'Add' 按钮。

Google Hit Hider

http://www.jeffersonscher.com/gm/google-hit-hider/

Install

Grease Fork / OpenUserJS.org

Manage lists

http://www.jeffersonscher.com/gm/google-hit-hider/manage-lists.php

订阅说明

文件 说明
uBlacklist.txt uBlacklist 规则集合
Surge.txt Surge 规则集合
uBlacklist/spam/g.penzai.com.txt uBlacklist 专用小搭百科网域名集合
Surge/spam/g.penzai.com.txt Surge 专用小搭百科网域名集合
uBlacklist/machine-translated/stackoverflow.txt uBlacklist 专用机翻 StackOverflow 域名集合
Surge/machine-translated/stackoverflow.txt Surge 专用机翻 StackOverflow 域名集合

设置搜索引擎

因与清单中域名匹配的结果会被移除,所以搜索引擎的结果页剩余条目太少,不便浏览,建议登录后设置搜索结果显示为每页面 100 条。

我们能做什么?

一、发 PR 添加域名

  1. 从本地插件 uBlacklist 中导出域名列表
  2. 在搜索引擎中尝试长尾关键词,以发现更多目前权重尚低的农场域名

按结构在 domains 目录中添加新的分类集合文件。参考文件中已有内容的格式,在任意位置添加即可。(Fork 本仓库后编辑再 Push,或在页面中编辑均可。)

文件 说明
domains/spam/g.penzai.com.txt 小搭百科网域名集合
domains/machine-translated/stackoverflow.txt 机翻 StackOverflow 域名集合

提交后,脚本会自动更新订阅文件中的内容。

二、举报

向其使用的云服务提供商举报其滥用行为。

Owner
WDMPA
World Developer Mood Protection Association
WDMPA
A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset. This repo contains scripts to train RL agents to navigate the closed world and collect vi

MUGEN 11 Oct 22, 2022
Official PyTorch implementation of PS-KD

Self-Knowledge Distillation with Progressive Refinement of Targets (PS-KD) Accepted at ICCV 2021, oral presentation Official PyTorch implementation of

61 Dec 28, 2022
CowHerd is a partially-observed reinforcement learning environment

CowHerd is a partially-observed reinforcement learning environment, where the player walks around an area and is rewarded for milking cows. The cows try to escape and the player can place fences to h

Danijar Hafner 6 Mar 06, 2022
Tutoriais publicados nas nossas redes sociais para obtenção de dados, análises simples e outras tarefas relevantes no mercado financeiro.

Tutoriais Públicos Tutoriais publicados nas nossas redes sociais para obtenção de dados, análises simples e outras tarefas relevantes no mercado finan

Trading com Dados 68 Oct 15, 2022
DTCN SMP Challenge - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".

AgentFormer This repo contains the official implementation of our paper: AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecast

Ye Yuan 161 Dec 23, 2022
VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations 3D-aware Image Synthesis via Learning Structural and Textura

GenForce: May Generative Force Be with You 116 Dec 26, 2022
This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

OpenAI 3k Dec 26, 2022
An implementation for the ICCV 2021 paper Deep Permutation Equivariant Structure from Motion.

Deep Permutation Equivariant Structure from Motion Paper | Poster This repository contains an implementation for the ICCV 2021 paper Deep Permutation

72 Dec 27, 2022
An open source python library for automated feature engineering

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to

alteryx 6.4k Jan 03, 2023
Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Myo Keylogging This is the source code for our paper My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack by Matthias Ga

Secure Mobile Networking Lab 7 Jan 03, 2023
This repository contains the scripts for downloading and validating scripts for the documents

HC4: HLTCOE CLIR Common-Crawl Collection This repository contains the scripts for downloading and validating scripts for the documents. Document ids,

JHU Human Language Technology Center of Excellence 6 Jun 07, 2022
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

Microsoft 247 Dec 25, 2022
Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

47 Jan 01, 2023
The project was to detect traffic signs, based on the Megengine framework.

trafficsign 赛题 旷视AI智慧交通开源赛道,初赛1/177,复赛1/12。 本赛题为复杂场景的交通标志检测,对五种交通标志进行识别。 框架 megengine 算法方案 网络框架 atss + resnext101_32x8d 训练阶段 图片尺寸 最终提交版本输入图片尺寸为(1500,2

20 Dec 02, 2022
Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation

tf-imle Tensorflow 2 and PyTorch implementation and Jupyter notebooks for Implicit Maximum Likelihood Estimation (I-MLE) proposed in the NeurIPS 2021

NEC Laboratories Europe 69 Dec 13, 2022
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 109 Nov 22, 2022
Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022
The repository offers the official implementation of our paper in PyTorch.

Cloth Interactive Transformer (CIT) Cloth Interactive Transformer for Virtual Try-On Bin Ren1, Hao Tang1, Fanyang Meng2, Runwei Ding3, Ling Shao4, Phi

Bingoren 49 Dec 01, 2022
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning This repository is the official implementation of CARE.

ChongjianGE 89 Dec 02, 2022