当前位置:网站首页>Word cloud graph, word frequency graph, specially statistics the word cloud word frequency of some keywords
Word cloud graph, word frequency graph, specially statistics the word cloud word frequency of some keywords
2022-07-18 21:24:00 【Listen to my call, rookie evolution】
# 1. Read the text , And use jieba In the library cut() Function for word segmentation
import jieba
import random
report = open('1.txt','r').read()
words = jieba.cut(report)
# 2. adopt for Loop statement extract list words The middle length is greater than or equal to 4 A word
report_words = []
for word in words: # Make the length greater than or equal to 4 Put words in the list
if len(word) >= 2 and word in [' autonomous ',' Rule of virtue ',' The rule of law ', ' Three treatments ', ' data ', ' Rural revitalization ', ' a farmer ', ' rural ' ,' Agriculture ', ' Governance ', ' rural ' ,' urban and rural ' ,' Numbers ',' Governance system ']:
report_words.append(word)
for i in range(0,report.count(' Rural revitalization ')):
report_words.append(' Rural revitalization ')
for i in range(0,report.count(' Governance ')):
report_words.append(' Governance ')
for i in range(0,report.count(' Governance system ')):
report_words.append(' Governance system ')
random.shuffle(report_words)
#print(report_words)
# # 3. Get the number of occurrences of high-frequency words printed out
# from collections import Counter
# result = Counter(report_words).most_common(50) # Take the most 50 Group
# #print(result)
# 4. Draw a cloud of words
from wordcloud import WordCloud # Import related libraries
content = ' '.join(report_words) # Convert list to string
wc = WordCloud(
background_color='pink',
font_path=r"C:\\Windows\\Fonts\\msyh.ttc"
).generate(content)
image_produce = wc.to_image()
# # # wordcloud.to_file("new_wordcloud.jpg")
# # image_produce.show()
# # wc.to_file(' Clouds of words .png') # Export to PNG Format picture ( Using relative paths )
import jieba
import pandas as pd
import numpy as np
import PIL.Image as image
from wordcloud import WordCloud
import matplotlib.pyplot as plt
with open("1.txt", "r",encoding='gbk') as f: # Open file
texts = f.read() # Read the file
print(texts)
texts=texts.replace(' the people ','').replace(' socialist ','').replace(' social ','').replace(' Country ','')
# Clouds of words
content=[]
words=jieba.lcut(texts)
for word in words:
if len(word)>1:
content.append(word)
contents=" ".join(content)
wordcloud = WordCloud(
background_color='pink',
font_path=r"C:\\Windows\\Fonts\\msyh.ttc"
).generate(contents)
image_produce = wordcloud.to_image()
# wordcloud.to_file("new_wordcloud.jpg")
image_produce.show()
plt.imshow(wordcloud)
# Top ten keywords with the highest frequency
import jieba.analyse
content_str=contents
print(" ".join(jieba.analyse.extract_tags(content_str,topK=20,withWeight=(False))))#topK Number of keywords
import jieba
content = open('2.txt', 'r', encoding='gbk').read()
#txt It is a text file for statistical word cloud
# content=content.replace(' the people ','').replace(' socialist ','').replace(' social ','').replace(' Country ','').replace(' China ','').replace(' In our country ','')
# words = jieba.lcut(content)
# counts = {}
# for word in words:
# if len(word) == 1: # Exclude word segmentation results of single words
# continue
# else:
# counts[word] = counts.get(word, 0) + 1 # dict usage
# hist = list(counts.items()) # Form a list
# hist.sort(key=lambda x: x[1], reverse=True)
# words=[]
# counts=[]
# for i in range(20): # Output high frequency front 20 Word
# word, count = hist[i]
# words.append(word)
# counts.append(count)
# print(counts)
# print(words)
import random
import jieba
import pandas as pd
import numpy as np
import PIL.Image as image
from wordcloud import WordCloud
import matplotlib.pyplot as plt
words=[' autonomous ',' Rule of virtue ',' The rule of law ', ' Three treatments ', ' data ', ' Rural revitalization ', ' a farmer ', ' rural ' ,' Agriculture ', ' Governance ', ' rural ' ,' urban and rural ' ,' Numbers ',' Governance system ']
counts=[]
for i in words:
num=content.count(i)
counts.append(num)
print(counts)
x_data = words
y_data = counts
contents=" ".join(words)
wordcloud = WordCloud(
background_color='pink',
font_path=r"C:\\Windows\\Fonts\\msyh.ttc"
).generate(contents)
image_produce = wordcloud.to_image()
# import matplotlib.pyplot as plt
# plt.rcParams["font.sans-serif"]=['SimHei']
# plt.rcParams["axes.unicode_minus"]=False
# for i in range(len(x_data)):
# plt.bar(x_data[i],y_data[i])
# plt.title(" Word frequency display ")
# plt.xlabel(" Word frequency ")
# plt.ylabel(" Number ")
# plt.show()
边栏推荐
- Stm32f407---- power management
- 数学建模 - 分类模型(基于logistic回归)
- Sword finger offer 55 - I. depth of binary tree
- 【我的OpenGL学习进阶之旅】NDK开发中find_library查找的系统动态库在哪里?
- R language uses LM function to build regression model and BoxCox function of mass package to find the best power transformation to improve the fitting degree of the model (determine the best λ Paramet
- Discussion on ble Bluetooth battery service
- 博客从 CloudBase 迁移至云主机
- R language uses LM function to build multiple regression model, writes regression equation according to model coefficient, and uses summary function to calculate the summary statistical information of
- c语言做推箱子
- 鸿蒙开发板上安装HAP应用方法之经典
猜你喜欢

JS 中的事件委托是什么?

数据统计分析案例(对比分析、销量定比分析、同比、双坐标图、环比、shift、贡献度分析(帕累托法则)、差异化分析、resample、季节性波动分析)

C language as a push box

Sword finger offer 55 - ii balanced binary tree

Sword finger offer 53 - ii Missing numbers from 0 to n-1

Use of prettier code formatting tool

裁员之水天上来

Electron installation configuration

MySQL --- 多表查询 - 表与表之间的关系

Mathematical modeling does not know latex typesetting | it teaches you how to use beautiful latex formulas gracefully in word
随机推荐
Find out the motivation and needs of enterprise location, and carry out investment attraction work efficiently
Data statistical analysis cases (comparative analysis, sales comparison analysis, year-on-year, double coordinate diagram, month on month, shift, contribution analysis (Pareto Law), differentiation an
Mathematical modeling - Classification Model (based on logistic regression)
The first ide overlord in the universe, replaced...
Recommend a well written article on "I2C protocol explanation"
Go项目实战【开源十年】项目第13次更新(断更两个月后的”诈尸“)
Sword finger offer 57 And are two numbers of S
MySQL --- 多表查询 - 表与表之间的关系
[JS encapsulates a simple asynchronous API to obtain asynchronous operation results and process parsing]
Unity-2D像素晶格化消融
Unity about some possible reasons and solutions for using addforce of rigidbody but it doesn't work
[C language brush leetcode] 1744 Can you eat your favorite candy on your favorite day (m)
R语言使用glm函数构建泊松对数线性回归模型处理三维列联表数据构建饱和模型、使用step函数基于AIC指标实现逐步回归筛选最佳模型
Paddle crowdnet population density estimation
UE4 shadow: perobjectshadow verification
Kotlin correctly exits the foreach and foreachindexed loop functions
True question of CCF (anger takes 100 faints)
R language uses data The table package sorts the dataframe row data (sort the data rows based on multiple fields and variables without reordering the actual data changes), and calculates the cumulativ
[hero planet July training leetcode problem solving daily] 16th queue
Opencv tutorial 02: core operations of opencv