当前位置:网站首页>Crawler Basics - session and cookies
Crawler Basics - session and cookies
2022-07-19 07:15:00 【W_ chuanqi】
Personal profile
Author's brief introduction : Hello everyone , I am a W_chuanqi, A programming enthusiast
Personal home page :W_chaunqi
Stand by me : give the thumbs-up + Collection ️+ Leaving a message.
May you and I share :“ If you are in the mire , The heart is also in the mire , Then all eyes are muddy ; If you are in the mire , And I miss Kun Peng , Then you can see 90000 miles of heaven and earth .”

List of articles
The first 1 Chapter Reptile base
1.4 Session and Cookie
In the process of browsing the website , We often need to log in , Some pages can only be accessed after logging in . You can visit the website many times after logging in , But sometimes you need to log in again after a while . There are also some websites , Automatically log in when you open your browser , And it will not fail for a long time , What's the situation ? Actually, it involves Session and Cookie Knowledge about , In this section, we will uncover their mystery .
1. Static web pages and dynamic web pages
Before we begin to uncover the secrets , We need to understand the concept of static web page and dynamic web page first . Or use “ The structure of the web page ” The example code of section , The contents are as follows :
<!DOCTYPE html>
<html>
<head>
<meta charset-"UTF-8">
<title>This is a Demo</title>
</head>
<body>
<div ide"container">
<div class="wrapper">
<h2 class="titie">Hello World</h2>
<p class="text">Hello, this is a paragraph.</p>
</div>
</div>
</body>
</html>
This is the most basic HTML Code , We save it as a .html file , And put this file on a fixed public network IP On a host of , Install Apache or Nginx Wait for the server , Then the host can act as a server , Others can see the instance page by visiting the server , This is the simplest website .
The content of this web page is made up of HTML Coded , written words 、 Pictures and other contents are written by HTML Code to specify , This kind of page is called static page . Static web pages load faster 、 Write simple , At the same time, there are also great defects , Such as poor maintainability 、 Can't be based on URL Flexible display of content, etc . If we want to give static web pages URL One by one name Parameters , Let it display in the web page , It can't be done .
So dynamic web pages came into being , It can dynamically parse URL Changes in parameters in , Associate the database and dynamically render different page contents , Very flexible . Almost all the websites we see now are dynamic websites , They are no longer a simple HTML page , It could be from JSP、PHP、Python Wait for the language to write , Functions are more powerful than static web pages 、 Too much . Besides , The dynamic website can also realize the function of user login and registration .
Back to the question at the beginning , Many pages can only be viewed after logging in . According to the general logic , Enter your user name and password to log in to the website , Must have got something like a voucher , With this certificate , To stay logged in , Visit the pages you can't see until you log in . What is this mysterious Certificate ? It's actually Session and Cookie The common result , Let's explore .
2. No state HTTP
In understanding Session and Cookie Before , We also need to understand HTTP A feature of , It's called statelessness .
HTTP The statelessness of means HTTP Protocol has no memory for transaction processing , Or the server doesn't know what state the client is in . After the client sends a request to the server , The server resolves this request , And then return the corresponding response , The server is responsible for this process , And the process is completely independent , The server does not record changes in state before and after , That is, the lack of status records . This means that if you need to deal with the previous information later , The client must retransmit , This leads to the need to pass additional repeated requests , To get subsequent responses , This effect is obviously not what we want . In order to maintain the front and back state , Certainly, the client cannot retransmit all the previous requests , It's a waste of resources , For pages that require user login , It's even trickier .
At this time , Two for holding HTTP The technology of connection state appears , Namely Session and Cookie.Session On the server , That is, the server of the website , Used to save the user's Session Information ;Cookie On the client side , It can also be understood as on the browser side , With Cookie, The browser will automatically attach it the next time it visits the same web page , And send it to the server , The server identifies Cookie Identify which user is accessing , Then judge whether this user is logged in , And return the corresponding response .
It can be understood in this way ,Cookie The login credentials are stored in , The client only needs to carry it on the next request , You don't have to re-enter the user name 、 Login again with password and other information .
So in reptiles , When dealing with pages that need to be logged in first , Generally, we will directly log in the information obtained after successful login Cookie Put it in the request header and directly request , Instead of re simulating login .
3. Session
Session, It is called conversation in Chinese , Its original meaning refers to a series of actions from beginning to end 、 news . For example, when making a phone call , The process from picking up the phone and dialing to hanging up the phone can be called a Session.
And in the Web in ,Session Object is used to store specific users Session Required properties and configuration information . such , When a user jumps between pages of an application , Stored in Session The variables in the object will not be lost , In the whole user Session It has always existed in . When a user requests a page from an application , If the user has not Session, that Web The server will automatically create a Session object . When Session After expiration or abandonment , The server will terminate the Session.
4. Cookie
Cookie, It refers to some websites in order to identify users 、 Conduct Session Data that is tracked and stored on the user's local terminal .
Session maintain
that , How to use Cookie Stay in shape ? When the client first requests the server , The server will return a response with Set-Cookie Field response to client , This field is used to mark the user . The client browser will take Cookie Save up , The next time you request the same website , Put the preserved Cookie Put it in the request header and submit it to the server .Cookie It's carrying Session ID Related information , The server passed the check Cookie You can find the corresponding Session, Then through judgment Session Identify user status . If Session It is currently valid , It proves that the user is logged in , At this time, the server returns the web page content that can be viewed only after logging in , The browser parses it again and you can see .
conversely , If it is transmitted to the server Cookie It's invalid , perhaps Session It's overdue , The client will no longer be able to access the page , At this time, you may receive an incorrect response or jump to the login page to log in again .
Cookie and Session Need to cooperate with , One on the client side , One is on the server side , The two work together , Login control is realized .
Attribute structure
Next , Let's see. Cookie What does it contain . Take jd.com as an example , Open in the browser developer tool Application tab , One part on the left is called Storage,Storage The last item of is Cookies, Turn it on , As shown in the figure below .

You can see , There are many entries in the list , Each of these entries can be called a Cookie entry .Cookie It has the following properties .
- Name: Cookie The name of .Cookie Once created , The name cannot be changed .
- Value: Cookie Value . If the value is Unicode character , You need to encode characters . If the value is binary data , You need to use BASE64 code .
- Domain: Specify that you can access the Cookie Domain name of . For example setting Domain by .jd.com, All in the form of jd.com All domain names at the end can access the Cookie.
- Path: Cookie The use path of . If set to /path/, Only the path is /path/ Can access this Cookie. If set to /, Then all pages under this domain name can access this Cookie.
- Max-Age: Cookie Expiration time , The unit is in seconds , Constant harmony Expires Use it together , Through this attribute, we can calculate Cookie Effective time of .Max-Age If it's a positive number , said Cookie stay Max-Age Seconds later ; If it's negative , be Cookie Invalid when closing the browser , And the browser will not save this in any form Cookie.
- Size Field :Cookie Size .
- HTTP Field :Cookie Of httponly attribute . If this property is true, Only in HTTP Headers This will appear in Cookie Information about , Not through document.cookie Come and ask Cookie.
- Secure: Whether only secure protocol transmission is allowed Cookie. The security agreement has HTTPS and SSL etc. , Before using these protocols to transmit data on the network, the data will be encrypted . The default value is false.
conversation Cookie And persistent Cookie
On the surface , conversation Cookie Is to put Cookie In browser memory , After closing the browser ,Cookie I.e. failure ; persistent Cookie It will put Cookie Save to the client's hard disk , You can use it again next time , It is used to maintain the login status of users for a long time .
Strictly speaking , Actually, there is no conversation Cookie And persistent Cookie Points , It's just Max-Age or Expires Field determines Cookie Expiration time .
therefore , Some persistent login websites actually put Cookie The effective time and Session The validity period is set to be relatively long , The next time the client accesses the page, it still carries the previous Cookie, You can directly present the login status .
5. Common misconceptions about
Talking about Session When it comes to mechanisms , I often hear a misunderstanding —— Just close the browser ,Session It disappeared . Imagine the membership card in life , Unless the customer offers to sell the card to the store , Otherwise, the store owner will never easily delete the customer information . Yes Session Come on , It's the same thing , Unless the program tells the server to delete a Session, Otherwise, the server will keep . For example, programs are usually deleted when we log off Session.
But when we close the browser , The browser will not actively notify the server that it will be shut down before shutting down , So the server will never have a chance to know that the browser has been closed . The reason for the above misunderstanding , Because most websites use conversation Cookie To preserve Session ID Information , After the browser closes Cookie It disappeared , When the browser connects to the server again , You can't find the original Session 了 . If the server is set Cookie Save to hard disk , Or use some means to rewrite the HTTP Request header , Original Cookie Send to the server , When you open the browser again , You can still find the original Session ID, Remain logged in .
And it's precisely because closing the browser won't lead to Session Be deleted , Therefore, the server needs to be Session Set an expiration time , When the distance from the client last used Session When the time exceeds the expiration time , The server can think that the client has stopped its activity , And delete Session To save storage space .
Be deleted , Therefore, the server needs to be Session Set an expiration time , When the distance from the client last used Session When the time exceeds the expiration time , The server can think that the client has stopped its activity , And delete Session To save storage space .
边栏推荐
- m基于matlab的BTS天线设计,带GUI界面
- Security自动登录与防CSRF攻击冲突解决办法
- Recursive access to directories, print Fibonacci sequences, high-order functions
- 基于小波域的隐马尔可夫树模型的图像去噪方法的matlab实现代码
- 论文阅读:Deep Residual Shrinkage Networksfor Fault Diagnosis
- What do you need to build a website
- 爬虫基础—Session和Cookie
- 快速学会cut命令,uniq命令的使用
- Pytorch learning diary (III)
- 网站被劫持了怎么办?
猜你喜欢
随机推荐
剑指Offer刷题记录——Offer 06.从尾到头打印链表
How does the advanced anti DDoS server confirm which are malicious ip/ traffic? ip:103.88.32. XXX
m基于simulink的16QAM和2DPSK通信链路仿真,并通过matlab调用simulink模型得到误码率曲线
高防服务器是如何确认哪些是恶意IP/流量?ip:103.88.32.XXX
SNN学习日记——安装SpikingJelly
linux下执行shell脚本调用sql文件,传输到远程服务器
M BTS antenna design based on MATLAB, with GUI interface
【无标题】
The use and differences of dictionaries, tuples and lists,
JS不使用async/await解决数据异步/同步问题
IP103.53.125.xxx IP地址段 详解
Dictionary, use of sets, conversion of data types
M simulation of DQPSK modulation and demodulation technology based on MATLAB
MySQL正则表达式^和$用法
快速理解重定向
Xiaodi network security - Notes (2)
Functions and random numbers
快速掌握sort命令,tr命令
1. What is a server?
Ucloud Shanghai arm cloud server evaluation









