【Python自查手册】之带参数请求params的用法

posted on 2023-06-06 11:50 read(377) comment(0) like(28) collect(3)

params, allows us to request data with parameters: which page do I want? The keywords I want to search for? How much data do I want?
headers, request headers. It tells the server, what is my device/browser? What page do I come from?

Request with parameters to realize grabbing multiple links

Different from grabbing a single URL, the parameters are uniformly encapsulated in params
params is a dictionary, added to the request
By modifying params, we can crawl to a lot of information

url = 'https://www.douban.com/search?q=%E6%B5%B7%E8%BE%B9%E7%9A%84%E5%8D%A1%E5%A4%AB%E5%8D%A1'
url = 'https://y.qq.com/n/ryqq/search?searchid=1&remoteplace=txt.yqq.top&w=%E5%91%A8%E6%9D%B0%E4%BC%A6&t=song'
The above two URL examples are separated by ? or #: a url consists of two parts, before ? (sometimes "#") is the address we request, and after ? are the parameters attached to our request. Note: use #
to separate can be replaced by?; URLs separated by? may not necessarily be replaced by #

The first half is the address we requested, which tells the server that I want to visit here.
The second half is the parameters attached to our request, which tells the server what kind of data we want.
The structure of the parameter is very similar to a dictionary, with keys and values, and the key values are connected by =; between each group of key values, & is used to connect.

Query String Parameters: XHR–Payload–Query String Parameters

To understand parameters, there are two important methods are "observation" and "comparison":

"Observing" means reading the parameter's key and value, trying to understand what it means
"Compare" refers to comparing two similar XHRs - what are the differences between them, and what is the difference in the corresponding page display content

Request Headers

Every request will have a Request Headers, which we call request headers. It will contain some basic information about the request, such as: what device and browser is this request sent from? Which page is this request redirected from?

user-agent (Chinese: User Agent) will record your computer information and browser version

origin (Chinese: source) and referer (Chinese: reference source) record the request, which page the original origin came from. The difference between them is that the referer will carry more information than the origin

Its biggest application is to help us deal with " anti-crawler " technology, disguising Python crawlers as real browsers and not being recognized by the server; at the same time, it can also mark the source of this request, and finally help us get the information we want

headers and params example notes

import requests
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
headers = {
    'origin':'https://y.qq.com',
    # 请求来源，本案例中其实是不需要加这个参数的，只是为了演示
    'referer':'https://y.qq.com/n/yqq/song/004Z8Ihr0JIu5s.html',
    # 请求来源，携带的信息比“origin”更丰富，本案例中其实是不需要加这个参数的，只是为了演示
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
    # 标记了请求从什么设备，什么浏览器上发出
    }

masquerade request header

params = {
'ct':'24',
'qqmusic_ver': '1298',
'new_json':'1',
'remoteplace':'sizer.yqq.song_next',
'searchid':'59091538798969282',
't':'0',
'aggr':'1',
'cr':'1',
'catZhida':'1',
'lossless':'0',
'flag_qc':'0',
'p':'1',
'n':'20',
'w':'周杰伦',
'g_tk':'5381',
'loginUin':'0',
'hostUin':'0',
'format':'json',
'inCharset':'utf8',
'outCharset':'utf-8',
'notice':'0',
'platform':'yqq.json',
'needNewCode':'0'    
} # 将参数封装为字典
res_music = requests.get(url,headers=headers,params=params) # 发起请求，填入请求头和参数

Turn the page to get more comments in QQ Music

import requests
# 引用requests模块
url = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
# 请求歌曲评论的url参数前面的部分
for i in range(5):
    params = {
    'g_tk':'5381',
    'loginUin':'0', 
    'hostUin':'0',
    'format':'json',
    'inCharset':'utf8',
    'outCharset':'GB2312',
    'notice':'0',
    'platform':'yqq.json',
    'needNewCode':'0',
    'cid':'205360772',
    'reqtype':'2',
    'biztype':'1',
    'topid':'102065756',
    'cmd':'6',
    'needmusiccrit':'0',
    'pagenum':str(i),
    'pagesize':'15',
    'lasthotcommentid':'song_102065756_3202544866_44059185',
    'domain':'qq.com',
    'ct':'24',
    'cv':'10101010'   
    }
    # 将参数封装为字典
    res_comments = requests.get(url,params=params)
    # 调用get方法，下载这个字典
    json_comments = res_comments.json()
    list_comments = json_comments['comment']['commentlist']
    for comment in list_comments:
        print(comment['rootcommentcontent'])
        print('-----------------------------------')

Grab the lyrics of Jay Chou's first five pages of songs in QQ Music

import requests
import json
# 引用requests,json模块
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
headers = {
    'referer':'https://y.qq.com/portal/search.html',
    # 请求来源
    'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
    # 标记了请求从什么设备，什么浏览器上发出
    }
for x in range(5):
    params = {
    'ct':'24',
    'qqmusic_ver': '1298',
    'new_json':'1',
    'remoteplace':'sizer.yqq.lyric_next',
    'searchid':'94267071827046963',
    'aggr':'1',
    'cr':'1',
    'catZhida':'1',
    'lossless':'0',
    'sem':'1',
    't':'7',
    'p':str(x+1),
    'n':'10',
    'w':'周杰伦',
    'g_tk':'1714057807',
    'loginUin':'0',
    'hostUin':'0',
    'format':'json',
    'inCharset':'utf8',
    'outCharset':'utf-8',
    'notice':'0',
    'platform':'yqq.json',
    'needNewCode':'0'  
    }
    res = requests.get(url, params = params)#下载该网页，赋值给res
    jsonres = json.loads(res.text)#使用json来解析res.text
    list_lyric = jsonres['data']['lyric']['list']#一层一层地取字典，获取歌词的列表
    for lyric in list_lyric:#lyric是一个列表，x是它里面的元素
        print(lyric['content'])#以content为键，查找歌词

Grab the song information of any singer in QQ Music

import requests
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
singer = input('你喜欢的歌手是谁呢？')
for x in range(6):
    params = {
    'ct':'24',
    'qqmusic_ver': '1298',
    'new_json':'1',
    'remoteplace':'txt.yqq.song',
    'searchid':'70717568573156220',
    't':'0',
    'aggr':'1',
    'cr':'1',
    'catZhida':'1',
    'lossless':'0',
    'flag_qc':'0',
    'p':str(x+1),
    'n':'20',
    'w':singer,
    'g_tk':'714057807',
    'loginUin':'0',
    'hostUin':'0',
    'format':'json',
    'inCharset':'utf8',
    'outCharset':'utf-8',
    'notice':'0',
    'platform':'yqq.json',
    'needNewCode':'0'    
    }
    # 将参数封装为字典
    res_music = requests.get(url,params=params)
    # 调用get方法，下载这个列表
    json_music = res_music.json()
    # 使用json()方法，将response对象，转为列表/字典
    list_music = json_music['data']['song']['list']
    # 一层一层地取字典，获取歌单列表
    for music in list_music:
    # list_music是一个列表，music是它里面的元素
        print(music['name'])
        # 以name为键，查找歌曲名
        print('所属专辑：'+music['album']['name'])
        # 查找专辑名
        print('播放时长：'+str(music['interval'])+'秒')
        # 查找播放时长
        print('播放链接：https://y.qq.com/n/yqq/song/'+music['mid']+'.html\n\n')
        # 查找播放链接