posted on 2023-06-06 11:50 read(355) comment(0) like(28) collect(3)
params, allows us to request data with parameters: which page do I want? The keywords I want to search for? How much data do I want?
headers, request headers. It tells the server, what is my device/browser? What page do I come from?
url = 'https://www.douban.com/search?q=%E6%B5%B7%E8%BE%B9%E7%9A%84%E5%8D%A1%E5%A4%AB%E5%8D%A1'
url = 'https://y.qq.com/n/ryqq/search?searchid=1&remoteplace=txt.yqq.top&w=%E5%91%A8%E6%9D%B0%E4%BC%A6&t=song'
The above two URL examples are separated by ? or #: a url consists of two parts, before ? (sometimes "#") is the address we request, and after ? are the parameters attached to our request. Note: use #
to separate can be replaced by?; URLs separated by? may not necessarily be replaced by #
The first half is the address we requested, which tells the server that I want to visit here.
The second half is the parameters attached to our request, which tells the server what kind of data we want.
The structure of the parameter is very similar to a dictionary, with keys and values, and the key values are connected by =; between each group of key values, & is used to connect.
To understand parameters, there are two important methods are "observation" and "comparison":
Every request will have a Request Headers, which we call request headers. It will contain some basic information about the request, such as: what device and browser is this request sent from? Which page is this request redirected from?
user-agent (Chinese: User Agent) will record your computer information and browser version
origin (Chinese: source) and referer (Chinese: reference source) record the request, which page the original origin came from. The difference between them is that the referer will carry more information than the origin
Its biggest application is to help us deal with " anti-crawler " technology, disguising Python crawlers as real browsers and not being recognized by the server; at the same time, it can also mark the source of this request, and finally help us get the information we want
import requests
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
headers = {
'origin':'https://y.qq.com',
# 请求来源,本案例中其实是不需要加这个参数的,只是为了演示
'referer':'https://y.qq.com/n/yqq/song/004Z8Ihr0JIu5s.html',
# 请求来源,携带的信息比“origin”更丰富,本案例中其实是不需要加这个参数的,只是为了演示
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
# 标记了请求从什么设备,什么浏览器上发出
}
params = { 'ct':'24', 'qqmusic_ver': '1298', 'new_json':'1', 'remoteplace':'sizer.yqq.song_next', 'searchid':'59091538798969282', 't':'0', 'aggr':'1', 'cr':'1', 'catZhida':'1', 'lossless':'0', 'flag_qc':'0', 'p':'1', 'n':'20', 'w':'周杰伦', 'g_tk':'5381', 'loginUin':'0', 'hostUin':'0', 'format':'json', 'inCharset':'utf8', 'outCharset':'utf-8', 'notice':'0', 'platform':'yqq.json', 'needNewCode':'0' } # 将参数封装为字典 res_music = requests.get(url,headers=headers,params=params) # 发起请求,填入请求头和参数
import requests # 引用requests模块 url = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg' # 请求歌曲评论的url参数前面的部分 for i in range(5): params = { 'g_tk':'5381', 'loginUin':'0', 'hostUin':'0', 'format':'json', 'inCharset':'utf8', 'outCharset':'GB2312', 'notice':'0', 'platform':'yqq.json', 'needNewCode':'0', 'cid':'205360772', 'reqtype':'2', 'biztype':'1', 'topid':'102065756', 'cmd':'6', 'needmusiccrit':'0', 'pagenum':str(i), 'pagesize':'15', 'lasthotcommentid':'song_102065756_3202544866_44059185', 'domain':'qq.com', 'ct':'24', 'cv':'10101010' } # 将参数封装为字典 res_comments = requests.get(url,params=params) # 调用get方法,下载这个字典 json_comments = res_comments.json() list_comments = json_comments['comment']['commentlist'] for comment in list_comments: print(comment['rootcommentcontent']) print('-----------------------------------')
import requests import json # 引用requests,json模块 url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp' headers = { 'referer':'https://y.qq.com/portal/search.html', # 请求来源 'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' # 标记了请求从什么设备,什么浏览器上发出 } for x in range(5): params = { 'ct':'24', 'qqmusic_ver': '1298', 'new_json':'1', 'remoteplace':'sizer.yqq.lyric_next', 'searchid':'94267071827046963', 'aggr':'1', 'cr':'1', 'catZhida':'1', 'lossless':'0', 'sem':'1', 't':'7', 'p':str(x+1), 'n':'10', 'w':'周杰伦', 'g_tk':'1714057807', 'loginUin':'0', 'hostUin':'0', 'format':'json', 'inCharset':'utf8', 'outCharset':'utf-8', 'notice':'0', 'platform':'yqq.json', 'needNewCode':'0' } res = requests.get(url, params = params)#下载该网页,赋值给res jsonres = json.loads(res.text)#使用json来解析res.text list_lyric = jsonres['data']['lyric']['list']#一层一层地取字典,获取歌词的列表 for lyric in list_lyric:#lyric是一个列表,x是它里面的元素 print(lyric['content'])#以content为键,查找歌词
import requests url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp' singer = input('你喜欢的歌手是谁呢?') for x in range(6): params = { 'ct':'24', 'qqmusic_ver': '1298', 'new_json':'1', 'remoteplace':'txt.yqq.song', 'searchid':'70717568573156220', 't':'0', 'aggr':'1', 'cr':'1', 'catZhida':'1', 'lossless':'0', 'flag_qc':'0', 'p':str(x+1), 'n':'20', 'w':singer, 'g_tk':'714057807', 'loginUin':'0', 'hostUin':'0', 'format':'json', 'inCharset':'utf8', 'outCharset':'utf-8', 'notice':'0', 'platform':'yqq.json', 'needNewCode':'0' } # 将参数封装为字典 res_music = requests.get(url,params=params) # 调用get方法,下载这个列表 json_music = res_music.json() # 使用json()方法,将response对象,转为列表/字典 list_music = json_music['data']['song']['list'] # 一层一层地取字典,获取歌单列表 for music in list_music: # list_music是一个列表,music是它里面的元素 print(music['name']) # 以name为键,查找歌曲名 print('所属专辑:'+music['album']['name']) # 查找专辑名 print('播放时长:'+str(music['interval'])+'秒') # 查找播放时长 print('播放链接:https://y.qq.com/n/yqq/song/'+music['mid']+'.html\n\n') # 查找播放链接
Author:Abstract
link:http://www.pythonblackhole.com/blog/article/80553/79cc4fce83766a5825d9/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!