补充一下,如果采集的链接是一个pdf文件(可能链接就是pdf路径也可能只是一个带参数的网址,然后跳转到pdf下载的),像这种情况。如果用打开网页的方式,就会弹出一个保存的对话框,然后程序就会卡在这个对话框不会继续往下执行了。请问怎么解决呢
或者这么说吧, http 状态 以及 响应 和请求的 头文件,对采集来说也是非常重要的。Request URL:
http://jyt.nmg.gov.cn/xxgk/wjtz/201801/P020180817615689450871.pdfRequest Method: GET
Status Code: 200 OK
Remote Address: 222.74.200.122
0
Referrer Policy: no-referrer-when-downgrade
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7
Connection: keep-alive
Cookie: _gscu_1874576098=78487030x74i1i12; _gscbrs_1874576098=1; UM_distinctid=16f8528a3111b-09fe147a649dd4-6701b35-1fa400-16f8528a31248b; CNZZDATA1260571631=1674668539-1578486362-%7C1578486362
Host: jyt.nmg.gov.cn
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7
Connection: keep-alive
Cookie: _gscu_1874576098=78487030x74i1i12; _gscbrs_1874576098=1; UM_distinctid=16f8528a3111b-09fe147a649dd4-6701b35-1fa400-16f8528a31248b; CNZZDATA1260571631=1674668539-1578486362-%7C1578486362
Host: jyt.nmg.gov.cn
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36