python: 爬虫利器requests

发布时间:2019-08-19 09:22:29编辑:auto阅读(1752)

    requests并不是系统自带的模块,他是第三方库,需要安装才能使用

    requests库使用方式

    闲话少说,来,让我们上代码:
    简单的看一下效果:

    import requests
    requests = requests.session()
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'
    }
    url = "http://httpbin.org"
    response = requests.get(url, headers=headers, timeout=None)
    print(response.text)
    print(response.cookies)
    print(response.content)
    print(response.content.decode("utf-8"))
    print(respone.json())

    基本的post请求:

    data = {
        "name":"zhaofan",
        "age":23
    }
    response = requests.post("http://httpbin.org/post",data=data)
    print(response.text)

    对于无效的网站证书请求方法:

    import requests
    from requests.packages import urllib3
    urllib3.disable_warnings()
    response = requests.get("https://www.12306.cn",verify=False)
    print(response.status_code)
    

    代理设置:

    import requests
    
    proxies= {
        "http":"http://127.0.0.1:9999",
        "https":"http://127.0.0.1:8888"
    }
    response  = requests.get("https://www.baidu.com",proxies=proxies)
    print(response.text)
    
    如果代理需要设置账户名和密码,只需要将字典更改为如下:
    proxies = {
    "http":"http://user:password@127.0.0.1:9999"
    }
    如果你的代理是通过sokces这种方式则需要pip install "requests[socks]"
    proxies= {
    "http":"socks5://127.0.0.1:9999",
    "https":"sockes5://127.0.0.1:8888"
    }
    

    超时设置

    通过timeout参数可以设置超时的时间

    没有超时时间,一直等待
    timeout=None
    

    异常捕捉:

    import requests
    
    from requests.exceptions import ReadTimeout,ConnectionError,RequestException
    
    try:
        response = requests.get("http://httpbin.org/get",timout=0.1)
        print(response.status_code)
    except ReadTimeout:
        print("timeout")
    except ConnectionError:
        print("connection Error")
    except RequestException:
        print("error")
    

关键字