Scrapy笔记

字数 341 · 2019-04-19

#python

install

1
pip install Scrapy

Cli

Demo

1
2
3
4
last = response.css('div.pg .last').xpath('text()').re('\d+')[0]

item['name'] = animal.css('h3 a').xpath('@title').extract_first()
item['created_date'] = animal.css('h6').xpath('text()').extract_first()

XPath

CSSSelector

Reference

Document - docs.scrapy.org

tor

1
2
brew install tor
brew service services start tor
1
2
3
pip install requests
pip install requests[socks]
pip install requests[security]
1
2
3
4
5
6
7
8
9
10
11
12
import requests

url = 'https://ident.me'

requests.get(url).text

proxies = {
    'http': 'socks5://127.0.0.1:9050',
    'https': 'socks5://127.0.0.1:9050'
}

requests.get(url, proxies=proxies).text

Obtaining a new identity

1
pip install stem

/usr/local/etc/tor/torrc

1
2
ControlPort 9051
CookieAuthentication 1
1
brew service services restart tor
1
2
3
4
5
6
from stem import Signal
from stem.control import Controller

with Controller.from_port(port = 9051) as c:
    c.authenticate()
    c.signal(Signal.NEWNYM)