python用700行代碼實(shí)現(xiàn)http客戶端
本文用python在TCP的基礎(chǔ)上實(shí)現(xiàn)一個(gè)HTTP客戶端, 該客戶端能夠復(fù)用TCP連接, 使用HTTP1.1協(xié)議.
一. 創(chuàng)建HTTP請求HTTP是基于TCP連接的, 它的請求報(bào)文格式如下:
因此, 我們只需要創(chuàng)建一個(gè)到服務(wù)器的TCP連接, 然后按照上面的格式寫好報(bào)文并發(fā)給服務(wù)器, 就實(shí)現(xiàn)了一個(gè)HTTP請求.
1. HTTPConnection類基于以上的分析, 我們首先定義一個(gè)HTTPConnection類來管理連接和請求內(nèi)容:
class HTTPConnection: default_port = 80 _http_vsn = 11 _http_vsn_str = ’HTTP/1.1’ def __init__(self, host: str, port: int = None) -> None: self.sock = None self._buffer = [] self.host = host self.port = port if port is not None else self.default_port self._state = _CS_IDLE self._response = None self._method = None self.block_size = 8192 def _output(self, s: Union[str, bytes]) -> None: if hasattr(s, ’encode’): s = s.encode(’latin-1’) self._buffer.append(s) def connect(self) -> None: self.sock = socket.create_connection((self.host, self.port))
對于這個(gè)HTTPConnection對象, 我們只需要創(chuàng)建TCP連接, 然后按照HTTP協(xié)議的格式把請求數(shù)據(jù)寫入buffer中, 最后把buffer中的數(shù)據(jù)發(fā)送出去就行了.
2. 編寫請求行請求行的內(nèi)容比較簡單, 就是說明請求方法, 請求路徑和HTTP協(xié)議. 使用下面的方法來編寫一個(gè)請求行:
def put_request(self, method: str, url: str) -> None: self._method = method url = url or ’/’ request = f’{method} {url} {self._http_vsn_str}’ self._output(request)3. 添加請求頭
HTTP請求頭和python的字典類似, 每行都是一個(gè)字段名與值的映射關(guān)系. HTTP協(xié)議并不要求設(shè)置所有合法的請求頭的值, 我們只需要按照需要, 設(shè)置特定的請求頭即可. 使用如下代碼添加請求頭:
def put_header(self, header: Union[bytes, str], value: Union[bytes, str, int]) -> None: if hasattr(header, ’encode’): header = header.encode(’ascii’) if hasattr(value, ’encode’): value = value.encode(’latin-1’) elif isinstance(value, int): value = str(value).encode(’ascii’) header = header + b’: ’ + value self._output(header)
此外, 在HTTP請求中, Host請求頭字段是必須的, 否則網(wǎng)站可能會拒絕響應(yīng). 因此, 如果用戶沒有設(shè)置這個(gè)字段, 這里就應(yīng)該主動把它加上去:
def _add_host(self, url: str) -> None: # 所有HTTP / 1.1請求報(bào)文中必須包含一個(gè)Host頭字段 # 如果用戶沒給,就調(diào)用這個(gè)函數(shù)來生成 netloc = ’’ if url.startswith(’http’): nil, netloc, nil, nil, nil = urllib.parse.urlsplit(url) if netloc: try: netloc_enc = netloc.encode(’ascii’) except UnicodeEncodeError: netloc_enc = netloc.encode(’idna’) self.put_header(’Host’, netloc_enc) else: host = self.host port = self.port try: host_enc = host.encode(’ascii’) except UnicodeEncodeError: host_enc = host.encode(’idna’) # 對IPv6的地址進(jìn)行額外處理 if host.find(’:’) >= 0: host_enc = b’[’ + host_enc + b’]’ if port == self.default_port: self.put_header(’Host’, host_enc) else: host_enc = host_enc.decode(’ascii’) self.put_header(’Host’, f’{host_enc}:{port}’)4. 發(fā)送請求正文
我們接受兩種形式的body數(shù)據(jù): 一個(gè)基于io.IOBase的可讀文件對象, 或者是一個(gè)能通過迭代得到數(shù)據(jù)的對象. 在傳輸數(shù)據(jù)之前, 我們首先要確定數(shù)據(jù)是否采用分塊傳輸:
def request(self, method: str, url: str, headers: dict = None, body: Union[io.IOBase, Iterable] = None, encode_chunked: bool = False) -> None: ... if ’content-length’ not in header_names: if ’transfer-encoding’ not in header_names: encode_chunked = False content_length = self._get_content_length(body, method) if content_length is None:if body is not None: # 在這種情況下, body一般是個(gè)生成器或者可讀文件之類的東西,應(yīng)該分塊傳輸 encode_chunked = True self.put_header(’Transfer-Encoding’, ’chunked’) else:self.put_header(’Content-Length’, str(content_length)) else: # 如果設(shè)置了transfer-encoding,則根據(jù)用戶給的encode_chunked參數(shù)決定是否分塊 pass else: # 只要給了content-length,那么一定不是分塊傳輸 encode_chunked = False ...@staticmethoddef _get_content_length(body: Union[str, bytes, bytearray, Iterable, io.IOBase], method: str) -> Optional[int]: if body is None: # PUT,POST,PATCH三個(gè)方法默認(rèn)是有body的 if method.upper() in _METHODS_EXPECTING_BODY: return 0 else: return None if hasattr(body, ’read’): return None try: # 對于bytes或者bytearray格式的數(shù)據(jù),通過memoryview獲取它的長度 return memoryview(body).nbytes except TypeError: pass if isinstance(body, str): return len(body) return None
在確定了是否分塊之后, 就可以把正文發(fā)出去了. 如果body是一個(gè)可讀文件的話, 就調(diào)用_read_readable方法把它封裝為一個(gè)生成器:
def _send_body(self, message_body: Union[str, bytes, bytearray, Iterable, io.IOBase], encode_chunked: bool) -> None: if hasattr(message_body, ’read’): chunks = self._read_readable(message_body) else: try: memoryview(message_body) except TypeError: try:chunks = iter(message_body) except TypeError:raise TypeError( f’message_body should be a bytes-like object or an iterable, got {repr(type(message_body))}’) else: # 如果是字節(jié)類型的,通過一次迭代把它發(fā)出去 chunks = (message_body,) for chunk in chunks: if not chunk: continue if encode_chunked: chunk = f’{len(chunk):X}rn’.encode(’ascii’) + chunk + b’rn’ self.send(chunk) if encode_chunked: self.send(b’0rnrn’)def _read_readable(self, readable: io.IOBase) -> Generator[bytes, None, None]: need_encode = False if isinstance(readable, io.TextIOBase): need_encode = True while True: data_block = readable.read(self.block_size) if not data_block: break if need_encode: data_block = data_block.encode(’utf-8’) yield data_block二. 獲取響應(yīng)數(shù)據(jù)
HTTP響應(yīng)報(bào)文的格式與請求報(bào)文大同小異, 它大致是這樣的:
因此, 我們只要用HTTPConnection的socket對象讀取服務(wù)器發(fā)送的數(shù)據(jù), 然后按照上面的格式對數(shù)據(jù)進(jìn)行解析就行了.
1. HTTPResponse類我們首先定義一個(gè)簡單的HTTPResponse類. 它的屬性大致上就是socket的文件對象以及一些請求的信息等等, 調(diào)用它的begin方法來解析響應(yīng)行和響應(yīng)頭的數(shù)據(jù), 然后調(diào)用read方法讀取響應(yīng)正文:
class HTTPResponse: def __init__(self, sock: socket.socket, method: str = None) -> None: self.fp = sock.makefile(’rb’) self._method = method self.headers = None self.version = _UNKNOWN self.status = _UNKNOWN self.reason = _UNKNOWN self.chunked = _UNKNOWN self.chunk_left = _UNKNOWN self.length = _UNKNOWN self.will_close = _UNKNOWN def begin(self) -> None: ... def read(self, amount: int = None) -> bytes: ...2. 解析狀態(tài)行
狀態(tài)行的解析比較簡單, 我們只需要讀取響應(yīng)的第一行數(shù)據(jù), 然后把它解析為HTTP協(xié)議版本,狀態(tài)碼和原因短語三部分就行了:
def _read_status(self) -> Tuple[str, int, str]: line = str(self._read_line(), ’latin-1’) if not line: raise RemoteDisconnected(’Remote end closed connection without response’) try: version, status, reason = line.split(None, 2) except ValueError: # reason只是給人看的, 一般和status對應(yīng), 所以它有可能不存在 try: version, status = line.split(None, 1) reason = ’’ except ValueError: version, status, reason = ’’, ’’, ’’ if not version.startswith(’HTTP/’): self._close_conn() raise BadStatusLine(line) try: status = int(status) if status < 100 or status > 999: raise BadStatusLine(line) except ValueError: raise BadStatusLine(line) return version, status, reason.strip()
如果狀態(tài)碼為100, 則客戶端需要解析多個(gè)響應(yīng)狀態(tài)行. 它的原理是這樣的: 在請求數(shù)據(jù)過大的時(shí)候, 有的客戶端會先不發(fā)送請求數(shù)據(jù), 而是先在header中添加一個(gè)Expect: 100-continue, 如果服務(wù)器愿意接收數(shù)據(jù), 會返回100的狀態(tài)碼, 這時(shí)候客戶端再把數(shù)據(jù)發(fā)過去. 因此, 如果讀取到100的狀態(tài)碼, 那么后面往往還會收到一個(gè)正式的響應(yīng)數(shù)據(jù), 應(yīng)該繼續(xù)讀取響應(yīng)頭. 這部分的代碼如下:
def begin(self) -> None: while True: version, status, reason = self._read_status() if status != HTTPStatus.CONTINUE: break # 跳過100狀態(tài)碼部分的響應(yīng)頭 while True: skip = self._read_line().strip() if not skip:breakself.status = status self.reason = reason if version in (’HTTP/1.0’, ’HTTP/0.9’): self.version = 10 elif version.startswith(’HTTP/1.’): self.version = 11 else: # HTTP2還沒研究, 這里就不寫了 raise UnknownProtocol(version) ...3. 解析響應(yīng)頭
解析響應(yīng)頭比響應(yīng)行還要簡單. 因?yàn)槊總€(gè)header字段占一行, 我們只需要一直調(diào)用read_line方法讀取字段, 直到讀完header為止就行了.
def _parse_header(self) -> None: headers = {} while True: line = self._read_line() if len(headers) > _MAX_HEADERS: raise HTTPException(’got more than %d headers’ % _MAX_HEADERS) if line in _EMPTY_LINE: break line = line.decode(’latin-1’) i = line.find(’:’) if i == -1: raise BadHeaderLine(line) # 這里默認(rèn)沒有重名的情況 key, value = line[:i].lower(), line[i + 1:].strip() headers[key] = value self.headers = headers4. 接收響應(yīng)正文
在接收響應(yīng)正文之前, 首先要確定它的傳輸方式和長度:
def _set_chunk(self) -> None: transfer_encoding = self.get_header(’transfer-encoding’) if transfer_encoding and transfer_encoding.lower() == ’chunked’: self.chunked = True self.chunk_left = None else: self.chunked = Falsedef _set_length(self) -> None: # 首先要知道數(shù)據(jù)是否是分塊傳輸?shù)? if self.chunked == _UNKNOWN: self._set_chunk() # 如果狀態(tài)碼是1xx或者204(無響應(yīng)內(nèi)容)或者304(使用上次緩存的內(nèi)容),則沒有響應(yīng)正文 # 如果這是個(gè)HEAD請求,那么也不能有響應(yīng)正文 if (self.status == HTTPStatus.NO_CONTENT or self.status == HTTPStatus.NOT_MODIFIED or 100 <= self.status < 200 or self._method == ’HEAD’): self.length = 0 return length = self.get_header(’content-length’) if length and not self.chunked: try: self.length = int(length) except ValueError: self.length = None else: if self.length < 0:self.length = None else: self.length = None
然后, 我們實(shí)現(xiàn)一個(gè)read方法, 從body中讀取指定大小的數(shù)據(jù):
def read(self, amount: int = None) -> bytes: if self.is_closed(): return b’’ if self._method == ’HEAD’: self.close() return b’’ if amount is None: return self._read_all() return self._read_amount(amount)
如果沒有指定需要的數(shù)據(jù)大小, 就默認(rèn)讀取所有數(shù)據(jù):
def _read_all(self) -> bytes: if self.chunked: return self._read_all_chunk() if self.length is None: s = self.fp.read() else: try: s = self._read_bytes(self.length) except IncompleteRead: self.close() raise self.length = 0 self.close() return sdef _read_all_chunk(self) -> bytes: assert self.chunked != _UNKNOWN value = [] try: while True: chunk = self._read_chunk() if chunk is None:break value.append(chunk) return b’’.join(value) except IncompleteRead: raise IncompleteRead(b’’.join(value))def _read_chunk(self) -> Optional[bytes]: try: chunk_size = self._read_chunk_size() except ValueError: raise IncompleteRead(b’’) if chunk_size == 0: self._read_and_discard_trailer() self.close() return None chunk = self._read_bytes(chunk_size) # 每塊的結(jié)尾會有一個(gè)rn,這里把它讀掉 self._read_bytes(2) return chunkdef _read_chunk_size(self) -> int: line = self._read_line(error_message=’chunk size’) i = line.find(b’;’) if i >= 0: line = line[:i] try: return int(line, 16) except ValueError: self.close() raisedef _read_and_discard_trailer(self) -> None: # chunk的尾部可能會掛一些額外的信息,比如MD5值,過期時(shí)間等等,一般會在header中用trailer字段說明 # 當(dāng)chunk讀完之后調(diào)用這個(gè)函數(shù), 這些信息就先舍棄掉得了 while True: line = self._read_line(error_message=’chunk size’) if line in _EMPTY_LINE: break
否則的話, 就讀取部分?jǐn)?shù)據(jù), 如果正好是分塊數(shù)據(jù)的話, 就比較復(fù)雜了. 簡單來說, 就是用bytearray制造一個(gè)所需大小的數(shù)組, 然后依次讀取chunk把數(shù)據(jù)往里面填, 直到填滿或者沒數(shù)據(jù)為止. 然后用chunk_left記錄下當(dāng)前塊剩余的量, 以便下次讀取.
def _read_amount(self, amount: int) -> bytes: if self.chunked: return self._read_amount_chunk(amount) if isinstance(self.length, int) and amount > self.length: amount = self.length container = bytearray(amount) n = self.fp.readinto(container) if not n and container: # 如果讀不到字節(jié)了,也就可以關(guān)了 self.close() elif self.length is not None: self.length -= n if not self.length: self.close() return memoryview(container)[:n].tobytes()def _read_amount_chunk(self, amount: int) -> bytes: # 調(diào)用這個(gè)方法,讀取amount大小的chunk類型數(shù)據(jù),不足就全部讀取 assert self.chunked != _UNKNOWN total_bytes = 0 container = bytearray(amount) mvb = memoryview(container) try: while True: # mvb可以理解為容器的空的那一部分 # 這里一直調(diào)用_full_readinto把數(shù)據(jù)填進(jìn)去,讓mvb越來越小,同時(shí)記錄填入的量 # 等沒數(shù)據(jù)或者當(dāng)前數(shù)據(jù)足夠把mvb填滿之后,跳出循環(huán) chunk_left = self._get_chunk_left() if chunk_left is None:break if len(mvb) <= chunk_left:n = self._full_readinto(mvb)self.chunk_left = chunk_left - ntotal_bytes += nbreak temp_mvb = mvb[:chunk_left] n = self._full_readinto(temp_mvb) mvb = mvb[n:] total_bytes += n self.chunk_left = 0 except IncompleteRead: raise IncompleteRead(bytes(container[:total_bytes])) return memoryview(container)[:total_bytes].tobytes()def _full_readinto(self, container: memoryview) -> int: # 返回讀取的量.如果沒能讀滿,這個(gè)方法會報(bào)警 amount = len(container) n = self.fp.readinto(container) if n < amount: raise IncompleteRead(bytes(container[:n]), amount - n) return ndef _get_chunk_left(self) -> Optional[int]: # 如果當(dāng)前塊讀了一半,那么直接返回self.chunk_left就行了 # 否則,有三種情況 # 1). chunk_left為None,說明body壓根沒開始讀,于是返回當(dāng)前這一整塊的長度 # 2). chunk_left為0,說明這塊讀完了,于是返回下一塊的長度 # 3). body數(shù)據(jù)讀完了,返回None,順便做好善后工作 chunk_left = self.chunk_left if not chunk_left: if chunk_left == 0: # 如果剩余零,說明上一塊已經(jīng)讀完了,這里把rn讀掉 # 如果是None,就說明chunk壓根沒開始讀 self._read_bytes(2) try: chunk_left = self._read_chunk_size() except ValueError: raise IncompleteRead(b’’) if chunk_left == 0: self._read_and_discard_trailer() self.close() chunk_left = None self.chunk_left = chunk_left return chunk_left三. 復(fù)用TCP連接
HTTP通信本質(zhì)上是基于TCP連接發(fā)送和接收HTTP請求和響應(yīng), 因此, 只要TCP連接不斷開, 我們就可以繼續(xù)用它進(jìn)行HTTP請求, 這樣就避免了創(chuàng)建和銷毀TCP連接產(chǎn)生的消耗.
在下面幾種情況中, 服務(wù)端會自動斷開連接:
HTTP協(xié)議小于1.1且沒有在頭部設(shè)置了keep-alive HTTP協(xié)議大于等于1.1但是在頭部設(shè)置了connection: close 數(shù)據(jù)沒有分塊傳輸, 也沒有說明數(shù)據(jù)的長度, 這種情況下, 服務(wù)器一般會在發(fā)送完成后斷開連接, 讓客戶端知道數(shù)據(jù)發(fā)完了根據(jù)上面列出來的幾種情況, 通過下面的代碼來判斷連接是否會斷開:
def _check_close(self) -> bool: conn = self.get_header(’connection’) if not self.chunked and self.length is None: return True if self.version == 11: if conn and ’close’ in conn.lower(): return True return False else: if self.headers.get(’keep-alive’): return False if conn and ’keep-alive’ in conn.lower(): return False return True2. 正確地關(guān)閉HTTPResponse對象
由于TCP連接的復(fù)用, 一個(gè)HTTPConnection可以產(chǎn)生多個(gè)HTTPResponse對象, 而這些對象在同一個(gè)TCP連接上, 會共用這個(gè)連接的讀緩沖區(qū). 這就導(dǎo)致, 如果上一個(gè)HTTPResponse對象沒有把它的那部分?jǐn)?shù)據(jù)讀完, 就會對下一個(gè)響應(yīng)產(chǎn)生影響.
另一方面來看, 我們也需要及時(shí)地關(guān)閉與這個(gè)TCP關(guān)聯(lián)的文件對象來避免占用資源. 因此, 我們定義如下的close方法關(guān)閉一個(gè)HTTPResponse對象:
def close(self) -> None: if self.is_closed(): return fp = self.fp self.fp = None fp.close()def is_closed(self) -> bool: return self.fp is None
用戶調(diào)用HTTPResponse對象的read方法, 把緩沖區(qū)數(shù)據(jù)讀完之后, 就會自動調(diào)用close方法(具體實(shí)現(xiàn)見上一章的第四節(jié): 讀取響應(yīng)數(shù)據(jù)這部分). 因此, 在獲取下一個(gè)響應(yīng)數(shù)據(jù)之前, 我們只需要調(diào)用這個(gè)對象的is_closed方法, 就能判斷讀緩沖區(qū)是否已經(jīng)讀完, 能否繼續(xù)接收響應(yīng)了.
3. HTTP請求的生命周期不使用管道機(jī)制的話, 不同的HTTP請求必須按次序進(jìn)行, 相互之間不能重疊. 基于這個(gè)原因, 我們?yōu)镠TTPConnection對象設(shè)置IDLE, REQ_STARTED和REQ_SENT三種狀態(tài), 一個(gè)完整的請求應(yīng)該經(jīng)歷這幾種狀態(tài):
根據(jù)上面的流程, 對HTTPConnection中對應(yīng)的方法進(jìn)行修改:
def get_response(self) -> HTTPResponse: if self._response and self._response.is_closed(): self._response = None if self._state != _CS_REQ_SENT or self._response: raise ResponseNotReady(self._state) response = HTTPResponse(self.sock, method=self._method) try: try: response.begin() except ConnectionError: self.close() raise assert response.will_close != _UNKNOWN self._state = _CS_IDLE if response.will_close: self.close() else: self._response = response return response except Exception as _: response.close() raisedef put_request(self, method: str, url: str) -> None: # 調(diào)用這個(gè)函數(shù)開始新一輪的請求,它負(fù)責(zé)寫好請求行輸出到緩存里面去 # 調(diào)用它的前提是當(dāng)前處于空閑狀態(tài) # 如果之前的response還在并且已結(jié)束,會自動把它消除掉 if self._response and self._response.is_closed(): self._response = None if self._state == _CS_IDLE: self._state = _CS_REQ_STARTED else: raise CannotSendRequest(self._state) ...def put_header(self, header: Union[bytes, str], value: Union[bytes, str, int]) -> None: if self._state != _CS_REQ_STARTED: raise CannotSendHeader() ...def end_headers(self, message_body=None, encode_chunked=False) -> None: if self._state == _CS_REQ_STARTED: self._state = _CS_REQ_SENT else: raise CannotSendHeader() ...
需要注意的是, 如果第二個(gè)請求已經(jīng)進(jìn)入到獲取響應(yīng)的階段了, 而上一個(gè)請求的響應(yīng)還沒關(guān)閉, 那么就應(yīng)該直接報(bào)錯(cuò), 否則讀取到的會是上一個(gè)請求剩余的響應(yīng)部分?jǐn)?shù)據(jù), 導(dǎo)致解析響應(yīng)出現(xiàn)問題.
事實(shí)上, HTTP1.1開始支持管道化技術(shù), 也就是一次提交多個(gè)HTTP請求, 然后等待響應(yīng), 而不是在接收到上一個(gè)請求的響應(yīng)后, 才發(fā)送后面的請求.基于這種處理模式, 管道化技術(shù)理論上可以減少IO時(shí)間的損耗, 提升效率, 不過, 需要服務(wù)端的支持, 而且會增加程序的復(fù)雜程度, 這里就不實(shí)現(xiàn)了.
四. 總結(jié)1. 完整代碼HTTPConnection的完整代碼如下:
class HTTPConnection: default_port = 80 _http_vsn = 11 _http_vsn_str = ’HTTP/1.1’ def __init__(self, host: str, port: int = None) -> None: self.sock = None self._buffer = [] self.host = host self.port = port if port is not None else self.default_port self._state = _CS_IDLE self._response = None self._method = None self.block_size = 8192 def request(self, method: str, url: str, headers: dict = None, body: Union[io.IOBase, Iterable] = None,encode_chunked: bool = False) -> None: self.put_request(method, url) headers = headers or {} header_names = frozenset(k.lower() for k in headers.keys()) if ’host’ not in header_names: self._add_host(url) if ’content-length’ not in header_names: if ’transfer-encoding’ not in header_names:encode_chunked = Falsecontent_length = self._get_content_length(body, method)if content_length is None: if body is not None: encode_chunked = True self.put_header(’Transfer-Encoding’, ’chunked’)else: self.put_header(’Content-Length’, str(content_length)) else:# 如果設(shè)置了transfer-encoding,則根據(jù)用戶給的encode_chunked參數(shù)決定是否分塊pass else: # 只要給了content-length,那么一定不是分塊傳輸 encode_chunked = False for hdr, value in headers.items(): self.put_header(hdr, value) if isinstance(body, str): body = _encode(body) self.end_headers(body, encode_chunked=encode_chunked) def send(self, data: bytes) -> None: if self.sock is None: self.connect() self.sock.sendall(data) def get_response(self) -> HTTPResponse: if self._response and self._response.is_closed(): self._response = None if self._state != _CS_REQ_SENT or self._response: raise ResponseNotReady(self._state) response = HTTPResponse(self.sock, method=self._method) try: try:response.begin() except ConnectionError:self.close()raise assert response.will_close != _UNKNOWN self._state = _CS_IDLE if response.will_close:self.close() else:self._response = response return response except Exception as _: response.close() raise def connect(self) -> None: self.sock = socket.create_connection((self.host, self.port)) def close(self) -> None: self._state = _CS_IDLE try: sock = self.sock if sock:self.sock = Nonesock.close() finally: response = self._response if response:self._response = Noneresponse.close() def put_request(self, method: str, url: str) -> None: # 調(diào)用這個(gè)函數(shù)開始新一輪的請求,它負(fù)責(zé)寫好請求行輸出到緩存里面去 # 調(diào)用它的前提是當(dāng)前處于空閑狀態(tài) # 如果之前的response還在并且已結(jié)束,會自動把它消除掉 if self._response and self._response.is_closed(): self._response = None if self._state == _CS_IDLE: self._state = _CS_REQ_STARTED else: raise CannotSendRequest(self._state) self._method = method url = url or ’/’ request = f’{method} {url} {self._http_vsn_str}’ self._output(request) def put_header(self, header: Union[bytes, str], value: Union[bytes, str, int]) -> None: if self._state != _CS_REQ_STARTED: raise CannotSendHeader() if hasattr(header, ’encode’): header = header.encode(’ascii’) if hasattr(value, ’encode’): value = value.encode(’latin-1’) elif isinstance(value, int): value = str(value).encode(’ascii’) header = header + b’: ’ + value self._output(header) def end_headers(self, message_body=None, encode_chunked=False) -> None: if self._state == _CS_REQ_STARTED: self._state = _CS_REQ_SENT else: raise CannotSendHeader() self._send_output(message_body, encode_chunked=encode_chunked) def _add_host(self, url: str) -> None: # 所有HTTP / 1.1請求報(bào)文中必須包含一個(gè)Host頭字段 # 如果用戶沒給,就調(diào)用這個(gè)函數(shù)來生成 netloc = ’’ if url.startswith(’http’): nil, netloc, nil, nil, nil = urlsplit(url) if netloc: try:netloc_enc = netloc.encode(’ascii’) except UnicodeEncodeError:netloc_enc = netloc.encode(’idna’) self.put_header(’Host’, netloc_enc) else: host = self.host port = self.port try:host_enc = host.encode(’ascii’) except UnicodeEncodeError:host_enc = host.encode(’idna’) # 對IPv6的地址進(jìn)行額外處理 if host.find(’:’) >= 0:host_enc = b’[’ + host_enc + b’]’ if port == self.default_port:self.put_header(’Host’, host_enc) else:host_enc = host_enc.decode(’ascii’)self.put_header(’Host’, f’{host_enc}:{port}’) def _output(self, s: Union[str, bytes]) -> None: # 將數(shù)據(jù)添加到緩沖區(qū) if hasattr(s, ’encode’): s = s.encode(’latin-1’) self._buffer.append(s) def _send_output(self, message_body=None, encode_chunked=False) -> None: # 發(fā)送并清空緩沖數(shù)據(jù).然后,如果有請求正文,就也順便發(fā)送 self._buffer.extend((b’’, b’’)) msg = b’rn’.join(self._buffer) self._buffer.clear() self.send(msg) if message_body is not None: self._send_body(message_body, encode_chunked) def _send_body(self, message_body: Union[bytes, str, bytearray, Iterable, io.IOBase], encode_chunked: bool) -> None: if hasattr(message_body, ’read’): chunks = self._read_readable(message_body) else: try:memoryview(message_body) except TypeError:try: chunks = iter(message_body)except TypeError: raise TypeError( f’message_body should be a bytes-like object or an iterable, got {repr(type(message_body))}’) else:# 如果是字節(jié)類型的,通過一次迭代把它發(fā)出去chunks = (message_body,) for chunk in chunks: if not chunk:continue if encode_chunked:chunk = f’{len(chunk):X}rn’.encode(’ascii’) + chunk + b’rn’ self.send(chunk) if encode_chunked: self.send(b’0rnrn’) def _read_readable(self, readable: io.IOBase) -> Generator[bytes, None, None]: need_encode = False if isinstance(readable, io.TextIOBase): need_encode = True while True: data_block = readable.read(self.block_size) if not data_block:break if need_encode:data_block = data_block.encode(’utf-8’) yield data_block @staticmethod def _get_content_length(body: Union[str, bytes, bytearray, Iterable, io.IOBase], method: str) -> Optional[int]: if body is None: # PUT,POST,PATCH三個(gè)方法默認(rèn)是有body的 if method.upper() in _METHODS_EXPECTING_BODY:return 0 else:return None if hasattr(body, ’read’): return None try: # 對于bytes或者bytearray格式的數(shù)據(jù),通過memoryview獲取它的長度 return memoryview(body).nbytes except TypeError: pass if isinstance(body, str): return len(body) return None
HTTPResponse的完整代碼如下:
class HTTPResponse: def __init__(self, sock: socket.socket, method: str = None) -> None: self.fp = sock.makefile(’rb’) self._method = method self.headers = None self.version = _UNKNOWN self.status = _UNKNOWN self.reason = _UNKNOWN self.chunked = _UNKNOWN self.chunk_left = _UNKNOWN self.length = _UNKNOWN self.will_close = _UNKNOWN def begin(self) -> None: if self.headers is not None: return self._parse_status_line() self._parse_header() self._set_chunk() self._set_length() self.will_close = self._check_close() def _read_line(self, limit: int = _MAX_LINE + 1, error_message: str = ’’) -> bytes: # 注意,這個(gè)方法默認(rèn)不去除line尾部的rn line = self.fp.readline(limit) if len(line) > _MAX_LINE: raise LineTooLong(error_message) return line def _read_bytes(self, amount: int) -> bytes: data = self.fp.read(amount) if len(data) < amount: raise IncompleteRead(data, amount - len(data)) return data def _parse_status_line(self) -> None: while True: version, status, reason = self._read_status() if status != HTTPStatus.CONTINUE:break while True:skip = self._read_line(error_message=’header line’).strip()if not skip: break self.status = status self.reason = reason if version in (’HTTP/1.0’, ’HTTP/0.9’): self.version = 10 elif version.startswith(’HTTP/1.’): self.version = 11 else: raise UnknownProtocol(version) def _read_status(self) -> Tuple[str, int, str]: line = str(self._read_line(error_message=’status line’), ’latin-1’) if not line: raise RemoteDisconnected(’Remote end closed connection without response’) try: version, status, reason = line.split(None, 2) except ValueError: # reason只是給人看的, 和status對應(yīng), 所以它有可能不存在 try:version, status = line.split(None, 1)reason = ’’ except ValueError:version, status, reason = ’’, ’’, ’’ if not version.startswith(’HTTP/’): self.close() raise BadStatusLine(line) try: status = int(status) if status < 100 or status > 999:raise BadStatusLine(line) except ValueError: raise BadStatusLine(line) return version, status, reason.strip() def _parse_header(self) -> None: headers = {} while True: line = self._read_line(error_message=’header line’) if len(headers) > _MAX_HEADERS:raise HTTPException(’got more than %d headers’ % _MAX_HEADERS) if line in _EMPTY_LINE:break line = line.decode(’latin-1’) i = line.find(’:’) if i == -1:raise BadHeaderLine(line) # 這里默認(rèn)沒有重名的情況 key, value = line[:i].lower(), line[i + 1:].strip() headers[key] = value self.headers = headers def _set_chunk(self) -> None: transfer_encoding = self.get_header(’transfer-encoding’) if transfer_encoding and transfer_encoding.lower() == ’chunked’: self.chunked = True self.chunk_left = None else: self.chunked = False def _set_length(self) -> None: # 首先要知道數(shù)據(jù)是否是分塊傳輸?shù)? if self.chunked == _UNKNOWN: self._set_chunk() # 如果狀態(tài)碼是1xx或者204(無響應(yīng)內(nèi)容)或者304(使用上次緩存的內(nèi)容),則沒有響應(yīng)正文 # 如果這是個(gè)HEAD請求,那么也不能有響應(yīng)正文 assert isinstance(self.status, int) if (self.status == HTTPStatus.NO_CONTENT orself.status == HTTPStatus.NOT_MODIFIED or100 <= self.status < 200 orself._method == ’HEAD’): self.length = 0 return length = self.get_header(’content-length’) if length and not self.chunked: try:self.length = int(length) except ValueError:self.length = None else:if self.length < 0: self.length = None else: self.length = None def _check_close(self) -> bool: conn = self.get_header(’connection’) if not self.chunked and self.length is None: return True if self.version == 11: if conn and ’close’ in conn.lower():return True return False else: if self.headers.get(’keep-alive’):return False if conn and ’keep-alive’ in conn.lower():return False return True def close(self) -> None: if self.is_closed(): return fp = self.fp self.fp = None fp.close() def is_closed(self) -> bool: return self.fp is None def read(self, amount: int = None) -> bytes: if self.is_closed(): return b’’ if self._method == ’HEAD’: self.close() return b’’ if amount is None: return self._read_all() print(amount, amount is None) return self._read_amount(amount) def _read_all(self) -> bytes: if self.chunked: return self._read_all_chunk() if self.length is None: s = self.fp.read() else: try:s = self._read_bytes(self.length) except IncompleteRead:self.close()raise self.length = 0 self.close() return s def _read_all_chunk(self) -> bytes: assert self.chunked != _UNKNOWN value = [] try: while True:chunk = self._read_chunk()if chunk is None: breakvalue.append(chunk) return b’’.join(value) except IncompleteRead: raise IncompleteRead(b’’.join(value)) def _read_chunk(self) -> Optional[bytes]: try: chunk_size = self._read_chunk_size() except ValueError: raise IncompleteRead(b’’) if chunk_size == 0: self._read_and_discard_trailer() self.close() return None chunk = self._read_bytes(chunk_size) # 每塊的結(jié)尾會有一個(gè)rn,這里把它讀掉 self._read_bytes(2) return chunk def _read_chunk_size(self) -> int: line = self._read_line(error_message=’chunk size’) i = line.find(b’;’) if i >= 0: line = line[:i] try: return int(line, 16) except ValueError: self.close() raise def _read_and_discard_trailer(self) -> None: # chunk的尾部可能會掛一些額外的信息,比如MD5值,過期時(shí)間等等,一般會在header中用trailer字段說明 # 當(dāng)chunk讀完之后調(diào)用這個(gè)函數(shù), 這些信息就先舍棄掉得了 while True: line = self._read_line(error_message=’chunk size’) if line in _EMPTY_LINE:break def _read_amount(self, amount: int) -> bytes: if self.chunked: return self._read_amount_chunk(amount) if isinstance(self.length, int) and amount > self.length: amount = self.length container = bytearray(amount) n = self.fp.readinto(container) if not n and container: # 如果讀不到字節(jié)了,也就可以關(guān)了 self.close() elif self.length is not None: self.length -= n if not self.length:self.close() return memoryview(container)[:n].tobytes() def _read_amount_chunk(self, amount: int) -> bytes: # 調(diào)用這個(gè)方法,讀取amount大小的chunk類型數(shù)據(jù),不足就全部讀取 assert self.chunked != _UNKNOWN total_bytes = 0 container = bytearray(amount) mvb = memoryview(container) try: while True:# mvb可以理解為容器的空的那一部分# 這里一直調(diào)用_full_readinto把數(shù)據(jù)填進(jìn)去,讓mvb越來越小,同時(shí)記錄填入的量# 等沒數(shù)據(jù)或者當(dāng)前數(shù)據(jù)足夠把mvb填滿之后,跳出循環(huán)chunk_left = self._get_chunk_left()if chunk_left is None: breakif len(mvb) <= chunk_left: n = self._full_readinto(mvb) self.chunk_left = chunk_left - n total_bytes += n breaktemp_mvb = mvb[:chunk_left]n = self._full_readinto(temp_mvb)mvb = mvb[n:]total_bytes += nself.chunk_left = 0 except IncompleteRead: raise IncompleteRead(bytes(container[:total_bytes])) return memoryview(container)[:total_bytes].tobytes() def _full_readinto(self, container: memoryview) -> int: # 返回讀取的量.如果沒能讀滿,這個(gè)方法會報(bào)警 amount = len(container) n = self.fp.readinto(container) if n < amount: raise IncompleteRead(bytes(container[:n]), amount - n) return n def _get_chunk_left(self) -> Optional[int]: # 如果當(dāng)前塊讀了一半,那么直接返回self.chunk_left就行了 # 否則,有三種情況 # 1). chunk_left為None,說明body壓根沒開始讀,于是返回當(dāng)前這一整塊的長度 # 2). chunk_left為0,說明這塊讀完了,于是返回下一塊的長度 # 3). body數(shù)據(jù)讀完了,返回None,順便做好善后工作 chunk_left = self.chunk_left if not chunk_left: if chunk_left == 0:# 如果剩余零,說明上一塊已經(jīng)讀完了,這里把rn讀掉# 如果是None,就說明chunk壓根沒開始讀self._read_bytes(2) try:chunk_left = self._read_chunk_size() except ValueError:raise IncompleteRead(b’’) if chunk_left == 0:self._read_and_discard_trailer()self.close()chunk_left = None self.chunk_left = chunk_left return chunk_left def get_header(self, name, default: str = None) -> Optional[str]: if self.headers is None: raise ResponseNotReady() return self.headers.get(name, default) @property def info(self) -> str: return repr(self.headers)
這兩個(gè)類應(yīng)該放到同一個(gè)py文件中, 同時(shí)這個(gè)文件內(nèi)還有其他一些輔助性質(zhì)的代碼:
import ioimport socketfrom typing import Generator, Iterable, Optional, Tuple, Unionfrom urllib.parse import urlsplit_CS_IDLE = ’Idle’_CS_REQ_STARTED = ’Request-started’_CS_REQ_SENT = ’Request-sent’_METHODS_EXPECTING_BODY = {’PATCH’, ’POST’, ’PUT’}_UNKNOWN = ’UNKNOWN’_MAX_LINE = 65536_MAX_HEADERS = 100_EMPTY_LINE = (b’rn’, b’n’, b’’)class HTTPStatus: CONTINUE = 100 SWITCHING_PROTOCOLS = 101 PROCESSING = 102 OK = 200 CREATED = 201 ACCEPTED = 202 NON_AUTHORITATIVE_INFORMATION = 203 NO_CONTENT = 204 RESET_CONTENT = 205 PARTIAL_CONTENT = 206 MULTI_STATUS = 207 ALREADY_REPORTED = 208 IM_USED = 226 MULTIPLE_CHOICES = 300 MOVED_PERMANENTLY = 301 FOUND = 302 SEE_OTHER = 303 NOT_MODIFIED = 304 USE_PROXY = 305 TEMPORARY_REDIRECT = 307 PERMANENT_REDIRECT = 308 BAD_REQUEST = 400 UNAUTHORIZED = 401 PAYMENT_REQUIRED = 402 FORBIDDEN = 403 NOT_FOUND = 404 METHOD_NOT_ALLOWED = 405 NOT_ACCEPTABLE = 406 PROXY_AUTHENTICATION_REQUIRED = 407 REQUEST_TIMEOUT = 408 CONFLICT = 409 GONE = 410 LENGTH_REQUIRED = 411 PRECONDITION_FAILED = 412 REQUEST_ENTITY_TOO_LARGE = 413 REQUEST_URI_TOO_LONG = 414 UNSUPPORTED_MEDIA_TYPE = 415 REQUESTED_RANGE_NOT_SATISFIABLE = 416 EXPECTATION_FAILED = 417 MISDIRECTED_REQUEST = 421 UNPROCESSABLE_ENTITY = 422 LOCKED = 423 FAILED_DEPENDENCY = 424 UPGRADE_REQUIRED = 426 PRECONDITION_REQUIRED = 428 TOO_MANY_REQUESTS = 429 REQUEST_HEADER_FIELDS_TOO_LARGE = 431 UNAVAILABLE_FOR_LEGAL_REASONS = 451 INTERNAL_SERVER_ERROR = 500 NOT_IMPLEMENTED = 501 BAD_GATEWAY = 502 SERVICE_UNAVAILABLE = 503 GATEWAY_TIMEOUT = 504 HTTP_VERSION_NOT_SUPPORTED = 505 VARIANT_ALSO_NEGOTIATES = 506 INSUFFICIENT_STORAGE = 507 LOOP_DETECTED = 508 NOT_EXTENDED = 510 NETWORK_AUTHENTICATION_REQUIRED = 511class HTTPResponse: ...class HTTPConnection: ...def _encode(data: str, encoding: str = ’latin-1’, name: str = ’data’) -> bytes: # 給請求正文等不知道能怎么轉(zhuǎn)碼的東西轉(zhuǎn)碼時(shí)用這個(gè),默認(rèn)使用latin-1編碼 # 它的好處是,轉(zhuǎn)碼失敗后能拋出詳細(xì)的錯(cuò)誤信息,一目了然 try: return data.encode(encoding) except UnicodeEncodeError as err: raise UnicodeEncodeError( err.encoding, err.object, err.start, err.end, '{} ({:.20!r}) is not valid {}. Use {}.encode(’utf-8’) if you want to send it encoded in UTF-8.'.format(name.title(), data[err.start:err.end], encoding, name) ) from Noneclass HTTPException(Exception): passclass ImproperConnectionState(HTTPException): passclass CannotSendRequest(ImproperConnectionState): passclass CannotSendHeader(ImproperConnectionState): passclass CannotCloseStream(ImproperConnectionState): passclass ResponseNotReady(ImproperConnectionState): passclass LineTooLong(HTTPException): def __init__(self, line_type): HTTPException.__init__(self, ’got more than %d bytes when reading %s’% (_MAX_LINE, line_type))class BadStatusLine(HTTPException): def __init__(self, line): if not line: line = repr(line) self.args = line, self.line = lineclass BadHeaderLine(HTTPException): def __init__(self, line): if not line: line = repr(line) self.args = line, self.line = lineclass RemoteDisconnected(ConnectionResetError, BadStatusLine): def __init__(self, *args, **kwargs): BadStatusLine.__init__(self, ’’) ConnectionResetError.__init__(self, *args, **kwargs)class UnknownProtocol(HTTPException): def __init__(self, version): self.args = version, self.version = versionclass UnknownTransferEncoding(HTTPException): passclass IncompleteRead(HTTPException): def __init__(self, partial, expected=None): self.args = partial, self.partial = partial self.expected = expected def __repr__(self): if self.expected is not None: e = f’, {self.expected} more expected’ else: e = ’’ return f’{self.__class__.__name__}({len(self.partial)} bytes read{e})’ __str__ = object.__str__2. 需要注意的點(diǎn)
總的來說, 本文的內(nèi)容不算復(fù)雜, 畢竟HTTP屬于不難理解, 但知識點(diǎn)很多很雜的類型. 這里把本文中一些需要注意的點(diǎn)總結(jié)一下:
請求和響應(yīng)數(shù)據(jù)的結(jié)構(gòu)大致相同, 都是狀態(tài)行+頭部+正文, 狀態(tài)行和頭部的每個(gè)字段都用一個(gè)rn分割, 與正文之間用兩個(gè)分割; 狀態(tài)行是必須的, 請求頭則最少需要host這個(gè)字段, 同時(shí)為了大家的方便, 你最好也設(shè)置一下Accept-encoding和Accept來限制服務(wù)器返回給你的數(shù)據(jù)內(nèi)容和格式; 正文不是必須的, 特別是對于除了3P(PATCH, POST, PUT)之外的方法來說. 如果你有正文, 你最好在header中使用Content-Length說明正文的長度, 如果是分塊發(fā)送, 則使用Transfer-Encoding字段說明; 如果對正文使用分塊傳輸, 每塊的格式是: 16進(jìn)制的數(shù)據(jù)長度+rn+數(shù)據(jù)+rn, 使用0rnrn來收尾. 收尾之后, 你還可以放一個(gè)trailer, 里面放數(shù)據(jù)的MD5值或者過期時(shí)間什么的, 這時(shí)候最好在header中設(shè)置trailer字段; 在一個(gè)請求的生命周期完成后, TCP連接是否會斷開取決于三點(diǎn): 響應(yīng)數(shù)據(jù)的HTTP版本, 響應(yīng)頭中的Connection和Keep-Alive字段, 是否知道響應(yīng)正文的長度; 最最重要的一點(diǎn), HTTP協(xié)議只是一個(gè)約定而非限制, 這就和礦泉水的建議零售價(jià)差不多, 你可以選擇遵守, 也可以不遵守, 后果自負(fù). 3. 結(jié)果測試首先, 我們用tornado寫一個(gè)簡單的服務(wù)器, 它會顯示客戶端的地址和接口;
import tornado.webimport tornado.ioloopclass IndexHandler(tornado.web.RequestHandler): def get(self) -> None: print(f’new connection from {self.request.connection.context.address}’) self.write(’hello world’)app = tornado.web.Application([(r’/’, IndexHandler)])app.listen(8888)tornado.ioloop.IOLoop.current().start()
然后, 使用我們剛寫好的客戶端進(jìn)行測試:
from client import HTTPConnectiondef fetch(conn: HTTPConnection, url: str = ’’) -> None: conn.request(’GET’, url) res = conn.get_response() print(res.read())connection = HTTPConnection(’127.0.0.1’, 8888)for i in range(10): fetch(connection)
結(jié)果如下:
以上就是python用700行代碼實(shí)現(xiàn)http客戶端的詳細(xì)內(nèi)容,更多關(guān)于python http客戶端的資料請關(guān)注好吧啦網(wǎng)其它相關(guān)文章!
相關(guān)文章:
1. ASP動態(tài)網(wǎng)頁制作技術(shù)經(jīng)驗(yàn)分享2. jsp實(shí)現(xiàn)登錄驗(yàn)證的過濾器3. Xml簡介_動力節(jié)點(diǎn)Java學(xué)院整理4. jsp文件下載功能實(shí)現(xiàn)代碼5. 如何在jsp界面中插入圖片6. JSP之表單提交get和post的區(qū)別詳解及實(shí)例7. 詳解瀏覽器的緩存機(jī)制8. vue3+ts+elementPLus實(shí)現(xiàn)v-preview指令9. .Net加密神器Eazfuscator.NET?2023.2?最新版使用教程10. phpstudy apache開啟ssi使用詳解
