HTTP系列1-HTTP/0.9-HTTP/1.0/HTTP1.1

过了这么久, 我还是觉得对HTTP一知半解的

HF上的大佬是这样说的:

“Most people think they understand the internet, they are wrong. Can you write out a full HTTP request and response? Do you know every HTTP verb? Do you know the difference between HTTP 1.0 and HTTP 1.1 and HTTP 2.0 or HTTP 0.9 without having to research it? Do you know most of the HTTP response codes, not just general information but specific codes. Do you know how a CDE works?”

感觉有被骂到

于是半夜我从床上爬起来决定把这个硬骨头彻底解决了!

Resource:

(阮老师的博客)HTTP协议入门

https://www.ruanyifeng.com/blog/2016/08/http.html

RFC2616

https://www.ietf.org/rfc/rfc2616.txt

RFC1945

https://datatracker.ietf.org/doc/html/rfc1945#section-1.1

MDN Web docs

https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Basics_of_HTTP/Evolution_of_HTTP

HTTP/0.9

HTTP 最早版本是1991年发布的0.9版, 基于TCP/IP, 是一个非常简单的用来传输raw data的协议 (不涉及数据包传输, 只能发送纯文本) , 默认使用80端口.

请求方式 (Method)

只有一个请求方式 GET

请求 (Request)

HTTP 0.9 的请求是单行的, 请求方式后跟目标资源路径, 如下:

GET /B1nz.html

响应 (Response)

响应只包含目标资源本身, 如下:

<HTML>
  <body><h1>B1nz</h1></body>
</HTML>

HTTP/1.0

HTTP/1.0 基于 HTTP/0.9 做了以下的扩展

格式改变: 头域 (Header Field)
可以发送任何格式的文件
新增请求方式
状态码
权限（authorization）
缓存（cache）
内容编码（content encoding）

请求方式 (Method)

新增了两种请求方式

1. "GET"
2. "HEAD"
3. "POST"

来看看RFC里面的定义吧:

HEAD:

The HEAD method is identical to GET except that the server must not
return any Entity-Body in the response. The metainformation contained
in the HTTP headers in response to a HEAD request should be identical
to the information sent in response to a GET request. This method can
be used for obtaining metainformation about the resource identified
by the Request-URI without transferring the Entity-Body itself. This
method is often used for testing hypertext links for validity,
accessibility, and recent modification.

There is no “conditional HEAD” request analogous to the conditional
GET. If an If-Modified-Since header field is included with a HEAD
request, it should be ignored.

B1nz的笔记:

HEAD请求和GET是一毛一样的,只不过不返回entity-body (消息体)

所以说返回的有:

Status-Line
General-Header
Response-Header
Entity-Header

HEAD请求可以用来:

检查超链接的有效性
检查网页近期是否被修改
获得所有关于请求的元数据

POST: (我们熟悉的家伙出生了)

The POST method is used to request that the destination server accept
the entity enclosed in the request as a new subordinate of the
resource identified by the Request-URI in the Request-Line. POST is
designed to allow a uniform method to cover the following functions:

o Annotation of existing resources;

o Posting a message to a bulletin board, newsgroup, mailing list,
or similar group of articles;

o Providing a block of data, such as the result of submitting a
form, to a data-handling process;

o Extending a database through an append operation.

The actual function performed by the POST method is determined by the
server and is usually dependent on the Request-URI. The posted entity
is subordinate to that URI in the same way that a file is subordinate
to a directory containing it, a news article is subordinate to a
newsgroup to which it is posted, or a record is subordinate to a
database.

A successful POST does not require that the entity be created as a
resource on the origin server or made accessible for future
reference. That is, the action performed by the POST method might not
result in a resource that can be identified by a URI. In this case,
either 200 (ok) or 204 (no content) is the appropriate response
status, depending on whether or not the response includes an entity
that describes the result.

If a resource has been created on the origin server, the response
should be 201 (created) and contain an entity (preferably of type
“text/html”) which describes the status of the request and refers to
the new resource.

A valid Content-Length is required on all HTTP/1.0 POST requests. An
HTTP/1.0 server should respond with a 400 (bad request) message if it
cannot determine the length of the request message’s content.

Applications must not cache responses to a POST request because the
application has no way of knowing that the server would return an
equivalent response on some future request.

B1nz的笔记:

其他不多说了,看到新的一点:

GET方式会缓存. 而POST方式不缓存

请求 (Request)

请求包含:

请求命令
各种头
消息体

请求命令:

新增HTTP协议版本

GET /B1nz.html HTTP/1.0

请求头域（Request-header）包含:

1. Authorization

包含在头域内的一种验证方式 (下面的例子内是基础认证)

1	Authorization: Basic U2h1c2hlbmcwMDcldUZGMUFzczAwNw==

2. From

1	From: webmaster@w3.org

3. If-Modified-Since

修改信息

1	If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT

4. Referer

客户端发送这个请求之前的位置

1	Referer: http://www.w3.org/hypertext/DataSources/Overview.html

5. User-Agent

客户端信息

1	User-Agent: Mozilla/5.0 (X11; Linux x86_64)

消息头域（Entity-header）包括:

1. Allow

允许的HTTP请求方法的集合

1	Allow: GET, POST, HEAD

2. Content-Encoding

可以把数据压缩后在发送,此字段说明压缩的方式

1	Content-Encoding: x-gzip

3. Content-Length

单位是byte

1	Content-Length: 3495

4. Content-Type

写的是 MIME type

常见MIME Type如下:

表明文件是普通文本，理论上是人类可读:
text/plain
text/html
text/css
表明是某种图像。不包括视频，但是动态图（比如动态gif）也使用image类型:
image/jpeg
image/png
image/svg+xml
表明是某种音频文件:
audio/mp4
表明是某种视频文件:
video/mp4
表明是某种二进制数据:
application/javascript
application/pdf
application/zip
application/atom+xml

MIME Type 后接” ; “ 可以添加参数

例1:
Content-Type: text/html; charset=utf-8
注意:你可能发现某些内容在 text/javascript 媒体类型末尾有一个 charset 参数，指定用于表示代码内容的字符集。这不是合法的，而且在大多数场景下会导致脚本不被载入。
例2:
Content-Type: multipart/form-data; boundary=---------------------------872165604191141565395500449

5. Expires

过了这个时间点后客户端不会再缓存此数据:

Expires: Thu, 01 Dec 1994 16:00:00 GMT

6. Last-Modified

Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT

7. extension-header

个人理解, 可以写好几个entity-header

用的好像不是很多。官方也提示，对于这类头，接收方一般是忽略的。

响应（Response）

响应包括：

响应命令
各种头
消息体

响应命令：

新增状态码和状态描述

HTTP/1.0 200 OK

状态码：

o 1xx: Informational - Not used, but reserved for future use

o 2xx: Success - The action was successfully received, understood, and accepted.

o 3xx: Redirection - Further action must be taken in order to complete the request

o 4xx: Client Error - The request contains bad syntax or cannot be fulfilled

o 5xx: Server Error - Tsshe server failed to fulfill an apparently valid request

状态描述：

"200"   ; OK
"201"   ; Created
"202"   ; Accepted
"204"   ; No Content
"301"   ; Moved Permanently
"302"   ; Moved Temporarily
"304"   ; Not Modified
"400"   ; Bad Request
"401"   ; Unauthorized
"403"   ; Forbidden
"404"   ; Not Found
"500"   ; Internal Server Error
"501"   ; Not Implemented
"502"   ; Bad Gateway
"503"   ; Service Unavailable

ctf的web题里如果碰到302重定向

用bp可以看到重定向之前的内容

响应头域包含：

1. Location

资源地址

只允许写一个URL

如果是重定向的情况，一定要写出定向后的新地址

1	Location: http://www.w3.org/hypertext/WWW/NewLocation.html

2. Server

1	Server: Apache/2.4.1 (Unix)

3. WWW-Authenticate

定义了使用何种验证方式（上文提到过的http验证）

1	WWW-Authenticate: Basic

消息体头域前文已经讲过啦，就不再赘述了捏

HTTP/1.0 的局限

HTTP/1.0有两个大问题

连接无法复用

因为每次请求都是一个单独的TCP请求，每一次都要来一次三次握手和慢启动，导致延迟很高。
head of line blocking（队头堵塞）

服务器没有响应就不能发送下一个请求，会导致堵塞。

虽然我找到的全部资料都有提到http/1.0队头堵塞的问题，但是我觉得http/1.0根本不支持一次连接内同时发送多个请求，哪里来的队。用队头堵塞这个词不太合适，用在http/1.1还是更加合适点。

B1nz碎碎念：

本来这篇文章的名字是叫一文啃透HTTP的，但是篇幅太太太长了，所以决定分成三篇文章^^

网上对http/1.0的分析并不是很多，毕竟用得少嘛，但其实http/1.1只是基于http/1.0做了改进。对http/1.0熟悉的了解可以促进俺们对HTTP/1.1的学习。^^不偷懒。