URL == Uniform Resource Locator
Augmented BNF for a URL is defined as:
URL = scheme ":" *(uchar|reserved)["#" fragment] uchar = unreserved | escape unreserved = ALPHA | DIGIT | safe | extra | national escape = "%" HEX HEX extra = "!" | "*" | "í" | "(" | ")" | "," safe = "$" | "-" | "_" | "." unsafe = "CTL | SP | <"> | "#" | "%" | "<" | ">" national = <any OCTET excluding ALPHA, DIGIT, reserved, extra, safe, and unsafe> reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" fragment = *( uchar | reserved )
The "http" scheme is used to locate network resources via HTTP
http_URL = "http:" "//" host [ ":" port ][ abs_path ] host = <A legal Internet host domain name or IP address (in dotted-decimal form), as defined by RFC 1123> port = *DIGIT abs_path = "/" rel_path rel_path = [ path ] [ ";" params ] [ "?" query ] path = fsegment *( "/" segment ) fsegment = 1*pchar segment = *pchar params = param *( ";" param ) param = *( pchar | "/" ) pchar = uchar | ":" | "@" | "&" | "=" | "+" query = *( uchar | reserved )
Syntax and semantics of URLs can be found in RFC 1738
Some Buzz Words
HTML is a language for describing structured documents
HTML does not describe page layout. (Why not?)
This language is used by web browsers to render the document and display it to a user.
Several versions of HTML:
Other developments:
HTML files are all ASCII (except for kanji and other alphabets)
Tags are used to markup a document
A tag starts with "<" and ends with ">"
Some tags come in pairs: a start tag and an end tag. End tags have a "/" following the "<".
"<" or ">" are reserved for tags only. Tags themselves may not include these characters, either.
In general, all white space (including newlines) is reduced to the equivalent of one space.
Basic HTML document:
<HTML>
<HEAD>
<TITLE>Sample HTML Document</TITLE>
</HEAD>
<BODY>
This is a document
</BODY>
</HTML>
Learn by example. "View Source" in any browser.
Many books on HTML
Online tutorials (start at http://www.yahoo.com/)
Hyperlinks
<A HREF="http://www.sdsu.edu/">SDSU homepage</A>
will render as
Images
<IMG SRC="http://www.sdsu.edu/graphics/ComputerSci.gif">
will render as
HyperText Transfer Protocol
Stateless, object-oriented protocol
The typing and negotiation of data representation, allows systems to be built independently of the data being transferred.
Assigned port 80
Basic Client-Server Interaction
Client: Open connection
Server: Accept/Reject connection
Client: Send request
Server: Send response to request
Connection closed
HTTP-message = Simple-Request | Simple-Response | Full-Request | Full-Response Full-Request = Request-Line *( General-Header | Request-Header | Entity-Header ) CRLF [ Entity-Body ] Full-Response = Status-Line *( General-Header | Request-Header | Entity-Header ) CRLF [ Entity-Body ] HTTP-header = field-name ":" [ field-value ] CRLF Entity-Body = *OCTET
Request = Simple-Request | Full-Request Simple-Request = "GET" SP Request-URI CRLF
Simple-Request Example
telnet: www.eli.sdsu.edu 80
telnet: GET /index.html<CRLF>
Server:
<!DOCTYPE HTML SYSTEM "html.dtd">
<HTML><HEAD><TITLE>
Roger Whitney
</TITLE></HEAD>
<BODY>
<CENTER><H2>
Roger Whitney<br>
Computer Science (etc...)
Full-Request = Request-Line *( General-Header | Request-Header | Entity-Header ) CRLF [ Entity-Body ] Request-Line = Method SP URI SP HTTP-Version CRLF
Example
telnet: www.eli.sdsu.edu 80
Server: accepts connection
telnet: GET /index.html HTTP/1.0<CRLF>
telnet: <CRLF>
Server:
HTTP/1.0 200 Ok
Server: Netscape-Commerce/1.12
Date: Tuesday, 04-Mar-97 07:58:45 GMT
Last-modified: Thursday, 27-Feb-97 00:19:07 GMT
Content-length: 3949
Content-type: text/html
<!DOCTYPE HTML SYSTEM "html.dtd">
<HTML><HEAD><TITLE>
Roger Whitney
Response = Simple-Response | Full-Response Simple-Response = [Entity-Body] Full-Response = Status-Line *( General-Header | Response-Header | Entity-Header ) CRLF [ Entity-Body ]
Simple response is sent only in response to simple request
Sample Full-Response:
HTTP/1.0 200 Ok
Server: Netscape-Commerce/1.12
Date: Tuesday, 04-Mar-97 07:58:45 GMT
Last-modified: Thursday, 27-Feb-97 00:19:07 GMT
Content-length: 3949
Content-type: text/html
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF Status-Code = 3DIGIT Reason-Phrase = token *( SP token )
Example
HTTP/1.0 200 Ok
Status codes
1xx: Not used, but reserved for future use
2xx: Success ñ The requested action was succesfully received and understood
3xx: Redirection ñ Further action must be taken in order to complete the request
4xx: Client Error ñ The request contains bad syntax or is inherently impossible to fulfill
5xx: Server Error ñ The server could not fulfill the request
HTTP-header = Field-name ":" [Field-value ] CRLF Field-name = 1*<any CHAR, excluding CTLs, SP, and ":"> Field-value = *( Field-content | comment | LWS ) Field-content = <the OCTETs making up the field-value and sinsisting of either *text or combinations of token, tspecials, and quoted-string>
Sample Full-Response:
HTTP/1.0 200 Ok
Server: Netscape-Commerce/1.12
Date: Tuesday, 04-Mar-97 07:58:45 GMT
Last-modified: Thursday, 27-Feb-97 00:19:07 GMT
Content-length: 3949
Content-type: text/html
Method = "GET" | "HEAD" | "PUT" | "POST" | "DELETE" | "LINK" | "UNLINK" | extension-method
GET and HEAD must be supported by all HTTP/1.0 servers
Servers should return Status-Code
"501 Not Implemented"
if the method is unknown.
GET
Retrieves whatever item is identified by the URI
The URI can refer to a data-producing process, or a script
The produced data shall be returned as the Entity-Body
HEAD
Identical to GET except that the server must not return any Entity-Body in the response.
POST
Request that the origin server accept the item enclosed in the request as a new subordinate of the resource identified by the URI.
Allows a uniform function to:
Why?
PUT
The enclosed item in the request is to be stored under the supplied URI
DELETE
Request that the server delete the resource identified by the given URI.
LINK
Establishes one or more Link relationships between the existing resource identified by the URI and other existing resources
UNLINK
Removes one or more Link relationships from the existing resource identified by the URI
General-Header = Connection | Data | Forwarded | Mandatory | Message-ID | MIME-Version Connection = "Connection" ":" 1#connect-option connect-option = token [ "=" word ]
Request = Simple-Request | Full-Request Full-Request = Request_Line *( General-Header | Request-Header | Entity-Header) CRLF [ Entity-Body ] Request-Header = User-Agent | If-Modified-Since | Pragma | Authorization | Proxy-Authorization | Referer | From | Accept | Accept-Encoding | Accept-Language
Full-Response = Status-Line *( General-Header | Response-Header | Entity-Header ) CRLF [ Entity-Body ] Response-Header = Server | WWW-Authenticate | Proxy-Authenticate | Retry-After
Unknown header fields should be considered Entity-Header fields.
Entity-Header = Allow | Content-Length | Content-Type | Content-Encoding | Content-Transfer-Encoding | Content-Language | Expires | Last-Modified | URI-header | Location | Version | Derived-From | Title | Link | extension-header extension-header = HTTP-header