How to Include Scripts in HTML Documents

Introduction

There are three ways of including scripts in HTML documents:-

  1. As the string value provided for an event handling attribute such as onclick (intrinsic events).
  2. As the contents of a SCRIPT element (between opening and closing script tags in certain permissible locations within HTML source code.
  3. As a separate script file imported into an HTML page by having the URL of the file assigned to the SRC attribute of a SCRIPT element, again at certain permissible locations within the HTML source.

SCRIPT Elements

The current HTML version is 4.01. The HTML 4.01 transitional DTD defines a script element with:-

<!ELEMENT SCRIPT - - %Script; -- script statements -->
<!ATTLIST SCRIPT
  charset  %Charset;     #IMPLIED  - char encoding of linked resource -
  type     %ContentType; #REQUIRED - content type of script language -
  language CDATA         #IMPLIED  - predefined script language name -
  src      %URI;         #IMPLIED  - URI for an external script -
  defer    (defer)       #IMPLIED  - UA may defer execution of script -
  event    CDATA         #IMPLIED  - reserved for possible future use -
  for      %URI;         #IMPLIED  - reserved for possible future use -
  
>

(The HTML 4.01 strict DTD omits the language attribute as it is deprecated in the HTML 4.01 standard.)

Script elements are defined with opening and closing script tags. The two dashes after the element name in the DTD means that neither opening nor closing tag may be omitted, even when the element is importing a javascript file and has no contents.

Attributes

charset

The charset attribute can declare the character encoding of an external javascript file that is being imported using the src attribute. In all cases it is preferable that the server sending the file provide character encoding information in Content-Type HTTP headers (with the slight problem that there was no official content-type for use with scripts; see the type attribute below).

Javascript itself uses a very limited repertoire of characters but the content of string literals in non-Latin languages may necessitate an interest in character encodings with script files. That is not a problem that I have faced to date so I don't know how it should best be handled. I am yet to see a charset attribute used in a script tag.

type

The type attribute is required in HTML 4 but the HTML 4 specification is not very helpful on the subject. It says:-

type = content-type [CI]
This attribute specifies the scripting language of the element's contents and overrides the default scripting language. The scripting language is specified as a content type (e.g., "text/javascript"). Authors must supply a value for this attribute. There is no default value for this attribute.

(The [CI] means that the attribute's value is case insensitive.)

Pursuing the permissible values of content-type through the HTML specification leads to a list of currently recognised content types (MIME or Media types). Up until mid 2005 that list did not include anything related to ECMAScript or javascript. So although the attribute is required, and so must have a value, there was no standardised content-type for that value. However, the HTML 4 specification did give text/javascript as an example (even though it was not a recognised standard content type) so it was that value that has traditionally been used with the type attribute when including or importing ECMAScript/javascript into an HTML page.

The MIME types introduced in 2005 are application/ecmascript, application/javascript and text/javascript. The last of these, and the value that has traditionally been used; text/javascript, was official deprecated and so should be phased-out over time. However, at the point of officially recognising these new MIME types no browsers exist that will recognise either of application/ecmascript and application/javascript. This means that if either are actually used for the value of the type attribute the likelihood is that the script in question will never be executed.

So for the present, and probably many years to come, text/javascript is the only viable value for use with the type attribute when using javascript.

type="text/javascript"

language

The language attribute is deprecated (and not allowed under the strict DTD) and it is unnecessary when the type attribute is required, as that attribute will determine the language used.

The language attribute can be more specific than the type attribute because it can also specify the language version. In almost all respects specifying a language version is not helpful and even potentially dangerous.

By default a web browser will execute a script using the latest version of the language that it supports. Generally all current (March 2004) browsers support all of the language features specified in ECMA 262 2nd edition (approximately JavaScript 1.3) and most fully support the 3rd edition. Restricting the language features used to those defined in ECMA 262 2nd edition (with additional care in some less used areas) should result in scripts that will happily execute on all current browsers without a need to specify a language version.

Netscape initially attempted to tie the DOM features of their browser to the language version, which would have allowed a specified language version to imply the DOM features supported. That idea was abandoned because other browsers produced by their competitors introduced scripting with near identical languages but significantly different DOMs. DOM support should be determined independently of language version using object/feature detecting.

The potential danger with specifying a language version comes with specifying version 1.2. Version 1.2 was an aberration. It deviated significantly from earlier versions of the language in anticipation of changes to the ECMA specification, but those changes were never made. Netscape had to reverse the changes it had made to version 1.2 in version 1.3 in order to conform with what was eventually published as ECMA 262 2nd edition. The only browsers released for which version 1.2 was the default JavaScript version were Netscape 4.00 to 4.05 (and you won't find many of those left in the wild).

The problem is that if you specify version 1.2 in a language attribute you may actually get it, with all of its deviant characteristics, but at the same time most browsers will not exhibit those characteristics. It is always a bad idea to encourage the same code to be interpreted in two different ways, and certainly never without fully understanding how the language versions differ. The specific problem can be avoided by never specifying the language version as 1.2. The issue can be avoided by never providing the deprecated language attribute at all.

src

The SRC attribute specifies the URL of an external javascript file that is to be imported by the script element. If no file is being imported by the element (the script is the element's contents) then the src attribute is omitted.

defer

The defer attribute is specified as providing a "hint" to the browser as to whether it needs to process the script immediately (as it usually would), or whether it can carry on parsing the HTML following the script element and leave the javascript interpreter to process the script in its own time.

If a script uses the document.write method to insert content into the HTML being processed then the script element containing that script must not be deferred as the inserted HTML could end up at any point in the document (or even be inserted after the current document has closed, replacing it). If a script is deferred additional care must be taken before any part of it, such as a function it defines, is interacted with by other scripts (such as intrinsic events).

It is unusual for a script element to have a defer attribute. And many browsers will not recognise/act upon a defer attribute even if one is present.

The Standard Formulations

Leaving the defer and charset attributes aside, the normal formulation for a valid HTML 4 script element that imports a javascript file is:-

<script type="text/javascript"
        src="http://example.com/scriptFile.js"></script>

<!-- or using an example relative URL -->

<script type="text/javascript" src="../scripts/scriptFile.js"></script>

HTML is case insensitive so the tag name and attribute names can be in upper or lower (or mixed) case (current practice tends to prefer lower case).

The attribute values must be quoted because in both cases they include characters that are forbidden in unquoted attribute values (forbidden characters would be any character that is not: letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58)). The quote characters used may be double quotes (") or single quotes ('). Common practice prefers double quotes for HTML attributes.

Traditionally javascript files are given a two-letter extension of .js. That extension is not required, any valid URL to a resource that returns content that can be interpreted as valid javascript source code will work. In addition, browsers do not appear to be interested in any Content Type headers sent with the javascript source, which is probably a good thing as officially recognised content types have only just (mid 2005) been introduced.

Script that is to be included in an HTML page is placed as the content of a script element. Appearing between the opening and closing script tags:-

<script type="text/javascript">
function exampleFunctionDeclaration(n){
    return (n * 4);
}
</sciprt>

The same case sensitivity and attribute value quoting considerations apply to this application of the script tags as applied to their use when importing external script files.

Permissible Contexts for Script Elements

Script elements may not appear in all contexts in an HTML document. They may be children of the HEAD element because the DTD defines the content of the HEAD element as including %head_misc; content which includes SCRIPT in its definition. Script elements may also appear within the BODY element in any context that is specified as %flow;, %inline;, %special; or specifically SCRIPT by the HTML DTDs. This is because %flow; includes all elements defined as %inline;, which includes all elements defined as %special;, which includes SCRIP in its definition (among others).

Reading the DTDs and looking out for these categories will indicate where script elements are allowed to appear. For example, the (HTML 4.01 transitional) DTD definition for the paragraph element reads:-

<!ELEMENT P - O (%inline;)*        -- paragraph -->

The content for the P element is %inline; and %inline; encompasses SCRIPT. Similarly:-

<!ELEMENT DD - O (%flow;)*         -- definition description -->

The DD element has %flow; defining its content so it is allowed SCRIPT as its content (or part of it). Whereas:-

<!ELEMENT DL - - (DT|DD)+          -- definition list -->

The DL element is only allowed DT and DD elements as its children. So a script element cannot appear as a child of a DL element in valid HTML 4.

The DTD for the particular flavour of HTML being authored is the best guide as the where in a document script elements may appear, but note that the different versions of the DTD differ slightly in terms of the content defined for some elements.

The Content of Script Elements

Script elements that have source code as their contents and appear on an HTML page need some special consideration.

Hiding Scripts from Older Browsers

When scripting was first introduced the preceding generations of browsers had no concept of what a script element was, and would treat the content of the unrecognised script tags in the way unrecognised tags are normally handled in HTML. The content is treated as HTML source, which, for a scripts, meant including it as text in a page. The results did not look good and a mechanism was provided to hide the script element contents from browsers that did not know how to handle script elements.

Javascript has an end of line comment symbol consisting of two slashes (//). All characters between that symbol and the end of the line are treated as a comment and ignored. HTML also provides a means of commenting out parts of its source, an opening comment tag <!-- and a closing comment tag --> (strictly these are not opening and closing tags in HTML, it is the pairs of dashes that start and end a comment. The surrounding <! and > represent a processing instruction, which is the only context in which a comment is recognised in HTML.).

The trick to hiding javascript source code from browsers that did not recognise the script element, so it would not be shown on the page, was to allow script included in an HTML page to use an additional end of line comment symbol that corresponded with the <!-- opening comment tag used by HTML. The script author would then place this tag/comment symbol at the start of the script source code (on a line of its own, so as not to comment out any javascript code) and then use the normal javascript end of line comment symbol to comment out (from the javascript interpreter) an HTML end of comment tag.

<script type="text/javascript">
<!--
function exampleFunctionDeclaration(n){
    return (n * 4);
}
// -->
</sciprt>

A browser incapable of recognising the script element would treat its content as HTML source and so it would interpret the script within the script element as effectively commented out, thus not displaying it on the page.

When scripting was introduced the practice was necessary and highly recommended, but that was some time ago and browsers and HTML versions have moved on two or three generations. We are now at a point where the oldest browsers in current use are already two generations into HTML versions that formalised script elements. They all know what a script element is and how its contents should be handled. Even browsers that cannot execute scripts know that they are supposed to ignore the content of script elements.

The practice of hiding scripts from "older" browsers has become an anachronism, no longer needed and no longer used by informed javascript authors. It is still often seen because it is recommended in out of date books and in out of date javascript tutorials on the web. And the readers of those books and tutorials continue to use and promote it, not realising that it no longer serves any real purpose.

The existence of this additional comment syntax in javascript included in HTML pages also lead to HTML style comments being used extensively in on-page javascript. This was, and is, a very bad idea. Javascript has end of line and multi-line comment syntaxes and they should be used exclusively to comment javascript source code.

Closing Script Tags and "</" (end-tag open delimiter)

When a script is included on an HTML page the HTML parser needs to decide how much of the page's source text to pass on to the javascript interpreter and where it should start processing other HTML again. Officially an HTML parser is required to take the first occurrence of the character sequence "</" it finds after the opening script tag as marking the end of the script element. In practice browsers seem to be a lot more lax and only terminate the script section when they encounter the character sequence "</script>".

That seems reasonable (if lax) but it does not eliminate all problems. Suppose that a script includes HTML source in the form of a string literal, and that source includes a closing script tag, as might be the case when using document.write to write a new script element to the page:-

<script type="text/javascript">
document.write(
    '<script type="text/javascript" src="scriptFile.js"></script>');
</script>

That is an example simplified to the point of being futile but it should be obvious that if the HTML parser considers the first occurrence of "</script>" as terminating the script element the results will be undesirable.

The solution is to do something to make the character sequence within the javascript string of HTML different from the sequence that will be recognised as the closing script tag. This is often done by splitting the string and using a concatenation operation to let the script produce the same output:-

<script type="text/javascript">
document.write(
    '<script type="text/javascript" src="scriptFile.js"></scr'+'ipt>');
</script>

This conceals the closing script tag from the HTML parser but it is not a good idea because string concatenation is a surprisingly heavyweight operation and the same goal of disrupting the character sequence that the HTML parser will mistake for a closing tag can be achieved by using the javascript escape character to escape any character in the closing script tag:-

<script type="text/javascript">
document.write(
    '<script type="text/javascript" src="scriptFile.js"></script\>');
</script>

The HTML parser will now not find the character sequence "</script>" until it encounters the real closing script tag, but the internal representation of the string is not affected by the use of the escape character in the javascript source and no additional operations are needed.

However, as I said, it is the character sequence "</" that is officially to be taken as terminating a script element's contents. While no current browsers are known to be that strict it is entirely realistic that some browsers may exist (or be introduced) that takes the HTML specifications to hart and treat "</" as the end of the script content. But HTML validaters already tend to take the HTML specification seriously and will report many mark-up errors as a result of getting the impression that a script element has terminated sooner than a browser would think it had.

The above use of the escape character may placate all known browsers but it will not address the requirements of the HTML specification. But they can both be addressed by escaping a different character, specifically the forward slash:-

<script type="text/javascript">
document.write(
    '<script type="text/javascript" src="scriptFile.js"><\/script>');
</script>

Of course now it is not just the closing script tag that needs to be escaped but all occurrences of closing tags appearing in string literals. All occurrences of "</" would need to be escaped to "<\/" to completely avoid HTML parser and validation problems. Alternatively the javascript source could be moved to an external file as then it is never examined by an HTML parser or considered in HTML validation.

External Javascript Files

Placing javascript source code in external files has several advantages. For those who are required to use a browser that is javascript incapable/disabled it can significantly reduce download time as those browsers just will not bother getting the external file as they have no use for it, scripts on an HTML page must be downloaded with the page if the HTML is to be used.

External javascript files can also be cached separately from HTML pages so they may need to be downloaded less often even for the users of javascript capable/enabled browsers.

They entirely remove the need to worry about script hiding (no longer needed anyway), escaping HTML closing tags in strings or any other factors relating to the parsing of mark-up languages.

Javascript imported by using the src attribute of a script element is used in place of the content for the script element that imported it. The position of that element in the page defines the "location" of the script in the document. If the file executes document.write then any content written will be inserted following the script element that imported the file, and any other elements on the page referenced by that script as it loads will need to have already been parsed by the HTML parser at that point or they will not be found in the DOM.

The Content of External Javascript Files.

Javascript files imported using the src attribute of script elements must contain only javascript source code. They must not contain any HTML. It is a surprisingly common error for opening and closing script tags and/or the "hide from older browsers" HTML comment tags to be included in external script files, in that context they are javascript syntax errors and nothing else.

Script Elements that Import Javascript Files and Have Contents

Script elements may attempt to both import a file and contain script contents. The idea here is to provide some scripted action in the event that the external file cannot be loaded for some reason. Such a script element may look like:-

<script type="text/javascript" src="../scripts/scriptFile.js">
    var externalScriptLoaded = false;
</script> 

The browser should handle this formulation of the script element by attempting to load the external file, but in the even of that attempt failing instead the contents of the script element are executed. So, in the example above, if the external file is loaded and executed the contents of the element would not be executed. That external file would itself define the externalScriptLoaded global variable and assign it a value of boolean true. If the file did not load the contents would be executed, again creating the externalScriptLoaded variable, but this time assigning it a false value. Another script on the page can then read the externalScriptLoaded variable as a means of determining whether the external script loaded successfully.

The definition of failing to load an external script is centred around HTTP. If no connection to the server can be made, or an HTTP error response, such as 404, is returned, then the external script has failed to load and the browser can execute the contents of the element. However, many servers are set up in such a way that they do not actually return the expected HTTP error responses, but instead return an HTML page that is intended to inform the user of the error. This is fine for humans but from the point of view of the browser such a response is indistinguishable from a returned (but erroneous) javascript source file (This is in part because the browser disregards content-type headers sent with external javascript files so even if the HTML error reporting page is sent with a text/html content type the browser will still assume that it contains javascript source). The browser attempts to execute the returned HTML source as javascript and fails at the first inevitable syntax error. But erroring while executing what the browser thought was an external javascript file does not result in the execution of the code within the script element.

In practice script elements are rarely used where an external file is imported and script contents are provided for the element. If a separate script wanted to verify that an externally imported script was available it would not need the mechanism demonstrated in the example above as javascript provides many ways of verifying the existence of javascript defined entities. So, for example, if the external script defined a function called functionName, the availability of that function could be verified as:-

if(typeof functionName == "function"){
    functionName();
}

- and if a function defined in an external file is available then that external file must have been successfully loaded.

Event Handling Attributes: Intrinsic Events

The final place where javascript can be included in an HTML document is as the value strings provided for event handling attributes.

The values of event handling attributes will almost certainly need to be quoted because it is nearly impossible to write a javascript statement that only uses the characters allowed in an unquoted attribute value. And quoting can get quite involved in attribute values because they need to be quoted in the HTML source so whatever type of quote marks are used in the HTML cannot be used within the javascript code provided as the value because the HTML parser would take them as ending the string for the attribute value. While javascript string literals allow the use of double quotes or single quotes as delimiters and allow the type of quote not used as the delimiter to appear within the string literal unescaped.

So, given a desire to assign the string "don't do that" to an element's value property in an onclick event, because of the single quote appearing in the string itself the attribute value onclick='this.value = "don't do that";' will not work because the HTML parser will take the second single quote as ending the attribute value. It will not work to simply escape the single quote as onclick='this.value = "don\'t do that";' because the HTML parser doesn't know anything about javascript escapes and still sees the second single quote in the middle of the javascript string.

In this case escaping the single quote and reversing the quoting between the HTML and the javascript onclick="this.value = 'don\'t do that';" or using a javascript hex escape (which the HTML parser will not see as a quote) onclick='this.value = "don\x27t do that";' would solve the problem. But quotes in event handling attribute strings that define code that uses string literals often needs to be thought about.

The Default Language for Intrinsic Events

All else being equal, web browsers seem to all default the scripting language used with intrinsic events to javascript (ECMAScript, in whichever implementation is provided) and there is no formal mechanism for associating a scripting language with individual event handling attributes (unlike script elements which must be provided with a type attribute).

The HTML specification calls for a page wide default scripting language to be set, and that is the only specified way to set the scripting language for intrinsic events.

To this end The HTML specification proposes the inclusion in the HEAD section of a page of a META tag:-

<meta http-equiv="Content-Script-Type" content="text/javascript">

This is supposed to assert the default type of script language on a page (possibly overridden by the (required) type attributes provided for individual script elements). As a result it is formally correct to include this tag in HTML 4.01 documents (or provide a corresponding HTTP header when the page is served).

However, there is no evidence that any current browsers pay any attention to this META element at all (or would have any interest in a corresponding HTTP header), but then there are not many browsers that can execute any scripting language but javascript. This entire proposed mechanism has also been subject to criticism, and many recommend disregarding it entirely in favour of relying on the tendency of browsers to default to interpreting intrinsic event code as javascript.

NOSCRIPT Elements

The general idea of a NOSCRIPT element is to provide a holder for HTML marked-up content that will only be displayed when scripting is not enabled/available on a web browsers. At first sight this seems to be a useful idea, and a contribution towards providing clean degradation in circumstances where scripts cannot be executed. Showing content that would be a substitute for any content that would otherwise have been provided by a script.

However, SCRIPT and NOSCRIPT elements are not actually directly substitutable in HTML. That is, you cannot use a NOSCRIPT element in all of the contexts in which you can use a SCRIPT element and produce valid HTML as a result.

The HTML DTDs categories SCRIPT and NOSCRIPT differently: SCRIPT is an %inline, %special or %head.misc element, it may appear in the HEAD of a document (as a child of a HEAD element (%head.misc)), or in any context that allows inline or %special content (descendants of the BODY element, but not in all contexts). The NOSCRIPT element is categorised as %block, and as a result it cannot appear in the HEAD at all, and may only appear in the body in a context that allows %block content (%flow or %block but not %inline). This means that the one cannot always stand as a direct substitute for the other in a valid HTML document.

HTML NOSCRIPT elements probably seemed like a good idea when they were first introduced. They were probably even viable at the time because so few browsers were able to execute javascript that a division between SCRIPT and NOSCRIPT could encompass all of the possibilities. The problem with them now is the diversity of javascript capable web browsers, with their differing object models and language implementations.

While it remains the case that any browser on which scripting is disabled or unavailable will use any NOSCRIPT elements provided in an HTML page, it is not the case that all javascript supporting and enabled browsers will be able to successfully execute any script specified within (or imported by) a SCRIPT element. The browser may lack the features needed by the script, or just not be sufficiently dynamic to present any content that the script intends to insert into the document.

Even browser features as seemingly universal as the document.write function are not universally supported (even on modern browsers), and anything even remotely dynamic is bound to fail somewhere. So instead of having to cope with two certain outcomes, successful execution and no script execution at all, it is actually necessary to cope with 3 possible outcomes, adding the possibility that scripting is supported by the browser but the features required by any individual script are not available. In that third case the script fails to provide what it was intended to provide, but the contents of the NOSCRIPT elements are not presented either.

This effectively renders NOSCRIPT elements next to useless when it comes to providing clean degradation. They leave an unbridgeable gap between browsers unwilling or unable to execute scripts at all and browsers that will fully support any given script. And whatever content seemed to make sense within those NOSCRIPT elements must also make sense in the context of a javascript capable browser that does not support the features required by a script.

Recognising a requirement for clean degradation in script design, and the inability of NOSCRIPT elements to contribute towards facilitating it, many recommend never using NOSCRIPT elements. Instead providing content that works in place of active script support within the HTML and then having their scripts remove, or transform by manipulation, that content only when the browser proves sufficiently supportive for the script to be viable. This technique allows the design to only consider two conditions; the browser fully supports the script and will execute it, or the browser does not support the scripts so whatever was originally included in the HTML will be what the user is exposed to.

comp.lang.javascript FAQ notes T.O.C.