- Article
Cross-Site Scripting (XSS) is a security vulnerability that enables an attacker to place client side scripts (usually JavaScript) into web pages. When other users load affected pages, the attacker's scripts run, enabling the attacker to steal cookies and session tokens, change the contents of the web page through DOM manipulation, or redirect the browser to another page. XSS vulnerabilities generally occur when an application takes user input and outputs it to a page without validating, encoding or escaping it.
This article applies primarily to ASP.NET Core MVC with views, Razor Pages, and other apps that return HTML that may be vulnerable to XSS. Web APIs that return data in the form of HTML, XML, or JSON can trigger XSS attacks in their client apps if they don't properly sanitize user input, depending on how much trust the client app places in the API. For example, if an API accepts user-generated content and returns it in an HTML response, an attacker could inject malicious scripts into the content that executes when the response is rendered in the user's browser.
To prevent XSS attacks, web APIs should implement input validation and output encoding. Input validation ensures that user input meets expected criteria and doesn't include malicious code. Output encoding ensures that any data returned by the API is properly sanitized so that it can't be executed as code by the user's browser. For more information, see this GitHub issue.
Protecting your application against XSS
At a basic level, XSS works by tricking your application into inserting a <script>
tag into your rendered page, or by inserting an On*
event into an element. Developers should use the following prevention steps to avoid introducing XSS into their applications:
Never put untrusted data into your HTML input, unless you follow the rest of the steps below. Untrusted data is any data that may be controlled by an attacker, such as HTML form inputs, query strings, HTTP headers, or even data sourced from a database, as an attacker may be able to breach your database even if they can't breach your application.
Before putting untrusted data inside an HTML element, ensure it's HTML encoded. HTML encoding takes characters such as < and changes them into a safe form like <
Before putting untrusted data into an HTML attribute, ensure it's HTML encoded. HTML attribute encoding is a superset of HTML encoding and encodes additional characters such as " and ".
Before putting untrusted data into JavaScript, place the data in an HTML element whose contents you retrieve at runtime. If this isn't possible, then ensure the data is JavaScript encoded. JavaScript encoding takes dangerous characters for JavaScript and replaces them with their hex, for example, < would be encoded as
\u003C
.Before putting untrusted data into a URL query string ensure it's URL encoded.
HTML Encoding using Razor
The Razor engine used in MVC automatically encodes all output sourced from variables, unless you work really hard to prevent it doing so. It uses HTML attribute encoding rules whenever you use the @ directive. As HTML attribute encoding is a superset of HTML encoding this means you don't have to concern yourself with whether you should use HTML encoding or HTML attribute encoding. You must ensure that you only use @ in an HTML context, not when attempting to insert untrusted input directly into JavaScript. Tag helpers will also encode input you use in tag parameters.
Take the following Razor view:
@{ var untrustedInput = "<\"123\">";}@untrustedInput
This view outputs the contents of the untrustedInput variable. This variable includes some characters which are used in XSS attacks, namely <, " and >. Examining the source shows the rendered output encoded as:
<"123">
Warning
ASP.NET Core MVC provides an HtmlString
class which isn't automatically encoded upon output. This should never be used in combination with untrusted input as this will expose an XSS vulnerability.
JavaScript Encoding using Razor
There may be times you want to insert a value into JavaScript to process in your view. There are two ways to do this. The safest way to insert values is to place the value in a data attribute of a tag and retrieve it in your JavaScript. For example:
@{ var untrustedInput = "<script>alert(1)</script>";}<div id="injectedData" data-untrustedinput="@untrustedInput" /><div id="scriptedWrite" /><div id="scriptedWrite-html5" /><script> var injectedData = document.getElementById("injectedData"); // All clients var clientSideUntrustedInputOldStyle = injectedData.getAttribute("data-untrustedinput"); // HTML 5 clients only var clientSideUntrustedInputHtml5 = injectedData.dataset.untrustedinput; // Put the injected, untrusted data into the scriptedWrite div tag. // Do NOT use document.write() on dynamically generated data as it // can lead to XSS. document.getElementById("scriptedWrite").innerText += clientSideUntrustedInputOldStyle; // Or you can use createElement() to dynamically create document elements // This time we're using textContent to ensure the data is properly encoded. var x = document.createElement("div"); x.textContent = clientSideUntrustedInputHtml5; document.body.appendChild(x); // You can also use createTextNode on an element to ensure data is properly encoded. var y = document.createElement("div"); y.appendChild(document.createTextNode(clientSideUntrustedInputHtml5)); document.body.appendChild(y);</script>
The preceding markup generates the following HTML:
<div id="injectedData" data-untrustedinput="<script>alert(1)</script>" /><div id="scriptedWrite" /><div id="scriptedWrite-html5" /><script> var injectedData = document.getElementById("injectedData"); // All clients var clientSideUntrustedInputOldStyle = injectedData.getAttribute("data-untrustedinput"); // HTML 5 clients only var clientSideUntrustedInputHtml5 = injectedData.dataset.untrustedinput; // Put the injected, untrusted data into the scriptedWrite div tag. // Do NOT use document.write() on dynamically generated data as it can // lead to XSS. document.getElementById("scriptedWrite").innerText += clientSideUntrustedInputOldStyle; // Or you can use createElement() to dynamically create document elements // This time we're using textContent to ensure the data is properly encoded. var x = document.createElement("div"); x.textContent = clientSideUntrustedInputHtml5; document.body.appendChild(x); // You can also use createTextNode on an element to ensure data is properly encoded. var y = document.createElement("div"); y.appendChild(document.createTextNode(clientSideUntrustedInputHtml5)); document.body.appendChild(y);</script>
The preceding code generates the following output:
<script>alert(1)</script><script>alert(1)</script><script>alert(1)</script>
Warning
Do NOT concatenate untrusted input in JavaScript to create DOM elements or use document.write()
on dynamically generated content.
Use one of the following approaches to prevent code from being exposed to DOM-based XSS:
createElement()
and assign property values with appropriate methods or properties such asnode.textContent=
ornode.InnerText=
.document.CreateTextNode()
and append it in the appropriate DOM location.element.SetAttribute()
element[attribute]=
Accessing encoders in code
The HTML, JavaScript and URL encoders are available to your code in two ways:
- Inject them via dependency injection.
- Use the default encoders contained in the
System.Text.Encodings.Web
namespace.
When using the default encoders, then any customizations applied to character ranges to be treated as safe won't take effect. The default encoders use the safest encoding rules possible.
To use the configurable encoders via DI your constructors should take an HtmlEncoder, JavaScriptEncoder and UrlEncoder parameter as appropriate. For example;
public class HomeController : Controller{ HtmlEncoder _htmlEncoder; JavaScriptEncoder _javaScriptEncoder; UrlEncoder _urlEncoder; public HomeController(HtmlEncoder htmlEncoder, JavaScriptEncoder javascriptEncoder, UrlEncoder urlEncoder) { _htmlEncoder = htmlEncoder; _javaScriptEncoder = javascriptEncoder; _urlEncoder = urlEncoder; }}
Encoding URL Parameters
If you want to build a URL query string with untrusted input as a value use the UrlEncoder
to encode the value. For example,
var example = "\"Quoted Value with spaces and &\"";var encodedValue = _urlEncoder.Encode(example);
After encoding the encodedValue variable contains %22Quoted%20Value%20with%20spaces%20and%20%26%22
. Spaces, quotes, punctuation and other unsafe characters are percent encoded to their hexadecimal value, for example a space character will become %20.
Warning
Don't use untrusted input as part of a URL path. Always pass untrusted input as a query string value.
Customizing the Encoders
By default encoders use a safe list limited to the Basic Latin Unicode range and encode all characters outside of that range as their character code equivalents. This behavior also affects Razor TagHelper and HtmlHelper rendering as it uses the encoders to output your strings.
The reasoning behind this is to protect against unknown or future browser bugs (previous browser bugs have tripped up parsing based on the processing of non-English characters). If your web site makes heavy use of non-Latin characters, such as Chinese, Cyrillic or others this is probably not the behavior you want.
The encoder safe lists can be customized to include Unicode ranges appropriate to the app during startup, in Program.cs
:
For example, using the default configuration using a Razor HtmlHelper similar to the following:
<p>This link text is in Chinese: @Html.ActionLink("汉语/漢語", "Index")</p>
The preceding markup is rendered with Chinese text encoded:
<p>This link text is in Chinese: <a href="/">汉语/漢語</a></p>
To widen the characters treated as safe by the encoder, insert the following line into Program.cs
.:
builder.Services.AddSingleton<HtmlEncoder>( HtmlEncoder.Create(allowedRanges: new[] { UnicodeRanges.BasicLatin, UnicodeRanges.CjkUnifiedIdeographs }));
You can customize the encoder safe lists to include Unicode ranges appropriate to your application during startup, in ConfigureServices()
.
For example, using the default configuration you might use a Razor HtmlHelper like so;
<p>This link text is in Chinese: @Html.ActionLink("汉语/漢語", "Index")</p>
When you view the source of the web page you'll see it has been rendered as follows, with the Chinese text encoded;
<p>This link text is in Chinese: <a href="/">汉语/漢語</a></p>
To widen the characters treated as safe by the encoder you would insert the following line into the ConfigureServices()
method in startup.cs
;
services.AddSingleton<HtmlEncoder>( HtmlEncoder.Create(allowedRanges: new[] { UnicodeRanges.BasicLatin, UnicodeRanges.CjkUnifiedIdeographs }));
This example widens the safe list to include the Unicode Range CjkUnifiedIdeographs. The rendered output would now become
<p>This link text is in Chinese: <a href="/">汉语/漢語</a></p>
Safe list ranges are specified as Unicode code charts, not languages. The Unicode standard has a list of code charts you can use to find the chart containing your characters. Each encoder, Html, JavaScript and Url, must be configured separately.
Note
Customization of the safe list only affects encoders sourced via DI. If you directly access an encoder via System.Text.Encodings.Web.*Encoder.Default
then the default, Basic Latin only safelist will be used.
Where should encoding take place?
The general accepted practice is that encoding takes place at the point of output and encoded values should never be stored in a database. Encoding at the point of output allows you to change the use of data, for example, from HTML to a query string value. It also enables you to easily search your data without having to encode values before searching and allows you to take advantage of any changes or bug fixes made to encoders.
Validation as an XSS prevention technique
Validation can be a useful tool in limiting XSS attacks. For example, a numeric string containing only the characters 0-9 won't trigger an XSS attack. Validation becomes more complicated when accepting HTML in user input. Parsing HTML input is difficult, if not impossible. Markdown, coupled with a parser that strips embedded HTML, is a safer option for accepting rich input. Never rely on validation alone. Always encode untrusted input before output, no matter what validation or sanitization has been performed.