Yesterday Google announced a new AJAX API for translation and language detection. It’s a Javascript API to translate and detect the language of blocks of text within a webpage. But I need server side language detection of a text using PHP and the Google AJAX Language API.
Step 1 : Find Out Internal Working
You can test the language detection in the announcement post on The Official Google Blog. So I did, and I took my friend FireBug with me. This revealed the URL from which the Javascript code requests the magic :
1 | http://www.google.com/uds/GlangDetect?callback=google.language.callbacks.id102&context=22&q=this%20is%20a%20test%20of%20a%20language&key=internal&v=1.0 |
A quick surf to that URL shows the response :
1 | google.language.callbacks.id102('22',{"language":"en","isReliable":true,"confidence":0.33399844}, 200, null, 200) |
Which is without any doubt JSONP. It invokes a Javascript method “google.language.callbacks.id102″, passing 5 parameters to it.
Step 2 : Analyze
Request
Lets analyze all parameters in the request URL :
- callback : This is the name of the function that is called when the response is ready. It can be left-off.
- context : No idea. It can be left-off.
- q : The text you want to detect the language for. Is required (DUH!).
- key : An API key. It can be left-off as it seems.
- v : The API version. Is required.
Response
Like I said, the reponse is a call to the Javascript method “google.language.callbacks.id102″. It takes 5 parameters :
- 22 : Same value as the context parameter in the request URL.
- A Javascript object with 3 properties :
- language : The ISO2 language code of the language detected by Google.
- isReliable : Is the quess by Google reliable?
- confidence : How confident is Google about the guess? I think it takes values from 0 to 1, 0 being least reliable and 1 being most reliable.
- 200 : No idea.
- null : No idea.
- 200 : Again, no idea.
Step 3 : Bend and Break
Now lets see what we need to do to make this work in PHP.
Dump Unknown Request Parameters
Don’t use magic you don’t understand, so I dump the request parameters “context” and “key”. The result doesn’t seem to change when leaving these off.
No JSONP
Since I’m going to use PHP to request the language detection, I won’t be needing the callback parameter. So I dump that one too. The result is nice :
1 | {"responseData": {"language":"en","isReliable":true,"confidence":0.33399844}, "responseDetails": null, "responseStatus": 200} |
The 5 response parameters passed to the callback function are replaced by one Javascript object. At once, another useful response parameter becomes available : responseStatus. I guess it indicates if the request was successful. Just like the HTTP Code 200 OK.
Step 4 : Mix it with PHP
The final step is to mix what I have learned into a simple PHP class :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <?php class Google_Language_Detection { const URL = 'http://www.google.com/uds/GlangDetect?v=1.0&q='; public static function detect($text) { $url = self::URL . urlencode($text); $response = Zend_Json::decode(file_get_contents($url)); if ($response['responseStatus'] == 200) return $response['responseData']; else return false; } } ?> |
Notice that I use Zend Framework to decode the JSON from the response into a native PHP associative array.
Thank you so much for posting this! I have been looking all over for a server-side way of detecting the language of a piece of text. I am going to try porting your idea to Ruby or Java.
Wow! Awesome stuff. I was wondering if you have any spare time to help guide me to get this working on my site?
Erik
Chess.com
chessdev[4t}gmail,com
Perfect! Thanks for posting this!
Thanks a lot, Lode. This still works like a charm.
I changed the code, so there is no need for Zend Framework:
function detect_language($text){
/* NOTE: Google is currently using version 1.0.
If this changes in the future, change the value of this
variable. Otherwise the URL will not work anymore. */
$version = '1.0';
$url = 'http://www.google.com/uds/GlangDetect?v='.$version.
'&q='.urlencode($text);
/* Get the (multidimensional) associative array from the JSON
NOTE: json_decode() requires PHP >= 5.2.0 */
$response = json_decode(file_get_contents($url), true);
if ($response['responseStatus'] == 200)
return $response['responseData']['language'];
else
return false;
}
Have you tried this serverside recently? I just tried it and got a json message “Suspected Terms of Service Abuse”. If I really wanted to “abuse” this service I guess I could fake all the headers that the jsonp request would do… Would be nice if google provided a way to do this, they do it for their translation service: http://code.google.com/apis/ajaxlanguage/documentation/#fonje
Wonderful stuff
Hi Samuel,
Your function is working good except few languages that it doesn’t support(ar,ru and few more)
Do you have an idea why is that?
Thanks both of you for this feature, it will help me allot.