Yesterday Google announced a new AJAX API for translation and language detection. It’s a Javascript API to translate and detect the language of blocks of text within a webpage. But I need server side language detection of a text using PHP and the Google AJAX Language API.
Step 1 : Find Out Internal Working
You can test the language detection in the announcement post on The Official Google Blog. So I did, and I took my friend FireBug with me. This revealed the URL from which the Javascript code requests the magic :
1 | http://www.google.com/uds/GlangDetect?callback=google.language.callbacks.id102&context=22&q=this%20is%20a%20test%20of%20a%20language&key=internal&v=1.0 |
A quick surf to that URL shows the response :
1 | google.language.callbacks.id102('22',{"language":"en","isReliable":true,"confidence":0.33399844}, 200, null, 200) |
Which is without any doubt JSONP. It invokes a Javascript method “google.language.callbacks.id102″, passing 5 parameters to it.
Step 2 : Analyze
Request
Lets analyze all parameters in the request URL :
- callback : This is the name of the function that is called when the response is ready. It can be left-off.
- context : No idea. It can be left-off.
- q : The text you want to detect the language for. Is required (DUH!).
- key : An API key. It can be left-off as it seems.
- v : The API version. Is required.
Response
Like I said, the reponse is a call to the Javascript method “google.language.callbacks.id102″. It takes 5 parameters :
- 22 : Same value as the context parameter in the request URL.
- A Javascript object with 3 properties :
- language : The ISO2 language code of the language detected by Google.
- isReliable : Is the quess by Google reliable?
- confidence : How confident is Google about the guess? I think it takes values from 0 to 1, 0 being least reliable and 1 being most reliable.
- 200 : No idea.
- null : No idea.
- 200 : Again, no idea.
Step 3 : Bend and Break
Now lets see what we need to do to make this work in PHP.
Dump Unknown Request Parameters
Don’t use magic you don’t understand, so I dump the request parameters “context” and “key”. The result doesn’t seem to change when leaving these off.
No JSONP
Since I’m going to use PHP to request the language detection, I won’t be needing the callback parameter. So I dump that one too. The result is nice :
1 | {"responseData": {"language":"en","isReliable":true,"confidence":0.33399844}, "responseDetails": null, "responseStatus": 200} |
The 5 response parameters passed to the callback function are replaced by one Javascript object. At once, another useful response parameter becomes available : responseStatus. I guess it indicates if the request was successful. Just like the HTTP Code 200 OK.
Step 4 : Mix it with PHP
The final step is to mix what I have learned into a simple PHP class :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <?php class Google_Language_Detection { const URL = 'http://www.google.com/uds/GlangDetect?v=1.0&q='; public static function detect($text) { $url = self::URL . urlencode($text); $response = Zend_Json::decode(file_get_contents($url)); if ($response['responseStatus'] == 200) return $response['responseData']; else return false; } } ?> |
Notice that I use Zend Framework to decode the JSON from the response into a native PHP associative array.
Thank you so much for posting this! I have been looking all over for a server-side way of detecting the language of a piece of text. I am going to try porting your idea to Ruby or Java.
Wow! Awesome stuff. I was wondering if you have any spare time to help guide me to get this working on my site?
Erik
Chess.com
chessdev[4t}gmail,com
Perfect! Thanks for posting this!
Thanks a lot, Lode. This still works like a charm.
I changed the code, so there is no need for Zend Framework:
function detect_language($text){
/* NOTE: Google is currently using version 1.0.
If this changes in the future, change the value of this
variable. Otherwise the URL will not work anymore. */
$version = '1.0';
$url = 'http://www.google.com/uds/GlangDetect?v='.$version.
'&q='.urlencode($text);
/* Get the (multidimensional) associative array from the JSON
NOTE: json_decode() requires PHP >= 5.2.0 */
$response = json_decode(file_get_contents($url), true);
if ($response['responseStatus'] == 200)
return $response['responseData']['language'];
else
return false;
}