Server Side Language Detection with Google Language API

Yesterday Google announced a new AJAX API for translation and language detection. It’s a Javascript API to translate and detect the language of blocks of text within a webpage. But I need server side language detection of a text using PHP and the Google AJAX Language API.

Step 1 : Find Out Internal Working

You can test the language detection in the announcement post on The Official Google Blog. So I did, and I took my friend FireBug with me. This revealed the URL from which the Javascript code requests the magic :

1
http://www.google.com/uds/GlangDetect?callback=google.language.callbacks.id102&context=22&q=this%20is%20a%20test%20of%20a%20language&key=internal&v=1.0

A quick surf to that URL shows the response :

1
google.language.callbacks.id102('22',{"language":"en","isReliable":true,"confidence":0.33399844}, 200, null, 200)

Which is without any doubt JSONP. It invokes a Javascript method “google.language.callbacks.id102”, passing 5 parameters to it.

Step 2 : Analyze

Request

Lets analyze all parameters in the request URL :

  • callback : This is the name of the function that is called when the response is ready. It can be left-off.
  • context : No idea. It can be left-off.
  • q : The text you want to detect the language for. Is required (DUH!).
  • key : An API key. It can be left-off as it seems.
  • v : The API version. Is required.

Response

Like I said, the reponse is a call to the Javascript method “google.language.callbacks.id102”. It takes 5 parameters :

  • 22 : Same value as the context parameter in the request URL.
  • A Javascript object with 3 properties :
    • language : The ISO2 language code of the language detected by Google.
    • isReliable : Is the quess by Google reliable?
    • confidence : How confident is Google about the guess? I think it takes values from 0 to 1, 0 being least reliable and 1 being most reliable.
  • 200 : No idea.
  • null : No idea.
  • 200 : Again, no idea.

Step 3 : Bend and Break

Now lets see what we need to do to make this work in PHP.

Dump Unknown Request Parameters

Don’t use magic you don’t understand, so I dump the request parameters “context” and “key”. The result doesn’t seem to change when leaving these off.

No JSONP

Since I’m going to use PHP to request the language detection, I won’t be needing the callback parameter. So I dump that one too. The result is nice :

1
{"responseData": {"language":"en","isReliable":true,"confidence":0.33399844}, "responseDetails": null, "responseStatus": 200}

The 5 response parameters passed to the callback function are replaced by one Javascript object. At once, another useful response parameter becomes available : responseStatus. I guess it indicates if the request was successful. Just like the HTTP Code 200 OK.

Step 4 : Mix it with PHP

The final step is to mix what I have learned into a simple PHP class :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<?php
 
class Google_Language_Detection {
    const URL = 'http://www.google.com/uds/GlangDetect?v=1.0&q=';
 
    public static function detect($text) {
        $url = self::URL . urlencode($text);
        $response = Zend_Json::decode(file_get_contents($url));
 
        if ($response['responseStatus'] == 200)
            return $response['responseData'];
        else
            return false;
    }
}
 
?>

Notice that I use Zend Framework to decode the JSON from the response into a native PHP associative array.

9 thoughts on “Server Side Language Detection with Google Language API”

  1. Thank you so much for posting this! I have been looking all over for a server-side way of detecting the language of a piece of text. I am going to try porting your idea to Ruby or Java.

  2. Wow! Awesome stuff. I was wondering if you have any spare time to help guide me to get this working on my site?

    Erik
    Chess.com
    chessdev[4t}gmail,com

  3. Thanks a lot, Lode. This still works like a charm.

    I changed the code, so there is no need for Zend Framework:
    function detect_language($text)
    {
    /* NOTE: Google is currently using version 1.0.
    If this changes in the future, change the value of this
    variable. Otherwise the URL will not work anymore. */
    $version = '1.0';
    $url = 'http://www.google.com/uds/GlangDetect?v='.$version.
    '&q='.urlencode($text);

    /* Get the (multidimensional) associative array from the JSON
    NOTE: json_decode() requires PHP >= 5.2.0 */
    $response = json_decode(file_get_contents($url), true);

    if ($response['responseStatus'] == 200)
    return $response['responseData']['language'];
    else
    return false;
    }

  4. Hi Samuel,
    Your function is working good except few languages that it doesn’t support(ar,ru and few more)
    Do you have an idea why is that?
    Thanks both of you for this feature, it will help me allot.

Leave a Reply

Your email address will not be published. Required fields are marked *