Monday, February 13, 2012

Hive: Custom User Defined Function (UDF)

Apache Hive has many functions to manipulate the data. Here below given the custom hive function for converting the Unix timestamp value to actual readable date value. This custom method is like adding a functionality to a job processing a query in Hadoop MapReduce.
Here it is enough to extend UDF class.

package com.hive.example.util;

import java.util.Date;
import java.text.DateFormat;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class UnixtimeToDate extends UDF{
    public Text evaluate(Text text) {
        if(text == null) return null;
        long timestamp = Long.parseLong(text.toString());
        return new Text(toDate(timestamp));
    }
    
    private String toDate(long timestamp) {
        Date date = new Date (timestamp * 1000);
        return DateFormat.getInstance().format(date).toString();
    }
}

Pack this class file into a jar: 
$jar -cvf convert.jar com.hive.example.util.UnixtimeToDate

Verify jar using command : $jar -tvf convert.jar

add this jar in hive prompt
hive>create temporary function userdate as 'com.hive.example.util.UnixtimeToDate';

Example:
Normally without the function query:
hive>select id, unixtime from table;
12     879959583

 Then use function 'userdate' in sql command
hive>select id, userdate(unixtime) from table;
12     19/11/97 10:43 PM


No comments: