Skip to content
tanghong123 edited this page Sep 14, 2010 · 3 revisions

Frequently Asked Questions

  1. What versions of Hadoop does hadoop-gpl-compression support?
    hadoop-gpl-compression supports Hadoop 0.20, 0.21, and 0.22 (the current trunk). Due to the divergence of API, we branched out branch-0.1 which supports Hadoop 0.20, and the current trunk supports Hadoop 0.21 and 0.22.
  2. How do I build 32 bit and 64 bit binaries?
    1. First ensure you have checked out the correct code base (trunk for Hadoop 0.21/0.22, and branches/branch-0.1 for Hadoop 0.20).
    2. Next, you need to compile twice, once with the appropriate variables set.
      export JAVA_HOME=/path/to/32bit/jdk
      export CFLAGS=-m32
      export CXXFLAGS=-m32
      ant compile-native
       
      export JAVA_HOME=/path/to/64bit/jdk
      export CFLAGS=-m64
      export CXXFLAGS=-m64
      ant compile-native tar
      
      Note that you must have both 32-bit and 64-bit liblzo2 installed. This is how it looks like on my RedHat build machine:
      % ls -l /usr/lib*/liblzo2*
      -rw-r--r--  1 root root 171056 Mar 20  2006 /usr/lib/liblzo2.a
      lrwxrwxrwx  1 root root     16 Feb 17  2007 /usr/lib/liblzo2.so -> liblzo2.so.2.0.0*
      lrwxrwxrwx  1 root root     16 Feb 17  2007 /usr/lib/liblzo2.so.2 -> liblzo2.so.2.0.0*
      -rwxr-xr-x  1 root root 129067 Mar 20  2006 /usr/lib/liblzo2.so.2.0.0*
      -rw-r--r--  1 root root 208494 Mar 20  2006 /usr/lib64/liblzo2.a
      lrwxrwxrwx  1 root root     16 Feb 17  2007 /usr/lib64/liblzo2.so -> liblzo2.so.2.0.0*
      lrwxrwxrwx  1 root root     16 Feb 17  2007 /usr/lib64/liblzo2.so.2 -> liblzo2.so.2.0.0*
      -rwxr-xr-x  1 root root 126572 Mar 20  2006 /usr/lib64/liblzo2.so.2.0.0*
      
  3. How do I configure Hadoop to use these classes?
    Generally, using these classes is no different from using any classes from a third party jar: (1) make sure the jar file is in the class path; (2) make sure the depending dynamic libraries’ paths are in the system property java.library.path; (3) use the classes provided by the jar file. There are various ways to do the above. The following is the approach I took by placing the jar files and native libraries in the right place and let hadoop script to do (1) and (2):
    #Build the jar file and native library:
    cd /path/to/hadoop-gpl-compression
    ant compile-native tar
    #Copy the jar file
    cp build/hadoop-gpl-compression-0.1.0-dev/hadoop-gpl-compression-0.1.0-dev.jar /path/to/hadoop/dist/lib/
    #Copy the native library
    tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C /path/to/hadoop/dist/lib/native
    
    Additional steps are needed to add entries to hadoop configuration file to register the external codecs in the codec factory. Add the following key/value pairs into hadoop-site.xml (or core-site.xml):
      <property>
        <name>io.compression.codecs</name>
         <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec </value>
       </property>
       <property>
         <name>io.compression.codec.lzo.class</name>
         <value>com.hadoop.compression.lzo.LzoCodec</value>
       </property>
    
    If you would like to use lzo to compress intermediate map output, set the following in hadoop-site.xml:
      <property>
        <name>mapred.compress.map.output</name>
        <value>true</value>
      </property>
      <property>
        <name>mapred.map.output.compression.codec</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
    
    Or if you are using Hadoop 0.21 or later, set the following in mapred-site.xml:
      <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
      </property>
      <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
    
  4. How do I build hadoop-gpl-compression on Mac OS X 10.5 (Leopard)? (Note: some instructions below are based on http://wiki.apache.org/hadoop/UsingLzoCompression.)
    • Download latest java update from apple.com and set default version of java to be 1.6.
    • Download LZO2 library and header. Because Java 1.6 is only available for 64-bit applications, we need to have 64-bit lzo library. There are two ways: (1) manual build; (2) macports.
      • If you choose the route of manual build:
        1. Download lzo2 source from http://www.oberhumer.com/opensource/lzo/download/.
        2. Unpack source tarball, configure/build/install lzo2 with the following commands:
            tar -xzf lzo-2.03.tar.gz
            cd lzo-2.03
            env CFLAGS="-arch x86_64" ./configure --build=x86_64-darwin --enable-shared --disable-asm --prefix=/path/to/lzo64/
            make; make install
          
      • If you want to use macports, as root, do the following:
          port fetch lzo2 # if lzo2 is already installed, do "port uninstall lzo2"
          port edit lzo2 # the Portfile for lzo2 will be opened in your $EDITOR.
        ##Add the following block of text in the file and save the file.##
        variant x86_64 description "Build the 64-bit." {
            configure.args-delete     --build=x86-apple-darwin ABI=standard
            configure.cflags-delete   -m32
            configure.cxxflags-delete -m32
         
            configure.args-append     --build=x86_64-apple-darwin ABI=64
            configure.cflags-append   -m64 -arch x86_64
            configure.cxxflags-append -m64 -arch x86_64
        }
        ##END##
          port install lzo2 +x86_64
        
        Now the 64-bit lzo2 library will be installed under /opt/local/lib.
    • Finally, build hadoop-gpl-compression library with the following:
        env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/ \
        C_INCLUDE_PATH=/path/to/lzo64/include LIBRARY_PATH=/path/to/lzo64/lib \
        CFLAGS="-arch x86_64" ant clean compile-native test tar
      
      In the above, substitute /path/to/lzo64 with /opt/local if you install lzo2 through macports. With a bit luck, you should see BUILD SUCCESSFUL at the end. Congratulation, now you can use LZO compression in your java program on Mac OS X 10.5 (Leopard)!
Clone this wiki locally