-
Notifications
You must be signed in to change notification settings - Fork 10
tanghong123 edited this page Sep 14, 2010
·
3 revisions
-
What versions of Hadoop does hadoop-gpl-compression support?
hadoop-gpl-compression supports Hadoop 0.20, 0.21, and 0.22 (the current trunk). Due to the divergence of API, we branched out branch-0.1 which supports Hadoop 0.20, and the current trunk supports Hadoop 0.21 and 0.22. -
How do I build 32 bit and 64 bit binaries?
- First ensure you have checked out the correct code base (trunk for Hadoop 0.21/0.22, and branches/branch-0.1 for Hadoop 0.20).
- Next, you need to compile twice, once with the appropriate variables set.
export JAVA_HOME=/path/to/32bit/jdk export CFLAGS=-m32 export CXXFLAGS=-m32 ant compile-native export JAVA_HOME=/path/to/64bit/jdk export CFLAGS=-m64 export CXXFLAGS=-m64 ant compile-native tar
Note that you must have both 32-bit and 64-bit liblzo2 installed. This is how it looks like on my RedHat build machine:
% ls -l /usr/lib*/liblzo2* -rw-r--r-- 1 root root 171056 Mar 20 2006 /usr/lib/liblzo2.a lrwxrwxrwx 1 root root 16 Feb 17 2007 /usr/lib/liblzo2.so -> liblzo2.so.2.0.0* lrwxrwxrwx 1 root root 16 Feb 17 2007 /usr/lib/liblzo2.so.2 -> liblzo2.so.2.0.0* -rwxr-xr-x 1 root root 129067 Mar 20 2006 /usr/lib/liblzo2.so.2.0.0* -rw-r--r-- 1 root root 208494 Mar 20 2006 /usr/lib64/liblzo2.a lrwxrwxrwx 1 root root 16 Feb 17 2007 /usr/lib64/liblzo2.so -> liblzo2.so.2.0.0* lrwxrwxrwx 1 root root 16 Feb 17 2007 /usr/lib64/liblzo2.so.2 -> liblzo2.so.2.0.0* -rwxr-xr-x 1 root root 126572 Mar 20 2006 /usr/lib64/liblzo2.so.2.0.0*
-
How do I configure Hadoop to use these classes?
Generally, using these classes is no different from using any classes from a third party jar: (1) make sure the jar file is in the class path; (2) make sure the depending dynamic libraries’ paths are in the system property java.library.path; (3) use the classes provided by the jar file. There are various ways to do the above. The following is the approach I took by placing the jar files and native libraries in the right place and let hadoop script to do (1) and (2):
#Build the jar file and native library: cd /path/to/hadoop-gpl-compression ant compile-native tar #Copy the jar file cp build/hadoop-gpl-compression-0.1.0-dev/hadoop-gpl-compression-0.1.0-dev.jar /path/to/hadoop/dist/lib/ #Copy the native library tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C /path/to/hadoop/dist/lib/native
Additional steps are needed to add entries to hadoop configuration file to register the external codecs in the codec factory. Add the following key/value pairs intohadoop-site.xml
(orcore-site.xml
):
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
If you would like to use lzo to compress intermediate map output, set the following inhadoop-site.xml
:
<property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
Or if you are using Hadoop 0.21 or later, set the following inmapred-site.xml
:
<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
-
How do I build hadoop-gpl-compression on Mac OS X 10.5 (Leopard)? (Note: some instructions below are based on http://wiki.apache.org/hadoop/UsingLzoCompression.)
- Download latest java update from apple.com and set default version of java to be 1.6.
- Download LZO2 library and header. Because Java 1.6 is only available for 64-bit applications, we need to have 64-bit lzo library. There are two ways: (1) manual build; (2) macports.
- If you choose the route of manual build:
- Download lzo2 source from http://www.oberhumer.com/opensource/lzo/download/.
- Unpack source tarball, configure/build/install lzo2 with the following commands:
tar -xzf lzo-2.03.tar.gz cd lzo-2.03 env CFLAGS="-arch x86_64" ./configure --build=x86_64-darwin --enable-shared --disable-asm --prefix=/path/to/lzo64/ make; make install
- If you want to use macports, as root, do the following:
port fetch lzo2 # if lzo2 is already installed, do "port uninstall lzo2" port edit lzo2 # the Portfile for lzo2 will be opened in your $EDITOR. ##Add the following block of text in the file and save the file.## variant x86_64 description "Build the 64-bit." { configure.args-delete --build=x86-apple-darwin ABI=standard configure.cflags-delete -m32 configure.cxxflags-delete -m32 configure.args-append --build=x86_64-apple-darwin ABI=64 configure.cflags-append -m64 -arch x86_64 configure.cxxflags-append -m64 -arch x86_64 } ##END## port install lzo2 +x86_64
Now the 64-bit lzo2 library will be installed under /opt/local/lib.
- If you choose the route of manual build:
- Finally, build hadoop-gpl-compression library with the following:
env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/ \ C_INCLUDE_PATH=/path/to/lzo64/include LIBRARY_PATH=/path/to/lzo64/lib \ CFLAGS="-arch x86_64" ant clean compile-native test tar
In the above, substitute/path/to/lzo64
with/opt/local
if you install lzo2 through macports. With a bit luck, you should see BUILD SUCCESSFUL at the end. Congratulation, now you can use LZO compression in your java program on Mac OS X 10.5 (Leopard)!