Harp-DAAL provides a class Initialize to facilitate the initialization stage of Harp-DAAL application from Hadoop environment. To use Initialize, load the package at the beginning of application’s launcher class.
import edu.iu.data_aux.*;
To create an Initialize instance from the launcher class
// obtain the hadoop configuration
Configuration conf = this.getConf();
Initialize init = new Initialize(conf, args);
To load all the native libraries required by Harp-DAAL into distributed cache of Harp. Make sure that the native libraries exist at HDFS path /Hadoop/Libraries before adding this line.
init.loadDistributedLibs();
Harp-DAAL divides the command line arguments into two parts: 1) Harp-DAAL system-wide arguments; 2) Application-wide arguments. System-wide arguments must be supplied at the first six entries of command line args, including
- num_mapper: the number of mappers (processes) in Harp-DAAL.
- num_thread: the number of threads per mapper
- mem: the memory in MB allocated to each mapper
- iterations: the iteration number for iterative applications (1 for non-iterative applications)
- inputDir: the HDFS directory path for input data
- workDir: the HDFS directory path for intermediate data and results
To load all the system-wide arguments
init.loadSysArgs();
To load the application-wide arguments, users must set them either in the HarpDAALConstants class or a user-defined Constants class accesible to the mapper class. Here function getSysArgNum() obtain the number of system-wide arguments in args[] array.
conf.setInt(HarpDAALConstants.FILE_DIM, Integer.parseInt(args[init.getSysArgNum()]));
conf.setDouble(Constants.MIN_SUPPORT, Double.parseDouble(args[init.getSysArgNum()+1]));
conf.setDouble(Constants.MIN_CONFIDENCE, Double.parseDouble(args[init.getSysArgNum()+2]));
To create a Harp-DAAL job, specify the name of application job, the name of launcher class, and the name of mapper class.
Job arbatchJob = init.createJob("jabname", launcherName.class, MapperName.class);
To launch the job and wait for its completion.
boolean jobSuccess = arbatchJob.waitForCompletion(true);
Finally, we post the whole run function of an Association Rule application launcher class
@Override
public int run(String[] args) throws Exception {
// get the configuration handle
Configuration conf = this.getConf();
// create Initialize obj
Initialize init = new Initialize(conf, args);
// load native libs
init.loadDistributedLibs();
// load system-wide args
init.loadSysArgs();
// load application-wide args
conf.setInt(HarpDAALConstants.FILE_DIM, Integer.parseInt(args[init.getSysArgNum()]));
conf.setDouble(Constants.MIN_SUPPORT, Double.parseDouble(args[init.getSysArgNum()+1]));
conf.setDouble(Constants.MIN_CONFIDENCE, Double.parseDouble(args[init.getSysArgNum()+2]));
// launch job
System.out.println("Starting Job");
Job arbatchJob = init.createJob("arbatchJob", ARDaalLauncher.class, ARDaalCollectiveMapper.class);
// finish job
boolean jobSuccess = arbatchJob.waitForCompletion(true);
System.out.println("End Job#" + " "+ new SimpleDateFormat("HH:mm:ss.SSS").format(Calendar.getInstance().getTime()));
if (!jobSuccess) {
arbatchJob.killJob();
System.out.println("ArBatchJob Job failed");
}
return 0;
}