Jul 9

org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs处理 不指定

admin , 10:37 , Hadoop , 评论(0) , 引用(0) , 阅读(475) , Via 本站原创 | |
搜索
我已经获得阿里云幸运券,准备分享给您。请点击获取   hadoop mapreducer 中如果输出文件存在会报org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs处理
hadoop@itlife365 ~]$ hadoop jar wcountljs.jar com.itlife365.bigdata.hadoop.mr.wordcount.WordCountRunner
17/06/19 20:30:56 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/06/19 20:30:57 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://itlife365:9000/user/hadoop/output already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
        at com.itlife365.bigdata.hadoop.mr.wordcount.WordCountRunner.main(WordCountRunner.java:65)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
方法1:进入hdfs 的目录删除这个目录 要自己手动删除输出文件夹
方法2:代码中添加判断,如果有存在则删除
其实可以在代码上利用hdfs的文件操作,解决这个问题。思想就是在代码运行之前,也就是提交作业之前,判断output文件夹是否存在,如果存在则删除。关键代码如下:
//判断output输出文件夹是否存在,如果存在则删除
  //Path path = new Path(otherArgs[1]);// 取第1个表示输出目录参数(第0个参数是输入目录)
  Path path = new Path("hdfs://itlife365:9000/user/hadoop/output");
  FileSystem fileSystem = path.getFileSystem(conf);// 根据path找到这个文件
  if(fileSystem.exists(path)){
   fileSystem.delete(path, true);// true的意思是,就算output有东西,也一带删除 
  }

        FileInputFormat.addInputPath(job, new Path(Args[0])); 
        FileOutputFormat.setOutputPath(job, new Path(Args[1])); 
        System.exit(job.waitForCompletion(true) ? 0 : 1); 

--重新打包问题解决 hadoop FileAlreadyExistsException Output