sách gpt4 ai đã đi

java - 设置带有输入拆分的映射器的Hadoop数量不起作用

In lại 作者:行者123 更新时间:2023-12-02 21:09:38 29 4
mua khóa gpt4 Nike

我正在尝试使用不同数量的mapper和reducer多次运行hadoop作业。我已经设置了配置:

  • mapreduce.input.fileinputformat.split.maxsize
  • mapreduce.input.fileinputformat.split.minsize
  • mapreduce.job.maps


我的文件大小为1160421275,当我尝试在此代码中使用4个映射器和3个reducer配置它时:
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
long size = hdfs.getContentSummary(new Path("input/filea").getLength();
size+=hdfs.getContentSummary(new Path("input/fileb").getLength();
conf.set("mapreduce.input.fileinputformat.split.maxsize", String.valueOf((size/4)));
conf.set("mapreduce.input.fileinputformat.split.minsize", String.valueOf((size/4)));
conf.set("mapreduce.job.maps",4);
....
job.setNumReduceTask(3);

size / 4表示290105318。作业的执行给出以下输出:
2016-11-19 12:30:36,426 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 1
2016-11-19 12:30:36,535 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 4
2016-11-19 12:30:36,572 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:7

分割数为7,而不是4,成功作业的输出为:
File System Counters
FILE: Number of bytes read=18855390277
FILE: Number of bytes written=14653469965
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=39184416
Map output records=36751473
Map output bytes=787022241
Map output materialized bytes=860525313
Input split bytes=1801
Combine input records=0
Combine output records=0
Reduce input groups=25064998
Reduce shuffle bytes=860525313
Reduce input records=36751473
Reduce output records=1953960
Spilled Records=110254419
Shuffled Maps =21
Failed Shuffles=0
Merged Map outputs=21
GC time elapsed (ms)=1124
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=6126829568
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=77643084

该 map 显示它处理了21张混洗的 map 。我希望它只处理4个映射器。对于reducer,它给出的文件总数正确为3,而我的mapper拆分大小设置是否错误?

1 Câu trả lời

我相信您正在使用TextInputFormat。

  • 如果您有多个文件,则每个文件将至少产生一个映射器。如果文件大小(不是累积大小,而是单个文件大小)大于块大小(已通过设置min和max进行了调整),则会再次生成更多的映射器。
  • 尝试使用CombineTextInputFormat,这将帮助您实现所需的功能,但可能仍然不完全是4。
  • 查看要用来确定要生成多少个映射器的InputFormat的逻辑。
  • 关于java - 设置带有输入拆分的映射器的Hadoop数量不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40689601/

    29 4 0
    行者123
    Hồ sơ cá nhân

    Tôi là một lập trình viên xuất sắc, rất giỏi!

    Nhận phiếu giảm giá Didi Taxi miễn phí
    Mã giảm giá Didi Taxi
    Giấy chứng nhận ICP Bắc Kinh số 000000
    Hợp tác quảng cáo: 1813099741@qq.com 6ren.com