sách gpt4 ai đã đi

hadoop - Hadoop> Mapper类输入错误

In lại 作者:行者123 更新时间:2023-12-02 22:06:21 25 4
mua khóa gpt4 Nike

我使用的输入文本文件的内容是

1 "Come 
1 "Defects,"
1 "I
1 "Information
1 "J"
2 "Plain
5 "Project
1 "Right
1 "Viator"
左侧的数字和右侧的单词由制表符分隔
但是当我执行下面的mapper函数时
public static class SortingMapper extends Mapper 
{
private Text word = new Text();
private IntWritable freq = new IntWritable();

@Ghi đè
public void map(Text key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
String[] words = line.split("\t");

freq = new IntWritable(Integer.parseInt(words[0]));
word.set(words[1]);
context.write(new Pair(word, freq), NullWritable.get());}}
public static class FirstPartitioner extends Partitioner
{
@Ghi đè
public int getPartition(Pair p, NullWritable n, int numPartitions)
{
String word = p.getFirst().toString();

char first = word.charAt(0);
char middle = 'n';

if(middle < first)
{
trả về 0;
}
khác
return 1 % numPartitions; //why does % need???
}
}

public static class KeyComparator extends WritableComparator
{

protected KeyComparator()
{
super(Pair.class, true);
}

@Ghi đè
public int compare(WritableComparable w1, WritableComparable w2)
{
Pair v1 = (Pair) w1;
Pair v2 = (Pair) w2;

/*
* since we already count word in the first MR we only need to sort the list by frequency
* so no need to compare Text again
int cmp = Pair.compare(v1.getFirst(), v2.getFirst());
if(cmp != 0) { return cmp; }
*/

return -1 * v1.compareTo(v2);
//possible error: it compares Text first and then compare IntWritable
}
}

public static class GroupComparator extends WritableComparator
{
protected GroupComparator()
{
super(Pair.class, true);
}

@Ghi đè
public int compare(WritableComparable w1, WritableComparable w2)
{
Pair v1 = (Pair) w1;
Pair v2 = (Pair) w2;
return v1.getFirst().compareTo(v2.getFirst());
//this compareTo is under binarycomparable
}
}

public static class SortingReducer extends Reducer
{
@Ghi đè
public void reduce(Pair p, Iterable values, Context context) throws IOException, InterruptedException
{
System.out.println("sortingReducer");
context.write(p, NullWritable.get());
}
}

public static void main(String[] args) throws Exception
{

Configuration conf2 = new Configuration();
//String[] otherArgs2 = new GenericOptionsParser(conf1, args).getRemainingArgs();

ControlledJob cJob2 = new ControlledJob(conf2);
//conf2.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", " ");
cJob2.setJobName("Sorting");

Job job2 = cJob2.getJob();

job2.setJarByClass(Sorting.class);

job2.setInputFormatClass(KeyValueTextInputFormat.class);

job2.setMapperClass(SortingMapper.class);
job2.setPartitionerClass(FirstPartitioner.class);
job2.setSortComparatorClass(KeyComparator.class);
job2.setGroupingComparatorClass(GroupComparator.class);
job2.setReducerClass(SortingReducer.class);

job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(NullWritable.class);

job2.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job2, new Path("hdfs:///tmp/inter/part-r-
00000.txt"));
FileOutputFormat.setOutputPath(job2, new Path(args[0]));

job2.waitForCompletion(true);

}
然后我下面有一些错误
Error: java.lang.NumberFormatException: For input string: ""Come"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:481)
at java.lang.Integer.parseInt(Integer.java:527)
at Sorting$SortingMapper.map(Sorting.java:98)
at Sorting$SortingMapper.map(Sorting.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
我猜在String []单词中有问题,但是我不知道该怎么解决。如果您能帮助我解决错误,将不胜感激。
另外
我发现我曾经
 job2.setInputFormatClass(KeyValueTextInputFormat.class); 


在主要功能中,通过制表符分隔符将键和值分开,所以我只是更改了
String line = value.toString();

String[] words = line.split("\t");

freq = new IntWritable(Integer.parseInt(words[0]));
word.set(words[1]);
Đi vào
String num = key.toString();
freq = new IntWritable(Integer.parseInt(num));
word = value;
context.write(new Pair(word, freq), NullWritable.get());

它运行成功,但是输出很奇怪。
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
....
我的预期输出是
5 "Project 
2 "Plain
1 "Come
1 "Defects,"
1 "I
1 "Information
1 "J"
1 "Right
1 "Viator"
变化使情况变得更糟吗?

1 Câu trả lời

您只需要在toString对象上覆盖Đôi并返回您想要成为每个记录的最终输出的任何内容即可。

像这样

class Pair {

...

@Ghi đè
public String toString() {
return freq + " " + word;
}
}

关于hadoop - Hadoop> Mapper类输入错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24373447/

25 4 0
行者123
Hồ sơ cá nhân

Tôi là một lập trình viên xuất sắc, rất giỏi!

Nhận phiếu giảm giá Didi Taxi miễn phí
Mã giảm giá Didi Taxi
Giấy chứng nhận ICP Bắc Kinh số 000000
Hợp tác quảng cáo: 1813099741@qq.com 6ren.com