# Erlang Exercise 19-3

3 messages
Open this post in threaded view
|

## Erlang Exercise 19-3

 Hello, everyone.I'm a little confused with an Exercise, and I cannot understand the question.Here is the question.Write a program to detect plagiarisms in text. To do this, use a two-pass algorithm. In pass 1, break the text into 40-character blocks and compute a checksum for each 40-character block. Store the checksum and filename in an ETS table. In pass 2, compute the checksum of each 40-character block in the data and compare with the checksums in the ETS table.Hint: You will need to compute a  "rolling checksum" to do this. For example, if C1 = B1 + B2 + ... + B40 and C2 = B2 + B3+ ... + B41, then C2 can be quickly computed by obversing that C2 = C1 + B41 - B1.What I've done is finding two files, say file1.txt and file2.txt. I want to check whether file2 plagiarizes file1. So I get first 40-characters of file1 and calculate the checksum of it. And then I move the block one character to the right and calculate the checksum. At last, I will get a list of checksum of file1, and I store them into an ETS table.On pass two, I do the same thing with file2 and get a list of checksum [C1, C2, ..., Cn]. For each element, I check whether it exists in ETS table or not. If exists, it means that there is a block with the same content in file 1. So it is plagiarism. Finally, I count the number of these blocks and output it.However, it seems that I haven't use the filename. So I am wondering that there is something wrong with my opinion.Thanks. _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions