tienhv · April 6, 2016 15:35
diff --git a/concat.txt b/concat.txt
 First off, you're not using the full power of cat. The loop can be replaced by just
 http://stackoverflow.com/questions/16873669/combine-multiple-text-files-and-remove-duplicates
 cat data/* > dnsFull
 assuming that file is initially empty.

 Then there's all those temporary files that force programs to wait for hard disks (commonly the slowest parts in modern computer systems). Use a pipeline:

 cat data/* | sort | uniq > dnsOut
 This is still wasteful since sort alone can do what you're using cat and uniq for; the whole script can be replaced by

 sort -u data/* > dnsOut
 If this is still not fast enough, then realize that sorting takes O(n lg n) time while deduplication can be done in linear time with Awk:

 awk '{if (!a[$0]++) print}' data/* > dnsOut
	First off, you're not using the full power of cat. The loop can be replaced by just
	http://stackoverflow.com/questions/16873669/combine-multiple-text-files-and-remove-duplicates
	cat data/* > dnsFull
	assuming that file is initially empty.

	Then there's all those temporary files that force programs to wait for hard disks (commonly the slowest parts in modern computer systems). Use a pipeline:

	cat data/* \| sort \| uniq > dnsOut
	This is still wasteful since sort alone can do what you're using cat and uniq for; the whole script can be replaced by

	sort -u data/* > dnsOut
	If this is still not fast enough, then realize that sorting takes O(n lg n) time while deduplication can be done in linear time with Awk:

	awk '{if (!a[$0]++) print}' data/* > dnsOut
No results found