Howdy all. In this post, I will share some of my monkey coding work that can concurrently sort very huge files in relatively less time.
Goal: Sort an extremely large text file using java.
How it works:
- Take the target text file
- Split it into multiple say ‘n’ small chunks
- Feed each chunk of these files to each of the 'n’ asynchronous threads for sorting
- Merge the sorted files into one (TBI)
Right now, the code has some minor limitations like -
- Total number of lines in the target file should be a multiple of total number of threads.
- File splitting taking more time than sorting (roughly 3:1)
- Inability to merge back the sorted intermediate files into one big sorted file
DOWNLOAD: You can download the jar file from HERE
> java -jar FileSorter.jar FileToBeSorted Concurrency
On successfully running the jar file with valid arguments, you will have #concurrency chunks of sorted files, All you gotta do is to merge 'em back into one.
- We can avoid physically splitting the file before sorting (will do it in next post).
JAVA SOURCE CODE: https://github.com/shankarvalleru/JavaFileSorter
** You can use, modify and do whatever you want to this code as long as you understand what’s going on in the code ;)