Sorting Huge Files Using Java

Howdy all. In this post, I will share some of my monkey coding work that can concurrently sort very huge files in relatively less time.

Goal: Sort an extremely large text file using java.

How it works:

  1. Take the target text file
  2. Split it into multiple say ‘n’ small chunks
  3. Feed each chunk of these files to each of the 'n’ asynchronous threads for sorting
  4. Merge the sorted files into one (TBI)  

Limitations:

Right now, the code has some minor limitations like -

  • Total number of lines in the target file should be a multiple of total number of threads.
  • File splitting taking more time than sorting (roughly 3:1)
  • Inability to merge back the sorted intermediate files into one big sorted file

DOWNLOAD: You can download the jar file from HERE

USAGE:

> java -jar FileSorter.jar FileToBeSorted Concurrency

On successfully running the jar file with valid arguments, you will have #concurrency chunks of sorted files, All you gotta do is to merge 'em back into one. 

IMPROVEMENTS:

- We can avoid physically splitting the file before sorting (will do it in next post). 

JAVA SOURCE CODE: https://github.com/shankarvalleru/JavaFileSorter

** You can use, modify and do whatever you want to this code as long as you understand what’s going on in the code ;)