[CSED332]TeamProject - Team Pink
https://large-drain-0aa.notion.site/CS332-Team-Project-bb4c00c8aa5f4f96a583a9539c78d1a3
First, write the command "sbt" to the terminal first. run "workerNum"
sbt
sbt:distributedSorting> run 4
Multiple main classes detected. Select one to run: [1] network.Master [2] network.Worker
Enter number: 1
run "master ipAddress: master port" -I "input path" -O "output path"
sbt
sbt:distributedSorting> run 2.2.2.107:50051 -I /home/pink/64/input -O /home/pink/64/output
We make the input file using gensort -a ( -b option is not appropriate because key should be ASCIICode.)
4 Workers with individualy 2 inputs Each file is 32MB. So sum of the all inputs is 256MB (32 *2 *4) Master is VM 7 and workers are VM 3/4/5/6.
- Starting Master
- When all workers are connected
Master print all workers IP addresses.
3)When master finished all process and terminates.
Master print the sum of all outputfiles sizes from individual workers. Input was 256MB and the result byte is 256000000. So we can say that number of records in the input are same with the number of records in the output.
range from : "{Y>:#%`
range to : Vu1s$HZwQ;
range from : Vu27GO]$g?
range to : k$&X\sS}`T
range from : k$'a~gf{Z.
range to : pW3|:I8,%
range from : pW9cf)7^Z@
range to : ~~}+tO+-g}
Seeing the individual workers' range, all keys are ordered in ascending order in ASCII code from worker to worker. To test whether the ouput is sorted in each workers, we made an testing code.
In each worker, to test the individual worker's sorting of the output file we should put command below.
sbt
test
Then the SortingTest that we made show whether the sorting is correctly done in each worker.
We can check the sampling / partitioning/ shuffling / subpartitioning is well going with checking the temp directory.
You can check the individual process is well going in our progress notion of week 8.
https://large-drain-0aa.notion.site/8th-week-b8c4ab5e540e423c9fb2505674ad106d