- Make sure JDK 8 and Maven 3 are installed
git clone https://github.com/guiguan/table-join.gitcd table-join
mvn clean install
mvn clean install -DskipTests
java -jar target/*.jar
java -jar target/*.jar t1JsonPath t2JsonPath outputPath
- Values of
zare unique int1, but not int2 t2has about 3 times as many rows ast1- Each row of json input files is a small JSON object
- Precision of double is defined in
DoubleAdapter.precision, which is set to 6, so46.1999999is considered equivalent to46.2, but not46.199999. Double is used in this project instead of float becauseFloat.parseFloatcan only keep up to 5 decimal places for a floating number. Double gives more freedom on the value ofDoubleAdapter.precision
- GSON is chosen for JSON parsing according to this benchmark for large number of small JSON objects
t1is processed line by line first and a hashtable is constructed fort1.z, which is then used to fast check join criteria oft2line by line.- When processing each line of
t2,zzis ignored, and sum aggregation is updated on the fly and no info oft2is stored otherwise. - Finally, results from 3 is sorted according to
SUM(t2.y)