1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
3557
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
3705
3706
3707
3708
3709
3710
3711
3712
3713
3714
3715
3716
3717
3718
3719
3720
3721
3722
3723
3724
3725
3726
3727
3728
3729
3730
3731
3732
3733
3734
3735
3736
3737
3738
3739
3740
3741
3742
3743
3744
3745
3746
3747
3748
3749
3750
3751
3752
3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
3765
3766
3767
3768
3769
3770
3771
3772
3773
3774
3775
3776
3777
3778
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834
3835
3836
3837
3838
3839
3840
3841
3842
3843
3844
3845
3846
3847
3848
3849
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
3905
3906
3907
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922
3923
3924
3925
3926
3927
3928
3929
3930
3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
3951
3952
3953
3954
3955
3956
3957
3958
3959
3960
3961
3962
3963
3964
3965
3966
3967
3968
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
4047
4048
4049
4050
4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102
4103
4104
4105
4106
4107
4108
4109
4110
4111
4112
4113
4114
4115
4116
4117
4118
4119
4120
4121
4122
4123
4124
4125
4126
4127
4128
4129
4130
4131
4132
4133
4134
4135
4136
4137
4138
4139
4140
4141
4142
4143
4144
4145
4146
4147
4148
4149
4150
4151
4152
4153
4154
4155
4156
4157
4158
4159
4160
4161
4162
4163
4164
4165
4166
4167
4168
4169
4170
4171
4172
4173
4174
4175
4176
4177
4178
4179
4180
4181
4182
4183
4184
4185
4186
4187
4188
4189
4190
4191
4192
4193
4194
4195
4196
4197
4198
4199
4200
4201
4202
4203
4204
4205
4206
4207
4208
4209
4210
4211
4212
4213
4214
4215
4216
4217
4218
4219
4220
4221
4222
4223
4224
4225
4226
4227
4228
4229
4230
4231
4232
4233
4234
4235
4236
4237
4238
4239
4240
4241
4242
4243
4244
4245
4246
4247
4248
4249
4250
4251
4252
4253
4254
4255
4256
4257
4258
4259
4260
4261
4262
4263
4264
4265
4266
4267
4268
4269
4270
4271
4272
4273
4274
4275
4276
4277
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289
4290
4291
4292
4293
4294
4295
4296
4297
4298
4299
4300
4301
4302
4303
4304
4305
4306
4307
4308
4309
4310
4311
4312
4313
4314
4315
4316
4317
4318
4319
4320
4321
4322
4323
4324
4325
4326
4327
4328
4329
4330
4331
4332
4333
4334
4335
4336
4337
4338
4339
4340
4341
4342
4343
4344
4345
4346
4347
4348
4349
4350
4351
4352
4353
4354
4355
4356
4357
4358
4359
4360
4361
4362
4363
4364
4365
4366
4367
4368
4369
4370
4371
4372
4373
4374
4375
4376
4377
4378
4379
4380
4381
4382
4383
4384
4385
4386
4387
4388
4389
4390
4391
4392
4393
4394
4395
4396
4397
4398
4399
4400
4401
4402
4403
4404
4405
4406
4407
4408
4409
4410
4411
4412
4413
4414
4415
4416
4417
4418
4419
4420
4421
4422
4423
4424
4425
4426
4427
4428
4429
4430
4431
4432
4433
4434
4435
4436
4437
4438
4439
4440
4441
4442
4443
4444
4445
4446
4447
4448
4449
4450
4451
4452
4453
4454
4455
4456
4457
4458
4459
4460
4461
4462
4463
4464
4465
4466
4467
4468
4469
4470
4471
4472
4473
4474
4475
4476
4477
4478
4479
4480
4481
4482
4483
4484
4485
4486
4487
4488
4489
4490
4491
4492
4493
4494
4495
4496
4497
4498
4499
4500
4501
4502
4503
4504
4505
4506
4507
4508
4509
4510
4511
4512
4513
4514
4515
4516
4517
4518
4519
4520
4521
4522
4523
4524
4525
4526
4527
4528
4529
4530
4531
4532
4533
4534
4535
4536
4537
4538
4539
4540
4541
4542
4543
4544
4545
4546
4547
4548
4549
4550
4551
4552
4553
4554
4555
4556
4557
4558
4559
4560
4561
4562
4563
4564
4565
4566
4567
4568
4569
4570
4571
4572
4573
4574
4575
4576
4577
4578
4579
4580
4581
4582
4583
4584
4585
4586
4587
4588
4589
4590
4591
4592
4593
4594
4595
4596
4597
4598
4599
4600
4601
4602
4603
4604
4605
4606
4607
4608
4609
4610
4611
4612
4613
4614
4615
4616
4617
4618
4619
4620
4621
4622
4623
4624
4625
4626
4627
4628
4629
4630
4631
4632
4633
4634
4635
4636
4637
4638
4639
4640
4641
4642
4643
4644
4645
4646
4647
4648
4649
4650
4651
4652
4653
4654
4655
4656
4657
4658
4659
4660
4661
4662
4663
4664
4665
4666
4667
4668
4669
4670
4671
4672
4673
4674
4675
4676
4677
4678
4679
4680
4681
4682
4683
4684
4685
4686
4687
4688
4689
4690
4691
4692
4693
4694
4695
4696
4697
4698
4699
4700
4701
4702
4703
4704
4705
4706
4707
4708
4709
4710
4711
4712
4713
4714
4715
4716
4717
4718
4719
4720
4721
4722
4723
4724
4725
4726
4727
4728
4729
4730
4731
4732
4733
4734
4735
4736
4737
4738
4739
4740
4741
4742
4743
4744
4745
4746
4747
4748
4749
4750
4751
4752
4753
4754
4755
4756
4757
4758
4759
4760
4761
4762
4763
4764
4765
4766
4767
4768
4769
4770
4771
4772
4773
4774
4775
4776
4777
4778
4779
4780
4781
4782
4783
4784
4785
4786
4787
4788
4789
4790
4791
4792
4793
4794
4795
4796
4797
4798
4799
4800
4801
4802
4803
4804
4805
4806
4807
4808
4809
4810
4811
4812
4813
4814
4815
4816
4817
4818
4819
4820
4821
4822
4823
4824
4825
4826
4827
4828
4829
4830
4831
4832
4833
4834
4835
4836
4837
4838
4839
4840
4841
4842
4843
4844
4845
4846
4847
4848
4849
4850
4851
4852
4853
4854
4855
4856
4857
4858
4859
4860
4861
4862
4863
4864
4865
4866
4867
4868
4869
4870
4871
4872
4873
4874
4875
4876
4877
4878
4879
4880
4881
4882
4883
4884
4885
4886
4887
4888
4889
4890
4891
4892
4893
4894
4895
4896
4897
4898
4899
4900
4901
4902
4903
4904
4905
4906
4907
4908
4909
4910
4911
4912
4913
4914
4915
4916
4917
4918
4919
4920
4921
4922
4923
4924
4925
4926
4927
4928
4929
4930
4931
4932
4933
4934
4935
4936
4937
4938
4939
4940
4941
4942
4943
4944
4945
4946
4947
4948
4949
4950
4951
4952
4953
4954
4955
4956
4957
4958
4959
4960
4961
4962
4963
4964
4965
4966
4967
4968
4969
4970
4971
4972
4973
4974
4975
4976
4977
4978
4979
4980
4981
4982
4983
4984
4985
4986
4987
4988
4989
4990
4991
4992
4993
4994
4995
4996
4997
4998
4999
5000
5001
5002
5003
5004
5005
5006
5007
5008
5009
5010
5011
5012
5013
5014
5015
5016
5017
5018
5019
5020
5021
5022
5023
5024
5025
5026
5027
5028
5029
5030
5031
5032
5033
5034
5035
5036
5037
5038
5039
5040
5041
5042
5043
5044
5045
5046
5047
5048
5049
5050
5051
5052
5053
5054
5055
5056
5057
5058
5059
5060
5061
5062
5063
5064
5065
5066
5067
5068
5069
5070
5071
5072
5073
5074
5075
5076
5077
5078
5079
5080
5081
5082
5083
5084
5085
5086
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
5102
5103
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
5121
5122
5123
5124
5125
5126
5127
5128
5129
5130
5131
5132
5133
5134
5135
5136
5137
5138
5139
5140
5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
5151
5152
5153
5154
5155
5156
5157
5158
5159
5160
5161
5162
5163
5164
5165
5166
5167
5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186
5187
5188
5189
5190
5191
5192
5193
5194
5195
5196
5197
5198
5199
5200
5201
5202
5203
5204
5205
5206
5207
5208
5209
5210
5211
5212
5213
5214
5215
5216
5217
5218
5219
5220
5221
5222
5223
5224
5225
5226
5227
5228
5229
5230
5231
5232
5233
5234
5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
5251
5252
5253
5254
5255
5256
5257
5258
5259
5260
5261
5262
5263
5264
5265
5266
5267
5268
5269
5270
5271
5272
5273
5274
5275
5276
5277
5278
5279
5280
5281
5282
5283
5284
5285
5286
5287
5288
5289
5290
5291
5292
5293
5294
5295
5296
5297
5298
5299
5300
5301
5302
5303
5304
5305
5306
5307
5308
5309
5310
5311
5312
5313
5314
5315
5316
5317
5318
5319
5320
5321
5322
5323
5324
5325
5326
5327
5328
5329
5330
5331
5332
5333
5334
5335
5336
5337
5338
5339
5340
5341
5342
5343
5344
5345
5346
5347
5348
5349
5350
5351
5352
5353
5354
5355
5356
5357
5358
5359
5360
5361
5362
5363
5364
5365
5366
5367
5368
5369
5370
5371
5372
5373
5374
5375
5376
5377
5378
5379
5380
5381
5382
5383
5384
5385
5386
5387
5388
5389
5390
5391
5392
5393
5394
5395
5396
5397
5398
5399
5400
5401
5402
5403
5404
5405
5406
5407
5408
5409
5410
5411
5412
5413
5414
5415
5416
5417
5418
5419
5420
5421
5422
5423
5424
5425
5426
5427
5428
5429
5430
5431
5432
5433
5434
5435
5436
5437
5438
5439
5440
5441
5442
5443
5444
5445
5446
5447
5448
5449
5450
5451
5452
5453
5454
5455
5456
5457
5458
5459
5460
5461
5462
5463
5464
5465
5466
5467
5468
5469
5470
5471
5472
5473
5474
5475
5476
5477
5478
5479
5480
5481
5482
5483
5484
5485
5486
5487
5488
5489
5490
5491
5492
5493
5494
5495
5496
5497
5498
5499
5500
5501
5502
5503
5504
5505
5506
5507
5508
5509
5510
5511
5512
5513
5514
5515
5516
5517
5518
5519
5520
5521
5522
5523
5524
5525
5526
5527
5528
5529
5530
5531
5532
5533
5534
5535
5536
5537
5538
5539
5540
5541
5542
5543
5544
5545
5546
5547
5548
5549
5550
5551
5552
5553
5554
5555
5556
5557
5558
5559
5560
5561
5562
5563
5564
5565
5566
5567
5568
5569
5570
5571
5572
5573
5574
5575
5576
5577
5578
5579
5580
5581
5582
5583
5584
5585
5586
5587
5588
5589
5590
5591
5592
5593
5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605
5606
5607
5608
5609
5610
5611
5612
5613
5614
5615
5616
5617
5618
5619
5620
5621
5622
5623
5624
5625
5626
5627
5628
5629
5630
5631
5632
5633
5634
5635
5636
5637
5638
5639
5640
5641
5642
5643
5644
5645
5646
5647
5648
5649
5650
5651
5652
5653
5654
5655
5656
5657
5658
5659
5660
5661
5662
5663
5664
5665
5666
5667
5668
5669
5670
5671
5672
5673
5674
5675
5676
5677
5678
5679
5680
5681
5682
5683
5684
5685
5686
5687
5688
5689
5690
5691
5692
5693
5694
5695
5696
5697
5698
5699
5700
5701
5702
5703
5704
5705
5706
5707
5708
5709
5710
5711
5712
5713
5714
5715
5716
5717
5718
5719
5720
5721
5722
5723
5724
5725
5726
5727
5728
5729
5730
5731
5732
5733
5734
5735
5736
5737
5738
5739
5740
5741
5742
5743
5744
5745
5746
5747
5748
5749
5750
5751
5752
5753
5754
5755
5756
5757
5758
5759
5760
5761
5762
5763
5764
5765
5766
5767
5768
5769
5770
5771
5772
5773
5774
5775
5776
5777
5778
5779
5780
5781
5782
5783
5784
5785
5786
5787
5788
5789
5790
5791
5792
5793
5794
5795
5796
5797
5798
5799
5800
5801
5802
5803
5804
5805
5806
5807
5808
5809
5810
5811
5812
5813
5814
5815
5816
5817
5818
5819
5820
5821
5822
5823
5824
5825
5826
5827
5828
5829
5830
5831
5832
5833
5834
5835
5836
5837
5838
5839
5840
5841
5842
5843
5844
5845
5846
5847
5848
5849
5850
5851
5852
5853
5854
5855
5856
5857
5858
5859
5860
5861
5862
5863
5864
5865
5866
5867
5868
5869
5870
5871
5872
5873
5874
5875
5876
5877
5878
5879
5880
5881
5882
5883
5884
5885
5886
5887
5888
5889
5890
5891
5892
5893
5894
5895
5896
5897
5898
5899
5900
5901
5902
5903
5904
5905
5906
5907
5908
5909
5910
5911
5912
5913
5914
5915
5916
5917
5918
5919
5920
5921
5922
5923
5924
5925
5926
5927
5928
5929
5930
5931
5932
5933
5934
5935
5936
5937
5938
5939
5940
5941
5942
5943
5944
5945
5946
5947
5948
5949
5950
5951
5952
5953
5954
5955
5956
5957
5958
5959
5960
5961
5962
5963
5964
5965
5966
5967
5968
5969
5970
5971
5972
5973
5974
5975
5976
5977
5978
5979
5980
5981
5982
5983
5984
5985
5986
5987
5988
5989
5990
5991
5992
5993
5994
5995
5996
5997
5998
5999
6000
6001
6002
6003
6004
6005
6006
6007
6008
6009
6010
6011
6012
6013
6014
6015
6016
6017
6018
6019
6020
6021
6022
6023
6024
6025
6026
6027
6028
6029
6030
6031
6032
6033
6034
6035
6036
6037
6038
6039
6040
6041
6042
6043
6044
6045
6046
6047
6048
6049
6050
6051
6052
6053
6054
6055
6056
6057
6058
6059
6060
6061
6062
6063
6064
6065
6066
6067
6068
6069
6070
6071
6072
6073
6074
6075
6076
6077
6078
6079
6080
6081
6082
6083
6084
6085
6086
6087
6088
6089
6090
6091
6092
6093
6094
6095
6096
6097
6098
6099
6100
6101
6102
6103
6104
6105
6106
6107
6108
6109
6110
6111
6112
6113
6114
6115
6116
6117
6118
6119
6120
6121
6122
6123
6124
6125
6126
6127
6128
6129
6130
6131
6132
6133
6134
6135
6136
6137
6138
6139
6140
6141
6142
6143
6144
6145
6146
6147
6148
6149
6150
6151
6152
6153
6154
6155
6156
6157
6158
6159
6160
6161
6162
6163
6164
6165
6166
6167
6168
6169
6170
6171
6172
6173
6174
6175
6176
6177
6178
6179
6180
6181
6182
6183
6184
6185
6186
6187
6188
6189
6190
6191
6192
6193
6194
6195
6196
6197
6198
6199
6200
6201
6202
6203
6204
6205
6206
6207
6208
6209
6210
6211
6212
6213
6214
6215
6216
6217
6218
6219
6220
6221
6222
6223
6224
6225
6226
6227
6228
6229
6230
6231
6232
6233
6234
6235
6236
6237
6238
6239
6240
6241
6242
6243
6244
6245
6246
6247
6248
6249
6250
6251
6252
6253
6254
6255
6256
6257
6258
6259
6260
6261
6262
6263
6264
6265
6266
6267
6268
6269
6270
6271
6272
6273
6274
6275
6276
6277
6278
6279
6280
6281
6282
6283
6284
6285
6286
6287
6288
6289
6290
6291
6292
6293
6294
6295
6296
6297
6298
6299
6300
6301
6302
6303
6304
6305
6306
6307
6308
6309
6310
6311
6312
6313
6314
6315
6316
6317
6318
6319
6320
6321
6322
6323
6324
6325
6326
6327
6328
6329
6330
6331
6332
6333
6334
6335
6336
6337
6338
6339
6340
6341
6342
6343
6344
6345
6346
6347
6348
6349
6350
6351
6352
6353
6354
6355
6356
6357
6358
6359
6360
6361
6362
6363
6364
6365
6366
6367
6368
6369
6370
6371
6372
6373
6374
6375
6376
6377
6378
6379
6380
6381
6382
6383
6384
6385
6386
6387
6388
6389
6390
6391
6392
6393
6394
6395
6396
6397
6398
6399
6400
6401
6402
6403
6404
6405
6406
6407
6408
6409
6410
6411
6412
6413
6414
6415
6416
6417
6418
6419
6420
6421
6422
6423
6424
6425
6426
6427
6428
6429
6430
6431
6432
6433
6434
6435
6436
6437
6438
6439
6440
6441
6442
6443
6444
6445
6446
6447
6448
6449
6450
6451
6452
6453
6454
6455
6456
6457
6458
6459
6460
6461
6462
6463
6464
6465
6466
6467
6468
6469
6470
6471
6472
6473
6474
6475
6476
6477
6478
6479
6480
6481
6482
6483
6484
6485
6486
6487
6488
6489
6490
6491
6492
6493
6494
6495
6496
6497
6498
6499
6500
6501
6502
6503
6504
6505
6506
6507
6508
6509
6510
6511
6512
6513
6514
6515
6516
6517
6518
6519
6520
6521
6522
6523
6524
6525
6526
6527
6528
6529
6530
6531
6532
6533
6534
6535
6536
6537
6538
6539
6540
6541
6542
6543
6544
6545
6546
6547
6548
6549
6550
6551
6552
6553
6554
6555
6556
6557
6558
6559
6560
6561
6562
6563
6564
6565
6566
6567
6568
6569
6570
6571
6572
6573
6574
6575
6576
6577
6578
6579
6580
6581
6582
6583
6584
6585
6586
6587
6588
6589
6590
6591
6592
6593
6594
6595
6596
6597
6598
6599
6600
6601
6602
6603
6604
6605
6606
6607
6608
6609
6610
6611
6612
6613
6614
6615
6616
6617
6618
6619
6620
6621
6622
6623
6624
6625
6626
6627
6628
6629
6630
6631
6632
6633
6634
6635
6636
6637
6638
6639
6640
6641
6642
6643
6644
6645
6646
6647
6648
6649
6650
6651
6652
6653
6654
6655
6656
6657
6658
6659
6660
6661
6662
6663
6664
6665
6666
6667
6668
6669
6670
6671
6672
6673
6674
6675
6676
6677
6678
6679
6680
6681
6682
6683
6684
6685
6686
6687
6688
6689
6690
6691
6692
6693
6694
6695
6696
6697
6698
6699
6700
6701
6702
6703
6704
6705
6706
6707
6708
6709
6710
6711
6712
6713
6714
6715
6716
6717
6718
6719
6720
6721
6722
6723
6724
6725
6726
6727
6728
6729
6730
6731
6732
6733
6734
6735
6736
6737
6738
6739
6740
6741
6742
6743
6744
6745
6746
6747
6748
6749
6750
6751
6752
6753
6754
6755
6756
6757
6758
6759
6760
6761
6762
6763
6764
6765
6766
6767
6768
6769
6770
6771
6772
6773
6774
6775
6776
6777
6778
6779
6780
6781
6782
6783
6784
6785
6786
6787
6788
6789
6790
6791
6792
6793
6794
6795
6796
6797
6798
6799
6800
6801
6802
6803
6804
6805
6806
6807
6808
6809
6810
6811
6812
6813
6814
6815
6816
6817
6818
6819
6820
6821
6822
6823
6824
6825
6826
6827
6828
6829
6830
6831
6832
6833
6834
6835
6836
6837
6838
6839
6840
6841
6842
6843
6844
6845
6846
6847
6848
6849
6850
6851
6852
6853
6854
6855
6856
6857
6858
6859
6860
6861
6862
6863
6864
6865
6866
6867
6868
6869
6870
6871
6872
6873
6874
6875
6876
6877
6878
6879
6880
6881
6882
6883
6884
6885
6886
6887
6888
6889
6890
6891
6892
6893
6894
6895
6896
6897
6898
6899
6900
6901
6902
6903
6904
6905
6906
6907
6908
6909
6910
6911
6912
6913
6914
6915
6916
6917
6918
6919
6920
6921
6922
6923
6924
6925
6926
6927
6928
6929
6930
6931
6932
6933
6934
6935
6936
6937
6938
6939
6940
6941
6942
6943
6944
6945
6946
6947
6948
6949
6950
6951
6952
6953
6954
6955
6956
6957
6958
6959
6960
6961
6962
6963
6964
6965
6966
6967
6968
6969
6970
6971
6972
6973
6974
6975
6976
6977
6978
6979
6980
6981
6982
6983
6984
6985
6986
6987
6988
6989
6990
6991
6992
6993
6994
6995
6996
6997
6998
6999
7000
7001
7002
7003
7004
7005
7006
7007
7008
7009
7010
7011
7012
7013
7014
7015
7016
7017
7018
7019
7020
7021
7022
7023
7024
7025
7026
7027
7028
7029
7030
7031
7032
7033
7034
7035
7036
7037
7038
7039
7040
7041
7042
7043
7044
7045
7046
7047
7048
7049
7050
7051
7052
7053
7054
7055
7056
7057
7058
7059
7060
7061
7062
7063
7064
7065
7066
7067
7068
7069
7070
7071
7072
7073
7074
7075
7076
7077
7078
7079
7080
7081
7082
7083
7084
7085
7086
7087
7088
7089
7090
7091
7092
7093
7094
7095
7096
7097
7098
7099
7100
7101
7102
7103
7104
7105
7106
7107
7108
7109
7110
7111
7112
7113
7114
7115
7116
7117
7118
7119
7120
7121
7122
7123
7124
7125
7126
7127
7128
7129
7130
7131
7132
7133
7134
7135
7136
7137
7138
7139
7140
7141
7142
7143
7144
7145
7146
7147
7148
7149
7150
7151
7152
7153
7154
7155
7156
7157
7158
7159
7160
7161
7162
7163
7164
7165
7166
7167
7168
7169
7170
7171
7172
7173
7174
7175
7176
7177
7178
7179
7180
7181
7182
7183
7184
7185
7186
7187
7188
7189
7190
7191
7192
7193
7194
7195
7196
7197
7198
7199
7200
7201
7202
7203
7204
7205
7206
7207
7208
7209
7210
7211
7212
7213
7214
7215
7216
7217
7218
7219
7220
7221
7222
7223
7224
7225
7226
7227
7228
7229
7230
7231
7232
7233
7234
7235
7236
7237
7238
7239
7240
7241
7242
7243
7244
7245
7246
7247
7248
7249
7250
7251
7252
7253
7254
7255
7256
7257
7258
7259
7260
7261
7262
7263
7264
7265
7266
7267
7268
7269
7270
7271
7272
7273
7274
7275
7276
7277
7278
7279
7280
7281
7282
7283
7284
7285
7286
7287
7288
7289
7290
7291
7292
7293
7294
7295
7296
7297
7298
7299
7300
7301
7302
7303
7304
7305
7306
7307
7308
7309
7310
7311
7312
7313
7314
7315
7316
7317
7318
7319
7320
7321
7322
7323
7324
7325
7326
7327
7328
7329
7330
7331
7332
7333
7334
7335
7336
7337
7338
7339
7340
7341
7342
7343
7344
7345
7346
7347
7348
7349
7350
7351
7352
7353
7354
7355
7356
7357
7358
7359
7360
7361
7362
7363
7364
7365
7366
7367
7368
7369
7370
7371
7372
7373
7374
7375
7376
7377
7378
7379
7380
7381
7382
7383
7384
7385
7386
7387
7388
7389
7390
7391
7392
7393
7394
7395
7396
7397
7398
7399
7400
7401
7402
7403
7404
7405
7406
7407
7408
7409
7410
7411
7412
7413
7414
7415
7416
7417
7418
7419
7420
7421
7422
7423
7424
7425
7426
7427
7428
7429
7430
7431
7432
7433
7434
7435
7436
7437
7438
7439
7440
7441
7442
7443
7444
7445
7446
7447
7448
7449
7450
7451
7452
7453
7454
7455
7456
7457
7458
7459
7460
7461
7462
7463
7464
7465
7466
7467
7468
7469
7470
7471
7472
7473
7474
7475
7476
7477
7478
7479
7480
7481
7482
7483
7484
7485
7486
7487
7488
7489
7490
7491
7492
7493
7494
7495
7496
7497
7498
7499
7500
7501
7502
7503
7504
7505
7506
7507
7508
7509
7510
7511
7512
7513
7514
7515
7516
7517
7518
7519
7520
7521
7522
7523
7524
7525
7526
7527
7528
7529
7530
7531
7532
7533
7534
7535
7536
7537
7538
7539
7540
7541
7542
7543
7544
7545
7546
7547
7548
7549
7550
7551
7552
7553
7554
7555
7556
7557
7558
7559
7560
7561
7562
7563
7564
7565
7566
7567
7568
7569
7570
7571
7572
7573
7574
7575
7576
7577
7578
7579
7580
7581
7582
7583
7584
7585
7586
7587
7588
7589
7590
7591
7592
7593
7594
7595
7596
7597
7598
7599
7600
7601
7602
7603
7604
7605
7606
7607
7608
7609
7610
7611
7612
7613
7614
7615
7616
7617
7618
7619
7620
7621
7622
7623
7624
7625
7626
7627
7628
7629
7630
7631
7632
7633
7634
7635
7636
7637
7638
7639
7640
7641
7642
7643
7644
7645
7646
7647
7648
7649
7650
7651
7652
7653
7654
7655
7656
7657
7658
7659
7660
7661
7662
7663
7664
7665
7666
7667
7668
7669
7670
7671
7672
7673
7674
7675
7676
7677
7678
7679
7680
7681
7682
7683
7684
7685
7686
7687
7688
7689
7690
7691
7692
7693
7694
7695
7696
7697
7698
7699
7700
7701
7702
7703
7704
7705
7706
7707
7708
7709
7710
7711
7712
7713
7714
7715
7716
7717
7718
7719
7720
7721
7722
7723
7724
7725
7726
7727
7728
7729
7730
7731
7732
7733
7734
7735
7736
7737
7738
7739
7740
7741
7742
7743
7744
7745
7746
7747
7748
7749
7750
7751
7752
7753
7754
7755
7756
7757
7758
7759
7760
7761
7762
7763
7764
7765
7766
7767
7768
7769
7770
7771
7772
7773
7774
7775
7776
7777
7778
7779
7780
7781
7782
7783
7784
7785
7786
7787
7788
7789
7790
7791
7792
7793
7794
7795
7796
7797
7798
7799
7800
7801
7802
7803
7804
7805
7806
7807
7808
7809
7810
7811
7812
7813
7814
7815
7816
7817
7818
7819
7820
7821
7822
7823
7824
7825
7826
7827
7828
7829
7830
7831
7832
7833
7834
7835
7836
7837
7838
7839
7840
7841
7842
7843
7844
7845
7846
7847
7848
7849
7850
7851
7852
7853
7854
7855
7856
7857
7858
7859
7860
7861
7862
7863
7864
7865
7866
7867
7868
7869
7870
7871
7872
7873
7874
7875
7876
7877
7878
7879
7880
7881
7882
7883
7884
7885
7886
7887
7888
7889
7890
7891
7892
7893
7894
7895
7896
7897
7898
7899
7900
7901
7902
7903
7904
7905
7906
7907
7908
7909
7910
7911
7912
7913
7914
7915
7916
7917
7918
7919
7920
7921
7922
7923
7924
7925
7926
7927
7928
7929
7930
7931
7932
7933
7934
7935
7936
7937
7938
7939
7940
7941
7942
7943
7944
7945
7946
7947
7948
7949
7950
7951
7952
7953
7954
7955
7956
7957
7958
7959
7960
7961
7962
7963
7964
7965
7966
7967
7968
7969
7970
7971
7972
7973
7974
7975
7976
7977
7978
7979
7980
7981
7982
7983
7984
7985
7986
7987
7988
7989
7990
7991
7992
7993
7994
7995
7996
7997
7998
7999
8000
8001
8002
8003
8004
8005
8006
8007
8008
8009
8010
8011
8012
8013
8014
8015
8016
8017
8018
8019
8020
8021
8022
8023
8024
8025
8026
8027
8028
8029
8030
8031
8032
8033
8034
8035
8036
8037
8038
8039
8040
8041
8042
8043
8044
8045
8046
8047
8048
8049
8050
8051
8052
8053
8054
8055
8056
8057
8058
8059
8060
8061
8062
8063
8064
8065
8066
8067
8068
8069
8070
8071
8072
8073
8074
8075
8076
8077
8078
8079
8080
8081
8082
8083
8084
8085
8086
8087
8088
8089
8090
8091
8092
8093
8094
8095
8096
8097
8098
8099
8100
8101
8102
8103
8104
8105
8106
8107
8108
8109
8110
8111
8112
8113
8114
8115
8116
8117
8118
8119
8120
8121
8122
8123
8124
8125
8126
8127
8128
8129
8130
8131
8132
8133
8134
8135
8136
8137
8138
8139
8140
8141
8142
8143
8144
8145
8146
8147
8148
8149
8150
8151
8152
8153
8154
8155
8156
8157
8158
8159
8160
8161
8162
8163
8164
8165
8166
8167
8168
8169
8170
8171
8172
8173
8174
8175
8176
8177
8178
8179
8180
8181
8182
8183
8184
8185
8186
8187
8188
8189
8190
8191
8192
8193
8194
8195
8196
8197
8198
8199
8200
8201
8202
8203
8204
8205
8206
8207
8208
8209
8210
8211
8212
8213
8214
8215
8216
8217
8218
8219
8220
8221
8222
8223
8224
8225
8226
8227
8228
8229
8230
8231
8232
8233
8234
8235
8236
8237
8238
8239
8240
8241
8242
8243
8244
8245
8246
8247
8248
8249
8250
8251
8252
8253
8254
8255
8256
8257
8258
8259
8260
8261
8262
8263
8264
8265
8266
8267
8268
8269
8270
8271
8272
8273
8274
8275
8276
8277
8278
8279
8280
8281
8282
8283
8284
8285
8286
8287
8288
8289
8290
8291
8292
8293
8294
8295
8296
8297
8298
8299
8300
8301
8302
8303
8304
8305
8306
8307
8308
8309
8310
8311
8312
8313
8314
8315
8316
8317
8318
8319
8320
8321
8322
8323
8324
8325
8326
8327
8328
8329
8330
8331
8332
8333
8334
8335
8336
8337
8338
8339
8340
8341
8342
8343
8344
8345
8346
8347
8348
8349
8350
8351
8352
8353
8354
8355
8356
8357
8358
8359
8360
8361
8362
8363
8364
8365
8366
8367
8368
8369
8370
8371
8372
8373
8374
8375
8376
8377
8378
8379
8380
8381
8382
8383
8384
8385
8386
8387
8388
8389
8390
8391
8392
8393
8394
8395
8396
8397
8398
8399
8400
8401
8402
8403
8404
8405
8406
8407
8408
8409
8410
8411
8412
8413
8414
8415
8416
8417
8418
8419
8420
8421
8422
8423
8424
8425
8426
8427
8428
8429
8430
8431
8432
8433
8434
8435
8436
8437
8438
8439
8440
8441
8442
8443
8444
8445
8446
8447
8448
8449
8450
8451
8452
8453
8454
8455
8456
8457
8458
8459
8460
8461
8462
8463
8464
8465
8466
8467
8468
8469
8470
8471
8472
8473
8474
8475
8476
8477
8478
8479
8480
8481
8482
8483
8484
8485
8486
8487
8488
8489
8490
8491
8492
8493
8494
8495
8496
8497
8498
8499
8500
8501
8502
8503
8504
8505
8506
8507
8508
8509
8510
8511
8512
8513
8514
8515
8516
8517
8518
8519
8520
8521
8522
8523
8524
8525
8526
8527
8528
8529
8530
8531
8532
8533
8534
8535
8536
8537
8538
8539
8540
8541
8542
8543
8544
8545
8546
8547
8548
8549
8550
8551
8552
8553
8554
8555
8556
8557
8558
8559
8560
8561
8562
8563
8564
8565
8566
8567
8568
8569
8570
8571
8572
8573
8574
8575
8576
8577
8578
8579
8580
8581
8582
8583
8584
8585
8586
8587
8588
8589
8590
8591
8592
8593
8594
8595
8596
8597
8598
8599
8600
8601
8602
8603
8604
8605
8606
8607
8608
8609
8610
8611
8612
8613
8614
8615
8616
8617
8618
8619
8620
8621
8622
8623
8624
8625
8626
8627
8628
8629
8630
8631
8632
8633
8634
8635
8636
8637
8638
8639
8640
8641
8642
8643
8644
8645
8646
8647
8648
8649
8650
8651
8652
8653
8654
8655
8656
8657
8658
8659
8660
8661
8662
8663
8664
8665
8666
8667
8668
8669
8670
8671
8672
8673
8674
8675
8676
8677
8678
8679
8680
8681
8682
8683
8684
8685
8686
8687
8688
8689
8690
8691
8692
8693
8694
8695
8696
8697
8698
8699
8700
8701
8702
8703
8704
8705
8706
8707
8708
8709
8710
8711
8712
8713
8714
8715
8716
8717
8718
8719
8720
8721
8722
8723
8724
8725
8726
8727
8728
8729
8730
8731
8732
8733
8734
8735
8736
8737
8738
8739
8740
8741
8742
8743
8744
8745
8746
8747
8748
8749
8750
8751
8752
8753
8754
8755
8756
8757
8758
8759
8760
8761
8762
8763
8764
8765
8766
8767
8768
8769
8770
8771
8772
8773
8774
8775
8776
8777
8778
8779
8780
8781
8782
8783
8784
8785
8786
8787
8788
8789
8790
8791
8792
8793
8794
8795
8796
8797
8798
8799
8800
8801
8802
8803
8804
8805
8806
8807
8808
8809
8810
8811
8812
8813
8814
8815
8816
8817
8818
8819
8820
8821
8822
8823
8824
8825
8826
8827
8828
8829
8830
8831
8832
8833
8834
8835
8836
8837
8838
8839
8840
8841
8842
8843
8844
8845
8846
8847
8848
8849
8850
8851
8852
8853
8854
8855
8856
8857
8858
8859
8860
8861
8862
8863
8864
8865
8866
8867
8868
8869
8870
8871
8872
8873
8874
8875
8876
8877
8878
8879
8880
8881
8882
8883
8884
8885
8886
8887
8888
8889
8890
8891
8892
8893
8894
8895
8896
8897
8898
8899
8900
8901
8902
8903
8904
8905
8906
8907
8908
8909
8910
8911
8912
8913
8914
8915
8916
8917
8918
8919
8920
8921
8922
8923
8924
8925
8926
8927
8928
8929
8930
8931
8932
8933
8934
8935
8936
8937
8938
8939
8940
8941
8942
8943
8944
8945
8946
8947
8948
8949
8950
8951
8952
8953
8954
8955
8956
8957
8958
8959
8960
8961
8962
8963
8964
8965
8966
8967
8968
8969
8970
8971
8972
8973
8974
8975
8976
8977
8978
8979
8980
8981
8982
8983
8984
8985
8986
8987
8988
8989
8990
8991
8992
8993
8994
8995
8996
8997
8998
8999
9000
9001
9002
9003
9004
9005
9006
9007
9008
9009
9010
9011
9012
9013
9014
9015
9016
9017
9018
9019
9020
9021
9022
9023
9024
9025
9026
9027
9028
9029
9030
9031
9032
9033
9034
9035
9036
9037
9038
9039
9040
9041
9042
9043
9044
9045
9046
9047
9048
9049
9050
9051
9052
9053
9054
9055
9056
9057
9058
9059
9060
9061
9062
9063
9064
9065
9066
9067
9068
9069
9070
9071
9072
9073
9074
9075
9076
9077
9078
9079
9080
9081
9082
9083
9084
9085
9086
9087
9088
9089
9090
9091
9092
9093
9094
9095
9096
9097
9098
9099
9100
9101
9102
9103
9104
9105
9106
9107
9108
9109
9110
9111
9112
9113
9114
9115
9116
9117
9118
9119
9120
9121
9122
9123
9124
9125
9126
9127
9128
9129
9130
9131
9132
9133
9134
9135
9136
9137
9138
9139
9140
9141
9142
9143
9144
9145
9146
9147
9148
9149
9150
9151
9152
9153
9154
9155
9156
9157
9158
9159
9160
9161
9162
9163
9164
9165
9166
9167
9168
9169
9170
9171
9172
9173
9174
9175
9176
9177
9178
9179
9180
9181
9182
9183
9184
9185
9186
9187
9188
9189
9190
9191
9192
9193
9194
9195
9196
9197
9198
9199
9200
9201
9202
9203
9204
9205
9206
9207
9208
9209
9210
9211
9212
9213
9214
9215
9216
9217
9218
9219
9220
9221
9222
9223
9224
9225
9226
9227
9228
9229
9230
9231
9232
9233
9234
9235
9236
9237
9238
9239
9240
9241
9242
9243
9244
9245
9246
9247
9248
9249
9250
9251
9252
9253
9254
9255
9256
9257
9258
9259
9260
9261
9262
9263
9264
9265
9266
9267
9268
9269
9270
9271
9272
9273
9274
9275
9276
9277
9278
9279
9280
9281
9282
9283
9284
9285
9286
9287
9288
9289
9290
9291
9292
9293
9294
9295
9296
9297
9298
9299
9300
9301
9302
9303
9304
9305
9306
9307
9308
9309
9310
9311
9312
9313
9314
9315
9316
9317
9318
9319
9320
9321
9322
9323
9324
9325
9326
9327
9328
9329
9330
9331
9332
9333
9334
9335
9336
9337
9338
9339
9340
9341
9342
9343
9344
9345
9346
9347
9348
9349
9350
9351
9352
9353
9354
9355
9356
9357
9358
9359
9360
9361
9362
9363
9364
9365
9366
9367
9368
9369
9370
9371
9372
9373
9374
9375
9376
9377
9378
9379
9380
9381
9382
9383
9384
9385
9386
9387
9388
9389
9390
9391
9392
9393
9394
9395
9396
9397
9398
9399
9400
9401
9402
9403
9404
9405
9406
9407
9408
9409
9410
9411
9412
9413
9414
9415
9416
9417
9418
9419
9420
9421
9422
9423
9424
9425
9426
9427
9428
9429
9430
9431
9432
9433
9434
9435
9436
9437
9438
9439
9440
9441
9442
9443
9444
9445
9446
9447
9448
9449
9450
9451
9452
9453
9454
9455
9456
9457
9458
9459
9460
9461
9462
9463
9464
9465
9466
9467
9468
9469
9470
9471
9472
9473
9474
9475
9476
9477
9478
9479
9480
9481
9482
9483
9484
9485
9486
9487
9488
9489
9490
9491
9492
9493
9494
9495
9496
9497
9498
9499
9500
9501
9502
9503
9504
9505
9506
9507
9508
9509
9510
9511
9512
9513
9514
9515
9516
9517
9518
9519
9520
9521
9522
9523
9524
9525
9526
9527
9528
9529
9530
9531
9532
9533
9534
9535
9536
9537
9538
9539
9540
9541
9542
9543
9544
9545
9546
9547
9548
9549
9550
9551
9552
9553
9554
9555
9556
9557
9558
9559
9560
9561
9562
9563
9564
9565
9566
9567
9568
9569
9570
9571
9572
9573
9574
9575
9576
9577
9578
9579
9580
9581
9582
9583
9584
9585
9586
9587
9588
9589
9590
9591
9592
9593
9594
9595
9596
9597
9598
9599
9600
9601
9602
9603
9604
9605
9606
9607
9608
9609
9610
9611
9612
9613
9614
9615
9616
9617
9618
9619
9620
9621
9622
9623
9624
9625
9626
9627
9628
9629
9630
9631
9632
9633
9634
9635
9636
9637
9638
9639
9640
9641
9642
9643
9644
9645
9646
9647
9648
9649
9650
9651
9652
9653
9654
9655
9656
9657
9658
9659
9660
9661
9662
9663
9664
9665
9666
9667
9668
9669
9670
9671
9672
9673
9674
9675
9676
9677
9678
9679
9680
9681
9682
9683
9684
9685
9686
9687
9688
9689
9690
9691
9692
9693
9694
9695
9696
9697
9698
9699
9700
9701
9702
9703
9704
9705
9706
9707
9708
9709
9710
9711
9712
9713
9714
9715
9716
9717
9718
9719
9720
9721
9722
9723
9724
9725
9726
9727
9728
9729
9730
9731
9732
9733
9734
9735
9736
9737
9738
9739
9740
9741
9742
9743
9744
9745
9746
9747
9748
9749
9750
9751
9752
9753
9754
9755
9756
9757
9758
9759
9760
9761
9762
9763
9764
9765
9766
9767
9768
9769
9770
9771
9772
9773
9774
9775
9776
9777
9778
9779
9780
9781
9782
9783
9784
9785
9786
9787
9788
9789
9790
9791
9792
9793
9794
9795
9796
9797
9798
9799
9800
9801
9802
9803
9804
9805
9806
9807
9808
9809
9810
9811
9812
9813
9814
9815
9816
9817
9818
9819
9820
9821
9822
9823
9824
9825
9826
9827
9828
9829
9830
9831
9832
9833
9834
9835
9836
9837
9838
9839
9840
9841
9842
9843
9844
9845
9846
9847
9848
9849
9850
9851
9852
9853
9854
9855
9856
9857
9858
9859
9860
9861
9862
9863
9864
9865
9866
9867
9868
9869
9870
9871
9872
9873
9874
9875
9876
9877
9878
9879
9880
9881
9882
9883
9884
9885
9886
9887
9888
9889
9890
9891
9892
9893
9894
9895
9896
9897
9898
9899
9900
9901
9902
9903
9904
9905
9906
9907
9908
9909
9910
9911
9912
9913
9914
9915
9916
9917
9918
9919
9920
9921
9922
9923
9924
9925
9926
9927
9928
9929
9930
9931
9932
9933
9934
9935
9936
9937
9938
9939
9940
9941
9942
9943
9944
9945
9946
9947
9948
9949
9950
9951
9952
9953
9954
9955
9956
9957
9958
9959
9960
9961
9962
9963
9964
9965
9966
9967
9968
9969
9970
9971
9972
9973
9974
9975
9976
9977
9978
9979
9980
9981
9982
9983
9984
9985
9986
9987
9988
9989
9990
9991
9992
9993
9994
9995
9996
9997
9998
9999
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010
10011
10012
10013
10014
10015
10016
10017
10018
10019
10020
10021
10022
10023
10024
10025
10026
10027
10028
10029
10030
10031
10032
10033
10034
10035
10036
10037
10038
10039
10040
10041
10042
10043
10044
10045
10046
10047
10048
10049
10050
10051
10052
10053
10054
10055
10056
10057
10058
10059
10060
10061
10062
10063
10064
10065
10066
10067
10068
10069
10070
10071
10072
10073
10074
10075
10076
10077
10078
10079
10080
10081
10082
10083
10084
10085
10086
10087
10088
10089
10090
10091
10092
10093
10094
10095
10096
10097
10098
10099
10100
10101
10102
10103
10104
10105
10106
10107
10108
10109
10110
10111
10112
10113
10114
10115
10116
10117
10118
10119
10120
10121
10122
10123
10124
10125
10126
10127
10128
10129
10130
10131
10132
10133
10134
10135
10136
10137
10138
10139
10140
10141
10142
10143
10144
10145
10146
10147
10148
10149
10150
10151
10152
10153
10154
10155
10156
10157
10158
10159
10160
10161
10162
10163
10164
10165
10166
10167
10168
10169
10170
10171
10172
10173
10174
10175
10176
10177
10178
10179
10180
10181
10182
10183
10184
10185
10186
10187
10188
10189
10190
10191
10192
10193
10194
10195
10196
10197
10198
10199
10200
10201
10202
10203
10204
10205
10206
10207
10208
10209
10210
10211
10212
10213
10214
10215
10216
10217
10218
10219
10220
10221
10222
10223
10224
10225
10226
10227
10228
10229
10230
10231
10232
10233
10234
10235
10236
10237
10238
10239
10240
10241
10242
10243
10244
10245
10246
10247
10248
10249
10250
10251
10252
10253
10254
10255
10256
10257
10258
10259
10260
10261
10262
10263
10264
10265
10266
10267
10268
10269
10270
10271
10272
10273
10274
10275
10276
10277
10278
10279
10280
10281
10282
10283
10284
10285
10286
10287
10288
10289
10290
10291
10292
10293
10294
10295
10296
10297
10298
10299
10300
10301
10302
10303
10304
10305
10306
10307
10308
10309
10310
10311
10312
10313
10314
10315
10316
10317
10318
10319
10320
10321
10322
10323
10324
10325
10326
10327
10328
10329
10330
10331
10332
10333
10334
10335
10336
10337
10338
10339
10340
10341
10342
10343
10344
10345
10346
10347
10348
10349
10350
10351
10352
10353
10354
10355
10356
10357
10358
10359
10360
10361
10362
10363
10364
10365
10366
10367
10368
10369
10370
10371
10372
10373
10374
10375
10376
10377
10378
10379
10380
10381
10382
10383
10384
10385
10386
10387
10388
10389
10390
10391
10392
10393
10394
10395
10396
10397
10398
10399
10400
10401
10402
10403
10404
10405
10406
10407
10408
10409
10410
10411
10412
10413
10414
10415
10416
10417
10418
10419
10420
10421
10422
10423
10424
10425
10426
10427
10428
10429
10430
10431
10432
10433
10434
10435
10436
10437
10438
10439
10440
10441
10442
10443
10444
10445
10446
10447
10448
10449
10450
10451
10452
10453
10454
10455
10456
10457
10458
10459
10460
10461
10462
10463
10464
10465
10466
10467
10468
10469
10470
10471
10472
10473
10474
10475
10476
10477
10478
10479
10480
10481
10482
10483
10484
10485
10486
10487
10488
10489
10490
10491
10492
10493
10494
10495
10496
10497
10498
10499
10500
10501
10502
10503
10504
10505
10506
10507
10508
10509
10510
10511
10512
10513
10514
10515
10516
10517
10518
10519
10520
10521
10522
10523
10524
10525
10526
10527
10528
10529
10530
10531
10532
10533
10534
10535
10536
10537
10538
10539
10540
10541
10542
10543
10544
10545
10546
10547
10548
10549
10550
10551
10552
10553
10554
10555
10556
10557
10558
10559
10560
10561
10562
10563
10564
10565
10566
10567
10568
10569
10570
10571
10572
10573
10574
10575
10576
10577
10578
10579
10580
10581
10582
10583
10584
10585
10586
10587
10588
10589
10590
10591
10592
10593
10594
10595
10596
10597
10598
10599
10600
10601
10602
10603
10604
10605
10606
10607
10608
10609
10610
10611
10612
10613
10614
10615
10616
10617
10618
10619
10620
10621
10622
10623
10624
10625
10626
10627
10628
10629
10630
10631
10632
10633
10634
10635
10636
10637
10638
10639
10640
10641
10642
10643
10644
10645
10646
10647
10648
10649
10650
10651
10652
10653
10654
10655
10656
10657
10658
10659
10660
10661
10662
10663
10664
10665
10666
10667
10668
10669
10670
10671
10672
10673
10674
10675
10676
10677
10678
10679
10680
10681
10682
10683
10684
10685
10686
10687
10688
10689
10690
10691
10692
10693
10694
10695
10696
10697
10698
10699
10700
10701
10702
10703
10704
10705
10706
10707
10708
10709
10710
10711
10712
10713
10714
10715
10716
10717
10718
10719
10720
10721
10722
10723
10724
10725
10726
10727
10728
10729
10730
10731
10732
10733
10734
10735
10736
10737
10738
10739
10740
10741
10742
10743
10744
10745
10746
10747
10748
10749
10750
10751
10752
10753
10754
10755
10756
10757
10758
10759
10760
10761
10762
10763
10764
10765
10766
10767
10768
10769
10770
10771
10772
10773
10774
10775
10776
10777
10778
10779
10780
10781
10782
10783
10784
10785
10786
10787
10788
10789
10790
10791
10792
10793
10794
10795
10796
10797
10798
10799
10800
10801
10802
10803
10804
10805
10806
10807
10808
10809
10810
10811
10812
10813
10814
10815
10816
10817
10818
10819
10820
10821
10822
10823
10824
10825
10826
10827
10828
10829
10830
10831
10832
10833
10834
10835
10836
10837
10838
10839
10840
10841
10842
10843
10844
10845
10846
10847
10848
10849
10850
10851
10852
10853
10854
10855
10856
10857
10858
10859
10860
10861
10862
10863
10864
10865
10866
10867
10868
10869
10870
10871
10872
10873
10874
10875
10876
10877
10878
10879
10880
10881
10882
10883
10884
10885
10886
10887
10888
10889
10890
10891
10892
10893
10894
10895
10896
10897
10898
10899
10900
10901
10902
10903
10904
10905
10906
10907
10908
10909
10910
10911
10912
10913
10914
10915
10916
10917
10918
10919
10920
10921
10922
10923
10924
10925
10926
10927
10928
10929
10930
10931
10932
10933
10934
10935
10936
10937
10938
10939
10940
10941
10942
10943
10944
10945
10946
10947
10948
10949
10950
10951
10952
10953
10954
10955
10956
10957
10958
10959
10960
10961
10962
10963
10964
10965
10966
10967
10968
10969
10970
10971
10972
10973
10974
10975
10976
10977
10978
10979
10980
10981
10982
10983
10984
10985
10986
10987
10988
10989
10990
10991
10992
10993
10994
10995
10996
10997
10998
10999
11000
11001
11002
11003
11004
11005
11006
11007
11008
11009
11010
11011
11012
11013
11014
11015
11016
11017
11018
11019
11020
11021
11022
11023
11024
11025
11026
11027
11028
11029
11030
11031
11032
11033
11034
11035
11036
11037
11038
11039
11040
11041
11042
11043
11044
11045
11046
11047
11048
11049
11050
11051
11052
11053
11054
11055
11056
11057
11058
11059
11060
11061
11062
11063
11064
11065
11066
11067
11068
11069
11070
11071
11072
11073
11074
11075
11076
11077
11078
11079
11080
11081
11082
11083
11084
11085
11086
11087
11088
11089
11090
11091
11092
11093
11094
11095
11096
11097
11098
11099
11100
11101
11102
11103
11104
11105
11106
11107
11108
11109
11110
11111
11112
11113
11114
11115
11116
11117
11118
11119
11120
11121
11122
11123
11124
11125
11126
11127
11128
11129
11130
11131
11132
11133
11134
11135
11136
11137
11138
11139
11140
11141
11142
11143
11144
11145
11146
11147
11148
11149
11150
11151
11152
11153
11154
11155
11156
11157
11158
11159
11160
11161
11162
11163
11164
11165
11166
11167
11168
11169
11170
11171
11172
11173
11174
11175
11176
11177
11178
11179
11180
11181
11182
11183
11184
11185
11186
11187
11188
11189
11190
11191
11192
11193
11194
11195
11196
11197
11198
11199
11200
11201
11202
11203
11204
11205
11206
11207
11208
11209
11210
11211
11212
11213
11214
11215
11216
11217
11218
11219
11220
11221
11222
11223
11224
11225
11226
11227
11228
11229
11230
11231
11232
11233
11234
11235
11236
11237
11238
11239
11240
11241
11242
11243
11244
11245
11246
11247
11248
11249
11250
11251
11252
11253
11254
11255
11256
11257
11258
11259
11260
11261
11262
11263
11264
11265
11266
11267
11268
11269
11270
11271
11272
11273
11274
11275
11276
11277
11278
11279
11280
11281
11282
11283
11284
11285
11286
11287
11288
11289
11290
11291
11292
11293
11294
11295
11296
11297
11298
11299
11300
11301
11302
11303
11304
11305
11306
11307
11308
11309
11310
11311
11312
11313
11314
11315
11316
11317
11318
11319
11320
11321
11322
11323
11324
11325
11326
11327
11328
11329
11330
11331
11332
11333
11334
11335
11336
11337
11338
11339
11340
11341
11342
11343
11344
11345
11346
11347
11348
11349
11350
11351
11352
11353
11354
11355
11356
11357
11358
11359
11360
11361
11362
11363
11364
11365
11366
11367
11368
11369
11370
11371
11372
11373
11374
11375
11376
11377
11378
11379
11380
11381
11382
11383
11384
11385
11386
11387
11388
11389
11390
11391
11392
11393
11394
11395
11396
11397
11398
11399
11400
11401
11402
11403
11404
11405
11406
11407
11408
11409
11410
11411
11412
11413
11414
11415
11416
11417
11418
11419
11420
11421
11422
11423
11424
11425
11426
11427
11428
11429
11430
11431
11432
11433
11434
11435
11436
11437
11438
11439
11440
11441
11442
11443
11444
11445
11446
11447
11448
11449
11450
11451
11452
11453
11454
11455
11456
11457
11458
11459
11460
11461
11462
11463
11464
11465
11466
11467
11468
11469
11470
11471
11472
11473
11474
11475
11476
11477
11478
11479
11480
11481
11482
11483
11484
11485
11486
11487
11488
11489
11490
11491
11492
11493
11494
11495
11496
11497
11498
11499
11500
11501
11502
11503
11504
11505
11506
11507
11508
11509
11510
11511
11512
11513
11514
11515
11516
11517
11518
11519
11520
11521
11522
11523
11524
11525
11526
11527
11528
11529
11530
11531
11532
11533
11534
11535
11536
11537
11538
11539
11540
11541
11542
11543
11544
11545
11546
11547
11548
11549
11550
11551
11552
11553
11554
11555
11556
11557
11558
11559
11560
11561
11562
11563
11564
11565
11566
11567
11568
11569
11570
11571
11572
11573
11574
11575
11576
11577
11578
11579
11580
11581
11582
11583
11584
11585
11586
11587
11588
11589
11590
11591
11592
11593
11594
11595
11596
11597
11598
11599
11600
11601
11602
11603
11604
11605
11606
11607
11608
11609
11610
11611
11612
11613
11614
11615
11616
11617
11618
11619
11620
11621
11622
11623
11624
11625
11626
11627
11628
11629
11630
11631
11632
11633
11634
11635
11636
11637
11638
11639
11640
11641
11642
11643
11644
11645
11646
11647
11648
11649
11650
11651
11652
11653
11654
11655
11656
11657
11658
11659
11660
11661
11662
11663
11664
11665
11666
11667
11668
11669
11670
11671
11672
11673
11674
11675
11676
11677
11678
11679
11680
11681
11682
11683
11684
11685
11686
11687
11688
11689
11690
11691
11692
11693
11694
11695
11696
11697
11698
11699
11700
11701
11702
11703
11704
11705
11706
11707
11708
11709
11710
11711
11712
11713
11714
11715
11716
11717
11718
11719
11720
11721
11722
11723
11724
11725
11726
11727
11728
11729
11730
11731
11732
11733
11734
11735
11736
11737
11738
11739
11740
11741
11742
11743
11744
11745
11746
11747
11748
11749
11750
11751
11752
11753
11754
11755
11756
11757
11758
11759
11760
11761
11762
11763
11764
11765
11766
11767
11768
11769
11770
11771
11772
11773
11774
11775
11776
11777
11778
11779
11780
11781
11782
11783
11784
11785
11786
11787
11788
11789
11790
11791
11792
11793
11794
11795
11796
11797
11798
11799
11800
11801
11802
11803
11804
11805
11806
11807
11808
11809
11810
11811
11812
11813
11814
11815
11816
11817
11818
11819
11820
11821
11822
11823
11824
11825
11826
11827
11828
11829
11830
11831
11832
11833
11834
11835
11836
11837
11838
11839
11840
11841
11842
11843
11844
11845
11846
11847
11848
11849
11850
11851
11852
11853
11854
11855
11856
11857
11858
11859
11860
11861
11862
11863
11864
11865
11866
11867
11868
11869
11870
11871
11872
11873
11874
11875
11876
11877
11878
11879
11880
11881
11882
11883
11884
11885
11886
11887
11888
11889
11890
11891
11892
11893
11894
11895
11896
11897
11898
11899
11900
11901
11902
11903
11904
11905
11906
11907
11908
11909
11910
11911
11912
11913
11914
11915
11916
11917
11918
11919
11920
11921
11922
11923
11924
11925
11926
11927
11928
11929
11930
11931
11932
11933
11934
11935
11936
11937
11938
11939
11940
11941
11942
11943
11944
11945
11946
11947
11948
11949
11950
11951
11952
11953
11954
11955
11956
11957
11958
11959
11960
11961
11962
11963
11964
11965
11966
11967
11968
11969
11970
11971
11972
11973
11974
11975
11976
11977
11978
11979
11980
11981
11982
11983
11984
11985
11986
11987
11988
11989
11990
11991
11992
11993
11994
11995
11996
11997
11998
11999
12000
12001
12002
12003
12004
12005
12006
12007
12008
12009
12010
12011
12012
12013
12014
12015
12016
12017
12018
12019
12020
12021
12022
12023
12024
12025
12026
12027
12028
12029
12030
12031
12032
12033
12034
12035
12036
12037
12038
12039
12040
12041
12042
12043
12044
12045
12046
12047
12048
12049
12050
12051
12052
12053
12054
12055
12056
12057
12058
12059
12060
12061
12062
12063
12064
12065
12066
12067
12068
12069
12070
12071
12072
12073
12074
12075
12076
12077
12078
12079
12080
12081
12082
12083
12084
12085
12086
12087
12088
12089
12090
12091
12092
12093
12094
12095
12096
12097
12098
12099
12100
12101
12102
12103
12104
12105
12106
12107
12108
12109
12110
12111
12112
12113
12114
12115
12116
12117
12118
12119
12120
12121
12122
12123
12124
12125
12126
12127
12128
12129
12130
12131
12132
12133
12134
12135
12136
12137
12138
12139
12140
12141
12142
12143
12144
12145
12146
12147
12148
12149
12150
12151
12152
12153
12154
12155
12156
12157
12158
12159
12160
12161
12162
12163
12164
12165
12166
12167
12168
12169
12170
12171
12172
12173
12174
12175
12176
12177
12178
12179
12180
12181
12182
12183
12184
12185
12186
12187
12188
12189
12190
12191
12192
12193
12194
12195
12196
12197
12198
12199
12200
12201
12202
12203
12204
12205
12206
12207
12208
12209
12210
12211
12212
12213
12214
12215
12216
12217
12218
12219
12220
12221
12222
12223
12224
12225
12226
12227
12228
12229
12230
12231
12232
12233
12234
12235
12236
12237
12238
12239
12240
12241
12242
12243
12244
12245
12246
12247
12248
12249
12250
12251
12252
12253
12254
12255
12256
12257
12258
12259
12260
12261
12262
12263
12264
12265
12266
12267
12268
12269
12270
12271
12272
12273
12274
12275
12276
12277
12278
12279
12280
12281
12282
12283
12284
12285
12286
12287
12288
12289
12290
12291
12292
12293
12294
12295
12296
12297
12298
12299
12300
12301
12302
12303
12304
12305
12306
12307
12308
12309
12310
12311
12312
12313
12314
12315
12316
12317
12318
12319
12320
12321
12322
12323
12324
12325
12326
12327
12328
12329
12330
12331
12332
12333
12334
12335
12336
12337
12338
12339
12340
12341
12342
12343
12344
12345
12346
12347
12348
12349
12350
12351
12352
12353
12354
12355
12356
12357
12358
12359
12360
12361
12362
12363
12364
12365
12366
12367
12368
12369
12370
12371
12372
12373
12374
12375
12376
12377
12378
12379
12380
12381
12382
12383
12384
12385
12386
12387
12388
12389
12390
12391
12392
12393
12394
12395
12396
12397
12398
12399
12400
12401
12402
12403
12404
12405
12406
12407
12408
12409
12410
12411
12412
12413
12414
12415
12416
12417
12418
12419
12420
12421
12422
12423
12424
12425
12426
12427
12428
12429
12430
12431
12432
12433
12434
12435
12436
12437
12438
12439
12440
12441
12442
12443
12444
12445
12446
12447
12448
12449
12450
12451
12452
12453
12454
12455
12456
12457
12458
12459
12460
12461
12462
12463
12464
12465
12466
12467
12468
12469
12470
12471
12472
12473
12474
12475
12476
12477
12478
12479
12480
12481
12482
12483
12484
12485
12486
12487
12488
12489
12490
12491
12492
12493
12494
12495
12496
12497
12498
12499
12500
12501
12502
12503
12504
12505
12506
12507
12508
12509
12510
12511
12512
12513
12514
12515
12516
12517
12518
12519
12520
12521
12522
12523
12524
12525
12526
12527
12528
12529
12530
12531
12532
12533
12534
12535
12536
12537
12538
12539
12540
12541
12542
12543
12544
12545
12546
12547
12548
12549
12550
12551
12552
12553
12554
12555
12556
12557
12558
12559
12560
12561
12562
12563
12564
12565
12566
12567
12568
12569
12570
12571
12572
12573
12574
12575
12576
12577
12578
12579
12580
12581
12582
12583
12584
12585
12586
12587
12588
12589
12590
12591
12592
12593
12594
12595
12596
12597
12598
12599
12600
12601
12602
12603
12604
12605
12606
12607
12608
12609
12610
12611
12612
12613
12614
12615
12616
12617
12618
12619
12620
12621
12622
12623
12624
12625
12626
12627
12628
12629
12630
12631
12632
12633
12634
12635
12636
12637
12638
12639
12640
12641
12642
12643
12644
12645
12646
12647
12648
12649
12650
12651
12652
12653
12654
12655
12656
12657
12658
12659
12660
12661
12662
12663
12664
12665
12666
12667
12668
12669
12670
12671
12672
12673
12674
12675
12676
12677
12678
12679
12680
12681
12682
12683
12684
12685
12686
12687
12688
12689
12690
12691
12692
12693
12694
12695
12696
12697
12698
12699
12700
12701
12702
12703
12704
12705
12706
12707
12708
12709
12710
12711
12712
12713
12714
12715
12716
12717
12718
12719
12720
12721
12722
12723
12724
12725
12726
12727
12728
12729
12730
12731
12732
12733
12734
12735
12736
12737
12738
12739
12740
12741
12742
12743
12744
12745
12746
12747
12748
12749
12750
12751
12752
12753
12754
12755
12756
12757
12758
12759
12760
12761
12762
12763
12764
12765
12766
12767
12768
12769
12770
12771
12772
12773
12774
12775
12776
12777
12778
12779
12780
12781
12782
12783
12784
12785
12786
12787
12788
12789
12790
12791
12792
12793
12794
12795
12796
12797
12798
12799
12800
12801
12802
12803
12804
12805
12806
12807
12808
12809
12810
12811
12812
12813
12814
12815
12816
12817
12818
12819
12820
12821
12822
12823
12824
12825
12826
12827
12828
12829
12830
12831
12832
12833
12834
12835
12836
12837
12838
12839
12840
12841
12842
12843
12844
12845
12846
12847
12848
12849
12850
12851
12852
12853
12854
12855
12856
12857
12858
12859
12860
12861
12862
12863
12864
12865
12866
12867
12868
12869
12870
12871
12872
12873
12874
12875
12876
12877
12878
12879
12880
12881
12882
12883
12884
12885
12886
12887
12888
12889
12890
12891
12892
12893
12894
12895
12896
12897
12898
12899
12900
12901
12902
12903
12904
12905
12906
12907
12908
12909
12910
12911
12912
12913
12914
12915
12916
12917
12918
12919
12920
12921
12922
12923
12924
12925
12926
12927
12928
12929
12930
12931
12932
12933
12934
12935
12936
12937
12938
12939
12940
12941
12942
12943
12944
12945
12946
12947
12948
12949
12950
12951
12952
12953
12954
12955
12956
12957
12958
12959
12960
12961
12962
12963
12964
12965
12966
12967
12968
12969
12970
12971
12972
12973
12974
12975
12976
12977
12978
12979
12980
12981
12982
12983
12984
12985
12986
12987
12988
12989
12990
12991
12992
12993
12994
12995
12996
12997
12998
12999
13000
13001
13002
13003
13004
13005
13006
13007
13008
13009
13010
13011
13012
13013
13014
13015
13016
13017
13018
13019
13020
13021
13022
13023
13024
13025
13026
13027
13028
13029
13030
13031
13032
13033
13034
13035
13036
13037
13038
13039
13040
13041
13042
13043
13044
13045
13046
13047
13048
13049
13050
13051
13052
13053
13054
13055
13056
13057
13058
13059
13060
13061
13062
13063
13064
13065
13066
13067
13068
13069
13070
13071
13072
13073
13074
13075
13076
13077
13078
13079
13080
13081
13082
13083
13084
13085
13086
13087
13088
13089
13090
13091
13092
13093
13094
13095
13096
13097
13098
13099
13100
13101
13102
13103
13104
13105
13106
13107
13108
13109
13110
13111
13112
13113
13114
13115
13116
13117
13118
13119
13120
13121
13122
13123
13124
13125
13126
13127
13128
13129
13130
13131
13132
13133
13134
13135
13136
13137
13138
13139
13140
13141
13142
13143
13144
13145
13146
13147
13148
13149
13150
13151
13152
13153
13154
13155
13156
13157
13158
13159
13160
13161
13162
13163
13164
13165
13166
13167
13168
13169
13170
13171
13172
13173
13174
13175
13176
13177
13178
13179
13180
13181
13182
13183
13184
13185
13186
13187
13188
13189
13190
13191
13192
13193
13194
13195
13196
13197
13198
13199
13200
13201
13202
13203
13204
13205
13206
13207
13208
13209
13210
13211
13212
13213
13214
13215
13216
13217
13218
13219
13220
13221
13222
13223
13224
13225
13226
13227
13228
13229
13230
13231
13232
13233
13234
13235
13236
13237
13238
13239
13240
13241
13242
13243
13244
13245
13246
13247
13248
13249
13250
13251
13252
13253
13254
13255
13256
13257
13258
13259
13260
13261
13262
13263
13264
13265
13266
13267
13268
13269
13270
13271
13272
13273
13274
13275
13276
13277
13278
13279
13280
13281
13282
13283
13284
13285
13286
13287
13288
13289
13290
13291
13292
13293
13294
13295
13296
13297
13298
13299
13300
13301
13302
13303
13304
13305
13306
13307
13308
13309
13310
13311
13312
13313
13314
13315
13316
13317
13318
13319
13320
13321
13322
13323
13324
13325
13326
13327
13328
13329
13330
13331
13332
13333
13334
13335
13336
13337
13338
13339
13340
13341
13342
13343
13344
13345
13346
13347
13348
13349
13350
13351
13352
13353
13354
13355
13356
13357
13358
13359
13360
13361
13362
13363
13364
13365
13366
13367
13368
13369
13370
13371
13372
13373
13374
13375
13376
13377
13378
13379
13380
13381
13382
13383
13384
13385
13386
13387
13388
13389
13390
13391
13392
13393
13394
13395
13396
13397
13398
13399
13400
13401
13402
13403
13404
13405
13406
13407
13408
13409
13410
13411
13412
13413
13414
13415
13416
13417
13418
13419
13420
13421
13422
13423
13424
13425
13426
13427
13428
13429
13430
13431
13432
13433
13434
13435
13436
13437
13438
13439
13440
13441
13442
13443
13444
13445
13446
13447
13448
13449
13450
13451
13452
13453
13454
13455
13456
13457
13458
13459
13460
13461
13462
13463
13464
13465
13466
13467
13468
13469
13470
13471
13472
13473
13474
13475
13476
13477
13478
13479
13480
13481
13482
13483
13484
13485
13486
13487
13488
13489
13490
13491
13492
13493
13494
13495
13496
13497
13498
13499
13500
13501
13502
13503
13504
13505
13506
13507
13508
13509
13510
13511
13512
13513
13514
13515
13516
13517
13518
13519
13520
13521
13522
13523
13524
13525
13526
13527
13528
13529
13530
13531
13532
13533
13534
13535
13536
13537
13538
13539
13540
13541
13542
13543
13544
13545
13546
13547
13548
13549
13550
13551
13552
13553
13554
13555
13556
13557
13558
13559
13560
13561
13562
13563
13564
13565
13566
13567
13568
13569
13570
13571
13572
13573
13574
13575
13576
13577
13578
13579
13580
13581
13582
13583
13584
13585
13586
13587
13588
13589
13590
13591
13592
13593
13594
13595
13596
13597
13598
13599
13600
13601
13602
13603
13604
13605
13606
13607
13608
13609
13610
13611
13612
13613
13614
13615
13616
13617
13618
13619
13620
13621
13622
13623
13624
13625
13626
13627
13628
13629
13630
13631
13632
13633
13634
13635
13636
13637
13638
13639
13640
13641
13642
13643
13644
13645
13646
13647
13648
13649
13650
13651
13652
13653
13654
13655
13656
13657
13658
13659
13660
13661
13662
13663
13664
13665
13666
13667
13668
13669
13670
13671
13672
13673
13674
13675
13676
13677
13678
13679
13680
13681
13682
13683
13684
13685
13686
13687
13688
13689
13690
13691
13692
13693
13694
13695
13696
13697
13698
13699
13700
13701
13702
13703
13704
13705
13706
13707
13708
13709
13710
13711
13712
13713
13714
13715
13716
13717
13718
13719
13720
13721
13722
13723
13724
13725
13726
13727
13728
13729
13730
13731
13732
13733
13734
13735
13736
13737
13738
13739
13740
13741
13742
13743
13744
13745
13746
13747
13748
13749
13750
13751
13752
13753
13754
13755
13756
13757
13758
13759
13760
13761
13762
13763
13764
13765
13766
13767
13768
13769
13770
13771
13772
13773
13774
13775
13776
13777
13778
13779
13780
13781
13782
13783
13784
13785
13786
13787
13788
13789
13790
13791
13792
13793
13794
13795
13796
13797
13798
13799
13800
13801
13802
13803
13804
13805
13806
13807
13808
13809
13810
13811
13812
13813
13814
13815
13816
13817
13818
13819
13820
13821
13822
13823
13824
13825
13826
13827
13828
13829
13830
13831
13832
13833
13834
13835
13836
13837
13838
13839
13840
13841
13842
13843
13844
13845
13846
13847
13848
13849
13850
13851
13852
13853
13854
13855
13856
13857
13858
13859
13860
13861
13862
13863
13864
13865
13866
13867
13868
13869
13870
13871
13872
13873
13874
13875
13876
13877
13878
13879
13880
13881
13882
13883
13884
13885
13886
13887
13888
13889
13890
13891
13892
13893
13894
13895
13896
13897
13898
13899
13900
13901
13902
13903
13904
13905
13906
13907
13908
13909
13910
13911
13912
13913
13914
13915
13916
13917
13918
13919
13920
13921
13922
13923
13924
13925
13926
13927
13928
13929
13930
13931
13932
13933
13934
13935
13936
13937
13938
13939
13940
13941
13942
13943
13944
13945
13946
13947
13948
13949
13950
13951
13952
13953
13954
13955
13956
13957
13958
13959
13960
13961
13962
13963
13964
13965
13966
13967
13968
13969
13970
13971
13972
13973
13974
13975
13976
13977
13978
13979
13980
13981
13982
13983
13984
13985
13986
13987
13988
13989
13990
13991
13992
13993
13994
13995
13996
13997
13998
13999
14000
14001
14002
14003
14004
14005
14006
14007
14008
14009
14010
14011
14012
14013
14014
14015
14016
14017
14018
14019
14020
14021
14022
14023
14024
14025
14026
14027
14028
14029
14030
14031
14032
14033
14034
14035
14036
14037
14038
14039
14040
14041
14042
14043
14044
14045
14046
14047
14048
14049
14050
14051
14052
14053
14054
14055
14056
14057
14058
14059
14060
14061
14062
14063
14064
14065
14066
14067
14068
14069
14070
14071
14072
14073
14074
14075
14076
14077
14078
14079
14080
14081
14082
14083
14084
14085
14086
14087
14088
14089
14090
14091
14092
14093
14094
14095
14096
14097
14098
14099
14100
14101
14102
14103
14104
14105
14106
14107
14108
14109
14110
14111
14112
14113
14114
14115
14116
14117
14118
14119
14120
14121
14122
14123
14124
14125
14126
14127
14128
14129
14130
14131
14132
14133
14134
14135
14136
14137
14138
14139
14140
14141
14142
14143
14144
14145
14146
14147
14148
14149
14150
14151
14152
14153
14154
14155
14156
14157
14158
14159
14160
14161
14162
14163
14164
14165
14166
14167
14168
14169
14170
14171
14172
14173
14174
14175
14176
14177
14178
14179
14180
14181
14182
14183
14184
14185
14186
14187
14188
14189
14190
14191
14192
14193
14194
14195
14196
14197
14198
14199
14200
14201
14202
14203
14204
14205
14206
14207
14208
14209
14210
14211
14212
14213
14214
14215
14216
14217
14218
14219
14220
14221
14222
14223
14224
14225
14226
14227
14228
14229
14230
14231
14232
14233
14234
14235
14236
14237
14238
14239
14240
14241
14242
14243
14244
14245
14246
14247
14248
14249
14250
14251
14252
14253
14254
14255
14256
14257
14258
14259
14260
14261
14262
14263
14264
14265
14266
14267
14268
14269
14270
14271
14272
14273
14274
14275
14276
14277
14278
14279
14280
14281
14282
14283
14284
14285
14286
14287
14288
14289
14290
14291
14292
14293
14294
14295
14296
14297
14298
14299
14300
14301
14302
14303
14304
14305
14306
14307
14308
14309
14310
14311
14312
14313
14314
14315
14316
14317
14318
14319
14320
14321
14322
14323
14324
14325
14326
14327
14328
14329
14330
14331
14332
14333
14334
14335
14336
14337
14338
14339
14340
14341
14342
14343
14344
14345
14346
14347
14348
14349
14350
14351
14352
14353
14354
14355
14356
14357
14358
14359
14360
14361
14362
14363
14364
14365
14366
14367
14368
14369
14370
14371
14372
14373
14374
14375
14376
14377
14378
14379
14380
14381
14382
14383
14384
14385
14386
14387
14388
14389
14390
14391
14392
14393
14394
14395
14396
14397
14398
14399
14400
14401
14402
14403
14404
14405
14406
14407
14408
14409
14410
14411
14412
14413
14414
14415
14416
14417
14418
14419
14420
14421
14422
14423
14424
14425
14426
14427
14428
14429
14430
14431
14432
14433
14434
14435
14436
14437
14438
14439
14440
14441
14442
14443
14444
14445
14446
14447
14448
14449
14450
14451
14452
14453
14454
14455
14456
14457
14458
14459
14460
14461
14462
14463
14464
14465
14466
14467
14468
14469
14470
14471
14472
14473
14474
14475
14476
14477
14478
14479
14480
14481
14482
14483
14484
14485
14486
14487
14488
14489
14490
14491
14492
14493
14494
14495
14496
14497
14498
14499
14500
14501
14502
14503
14504
14505
14506
14507
14508
14509
14510
14511
14512
14513
14514
14515
14516
14517
14518
14519
14520
14521
14522
14523
14524
14525
14526
14527
14528
14529
14530
14531
14532
14533
14534
14535
14536
14537
14538
14539
14540
14541
14542
14543
14544
14545
14546
14547
14548
14549
14550
14551
14552
14553
14554
14555
14556
14557
14558
14559
14560
14561
14562
14563
14564
14565
14566
14567
14568
14569
14570
14571
14572
14573
14574
14575
14576
14577
14578
14579
14580
14581
14582
14583
14584
14585
14586
14587
14588
14589
14590
14591
14592
14593
14594
14595
14596
14597
14598
14599
14600
14601
14602
14603
14604
14605
14606
14607
14608
14609
14610
14611
14612
14613
14614
14615
14616
14617
14618
14619
14620
14621
14622
14623
14624
14625
14626
14627
14628
14629
14630
14631
14632
14633
14634
14635
14636
14637
14638
14639
14640
14641
14642
14643
14644
14645
14646
14647
14648
14649
14650
14651
14652
14653
14654
14655
14656
14657
14658
14659
14660
14661
14662
14663
14664
14665
14666
14667
14668
14669
14670
14671
14672
14673
14674
14675
14676
14677
14678
14679
14680
14681
14682
14683
14684
14685
14686
14687
14688
14689
14690
14691
14692
14693
14694
14695
14696
14697
14698
14699
14700
14701
14702
14703
14704
14705
14706
14707
14708
14709
14710
14711
14712
14713
14714
14715
14716
14717
14718
14719
14720
14721
14722
14723
14724
14725
14726
14727
14728
14729
14730
14731
14732
14733
14734
14735
14736
14737
14738
14739
14740
14741
14742
14743
14744
14745
14746
14747
14748
14749
14750
14751
14752
14753
14754
14755
14756
14757
14758
14759
14760
14761
14762
14763
14764
14765
14766
14767
14768
14769
14770
14771
14772
14773
14774
14775
14776
14777
14778
14779
14780
14781
14782
14783
14784
14785
14786
14787
14788
14789
14790
14791
14792
14793
14794
14795
14796
14797
14798
14799
14800
14801
14802
14803
14804
14805
14806
14807
14808
14809
14810
14811
14812
14813
14814
14815
14816
14817
14818
14819
14820
14821
14822
14823
14824
14825
14826
14827
14828
14829
14830
14831
14832
14833
14834
14835
14836
14837
14838
14839
14840
14841
14842
14843
14844
14845
14846
14847
14848
14849
14850
14851
14852
14853
14854
14855
14856
14857
14858
14859
14860
14861
14862
14863
14864
14865
14866
14867
14868
14869
14870
14871
14872
14873
14874
14875
14876
14877
14878
14879
14880
14881
14882
14883
14884
14885
14886
14887
14888
14889
14890
14891
14892
14893
14894
14895
14896
14897
14898
14899
14900
14901
14902
14903
14904
14905
14906
14907
14908
14909
14910
14911
14912
14913
14914
14915
14916
14917
14918
14919
14920
14921
14922
14923
14924
14925
14926
14927
14928
14929
14930
14931
14932
14933
14934
14935
14936
14937
14938
14939
14940
14941
14942
14943
14944
14945
14946
14947
14948
14949
14950
14951
14952
14953
14954
14955
14956
14957
14958
14959
14960
14961
14962
14963
14964
14965
14966
14967
14968
14969
14970
14971
14972
14973
14974
14975
14976
14977
14978
14979
14980
14981
14982
14983
14984
14985
14986
14987
14988
14989
14990
14991
14992
14993
14994
14995
14996
14997
14998
14999
15000
15001
15002
15003
15004
15005
15006
15007
15008
15009
15010
15011
15012
15013
15014
15015
15016
15017
15018
15019
15020
15021
15022
15023
15024
15025
15026
15027
15028
15029
15030
15031
15032
15033
15034
15035
15036
15037
15038
15039
15040
15041
15042
15043
15044
15045
15046
15047
15048
15049
15050
15051
15052
15053
15054
15055
15056
15057
15058
15059
15060
15061
15062
15063
15064
15065
15066
15067
15068
15069
15070
15071
15072
15073
15074
15075
15076
15077
15078
15079
15080
15081
15082
15083
15084
15085
15086
15087
15088
15089
15090
15091
15092
15093
15094
15095
15096
15097
15098
15099
15100
15101
15102
15103
15104
15105
15106
15107
15108
15109
15110
15111
15112
15113
15114
15115
15116
15117
15118
15119
15120
15121
15122
15123
15124
15125
15126
15127
15128
15129
15130
15131
15132
15133
15134
15135
15136
15137
15138
15139
15140
15141
15142
15143
15144
15145
15146
15147
15148
15149
15150
15151
15152
15153
15154
15155
15156
15157
15158
15159
15160
15161
15162
15163
15164
15165
15166
15167
15168
15169
15170
15171
15172
15173
15174
15175
15176
15177
15178
15179
15180
15181
15182
15183
15184
15185
15186
15187
15188
15189
15190
15191
15192
15193
15194
15195
15196
15197
15198
15199
15200
15201
15202
15203
15204
15205
15206
15207
15208
15209
15210
15211
15212
15213
15214
15215
15216
15217
15218
15219
15220
15221
15222
15223
15224
15225
15226
15227
15228
15229
15230
15231
15232
15233
15234
15235
15236
15237
15238
15239
15240
15241
15242
15243
15244
15245
15246
15247
15248
15249
15250
15251
15252
15253
15254
15255
15256
15257
15258
15259
15260
15261
15262
15263
15264
15265
15266
15267
15268
15269
15270
15271
15272
15273
15274
15275
15276
15277
15278
15279
15280
15281
15282
15283
15284
15285
15286
15287
15288
15289
15290
15291
15292
15293
15294
15295
15296
15297
15298
15299
15300
15301
15302
15303
15304
15305
15306
15307
15308
15309
15310
15311
15312
15313
15314
15315
15316
15317
15318
15319
15320
15321
15322
15323
15324
15325
15326
15327
15328
15329
15330
15331
15332
15333
15334
15335
15336
15337
15338
15339
15340
15341
15342
15343
15344
15345
15346
15347
15348
15349
15350
15351
15352
15353
15354
15355
15356
15357
15358
15359
15360
15361
15362
15363
15364
15365
15366
15367
15368
15369
15370
15371
15372
15373
15374
15375
15376
15377
15378
15379
15380
15381
15382
15383
15384
15385
15386
15387
15388
15389
15390
15391
15392
15393
15394
15395
15396
15397
15398
15399
15400
15401
15402
15403
15404
15405
15406
15407
15408
15409
15410
15411
15412
15413
15414
15415
15416
15417
15418
15419
15420
15421
15422
15423
15424
15425
15426
15427
15428
15429
15430
15431
15432
15433
15434
15435
15436
15437
15438
15439
15440
15441
15442
15443
15444
15445
15446
15447
15448
15449
15450
15451
15452
15453
15454
15455
15456
15457
15458
15459
15460
15461
15462
15463
15464
15465
15466
15467
15468
15469
15470
15471
15472
15473
15474
15475
15476
15477
15478
15479
15480
15481
15482
15483
15484
15485
15486
15487
15488
15489
15490
15491
15492
15493
15494
15495
15496
15497
15498
15499
15500
15501
15502
15503
15504
15505
15506
15507
15508
15509
15510
15511
15512
15513
15514
15515
15516
15517
15518
15519
15520
15521
15522
15523
15524
15525
15526
15527
15528
15529
15530
15531
15532
15533
15534
15535
15536
15537
15538
15539
15540
15541
15542
15543
15544
15545
15546
15547
15548
15549
15550
15551
15552
15553
15554
15555
15556
15557
15558
15559
15560
15561
15562
15563
15564
15565
15566
15567
15568
15569
15570
15571
15572
15573
15574
15575
15576
15577
15578
15579
15580
15581
15582
15583
15584
15585
15586
15587
15588
15589
15590
15591
15592
15593
15594
15595
15596
15597
15598
15599
15600
15601
15602
15603
15604
15605
15606
15607
15608
15609
15610
15611
15612
15613
15614
15615
15616
15617
15618
15619
15620
15621
15622
15623
15624
15625
15626
15627
15628
15629
15630
15631
15632
15633
15634
15635
15636
15637
15638
15639
15640
15641
15642
15643
15644
15645
15646
15647
15648
15649
15650
15651
15652
15653
15654
15655
15656
15657
15658
15659
15660
15661
15662
15663
15664
15665
15666
15667
15668
15669
15670
15671
15672
15673
15674
15675
15676
15677
15678
15679
15680
15681
15682
15683
15684
15685
15686
15687
15688
15689
15690
15691
15692
15693
15694
15695
15696
15697
15698
15699
15700
15701
15702
15703
15704
15705
15706
15707
15708
15709
15710
15711
15712
15713
15714
15715
15716
15717
15718
15719
15720
15721
15722
15723
15724
15725
15726
15727
15728
15729
15730
15731
15732
15733
15734
15735
15736
15737
15738
15739
15740
15741
15742
15743
15744
15745
15746
15747
15748
15749
15750
15751
15752
15753
15754
15755
15756
15757
15758
15759
15760
15761
15762
15763
15764
15765
15766
15767
15768
15769
15770
15771
15772
15773
15774
15775
15776
15777
15778
15779
15780
15781
15782
15783
15784
15785
15786
15787
15788
15789
15790
15791
15792
15793
15794
15795
15796
15797
15798
15799
15800
15801
15802
15803
15804
15805
15806
15807
15808
15809
15810
15811
15812
15813
15814
15815
15816
15817
15818
15819
15820
15821
15822
15823
15824
15825
15826
15827
15828
15829
15830
15831
15832
15833
15834
15835
15836
15837
15838
15839
15840
15841
15842
15843
15844
15845
15846
15847
15848
15849
15850
15851
15852
15853
15854
15855
15856
15857
15858
15859
15860
15861
15862
15863
15864
15865
15866
15867
15868
15869
15870
15871
15872
15873
15874
15875
15876
15877
15878
15879
15880
15881
15882
15883
15884
15885
15886
15887
15888
15889
15890
15891
15892
15893
15894
15895
15896
15897
15898
15899
15900
15901
15902
15903
15904
15905
15906
15907
15908
15909
15910
15911
15912
15913
15914
15915
15916
15917
15918
15919
15920
15921
15922
15923
15924
15925
15926
15927
15928
15929
15930
15931
15932
15933
15934
15935
15936
15937
15938
15939
15940
15941
15942
15943
15944
15945
15946
15947
15948
15949
15950
15951
15952
15953
15954
15955
15956
15957
15958
15959
15960
15961
15962
15963
15964
15965
15966
15967
15968
15969
15970
15971
15972
15973
15974
15975
15976
15977
15978
15979
15980
15981
15982
15983
15984
15985
15986
15987
15988
15989
15990
15991
15992
15993
15994
15995
15996
15997
15998
15999
16000
16001
16002
16003
16004
16005
16006
16007
16008
16009
16010
16011
16012
16013
16014
16015
16016
16017
16018
16019
16020
16021
16022
16023
16024
16025
16026
16027
16028
16029
16030
16031
16032
16033
16034
16035
16036
16037
16038
16039
16040
16041
16042
16043
16044
16045
16046
16047
16048
16049
16050
16051
16052
16053
16054
16055
16056
16057
16058
16059
16060
16061
16062
16063
16064
16065
16066
16067
16068
16069
16070
16071
16072
16073
16074
16075
16076
16077
16078
16079
16080
16081
16082
16083
16084
16085
16086
16087
16088
16089
16090
16091
16092
16093
16094
16095
16096
16097
16098
16099
16100
16101
16102
16103
16104
16105
16106
16107
16108
16109
16110
16111
16112
16113
16114
16115
16116
16117
16118
16119
16120
16121
16122
16123
16124
16125
16126
16127
16128
16129
16130
16131
16132
16133
16134
16135
16136
16137
16138
16139
16140
16141
16142
16143
16144
16145
16146
16147
16148
16149
16150
16151
16152
16153
16154
16155
16156
16157
16158
16159
16160
16161
16162
16163
16164
16165
16166
16167
16168
16169
16170
16171
16172
16173
16174
16175
16176
16177
16178
16179
16180
16181
16182
16183
16184
16185
16186
16187
16188
16189
16190
16191
16192
16193
16194
16195
16196
16197
16198
16199
16200
16201
16202
16203
16204
16205
16206
16207
16208
16209
16210
16211
16212
16213
16214
16215
16216
16217
16218
16219
16220
16221
16222
16223
16224
16225
16226
16227
16228
16229
16230
16231
16232
16233
16234
16235
16236
16237
16238
16239
16240
16241
16242
16243
16244
16245
16246
16247
16248
16249
16250
16251
16252
16253
16254
16255
16256
16257
16258
16259
16260
16261
16262
16263
16264
16265
16266
16267
16268
16269
16270
16271
16272
16273
16274
16275
16276
16277
16278
16279
16280
16281
16282
16283
16284
16285
16286
16287
16288
16289
16290
16291
16292
16293
16294
16295
16296
16297
16298
16299
16300
16301
16302
16303
16304
16305
16306
16307
16308
16309
16310
16311
16312
16313
16314
16315
16316
16317
16318
16319
16320
16321
16322
16323
16324
16325
16326
16327
16328
16329
16330
16331
16332
16333
16334
16335
16336
16337
16338
16339
16340
16341
16342
16343
16344
16345
16346
16347
16348
16349
16350
16351
16352
16353
16354
16355
16356
16357
16358
16359
16360
16361
16362
16363
16364
16365
16366
16367
16368
16369
16370
16371
16372
16373
16374
16375
16376
16377
16378
16379
16380
16381
16382
16383
16384
16385
16386
16387
16388
16389
16390
16391
16392
16393
16394
16395
16396
16397
16398
16399
16400
16401
16402
16403
16404
16405
16406
16407
16408
16409
16410
16411
16412
16413
16414
16415
16416
16417
16418
16419
16420
16421
16422
16423
16424
16425
16426
16427
16428
16429
16430
16431
16432
16433
16434
16435
16436
16437
16438
16439
16440
16441
16442
16443
16444
16445
16446
16447
16448
16449
16450
16451
16452
16453
16454
16455
16456
16457
16458
16459
16460
16461
16462
16463
16464
16465
16466
16467
16468
16469
16470
16471
16472
16473
16474
16475
16476
16477
16478
16479
16480
16481
16482
16483
16484
16485
16486
16487
16488
16489
16490
16491
16492
16493
16494
16495
16496
16497
16498
16499
16500
16501
16502
16503
16504
16505
16506
16507
16508
16509
16510
16511
16512
16513
16514
16515
16516
16517
16518
16519
16520
16521
16522
16523
16524
16525
16526
16527
16528
16529
16530
16531
16532
16533
16534
16535
16536
16537
16538
16539
16540
16541
16542
16543
16544
16545
16546
16547
16548
16549
16550
16551
16552
16553
16554
16555
16556
16557
16558
16559
16560
16561
16562
16563
16564
16565
16566
16567
16568
16569
|
Lucene Change Log
For more information on past and future Lucene versions, please see:
http://s.apache.org/luceneversions
======================= Lucene 8.3.1 =======================
Bug Fixes
---------------------
* LUCENE-9050: MultiTermIntervalsSource.visit() was not calling back to its
visitor. (Alan Woodward)
======================= Lucene 8.3.0 =======================
API Changes
* LUCENE-8909: IndexWriter#getFieldNames() method is used to get fields present in index. After LUCENE-8316, this
method is no longer required. Hence, deprecate IndexWriter#getFieldNames() method. (Adrien Grand, Munendra S N)
* LUCENE-8755: SpatialPrefixTreeFactory now consumes the "version" parsed with Lucene's Version class. The quad
and packed quad prefix trees are sensitive to this. It's recommended to pass the version like you
should do likewise for analysis components for tokenized text, or else changes to the encoding in future versions
may be incompatible with older indexes. (Chongchen Chen, David Smiley)
* LUCENE-8956: QueryRescorer now only sorts the first topN hits instead of all
initial hits. (Paul Sanwald via Adrien Grand)
* LUCENE-8921: IndexSearcher.termStatistics() no longer takes a TermStates; it takes the docFreq and totalTermFreq.
And don't call if docFreq <= 0. The previous implementation survives as deprecated and final. It's removed in 9.0.
(Bruno Roustant, David Smiley, Alan Woodward)
* LUCENE-8990: PointValues#estimateDocCount(visitor) estimates the number of documents that would be matched by
the given IntersectVisitor. THe method is used to compute the cost() of ScorerSuppliers instead of
PointValues#estimatePointCount(visitor). (Ignacio Vera, Adrien Grand)
New Features
* LUCENE-8936: Add SpanishMinimalStemFilter (vinod kumar via Tomoko Uchida)
* LUCENE-8764 LUCENE-8945: Add "export all terms and doc freqs" feature to Luke with delimiters. (Leonardo Menezes, Amish Shah via Tomoko Uchida)
* LUCENE-8747: Composite Matches from multiple subqueries now allow access to
their submatches, and a new NamedMatches API allows marking of subqueries
and a simple way to find which subqueries have matched on a given document
(Alan Woodward, Jim Ferenczi)
* LUCENE-8769: Introduce Range Query For Multiple Connected Ranges (Atri Sharma)
* LUCENE-8960: Introduce LatLonDocValuesPointInPolygonQuery for LatLonDocValuesField (Ignacio Vera)
* LUCENE-8753: New UniformSplitPostingsFormat (name "UniformSplit") primarily benefiting in simplicity and
extensibility. New STUniformSplitPostingsFormat (name "SharedTermsUniformSplit") that shares a single internal
term dictionary across fields. (Bruno Roustant, Juan Rodriguez, David Smiley)
Improvements
* LUCENE-8874: Show SPI names instead of class names in Luke Analysis tab. (Tomoko Uchida)
* LUCENE-8894: Add APIs to find SPI names for Tokenizer/CharFilter/TokenFilter factory classes. (Tomoko Uchida)
* LUCENE-8914: move the logic for discarding inner modes in FloatPointNearestNeighbor to the IntersectVisitor
so we take advantage of the change introduced in LUCENE-7862. (Ignacio Vera)
* LUCENE-8955: move the logic for discarding inner modes in LatLonPoint NearestNeighbor to the IntersectVisitor
so we take advantage of the change introduced in LUCENE-7862. (Ignacio Vera)
* LUCENE-8918: PhraseQuery throws exceptions at construction time if it is passed
null arguments. (Alan Woodward)
* LUCENE-8916: GraphTokenStreamFiniteStrings preserves all Token attributes
through its finite strings TokenStreams (Alan Woodward)
* LUCENE-8933: Check kuromoji user dictionary beforehand to avoid unexpected runtime exceptions. (Tomoko Uchida
* LUCENE-8906: Expose Lucene50PostingsFormat.IntBlockTermState as public so that other postings formats can re-use it.
(Bruno Roustant)
* LUCENE-8942: Remove redundant parameters and improve visibility strictness in
LRUQueryCache (Atri Sharma)
* SOLR-13663: Introduce <SpanPositionRange> into XML Query Parser (Alessandro Benedetti via Mikhail Khludnev)
* LUCENE-8952: Use a sort key instead of true distance in NearestNeighbor (Julie Tibshirani).
* LUCENE-8620: Tessellator labels the edges of the generated triangles whether they belong to
the original polygon. This information is added to the triangle encoding. (Ignacio Vera)
* LUCENE-8964: Fix geojson shape parsing on string arrays in properties
(Alexander Reelsen)
* LUCENE-8976: Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor. (Ignacio Vera)
* LUCENE-8966: The Korean analyzer now splits tokens on boundaries between digits and alphabetic characters. (Jim Ferenczi)
* LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (Andy Hind via Anshum Gupta)
Optimizations
* LUCENE-8922: DisjunctionMaxQuery more efficiently leverages impacts to skip
non-competitive hits. (Adrien Grand)
* LUCENE-8935: BooleanQuery with no scoring clause can now early terminate the query when
the total hits is not requested.
* LUCENE-8941: Matches on wildcard queries will defer building their full
disjunction until a MatchesIterator is pulled (Alan Woodward)
* LUCENE-8755: spatial-extras quad and packed quad prefix trees now index points faster.
(Chongchen Chen, David Smiley)
* LUCENE-8860: add additional leaf node level optimizations in LatLonShapeBoundingBoxQuery.
(Igor Motov via Ignacio Vera)
* LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries by
doing just one pass whenever possible. (Ignacio Vera)
* LUCENE-8939: Introduce shared count based early termination across multiple slices
(Atri Sharma)
* LUCENE-8980: Blocktree's seekExact now short-circuits false if the term isn't in the min-max range of the segment.
Large perf gain for ID/time like data when populated sequentially. (Guoqiang Jiang)
Bug Fixes
* LUCENE-8755: spatial-extras quad and packed quad prefix trees could throw a
NullPointerException for certain cell edge coordinates (Chongchen Chen, David Smiley)
* LUCENE-9005: BooleanQuery.visit() would pull subVisitors from its parent visitor, rather
than from a visitor for its own specific query. This could cause problems when BQ was
nested under another BQ. Instead, we now pull a MUST subvisitor, pass it to any MUST
subclauses, and then pull SHOULD, MUST_NOT and FILTER visitors from it rather than from
the parent. (Alan Woodward)
Other
* LUCENE-8778 LUCENE-8911 LUCENE-8957: Define analyzer SPI names as static final fields and document the names in Javadocs.
(Tomoko Uchida, Uwe Schindler)
* LUCENE-8758: QuadPrefixTree: removed levelS and levelN fields which weren't used. (Amish Shah)
* LUCENE-8975: Code Cleanup: Use entryset for map iteration wherever possible.
* LUCENE-8993, LUCENE-8807: Changed all repository and download references in build files
to HTTPS. (Uwe Schindler)
* LUCENE-8998: Fix OverviewImplTest.testIsOptimized reproducible failure. (Tomoko Uchida)
* LUCENE-8999: LuceneTestCase.expectThrows now propogates assert/assumption failures up to the test
w/o wrapping in a new assertion failure unless the caller has explicitly expected them (hossman)
* LUCENE-8062: GlobalOrdinalsWithScoreQuery is no longer eligible for query caching. (Jim Ferenczi)
======================= Lucene 8.2.0 =======================
API Changes
* LUCENE-8865: IndexSearcher now uses Executor instead of ExecutorSerivce.
This change is fully backwards compatible since ExecutorService directly
implements Executor. (Simon Willnauer)
* LUCENE-8856: Intervals queries have moved from the sandbox to the queries
module. (Alan Woodward)
* LUCENE-8893: Intervals.wildcard() and Intervals.prefix() methods now take
BytesRef rather than String. (Alan Woodward)
New Features
* LUCENE-8632: New XYShape Field and Queries for indexing and searching general cartesian
geometries. (Nick Knize)
* LUCENE-8891: Snowball stemmer/analyzer for the Estonian language.
(Gert Morten Paimla via Tomoko Uchida)
* LUCENE-8815: Provide a DoubleValues implementation for retrieving the value of features without
requiring a separate numeric field. Note that as feature values are stored with only 8 bits of
mantissa the values returned may have a delta from the original values indexed.
(Colin Goodheart-Smithe via Adrien Grand)
* LUCENE-8803: Provide a FeatureSortfield to allow sorting search hits by descending value of a
feature. This is exposed via the factory method FeatureField#newFeatureSort.
(Colin Goodheart-Smithe via Adrien Grand)
* LUCENE-8784: The KoreanTokenizer now preserves punctuations if discardPunctuation is set
to false (defaults to true).
(Namgyu Kim via Jim Ferenczi)
* LUCENE-8812: Add new KoreanNumberFilter that can change Hangul character to number
and process decimal point. It is similar to the JapaneseNumberFilter.
(Namgyu Kim)
* LUCENE-8362: Add doc-value support to range fields. (Atri Sharma via Adrien Grand)
* LUCENE-8766: Add monitor subproject (previously Luwak monitoring library). This
allows a stream of documents to be matched against a set of registered queries
in an efficient manner, for use as a monitoring or classification tool.
(Alan Woodward)
* LUCENE-7714: Add a numeric range query in sandbox that takes advantage of index sorting.
(Julie Tibshirani via Jim Ferenczi)
* LUCENE-8859: The completion suggester's postings format now have an option to
load its internal FST off-heap. (Jim Ferenczi)
Bug Fixes
* LUCENE-8831: Fixed LatLonShapeBoundingBoxQuery .hashCode methods. (Ignacio Vera)
* LUCENE-8775: Improve tessellator to handle better cases where a hole share a vertex
with the polygon. (Ignacio Vera)
* LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates.
This causes assertion errors and potentially broken field attributes in the IndexWriter when
IndexWriter#deleteAll is called while actively indexing. (Simon Willnauer)
* LUCENE-8804: Forbid calls to putAttribute on frozen FieldType instances.
(Vamshi Vijay Nakkirtha via Adrien Grand)
* LUCENE-8828: Removes the buggy 'disallow overlaps' boolean from Intervals.unordered(),
and replaces it with a new Intervals.unorderedNoOverlaps() method (Alan Woodward)
* LUCENE-8843: Don't ignore exceptions that are thrown when trying to open a
file in IOUtils#fsync. (Jason Tedor via Adrien Grand)
* LUCENE-8835: FileSwitchDirectory now respects the file extension when listing directory
contents to ensure we don't expose pending deletes if both directory point to the same
underlying filesystem directory. (Simon Willnauer)
* LUCENE-8853: FileSwitchDirectory now applies best effort to place tmp files in the same
directory as the target files. (Simon Willnauer)
* LUCENE-8892: Add missing closing parentheses in MultiBoolFunction's description() (Florian Diebold, Munendra S N)
Improvements
* LUCENE-7840: Non-scoring BooleanQuery now removes SHOULD clauses before building the scorer supplier
as opposed to eliminating them during scoring construction. (Atri Sharma via Jim Ferenczi)
* LUCENE-8770: BlockMaxConjunctionScorer now leverages two-phase iterators in order to avoid
executing the second phase when scorers don't intersect. (Adrien Grand, Jim Ferenczi)
* LUCENE-8781: FST lookup performance has been improved in many cases by
encoding Arcs using full-sized arrays with gaps. The new encoding is
enabled for postings in the default codec and for suggesters. (Mike Sokolov)
* LUCENE-8818: Fix smokeTestRelease.py encoding bug (janhoy)
* LUCENE-8845: Allow Intervals.prefix() and Intervals.wildcard() to specify
their maximum allowed expansions (Alan Woodward)
* LUCENE-8875: Introduce a Collector optimized for use cases when large
number of hits are requested (Atri Sharma)
* LUCENE-8848 LUCENE-7757 LUCENE-8492: The UnifiedHighlighter now detects that parts of the query are not understood by
it, and thus it should not make optimizations that result in no highlights or slow highlighting. This generally works
best for WEIGHT_MATCHES mode. Consequently queries produced by ComplexPhraseQueryParser and the surround QueryParser
will now highlight correctly. (David Smiley)
* LUCENE-8793: Luke enhanced UI for CustomAnalyzer: show detailed analysis steps. (Jun Ohtani via Tomoko Uchida)
* LUCENE-8855: Add Accountable to some Query implementations (ab, Adrien Grand)
Optimizations
* LUCENE-8796: Use exponential search instead of binary search in
IntArrayDocIdSet#advance method (Luca Cavanna via Adrien Grand)
* LUCENE-8865: Use incoming thread for execution if IndexSearcher has an executor.
Now caller threads execute at least one search on an index even if there is
an executor provided to minimize thread context switching. (Simon Willnauer)
* LUCENE-8868: New storing strategy for BKD tree leaves with low cardinality.
It stores the distinct values once with the cardinality value reducing the
storage cost. (Ignacio Vera)
* LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored
on leaves. (Ignacio Vera)
* LUCENE-8896: Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[])
for several queries. (Ignacio Vera)
* LUCENE-8901: Load frequencies lazily only when needed in BlockDocsEnum and
BlockImpactsEverythingEnum (Mayya Sharipova).
* LUCENE-8888: Optimize distribution of points with data dimensions in
BKD tree leaves. (Ignacio Vera)
* LUCENE-8311: Phrase queries now leverage impacts. (Adrien Grand)
Test Framework
* LUCENE-8825: CheckHits now display the shard index in case of mismatch
between top hits. (Atri Sharma via Adrien Grand)
Other
* LUCENE-8847: Code Cleanup: Remove StringBuilder.append with concatenated
strings. (Koen De Groote via Uwe Schindler)
* LUCENE-8861: Script to find open Github PRs that needs attention (janhoy)
* LUCENE-8852: ReleaseWizard tool for release managers (janhoy)
* LUCENE-8838: Remove support for Steiner points on Tessellator. (Ignacio Vera)
* LUCENE-8879: Improve BKDRadixSelector tests. (Ignacio Vera)
* LUCENE-8886: Fix TestMutablePointsReaderUtils tests. (Ignacio Vera)
======================= Lucene 8.1.1 =======================
(No Changes)
======================= Lucene 8.1.0 =======================
API Changes
* LUCENE-3041: A query introspection API has been added. Queries should
implement a visit() method, taking a QueryVisitor, and either pass the
visitor down to any child queries, or call a visitX() or consumeX() method
on it. All locations in the code that called Weight.extractTerms()
have been changed to use this API, and the extractTerms() method has
been deprecated. (Alan Woodward, Simon Willnauer, David Smiley, Luca
Cavanna)
* LUCENE-8735: Directory.getPendingDeletions is now abstract to ensure
subclasses override it. FilterDirectory now delegates the call, ensuring
correct default behaviour for subclasses. (Henning Andersen)
New Features
* LUCENE-2562: The well-known graphical user interface for inspecting Lucene
indexes "Luke" was added as a Lucene module. It can be started from the
binary distribution by calling the shell scripts in the module folder
or from the source checkout by using `ant -f lucene/luke/build.xml run`.
Luke provides a Swing-based user interface and can be used to open
Lucene or Solr (or Elasticsearch) indexes, inspect documents, check index
commits and segments, or test (custom) analyzers. It also has maintenance
functions to check index structures and force merge indexes for archival.
Luke was originally developed by Andrzej Bialecki, later maintained by
Dmitry Kan and finally rewritten by Tomoko Uchida to use the ASF licensing
compatible Swing framework (as shipped with JDKs).
(Tomoko Uchida, Uwe Schindler)
Bug fixes
* LUCENE-8736: LatLonShapePolygonQuery returns incorrect WITHIN results
with shared boundaries. Point in Polygon now correctly includes boundary
points. Box and Polygon relations with triangles have also been improved to
correctly include boundary points. (Nick Knize)
* LUCENE-8712: Polygon2D does not detect crossings through segment edges.
(Ignacio Vera)
* LUCENE-8720: NameIntCacheLRU (in the facets module) had an int
overflow bug that disabled cleaning of the cache (Russell A Brown)
* LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to
IndexSearcher (Alan Woodward, Yury Pakhomov)
* LUCENE-8719: FixedShingleFilter can miss shingles at the end of a token stream if
there are multiple paths with different lengths. (Alan Woodward)
* LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the
cheapest merges that allow the index to go down to `maxSegmentCount` segments
or less. (Armin Braun via Adrien Grand)
* LUCENE-8477: Interval disjunctions could miss valid hits if some of the
clauses of the disjunction are minimized away. We now rewrite intervals
if a source contains a disjunction and the internal gaps matter for
matching. This behaviour can be disabled if users are more interested
in speed rather than accuracy of matching. (Alan Woodward, Jim Ferenczi)
* LUCENE-8741: ValueSource.fromDoubleValuesSource() was casting to
Scorer instead of Scorable, leading to ClassCastExceptions (Markus Jelsma,
Alan Woodward)
* LUCENE-8754: Fix ConcurrentModificationException in SegmentInfo if
attributes are accessed in MergePolicy while the merge is running (Simon Willnauer)
* LUCENE-8765: Fixed validation of the number of added points in KD trees.
(Zhao Yang via Adrien Grand)
Improvements
* LUCENE-8673: Use radix partitioning when merging dimensional points instead
of sorting all dimensions before hand. (Ignacio Vera, Adrien Grand)
* LUCENE-8687: Optimise radix partitioning for points on heap. (Ignacio Vera)
* LUCENE-8699: Change HeapPointWriter to use a single byte array instead to a list
of byte arrays. In addition a new interface PointValue is added to abstract out
the different formats between offline and on-heap writers. (Ignacio Vera)
* LUCENE-8703: Build point writers in the BKD tree only when they are needed.
(Ignacio Vera)
* LUCENE-8652: SynonymQuery can now deboost the document frequency of each term when
blending the score of the synonym. (Jim Ferenczi)
* LUCENE-8631: The Korean's user dictionary now picks the longest-matching word and discards
the other matches. (Yeongsu Kim via Jim Ferenczi)
* LUCENE-8732: ConstantScoreQuery can now early terminate the query if the minimum score is
greater than the constant score and total hits are not requested. (Jim Ferenczi)
* LUCENE-8750: Implements setMissingValue() on sort fields produced from
DoubleValuesSource and LongValuesSource (Mike Sokolov via Alan Woodward)
* LUCENE-8701: ToParentBlockJoinQuery now creates a child scorer that disallows skipping over
non-competitive documents if the score of a parent depends on the score of multiple
children (avg, max, min). Additionally the score mode `none` that assigns a constant score to
each parent can early terminate top scores's collection. (Jim Ferenczi)
* LUCENE-8751: Weight#matches now use the ScorerSupplier to build scorers with a lead cost of 1
(single document). (Jim Ferenczi)
* LUCENE-8752: Japanese new era name '令和' (Reiwa) is added to the dictionary used in
JapaneseTokenizer so that the analyzer handles the era name correctly.
Reiwa is set to replace the Heisei Era on May 1, 2019. (Tomoko Uchida)
* LUCENE-8671: Introduced reader attributes allows a per IndexReader configuration
of codec internals. This enables a per reader configuration if FSTs are on- or off-heap on a
per field basis (Simon Willnauer)
* LUCENE-8787: spatial-extras DateRangePrefixTree used to only parse ISO-8601 timestamps with 0 or 3
digits of milliseconds precision but now parses other lengths (although > 3 not used).
(Thomas Lemmé via David Smiley)
Changes in Runtime Behavior
* LUCENE-8671: Load FST off-heap also for ID-like fields if reader is not opened
from an IndexWriter. (Simon Willnauer)
* LUCENE-8730: WordDelimiterGraphFilter always emits its original token first. This
brings its behaviour into line with the deprecated WordDelimiterFilter, so that
the only difference in output between the two is in the position length
attribute. (Alan Woodward, Jim Ferenczi)
* LUCENE-7386: Disjunctions nested in disjunctions are now flattened. This might
trigger changes in the produced scores due to changes to the order in which
scores of sub clauses are summed up. (Adrien Grand)
* LUCENE-8756: MoreLikeThisQuery now respects custom term frequencies
(TermFrequencyAttribute) at search time (Olli Kuonanoja)
Other
* LUCENE-8680: Refactor EdgeTree#relateTriangle method. (Ignacio Vera)
* LUCENE-8685: Refactor LatLonShape tests. (Ignacio Vera)
* LUCENE-8713: Add Line2D tests. (Ignacio Vera)
* LUCENE-8729: Workaround: Disable accessibility doclints (Java 13+),
so compilation with recent JDK succeeds. (Uwe Schindler)
* LUCENE-8725: Make TermsQuery.SeekingTermSetTermsEnum a top level class and public (noble)
======================= Lucene 8.0.0 =======================
API Changes
* LUCENE-8662: TermsEnum.seekExact(BytesRef) to abstract and delegate seekExact(BytesRef)
in FilterLeafReader.FilterTermsEnum. (Jeffery Yuan via Tomás Fernández Löbbe, Simon Willnauer)
* LUCENE-8469: Deprecated StringHelper.compare has been removed. (Dawid Weiss)
* LUCENE-8039: Introduce a "delta distance" method set to GeoDistance. This
allows distance calculations, especially for paths, to take into account an
"excursion" to include the specified point.
* LUCENE-8007: Index statistics Terms.getSumDocFreq(), Terms.getDocCount() are
now required to be stored by codecs. Additionally, TermsEnum.totalTermFreq()
and Terms.getSumTotalTermFreq() are now required: if frequencies are not
stored they are equal to TermsEnum.docFreq() and Terms.getSumDocFreq(),
respectively, because all freq() values equal 1. (Adrien Grand, Robert Muir)
* LUCENE-8038: Deprecated PayloadScoreQuery constructors have been removed (Alan
Woodward)
* LUCENE-8014: Similarity.computeSlopFactor() and
Similarity.computePayloadFactor() have been removed (Alan Woodward)
* LUCENE-7996: Queries are now required to produce positive scores.
(Adrien Grand)
* LUCENE-8099: CustomScoreQuery, BoostedQuery and BoostingQuery have been
removed (Alan Woodward)
* LUCENE-8012: Explanation now takes Number rather than float (Alan Woodward,
Robert Muir)
* LUCENE-8116: SimScorer now only takes a frequency and a norm as per-document
scoring factors. (Adrien Grand)
* LUCENE-8113: TermContext has been renamed to TermStates, and can now be
constructed lazily if term statistics are not required (Alan Woodward)
* LUCENE-8242: Deprecated method IndexSearcher#createNormalizedWeight() has
been removed (Alan Woodward)
* LUCENE-8267: Memory codecs removed from the codebase (MemoryPostings,
MemoryDocValues). (Dawid Weiss)
* LUCENE-8144: Moved QueryCachingPolicy.ALWAYS_CACHE to the test framework.
(Nhat Nguyen via Adrien Grand)
* LUCENE-8356: StandardFilter and StandardFilterFactory have been removed
(Alan Woodward)
* LUCENE-8373: StandardAnalyzer.ENGLISH_STOP_WORD_SET has been removed
(Alan Woodward)
* LUCENE-8388: Unused PostingsEnum#attributes() method has been removed
(Alan Woodward)
* LUCENE-8405: TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector
no longer have an option to compute the maximum score when sorting by field.
(Adrien Grand)
* LUCENE-8411: TopFieldCollector no longer takes a fillFields option, it now
always fills fields. (Adrien Grand)
* LUCENE-8412: TopFieldCollector no longer takes a trackDocScores option. Scores
need to be set on top hits via TopFieldCollector#populateScores instead.
(Adrien Grand)
* LUCENE-6228: A new Scorable abstract class has been added, containing only those
methods from Scorer that should be called from Collectors. LeafCollector.setScorer()
now takes a Scorable rather than a Scorer. (Alan Woodward, Adrien Grand)
* LUCENE-8475: Deprecated constants have been removed from RamUsageEstimator.
(Dimitrios Athanasiou)
* LUCENE-8483: Scorers may no longer take null as a Weight (Alan Woodward)
* LUCENE-8352: TokenStreamComponents is now final, and can take a Consumer<Reader>
in its constructor (Mark Harwood, Alan Woodward, Adrien Grand)
* LUCENE-8498: LowerCaseTokenizer has been removed, and CharTokenizer no longer
takes a normalizer function. (Alan Woodward)
* LUCENE-7875: Moved MultiFields static methods out of the class. getLiveDocs is now
in MultiBits which is now public. getMergedFieldInfos and getIndexedFields are now in
FieldInfos. getTerms is now in MultiTerms. getTermPositionsEnum and getTermDocsEnum
were collapsed and renamed to just getTermPostingsEnum and moved to MultiTerms.
(David Smiley)
* LUCENE-8513: MultiFields.getFields is now removed. Please avoid this class,
and Fields in general, when possible. (David Smiley)
* LUCENE-8497: MultiTermAwareComponent has been removed, and in its place
TokenFilterFactory and CharFilterFactory now expose type-safe normalize()
methods. This decouples normalization from tokenization entirely.
(Mayya Sharipova, Alan Woodward)
* LUCENE-8597: IntervalIterator now exposes a gaps() method that reports the
number of gaps between its component sub-intervals. This can be used in a
new filter available via Intervals.maxgaps(). (Alan Woodward)
* LUCENE-8609: Remove IndexWriter#numDocs() and IndexWriter#maxDoc() in favor
of IndexWriter#getDocStats(). (Simon Willnauer)
* LUCENE-8292: Make TermsEnum fully abstract. (Simon Willnauer)
Changes in Runtime Behavior
* LUCENE-8333: Switch MoreLikeThis.setMaxDocFreqPct to use maxDoc instead of
numDocs. (Robert Muir, Dawid Weiss).
* LUCENE-7837: Indices that were created before the previous major version
will now fail to open even if they have been merged with the previous major
version. (Adrien Grand)
* LUCENE-8020: Similarities are no longer passed terms that don't exist by
queries such as SpanOrQuery, so scoring formulas no longer require
divide-by-zero hacks. IndexSearcher.termStatistics/collectionStatistics return null
instead of returning bogus values for a non-existent term or field. (Robert Muir)
* LUCENE-7996: FunctionQuery and FunctionScoreQuery now return a score of 0
when the function produces a negative value. (Adrien Grand)
* LUCENE-8116: Similarities now score fields that omit norms as if the norm was
1. This might change score values on fields that omit norms. (Adrien Grand)
* LUCENE-8134: Index options are no longer automatically downgraded.
(Adrien Grand)
* LUCENE-8031: Length normalization correctly reflects omission of term frequencies.
(Robert Muir, Adrien Grand)
* LUCENE-7444: StandardAnalyzer no longer defaults to removing English stopwords
(Alan Woodward)
* LUCENE-8060: IndexSearcher's search and searchAfter methods now only compute
total hit counts accurately up to 1,000 in order to enable top-hits
optimizations such as block-max WAND (LUCENE-8135). (Adrien Grand)
* LUCENE-8505: IndexWriter#addIndices will now fail if the target index is sorted but
the candidate is not. (Jim Ferenczi)
* LUCENE-8535: Highlighter and FVH doesn't support ToParent and ToChildBlockJoinQuery out of the
box anymore. In order to highlight on Block-Join Queries a custom WeightedSpanTermExtractor / FieldQuery
should be used. (Simon Willnauer, Jim Ferenczi, Julie Tibshirani)
* LUCENE-8563: BM25 scores don't include the (k1+1) factor in their numerator
anymore. This doesn't affect ordering as this is a constant factor which is
the same for every document. (Luca Cavanna via Adrien Grand)
* LUCENE-8509: WordDelimiterGraphFilter will no longer set the offsets of internal
tokens by default, preventing a number of bugs when the filter is chained with
tokenfilters that change the length of their tokens (Alan Woodward)
* LUCENE-8633: IntervalQuery scores do not use term weighting any more, the score
is instead calculated as a function of the sloppy frequency of the matching
intervals. (Alan Woodward, Jim Ferenczi)
* LUCENE-8635: FSTs can now remain off-heap, accessed via
IndexInput, and the default codec's term dictionary
(BlockTreeTermsReader) will now leave the FST for the terms index
off-heap for non-primary-key fields using MMapDirectory, reducing
heap usage for such fields. (Ankit Jain)
New Features
* LUCENE-8340: LongPoint#newDistanceFeatureQuery may be used to boost scores based on
how close a value of a long field is from an configurable origin. This is
typically useful to boost by recency. (Adrien Grand)
* LUCENE-8482: LatLonPoint#newDistanceFeatureQuery may be used to boost scores
based on the haversine distance of a LatLonPoint field to a provided point. This is
typically useful to boost by distance. (Ignacio Vera)
* LUCENE-8216: Added a new BM25FQuery in sandbox to blend statistics across several fields
using the BM25F formula. (Adrien Grand, Jim Ferenczi)
* LUCENE-8564: GraphTokenFilter is an abstract class useful for token filters that need
to read-ahead in the token stream and take into account graph structures. This
also changes FixedShingleFilter to extend GraphTokenFilter (Alan Woodward)
* LUCENE-8612: Intervals.extend() treats an interval as if it covered a wider
span than it actually does, allowing users to force minimum gaps between
intervals in a phrase. (Alan Woodward)
* LUCENE-8629: New interval functions: Intervals.before(), Intervals.after(),
Intervals.within() and Intervals.overlapping(). (Alan Woodward)
* LUCENE-8622: Adds a minimum-should-match interval function that produces intervals
spanning a subset of a set of sources. (Alan Woodward)
* LUCENE-8645: Intervals.fixField() allows you to report intervals from one field
as if they came from another. (Alan Woodward)
* LUCENE-8646: New interval functions: Intervals.prefix() and Intervals.wildcard()
(Alan Woodward)
* LUCENE-8655: Add a getter in FunctionScoreQuery class in order to access to the
underlying DoubleValuesSource. (Gérald Quaire via Alan Woodward)
* LUCENE-8697: GraphTokenStreamFiniteStrings correctly handles side paths
containing gaps (Alan Woodward)
* LUCENE-8702: Simplify intervals returned from vararg Intervals factory methods
(Alan Woodward)
Improvements
* LUCENE-7997: Add BaseSimilarityTestCase to sanity check similarities.
SimilarityBase switches to 64-bit doubles internally to help avoid common numeric issues.
Add missing range checks for similarity parameters.
Improve BM25 and ClassicSimilarity's explanations. (Robert Muir)
* LUCENE-8011: Improved similarity explanations.
(Mayya Sharipova via Adrien Grand)
* LUCENE-4198: Codecs now have the ability to index score impacts.
(Adrien Grand)
* LUCENE-8135: Boolean queries now implement the block-max WAND algorithm in
order to speed up selection of top scored documents. (Adrien Grand)
* LUCENE-8279: CheckIndex now cross-checks terms with norms. (Adrien Grand)
* LUCENE-8660: TopDocsCollectors now return an accurate count (instead of a lower bound)
if the total hit count is equal to the provided threshold. (Adrien Grand, Jim Ferenczi)
Optimizations
* LUCENE-8040: Optimize IndexSearcher.collectionStatistics, avoiding MultiFields/MultiTerms
(David Smiley, Robert Muir)
* LUCENE-4100: Disjunctions now support faster collection of top hits when the
total hit count is not required. (Stefan Pohl, Adrien Grand, Robert Muir)
* LUCENE-7993: Phrase queries are now faster if total hit counts are not
required. (Adrien Grand)
* LUCENE-8109: Boolean queries propagate information about the minimum
competitive score in order to make collection faster if there are disjunctions
or phrase queries as sub queries, which know how to leverage this information
to run faster. (Adrien Grand)
* LUCENE-8439: Disjunction max queries can skip blocks to select the top documents
if the total hit count is not required. (Jim Ferenczi, Adrien Grand)
* LUCENE-8204: Boolean queries with a mix of required and optional clauses are
now faster if the total hit count is not required. (Jim Ferenczi, Adrien Grand)
* LUCENE-8448: Boolean queries now propagates the mininum score to their sub-scorers.
(Jim Ferenczi, Adrien Grand)
* LUCENE-8511: MultiFields.getIndexedFields is now optimized; does not call getMergedFieldInfos
(David Smiley)
* LUCENE-8507: TopFieldCollector can now update the minimum competitive score if the primary sort
is by relevancy and the total hit count is not required. (Jim Ferenczi)
* LUCENE-8464: ConstantScoreScorer now implements setMinCompetitveScore in order
to early terminate the iterator if the minimum score is greater than the constant
score. (Christophe Bismuth via Jim Ferenczi)
* LUCENE-8607: MatchAllDocsQuery can shortcut when total hit count is not
required (Alan Woodward, Adrien Grand)
* LUCENE-8585: Index-time jump-tables for DocValues, for O(1) advance when retrieving doc values.
(Toke Eskildsen, Adrien Grand)
======================= Lucene 7.7.2 =======================
Bug fixes
* LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to
IndexSearcher (Alan Woodward, Yury Pakhomov)
* LUCENE-8735: FilterDirectory.getPendingDeletions now forwards to the delegate
even the method is not abstract in the super class. This prevents issues
where our best effort in carrying on generations in the IndexWriter since pending
deletions are swallowed by the FilterDirectory. (Henning Andersen, Simon Willnauer)
* LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the
cheapest merges that allow the index to go down to `maxSegmentCount` segments
or less. (Armin Braun via Adrien Grand)
* LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates.
This causes assertion errors and potentially broken field attributes in the IndexWriter when
IndexWriter#deleteAll is called while actively indexing. (Simon Willnauer)
* LUCENE-8720: NameIntCacheLRU (in the facets module) had an int
overflow bug that disabled cleaning of the cache (Russell A Brown)
* LUCENE-8809: Refresh and rollback concurrently can leave segment states unclosed (Nhat Nguyen)
======================= Lucene 7.7.1 =======================
(No Changes)
======================= Lucene 7.7.0 =======================
Changes in Runtime Behavior
* LUCENE-8527: StandardTokenizer and UAX29URLEmailTokenizer now support Unicode 9.0,
and provide Unicode UTS#51 v11.0 Emoji tokenization with the "<EMOJI>" token type.
Build
* LUCENE-8611: Update randomizedtesting to 2.7.2, JUnit to 4.12, add hamcrest-core
dependency. (Dawid Weiss)
* LUCENE-8537: ant test command fails under lucene/tools (Peter Somogyi)
Bug fixes:
* LUCENE-8669: Fix LatLonShape WITHIN queries that fail with Multiple search Polygons
that share the dateline. (Nick Knize)
* LUCENE-8603: Fix the inversion of right ids for additional nouns in the Korean user dictionary.
(Yoo Jeongin via Jim Ferenczi)
* LUCENE-8624: int overflow in ByteBuffersDataOutput.size(). (Mulugeta Mammo,
Dawid Weiss)
* LUCENE-8625: int overflow in ByteBuffersDataInput.sliceBufferList. (Mulugeta Mammo,
Dawid Weiss)
* LUCENE-8639: Newly created threadstates while flushing / refreshing can cause duplicated
sequence IDs on IndexWriter. (Simon Willnauer)
* LUCENE-8649: LatLonShape's within and disjoint queries can return false positives with
indexed multi-shapes. (Ignacio Vera)
* LUCENE-8654: Polygon2D#relateTriangle returns the wrong answer if polygon is inside
the triangle. (Ignacio Vera)
* LUCENE-8650: ConcatenatingTokenStream did not correctly clear its state in reset(), and
was not propagating final position increments from its child streams correctly.
(Dan Meehl, Alan Woodward)
* LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused
by a big buffer (1024 chars). (Jim Ferenczi)
New Features
* LUCENE-8026: ExitableDirectoryReader may now time out queries that run on
points such as range queries or geo queries.
(Christophe Bismuth via Adrien Grand)
* LUCENE-8508: IndexWriter can now set the created version via
IndexWriterConfig#setIndexCreatedVersionMajor. This is an expert feature.
(Adrien Grand)
* LUCENE-8601: Attributes set in the IndexableFieldType for each field during indexing will
now be recorded into the corresponding FieldInfo's attributes, accessible at search
time (Murali Krishna P)
Improvements
* LUCENE-8463: TopFieldCollector can now early-terminates queries when sorting by SortField.DOC.
(Christophe Bismuth via Jim Ferenczi)
* LUCENE-8562: Speed up merging segments of points with data dimensions by only sorting on the indexed
dimensions. (Ignacio Vera)
* LUCENE-8529: TopSuggestDocsCollector will now use the completion key to tiebreak completion
suggestion with identical scores. (Jim Ferenczi)
* LUCENE-8575: SegmentInfos#toString now includes attributes and diagnostics.
(Namgyu Kim via Adrien Grand)
* LUCENE-8548: The KoreanTokenizer no longer splits unknown words on combining diacritics and
detects script boundaries more accurately with Character#UnicodeScript#of.
(Christophe Bismuth, Jim Ferenczi)
* LUCENE-8581: Change LatLonShape encoding to use 4 bytes Per Dimension.
(Ignacio Vera, Nick Knize, Adrien Grand)
* LUCENE-8527: Upgrade JFlex dependency to 1.7.0; in StandardTokenizer and UAX29URLEmailTokenizer,
increase supported Unicode version from 6.3 to 9.0, and support Unicode UTS#51 v11.0 Emoji tokenization.
* LUCENE-8640: Date Range format validation (Lucky Sharma, David Smiley via Mikhail Khludnev)
Optimizations
* LUCENE-8552: FieldInfos.getMergedFieldInfos no longer does any merging if there is <= 1 segment.
(Christophe Bismuth via David Smiley)
* LUCENE-8590: BufferedUpdates now uses an optimized storage for buffering docvalues updates that
can safe up to 80% of the heap used compared to the previous implementation and uses non-object
based datastructures. (Simon Willnauer, Mike McCandless, Shai Erera, Adrien Grand)
* LUCENE-8598: Moved to the default accepted overhead ratio for packet ints in DocValuesFieldUpdats
yields an up-to 4x performance improvement when applying doc values updates. (Simon Willnauer, Adrien Grand)
* LUCENE-8599: Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates.
(Simon Willnauer, Adrien Grand)
* LUCENE-8600: Doc-value updates get applied faster by sorting with quicksort,
rather than an in-place mergesort, which needs to perform fewer swaps.
(Adrien Grand)
* LUCENE-8623: Decrease I/O pressure when merging high dimensional points. (Ignacio Vera)
Test Framework
* LUCENE-8604: TestRuleLimitSysouts now has an optional "hard limit" of bytes that can be written
to stderr and stdout (anything beyond the hard limit is ignored). The default hard limit is 2 GB of
logs per test class. (Dawid Weiss)
Other
* LUCENE-8573: BKDWriter now uses FutureArrays#mismatch to compute shared prefixes.
(Christoph Büscher via Adrien Grand)
* LUCENE-8605: Separate bounding box spatial logic from query logic on LatLonShapeBoundingBoxQuery.
(Ignacio Vera)
* LUCENE-8609: Deprecated IndexWriter#numDocs() and IndexWriter#maxDoc() in favor of IndexWriter#getDocStats()
that allows to get consistent numDocs and maxDoc stats that are not subject to concurrent changes.
(Simon Willnauer, Nhat Nguyen)
======================= Lucene 7.6.0 =======================
Build
* LUCENE-8504: Upgrade forbiddenapis to version 2.6. (Uwe Schindler)
* LUCENE-8493: Stop publishing insecure .sha1 files with releases (janhoy)
Bug fixes
* LUCENE-8479: QueryBuilder#analyzeGraphPhrase now throws TooManyClause exception
if the number of expanded path reaches the BooleanQuery#maxClause limit. (Jim Ferenczi)
* LUCENE-8522: throw InvalidShapeException when constructing a polygon and
all points are coplanar. (Ignacio Vera)
* LUCENE-8531: QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings
in the graph if the slop is greater than 0. Span queries cannot be used in this case because
they don't handle slop the same way than phrase queries. (Steve Rowe, Uwe Schindler, Jim Ferenczi)
* LUCENE-8524: Add the Hangul Letter Araea (interpunct) as a separator in Nori's tokenizer.
This change also removes empty terms and trim surface form in Nori's Korean dictionary. (Trey Jones, Jim Ferenczi)
* LUCENE-8550: Fix filtering of coplanar points when creating linked list on
polygon tesselator. (Ignacio Vera)
* LUCENE-8549: Polygon tessellator throws an error if some parts of the shape
could not be processed. (Ignacio Vera)
* LUCENE-8540: Better handling of min/max values for Geo3d encoding. (Ignacio Vera)
* LUCENE-8534: Fix incorrect computation for triangles intersecting polygon edges in
shape tessellation. (Ignacio Vera)
* LUCENE-8559: Fix bug where polygon edges were skipped when checking for intersections.
(Ignacio Vera)
* LUCENE-8556: Use latitude and longitude instead of encoding values to check if triangle is ear
when using morton optimisation. (Ignacio Vera)
* LUCENE-8586: Intervals.or() could get stuck in an infinite loop on certain indexes
(Alan Woodward)
* LUCENE-8595: Fix interleaved DV update and reset. Interleaved update and reset value
to the same doc in the same updates package looses an update if the reset comes before
the update as well as loosing the reset if the update comes frist. (Simon Willnauer, Adrien Grand)
* LUCENE-8592: Fix index sorting corruption due to numeric overflow. The merge of sorted segments
can produce an invalid sort if the sort field is an Integer/Long that uses reverse order and contains
values equal to Integer/Long#MIN_VALUE. These values are always sorted first during a merge
(instead of last because of the reverse order) due to this bug. Indices affected by the bug can be
detected by running the CheckIndex command on a distribution that contains the fix (7.6+).
(Jim Ferenczi, Adrien Grand, Mike McCandless, Simon Willnauer)
New Features
* LUCENE-8496: Selective indexing - modify BKDReader/BKDWriter to allow users
to select a fewer number of dimensions to be used for creating the index than
the total number of dimensions used for field encoding. i.e., dimensions 0 to N
may be used to determine how to split the inner nodes, and dimensions N+1 to D
are ignored and stored as data dimensions at the leaves. (Nick Knize)
* LUCENE-8538: Add a Simple WKT Shape Parser for creating Lucene Geometries (Polygon, Line,
Rectangle) from WKT format. (Nick Knize)
* LUCENE-8462: Adds an Arabic snowball stemmer based on
https://github.com/snowballstem/snowball/blob/master/algorithms/arabic.sbl
(Ryadh Dahimene via Jim Ferenczi)
* LUCENE-8554: Add new LatLonShapeLineQuery that queries indexed LatLonShape fields
by arbitrary lines. (Nick Knize)
* LUCENE-8555: Add dateline crossing support to LatLonShapeBoundingBoxQuery. (Nick Knize)
Improvements
* LUCENE-8521: Change LatLonShape encoding to 7 dimensions instead of 6; where the
first 4 are index dimensions defining the bounding box of the Triangle and the
remaining 3 data dimensions define the vertices of the triangle. (Nick Knize)
* LUCENE-8557: LeafReader.getFieldInfos is now documented and tested that it ought to return
the same cached instance. MemoryIndex's impl now pre-creates the FieldInfos instead of
re-calculating a new instance each time. (Tim Underwood, David Smiley)
* LUCENE-8558: Replace O(N) lookup with O(1) lookup in PerFieldMergeState#FilterFieldInfos.
(Kranthi via Simon Willnauer)
Other
* LUCENE-8523: Correct typo in JapaneseNumberFilterFactory javadocs (Ankush Jhalani
via Alan Woodward)
* LUCENE-8533: Fix Javadocs of DataInput#readVInt(): Negative numbers are
supported, but should be avoided. (Vladimir Dolzhenko via Uwe Schindler)
======================= Lucene 7.5.1 =======================
Bug Fixes
* LUCENE-8454: Fix incorrect vertex indexing and other computation errors in
shape tessellation that would sometimes cause an infinite loop. (Nick Knize)
======================= Lucene 7.5.0 =======================
API Changes
* LUCENE-8467: RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated
(Dawid Weiss)
* LUCENE-8356: StandardFilter is deprecated (Alan Woodward)
* LUCENE-8373: ENGLISH_STOP_WORD_SET on StandardAnalyzer is deprecated. Instead
use EnglishAnalyzer.ENGLISH_STOP_WORD_SET. The default constructor for
StopAnalyzer is also deprecated, and a stop word set should be explicitly
passed to the constructor. (Alan Woodward)
* LUCENE-8378: Add DocIdSetIterator.range static method to return an iterator
matching a range of docids (Mike McCandless)
* LUCENE-8379: Add experimental TermQuery.getTermStates method (Mike McCandless)
* LUCENE-8407: Add experimental SpanTermQuery.getTermStates method (David Smiley)
* LUCENE-8390: MatchesIteratorSupplier replaced by IOSupplier (Alan Woodward,
David Smiley)
* LUCENE-8397: Add DirectoryTaxonomyWriter.getCache (Mike McCandless)
* LUCENE-8387: Add experimental IndexSearcher.getSlices API to see which slices
IndexSearcher is searching concurrently when it's created with an ExecutorService
(Mike McCandless)
* LUCENE-8263: TieredMergePolicy's reclaimDeletesWeight has been replaced with a
new deletesPctAllowed setting to control how aggressively deletes should be
reclaimed. (Erick Erickson, Adrien Grand)
* LUCENE-7314: Graduate LatLonPoint and query classes to core (Nick Knize)
* LUCENE-8428: The way that oal.util.PriorityQueue creates sentinel objects has
been changed from a protected method to a java.util.function.Supplier as a
constructor argument. (Adrien Grand)
* LUCENE-8437: CheckIndex.Status.cantOpenSegments and missingSegmentVersion
have been removed as they were not computed correctly. (Adrien Grand)
* LUCENE-8286: The UnifiedHighlighter has a new HighlightFlag.WEIGHT_MATCHES flag that
will tell this highlighter to use the new MatchesIterator API as the underlying
approach to navigate matching hits for a query. This mode will highlight more
accurately than any other highlighter, and can mark up phrases as one span instead of
word-by-word. The UH's public internal APIs changed a bit in the process.
(David Smiley)
* LUCENE-8471: IndexWriter.getFlushingBytes() returns how many bytes are currently
being flushed to disk. (Alan Woodward)
* LUCENE-8422: Static helper functions for Matches and MatchesIterator implementations
have been moved from Matches to MatchesUtils (Alan Woodward)
* LUCENE-8343: Suggesters now require Long (versus long, previously) from weight() method
while indexing, and provide double (versus long, previously) scores at lookup time
(Alessandro Benedetti)
* LUCENE-8459: SearcherTaxonomyManager now has a constructor taking already opened
IndexReaders, allowing the caller to pass a FilterDirectoryReader, for example.
(Mike McCandless)
Bug Fixes
* LUCENE-8445: Tighten condition when two planes are identical to prevent constructing
bogus tiles when building GeoPolygons. (Ignacio Vera)
* LUCENE-8444: Prevent building functionally identical plane bounds when constructing
DualCrossingEdgeIterator . (Ignacio Vera)
* LUCENE-8380: UTF8TaxonomyWriterCache inconsistency. (Ruslan Torobaev, Dawid Weiss)
* LUCENE-8164: IndexWriter silently accepts broken payload. This has been fixed
via LUCENE-8165 since we are now checking for offset+length going out of bounds.
(Robert Muir, Nhat Nyugen, Simon Willnauer)
* LUCENE-8370: Reproducing
TestLucene{54,70}DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields()
failures (Erick Erickson)
* LUCENE-8376, LUCENE-8371: ConditionalTokenFilter.end() would not propagate correctly
if the last token in the stream was subsequently dropped; FixedShingleFilter did
not set position increment in end() (Alan Woodward)
* LUCENE-8395: WordDelimiterGraphFilter would incorrectly insert a hole into a
TokenStream if a token consisting entirely of delimiter characters was
encountered, but preserve_original was set. (Alan Woodward)
* LUCENE-8398: TieredMergePolicy.getMaxMergedSegmentMB has rounding error (Erick Erickson)
* LUCENE-8429: DaciukMihovAutomatonBuilder is no longer prone to stack
overflows by enforcing a maximum term length. (Adrien Grand)
* LUCENE-8441: IndexWriter now checks doc value type for index sort fields
and fails the document if they are not compatible. (Jim Ferenczi, Mike McCandless)
* LUCENE-8458: Adjust initialization condition of PendingSoftDeletes and ensures
it is initialized before accepting deletes (Simon Willnauer, Nhat Nguyen)
* LUCENE-8466: IndexWriter.deleteDocs(Query... query) incorrectly applies deletes on flush
if the index is sorted. (Adrien Grand, Jim Ferenczi, Vish Ramachandran)
* LUCENE-8502: Allow access to delegate in FilterCodecReader. FilterCodecReader didn't
allow access to it's delegate like other filter readers. This adds a new #getDelegate method
to access the wrapped reader. (Simon Willnauer)
Changes in Runtime Behavior
* LUCENE-7976: TieredMergePolicy now respects maxSegmentSizeMB by default when executing
findForcedMerges and findForcedDeletesMerges (Erick Erickson)
* LUCENE-8263: TieredMergePolicy now reclaims deleted documents more
aggressively by default ensuring that no more than ~1/3 of the index size is
used by deleted documents. (Adrien Grand)
* LUCENE-8503: Call #getDelegate instead of direct member access during unwrap.
Filter*Reader instances access the member or the delegate directly instead of
calling getDelegate(). In order to track access of the delegate these methods
should call #getDelegate() (Simon Willnauer)
Improvements
* LUCENE-8468: A ByteBuffer based Directory implementation. (Dawid Weiss)
* LUCENE-8447: Add DISJOINT and WITHIN support to LatLonShape queries. (Nick Knize)
* LUCENE-8440: Add support for indexing and searching Line and Point shapes using LatLonShape encoding (Nick Knize)
* LUCENE-8435: Add new LatLonShapePolygonQuery for querying indexed LatLonShape fields by arbitrary polygons (Nick Knize)
* LUCENE-8367: Make per-dimension drill down optional for each facet dimension (Mike McCandless)
* LUCENE-8396: Add Points Based Shape Indexing and Search that decomposes shapes
into a triangular mesh and indexes individual triangles as a 6 dimension point (Nick Knize)
* LUCENE-8345, GitHub PR #392: Remove instantiation of redundant wrapper classes for primitives;
add wrapper class constructors to forbiddenapis. (Michael Braun via Uwe Schindler)
* LUCENE-8415: Clean up Directory contracts and JavaDoc comments. (Dawid Weiss)
* LUCENE-8414: Make segmentInfos private in IndexWriter (Simon Willnauer, Nhat Nguyen)
* LUCENE-8446: The UnifiedHighlighter's DefaultPassageFormatter now treats overlapping matches in
the passage as merged (as if one larger match). (David Smiley)
* LUCENE-8460: Better argument validation in StoredField. (Namgyu Kim)
* LUCENE-8432: TopFieldComparator stops comparing documents if the index is
sorted, even if hits still need to be visited to compute the hit count.
(Nikolay Khitrin)
* LUCENE-8422: IntervalQuery now returns useful Matches (Alan Woodward)
* LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the
number of dimensions is bigger than 1. It improves performance when there is
correlation between the dimensions, for example ranges. (Ignacio Vera, Adrien Grand)
Build
* LUCENE-5143: Stop publishing KEYS file with each version, use topmost lucene/KEYS file only.
The buildAndPushRelease.py script validates that RM's PGP key is in the KEYS file.
Remove unused 'copy-to-stage' and '-dist-keys' targets from ant build. (janhoy)
Other
* LUCENE-8485: Update randomizedtesting to version 2.6.4. (Dawid Weiss)
* LUCENE-8366: Upgrade to ICU 62.1. Emoji handling now uses Unicode 11's
Extended_Pictographic property. (Robert Muir)
* LUCENE-8408: original Highlighter: Remove obsolete static AttributeFactory instance
in TokenStreamFromTermVector. (Michael Braun, David Smiley)
* LUCENE-8420: Upgrade OpenNLP to 1.9.0 so OpenNLP tool can read the new model format which 1.8.x
cannot read. 1.9.0 can read the old format. (Koji Sekiguchi)
* LUCENE-8453: Add documentation to analysis factories of Korean (Nori) analyzer
module. (Tomoko Uchida via Uwe Schindler)
* LUCENE-8455: Upgrade ECJ compiler to 4.6.1 in lucene/common-build.xml (Erick Erickson)
* LUCENE-8456: Upgrade Apache Commons Compress to v1.18 (Steve Rowe)
* LUCENE-765: Improved org.apache.lucene.index javadocs. (Mike Sokolov)
* LUCENE-8476: Remove redundant nullity check and switch to optimized List.sort in the
Korean's user dictionary. (Namgyu Kim)
======================= Lucene 7.4.1 =======================
Bug Fixes
* LUCENE-8365: Fix ArrayIndexOutOfBoundsException in UnifiedHighlighter. This fixes
a "off by one" error in the UnifiedHighlighter's code that is only triggered when
two nested SpanNearQueries contain the same term. (Marc-Andre Morissette via Simon Willnauer)
* LUCENE-8381: Fix IndexWriter incorrectly interprets hard-deletes as soft-deletes
while wrapping reader for merges. (Simon Willnauer, Nhat Nguyen)
* LUCENE-8384: Fix missing advance docValues generation while handling docValues
update in PendingSoftDeletes. (Simon Willnauer, Nhat Nguyen)
* LUCENE-8472: Always rewrite the soft-deletes merge retention query. (Adrien Grand, Nhat Nguyen)
======================= Lucene 7.4.0 =======================
Upgrading
* LUCENE-8344: If you are using the AnalyzingSuggester or FuzzySuggester subclass, and if you
explicitly use the preservePositionIncrements=false setting (not the default), then you ought
to rebuild your suggester index. If you don't, queries or indexed data with trailing position
gaps (e.g. stop words) may not work correctly. (David Smiley, Jim Ferenczi)
API Changes
* LUCENE-8242: IndexSearcher.createNormalizedWeight() has been deprecated.
Instead use IndexSearcher.createWeight(), rewriting the query first.
(Alan Woodward)
* LUCENE-8248: MergePolicyWrapper is renamed to FilterMergePolicy and now
also overrides getMaxCFSSegmentSizeMB (Mike Sokolov via Mike McCandless)
* LUCENE-8303: LiveDocsFormat is now only responsible for (de)serialization of
live docs. (Adrien Grand)
Changes in Runtime Behavior
* LUCENE-8309: Live docs are no longer backed by a FixedBitSet. (Adrien Grand)
* LUCENE-8330: Detach IndexWriter from MergePolicy. MergePolicy now instead of
requiring IndexWriter as a hard dependency expects a MergeContext which
IndexWriter implements. (Simon Willnauer, Robert Muir, Dawid Weiss, Mike McCandless)
New Features
* LUCENE-8200: Allow doc-values to be updated atomically together
with a document. Doc-Values updates now can be used as a soft-delete
mechanism to all keeping several version of a document or already
deleted documents around for later reuse. See "IW.softUpdateDocument(...)"
for reference. (Simon Willnauer)
* LUCENE-8197: A new FeatureField makes it easy and efficient to integrate
static relevance signals into the final score. (Adrien Grand, Robert Muir)
* LUCENE-8202: Add a FixedShingleFilter (Alan Woodward, Adrien Grand, Jim
Ferenczi)
* LUCENE-8125: ICUTokenizer support for emoji/emoji sequence tokens. (Robert Muir)
* LUCENE-8196, LUCENE-8300: A new IntervalQuery in the sandbox allows efficient proximity
searches based on minimum-interval semantics. (Alan Woodward, Adrien Grand,
Jim Ferenczi, Simon Willnauer, Matt Weber)
* LUCENE-8233: Add support for soft deletes to IndexWriter delete accounting.
Soft deletes are accounted for inside the index writer and therefor also
by merge policies. A SoftDeletesRetentionMergePolicy is added that allows
to selectively carry over soft_deleted document across merges for retention
policies (Simon Willnauer, Mike McCandless, Robert Muir)
* LUCENE-8237: Add a SoftDeletesDirectoryReaderWrapper that allows to respect
soft deletes if the reader is opened form a directory. (Simon Willnauer,
Mike McCandless, Uwe Schindler, Adrien Grand)
* LUCENE-8229, LUCENE-8270: Add a method Weight.matches(LeafReaderContext, doc)
that returns an iterator over matching positions for a given query and document.
This allows exact hit extraction and will enable implementation of accurate
highlighters. (Alan Woodward, Adrien Grand, David Smiley)
* LUCENE-8249: Implement Matches API for phrase queries (Alan Woodward, Adrien
Grand)
* LUCENE-8246: Allow to customize the number of deletes a merge claims. This
helps merge policies in the soft-delete case to correctly implement retention
policies without triggering uncessary merges. (Simon Willnauer, Mike McCandless)
* LUCENE-8231: A new analysis module (nori) similar to Kuromoji
but to handle Korean using mecab-ko-dic and morphological analysis.
(Robert Muir, Jim Ferenczi)
* LUCENE-8265: WordDelimter/GraphFilter now have an option to skip tokens
marked with KeywordAttribute (Mike Sokolov via Mike McCandless)
* LUCENE-8297: Add IW#tryUpdateDocValues(Reader, int, Fields...) IndexWriter can
update doc values for a specific term but this might affect all documents
containing the term. With tryUpdateDocValues users can update doc-values
fields for individual documents. This allows for instance to soft-delete
individual documents. (Simon Willnauer)
* LUCENE-8298: Allow DocValues updates to reset a value. Passing a DV field with a null
value to IW#updateDocValues or IW#tryUpdateDocValues will now remove the value from the
provided document. This allows to undelete a soft-deleted document unless it's been claimed
by a merge. (Simon Willnauer)
* LUCENE-8273: ConditionalTokenFilter allows analysis chains to skip particular token
filters based on the attributes of the current token. This generalises the keyword
token logic currently used for stemmers and WDF. It is integrated into
CustomAnalyzer by using the `when` and `whenTerm` builder methods, and a new
ProtectedTermFilter is added as an example. (Alan Woodward, Robert Muir,
David Smiley, Steve Rowe, Mike Sokolov)
* LUCENE-8310: Ensure IndexFileDeleter accounts for pending deletes. Today we fail
creating the IndexWriter when the directory has a pending delete. Yet, this
is mainly done to prevent writing still existing files more than once.
IndexFileDeleter already accounts for that for existing files which we can
now use to also take pending deletes into account which ensures that all file
generations per segment always go forward. (Simon Willnauer)
* LUCENE-7960: Add preserveOriginal option to the NGram and EdgeNGram filters.
(Ingomar Wesp, Shawn Heisey via Robert Muir)
* LUCENE-8335: Enforce soft-deletes field up-front. Soft deletes field must be marked
as such once it's introduced and can't be changed after the fact.
(Nhat Nguyen via Simon Willnauer)
* LUCENE-8332: New ConcatenateGraphFilter for concatenating all tokens into one (or more
in the event of a graph input). This is useful for fast analyzed exact-match lookup,
suggesters, and as a component of a named entity recognition system. This was excised
out of CompletionTokenStream in the NRT doc suggester. (David Smiley, Jim Ferenczi)
Bug Fixes
* LUCENE-8221: MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger
indexes.
* LUCENE-8266: Detect bogus tiles when creating a standard polygon and
throw a TileException. (Ignacio Vera)
* LUCENE-8234: Fixed bug in how spatial relationship is computed for
GeoStandardCircle when it covers the whole world. (Ignacio Vera)
* LUCENE-8236: Filter duplicated points when creating GeoPath shapes to
avoid creation of bogus planes. (Ignacio Vera)
* LUCENE-8243: IndexWriter.addIndexes(Directory[]) did not properly preserve
index file names for updated doc values fields (Simon Willnauer,
Michael McCandless, Nhat Nguyen)
* LUCENE-8275: Push up #checkPendingDeletes to Directory to ensure IW fails if
the directory has pending deletes files even if the directory is filtered or
a FileSwitchDirectory (Simon Willnauer, Robert Muir)
* LUCENE-8244: Do not leak open file descriptors in SearcherTaxonomyManager's
refresh on exception (Mike McCandless)
* LUCENE-8305: ComplexPhraseQuery.rewrite now handles an embedded MultiTermQuery
that rewrites to a MatchNoDocsQuery instead of throwing an exception.
(Bjarke Mortensen, Andy Tran via David Smiley)
* LUCENE-8287: Ensure that empty regex completion queries always return no results.
(Julie Tibshirani via Jim Ferenczi)
* LUCENE-8317: Prevent concurrent deletes from being applied during full flush.
Future deletes could potentially be exposed to flushes/commits/refreshes if the
amount of RAM used by deletes is greater than half of the IW RAM buffer. (Simon Willnauer)
* LUCENE-8320: Fix WindowsFS to correctly account for rename and hardlinks.
(Simon Willnauer, Nhat Nguyen)
* LUCENE-8328: Ensure ReadersAndUpdates consistently executes under lock.
(Nhat Nguyen via Simon Willnauer)
* LUCENE-8325: Fixed the smartcn tokenizer to not split UTF-16 surrogate pairs.
(chengpohi via Jim Ferenczi)
* LUCENE-8186: LowerCaseTokenizerFactory now lowercases text in multi-term
queries. (Tim Allison via Adrien Grand)
* LUCENE-8278: Some end-of-input no-scheme domain-only URL tokens are typed as
<ALPHANUM> rather than <URL>. (Junte Zhang, Steve Rowe)
* LUCENE-8355: Prevent IW from opening an already dropped segment while DV updates
are written. (Nhat Nguyen via Simon Willnauer)
* LUCENE-8344: TokenStreamToAutomaton (used by some suggesters) was not ignoring a trailing
position increment when the preservePositionIncrement setting is false.
(David Smiley, Jim Ferenczi)
* LUCENE-8357: FunctionScoreQuery.boostByQuery() and boostByValue() were
producing truncated Explanations (Markus Jelsma, Alan Woodward)
* LUCENE-8360: NGramTokenFilter and EdgeNGramTokenFilter did not correctly
set position increments in end() (Alan Woodward)
Other
* LUCENE-8301: Update randomizedtesting to 2.6.0. (Dawid Weiss)
* LUCENE-8299: Geo3D wrapper uses new polygon method factory that gives better
support for polygons with many points (>100). (Ignacio vera)
* LUCENE-8261: InterpolatedProperties.interpolate and recursive property
references. (Steve Rowe, Dawid Weiss)
* LUCENE-8228: removed obsolete IndexDeletionPolicy clone() requirements from
the javadoc. (Dawid Weiss)
* LUCENE-8219: Use a realistic estimate of the number of nodes and links in
LevensteinAutomaton.java, to save reallocation of arrays.
(Christian Ziech)
* LUCENE-8214: Improve selection of testPoint for GeoComplexPolygon.
(Ignacio Vera)
* SOLR-10912: Add automatic patch validation. (Mano Kovacs, Steve Rowe)
* LUCENE-8122, LUCENE-8175: Upgrade analysis/icu to ICU 61.1.
(Robert Muir, Adrien Grand, Uwe Schindler)
* LUCENE-8291: Remove QueryTemplateManager utility class from XML queryparser.
This class is just a general XML transforming tool (using property files and
XSLT) and has nothing to do with query parsing. It can easily be implemented
using more sophisticated libraries or using XSL transformers from the JDK.
This change also removes the Lucene demo webapp to prevent XSS issues in
untested/unmaintained code. (Uwe Schindler)
Build
* LUCENE-7935: Publish .sha512 hash files with the release artifacts and stop
publishing .md5 hashes since the algorithm is broken (janhoy)
* LUCENE-8230: Upgrade forbiddenapis to version 2.5. (Uwe Schindler)
Documentation
* LUCENE-8238: Improve WordDelimiterFilter and WordDelimiterGraphFilter javadocs
(Mike Sokolov via Mike McCandless)
======================= Lucene 7.3.1 =======================
Bug fixes
* LUCENE-8254: LRUQueryCache could cause IndexReader to hang on close, when
shared with another reader with no CacheHelper (Alan Woodward, Simon Willnauer,
Adrien Grand)
======================= Lucene 7.3.0 =======================
API Changes
* LUCENE-8051: LevensteinDistance renamed to LevenshteinDistance.
(Pulak Ghosh via Adrien Grand)
* LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery and BoostingQuery.
Users should instead use FunctionScoreQuery, possibly combined with
a lucene expression (Alan Woodward)
* LUCENE-8104: Remove facets module compile-time dependency on queries
(Alan Woodward)
* LUCENE-8145: UnifiedHighlighter now uses a unitary OffsetsEnum rather
than a list of enums (Alan Woodward, David Smiley, Jim Ferenczi, Timothy
Rodriguez)
New Features
* LUCENE-2899: Add new module analysis/opennlp, with analysis components
to perform tokenization, part-of-speech tagging, lemmatization and phrase
chunking by invoking the corresponding OpenNLP tools. Named entity
recognition is also provided as a Solr update request processor.
(Lance Norskog, Grant Ingersoll, Joern Kottmann, Em, Kai Gülzau,
Rene Nederhand, Robert Muir, Steven Bower, Steve Rowe)
* LUCENE-8126: Add new spatial prefix tree (SPT) based on google S2 geometry.
It can only be used currently with Geo3D spatial context and it provides
improvements on indexing time for non-points shapes and on query performance.
(Ignacio Vera, David Smiley).
Improvements
* LUCENE-8081: Allow IndexWriter to opt out of flushing on indexing threads
Index/Update Threads try to help out flushing pending document buffers to
disk. This change adds an expert setting to opt ouf of this behavior unless
flusing is falling behind. (Simon Willnauer)
* LUCENE-8086: spatial-extras Geo3dFactory: Use GeoExactCircle with
configurable precision for non-spherical planet models.
(Ignacio Vera via David Smiley)
* LUCENE-8093: TrimFilterFactory implements MultiTermAwareComponent (Alan Woodward)
* LUCENE-8094: TermInSetQuery.toString now returns "field:(A B C)" (Mike McCandless)
* LUCENE-8121: UnifiedHighlighter passage relevancy is improved for terms that are
position sensitive (e.g. part of a phrase) by having an accurate freq.
(David Smiley)
* LUCENE-8129: A Unicode set filter can now be specified when using ICUFoldingFilter.
(Ere Maijala)
* LUCENE-7966: Build Multi-Release JARs to enable usage of optimized intrinsic methods
from Java 9 for index bounds checking and array comparison/mismatch. This change
introduces Java 8 replacements for those Java 9 methods and patches the compiled
classes to use the optimized variants through the MR-JAR mechanism.
(Uwe Schindler, Robert Muir, Adrien Grand, Mike McCandless)
* LUCENE-8127: Speed up rewriteNoScoring when there are no MUST clauses.
(Michael Braun via Adrien Grand)
* LUCENE-8152: Improve consumption of doc-value iterators. (Horatiu Lazu via
Adrien Grand)
* LUCENE-8033: FieldInfos now always use a dense encoding. (Mayya Sharipova
via Adrien Grand)
* LUCENE-8190: Specialized cell interface to allow any spatial prefix tree to
benefit from the setting setPruneLeafyBranches on RecursivePrefixTreeStrategy.
(Ignacio Vera)
Bug Fixes
* LUCENE-8077: Fixed bug in how CheckIndex verifies doc-value iterators.
(Xiaoshan Sun via Adrien Grand)
* SOLR-11758: Fixed FloatDocValues.boolVal to correctly return true for all values != 0.0F
(Munendra S N via hossman)
* LUCENE-8121: The UnifiedHighlighter would highlight some terms within some nested
SpanNearQueries at positions where it should not have. It's fixed in the UH by
switching to the SpanCollector API. The original Highlighter still has this
problem (LUCENE-2287, LUCENE-5455, LUCENE-6796). Some public but internal parts of
the UH were refactored. (David Smiley, Steve Davids)
* LUCENE-8120: Fix LatLonBoundingBox's toString() method (Martijn van Groningen, Adrien Grand)
* LUCENE-8130: Fix NullPointerException from TermStates.toString() (Mike McCandless)
* LUCENE-8124: Fixed HyphenationCompoundWordTokenFilter to handle correctly
hyphenation patterns with indicator >= 7. (Holger Bruch via Adrien Grand)
* LUCENE-8163: BaseDirectoryTestCase could produce random filenames that fail
on Windows (Alan Woodward)
* LUCENE-8174: Fixed {Float,Double,Int,Long}Range.toString(). (Oliver Kaleske
via Adrien Grand)
* LUCENE-8182: Fixed BoostingQuery to apply the context boost instead of the parent query
boost (Jim Ferenczi)
* LUCENE-8188: Fixed bugs in OpenNLPOpsFactory that were causing InputStreams fetched from the
ResourceLoader to be leaked (hossman)
Other
* LUCENE-8111: IndexOrDocValuesQuery Javadoc references outdated method name.
(Kai Chan via Adrien Grand)
* LUCENE-8106: Add script (reproduceJenkinsFailures.py) to attempt to reproduce
failing tests from a Jenkins log. (Steve Rowe)
* LUCENE-8075: Removed unnecessary null check in IntersectTermsEnum.
(Pulak Ghosh via Adrien Grand)
* LUCENE-8156: Require users to not have ASM on the Ant classpath during build.
This is required by LUCENE-7966. (Adrien Grand, Uwe Schindler)
* LUCENE-8161: spatial-extras: the Spatial4j dependency has been updated from 0.6 to 0.7,
which is drop-in compatible (Lucene doesn't expressly use any of the few API differences).
Spatial4j 0.7 is compatible with JTS 1.15.0 and not any prior version. JTS 1.15.0 is
dual-licensed to include BSD; prior versions were LGPL. (David Smiley)
* LUCENE-8155: Add back support in smoke tester to run against later Java versions.
(Uwe Schindler)
* LUCENE-8169: Migrated build to use OpenClover 4.2.1 for checking code coverage.
(Uwe Schindler)
* LUCENE-8170: Improve OpenClover reports (separate test from production code);
enable coverage reports inside test-frameworks. (Uwe Schindler)
Build
* LUCENE-8168: Moved Groovy scripts in build files to separate files.
Update Groovy to 2.4.13. (Uwe Schindler)
* LUCENE-8176: HttpReplicatorTest awaits more than a minute for stopping Jetty threads
(Mikhail Khludnev)
======================= Lucene 7.2.1 =======================
Bug Fixes
* LUCENE-8117: Fix advanceExact on SortedNumericDocValues produced by Lucene54DocValues. (Jim Ferenczi).
======================= Lucene 7.2.0 =======================
API Changes
* LUCENE-8017, LUCENE-8042: Weight, DoubleValuesSource and related objects
now implement a SegmentCacheable interface, with a single method
isCacheable(LeafReaderContext) determining whether or not the object may
be cached against a LeafReader. (Alan Woodward, Robert Muir)
* LUCENE-8038: Payload factors for scoring in PayloadScoreQuery are now
calculated by a PayloadDecoder, instead of delegating to the Similarity.
(Alan Woodward)
* LUCENE-8014: Similarity.computeSlopFactor() and
Similarity.computePayloadFactor() have been deprecated. (Alan Woodward)
* LUCENE-6278: Scorer.freq() has been removed (Alan Woodward)
* LUCENE-7736: DoubleValuesSource and LongValuesSource now expose a
rewrite(IndexSearcher) function. (Alan Woodward)
* LUCENE-7998: DoubleValuesSource.fromQuery() allows you to use the scores
from a Query as a DoubleValuesSource. (Alan Woodward)
* LUCENE-8049: IndexWriter.getMergingSegments()'s return type was changed from
Collection to Set to more accurately reflect it's nature. (David Smiley)
* LUCENE-8059: TopFieldDocCollector can now early terminate collection when
the sort order is compatible with the index order. As a consequence,
EarlyTerminatingSortingCollector is now deprecated. (Adrien Grand)
New Features
* LUCENE-8061: Add convenience factory methods to create BBoxes and XYZSolids
directly from bounds objects.
* LUCENE-7736: IndexReaderFunctions expose various IndexReader statistics as
DoubleValuesSources. (Alan Woodward)
* LUCENE-8068: Allow IndexWriter to write a single DWPT to disk Adds a
flushNextBuffer method to IndexWriter that allows the caller to
synchronously move the next pending or the biggest non-pending index buffer to
disk. This enables flushing selected buffer to disk without highjacking an
indexing thread. This is for instance useful if more than one IW (shards) must
be maintained in a single JVM / system. (Simon Willnauer)
Bug Fixes
* LUCENE-8076: Normalize Vincenti distance calculation for planet models that aren't normalized.
(Ignacio Vera)
* LUCENE-8057: Exact circle bounds computation was incorrect.
(Ignacio Vera)
* LUCENE-8056: Exact circle segment bounding suffered from precision errors.
(Karl Wright)
* LUCENE-8054: Fix the exact circle case where relationships fail when the
planet model has c <= ab, because the planes are constructed incorrectly.
(Ignacio Vera)
* LUCENE-7991: KNearestNeighborDocumentClassifier.knnSearch no longer applies
a previous boosted field's factor to subsequent unboosted fields.
(Christine Poerschke)
* LUCENE-7999: Switch from int to long to track the name for the next
segment to write, so that very long lived indices with very frequent
refreshes or commits, and high indexing thread counts, do not
overflow an int (Mykhailo Demianenko via Mike McCandless)
* LUCENE-8025: Use sumTotalTermFreq=sumDocFreq when scoring DOCS_ONLY fields
that omit term frequency information, as it is equivalent in that case.
Previously bogus numbers were used, and many similarities would
completely degrade. (Robert Muir, Adrien Grand)
* LUCENE-8045: ParallelLeafReader did not correctly report FieldInfo.dvGen
(Alan Woodward)
* LUCENE-8034: Use subtraction instead of addition to sidestep int
overflow in SpanNotQuery. (Hari Menon via Mike McCandless)
* LUCENE-8078: The query cache should not cache instances of
MatchNoDocsQuery. (Jon Harper via Adrien Grand)
* LUCENE-8048: Filesystems do not guarantee order of directories updates
(Nikolay Martynov, Simon Willnauer, Erick Erickson)
Optimizations
* LUCENE-8018: Smaller FieldInfos memory footprint by not retaining unnecessary
references to TreeMap entries. (Julian Vassev via Adrien Grand)
* LUCENE-7994: Use int/int scatter map to gather facet counts when the
number of hits is small relative to the number of unique facet labels
(Dawid Weiss, Robert Muir, Mike McCandless)
* LUCENE-8062: GlobalOrdinalsQuery is no longer eligible for caching. (Jim Ferenczi)
* LUCENE-8058: Large instances of TermInSetQuery are no longer eligible for
caching as they could break memory accounting of the query cache.
(Adrien Grand)
* LUCENE-8055: MemoryIndex.MemoryDocValuesIterator returns 2 documents
instead of 1. (Simon Willnauer)
* LUCENE-8043: Fix document accounting in IndexWriter to prevent writing too many
documents. Once this happens, Lucene refuses to open the index and throws a
CorruptIndexException. (Simon Willnauer, Yonik Seeley, Mike McCandless)
Tests
* LUCENE-8035: Run tests with JDK-specific options: --illegal-access=deny
on Java 9+. (Uwe Schindler)
Build
* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy
jars in ~/.ant/lib/. (Shawn Heisey, Steve Rowe)
======================= Lucene 7.1.0 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
New Features
* LUCENE-7970: Add a shape to Geo3D that consists of multiple planes that
approximate a true circle, rather than an ellipse, for non-spherical planet models.
(Karl Wright, Ignacio Vera)
* LUCENE-7955: Add support for the concept of "nearest distance" to Geo3D's
GeoPath abstraction, which is the distance along the path to the point that is
closest to the provided point. (Karl Wright)
* LUCENE-7906: Add spatial relationships between all currently-defined Geo shapes.
(Ignacio Vera)
* LUCENE-7955: Add support for zero-width paths. (Karl Wright)
* LUCENE-7936: Add serialization and deserialization support to Geo3D. (Karl Wright,
Ignacio Vera)
* LUCENE-7942: Distance computations now have the ability to accurately aggregate
distances, rather than just doing sums. (Karl Wright)
* LUCENE-7934: Add a planet model interface. (Karl Wright)
* LUCENE-7918: Revamp the API for composites so that it's generic and can be used
for many kinds of shapes. (Ignacio Vera)
* LUCENE-7621: Add CoveringQuery, a query whose required number of matching
clauses can be defined per document. (Adrien Grand)
* LUCENE-7927: Add LongValueFacetCounts, to compute facet counts for individual
numeric values (Mike McCandless)
* LUCENE-7940: Add BengaliAnalyzer. (Md. Abdulla-Al-Sun via Robert Muir)
* LUCENE-7392: Add point based LatLonBoundingBox as new RangeField Type.
(Nick Knize)
* LUCENE-7951: Spatial-extras has much better Geo3d support by implementing Spatial4j
abstractions: SpatialContextFactory, ShapeFactory, BinaryCodec, DistanceCalculator.
(Ignacio Vera, David Smiley)
* LUCENE-7973: Update dictionary version for Ukrainian analyzer to 3.9.0 (Andriy
Rysin via Dawid Weiss)
* LUCENE-7974: Add FloatPointNearestNeighbor, an N-dimensional FloatPoint
K-nearest-neighbor search implementation. (Steve Rowe)
* LUCENE-7975: Change the default taxonomy facets cache to a faster
byte[] (UTF-8) based cache. (Mike McCandless)
* LUCENE-7972: DirectoryTaxonomyReader, in Lucene's facet module, now
implements Accountable, so you can more easily track how much heap
it's using. (Mike McCandless)
* LUCENE-7982: A new NormsFieldExistsQuery matches documents that have
norms in a specified field (Colin Goodheart-Smithe via Mike McCandless)
Optimizations
* LUCENE-7905: Optimize how OrdinalMap (used by
SortedSetDocValuesFacetCounts and others) builds its map (Robert
Muir, Adrien Grand, Mike McCandless)
* LUCENE-7655: Speed up geo-distance queries in case of dense single-valued
fields when most documents match. (Maciej Zasada via Adrien Grand)
* LUCENE-7897: IndexOrDocValuesQuery now requires the range cost to be more
than 8x greater than the cost of the lead iterator in order to use doc values.
(Murali Krishna P via Adrien Grand)
* LUCENE-7925: Collapse duplicate SHOULD or MUST clauses by summing up their
boosts. (Adrien Grand)
* LUCENE-7939: MinShouldMatchSumScorer now leverages two-phase iteration in
order to be faster when used in conjunctions. (Adrien Grand)
* LUCENE-7827: AnalyzingInfixSuggester doesn't create "textgrams"
when minPrefixChar=0 (Mikhail Khludnev)
Bug Fixes
* LUCENE-8066: It was still possible to construct a concave GeoExactCircle, so use
a sector approach to prevent that. (Ignacio Vera)
* LUCENE-7967: The GeoDegeneratePoint isWithin() method needed allowance for
numerical precision. (Karl Wright)
* LUCENE-7965: GeoBBoxFactory was constructing the wrong shape at the poles
if the longitude span was greater than 180 degrees. (Karl Wright)
* LUCENE-7916: Prevent ArrayIndexOutOfBoundsException if ICUTokenizer is used
with a different ICU JAR version than it is compiled against. Note, this is
not recommended, lucene-analyzers-icu contains binary data structures
specific to ICU/Unicode versions it is built against. (Chris Koenig, Robert Muir)
* LUCENE-7891: Lucene's taxonomy facets now uses a non-buggy LRU cache
by default. (Jan-Willem van den Broek via Mike McCandless)
* LUCENE-7959: Improve NativeFSLockFactory's exception message if it cannot create
write.lock for an empty index due to bad permissions/read-only filesystem/etc.
(Erick Erickson, Shawn Heisey, Robert Muir)
* LUCENE-7968: AnalyzingSuggester would sometimes order suggestions incorrectly,
it did not properly break ties on the surface forms when both the weights and
the analyzed forms were equal. (Robert Muir)
* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all
child scorers (Adrien Grand, Mike McCandless)
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
Build
* SOLR-11181: Switch order of maven artifact publishing procedure: deploy first
instead of locally installing first, to workaround a double repository push of
*-sources.jar and *-javadoc.jar files. (Lynn Monson via Steve Rowe)
* LUCENE-6673: Maven build fails for target javadoc:jar.
(Ramkumar Aiyengar, Daniel Collins via Steve Rowe)
* LUCENE-7985: Upgrade forbiddenapis to 2.4.1. (Uwe Schindler)
Other
* LUCENE-7948, LUCENE-7937: Upgrade randomizedtesting to 2.5.3 (minor fixes
in test filtering for IDEs). (Mike Sokolov, Dawid Weiss)
* LUCENE-7933: LongBitSet now validates the numBits parameter (Won
Jonghoon, Mike McCandless)
* LUCENE-7978: Add some more documentation about setting up build
environment. (Anton R. Yuste via Uwe Schindler)
* LUCENE-7983: IndexWriter.IndexReaderWarmer is now a functional interface
instead of an abstract class with a single method (Dawid Weiss)
* LUCENE-5753: Update TLDs recognized by UAX29URLEmailTokenizer. (Steve Rowe)
======================= Lucene 7.0.1 =======================
Bug Fixes
* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all
child scorers (Adrien Grand, Mike McCandless)
======================= Lucene 7.0.0 =======================
New Features
* LUCENE-7703: SegmentInfos now record the major Lucene version at index
creation time. (Adrien Grand)
* LUCENE-7756: LeafReader.getMetaData now exposes the index created version as
well as the oldest Lucene version that contributed to the segment.
(Adrien Grand)
* LUCENE-7854: The new TermFrequencyAttribute used during analysis
with a custom token stream allows indexing custom term frequencies
(Mike McCandless)
* LUCENE-7866: Add a new DelimitedTermFrequencyTokenFilter that allows to
mark tokens with a custom term frequency (LUCENE-7854). It parses a numeric
value after a separator char ('|') at the end of each token and changes
the term frequency to this value. (Uwe Schindler, Robert Muir, Mike
McCandless)
* LUCENE-7868: Multiple threads can now resolve deletes and doc values
updates concurrently, giving sizable speedups in update-heavy
indexing use cases (Simon Willnauer, Mike McCandless)
* LUCENE-7823: Pure query based naive bayes classifier using BM25 scores (Tommaso Teofili)
* LUCENE-7838: Knn classifier based on fuzzified term queries (Tommaso Teofili)
* LUCENE-7855: Added advanced options of the Wikipedia tokenizer to its factory.
(Juan Pedro via Adrien Grand)
API Changes
* LUCENE-2605: Classic QueryParser no longer splits on whitespace by default.
Use setSplitOnWhitespace(true) to get the old behavior. (Steve Rowe)
* LUCENE-7369: Similarity.coord and BooleanQuery.disableCoord are removed.
(Adrien Grand)
* LUCENE-7368: Removed query normalization. (Adrien Grand)
* LUCENE-7355: AnalyzingQueryParser has been removed as its functionality has
been folded into the classic QueryParser. (Adrien Grand)
* LUCENE-7407: Doc values APIs have been switched from random access
to iterators, enabling future codec compression improvements. (Mike
McCandless)
* LUCENE-7475: Norms now support sparsity, allowing to pay for what is
actually used. (Adrien Grand)
* LUCENE-7494: Points now have a per-field API, like doc values. (Adrien Grand)
* LUCENE-7410: Cache keys and close listeners have been refactored in order
to be less trappy. See IndexReader.getReaderCacheHelper and
LeafReader.getCoreCacheHelper. (Adrien Grand)
* LUCENE-6819: Index-time boosts are not supported anymore. As a replacement,
index-time scoring factors should be indexed into a doc value field and
combined at query time using eg. FunctionScoreQuery. (Adrien Grand)
* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType.
(David Smiley)
* LUCENE-7701: Grouping collectors have been refactored, such that groups are
now defined by a GroupSelector implementation. (Alan Woodward)
* LUCENE-7741: DoubleValuesSource now has an explain() method (Alan Woodward,
Adrien Grand)
* LUCENE-7815: Removed the PostingsHighlighter; you should use the UnifiedHighlighter
instead, which derived from the UH. WholeBreakIterator and
CustomSeparatorBreakIterator were moved to UH's package. (David Smiley)
* LUCENE-7850: Removed support for legacy numerics. (Adrien Grand)
* LUCENE-7500: Removed abstract LeafReader.fields(); instead terms(fieldName)
has been made abstract, fomerly was final. Also, MultiFields.getTerms
was optimized to work directly instead of being implemented on getFields.
(David Smiley)
* LUCENE-7872: TopDocs.totalHits is now a long. (Adrien Grand, hossman)
* LUCENE-7868: IndexWriterConfig.setMaxBufferedDeleteTerms is
removed. (Simon Willnauer, Mike McCandless)
* LUCENE-7877: PrefixAwareTokenStream is replaced with ConcatenatingTokenStream
(Alan Woodward, Uwe Schindler, Adrien Grand)
* LUCENE-7867: The deprecated Token class is now only available in the test
framework (Alan Woodward, Adrien Grand)
* LUCENE-7723: DoubleValuesSource enforces implementation of equals() and
hashCode() (Alan Woodward)
* LUCENE-7737: The spatial-extras module no longer has a dependency on the
queries module. All uses of ValueSource are either replaced with core
DoubleValuesSource extensions, or with the new ShapeValuesSource and
ShapeValuesPredicate classes (Alan Woodward, David Smiley)
* LUCENE-7892: Doc-values query factory methods have been renamed so that their
name contains "slow" in order to cleary indicate that they would usually be a
bad choice. (Adrien Grand)
* LUCENE-7899: FieldValueQuery is renamed to DocValuesFieldExistsQuery
(Adrien Grand, Mike McCandless)
Bug Fixes
* LUCENE-7626: IndexWriter will no longer accept broken token offsets
(Mike McCandless)
* LUCENE-7859: Spatial-extras PackedQuadPrefixTree bug that only revealed itself
with the new pointsOnly optimizations in LUCENE-7845. (David Smiley)
* LUCENE-7871: fix false positive match in BlockJoinSelector when children have no value, introducing
wrap methods accepting children as DISI. Extracting ToParentDocValues (Mikhail Khludnev)
* LUCENE-7914: Add a maximum recursion level in automaton recursive
functions (Operations.isFinite and Operations.topsortState) to prevent
large automaton to overflow the stack (Robert Muir, Adrien Grand, Jim Ferenczi)
* LUCENE-7864: IndexMergeTool is not using intermediate hard links (even
if possible). (Dawid Weiss)
* LUCENE-7956: Fixed potential stack overflow error in ICUNormalizer2CharFilter.
(Adrien Grand)
* LUCENE-7963: Remove useless getAttribute() in DefaultIndexingChain that
causes performance drop, introduced by LUCENE-7626. (Daniel Mitterdorfer
via Uwe Schindler)
Improvements
* LUCENE-7489: Better storage of sparse doc-values fields with the default
codec. (Adrien Grand)
* LUCENE-7730: More accurate encoding of the length normalization factor
thanks to the removal of index-time boosts. (Adrien Grand)
* LUCENE-7901: Original Highlighter now eagerly throws an exception if you
provide components that are null. (Jason Gerlowski, David Smiley)
* LUCENE-7841: Normalize ґ to г in Ukrainian analyzer. (Andriy Rysin via Dawid Weiss)
Optimizations
* LUCENE-7416: BooleanQuery optimizes queries that have queries that occur both
in the sets of SHOULD and FILTER clauses, or both in MUST/FILTER and MUST_NOT
clauses. (Spyros Kapnissis via Adrien Grand, Uwe Schindler)
* LUCENE-7506: FastTaxonomyFacetCounts should use CPU in proportion to
the size of the intersected set of hits from the query and documents
that have a facet value, so sparse faceting works as expected
(Adrien Grand via Mike McCandless)
* LUCENE-7519: Add optimized APIs to compute browse-only top level
facets (Mike McCandless)
* LUCENE-7589: Numeric doc values now have the ability to encode blocks of
values using different numbers of bits per value if this proves to save
storage. (Adrien Grand)
* LUCENE-7845: Enhance spatial-extras RecursivePrefixTreeStrategy queries when the
query is a point (for 2D) or a is a simple date interval (e.g. 1 month). When
the strategy is marked as pointsOnly, the results is a TermQuery. (David Smiley)
* LUCENE-7874: DisjunctionMaxQuery rewrites to a BooleanQuery when tiebreaker is set to 1. (Jim Ferenczi)
* LUCENE-7828: Speed up range queries on range fields by improving how we
compute the relation between the query and inner nodes of the BKD tree.
(Adrien Grand)
Other
* LUCENE-7923: Removed FST.Arc.node field (unused). (Dawid Weiss)
* LUCENE-7328: Remove LegacyNumericEncoding from GeoPointField. (Nick Knize)
* LUCENE-7360: Remove Explanation.toHtml() (Alan Woodward)
* LUCENE-7681: MemoryIndex uses new DocValues API (Alan Woodward)
* LUCENE-7753: Make fields static when possible.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7540: Upgrade ICU to 59.1 (Mike McCandless, Jim Ferenczi)
* LUCENE-7852: Correct copyright year(s) in lucene/LICENSE.txt file.
(Christine Poerschke, Steve Rowe)
* LUCENE-7719: Generalized the UnifiedHighlighter's support for AutomatonQuery
for character & binary automata. Added AutomatonQuery.isBinary. (David Smiley)
* LUCENE-7873: Due to serious problems with context class loaders in several
frameworks (OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats,
DocValuesFormats and all analysis factories was changed to only inspect the
current classloader that defined the interface class (lucene-core.jar).
See MIGRATE.txt for more information! (Uwe Schindler, Dawid Weiss)
* LUCENE-7883: Lucene no longer uses the context class loader when resolving
resources in CustomAnalyzer or ClassPathResourceLoader. Resources are only
resolved against Lucene's class loader by default. Please use another builder
method to change to a custom classloader. (Uwe Schindler)
* LUCENE-5822: Convert README to Markdown (Jason Gerlowski via Mike Drob)
* LUCENE-7773: Remove unused/deprecated token types from StandardTokenizer.
(Ahmet Arslan via Steve Rowe)
* LUCENE-7800: Remove code that potentially rethrows checked exceptions
from methods that don't declare them ("sneaky throw" hack). (Robert Muir,
Uwe Schindler, Dawid Weiss)
* LUCENE-7876: Avoid calls to LeafReader.fields() and MultiFields.getFields()
that are trivially replaced by LeafReader.terms() and MultiFields.getTerms()
(David Smiley)
======================= Lucene 6.6.5 =======================
(No Changes)
======================= Lucene 6.6.4 =======================
(No Changes)
======================= Lucene 6.6.3 =======================
Build
* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy
jars in ~/.ant/lib/. (Shawn Heisey, Steve Rowe)
======================= Lucene 6.6.2 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
Bug Fixes
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
======================= Lucene 6.6.1 =======================
Bug Fixes
* LUCENE-7869: Changed MemoryIndex to sort 1d points. In case of 1d points, the PointInSetQuery.MergePointVisitor expects
that these points are visited in ascending order. The memory index doesn't do this and this can result in document
with multiple points that should match to not match. (Martijn van Groningen)
* LUCENE-7878: Fix query builder to keep the SHOULD clause that wraps multi-word synonyms. (Jim Ferenczi)
======================= Lucene 6.6.0 =======================
New Features
* LUCENE-7811: Add a concurrent SortedSet facets implementation.
(Mike McCandless)
Bug Fixes
* LUCENE-7777: ByteBlockPool.readBytes sometimes throws
ArrayIndexOutOfBoundsException when byte blocks larger than 32 KB
were added (Mike McCandless)
* LUCENE-7797: The static FSDirectory.listAll(Path) method was always
returning an empty array. (Atkins Chang via Mike McCandless)
* LUCENE-7481: Fixed missing rewrite methods for SpanPayloadCheckQuery
and PayloadScoreQuery. (Erik Hatcher)
* LUCENE-7808: Fixed PayloadScoreQuery and SpanPayloadCheckQuery
.equals and .hashCode methods. (Erik Hatcher)
* LUCENE-7798: Add .equals and .hashCode to ToParentBlockJoinSortField
(Mikhail Khludnev)
* LUCENE-7814: DateRangePrefixTree (in spatial-extras) had edge-case bugs for
years >= 292,000,000. (David Smiley)
* LUCENE-5365, LUCENE-7818: Fix incorrect condition in queryparser's
QueryNodeOperation#logicalAnd(). (Olivier Binda, Amrit Sarkar,
AppChecker via Uwe Schindler)
* LUCENE-7821: The classic and flexible query parsers, as well as Solr's
"lucene"/standard query parser, should require " TO " in range queries,
and accept "TO" as endpoints in range queries. (hossman, Steve Rowe)
* LUCENE-7824: Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city).
(Jim Ferenczi)
* LUCENE-7817: Pass cached query to onQueryCache instead of null.
(Christoph Kaser via Adrien Grand)
* LUCENE-7831: CodecUtil should not seek to negative offsets. (Adrien Grand)
* LUCENE-7833: ToParentBlockJoinQuery computed the min score instead of the max
score with ScoreMode.MAX. (Adrien Grand)
* LUCENE-7847: Fixed all-docs-match optimization of range queries on range
fields. (Adrien Grand)
* LUCENE-7810: Fix equals() and hashCode() methods of several join queries.
(Hossman, Adrien Grand, Martijn van Groningen)
Improvements
* LUCENE-7782: OfflineSorter now passes the total number of items it
will write to getWriter (Mike McCandless)
* LUCENE-7785: Move dictionary for Ukrainian analyzer to external dependency.
(Andriy Rysin via Steve Rowe, Dawid Weiss)
* LUCENE-7801: SortedSetDocValuesReaderState now implements
Accountable so you can see how much RAM it's using (Robert Muir,
Mike McCandless)
* LUCENE-7792: OfflineSorter can now run concurrently if you pass it
an optional ExecutorService (Dawid Weiss, Mike McCandless)
* LUCENE-7811: Sorted set facets now use sparse storage when
collecting hits, when appropriate. (Mike McCandless)
Optimizations
* LUCENE-7787: spatial-extras HeatmapFacetCounter will now short-circuit it's
work when Bits.MatchNoBits is passed. (David Smiley)
Other
* LUCENE-7796: Make IOUtils.reThrow idiom declare Error return type so
callers may use it in a way that compiler knows subsequent code is
unreachable. reThrow is now deprecated in favor of IOUtils.rethrowAlways
with a slightly different semantics (see javadoc). (Hossman, Robert Muir,
Dawid Weiss)
* LUCENE-7754: Inner classes should be static whenever possible.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7751: Avoid boxing primitives only to call compareTo.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7743: Never call new String(String).
(Daniel Jelinski via Adrien Grand)
* LUCENE-7761: Fixed comment in ReqExclScorer.
(Pablo Pita Leira via Adrien Grand)
======================= Lucene 6.5.1 =======================
Bug Fixes
* LUCENE-7755: Fixed join queries to not reference IndexReaders, as it could
cause leaks if they are cached. (Adrien Grand)
* LUCENE-7749: Made LRUQueryCache delegate the scoreSupplier method.
(Martin Amirault via Adrien Grand)
* LUCENE-7769: The UnifiedHighligter wasn't highlighting portions of the query
wrapped in BoostQuery or SpanBoostQuery. (David Smiley, Dmitry Malinin)
Other
* LUCENE-7763: Remove outdated comment in IndexWriterConfig.setIndexSort javadocs.
(马可阳 via Christine Poerschke)
======================= Lucene 6.5.0 =======================
API Changes
* LUCENE-7740: Refactor Range Fields to remove Field suffix (e.g., DoubleRange),
move InetAddressRange and InetAddressPoint from sandbox to misc module, and
refactor all other range fields from sandbox to core. (Nick Knize)
* LUCENE-7624: TermsQuery has been renamed as TermInSetQuery and moved to core.
(Alan Woodward)
* LUCENE-7637: TermInSetQuery requires that all terms come from the same field.
(Adrien Grand)
* LUCENE-7644: FieldComparatorSource.newComparator() and
SortField.getComparator() no longer throw IOException (Alan Woodward)
* LUCENE-7643: Replaced doc-values queries in lucene/sandbox with factory
methods on the *DocValuesField classes. (Adrien Grand)
* LUCENE-7659: Added a IndexWriter#getFieldNames() method (experimental) to return
all field names as visible from the IndexWriter. This would be useful for
IndexWriter#updateDocValues() calls, to prevent calling with non-existent
docValues fields (Ishan Chattopadhyaya, Adrien Grand, Mike McCandless)
* LUCENE-6959: Removed ToParentBlockJoinCollector in favour of
ParentChildrenBlockJoinQuery, that can return the matching children documents per
parent document. This query should be executed for each matching parent document
after the main query has been executed. (Adrien Grand, Martijn van Groningen,
Mike McCandless)
* LUCENE-7628: Scorer.getChildren() now only returns Scorers that are
positioned on the current document, and can throw an IOException.
AssertingScorer checks that getChildren() is not called on an unpositioned
Scorer. (Alan Woodward, Adrien Grand)
* LUCENE-7702: Removed GraphQuery in favour of simple boolean query. (Matt Webber via Jim Ferenczi)
* LUCENE-7707: TopDocs.merge now takes a boolean option telling it
when to use the incoming shard index versus when to assign the shard
index itself, allowing users to merge shard responses incrementally
instead of once all shard responses are present. (Simon Willnauer,
Mike McCandless)
* LUCENE-7700: A cleanup of merge throughput control logic. Refactored all the
code previously scattered throughout the IndexWriter and
ConcurrentMergeScheduler into a more accessible set of public methods (see
MergePolicy.OneMergeProgress, MergeScheduler.wrapForMerge and
OneMerge.mergeInit). (Dawid Weiss, Mike McCandless).
* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType.
(David Smiley)
New Features
* LUCENE-7738: Add new InetAddressRange for indexing and querying InetAddress
ranges. (Nick Knize)
* LUCENE-7449: Add CROSSES relation support to RangeFieldQuery. (Nick Knize)
* LUCENE-7623: Add FunctionScoreQuery and FunctionMatchQuery (Alan Woodward,
Adrien Grand, David Smiley)
* LUCENE-7619: Add WordDelimiterGraphFilter, just like
WordDelimiterFilter except it produces correct token graphs so that
proximity queries at search time will produce correct results (Mike
McCandless)
* LUCENE-7656: Added the LatLonDocValuesField.new(Box/Distance)Query() factory
methods that are the equivalent of factory methods on LatLonPoint but operate
on doc values. These new methods should be wrapped in an IndexOrDocValuesQuery
for best performance. (Adrien Grand)
* LUCENE-7673: Added MultiValued[Int/Long/Float/Double]FieldSource that given a
SortedNumericSelector.Type can give a ValueSource view of a
SortedNumericDocValues field. (Tomás Fernández Löbbe)
* LUCENE-7465: Add SimplePatternTokenizer and
SimplePatternSplitTokenizer, using Lucene's regexp/automaton
implementation for analysis/tokenization (Clinton Gormley, Mike
McCandless)
* LUCENE-7688: Add OneMergeWrappingMergePolicy class.
(Keith Laban, Christine Poerschke)
* LUCENE-7686: The near-real-time document suggester can now
efficiently filter out duplicate suggestions (Uwe Schindler, Mike
McCandless)
* LUCENE-7712: SimpleQueryParser now supports default fuzziness
syntax, mapping foo~ to a FuzzyQuery with edit distance 2. (Lee
Hinman, David Pilato via Mike McCandless)
Bug Fixes
* LUCENE-7630: Fix (Edge)NGramTokenFilter to no longer drop payloads
and preserve all attributes. (Nathan Gass via Uwe Schindler)
* LUCENE-7679: MemoryIndex was ignoring omitNorms settings on passed-in
IndexableFields. (Alan Woodward)
* LUCENE-7692: PatternReplaceCharFilterFactory now implements MultiTermAware.
(Adrien Grand)
* LUCENE-7685: ToParentBlockJoinQuery and ToChildBlockJoinQuery now use the
rewritten child query in their equals and hashCode implementations.
(Adrien Grand)
* LUCENE-7698: CommonGramsQueryFilter was producing a disconnected
token graph, messing up phrase queries when it was used during query
parsing (Ere Maijala via Mike McCandless)
* LUCENE-7708: ShingleFilter without unigram was producing a disconnected
token graph, messing up queries when it was used during query
parsing (Jim Ferenczi)
Improvements
* LUCENE-7055: Added Weight#scorerSupplier, which allows to estimate the cost
of a Scorer before actually building it, in order to optimize how the query
should be run, eg. using points or doc values depending on costs of other
parts of the query. (Adrien Grand)
* LUCENE-7643: IndexOrDocValuesQuery allows to execute range queries using
either points or doc values depending on which one is more efficient.
(Adrien Grand)
* LUCENE-7662: If index files are missing, throw CorruptIndexException instead
of the less descriptive FileNotFound or NoSuchFileException (Mike Drob via
Mike McCandless, Erick Erickson)
* LUCENE-7680: UsageTrackingQueryCachingPolicy never caches term filters anymore
since they are plenty fast. This also has the side-effect of leaving more
space in the history for costly filters. (Adrien Grand)
* LUCENE-7677: UsageTrackingQueryCachingPolicy now caches compound queries a bit
earlier than regular queries in order to improve cache efficiency.
(Adrien Grand)
* LUCENE-7710: BlockPackedReader throws CorruptIndexException and includes
IndexInput description instead of plain IOException (Mike Drob via
Mike McCandless)
* LUCENE-7695: ComplexPhraseQueryParser to support query time synonyms (Markus Jelsma
via Mikhail Khludnev)
* LUCENE-7747: QueryBuilder now iterates lazily over the possible paths when building a graph query
(Jim Ferenczi)
Optimizations
* LUCENE-7641: Optimized point range queries to compute documents that do not
match the range on single-valued fields when more than half the documents in
the index would match. (Adrien Grand)
* LUCENE-7656: Speed up for LatLonPointDistanceQuery by computing distances even
less often. (Adrien Grand)
* LUCENE-7661: Speed up for LatLonPointInPolygonQuery by pre-computing the
relation of the polygon with a grid. (Adrien Grand)
* LUCENE-7660: Speed up LatLonPointDistanceQuery by improving the detection of
whether BKD cells are entirely within the distance close to the dateline.
(Adrien Grand)
* LUCENE-7654: ToParentBlockJoinQuery now implements two-phase iteration and
computes scores lazily in order to be faster when used in conjunctions.
(Adrien Grand)
* LUCENE-7667: BKDReader now calls `IntersectVisitor.grow()` on larger
increments. (Adrien Grand)
* LUCENE-7638: Query parsers now analyze the token graph for articulation
points (or cut vertices) in order to create more efficient queries for
multi-token synonyms. (Jim Ferenczi)
* LUCENE-7699: Query parsers now use span queries to produce more efficient
phrase queries for multi-token synonyms. (Matt Webber via Jim Ferenczi)
* LUCENE-7742: Fix places where we were unboxing and then re-boxing
according to FindBugs (Daniel Jelinski via Mike McCandless)
* LUCENE-7739: Fix places where we unnecessarily boxed while parsing
a numeric value according to FindBugs (Daniel Jelinski via Mike
McCandless)
Build
* LUCENE-7653: Update randomizedtesting to version 2.5.0. (Dawid Weiss)
* LUCENE-7665: Remove grouping dependency from the join module.
(Martijn van Groningen)
* SOLR-10023: Add non-recursive 'test-nocompile' target: Only runs unit tests.
Jars are not downloaded; compilation is not updated; and Clover is not enabled.
(Steve Rowe)
* LUCENE-7694: Update forbiddenapis to version 2.3. (Uwe Schindler)
* LUCENE-7693: Replace "org.apache." logic in GetMavenDependenciesTask.
(Daniel Collins, Christine Poerschke)
* LUCENE-7726: Fix HTML entity bugs in Javadocs to be able to build with
Java 9. (Uwe Schindler, Hossman)
* LUCENE-7727: Replace end-of-life Markdown parser "Pegdown" by "Flexmark"
for compatibility with Java 9. (Uwe Schindler)
Other
* LUCENE-7666: Fix typos in lucene-join package info javadoc.
(Tom Saleeba via Christine Poerschke)
* LUCENE-7658: queryparser/xml CoreParser now implements SpanQueryBuilder interface.
(Daniel Collins, Christine Poerschke)
* LUCENE-7715: NearSpansUnordered simplifications.
(Paul Elschot via Adrien Grand)
======================= Lucene 6.4.2 =======================
Bug Fixes
* LUCENE-7676: Fixed FilterCodecReader to override more super-class methods.
Also added TestFilterCodecReader class. (Christine Poerschke)
* LUCENE-7717: The UnifiedHighlighter and PostingsHighlighter were not highlighting
prefix queries with multi-byte characters. TermRangeQuery is affected too.
(Dmitry Malinin, David Smiley)
======================= Lucene 6.4.1 =======================
Build
* LUCENE-7651: Fix Javadocs build for Java 8u121 by injecting "Google Code
Prettify" without adding Javascript to Javadocs's -bottom parameter.
Also update Prettify to latest version to fix Google Chrome issue.
(Uwe Schindler)
Bug Fixes
* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery
with a TermContext is cached. (Adrien Grand)
* LUCENE-7647: Made stored fields reclaim native memory more aggressively when
configured with BEST_COMPRESSION. This could otherwise result in out-of-memory
issues. (Adrien Grand)
* LUCENE-7670: AnalyzingInfixSuggester should not immediately open an
IndexWriter over an already-built index. (Steve Rowe)
======================= Lucene 6.4.0 =======================
API Changes
* LUCENE-7533: Classic query parser no longer allows autoGeneratePhraseQueries
to be set to true when splitOnWhitespace is false (and vice-versa).
* LUCENE-7607: LeafFieldComparator.setScorer and SimpleFieldComparator.setScorer
are declared as throwing IOException (Alan Woodward)
* LUCENE-7617: Collector construction for two-pass grouping queries is
abstracted into a new Grouper class, which can be passed as a constructor
parameter to GroupingSearch. The abstract base classes for the different
grouping Collectors are renamed to remove the Abstract* prefix.
(Alan Woodward, Martijn van Groningen)
* LUCENE-7609: The expressions module now uses the DoubleValuesSource API, and
no longer depends on the queries module. Expression#getValueSource() is
replaced with Expression#getDoubleValuesSource(). (Alan Woodward, Adrien
Grand)
* LUCENE-7610: The facets module now uses the DoubleValuesSource API, and
methods that take ValueSource parameters are deprecated (Alan Woodward)
* LUCENE-7611: DocumentValueSourceDictionary now takes a LongValuesSource
as a parameter, and the ValueSource equivalent is deprecated (Alan Woodward)
New features
* LUCENE-5867: Added BooleanSimilarity. (Robert Muir, Adrien Grand)
* LUCENE-7466: Added AxiomaticSimilarity. (Peilin Yang via Tommaso Teofili)
* LUCENE-7590: Added DocValuesStatsCollector to compute statistics on DocValues
fields. (Shai Erera)
* LUCENE-7587: The new FacetQuery and MultiFacetQuery helper classes
make it simpler to execute drill down when drill sideways counts are
not needed (Emmanuel Keller via Mike McCandless)
* LUCENE-6664: A new SynonymGraphFilter outputs a correct graph
structure for multi-token synonyms, separating out a
FlattenGraphFilter that is hardwired into the current
SynonymFilter. This finally makes it possible to implement
correct multi-token synonyms at search time. See
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
for details. (Mike McCandless)
* LUCENE-5325: Added LongValuesSource and DoubleValuesSource, intended as
type-safe replacements for ValueSource in the queries module. These
expose per-segment LongValues or DoubleValues iterators. (Alan Woodward, Adrien Grand)
* LUCENE-7603: Graph token streams are now handled accurately by query
parsers, by enumerating all paths and creating the corresponding
query/ies as sub-clauses (Matt Weber via Mike McCandless)
* LUCENE-7588: DrillSideways can now run queries concurrently, and
supports an IndexSearcher using an executor service to run each query
concurrently across all segments in the index (Emmanuel Keller via
Mike McCandless)
* LUCENE-7627: Added .intersect methods to SortedDocValues and
SortedSetDocValues to allow filtering their TermsEnums with a
CompiledAutomaton (Alan Woodward, Mike McCandless)
Bug Fixes
* LUCENE-7547: JapaneseTokenizerFactory was failing to close the
dictionary file it opened (Markus via Mike McCandless)
* LUCENE-7562: CompletionFieldsConsumer sometimes throws
NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless)
* LUCENE-7533: Classic query parser: disallow autoGeneratePhraseQueries=true
when splitOnWhitespace=false (and vice-versa). (Steve Rowe)
* LUCENE-7536: ASCIIFoldingFilterFactory used to return an illegal multi-term
component when preserveOriginal was set to true. (Adrien Grand)
* LUCENE-7576: Fix Terms.intersect in the default codec to detect when
the incoming automaton is a special case and throw a clearer
exception than NullPointerException (Tom Mortimer via Mike McCandless)
* LUCENE-6989: Fix Exception handling in MMapDirectory's unmap hack
support code to work with Java 9's new InaccessibleObjectException
that does not extend ReflectiveAccessException in Java 9.
(Uwe Schindler)
* LUCENE-7581: Lucene now prevents updating a doc values field that is used
in the index sort, since this would lead to corruption. (Jim
Ferenczi via Mike McCandless)
* LUCENE-7570: IndexWriter may deadlock if a commit is running while
there are too many merges running and one of the merges hits a
tragic exception (Joey Echeverria via Mike McCandless)
* LUCENE-7594: Fixed point range queries on floating-point types to recommend
using helpers for exclusive bounds that are consistent with Double.compare.
(Adrien Grand, Dawid Weiss)
* LUCENE-7606: Normalization with CustomAnalyzer would only apply the last
token filter. (Adrien Grand)
* LUCENE-7612: Removed an unused dependency from the suggester to the misc
module. (Alan Woodward)
Improvements
* LUCENE-7532: Add back lost codec file format documentation
(Shinichiro Abe via Mike McCandless)
* LUCENE-6824: TermAutomatonQuery now rewrites to TermQuery,
PhraseQuery or MultiPhraseQuery when the word automaton is simple
(Mike McCandless)
* LUCENE-7431: Allow a certain amount of overlap to be specified between the include
and exclude arguments of SpanNotQuery via negative pre and/or post arguments.
(Marc Morissette via David Smiley)
* LUCENE-7544: UnifiedHighlighter: add extension points for handling custom queries.
(Michael Braun, David Smiley)
* LUCENE-7538: Asking IndexWriter to store a too-massive text field
now throws IllegalArgumentException instead of a cryptic exception
that closes your IndexWriter (Steve Chen via Mike McCandless)
* LUCENE-7524: Added more detailed explanation of how IDF is computed in
ClassicSimilarity and BM25Similarity. (Adrien Grand)
* LUCENE-7564: AnalyzingInfixSuggester should close its IndexWriter by default
at the end of build(). (Steve Rowe)
* LUCENE-7526: Enhanced UnifiedHighlighter's passage relevancy for queries with
wildcards and sometimes just terms. Added shouldPreferPassageRelevancyOverSpeed()
which can be overridden to return false to eek out more speed in some cases.
(Timothy M. Rodriguez, David Smiley)
* LUCENE-7560: QueryBuilder.createFieldQuery is no longer final,
giving custom query parsers subclassing QueryBuilder more freedom to
control how text is analyzed and converted into a query (Matt Weber
via Mike McCandless)
* LUCENE-7537: Index time sorting now supports multi-valued sorts
using selectors (MIN, MAX, etc.) (Jim Ferenczi via Mike McCandless)
* LUCENE-7575: UnifiedHighlighter can now highlight fields with queries that don't
necessarily refer to that field (AKA requireFieldMatch==false). Disabled by default.
See UH get/setFieldMatcher. (Jim Ferenczi via David Smiley)
* LUCENE-7592: If the segments file is truncated, we now throw
CorruptIndexException instead of the more confusing EOFException
(Mike Drob via Mike McCandless)
* LUCENE-6989: Make MMapDirectory's unmap hack work with Java 9 EA (b150+):
Unmapping uses new sun.misc.Unsafe#invokeCleaner(ByteBuffer).
Java 9 now needs same permissions like Java 8;
RuntimePermission("accessClassInPackage.jdk.internal.ref")
is no longer needed. Support for older Java 9 builds was removed.
(Uwe Schindler)
* LUCENE-7401: Changed the way BKD trees pick the split dimension in order to
ensure all dimensions are indexed. (Adrien Grand)
* LUCENE-7614: Complex Phrase Query parser ignores double quotes around single token
prefix, wildcard, range queries (Mikhail Khludnev)
* LUCENE-7620: Added LengthGoalBreakIterator, a wrapper around another B.I. to skip breaks
that would create Passages that are too short. Only for use with the UnifiedHighlighter
(and probably PostingsHighlighter). (David Smiley)
Optimizations
* LUCENE-7568: Optimize merging when index sorting is used but the
index is already sorted (Jim Ferenczi via Mike McCandless)
* LUCENE-7563: The BKD in-memory index for dimensional points now uses
a compressed format, using substantially less RAM in some cases
(Adrien Grand, Mike McCandless)
* LUCENE-7583: BKD writing now buffers each leaf block in heap before
writing to disk, giving a small speedup in points-heavy use cases.
(Mike McCandless)
* LUCENE-7572: Doc values queries now cache their hash code. (Adrien Grand)
Other
* LUCENE-7546: Fixed references to benchmark wikipedia data and the Jenkins line-docs file
(David Smiley)
* LUCENE-7534: fix smokeTestRelease.py to run on Cygwin (Mikhail Khludnev)
* LUCENE-7559: UnifiedHighlighter: Make Passage and OffsetsEnum more exposed to allow
passage creation to be customized. (David Smiley)
* LUCENE-7599: Simplify TestRandomChains using Java's built-in Predicate and
Function interfaces. (Ahmet Arslan via Adrien Grand)
* LUCENE-7595: Improve RAMUsageTester in test-framework to estimate memory usage of
runtime classes and work with Java 9 EA (b148+). Disable static field heap usage
checker in LuceneTestCase. (Uwe Schindler, Dawid Weiss)
Build
* LUCENE-7387: fix defaultCodec in build.xml to account for the line ending (hossman)
* LUCENE-7543: Make changes-to-html target an offline operation, by moving the
Lucene and Solr DOAP RDF files into the Git source repository under
dev-tools/doap/ and then pulling release dates from those files, rather than
from JIRA. (Mano Kovacs, hossman, Steve Rowe)
* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9
build 148+. Also update JGit version for working-copy checks. (Uwe Schindler)
======================= Lucene 6.3.0 =======================
API Changes
New Features
* LUCENE-7438: New "UnifiedHighlighter" derivative of the PostingsHighlighter that
can consume offsets from postings, term vectors, or analysis. It can highlight phrases
as accurately as the standard Highlighter. Light term vectors can be used with offsets
in postings for fast wildcard (MultiTermQuery) highlighting.
(David Smiley, Timothy Rodriguez)
* LUCENE-7490: SimpleQueryParser now parses '*' to MatchAllDocsQuery
(Lee Hinman via Mike McCandless)
Bug Fixes
* LUCENE-7507: Upgrade morfologik-stemming to version 2.1.1 (fixes security
manager issue with Polish dictionary lookup). (Dawid Weiss)
* LUCENE-7472: MultiFieldQueryParser.getFieldQuery() drops queries that are
neither BooleanQuery nor TermQuery. (Steve Rowe)
* LUCENE-7456: PerFieldPostings/DocValues was failing to delegate the
merge method (Julien MASSENET via Mike McCandless)
* LUCENE-7468: ASCIIFoldingFilter should not emit duplicated tokens when
preserve original is on. (David Causse via Adrien Grand)
* LUCENE-7484: FastVectorHighlighter failed to highlight SynonymQuery
(Jim Ferenczi via Mike McCandless)
* LUCENE-7476: JapaneseNumberFilter should not invoke incrementToken
on its input after it's exhausted (Andy Hind via Mike McCandless)
* LUCENE-7486: DisjunctionMaxQuery does not work correctly with queries that
return negative scores. (Ivan Provalov, Uwe Schindler, Adrien Grand)
* LUCENE-7491: Suddenly turning on dimensional points for some fields
that already exist in an index but didn't previously index
dimensional points could cause unexpected merge exceptions (Hans
Lund, Mike McCandless)
* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points.
(Hossman)
* LUCENE-7493: FacetCollector.search threw an unexpected exception if
you asked for zero hits but wanted facets (Mahesh via Mike McCandless)
* LUCENE-7505: AnalyzingInfixSuggester returned invalid results when
allTermsRequired is false and context filters are specified (Mike
McCandless)
* LUCENE-7429: AnalyzerWrapper can now modify the normalization chain too and
DelegatingAnalyzerWrapper does the right thing automatically. (Adrien Grand)
* LUCENE-7135: Lucene's check for 32 or 64 bit JVM now works around security
manager blocking access to some properties (Aaron Madlon-Kay via
Mike McCandless)
Improvements
* LUCENE-7439: FuzzyQuery now matches all terms within the specified
edit distance, even if they are short terms (Mike McCandless)
* LUCENE-7496: Better toString for SweetSpotSimilarity (janhoy)
* LUCENE-7520: Highlighter's WeightedSpanTermExtractor shouldn't attempt to expand a MultiTermQuery
when its field doesn't match the field the extraction is scoped to.
(Cao Manh Dat via David Smiley)
Optimizations
* LUCENE-7501: BKDReader should not store the split dimension explicitly in the
1D case. (Adrien Grand)
Other
* LUCENE-7513: Upgrade randomizedtesting to 2.4.0. (Dawid Weiss)
* LUCENE-7452: Block join query exception suggests how to find a doc, which
violates orthogonality requirement. (Mikhail Khludnev)
* LUCENE-7438: Renovate the Benchmark module's support for benchmarking highlighting. All
highlighters are supported via SearchTravRetHighlight. (David Smiley)
Build
* LUCENE-7292: Fix build to use "--release 8" instead of "-release 8" on
Java 9 (this changed with recent EA build b135). (Uwe Schindler)
======================= Lucene 6.2.1 =======================
API Changes
* LUCENE-7436: MinHashFilter's constructor, and some of its default
settings, should be public. (Doug Turnbull via Mike McCandless)
Bug Fixes
* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when
trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one
term. (Thomas Kappler via David Smiley)
* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an
ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs)
with large skips. (yonik)
* LUCENE-7442: MinHashFilter's ctor should validate its args.
(Cao Manh Dat via Steve Rowe)
* LUCENE-7318: Fix backwards compatibility issues around StandardAnalyzer
and its components, introduced with Lucene 6.2.0. The moved classes
were restored in their original packages: LowercaseFilter and StopFilter,
as well as several utility classes. (Uwe Schindler, Mike McCandless)
======================= Lucene 6.2.0 =======================
API Changes
* ScoringWrapperSpans was removed since it had no purpose or effect as of Lucene 5.5.
New Features
* LUCENE-7388: Add point based IntRangeField, FloatRangeField, LongRangeField along with
supporting queries and tests (Nick Knize)
* LUCENE-7381: Add point based DoubleRangeField and RangeFieldQuery for
indexing and querying on Ranges up to 4 dimensions (Nick Knize)
* LUCENE-6968: LSH Filter (Tommaso Teofili, Andy Hind, Cao Manh Dat)
* LUCENE-7302: IndexWriter methods that change the index now return a
long "sequence number" indicating the effective equivalent
single-threaded execution order (Mike McCandless)
* LUCENE-7335: IndexWriter's commit data is now late binding,
recording key/values from a provided iterable based on when the
commit actually takes place (Mike McCandless)
* LUCENE-7287: UkrainianMorfologikAnalyzer is a new dictionary-based
analyzer for the Ukrainian language (Andriy Rysin via Mike
McCandless)
* LUCENE-7373: Directory.renameFile, which did both renaming and fsync
of the directory metadata, has been deprecated; use the new separate
methods Directory.rename and Directory.syncMetaData instead (Robert Muir,
Uwe Schindler, Mike McCandless)
* LUCENE-7355: Added Analyzer#normalize(), which only applies normalization to
an input string. (Adrien Grand)
* LUCENE-7380: Add Polygon.fromGeoJSON for more easily creating
Polygon instances from a standard GeoJSON string (Robert Muir, Mike
McCandless)
* LUCENE-7395: PerFieldSimilarityWrapper requires a default similarity
for calculating query norm and coordination factor in Lucene 6.x.
Lucene 7 will no longer have those factors. (Uwe Schindler, Sascha Markus)
* SOLR-9279: Queries module: new ComparisonBoolFunction base class
(Doug Turnbull via David Smiley)
Bug Fixes
* LUCENE-6662: Fixed potential resource leaks. (Rishabh Patel via Adrien Grand)
* LUCENE-7340: MemoryIndex.toString() could throw NPE; fixed. Renamed to toStringDebug().
(Daniel Collins, David Smiley)
* LUCENE-7382: Fix bug introduced by LUCENE-7355 that used the
wrong default AttributeFactory for new Tokenizers.
(Terry Smith, Uwe Schindler)
* LUCENE-7389: Fix FieldType.setDimensions(...) validation for the dimensionNumBytes
parameter. (Martijn van Groningen)
* LUCENE-7391: Fix performance regression in MemoryIndex's fields() introduced
in Lucene 6. (Steve Mason via David Smiley)
* LUCENE-7395, SOLR-9315: Fix PerFieldSimilarityWrapper to also delegate query
norm and coordination factor using a default similarity added as ctor param.
(Uwe Schindler, Sascha Markus)
* SOLR-9413: Fix analysis/kuromoji's CSVUtil.quoteEscape logic, add TestCSVUtil test.
(AppChecker, Christine Poerschke)
* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup
PositionIncrementAttribute every time. (Mike McCandless, Robert Muir)
Improvements
* LUCENE-7323: Compound file writing now verifies the incoming
sub-files' checkums and segment IDs, to catch hardware issues or
filesytem bugs earlier (Robert Muir, Mike McCandless)
* LUCENE-6766: Index time sorting has graduated from the misc module
to core, is much simpler to use, via
IndexWriter.setIndexSort, and now works with dimensional points.
(Adrien Grand, Mike McCandless)
* LUCENE-5931: Detect when an application tries to reopen an
IndexReader after (illegally) removing the old index and
reindexing (Vitaly Funstein, Robert Muir, Mike McCandless)
* LUCENE-6171: Lucene now passes the StandardOpenOption.CREATE_NEW
option when writing new files so the filesystem enforces our
write-once architecture, possibly catching externally caused
issues sooner (Robert Muir, Mike McCandless)
* LUCENE-7318: StandardAnalyzer has been moved from the analysis
module into core and is now the default analyzer in
IndexWriterConfig (Robert Muir, Mike McCandless)
* LUCENE-7345: RAMDirectory now enforces write-once files as well
(Robert Muir, Mike McCandless)
* LUCENE-7337: MatchNoDocsQuery now scores with 0 normalization factor
and empty boolean queries now rewrite to MatchNoDocsQuery instead of
vice/versa (Jim Ferenczi via Mike McCandless)
* LUCENE-7359: Add equals() and hashCode() to Explanation (Alan Woodward)
* LUCENE-7353: ScandinavianFoldingFilterFactory and
ScandinavianNormalizationFilterFactory now implement MultiTermAwareComponent.
(Adrien Grand)
* LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
control whether to split on whitespace prior to text analysis. Default
behavior remains unchanged: split-on-whitespace=true. (Steve Rowe)
* LUCENE-7276: MatchNoDocsQuery now includes an optional reason for
why it was used (Jim Ferenczi via Mike McCandless)
* LUCENE-7355: AnalyzingQueryParser now only applies the subset of the analysis
chain that is about normalization for range/fuzzy/wildcard queries.
(Adrien Grand)
* LUCENE-7376: Add support for ToParentBlockJoinQuery to fast vector highlighter's
FieldQuery. (Martijn van Groningen)
* LUCENE-7385: Improve/fix assert messages in SpanScorer. (David Smiley)
* LUCENE-7393: Add ICUTokenizer option to parse Myanmar text as syllables instead of words,
because the ICU word-breaking algorithm has some issues. This allows for the previous
tokenization used before Lucene 5. (AM, Robert Muir)
* LUCENE-7409: Changed MMapDirectory's unmapping to work safer, but still with
no guarantees. This uses a store-store barrier and yields the current thread
before unmapping to allow in-flight requests to finish. The new code no longer
uses WeakIdentityMap as it delegates all ByteBuffer reads throgh a new
ByteBufferGuard wrapper that is shared between all ByteBufferIndexInput clones.
(Robert Muir, Uwe Schindler)
Optimizations
* LUCENE-7330, LUCENE-7339: Speed up conjunction queries. (Adrien Grand)
* LUCENE-7356: SearchGroup tweaks. (Christine Poerschke)
* LUCENE-7351: Doc id compression for points. (Adrien Grand)
* LUCENE-7371: Point values are now better compressed using run-length
encoding. (Adrien Grand)
* LUCENE-7311: Cached term queries do not seek the terms dictionary anymore.
(Adrien Grand)
* LUCENE-7396, LUCENE-7399: Faster flush of points.
(Adrien Grand, Mike McCandless)
* LUCENE-7406: Automaton and PrefixQuery tweaks (fewer object (re)allocations).
(Christine Poerschke)
Other
* LUCENE-4787: Fixed some highlighting javadocs. (Michael Dodsworth via Adrien
Grand)
* LUCENE-7334: Update ASM dependency to 5.1. (Uwe Schindler)
* LUCENE-7346: Update forbiddenapis to version 2.2.
(Uwe Schindler)
* LUCENE-7360: Explanation.toHtml() is deprecated. (Alan Woodward)
* LUCENE-7372: Factor out an org.apache.lucene.search.FilterWeight class.
(Christine Poerschke, Adrien Grand, David Smiley)
* LUCENE-7384: Removed ScoringWrapperSpans. And tweaked SpanWeight.buildSimWeight() to
reuse the existing Similarity instead of creating a new one. (David Smiley)
======================= Lucene 6.1.0 =======================
New Features
* LUCENE-7099: Add LatLonDocValuesField.newDistanceSort to the sandbox.
(Robert Muir)
* LUCENE-7140: Add PlanetModel.bisection to spatial3d (Karl Wright via
Mike McCandless)
* LUCENE-7069: Add LatLonPoint.nearest, to find nearest N points to a
provided query point (Mike McCandless)
* LUCENE-7234: Added InetAddressPoint.nextDown/nextUp to easily generate range
queries with excluded bounds. (Adrien Grand)
* LUCENE-7300: The misc module now has a directory wrapper that uses hard-links if
applicable and supported when copying files from another FSDirectory in
Directory#copyFrom. (Simon Willnauer)
API Changes
* LUCENE-7184: Refactor LatLonPoint encoding methods to new GeoEncodingUtils
helper class in core geo package. Also refactors LatLonPointTests to
TestGeoEncodingUtils (Nick Knize)
* LUCENE-7163: refactor GeoRect, Polygon, and GeoUtils tests to geo
package in core (Nick Knize)
* LUCENE-7152: Refactor GeoUtils from lucene-spatial package to
core (Nick Knize)
* LUCENE-7141: Switch OfflineSorter's ByteSequencesReader to
BytesRefIterator (Mike McCandless)
* LUCENE-7150: Spatial3d gets useful APIs to create common shape
queries, matching LatLonPoint. (Karl Wright via Mike McCandless)
* LUCENE-7243: Removed the LeafReaderContext parameter from
QueryCachingPolicy#shouldCache. (Adrien Grand)
Optimizations
* LUCENE-7071: Reduce bytes copying in OfflineSorter, giving ~10%
speedup on merging 2D LatLonPoint values (Mike McCandless)
* LUCENE-7105, LUCENE-7215: Optimize LatLonPoint's newDistanceQuery.
(Robert Muir)
* LUCENE-7097: IntroSorter now recurses to 2 * log_2(count) quicksort
stack depth before switching to heapsort (Adrien Grand, Mike McCandless)
* LUCENE-7115: Speed up FieldCache.CacheEntry toString by setting initial
StringBuilder capacity (Gregory Chanan)
* LUCENE-7147: Improve disjoint check for geo distance query traversal
(Ryan Ernst, Robert Muir, Mike McCandless)
* LUCENE-7153: GeoPointField and LatLonPoint polygon queries now support
multiple polygons and holes, with memory usage independent of
polygon complexity. (Karl Wright, Mike McCandless, Robert Muir)
* LUCENE-7159: Speed up LatLonPoint polygon performance. (Robert Muir, Ryan Ernst)
* LUCENE-7211: Reduce memory & GC for spatial RPT Intersects when the number of
matching docs is small. (Jeff Wartes, David Smiley)
* LUCENE-7235: LRUQueryCache should not take a lock for segments that it will
not cache on anyway. (Adrien Grand)
* LUCENE-7238: Explicitly disable the query cache in MemoryIndex#createSearcher.
(Adrien Grand)
* LUCENE-7237: LRUQueryCache now prefers returning an uncached Scorer than
waiting on a lock. (Adrien Grand)
* LUCENE-7261, LUCENE-7262, LUCENE-7264, LUCENE-7258: Speed up DocIdSetBuilder
(which is used by TermsQuery, multi-term queries and several point queries).
(Adrien Grand, Jeff Wartes, David Smiley)
* LUCENE-7299: Speed up BytesRefHash.sort() using radix sort. (Adrien Grand)
* LUCENE-7306: Speed up points indexing and merging using radix sort.
(Adrien Grand)
Bug Fixes
* LUCENE-7127: Fix corner case bugs in GeoPointDistanceQuery. (Robert Muir)
* LUCENE-7166: Fix corner case bugs in LatLonPoint/GeoPointField bounding box
queries. (Robert Muir)
* LUCENE-7168: Switch to stable encode for geo3d, remove quantization
test leniency, remove dead code (Mike McCandless)
* LUCENE-7301: Multiple doc values updates to the same document within
one update batch could be applied in the wrong order resulting in
the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless)
* LUCENE-7312: Fix geo3d's x/y/z double to int encoding to ensure it always
rounds down (Karl Wright, Mike McCandless)
* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases
where ranges of documents had only a single clause matching while
other ranges had more than one clause matching (Ahmet Arslan,
hossman, Mike McCandless)
* LUCENE-7286: Added support for highlighting SynonymQuery. (Adrien Grand)
* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the
dateline and indexed non-point shapes are much bigger than the heatmap region.
(David Smiley)
* LUCENE-7333: Fix test bug where randomSimpleString() generated a filename
that is a reserved device name on Windows. (Uwe Schindler, Mike McCandless)
Other
* LUCENE-7295: TermAutomatonQuery.hashCode calculates Automaton.toDot().hash,
equivalence relationship replaced with object identity. (Dawid Weiss)
* LUCENE-7277: Make Query.hashCode and Query.equals abstract. (Paul Elschot,
Dawid Weiss)
* LUCENE-7174: Upgrade randomizedtesting to 2.3.4. (Uwe Schindler, Dawid Weiss)
* LUCENE-7205: Remove repeated nl.getLength() calls in
(Boolean|DisjunctionMax|FuzzyLikeThis)QueryBuilder. (Christine Poerschke)
* LUCENE-7210: Make TestCore*Parser's analyzer choice override-able
(Christine Poerschke, Daniel Collins)
* LUCENE-7263: Make queryparser/xml/CoreParser's SpanQueryBuilderFactory
accessible to deriving classes. (Daniel Collins via Christine Poerschke)
* SOLR-9109/SOLR-9121: Allow specification of a custom Ivy settings file via system
property "ivysettings.xml". (Misha Dmitriev, Christine Poerschke, Uwe Schindler, Steve Rowe)
* LUCENE-7206: Improve the ToParentBlockJoinQuery's explain by including the explain
of the best matching child doc. (Ilya Kasnacheev, Jeff Evans via Martijn van Groningen)
* LUCENE-7307: Add getters to the PointInSetQuery and PointRangeQuery queries.
(Martijn van Groningen, Adrien Grand)
Build
* LUCENE-7292: Use '-release' instead of '-source/-target' during
compilation on Java 9+ to ensure real cross-compilation.
(Uwe Schindler)
* LUCENE-7296: Update forbiddenapis to version 2.1.
(Uwe Schindler)
======================= Lucene 6.0.1 =======================
New Features
* LUCENE-7278: Spatial-extras DateRangePrefixTree's Calendar is now configurable, to
e.g. clear the Gregorian Change Date. Also, toString(cal) is now identical to
DateTimeFormatter.ISO_INSTANT. (David Smiley)
Bug Fixes
* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations
should delegate to the wrapped weight. (Martijn van Groningen)
* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand)
* LUCENE-7232: Fixed InetAddressPoint.newPrefixQuery, which was generating an
incorrect query when the prefix length was not a multiple of 8. (Adrien Grand)
* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException
on some valid inputs (Mike McCandless)
* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll()
that led to IllegalStateException being thrown when nothing was wrong.
(David Smiley, yonik)
* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders
match the underlying queries' (lower|upper)Term optionality logic.
(Kaneshanathan Srivisagan, Christine Poerschke)
* LUCENE-7257: Fixed PointValues#size(IndexReader, String), docCount,
minPackedValue and maxPackedValue to skip leaves that do not have points
rather than raising an IllegalStateException. (Adrien Grand)
* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan
Woodward)
* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term
phrase queries. (Eva Popenda, Alan Woodward)
* LUCENE-7293: Don't try to highlight GeoPoint queries (Britta Weber,
Nick Knize, Mike McCandless, Uwe Schindler)
Documentation
* LUCENE-7223: Improve XXXPoint javadocs to make it clear that you
should separately add StoredField if you want to retrieve these
field values at search time (Greg Huber, Robert Muir, Mike McCandless)
======================= Lucene 6.0.0 =======================
System Requirements
* LUCENE-5950: Move to Java 8 as minimum Java version.
(Ryan Ernst, Uwe Schindler)
* LUCENE-6069: Lucene Core now gets compiled with Java 8 "compact1" profile,
all other modules with "compact2". (Robert Muir, Uwe Schindler)
New Features
* LUCENE-6631: Lucene Document classification (Tommaso Teofili, Alessandro Benedetti)
* LUCENE-6747: FingerprintFilter is a TokenFilter that outputs a single
token which is a concatenation of the sorted and de-duplicated set of
input tokens. Useful for normalizing short text in clustering/linking
tasks. (Mark Harwood, Adrien Grand)
* LUCENE-5735: NumberRangePrefixTreeStrategy now includes interval/range faceting
for counting ranges that align with the underlying terms as defined by the
NumberRangePrefixTree (e.g. familiar date units like days). (David Smiley)
* LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field
length computations, to avoid skew from documents that don't have the field.
(Ahmet Arslan via Robert Muir)
* LUCENE-6758: Use docCount+1 for DefaultSimilarity's IDF, so that queries
containing nonexistent fields won't screw up querynorm. (Terry Smith, Robert Muir)
* SOLR-7876: The QueryTimeout interface now has a isTimeoutEnabled method
that can return false to exit from ExitableDirectoryReader wrapping at
the point fields() is called. (yonik)
* LUCENE-6825: Add low-level support for block-KD trees (Mike McCandless)
* LUCENE-6852, LUCENE-6975: Add support for points (dimensionally
indexed values) to index, document and codec APIs, including a
simple text implementation. (Mike McCandless)
* LUCENE-6861: Create Lucene60Codec, supporting points.
(Mike McCandless)
* LUCENE-6879: Allow to define custom CharTokenizer instances without
subclassing using Java 8 lambdas or method references. (Uwe Schindler)
* LUCENE-6881: Cutover all BKD implementations to points
(Mike McCandless)
* LUCENE-6837: Add N-best output support to JapaneseTokenizer.
(Hiroharu Konno via Christian Moen)
* LUCENE-6962: Add per-dimension min/max to points
(Mike McCandless)
* LUCENE-6975: Add ExactPointQuery, to match a single N-dimensional
point (Robert Muir, Mike McCandless)
* LUCENE-6989: Add preliminary support for MMapDirectory unmapping in Java 9.
(Uwe Schindler, Chris Hegarty, Peter Levart)
* LUCENE-7040: Upgrade morfologik-stemming to version 2.1.0.
(Dawid Weiss)
* LUCENE-7048: Add XXXPoint.newSetQuery, to create a query that
efficiently matches all documents containing any of the specified
point values. This is the analog of TermsQuery, but for points
instead. (Adrien Grand, Robert Muir, Mike McCandless)
API Changes
* LUCENE-7094: BBoxStrategy and PointVectorStrategy now support
PointValues (in addition to legacy numeric trie). Their APIs
were changed a little and also made more consistent. PointValues/Trie
is optional, DocValues is optional, stored value is optional.
(Nick Knize, David Smiley)
* LUCENE-6067: Accountable.getChildResources has a default
implementation returning the empty list. (Robert Muir)
* LUCENE-6583: FilteredQuery has been removed. Instead, you can construct a
BooleanQuery with one MUST clause for the query, and one FILTER clause for
the filter. (Adrien Grand)
* LUCENE-6651: AttributeImpl#reflectWith(AttributeReflector) was made
abstract and has no reflection-based default implementation anymore.
(Uwe Schindler)
* LUCENE-6706: PayloadTermQuery and PayloadNearQuery have been removed.
Instead, use PayloadScoreQuery to wrap any SpanQuery. (Alan Woodward)
* LUCENE-6829: OfflineSorter, and the classes that use it (suggesters,
hunspell) now do all temporary file IO via Directory instead of
directly through java's temp dir. Directory.createTempOutput
creates a uniquely named IndexOutput, and the new
IndexOutput.getName returns its name (Dawid Weiss, Robert Muir, Mike
McCandless)
* LUCENE-6917: Deprecate and rename NumericXXX classes to
LegacyNumericXXX in favor of points (Mike McCandless)
* LUCENE-6947: SortField.missingValue is now protected. You can read its
value using the new SortField.getMissingValue getter. (Adrien Grand)
* LUCENE-7028: Remove duplicate method in LegacyNumericUtils.
(Uwe Schindler)
* LUCENE-7052, LUCENE-7053: Remove custom comparators from BytesRef
class and solely use natural byte[] comparator throughout codebase.
This also simplifies API of BytesRefHash. It also replaces the natural
comparator in ArrayUtil by Java 8's Comparator#naturalOrder().
(Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-7060: Update Spatial4j to 0.6. The package com.spatial4j.core
is now org.locationtech.spatial4j. (David Smiley)
* LUCENE-7058: Add getters to various Query implementations (Guillaume Smet via
Alan Woodward)
* LUCENE-7064: MultiPhraseQuery is now immutable and should be constructed
with MultiPhraseQuery.Builder. (Luc Vanlerberghe via Adrien Grand)
* LUCENE-7072: Geo3DPoint always uses WGS84 planet model.
(Robert Muir, Mike McCandless)
* LUCENE-7056: Geo3D classes are in different packages now. (David Smiley)
* LUCENE-6952: These classes are now abstract: FilterCodecReader, FilterLeafReader,
FilterCollector, FilterDirectory. And some Filter* classes in
lucene-test-framework too. (David Smiley)
* SOLR-8867: FunctionValues.getRangeScorer now takes a LeafReaderContext instead
of an IndexReader, and avoids matching documents without a value in the field
for numeric fields. (yonik)
Optimizations
* LUCENE-6891: Use prefix coding when writing points in
each leaf block in the default codec, to reduce the index
size (Mike McCandless)
* LUCENE-6901: Optimize points indexing: use faster
IntroSorter instead of InPlaceMergeSorter, and specialize 1D
merging to merge sort the already sorted segments instead of
re-indexing (Mike McCandless)
* LUCENE-6793: LegacyNumericRangeQuery.hashCode() is now less subject to hash
collisions. (J.B. Langston via Adrien Grand)
* LUCENE-7050: TermsQuery is now cached more aggressively by the default
query caching policy. (Adrien Grand)
* LUCENE-7066: PointRangeQuery got optimized for the case that all documents
have a value and all points from the segment match. (Adrien Grand)
Changes in Runtime Behavior
* LUCENE-6789: IndexSearcher's default Similarity is changed to BM25Similarity.
Use ClassicSimilarity to get the old vector space DefaultSimilarity. (Robert Muir)
* LUCENE-6886: Reserve the .tmp file name extension for temp files,
and codec components are no longer allowed to use this extension
(Robert Muir, Mike McCandless)
* LUCENE-6835: Directory.listAll now returns entries in sorted order,
to not leak platform-specific behavior, and "retrying file deletion"
is now the responsibility of Directory.deleteFile, not the caller.
(Robert Muir, Mike McCandless)
Tests
* LUCENE-7009: Add expectThrows utility to LuceneTestCase. This uses a lambda
expression to encapsulate a statement that is expected to throw an exception.
(Ryan Ernst)
Bug Fixes
* LUCENE-7065: Fix the explain for the global ordinals join query. Before the
explain would also indicate that non matching documents would match.
On top of that with score mode average, the explain would fail with a NPE.
(Martijn van Groningen)
* LUCENE-7101: OfflineSorter had O(N^2) merge cost, and used too many
temporary file descriptors, for large sorts (Mike McCandless)
* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for
Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe)
* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance
implementation (Karl Wright via Mike McCandless)
* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called
on queries that could not be extracted. (Adrien Grand)
* LUCENE-7126: Remove GeoPointDistanceRangeQuery. This query was implemented
with boolean NOT, and incorrect for multi-valued documents. (Robert Muir)
* LUCENE-7158: Consistently use earth's WGS84 mean radius wherever our
geo search implementations approximate the earth as a sphere (Karl
Wright via Mike McCandless)
Other
* LUCENE-7035: Upgrade icu4j to 56.1/unicode 8. (Robert Muir)
* LUCENE-7087: Let MemoryIndex#fromDocument(...) accept 'Iterable<? extends IndexableField>'
as document instead of 'Document'. (Martijn van Groningen)
* LUCENE-7091: Add doc values support to MemoryIndex
(Martijn van Groningen, David Smiley)
* LUCENE-7093: Add point values support to MemoryIndex
(Martijn van Groningen, Mike McCandless)
* LUCENE-7095: Add point values support to the numeric field query time join.
(Martijn van Groningen, Mike McCandless)
======================= Lucene 5.5.5 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
Bug Fixes
* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup
PositionIncrementAttribute every time. (Mike McCandless, Robert Muir)
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
======================= Lucene 5.5.4 =======================
Bug Fixes
* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when
trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one
term. (Thomas Kappler via David Smiley)
* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery
with a TermContext is cached. (Adrien Grand)
* LUCENE-7647: Made stored fields reclaim native memory more aggressively when
configured with BEST_COMPRESSION. This could otherwise result in out-of-memory
issues. (Adrien Grand)
* LUCENE-7562: CompletionFieldsConsumer sometimes throws
NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless)
* LUCENE-7547: JapaneseTokenizerFactory was failing to close the
dictionary file it opened (Markus via Mike McCandless)
* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points.
(Hossman)
* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an
ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs)
with large skips. (yonik)
* LUCENE-7570: IndexWriter may deadlock if a commit is running while
there are too many merges running and one of the merges hits a
tragic exception (Joey Echeverria via Mike McCandless)
Other
* LUCENE-6989: Backport MMapDirectory's unmapping code from Lucene 6.4 to use
MethodHandles. This allows it to work with Java 9 (EA build 150 and later).
(Uwe Schindler)
Build
* LUCENE-7543: Make changes-to-html target an offline operation, by moving the
Lucene and Solr DOAP RDF files into the Git source repository under
dev-tools/doap/ and then pulling release dates from those files, rather than
from JIRA. (Mano Kovacs, hossman, Steve Rowe)
* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9
build 148+. Also update JGit version for working-copy checks. This does not
fix all issues with Java 9, but allows to build the distribution.
(Uwe Schindler)
* LUCENE-7651: Backport (Lucene 6.4.1) fix for Java 8u121 to allow documentation
build to inject "Google Code Prettify" without adding Javascript to Javadocs's
-bottom parameter. Unfortunately, this fix disables Prettify if Javadocs are
built with Java 7, as there is no generic way in Java 7 to inject Javascript
without breaking Java 8 (and possible paid Java 7 security updates). This
fix also updates Prettify to latest version to work around a Google Chrome
issue. (Uwe Schindler)
======================= Lucene 5.5.3 =======================
(No Changes)
======================= Lucene 5.5.2 =======================
Bug Fixes
* LUCENE-7065: Fix the explain for the global ordinals join query. Before the
explain would also indicate that non matching documents would match.
On top of that with score mode average, the explain would fail with a NPE.
(Martijn van Groningen)
* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for
Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe)
* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance
implementation (Karl Wright via Mike McCandless)
* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations
should delegate to the wrapped weight. (Martijn van Groningen)
* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException
on some valid inputs (Mike McCandless)
* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders
match the underlying queries' (lower|upper)Term optionality logic.
(Kaneshanathan Srivisagan, Christine Poerschke)
* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan
Woodward)
* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term
phrase queries. (Eva Popenda, Alan Woodward)
* LUCENE-7301: Multiple doc values updates to the same document within
one update batch could be applied in the wrong order resulting in
the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless)
* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases
where ranges of documents had only a single clause matching while
other ranges had more than one clause matching (Ahmet Arslan,
hossman, Mike McCandless)
* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the
dateline and indexed non-point shapes are much bigger than the heatmap region.
(David Smiley)
======================= Lucene 5.5.1 =======================
Bug fixes
* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called
on queries that could not be extracted. (Adrien Grand)
* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll()
that led to IllegalStateException being thrown when nothing was wrong.
(David Smiley, yonik)
* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand)
======================= Lucene 5.5.0 =======================
New Features
* LUCENE-5868: JoinUtil.createJoinQuery(..,NumericType,..) query-time join
for LONG and INT fields with NUMERIC and SORTED_NUMERIC doc values.
(Alexey Zelin via Mikhail Khludnev)
* LUCENE-6939: Add exponential reciprocal scoring to
BlendedInfixSuggester, to even more strongly favor suggestions that
match closer to the beginning (Arcadius Ahouansou via Mike McCandless)
* LUCENE-6958: Improved CustomAnalyzer to take class references to factories
as alternative to their SPI name. This enables compile-time safety when
defining analyzer's components. (Uwe Schindler, Shai Erera)
* LUCENE-6818, LUCENE-6986: Add DFISimilarity implementing the divergence
from independence model. (Ahmet Arslan via Robert Muir)
* SOLR-4619: Added removeAllAttributes() to AttributeSource, which removes
all previously added attributes.
* LUCENE-7010: Added MergePolicyWrapper to allow easy wrapping of other policies.
(Shai Erera)
API Changes
* LUCENE-6997: refactor sandboxed GeoPointField and query classes to lucene-spatial
module under new lucene.spatial.geopoint package (Nick Knize)
* LUCENE-6908: GeoUtils static relational methods have been refactored to new
GeoRelationUtils and now correctly handle large irregular rectangles, and
pole crossing distance queries. (Nick Knize)
* LUCENE-6900: Grouping sortWithinGroup variables used to allow null to mean
Sort.RELEVANCE. Null is no longer permitted. (David Smiley)
* LUCENE-6919: The Scorer class has been refactored to expose an iterator
instead of extending DocIdSetIterator. asTwoPhaseIterator() has been renamed
to twoPhaseIterator() for consistency. (Adrien Grand)
* LUCENE-6973: TeeSinkTokenFilter no longer accepts a SinkFilter (the latter
has been removed). If you wish to filter the sinks, you can wrap them with
any other TokenFilter (e.g. a FilteringTokenFilter). Also, you can no longer
add a SinkTokenStream to an existing TeeSinkTokenFilter. If you need to
share multiple streams with a single sink, chain them with multiple
TeeSinkTokenFilters.
DateRecognizerSinkFilter was renamed to DateRecognizerFilter and moved under
analysis/common. TokenTypeSinkFilter was removed (use TypeTokenFilter instead).
TokenRangeSinkFilter was removed. (Shai Erera, Uwe Schindler)
* LUCENE-6980: Default applyAllDeletes to true when opening
near-real-time readers (Mike McCandless)
* LUCENE-6981: SpanQuery.getTermContexts() helper methods are now public, and
SpanScorer has a public getSpans() method. (Alan Woodward)
* LUCENE-6932: IndexInput.seek implementations now throw EOFException
if you seek beyond the end of the file (Adrien Grand, Mike McCandless)
* LUCENE-6988: IndexableField.tokenStream() no longer throws IOException
(Alan Woodward)
* LUCENE-7028: Deprecate a duplicate method in NumericUtils.
(Uwe Schindler)
Optimizations
* LUCENE-6930: Decouple GeoPointField from NumericType by using a custom
and efficient GeoPointTokenStream and TermEnum designed for GeoPoint prefix
terms. (Nick Knize)
* LUCENE-6951: Improve GeoPointInPolygonQuery using point orientation based
line crossing algorithm, and adding result for multi-value docs when least
1 point satisfies polygon criteria. (Nick Knize)
* LUCENE-6889: BooleanQuery.rewrite now performs some query optimization, in
particular to rewrite queries that look like: "+*:* #filter" to a
"ConstantScore(filter)". (Adrien Grand)
* LUCENE-6912: Grouping's Collectors now calculate a response to needsScores()
instead of always 'true'. (David Smiley)
* LUCENE-6815: DisjunctionScorer now advances two-phased iterators lazily,
stopping to evaluate them as soon as a single one matches. The other iterators
will be confirmed lazily when computing score() or freq(). (Adrien Grand)
* LUCENE-6926: MUST_NOT clauses now use the match cost API to run the slow bits
last whenever possible. (Adrien Grand)
* LUCENE-6944: BooleanWeight no longer creates sub-scorers if BS1 is not
applicable. (Adrien Grand)
* LUCENE-6940: MUST_NOT clauses execute faster, especially when they are sparse.
(Adrien Grand)
* LUCENE-6470: Improve efficiency of TermsQuery constructors. (Robert Muir)
Bug Fixes
* LUCENE-6976: BytesRefTermAttributeImpl.copyTo NPE'ed if BytesRef was null.
Added equals & hashCode, and a new test for these things. (David Smiley)
* LUCENE-6932: RAMDirectory's IndexInput was failing to throw
EOFException in some cases (Stéphane Campinas, Adrien Grand via Mike
McCandless)
* LUCENE-6896: Don't treat the smallest possible norm value as an infinitely
long document in SimilarityBase or BM25Similarity. Add more warnings to sims
that will not work well with extreme tf values. (Ahmet Arslan, Robert Muir)
* LUCENE-6984: SpanMultiTermQueryWrapper no longer modifies its wrapped query.
(Alan Woodward, Adrien Grand)
* LUCENE-6998: Fix a couple places to better detect truncated index files
as corruption. (Robert Muir, Mike McCandless)
* LUCENE-7002: Fixed MultiCollector to not throw a NPE if setScorer is called
after one of the sub collectors is done collecting. (John Wang, Adrien Grand)
* LUCENE-7027: Fixed NumericTermAttribute to not throw IllegalArgumentException
after NumericTokenStream was exhausted. (Uwe Schindler, Lee Hinman,
Mike McCandless)
* LUCENE-7018: Fix GeoPointTermQueryConstantScoreWrapper to add document on
first GeoPointField match. (Nick Knize)
* LUCENE-7019: Add two-phase iteration to GeoPointTermQueryConstantScoreWrapper.
(Robert Muir via Nick Knize)
* LUCENE-6989: Improve MMapDirectory's unmapping checks to catch more non-working
cases. The unmap-hack does not yet work with recent Java 9. Official support
will come with Lucene 6. (Uwe Schindler)
Other
* LUCENE-6924: Upgrade randomizedtesting to 2.3.2. (Dawid Weiss)
* LUCENE-6920: Improve custom function checks in expressions module
to use MethodHandles and work without extra security privileges.
(Uwe Schindler, Robert Muir)
* LUCENE-6921: Fix SPIClassIterator#isParentClassLoader to don't
require extra permissions. (Uwe Schindler)
* LUCENE-6923: Fix RamUsageEstimator to access private fields inside
AccessController block for computing size. (Robert Muir)
* LUCENE-6907: make TestParser extendable, rename test/.../xml/
NumericRangeQueryQuery.xml to NumericRangeQuery.xml
(Christine Poerschke)
* LUCENE-6925: add ForceMergePolicy class in test-framework
(Christine Poerschke)
* LUCENE-6945: factor out TestCorePlus(Queries|Extensions)Parser from
TestParser, rename TestParser to TestCoreParser (Christine Poerschke)
* LUCENE-6949: fix (potential) resource leak in SynonymFilterFactory
(https://scan.coverity.com/projects/5620 CID 120656)
(Christine Poerschke, Coverity Scan (via Rishabh Patel))
* LUCENE-6961: Improve Exception handling in AnalysisFactories /
AnalysisSPILoader: Don't wrap exceptions occuring in factory's
ctor inside InvocationTargetException. (Uwe Schindler)
* LUCENE-6965: Expression's JavascriptCompiler now throw ParseException
with bad function names or bad arity instead of IllegalArgumentException.
(Tomás Fernández Löbbe, Uwe Schindler, Ryan Ernst)
* LUCENE-6964: String-based signatures in JavascriptCompiler replaced
with better compile-time-checked MethodType; generated class files
are no longer marked as synthetic. (Uwe Schindler)
* LUCENE-6978: Refactor several code places that lookup locales
by string name to use BCP47 locale tag instead. LuceneTestCase
now also prints locales on failing tests this way.
Locale#forLanguageTag() and Locale#toString() were placed on list
of forbidden signatures. (Uwe Schindler, Robert Muir)
* LUCENE-6988: You can now add IndexableFields directly to a MemoryIndex,
and create a MemoryIndex from a lucene Document. (Alan Woodward)
* LUCENE-7005: TieredMergePolicy tweaks (>= vs. >, @see get vs. set)
(Christine Poerschke)
* LUCENE-7006: increase BaseMergePolicyTestCase use (TestNoMergePolicy and
TestSortingMergePolicy now extend it, TestUpgradeIndexMergePolicy added)
(Christine Poerschke)
======================= Lucene 5.4.1 =======================
Bug Fixes
* LUCENE-6910: fix 'if ... > Integer.MAX_VALUE' check in
(Binary|Numeric)DocValuesFieldUpdates.merge
(https://scan.coverity.com/projects/5620 CID 119973 and CID 120081)
(Christine Poerschke, Coverity Scan (via Rishabh Patel))
* LUCENE-6946: SortField.equals now takes the missingValue parameter into
account. (Adrien Grand)
* LUCENE-6918: LRUQueryCache.onDocIdSetEviction is only called when at least
one DocIdSet is being evicted. (Adrien Grand)
* LUCENE-6929: Fix SpanNotQuery rewriting to not drop the pre/post parameters.
(Tim Allison via Adrien Grand)
* LUCENE-6950: Fix FieldInfos handling of UninvertingReader, e.g. do not
hide the true docvalues update generation or other properties.
(Ishan Chattopadhyaya via Robert Muir)
* LUCENE-6948: Fix ArrayIndexOutOfBoundsException in PagedBytes$Reader.fill
by removing an unnecessary long-to-int cast.
(Michael Lawley via Christine Poerschke)
* SOLR-7865: BlendedInfixSuggester was returning too many results
(Arcadius Ahouansou via Mike McCandless)
* LUCENE-6970: Fixed off-by-one error in Lucene54DocValuesProducer that could
potentially corrupt doc values. (Adrien Grand)
* LUCENE-2229: Fix Highlighter's SimpleSpanFragmenter when multiple adjacent
stop words following a span can unduly make the fragment way too long.
(Elmer Garduno, Lukhnos Liu via David Smiley)
======================= Lucene 5.4.0 =======================
New Features
* LUCENE-6875: New Serbian Filter. (Nikola Smolenski via Robert Muir,
Dawid Weiss)
* LUCENE-6720: New FunctionRangeQuery wrapper around ValueSourceScorer
(returned from ValueSource/FunctionValues.getRangeScorer()). (David Smiley)
* LUCENE-6724: Add utility APIs to GeoHashUtils to compute neighbor
geohash cells (Nick Knize via Mike McCandless).
* LUCENE-6737: Add DecimalDigitFilter which folds unicode digits to basic latin.
(Robert Muir)
* LUCENE-6699: Add integration of BKD tree and geo3d APIs to give
fast, very accurate query to find all indexed points within an
earth-surface shape (Karl Wright, Mike McCandless)
* LUCENE-6838: Added IndexSearcher#getQueryCache and #getQueryCachingPolicy.
(Adrien Grand)
* LUCENE-6844: PayloadScoreQuery can include or exclude underlying span scores
from its score calculations (Bill Bell, Alan Woodward)
* LUCENE-6778: Add GeoPointDistanceRangeQuery, to search for points
within a "ring" (beyond a minimum distance and below a maximum
distance) (Nick Knize via Mike McCandless)
* LUCENE-6874: Add a new UnicodeWhitespaceTokenizer to analysis/common
that uses Unicode character properties extracted from ICU4J to tokenize
text on whitespace. This tokenizer will split on non-breaking
space (NBSP), too. (David Smiley, Uwe Schindler, Steve Rowe)
API Changes
* LUCENE-6590: Query.setBoost(), Query.getBoost() and Query.clone() are gone.
In order to apply boosts, you now need to wrap queries in a BoostQuery.
(Adrien Grand)
* LUCENE-6716: SpanPayloadCheckQuery now takes a List<BytesRef> rather than
a Collection<byte[]>. (Alan Woodward)
* LUCENE-6489: The various span payload queries have been moved to the queries
submodule, and PayloadSpanUtil is now in sandbox. (Alan Woodward)
* LUCENE-6650: The spatial module no longer uses Filter in any way. All
spatial Filters are now subclass Query. The spatial heatmap/facet API
now accepts a Bits parameter to filter counts. (David Smiley, Adrien Grand)
* LUCENE-6803: Deprecate sandbox Regexp Query. (Uwe Schindler)
* LUCENE-6301: org.apache.lucene.search.Filter is now deprecated. You should use
Query objects instead of Filters, and the BooleanClause.Occur.FILTER clause in
order to let Lucene know that a Query should be used for filtering but not
scoring.
* LUCENE-6939: SpanOrQuery.addClause is now deprecated, clauses should all be
provided at construction time. (Paul Elschot via Adrien Grand)
* LUCENE-6855: CachingWrapperQuery is deprecated and will be removed in 6.0.
(Adrien Grand)
* LUCENE-6870: DisjunctionMaxQuery#add is now deprecated, clauses should all be
provided at construction time. (Adrien Grand)
* LUCENE-6884: Analyzer.tokenStream() and Tokenizer.setReader() are no longer
declared as throwing IOException. (Alan Woodward)
* LUCENE-6849: Expose IndexWriter.flush() method, to move all
in-memory segments to disk without opening a near-real-time reader
nor calling fsync (Robert Muir, Simon Willnauer, Mike McCandless)
* LUCENE-6911: Add correct StandardQueryParser.getMultiFields() method,
deprecate no-op StandardQueryParser.getMultiFields(CharSequence[]) method.
(Christine Poerschke, Mikhail Khludnev, Coverity Scan (via Rishabh Patel))
Optimizations
* LUCENE-6708: TopFieldCollector does not compute the score several times on the
same document anymore. (Adrien Grand)
* LUCENE-6720: ValueSourceScorer, returned from
FunctionValues.getRangeScorer(), now uses TwoPhaseIterator. (David Smiley)
* LUCENE-6756: MatchAllDocsQuery now has a dedicated BulkScorer for better
performance when used as a top-level query. (Adrien Grand)
* LUCENE-6746: DisjunctionMaxQuery, BoostingQuery and BoostedQuery now create
sub weights through IndexSearcher so that they can be cached. (Adrien Grand)
* LUCENE-6754: Optimized IndexSearcher.count for the cases when it can use
index statistics instead of collecting all matches. (Adrien Grand)
* LUCENE-6773: Nested conjunctions now iterate over documents as if clauses
were all at the same level. (Adrien Grand)
* LUCENE-6777: Reuse BytesRef when visiting term ranges in
GeoPointTermsEnum to reduce GC pressure (Nick Knize via Mike
McCandless)
* LUCENE-6779: Reduce memory allocated by CompressingStoredFieldsWriter to write
strings larger than 64kb by an amount equal to string's utf8 size.
(Dawid Weiss, Robert Muir, shalin)
* LUCENE-6850: Optimize BooleanScorer for sparse clauses. (Adrien Grand)
* LUCENE-6840: Ordinal indexes for SORTED_SET/SORTED_NUMERIC fields and
addresses for BINARY fields are now stored on disk instead of in memory.
(Adrien Grand)
* LUCENE-6878: Speed up TopDocs.merge. (Daniel Jelinski via Adrien Grand)
* LUCENE-6885: StandardDirectoryReader (initialCapacity) tweaks
(Christine Poerschke)
* LUCENE-6863: Optimized storage requirements of doc values fields when less
than 1% of documents have a value. (Adrien Grand)
* LUCENE-6892: various lucene.index initialCapacity tweaks
(Christine Poerschke)
* LUCENE-6276: Added TwoPhaseIterator.matchCost() which allows to confirm the
least costly TwoPhaseIterators first. (Paul Elschot via Adrien Grand)
* LUCENE-6898: In the default codec, the last stored field value will not
be fully read from disk if the supplied StoredFieldVisitor doesn't want it.
So put your largest text field value last to benefit. (David Smiley)
* LUCENE-6909: Remove unnecessary synchronized from
FacetsConfig.getDimConfig for better concurrency (Sanne Grinovero
via Mike McCandless)
* SOLR-7730: Speed up SlowCompositeReaderWrapper.getSortedSetDocValues() by
avoiding merging FieldInfos just to check doc value type.
(Paul Vasilyev, Yuriy Pakhomov, Mikhail Khludnev, yonik)
Bug Fixes
* LUCENE-6905: Unwrap center longitude for dateline crossing
GeoPointDistanceQuery. (Nick Knize)
* LUCENE-6817: ComplexPhraseQueryParser.ComplexPhraseQuery does not display
slop in toString(). (Ahmet Arslan via Dawid Weiss)
* LUCENE-6730: Hyper-parameter c is ignored in term frequency NormalizationH1.
(Ahmet Arslan via Robert Muir)
* LUCENE-6742: Lovins & Finnish implementation of SnowballFilter was
fixed to behave exactly as specified. A bug in the snowball compiler
caused differences in output of the filter in comparison to the original
test data. In addition, the performance of those filters was improved
significantly. (Uwe Schindler, Robert Muir)
* LUCENE-6783: Removed side effects from FuzzyLikeThisQuery.rewrite.
(Adrien Grand)
* LUCENE-6776: Fix geo3d math to handle randomly squashed planet
models (Karl Wright via Mike McCandless)
* LUCENE-6792: Fix TermsQuery.toString() to work with binary terms.
(Ruslan Muzhikov, Robert Muir)
* LUCENE-5503: When Highlighter's WeightedSpanTermExtractor converts a
PhraseQuery to an equivalent SpanQuery, it would sometimes use a slop that is
too low (no highlight) or determine inOrder wrong.
(Tim Allison via David Smiley)
* LUCENE-6790: Fix IndexWriter thread safety when one thread is
handling a tragic exception but another is still committing (Mike
McCandless)
* LUCENE-6810: Upgrade to Spatial4j 0.5 -- fixes some edge-case bugs in the
spatial module. See https://github.com/locationtech/spatial4j/blob/master/CHANGES.md
(David Smiley)
* LUCENE-6813: OfflineSorter no longer removes its output Path up
front, and instead opens it for write with the
StandardCopyOption.REPLACE_EXISTING to overwrite any prior file, so
that callers can safely use Files.createTempFile for the output.
This change also fixes OfflineSorter's default temp directory when
running tests to use mock filesystems so e.g. we detect file handle
leaks (Dawid Weiss, Robert Muir, Mike McCandless)
* LUCENE-6813: RangeTreeWriter was failing to close all file handles
it opened, leading to intermittent failures on Windows (Dawid Weiss,
Robert Muir, Mike McCandless)
* LUCENE-6826: Fix ClassCastException when merging a field that has no
terms because they were filtered out by e.g. a FilterCodecReader
(Trejkaz via Mike McCandless)
* LUCENE-6823: LocalReplicator should use System.nanoTime as its clock
source for checking for expiration (Ishan Chattopadhyaya via Mike
McCandless)
* LUCENE-6856: The Weight wrapper used by LRUQueryCache now delegates to the
original Weight's BulkScorer when applicable. (Adrien Grand)
* LUCENE-6858: Fix ContextSuggestField to correctly wrap token stream
when using CompletionAnalyzer. (Areek Zillur)
* LUCENE-6872: IndexWriter handles any VirtualMachineError, not just OOM,
as tragic. (Robert Muir)
* LUCENE-6814: PatternTokenizer no longer hangs onto heap sized to the
maximum input string it's ever seen, which can be a large memory
"leak" if you tokenize large strings with many threads across many
indices (Alex Chow via Mike McCandless)
* LUCENE-6888: Explain output of map() function now also prints default value (janhoy)
Other
* LUCENE-6899: Upgrade randomizedtesting to 2.3.1. (Dawid Weiss)
* LUCENE-6478: Test execution can hang with java.security.debug. (Dawid Weiss)
* LUCENE-6862: Upgrade of RandomizedRunner to version 2.2.0. (Dawid Weiss)
* LUCENE-6857: Validate StandardQueryParser with NOT operator
with-in parantheses. (Jigar Shah via Dawid Weiss)
* LUCENE-6827: Use explicit capacity ArrayList instead of a LinkedList
in MultiFieldQueryNodeProcessor. (Dawid Weiss).
* LUCENE-6812: Upgrade RandomizedTesting to 2.1.17. (Dawid Weiss)
* LUCENE-6174: Improve "ant eclipse" to select right JRE for building.
(Uwe Schindler, Dawid Weiss)
* LUCENE-6417, LUCENE-6830: Upgrade ANTLR used in expressions module
to version 4.5.1-1. (Jack Conradson, Uwe Schindler)
* LUCENE-6729: Upgrade ASM used in expressions module to version 5.0.4.
(Uwe Schindler)
* LUCENE-6738: remove IndexWriterConfig.[gs]etIndexingChain
(Christine Poerschke)
* LUCENE-6755: more tests of ToChildBlockJoinScorer.advance (hossman)
* LUCENE-6571: fix some private access level javadoc errors and warnings
(Cao Manh Dat, Christine Poerschke)
* LUCENE-6768: AbstractFirstPassGroupingCollector.groupSort private member
is not needed. (Christine Poerschke)
* LUCENE-6761: MatchAllDocsQuery's Scorers do not expose approximations
anymore. (Adrien Grand)
* LUCENE-6775, LUCENE-6833: Improved MorfologikFilterFactory to allow
loading of custom dictionaries from ResourceLoader. Upgraded
Morfologik to version 2.0.1. The 'dictionary' attribute has been
reverted back and now points at the dictionary resource to be
loaded instead of the default Polish dictionary.
(Uwe Schindler, Dawid Weiss)
* LUCENE-6797: Make GeoCircle an interface and use a factory to create
it, to eventually handle degenerate cases (Karl Wright via Mike
McCandless)
* LUCENE-6800: Use XYZSolidFactory to create XYZSolids (Karl Wright
via Mike McCandless)
* LUCENE-6798: Geo3d now models degenerate (too tiny) circles as a
single point (Karl Wright via Mike McCandless)
* LUCENE-6770: Add javadocs that FSDirectory canonicalizes the path.
(Uwe Schindler, Vladimir Kuzmin)
* LUCENE-6795: Fix various places where code used
AccessibleObject#setAccessible() without a privileged block. Code
without a hard requirement to do reflection were rewritten. This
makes Lucene and Solr ready for Java 9 Jigsaw's module system, where
reflection on Java's runtime classes is very restricted.
(Robert Muir, Uwe Schindler)
* LUCENE-6467: Simplify Query.equals. (Paul Elschot via Adrien Grand)
* LUCENE-6845: SpanScorer is now merged into Spans (Alan Woodward, David Smiley)
* LUCENE-6887: DefaultSimilarity is deprecated, use ClassicSimilarity for equivalent behavior,
or consider switching to BM25Similarity which will become the new default in Lucene 6.0 (hossman)
* LUCENE-6893: factor out CorePlusQueriesParser from CorePlusExtensionsParser
(Christine Poerschke)
* LUCENE-6902: Don't retry to fsync files / directories; fail
immediately. (Daniel Mitterdorfer, Uwe Schindler)
* LUCENE-6801: Clarify JavaDocs of PhraseQuery that it in fact supports terms
at the same position (as does MultiPhraseQuery), treated like a conjunction.
Added test. (David Smiley, Adrien Grand)
Build
* LUCENE-6732: Improve checker for invalid source patterns to also
detect javadoc-style license headers. Use Groovy to implement the
checks instead of plain Ant. (Uwe Schindler)
* LUCENE-6594: Update forbiddenapis to 2.0. (Uwe Schindler)
Tests
* LUCENE-6752: Add Math#random() to forbiddenapis. (Uwe Schindler,
Mikhail Khludnev, Andrei Beliakov)
Changes in Backwards Compatibility Policy
* LUCENE-6742: The Lovins & Finnish implementation of SnowballFilter
were fixed to now behave exactly like the original Snowball stemmer.
If you have indexed text using those stemmers you may need to reindex.
(Uwe Schindler, Robert Muir)
Changes in Runtime Behavior
* LUCENE-6772: MultiCollector now catches CollectionTerminatedException and
removes the collector that threw this exception from the list of sub
collectors to collect. (Adrien Grand)
* LUCENE-6784: IndexSearcher's query caching is enabled by default. Run
indexSearcher.setQueryCache(null) to disable. (Adrien Grand)
* LUCENE-6305: BooleanQuery.equals and hashcode do not depend on the order of
clauses anymore. (Adrien Grand)
======================= Lucene 5.3.2 =======================
Bug Fixes
* SOLR-7865: BlendedInfixSuggester was returning too many results
(Arcadius Ahouansou via Mike McCandless)
======================= Lucene 5.3.1 =======================
Bug Fixes
* LUCENE-6774: Remove classloader hack in MorfologikFilter. (Robert Muir,
Uwe Schindler)
* LUCENE-6748: UsageTrackingQueryCachingPolicy no longer caches trivial queries
like MatchAllDocsQuery. (Adrien Grand)
* LUCENE-6781: Fixed BoostingQuery to rewrite wrapped queries. (Adrien Grand)
Tests
* LUCENE-6760, SOLR-7958: Move TestUtil#randomWhitespace to the only
Solr test that is using it. The method is not useful for Lucene tests
(and easily breaks, e.g., in Java 9 caused by Unicode version updates).
(Uwe Schindler)
======================= Lucene 5.3.0 =======================
New Features
* LUCENE-6485: Add CustomSeparatorBreakIterator to postings
highlighter which splits on any character. For example, it
can be used with getMultiValueSeparator render whole field
values. (Luca Cavanna via Robert Muir)
* LUCENE-6459: Add common suggest API that mirrors Lucene's
Query/IndexSearcher APIs for Document based suggester.
Adds PrefixCompletionQuery, RegexCompletionQuery,
FuzzyCompletionQuery and ContextQuery.
(Areek Zillur via Mike McCandless)
* LUCENE-6487: Spatial Geo3D API now has a WGS84 ellipsoid world model option.
(Karl Wright via David Smiley)
* LUCENE-6477: Add experimental BKD geospatial tree doc values format
and queries, for fast "bbox/polygon contains lat/lon points" (Mike
McCandless)
* LUCENE-6526: Asserting(Query|Weight|Scorer) now ensure scores are not computed
if they are not needed. (Adrien Grand)
* LUCENE-6481: Add GeoPointField, GeoPointInBBoxQuery,
GeoPointInPolygonQuery for simple "indexed lat/lon point in
bbox/shape" searching. (Nick Knize via Mike McCandless)
* LUCENE-5954: The segments_N commit point now stores the Lucene
version that wrote the commit as well as the lucene version that
wrote the oldest segment in the index, for faster checking of "too
old" indices (Ryan Ernst, Robert Muir, Mike McCandless)
* LUCENE-6519: BKDPointInPolygonQuery is much faster by avoiding
the per-hit polygon check when a leaf cell is fully contained by the
polygon. (Nick Knize, Mike McCandless)
* LUCENE-6549: Add preload option to MMapDirectory. (Robert Muir)
* LUCENE-6504: Add Lucene53Codec, with norms implemented directly
via the Directory's RandomAccessInput api. (Robert Muir)
* LUCENE-6539: Add new DocValuesNumbersQuery, to match any document
containing one of the specified long values. This change also
moves the existing DocValuesTermsQuery and DocValuesRangeQuery
to Lucene's sandbox module, since in general these queries are
quite slow and are only fast in specific cases. (Adrien Grand,
Robert Muir, Mike McCandless)
* LUCENE-6577: Give earlier and better error message for invalid CRC.
(Robert Muir)
* LUCENE-6544: Geo3D: (1) Regularize path & polygon construction, (2) add
PlanetModel.surfaceDistance() (ellipsoidal calculation), (3) cache lat & lon
in GeoPoint, (4) add thread-safety where missing -- Geo3dShape. (Karl Wright,
David Smiley)
* LUCENE-6606: SegmentInfo.toString now confesses how the documents
were sorted, when SortingMergePolicy was used (Christine Poerschke
via Mike McCandless)
* LUCENE-6524: IndexWriter can now be initialized from an already open
near-real-time or non-NRT reader. (Boaz Leskes, Robert Muir, Mike
McCandless)
* LUCENE-6578: Geo3D can now compute the distance from a point to a shape, both
inner distance and to an outside edge. Multiple distance algorithms are
available. (Karl Wright, David Smiley)
* LUCENE-6632: Geo3D: Compute circle planes more accurately.
(Karl Wright via David Smiley)
* LUCENE-6653: Added general purpose BytesTermAttribute to basic token
attributes package that can be used for TokenStreams that solely produce
binary terms. (Uwe Schindler)
* LUCENE-6365: Add Operations.topoSort, to run topological sort of the
states in an Automaton (Markus Heiden via Mike McCandless)
* LUCENE-6365: Replace Operations.getFiniteStrings with a
more scalable iterator API (FiniteStringsIterator) (Markus Heiden
via Mike McCandless)
* LUCENE-6589: Add a new org.apache.lucene.search.join.CheckJoinIndex class
that can be used to validate that an index has an appropriate structure to
run join queries. (Adrien Grand)
* LUCENE-6659: Remove IndexWriter's unnecessary hard limit on max concurrency
(Robert Muir, Mike McCandless)
* LUCENE-6547: Add GeoPointDistanceQuery, matching all points within
the specified distance from the center point. Fix
GeoPointInBBoxQuery to handle dateline crossing.
* LUCENE-6694: Add LithuanianAnalyzer and LithuanianStemmer.
(Dainius Jocas via Robert Muir)
* LUCENE-6695: Added a new BlendedTermQuery to blend statistics across several
terms. (Simon Willnauer, Adrien Grand)
* LUCENE-6706: Added a new PayloadScoreQuery that generalises the behaviour of
PayloadTermQuery and PayloadNearQuery to all Span queries. (Alan Woodward)
* LUCENE-6697: Add experimental range tree doc values format and
queries, based on a 1D version of the spatial BKD tree, for a faster
and smaller alternative to postings-based numeric and binary term
filtering. Range trees can also handle values larger than 64 bits.
(Adrien Grand, Mike McCandless)
* LUCENE-6647: Add GeoHash string utility APIs (Nick Knize via Mike
McCandless).
* LUCENE-6710: GeoPointField now uses full 64 bits (up from 62) to encode
lat/lon (Nick Knize via Mike McCandless).
* LUCENE-6580: SpanNearQuery now allows defined-width gaps in its subqueries
(Alan Woodward, Adrien Grand).
* LUCENE-6712: Use doc values to post-filter GeoPointField hits that
fall in boundary cells, resulting in smaller index, faster searches
and less heap used for each query (Nick Knize via Mike McCandless).
API Changes
* LUCENE-6508: Simplify Lock api, there is now just
Directory.obtainLock() which returns a Lock that can be
released (or fails with exception). Add lock verification
to IndexWriter. Improve exception messages when locking fails.
(Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-6371, LUCENE-6490: Payload collection from Spans is moved to a more generic
SpanCollector framework. Spans no longer implements .hasPayload() and
.getPayload() methods, and instead exposes a collect() method that allows
the collection of arbitrary postings information. SpanPayloadCheckQuery and
SpanPayloadNearCheckQuery have moved from the .spans package to the .payloads
package. (Alan Woodward, David Smiley, Paul Elschot, Robert Muir)
* LUCENE-6529: Removed an optimization in UninvertingReader that was causing
incorrect results for Numeric fields using precisionStep
(hossman, Robert Muir)
* LUCENE-6551: Add missing ConcurrentMergeScheduler.getAutoIOThrottle
getter (Simon Willnauer, Mike McCandless)
* LUCENE-6552: Add MergePolicy.OneMerge.getMergeInfo and rename
setInfo to setMergeInfo (Simon Willnauer, Mike McCandless)
* LUCENE-6525: Deprecate IndexWriterConfig's writeLockTimeout.
(Robert Muir)
* LUCENE-6583: FilteredQuery is deprecated and will be removed in 6.0. It should
be replaced with a BooleanQuery which handle the query as a MUST clause and
the filter as a FILTER clause. (Adrien Grand)
* LUCENE-6553: The postings, spans and scorer APIs no longer take an acceptDocs
parameter. Live docs are now always checked on top of these APIs.
(Adrien Grand)
* LUCENE-6634: PKIndexSplitter now takes a Query instead of a Filter to decide
how to split an index. (Adrien Grand)
* LUCENE-6643: GroupingSearch from lucene/grouping was changed to take a Query
object to define groups instead of a Filter. (Adrien Grand)
* LUCENE-6554: ToParentBlockJoinFieldComparator was removed because of a bug
with missing values that could not be fixed. ToParentBlockJoinSortField now
works with string or numeric doc values selectors. Sorting on anything else
than a string or numeric field would require to implement a custom selector.
(Adrien Grand)
* LUCENE-6648: All lucene/facet APIs now take Query objects where they used to
take Filter objects. (Adrien Grand)
* LUCENE-6640: Suggesters now take a BitsProducer object instead of a Filter
object to reduce the scope of doc IDs that may be returned, emphasizing the
fact that these objects need to support random-access. (Adrien Grand)
* LUCENE-6646: Make EarlyTerminatingCollector take a Sort object directly
instead of a SortingMergePolicy. (Christine Poerschke via Adrien Grand)
* LUCENE-6649: BitDocIdSetFilter and BitDocIdSetCachingWrapperFilter are now
deprecated in favour of BitSetProducer and QueryBitSetProducer, which do not
extend oal.search.Filter. (Adrien Grand)
* LUCENE-6607: Factor out geo3d into its own spatial3d module. (Karl
Wright, Nick Knize, David Smiley, Mike McCandless)
* LUCENE-6531: PhraseQuery is now immutable and can be built using the
PhraseQuery.Builder class. (Adrien Grand)
* LUCENE-6570: BooleanQuery is now immutable and can be built using the
BooleanQuery.Builder class. (Adrien Grand)
* LUCENE-6702: NRTSuggester: Add a method to inject context values at index time
in ContextSuggestField. Simplify ContextQuery logic for extracting contexts and
add dedicated method to consider all context values at query time.
(Areek Zillur, Mike McCandless)
* LUCENE-6719: NumericUtils getMinInt, getMaxInt, getMinLong, getMaxLong now
return null if there are no terms for the specified field, previously these
methods returned primitive values and raised an undocumented NullPointerException
if there were no terms for the field. (hossman, Timothy Potter)
Bug fixes
* LUCENE-6500: ParallelCompositeReader did not always call
closed listeners. This was fixed by LUCENE-6501.
(Adrien Grand, Uwe Schindler)
* LUCENE-6520: Geo3D GeoPath.done() would throw an NPE if adjacent path
segments were co-linear. (Karl Wright via David Smiley)
* LUCENE-5805: QueryNodeImpl.removeFromParent was doing nothing in a
costly manner (Christoph Kaser, Cao Manh Dat via Mike McCAndless)
* LUCENE-6533: SlowCompositeReaderWrapper no longer caches its live docs
instance since this can prevent future improvements like a
disk-backed live docs (Adrien Grand, Mike McCandless)
* LUCENE-6558: Highlighters now work with CustomScoreQuery (Cao Manh
Dat via Mike McCandless)
* LUCENE-6560: BKDPointInBBoxQuery now handles "dateline crossing"
correctly (Nick Knize, Mike McCandless)
* LUCENE-6564: Change PrintStreamInfoStream to use thread safe Java 8
ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString
as workaround); fix output of tests to use same format. (Uwe Schindler,
Ramkumar Aiyengar)
* LUCENE-6593: Fixed ToChildBlockJoinQuery's scorer to not refuse to advance
to a document that belongs to the parent space. (Adrien Grand)
* LUCENE-6591: Never write a negative vLong (Robert Muir, Ryan Ernst,
Adrien Grand, Mike McCandless)
* LUCENE-6588: Fix how ToChildBlockJoinQuery deals with acceptDocs.
(Christoph Kaser via Adrien Grand)
* LUCENE-6597: Geo3D's GeoCircle now supports a world-globe diameter.
(Karl Wright via David Smiley)
* LUCENE-6608: Fix potential resource leak in BigramDictionary.
(Rishabh Patel via Uwe Schindler)
* LUCENE-6614: Improve partition detection in IOUtils#spins() so it
works with NVMe drives. (Uwe Schindler, Mike McCandless)
* LUCENE-6586: Fix typo in GermanStemmer, causing possible wrong value
for substCount. (Christoph Kaser via Mike McCandless)
* LUCENE-6658: Fix IndexUpgrader to also upgrade indexes without any
segments. (Trejkaz, Uwe Schindler)
* LUCENE-6677: QueryParserBase fails to enforce maxDeterminizedStates when
creating a WildcardQuery (David Causse via Mike McCandless)
* LUCENE-6680: Preserve two suggestions that have same key and weight but
different payloads (Arcadius Ahouansou via Mike McCandless)
* LUCENE-6681: SortingMergePolicy must override MergePolicy.size(...).
(Christine Poerschke via Adrien Grand)
* LUCENE-6682: StandardTokenizer performance bug: scanner buffer is
unnecessarily copied when maxTokenLength doesn't change. Also stop silently
maxing out buffer size (and effectively also max token length) at 1M chars,
but instead throw an exception from setMaxTokenLength() when the given
length is greater than 1M chars. (Piotr Idzikowski, Steve Rowe)
* LUCENE-6696: Fix FilterDirectoryReader.close() to never close the
underlying reader several times. (Adrien Grand)
* LUCENE-6334: FastVectorHighlighter failed to highlight phrases across
more than one value in a multi-valued field. (Chris Earle, Nik Everett
via Mike McCandless)
* LUCENE-6704: GeoPointDistanceQuery was visiting too many term ranges,
consuming too much heap for a large radius (Nick Knize via Mike McCandless)
* SOLR-5882: fix ScoreMode.Min at ToParentBlockJoinQuery (Mikhail Khludnev)
* LUCENE-6718: JoinUtil.createJoinQuery failed to rewrite queries before
creating a Weight. (Adrien Grand)
* LUCENE-6713: TooComplexToDeterminizeException claims to be serializable
but wasn't (Simon Willnauer, Mike McCandless)
* LUCENE-6723: Fix date parsing problems in Java 9 with date formats using
English weekday/month names. (Uwe Schindler)
* LUCENE-6618: Properly set MMapDirectory.UNMAP_SUPPORTED when it is now allowed
by security policy. (Robert Muir)
Changes in Runtime Behavior
* LUCENE-6501: The subreader structure in ParallelCompositeReader
was flattened, because the current implementation had too many
hidden bugs regarding refounting and close listeners.
If you create a new ParallelCompositeReader, it will just take
all leaves of the passed readers and form a flat structure of
ParallelLeafReaders instead of trying to assemble the original
structure of composite and leaf readers. (Adrien Grand,
Uwe Schindler)
* LUCENE-6537: NearSpansOrdered no longer tries to minimize its
Span matches. This means that the matching algorithm is entirely
lazy. All spans returned by the previous implementation are still
reported, but matching documents may now also return additional
spans that were previously discarded in preference to shorter
overlapping ones. (Alan Woodward, Adrien Grand, Paul Elschot)
* LUCENE-6538: Also include java.vm.version and java.runtime.version
in per-segment diagnostics (Robert Muir, Mike McCandless)
* LUCENE-6569: Optimize MultiFunction.anyExists and allExists to eliminate
excessive array creation in common 2 argument usage (Jacob Graves, hossman)
* LUCENE-2880: Span queries now score more consistently with regular queries.
(Robert Muir, Adrien Grand)
* LUCENE-6601: FilteredQuery now always rewrites to a BooleanQuery which handles
the query as a MUST clause and the filter as a FILTER clause.
LEAP_FROG_QUERY_FIRST_STRATEGY and LEAP_FROG_FILTER_FIRST_STRATEGY do not
guarantee anymore which iterator will be advanced first, it will depend on the
respective costs of the iterators. QUERY_FIRST_FILTER_STRATEGY and
RANDOM_ACCESS_FILTER_STRATEGY still consume the filter using its random-access
API, however the returned bits may be called on different documents compared
to before. (Adrien Grand)
* LUCENE-6542: FSDirectory's ctor now works with security policies or file systems
that restrict write access. (Trejkaz, hossman, Uwe Schindler)
* LUCENE-6651: The default implementation of AttributeImpl#reflectWith(AttributeReflector)
now uses AccessControler#doPrivileged() to do the reflection. Please consider
implementing this method in all your custom attributes, because the method will be
made abstract in Lucene 6. (Uwe Schindler)
* LUCENE-6639: LRUQueryCache and CachingWrapperQuery now consider a query as
"used" when the first Scorer is pulled instead of when a Scorer is pulled on
the first segment on an index. (Terry Smith, Adrien Grand)
* LUCENE-6579: IndexWriter now sacrifices (closes) itself to protect the index
when an unexpected, tragic exception strikes while merging. (Robert
Muir, Mike McCandless)
* LUCENE-6691: SortingMergePolicy.isSorted now considers FilterLeafReader instances.
EarlyTerminatingSortingCollector.terminatedEarly accessor added.
TestEarlyTerminatingSortingCollector.testTerminatedEarly test added.
(Christine Poerschke)
* LUCENE-6609: Add getSortField impls to many subclasses of FieldCacheSource which return
the most direct SortField implementation. In many trivial sort by ValueSource usages, this
will result in less RAM, and more precise sorting of extreme values due to no longer
converting to double. (hossman)
Optimizations
* LUCENE-6548: Some optimizations for BlockTree's intersect with very
finite automata (Mike McCandless)
* LUCENE-6585: Flatten conjunctions and conjunction approximations into
parent conjunctions. For example a sloppy phrase query of "foo bar"~5
with a filter of "baz" will internally leapfrog foo,bar,baz as one
conjunction. (Ryan Ernst, Robert Muir, Adrien Grand)
* LUCENE-6325: Reduce RAM usage of FieldInfos, and speed up lookup by
number, by using an array instead of TreeMap except in very sparse
cases (Robert Muir, Mike McCandless)
* LUCENE-6617: Reduce heap usage for small FSTs (Mike McCandless)
* LUCENE-6616: IndexWriter now lists the files in the index directory
only once on init, and IndexFileDeleter no longer suppresses
FileNotFoundException and NoSuchFileException. This also improves
IndexFileDeleter to delete segments_N files last, so that in the
presence of a virus checker, the index is never left in a state
where an expired segments_N references non-existing files (Robert
Muir, Mike McCandless)
* LUCENE-6645: Optimized the way we merge postings lists in multi-term queries
and TermsQuery. This should especially help when there are lots of small
postings lists. (Adrien Grand, Mike McCandless)
* LUCENE-6668: Optimized storage for sorted set and sorted numeric doc values
in the case that there are few unique sets of values.
(Adrien Grand, Robert Muir)
* LUCENE-6690: Sped up MultiTermsEnum.next() on high-cardinality fields.
(Adrien Grand)
* LUCENE-6621: Removed two unused variables in analysis/stempel/src/java/org/
egothor/stemmer/Compile.java
(Rishabh Patel via Christine Poerschke)
Build
* LUCENE-6518: Don't report false thread leaks from IBM J9
ClassCache Reaper in test framework. (Dawid Weiss)
* LUCENE-6567: Simplify payload checking in SpanPayloadCheckQuery (Alan
Woodward)
* LUCENE-6568: Make rat invocation depend on ivy configuration being set up
(Ramkumar Aiyengar)
* LUCENE-6683: ivy-fail goal directs people to non-existent page
(Mike Drob via Steve Rowe)
* LUCENE-6693: Updated Groovy to 2.4.4, Pegdown to 1.5, Svnkit to 1.8.10.
Also fixed some PermGen errors while running full build caused by
these updates: Tasks are now installed from root's build.xml.
(Uwe Schindler)
* LUCENE-6741: Fix jflex files to regenerate the java files correctly.
(Uwe Schindler)
Test Framework
* LUCENE-6637: Fix FSTTester to not violate file permissions
on -Dtests.verbose=true. (Mesbah M. Alam, Uwe Schindler)
* LUCENE-6542: LuceneTestCase now has runWithRestrictedPermissions() to run
an action with reduced permissions. This can be used to simulate special
environments (e.g., read-only dirs). If tests are running without a security
manager, an assume cancels test execution automatically. (Uwe Schindler)
* LUCENE-6652: Removed lots of useless Byte(s)TermAttributes all over test
infrastructure. (Uwe Schindler)
* LUCENE-6563: Improve MockFileSystemTestCase.testURI to check if a path
can be encoded according to local filesystem requirements. Otherwise
stop test execution. (Christine Poerschke via Uwe Schindler)
Changes in Backwards Compatibility Policy
* LUCENE-6553: The iterator returned by the LeafReader.postings method now
always includes deleted docs, so you have to check for deleted documents on
top of the iterator. (Adrien Grand)
* LUCENE-6633: DuplicateFilter has been deprecated and will be removed in 6.0.
DiversifiedTopDocsCollector can be used instead with a maximum number of hits
per key equal to 1. (Adrien Grand)
* LUCENE-6653: The workflow for consuming the TermToBytesRefAttribute was changed:
getBytesRef() now does all work and is called on each token, fillBytesRef()
was removed. The implementation is free to reuse the internal BytesRef
or return a new one on each call. (Uwe Schindler)
* LUCENE-6682: StandardTokenizer.setMaxTokenLength() now throws an exception if
a length greater than 1M chars is given. Previously the effective max token
length (the scanner's buffer) was capped at 1M chars, but getMaxTokenLength()
incorrectly returned the previously requested length, even when it exceeded 1M.
(Piotr Idzikowski, Steve Rowe)
======================= Lucene 5.2.1 =======================
Bug Fixes
* LUCENE-6482: Fix class loading deadlock relating to Codec initialization,
default codec and SPI discovery. (Shikhar Bhushan, Uwe Schindler)
* LUCENE-6523: NRT readers now reflect a new commit even if there is
no change to the commit user data (Mike McCandless)
* LUCENE-6527: Queries now get a dummy Similarity when scores are not needed
in order to not load unnecessary information like norms. (Adrien Grand)
* LUCENE-6559: TimeLimitingCollector now also checks for timeout when a new
leaf reader is pulled ie. if we move from one segment to another even without
collecting a hit. (Simon Willnauer)
======================= Lucene 5.2.0 =======================
New Features
* LUCENE-6308, LUCENE-6385, LUCENE-6391: Span queries now share
document conjunction/intersection
code with boolean queries, and use two-phased iterators for
faster intersection by avoiding loading positions in certain cases.
(Paul Elschot, Terry Smith, Robert Muir via Mike McCandless)
* LUCENE-6393: Add two-phase support to SpanPositionCheckQuery
and its subclasses: SpanPositionRangeQuery, SpanPayloadCheckQuery,
SpanNearPayloadCheckQuery, SpanFirstQuery. (Paul Elschot, Robert Muir)
* LUCENE-6394: Add two-phase support to SpanNotQuery and refactor
FilterSpans to just have an accept(Spans candidate) method for
subclasses. (Robert Muir)
* LUCENE-6373: SpanOrQuery shares disjunction logic with boolean
queries, and supports two-phased iterators to avoid loading
positions when possible. (Paul Elschot via Robert Muir)
* LUCENE-6352, LUCENE-6472: Added a new query time join to the join module
that uses global ordinals, which is faster for subsequent joins between
reopens. (Martijn van Groningen, Adrien Grand)
* LUCENE-5879: Added experimental auto-prefix terms to BlockTree terms
dictionary, exposed as AutoPrefixPostingsFormat (Adrien Grand,
Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-5579: New CompositeSpatialStrategy combines speed of RPT with
accuracy of SDV. Includes optimized Intersect predicate to avoid many
geometry checks. Uses TwoPhaseIterator. (David Smiley)
* LUCENE-5989: Allow passing BytesRef to StringField to make it easier
to index arbitrary binary tokens, and change the experimental
StoredFieldVisitor.stringField API to take UTF-8 byte[] instead of
String (Mike McCandless)
* LUCENE-6389: Added ScoreMode.Min that aggregates the lowest child score
to the parent hit. (Martijn van Groningen, Adrien Grand)
* LUCENE-6423: New LimitTokenOffsetFilter that limits tokens to those before
a configured maximum start offset. (David Smiley)
* LUCENE-6422: New spatial PackedQuadPrefixTree, a generally more efficient
choice than QuadPrefixTree, especially for high precision shapes.
When used, you should typically disable RPT's pruneLeafyBranches option.
(Nick Knize, David Smiley)
* LUCENE-6451: Expressions now support bindings keys that look like
zero arg functions (Jack Conradson via Ryan Ernst)
* LUCENE-6083: Add SpanWithinQuery and SpanContainingQuery that return
spans inside of / containing another spans. (Paul Elschot via Robert Muir)
* LUCENE-6454: Added distinction between member variable and method in
expression helper VariableContext
(Jack Conradson via Ryan Ernst)
* LUCENE-6196: New Spatial "Geo3d" API with partial Spatial4j integration.
It is a set of shapes implemented using 3D planar geometry for calculating
spatial relations on the surface of a sphere. Shapes include Point, BBox,
Circle, Path (buffered line string), and Polygon.
(Karl Wright via David Smiley)
* LUCENE-6464: Add a new expert lookup method to
AnalyzingInfixSuggester to accept an arbitrary BooleanQuery to
express how contexts should be filtered. (Arcadius Ahouansou via
Mike McCandless)
Optimizations
* LUCENE-6379: IndexWriter.deleteDocuments(Query...) now detects if
one of the queries is MatchAllDocsQuery and just invokes the much
faster IndexWriter.deleteAll in that case (Robert Muir, Adrien
Grand, Mike McCandless)
* LUCENE-6388: Optimize SpanNearQuery when payloads are not present.
(Robert Muir)
* LUCENE-6421: Defer reading of positions in MultiPhraseQuery until
they are needed. (Robert Muir)
* LUCENE-6392: Highligher- reduce memory of tokens in
TokenStreamFromTermVector, and add maxStartOffset limit. (David Smiley)
* LUCENE-6456: Queries that generate doc id sets that are too large for the
query cache are not cached instead of evicting everything. (Adrien Grand)
* LUCENE-6455: Require a minimum index size to enable query caching in order
not to cache eg. on MemoryIndex. (Adrien Grand)
* LUCENE-6330: BooleanScorer (used for top-level disjunctions) does not decode
norms when not necessary anymore. (Adrien Grand)
* LUCENE-6350: TermsQuery is now compressed with PrefixCodedTerms.
(Robert Muir, Mike McCandless, Adrien Grand)
* LUCENE-6458: Multi-term queries matching few terms per segment now execute
like a disjunction. (Adrien Grand)
* LUCENE-6360: TermsQuery rewrites to a disjunction when there are 16 matching
terms or less. (Adrien Grand)
Bug Fixes
* LUCENE-329: Fix FuzzyQuery defaults to rank exact matches highest.
(Mark Harwood, Adrien Grand)
* LUCENE-6378: Fix all RuntimeExceptions to throw the underlying root cause.
(Varun Thacker, Adrien Grand, Mike McCandless)
* LUCENE-6415: TermsQuery.extractTerms is a no-op (used to throw an
UnsupportedOperationException). (Adrien Grand)
* LUCENE-6416: BooleanQuery.extractTerms now only extracts terms from scoring
clauses. (Adrien Grand)
* LUCENE-6409: Fixed integer overflow in LongBitSet.ensureCapacity.
(Luc Vanlerberghe via Adrien Grand)
* LUCENE-6424, LUCENE-6430: Fix many bugs with mockfs filesystems in the
test-framework: always consistently wrap Path, fix buggy behavior for
globs, implement equals/hashcode for filtered Paths, etc.
(Ryan Ernst, Simon Willnauer, Robert Muir)
* LUCENE-6426: Fix FieldType's copy constructor to also copy over the numeric
precision step. (Adrien Grand)
* LUCENE-6345: Null check terms/fields in Lucene queries (Lee
Hinman via Mike McCandless)
* LUCENE-6400: SolrSynonymParser should preserve original token instead
of replacing it with a synonym, when expand=true and there is no
explicit mapping (Ian Ribas, Robert Muir, Mike McCandless)
* LUCENE-6449: Don't throw NullPointerException if some segments are
missing the field being highlighted, in PostingsHighlighter (Roman
Khmelichek via Mike McCandless)
* LUCENE-6427: Added assertion about the presence of ghost bits in
(Fixed|Long)BitSet. (Luc Vanlerberghe via Adrien Grand)
* LUCENE-6468: Fixed NPE with empty Kuromoji user dictionary.
(Jun Ohtani via Christian Moen)
* LUCENE-6483: Ensure core closed listeners are called on the same cache key as
the reader which has been used to register the listener. (Adrien Grand)
* LUCENE-6486 DocumentDictionary iterator no longer skips
documents with no payloads and now returns an empty BytesRef instead
(Marius Grama via Michael McCandless)
* LUCENE-6505: NRT readers now reflect segments_N filename and commit
user data from previous commits (Mike McCandless)
* LUCENE-6507: Don't let NativeFSLock.close() release other locks
(Simon Willnauer, Robert Muir, Uwe Schindler, Mike McCandless)
API Changes
* LUCENE-6377: SearcherFactory#newSearcher now accepts the previous reader
to simplify warming logic during opening new searchers. (Simon Willnauer)
* LUCENE-6410: Removed unused "reuse" parameter to
Terms.iterator. (Robert Muir, Mike McCandless)
* LUCENE-6425: Replaced Query.extractTerms with Weight.extractTerms.
(Adrien Grand)
* LUCENE-6446: Simplified Explanation API. (Adrien Grand)
* LUCENE-6445: Two new methods in Highlighter's TokenSources; the existing
methods are now marked deprecated. (David Smiley)
* LUCENE-6484: Removed EliasFanoDocIdSet, which was unused.
(Paul Elschot via Adrien Grand)
* LUCENE-6466: Moved SpanQuery.getSpans() and .extractTerms() to SpanWeight
(Alan Woodward, Robert Muir)
* LUCENE-6497: Allow subclasses of FieldType to check frozen state
(Ryan Ernst)
Other
* LUCENE-6413: Test runner should report the number of suites completed/
remaining. (Dawid Weiss)
* LUCENE-5439: Add 'ant jacoco' build target. (Robert Muir)
* LUCENE-6315: Simplify the private iterator Lucene uses internally
when resolving deleted terms to matched docids. (Robert Muir, Adrien
Grand, Mike McCandless)
* LUCENE-6399: Benchmark module's QueryMaker.resetInputs should call setConfig
so queries can react to property changes in new rounds. (David Smiley)
* LUCENE-6382: Lucene now enforces that positions never exceed the
maximum value IndexWriter.MAX_POSITION. (Robert Muir, Mike McCandless)
* LUCENE-6372: Simplified and improved equals/hashcode of span queries.
(Paul Elschot via Adrien Grand)
Build
* LUCENE-6420: Update forbiddenapis to v1.8 (Uwe Schindler)
Test Framework
* LUCENE-6419: Added two-phase iteration assertions to AssertingQuery.
(Adrien Grand)
* LUCENE-6437: Randomly set CPU core count and spins, derived from
test's master seed, used by ConcurrentMergeScheduler to set dynamic
defaults, for better test randomization and to help tests reproduce
(Robert Muir, Mike McCandless)
======================= Lucene 5.1.0 =======================
New Features
* LUCENE-6066: Added DiversifiedTopDocsCollector to misc for collecting no more
than a given number of results under a choice of key. Introduces new remove
method to core's PriorityQueue. (Mark Harwood)
* LUCENE-6191: New spatial 2D heatmap faceting for PrefixTreeStrategy. (David Smiley)
* LUCENE-6227: Added BooleanClause.Occur.FILTER to filter documents without
participating in scoring (on the contrary to MUST). (Adrien Grand)
* LUCENE-6294: Added oal.search.CollectorManager to allow for parallelization
of the document collection process on IndexSearcher. (Adrien Grand)
* LUCENE-6303: Added filter caching baked into IndexSearcher, disabled by
default. (Adrien Grand)
* LUCENE-6304: Added a new MatchNoDocsQuery that matches no documents.
(Lee Hinman via Adrien Grand)
* LUCENE-6341: Add a -fast option to CheckIndex. (Robert Muir)
* LUCENE-6355: IndexWriter's infoStream now also logs time to write FieldInfos
during merge (Lee Hinman via Mike McCandless)
* LUCENE-6339: Added Near-real time Document Suggester via custom postings format
(Areek Zillur, Mike McCandless, Simon Willnauer)
Bug Fixes
* LUCENE-6368: FST.save can truncate output (BufferedOutputStream may be closed
after the underlying stream). (Ippei Matsushima via Dawid Weiss)
* LUCENE-6249: StandardQueryParser doesn't support pure negative clauses.
(Dawid Weiss)
* LUCENE-6190: Spatial pointsOnly flag on PrefixTreeStrategy shouldn't switch all predicates to
Intersects. (David Smiley)
* LUCENE-6242: Ram usage estimation was incorrect for SparseFixedBitSet when
object alignment was different from 8. (Uwe Schindler, Adrien Grand)
* LUCENE-6293: Fixed TimSorter bug. (Adrien Grand)
* LUCENE-6001: DrillSideways hits NullPointerException for certain
BooleanQuery searches. (Dragan Jotannovic, jane chang via Mike
McCandless)
* LUCENE-6311: Fix NIOFSDirectory and SimpleFSDirectory so that the
toString method of IndexInputs confess when they are from a compound
file. (Robert Muir, Mike McCandless)
* LUCENE-6381: Add defensive wait time limit in
DocumentsWriterStallControl to prevent hangs during indexing if we
miss a .notify/All somewhere (Mike McCandless)
* LUCENE-6386: Correct IndexWriter.forceMerge documentation to state
that up to 3X (X = current index size) spare disk space may be needed
to complete forceMerge(1). (Robert Muir, Shai Erera, Mike McCandless)
* LUCENE-6395: Seeking by term ordinal was failing to set the term's
bytes in MemoryIndex (Mike McCandless)
* LUCENE-6429: Removed the TermQuery(Term,int) constructor which could lead to
inconsistent term statistics. (Adrien Grand, Robert Muir)
Optimizations
* LUCENE-6183, LUCENE-5647: Avoid recompressing stored fields
and term vectors when merging segments without deletions.
Lucene50Codec's BEST_COMPRESSION mode uses a higher deflate
level for more compact storage. (Robert Muir)
* LUCENE-6184: Make BooleanScorer only score windows that contain
matches. (Adrien Grand)
* LUCENE-6161: Speed up resolving of deleted terms to docIDs by doing
a combined merge sort between deleted terms and segment terms
instead of a separate merge sort for each segment. In delete-heavy
use cases this can be a sizable speedup. (Mike McCandless)
* LUCENE-6201: BooleanScorer can now deal with values of minShouldMatch that
are greater than one and is used when queries produce dense result sets.
(Adrien Grand)
* LUCENE-6218: Don't decode frequencies or match all positions when scoring
is not needed. (Robert Muir)
* LUCENE-6233 Speed up CheckIndex when the index has term vectors
(Robert Muir, Mike McCandless)
* LUCENE-6198: Added the TwoPhaseIterator API, exposed on scorers which
is for now only used on phrase queries and conjunctions in order to check
positions lazily if the phrase query is in a conjunction with other queries.
(Robert Muir, Adrien Grand, David Smiley)
* LUCENE-6244, LUCENE-6251: All boolean queries but those that have a
minShouldMatch > 1 now either propagate or take advantage of the two-phase
iteration capabilities added in LUCENE-6198. (Adrien Grand, Robert Muir)
* LUCENE-6241: FSDirectory.listAll() doesnt filter out subdirectories anymore,
for faster performance. Subdirectories don't matter to Lucene. If you need to
filter out non-index files with some custom usage, you may want to look at
the IndexFileNames class. (Robert Muir)
* LUCENE-6262: ConstantScoreQuery does not wrap the inner weight anymore when
scores are not required. (Adrien Grand)
* LUCENE-6263: MultiCollector automatically caches scores when several
collectors need them. (Adrien Grand)
* LUCENE-6275: SloppyPhraseScorer now uses the same logic as ConjunctionScorer
in order to advance doc IDs, which takes advantage of the cost() API.
(Adrien Grand)
* LUCENE-6290: QueryWrapperFilter propagates approximations and FilteredQuery
rewrites to a BooleanQuery when the filter is a QueryWrapperFilter in order
to leverage approximations. (Adrien Grand)
* LUCENE-6318: Reduce RAM usage of FieldInfos when there are many fields.
(Mike McCandless, Robert Muir)
* LUCENE-6320: Speed up CheckIndex. (Robert Muir)
* LUCENE-4942: Optimized the encoding of PrefixTreeStrategy indexes for
non-point data: 33% smaller index, 68% faster indexing, and 44% faster
searching. YMMV (David Smiley)
API Changes
* LUCENE-6204, LUCENE-6208: Simplify CompoundFormat: remove files()
and remove files parameter to write(). (Robert Muir)
* LUCENE-6217: Add IndexWriter.isOpen and getTragicException. (Simon
Willnauer, Mike McCandless)
* LUCENE-6218, LUCENE-6220: Add Collector.needsScores() and needsScores
parameter to Query.createWeight(). (Robert Muir, Adrien Grand)
* LUCENE-4524, LUCENE-6246, LUCENE-6256, LUCENE-6271: Merge DocsEnum and DocsAndPositionsEnum
into a single PostingsEnum iterator. TermsEnum.docs() and TermsEnum.docsAndPositions()
are replaced by TermsEnum.postings().
(Alan Woodward, Simon Willnauer, Robert Muir, Ryan Ernst)
* LUCENE-6222: Removed TermFilter, use a QueryWrapperFilter(TermQuery)
instead. This will be as efficient now that queries can opt out from
scoring. (Adrien Grand)
* LUCENE-6269: Removed BooleanFilter, use a QueryWrapperFilter(BooleanQuery)
instead. (Adrien Grand)
* LUCENE-6270: Replaced TermsFilter with TermsQuery, use a
QueryWrapperFilter(TermsQuery) instead. (Adrien Grand)
* LUCENE-6223: Move BooleanQuery.BooleanWeight to BooleanWeight.
(Robert Muir)
* LUCENE-1518: Make Filter extend Query and return 0 as score.
(Uwe Schindler, Adrien Grand)
* LUCENE-6245: Force Filter subclasses to implement toString API from Query.
(Ryan Ernst)
* LUCENE-6268: Replace FieldValueFilter and DocValuesRangeFilter with equivalent
queries that support approximations. (Adrien Grand)
* LUCENE-6289: Replace DocValuesRangeFilter with DocValuesRangeQuery which
supports approximations. (Adrien Grand)
* LUCENE-6266: Remove unnecessary Directory params from SegmentInfo.toString,
SegmentInfos.files/toString, and SegmentCommitInfo.toString. (Robert Muir)
* LUCENE-6272: Scorer extends DocSetIdIterator rather than DocsEnum (Alan
Woodward)
* LUCENE-6281: Removed support for slow collations from lucene/sandbox. Better
performance would be achieved through CollationKeyAnalyzer or
ICUCollationKeyAnalyzer. (Adrien Grand)
* LUCENE-6286: Removed IndexSearcher methods that take a Filter object.
A BooleanQuery with a filter clause must be used instead. (Adrien Grand)
* LUCENE-6300: PrefixFilter, TermRangeFilter and NumericRangeFilter have been
removed. Use PrefixQuery, TermRangeQuery and NumericRangeQuery instead.
(Adrien Grand)
* LUCENE-6303: Replaced FilterCache with QueryCache and CachingWrapperFilter
with CachingWrapperQuery. (Adrien Grand)
* LUCENE-6317: Deprecate DataOutput.writeStringSet and writeStringStringMap.
Use writeSetOfStrings/Maps instead. (Mike McCandless, Robert Muir)
* LUCENE-6307: Rename SegmentInfo.getDocCount -> .maxDoc,
SegmentInfos.totalDocCount -> .totalMaxDoc, MergeInfo.totalDocCount
-> .totalMaxDoc and MergePolicy.OneMerge.totalDocCount ->
.totalMaxDoc (Adrien Grand, Robert Muir, Mike McCandless)
* LUCENE-6367: PrefixQuery now subclasses AutomatonQuery, removing the
specialized PrefixTermsEnum. (Robert Muir, Mike McCandless)
Other
* LUCENE-6248: Remove unused odd constants from StandardSyntaxParser.jj
(Dawid Weiss)
* LUCENE-6193: Collapse identical catch branches in try-catch statements.
(shalin)
* LUCENE-6239: Removed RAMUsageEstimator's sun.misc.Unsafe calls.
(Robert Muir, Dawid Weiss, Uwe Schindler)
* LUCENE-6292: Seed StringHelper better. (Robert Muir)
* LUCENE-6333: Refactored queries to delegate their equals and hashcode
impls to the super class. (Lee Hinman via Adrien Grand)
* LUCENE-6343: DefaultSimilarity javadocs had the wrong float value to
demonstrate precision of encoded norms (András Péteri via Mike McCandless)
Changes in Runtime Behavior
* LUCENE-6255: PhraseQuery now ignores leading holes and requires that
positions are positive and added in order. (Adrien Grand)
* LUCENE-6298: SimpleQueryParser returns an empty query rather than
null, if e.g. the terms were all stopwords. (Lee Hinman via Robert Muir)
======================= Lucene 5.0.0 =======================
New Features
* LUCENE-5945: All file handling converted to NIO.2 apis. (Robert Muir)
* LUCENE-5946: SimpleFSDirectory now uses Files.newByteChannel, for
portability with custom FileSystemProviders. If you want the old
non-interruptible behavior of RandomAccessFile, use RAFDirectory
in the misc/ module. (Uwe Schindler, Robert Muir)
* SOLR-3359: Added analyzer attribute/property to SynonymFilterFactory.
(Ryo Onodera via Koji Sekiguchi)
* LUCENE-5648: Index and search date ranges, particularly multi-valued ones. It's
implemented in the spatial module as DateRangePrefixTree used with
NumberRangePrefixTreeStrategy. (David Smiley)
* LUCENE-5895: Lucene now stores a unique id per-segment and per-commit to aid
in accurate replication of index files (Robert Muir, Mike McCandless)
* LUCENE-5889: Add commit method to AnalyzingInfixSuggester, and allow just using .add
to build up the suggester. (Varun Thacker via Mike McCandless)
* LUCENE-5123: Add a "pull" option to the postings writing API, so
that a PostingsFormat now receives a Fields instance and it is
responsible for iterating through all fields, terms, documents and
positions. (Robert Muir, Mike McCandless)
* LUCENE-5268: Full cutover of all postings formats to the "pull"
FieldsConsumer API, removing PushFieldsConsumer. Added new
PushPostingsWriterBase for single-pass push of docs/positions to the
postings format. (Mike McCandless)
* LUCENE-5906: Use Files.delete everywhere instead of File.delete, so that
when things go wrong, you get a real exception message why.
(Uwe Schindler, Robert Muir)
* LUCENE-5933: Added FilterSpans for easier wrapping of Spans instance. (Shai Erera)
* LUCENE-5925: Remove fallback logic from opening commits, instead use
Directory.renameFile so that in-progress commits are never visible.
(Robert Muir)
* LUCENE-5820: SuggestStopFilter should have a factory.
(Varun Thacker via Steve Rowe)
* LUCENE-5949: Add Accountable.getChildResources(). (Robert Muir)
* SOLR-5986: Added ExitableDirectoryReader that extends FilterDirectoryReader and enables
exiting requests that take too long to enumerate over terms. (Anshum Gupta, Steve Rowe,
Robert Muir)
* LUCENE-5911: Add MemoryIndex.freeze() to allow thread-safe searching over a
MemoryIndex. (Alan Woodward, David Smiley, Robert Muir)
* LUCENE-5969: Lucene 5.0 has a new index format with mismatched file detection,
improved exception handling, and indirect norms encoding for sparse fields.
(Mike McCandless, Ryan Ernst, Robert Muir)
* LUCENE-6053: Add Serbian analyzer. (Nikola Smolenski via Robert Muir, Mike McCandless)
* LUCENE-4400: Add support for new NYSIIS Apache commons phonetic
codec (Thomas Neidhart via Mike McCandless)
* LUCENE-6059: Add Daitch-Mokotoff Soundex phonetic Apache commons
phonetic codec, and upgrade to Apache commons codec 1.10. (Thomas
Neidhart via Mike McCandless)
* LUCENE-6058: With the upgrade to Apache commons codec 1.10, the
experimental BeiderMorseFilter has changed its behavior, so any
index using it will need to be rebuilt. (Thomas
Neidhart via Mike McCandless)
* LUCENE-6050: Accept MUST and MUST_NOT (in addition to SHOULD) for
each context passed to Analyzing/BlendedInfixSuggester (Arcadius
Ahouansou, jane chang via Mike McCandless)
* LUCENE-5929: Also extract terms to highlight from block join
queries. (Julie Tibshirani via Mike McCandless)
* LUCENE-6063: Allow overriding whether/how ConcurrentMergeScheduler
stalls incoming threads when merges are falling behind (Mike
McCandless)
* LUCENE-5833: DocumentDictionary now enumerates each value separately
in a multi-valued field (not just the first value), so you can build
suggesters from multi-valued fields. (Varun Thacker via Mike
McCandless)
* LUCENE-6077: Added a filter cache. (Adrien Grand, Robert Muir)
* LUCENE-6088: TermsFilter implements Accountable. (Adrien Grand)
* LUCENE-6034: The default highlighter when used with QueryScorer will highlight payload-sensitive
queries provided that term vectors with positions, offsets, and payloads are present. This is the
only highlighter that can highlight such queries accurately. (David Smiley)
* LUCENE-5914: Add an option to Lucene50Codec to support either BEST_SPEED
or BEST_COMPRESSION for stored fields. (Adrien Grand, Robert Muir)
* LUCENE-6119: Add auto-IO-throttling to ConcurrentMergeScheduler, to
rate limit IO writes for each merge depending on incoming merge
rate. (Mike McCandless)
* LUCENE-6155: Add payload support to MemoryIndex. The default highlighter's
QueryScorer and WeighedSpanTermExtractor now have setUsePayloads(bool).
(David Smiley)
* LUCENE-6166: Deletions (alone) can now trigger new merges. (Mike McCandless)
* LUCENE-6177: Add CustomAnalyzer that allows to configure analyzers
like you do in Solr's index schema. This class has a builder API to configure
Tokenizers, TokenFilters, and CharFilters based on their SPI names
and parameters as documented by the corresponding factories.
(Uwe Schindler)
Optimizations
* LUCENE-5960: Use a more efficient bitset, not a Set<Integer>, to
track visited states. (Markus Heiden via Mike McCandless)
* LUCENE-5959: Don't allocate excess memory when building automaton in
finish. (Markus Heiden via Mike McCandless)
* LUCENE-5963: Reduce memory allocations in
AnalyzingSuggester. (Markus Heiden via Mike McCandless)
* LUCENE-5938: MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE is now faster on
queries that match few documents by using a sparse bit set implementation.
(Adrien Grand)
* LUCENE-5969: Refactor merging to be more efficient, checksum calculation is
per-segment/per-producer, and norms and doc values merging no longer cause
RAM spikes for latent fields. (Mike McCandless, Robert Muir)
* LUCENE-5983: CachingWrapperFilter now uses a new DocIdSet implementation
called RoaringDocIdSet instead of WAH8DocIdSet. (Adrien Grand)
* LUCENE-6022: DocValuesDocIdSet checks live docs before doc values.
(Adrien Grand)
* LUCENE-6030: Add norms patched compression for a small number of common values
(Ryan Ernst)
* LUCENE-6040: Speed up EliasFanoDocIdSet through broadword bit selection.
(Paul Elschot)
* LUCENE-6033: CachingTokenFilter now uses ArrayList not LinkedList, and has new
isCached() method. (David Smiley)
* LUCENE-6031: TokenSources (in the default highlighter) converts term vectors into a
TokenStream much faster in linear time (not N*log(N) using less memory, and with reset()
implemented. Only one of offsets or positions are required of the term vector.
(David Smiley)
* LUCENE-6089, LUCENE-6090: Tune CompressionMode.HIGH_COMPRESSION for
better compression and less cpu usage. (Adrien Grand, Robert Muir)
* LUCENE-6034: QueryScorer, used by the default highlighter, needn't re-index the provided
TokenStream with MemoryIndex when it comes from TokenSources (term vectors) with offsets and
positions. (David Smiley)
* LUCENE-5951: ConcurrentMergeScheduler detects whether the index is on SSD or not
and does a better job defaulting its settings. This only works on Linux for now;
other OS's will continue to use the previous defaults (tuned for spinning disks).
(Robert Muir, Uwe Schindler, hossman, Mike McCandless)
* LUCENE-6131: Optimize SortingMergePolicy. (Robert Muir)
* LUCENE-6133: Improve default StoredFieldsWriter.merge() to be more efficient.
(Robert Muir)
* LUCENE-6145: Make EarlyTerminatingSortingCollector able to early-terminate
when the sort order is a prefix of the index-time order. (Adrien Grand)
* LUCENE-6178: Score boolean queries containing MUST_NOT clauses with BooleanScorer2,
to use skip list data and avoid unnecessary scoring. (Adrien Grand, Robert Muir)
API Changes
* LUCENE-5900: Deprecated more constructors taking Version in *InfixSuggester and
ICUCollationKeyAnalyzer, and removed TEST_VERSION_CURRENT from the test framework.
(Ryan Ernst)
* LUCENE-4535: oal.util.FilterIterator is now an internal API.
(Adrien Grand)
* LUCENE-4924: DocIdSetIterator.docID() must now return -1 when the iterator is
not positioned. This change affects all classes that inherit from
DocIdSetIterator, including DocsEnum and DocsAndPositionsEnum. (Adrien Grand)
* LUCENE-5127: Reduce RAM usage of FixedGapTermsIndex. Remove
IndexWriterConfig.setTermIndexInterval, IndexWriterConfig.setReaderTermsIndexDivisor,
and termsIndexDivisor from StandardDirectoryReader. These options have been no-ops
with the default codec since Lucene 4.0. If you want to configure the interval for
this term index, pass it directly in your codec, where it can also be configured
per-field. (Robert Muir)
* LUCENE-5388: Remove Reader from Tokenizer's constructor and from
Analyzer's createComponents. TokenStreams now always get their input
via setReader.
(Benson Margulies via Robert Muir - pull request #16)
* LUCENE-5527: The Collector API has been refactored to use a dedicated Collector
per leaf. (Shikhar Bhushan, Adrien Grand)
* LUCENE-5702: The FieldComparator API has been refactor to a per-leaf API, just
like Collectors. (Adrien Grand)
* LUCENE-4246: IndexWriter.close now always closes, even if it throws
an exception. The new IndexWriterConfig.setCommitOnClose (default
true) determines whether close() should commit before closing.
* LUCENE-5608, LUCENE-5565: Refactor SpatialPrefixTree/Cell API. Doesn't use Strings
as tokens anymore, and now iterates cells on-demand during indexing instead of
building a collection. RPT now has more setters. (David Smiley)
* LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc)
to use the DocValues API instead of FieldCache. For FieldCache functionality,
use UninvertingReader in lucene/misc (or implement your own FilterReader).
UninvertingReader is more efficient: supports multi-valued numeric fields,
detects when a multi-valued field is single-valued, reuses caches
of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access
without insanity). "Insanity" is no longer possible unless you explicitly want it.
Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*.
Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which
takes the same selectors. Add helper methods to DocValues.java that are better
suited for search code (never return null, etc). (Mike McCandless, Robert Muir)
* LUCENE-5871: Remove Version from IndexWriterConfig. Use
IndexWriterConfig.setCommitOnClose to change the behavior of IndexWriter.close().
The default has been changed to match that of 4.x.
(Ryan Ernst, Mike McCandless)
* LUCENE-5965: CorruptIndexException requires a String or DataInput resource.
(Robert Muir)
* LUCENE-5972: IndexFormatTooOldException and IndexFormatTooNewException now
extend from IOException.
(Ryan Ernst, Robert Muir)
* LUCENE-5569: *AtomicReader/AtomicReaderContext have been renamed to *LeafReader/LeafReaderContext.
(Ryan Ernst)
* LUCENE-5938: Removed MultiTermQuery.ConstantScoreAutoRewrite as
MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE is usually better. (Adrien Grand)
* LUCENE-5924: Rename CheckIndex -fix option to -exorcise. This option does not
actually fix the index, it just drops data. (Robert Muir)
* LUCENE-5969: Add Codec.compoundFormat, which handles the encoding of compound
files. Add getMergeInstance() to codec producer APIs, which can be overridden
to return an instance optimized for merging instead of searching. Add
Terms.getStats() which can return additional codec-specific statistics about a field.
Change instance method SegmentInfos.read() to two static methods: SegmentInfos.readCommit()
and SegmentInfos.readLatestCommit().
(Mike McCandless, Robert Muir)
* LUCENE-5992: Remove FieldInfos from SegmentInfosWriter.write API. (Robert Muir, Mike McCandless)
* LUCENE-5998: Simplify Field/SegmentInfoFormat to read+write methods.
(Robert Muir)
* LUCENE-6000: Removed StandardTokenizerInterface. Tokenizers now use
their jflex impl directly.
(Ryan Ernst)
* LUCENE-6006: Removed FieldInfo.normType since it's redundant: it
will be DocValuesType.NUMERIC if the field indexed and does not omit
norms, else null. (Robert Muir, Mike McCandless)
* LUCENE-6013: Removed indexed boolean from IndexableFieldType and
FieldInfo, since it's redundant with IndexOptions != null. (Robert
Muir, Mike McCandless)
* LUCENE-6021: FixedBitSet.nextSetBit now returns DocIdSetIterator.NO_MORE_DOCS
instead of -1 when there are no more bits which are set. (Adrien Grand)
* LUCENE-5953: Directory and LockFactory APIs were restructured: Locking is
now under the responsibility of the Directory implementation. LockFactory is
only used by subclasses of BaseDirectory to delegate locking to an impl
class. LockFactories are now singletons and are responsible to create a Lock
instance based on a Directory implementation passed to the factory method.
See MIGRATE.txt for more details. (Uwe Schindler, Robert Muir)
* LUCENE-6062: Throw exception instead of silently doing nothing if you try to
sort/group/etc on a misconfigured field (e.g. no docvalues, no UninvertingReader, etc).
(Robert Muir)
* LUCENE-6068: LeafReader.fields() never returns null. (Robert Muir)
* LUCENE-6082: Remove abort() from codec apis. (Robert Muir)
* LUCENE-6084: IndexOutput's constructor now requires a String
resourceDescription so its toString is sane (Robert Muir, Mike
McCandless)
* LUCENE-6087: Allow passing custom DirectoryReader to SearcherManager
(Mike McCandless)
* LUCENE-6085: Undeprecate SegmentInfo attributes, but add safety so they
won't be trappy if codec tries to use them during docvalues updates.
(Robert Muir)
* LUCENE-6097: Remove dangerous / overly expert
IndexWriter.abortMerges and waitForMerges methods. (Robert Muir,
Mike McCandless)
* LUCENE-6099: Add FilterDirectory.unwrap and
FilterDirectoryReader.unwrap (Simon Willnauer, Mike McCandless)
* LUCENE-6121: CachingTokenFilter.reset() now propagates to its input if called before
incrementToken(). You must call reset() now on this filter instead of doing it a-priori on the
input(), which previously didn't work. (David Smiley, Robert Muir)
* LUCENE-6147: Make the core Accountables.namedAccountable function public
(Ryan Ernst)
* LUCENE-6150: Remove staleFiles set and onIndexOutputClosed() from FSDirectory.
(Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-6146: Replaced Directory.copy() with Directory.copyFrom().
(Robert Muir)
* LUCENE-6149: Infix suggesters' highlighting and allTermsRequired can
be set at the constructor for non-contextual lookup.
(Boon Low, Tomás Fernández Löbbe)
* LUCENE-6158, LUCENE-6165: IndexWriter.addIndexes(IndexReader...) changed to
addIndexes(CodecReader...) (Robert Muir)
* LUCENE-6179: Out-of-order scoring is not allowed anymore, so
Weight.scoresDocsOutOfOrder and LeafCollector.acceptsDocsOutOfOrder have been
removed and boolean queries now always score in order.
* LUCENE-6212: IndexWriter no longer accepts per-document Analyzer to
add/updateDocument. These methods were trappy as they made it
easy to accidentally index tokens that were not easily
searchable. (Mike McCandless)
Bug Fixes
* LUCENE-5650: Enforce read-only access to any path outside the temporary
folder via security manager, and make test temp dirs absolute.
(Ryan Ernst, Dawid Weiss)
* LUCENE-5948: RateLimiter now fully inits itself on init. (Varun
Thacker via Mike McCandless)
* LUCENE-5981: CheckIndex obtains write.lock, since with some parameters it
may modify the index, and to prevent false corruption reports, as it does
not have the regular "spinlock" of DirectoryReader.open. It now implements
Closeable and you must close it to release the lock. (Mike McCandless, Robert Muir)
* LUCENE-6004: Don't highlight the LookupResult.key returned from
AnalyzingInfixSuggester (Christian Reuschling, jane chang via Mike McCandless)
* LUCENE-5980: Don't let document length overflow. (Robert Muir)
* LUCENE-5961: Fix the exists() method for FunctionValues returned by many ValueSources to
behave properly when wrapping other ValueSources which do not exist for the specified document
(hossman)
* LUCENE-6039: Add IndexOptions.NONE and DocValuesType.NONE instead of
using null to mean not index and no doc values, renamed
IndexOptions.DOCS_ONLY to DOCS, and pulled IndexOptions and
DocValues out of FieldInfo into their own classes in
org.apache.lucene.index (Simon Willnauer, Robert Muir, Mike
McCandless)
* LUCENE-6041: Remove sugar methods FieldInfo.isIndexed and
FieldInfo.hasDocValues. (Robert Muir, Mike McCandless)
* LUCENE-6044: Fix backcompat support for token filters with enablePositionIncrements=false.
Also fixed backcompat for TrimFilter with updateOffsets=true. These options
are supported with a match version before 4.4, and no longer valid at all with 5.0.
(Ryan Ernst)
* LUCENE-6042: CustomScoreQuery explain was incorrect in some cases,
such as when nested inside a boolean query. (Denis Lantsman via Robert Muir)
* LUCENE-6046: Add maxDeterminizedStates safety to determinize (which has
an exponential worst case) so that if it would create too many states, it
now throws an exception instead of exhausting CPU/RAM. (Nik
Everett via Mike McCandless)
* LUCENE-6054: Allow repeating the empty automaton (Nik Everett via
Mike McCandless)
* LUCENE-6049: Don't throw cryptic exception writing a segment when
the only docs in it had fields that hit non-aborting exceptions
during indexing but also had doc values. (Mike McCandless)
* LUCENE-6055: PayloadAttribute.clone() now does a deep clone of the underlying
bytes. (Shai Erera)
* LUCENE-6060: Remove dangerous IndexWriter.unlock method (Simon
Willnauer, Mike McCandless)
* LUCENE-6062: Pass correct fieldinfos to docvalues producer when the
segment has updates. (Mike McCandless, Shai Erera, Robert Muir)
* LUCENE-6075: Don't overflow int in SimpleRateLimiter (Boaz Leskes
via Mike McCandless)
* LUCENE-5987: IndexWriter will now forcefully close itself on
aborting exception (an exception that would otherwise cause silent
data loss). (Robert Muir, Mike McCandless)
* LUCENE-6094: Allow IW.rollback to stop ConcurrentMergeScheduler even
when it's stalling because there are too many merges. (Mike McCandless)
* LUCENE-6105: Don't cache FST root arcs if the number of root arcs is
small, or if the cache would be > 20% of the size of the FST.
(Robert Muir, Mike McCandless)
* LUCENE-6124: Fix double-close() problems in codec and store APIs.
(Robert Muir)
* LUCENE-6152: Fix double close problems in OutputStreamIndexOutput.
(Uwe Schindler)
* LUCENE-6139: Highlighter: TokenGroup start & end offset getters should have
been returning the offsets of just the matching tokens in the group when
there's a distinction. (David Smiley)
* LUCENE-6173: NumericTermAttribute and spatial/CellTokenStream do not clone
their BytesRef(Builder)s. Also equals/hashCode was missing. (Uwe Schindler)
* LUCENE-6205: Fixed intermittent concurrency issue that could cause
FileNotFoundException when writing doc values updates at the same
time that a merge kicks off. (Mike McCandless)
* LUCENE-6192: Fix int overflow corruption case in skip data for
high frequency terms in extremely large indices (Robert Muir, Mike
McCandless)
* LUCENE-6093: Don't throw NullPointerException from
BlendedInfixSuggester for lookups that do not end in a prefix
token. (jane chang via Mike McCandless)
* LUCENE-6214: Fixed IndexWriter deadlock when one thread is
committing while another opens a near-real-time reader and an
unrecoverable (tragic) exception is hit. (Simon Willnauer, Mike
McCandless)
Documentation
* LUCENE-5392: Add/improve analysis package documentation to reflect
analysis API changes. (Benson Margulies via Robert Muir - pull request #17)
* LUCENE-6057: Improve Sort(SortField) docs (Martin Braun via Mike McCandless)
* LUCENE-6112: Fix compile error in FST package example code
(Tomoko Uchida via Koji Sekiguchi)
Tests
* LUCENE-5957: Add option for tests to not randomize codec
(Ryan Ernst)
* LUCENE-5974: Add check that backcompat indexes use default codecs
(Ryan Ernst)
* LUCENE-5971: Create addBackcompatIndexes.py script to build and add
backcompat test indexes for a given lucene version. Also renamed backcompat
index files to use Version.toString() in filename.
(Ryan Ernst)
* LUCENE-6002: Monster tests no longer fail. Most of them now have an 80 hour
timeout, effectively removing the timeout. The tests that operate near the 2
billion limit now use IndexWriter.MAX_DOCS instead of Integer.MAX_VALUE.
Some of the slow Monster tests now explicitly choose the default codec.
(Mike McCandless, Shawn Heisey)
* LUCENE-5968: Improve error message when 'ant beast' is run on top-level
modules. (Ramkumar Aiyengar, Uwe Schindler)
* LUCENE-6120: Fix MockDirectoryWrapper's close() handling.
(Mike McCandless, Robert Muir)
Build
* LUCENE-5909: Smoke tester now has better command line parsing and
optionally also runs on Java 8. (Ryan Ernst, Uwe Schindler)
* LUCENE-5902: Add bumpVersion.py script to manage version increase after release branch is cut.
* LUCENE-5962: Rename diffSources.py to createPatch.py and make it work with all text file types.
(Ryan Ernst)
* LUCENE-5995: Upgrade ICU to 54.1 (Robert Muir)
* LUCENE-6070: Upgrade forbidden-apis to 1.7 (Uwe Schindler)
Other
* LUCENE-5563: Removed sep layout: which has fallen behind on features and doesn't
perform as well as other options. (Robert Muir)
* LUCENE-4086: Removed support for Lucene 3.x indexes. See migration guide for
more information. (Robert Muir)
* LUCENE-5858: Moved Lucene 4 compatibility codecs to 'lucene-backward-codecs.jar'.
(Adrien Grand, Robert Muir)
* LUCENE-5915: Remove Pulsing postings format. (Robert Muir)
* LUCENE-6213: Add useful exception message when commit contains segments from legacy codecs.
(Ryan Ernst)
======================= Lucene 4.10.4 ======================
Bug fixes
* LUCENE-6019, LUCENE-6117: Remove -Dtests.assert to make IndexWriter
infoStream sane. (Robert Muir, Mike McCandless)
* LUCENE-6161: Resolving deletes was failing to reuse DocsEnum likely
causing substantial performance cost for use cases that frequently
delete old documents (Mike McCandless)
* LUCENE-6192: Fix int overflow corruption case in skip data for
high frequency terms in extremely large indices (Robert Muir, Mike
McCandless)
* LUCENE-6207: Fixed consumption of several terms enums on the same
sorted (set) doc values instance at the same time.
(Tom Shally, Robert Muir, Adrien Grand)
* LUCENE-6093: Don't throw NullPointerException from
BlendedInfixSuggester for lookups that do not end in a prefix
token. (jane chang via Mike McCandless)
* LUCENE-6279: Don't let an abusive leftover _N_upgraded.si in the
index directory cause index corruption on upgrade (Robert Muir, Mike
McCandless)
* LUCENE-6287: Fix concurrency bug in IndexWriter that could cause
index corruption (missing _N.si files) the first time 4.x kisses a
3.x index if merges are also running. (Simon Willnauer, Mike
McCandless)
* LUCENE-6205: Fixed intermittent concurrency issue that could cause
FileNotFoundException when writing doc values updates at the same
time that a merge kicks off. (Mike McCandless)
* LUCENE-6214: Fixed IndexWriter deadlock when one thread is
committing while another opens a near-real-time reader and an
unrecoverable (tragic) exception is hit. (Simon Willnauer, Mike
McCandless)
* LUCENE-6105: Don't cache FST root arcs if the number of root arcs is
small, or if the cache would be > 20% of the size of the FST.
(Robert Muir, Mike McCandless)
* LUCENE-6001: DrillSideways hits NullPointerException for certain
BooleanQuery searches. (Dragan Jotannovic, jane chang via Mike
McCandless)
* LUCENE-6306: Merging of doc values and norms now checks whether the
merge was aborted so IndexWriter.rollback can more promptly abort a
running merge. (Robert Muir, Mike McCandless)
API Changes
* LUCENE-6212: Deprecate IndexWriter APIs that accept per-document Analyzer.
These methods were trappy as they made it easy to accidentally index
tokens that were not easily searchable and will be removed in 5.0.0.
(Mike McCandless)
======================= Lucene 4.10.3 ======================
Bug fixes
* LUCENE-6046: Add maxDeterminizedStates safety to determinize (which has
an exponential worst case) so that if it would create too many states, it
now throws an exception instead of exhausting CPU/RAM. (Nik
Everett via Mike McCandless)
* LUCENE-6054: Allow repeating the empty automaton (Nik Everett via
Mike McCandless)
* LUCENE-6049: Don't throw cryptic exception writing a segment when
the only docs in it had fields that hit non-aborting exceptions
during indexing but also had doc values. (Mike McCandless)
* LUCENE-6060: Deprecate IndexWriter.unlock (Simon Willnauer, Mike
McCandless)
* LUCENE-3229: Overlapping ordered SpanNearQuery spans should not match.
(Ludovic Boutros, Paul Elschot, Greg Dearing, ehatcher)
* LUCENE-6004: Don't highlight the LookupResult.key returned from
AnalyzingInfixSuggester (Christian Reuschling, jane chang via Mike McCandless)
* LUCENE-6075: Don't overflow int in SimpleRateLimiter (Boaz Leskes
via Mike McCandless)
* LUCENE-5980: Don't let document length overflow. (Robert Muir)
* LUCENE-6042: CustomScoreQuery explain was incorrect in some cases,
such as when nested inside a boolean query. (Denis Lantsman via Robert Muir)
* LUCENE-5948: RateLimiter now fully inits itself on init. (Varun
Thacker via Mike McCandless)
* LUCENE-6055: PayloadAttribute.clone() now does a deep clone of the underlying
bytes. (Shai Erera)
* LUCENE-6094: Allow IW.rollback to stop ConcurrentMergeScheduler even
when it's stalling because there are too many merges. (Mike McCandless)
Documentation
* LUCENE-6057: Improve Sort(SortField) docs (Martin Braun via Mike McCandless)
======================= Lucene 4.10.2 ======================
Bug fixes
* LUCENE-5977: Fix tokenstream safety checks in IndexWriter to properly
work across multi-valued fields. Previously some cases across multi-valued
fields would happily create a corrupt index. (Dawid Weiss, Robert Muir)
* LUCENE-6019: Detect when DocValuesType illegally changes for the
same field name. Also added -Dtests.asserts=true|false so we can
run tests with and without assertions. (Simon Willnauer, Robert
Muir, Mike McCandless).
======================= Lucene 4.10.1 ======================
Bug fixes
* LUCENE-5934: Fix backwards compatibility for 4.0 indexes.
(Ian Lea, Uwe Schindler, Robert Muir, Ryan Ernst)
* LUCENE-5939: Regenerate old backcompat indexes to ensure they were built with
the exact release
(Ryan Ernst, Uwe Schindler)
* LUCENE-5952: Improve error messages when version cannot be parsed;
don't check for too old or too new major version (it's too low level
to enforce here); use simple string tokenizer. (Ryan Ernst, Uwe Schindler,
Robert Muir, Mike McCandless)
* LUCENE-5958: Don't let exceptions during checkpoint corrupt the index.
Refactor existing OOM handling too, so you don't need to handle OOM special
for every IndexWriter method: instead such disasters will cause IW to close itself
defensively. (Robert Muir, Mike McCandless)
* LUCENE-5904: Fixed a corruption case that can happen when 1)
IndexWriter is uncleanly shut-down (OS crash, power loss, etc.), 2)
on startup, when a new IndexWriter is created, a virus checker is
holding some of the previously written but unused files open and
preventing deletion, 3) IndexWriter writes these files again during
the course of indexing, then the files can later be deleted, causing
corruption. This case was detected by adding evilness to
MockDirectoryWrapper to have it simulate a virus checker holding a
file open and preventing deletion (Robert Muir, Mike McCandless)
* LUCENE-5916: Static scope test components should be consistent between
tests (and test iterations). Fix for FaultyIndexInput in particular.
(Dawid Weiss)
* LUCENE-5975: Fix reading of 3.0-3.3 indexes, where bugs in these old
index formats would result in CorruptIndexException "did not read all
bytes from file" when reading the deleted docs file. (Patrick Mi, Robert MUir)
Tests
* LUCENE-5936: Add backcompat checks to verify what is tested matches known versions
(Ryan Ernst)
======================= Lucene 4.10.0 ======================
New Features
* LUCENE-5778: Support hunspell morphological description fields/aliases.
(Robert Muir)
* LUCENE-5801: Added (back) OrdinalMappingAtomicReader for merging search
indexes that contain category ordinals from separate taxonomy indexes.
(Nicola Buso via Shai Erera)
* LUCENE-4175, LUCENE-5714, LUCENE-5779: Index and search rectangles with spatial
BBoxSpatialStrategy using most predicates. Sort documents by relative overlap
of query areas or just by indexed shape area. (Ryan McKinley, David Smiley)
* LUCENE-5806: Extend expressions grammar to support array access in variables.
Added helper class VariableContext to parse complex variable into pieces.
(Ryan Ernst)
* LUCENE-5826: Support proper hunspell case handling, LANG, KEEPCASE, NEEDAFFIX,
and ONLYINCOMPOUND flags. (Robert Muir)
* LUCENE-5815: Add TermAutomatonQuery, a proximity query allowing you
to create an arbitrary automaton, using terms on the transitions,
expressing which sequence of sequential terms (including a special
"any" term) are allowed. This is a generalization of
MultiPhraseQuery and span queries, and enables "correct" (including
position) length search-time graph synonyms. (Mike McCandless)
* LUCENE-5819: Add OrdsLucene41 block tree terms dict and postings
format, to include term ordinals in the index so the optional
TermsEnum.ord() and TermsEnum.seekExact(long ord) APIs work. (Mike
McCandless)
* LUCENE-5835: TermValComparator can sort missing values last. (Adrien Grand)
* LUCENE-5825: Benchmark module can use custom postings format, e.g.:
codec.postingsFormat=Memory (Varun Shenoy, David Smiley)
* LUCENE-5842: When opening large files (where it's too expensive to compare
checksum against all the bytes), retrieve checksum to validate structure
of footer, this can detect some forms of corruption such as truncation.
(Robert Muir)
* LUCENE-5739: Added DataInput.readZ(Int|Long) and DataOutput.writeZ(Int|Long)
to read and write small signed integers. (Adrien Grand)
API Changes
* LUCENE-5752: Simplified Automaton API to be immutable. (Mike McCandless)
* LUCENE-5793: Add equals/hashCode to FieldType. (Shay Banon, Robert Muir)
* LUCENE-5692: DisjointSpatialFilter is deprecated (used by RecursivePrefixTreeStrategy)
(David Smiley)
* LUCENE-5771: SpatialOperation's predicate names are now aliased to OGC standard names.
Thus you can use: Disjoint, Equals, Intersects, Overlaps, Within, Contains, Covers,
CoveredBy. The area requirement on the predicates was removed, and Overlaps' definition
was fixed. (David Smiley)
* LUCENE-5850: Made Version handling more robust and extensible. Deprecated
Constants.LUCENE_MAIN_VERSION, Constants.LUCENE_VERSION and current Version
constants of the form LUCENE_X_Y. Added version constants that include bugfix
number of form LUCENE_X_Y_Z. Changed Version.LUCENE_CURRENT to Version.LATEST.
CheckIndex now prints the Lucene version used to write each segment.
(Ryan Ernst, Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-5836: BytesRef has been splitted into BytesRef, whose intended usage is
to be just a reference to a section of a larger byte[] and BytesRefBuilder
which is a StringBuilder-like class for BytesRef instances. (Adrien Grand)
* LUCENE-5883: You can now change the MergePolicy instance on a live IndexWriter,
without first closing and reopening the writer. This allows to e.g. run a special
merge with UpgradeIndexMergePolicy without reopening the writer. Also, MergePolicy
no longer implements Closeable; if you need to release your custom MergePolicy's
resources, you need to implement close() and call it explicitly. (Shai Erera)
* LUCENE-5859: Deprecate Analyzer constructors taking Version. Use Analyzer.setVersion()
to set the version an analyzer to replicate behavior from a specific release.
(Ryan Ernst, Robert Muir)
Optimizations
* LUCENE-5780: Make OrdinalMap more memory-efficient, especially in case the
first segment has all values. (Adrien Grand, Robert Muir)
* LUCENE-5782: OrdinalMap now sorts enums before being built in order to
improve compression. (Adrien Grand)
* LUCENE-5798: Optimize MultiDocsEnum reuse. (Robert Muir)
* LUCENE-5799: Optimize numeric docvalues merging. (Robert Muir)
* LUCENE-5797: Optimize norms merging (Adrien Grand, Robert Muir)
* LUCENE-5803: Add DelegatingAnalyzerWrapper, an optimized variant
of AnalyzerWrapper that doesn't allow to wrap components or readers.
This wrapper class is the base class of all analyzers that just delegate
to another analyzer, e.g. per field name: PerFieldAnalyzerWrapper and
Solr's schema support. (Shay Banon, Uwe Schindler, Robert Muir)
* LUCENE-5795: MoreLikeThisQuery now only collects the top N terms instead
of collecting all terms from the like text when building the query.
(Alex Ksikes, Simon Willnauer)
* LUCENE-5681: Fix RAMDirectory's IndexInput to not do double buffering
on slices (causes useless data copying, especially on random access slices).
This also improves slices of NRTCachingDirectory, because the cache
is based on RAMDirectory. BufferedIndexInput.wrap() was marked with a
warning in javadocs. It is almost always a better idea to implement
slicing on your own! (Uwe Schindler, Robert Muir)
* LUCENE-5834: Empty sorted set and numeric doc values are now singletons.
(Adrien Grand)
* LUCENE-5841: Improve performance of block tree terms dictionary when
assigning terms to blocks. (Mike McCandless)
* LUCENE-5856: Optimize Fixed/Open/LongBitSet to remove unnecessary AND.
(Robert Muir)
* LUCENE-5884: Optimize FST.ramBytesUsed. (Adrien Grand, Robert Muir,
Mike McCandless)
* LUCENE-5882: Add Lucene410DocValuesFormat, with faster term lookups
for SORTED/SORTED_SET fields. (Robert Muir)
* LUCENE-5887: Remove WeakIdentityMap caching in AttributeFactory,
AttributeSource, and VirtualMethod in favour of Java 7's ClassValue.
Always use MethodHandles to create AttributeImpl classes.
(Uwe Schindler)
Bug Fixes
* LUCENE-5796: Fixes the Scorer.getChildren() method for two combinations
of BooleanQuery. (Terry Smith via Robert Muir)
* LUCENE-5790: Fix compareTo in MutableValueDouble and MutableValueBool, this caused
incorrect results when grouping on fields with missing values.
(海老澤 志信, hossman)
* LUCENE-5817: Fix hunspell zero-affix handling: previously only zero-strips worked
correctly. (Robert Muir)
* LUCENE-5818, LUCENE-5823: Fix hunspell overgeneration for short strings that also
match affixes, words are only stripped to a zero-length string if FULLSTRIP option
is specified in the dictionary. (Robert Muir)
* LUCENE-5824: Fix hunspell 'long' flag handling. (Robert Muir)
* LUCENE-5838: Fix hunspell when the .aff file has over 64k affixes. (Robert Muir)
* LUCENE-5869: Added restriction to positive values for maxExpansions in
FuzzyQuery. (Ryan Ernst)
* LUCENE-5672: IndexWriter.addIndexes() calls maybeMerge(), to ensure the index stays
healthy. If you don't want merging use NoMergePolicy instead. (Robert Muir)
* LUCENE-5908: Fix Lucene43NGramTokenizer to be final
Test Framework
* LUCENE-5786: Unflushed/ truncated events file (hung testing subprocess).
(Dawid Weiss)
* LUCENE-5881: Add "beasting" of tests: repeats the whole "test" Ant target
N times with "ant beast -Dbeast.iters=N". (Uwe Schindler, Robert Muir,
Ryan Ernst, Dawid Weiss)
Build
* LUCENE-5770: Upgrade to JFlex 1.6, which has direct support for
supplementary code points - as a result, ICU4J is no longer used
to generate surrogate pairs to augment JFlex scanner specifications.
(Steve Rowe)
* SOLR-6358: Remove VcsDirectoryMappings from idea configuration
vcs.xml (Ramkumar Aiyengar via Steve Rowe)
======================= Lucene 4.9.1 ======================
Bug fixes
* LUCENE-5907: Fix corruption case when opening a pre-4.x index with
IndexWriter, then opening an NRT reader from that writer, then
calling commit from the writer, then closing the NRT reader. This
case would remove the wrong files from the index leading to a
corrupt index. (Mike McCandless)
* LUCENE-5919: Fix exception handling inside IndexWriter when
deleteFile throws an exception, to not over-decRef index files,
possibly deleting a file that's still in use in the index, leading
to corruption. (Mike McCandless)
* LUCENE-5922: DocValuesDocIdSet on 5.x and FieldCacheDocIdSet on 4.x
are not cacheable. (Adrien Grand)
* LUCENE-5843: Added IndexWriter.MAX_DOCS which is the maximum number
of documents allowed in a single index, and any operations that add
documents will now throw IllegalStateException if the max count
would be exceeded, instead of silently creating an unusable
index. (Mike McCandless)
* LUCENE-5844: ArrayUtil.grow/oversize now returns a maximum of
Integer.MAX_VALUE - 8 for the maximum array size. (Robert Muir,
Mike McCandless)
* LUCENE-5827: Make all Directory implementations correctly fail with
IllegalArgumentException if slices are out of bounds. (Uwe Schindler)
* LUCENE-5897, LUCENE-5400: JFlex-based tokenizers StandardTokenizer and
UAX29URLEmailTokenizer tokenize extremely slowly over long sequences of
text partially matching certain grammar rules. The scanner default
buffer size was reduced, and scanner buffer growth was disabled, resulting
in much, much faster tokenization for these text sequences.
(Chris Geeringh, Robert Muir, Steve Rowe)
======================= Lucene 4.9.0 =======================
Changes in Runtime Behavior
* LUCENE-5611: Changing the term vector options for multiple field
instances by the same name in one document is not longer accepted;
IndexWriter will now throw IllegalArgumentException. (Robert Muir,
Mike McCandless)
* LUCENE-5646: Remove rare/undertested bulk merge algorithm in
CompressingStoredFieldsWriter. (Robert Muir, Adrien Grand)
New Features
* LUCENE-5610: Add Terms.getMin and Terms.getMax to get the lowest and
highest terms, and NumericUtils.get{Min/Max}{Int/Long} to get the
minimum numeric values from the provided Terms. (Robert Muir, Mike
McCandless)
* LUCENE-5675: Add IDVersionPostingsFormat, a postings format
optimized for primary-key (ID) fields that also record a version
(long) for each ID. (Robert Muir, Mike McCandless)
* LUCENE-5680: Add ability to atomically update a set of DocValues
fields. (Shai Erera)
* LUCENE-5717: Add support for multiterm queries nested inside
filtered and constant-score queries to postings highlighter.
(Luca Cavanna via Robert Muir)
* LUCENE-5731, LUCENE-5760: Add RandomAccessInput, a random access API for directory.
Add DirectReader/Writer, optimized for reading packed integers directly
from Directory. Add Lucene49Codec and Lucene49DocValuesFormat that make
use of these. (Robert Muir)
* LUCENE-5743: Add Lucene49NormsFormat, which can compress in some cases
such as very short fields. (Ryan Ernst, Adrien Grand, Robert Muir)
* LUCENE-5748: Add SORTED_NUMERIC docvalues type, which is efficient
for processing numeric fields with multiple values. (Robert Muir)
* LUCENE-5754: Allow "$" as part of variable and function names in
expressions module. (Uwe Schindler)
Changes in Backwards Compatibility Policy
* LUCENE-5634: Add reuse argument to IndexableField.tokenStream. This
can be used by custom fieldtypes, which don't use the Analyzer, but
implement their own TokenStream. (Uwe Schindler, Robert Muir)
* LUCENE-5640: AttributeSource.AttributeFactory was moved to a
top-level class: org.apache.lucene.util.AttributeFactory
(Uwe Schindler, Robert Muir)
* LUCENE-4371: Removed IndexInputSlicer and Directory.createSlicer() and replaced
with IndexInput.slice(). (Robert Muir)
* LUCENE-5727, LUCENE-5678: Remove IndexOutput.seek, IndexOutput.setLength().
(Robert Muir, Uwe Schindler)
API Changes
* LUCENE-5756: IndexWriter now implements Accountable and IW#ramSizeInBytes()
has been deprecated in favor of IW#ramBytesUsed() (Simon Willnauer)
* LUCENE-5725: MoreLikeThis#like now accepts multiple values per field.
The pre-existing method has been deprecated in favor of a variable arguments
for the like text. (Alex Ksikes via Simon Willnauer)
* LUCENE-5711: MergePolicy accepts an IndexWriter instance
on each method rather than holding state against a single
IndexWriter instance. (Simon Willnauer)
* LUCENE-5582: Deprecate IndexOutput.length (just use
IndexOutput.getFilePointer instead) and IndexOutput.setLength.
(Mike McCandless)
* LUCENE-5621: Deprecate IndexOutput.flush: this is not used by Lucene.
(Robert Muir)
* LUCENE-5611: Simplified Lucene's default indexing chain / APIs.
AttributeSource/TokenStream.getAttribute now returns null if the
attribute is not present (previously it threw
IllegalArgumentException). StoredFieldsWriter.startDocument no
longer receives the number of fields that will be added (Robert
Muir, Mike McCandless)
* LUCENE-5632: In preparation for coming Lucene versions, the Version
enum constants were renamed to make them better readable. The constant
for Lucene 4.9 is now "LUCENE_4_9". Version.parseLeniently() is still
able to parse the old strings ("LUCENE_49"). The old identifiers got
deprecated and will be removed in Lucene 5.0. (Uwe Schindler,
Robert Muir)
* LUCENE-5633: Change NoMergePolicy to a singleton with no distinction between
compound and non-compound types. (Shai Erera)
* LUCENE-5640: The Token class was deprecated. Since Lucene 2.9, TokenStreams
are using Attributes, Token is no longer used. (Uwe Schindler, Robert Muir)
* LUCENE-5679: Consolidated IndexWriter.deleteDocuments(Term) and
IndexWriter.deleteDocuments(Query) with their varargs counterparts.
(Shai Erera)
* LUCENE-5701: Core closed listeners are now available in the AtomicReader API,
they used to sit only in SegmentReader. (Adrien Grand, Robert Muir)
* LUCENE-5706: Removed the option to unset a DocValues field through DocValues
updates. (Shai Erera)
* LUCENE-5700: Added oal.util.Accountable that is now implemented by all
classes whose memory usage can be estimated. (Robert Muir, Adrien Grand)
* LUCENE-5708: Remove IndexWriterConfig.clone, so now IndexWriter
simply uses the IndexWriterConfig you pass it, and you must create a
new IndexWriterConfig for each IndexWriter. (Mike McCandless)
* LUCENE-5678: IndexOutput no longer allows seeking, so it is no longer required
to use RandomAccessFile to write Indexes. Lucene now uses standard FileOutputStream
wrapped with OutputStreamIndexOutput to write index data. BufferedIndexOutput was
removed, because buffering and checksumming is provided by FilterOutputStreams,
provided by the JDK. (Uwe Schindler, Mike McCandless)
* LUCENE-5703: BinaryDocValues API changed to work like TermsEnum and not allocate/
copy bytes on each access, you are responsible for cloning if you want to keep
data around. (Adrien Grand)
* LUCENE-5695: DocIdSet implements Accountable. (Adrien Grand)
* LUCENE-5757: Moved RamUsageEstimator's reflection-based processing to RamUsageTester
in the test-framework module. (Robert Muir)
* LUCENE-5761: Removed DiskDocValuesFormat, it was very inefficient and saved very little
RAM over the default codec. (Robert Muir)
* LUCENE-5775: Deprecate JaspellLookup. (Mike McCandless)
Optimizations
* LUCENE-5603: hunspell stemmer more efficiently strips prefixes
and suffixes. (Robert Muir)
* LUCENE-5599: HttpReplicator did not properly delegate bulk read() to wrapped
InputStream. (Christoph Kaser via Shai Erera)
* LUCENE-5591: pass an IOContext with estimated flush size when applying DV
updates. (Shai Erera)
* LUCENE-5634: IndexWriter reuses TokenStream instances for String and Numeric
fields by default. (Uwe Schindler, Shay Banon, Mike McCandless, Robert Muir)
* LUCENE-5638, LUCENE-5640: TokenStream uses a more performant AttributeFactory
by default, that packs the core attributes into one implementation
(PackedTokenAttributeImpl), for faster clearAttributes(), saveState(), and
restoreState(). In addition, AttributeFactory uses Java 7 MethodHandles for
instantiating Attribute implementations. (Uwe Schindler, Robert Muir)
* LUCENE-5609: Changed the default NumericField precisionStep from 4
to 8 (for int/float) and 16 (for long/double), for faster indexing
time and smaller indices. (Robert Muir, Uwe Schindler, Mike McCandless)
* LUCENE-5670: Add skip/FinalOutput to FST Outputs. (Christian
Ziech via Mike McCandless).
* LUCENE-4236: Optimize BooleanQuery's in-order scoring. This speeds up
some types of boolean queries. (Robert Muir)
* LUCENE-5694: Don't score() subscorers in DisjunctionSumScorer or
DisjunctionMaxScorer unless score() is called. (Robert Muir)
* LUCENE-5720: Optimize DirectPackedReader's decompression. (Robert Muir)
* LUCENE-5722: Optimize ByteBufferIndexInput#seek() by specializing
implementations. This improves random access as used by docvalues codecs
if used with MMapDirectory. (Robert Muir, Uwe Schindler)
* LUCENE-5730: FSDirectory.open returns MMapDirectory for 64-bit operating
systems, not just Linux and Windows. (Robert Muir)
* LUCENE-5703: BinaryDocValues producers don't allocate or copy bytes on
each access anymore. (Adrien Grand)
* LUCENE-5721: Monotonic compression doesn't use zig-zag encoding anymore.
(Robert Muir, Adrien Grand)
* LUCENE-5750: Speed up monotonic addressing for BINARY and SORTED_SET
docvalues. (Robert Muir)
* LUCENE-5751: Speed up MemoryDocValues. (Adrien Grand, Robert Muir)
* LUCENE-5767: OrdinalMap optimizations, that mostly help on low cardinalities.
(Martijn van Groningen, Adrien Grand)
* LUCENE-5769: SingletonSortedSetDocValues now supports random access ordinals.
(Robert Muir)
Bug fixes
* LUCENE-5738: Ensure NativeFSLock prevents opening the file channel for the
lock if the lock is already obtained by the JVM. Trying to obtain an already
obtained lock in the same JVM can unlock the file might allow other processes
to lock the file even without explicitly unlocking the FileLock. This behavior
is operating system dependent. (Simon Willnauer)
* LUCENE-5673: MMapDirectory: Work around a "bug" in the JDK that throws
a confusing OutOfMemoryError wrapped inside IOException if the FileChannel
mapping failed because of lack of virtual address space. The IOException is
rethrown with more useful information about the problem, omitting the
incorrect OutOfMemoryError. (Robert Muir, Uwe Schindler)
* LUCENE-5682: NPE in QueryRescorer when Scorer is null
(Joel Bernstein, Mike McCandless)
* LUCENE-5691: DocTermOrds lookupTerm(BytesRef) would return incorrect results
if the underlying TermsEnum supports ord() and the insertion point would
be at the end. (Robert Muir)
* LUCENE-5618, LUCENE-5636: SegmentReader referenced unneeded files following
doc-values updates. Now doc-values field updates are written in separate file
per field. (Shai Erera, Robert Muir)
* LUCENE-5684: Make best effort to detect invalid usage of Lucene,
when IndexReader is reopened after all files in its index were
removed and recreated by the application (the proper way to do
this is IndexWriter.deleteAll, or opening an IndexWriter with
OpenMode.CREATE) (Mike McCandless)
* LUCENE-5704: Fix compilation error with Java 8u20. (Uwe Schindler)
* LUCENE-5710: Include the inner exception as the cause and in the
exception message when an immense term is hit during indexing (Lee
Hinman via Mike McCandless)
* LUCENE-5724: CompoundFileWriter was failing to pass through the
IOContext in some cases, causing NRTCachingDirectory to cache
compound files when it shouldn't, then causing OOMEs. (Mike
McCandless)
* LUCENE-5747: Project-specific settings for the eclipse development
environment will prevent automatic code reformatting. (Shawn Heisey)
* LUCENE-5768, LUCENE-5777: Hunspell condition checks containing character classes
were buggy. (Clinton Gormley, Robert Muir)
Test Framework
* LUCENE-5622: Fail tests if they print over the given limit of bytes to
System.out or System.err. (Robert Muir, Dawid Weiss)
* LUCENE-5619: Added backwards compatibility tests to ensure we can update existing
indexes with doc-values updates. (Shai Erera, Robert Muir)
Build
* LUCENE-5442: The Ant check-lib-versions target now runs Ivy resolution
transitively, then fails the build when it finds a version conflict: when a
transitive dependency's version is more recent than the direct dependency's
version specified in lucene/ivy-versions.properties. Exceptions are
specifiable in lucene/ivy-ignore-conflicts.properties.
(Steve Rowe)
* LUCENE-5715: Upgrade direct dependencies known to be older than transitive
dependencies: com.sun.jersey.version:1.8->1.9; com.sun.xml.bind:jaxb-impl:2.2.2->2.2.3-1;
commons-beanutils:commons-beanutils:1.7.0->1.8.3; commons-digester:commons-digester:2.0->2.1;
commons-io:commons-io:2.1->2.3; commons-logging:commons-logging:1.1.1->1.1.3;
io.netty:netty:3.6.2.Final->3.7.0.Final; javax.activation:activation:1.1->1.1.1;
javax.mail:mail:1.4.1->1.4.3; log4j:log4j:1.2.16->1.2.17; org.apache.avro:avro:1.7.4->1.7.5;
org.tukaani:xz:1.2->1.4; org.xerial.snappy:snappy-java:1.0.4.1->1.0.5 (Steve Rowe)
======================= Lucene 4.8.1 =======================
Bug fixes
* LUCENE-5639: Fix PositionLengthAttribute implementation in Token class.
(Uwe Schindler, Robert Muir)
* LUCENE-5635: IndexWriter didn't properly handle IOException on TokenStream.reset(),
which could leave the analyzer in an inconsistent state. (Robert Muir)
* LUCENE-5599: HttpReplicator did not properly delegate bulk read() to wrapped
InputStream. (Christoph Kaser via Shai Erera)
* LUCENE-5600: HttpClientBase did not properly consume a connection if a server
error occurred. (Christoph Kaser via Shai Erera)
* LUCENE-5628: Change getFiniteStrings to iterative not recursive
implementation, so that building suggesters on a long suggestion
doesn't risk overflowing the stack; previously it consumed one Java
stack frame per character in the expanded suggestion. If you are building
a suggester this is a nasty trap. (Robert Muir, Simon Willnauer,
Mike McCandless).
* LUCENE-5559: Add additional argument validation for CapitalizationFilter
and CodepointCountFilter. (Ahmet Arslan via Robert Muir)
* LUCENE-5641: SimpleRateLimiter would silently rate limit at 8 MB/sec
even if you asked for higher rates. (Mike McCandless)
* LUCENE-5644: IndexWriter clears which threads use which internal
thread states on flush, so that if an application reduces how many
threads it uses for indexing, that results in a reduction of how
many segments are flushed on a full-flush (e.g. to obtain a
near-real-time reader). (Simon Willnauer, Mike McCandless)
* LUCENE-5653: JoinUtil with ScoreMode.Avg on a multi-valued field
with more than 256 values would throw exception.
(Mikhail Khludnev via Robert Muir)
* LUCENE-5654: Fix various close() methods that could suppress
throwables such as OutOfMemoryError, instead returning scary messages
that look like index corruption. (Mike McCandless, Robert Muir)
* LUCENE-5656: Fix rare fd leak in SegmentReader when multiple docvalues
fields have been updated with IndexWriter.updateXXXDocValue and one
hits exception. (Shai Erera, Robert Muir)
* LUCENE-5660: AnalyzingSuggester.build will now throw IllegalArgumentException if
you give it a longer suggestion than it can handle (Robert Muir, Mike McCandless)
* LUCENE-5662: Add missing checks to Field to prevent IndexWriter.abort
if a stored value is null. (Robert Muir)
* LUCENE-5668: Fix off-by-one in TieredMergePolicy (Mike McCandless)
* LUCENE-5671: Upgrade ICU version to fix an ICU concurrency problem that
could cause exceptions when indexing. (feedly team, Robert Muir)
======================= Lucene 4.8.0 =======================
System Requirements
* LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version.
(Robert Muir, Uwe Schindler)
Changes in Runtime Behavior
* LUCENE-5472: IndexWriter.addDocument will now throw an IllegalArgumentException
if a Term to be indexed exceeds IndexWriter.MAX_TERM_LENGTH. To recreate previous
behavior of silently ignoring these terms, use LengthFilter in your Analyzer.
(hossman, Mike McCandless, Varun Thacker)
New Features
* LUCENE-5356: Morfologik filter can accept custom dictionary resources.
(Michal Hlavac, Dawid Weiss)
* LUCENE-5454: Add SortedSetSortField to lucene/sandbox, to allow sorting
on multi-valued field. (Robert Muir)
* LUCENE-5478: CommonTermsQuery now allows to create custom term queries
similar to the query parser by overriding a newTermQuery method.
(Simon Willnauer)
* LUCENE-5477: AnalyzingInfixSuggester now supports near-real-time
additions and updates (to change weight or payload of an existing
suggestion). (Mike McCandless)
* LUCENE-5482: Improve default TurkishAnalyzer by adding apostrophe
handling suitable for Turkish. (Ahmet Arslan via Robert Muir)
* LUCENE-5479: FacetsConfig subclass can now customize the default
per-dim facets configuration. (Rob Audenaerde via Mike McCandless)
* LUCENE-5485: Add circumfix support to HunspellStemFilter. (Robert Muir)
* LUCENE-5224: Add iconv, oconv, and ignore support to HunspellStemFilter.
(Robert Muir)
* LUCENE-5493: SortingMergePolicy, and EarlyTerminatingSortingCollector
support arbitrary Sort specifications.
(Robert Muir, Mike McCandless, Adrien Grand)
* LUCENE-3758: Allow the ComplexPhraseQueryParser to search order or
un-order proximity queries. (Ahmet Arslan via Erick Erickson)
* LUCENE-5530: ComplexPhraseQueryParser throws ParseException for fielded queries.
(Erick Erickson via Tomas Fernandez Lobbe and Ahmet Arslan)
* LUCENE-5513: Add IndexWriter.updateBinaryDocValue which lets
you update the value of a BinaryDocValuesField without reindexing the
document(s). (Shai Erera)
* LUCENE-4072: Add ICUNormalizer2CharFilter, which lets you do unicode normalization
with offset correction before the tokenizer. (David Goldfarb, Ippei UKAI via Robert Muir)
* LUCENE-5476: Add RandomSamplingFacetsCollector for computing facets on a sampled
set of matching hits, in cases where there are millions of hits.
(Rob Audenaerde, Gilad Barkai, Shai Erera)
* LUCENE-4984: Add SegmentingTokenizerBase, abstract class for tokenizers
that want to do two-pass tokenization such as by sentence and then by word.
(Robert Muir)
* LUCENE-5489: Add Rescorer/QueryRescorer, to resort the hits from a
first pass search using scores from a more costly second pass
search. (Simon Willnauer, Robert Muir, Mike McCandless)
* LUCENE-5528: Add context to suggesters (InputIterator and Lookup
classes), and fix AnalyzingInfixSuggester to handle contexts.
Suggester contexts allow you to filter suggestions. (Areek Zillur,
Mike McCandless)
* LUCENE-5545: Add SortRescorer and Expression.getRescorer, to
resort the hits from a first pass search using a Sort or an
Expression. (Simon Willnauer, Robert Muir, Mike McCandless)
* LUCENE-5558: Add TruncateTokenFilter which truncates terms to
the specified length. (Ahmet Arslan via Robert Muir)
* LUCENE-2446: Added checksums to lucene index files. As of 4.8, the last 8
bytes of each file contain a zlib-crc32 checksum. Small metadata files are
verified on load. Larger files can be checked on demand via
AtomicReader.checkIntegrity. You can configure this to happen automatically
before merges by enabling IndexWriterConfig.setCheckIntegrityAtMerge.
(Robert Muir)
* LUCENE-5580: Checksums are automatically verified on the default stored
fields format when performing a bulk merge. (Adrien Grand)
* LUCENE-5602: Checksums are automatically verified on the default term
vectors format when performing a bulk merge. (Adrien Grand, Robert Muir)
* LUCENE-5583: Added DataInput.skipBytes. ChecksumIndexInput can now seek, but
only forward. (Adrien Grand, Mike McCandless, Simon Willnauer, Uwe Schindler)
* LUCENE-5588: Lucene now calls fsync() on the index directory, ensuring
that all file metadata is persisted on disk in case of power failure.
This does not work on all file systems and operating systems, but Linux
and MacOSX are known to work. On Windows, fsyncing a directory is not
possible with Java APIs. (Mike McCandless, Uwe Schindler)
API Changes
* LUCENE-5454: Add RandomAccessOrds, an optional extension of SortedSetDocValues
that supports random access to the ordinals in a document. (Robert Muir)
* LUCENE-5468: Move offline Sort (from suggest module) to OfflineSort. (Robert Muir)
* LUCENE-5493: SortingMergePolicy and EarlyTerminatingSortingCollector take
Sort instead of Sorter. BlockJoinSorter is removed, replaced with
BlockJoinComparatorSource, which can take a Sort for ordering of parents
and a separate Sort for ordering of children within a block.
(Robert Muir, Mike McCandless, Adrien Grand)
* LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as well as
a boolean that indicates if a new merge was found in the caller thread before
the scheduler was called. (Simon Willnauer)
* LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method) from
normal scoring (Weight.scorer) for those queries that can do bulk
scoring more efficiently, e.g. BooleanQuery in some cases. This
also simplified the Weight.scorer API by removing the two confusing
booleans. (Robert Muir, Uwe Schindler, Mike McCandless)
* LUCENE-5519: TopNSearcher now allows to retrieve incomplete results if the max
size of the candidate queue is unknown. The queue can still be bound in order
to apply pruning while retrieving the top N but will not throw an exception if
too many results are rejected to guarantee an absolutely correct top N result.
The TopNSearcher now returns a struct like class that indicates if the result
is complete in the sense of the top N or not. Consumers of this API should assert
on the completeness if the bounded queue size is know ahead of time. (Simon Willnauer)
* LUCENE-4984: Deprecate ThaiWordFilter and smartcn SentenceTokenizer and WordTokenFilter.
These filters would not work correctly with CharFilters and could not be safely placed
at an arbitrary position in the analysis chain. Use ThaiTokenizer and HMMChineseTokenizer
instead. (Robert Muir)
* LUCENE-5543: Remove/deprecate Directory.fileExists (Mike McCandless)
* LUCENE-5573: Move docvalues constants and helper methods to o.a.l.index.DocValues.
(Dawid Weiss, Robert Muir)
* LUCENE-5604: Switched BytesRef.hashCode to MurmurHash3 (32 bit).
TermToBytesRefAttribute.fillBytesRef no longer returns the hash
code. BytesRefHash now uses MurmurHash3 for its hashing. (Robert
Muir, Mike McCandless)
Optimizations
* LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads
all known openoffice dictionaries without error, and supports an additional
longestOnly option for a less aggressive approach. (Robert Muir)
* LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile
for NIOFSDirectory and MMapDirectory. This allows to delete open files
on Windows if NIOFSDirectory is used, mmapped files are still locked.
(Michael Poindexter, Robert Muir, Uwe Schindler)
* LUCENE-5515: Improved TopDocs#merge to create a merged ScoreDoc
array with length of at most equal to the specified size instead of length
equal to at most from + size as was before. (Martijn van Groningen)
* LUCENE-5529: Spatial search of non-point indexed shapes should be a little
faster due to skipping intersection tests on redundant cells. (David Smiley)
Bug fixes
* LUCENE-5483: Fix inaccuracies in HunspellStemFilter. Multi-stage affix-stripping,
prefix-suffix dependencies, and COMPLEXPREFIXES now work correctly according
to the hunspell algorithm. Removed recursionCap parameter, as it's no longer needed, rules for
recursive affix application are driven correctly by continuation classes in the affix file.
(Robert Muir)
* LUCENE-5497: HunspellStemFilter properly handles escaped terms and affixes without conditions.
(Robert Muir)
* LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries and handles varying
types of whitespace in SET/FLAG commands. (Robert Muir)
* LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with large amounts of aliases
etc before the encoding declaration. (Robert Muir)
* LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order. (Robert Muir)
* LUCENE-5555: Fix SortedInputIterator to correctly encode/decode contexts in presence of payload (Areek Zillur)
* LUCENE-5559: Add missing argument checks to tokenfilters taking
numeric arguments. (Ahmet Arslan via Robert Muir)
* LUCENE-5568: Benchmark module's "default.codec" option didn't work. (David Smiley)
* SOLR-5983: HTMLStripCharFilter is treating CDATA sections incorrectly.
(Dan Funk, Steve Rowe)
* LUCENE-5615: Validate per-segment delete counts at write time, to
help catch bugs that might otherwise cause corruption (Mike McCandless)
* LUCENE-5612: NativeFSLockFactory no longer deletes its lock file. This cannot be done
safely without the risk of deleting someone else's lock file. If you use NativeFSLockFactory,
you may see write.lock hanging around from time to time: it's harmless.
(Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-5624: Ensure NativeFSLockFactory does not leak file handles if it is unable
to obtain the lock. (Uwe Schindler, Robert Muir)
* LUCENE-5626: Fix bug in SimpleFSLockFactory's obtain() that sometimes throwed
IOException (ERROR_ACCESS_DENIED) on Windows if the lock file was created
concurrently. This error is now handled the same way like in NativeFSLockFactory
by returning false. (Uwe Schindler, Robert Muir, Dawid Weiss)
* LUCENE-5630: Add missing META-INF entry for UpperCaseFilterFactory.
(Robert Muir)
Tests
* LUCENE-5630: Fix TestAllAnalyzersHaveFactories to correctly check for existence
of class and corresponding Map<String,String> ctor. (Uwe Schindler, Robert Muir)
Test Framework
* LUCENE-5592: Incorrectly reported uncloseable files. (Dawid Weiss)
* LUCENE-5577: Temporary folder and file management (and cleanup facilities)
(Mark Miller, Uwe Schindler, Dawid Weiss)
* LUCENE-5567: When a suite fails with zombie threads failure marker and count
is not propagated properly. (Dawid Weiss)
* LUCENE-5449: Rename _TestUtil and _TestHelper to remove the leading _.
* LUCENE-5501: Added random out-of-order collection testing (when the collector
supports it) to AssertingIndexSearcher. (Adrien Grand)
Build
* LUCENE-5463: RamUsageEstimator.(human)sizeOf(Object) is now a forbidden API.
(Adrien Grand, Robert Muir)
* LUCENE-5512: Remove redundant typing (use diamond operator) throughout
the codebase. (Furkan KAMACI via Robert Muir)
* LUCENE-5614: Enable building on Java 8 using Apache Ant 1.8.3 or 1.8.4
by adding a workaround for the Ant bug. (Uwe Schindler)
* LUCENE-5612: Add a new Ant target in lucene/core to test LockFactory
implementations: "ant test-lock-factory". (Uwe Schindler, Mike McCandless,
Robert Muir)
Documentation
* LUCENE-5534: Add javadocs to GreekStemmer methods.
(Stamatis Pitsios via Robert Muir)
======================= Lucene 4.7.2 =======================
Bug Fixes
* LUCENE-5574: Closing a near-real-time reader no longer attempts to
delete unreferenced files if the original writer has been closed;
this could cause index corruption in certain cases where index files
were directly changed (deleted, overwritten, etc.) in the index
directory outside of Lucene. (Simon Willnauer, Shai Erera, Robert
Muir, Mike McCandless)
* LUCENE-5570: Don't let FSDirectory.sync() create new zero-byte files, instead throw
exception if a file is missing. (Uwe Schindler, Mike McCandless, Robert Muir)
======================= Lucene 4.7.1 =======================
Changes in Runtime Behavior
* LUCENE-5532: AutomatonQuery.equals is no longer implemented as "accepts same language".
This was inconsistent with hashCode, and unnecessary for any subclasses in Lucene.
If you desire this in a custom subclass, minimize the automaton. (Robert Muir)
Bug Fixes
* LUCENE-5450: Fix getField() NPE issues with SpanOr/SpanNear when they have an
empty list of clauses. This can happen for example, when a wildcard matches
no terms. (Tim Allison via Robert Muir)
* LUCENE-5473: Throw IllegalArgumentException, not
NullPointerException, if the synonym map is empty when creating
SynonymFilter (帅广应 via Mike McCandless)
* LUCENE-5432: EliasFanoDocIdSet: Fix number of index entry bits when the maximum
entry is a power of 2. (Paul Elschot via Adrien Grand)
* LUCENE-5466: query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier.
(Koji Sekiguchi)
* LUCENE-5502: Fixed TermsFilter.equals that could return true for different
filters. (Igor Motov via Adrien Grand)
* LUCENE-5522: FacetsConfig didn't add drill-down terms for association facet
fields labels. (Shai Erera)
* LUCENE-5520: ToChildBlockJoinQuery would hit
ArrayIndexOutOfBoundsException if a parent document had no children
(Sally Ang via Mike McCandless)
* LUCENE-5532: AutomatonQuery.hashCode was not thread-safe. (Robert Muir)
* LUCENE-5525: Implement MultiFacets.getAllDims, so you can do sparse
facets through DrillSideways, for example. (Jose Peleteiro, Mike
McCandless)
* LUCENE-5481: IndexWriter.forceMerge used to run a merge even if there was a
single segment in the index. (Adrien Grand, Mike McCandless)
* LUCENE-5538: Fix FastVectorHighlighter bug with index-time synonyms when the
query is more complex than a single phrase. (Robert Muir)
* LUCENE-5544: Exceptions during IndexWriter.rollback could leak file handles
and the write lock. (Robert Muir)
* LUCENE-4978: Spatial RecursivePrefixTree queries could result in false-negatives for
indexed shapes within 1/2 maxDistErr from the edge of the query shape. This meant
searching for a point by the same point as a query rarely worked. (David Smiley)
* LUCENE-5553: IndexReader#ReaderClosedListener is not always invoked when
IndexReader#close() is called or if refCount is 0. If an exception is
thrown during internal close or on any of the close listeners some or all
listeners might be missed. This can cause memory leaks if the core listeners
are used to clear caches. (Simon Willnauer)
Build
* LUCENE-5511: "ant precommit" / "ant check-svn-working-copy" now work again
with any working copy format (thanks to svnkit 1.8.4). (Uwe Schindler)
======================= Lucene 4.7.0 =======================
New Features
* LUCENE-5336: Add SimpleQueryParser: parser for human-entered queries.
(Jack Conradson via Robert Muir)
* LUCENE-5337: Add Payload support to FileDictionary (Suggest) and make it more
configurable (Areek Zillur via Erick Erickson)
* LUCENE-5329: suggest: DocumentDictionary and
DocumentExpressionDictionary are now lenient for dirty documents
(missing the term, weight or payload). (Areek Zillur via
Mike McCandless)
* LUCENE-5404: Add .getCount method to all suggesters (Lookup); persist count
metadata on .store(); Dictionary returns InputIterator; Dictionary.getWordIterator
renamed to .getEntryIterator. (Areek Zillur)
* SOLR-1871: The RangeMapFloatFunction accepts an arbitrary ValueSource
as target and default values. (Chris Harris, shalin)
* LUCENE-5371: Speed up Lucene range faceting from O(N) per hit to
O(log(N)) per hit using segment trees; this only really starts to
matter in practice if the number of ranges is over 10 or so. (Mike
McCandless)
* LUCENE-5379: Add Analyzer for Kurdish. (Robert Muir)
* LUCENE-5369: Added an UpperCaseFilter to make UPPERCASE tokens. (ryan)
* LUCENE-5345: Add a new BlendedInfixSuggester, which is like
AnalyzingInfixSuggester but boosts suggestions that matched tokens
with lower positions. (Remi Melisson via Mike McCandless)
* LUCENE-5399: When sorting by String (SortField.STRING), you can now
specify whether missing values should be sorted first (the default),
using SortField.setMissingValue(SortField.STRING_FIRST), or last,
using SortField.setMissingValue(SortField.STRING_LAST). (Rob Muir,
Mike McCandless)
* LUCENE-5099: QueryNode should have the ability to detach from its node
parent. Added QueryNode.removeFromParent() that allows nodes to be
detached from its parent node. (Adriano Crestani)
* LUCENE-5395 LUCENE-5451: Upgrade to Spatial4j 0.4.1: Parses WKT (including
ENVELOPE) with extension "BUFFER"; buffering a point results in a Circle.
JTS isn't needed for WKT any more but remains required for Polygons. New
Shapes: ShapeCollection and BufferedLineString. Various other improvements and
bug fixes too. More info:
https://github.com/spatial4j/spatial4j/blob/master/CHANGES.md (David Smiley)
* LUCENE-5415: Add multitermquery (wildcards,prefix,etc) to PostingsHighlighter.
(Mike McCandless, Robert Muir)
* LUCENE-3069: Add two memory resident dictionaries (FST terms dictionary and
FSTOrd terms dictionary) to improve primary key lookups. The PostingsBaseFormat
API is also changed so that term dictionaries get the ability to block
encode term metadata, and all dictionary implementations can now plug in any
PostingsBaseFormat. (Han Jiang, Mike McCandless)
* LUCENE-5353: ShingleFilter's filler token should be configurable.
(Ahmet Arslan, Simon Willnauer, Steve Rowe)
* LUCENE-5320: Add SearcherTaxonomyManager over search and taxonomy index
directories (i.e. not only NRT). (Shai Erera)
* LUCENE-5410: Add fuzzy and near support via '~' operator to SimpleQueryParser.
(Lee Hinman via Robert Muir)
* LUCENE-5426: Make SortedSetDocValuesReaderState abstract to allow
custom implementations for Lucene doc values faceting (John Wang via
Mike McCandless)
* LUCENE-5434: NRT support for file systems that do no have delete on last
close or cannot delete while referenced semantics.
(Mark Miller, Mike McCandless)
* LUCENE-5418: Drilling down or sideways on a Lucene facet range
(using Range.getFilter()) is now faster for costly filters (uses
random access, not iteration); range facet counts now accept a
fast-match filter to avoid computing the value for documents that
are out of bounds, e.g. using a bounding box filter with distance
range faceting. (Mike McCandless)
* LUCENE-5440: Add LongBitSet for managing more than 2.1B bits (otherwise use
FixedBitSet). (Shai Erera)
* LUCENE-5437: ASCIIFoldingFilter now has an option to preserve the original token
and emit it on the same position as the folded token only if the actual token was
folded. (Simon Willnauer, Nik Everett)
* LUCENE-5408: Add spatial SerializedDVStrategy that serializes a binary
representations of a shape into BinaryDocValues. It supports exact geometry
relationship calculations. (David Smiley)
* LUCENE-5457: Add SloppyMath.earthDiameter(double latitude) that returns an
approximate value of the diameter of the earth at the given latitude.
(Adrien Grand)
* LUCENE-5979: FilteredQuery uses the cost API to decide on whether to use
random-access or leap-frog to intersect the filter with the query.
(Adrien Grand)
Build
* LUCENE-5217,LUCENE-5420: Maven config: get dependencies from Ant+Ivy config;
disable transitive dependency resolution for all depended-on artifacts by
putting an exclusion for each transitive dependency in the
<dependencyManagement> section of the grandparent POM. (Steve Rowe)
* LUCENE-5322: Clean up / simplify Maven-related Ant targets.
(Steve Rowe)
* LUCENE-5347: Upgrade forbidden-apis checker to version 1.4.
(Uwe Schindler)
* LUCENE-4381: Upgrade analysis/icu to 52.1. (Robert Muir)
* LUCENE-5357: Upgrade StandardTokenizer and UAX29URLEmailTokenizer to
Unicode 6.3; update UAX29URLEmailTokenizer's recognized top level
domains in URLs and Emails from the IANA Root Zone Database.
(Steve Rowe)
* LUCENE-5360: Add support for developing in Netbeans IDE.
(Michal Hlavac, Uwe Schindler, Steve Rowe)
* SOLR-5590: Upgrade HttpClient/HttpComponents to 4.3.x.
(Karl Wright via Shawn Heisey)
* LUCENE-5385: "ant precommit" / "ant check-svn-working-copy" now work
for SVN 1.8 or GIT checkouts. The ANT target prints a warning instead
of failing. It also instructs the user, how to run on SVN 1.8 working
copies. (Robert Muir, Uwe Schindler)
* LUCENE-5383: fix changes2html to link pull requests (Steve Rowe)
* LUCENE-5411: Upgrade to released JFlex 1.5.0; stop requiring
a locally built JFlex snapshot jar. (Steve Rowe)
* LUCENE-5465: Solr Contrib "map-reduce" breaks Manifest of all other
JAR files by adding a broken Main-Class attribute.
(Uwe Schindler, Steve Rowe)
Bug fixes
* LUCENE-5285: Improved highlighting of multi-valued fields with
FastVectorHighlighter. (Nik Everett via Adrien Grand)
* LUCENE-5391: UAX29URLEmailTokenizer should not tokenize no-scheme
domain-only URLs that are followed by an alphanumeric character.
(Chris Geeringh, Steve Rowe)
* LUCENE-5405: If an analysis component throws an exception, Lucene
logs the field name to the info stream to assist in
diagnosis. (Benson Margulies)
* SOLR-5661: PriorityQueue now refuses to allocate itself if the
incoming maxSize is too large (Raintung Li via Mike McCandless)
* LUCENE-5228: IndexWriter.addIndexes(Directory[]) now acquires a
write lock in each Directory, to ensure that no open IndexWriter is
changing the incoming indices. This also means that you cannot pass
the same Directory to multiple concurrent addIndexes calls (which is
anyways unusual). (Robert Muir, Mike McCandless)
* LUCENE-5415: SpanMultiTermQueryWrapper didn't handle its boost in
hashcode/equals/tostring/rewrite. (Robert Muir)
* LUCENE-5409: ToParentBlockJoinCollector.getTopGroups would fail to
return any groups when the joined query required more than one
rewrite step (Peng Cheng via Mike McCandless)
* LUCENE-5398: NormValueSource was incorrectly casting the long value
to byte, before calling Similarity.decodeNormValue. (Peng Cheng via
Mike McCandless)
* LUCENE-5436: ReferenceManager#accquire can result in infinite loop if
managed resource is abused outside of the ReferenceManager. Decrementing
the reference without a corresponding incRef() call can cause an infinite
loop. ReferenceManager now throws IllegalStateException if currently managed
resources ref count is 0. (Simon Willnauer)
* LUCENE-5443: Lucene45DocValuesProducer.ramBytesUsed() may throw
ConcurrentModificationException. (Shai Erera, Simon Willnauer)
* LUCENE-5444: MemoryIndex didn't respect the analyzers offset gap and
offsets were corrupted if multiple fields with the same name were
added to the memory index. (Britta Weber, Simon Willnauer)
* LUCENE-5447: StandardTokenizer should break at consecutive chars matching
Word_Break = MidLetter, MidNum and/or MidNumLet (Steve Rowe)
* LUCENE-5462: RamUsageEstimator.sizeOf(Object) is not used anymore to
estimate memory usage of segments. This used to make
SegmentReader.ramBytesUsed very CPU-intensive. (Adrien Grand)
* LUCENE-5461: ControlledRealTimeReopenThread would sometimes wait too
long (up to targetMaxStaleSec) when a searcher is waiting for a
specific generation, when it should have waited for at most
targetMinStaleSec. (Hans Lund via Mike McCandless)
API Changes
* LUCENE-5339: The facet module was simplified/reworked to make the
APIs more approachable to new users. Note: when migrating to the new
API, you must pass the Document that is returned from FacetConfig.build()
to IndexWriter.addDocument(). (Shai Erera, Gilad Barkai, Rob
Muir, Mike McCandless)
* LUCENE-5405: Make ShingleAnalyzerWrapper.getWrappedAnalyzer() public final (gsingers)
* LUCENE-5395: The SpatialArgsParser now only reads WKT, no more "lat, lon"
etc. but it's easy to override the parseShape method if you wish. (David
Smiley)
* LUCENE-5414: DocumentExpressionDictionary was renamed to
DocumentValueSourceDictionary and all dependencies to the lucene-expression
module were removed from lucene-suggest. DocumentValueSourceDictionary now
only accepts a ValueSource instead of a convenience ctor for an expression
string. (Simon Willnauer)
* LUCENE-3069: PostingsWriterBase and PostingsReaderBase are no longer
responsible for encoding/decoding a block of terms. Instead, they
should encode/decode each term to/from a long[] and byte[]. (Han
Jiang, Mike McCandless)
* LUCENE-5425: FacetsCollector and MatchingDocs use a general DocIdSet,
allowing for custom implementations to be used when faceting.
(John Wang, Lei Wang, Shai Erera)
Optimizations
* LUCENE-5372: Replace StringBuffer by StringBuilder, where possible.
(Joshua Hartman via Uwe Schindler, Dawid Weiss, Mike McCandless)
* LUCENE-5271: A slightly more accurate SloppyMath distance.
(Gilad Barkai via Ryan Ernst)
* LUCENE-5399: Deep paging using IndexSearcher.searchAfter when
sorting by fields is faster (Rob Muir, Mike McCandless)
Changes in Runtime Behavior
* LUCENE-5362: IndexReader and SegmentCoreReaders now throw
AlreadyClosedException if the refCount in incremented but
is less that 1. (Simon Willnauer)
Documentation
* LUCENE-5384: Add some tips for making tokenfilters and tokenizers
to the analysis package overview.
(Benson Margulies via Robert Muir - pull request #12)
* LUCENE-5389: Add more guidance in the analysis documentation
package overview.
(Benson Margulies via Robert Muir - pull request #14)
======================= Lucene 4.6.1 =======================
Bug fixes
* LUCENE-5373: Memory usage of
[Lucene40/Lucene42/Memory/Direct]DocValuesFormat was over-estimated.
(Shay Banon, Adrien Grand, Robert Muir)
* LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter.
(Nik Everett via Adrien Grand)
* LUCENE-5374: IndexWriter processes internal events after the it
closed itself internally. This rare condition can happen if an
IndexWriter has internal changes that were not fully applied yet
like when index / flush requests happen concurrently to the close or
rollback call. (Simon Willnauer)
* LUCENE-5394: Fix TokenSources.getTokenStream to return payloads if
they were indexed with the term vectors. (Mike McCandless)
* LUCENE-5344: Flexible StandardQueryParser behaves differently than
ClassicQueryParser. (Adriano Crestani)
* LUCENE-5375: ToChildBlockJoinQuery works harder to detect mis-use,
when the parent query incorrectly returns child documents, and throw
a clear exception saying so. (Dr. Oleg Savrasov via Mike McCandless)
* LUCENE-5401: Field.StringTokenStream#end() calls super.end() now,
preventing wrong term positions for fields that use
StringTokenStream. (Michael Busch)
* LUCENE-5377: IndexWriter.addIndexes(Directory[]) would cause corruption
on Lucene 4.6 if any index segments were Lucene 4.0-4.5.
(Littlestar, Mike McCandless, Shai Erera, Robert Muir)
======================= Lucene 4.6.0 =======================
New Features
* LUCENE-4906: PostingsHighlighter can now render to custom Object,
for advanced use cases where String is too restrictive (Luca
Cavanna, Robert Muir, Mike McCandless)
* LUCENE-5133: Changed AnalyzingInfixSuggester.highlight to return
Object instead of String, to allow for advanced use cases where
String is too restrictive (Robert Muir, Shai Erera, Mike
McCandless)
* LUCENE-5207, LUCENE-5334: Added expressions module for customizing ranking
with script-like syntax.
(Jack Conradson, Ryan Ernst, Uwe Schindler via Robert Muir)
* LUCENE-5180: ShingleFilter now creates shingles with trailing holes,
for example if a StopFilter had removed the last token. (Mike
McCandless)
* LUCENE-5219: Add support to SynonymFilterFactory for custom
parsers. (Ryan Ernst via Robert Muir)
* LUCENE-5235: Tokenizers now throw an IllegalStateException if the
consumer does not call reset() before consuming the stream. Previous
versions throwed NullPointerException or ArrayIndexOutOfBoundsException
on best effort which was not user-friendly.
(Uwe Schindler, Robert Muir)
* LUCENE-5240: Tokenizers now throw an IllegalStateException if the
consumer neglects to call close() on the previous stream before consuming
the next one. (Uwe Schindler, Robert Muir)
* LUCENE-5214: Add new FreeTextSuggester, to predict the next word
using a simple ngram language model. This is useful for the "long
tail" suggestions, when a primary suggester fails to find a
suggestion. (Mike McCandless)
* LUCENE-5251: New DocumentDictionary allows building suggesters via
contents of existing field, weight and optionally payload stored
fields in an index (Areek Zillur via Mike McCandless)
* LUCENE-5261: Add QueryBuilder, a simple API to build queries from
the analysis chain directly, or to make it easier to implement
query parsers. (Robert Muir, Uwe Schindler)
* LUCENE-5270: Add Terms.hasFreqs, to determine whether a given field
indexed per-doc term frequencies. (Mike McCandless)
* LUCENE-5269: Add CodepointCountFilter. (Robert Muir)
* LUCENE-5294: Suggest module: add DocumentExpressionDictionary to
compute each suggestion's weight using a javascript expression.
(Areek Zillur via Mike McCandless)
* LUCENE-5274: FastVectorHighlighter now supports highlighting against several
indexed fields. (Nik Everett via Adrien Grand)
* LUCENE-5304: SingletonSortedSetDocValues can now return the wrapped
SortedDocValues (Robert Muir, Adrien Grand)
* LUCENE-2844: The benchmark module can now test the spatial module. See
spatial.alg (David Smiley, Liviy Ambrose)
* LUCENE-5302: Make StemmerOverrideMap's methods public (Alan Woodward)
* LUCENE-5296: Add DirectDocValuesFormat, which holds all doc values
in heap as uncompressed java native arrays. (Mike McCandless)
* LUCENE-5189: Add IndexWriter.updateNumericDocValues, to update
numeric DocValues fields of documents, without re-indexing them.
(Shai Erera, Mike McCandless, Robert Muir)
* LUCENE-5298: Add SumValueSourceFacetRequest for aggregating facets by
a ValueSource, such as a NumericDocValuesField or an expression.
(Shai Erera)
* LUCENE-5323: Add .sizeInBytes method to all suggesters (Lookup).
(Areek Zillur via Mike McCandless)
* LUCENE-5312: Add BlockJoinSorter, a new Sorter implementation that makes sure
to never split up blocks of documents indexed with IndexWriter.addDocuments.
(Adrien Grand)
* LUCENE-5297: Allow to range-facet on any ValueSource, not just
NumericDocValues fields. (Shai Erera)
Bug Fixes
* LUCENE-5272: OpenBitSet.ensureCapacity did not modify numBits, causing
false assertion errors in fastSet. (Shai Erera)
* LUCENE-5303: OrdinalsCache did not use coreCacheKey, resulting in
over caching across multiple threads. (Mike McCandless, Shai Erera)
* LUCENE-5307: Fix topScorer inconsistency in handling QueryWrapperFilter
inside ConstantScoreQuery, which now rewrites to a query removing the
obsolete QueryWrapperFilter. (Adrien Grand, Uwe Schindler)
* LUCENE-5330: IndexWriter didn't process all internal events on
#getReader(), #close() and #rollback() which causes files to be
deleted at a later point in time. This could cause short-term disk
pollution or OOM if in-memory directories are used. (Simon Willnauer)
* LUCENE-5342: Fixed bulk-merge issue in CompressingStoredFieldsFormat which
created corrupted segments when mixing chunk sizes.
Lucene41StoredFieldsFormat is not impacted. (Adrien Grand, Robert Muir)
API Changes
* LUCENE-5222: Add SortField.needsScores(). Previously it was not possible
for a custom Sort that makes use of the relevance score to work correctly
with IndexSearcher when an ExecutorService is specified.
(Ryan Ernst, Mike McCandless, Robert Muir)
* LUCENE-5275: Change AttributeSource.toString() to display the current
state of attributes. (Robert Muir)
* LUCENE-5277: Modify FixedBitSet copy constructor to take an additional
numBits parameter to allow growing/shrinking the copied bitset. You can
use FixedBitSet.clone() if you only need to clone the bitset. (Shai Erera)
* LUCENE-5260: Use TermFreqPayloadIterator for all suggesters; those
suggesters that can't support payloads will throw an exception if
hasPayloads() is true. (Areek Zillur via Mike McCandless)
* LUCENE-5280: Rename TermFreqPayloadIterator -> InputIterator, along
with associated suggest/spell classes. (Areek Zillur via Mike
McCandless)
* LUCENE-5157: Rename OrdinalMap methods to clarify API and internal structure.
(Boaz Leskes via Adrien Grand)
* LUCENE-5313: Move preservePositionIncrements from setter to ctor in
Analyzing/FuzzySuggester. (Areek Zillur via Mike McCandless)
* LUCENE-5321: Remove Facet42DocValuesFormat. Use DirectDocValuesFormat if you
want to load the category list into memory. (Shai Erera, Mike McCandless)
* LUCENE-5324: AnalyzerWrapper.getPositionIncrementGap and getOffsetGap can now
be overridden. (Adrien Grand)
Optimizations
* LUCENE-5225: The ToParentBlockJoinQuery only keeps tracks of the the child
doc ids and child scores if the ToParentBlockJoinCollector is used.
(Martijn van Groningen)
* LUCENE-5236: EliasFanoDocIdSet now has an index and uses broadword bit
selection to speed-up advance(). (Paul Elschot via Adrien Grand)
* LUCENE-5266: Improved number of read calls and branches in DirectPackedReader. (Ryan Ernst)
* LUCENE-5300: Optimized SORTED_SET storage for fields which are single-valued.
(Adrien Grand)
Documentation
* LUCENE-5211: Better javadocs and error checking of 'format' option in
StopFilterFactory, as well as comments in all snowball formatted files
about specifying format option. (hossman)
Changes in backwards compatibility policy
* LUCENE-5235: Sub classes of Tokenizer have to call super.reset()
when implementing reset(). Otherwise the consumer will get an
IllegalStateException because the Reader is not correctly assigned.
It is important to never change the "input" field on Tokenizer
without using setReader(). The "input" field must not be used
outside reset(), incrementToken(), or end() - especially not in
the constructor. (Uwe Schindler, Robert Muir)
* LUCENE-5204: Directory doesn't have default implementations for
LockFactory-related methods, which have been moved to BaseDirectory. If you
had a custom Directory implementation that extended Directory, you need to
extend BaseDirectory instead. (Adrien Grand)
Build
* LUCENE-5283: Fail the build if ant test didn't execute any tests
(everything filtered out). (Dawid Weiss, Uwe Schindler)
* LUCENE-5249, LUCENE-5257: All Lucene/Solr modules should use the same
dependency versions. (Steve Rowe)
* LUCENE-5273: Binary artifacts in Lucene and Solr convenience binary
distributions accompanying a release, including on Maven Central,
should be identical across all distributions. (Steve Rowe, Uwe Schindler,
Shalin Shekhar Mangar)
* LUCENE-4753: Run forbidden-apis Ant task per module. This allows more
improvements and prevents OOMs after the number of class files
raised recently. (Uwe Schindler)
Tests
* LUCENE-5278: Fix MockTokenizer to work better with more regular expression
patterns. Previously it could only behave like CharTokenizer (where a character
is either a "word" character or not), but now it gives a general longest-match
behavior. (Nik Everett via Robert Muir)
======================= Lucene 4.5.1 =======================
Bug Fixes
* LUCENE-4998: Fixed a few places to pass IOContext.READONCE instead
of IOContext.READ (Shikhar Bhushan via Mike McCandless)
* LUCENE-5242: DirectoryTaxonomyWriter.replaceTaxonomy did not fully reset
its state, which could result in exceptions being thrown, as well as
incorrect ordinals returned from getParent. (Shai Erera)
* LUCENE-5254: Fixed bounded memory leak, where objects like live
docs bitset were not freed from an starting reader after reopening
to a new reader and closing the original one. (Shai Erera, Mike
McCandless)
* LUCENE-5262: Fixed file handle leaks when multiple attempts to open an
NRT reader hit exceptions. (Shai Erera)
* LUCENE-5263: Transient IOExceptions, e.g. due to disk full or file
descriptor exhaustion, hit at unlucky times inside IndexWriter could
lead to silently losing deletions. (Shai Erera, Mike McCandless)
* LUCENE-5264: CommonTermsQuery ignored minMustMatch if only high-frequent
terms were present in the query and the high-frequent operator was set
to SHOULD. (Simon Willnauer)
* LUCENE-5269: Fix bug in NGramTokenFilter where it would sometimes count
unicode characters incorrectly. (Mike McCandless, Robert Muir)
* LUCENE-5289: IndexWriter.hasUncommittedChanges was returning false
when there were buffered delete-by-Term. (Shalin Shekhar Mangar,
Mike McCandless)
======================= Lucene 4.5.0 =======================
New features
* LUCENE-5084: Added new Elias-Fano encoder, decoder and DocIdSet
implementations. (Paul Elschot via Adrien Grand)
* LUCENE-5081: Added WAH8DocIdSet, an in-memory doc id set implementation based
on word-aligned hybrid encoding. (Adrien Grand)
* LUCENE-5098: New broadword utility methods in oal.util.BroadWord.
(Paul Elschot via Adrien Grand, Dawid Weiss)
* LUCENE-5030: FuzzySuggester now supports optional unicodeAware
(default is false). If true then edits are measured in Unicode code
points instead of UTF8 bytes. (Artem Lukanin via Mike McCandless)
* LUCENE-5118: SpatialStrategy.makeDistanceValueSource() now has an optional
multiplier for scaling degrees to another unit. (David Smiley)
* LUCENE-5091: SpanNotQuery can now be configured with pre and post slop to act
as a hypothetical SpanNotNearQuery. (Tim Allison via David Smiley)
* LUCENE-4985: FacetsAccumulator.create() is now able to create a
MultiFacetsAccumulator over a mixed set of facet requests. MultiFacetsAccumulator
allows wrapping multiple FacetsAccumulators, allowing to easily mix
existing and custom ones. TaxonomyFacetsAccumulator supports any
FacetRequest which implements createFacetsAggregator and was indexed
using the taxonomy index. (Shai Erera)
* LUCENE-5153: AnalyzerWrapper.wrapReader allows wrapping the Reader given to
inputReader. (Shai Erera)
* LUCENE-5155: FacetRequest.getValueOf and .getFacetArraysSource replaced by
FacetsAggregator.createOrdinalValueResolver. This gives better options for
resolving an ordinal's value by FacetAggregators. (Shai Erera)
* LUCENE-5165: Add SuggestStopFilter, to be used with analyzing
suggesters, so that a stop word at the very end of the lookup query,
and without any trailing token characters, will be preserved. This
enables query "a" to suggest apple; see
http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-removes.html
for details.
* LUCENE-5178: Added support for missing values to DocValues fields.
AtomicReader.getDocsWithField returns a Bits of documents with a value,
and FieldCache.getDocsWithField forwards to that for DocValues fields. Things like
SortField.setMissingValue, FunctionValues.exists, and FieldValueFilter now
work with DocValues fields. (Robert Muir)
* LUCENE-5124: Lucene 4.5 has a new Lucene45Codec with Lucene45DocValues,
supporting missing values and with most datastructures residing off-heap.
Added "Memory" docvalues format that works entirely in heap, and "Disk"
loads no datastructures into RAM. Both of these also support missing values.
Added DiskNormsFormat (in case you want norms entirely on disk). (Robert Muir)
* LUCENE-2750: Added PForDeltaDocIdSet, an in-memory doc id set implementation
based on the PFOR encoding. (Adrien Grand)
* LUCENE-5186: Added CachingWrapperFilter.getFilter in order to be able to get
the wrapped filter. (Trejkaz via Adrien Grand)
* LUCENE-5197: Added SegmentReader.ramBytesUsed to return approximate heap RAM
used by index datastructures. (Areek Zillur via Robert Muir)
Bug Fixes
* LUCENE-5116: IndexWriter.addIndexes(IndexReader...) should drop empty (or all
deleted) segments. (Robert Muir, Shai Erera)
* LUCENE-5132: Spatial RecursivePrefixTree Contains predicate will throw an NPE
when there's no indexed data and maybe in other circumstances too. (David Smiley)
* LUCENE-5146: AnalyzingSuggester sort comparator read part of the input key as the
weight that caused the sorter to never sort by weight first since the weight is only
considered if the input is equal causing the malformed weight to be identical as well.
(Simon Willnauer)
* LUCENE-5151: Associations FacetsAggregators could enter an infinite loop when
some result documents were missing category associations. (Shai Erera)
* LUCENE-5152: Fix MemoryPostingsFormat to not modify borrowed BytesRef from FSTEnum
seek/lookup which can cause side effects if done on a cached FST root arc.
(Simon Willnauer)
* LUCENE-5160: Handle the case where reading from a file or FileChannel returns -1,
which could happen in rare cases where something happens to the file between the
time we start the read loop (where we check the length) and when we actually do
the read. (gsingers, yonik, Robert Muir, Uwe Schindler)
* LUCENE-5166: PostingsHighlighter would throw IOOBE if a term spanned the maxLength
boundary, made it into the top-N and went to the formatter.
(Manuel Amoabeng, Michael McCandless, Robert Muir)
* LUCENE-4583: Indexing core no longer enforces a limit on maximum
length binary doc values fields, but individual codecs (including
the default one) have their own limits (David Smiley, Robert Muir,
Mike McCandless)
* LUCENE-3849: TokenStreams now set the position increment in end(),
so we can handle trailing holes. If you have a custom TokenStream
implementing end() then be sure it calls super.end(). (Robert Muir,
Mike McCandless)
* LUCENE-5192: IndexWriter could allow adding same field name with different
DocValueTypes under some circumstances. (Shai Erera)
* LUCENE-5191: SimpleHTMLEncoder in Highlighter module broke Unicode
outside BMP because it encoded UTF-16 chars instead of codepoints.
The escaping of codepoints > 127 was removed (not needed for valid HTML)
and missing escaping for ' and / was added. (Uwe Schindler)
* LUCENE-5201: Fixed compression bug in LZ4.compressHC when the input is highly
compressible and the start offset of the array to compress is > 0.
(Adrien Grand)
* LUCENE-5221: SimilarityBase did not write norms the same way as DefaultSimilarity
if discountOverlaps == false and index-time boosts are present for the field.
(Yubin Kim via Robert Muir)
* LUCENE-5223: Fixed IndexUpgrader command line parsing: -verbose is not required
and -dir-impl option now works correctly. (hossman)
* LUCENE-5245: Fix MultiTermQuery's constant score rewrites to always
return a ConstantScoreQuery to make scoring consistent. Previously it
returned an empty unwrapped BooleanQuery, if no terms were available,
which has a different query norm. (Nik Everett, Uwe Schindler)
* LUCENE-5218: In some cases, trying to retrieve or merge a 0-length
binary doc value would hit an ArrayIndexOutOfBoundsException.
(Littlestar via Mike McCandless)
API Changes
* LUCENE-5094: Add ramBytesUsed() to MultiDocValues.OrdinalMap.
(Robert Muir)
* LUCENE-5114: Remove unused boolean useCache parameter from
TermsEnum.seekCeil and .seekExact (Mike McCandless)
* LUCENE-5128: IndexSearcher.searchAfter throws IllegalArgumentException if
searchAfter exceeds the number of documents in the reader.
(Crocket via Shai Erera)
* LUCENE-5129: CategoryAssociationsContainer no longer supports null
association values for categories. If you want to index categories without
associations, you should add them using FacetFields. (Shai Erera)
* LUCENE-4876: IndexWriter no longer clones the given IndexWriterConfig. If you
need to use the same config more than once, e.g. when sharing between multiple
writers, make sure to clone it before passing to each writer.
(Shai Erera, Mike McCandless)
* LUCENE-5144: StandardFacetsAccumulator renamed to OldFacetsAccumulator, and all
associated classes were moved under o.a.l.facet.old. The intention to remove it
one day, when the features it covers (complements, partitions, sampling) will be
migrated to the new FacetsAggregator and FacetsAccumulator API. Also,
FacetRequest.createAggregator was replaced by OldFacetsAccumulator.createAggregator.
(Shai Erera)
* LUCENE-5149: CommonTermsQuery now allows to set the minimum number of terms that
should match for its high and low frequent sub-queries. Previously this was only
supported on the low frequent terms query. (Simon Willnauer)
* LUCENE-5156: CompressingTermVectors TermsEnum no longer supports ord().
(Robert Muir)
* LUCENE-5161, LUCENE-5164: Fix default chunk sizes in FSDirectory to not be
unnecessarily large (now 8192 bytes); also use chunking when writing to index
files. FSDirectory#setReadChunkSize() is now deprecated and will be removed
in Lucene 5.0. (Uwe Schindler, Robert Muir, gsingers)
* LUCENE-5170: Analyzer.ReuseStrategy instances are now stateless and can
be reused in other Analyzer instances, which was not possible before.
Lucene ships now with stateless singletons for per field and global reuse.
Legacy code can still instantiate the deprecated implementation classes,
but new code should use the constants. Implementors of custom strategies
have to take care of new method signatures. AnalyzerWrapper can now be
configured to use a custom strategy, too, ideally the one from the wrapped
Analyzer. Analyzer adds a getter to retrieve the strategy for this use-case.
(Uwe Schindler, Robert Muir, Shay Banon)
* LUCENE-5173: Lucene never writes segments with 0 documents anymore.
(Shai Erera, Uwe Schindler, Robert Muir)
* LUCENE-5178: SortedDocValues always returns -1 ord when a document is missing
a value for the field. Previously it only did this if the SortedDocValues
was produced by uninversion on the FieldCache. (Robert Muir)
* LUCENE-5183: remove BinaryDocValues.MISSING. In order to determine a document
is missing a field, use getDocsWithField instead. (Robert Muir)
Changes in Runtime Behavior
* LUCENE-5178: DocValues codec consumer APIs (iterables) return null values
when the document has no value for the field. (Robert Muir)
* LUCENE-5200: The HighFreqTerms command-line tool returns the true top-N
by totalTermFreq when using the -t option, it uses the term statistics (faster)
and now always shows totalTermFreq in the output. (Robert Muir)
Optimizations
* LUCENE-5088: Added TermFilter to filter docs by a specific term.
(Martijn van Groningen)
* LUCENE-5119: DiskDV keeps the document-to-ordinal mapping on disk for
SortedDocValues. (Robert Muir)
* LUCENE-5145: New AppendingPackedLongBuffer, a new variant of the former
AppendingLongBuffer which assumes values are 0-based.
(Boaz Leskes via Adrien Grand)
* LUCENE-5145: All Appending*Buffer now support bulk get.
(Boaz Leskes via Adrien Grand)
* LUCENE-5140: Fixed a performance regression of span queries caused by
LUCENE-4946. (Alan Woodward, Adrien Grand)
* LUCENE-5150: Make WAH8DocIdSet able to inverse its encoding in order to
compress dense sets efficiently as well. (Adrien Grand)
* LUCENE-5159: Prefix-code the sorted/sortedset value dictionaries in DiskDV.
(Robert Muir)
* LUCENE-5170: Fixed several wrapper analyzers to inherit the reuse strategy
of the wrapped Analyzer. (Uwe Schindler, Robert Muir, Shay Banon)
* LUCENE-5006: Simplified DocumentsWriter and DocumentsWriterPerThread
synchronization and concurrent interaction with IndexWriter. DWPT is now
only setup once and has no reset logic. All segment publishing and state
transition from DWPT into IndexWriter is now done via an Event-Queue
processed from within the IndexWriter in order to prevent situations
where DWPT or DW calling int IW causing deadlocks. (Simon Willnauer)
* LUCENE-5182: Terminate phrase searches early if max phrase window is
exceeded in FastVectorHighlighter to prevent very long running phrase
extraction if phrase terms are high frequent. (Simon Willnauer)
* LUCENE-5188: CompressingStoredFieldsFormat now slices chunks containing big
documents into fixed-size blocks so that requesting a single field does not
necessarily force to decompress the whole chunk. (Adrien Grand)
* LUCENE-5101: CachingWrapper makes it easier to plug-in a custom cacheable
DocIdSet implementation and uses WAH8DocIdSet by default, which should be
more memory efficient than FixedBitSet on average as well as faster on small
sets. (Robert Muir)
Documentation
* LUCENE-4894: remove facet userguide as it was outdated. Partially absorbed into
package's documentation and classes javadocs. (Shai Erera)
* LUCENE-5206: Clarify FuzzyQuery's unexpected behavior on short
terms. (Tim Allison via Mike McCandless)
Changes in backwards compatibility policy
* LUCENE-5141: CheckIndex.fixIndex(Status,Codec) is now
CheckIndex.fixIndex(Status). If you used to pass a codec to this method, just
remove it from the arguments. (Adrien Grand)
* LUCENE-5089, SOLR-5126: Update to Morfologik 1.7.1. MorfologikAnalyzer and MorfologikFilter
no longer support multiple "dictionaries" as there is only one dictionary available.
(Dawid Weiss)
* LUCENE-5170: Changed method signatures of Analyzer.ReuseStrategy to take
Analyzer. Closeable interface was removed because the class was changed to
be stateless. (Uwe Schindler, Robert Muir, Shay Banon)
* LUCENE-5187: SlowCompositeReaderWrapper constructor is now private,
SlowCompositeReaderWrapper.wrap should be used instead. (Adrien Grand)
* LUCENE-5101: CachingWrapperFilter doesn't always return FixedBitSet instances
anymore. Users of the join module can use
oal.search.join.FixedBitSetCachingWrapperFilter instead. (Adrien Grand)
Build
* SOLR-5159: Manifest includes non-parsed maven variables.
(Artem Karpenko via Steve Rowe)
* LUCENE-5193: Add jar-src as top-level target to generate all Lucene and Solr
*-src.jar. (Steve Rowe, Shai Erera)
======================= Lucene 4.4.0 =======================
Changes in backwards compatibility policy
* LUCENE-5085: MorfologikFilter will no longer stem words marked as keywords
(Dawid Weiss, Grzegorz Sobczyk)
* LUCENE-4955: NGramTokenFilter now emits all n-grams for the same token at the
same position and preserves the position length and the offsets of the
original token. (Simon Willnauer, Adrien Grand)
* LUCENE-4955: NGramTokenizer now emits n-grams in a different order
(a, ab, b, bc, c) instead of (a, b, c, ab, bc) and doesn't trim trailing
whitespaces. (Adrien Grand)
* LUCENE-5042: The n-gram and edge n-gram tokenizers and filters now correctly
handle supplementary characters, and the tokenizers have the ability to
pre-tokenize the input stream similarly to CharTokenizer. (Adrien Grand)
* LUCENE-4967: NRTManager is replaced by
ControlledRealTimeReopenThread, for controlling which requests must
see which indexing changes, so that it can work with any
ReferenceManager (Mike McCandless)
* LUCENE-4973: SnapshotDeletionPolicy no longer requires a unique
String id (Mike McCandless, Shai Erera)
* LUCENE-4946: The internal sorting API (SorterTemplate, now Sorter) has been
completely refactored to allow for a better implementation of TimSort.
(Adrien Grand, Uwe Schindler, Dawid Weiss)
* LUCENE-4963: Some TokenFilter options that generate broken TokenStreams have
been deprecated: updateOffsets=true on TrimFilter and
enablePositionIncrements=false on all classes that inherit from
FilteringTokenFilter: JapanesePartOfSpeechStopFilter, KeepWordFilter,
LengthFilter, StopFilter and TypeTokenFilter. (Adrien Grand)
* LUCENE-4963: In order not to take position increments into account in
suggesters, you now need to call setPreservePositionIncrements(false) instead
of configuring the token filters to not increment positions. (Adrien Grand)
* LUCENE-3907: EdgeNGramTokenizer now supports maxGramSize > 1024, doesn't trim
the input, sets position increment = 1 for all tokens and doesn't support
backward grams anymore. (Adrien Grand)
* LUCENE-3907: EdgeNGramTokenFilter does not support backward grams and does
not update offsets anymore. (Adrien Grand)
* LUCENE-4981: PositionFilter is now deprecated as it can corrupt token stream
graphs. Since it main use-case was to make query parsers generate boolean
queries instead of phrase queries, it is now advised to use
QueryParser.setAutoGeneratePhraseQueries(false) (for simple cases) or to
override QueryParser.newFieldQuery. (Adrien Grand, Steve Rowe)
* LUCENE-5018: CompoundWordTokenFilterBase and its children
DictionaryCompoundWordTokenFilter and HyphenationCompoundWordTokenFilter don't
update offsets anymore. (Adrien Grand)
* LUCENE-5015: SamplingAccumulator no longer corrects the counts of the sampled
categories. You should set TakmiSampleFixer on SamplingParams if required (but
notice that this means slower search). (Rob Audenaerde, Gilad Barkai, Shai Erera)
* LUCENE-4933: Replace ExactSimScorer/SloppySimScorer with just SimScorer. Previously
there were 2 implementations as a performance hack to support tableization of
sqrt(), but this caching is removed, as sqrt is implemented in hardware with modern
jvms and it's faster not to cache. (Robert Muir)
* LUCENE-5038: MergePolicy now has a default implementation for useCompoundFile based
on segment size and noCFSRatio. The default implementation was pulled up from
TieredMergePolicy. (Simon Willnauer)
* LUCENE-5063: FieldCache.get(Bytes|Shorts), SortField.Type.(BYTE|SHORT) and
FieldCache.DEFAULT_(BYTE|SHORT|INT|LONG|FLOAT|DOUBLE)_PARSER are now
deprecated. These methods/types assume that data is stored as strings although
Lucene has much better support for numeric data through (Int|Long)Field,
NumericRangeQuery and FieldCache.get(Int|Long)s. (Adrien Grand)
* LUCENE-5078: TfIDFSimilarity lets you encode the norm value as any arbitrary long.
As a result, encode/decodeNormValue were made abstract with their signatures changed.
The default implementation was moved to DefaultSimilarity, which encodes the norm as
a single-byte value. (Shai Erera)
Bug Fixes
* LUCENE-4890: QueryTreeBuilder.getBuilder() only finds interfaces on the
most derived class. (Adriano Crestani)
* LUCENE-4997: Internal test framework's tests are sensitive to previous
test failures and tests.failfast. (Dawid Weiss, Shai Erera)
* LUCENE-4955: NGramTokenizer now supports inputs larger than 1024 chars.
(Adrien Grand)
* LUCENE-4959: Fix incorrect return value in
SimpleNaiveBayesClassifier.assignClass. (Alexey Kutin via Adrien Grand)
* LUCENE-4972: DirectoryTaxonomyWriter created empty commits even if no changes
were made. (Shai Erera, Michael McCandless)
* LUCENE-949: AnalyzingQueryParser can't work with leading wildcards.
(Tim Allison, Robert Muir, Steve Rowe)
* LUCENE-4980: Fix issues preventing mixing of RangeFacetRequest and
non-RangeFacetRequest when using DrillSideways. (Mike McCandless,
Shai Erera)
* LUCENE-4996: Ensure DocInverterPerField always includes field name
in exception messages. (Markus Jelsma via Robert Muir)
* LUCENE-4992: Fix constructor of CustomScoreQuery to take FunctionQuery
for scoringQueries. Instead use QueryValueSource to safely wrap arbitrary
queries and use them with CustomScoreQuery. (John Wang, Robert Muir)
* LUCENE-5016: SamplingAccumulator returned inconsistent label if asked to
aggregate a non-existing category. Also fixed a bug in RangeAccumulator if
some readers did not have the requested numeric DV field.
(Rob Audenaerde, Shai Erera)
* LUCENE-5028: Remove pointless and confusing doShare option in FST's
PositiveIntOutputs (Han Jiang via Mike McCandless)
* LUCENE-5032: Fix IndexOutOfBoundsExc in PostingsHighlighter when
multi-valued fields exceed maxLength (Tomás Fernández Löbbe
via Mike McCandless)
* LUCENE-4933: SweetSpotSimilarity didn't apply its tf function to some
queries (SloppyPhraseQuery, SpanQueries). (Robert Muir)
* LUCENE-5033: SlowFuzzyQuery was accepting too many terms (documents) when
provided minSimilarity is an int > 1 (Tim Allison via Mike McCandless)
* LUCENE-5045: DrillSideways.search did not work on an empty index. (Shai Erera)
* LUCENE-4995: CompressingStoredFieldsReader now only reuses an internal buffer
when there is no more than 32kb to decompress. This prevents from running
into out-of-memory errors when working with large stored fields.
(Adrien Grand)
* LUCENE-5062: If the spatial data for a document was comprised of multiple
overlapping or adjacent parts then a CONTAINS predicate query might not match
when the sum of those shapes contain the query shape but none do individually.
A flag was added to use the original faster algorithm. (David Smiley)
* LUCENE-4971: Fixed NPE in AnalyzingSuggester when there are too many
graph expansions. (Alexey Kudinov via Mike McCandless)
* LUCENE-5080: Combined setMaxMergeCount and setMaxThreadCount into one
setter in ConcurrentMergePolicy: setMaxMergesAndThreads. Previously these
setters would not work unless you invoked them very carefully.
(Robert Muir, Shai Erera)
* LUCENE-5068: QueryParserUtil.escape() does not escape forward slash.
(Matias Holte via Steve Rowe)
* LUCENE-5103: A join on A single-valued field with deleted docs scored too few
docs. (David Smiley)
* LUCENE-5090: Detect mismatched readers passed to
SortedSetDocValuesReaderState and SortedSetDocValuesAccumulator.
(Robert Muir, Mike McCandless)
* LUCENE-5120: AnalyzingSuggester modified its FST's cached root arc if payloads
are used and the entire output resided on the root arc on the first access. This
caused subsequent suggest calls to fail. (Simon Willnauer)
Optimizations
* LUCENE-4936: Improve numeric doc values compression in case all values share
a common divisor. In particular, this improves the compression ratio of dates
without time when they are encoded as milliseconds since Epoch. Also support
TABLE compressed numerics in the Disk codec. (Robert Muir, Adrien Grand)
* LUCENE-4951: DrillSideways uses the new Scorer.cost() method to make
better decisions about which scorer to use internally. (Mike McCandless)
* LUCENE-4976: PersistentSnapshotDeletionPolicy writes its state to a
single snapshots_N file, and no longer requires closing (Mike
McCandless, Shai Erera)
* LUCENE-5035: Compress addresses in FieldCacheImpl.SortedDocValuesImpl more
efficiently. (Adrien Grand, Robert Muir)
* LUCENE-4941: Sort "from" terms only once when using JoinUtil.
(Martijn van Groningen)
* LUCENE-5050: Close the stored fields and term vectors index files as soon as
the index has been loaded into memory to save file descriptors. (Adrien Grand)
* LUCENE-5086: RamUsageEstimator now uses official Java 7 API or a proprietary
Oracle Java 6 API to get Hotspot MX bean, preventing AWT classes to be
loaded on MacOSX. (Shay Banon, Dawid Weiss, Uwe Schindler)
New Features
* LUCENE-5085: MorfologikFilter will no longer stem words marked as keywords
(Dawid Weiss, Grzegorz Sobczyk)
* LUCENE-5064: Added PagedMutable (internal), a paged extension of
PackedInts.Mutable which allows for storing more than 2B values. (Adrien Grand)
* LUCENE-4766: Added a PatternCaptureGroupTokenFilter that uses Java regexes to
emit multiple tokens one for each capture group in one or more patterns.
(Simon Willnauer, Clinton Gormley)
* LUCENE-4952: Expose control (protected method) in DrillSideways to
force all sub-scorers to be on the same document being collected.
This is necessary when using collectors like
ToParentBlockJoinCollector with DrillSideways. (Mike McCandless)
* SOLR-4761: Add SimpleMergedSegmentWarmer, which just initializes terms,
norms, docvalues, and so on. (Mark Miller, Mike McCandless, Robert Muir)
* LUCENE-4964: Allow arbitrary Query for per-dimension drill-down to
DrillDownQuery and DrillSideways, to support future dynamic faceting
methods (Mike McCandless)
* LUCENE-4966: Add CachingWrapperFilter.sizeInBytes() (Mike McCandless)
* LUCENE-4965: Add dynamic (no taxonomy index used) numeric range
faceting to Lucene's facet module (Mike McCandless, Shai Erera)
* LUCENE-4979: LiveFieldFields can work with any ReferenceManager, not
just ReferenceManager<IndexSearcher> (Mike McCandless).
* LUCENE-4975: Added a new Replicator module which can replicate index
revisions between server and client. (Shai Erera, Mike McCandless)
* LUCENE-5022: Added FacetResult.mergeHierarchies to merge multiple
FacetResult of the same dimension into a single one with the reconstructed
hierarchy. (Shai Erera)
* LUCENE-5026: Added PagedGrowableWriter, a new internal packed-ints structure
that grows the number of bits per value on demand, can store more than 2B
values and supports random write and read access. (Adrien Grand)
* LUCENE-5025: FST's Builder can now handle more than 2.1 billion
"tail nodes" while building a minimal FST. (Aaron Binns, Adrien
Grand, Mike McCandless)
* LUCENE-5063: FieldCache.DEFAULT.get(Ints|Longs) now uses bit-packing to save
memory. (Adrien Grand)
* LUCENE-5079: IndexWriter.hasUncommittedChanges() returns true if there are
changes that have not been committed. (yonik, Mike McCandless, Uwe Schindler)
* SOLR-4565: Extend NorwegianLightStemFilter and NorwegianMinimalStemFilter
to handle "nynorsk" (Erlend Garåsen, janhoy via Robert Muir)
* LUCENE-5087: Add getMultiValuedSeparator to PostingsHighlighter, for cases
where you want a different logical separator between field values. This can
be set to e.g. U+2029 PARAGRAPH SEPARATOR if you never want passes to span
values. (Mike McCandless, Robert Muir)
* LUCENE-5013: Added ScandinavianFoldingFilterFactory and
ScandinavianNormalizationFilterFactory (Karl Wettin via janhoy)
* LUCENE-4845: AnalyzingInfixSuggester finds suggestions based on
matches to any tokens in the suggestion, not just based on pure
prefix matching. (Mike McCandless, Robert Muir)
API Changes
* LUCENE-5077: Make it easier to use compressed norms. Lucene42NormsFormat takes
an overhead parameter, so you can easily pass a different value other than
PackedInts.FASTEST from your own codec. (Robert Muir)
* LUCENE-5097: Analyzer now has an additional tokenStream(String fieldName,
String text) method, so wrapping by StringReader for common use is no
longer needed. This method uses an internal reusable reader, which was
previously only used by the Field class. (Uwe Schindler, Robert Muir)
* LUCENE-4542: HunspellStemFilter's maximum recursion level is now configurable.
(Piotr, Rafał Kuć via Adrien Grand)
Build
* LUCENE-4987: Upgrade randomized testing to version 2.0.10:
Test framework may fail internally due to overly aggressive J9 optimizations.
(Dawid Weiss, Shai Erera)
* LUCENE-5043: The eclipse target now uses the containing directory for the
project name. This also enforces UTF-8 encoding when files are copied with
filtering.
* LUCENE-5055: "rat-sources" target now checks also build.xml, ivy.xml,
forbidden-api signatures, and parts of resources folders. (Ryan Ernst,
Uwe Schindler)
* LUCENE-5072: Automatically patch javadocs generated by JDK versions
before 7u25 to work around the frame injection vulnerability (CVE-2013-1571,
VU#225657). (Uwe Schindler)
Tests
* LUCENE-4901: TestIndexWriterOnJRECrash should work on any
JRE vendor via Runtime.halt().
(Mike McCandless, Robert Muir, Uwe Schindler, Rodrigo Trujillo, Dawid Weiss)
Changes in runtime behavior
* LUCENE-5038: New segments written by IndexWriter are now wrapped into CFS
by default. DocumentsWriterPerThread doesn't consult MergePolicy anymore
to decide if a CFS must be written, instead IndexWriterConfig now has a
property to enable / disable CFS for newly created segments. (Simon Willnauer)
* LUCENE-5107: Properties files by Lucene are now written in UTF-8 encoding,
Unicode is no longer escaped. Reading of legacy properties files with
\u escapes is still possible. (Uwe Schindler, Robert Muir)
======================= Lucene 4.3.1 =======================
Bug Fixes
* SOLR-4813: Fix SynonymFilterFactory to allow init parameters for
tokenizer factory used when parsing synonyms file. (Shingo Sasaki, hossman)
* LUCENE-4935: CustomScoreQuery wrongly applied its query boost twice
(boost^2). (Robert Muir)
* LUCENE-4948: Fixed ArrayIndexOutOfBoundsException in PostingsHighlighter
if you had a 64-bit JVM without compressed OOPS: IBM J9, or Oracle with
large heap/explicitly disabled. (Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-4953: Fixed ParallelCompositeReader to inform ReaderClosedListeners of
its synthetic subreaders. FieldCaches keyed on the atomic children will be purged
earlier and FC insanity prevented. In addition, ParallelCompositeReader's
toString() was changed to better reflect the reader structure.
(Mike McCandless, Uwe Schindler)
* LUCENE-4968: Fixed ToParentBlockJoinQuery/Collector: correctly handle parent
hits that had no child matches, don't throw IllegalArgumentEx when
the child query has no hits, more aggressively catch cases where childQuery
incorrectly matches parent documents (Mike McCandless)
* LUCENE-4970: Fix boost value of rewritten NGramPhraseQuery.
(Shingo Sasaki via Adrien Grand)
* LUCENE-4974: CommitIndexTask was broken if no params were set. (Shai Erera)
* LUCENE-4986: Fixed case where a newly opened near-real-time reader
fails to reflect a delete from IndexWriter.tryDeleteDocument (Reg,
Mike McCandless)
* LUCENE-4994: Fix PatternKeywordMarkerFilter to have public constructor.
(Uwe Schindler)
* LUCENE-4993: Fix BeiderMorseFilter to preserve custom attributes when
inserting tokens with position increment 0. (Uwe Schindler)
* LUCENE-4991: Fix handling of synonyms in classic QueryParser.getFieldQuery for
terms not separated by whitespace. PositionIncrementAttribute was ignored, so with
default AND synonyms wrongly became mandatory clauses, and with OR, the
coordination factor was wrong. (李威, Robert Muir)
* LUCENE-5002: IndexWriter#deleteAll() caused a deadlock in DWPT / DWSC if a
DwPT was flushing concurrently while deleteAll() aborted all DWPT. The IW
should never wait on DWPT via the flush control while holding on to the IW
Lock. (Simon Willnauer)
Optimizations
* LUCENE-4938: Don't use an unnecessarily large priority queue in IndexSearcher
methods that take top-N. (Uwe Schindler, Mike McCandless, Robert Muir)
======================= Lucene 4.3.0 =======================
Changes in backwards compatibility policy
* LUCENE-4810: EdgeNGramTokenFilter no longer increments position for
multiple ngrams derived from the same input token. (Walter Underwood
via Mike McCandless)
* LUCENE-4822: KeywordTokenFilter is now an abstract class. Subclasses
need to implement #isKeyword() in order to mark terms as keywords.
The existing functionality has been factored out into a new
SetKeywordTokenFilter class. (Simon Willnauer, Uwe Schindler)
* LUCENE-4642: Remove Tokenizer's and subclasses' ctors taking
AttributeSource. (Renaud Delbru, Uwe Schindler, Steve Rowe)
* LUCENE-4833: IndexWriterConfig used to use LogByteSizeMergePolicy when
calling setMergePolicy(null) although the default merge policy is
TieredMergePolicy. IndexWriterConfig setters now throw an exception when
passed null if null is not a valid value. (Adrien Grand)
* LUCENE-4849: Made ParallelTaxonomyArrays abstract with a concrete
implementation for DirectoryTaxonomyWriter/Reader. Also moved it under
o.a.l.facet.taxonomy. (Shai Erera)
* LUCENE-4876: IndexDeletionPolicy is now an abstract class instead of an
interface. IndexDeletionPolicy, MergeScheduler and InfoStream now implement
Cloneable. (Adrien Grand)
* LUCENE-4874: FilterAtomicReader and related classes (FilterTerms,
FilterDocsEnum, ...) don't forward anymore to the filtered instance when the
method has a default implementation through other abstract methods.
(Adrien Grand, Robert Muir)
* LUCENE-4642, LUCENE-4877: Implementors of TokenizerFactory, TokenFilterFactory,
and CharFilterFactory now need to provide at least one constructor taking
Map<String,String> to be able to be loaded by the SPI framework (e.g., from Solr).
In addition, TokenizerFactory needs to implement the abstract
create(AttributeFactory,Reader) method. (Renaud Delbru, Uwe Schindler,
Steve Rowe, Robert Muir)
API Changes
* LUCENE-4896: Made PassageFormatter abstract in PostingsHighlighter, made
members of DefaultPassageFormatter protected. (Luca Cavanna via Robert Muir)
* LUCENE-4844: removed TaxonomyReader.getParent(), you should use
TaxonomyReader.getParallelArrays().parents() instead. (Shai Erera)
* LUCENE-4742: Renamed spatial 'Node' to 'Cell', along with any method names
and variables using this terminology. (David Smiley)
New Features
* LUCENE-4815: DrillSideways now allows more than one FacetRequest per
dimension (Mike McCandless)
* LUCENE-3918: IndexSorter has been ported to 4.3 API and now supports
sorting documents by a numeric DocValues field, or reverse the order of
the documents in the index. Additionally, apps can implement their own
sort criteria. (Anat Hashavit, Shai Erera)
* LUCENE-4817: Added KeywordRepeatFilter that allows to emit a token twice
once as a keyword and once as an ordinary token allow stemmers to emit
a stemmed version along with the un-stemmed version. (Simon Willnauer)
* LUCENE-4822: PatternKeywordTokenFilter can mark tokens as keywords based
on regular expressions. (Simon Willnauer, Uwe Schindler)
* LUCENE-4821: AnalyzingSuggester now uses the ending offset to
determine whether the last token was finished or not, so that a
query "i " will no longer suggest "Isla de Muerta" for example.
(Mike McCandless)
* LUCENE-4642: Add create(AttributeFactory) to TokenizerFactory and
subclasses with ctors taking AttributeFactory.
(Renaud Delbru, Uwe Schindler, Steve Rowe)
* LUCENE-4820: Add payloads to Analyzing/FuzzySuggester, to record an
arbitrary byte[] per suggestion (Mike McCandless)
* LUCENE-4816: Add WholeBreakIterator to PostingsHighlighter
for treating the entire content as a single Passage. (Robert
Muir, Mike McCandless)
* LUCENE-4827: Add additional ctor to PostingsHighlighter PassageScorer
to provide bm25 k1,b,avgdl parameters. (Robert Muir)
* LUCENE-4607: Add DocIDSetIterator.cost() and Spans.cost() for optimizing
scoring. (Simon Willnauer, Robert Muir)
* LUCENE-4795: Add SortedSetDocValuesFacetFields and
SortedSetDocValuesAccumulator, to compute topK facet counts from a
field's SortedSetDocValues. This method only supports flat
(dim/label) facets, is a bit (~25%) slower, has added cost
per-IndexReader-open to compute its ordinal map, but it requires no
taxonomy index and it tie-breaks facet labels in an understandable
(by Unicode sort order) way. (Robert Muir, Mike McCandless)
* LUCENE-4843: Add LimitTokenPositionFilter: don't emit tokens with
positions that exceed the configured limit. (Steve Rowe)
* LUCENE-4832: Add ToParentBlockJoinCollector.getTopGroupsWithAllChildDocs, to retrieve
all children in each group. (Aleksey Aleev via Mike McCandless)
* LUCENE-4846: PostingsHighlighter subclasses can override where the
String values come from (it still defaults to pulling from stored
fields). (Robert Muir, Mike McCandless)
* LUCENE-4853: Add PostingsHighlighter.highlightFields method that
takes int[] docIDs instead of TopDocs. (Robert Muir, Mike
McCandless)
* LUCENE-4856: If there are no matches for a given field, return the
first maxPassages sentences (Robert Muir, Mike McCandless)
* LUCENE-4859: IndexReader now exposes Terms statistics: getDocCount,
getSumDocFreq, getSumTotalTermFreq. (Shai Erera)
* LUCENE-4862: It is now possible to terminate collection of a single
IndexReader leaf by throwing a CollectionTerminatedException in
Collector.collect. (Adrien Grand, Shai Erera)
* LUCENE-4752: New SortingMergePolicy (in lucene/misc) that sorts documents
before merging segments. (Adrien Grand, Shai Erera, David Smiley)
* LUCENE-4860: Customize scoring and formatting per-field in
PostingsHighlighter by subclassing and overriding the getFormatter
and/or getScorer methods. This also changes Passage.getMatchTerms()
to return BytesRef[] instead of Term[]. (Robert Muir, Mike
McCandless)
* LUCENE-4839: Added SorterTemplate.timSort, a O(n log n) stable sort algorithm
that performs well on partially sorted data. (Adrien Grand)
* LUCENE-4644: Added support for the "IsWithin" spatial predicate for
RecursivePrefixTreeStrategy. It's for matching non-point indexed shapes; if
you only have points (1/doc) then "Intersects" is equivalent and faster.
See the javadocs. (David Smiley)
* LUCENE-4861: Make BreakIterator per-field in PostingsHighlighter. This means
you can override getBreakIterator(String field) to use different mechanisms
for e.g. title vs. body fields. (Mike McCandless, Robert Muir)
* LUCENE-4645: Added support for the "Contains" spatial predicate for
RecursivePrefixTreeStrategy. (David Smiley)
* LUCENE-4898: DirectoryReader.openIfChanged now allows opening a reader
on an IndexCommit starting from a near-real-time reader (previously
this would throw IllegalArgumentException). (Mike McCandless)
* LUCENE-4905: Made the maxPassages parameter per-field in PostingsHighlighter.
(Robert Muir)
* LUCENE-4897: Added TaxonomyReader.getChildren for traversing a category's
children. (Shai Erera)
* LUCENE-4902: Added FilterDirectoryReader to allow easy filtering of a
DirectoryReader's subreaders. (Alan Woodward, Adrien Grand, Uwe Schindler)
* LUCENE-4858: Added EarlyTerminatingSortingCollector to be used in conjunction
with SortingMergePolicy, which allows to early terminate queries on sorted
indexes, when the sort order matches the index order. (Adrien Grand, Shai Erera)
* LUCENE-4904: Added descending sort order to NumericDocValuesSorter. (Shai Erera)
* LUCENE-3786: Added SearcherTaxonomyManager, to manage access to both
IndexSearcher and DirectoryTaxonomyReader for near-real-time
faceting. (Shai Erera, Mike McCandless)
* LUCENE-4915: DrillSideways now allows drilling down on fields that
are not faceted. (Mike McCandless)
* LUCENE-4895: Added support for the "IsDisjointTo" spatial predicate for
RecursivePrefixTreeStrategy. (David Smiley)
* LUCENE-4774: Added FieldComparator that allows sorting parent documents based on
fields on the child / nested document level. (Martijn van Groningen)
Optimizations
* LUCENE-4839: SorterTemplate.merge can now be overridden in order to replace
the default implementation which merges in-place by a faster implementation
that could require fewer swaps at the expense of some extra memory.
ArrayUtil and CollectionUtil override it so that their mergeSort and timSort
methods are faster but only require up to 1% of extra memory. (Adrien Grand)
* LUCENE-4571: Speed up BooleanQuerys with minNrShouldMatch to use
skipping. (Stefan Pohl via Robert Muir)
* LUCENE-4863: StemmerOverrideFilter now uses a FST to represent its overrides
in memory. (Simon Willnauer)
* LUCENE-4889: UnicodeUtil.codePointCount implementation replaced with a
non-array-lookup version. (Dawid Weiss)
* LUCENE-4923: Speed up BooleanQuerys processing of in-order disjunctions.
(Robert Muir)
* LUCENE-4926: Speed up DisjunctionMatchQuery. (Robert Muir)
* LUCENE-4930: Reduce contention in older/buggy JVMs when using
AttributeSource#addAttribute() because java.lang.ref.ReferenceQueue#poll()
is implemented using synchronization. (Christian Ziech, Karl Wright,
Uwe Schindler)
Bug Fixes
* LUCENE-4868: SumScoreFacetsAggregator used an incorrect index into
the scores array. (Shai Erera)
* LUCENE-4882: FacetsAccumulator did not allow to count ROOT category (i.e.
count dimensions). (Shai Erera)
* LUCENE-4876: IndexWriterConfig.clone() now clones its MergeScheduler,
IndexDeletionPolicy and InfoStream in order to make an IndexWriterConfig and
its clone fully independent. (Adrien Grand)
* LUCENE-4893: Facet counts were multiplied as many times as
FacetsCollector.getFacetResults() is called. (Shai Erera)
* LUCENE-4888: Fixed SloppyPhraseScorer, MultiDocs(AndPositions)Enum and
MultiSpansWrapper which happened to sometimes call DocIdSetIterator.advance
with target<=current (in this case the behavior of advance is undefined).
(Adrien Grand)
* LUCENE-4899: FastVectorHighlighter failed with StringIndexOutOfBoundsException
if a single highlight phrase or term was greater than the fragCharSize producing
negative string offsets. (Simon Willnauer)
* LUCENE-4877: Throw exception for invalid arguments in analysis factories.
(Steve Rowe, Uwe Schindler, Robert Muir)
* LUCENE-4914: SpatialPrefixTree's Node/Cell.reset() forgot to reset the 'leaf'
flag. It affects SpatialRecursivePrefixTreeStrategy on non-point indexed
shapes, as of Lucene 4.2. (David Smiley)
* LUCENE-4913: FacetResultNode.ordinal was always 0 when all children
are returned. (Mike McCandless)
* LUCENE-4918: Highlighter closes the given IndexReader if QueryScorer
is used with an external IndexReader. (Simon Willnauer, Sirvan Yahyaei)
* LUCENE-4880: Fix MemoryIndex to consume empty terms from the tokenstream consistent
with IndexWriter. Previously it discarded them. (Timothy Allison via Robert Muir)
* LUCENE-4885: FacetsAccumulator did not set the correct value for
FacetResult.numValidDescendants. (Mike McCandless, Shai Erera)
* LUCENE-4925: Fixed IndexSearcher.search when the argument list contains a Sort
and one of the sort fields is the relevance score. Only IndexSearchers created
with an ExecutorService are concerned. (Adrien Grand)
* LUCENE-4738, LUCENE-2727, LUCENE-2812: Simplified
DirectoryReader.indexExists so that it's more robust to transient
IOExceptions (e.g. due to issues like file descriptor exhaustion),
but this will also cause it to err towards returning true for
example if the directory contains a corrupted index or an incomplete
initial commit. In addition, IndexWriter with OpenMode.CREATE will
now succeed even if the directory contains a corrupted index (Billow
Gao, Robert Muir, Mike McCandless)
* LUCENE-4928: Stored fields and term vectors could become super slow in case
of tiny documents (a few bytes). This is especially problematic when switching
codecs since bulk-merge strategies can't be applied and the same chunk of
documents can end up being decompressed thousands of times. A hard limit on
the number of documents per chunk has been added to fix this issue.
(Robert Muir, Adrien Grand)
* LUCENE-4934: Fix minor equals/hashcode problems in facet/DrillDownQuery,
BoostingQuery, MoreLikeThisQuery, FuzzyLikeThisQuery, and block join queries.
(Robert Muir, Uwe Schindler)
* LUCENE-4504: Fix broken sort comparator in ValueSource.getSortField,
used when sorting by a function query. (Tom Shally via Robert Muir)
* LUCENE-4937: Fix incorrect sorting of float/double values (+/-0, NaN).
(Robert Muir, Uwe Schindler)
Documentation
* LUCENE-4841: Added example SimpleSortedSetFacetsExample to show how
to use the new SortedSetDocValues backed facet implementation.
(Shai Erera, Mike McCandless)
Build
* LUCENE-4879: Upgrade randomized testing to version 2.0.9:
Filter stack traces on console output. (Dawid Weiss, Robert Muir)
======================= Lucene 4.2.1 =======================
Bug Fixes
* LUCENE-4713: The SPI components used to load custom codecs or analysis
components were fixed to also scan the Lucene ClassLoader in addition
to the context ClassLoader, so Lucene is always able to find its own
codecs. The special case of a null context ClassLoader is now also
supported. (Christian Kohlschütter, Uwe Schindler)
* LUCENE-4819: seekExact(BytesRef, boolean) did not work correctly with
Sorted[Set]DocValuesTermsEnum. (Robert Muir)
* LUCENE-4826: PostingsHighlighter was not returning the top N best
scoring passages. (Robert Muir, Mike McCandless)
* LUCENE-4854: Fix DocTermOrds.getOrdTermsEnum() to not return negative
ord on initial next(). (Robert Muir)
* LUCENE-4836: Fix SimpleRateLimiter#pause to return the actual time spent
sleeping instead of the wakeup timestamp in nano seconds. (Simon Willnauer)
* LUCENE-4828: BooleanQuery no longer extracts terms from its MUST_NOT
clauses. (Mike McCandless)
* SOLR-4589: Fixed CPU spikes and poor performance in lazy field loading
of multivalued fields. (hossman)
* LUCENE-4870: Fix bug where an entire index might be deleted by the IndexWriter
due to false detection if an index exists in the directory when
OpenMode.CREATE_OR_APPEND is used. This might also affect application that set
the open mode manually using DirectoryReader#indexExists. (Simon Willnauer)
* LUCENE-4878: Override getRegexpQuery in MultiFieldQueryParser to prevent
NullPointerException when regular expression syntax is used with
MultiFieldQueryParser. (Simon Willnauer, Adam Rauch)
Optimizations
* LUCENE-4819: Added Sorted[Set]DocValues.termsEnum(), and optimized the
default codec for improved enumeration performance. (Robert Muir)
* LUCENE-4854: Speed up TermsEnum of FieldCache.getDocTermOrds.
(Robert Muir)
* LUCENE-4857: Don't unnecessarily copy stem override map in
StemmerOverrideFilter. (Simon Willnauer)
======================= Lucene 4.2.0 =======================
Changes in backwards compatibility policy
* LUCENE-4602: FacetFields now stores facet ordinals in a DocValues field,
rather than a payload. This forces rebuilding existing indexes, or do a
one time migration using FacetsPayloadMigratingReader. Since DocValues
support in-memory caching, CategoryListCache was removed too.
(Shai Erera, Michael McCandless)
* LUCENE-4697: FacetResultNode is now a concrete class with public members
(instead of getter methods). (Shai Erera)
* LUCENE-4600: FacetsCollector is now an abstract class with two
implementations: StandardFacetsCollector (the old version of
FacetsCollector) and CountingFacetsCollector. FacetsCollector.create()
returns the most optimized collector for the given parameters.
(Shai Erera, Michael McCandless)
* LUCENE-4700: OrdinalPolicy is now per CategoryListParams, and is no longer
an interface, but rather an enum with values NO_PARENTS and ALL_PARENTS.
PathPolicy was removed, you should extend FacetFields and DrillDownStream
to control which categories are added as drill-down terms. (Shai Erera)
* LUCENE-4547: DocValues improvements:
- Simplified codec API: codecs are now only responsible for encoding and
decoding docvalues, they do not need to do buffering or RAM accounting.
- Per-Field support: added PerFieldDocValuesFormat, which allows you to
use a different DocValuesFormat per field (like postings).
- Unified with FieldCache api: DocValues can be accessed via FieldCache API,
so it works automatically with grouping/join/sort/function queries, etc.
- Simplified types: There are only 3 types (NUMERIC, BINARY, SORTED), so it's
not necessary to specify for example that all of your binary values have
the same length. Instead it's easy for the Codec API to optimize encoding
based on any properties of the content.
(Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
* LUCENE-4757: Cleanup and refactoring of FacetsAccumulator, FacetRequest,
FacetsAggregator and FacetResultsHandler API. If your application did
FacetsCollector.create(), you should not be affected, but if you wrote
an Aggregator, then you should migrate it to the per-segment
FacetsAggregator. You can still use StandardFacetsAccumulator, which works
with the old API (for now). (Shai Erera)
* LUCENE-4761: Facet packages reorganized. Should be easy to fix your import
statements, if you use an IDE such as Eclipse. (Shai Erera)
* LUCENE-4750: Convert DrillDown to DrillDownQuery, so you can initialize it
and add drill-down categories to it. (Michael McCandless, Shai Erera)
* LUCENE-4759: remove FacetRequest.SortBy; result categories are always
sorted by value, while ties are broken by category ordinal. (Shai Erera)
* LUCENE-4772: Facet associations moved to new FacetsAggregator API. You
should override FacetsAccumulator and return the relevant aggregator,
for aggregating the association values. (Shai Erera)
* LUCENE-4748: A FacetRequest on a non-existent field now returns an
empty FacetResult instead of skipping it. (Shai Erera, Mike McCandless)
* LUCENE-4806: The default category delimiter character was changed
from U+F749 to U+001F, since the latter uses 1 byte vs 3 bytes for
the former. Existing facet indices must be reindexed. (Robert
Muir, Shai Erera, Mike McCandless)
Optimizations
* LUCENE-4687: BloomFilterPostingsFormat now lazily initializes delegate
TermsEnum only if needed to do a seek or get a DocsEnum. (Simon Willnauer)
* LUCENE-4677, LUCENE-4682: unpacked FSTs now use vInt to encode the node target,
to reduce their size (Mike McCandless)
* LUCENE-4678: FST now uses a paged byte[] structure instead of a
single byte[] internally, to avoid large memory spikes during
building (James Dyer, Mike McCandless)
* LUCENE-3298: FST can now be larger than 2.1 GB / 2.1 B nodes.
(James Dyer, Mike McCandless)
* LUCENE-4690: Performance improvements and non-hashing versions
of NumericUtils.*ToPrefixCoded() (yonik)
* LUCENE-4715: CategoryListParams.getOrdinalPolicy now allows to return a
different OrdinalPolicy per dimension, to better tune how you index
facets. Also added OrdinalPolicy.ALL_BUT_DIMENSION.
(Shai Erera, Michael McCandless)
* LUCENE-4740: Don't track clones of MMapIndexInput if unmapping
is disabled. This reduces GC overhead. (Kristofer Karlsson, Uwe Schindler)
* LUCENE-4733: The default Lucene 4.2 codec now uses a more compact
TermVectorsFormat (Lucene42TermVectorsFormat) based on
CompressingTermVectorsFormat. (Adrien Grand)
* LUCENE-3729: The default Lucene 4.2 codec now uses a more compact
DocValuesFormat (Lucene42DocValuesFormat). Sorted values are stored in an
FST, Numerics and Ordinals use a number of strategies (delta-compression,
table-compression, etc), and memory addresses use MonotonicBlockPackedWriter.
(Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
* LUCENE-4792: Reduction of the memory required to build the doc ID maps used
when merging segments. (Adrien Grand)
* LUCENE-4794: Spatial RecursivePrefixTreeStrategy's search filter: Skip calls
to termsEnum.seek() when the next term is known to follow the current cell.
(David Smiley)
New Features
* LUCENE-4686: New specialized DGapVInt8IntEncoder for facets (now the
default). (Shai Erera)
* LUCENE-4703: Add simple PrintTaxonomyStats tool to see summary
information about the facets taxonomy index. (Mike McCandless)
* LUCENE-4599: New oal.codecs.compressing.CompressingTermVectorsFormat which
compresses term vectors into chunks of documents similarly to
CompressingStoredFieldsFormat. (Adrien Grand)
* LUCENE-4695: Added LiveFieldValues utility class, for getting the
current (live, real-time) value for any indexed doc/field. The
class buffers recently indexed doc/field values until a new
near-real-time reader is opened that contains those changes.
(Robert Muir, Mike McCandless)
* LUCENE-4723: Add AnalyzerFactoryTask to benchmark, and enable analyzer
creation via the resulting factories using NewAnalyzerTask. (Steve Rowe)
* LUCENE-4728: Unknown and not explicitly mapped queries are now rewritten
against the highlighting IndexReader to obtain primitive queries before
discarding the query entirely. WeightedSpanTermExtractor now builds a
MemoryIndex only once even if multiple fields are highlighted.
(Simon Willnauer)
* LUCENE-4035: Added ICUCollationDocValuesField, more efficient
support for Locale-sensitive sort and range queries for
single-valued fields. (Robert Muir)
* LUCENE-4547: Added MonotonicBlockPacked(Reader/Writer), which provide
efficient random access to large amounts of monotonically increasing
positive values (e.g. file offsets). Each block stores the minimum value
and the average gap, and values are encoded as signed deviations from
the expected value. (Adrien Grand)
* LUCENE-4547: Added AppendingLongBuffer, an append-only buffer that packs
signed long values in memory and provides an efficient iterator API.
(Adrien Grand)
* LUCENE-4540: It is now possible for a codec to represent norms with
less than 8 bits per value. For performance reasons this is not done
by default, but you can customize your codec (e.g. pass PackedInts.DEFAULT
to Lucene42DocValuesConsumer) if you want to make this tradeoff.
(Adrien Grand, Robert Muir)
* LUCENE-4764: A new Facet42Codec and Facet42DocValuesFormat provide
faster but more RAM-consuming facet performance. (Shai Erera, Mike
McCandless)
* LUCENE-4769: Added OrdinalsCache and CachedOrdsCountingFacetsAggregator
which uses the cache to obtain a document's ordinals. This aggregator
is faster than others, however consumes much more RAM.
(Michael McCandless, Shai Erera)
* LUCENE-4778: Add a getter for the delegate in RateLimitedDirectoryWrapper.
(Mark Miller)
* LUCENE-4765: Add a multi-valued docvalues type (SORTED_SET). This is equivalent
to building a FieldCache.getDocTermOrds at index-time. (Robert Muir)
* LUCENE-4780: Add MonotonicAppendingLongBuffer: an append-only buffer for
monotonically increasing values. (Adrien Grand)
* LUCENE-4748: Added DrillSideways utility class for computing both
drill-down and drill-sideways counts for a DrillDownQuery. (Mike
McCandless)
API Changes
* LUCENE-4709: FacetResultNode no longer has a residue field. (Shai Erera)
* LUCENE-4716: DrillDown.query now takes Occur, allowing to specify if
categories should be OR'ed or AND'ed. (Shai Erera)
* LUCENE-4695: ReferenceManager.RefreshListener.afterRefresh now takes
a boolean indicating whether a new reference was in fact opened, and
a new beforeRefresh method notifies you when a refresh attempt is
starting. (Robert Muir, Mike McCandless)
* LUCENE-4794: Spatial RecursivePrefixTreeFilter replaced by
IntersectsPrefixTreeFilter and some extensible base classes. (David Smiley)
Bug Fixes
* LUCENE-4705: Pass on FilterStrategy in FilteredQuery if the filtered query is
rewritten. (Simon Willnauer)
* LUCENE-4712: MemoryIndex#normValues() throws NPE if field doesn't exist.
(Simon Willnauer, Ricky Pritchett)
* LUCENE-4550: Shapes wider than 180 degrees would use too much accuracy for the
PrefixTree based SpatialStrategy. For a pathological case of nearly 360
degrees and barely any height, it would generate so many indexed terms
(> 500k) that it could even cause an OutOfMemoryError. Fixed. (David Smiley)
* LUCENE-4704: Make join queries override hashcode and equals methods.
(Martijn van Groningen)
* LUCENE-4724: Fix bug in CategoryPath which allowed passing null or empty
string components. This is forbidden now (throws an exception). Note that if
you have a taxonomy index created with such strings, you should rebuild it.
(Michael McCandless, Shai Erera)
* LUCENE-4732: Fixed TermsEnum.seekCeil/seekExact on term vectors.
(Adrien Grand, Robert Muir)
* LUCENE-4739: Fixed bugs that prevented FSTs more than ~1.1GB from
being saved and loaded (Adrien Grand, Mike McCandless)
* LUCENE-4717: Fixed bug where Lucene40DocValuesFormat would sometimes write
an extra unused ordinal for sorted types. The bug is detected and corrected
on-the-fly for old indexes. (Robert Muir)
* LUCENE-4547: Fixed bug where Lucene40DocValuesFormat was unable to encode
segments that would exceed 2GB total data. This could happen in some surprising
cases, for example if you had an index with more than 260M documents and a
VAR_INT field. (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
* LUCENE-4775: Remove SegmentInfo.sizeInBytes() and make
MergePolicy.OneMerge.totalBytesSize thread safe (Josh Bronson via
Robert Muir, Mike McCandless)
* LUCENE-4770: If spatial's TermQueryPrefixTreeStrategy was used to search
indexed non-point shapes, then there was an edge case where a query should
find a shape but it didn't. The fix is the removal of an optimization that
simplifies some leaf cells into a parent. The index data for such a field is
now ~20% larger. This optimization is still done for the query shape, and for
indexed data for RecursivePrefixTreeStrategy. Furthermore, this optimization
is enhanced to roll up beyond the bottom cell level. (David Smiley,
Florian Schilling)
* LUCENE-4790: Fix FieldCacheImpl.getDocTermOrds to not bake deletes into the
cached datastructure. Otherwise this can cause inconsistencies with readers
at different points in time. (Robert Muir)
* LUCENE-4791: A conjunction of terms (ConjunctionTermScorer) scanned on
the lowest frequency term instead of skipping, leading to potentially
large performance impacts for many non-random or non-uniform
term distributions. (John Wang, yonik)
* LUCENE-4798: PostingsHighlighter's formatter sometimes didn't highlight
matched terms. (Robert Muir)
* LUCENE-4796, SOLR-4373: Fix concurrency issue in NamedSPILoader and
AnalysisSPILoader when doing reload (e.g. from Solr).
(Uwe Schindler, Hossman)
* LUCENE-4802: Don't compute norms for drill-down facet fields. (Mike McCandless)
* LUCENE-4804: PostingsHighlighter sometimes applied terms to the wrong passage,
if they started exactly on a passage boundary. (Robert Muir)
Documentation
* LUCENE-4718: Fixed documentation of oal.queryparser.classic.
(Hayden Muhl via Adrien Grand)
* LUCENE-4784, LUCENE-4785, LUCENE-4786: Fixed references to deprecated classes
SinkTokenizer, ValueSourceQuery and RangeQuery. (Hao Zhong via Adrien Grand)
Build
* LUCENE-4654: Test duration statistics from multiple test runs should be
reused. (Dawid Weiss)
* LUCENE-4636: Upgrade ivy to 2.3.0 (Shawn Heisey via Robert Muir)
* LUCENE-4570: Use the Policeman Forbidden API checker, released separately
from Lucene and downloaded via Ivy. (Uwe Schindler, Robert Muir)
* LUCENE-4758: 'ant jar', 'ant compile', and 'ant compile-test' should
recurse. (Steve Rowe)
======================= Lucene 4.1.0 =======================
Changes in backwards compatibility policy
* LUCENE-4514: Scorer's freq() method returns an integer value indicating
the number of times the scorer matches the current document. Previously
this was only sometimes the case, in some cases it returned a (meaningless)
floating point value. Scorer now extends DocsEnum so it has attributes().
(Robert Muir)
* LUCENE-4543: TFIDFSimilarity's index-time computeNorm is now final to
match the fact that its query-time norm usage requires a FIXED_8 encoding.
Override lengthNorm and/or encode/decodeNormValue to change the specifics,
like Lucene 3.x. (Robert Muir)
* LUCENE-3441: The facet module now supports NRT. As a result, the following
changes were made:
- DirectoryTaxonomyReader has a new constructor which takes a
DirectoryTaxonomyWriter. You should use that constructor in order to get
the NRT support (or the old one for non-NRT).
- TaxonomyReader.refresh() removed in exchange for TaxonomyReader.openIfChanged
static method. Similar to DirectoryReader, the method either returns null
if no changes were made to the taxonomy, or a new TR instance otherwise.
Instead of calling refresh(), you should write similar code to how you reopen
a regular DirectoryReader.
- TaxonomyReader.openIfChanged (previously refresh()) no longer throws
InconsistentTaxonomyException, and supports recreate. InconsistentTaxoEx
was removed.
- ChildrenArrays was pulled out of TaxonomyReader into a top-level class.
- TaxonomyReader was made an abstract class (instead of an interface), with
methods such as close() and reference counting management pulled from
DirectoryTaxonomyReader, and made final. The rest of the methods, remained
abstract.
(Shai Erera, Gilad Barkai)
* LUCENE-4576: Remove CachingWrapperFilter(Filter, boolean). This recacheDeletes
option gave less than 1% speedup at the expense of cache churn (filters were
invalidated on reopen if even a single delete was posted against the segment).
(Robert Muir)
* LUCENE-4575: Replace IndexWriter's commit/prepareCommit versions that take
commitData with setCommitData(). That allows committing changes to IndexWriter
even if the commitData is the only thing that changes.
(Shai Erera, Michael McCandless)
* LUCENE-4565: TaxonomyReader.getParentArray and .getChildrenArrays consolidated
into one getParallelTaxonomyArrays(). You can obtain the 3 arrays that the
previous two methods returned by calling parents(), children() or siblings()
on the returned ParallelTaxonomyArrays. (Shai Erera)
* LUCENE-4585: Spatial PrefixTree based Strategies (either TermQuery or
RecursivePrefix based) MAY want to re-index if used for point data. If a
re-index is not done, then an indexed point is ~1/2 the smallest grid cell
larger and as such is slightly more likely to match a query shape.
(David Smiley)
* LUCENE-4604: DefaultOrdinalPolicy removed in favor of OrdinalPolicy.ALL_PARENTS.
Same for DefaultPathPolicy (now PathPolicy.ALL_CATEGORIES). In addition, you
can use OrdinalPolicy.NO_PARENTS to never write any parent category ordinal
to the fulltree posting payload (but note that you need a special
FacetsAccumulator - see javadocs). (Shai Erera)
* LUCENE-4594: Spatial PrefixTreeStrategy no longer indexes center points of
non-point shapes. If you want to call makeDistanceValueSource() based on
shape centers, you need to do this yourself in another spatial field.
(David Smiley)
* LUCENE-4615: Replace IntArrayAllocator and FloatArrayAllocator by ArraysPool.
FacetArrays no longer takes those allocators; if you need to reuse the arrays,
you should use ReusingFacetArrays. (Shai Erera, Gilad Barkai)
* LUCENE-4621: FacetIndexingParams is now a concrete class (instead of DefaultFIP).
Also, the entire IndexingParams chain is now immutable. If you need to override
a setting, you should extend the relevant class.
Additionally, FacetSearchParams is now immutable, and requires all FacetRequests
to specified at initialization time. (Shai Erera)
* LUCENE-4647: CategoryDocumentBuilder and EnhancementsDocumentBuilder are replaced
by FacetFields and AssociationsFacetFields respectively. CategoryEnhancement and
AssociationEnhancement were removed in favor of a simplified CategoryAssociation
interface, with CategoryIntAssociation and CategoryFloatAssociation
implementations.
NOTE: indexes that contain category enhancements/associations are not supported
by the new code and should be recreated. (Shai Erera)
* LUCENE-4659: Massive cleanup to CategoryPath API. Additionally, CategoryPath is
now immutable, so you don't need to clone() it. (Shai Erera)
* LUCENE-4670: StoredFieldsWriter and TermVectorsWriter have new finish* callbacks
which are called after a doc/field/term has been completely added.
(Adrien Grand, Robert Muir)
* LUCENE-4620: IntEncoder/Decoder were changed to do bulk encoding/decoding. As a
result, few other classes such as Aggregator and CategoryListIterator were
changed to handle bulk category ordinals. (Shai Erera)
* LUCENE-4683: CategoryListIterator and Aggregator are now per-segment. As such
their implementations no longer take a top-level IndexReader in the constructor
but rather implement a setNextReader. (Shai Erera)
New Features
* LUCENE-4226: New experimental StoredFieldsFormat that compresses chunks of
documents together in order to improve the compression ratio. (Adrien Grand)
* LUCENE-4426: New ValueSource implementations (in lucene/queries) for
DocValues fields. (Adrien Grand)
* LUCENE-4410: FilteredQuery now exposes a FilterStrategy that exposes
how filters are applied during query execution. (Simon Willnauer)
* LUCENE-4404: New ListOfOutputs (in lucene/misc) for FSTs wraps
another Outputs implementation, allowing you to store more than one
output for a single input. UpToTwoPositiveIntsOutputs was moved
from lucene/core to lucene/misc. (Mike McCandless)
* LUCENE-3842: New AnalyzingSuggester, for doing auto-suggest
using an analyzer. This can create powerful suggesters: if the analyzer
remove stop words then "ghost chr..." could suggest "The Ghost of
Christmas Past"; if SynonymFilter is used to map wifi and wireless
network to hotspot, then "wirele..." could suggest "wifi router";
token normalization likes stemmers, accent removal, etc. would allow
the suggester to ignore such variations. (Robert Muir, Sudarshan
Gaikaiwari, Mike McCandless)
* LUCENE-4446: Lucene 4.1 has a new default index format (Lucene41Codec)
that incorporates the previously experimental "Block" postings format
for better search performance.
(Han Jiang, Adrien Grand, Robert Muir, Mike McCandless)
* LUCENE-3846: New FuzzySuggester, like AnalyzingSuggester except it
also finds completions allowing for fuzzy edits in the input string.
(Robert Muir, Simon Willnauer, Mike McCandless)
* LUCENE-4515: MemoryIndex now supports adding the same field multiple
times. (Simon Willnauer)
* LUCENE-4489: Added consumeAllTokens option to LimitTokenCountFilter
(hossman, Robert Muir)
* LUCENE-4566: Add NRT/SearcherManager.RefreshListener/addListener to
be notified whenever a new searcher was opened. (selckin via Shai
Erera, Mike McCandless)
* SOLR-4123: Add per-script customizability to ICUTokenizerFactory via
rule files in the ICU RuleBasedBreakIterator format.
(Shawn Heisey, Robert Muir, Steve Rowe)
* LUCENE-4590: Added WriteEnwikiLineDocTask - a benchmark task for writing
Wikipedia category pages and non-category pages into separate line files.
extractWikipedia.alg was changed to use this task, so now it creates two
files. (Doron Cohen)
* LUCENE-4290: Added PostingsHighlighter to the highlighter module. It uses
offsets from the postings lists to highlight documents. (Robert Muir)
* LUCENE-4628: Added CommonTermsQuery that executes high-frequency terms
in a optional sub-query to prevent slow queries due to "common" terms
like stopwords. (Simon Willnauer)
API Changes
* LUCENE-4399: Deprecated AppendingCodec. Lucene's term dictionaries
no longer seek when writing. (Adrien Grand, Robert Muir)
* LUCENE-4479: Rename TokenStream.getTokenStream(IndexReader, int, String)
to TokenStream.getTokenStreamWithOffsets, and return null on failure
rather than throwing IllegalArgumentException. (Alan Woodward)
* LUCENE-4472: MergePolicy now accepts a MergeTrigger that provides
information about the trigger of the merge ie. merge triggered due
to a segment merge or a full flush etc. (Simon Willnauer)
* LUCENE-4415: TermsFilter is now immutable. All terms need to be provided
as constructor argument. (Simon Willnauer)
* LUCENE-4520: ValueSource.getSortField no longer throws IOExceptions
(Alan Woodward)
* LUCENE-4537: RateLimiter is now separated from FSDirectory and exposed via
RateLimitingDirectoryWrapper. Any Directory can now be rate-limited.
(Simon Willnauer)
* LUCENE-4591: CompressingStoredFields{Writer,Reader} now accept a segment
suffix as a constructor parameter. (Renaud Delbru via Adrien Grand)
* LUCENE-4605: Added DocsEnum.FLAG_NONE which can be passed instead of 0 as
the flag to .docs() and .docsAndPositions(). (Shai Erera)
* LUCENE-4617: Remove FST.pack() method. Previously to make a packed FST,
you had to make a Builder with willPackFST=true (telling it you will later pack it),
create your fst with finish(), and then call pack() to get another FST.
Instead just pass true for doPackFST to Builder and finish() returns a packed FST.
(Robert Muir)
* LUCENE-4663: Deprecate IndexSearcher.document(int, Set). This was not intended
to be final, nor named document(). Use IndexSearcher.doc(int, Set) instead.
(Robert Muir)
* LUCENE-4684: Made DirectSpellChecker extendable.
(Martijn van Groningen)
Bug Fixes
* LUCENE-1822: BaseFragListBuilder hard-coded 6 char margin is too naive.
(Alex Vigdor, Arcadius Ahouansou, Koji Sekiguchi)
* LUCENE-4468: Fix rareish integer overflows in Lucene41 postings
format. (Robert Muir)
* LUCENE-4486: Add support for ConstantScoreQuery in Highlighter.
(Simon Willnauer)
* LUCENE-4485: When CheckIndex terms, terms/docs pairs and tokens,
these counts now all exclude deleted documents. (Mike McCandless)
* LUCENE-4479: Highlighter works correctly for fields with term vector
positions, but no offsets. (Alan Woodward)
* SOLR-3906: JapaneseReadingFormFilter in romaji mode will return
romaji even for out-of-vocabulary kana cases (e.g. half-width forms).
(Robert Muir)
* LUCENE-4511: TermsFilter might return wrong results if a field is not
indexed or doesn't exist in the index. (Simon Willnauer)
* LUCENE-4521: IndexWriter.tryDeleteDocument could return true
(successfully deleting the document) but then on IndexWriter
close/commit fail to write the new deletions, if no other changes
happened in the IndexWriter instance. (Ivan Vasilev via Mike
McCandless)
* LUCENE-4513: Fixed that deleted nested docs are scored into the
parent doc when using ToParentBlockJoinQuery. (Martijn van Groningen)
* LUCENE-4534: Fixed WFSTCompletionLookup and Analyzing/FuzzySuggester
to allow 0 byte values in the lookup keys. (Mike McCandless)
* LUCENE-4532: DirectoryTaxonomyWriter use a timestamp to denote taxonomy
index re-creation, which could cause a bug in case machine clocks were
not synced. Instead, it now tracks an 'epoch' version, which is incremented
whenever the taxonomy is re-created, or replaced. (Shai Erera)
* LUCENE-4544: Fixed off-by-1 in ConcurrentMergeScheduler that would
allow 1+maxMergeCount merges threads to be created, instead of just
maxMergeCount (Radim Kolar, Mike McCandless)
* LUCENE-4567: Fixed NullPointerException in analyzing, fuzzy, and
WFST suggesters when no suggestions were added (selckin via Mike
McCandless)
* LUCENE-4568: Fixed integer overflow in
PagedBytes.PagedBytesData{In,Out}put.getPosition. (Adrien Grand)
* LUCENE-4581: GroupingSearch.setAllGroups(true) was failing to
actually compute allMatchingGroups (dizh@neusoft.com via Mike
McCandless)
* LUCENE-4009: Improve TermsFilter.toString (Tim Costermans via Chris
Male, Mike McCandless)
* LUCENE-4588: Benchmark's EnwikiContentSource was discarding last wiki
document and had leaking threads in 'forever' mode. (Doron Cohen)
* LUCENE-4585: Spatial RecursivePrefixTreeFilter had some bugs that only
occurred when shapes were indexed. In what appears to be rare circumstances,
documents with shapes near a query shape were erroneously considered a match.
In addition, it wasn't possible to index a shape representing the entire
globe.
* LUCENE-4595: EnwikiContentSource had a thread safety problem (NPE) in
'forever' mode (Doron Cohen)
* LUCENE-4587: fix WordBreakSpellChecker to not throw AIOOBE when presented
with 2-char codepoints, and to correctly break/combine terms containing
non-latin characters. (James Dyer, Andreas Hubold)
* LUCENE-4596: fix a concurrency bug in DirectoryTaxonomyWriter.
(Shai Erera)
* LUCENE-4594: Spatial PrefixTreeStrategy would index center-points in addition
to the shape to index if it was non-point, in the same field. But sometimes
the center-point isn't actually in the shape (consider a LineString), and for
highly precise shapes it could cause makeDistanceValueSource's cache to load
parts of the shape's boundary erroneously too. So center points aren't
indexed any more; you should use another spatial field. (David Smiley)
* LUCENE-4629: IndexWriter misses to delete documents if a document block is
indexed and the Iterator throws an exception. Documents were only rolled back
if the actual indexing process failed. (Simon Willnauer)
* LUCENE-4608: Handle large number of requested fragments better.
(Martijn van Groningen)
* LUCENE-4633: DirectoryTaxonomyWriter.replaceTaxonomy did not refresh its
internal reader, which could cause an existing category to be added twice.
(Shai Erera)
* LUCENE-4461: If you added the same FacetRequest more than once, you would get
inconsistent results. (Gilad Barkai via Shai Erera)
* LUCENE-4656: Fix regression in IndexWriter to work with empty TokenStreams
that have no TermToBytesRefAttribute (commonly provided by CharTermAttribute),
e.g., oal.analysis.miscellaneous.EmptyTokenStream.
(Uwe Schindler, Adrien Grand, Robert Muir)
* LUCENE-4660: ConcurrentMergeScheduler was taking too long to
un-pause incoming threads it had paused when too many merges were
queued up. (Mike McCandless)
* LUCENE-4662: Add missing elided articles and prepositions to FrenchAnalyzer's
DEFAULT_ARTICLES list passed to ElisionFilter. (David Leunen via Steve Rowe)
* LUCENE-4671: Fix CharsRef.subSequence method. (Tim Smith via Robert Muir)
* LUCENE-4465: Let ConstantScoreQuery's Scorer return its child scorer.
(selckin via Uwe Schindler)
Changes in Runtime Behavior
* LUCENE-4586: Change default ResultMode of FacetRequest to PER_NODE_IN_TREE.
This only affects requests with depth>1. If you execute such requests and
rely on the facet results being returned flat (i.e. no hierarchy), you should
set the ResultMode to GLOBAL_FLAT. (Shai Erera, Gilad Barkai)
* LUCENE-1822: Improves the text window selection by recalculating the starting margin
once all phrases in the fragment have been identified in FastVectorHighlighter. This
way if a single word is matched in a fragment, it will appear in the middle of the highlight,
instead of 6 characters from the beginning. This way one can also guarantee that
the entirety of short texts are represented in a fragment by specifying a large
enough fragCharSize.
Optimizations
* LUCENE-2221: oal.util.BitUtil was modified to use Long.bitCount and
Long.numberOfTrailingZeros (which are intrinsics since Java 6u18) instead of
pure java bit twiddling routines in order to improve performance on modern
JVMs/hardware. (Dawid Weiss, Adrien Grand)
* LUCENE-4509: Enable stored fields compression by default in the Lucene 4.1
default codec. (Adrien Grand)
* LUCENE-4536: PackedInts on-disk format is now byte-aligned (it used to be
long-aligned), saving up to 7 bytes per array of values.
(Adrien Grand, Mike McCandless)
* LUCENE-4512: Additional memory savings for CompressingStoredFieldsFormat.
(Adrien Grand, Robert Muir)
* LUCENE-4443: Lucene41PostingsFormat no longer writes unnecessary offsets
into the skipdata. (Robert Muir)
* LUCENE-4459: Improve WeakIdentityMap.keyIterator() to remove GCed keys
from backing map early instead of waiting for reap(). This makes test
failures in TestWeakIdentityMap disappear, too.
(Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-4473: Lucene41PostingsFormat encodes offsets more efficiently
for low frequency terms (< 128 occurrences). (Robert Muir)
* LUCENE-4462: DocumentsWriter now flushes deletes, segment infos and builds
CFS files if necessary during segment flush and not during publishing. The latter
was a single threaded process while now all IO and CPU heavy computation is done
concurrently in DocumentsWriterPerThread. (Simon Willnauer)
* LUCENE-4496: Optimize Lucene41PostingsFormat when requesting a subset of
the postings data (via flags to TermsEnum.docs/docsAndPositions) to use
ForUtil.skipBlock. (Robert Muir)
* LUCENE-4497: Don't write PosVIntCount to the positions file in
Lucene41PostingsFormat, as it's always totalTermFreq % BLOCK_SIZE. (Robert Muir)
* LUCENE-4498: In Lucene41PostingsFormat, when a term appears in only one document,
Instead of writing a file pointer to a VIntBlock containing the doc id, just
write the doc id. (Mike McCandless, Robert Muir)
* LUCENE-4515: MemoryIndex now uses Byte/IntBlockPool internally to hold terms and
posting lists. All index data is represented as consecutive byte/int arrays to
reduce GC cost and memory overhead. (Simon Willnauer)
* LUCENE-4538: DocValues now caches direct sources in a ThreadLocal exposed via SourceCache.
Users of this API can now simply obtain an instance via DocValues#getDirectSource per thread.
(Simon Willnauer)
* LUCENE-4580: DrillDown.query variants return a ConstantScoreQuery with boost set to 0.0f
so that documents scores are not affected by running a drill-down query. (Shai Erera)
* LUCENE-4598: PayloadIterator no longer uses top-level IndexReader to iterate on the
posting's payload. (Shai Erera, Michael McCandless)
* LUCENE-4661: Drop default maxThreadCount to 1 and maxMergeCount to 2
in ConcurrentMergeScheduler, for faster merge performance on
spinning-magnet drives (Mike McCandless)
Documentation
* LUCENE-4483: Refer to BytesRef.deepCopyOf in Term's constructor that takes BytesRef.
(Paul Elschot via Robert Muir)
Build
* LUCENE-4650: Upgrade randomized testing to version 2.0.8: make the
test framework more robust under low memory conditions. (Dawid Weiss)
* LUCENE-4603: Upgrade randomized testing to version 2.0.5: print forked
JVM PIDs on heartbeat from hung tests (Dawid Weiss)
* Upgrade randomized testing to version 2.0.4: avoid hangs on shutdown
hooks hanging forever by calling Runtime.halt() in addition to
Runtime.exit() after a short delay to allow graceful shutdown (Dawid Weiss)
* LUCENE-4451: Memory leak per unique thread caused by
RandomizedContext.contexts static map. Upgrade randomized testing
to version 2.0.2 (Mike McCandless, Dawid Weiss)
* LUCENE-4589: Upgraded benchmark module's Nekohtml dependency to version
1.9.17, removing the workaround in Lucene's HTML parser for the
Turkish locale. (Uwe Schindler)
* LUCENE-4601: Fix ivy availability check to use typefound, so it works
if called from another build file. (Ryan Ernst via Robert Muir)
======================= Lucene 4.0.0 =======================
Changes in backwards compatibility policy
* LUCENE-4392: Class org.apache.lucene.util.SortedVIntList has been removed.
(Adrien Grand)
* LUCENE-4393: RollingCharBuffer has been moved to the o.a.l.analysis.util
package of lucene-analysis-common. (Adrien Grand)
New Features
* LUCENE-1888: Added the option to store payloads in the term
vectors (IndexableFieldType.storeTermVectorPayloads()). Note
that you must store term vector positions to store payloads.
(Robert Muir)
* LUCENE-3892: Add a new BlockPostingsFormat that bulk-encodes docs,
freqs and positions in large (size 128) packed-int blocks for faster
search performance. This was from Han Jiang's 2012 Google Summer of
Code project (Han Jiang, Adrien Grand, Robert Muir, Mike McCandless)
* LUCENE-4323: Added support for an absolute maximum CFS segment size
(in MiB) to LogMergePolicy and TieredMergePolicy.
(Alexey Lef via Uwe Schindler)
* LUCENE-4339: Allow deletes against 3.x segments for easier upgrading.
Lucene3x Codec is still otherwise read-only, you should not set it
as the default Codec on IndexWriter, because it cannot write new segments.
(Mike McCandless, Robert Muir)
* SOLR-3441: ElisionFilterFactory is now MultiTermAware
(Jack Krupansky via hossman)
API Changes
* LUCENE-4391, LUCENE-4440: All methods of Lucene40Codec but
getPostingsFormatForField are now final. To reuse functionality
of Lucene40, you should extend FilterCodec and delegate to Lucene40
instead of extending Lucene40Codec. (Adrien Grand, Shai Erera,
Robert Muir, Uwe Schindler)
* LUCENE-4299: Added Terms.hasPositions() and Terms.hasOffsets().
Previously you had no real way to know that a term vector field
had positions or offsets, since this can be configured on a
per-field-per-document basis. (Robert Muir)
* Removed DocsAndPositionsEnum.hasPayload() and simplified the
contract of getPayload(). It returns null if there is no payload,
otherwise returns the current payload. You can now call it multiple
times per position if you want. (Robert Muir)
* Removed FieldsEnum. Fields API instead implements Iterable<String>
and exposes Iterator, so you can iterate over field names with
for (String field : fields) instead. (Robert Muir)
* LUCENE-4152: added IndexReader.leaves(), which lets you enumerate
the leaf atomic reader contexts for all readers in the tree.
(Uwe Schindler, Robert Muir)
* LUCENE-4304: removed PayloadProcessorProvider. If you want to change
payloads (or other things) when merging indexes, it's recommended
to just use a FilterAtomicReader + IndexWriter.addIndexes. See the
OrdinalMappingAtomicReader and TaxonomyMergeUtils in the facets
module if you want an example of this.
(Mike McCandless, Uwe Schindler, Shai Erera, Robert Muir)
* LUCENE-4304: Make CompositeReader.getSequentialSubReaders()
protected. To get atomic leaves of any IndexReader use the new method
leaves() (LUCENE-4152), which lists AtomicReaderContexts including
the doc base of each leaf. (Uwe Schindler, Robert Muir)
* LUCENE-4307: Renamed IndexReader.getTopReaderContext to
IndexReader.getContext. (Robert Muir)
* LUCENE-4316: Deprecate Fields.getUniqueTermCount and remove it from
AtomicReader. If you really want the unique term count across all
fields, just sum up Terms.size() across those fields. This method
only exists so that this statistic can be accessed for Lucene 3.x
segments, which don't support Terms.size(). (Uwe Schindler, Robert Muir)
* LUCENE-4321: Change CharFilter to extend Reader directly, as FilterReader
overdelegates (read(), read(char[], int, int), skip, etc). This made it
hard to implement CharFilters that were correct. Instead only close() is
delegated by default: read(char[], int, int) and correct(int) are abstract
so that it's obvious which methods you should implement. The protected
inner Reader is 'input' like CharFilter in the 3.x series, instead of 'in'.
(Dawid Weiss, Uwe Schindler, Robert Muir)
* LUCENE-3309: The expert FieldSelector API, used to load only certain
fields in a stored document, has been replaced with the simpler
StoredFieldVisitor API. (Mike McCandless)
* LUCENE-4343: Made Tokenizer.setReader final. This is a setter that should
not be overridden by subclasses: per-stream initialization should happen
in reset(). (Robert Muir)
* LUCENE-4377: Remove IndexInput.copyBytes(IndexOutput, long).
Use DataOutput.copyBytes(DataInput, long) instead.
(Mike McCandless, Robert Muir)
* LUCENE-4355: Simplify AtomicReader's sugar methods such as termDocsEnum,
termPositionsEnum, docFreq, and totalTermFreq to only take Term as a
parameter. If you want to do expert things such as pass a different
Bits as liveDocs, then use the flex apis (fields(), terms(), etc) directly.
(Mike McCandless, Robert Muir)
* LUCENE-4425: clarify documentation of StoredFieldVisitor.binaryValue
and simplify the api to binaryField(FieldInfo, byte[]).
(Adrien Grand, Robert Muir)
Bug Fixes
* LUCENE-4423: DocumentStoredFieldVisitor.binaryField ignored offset and
length. (Adrien Grand)
* LUCENE-4297: BooleanScorer2 would multiply the coord() factor
twice for conjunctions: for most users this is no problem, but
if you had a customized Similarity that returned something other
than 1 when overlap == maxOverlap (always the case for conjunctions),
then the score would be incorrect. (Pascal Chollet, Robert Muir)
* LUCENE-4298: MultiFields.getTermDocsEnum(IndexReader, Bits, String, BytesRef)
did not work at all, it would infinitely recurse.
(Alberto Paro via Robert Muir)
* LUCENE-4300: BooleanQuery's rewrite was not always safe: if you
had a custom Similarity where coord(1,1) != 1F, then the rewritten
query would be scored differently. (Robert Muir)
* Don't allow negatives in the positions file. If you have an index
from 2.4.0 or earlier with such negative positions, and you already
upgraded to 3.x, then to Lucene 4.0-ALPHA or -BETA, you should run
CheckIndex. If it fails, then you need to upgrade again to 4.0 (Robert Muir)
* LUCENE-4303: PhoneticFilterFactory and SnowballPorterFilterFactory load their
encoders / stemmers via the ResourceLoader now instead of Class.forName().
Solr users should now no longer have to embed these in its war. (David Smiley)
* SOLR-3737: StempelPolishStemFilterFactory loaded its stemmer table incorrectly.
Also, ensure immutability and use only one instance of this table in RAM (lazy
loaded) since it's quite large. (sausarkar, Steven Rowe, Robert Muir)
* LUCENE-4310: MappingCharFilter was failing to match input strings
containing non-BMP Unicode characters. (Dawid Weiss, Robert Muir,
Mike McCandless)
* LUCENE-4224: Add in-order scorer to query time joining and the
out-of-order scorer throws an UOE. (Martijn van Groningen, Robert Muir)
* LUCENE-4333: Fixed NPE in TermGroupFacetCollector when faceting on mv fields.
(Jesse MacVicar, Martijn van Groningen)
* LUCENE-4218: Document.get(String) and Field.stringValue() again return
values for numeric fields, like Lucene 3.x and consistent with the documentation.
(Jamie, Uwe Schindler, Robert Muir)
* NRTCachingDirectory was always caching a newly flushed segment in
RAM, instead of checking the estimated size of the segment
to decide whether to cache it. (Mike McCandless)
* LUCENE-3720: fix memory-consumption issues with BeiderMorseFilter.
(Thomas Neidhart via Robert Muir)
* LUCENE-4401: Fix bug where DisjunctionSumScorer would sometimes call score()
on a subscorer that had already returned NO_MORE_DOCS. (Liu Chao, Robert Muir)
* LUCENE-4411: when sampling is enabled for a FacetRequest, its depth
parameter is reset to the default (1), even if set otherwise.
(Gilad Barkai via Shai Erera)
* LUCENE-4455: Fix bug in SegmentInfoPerCommit.sizeInBytes() that was
returning 2X the true size, inefficiently. Also fixed bug in
CheckIndex that would report no deletions when a segment has
deletions, and vice/versa. (Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-4456: Fixed double-counting sizeInBytes for a segment
(affects how merge policies pick merges); fixed CheckIndex's
incorrect reporting of whether a segment has deletions; fixed case
where on abort Lucene could remove files it didn't create; fixed
many cases where IndexWriter could leave leftover files (on
exception in various places, on reuse of a segment name after crash
and recovery. (Uwe Schindler, Robert Muir, Mike McCandless)
Optimizations
* LUCENE-4322: Decrease lucene-core JAR size. The core JAR size had increased a
lot because of generated code introduced in LUCENE-4161 and LUCENE-3892.
(Adrien Grand)
* LUCENE-4317: Improve reuse of internal TokenStreams and StringReader
in oal.document.Field. (Uwe Schindler, Chris Male, Robert Muir)
* LUCENE-4327: Support out-of-order scoring in FilteredQuery for higher
performance. (Mike McCandless, Robert Muir)
* LUCENE-4364: Optimize MMapDirectory to not make a mapping per-cfs-slice,
instead one map per .cfs file. This reduces the total number of maps.
Additionally factor out a (package-private) generic
ByteBufferIndexInput from MMapDirectory. (Uwe Schindler, Robert Muir)
Build
* LUCENE-4406, LUCENE-4407: Upgrade to randomizedtesting 2.0.1.
Workaround for broken test output XMLs due to non-XML text unicode
chars in strings. Added printing of failed tests at the end of a
test run (Dawid Weiss)
* LUCENE-4252: Detect/Fail tests when they leak RAM in static fields
(Robert Muir, Dawid Weiss)
* LUCENE-4360: Support running the same test suite multiple times in
parallel (Dawid Weiss)
* LUCENE-3985: Upgrade to randomizedtesting 2.0.0. Added support for
thread leak detection. Added support for suite timeouts. (Dawid Weiss)
* LUCENE-4354: Corrected maven dependencies to be consistent with
the licenses/ folder and the binary release. Some had different
versions or additional unnecessary dependencies. (selckin via Robert Muir)
* LUCENE-4340: Move all non-default codec, postings format and terms
dictionary implementations to lucene/codecs. (Adrien Grand)
Documentation
* LUCENE-4302: Fix facet userguide to have HTML loose doctype like
all other javadocs. (Karl Nicholas via Uwe Schindler)
======================= Lucene 4.0.0-BETA =======================
New features
* LUCENE-4249: Changed the explanation of the PayloadTermWeight to use the
underlying PayloadFunction's explanation as the explanation
for the payload score. (Scott Smerchek via Robert Muir)
* LUCENE-4069: Added BloomFilteringPostingsFormat for use with low-frequency terms
such as primary keys (Mark Harwood, Mike McCandless)
* LUCENE-4201: Added JapaneseIterationMarkCharFilter to normalize Japanese
iteration marks. (Robert Muir, Christian Moen)
* LUCENE-3832: Added BasicAutomata.makeStringUnion method to efficiently
create automata from a fixed collection of UTF-8 encoded BytesRef
(Dawid Weiss, Robert Muir)
* LUCENE-4153: Added option to fast vector highlighting via BaseFragmentsBuilder to
respect field boundaries in the case of highlighting for multivalued fields.
(Martijn van Groningen)
* LUCENE-4227: Added DirectPostingsFormat, to hold all postings in
memory as uncompressed simple arrays. This uses a tremendous amount
of RAM but gives good search performance gains. (Mike McCandless)
* LUCENE-2510, LUCENE-4044: Migrated Solr's Tokenizer-, TokenFilter-, and
CharFilterFactories to the lucene-analysis module. The API is still
experimental. (Chris Male, Robert Muir, Uwe Schindler)
* LUCENE-4230: When pulling a DocsAndPositionsEnum you can now
specify whether or not you require payloads (in addition to
offsets); turning one or both off may allow some codec
implementations to optimize the enum implementation. (Robert Muir,
Mike McCandless)
* LUCENE-4203: Add IndexWriter.tryDeleteDocument(AtomicReader reader,
int docID), to attempt deletion by docID as long as the provided
reader is an NRT reader, and the segment has not yet been merged
away (Mike McCandless).
* LUCENE-4286: Added option to CJKBigramFilter to always also output
unigrams. This can be used for a unigram+bigram approach, or at
index-time only for better support of short queries.
(Tom Burton-West, Robert Muir)
API Changes
* LUCENE-4138: update of morfologik (Polish morphological analyzer) to 1.5.3.
The tag attribute class has been renamed to MorphosyntacticTagsAttribute and
has a different API (carries a list of tags instead of a compound tag). Upgrade
of embedded morfologik dictionaries to version 1.9. (Dawid Weiss)
* LUCENE-4178: set 'tokenized' to true on FieldType by default, so that if you
make a custom FieldType and set indexed = true, it's analyzed by the analyzer.
(Robert Muir)
* LUCENE-4220: Removed the buggy JavaCC-based HTML parser in the benchmark
module and replaced by NekoHTML. HTMLParser interface was cleaned up while
changing method signatures. (Uwe Schindler, Robert Muir)
* LUCENE-2191: Rename Tokenizer.reset(Reader) to Tokenizer.setReader(Reader).
The purpose of this method was always to set a new Reader on the Tokenizer,
reusing the object. But the name was often confused with TokenStream.reset().
(Robert Muir)
* LUCENE-4228: Refactored CharFilter to extend java.io.FilterReader. CharFilters
filter another reader and you override correct() for offset correction.
(Robert Muir)
* LUCENE-4240: Analyzer api now just takes fieldName for getOffsetGap. If the
field is not analyzed (e.g. StringField), then the analyzer is not invoked
at all. If you want to tweak things like positionIncrementGap and offsetGap,
analyze the field with KeywordTokenizer instead. (Grant Ingersoll, Robert Muir)
* LUCENE-4250: Pass fieldName to the PayloadFunction explain method, so it
parallels with docScore and the default implementation is correct.
(Robert Muir)
* LUCENE-3747: Support Unicode 6.1.0. (Steve Rowe)
* LUCENE-3884: Moved ElisionFilter out of org.apache.lucene.analysis.fr
package into org.apache.lucene.analysis.util. (Robert Muir)
* LUCENE-4230: When pulling a DocsAndPositionsEnum you now pass an int
flags instead of the previous boolean needOffsets. Currently
recognized flags are DocsAndPositionsEnum.FLAG_PAYLOADS and
DocsAndPositionsEnum.FLAG_OFFSETS (Robert Muir, Mike McCandless)
* LUCENE-4273: When pulling a DocsEnum, you can pass an int flags
instead of the previous boolean needsFlags; consistent with the changes
for DocsAndPositionsEnum in LUCENE-4230. Currently the only flag
is DocsEnum.FLAG_FREQS. (Robert Muir, Mike McCandless)
* LUCENE-3616: TextField(String, Reader, Store) was reduced to TextField(String, Reader),
as the Store parameter didn't make sense: if you supplied Store.YES, you would only
receive an exception anyway. (Robert Muir)
Optimizations
* LUCENE-4171: Performance improvements to Packed64.
(Toke Eskildsen via Adrien Grand)
* LUCENE-4184: Performance improvements to the aligned packed bits impl.
(Toke Eskildsen, Adrien Grand)
* LUCENE-4235: Remove enforcing of Filter rewrite for NRQ queries.
(Uwe Schindler)
* LUCENE-4279: Regenerated snowball Stemmers from snowball r554,
making them substantially more lightweight. Behavior is unchanged.
(Robert Muir)
* LUCENE-4291: Reduced internal buffer size for Jflex-based tokenizers
such as StandardTokenizer from 32kb to 8kb.
(Raintung Li, Steven Rowe, Robert Muir)
Bug Fixes
* LUCENE-4109: BooleanQueries are not parsed correctly with the
flexible query parser. (Karsten Rauch via Robert Muir)
* LUCENE-4176: Fix AnalyzingQueryParser to analyze range endpoints as bytes,
so that it works correctly with Analyzers that produce binary non-UTF-8 terms
such as CollationAnalyzer. (Nattapong Sirilappanich via Robert Muir)
* LUCENE-4209: Fix FSTCompletionLookup to close its sorter, so that it won't
leave temp files behind in /tmp. Fix SortedTermFreqIteratorWrapper to not
leave temp files behind in /tmp on Windows. Fix Sort to not leave
temp files behind when /tmp is a separate volume. (Uwe Schindler, Robert Muir)
* LUCENE-4221: Fix overeager CheckIndex validation for term vector offsets.
(Robert Muir)
* LUCENE-4222: TieredMergePolicy.getFloorSegmentMB was returning the
size in bytes not MB (Chris Fuller via Mike McCandless)
* LUCENE-3505: Fix bug (Lucene 4.0alpha only) where boolean conjunctions
were sometimes scored incorrectly. Conjunctions of only termqueries where
at least one term omitted term frequencies (IndexOptions.DOCS_ONLY) would
be scored as if all terms omitted term frequencies. (Robert Muir)
* LUCENE-2686, LUCENE-3505: Fixed BooleanQuery scorers to return correct
freq(). Added support for scorer navigation API (Scorer.getChildren) to
all queries. Made Scorer.freq() abstract.
(Koji Sekiguchi, Mike McCandless, Robert Muir)
* LUCENE-4234: Exception when FacetsCollector is used with ScoreFacetRequest,
and the number of matching documents is too large. (Gilad Barkai via Shai Erera)
* LUCENE-4245: Make IndexWriter#close() and MergeScheduler#close()
non-interruptible. (Mark Miller, Uwe Schindler)
* LUCENE-4190: restrict allowed filenames that a codec may create to
the patterns recognized by IndexFileNames. This also fixes
IndexWriter to only delete files matching this pattern from an index
directory, to reduce risk when the wrong index path is accidentally
passed to IndexWriter (Robert Muir, Mike McCandless)
* LUCENE-4277: Fix IndexWriter deadlock during rollback if flushable DWPT
instance are already checked out and queued up but not yet flushed.
(Simon Willnauer)
* LUCENE-4282: Automaton FuzzyQuery didn't always deliver all results.
(Johannes Christen, Uwe Schindler, Robert Muir)
* LUCENE-4289: Fix minor idf inconsistencies/inefficiencies in highlighter.
(Robert Muir)
Changes in Runtime Behavior
* LUCENE-4109: Enable position increments in the flexible queryparser by default.
(Karsten Rauch via Robert Muir)
* LUCENE-3616: Field throws exception if you try to set a boost on an
unindexed field or one that omits norms. (Robert Muir)
Build
* LUCENE-4094: Support overriding file.encoding on forked test JVMs
(force via -Drandomized.file.encoding=XXX). (Dawid Weiss)
* LUCENE-4189: Test output should include timestamps (start/end for each
test/ suite). Added -Dtests.timestamps=[off by default]. (Dawid Weiss)
* LUCENE-4110: Report long periods of forked jvm inactivity (hung tests/ suites).
Added -Dtests.heartbeat=[seconds] with the default of 60 seconds.
(Dawid Weiss)
* LUCENE-4160: Added a property to quit the tests after a given
number of failures has occurred. This is useful in combination
with -Dtests.iters=N (you can start N iterations and wait for M
failures, in particular M = 1). -Dtests.maxfailures=M. Alternatively,
specify -Dtests.failfast=true to skip all tests after the first failure.
(Dawid Weiss)
* LUCENE-4115: JAR resolution/ cleanup should be done automatically for ant
clean/ eclipse/ resolve (Dawid Weiss)
* LUCENE-4199, LUCENE-4202, LUCENE-4206: Add a new target "check-forbidden-apis"
that parses all generated .class files for use of APIs that use default
charset, default locale, or default timezone and fail build if violations
found. This ensures, that Lucene / Solr is independent on local configuration
options. (Uwe Schindler, Robert Muir, Dawid Weiss)
* LUCENE-4217: Add the possibility to run tests with Atlassian Clover
loaded from IVY. A development License solely for Apache code was added in
the tools/ folder, but is not included in releases. (Uwe Schindler)
Documentation
* LUCENE-4195: Added package documentation and examples for
org.apache.lucene.codecs (Alan Woodward via Robert Muir)
======================= Lucene 4.0.0-ALPHA =======================
More information about this release, including any errata related to the
release notes, upgrade instructions, or other changes may be found online at:
https://wiki.apache.org/lucene-java/Lucene4.0
For "contrib" changes prior to 4.0, please see:
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_0/lucene/contrib/CHANGES.txt
Changes in backwards compatibility policy
* LUCENE-1458, LUCENE-2111, LUCENE-2354: Changes from flexible indexing:
- On upgrading to 4.0, if you do not fully reindex your documents,
Lucene will emulate the new flex API on top of the old index,
incurring some performance cost (up to ~10% slowdown, typically).
To prevent this slowdown, use oal.index.IndexUpgrader
to upgrade your indexes to latest file format (LUCENE-3082).
Mixed flex/pre-flex indexes are perfectly fine -- the two
emulation layers (flex API on pre-flex index, and pre-flex API on
flex index) will remap the access as required. So on upgrading to
4.0 you can start indexing new documents into an existing index.
To get optimal performance, use oal.index.IndexUpgrader
to upgrade your indexes to latest file format (LUCENE-3082).
- The postings APIs (TermEnum, TermDocsEnum, TermPositionsEnum)
have been removed in favor of the new flexible
indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum,
DocsEnum, DocsAndPositionsEnum). One big difference is that field
and terms are now enumerated separately: a TermsEnum provides a
BytesRef (wraps a byte[]) per term within a single field, not a
Term. Another is that when asking for a Docs/AndPositionsEnum, you
now specify the skipDocs explicitly (typically this will be the
deleted docs, but in general you can provide any Bits).
- The term vectors APIs (TermFreqVector, TermPositionVector,
TermVectorMapper) have been removed in favor of the above
flexible indexing APIs, presenting a single-document inverted
index of the document from the term vectors.
- MultiReader ctor now throws IOException
- Directory.copy/Directory.copyTo now copies all files (not just
index files), since what is and isn't and index file is now
dependent on the codecs used.
- UnicodeUtil now uses BytesRef for UTF-8 output, and some method
signatures have changed to CharSequence. These are internal APIs
and subject to change suddenly.
- Positional queries (PhraseQuery, *SpanQuery) will now throw an
exception if use them on a field that omits positions during
indexing (previously they silently returned no results).
- FieldCache.{Byte,Short,Int,Long,Float,Double}Parser's API has
changed -- each parse method now takes a BytesRef instead of a
String. If you have an existing Parser, a simple way to fix it is
invoke BytesRef.utf8ToString, and pass that String to your
existing parser. This will work, but performance would be better
if you could fix your parser to instead operate directly on the
byte[] in the BytesRef.
- The internal (experimental) API of NumericUtils changed completely
from String to BytesRef. Client code should never use this class,
so the change would normally not affect you. If you used some of
the methods to inspect terms or create TermQueries out of
prefix encoded terms, change to use BytesRef. Please note:
Do not use TermQueries to search for single numeric terms.
The recommended way is to create a corresponding NumericRangeQuery
with upper and lower bound equal and included. TermQueries do not
score correct, so the constant score mode of NRQ is the only
correct way to handle single value queries.
- NumericTokenStream now works directly on byte[] terms. If you
plug a TokenFilter on top of this stream, you will likely get
an IllegalArgumentException, because the NTS does not support
TermAttribute/CharTermAttribute. If you want to further filter
or attach Payloads to NTS, use the new NumericTermAttribute.
(Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch)
* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract
AtomicReader, CompositeReader, and DirectoryReader. To open Directory-
based indexes use DirectoryReader.open(), the corresponding method in
IndexReader is now deprecated for easier migration. Only DirectoryReader
supports commits, versions, and reopening with openIfChanged(). Terms,
postings, docvalues, and norms can from now on only be retrieved using
AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
only offering stored fields and access to the sub-readers (which may be
composite or atomic). SlowCompositeReaderWrapper (LUCENE-2597) can be
used to emulate atomic readers on top of composites.
Please review MIGRATE.txt for information how to migrate old code.
(Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
not unicode code units. For example, a Wildcard "?" represents any unicode
character. Furthermore, the rest of the automaton package and RegexpQuery use
true Unicode codepoint representation. (Robert Muir, Mike McCandless)
* LUCENE-2380: The String-based FieldCache methods (getStrings,
getStringIndex) have been replaced with BytesRef-based equivalents
(getTerms, getTermsIndex). Also, the sort values (returned in
FieldDoc.fields) when sorting by SortField.STRING or
SortField.STRING_VAL are now BytesRef instances. See MIGRATE.txt
for more details. (yonik, Mike McCandless)
* LUCENE-2480: Though not a change in backwards compatibility policy, pre-3.0
indexes are no longer supported. You should upgrade to 3.x first, then run
optimize(), or reindex. (Shai Erera, Earwin Burrfoot)
* LUCENE-2484: Removed deprecated TermAttribute. Use CharTermAttribute
and TermToBytesRefAttribute instead. (Uwe Schindler)
* LUCENE-2600: Remove IndexReader.isDeleted in favor of
AtomicReader.getDeletedDocs(). (Mike McCandless)
* LUCENE-2667: FuzzyQuery's defaults have changed for more performant
behavior: the minimum similarity is 2 edit distances from the word,
and the priority queue size is 50. To support this, FuzzyQuery now allows
specifying unscaled edit distances (foobar~2). If your application depends
upon the old defaults of 0.5 (scaled) minimum similarity and Integer.MAX_VALUE
priority queue size, you can use FuzzyQuery(Term, float, int, int) to specify
those explicitly.
* LUCENE-2674: MultiTermQuery.TermCollector.collect now accepts the
TermsEnum as well. (Robert Muir, Mike McCandless)
* LUCENE-588: WildcardQuery and QueryParser now allows escaping with
the '\' character. Previously this was impossible (you could not escape */?,
for example). If your code somehow depends on the old behavior, you will
need to change it (e.g. using "\\" to escape '\' itself).
(Sunil Kamath, Terry Yang via Robert Muir)
* LUCENE-2837: Collapsed Searcher, Searchable into IndexSearcher;
removed contrib/remote and MultiSearcher (Mike McCandless); absorbed
ParallelMultiSearcher into IndexSearcher as an optional
ExecutorServiced passed to its ctor. (Mike McCandless)
* LUCENE-2908, LUCENE-4037: Removed serialization code from lucene classes.
It is recommended that you serialize user search needs at a higher level
in your application.
(Robert Muir, Benson Margulies)
* LUCENE-2831: Changed Weight#scorer, Weight#explain & Filter#getDocIdSet to
operate on a AtomicReaderContext instead of directly on IndexReader to enable
searches to be aware of IndexSearcher's context. (Simon Willnauer)
* LUCENE-2839: Scorer#score(Collector,int,int) is now public because it is
called from other classes and part of public API. (Uwe Schindler)
* LUCENE-2865: Weight#scorer(AtomicReaderContext, boolean, boolean) now accepts
a ScorerContext struct instead of booleans.(Simon Willnauer)
* LUCENE-2882: Cut over SpanQuery#getSpans to AtomicReaderContext to enforce
per segment semantics on SpanQuery & Spans. (Simon Willnauer)
* LUCENE-2236: Similarity can now be configured on a per-field basis. See the
migration notes in MIGRATE.txt for more details. (Robert Muir, Doron Cohen)
* LUCENE-2315: AttributeSource's methods for accessing attributes are now final,
else it's easy to corrupt the internal states. (Uwe Schindler)
* LUCENE-2814: The IndexWriter.flush method no longer takes "boolean
flushDocStores" argument, as we now always flush doc stores (index
files holding stored fields and term vectors) while flushing a
segment. (Mike McCandless)
* LUCENE-2548: Field names (eg in Term, FieldInfo) are no longer
interned. (Mike McCandless)
* LUCENE-2883: The contents of o.a.l.search.function has been consolidated into
the queries module and can be found at o.a.l.queries.function. See
MIGRATE.txt for more information (Chris Male)
* LUCENE-2392, LUCENE-3299: Decoupled vector space scoring from
Query/Weight/Scorer. If you extended Similarity directly before, you should
extend TFIDFSimilarity instead. Similarity is now a lower-level API to
implement other scoring algorithms. See MIGRATE.txt for more details.
(David Nemeskey, Simon Willnauer, Mike McCandless, Robert Muir)
* LUCENE-3330: The expert visitor API in Scorer has been simplified and
extended to support arbitrary relationships. To navigate to a scorer's
children, call Scorer.getChildren(). (Robert Muir)
* LUCENE-2308: Field is now instantiated with an instance of IndexableFieldType,
of which there is a core implementation FieldType. Most properties
describing a Field have been moved to IndexableFieldType. See MIGRATE.txt
for more details. (Nikola Tankovic, Mike McCandless, Chris Male)
* LUCENE-3396: ReusableAnalyzerBase.TokenStreamComponents.reset(Reader) now
returns void instead of boolean. If a Component cannot be reset, it should
throw an Exception. (Chris Male)
* LUCENE-3396: ReusableAnalyzerBase has been renamed to Analyzer. All Analyzer
implementations must now use Analyzer.TokenStreamComponents, rather than
overriding .tokenStream() and .reusableTokenStream() (which are now final).
(Chris Male)
* LUCENE-3346: Analyzer.reusableTokenStream() has been renamed to tokenStream()
with the old tokenStream() method removed. Consequently it is now mandatory
for all Analyzers to support reusability. (Chris Male)
* LUCENE-3473: AtomicReader.getUniqueTermCount() no longer throws UOE when
it cannot be easily determined. Instead, it returns -1 to be consistent with
this behavior across other index statistics.
(Robert Muir)
* LUCENE-1536: The abstract FilteredDocIdSet.match() method is no longer
allowed to throw IOException. This change was required to make it conform
to the Bits interface. This method should never do I/O for performance reasons.
(Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley,
Jason Rutherglen, Paul Elschot)
* LUCENE-3559: The methods "docFreq" and "maxDoc" on IndexSearcher were removed,
as these are no longer used by the scoring system. See MIGRATE.txt for more
details. (Robert Muir)
* LUCENE-3533: Removed SpanFilters, they created large lists of objects and
did not scale. (Robert Muir)
* LUCENE-3606: IndexReader and subclasses were made read-only. It is no longer
possible to delete or undelete documents using IndexReader; you have to use
IndexWriter now. As deleting by internal Lucene docID is no longer possible,
this requires adding a unique identifier field to your index. Deleting/
relying upon Lucene docIDs is not recommended anyway, because they can
change. Consequently commit() was removed and DirectoryReader.open(),
openIfChanged() no longer take readOnly booleans or IndexDeletionPolicy
instances. Furthermore, IndexReader.setNorm() was removed. If you need
customized norm values, the recommended way to do this is by modifying
Similarity to use an external byte[] or one of the new DocValues
fields (LUCENE-3108). Alternatively, to dynamically change norms (boost
*and* length norm) at query time, wrap your AtomicReader using
FilterAtomicReader, overriding FilterAtomicReader.norms(). To persist the
changes on disk, copy the FilteredIndexReader to a new index using
IndexWriter.addIndexes(). (Uwe Schindler, Robert Muir)
* LUCENE-3640: Removed IndexSearcher.close(), because IndexSearcher no longer
takes a Directory and no longer "manages" IndexReaders, it is a no-op.
(Robert Muir)
* LUCENE-3684: Add offsets into DocsAndPositionsEnum, and a few
FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS. (Robert
Muir, Mike McCandless)
* LUCENE-2858, LUCENE-3770: FilterIndexReader was renamed to
FilterAtomicReader and now extends AtomicReader. If you want to filter
composite readers like DirectoryReader or MultiReader, filter their
atomic leaves and build a new CompositeReader (e.g. MultiReader) around
them. (Uwe Schindler, Robert Muir)
* LUCENE-3736: ParallelReader was split into ParallelAtomicReader
and ParallelCompositeReader. Lucene 3.x's ParallelReader is now
ParallelAtomicReader; but the new composite variant has improved performance
as it works on the atomic subreaders. It requires that all parallel
composite readers have the same subreader structure. If you cannot provide this,
you can use SlowCompositeReaderWrapper to make all parallel readers atomic
and use ParallelAtomicReader. (Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-2000: clone() now returns covariant types where possible. (ryan)
* LUCENE-3970: Rename Fields.getUniqueFieldCount -> .size() and
Terms.getUniqueTermCount -> .size(). (Iulius Curt via Mike McCandless)
* LUCENE-3514: IndexSearcher.setDefaultFieldSortScoring was removed
and replaced with per-search control via new expert search methods
that take two booleans indicating whether hit scores and max
score should be computed. (Mike McCandless)
* LUCENE-4055: You can't put foreign files into the index dir anymore.
* LUCENE-3866: CompositeReader.getSequentialSubReaders() now returns
unmodifiable List<? extends IndexReader>. ReaderUtil.Gather was
removed, as IndexReaderContext.leaves() is now the preferred way
to access sub-readers. (Uwe Schindler)
* LUCENE-4155: oal.util.ReaderUtil, TwoPhaseCommit, TwoPhaseCommitTool
classes were moved to oal.index package. oal.util.CodecUtil class was moved
to oal.codecs package. oal.util.DummyConcurrentLock was removed
(no longer used in Lucene 4.0). (Uwe Schindler)
Changes in Runtime Behavior
* LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if you
omitNorms(true) for field "a" for 1000 documents, but then add a document with
omitNorms(false) for field "a", all documents for field "a" will have no
norms. Previously, Lucene would fill the first 1000 documents with
"fake norms" from Similarity.getDefault(). (Robert Muir, Mike McCandless)
* LUCENE-2846: When some documents contain field "a", and others do not, the
documents that don't have the field get a norm byte value of 0. Previously,
Lucene would populate "fake norms" with Similarity.getDefault() for these
documents. (Robert Muir, Mike McCandless)
* LUCENE-2720: IndexWriter throws IndexFormatTooOldException on open, rather
than later when e.g. a merge starts.
(Shai Erera, Mike McCandless, Uwe Schindler)
* LUCENE-2881: FieldInfos is now tracked per segment. Before it was tracked
per IndexWriter session, which resulted in FieldInfos that had the FieldInfo
properties from all previous segments combined. Field numbers are now tracked
globally across IndexWriter sessions and persisted into a _X.fnx file on
successful commit. The corresponding file format changes are backwards-
compatible. (Michael Busch, Simon Willnauer)
* LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
DocumentsWriterPerThread:
- IndexWriter now uses a DocumentsWriter per thread when indexing documents.
Each DocumentsWriterPerThread indexes documents in its own private segment,
and the in memory segments are no longer merged on flush. Instead, each
segment is separately flushed to disk and subsequently merged with normal
segment merging.
- DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
FlushPolicy. When a DWPT is flushed, a fresh DWPT is swapped in so that
indexing may continue concurrently with flushing. The selected
DWPT flushes all its RAM resident documents do disk. Note: Segment flushes
don't flush all RAM resident documents but only the documents private to
the DWPT selected for flushing.
- Flushing is now controlled by FlushPolicy that is called for every add,
update or delete on IndexWriter. By default DWPTs are flushed either on
maxBufferedDocs per DWPT or the global active used memory. Once the active
memory exceeds ramBufferSizeMB only the largest DWPT is selected for
flushing and the memory used by this DWPT is subtracted from the active
memory and added to a flushing memory pool, which can lead to temporarily
higher memory usage due to ongoing indexing.
- IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address
up to 2048 MB memory such that the ramBufferSize is now bounded by the max
number of DWPT available in the used DocumentsWriterPerThreadPool.
IndexWriters net memory consumption can grow far beyond the 2048 MB limit if
the application can use all available DWPTs. To prevent a DWPT from
exhausting its address space IndexWriter will forcefully flush a DWPT if its
hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled
via IndexWriterConfig and defaults to 1945 MB.
Since IndexWriter flushes DWPT concurrently not all memory is released
immediately. Applications should still use a ramBufferSize significantly
lower than the JVMs available heap memory since under high load multiple
flushing DWPT can consume substantial transient memory when IO performance
is slow relative to indexing rate.
- IndexWriter#commit now doesn't block concurrent indexing while flushing all
'currently' RAM resident documents to disk. Yet, flushes that occur while a
a full flush is running are queued and will happen after all DWPT involved
in the full flush are done flushing. Applications using multiple threads
during indexing and trigger a full flush (eg call commit() or open a new
NRT reader) can use significantly more transient memory.
- IndexWriter#addDocument and IndexWriter.updateDocument can block indexing
threads if the number of active + number of flushing DWPT exceed a
safety limit. By default this happens if 2 * max number available thread
states (DWPTPool) is exceeded. This safety limit prevents applications from
exhausting their available memory if flushing can't keep up with
concurrently indexing threads.
- IndexWriter only applies and flushes deletes if the maxBufferedDelTerms
limit is reached during indexing. No segment flushes will be triggered
due to this setting.
- IndexWriter#flush(boolean, boolean) doesn't synchronized on IndexWriter
anymore. A dedicated flushLock has been introduced to prevent multiple full-
flushes happening concurrently.
- DocumentsWriter doesn't write shared doc stores anymore.
(Mike McCandless, Michael Busch, Simon Willnauer)
* LUCENE-3309: Stored fields no longer record whether they were
tokenized or not. In general you should not rely on stored fields
to record any "metadata" from indexing (tokenized, omitNorms,
IndexOptions, boost, etc.) (Mike McCandless)
* LUCENE-3309: Fast vector highlighter now inserts the
MultiValuedSeparator for NOT_ANALYZED fields (in addition to
ANALYZED fields). To ensure your offsets are correct you should
provide an analyzer that returns 1 from the offsetGap method.
(Mike McCandless)
* LUCENE-2621: Removed contrib/instantiated. (Robert Muir)
* LUCENE-1768: StandardQueryTreeBuilder no longer uses RangeQueryNodeBuilder
for RangeQueryNodes, since theses two classes were removed;
TermRangeQueryNodeProcessor now creates TermRangeQueryNode,
instead of RangeQueryNode; the same applies for numeric nodes;
(Vinicius Barros via Uwe Schindler)
* LUCENE-3455: QueryParserBase.newFieldQuery() will throw a ParseException if
any of the calls to the Analyzer throw an IOException. QueryParseBase.analyzeRangePart()
will throw a RuntimeException if an IOException is thrown by the Analyzer.
* LUCENE-4127: IndexWriter will now throw IllegalArgumentException if
the first token of an indexed field has 0 positionIncrement
(previously it silently corrected it to 1, possibly masking bugs).
OffsetAttributeImpl will throw IllegalArgumentException if startOffset
is less than endOffset, or if offsets are negative.
(Robert Muir, Mike McCandless)
API Changes
* LUCENE-2302, LUCENE-1458, LUCENE-2111, LUCENE-2514: Terms are no longer
required to be character based. Lucene views a term as an arbitrary byte[]:
during analysis, character-based terms are converted to UTF8 byte[],
but analyzers are free to directly create terms as byte[]
(NumericField does this, for example). The term data is buffered as
byte[] during indexing, written as byte[] into the terms dictionary,
and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
searching.
* LUCENE-1458, LUCENE-2111: AtomicReader now directly exposes its
deleted docs (getDeletedDocs), providing a new Bits interface to
directly query by doc ID.
* LUCENE-2691: IndexWriter.getReader() has been made package local and is now
exposed via open and reopen methods on DirectoryReader. The semantics of the
call is the same as it was prior to the API change.
(Grant Ingersoll, Mike McCandless)
* LUCENE-2566: QueryParser: Unary operators +,-,! will not be treated as
operators if they are followed by whitespace. (yonik)
* LUCENE-2831: Weight#scorer, Weight#explain, Filter#getDocIdSet,
Collector#setNextReader & FieldComparator#setNextReader now expect an
AtomicReaderContext instead of an IndexReader. (Simon Willnauer)
* LUCENE-2892: Add QueryParser.newFieldQuery (called by getFieldQuery by
default) which takes Analyzer as a parameter, for easier customization by
subclasses. (Robert Muir)
* LUCENE-2953: In addition to changes in 3.x, PriorityQueue#initialize(int)
function was moved into the ctor. (Uwe Schindler, Yonik Seeley)
* LUCENE-3219: SortField type properties have been moved to an enum
SortField.Type. In be consistent, CachedArrayCreator.getSortTypeID() has
been changed CachedArrayCreator.getSortType(). (Chris Male)
* LUCENE-3225: Add TermsEnum.seekExact for faster seeking when you
don't need the ceiling term; renamed existing seek methods to either
seekCeil or seekExact; changed seekExact(ord) to return no value.
Fixed MemoryCodec and SimpleTextCodec to optimize the seekExact
case, and fixed places in Lucene to use seekExact when possible.
(Mike McCandless)
* LUCENE-1536: Filter.getDocIdSet() now takes an acceptDocs Bits interface (like
Scorer) limiting the documents that can appear in the returned DocIdSet.
Filters are now required to respect these acceptDocs, otherwise deleted documents
may get returned by searches. Most filters will pass these Bits down to DocsEnum,
but those, e.g. working on FieldCache, may need to use BitsFilteredDocIdSet.wrap()
to exclude them.
(Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley,
Jason Rutherglen, Paul Elschot)
* LUCENE-3722: Similarity methods and collection/term statistics now take
long instead of int (to enable distributed scoring of > 2B docs).
(Yonik Seeley, Andrzej Bialecki, Robert Muir)
* LUCENE-3761: Generalize SearcherManager into an abstract ReferenceManager.
SearcherManager remains a concrete class, but due to the refactoring, the
method maybeReopen has been deprecated in favor of maybeRefresh().
(Shai Erera, Mike McCandless, Simon Willnauer)
* LUCENE-3859: AtomicReader.hasNorms(field) is deprecated, instead you
can inspect the FieldInfo yourself to see if norms are present, which
also allows you to get the type. (Robert Muir)
* LUCENE-2606: Changed RegexCapabilities interface to fix thread
safety, serialization, and performance problems. If you have
written a custom RegexCapabilities it will need to be updated
to the new API. (Robert Muir, Uwe Schindler)
* LUCENE-2638 MakeHighFreqTerms.TermStats public to make it more useful
for API use. (Andrzej Bialecki)
* LUCENE-2912: The field-specific hashmaps in SweetSpotSimilarity were removed.
Instead, use PerFieldSimilarityWrapper to return different SweetSpotSimilaritys
for different fields, this way all parameters (such as TF factors) can be
customized on a per-field basis. (Robert Muir)
* LUCENE-3308: DuplicateFilter keepMode and processingMode have been converted to
enums DuplicateFilter.KeepMode and DuplicateFilter.ProcessingMode respectively.
* LUCENE-3483: Move Function grouping collectors from Solr to grouping module.
(Martijn van Groningen)
* LUCENE-3606: FieldNormModifier was deprecated, because IndexReader's
setNorm() was deprecated. Furthermore, this class is broken, as it does
not take position overlaps into account while recalculating norms.
(Uwe Schindler, Robert Muir)
* LUCENE-3936: Renamed StringIndexDocValues to DocTermsIndexDocValues.
(Martijn van Groningen)
* LUCENE-1768: Deprecated Parametric(Range)QueryNode, RangeQueryNode(Builder),
ParametricRangeQueryNodeProcessor were removed. (Vinicius Barros via Uwe Schindler)
* LUCENE-3820: Deprecated constructors accepting pattern matching bounds. The input
is buffered and matched in one pass. (Dawid Weiss)
* LUCENE-2413: Deprecated PatternAnalyzer in common/miscellaneous, in favor
of the pattern package (CharFilter, Tokenizer, TokenFilter). (Robert Muir)
* LUCENE-2413: Removed the AnalyzerUtil in common/miscellaneous. (Robert Muir)
* LUCENE-1370: Added ShingleFilter option to output unigrams if no shingles
can be generated. (Chris Harris via Steven Rowe)
* LUCENE-2514, LUCENE-2551: JDK and ICU CollationKeyAnalyzers were changed to
use pure byte keys when Version >= 4.0. This cuts sort key size approximately
in half. (Robert Muir)
* LUCENE-3400: Removed DutchAnalyzer.setStemDictionary (Chris Male)
* LUCENE-3431: Removed QueryAutoStopWordAnalyzer.addStopWords* deprecated methods
since they prevented reuse. Stopwords are now generated at instantiation through
the Analyzer's constructors. (Chris Male)
* LUCENE-3434: Removed ShingleAnalyzerWrapper.set* and PerFieldAnalyzerWrapper.addAnalyzer
since they prevent reuse. Both Analyzers should be configured at instantiation.
(Chris Male)
* LUCENE-3765: Stopset ctors that previously took Set<?> or Map<?,String> now take
CharArraySet and CharArrayMap respectively. Previously the behavior was confusing,
and sometimes different depending on the type of set, and ultimately a CharArraySet
or CharArrayMap was always used anyway. (Robert Muir)
* LUCENE-3830: Switched to NormalizeCharMap.Builder to create
immutable instances of NormalizeCharMap. (Dawid Weiss, Mike
McCandless)
* LUCENE-4063: FrenchLightStemmer no longer deletes repeated digits.
(Tanguy Moal via Steve Rowe)
* LUCENE-4122: Replace Payload with BytesRef. (Andrzej Bialecki)
* LUCENE-4132: IndexWriter.getConfig() now returns a LiveIndexWriterConfig object
which can be used to change the IndexWriter's live settings. IndexWriterConfig
is used only for initializing the IndexWriter. (Shai Erera)
* LUCENE-3866: IndexReaderContext.leaves() is now the preferred way to access
atomic sub-readers of any kind of IndexReader (for AtomicReaders it returns
itself as only leaf with docBase=0). (Uwe Schindler)
New features
* LUCENE-2604: Added RegexpQuery support to QueryParser. Regular expressions
are directly supported by the standard queryparser via
fieldName:/expression/ OR /expression against default field/
Users who wish to search for literal "/" characters are advised to
backslash-escape or quote those characters as needed.
(Simon Willnauer, Robert Muir)
* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that
matches terms against a finite-state machine. Implement WildcardQuery
and FuzzyQuery with finite-state methods. Adds RegexpQuery.
(Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
* LUCENE-3662: Add support for levenshtein distance with transpositions
to LevenshteinAutomata, FuzzyTermsEnum, and DirectSpellChecker.
(Jean-Philippe Barrette-LaPierre, Robert Muir)
* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
representation for the in-memory terms dict index. (Mike
McCandless)
* LUCENE-2126: Add new classes for data (de)serialization: DataInput
and DataOutput. IndexInput and IndexOutput extend these new classes.
(Michael Busch)
* LUCENE-1458, LUCENE-2111: With flexible indexing it is now possible
for an application to create its own postings codec, to alter how
fields, terms, docs and positions are encoded into the index. The
standard codec is the default codec. IndexWriter accepts a Codec
class to obtain codecs for newly written segments.
* LUCENE-1458, LUCENE-2111: Some experimental codecs have been added
for flexible indexing, including pulsing codec (inlines
low-frequency terms directly into the terms dict, avoiding seeking
for some queries), sep codec (stores docs, freqs, positions, skip
data and payloads in 5 separate files instead of the 2 used by
standard codec), and int block (really a "base" for using
block-based compressors like PForDelta for storing postings data).
* LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard
codec is more RAM efficient: terms data is stored as block byte
arrays and packed integers. Net RAM reduction for indexes that have
many unique terms should be substantial, and initial open time for
IndexReaders should be faster. These gains only apply for newly
written segments after upgrading.
* LUCENE-1458, LUCENE-2111: Terms data are now buffered directly as
byte[] during indexing, which uses half the RAM for ascii terms (and
also numeric fields). This can improve indexing throughput for
applications that have many unique terms, since it reduces how often
a new segment must be flushed given a fixed RAM buffer size.
* LUCENE-2489: Added PerFieldCodecWrapper (in oal.index.codecs) which
lets you set the Codec per field (Mike McCandless)
* LUCENE-2373: Extend Codec to use SegmentInfosWriter and
SegmentInfosReader to allow customization of SegmentInfos data.
(Andrzej Bialecki)
* LUCENE-2504: FieldComparator.setNextReader now returns a
FieldComparator instance. You can "return this", to just reuse the
same instance, or you can return a comparator optimized to the new
segment. (yonik, Mike McCandless)
* LUCENE-2648: PackedInts.Iterator now supports to advance by more than a
single ordinal. (Simon Willnauer)
* LUCENE-2649: Objects in the FieldCache can optionally store Bits
that mark which docs have real values in the native[] (ryan)
* LUCENE-2664: Add SimpleText codec, which stores all terms/postings
data in a single text file for transparency (at the expense of poor
performance). (Sahin Buyrukbilen via Mike McCandless)
* LUCENE-2589: Add a VariableSizedIntIndexInput, which, when used w/
Sep*, makes it simple to take any variable sized int block coders
(like Simple9/16) and use them in a codec. (Mike McCandless)
* LUCENE-2597: Add oal.index.SlowCompositeReaderWrapper, to wrap a
composite reader (eg MultiReader or DirectoryReader), making it
pretend it's an atomic reader. This is a convenience class (you can
use MultiFields static methods directly, instead) if you need to use
the flex APIs directly on a composite reader. (Mike McCandless)
* LUCENE-2690: MultiTermQuery boolean rewrites per segment.
(Uwe Schindler, Robert Muir, Mike McCandless, Simon Willnauer)
* LUCENE-996: The QueryParser now accepts mixed inclusive and exclusive
bounds for range queries. Example: "{3 TO 5]"
QueryParser subclasses that overrode getRangeQuery will need to be changed
to use the new getRangeQuery method. (Andrew Schurman, Mark Miller, yonik)
* LUCENE-2742: Add native per-field postings format support. Codec lets you now
register a postings format for each field and which is in turn recorded
into the index. Postings formats are maintained on a per-segment basis and be
resolved without knowing the actual postings format used for writing the segment.
(Simon Willnauer)
* LUCENE-2741: Add support for multiple codecs that use the same file
extensions within the same segment. Codecs now use their per-segment codec
ID in the file names. (Simon Willnauer)
* LUCENE-2843: Added a new terms index impl,
VariableGapTermsIndexWriter/Reader, that accepts a pluggable
IndexTermSelector for picking which terms should be indexed in the
terms dict. This impl stores the indexed terms in an FST, which is
much more RAM efficient than FixedGapTermsIndex. (Mike McCandless)
* LUCENE-2862: Added TermsEnum.totalTermFreq() and
Terms.getSumTotalTermFreq(). (Mike McCandless, Robert Muir)
* LUCENE-3290: Added Terms.getSumDocFreq() (Mike McCandless, Robert Muir)
* LUCENE-3003: Added new expert class oal.index.DocTermsOrd,
refactored from Solr's UnInvertedField, for accessing term ords for
multi-valued fields, per document. This is similar to FieldCache in
that it inverts the index to compute the ords, but differs in that
it's able to handle multi-valued fields and does not hold the term
bytes in RAM. (Mike McCandless)
* LUCENE-3108, LUCENE-2935, LUCENE-2168, LUCENE-1231: Changes from
DocValues (ColumnStrideFields):
- IndexWriter now supports typesafe dense per-document values stored in
a column like storage. DocValues are stored on a per-document
basis where each documents field can hold exactly one value of a given
type. DocValues are provided via Fieldable and can be used in
conjunction with stored and indexed values.
- DocValues provides an entirely RAM resident document id to value
mapping per field as well as a DocIdSetIterator based disk-resident
sequential access API relying on filesystem-caches.
- Both APIs are exposed via IndexReader and the Codec / Flex API allowing
expert users to integrate customized DocValues reader and writer
implementations by extending existing Codecs.
- DocValues provides implementations for primitive datatypes like int,
long, float, double and arrays of byte. Byte based implementations further
provide storage variants like straight or dereferenced stored bytes, fixed
and variable length bytes as well as index time sorted based on
user-provided comparators.
(Mike McCandless, Simon Willnauer)
* LUCENE-3209: Added MemoryCodec, which stores all terms & postings in
RAM as an FST; this is good for primary-key fields if you frequently
need to lookup by that field or perform deletions against it, for
example in a near-real-time setting. (Mike McCandless)
* SOLR-2533: Added support for rewriting Sort and SortFields using an
IndexSearcher. SortFields can have SortField.REWRITEABLE type which
requires they are rewritten before they are used. (Chris Male)
* LUCENE-3203: FSDirectory can now limit the max allowed write rate
(MB/sec) of all running merges, to reduce impact ongoing merging has
on searching, NRT reopen time, etc. (Mike McCandless)
* LUCENE-2793: Directory#createOutput & Directory#openInput now accept an
IOContext instead of a buffer size to allow low level optimizations for
different usecases like merging, flushing and reading.
(Simon Willnauer, Mike McCandless, Varun Thacker)
* LUCENE-3354: FieldCache can cache DocTermOrds. (Martijn van Groningen)
* LUCENE-3376: ReusableAnalyzerBase has been moved from modules/analysis/common
into lucene/src/java/org/apache/lucene/analysis (Chris Male)
* LUCENE-3423: add Terms.getDocCount(), which returns the number of documents
that have at least one term for a field. (Yonik Seeley, Robert Muir)
* LUCENE-2959: Added a variety of different relevance ranking systems to Lucene.
- Added Okapi BM25, Language Models, Divergence from Randomness, and
Information-Based Models. The models are pluggable, support all of lucene's
features (boosts, slops, explanations, etc) and queries (spans, etc).
- All models default to the same index-time norm encoding as
DefaultSimilarity, so you can easily try these out/switch back and
forth/run experiments and comparisons without reindexing. Note: most of
the models do rely upon index statistics that are new in Lucene 4.0, so
for existing 3.x indexes it's a good idea to upgrade your index to the
new format with IndexUpgrader first.
- Added a new subclass SimilarityBase which provides a simplified API
for plugging in new ranking algorithms without dealing with all of the
nuances and implementation details of Lucene.
- For example, to use BM25 for all fields:
searcher.setSimilarity(new BM25Similarity());
If you instead want to apply different similarities (e.g. ones with
different parameter values or different algorithms entirely) to different
fields, implement PerFieldSimilarityWrapper with your per-field logic.
(David Mark Nemeskey via Robert Muir)
* LUCENE-3396: ReusableAnalyzerBase now provides a ReuseStrategy abstraction
which controls how TokenStreamComponents are reused per request. Two
implementations are provided - GlobalReuseStrategy which implements the
current behavior of sharing components between all fields, and
PerFieldReuseStrategy which shares per field. (Chris Male)
* LUCENE-2309: Added IndexableField.tokenStream(Analyzer) which is now
responsible for creating the TokenStreams for Fields when they are to
be indexed. (Chris Male)
* LUCENE-3433: Added random access for non RAM resident IndexDocValues. RAM
resident and disk resident IndexDocValues are now exposed via the Source
interface. ValuesEnum has been removed in favour of Source. (Simon Willnauer)
* LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements
a new bits() method, returning all documents in a random access way. If the
DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer
as replacement for AtomicReader's live docs.
In addition, FilteredQuery backs now IndexSearcher's filtering search methods.
Using FilteredQuery you can chain Filters in a very performant way
[new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not
possible with IndexSearcher's methods. FilteredQuery also allows to override
the heuristics used to decide if filtering should be done random access or
using a conjunction on DocIdSet's iterator().
(Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley,
Jason Rutherglen, Paul Elschot)
* LUCENE-3638: Added sugar methods to IndexReader and IndexSearcher to
load only certain fields when loading a document. (Peter Chang via
Mike McCandless)
* LUCENE-3628: Norms are represented as DocValues. AtomicReader exposes
a #normValues(String) method to obtain norms per field. (Simon Willnauer)
* LUCENE-3687: Similarity#computeNorm(FieldInvertState, Norm) allows to compute
norm values or arbitrary precision. Instead of returning a fixed single byte
value, custom similarities can now set a integer, float or byte value to the
given Norm object. (Simon Willnauer)
* LUCENE-2604, LUCENE-4103: Added RegexpQuery support to contrib/queryparser.
(Simon Willnauer, Robert Muir, Daniel Truemper)
* LUCENE-2373: Added a Codec implementation that works with append-only
filesystems (such as e.g. Hadoop DFS). SegmentInfos writing/reading
code is refactored to support append-only FS, and to allow for future
customization of per-segment information. (Andrzej Bialecki)
* LUCENE-2479: Added ability to provide a sort comparator for spelling suggestions along
with two implementations. The existing comparator (score, then frequency) is the default (Grant Ingersoll)
* LUCENE-2608: Added the ability to specify the accuracy at method time in the SpellChecker. The per class
method is also still available. (Grant Ingersoll)
* LUCENE-2507: Added DirectSpellChecker, which retrieves correction candidates directly
from the term dictionary using levenshtein automata. (Robert Muir)
* LUCENE-3527: Add LuceneLevenshteinDistance, which computes string distance in a compatible
way as DirectSpellChecker. This can be used to merge top-N results from more than one
SpellChecker. (James Dyer via Robert Muir)
* LUCENE-3496: Support grouping by DocValues. (Martijn van Groningen)
* LUCENE-2795: Generified DirectIOLinuxDirectory to work across any
unix supporting the O_DIRECT flag when opening a file (tested on
Linux and OS X but likely other Unixes will work), and improved it
so it can be used for indexing and searching. The directory uses
direct IO when doing large merges to avoid unnecessarily evicting
cached IO pages due to large merges. (Varun Thacker, Mike
McCandless)
* LUCENE-3827: DocsAndPositionsEnum from MemoryIndex implements
start/endOffset, if offsets are indexed. (Alan Woodward via Mike
McCandless)
* LUCENE-3802, LUCENE-3856: Support for grouped faceting. (Martijn van Groningen)
* LUCENE-3444: Added a second pass grouping collector that keeps track of distinct
values for a specified field for the top N group. (Martijn van Groningen)
* LUCENE-3778: Added a grouping utility class that makes it easier to use result
grouping for pure Lucene apps. (Martijn van Groningen)
* LUCENE-2341: A new analysis/ filter: Morfologik - a dictionary-driven lemmatizer
(accurate stemmer) for Polish (includes morphosyntactic annotations).
(Michał Dybizbański, Dawid Weiss)
* LUCENE-2413: Consolidated Lucene/Solr analysis components into analysis/common.
New features from Solr now available to Lucene users include:
- o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms
and phrases.
- o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML
constructs.
- o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words
into subwords and performs optional transformations on subword groups.
- o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which
filters out Tokens at the same position and Term text as the previous token.
- o.a.l.analysis.miscellaneous.TrimFilter: Trims leading and trailing whitespace
from Tokens in the stream.
- o.a.l.analysis.miscellaneous.KeepWordFilter: A TokenFilter that only keeps tokens
with text contained in the required words (inverse of StopFilter).
- o.a.l.analysis.miscellaneous.HyphenatedWordsFilter: A TokenFilter that puts
hyphenated words broken into two lines back together.
- o.a.l.analysis.miscellaneous.CapitalizationFilter: A TokenFilter that applies
capitalization rules to tokens.
- o.a.l.analysis.pattern: Package for pattern-based analysis, containing a
CharFilter, Tokenizer, and TokenFilter for transforming text with regexes.
- o.a.l.analysis.synonym.SynonymFilter: A synonym filter that supports multi-word
synonyms.
- o.a.l.analysis.phonetic: Package for phonetic search, containing various
phonetic encoders such as Double Metaphone.
Some existing analysis components changed packages:
- o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer
- o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer
- o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer
- o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter
- o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer
- o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer
- o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer
- o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter
- o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer
- o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer
- o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter
- o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter
- o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter
- o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter
- o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter
- o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper
- o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter
- o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter
- o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter
- o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter
- o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap
- o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet
- o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap
- o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase
- o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase
- o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
- o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
- o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
All analyzers in contrib/analyzers and contrib/icu were moved to the
analysis/ module. The 'smartcn' and 'stempel' components now depend on 'common'.
(Chris Male, Robert Muir)
* LUCENE-4004: Add DisjunctionMaxQuery support to the xml query parser.
(Benson Margulies via Robert Muir)
* LUCENE-4025: Add maybeRefreshBlocking to ReferenceManager, to let a caller
block until the refresh logic has been executed. (Shai Erera, Mike McCandless)
* LUCENE-4039: Add AddIndexesTask to benchmark, which uses IW.addIndexes.
(Shai Erera)
* LUCENE-3514: Added IndexSearcher.searchAfter when Sort is used,
returning results after a specified FieldDoc for deep
paging. (Mike McCandless)
* LUCENE-4043: Added scoring support via score mode for query time joining.
(Martijn van Groningen, Mike McCandless)
* LUCENE-3523: Added oal.search.spell.WordBreakSpellChecker, which
generates suggestions by combining two or more terms and/or
breaking terms into multiple words. See Javadocs for usage. (James Dyer)
* LUCENE-4019: Added improved parsing of Hunspell Dictionaries so those
rules missing the required number of parameters either ignored or
cause a ParseException (depending on whether strict parsing is enabled).
(Luca Cavanna via Chris Male)
* LUCENE-3440: Add ordered fragments feature with IDF-weighted terms for FVH.
(Sebastian Lutze via Koji Sekiguchi)
* LUCENE-4082: Added explain to ToParentBlockJoinQuery.
(Christoph Kaser, Martijn van Groningen)
* LUCENE-4108: add replaceTaxonomy to DirectoryTaxonomyWriter, which replaces
the taxonomy in place with the given one. (Shai Erera)
* LUCENE-3030: new BlockTree terms dictionary (used by the default
Lucene40 postings format) uses less RAM (for the terms index) and
disk space (for all terms and metadata) and gives sizable
performance gains for terms dictionary intensive operations like
FuzzyQuery, direct spell checker and primary-key lookup (Mike
McCandless).
Optimizations
* LUCENE-2588: Don't store unnecessary suffixes when writing the terms
index, saving RAM in IndexReader; change default terms index
interval from 128 to 32, because the terms index now requires much
less RAM. (Robert Muir, Mike McCandless)
* LUCENE-2669: Optimize NumericRangeQuery.NumericRangeTermsEnum to
not seek backwards when a sub-range has no terms. It now only seeks
when the current term is less than the next sub-range's lower end.
(Uwe Schindler, Mike McCandless)
* LUCENE-2694: Optimize MultiTermQuery to be single pass for Term lookups.
MultiTermQuery now stores TermState per leaf reader during rewrite to re-
seek the term dictionary in TermQuery / TermWeight.
(Simon Willnauer, Mike McCandless, Robert Muir)
* LUCENE-3292: IndexWriter no longer shares the same SegmentReader
instance for merging and NRT readers, which enables directory impls
to separately tune IO flags for each. (Varun Thacker, Simon
Willnauer, Mike McCandless)
* LUCENE-3328: BooleanQuery now uses a specialized ConjunctionScorer if all
boolean clauses are required and instances of TermQuery.
(Simon Willnauer, Robert Muir)
* LUCENE-3643: FilteredQuery and IndexSearcher.search(Query, Filter,...)
now optimize the special case query instanceof MatchAllDocsQuery to
execute as ConstantScoreQuery. (Uwe Schindler)
* LUCENE-3509: Added fasterButMoreRam option for docvalues. This option controls whether the space for packed ints
should be rounded up for better performance. This option only applies for docvalues types bytes fixed sorted
and bytes var sorted. (Simon Willnauer, Martijn van Groningen)
* LUCENE-3795: Replace contrib/spatial with modules/spatial. This includes
a basic spatial strategy interface. (David Smiley, Chris Male, ryan)
* LUCENE-3932: Lucene3x codec loads terms index faster, by
pre-allocating the packed ints array based on the .tii file size
(Sean Bridges via Mike McCandless)
* LUCENE-3468: Replaced last() and remove() with pollLast() in
FirstPassGroupingCollector (Martijn van Groningen)
* LUCENE-3830: Changed MappingCharFilter/NormalizeCharMap to use an
FST under the hood, which requires less RAM. NormalizeCharMap no
longer accepts empty string match (it did previously, but ignored
it). (Dawid Weiss, Mike McCandless)
* LUCENE-4061: improve synchronization in DirectoryTaxonomyWriter.addCategory
and few general improvements to DirectoryTaxonomyWriter.
(Shai Erera, Gilad Barkai)
* LUCENE-4062: Add new aligned packed bits impls for faster lookup
performance; add float acceptableOverheadRatio to getWriter and
getMutable API to give packed ints freedom to pick faster
implementations (Adrien Grand via Mike McCandless)
* LUCENE-2357: Reduce transient RAM usage when merging segments in
IndexWriter. (Adrien Grand)
* LUCENE-4098: Add bulk get/set methods to PackedInts (Adrien Grand
via Mike McCandless)
* LUCENE-4156: DirectoryTaxonomyWriter.getSize is no longer synchronized.
(Shai Erera, Sivan Yogev)
* LUCENE-4163: Improve concurrency of MMapIndexInput.clone() by using
the new WeakIdentityMap on top of a ConcurrentHashMap to manage
the cloned instances. WeakIdentityMap was extended to support
iterating over its keys. (Uwe Schindler)
Bug fixes
* LUCENE-2803: The FieldCache can miss values if an entry for a reader
with more document deletions is requested before a reader with fewer
deletions, provided they share some segments. (yonik)
* LUCENE-2645: Fix false assertion error when same token was added one
after another with 0 posIncr. (David Smiley, Kurosaka Teruhiko via Mike
McCandless)
* LUCENE-3348: Fix thread safety hazards in IndexWriter that could
rarely cause deletions to be incorrectly applied. (Yonik Seeley,
Simon Willnauer, Mike McCandless)
* LUCENE-3515: Fix terrible merge performance versus 3.x, especially
when the directory isn't MMapDirectory, due to failing to reuse
DocsAndPositionsEnum while merging (Marc Sturlese, Erick Erickson,
Robert Muir, Simon Willnauer, Mike McCandless)
* LUCENE-3589: BytesRef copy(short) didn't set length.
(Peter Chang via Robert Muir)
* LUCENE-3045: fixed QueryNodeImpl.containsTag(String key) that was
not lowercasing the key before checking for the tag (Adriano Crestani)
* LUCENE-3890: Fixed NPE for grouped faceting on multi-valued fields.
(Michael McCandless, Martijn van Groningen)
* LUCENE-2945: Fix hashCode/equals for surround query parser generated queries.
(Paul Elschot, Simon Rosenthal, gsingers via ehatcher)
* LUCENE-3971: MappingCharFilter could return invalid final token position.
(Dawid Weiss, Robert Muir)
* LUCENE-3820: PatternReplaceCharFilter could return invalid token positions.
(Dawid Weiss)
* LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in
CompoundWordTokenFilterBase, PatternTokenizer, PositionFilter,
SnowballFilter, PathHierarchyTokenizer, ReversePathHierarchyTokenizer,
WikipediaTokenizer, and KeywordTokenizer. ShingleFilter and
CommonGramsFilter now populate PositionLengthAttribute. Fixed
PathHierarchyTokenizer to reset() all state. Protect against AIOOBE in
ReversePathHierarchyTokenizer if skip is large. Fixed wrong final
offset calculation in PathHierarchyTokenizer.
(Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-4060: Fix a synchronization bug in
DirectoryTaxonomyWriter.addTaxonomies(). Also, the method has been renamed to
addTaxonomy and now takes only one Directory and one OrdinalMap.
(Shai Erera, Gilad Barkai)
* LUCENE-3590: Fix AIOOBE in BytesRef/CharsRef copyBytes/copyChars when
offset is nonzero, fix off-by-one in CharsRef.subSequence, and fix
CharsRef's CharSequence methods to throw exceptions in boundary cases
to properly meet the specification. (Robert Muir)
* LUCENE-4084: Attempting to reuse a single IndexWriterConfig instance
across more than one IndexWriter resulted in a cryptic exception.
This is now fixed, but requires that certain members of
IndexWriterConfig (MergePolicy, FlushPolicy,
DocumentsWriterThreadPool) implement clone. (Robert Muir, Simon
Willnauer, Mike McCandless)
* LUCENE-4079: Fixed loading of Hunspell dictionaries that use aliasing (AF rules)
(Ludovic Boutros via Chris Male)
* LUCENE-4077: Expose the max score and per-group scores from
ToParentBlockJoinCollector (Christoph Kaser, Mike McCandless)
* LUCENE-4114: Fix int overflow bugs in BYTES_FIXED_STRAIGHT and
BYTES_FIXED_DEREF doc values implementations (Walt Elder via Mike McCandless).
* LUCENE-4147: Fixed thread safety issues when rollback() and commit()
are called simultaneously. (Simon Willnauer, Mike McCandless)
* LUCENE-4165: Removed closing of the Reader used to read the affix file in
HunspellDictionary. Consumers are now responsible for closing all InputStreams
once the Dictionary has been instantiated. (Torsten Krah, Uwe Schindler, Chris Male)
Documentation
* LUCENE-3958: Javadocs corrections for IndexWriter.
(Iulius Curt via Robert Muir)
Build
* LUCENE-4047: Cleanup of LuceneTestCase: moved blocks of initialization/ cleanup
code into JUnit instance and class rules. (Dawid Weiss)
* LUCENE-4016: Require ANT 1.8.2+ for the build.
* LUCENE-3808: Refactoring of testing infrastructure to use randomizedtesting
package: http://labs.carrotsearch.com/randomizedtesting.html (Dawid Weiss)
* LUCENE-3964: Added target stage-maven-artifacts, which stages
Maven release artifacts to a Maven staging repository in preparation
for release. (Steve Rowe)
* LUCENE-2845: Moved contrib/benchmark to lucene/benchmark.
* LUCENE-2995: Moved contrib/spellchecker into lucene/suggest.
* LUCENE-3285: Moved contrib/queryparser into lucene/queryparser
* LUCENE-3285: Moved contrib/xml-query-parser's demo into lucene/demo
* LUCENE-3271: Moved contrib/queries BooleanFilter, BoostingQuery,
ChainedFilter, FilterClause and TermsFilter into lucene/queries
* LUCENE-3381: Moved contrib/queries regex.*, DuplicateFilter,
FuzzyLikeThisQuery and SlowCollated* into lucene/sandbox.
Removed contrib/queries.
* LUCENE-3286: Moved remainder of contrib/xml-query-parser to lucene/queryparser.
Classes now found at org.apache.lucene.queryparser.xml.*
* LUCENE-4059: Improve ANT task prepare-webpages (used by documentation
tasks) to correctly encode build file names as URIs for later processing by
XSL. (Greg Bowyer, Uwe Schindler)
======================= Lucene 3.6.2 =======================
Bug Fixes
* LUCENE-4234: Exception when FacetsCollector is used with ScoreFacetRequest,
and the number of matching documents is too large. (Gilad Barkai via Shai Erera)
* LUCENE-2686, LUCENE-3505, LUCENE-4401: Fix BooleanQuery scorers to
return correct freq().
(Koji Sekiguchi, Mike McCandless, Liu Chao, Robert Muir)
* LUCENE-2501: Fixed rare thread-safety issue that could cause
ArrayIndexOutOfBoundsException inside ByteBlockPool (Robert Muir,
Mike McCandless)
* LUCENE-4297: BooleanScorer2 would multiply the coord() factor
twice for conjunctions: for most users this is no problem, but
if you had a customized Similarity that returned something other
than 1 when overlap == maxOverlap (always the case for conjunctions),
then the score would be incorrect. (Pascal Chollet, Robert Muir)
* LUCENE-4300: BooleanQuery's rewrite was not always safe: if you
had a custom Similarity where coord(1,1) != 1F, then the rewritten
query would be scored differently. (Robert Muir)
* LUCENE-4398: If you index many different field names in your
documents then due to a bug in how it measures its RAM
usage, IndexWriter would flush each segment too early eventually
reaching the point where it flushes after every doc. (Tim Smith via
Mike McCandless)
* LUCENE-4411: when sampling is enabled for a FacetRequest, its depth
parameter is reset to the default (1), even if set otherwise.
(Gilad Barkai via Shai Erera)
* LUCENE-4635: Fixed ArrayIndexOutOfBoundsException when in-memory
terms index requires more than 2.1 GB RAM (indices with billions of
terms). (Tom Burton-West via Mike McCandless)
Documentation
* LUCENE-4302: Fix facet userguide to have HTML loose doctype like
all other javadocs. (Karl Nicholas via Uwe Schindler)
======================= Lucene 3.6.1 =======================
More information about this release, including any errata related to the
release notes, upgrade instructions, or other changes may be found online at:
https://wiki.apache.org/lucene-java/Lucene3.6.1
Bug Fixes
* LUCENE-3969: Throw IAE on bad arguments that could cause confusing
errors in KeywordTokenizer.
(Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-3971: MappingCharFilter could return invalid final token position.
(Dawid Weiss, Robert Muir)
* LUCENE-4023: DisjunctionMaxScorer now implements visitSubScorers().
(Uwe Schindler)
* LUCENE-2566: + - operators allow any amount of whitespace (yonik, janhoy)
* LUCENE-3590: Fix AIOOBE in BytesRef/CharsRef copyBytes/copyChars when
offset is nonzero, fix off-by-one in CharsRef.subSequence, and fix
CharsRef's CharSequence methods to throw exceptions in boundary cases
to properly meet the specification. (Robert Muir)
* LUCENE-4222: TieredMergePolicy.getFloorSegmentMB was returning the
size in bytes not MB (Chris Fuller via Mike McCandless)
API Changes
* LUCENE-4023: Changed the visibility of Scorer#visitSubScorers() to
public, otherwise it's impossible to implement Scorers outside
the Lucene package. (Uwe Schindler)
Optimizations
* LUCENE-4163: Improve concurrency of MMapIndexInput.clone() by using
the new WeakIdentityMap on top of a ConcurrentHashMap to manage
the cloned instances. WeakIdentityMap was extended to support
iterating over its keys. (Uwe Schindler)
Tests
* LUCENE-3873: add MockGraphTokenFilter, testing analyzers with
random graph tokens. (Mike McCandless)
* LUCENE-3968: factor out LookaheadTokenFilter from
MockGraphTokenFilter (Mike McCandless)
======================= Lucene 3.6.0 =======================
More information about this release, including any errata related to the
release notes, upgrade instructions, or other changes may be found online at:
https://wiki.apache.org/lucene-java/Lucene3.6
Changes in backwards compatibility policy
* LUCENE-3594: The protected inner class (never intended to be visible)
FieldCacheTermsFilter.FieldCacheTermsFilterDocIdSet was removed and
replaced by another internal implementation. (Uwe Schindler)
* LUCENE-3620: FilterIndexReader now overrides all methods of IndexReader that
it should (note that some are still not overridden, as they should be
overridden by sub-classes only). In the process, some methods of IndexReader
were made final. This is not expected to affect many apps, since these methods
already delegate to abstract methods, which you had to already override
anyway. (Shai Erera)
* LUCENE-3636: Added SearcherFactory, used by SearcherManager and NRTManager
to create new IndexSearchers. You can provide your own implementation to
warm new searchers, set an ExecutorService, set a custom Similarity, or
even return your own subclass of IndexSearcher. The SearcherWarmer and
ExecutorService parameters on these classes were removed, as they are
subsumed by SearcherFactory. (Shai Erera, Mike McCandless, Robert Muir)
* LUCENE-3644: The expert ReaderFinishedListener api suffered problems (propagated
down to subreaders, but was not called on SegmentReaders, unless they were
the owner of the reader core, and other ambiguities). The API is revised:
You can set ReaderClosedListeners on any IndexReader, and onClose is called
when that reader is closed. SegmentReader has CoreClosedListeners that you
can register to know when a shared reader core is closed.
(Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-3652: The package org.apache.lucene.messages was moved to
contrib/queryparser. If you have used those classes in your code
just add the lucene-queryparser.jar file to your classpath.
(Uwe Schindler)
* LUCENE-3681: FST now stores labels for BYTE2 input type as 2 bytes
instead of vInt; this can make FSTs smaller and faster, but it is a
break in the binary format so if you had built and saved any FSTs
then you need to rebuild them. (Robert Muir, Mike McCandless)
* LUCENE-3679: The expert IndexReader.getFieldNames(FieldOption) API
has been removed and replaced with the experimental getFieldInfos
API. All IndexReader subclasses must implement getFieldInfos.
(Mike McCandless)
* LUCENE-3695: Move confusing add(X) methods out of FST.Builder into
FST.Util. (Robert Muir, Mike McCandless)
* LUCENE-3701: Added an additional argument to the expert FST.Builder
ctor to take FreezeTail, which you can use to (very-expertly) customize
the FST construction process. Pass null if you want the default
behavior. Added seekExact() to FSTEnum, and added FST.save/read
from a File. (Mike McCandless, Dawid Weiss, Robert Muir)
* LUCENE-3712: Removed unused and untested ReaderUtil#subReader methods.
(Uwe Schindler)
* LUCENE-3672: Deprecate Directory.fileModified,
IndexCommit.getTimestamp and .getVersion and
IndexReader.lastModified and getCurrentVersion (Andrzej Bialecki,
Robert Muir, Mike McCandless)
* LUCENE-3760: In IndexReader/DirectoryReader, deprecate static
methods getCurrentVersion and getCommitUserData, and non-static
method getCommitUserData (use getIndexCommit().getUserData()
instead). (Ryan McKinley, Robert Muir, Mike McCandless)
* LUCENE-3867: Deprecate instance creation of RamUsageEstimator, instead
the new static method sizeOf(Object) should be used. As the algorithm
is now using Hotspot(TM) internals (reference size, header sizes,
object alignment), the abstract o.a.l.util.MemoryModel class was
completely removed (without replacement). The new static methods
no longer support String intern-ness checking, interned strings
now count to memory usage as any other Java object.
(Dawid Weiss, Uwe Schindler, Shai Erera)
* LUCENE-3738: All readXxx methods in BufferedIndexInput were made
final. Subclasses should only override protected readInternal /
seekInternal. (Uwe Schindler)
* LUCENE-2599: Deprecated the spatial contrib module, which was buggy and not
well maintained. Lucene 4 includes a new spatial module that replaces this.
(David Smiley, Ryan McKinley, Chris Male)
Changes in Runtime Behavior
* LUCENE-3796, SOLR-3241: Throw an exception if you try to set an index-time
boost on a field that omits norms. Because the index-time boost
is multiplied into the norm, previously your boost would be
silently discarded. (Tomás Fernández Löbbe, Hoss Man, Robert Muir)
* LUCENE-3848: Fix tokenstreams to not produce a stream with an initial
position increment of 0: which is out of bounds (overlapping with a
non-existent previous term). Consumers such as IndexWriter and QueryParser
still check for and silently correct this situation today, but at some point
in the future they may throw an exception. (Mike McCandless, Robert Muir)
* LUCENE-3738: DataInput/DataOutput no longer allow negative vLongs. Negative
vInts are still supported (for index backwards compatibility), but
should not be used in new code. The read method for negative vLongs
was already broken since Lucene 3.1.
(Uwe Schindler, Mike McCandless, Robert Muir)
Security fixes
* LUCENE-3588: Try harder to prevent SIGSEGV on cloned MMapIndexInputs:
Previous versions of Lucene could SIGSEGV the JVM if you try to access
the clone of an IndexInput retrieved from MMapDirectory. This security fix
prevents this as best as it can by throwing AlreadyClosedException
also on clones. (Uwe Schindler, Robert Muir)
API Changes
* LUCENE-3606: IndexReader will be made read-only in Lucene 4.0, so all
methods allowing to delete or undelete documents using IndexReader were
deprecated; you should use IndexWriter now. Consequently
IndexReader.commit() and all open(), openIfChanged(), clone() methods
taking readOnly booleans (or IndexDeletionPolicy instances) were
deprecated. IndexReader.setNorm() is superfluous and was deprecated.
If you have to change per-document boost use CustomScoreQuery.
If you want to dynamically change norms (boost *and* length norm) at
query time, wrap your IndexReader using FilterIndexReader, overriding
FilterIndexReader.norms(). To persist the changes on disk, copy the
FilteredIndexReader to a new index using IndexWriter.addIndexes().
In Lucene 4.0, SimilarityProvider will allow you to customize scoring
using external norms, too. (Uwe Schindler, Robert Muir)
* LUCENE-3735: PayloadProcessorProvider was changed to return a
ReaderPayloadProcessor instead of DirPayloadProcessor. The selection
of the provider to return for the factory is now based on the IndexReader
to be merged. To mimic the old behaviour, just use IndexReader.directory()
for choosing the provider by Directory. (Uwe Schindler)
* LUCENE-3765: Deprecated StopFilter ctor that took ignoreCase, because
in some cases (if the set is a CharArraySet), the argument is ignored.
Deprecated StandardAnalyzer and ClassicAnalyzer ctors that take File,
please use the Reader ctor instead. (Robert Muir)
* LUCENE-3766: Deprecate no-arg ctors of Tokenizer. Tokenizers are
TokenStreams with Readers: tokenizers with null Readers will not be
supported in Lucene 4.0, just use a TokenStream.
(Mike McCandless, Robert Muir)
* LUCENE-3769: Simplified NRTManager by requiring applyDeletes to be
passed to ctor only; if an app needs to mix and match it's free to
create two NRTManagers (one always applying deletes and the other
never applying deletes). (MJB, Shai Erera, Mike McCandless)
* LUCENE-3761: Generalize SearcherManager into an abstract ReferenceManager.
SearcherManager remains a concrete class, but due to the refactoring, the
method maybeReopen has been deprecated in favor of maybeRefresh().
(Shai Erera, Mike McCandless, Simon Willnauer)
* LUCENE-3776: You now acquire/release the IndexSearcher directly from
NRTManager. (Mike McCandless)
New Features
* LUCENE-3593: Added a FieldValueFilter that accepts all documents that either
have at least one or no value at all in a specific field. (Simon Willnauer,
Uwe Schindler, Robert Muir)
* LUCENE-3586: CheckIndex and IndexUpgrader allow you to specify the
specific FSDirectory implementation to use (with the new -dir-impl
command-line option). (Luca Cavanna via Mike McCandless)
* LUCENE-3634: IndexReader's static main method was moved to a new
tool, CompoundFileExtractor, in contrib/misc. (Robert Muir, Mike
McCandless)
* LUCENE-995: The QueryParser now interprets * as an open end for range
queries. Literal asterisks may be represented by quoting or escaping
(i.e. \* or "*") Custom QueryParser subclasses overriding getRangeQuery()
will be passed null for any open endpoint. (Ingo Renner, Adriano
Crestani, yonik, Mike McCandless
* LUCENE-3121: Add sugar reverse lookup (given an output, find the
input mapping to it) for FSTs that have strictly monotonic long
outputs (such as an ord). (Mike McCandless)
* LUCENE-3671: Add TypeTokenFilter that filters tokens based on
their TypeAttribute. (Tommaso Teofili via Uwe Schindler)
* LUCENE-3690,LUCENE-3913: Added HTMLStripCharFilter, a CharFilter that strips
HTML markup. (Steve Rowe)
* LUCENE-3725: Added optional packing to FST building; this uses extra
RAM during building but results in a smaller FST. (Mike McCandless)
* LUCENE-3714: Add top N shortest cost paths search for FST.
(Robert Muir, Dawid Weiss, Mike McCandless)
* LUCENE-3789: Expose MTQ TermsEnum via RewriteMethod for non package private
access (Simon Willnauer)
* LUCENE-3881: Added UAX29URLEmailAnalyzer: a standard analyzer that recognizes
URLs and emails. (Steve Rowe)
Bug fixes
* LUCENE-3595: Fixed FieldCacheRangeFilter and FieldCacheTermsFilter
to correctly respect deletions on reopened SegmentReaders. Factored out
FieldCacheDocIdSet to be a top-level class. (Uwe Schindler, Simon Willnauer)
* LUCENE-3627: Don't let an errant 0-byte segments_N file corrupt the index.
(Ken McCracken via Mike McCandless)
* LUCENE-3630: The internal method MultiReader.doOpenIfChanged(boolean doClone)
was overriding IndexReader.doOpenIfChanged(boolean readOnly), so changing the
contract of the overridden method. This method was renamed and made private.
In ParallelReader the bug was not existent, but the implementation method
was also made private. (Uwe Schindler)
* LUCENE-3641: Fixed MultiReader to correctly propagate readerFinishedListeners
to clones/reopened readers. (Uwe Schindler)
* LUCENE-3642, SOLR-2891, LUCENE-3717: Fixed bugs in CharTokenizer, n-gram tokenizers/filters,
compound token filters, thai word filter, icutokenizer, pattern analyzer,
wikipediatokenizer, and smart chinese where they would create invalid offsets in
some situations, leading to problems in highlighting.
(Max Beutel, Edwin Steiner via Robert Muir)
* LUCENE-3639: TopDocs.merge was incorrectly setting TopDocs.maxScore to
Float.MIN_VALUE when it should be Float.NaN, when there were 0
hits. Improved age calculation in SearcherLifetimeManager, to have
double precision and to compute age to be how long ago the searcher
was replaced with a new searcher (Mike McCandless)
* LUCENE-3658: Corrected potential concurrency issues with
NRTCachingDir, fixed createOutput to overwrite any previous file,
and removed invalid asserts (Robert Muir, Mike McCandless)
* LUCENE-3605: don't sleep in a retry loop when trying to locate the
segments_N file (Robert Muir, Mike McCandless)
* LUCENE-3711: SentinelIntSet with a small initial size can go into
an infinite loop when expanded. This can affect grouping using
TermAllGroupsCollector or TermAllGroupHeadsCollector if instantiated with a
non default small size. (Martijn van Groningen, yonik)
* LUCENE-3727: When writing stored fields and term vectors, Lucene
checks file sizes to detect a bug in some Sun JREs (LUCENE-1282),
however, on some NFS filesystems File.length() could be stale,
resulting in false errors like "fdx size mismatch while indexing".
These checks now use getFilePointer instead to avoid this.
(Jamir Shaikh, Mike McCandless, Robert Muir)
* LUCENE-3816: Fixed problem in FilteredDocIdSet, if null was returned
from the delegate DocIdSet.iterator(), which is allowed to return
null by DocIdSet specification when no documents match.
(Shay Banon via Uwe Schindler)
* LUCENE-3821: SloppyPhraseScorer missed documents that ExactPhraseScorer finds
When phrase query had repeating terms (e.g. "yes no yes")
sloppy query missed documents that exact query matched.
Fixed except when for repeating multiterms (e.g. "yes no yes|no").
(Robert Muir, Doron Cohen)
* LUCENE-3841: Fix CloseableThreadLocal to also purge stale entries on
get(); this fixes certain cases where we were holding onto objects
for dead threads for too long (Matthew Bellew, Mike McCandless)
* LUCENE-3872: IndexWriter.close() now throws IllegalStateException if
you call it after calling prepareCommit() without calling commit()
first. (Tim Bogaert via Mike McCandless)
* LUCENE-3874: Throw IllegalArgumentException from IndexWriter (rather
than producing a corrupt index), if a positionIncrement would cause
integer overflow. This can happen, for example when using a buggy
TokenStream that forgets to call clearAttributes() in combination
with a StopFilter. (Robert Muir)
* LUCENE-3876: Fix bug where positions for a document exceeding
Integer.MAX_VALUE/2 would produce a corrupt index.
(Simon Willnauer, Mike McCandless, Robert Muir)
* LUCENE-3880: UAX29URLEmailTokenizer now recognizes emails when the mailto:
scheme is prepended. (Kai Gülzau, Steve Rowe)
Optimizations
* LUCENE-3653: Improve concurrency in VirtualMethod and AttributeSource by
using a WeakIdentityMap based on a ConcurrentHashMap. (Uwe Schindler,
Gerrit Jansen van Vuuren)
Documentation
* LUCENE-3597: Fixed incorrect grouping documentation. (Martijn van Groningen,
Robert Muir)
* LUCENE-3926: Improve documentation of RAMDirectory, because this
class is not intended to work with huge indexes. Everything beyond
several hundred megabytes will waste resources (GC cycles), because
it uses an internal buffer size of 1024 bytes, producing millions of
byte[1024] arrays. This class is optimized for small memory-resident
indexes. It also has bad concurrency on multithreaded environments.
It is recommended to materialize large indexes on disk and use
MMapDirectory, which is a high-performance directory implementation
working directly on the file system cache of the operating system,
so copying data to Java heap space is not useful. (Uwe Schindler,
Mike McCandless, Robert Muir)
Build
* LUCENE-3857: exceptions from other threads in beforeclass/etc do not fail
the test (Dawid Weiss)
* LUCENE-3847: LuceneTestCase will now check for modifications of System
properties before and after each test (and suite). If changes are detected,
the test will fail. A rule can be used to reset system properties to
before-scope state (and this has been used to make Solr tests pass).
(Dawid Weiss, Uwe Schindler).
* LUCENE-3228: Stop downloading external javadoc package-list files:
- Added package-list files for Oracle Java javadocs and JUnit javadocs to
Lucene/Solr subversion.
- The Oracle Java javadocs package-list file is excluded from Lucene and
Solr source release packages.
- Regardless of network connectivity, javadocs built from a subversion
checkout contain links to Oracle & JUnit javadocs.
- Building javadocs from a source release package will download the Oracle
Java package-list file if it isn't already present.
- When the Oracle Java package-list file is not present and download fails,
the javadocs targets will not fail the build, though an error will appear
in the build log. In this case, the built javadocs will not contain links
to Oracle Java javadocs.
- Links from Solr javadocs to Lucene's javadocs are enabled. When building
a X.Y.Z-SNAPSHOT version, the links are to the most recently built nightly
Jenkins javadocs. When building a release version, links are to the
Lucene release javadocs for the same version.
(Steve Rowe, hossman)
* LUCENE-3753: Restructure the Lucene build system:
- Created a new Lucene-internal module named "core" by moving the java/
and test/ directories from lucene/src/ to lucene/core/src/.
- Eliminated lucene/src/ by moving all its directories up one level.
- Each internal module (core/, test-framework/, and tools/) now has its own
build.xml, from which it is possible to run module-specific targets.
lucene/build.xml delegates all build tasks (via
<ant dir="internal-module-dir"> calls) to these modules' build.xml files.
(Steve Rowe)
* LUCENE-3774: Optimized and streamlined license and notice file validation
by refactoring the build task into an ANT task and modifying build scripts
to perform top-level checks. (Dawid Weiss, Steve Rowe, Robert Muir)
* LUCENE-3762: Upgrade JUnit to 4.10, refactor state-machine of detecting
setUp/tearDown call chaining in LuceneTestCase. (Dawid Weiss, Robert Muir)
* LUCENE-3944: Make the 'generate-maven-artifacts' target use filtered POMs
placed under lucene/build/poms/, rather than in each module's base
directory. The 'clean' target now removes them.
(Steve Rowe, Robert Muir)
* LUCENE-3930: Changed build system to use Apache Ivy for retrival of 3rd
party JAR files. Please review BUILD.txt for instructions.
(Robert Muir, Chris Male, Uwe Schindler, Steven Rowe, Hossman)
======================= Lucene 3.5.0 =======================
Changes in backwards compatibility policy
* LUCENE-3390: The first approach in Lucene 3.4.0 for missing values
support for sorting had a design problem that made the missing value
be populated directly into the FieldCache arrays during sorting,
leading to concurrency issues. To fix this behaviour, the method
signatures had to be changed:
- FieldCache.getUnValuedDocs() was renamed to FieldCache.getDocsWithField()
returning a Bits interface (backported from Lucene 4.0).
- FieldComparator.setMissingValue() was removed and added to
constructor
As this is expert API, most code will not be affected.
(Uwe Schindler, Doron Cohen, Mike McCandless)
* LUCENE-3541: Remove IndexInput's protected copyBuf. If you want to
keep a buffer in your IndexInput, do this yourself in your implementation,
and be sure to do the right thing on clone()! (Robert Muir)
* LUCENE-2822: TimeLimitingCollector now expects a counter clock instead of
relying on a private daemon thread. The global time limiting clock thread
has been exposed and is now lazily loaded and fully optional.
TimeLimitingCollector now supports setting clock baseline manually to include
prelude of a search. Previous versions set the baseline on construction time,
now baseline is set once the first IndexReader is passed to the collector
unless set before. (Simon Willnauer)
Changes in runtime behavior
* LUCENE-3520: IndexReader.openIfChanged, when passed a near-real-time
reader, will now return null if there are no changes. The API has
always reserved the right to do this; it's just that in the past for
near-real-time readers it never did. (Mike McCandless)
Bug fixes
* LUCENE-3412: SloppyPhraseScorer was returning non-deterministic results
for queries with many repeats (Doron Cohen)
* LUCENE-3421: PayloadTermQuery's explain was wrong when includeSpanScore=false.
(Edward Drapkin via Robert Muir)
* LUCENE-3432: IndexWriter.expungeDeletes with TieredMergePolicy
should ignore the maxMergedSegmentMB setting (v.sevel via Mike
McCandless)
* LUCENE-3442: TermQuery.TermWeight.scorer() returns null for non-atomic
IndexReaders (optimization bug, introcuced by LUCENE-2829), preventing
QueryWrapperFilter and similar classes to get a top-level DocIdSet.
(Dan C., Uwe Schindler)
* LUCENE-3390: Corrected handling of missing values when two parallel searches
using different missing values for sorting: the missing value was populated
directly into the FieldCache arrays during sorting, leading to concurrency
issues. (Uwe Schindler, Doron Cohen, Mike McCandless)
* LUCENE-3439: Closing an NRT reader after the writer was closed was
incorrectly invoking the DeletionPolicy and (then possibly deleting
files) on the closed IndexWriter (Robert Muir, Mike McCandless)
* LUCENE-3215: SloppyPhraseScorer sometimes computed Infinite freq
(Robert Muir, Doron Cohen)
* LUCENE-3503: DisjunctionSumScorer would give slightly different scores
for a document depending if you used nextDoc() versus advance().
(Mike McCandless, Robert Muir)
* LUCENE-3529: Properly support indexing an empty field with empty term text.
Previously, if you had assertions enabled you would receive an error during
flush, if you didn't, you would get an invalid index.
(Mike McCandless, Robert Muir)
* LUCENE-2633: PackedInts Packed32 and Packed64 did not support internal
structures larger than 256MB (Toke Eskildsen via Mike McCandless)
* LUCENE-3540: LUCENE-3255 dropped support for pre-1.9 indexes, but the
error message in IndexFormatTooOldException was incorrect. (Uwe Schindler,
Mike McCandless)
* LUCENE-3541: IndexInput's default copyBytes() implementation was not safe
across multiple threads, because all clones shared the same buffer.
(Robert Muir)
* LUCENE-3548: Fix CharsRef#append to extend length of the existing char[]
and preserve existing chars. (Simon Willnauer)
* LUCENE-3582: Normalize NaN values in NumericUtils.floatToSortableInt() /
NumericUtils.doubleToSortableLong(), so this is consistent with stored
fields. Also fix NumericRangeQuery to not falsely hit NaNs on half-open
ranges (one bound is null). Because of normalization, NumericRangeQuery
can now be used to hit NaN values by creating a query with
upper == lower == NaN (inclusive). (Dawid Weiss, Uwe Schindler)
API Changes
* LUCENE-3454: Rename IndexWriter.optimize to forceMerge to discourage
use of this method since it is horribly costly and rarely justified
anymore. MergePolicy.findMergesForOptimize was renamed to
findForcedMerges. IndexReader.isOptimized was
deprecated. IndexCommit.isOptimized was replaced with
getSegmentCount. (Robert Muir, Mike McCandless)
* LUCENE-3205: Deprecated MultiTermQuery.getTotalNumerOfTerms() [and
related methods], as the numbers returned are not useful
for multi-segment indexes. They were only needed for tests of
NumericRangeQuery. (Mike McCandless, Uwe Schindler)
* LUCENE-3574: Deprecate outdated constants in org.apache.lucene.util.Constants
and add new ones for Java 6 and Java 7. (Uwe Schindler)
* LUCENE-3571: Deprecate IndexSearcher(Directory). Use the constructors
that take IndexReader instead. (Robert Muir)
* LUCENE-3577: Rename IndexWriter.expungeDeletes to forceMergeDeletes,
and revamped the javadocs, to discourage
use of this method since it is horribly costly and rarely
justified. MergePolicy.findMergesToExpungeDeletes was renamed to
findForcedDeletesMerges. (Robert Muir, Mike McCandless)
* LUCENE-3464: IndexReader.reopen has been renamed to
IndexReader.openIfChanged (a static method), and now returns null
(instead of the old reader) if there are no changes in the index, to
prevent the common pitfall of accidentally closing the old reader.
New Features
* LUCENE-3448: Added FixedBitSet.and(other/DISI), andNot(other/DISI).
(Uwe Schindler)
* LUCENE-2215: Added IndexSearcher.searchAfter which returns results after a
specified ScoreDoc (e.g. last document on the previous page) to support deep
paging use cases. (Aaron McCurry, Grant Ingersoll, Robert Muir)
* LUCENE-1990: Adds internal packed ints implementation, to be used
for more efficient storage of int arrays when the values are
bounded, for example for storing the terms dict index (Toke
Eskildsen via Mike McCandless)
* LUCENE-3558: Moved SearcherManager, NRTManager & SearcherLifetimeManager into
core. All classes are contained in o.a.l.search. (Simon Willnauer)
Optimizations
* LUCENE-3426: Add NGramPhraseQuery which extends PhraseQuery and tries to
reduce the number of terms of the query when rewrite(), in order to improve
performance. (Robert Muir, Koji Sekiguchi)
* LUCENE-3494: Optimize FilteredQuery to remove a multiply in score()
(Uwe Schindler, Robert Muir)
* LUCENE-3534: Remove filter logic from IndexSearcher and delegate to
FilteredQuery's Scorer. This is a partial backport of a cleanup in
FilteredQuery/IndexSearcher added by LUCENE-1536 to Lucene 4.0.
(Uwe Schindler)
* LUCENE-2205: Very substantial (3-5X) RAM reduction required to hold
the terms index on opening an IndexReader (Aaron McCurry via Mike McCandless)
* LUCENE-3443: FieldCache can now set docsWithField, and create an
array, in a single pass. This results in faster init time for apps
that need both (such as sorting by a field with a missing value).
(Mike McCandless)
Test Cases
* LUCENE-3420: Disable the finalness checks in TokenStream and Analyzer
for implementing subclasses in different packages, where assertions are not
enabled. (Uwe Schindler)
* LUCENE-3506: tests relying on assertions being enabled were no-op because
they ignored AssertionError. With this fix now entire test framework
(every test) fails if assertions are disabled, unless
-Dtests.asserts.gracious=true is specified. (Doron Cohen)
Build
* SOLR-2849: Fix dependencies in Maven POMs. (David Smiley via Steve Rowe)
* LUCENE-3561: Fix maven xxx-src.jar files that were missing resources.
(Uwe Schindler)
======================= Lucene 3.4.0 =======================
Bug fixes
* LUCENE-3251: Directory#copy failed to close target output if opening the
source stream failed. (Simon Willnauer)
* LUCENE-3255: If segments_N file is all zeros (due to file
corruption), don't read that to mean the index is empty. (Gregory
Tarr, Mark Harwood, Simon Willnauer, Mike McCandless)
* LUCENE-3254: Fixed minor bug in deletes were written to disk,
causing the file to sometimes be larger than it needed to be. (Mike
McCandless)
* LUCENE-3224: Fixed a big where CheckIndex would incorrectly report a
corrupt index if a term with docfreq >= 16 was indexed more than once
at the same position. (Robert Muir)
* LUCENE-3339: Fixed deadlock case when multiple threads use the new
block-add (IndexWriter.add/updateDocuments) methods. (Robert Muir,
Mike McCandless)
* LUCENE-3340: Fixed case where IndexWriter was not flushing at
exactly maxBufferedDeleteTerms (Mike McCandless)
* LUCENE-3358, LUCENE-3361: StandardTokenizer and UAX29URLEmailTokenizer
wrongly discarded combining marks attached to Han or Hiragana characters,
this is fixed if you supply Version >= 3.4 If you supply a previous
lucene version, you get the old buggy behavior for backwards compatibility.
(Trejkaz, Robert Muir)
* LUCENE-3368: IndexWriter commits segments without applying their buffered
deletes when flushing concurrently. (Simon Willnauer, Mike McCandless)
* LUCENE-3365: Create or Append mode determined before obtaining write lock
can cause IndexWriter overriding an existing index.
(Geoff Cooney via Simon Willnauer)
* LUCENE-3380: Fixed a bug where FileSwitchDirectory's listAll() would wrongly
throw NoSuchDirectoryException when all files written so far have been
written to one directory, but the other still has not yet been created on the
filesystem. (Robert Muir)
* LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT
SegmentReaders, leading to unused files accumulating in the
Directory. (tal steier via Mike McCandless)
* LUCENE-3418: Lucene was failing to fsync index files on commit,
meaning an operating system or hardware crash, or power loss, could
easily corrupt the index. (Mark Miller, Robert Muir, Mike
McCandless)
New Features
* LUCENE-3290: Added FieldInvertState.numUniqueTerms
(Mike McCandless, Robert Muir)
* LUCENE-3280: Add FixedBitSet, like OpenBitSet but is not elastic
(grow on demand if you set/get/clear too-large indices). (Mike
McCandless)
* LUCENE-2048: Added the ability to omit positions but still index
term frequencies, you can now control what is indexed into
the postings via AbstractField.setIndexOptions:
DOCS_ONLY: only documents are indexed: term frequencies and positions are omitted
DOCS_AND_FREQS: only documents and term frequencies are indexed: positions are omitted
DOCS_AND_FREQS_AND_POSITIONS: full postings: documents, frequencies, and positions
AbstractField.setOmitTermFrequenciesAndPositions is deprecated,
you should use DOCS_ONLY instead. (Robert Muir)
* LUCENE-3097: Added a new grouping collector that can be used to retrieve all most relevant
documents per group. This can be useful in situations when one wants to compute grouping
based facets / statistics on the complete query result. (Martijn van Groningen)
* LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
suppressed exceptions in the original exception, so stack trace
will contain them. (Uwe Schindler)
Optimizations
* LUCENE-3201, LUCENE-3218: CompoundFileSystem code has been consolidated
into a Directory implementation. Reading is optimized for MMapDirectory,
NIOFSDirectory and SimpleFSDirectory to only map requested parts of the
CFS into an IndexInput. Writing to a CFS now tries to append to the CF
directly if possible and merges separately written files on the fly instead
of during close. (Simon Willnauer, Robert Muir)
* LUCENE-3289: When building an FST you can now tune how aggressively
the FST should try to share common suffixes. Typically you can
greatly reduce RAM required during building, and CPU consumed, at
the cost of a somewhat larger FST. (Mike McCandless)
Test Cases
* LUCENE-3327: Fix AIOOBE when TestFSTs is run with -Dtests.verbose=true
(James Dyer via Mike McCandless)
Build
* LUCENE-3406: Add ant target 'package-local-src-tgz' to Lucene and Solr
to package sources from the local working copy.
(Seung-Yeoul Yang via Steve Rowe)
======================= Lucene 3.3.0 =======================
Changes in backwards compatibility policy
* LUCENE-3140: IndexOutput.copyBytes now takes a DataInput (superclass
of IndexInput) as its first argument. (Robert Muir, Dawid Weiss,
Mike McCandless)
* LUCENE-3191: FieldComparator.value now returns an Object not
Comparable; FieldDoc.fields also changed from Comparable[] to
Object[] (Uwe Schindler, Mike McCandless)
* LUCENE-3208: Made deprecated methods Query.weight(Searcher) and
Searcher.createWeight() final to prevent override. If you have
overridden one of these methods, cut over to the non-deprecated
implementation. (Uwe Schindler, Robert Muir, Yonik Seeley)
* LUCENE-3238: Made MultiTermQuery.rewrite() final, to prevent
problems (such as not properly setting rewrite methods, or
not working correctly with things like SpanMultiTermQueryWrapper).
To rewrite to a simpler form, instead return a simpler enum
from getEnum(IndexReader). For example, to rewrite to a single term,
return a SingleTermEnum. (ludovic Boutros, Uwe Schindler, Robert Muir)
Changes in runtime behavior
* LUCENE-2834: the hash used to compute the lock file name when the
lock file is not stored in the index has changed. This means you
will see a different lucene-XXX-write.lock in your lock directory.
(Robert Muir, Uwe Schindler, Mike McCandless)
* LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
does not store norms. (Shai Erera, Mike McCandless)
* LUCENE-3198: On Linux, if the JRE is 64 bit and supports unmapping,
FSDirectory.open now defaults to MMapDirectory instead of
NIOFSDirectory since MMapDirectory gives better performance. (Mike
McCandless)
* LUCENE-3200: MMapDirectory now uses chunk sizes that are powers of 2.
When setting the chunk size, it is rounded down to the next possible
value. The new default value for 64 bit platforms is 2^30 (1 GiB),
for 32 bit platforms it stays unchanged at 2^28 (256 MiB).
Internally, MMapDirectory now only uses one dedicated final IndexInput
implementation supporting multiple chunks, which makes Hotspot's life
easier. (Uwe Schindler, Robert Muir, Mike McCandless)
Bug fixes
* LUCENE-3147,LUCENE-3152: Fixed open file handles leaks in many places in the
code. Now MockDirectoryWrapper (in test-framework) tracks all open files,
including locks, and fails if the test fails to release all of them.
(Mike McCandless, Robert Muir, Shai Erera, Simon Willnauer)
* LUCENE-3102: CachingCollector.replay was failing to call setScorer
per-segment (Martijn van Groningen via Mike McCandless)
* LUCENE-3183: Fix rare corner case where seeking to empty term
(field="", term="") with terms index interval 1 could hit
ArrayIndexOutOfBoundsException (selckin, Robert Muir, Mike
McCandless)
* LUCENE-3208: IndexSearcher had its own private similarity field
and corresponding get/setter overriding Searcher's implementation. If you
setted a different Similarity instance on IndexSearcher, methods implemented
in the superclass Searcher were not using it, leading to strange bugs.
(Uwe Schindler, Robert Muir)
* LUCENE-3197: Fix core merge policies to not over-merge during
background optimize when documents are still being deleted
concurrently with the optimize (Mike McCandless)
* LUCENE-3222: The RAM accounting for buffered delete terms was
failing to measure the space required to hold the term's field and
text character data. (Mike McCandless)
* LUCENE-3238: Fixed bug where using WildcardQuery("prefix*") inside
of a SpanMultiTermQueryWrapper rewrote incorrectly and returned
an error instead. (ludovic Boutros, Uwe Schindler, Robert Muir)
API Changes
* LUCENE-3208: Renamed protected IndexSearcher.createWeight() to expert
public method IndexSearcher.createNormalizedWeight() as this better describes
what this method does. The old method is still there for backwards
compatibility. Query.weight() was deprecated and simply delegates to
IndexSearcher. Both deprecated methods will be removed in Lucene 4.0.
(Uwe Schindler, Robert Muir, Yonik Seeley)
* LUCENE-3197: MergePolicy.findMergesForOptimize now takes
Map<SegmentInfo,Boolean> instead of Set<SegmentInfo> as the second
argument, so the merge policy knows which segments were originally
present vs produced by an optimizing merge (Mike McCandless)
Optimizations
* LUCENE-1736: DateTools.java general improvements.
(David Smiley via Steve Rowe)
New Features
* LUCENE-3140: Added experimental FST implementation to Lucene.
(Robert Muir, Dawid Weiss, Mike McCandless)
* LUCENE-3193: A new TwoPhaseCommitTool allows running a 2-phase commit
algorithm over objects that implement the new TwoPhaseCommit interface (such
as IndexWriter). (Shai Erera)
* LUCENE-3191: Added TopDocs.merge, to facilitate merging results from
different shards (Uwe Schindler, Mike McCandless)
* LUCENE-3179: Added OpenBitSet.prevSetBit (Paul Elschot via Mike McCandless)
* LUCENE-3210: Made TieredMergePolicy more aggressive in reclaiming
segments with deletions; added new methods
set/getReclaimDeletesWeight to control this. (Mike McCandless)
Build
* LUCENE-1344: Create OSGi bundle using dev-tools/maven.
(Nicolas Lalevée, Luca Stancapiano via ryan)
* LUCENE-3204: The maven-ant-tasks jar is now included in the source tree;
users of the generate-maven-artifacts target no longer have to manually
place this jar in the Ant classpath. NOTE: when Ant looks for the
maven-ant-tasks jar, it looks first in its pre-existing classpath, so
any copies it finds will be used instead of the copy included in the
Lucene/Solr source tree. For this reason, it is recommeded to remove
any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under
~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe)
======================= Lucene 3.2.0 =======================
Changes in backwards compatibility policy
* LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing
with generics can lead to ClassCastException. For advanced use (e.g. in Solr)
a method getHeapArray() was added to retrieve the internal heap array as a
non-generic Object[]. (Uwe Schindler, Yonik Seeley)
* LUCENE-1076: IndexWriter.setInfoStream now throws IOException
(Mike McCandless, Shai Erera)
* LUCENE-3084: MergePolicy.OneMerge.segments was changed from
SegmentInfos to a List<SegmentInfo>. SegmentInfos itself was changed
to no longer extend Vector<SegmentInfo> (to update code that is using
Vector-API, use the new asList() and asSet() methods returning unmodifiable
collections; modifying SegmentInfos is now only possible through
the explicitely declared methods). IndexWriter.segString() now takes
Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile
should fix this. MergePolicy and SegmentInfos are internal/experimental
APIs not covered by the strict backwards compatibility policy.
(Uwe Schindler, Mike McCandless)
Changes in runtime behavior
* LUCENE-3065: When a NumericField is retrieved from a Document loaded
from IndexReader (or IndexSearcher), it will now come back as
NumericField not as a Field with a string-ified version of the
numeric value you had indexed. Note that this only applies for
newly-indexed Documents; older indices will still return Field
with the string-ified numeric value. If you call Document.get(),
the value comes still back as String, but Document.getFieldable()
returns NumericField instances. (Uwe Schindler, Ryan McKinley,
Mike McCandless)
* LUCENE-1076: Changed the default merge policy from
LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32
(passed to IndexWriterConfig), which is able to merge non-contiguous
segments. This means docIDs no longer necessarily stay "in order"
during indexing. If this is a problem then you can use either of
the LogMergePolicy impls. (Mike McCandless)
New features
* LUCENE-3082: Added index upgrade tool oal.index.IndexUpgrader
that allows to upgrade all segments to last recent supported index
format without fully optimizing. (Uwe Schindler, Mike McCandless)
* LUCENE-1076: Added TieredMergePolicy which is able to merge non-contiguous
segments, which means docIDs no longer necessarily stay "in order".
(Mike McCandless, Shai Erera)
* LUCENE-3071: Adding ReversePathHierarchyTokenizer, added skip parameter to
PathHierarchyTokenizer (Olivier Favre via ryan)
* LUCENE-1421, LUCENE-3102: added CachingCollector which allow you to cache
document IDs and scores encountered during the search, and "replay" them to
another Collector. (Mike McCandless, Shai Erera)
* LUCENE-3112: Added experimental IndexWriter.add/updateDocuments,
enabling a block of documents to be indexed, atomically, with
guaranteed sequential docIDs. (Mike McCandless)
API Changes
* LUCENE-3061: IndexWriter's getNextMerge() and merge(OneMerge) are now public
(though @lucene.experimental), allowing for custom MergeScheduler
implementations. (Shai Erera)
* LUCENE-3065: Document.getField() was deprecated, as it throws
ClassCastException when loading lazy fields or NumericFields.
(Uwe Schindler, Ryan McKinley, Mike McCandless)
* LUCENE-2027: Directory.touchFile is deprecated and will be removed
in 4.0. (Mike McCandless)
Optimizations
* LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early
on empty or one-element lists/arrays. (Uwe Schindler)
* LUCENE-2897: Apply deleted terms while flushing a segment. We still
buffer deleted terms to later apply to past segments. (Mike McCandless)
* LUCENE-3126: IndexWriter.addIndexes copies incoming segments into CFS if they
aren't already and MergePolicy allows that. (Shai Erera)
Bug fixes
* LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new
indexes, causing existing deletions to be applied on the incoming indexes as
well. (Shai Erera, Mike McCandless)
* LUCENE-3024: Index with more than 2.1B terms was hitting AIOOBE when
seeking TermEnum (eg used by Solr's faceting) (Tom Burton-West, Mike
McCandless)
* LUCENE-3042: When a filter or consumer added Attributes to a TokenStream
chain after it was already (partly) consumed [or clearAttributes(),
captureState(), cloneAttributes(),... was called by the Tokenizer],
the Tokenizer calling clearAttributes() or capturing state after addition
may not do this on the newly added Attribute. This bug affected only
very special use cases of the TokenStream-API, most users would not
have recognized it. (Uwe Schindler, Robert Muir)
* LUCENE-3054: PhraseQuery can in some cases stack overflow in
SorterTemplate.quickSort(). This fix also adds an optimization to
PhraseQuery as term with lower doc freq will also have less positions.
(Uwe Schindler, Robert Muir, Otis Gospodnetic)
* LUCENE-3068: sloppy phrase query failed to match valid documents when multiple
query terms had same position in the query. (Doron Cohen)
* LUCENE-3012: Lucene writes the header now for separate norm files (*.sNNN)
(Robert Muir)
Build
* LUCENE-3006: Building javadocs will fail on warnings by default.
Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
* LUCENE-3128: "ant eclipse" creates a .project file for easier Eclipse
integration (unless one already exists). (Daniel Serodio via Shai Erera)
Test Cases
* LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to
stop iterating if at least 'tests.iter.min' ran and a failure occured.
(Shai Erera, Chris Hostetter)
======================= Lucene 3.1.0 =======================
Changes in backwards compatibility policy
* LUCENE-2719: Changed API of internal utility class
org.apache.lucene.util.SorterTemplate to support faster quickSort using
pivot values and also merge sort and insertion sort. If you have used
this class, you have to implement two more methods for handling pivots.
(Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-1923: Renamed SegmentInfo & SegmentInfos segString method to
toString. These are advanced APIs and subject to change suddenly.
(Tim Smith via Mike McCandless)
* LUCENE-2190: Removed deprecated customScore() and customExplain()
methods from experimental CustomScoreQuery. (Uwe Schindler)
* LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
This means that terms with a position increment gap of zero do not
affect the norms calculation by default. (Robert Muir)
* LUCENE-2320: MergePolicy.writer is now of type SetOnce, which allows setting
the IndexWriter for a MergePolicy exactly once. You can change references to
'writer' from <code>writer.doXYZ()</code> to <code>writer.get().doXYZ()</code>
(it is also advisable to add an <code>assert writer != null;</code> before you
access the wrapped IndexWriter.)
In addition, MergePolicy only exposes a default constructor, and the one that
took IndexWriter as argument has been removed from all MergePolicy extensions.
(Shai Erera via Mike McCandless)
* LUCENE-2328: SimpleFSDirectory.SimpleFSIndexInput is moved to
FSDirectory.FSIndexInput. Anyone extending this class will have to
fix their code on upgrading. (Earwin Burrfoot via Mike McCandless)
* LUCENE-2302: The new interface for term attributes, CharTermAttribute,
now implements CharSequence. This requires the toString() methods of
CharTermAttribute, deprecated TermAttribute, and Token to return only
the term text and no other attribute contents. LUCENE-2374 implements
an attribute reflection API to no longer rely on toString() for attribute
inspection. (Uwe Schindler, Robert Muir)
* LUCENE-2372, LUCENE-2389: StandardAnalyzer, KeywordAnalyzer,
PerFieldAnalyzerWrapper, WhitespaceTokenizer are now final. Also removed
the now obsolete and deprecated Analyzer.setOverridesTokenStreamMethod().
Analyzer and TokenStream base classes now have an assertion in their ctor,
that check subclasses to be final or at least have final implementations
of incrementToken(), tokenStream(), and reusableTokenStream().
(Uwe Schindler, Robert Muir)
* LUCENE-2316: Directory.fileLength contract was clarified - it returns the
actual file's length if the file exists, and throws FileNotFoundException
otherwise. Returning length=0 for a non-existent file is no longer allowed. If
you relied on that, make sure to catch the exception. (Shai Erera)
* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
creation. Previously, if you passed an empty Directory and set OpenMode to
CREATE*, IndexWriter would make a first empty commit. If you need that
behavior you can call writer.commit()/close() immediately after you create it.
(Shai Erera, Mike McCandless)
* LUCENE-2733: Removed public constructors of utility classes with only static
methods to prevent instantiation. (Uwe Schindler)
* LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
takes deletions into account by default. You can disable this by
calling setCalibrateSizeByDeletes(false) on the merge policy. (Mike
McCandless)
* LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
values in multi-valued field has been changed for some cases in index.
If you index empty fields and uses positions/offsets information on that
fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
* LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
(Shai Erera, Robert Muir)
* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
Searchable are collapsed into IndexSearcher; contrib/remote and
MultiSearcher have been removed. (Mike McCandless)
* LUCENE-2854: Deprecated SimilarityDelegator and
Similarity.lengthNorm; the latter is now final, forcing any custom
Similarity impls to cutover to the more general computeNorm (Robert
Muir, Mike McCandless)
* LUCENE-2869: Deprecated Query.getSimilarity: instead of using
"runtime" subclassing/delegation, subclass the Weight instead.
(Robert Muir)
* LUCENE-2674: A new idfExplain method was added to Similarity, that
accepts an incoming docFreq. If you subclass Similarity, make sure
you also override this method on upgrade. (Robert Muir, Mike
McCandless)
Changes in runtime behavior
* LUCENE-1923: Made IndexReader.toString() produce something
meaningful (Tim Smith via Mike McCandless)
* LUCENE-2179: CharArraySet.clear() is now functional.
(Robert Muir, Uwe Schindler)
* LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index
before it adds the new ones. Also, the existing segments are not merged and so
the index will not end up with a single segment (unless it was empty before).
In addition, addIndexesNoOptimize was renamed to addIndexes and no longer
invokes a merge on the incoming and target segments, but instead copies the
segments to the target index. You can call maybeMerge or optimize after this
method completes, if you need to.
In addition, Directory.copyTo* were removed in favor of copy which takes the
target Directory, source and target files as arguments, and copies the source
file to the target Directory under the target file name. (Shai Erera)
* LUCENE-2663: IndexWriter no longer forcefully clears any existing
locks when create=true. This was a holdover from when
SimpleFSLockFactory was the default locking implementation, and,
even then it was dangerous since it could mask bugs in IndexWriter's
usage, allowing applications to accidentally open two writers on the
same directory. (Mike McCandless)
* LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on
LogMergePolicy now affect optimize() as well (as opposed to only regular
merges). This means that you can run optimize() and too large segments won't
be merged. (Shai Erera)
* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
guaranteeing the commits are sorted from oldest to latest. (Shai Erera)
* LUCENE-2785: TopScoreDocCollector, TopFieldCollector and
the IndexSearcher search methods that take an int nDocs will now
throw IllegalArgumentException if nDocs is 0. Instead, you should
use the newly added TotalHitCountCollector. (Mike McCandless)
* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
to determine whether the passed in segment should be compound.
(Shai Erera, Earwin Burrfoot)
* LUCENE-2805: IndexWriter now increments the index version on every change to
the index instead of for every commit. Committing or closing the IndexWriter
without any changes to the index will not cause any index version increment.
(Simon Willnauer, Mike McCandless)
* LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
Windows and Solaris systems that support unmapping, FSDirectory.open returns
MMapDirectory. Additionally the behavior of MMapDirectory has been
changed to enable unmapping by default if supported by the JRE.
(Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-2829: Improve the performance of "primary key" lookup use
case (running a TermQuery that matches one document) on a
multi-segment index. (Robert Muir, Mike McCandless)
* LUCENE-2010: Segments with 100% deleted documents are now removed on
IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
* LUCENE-2960: Allow some changes to IndexWriterConfig to take effect
"live" (after an IW is instantiated), via
IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless)
API Changes
* LUCENE-2076: Rename FSDirectory.getFile -> getDirectory. (George
Aroush via Mike McCandless)
* LUCENE-1260: Change norm encode (float->byte) and decode
(byte->float) to be instance methods not static methods. This way a
custom Similarity can alter how norms are encoded, though they must
still be encoded as a single byte (Johan Kindgren via Mike
McCandless)
* LUCENE-2103: NoLockFactory should have a private constructor;
until Lucene 4.0 the default one will be deprecated.
(Shai Erera via Uwe Schindler)
* LUCENE-2177: Deprecate the Field ctors that take byte[] and Store.
Since the removal of compressed fields, Store can only be YES, so
it's not necessary to specify. (Erik Hatcher via Mike McCandless)
* LUCENE-2200: Several final classes had non-overriding protected
members. These were converted to private and unused protected
constructors removed. (Steven Rowe via Robert Muir)
* LUCENE-2240: SimpleAnalyzer and WhitespaceAnalyzer now have
Version ctors. (Simon Willnauer via Uwe Schindler)
* LUCENE-2259: Add IndexWriter.deleteUnusedFiles, to attempt removing
unused files. This is only useful on Windows, which prevents
deletion of open files. IndexWriter will eventually remove these
files itself; this method just lets you do so when you know the
files are no longer open by IndexReaders. (luocanrao via Mike
McCandless)
* LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier
use by external code. In addition it offers a matchExtension method which
callers can use to query whether a certain file matches a certain extension.
(Shai Erera via Mike McCandless)
* LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery.
This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but
only scores terms by their boost values. For example, this can be used
with FuzzyQuery to ensure that exact matches are always scored higher,
because only the boost will be used in scoring. (Robert Muir)
* LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to
expose its folding logic. (Cédrik Lime via Robert Muir)
* LUCENE-2294: IndexWriter constructors have been deprecated in favor of a
single ctor which accepts IndexWriterConfig and a Directory. You can set all
the parameters related to IndexWriter on IndexWriterConfig. The different
setter/getter methods were deprecated as well. One should call
writer.getConfig().getXYZ() to query for a parameter XYZ.
Additionally, the setter/getter related to MergePolicy were deprecated as
well. One should interact with the MergePolicy directly.
(Shai Erera via Mike McCandless)
* LUCENE-2320: IndexWriter's MergePolicy configuration was moved to
IndexWriterConfig and the respective methods on IndexWriter were deprecated.
(Shai Erera via Mike McCandless)
* LUCENE-2328: Directory now keeps track itself of the files that are written
but not yet fsynced. The old Directory.sync(String file) method is deprecated
and replaced with Directory.sync(Collection<String> files). Take a look at
FSDirectory to see a sample of how such tracking might look like, if needed
in your custom Directories. (Earwin Burrfoot via Mike McCandless)
* LUCENE-2302: Deprecated TermAttribute and replaced by a new
CharTermAttribute. The change is backwards compatible, so
mixed new/old TokenStreams all work on the same char[] buffer
independent of which interface they use. CharTermAttribute
has shorter method names and implements CharSequence and
Appendable. This allows usage like Java's StringBuilder in
addition to direct char[] access. Also terms can directly be
used in places where CharSequence is allowed (e.g. regular
expressions).
(Uwe Schindler, Robert Muir)
* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
points too. If you use an IndexDeletionPolicy which holds onto index commits
(such as SnapshotDeletionPolicy), you can call this method to remove those
commit points when they are not needed anymore (instead of waiting for the
next commit). (Shai Erera)
* LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
with equivalent ones that take a String (id) as argument. You can pass
whatever ID you want, as long as you use the same one when calling both.
(Shai Erera)
* LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
set what IndexWriter passes for termsIndexDivisor to the readers it
opens internally when apply deletions or creating a near-real-time
reader. (Earwin Burrfoot via Mike McCandless)
* LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer
in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
points, including values from U+FFFF to U+10FFFF
ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
Analyzer implementation and behavior. Only the Unicode Basic Multilingual
Plane (code points from U+0000 to U+FFFF) is covered.
UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
(Steven Rowe, Robert Muir, Uwe Schindler)
* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
and return a different RAMFile implementation. (Shai Erera)
* LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
count the number of hits matching the query. (Mike McCandless)
* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method
is only syntactic sugar for setNorm(int, String, byte), but using the global
Similarity.getDefault().encodeNormValue(). Use the byte-based method instead
to ensure that the norm is encoded with your Similarity.
(Robert Muir, Mike McCandless)
* LUCENE-2374: Added Attribute reflection API: It's now possible to inspect the
contents of AttributeImpl and AttributeSource using a well-defined API.
This is e.g. used by Solr's AnalysisRequestHandlers to display all attributes
in a structured way.
There are also some backwards incompatible changes in toString() output,
as LUCENE-2302 introduced the CharSequence interface to CharTermAttribute
leading to changed toString() return values. The new API allows to get a
string representation in a well-defined way using a new method
reflectAsString(). For backwards compatibility reasons, when toString()
was implemented by implementation subclasses, the default implementation of
AttributeImpl.reflectWith() uses toString()s output instead to report the
Attribute's properties. Otherwise, reflectWith() uses Java's reflection
(like toString() did before) to get the attribute properties.
In addition, the mandatory equals() and hashCode() are no longer required
for AttributeImpls, but can still be provided (if needed).
(Uwe Schindler)
* LUCENE-2691: Deprecate IndexWriter.getReader in favor of
IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless)
* LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity,
it should keep it itself. Fixed Scorers to pass their parent Weight, so that
Scorer.visitSubScorers (LUCENE-2590) will work correctly.
(Robert Muir, Doron Cohen)
* LUCENE-2900: When opening a near-real-time (NRT) reader
(IndexReader.re/open(IndexWriter)) you can now specify whether
deletes should be applied. Applying deletes can be costly, and some
expert use cases can handle seeing deleted documents returned. The
deletes remain buffered so that the next time you open an NRT reader
and pass true, all deletes will be a applied. (Mike McCandless)
* LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now
require up front specification of enablePositionIncrement. Together with
StopFilter they have a common base class (FilteringTokenFilter) that handles
the position increments automatically. Implementors only need to override an
accept() method that filters tokens. (Uwe Schindler, Robert Muir)
Bug fixes
* LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
close. (Martin Traverso via Uwe Schindler)
* LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap
incorrectly and lead to ConcurrentModificationException.
(Uwe Schindler, Robert Muir)
* LUCENE-2328: Index files fsync tracking moved from
IndexWriter/IndexReader to Directory, and it no longer leaks memory.
(Earwin Burrfoot via Mike McCandless)
* LUCENE-2074: Reduce buffer size of lexer back to default on reset.
(Ruben Laguna, Shai Erera via Uwe Schindler)
* LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on
a prior (corrupt) index missing its segments_N file. (Mike
McCandless)
* LUCENE-2458: QueryParser no longer automatically forms phrase queries,
assuming whitespace tokenization. Previously all CJK queries, for example,
would be turned into phrase queries. The old behavior is preserved with
the matchVersion parameter for previous versions. Additionally, you can
explicitly enable the old behavior with setAutoGeneratePhraseQueries(true)
(Robert Muir)
* LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in
OOM if a large file was copied. (Shai Erera)
* LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions
exceeds number of terms at one position (Jayendra Patil via Mike McCandless)
* LUCENE-2617: Optional clauses of a BooleanQuery were not factored
into coord if the scorer for that segment returned null. This
can cause the same document to score to differently depending on
what segment it resides in. (yonik)
* LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
* LUCENE-2732: Fix charset problems in XML loading in
HyphenationCompoundWordTokenFilter. (Uwe Schindler)
* LUCENE-2802: NRT DirectoryReader returned incorrect values from
getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
to a mutable reference to the IndexWriters SegmentInfos.
(Simon Willnauer, Earwin Burrfoot)
* LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
false EOF after seeking to EOF then seeking back to same block you
were just in and then calling readBytes (Robert Muir, Mike McCandless)
* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
decides whether to return the cached computed size or not. (Shai Erera)
* LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if
called by multiple threads. (Alexander Kanarsky via Shai Erera)
* LUCENE-2809: Fixed IndexWriter.numDocs to take into account
applied but not yet flushed deletes. (Mike McCandless)
* LUCENE-2879: MultiPhraseQuery previously calculated its phrase IDF by summing
internally, it now calls Similarity.idfExplain(Collection, IndexSearcher).
(Robert Muir)
* LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed.
(Jason Rutherglen via Shai Erera)
* LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round()
is safe also in strange locales. (Uwe Schindler)
* LUCENE-2891: IndexWriterConfig did not accept -1 in setReaderTermIndexDivisor,
which can be used to prevent loading the terms index into memory. (Shai Erera)
* LUCENE-2937: Encoding a float into a byte (e.g. encoding field norms during
indexing) had an underflow detection bug that caused floatToByte(f)==0 where
f was greater than 0, but slightly less than byteToFloat(1). This meant that
certain very small field norms (index_boost * length_norm) could have
been rounded down to 0 instead of being rounded up to the smallest
positive number. (yonik)
* LUCENE-2936: PhraseQuery score explanations were not correctly
identifying matches vs non-matches. (hossman)
* LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if
the underlying readByte() is inlined (which happens e.g. in MMapDirectory).
The loop was unwinded which makes the hotspot bug disappear.
(Uwe Schindler, Robert Muir, Mike McCandless)
New features
* LUCENE-2128: Parallelized fetching document frequencies during weight
creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler)
* LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch
to Java 5, supplementary characters are now lowercased correctly if the
set is created as case insensitive.
CharArraySet now requires a Version argument to preserve
backwards compatibility. If Version < 3.1 is passed to the constructor,
CharArraySet yields the old behavior. (Simon Willnauer)
* LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch
to Java 5, supplementary characters are now lowercased correctly.
LowerCaseFilter now requires a Version argument to preserve
backwards compatibility. If Version < 3.1 is passed to the constructor,
LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir)
* LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer
that makes it easier to reuse TokenStreams correctly. This issue also added
StopwordAnalyzerBase, which improves consistency of all Analyzers that use
stopwords, and implement many analyzers in contrib with it.
(Simon Willnauer via Robert Muir)
* LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a
new KeywordAttribute. (Simon Willnauer, Drew Farris via Uwe Schindler)
* LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support
to CharTokenizer and its subclasses. CharTokenizer now has new
int-API which is conditionally preferred to the old char-API depending
on the provided Version. Version < 3.1 will use the char-API.
(Simon Willnauer via Uwe Schindler)
* LUCENE-2247: Added a CharArrayMap<V> for performance improvements
in some stemmers and synonym filters. (Uwe Schindler)
* LUCENE-2320: Added SetOnce which wraps an object and allows it to be set
exactly once. (Shai Erera via Mike McCandless)
* LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that
allows to use cloneAttributes() and this method as a replacement
for captureState()/restoreState(), if the state itself
needs to be inspected/modified. (Uwe Schindler)
* LUCENE-2293: Expose control over max number of threads that
IndexWriter will allow to run concurrently while indexing
documents (previously this was hardwired to 5), using
IndexWriterConfig.setMaxThreadStates. (Mike McCandless)
* LUCENE-2297: Enable turning on reader pooling inside IndexWriter
even when getReader (near-real-timer reader) is not in use, through
IndexWriterConfig.enable/disableReaderPooling. (Mike McCandless)
* LUCENE-2331: Add NoMergePolicy which never returns any merges to execute. In
addition, add NoMergeScheduler which never executes any merges. These two are
convenient classes in case you want to disable segment merges by IndexWriter
without tweaking a particular MergePolicy parameters, such as mergeFactor.
MergeScheduler's methods are now public. (Shai Erera via Mike McCandless)
* LUCENE-2339: Deprecate static method Directory.copy in favor of
Directory.copyTo, and use nio's FileChannel.transferTo when copying
files between FSDirectory instances. (Earwin Burrfoot via Mike
McCandless).
* LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)
* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
can be used to prevent commits from ever getting deleted from the index.
(Shai Erera)
* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
return a DirPayloadProcessor for a given Directory, which returns a
PayloadProcessor for a given Term. The PayloadProcessor will be used to
process the payloads of the segments as they are merged (e.g. if one wants to
rewrite payloads of external indexes as they are added, or of local ones).
(Shai Erera, Michael Busch, Mike McCandless)
* LUCENE-2440: Add support for custom ExecutorService in
ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
* LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter
to wrap any other Analyzer and provide the same functionality as
MaxFieldLength provided on IndexWriter. This patch also fixes a bug
in the offset calculation in CharTokenizer. (Uwe Schindler, Shai Erera)
* LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
it's empty. (Ross Woolf via Mike McCandless)
* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
McCandless)
* LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq. Along
with a custom Collector these experimental methods make it possible
to gather the hit-count per sub-clause and per document while a
search is running. (Simon Willnauer, Mike McCandless)
* LUCENE-2636: Added MultiCollector which allows running the search with several
Collectors. (Shai Erera)
* LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries
to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
Using this wrapper it's easy to add fuzzy/wildcard to e.g. a SpanNearQuery.
(Robert Muir, Uwe Schindler)
* LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query
instance for stripping off scores. The use of a QueryWrapperFilter
is no longer needed and discouraged for that use case. Directly wrapping
Query improves performance, as out-of-order collection is now supported.
(Uwe Schindler)
* LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to
FieldInvertState so that it can be used in Similarity.computeNorm.
(Robert Muir)
* LUCENE-2720: Segments now record the code version which created them.
(Shai Erera, Mike McCandless, Uwe Schindler)
* LUCENE-2474: Added expert ReaderFinishedListener API to
IndexReader, to allow apps that maintain external per-segment caches
to evict entries when a segment is finished. (Shay Banon, Yonik
Seeley, Mike McCandless)
* LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and
the ICUTokenizer in contrib now all tag types with a consistent set
of token types (defined in StandardTokenizer). Tokens in the major
CJK types are explicitly marked to allow for custom downstream handling:
<IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>.
(Robert Muir, Steven Rowe)
* LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler)
* LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields
(Tim Smith, Grant Ingersoll)
* LUCENE-2692: Added several new SpanQuery classes for positional checking
(match is in a range, payload is a specific value) (Grant Ingersoll)
Optimizations
* LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
simple polling for results. (Edward Drapkin, Simon Willnauer)
* LUCENE-2075: Terms dict cache is now shared across threads instead
of being stored separately in thread local storage. Also fixed
terms dict so that the cache is used when seeking the thread local
term enum, which will be important for MultiTermQuery impls that do
lots of seeking (Mike McCandless, Uwe Schindler, Robert Muir, Yonik
Seeley)
* LUCENE-2136: If the multi reader (DirectoryReader or MultiReader)
only has a single sub-reader, delegate all enum requests to it.
This avoid the overhead of using a PQ unnecessarily. (Mike
McCandless)
* LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin
Burrfoot via Mike McCandless)
* LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode
into MultiTermQuery. The number of fuzzy expansions can be specified with
the maxExpansions parameter to FuzzyQuery.
(Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-2164: ConcurrentMergeScheduler has more control over merge
threads. First, it gives smaller merges higher thread priority than
larges ones. Second, a new set/getMaxMergeCount setting will pause
the larger merges to allow smaller ones to finish. The defaults for
these settings are now dynamic, depending the number CPU cores as
reported by Runtime.getRuntime().availableProcessors() (Mike
McCandless)
* LUCENE-2169: Improved CharArraySet.copy(), if source set is
also a CharArraySet. (Simon Willnauer via Uwe Schindler)
* LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to
take advantage of this for faster performance.
(Steven Rowe, Uwe Schindler, Robert Muir)
* LUCENE-2188: Add a utility class for tracking deprecated overridden
methods in non-final subclasses.
(Uwe Schindler, Robert Muir)
* LUCENE-2195: Speedup CharArraySet if set is empty.
(Simon Willnauer via Robert Muir)
* LUCENE-2285: Code cleanup. (Shai Erera via Uwe Schindler)
* LUCENE-2303: Remove code duplication in Token class by subclassing
TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve
null-handling for TypeAttribute. (Uwe Schindler)
* LUCENE-2329: Switch TermsHash* from using a PostingList object per unique
term to parallel arrays, indexed by termID. This reduces garbage collection
overhead significantly, which results in great indexing performance wins
when the available JVM heap space is low. This will become even more
important when the DocumentsWriter RAM buffer is searchable in the future,
because then it will make sense to make the RAM buffers as large as
possible. (Mike McCandless, Michael Busch)
* LUCENE-2380: The terms field cache methods (getTerms,
getTermsIndex), which replace the older String equivalents
(getStrings, getStringIndex), consume quite a bit less RAM in most
cases. (Mike McCandless)
* LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
(Mike McCandless)
* LUCENE-2531: Fix issue when sorting by a String field that was
causing too many fallbacks to compare-by-value (instead of by-ord).
(Mike McCandless)
* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
streams. (Shai Erera)
* LUCENE-2719: Improved TermsHashPerField's sorting to use a better
quick sort algorithm that dereferences the pivot element not on
every compare call. Also replaced lots of sorting code in Lucene
by the improved SorterTemplate class.
(Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-2760: Optimize SpanFirstQuery and SpanPositionRangeQuery.
(Robert Muir)
* LUCENE-2770: Make SegmentMerger always work on atomic subreaders,
even when IndexWriter.addIndexes(IndexReader...) is used with
DirectoryReaders or other MultiReaders. This saves lots of memory
during merge of norms. (Uwe Schindler, Mike McCandless)
* LUCENE-2824: Optimize BufferedIndexInput to do less bounds checks.
(Robert Muir)
* LUCENE-2010: Segments with 100% deleted documents are now removed on
IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
* LUCENE-1472: Removed synchronization from static DateTools methods
by using a ThreadLocal. Also converted DateTools.Resolution to a
Java 5 enum (this should not break backwards). (Uwe Schindler)
Build
* LUCENE-2124: Moved the JDK-based collation support from contrib/collation
into core, and moved the ICU-based collation support into contrib/icu.
(Robert Muir)
* LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
branch is now included in the svn repository using "svn copy"
after release. (Uwe Schindler)
* LUCENE-2074: Regenerating StandardTokenizerImpl files now needs
JFlex 1.5 (currently only available on SVN). (Uwe Schindler)
* LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
can force them to run sequentially by passing -Drunsequential=1 on the command
line. The number of threads that are spawned per CPU defaults to '1'. If you
wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
(Robert Muir, Shai Erera, Peter Kofler)
* LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar
from tarball of previous version. Backwards tests are now packaged together
with src distribution. (Uwe Schindler)
* LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration:
"ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
(Steven Rowe)
* LUCENE-2657: Switch from using Maven POM templates to full POMs when
generating Maven artifacts (Steven Rowe)
* LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's
tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera,
Steven Rowe)
Test Cases
* LUCENE-2037 Allow Junit4 tests in our environment (Erick Erickson
via Mike McCandless)
* LUCENE-1844: Speed up the unit tests (Mark Miller, Erick Erickson,
Mike McCandless)
* LUCENE-2065: Use Java 5 generics throughout our unit tests. (Kay
Kay via Mike McCandless)
* LUCENE-2155: Fix time and zone dependent localization test failures
in queryparser tests. (Uwe Schindler, Chris Male, Robert Muir)
* LUCENE-2170: Fix thread starvation problems. (Uwe Schindler)
* LUCENE-2248, LUCENE-2251, LUCENE-2285: Refactor tests to not use
Version.LUCENE_CURRENT, but instead use a global static value
from LuceneTestCase(J4), that contains the release version.
(Uwe Schindler, Simon Willnauer, Shai Erera)
* LUCENE-2313, LUCENE-2322: Add VERBOSE to LuceneTestCase(J4) to control
verbosity of tests. If VERBOSE==false (default) tests should not print
anything other than errors to System.(out|err). The setting can be
changed with -Dtests.verbose=true on test invocation.
(Shai Erera, Paul Elschot, Uwe Schindler)
* LUCENE-2318: Remove inconsistent system property code for retrieving
temp and data directories inside test cases. It is now centralized in
LuceneTestCase(J4). Also changed lots of tests to use
getClass().getResourceAsStream() to retrieve test data. Tests needing
access to "real" files from the test folder itself, can use
LuceneTestCase(J4).getDataFile(). (Uwe Schindler)
* LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such
as Eclipse and IntelliJ.
(Paolo Castagna, Steven Rowe via Robert Muir)
* LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
random. (Shai Erera, Robert Muir)
Documentation
* LUCENE-2579: Fix oal.search's package.html description of abstract
methods. (Santiago M. Mola via Mike McCandless)
* LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
that the TermEnum must be seeked since it is unpositioned.
(Adriano Crestani via Robert Muir)
* LUCENE-2894: Use google-code-prettify for syntax highlighting in javadoc.
(Shinichiro Abe, Koji Sekiguchi)
================== Release 2.9.4 / 3.0.3 ====================
Changes in runtime behavior
* LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a
test lock just before the real lock is acquired. (Surinder Pal
Singh Bindra via Mike McCandless)
* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
handles against deleted files when compound-file was enabled (the
default) and readers are pooled. As a result of this the peak
worst-case free disk space required during optimize is now 3X the
index size, when compound file is enabled (else 2X). (Mike
McCandless)
* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
0.1), which means any time a merged segment is greater than 10% of
the index size, it will be left in non-compound format even if
compound format is on. This change was made to reduce peak
transient disk usage during optimize which increased due to
LUCENE-2762. (Mike McCandless)
Bug fixes
* LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer
throws an exception when term count exceeds doc count.
(Mike McCandless, Uwe Schindler)
* LUCENE-2513: when opening writable IndexReader on a not-current
commit, do not overwrite "future" commits. (Mike McCandless)
* LUCENE-2536: IndexWriter.rollback was failing to properly rollback
buffered deletions against segments that were flushed (Mark Harwood
via Mike McCandless)
* LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results
with endpoints near Long.MIN_VALUE and Long.MAX_VALUE:
NumericUtils.splitRange() overflowed, if
- the range contained a LOWER bound
that was greater than (Long.MAX_VALUE - (1L << precisionStep))
- the range contained an UPPER bound
that was less than (Long.MIN_VALUE + (1L << precisionStep))
With standard precision steps around 4, this had no effect on
most queries, only those that met the above conditions.
Queries with large precision steps failed more easy. Queries with
precision step >=64 were not affected. Also 32 bit data types int
and float were not affected.
(Yonik Seeley, Uwe Schindler)
* LUCENE-2593: Fixed certain rare cases where a disk full could lead
to a corrupted index (Robert Muir, Mike McCandless)
* LUCENE-2620: Fixed a bug in WildcardQuery where too many asterisks
would result in unbearably slow performance. (Nick Barkas via Robert Muir)
* LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an
exact multiple of the chunk size. (Robert Muir)
* LUCENE-2634: isCurrent on an NRT reader was failing to return false
if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless)
* LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing
an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-2384: Reset zzBuffer in StandardTokenizerImpl when lexer is reset.
(Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074)
* LUCENE-2658: Exceptions while processing term vectors enabled for multiple
fields could lead to invalid ArrayIndexOutOfBoundsExceptions.
(Robert Muir, Mike McCandless)
* LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap().
(Javier Godoy via Uwe Schindler)
* LUCENE-2328: Fixed memory leak in how IndexWriter/Reader tracked
already sync'd files. (Earwin Burrfoot via Mike McCandless)
* LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record
the absolute docid. (Uwe Schindler)
* LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when
primary & secondary dirs share the same underlying directory.
(Michael McCandless)
* LUCENE-2365: IndexWriter.newestSegment (used normally for testing)
is fixed to return null if there are no segments. (Karthick
Sankarachary via Mike McCandless)
* LUCENE-2730: Fix two rare deadlock cases in IndexWriter (Mike McCandless)
* LUCENE-2744: CheckIndex was stating total number of fields,
not the number that have norms enabled, on the "test: field
norms..." output. (Mark Kristensson via Mike McCandless)
* LUCENE-2759: Fixed two near-real-time cases where doc store files
may be opened for read even though they are still open for write.
(Mike McCandless)
* LUCENE-2618: Fix rare thread safety issue whereby
IndexWriter.optimize could sometimes return even though the index
wasn't fully optimized (Mike McCandless)
* LUCENE-2767: Fix thread safety issue in addIndexes(IndexReader[])
that could potentially result in index corruption. (Mike
McCandless)
* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
handles against deleted files when compound-file was enabled (the
default) and readers are pooled. As a result of this the peak
worst-case free disk space required during optimize is now 3X the
index size, when compound file is enabled (else 2X). (Mike
McCandless)
* LUCENE-2216: OpenBitSet.hashCode returned different hash codes for
sets that only differed by trailing zeros. (Dawid Weiss, yonik)
* LUCENE-2782: Fix rare potential thread hazard with
IndexWriter.commit (Mike McCandless)
API Changes
* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
0.1), which means any time a merged segment is greater than 10% of
the index size, it will be left in non-compound format even if
compound format is on. This change was made to reduce peak
transient disk usage during optimize which increased due to
LUCENE-2762. (Mike McCandless)
Optimizations
* LUCENE-2556: Improve memory usage after cloning TermAttribute.
(Adriano Crestani via Uwe Schindler)
* LUCENE-2098: Improve the performance of BaseCharFilter, especially for
large documents. (Robin Wojciki, Koji Sekiguchi, Robert Muir)
New features
* LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files
also in 2.9. The file format did not change, only the version number was
upgraded to mark segments that have no compression. FieldsWriter still only
writes 2.9 segments as they could contain compressed fields. This cross-version
index format compatibility is provided here solely because Lucene 2.9 and 3.0
have the same bugfix level, features, and the same index format with this slight
compression difference. In general, Lucene does not support reading newer
indexes with older library versions. (Uwe Schindler)
Documentation
* LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to
Java NIO behavior when a Thread is interrupted while blocking on IO.
(Simon Willnauer, Robert Muir)
================== Release 2.9.3 / 3.0.2 ====================
Changes in backwards compatibility policy
* LUCENE-2135: Added FieldCache.purge(IndexReader) method to the
interface. Anyone implementing FieldCache externally will need to
fix their code to implement this, on upgrading. (Mike McCandless)
Changes in runtime behavior
* LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if
it cannot delete the lock file, since obtaining the lock does not fail if the
file is there. (Shai Erera)
* LUCENE-2060 (2.9.3 only): Changed ConcurrentMergeScheduler's default for
maxNumThreads from 3 to 1, because in practice we get the most gains
from running a single merge in the backround. More than one
concurrent merge causes alot of thrashing (though it's possible on
SSD storage that there would be net gains). (Jason Rutherglen, Mike
McCandless)
Bug fixes
* LUCENE-2046 (2.9.3 only): IndexReader should not see the index as changed, after
IndexWriter.prepareCommit has been called but before
IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
* LUCENE-2119: Don't throw NegativeArraySizeException if you pass
Integer.MAX_VALUE as nDocs to IndexSearcher search methods. (Paul
Taylor via Mike McCandless)
* LUCENE-2142: FieldCacheImpl.getStringIndex no longer throws an
exception when term count exceeds doc count. (Mike McCandless)
* LUCENE-2104: NativeFSLock.release() would silently fail if the lock is held by
another thread/process. (Shai Erera via Uwe Schindler)
* LUCENE-2283: Use shared memory pool for term vector and stored
fields buffers. This memory will be reclaimed if needed according to
the configured RAM Buffer Size for the IndexWriter. This also fixes
potentially excessive memory usage when many threads are indexing a
mix of small and large documents. (Tim Smith via Mike McCandless)
* LUCENE-2300: If IndexWriter is pooling reader (because NRT reader
has been obtained), and addIndexes* is run, do not pool the
readers from the external directory. This is harmless (NRT reader is
correct), but a waste of resources. (Mike McCandless)
* LUCENE-2422: Don't reuse byte[] in IndexInput/Output -- it gains
little performance, and ties up possibly large amounts of memory
for apps that index large docs. (Ross Woolf via Mike McCandless)
* LUCENE-2387: Don't hang onto Fieldables from the last doc indexed,
in IndexWriter, nor the Reader in Tokenizer after close is
called. (Ruben Laguna, Uwe Schindler, Mike McCandless)
* LUCENE-2417: IndexCommit did not implement hashCode() and equals()
consistently. Now they both take Directory and version into consideration. In
addition, all of IndexComnmit methods which threw
UnsupportedOperationException are now abstract. (Shai Erera)
* LUCENE-2467: Fixed memory leaks in IndexWriter when large documents
are indexed. (Mike McCandless)
* LUCENE-2473: Clicking on the "More Results" link in the luceneweb.war
demo resulted in ArrayIndexOutOfBoundsException.
(Sami Siren via Robert Muir)
* LUCENE-2476: If any exception is hit init'ing IW, release the write
lock (previously we only released on IOException). (Tamas Cservenak
via Mike McCandless)
* LUCENE-2478: Fix CachingWrapperFilter to not throw NPE when
Filter.getDocIdSet() returns null. (Uwe Schindler, Daniel Noll)
* LUCENE-2468: Allow specifying how new deletions should be handled in
CachingWrapperFilter and CachingSpanFilter. By default, new
deletions are ignored in CachingWrapperFilter, since typically this
filter is AND'd with a query that correctly takes new deletions into
account. This should be a performance gain (higher cache hit rate)
in apps that reopen readers, or use near-real-time reader
(IndexWriter.getReader()), but may introduce invalid search results
(allowing deleted docs to be returned) for certain cases, so a new
expert ctor was added to CachingWrapperFilter to enforce deletions
at a performance cost. CachingSpanFilter by default recaches if
there are new deletions (Shay Banon via Mike McCandless)
* LUCENE-2299: If you open an NRT reader while addIndexes* is running,
it may miss some segments (Earwin Burrfoot via Mike McCandless)
* LUCENE-2397: Don't throw NPE from SnapshotDeletionPolicy.snapshot if
there are no commits yet (Shai Erera)
* LUCENE-2424: Fix FieldDoc.toString to actually return its fields
(Stephen Green via Mike McCandless)
* LUCENE-2311: Always pass a "fully loaded" (terms index & doc stores)
SegmentsReader to IndexWriter's mergedSegmentWarmer (if set), so
that warming is free to do whatever it needs to. (Earwin Burrfoot
via Mike McCandless)
* LUCENE-3029: Fix corner case when MultiPhraseQuery is used with zero
position-increment tokens that would sometimes assign different
scores to identical docs. (Mike McCandless)
* LUCENE-2486: Fixed intermittent FileNotFoundException on doc store
files when a mergedSegmentWarmer is set on IndexWriter. (Mike
McCandless)
* LUCENE-2130: Fix performance issue when FuzzyQuery runs on a
multi-segment index (Michael McCandless)
API Changes
* LUCENE-2281: added doBeforeFlush to IndexWriter to allow extensions to perform
operations before flush starts. Also exposed doAfterFlush as protected instead
of package-private. (Shai Erera via Mike McCandless)
* LUCENE-2356: Add IndexWriter.set/getReaderTermsIndexDivisor, to set
what IndexWriter passes for termsIndexDivisor to the readers it
opens internally when applying deletions or creating a
near-real-time reader. (Earwin Burrfoot via Mike McCandless)
Optimizations
* LUCENE-2494 (3.0.2 only): Use CompletionService in ParallelMultiSearcher
instead of simple polling for results. (Edward Drapkin, Simon Willnauer)
* LUCENE-2135: On IndexReader.close, forcefully evict any entries from
the FieldCache rather than waiting for the WeakHashMap to release
the reference (Mike McCandless)
* LUCENE-2161: Improve concurrency of IndexReader, especially in the
context of near real-time readers. (Mike McCandless)
* LUCENE-2360: Small speedup to recycling of reused per-doc RAM in
IndexWriter (Robert Muir, Mike McCandless)
Build
* LUCENE-2488 (2.9.3 only): Support build with JDK 1.4 and exclude Java 1.5
contrib modules on request (pass '-Dforce.jdk14.build=true') when
compiling/testing/packaging. This marks the benchmark contrib also
as Java 1.5, as it depends on fast-vector-highlighter. (Uwe Schindler)
================== Release 2.9.2 / 3.0.1 ====================
Changes in backwards compatibility policy
* LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm
from FuzzyQuery. The change was needed because the comparator of this
class had to be changed in an incompatible way. The class was never
intended to be public. (Uwe Schindler, Mike McCandless)
Bug fixes
* LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode
and equals methods, cause bad things to happen when caching
BooleanQueries. (Chris Hostetter, Mike McCandless)
* LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at
the same time, it's possible for commit to return control back to
one of the threads before all changes are actually committed.
(Sanne Grinovero via Mike McCandless)
* LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser
with a Version argument. (Brian Li via Robert Muir)
* LUCENE-2166: Don't incorrectly keep warning about the same immense
term, when IndexWriter.infoStream is on. (Mike McCandless)
* LUCENE-2158: At high indexing rates, NRT reader could temporarily
lose deletions. (Mike McCandless)
* LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load
implementation class when interface was loaded by a different
class loader. (Uwe Schindler, reported on java-user by Ahmed El-dawy)
* LUCENE-2257: Increase max number of unique terms in one segment to
termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
(Tom Burton-West via Mike McCandless)
* LUCENE-2260: Fixed AttributeSource to not hold a strong
reference to the Attribute/AttributeImpl classes which prevents
unloading of custom attributes loaded by other classloaders
(e.g. in Solr plugins). (Uwe Schindler)
* LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when
only one payload is present. (Erik Hatcher, Mike McCandless
via Uwe Schindler)
* LUCENE-2270: Queries consisting of all zero-boost clauses
(for example, text:foo^0) sorted incorrectly and produced
invalid docids. (yonik)
API Changes
* LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor
(it was accidentally removed in 3.0.0) (Mike McCandless)
* LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource
(it was accidentally removed in 3.0.0) (John Wang via Uwe Schindler)
* LUCENE-2190: Added a new class CustomScoreProvider to function package
that can be subclassed to provide custom scoring to CustomScoreQuery.
The methods in CustomScoreQuery that did this before were deprecated
and replaced by a method getCustomScoreProvider(IndexReader) that
returns a custom score implementation using the above class. The change
is necessary with per-segment searching, as CustomScoreQuery is
a stateless class (like all other Queries) and does not know about
the currently searched segment. This API works similar to Filter's
getDocIdSet(IndexReader). (Paul chez Jamespot via Mike McCandless,
Uwe Schindler)
* LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant
will cause backwards compatibility problems when upgrading Lucene. See
the Version javadocs for additional information.
(Robert Muir)
Optimizations
* LUCENE-2086: When resolving deleted terms, do so in term sort order
for better performance (Bogdan Ghidireac via Mike McCandless)
* LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue
added by LUCENE-504. (Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum.
(Uwe Schindler, Robert Muir)
Test Cases
* LUCENE-2114: Change TestFilteredSearch to test on multi-segment
index as well. (Simon Willnauer via Mike McCandless)
* LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute
that checks if clearAttributes() was called correctly.
(Uwe Schindler, Robert Muir)
* LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if
end() is implemented correctly. (Koji Sekiguchi, Robert Muir)
Documentation
* LUCENE-2114: Improve javadocs of Filter to call out that the
provided reader is per-segment (Simon Willnauer via Mike
McCandless)
======================= Release 3.0.0 =======================
Changes in backwards compatibility policy
* LUCENE-1979: Change return type of SnapshotDeletionPolicy#snapshot()
from IndexCommitPoint to IndexCommit. Code that uses this method
needs to be recompiled against Lucene 3.0 in order to work. The
previously deprecated IndexCommitPoint is also removed.
(Michael Busch)
* o.a.l.Lock.isLocked() is now allowed to throw an IOException.
(Mike McCandless)
* LUCENE-2030: CachingWrapperFilter and CachingSpanFilter now hide
the internal cache implementation for thread safety, before it was
declared protected. (Peter Lenahan, Uwe Schindler, Simon Willnauer)
* LUCENE-2053: If you call Thread.interrupt() on a thread inside
Lucene, Lucene will do its best to interrupt the thread. However,
instead of throwing InterruptedException (which is a checked
exception), you'll get an oal.util.ThreadInterruptedException (an
unchecked exception, subclassing RuntimeException). The interrupt
status on the thread is cleared when this exception is thrown.
(Mike McCandless)
* LUCENE-2052: Some methods in Lucene core were changed to accept
Java 5 varargs. This is not a backwards compatibility problem as
long as you not try to override such a method. We left common
overridden methods unchanged and added varargs to constructors,
static, or final methods (MultiSearcher,...). (Uwe Schindler)
* LUCENE-1558: IndexReader.open(Directory) now opens a readOnly=true
reader, and new IndexSearcher(Directory) does the same. Note that
this is a change in the default from 2.9, when these methods were
previously deprecated. (Mike McCandless)
* LUCENE-1753: Make not yet final TokenStreams final to enforce
decorator pattern. (Uwe Schindler)
Changes in runtime behavior
* LUCENE-1677: Remove the system property to set SegmentReader class
implementation. (Uwe Schindler)
* LUCENE-1960: As a consequence of the removal of Field.Store.COMPRESS,
support for this type of fields was removed. Lucene 3.0 is still able
to read indexes with compressed fields, but as soon as merges occur
or the index is optimized, all compressed fields are decompressed
and converted to Field.Store.YES. Because of this, indexes with
compressed fields can suddenly get larger. Also the first merge with
decompression cannot be done in raw mode, it is therefore slower.
This change has no effect for code that uses such old indexes,
they behave as before (fields are automatically decompressed
during read). Indexes converted to Lucene 3.0 format cannot be read
anymore with previous versions.
It is recommended to optimize your indexes after upgrading to convert
to the new format and decompress all fields.
If you want compressed fields, you can use CompressionTools, that
creates compressed byte[] to be added as binary stored field. This
cannot be done automatically, as you also have to decompress such
fields when reading. You have to reindex to do that.
(Michael Busch, Uwe Schindler)
* LUCENE-2060: Changed ConcurrentMergeScheduler's default for
maxNumThreads from 3 to 1, because in practice we get the most
gains from running a single merge in the background. More than one
concurrent merge causes a lot of thrashing (though it's possible on
SSD storage that there would be net gains). (Jason Rutherglen,
Mike McCandless)
API Changes
* LUCENE-1257, LUCENE-1984, LUCENE-1985, LUCENE-2057, LUCENE-1833, LUCENE-2012,
LUCENE-1998: Port to Java 1.5:
- Add generics to public and internal APIs (see below).
- Replace new Integer(int), new Double(double),... by static valueOf() calls.
- Replace for-loops with Iterator by foreach loops.
- Replace StringBuffer with StringBuilder.
- Replace o.a.l.util.Parameter by Java 5 enums (see below).
- Add @Override annotations.
(Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera,
DM Smith)
* Generify Lucene API:
- TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an
instance of the requested attribute interface and no cast needed anymore
(LUCENE-1855).
- NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter
now have Integer, Long, Float, Double as type param (LUCENE-1857).
- Document.getFields() returns List<Fieldable>.
- Query.extractTerms(Set<Term>)
- CharArraySet and stop word sets in core/contrib
- PriorityQueue (LUCENE-1935)
- TopDocCollector
- DisjunctionMaxQuery (LUCENE-1984)
- MultiTermQueryWrapperFilter
- CloseableThreadLocal
- MapOfSets
- o.a.l.util.cache package
- lot's of internal APIs of IndexWriter
(Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani)
* LUCENE-1944, LUCENE-1856, LUCENE-1957, LUCENE-1960, LUCENE-1961,
LUCENE-1968, LUCENE-1970, LUCENE-1946, LUCENE-1971, LUCENE-1975,
LUCENE-1972, LUCENE-1978, LUCENE-944, LUCENE-1979, LUCENE-1973, LUCENE-2011:
Remove deprecated methods/constructors/classes:
- Remove all String/File directory paths in IndexReader /
IndexSearcher / IndexWriter.
- Remove FSDirectory.getDirectory()
- Make FSDirectory abstract.
- Remove Field.Store.COMPRESS (see above).
- Remove Filter.bits(IndexReader) method and make
Filter.getDocIdSet(IndexReader) abstract.
- Remove old DocIdSetIterator methods and make the new ones abstract.
- Remove some methods in PriorityQueue.
- Remove old TokenStream API and backwards compatibility layer.
- Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery.
- Remove SpanQuery.getTerms().
- Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO.
- Remove old-style custom sort.
- Remove legacy search setting in SortField.
- Remove Hits and all references from core and contrib.
- Remove HitCollector and its TopDocs support implementations.
- Remove term field and accessors in MultiTermQuery
(and fix Highlighter).
- Remove deprecated methods in BooleanQuery.
- Remove deprecated methods in Similarity.
- Remove BoostingTermQuery.
- Remove MultiValueSource.
- Remove Scorer.explain(int).
...and some other minor ones (Uwe Schindler, Michael Busch, Mark Miller)
* LUCENE-1925: Make IndexSearcher's subReaders and docStarts members
protected; add expert ctor to directly specify reader, subReaders
and docStarts. (John Wang, Tim Smith via Mike McCandless)
* LUCENE-1945: All public classes that have a close() method now
also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...).
(Uwe Schindler)
* LUCENE-1998: Change all Parameter instances to Java 5 enums. This
is no backwards-break, only a change of the super class. Parameter
was deprecated and will be removed in a later version.
(DM Smith, Uwe Schindler)
Bug fixes
* LUCENE-1951: When the text provided to WildcardQuery has no wildcard
characters (ie matches a single term), don't lose the boost and
rewrite method settings. Also, rewrite to PrefixQuery if the
wildcard is form "foo*", for slightly faster performance. (Robert
Muir via Mike McCandless)
* LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
(Benjamin Keil via Mark Miller)
* LUCENE-2088: addAttribute() should only accept interfaces that
extend Attribute. (Shai Erera, Uwe Schindler)
* LUCENE-2045: Fix silly FileNotFoundException hit if you enable
infoStream on IndexWriter and then add an empty document and commit
(Shai Erera via Mike McCandless)
* LUCENE-2046: IndexReader should not see the index as changed, after
IndexWriter.prepareCommit has been called but before
IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
New features
* LUCENE-1933: Provide a convenience AttributeFactory that creates a
Token instance for all basic attributes. (Uwe Schindler)
* LUCENE-2041: Parallelize the rest of ParallelMultiSearcher. Lots of
code refactoring and Java 5 concurrent support in MultiSearcher.
(Joey Surls, Simon Willnauer via Uwe Schindler)
* LUCENE-2051: Add CharArraySet.copy() as a simple method to copy
any Set<?> to a CharArraySet that is optimized, if Set<?> is already
an CharArraySet. (Simon Willnauer)
Optimizations
* LUCENE-1183: Optimize Levenshtein Distance computation in
FuzzyQuery. (Cédrik Lime via Mike McCandless)
* LUCENE-2006: Optimization of FieldDocSortedHitQueue to always
use Comparable<?> interface. (Uwe Schindler, Mark Miller)
* LUCENE-2087: Remove recursion in NumericRangeTermEnum.
(Uwe Schindler)
Build
* LUCENE-486: Remove test->demo dependencies. (Michael Busch)
* LUCENE-2024: Raise build requirements to Java 1.5 and ANT 1.7.0
(Uwe Schindler, Mike McCandless)
======================= Release 2.9.1 =======================
Changes in backwards compatibility policy
* LUCENE-2002: Add required Version matchVersion argument when
constructing QueryParser or MultiFieldQueryParser and, default (as
of 2.9) enablePositionIncrements to true to match
StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless)
Bug fixes
* LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used
BooleanScorer for scoring), whereby some matching documents fail to
be collected. (Fulin Tang via Mike McCandless)
* LUCENE-1124: Make sure FuzzyQuery always matches the precise term.
(stefatwork@gmail.com via Mike McCandless)
* LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing
when the reader is a near real-time reader. (Jake Mannix via Mike
McCandless)
* LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan,
Mark Miller via Mike McCandless)
* LUCENE-1992: Fix thread hazard if a merge is committing just as an
exception occurs during sync (Uwe Schindler, Mike McCandless)
* LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB
cannot exceed 2048 MB, and throw IllegalArgumentException if it
does. (Aaron McKee, Yonik Seeley, Mike McCandless)
* LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined
by client code. (Uwe Schindler)
* LUCENE-2016: Replace illegal U+FFFF character with the replacement
char (U+FFFD) during indexing, to prevent silent index corruption.
(Peter Keegan, Mike McCandless)
API Changes
* Un-deprecate search(Weight weight, Filter filter, int n) from
Searchable interface (deprecated by accident). (Uwe Schindler)
* Un-deprecate o.a.l.util.Version constants. (Mike McCandless)
* LUCENE-1987: Un-deprecate some ctors of Token, as they will not
be removed in 3.0 and are still useful. Also add some missing
o.a.l.util.Version constants for enabling invalid acronym
settings in StandardAnalyzer to be compatible with the coming
Lucene 3.0. (Uwe Schindler)
* LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring,
to allow controlling per-IndexSearcher whether scores are computed
when sorting by field. (Uwe Schindler, Mike McCandless)
* LUCENE-2043: Make IndexReader.commit(Map<String,String>) public.
(Mike McCandless)
Documentation
* LUCENE-1955: Fix Hits deprecation notice to point users in right
direction. (Mike McCandless, Mark Miller)
* Fix javadoc about score tracking done by search methods in Searcher
and IndexSearcher. (Mike McCandless)
* LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
(Luke Nezda via Mike McCandless)
======================= Release 2.9.0 =======================
Changes in backwards compatibility policy
* LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
longer computes a document score for each hit by default. If
document score tracking is still needed, you can call
IndexSearcher.setDefaultFieldSortScoring(true, true) to enable
both per-hit and maxScore tracking; however, this is deprecated
and will be removed in 3.0.
Alternatively, use Searchable.search(Weight, Filter, Collector)
and pass in a TopFieldCollector instance, using the following code
sample:
<code>
TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields,
true /* trackDocScores */,
true /* trackMaxScore */,
false /* docsInOrder */);
searcher.search(query, tfc);
TopDocs results = tfc.topDocs();
</code>
Note that your Sort object cannot use SortField.AUTO when you
directly instantiate TopFieldCollector.
Also, the method search(Weight, Filter, Collector) was added to
the Searchable interface and the Searcher abstract class to
replace the deprecated HitCollector versions. If you either
implement Searchable or extend Searcher, you should change your
code to implement this method. If you already extend
IndexSearcher, no further changes are needed to use Collector.
Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not
valid scores. Lucene uses these values internally in certain
places, so if you have hits with such scores, it will cause
problems. (Shai Erera via Mike McCandless)
* LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache
have been moved into FieldCache. ExtendedFieldCache is now deprecated and
contains only a few declarations for binary backwards compatibility.
ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and
ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation.
The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of
ExtendedFieldCache and FieldCache, FieldCache can now additionally return
long[] and double[] arrays in addition to int[] and float[] and StringIndex.
The interface changes are only notable for users implementing the interfaces,
which was unlikely done, because there is no possibility to change
Lucene's FieldCache implementation. (Grant Ingersoll, Uwe Schindler)
* LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
class. Some of the method signatures have changed, but it should be fairly
easy to see what adjustments must be made to existing code to sync up
with the new API. You can find more detail in the API Changes section.
Going forward Searchable will be kept for convenience only and may
be changed between minor releases without any deprecation
process. It is not recommended that you implement it, but rather extend
Searcher.
(Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
* LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below)
has some backwards breaks in rare cases. We did our best to make the
transition as easy as possible and you are not likely to run into any problems.
If your tokenizers still implement next(Token) or next(), the calls are
automatically wrapped. The indexer and query parser use the new API
(eg use incrementToken() calls). All core TokenStreams are implemented using
the new API. You can mix old and new API style TokenFilters/TokenStream.
Problems only occur when you have done the following:
You have overridden next(Token) or next() in one of the non-abstract core
TokenStreams/-Filters. These classes should normally be final, but some
of them are not. In this case, next(Token)/next() would never be called.
To fail early with a hard compile/runtime error, the next(Token)/next()
methods in these TokenStreams/-Filters were made final in this release.
(Michael Busch, Uwe Schindler)
* LUCENE-1763: MergePolicy now requires an IndexWriter instance to
be passed upon instantiation. As a result, IndexWriter was removed
as a method argument from all MergePolicy methods. (Shai Erera via
Mike McCandless)
* LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
compat break and caused custom SpanQuery implementations to fail at runtime
in a variety of ways. This issue attempts to remedy things by causing
a compile time break on custom SpanQuery implementations and removing
the PayloadSpans class, with its functionality now moved to Spans. To
help in alleviating future back compat pain, Spans has been changed from
an interface to an abstract class.
(Hugh Cayless, Mark Miller)
* LUCENE-1808: Query.createWeight has been changed from protected to
public. This will be a back compat break if you have overridden this
method - but you are likely already affected by the LUCENE-1693 (make Weight
abstract rather than an interface) back compat break if you have overridden
Query.creatWeight, so we have taken the opportunity to make this change.
(Tim Smith, Shai Erera via Mark Miller)
* LUCENE-1708 - IndexReader.document() no longer checks if the document is
deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
(Shai Erera via Mike McCandless)
Changes in runtime behavior
* LUCENE-1424: QueryParser now by default uses constant score auto
rewriting when it generates a WildcardQuery and PrefixQuery (it
already does so for TermRangeQuery, as well). Call
setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
to revert to slower BooleanQuery rewriting method. (Mark Miller via Mike
McCandless)
* LUCENE-1575: As of 2.9, the core collectors as well as
IndexSearcher's search methods that return top N results, no
longer filter documents with scores <= 0.0. If you rely on this
functionality you can use PositiveScoresOnlyCollector like this:
<code>
TopDocsCollector tdc = new TopScoreDocCollector(10);
Collector c = new PositiveScoresOnlyCollector(tdc);
searcher.search(query, c);
TopDocs hits = tdc.topDocs();
...
</code>
* LUCENE-1604: IndexReader.norms(String field) is now allowed to
return null if the field has no norms, as long as you've
previously called IndexReader.setDisableFakeNorms(true). This
setting now defaults to false (to preserve the fake norms back
compatible behavior) but in 3.0 will be hardwired to true. (Shon
Vella via Mike McCandless).
* LUCENE-1624: If you open IndexWriter with create=true and
autoCommit=false on an existing index, IndexWriter no longer
writes an empty commit when it's created. (Paul Taylor via Mike
McCandless)
* LUCENE-1593: When you call Sort() or Sort.setSort(String field,
boolean reverse), the resulting SortField array no longer ends
with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties
internally by docID). (Shai Erera via Michael McCandless)
* LUCENE-1542: When the first token(s) have 0 position increment,
IndexWriter used to incorrectly record the position as -1, if no
payload is present, or Integer.MAX_VALUE if a payload is present.
This causes positional queries to fail to match. The bug is now
fixed, but if your app relies on the buggy behavior then you must
call IndexWriter.setAllowMinus1Position(). That API is deprecated
so you must fix your application, and rebuild your index, to not
rely on this behavior by the 3.0 release of Lucene. (Jonathan
Mamou, Mark Miller via Mike McCandless)
* LUCENE-1715: Finalizers have been removed from the 4 core classes
that still had them, since they will cause GC to take longer, thus
tying up memory for longer, and at best they mask buggy app code.
DirectoryReader (returned from IndexReader.open) & IndexWriter
previously released the write lock during finalize.
SimpleFSDirectory.FSIndexInput closed the descriptor in its
finalizer, and NativeFSLock released the lock. It's possible
applications will be affected by this, but only if the application
is failing to close reader/writers. (Brian Groose via Mike
McCandless)
* LUCENE-1717: Fixed IndexWriter to account for RAM usage of
buffered deletions. (Mike McCandless)
* LUCENE-1727: Ensure that fields are stored & retrieved in the
exact order in which they were added to the document. This was
true in all Lucene releases before 2.3, but was broken in 2.3 and
2.4, and is now fixed in 2.9. (Mike McCandless)
* LUCENE-1678: The addition of Analyzer.reusableTokenStream
accidentally broke back compatibility of external analyzers that
subclassed core analyzers that implemented tokenStream but not
reusableTokenStream. This is now fixed, such that if
reusableTokenStream is invoked on such a subclass, that method
will forcefully fallback to tokenStream. (Mike McCandless)
* LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear
startOffset, endOffset and type. This is not likely to affect any
Tokenizer chains, as Tokenizers normally always set these three values.
This change was made to be conform to the new AttributeImpl.clear() and
AttributeSource.clearAttributes() to work identical for Token as one for all
AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch)
* LUCENE-1483: When searching over multiple segments, a new Scorer is now created
for each segment. Searching has been telescoped out a level and IndexSearcher now
operates much like MultiSearcher does. The Weight is created only once for the top
level Searcher, but each Scorer is passed a per-segment IndexReader. This will
result in doc ids in the Scorer being internal to the per-segment IndexReader. It
has always been outside of the API to count on a given IndexReader to contain every
doc id in the index - and if you have been ignoring MultiSearcher in your custom code
and counting on this fact, you will find your code no longer works correctly. If a
custom Scorer implementation uses any caches/filters that rely on being based on the
top level IndexReader, it will need to be updated to correctly use contextless
caches/filters eg you can't count on the IndexReader to contain any given doc id or
all of the doc ids. (Mark Miller, Mike McCandless)
* LUCENE-1846: DateTools now uses the US locale to format the numbers in its
date/time strings instead of the default locale. For most locales there will
be no change in the index format, as DateFormatSymbols is using ASCII digits.
The usage of the US locale is important to guarantee correct ordering of
generated terms. (Uwe Schindler)
* LUCENE-1860: MultiTermQuery now defaults to
CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it
was SCORING_BOOLEAN_QUERY_REWRITE). This means that PrefixQuery
and WildcardQuery will now produce constant score for all matching
docs, equal to the boost of the query. (Mike McCandless)
API Changes
* LUCENE-1419: Add expert API to set custom indexing chain. This API is
package-protected for now, so we don't have to officially support it.
Yet, it will give us the possibility to try out different consumers
in the chain. (Michael Busch)
* LUCENE-1427: DocIdSet.iterator() is now allowed to throw
IOException. (Paul Elschot, Mike McCandless)
* LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called
AttributeSource instead of the Token class, which is now a utility class that
holds common Token attributes. All attributes that the Token class had have
been moved into separate classes: TermAttribute, OffsetAttribute,
PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute.
The new API is much more flexible; it allows to combine the Attributes
arbitrarily and also to define custom Attributes. The new API has the same
performance as the old next(Token) approach. For conformance with this new
API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
(Michael Busch, Uwe Schindler; additional contributions and bug fixes by
Daniel Shane, Doron Cohen)
* LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator.
These methods can be used to avoid additional calls to doc().
(Michael Busch)
* LUCENE-1468: Deprecate Directory.list(), which sometimes (in
FSDirectory) filters out files that don't look like index files, in
favor of new Directory.listAll(), which does no filtering. Also,
listAll() will never return null; instead, it throws an IOException
(or subclass). Specifically, FSDirectory.listAll() will throw the
newly added NoSuchDirectoryException if the directory does not
exist. (Marcel Reutegger, Mike McCandless)
* LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
you to record an opaque commitUserData (maps String -> String) into
the commit written by IndexReader. This matches IndexWriter's
commit methods. (Jason Rutherglen via Mike McCandless)
* LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
enable compressing & decompressing binary content, external to
Lucene's indexing. Deprecated Field.Store.COMPRESS.
* LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions
(Otis Gospodnetic via Mike McCandless)
* LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods
to denote issues when offsets in TokenStream tokens exceed the length of the
provided text. (Mark Harwood)
* LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of
a new Collector abstract class. For easy migration, people can use
HitCollectorWrapper which translates (wraps) HitCollector into
Collector. Note that this class is also deprecated and will be
removed when HitCollector is removed. Also TimeLimitedCollector
is deprecated in favor of the new TimeLimitingCollector which
extends Collector. (Shai Erera, Mark Miller, Mike McCandless)
* LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because
it is used nowhere in core/contrib and there is only a very ineffective
default implementation available. If you want to position a TermEnum
to another Term, create a new one using IndexReader.terms(Term).
(Uwe Schindler)
* LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does
not make sense for all subclasses of MultiTermQuery. Check individual
subclasses to see if they support getTerm(). (Mark Miller)
* LUCENE-1636: Make TokenFilter.input final so it's set only
once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
* LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
(but left an FSDirectory base class). Added an FSDirectory.open
static method to pick a good default FSDirectory implementation
given the OS. FSDirectories should now be instantiated using
FSDirectory.open or with public constructors rather than
FSDirectory.getDirectory(), which has been deprecated.
(Michael McCandless, Uwe Schindler, yonik)
* LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0.
Instead, when sorting by field, the application should explicitly
state the type of the field. (Mike McCandless)
* LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now
require up front specification of enablePositionIncrement (Mike
McCandless)
* LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor
of the new nextDoc() and advance(). The new methods return the doc Id they
landed on, saving an extra call to doc() in most cases.
For easy migration of the code, you can change the calls to next() to
nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo().
However it is advised that you take advantage of the returned doc ID and not
call doc() following those two.
Also, doc() was deprecated in favor of docID(). docID() should return -1 or
NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the
iterator has exhausted. Otherwise it should return the current doc ID.
(Shai Erera via Mike McCandless)
* LUCENE-1672: All ctors/opens and other methods using String/File to
specify the directory in IndexReader, IndexWriter, and IndexSearcher
were deprecated. You should instantiate the Directory manually before
and pass it to these classes (LUCENE-1451, LUCENE-1658).
(Uwe Schindler)
* LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out
of Lucene's core into new contrib/remote package. Searchable no
longer extends java.rmi.Remote (Simon Willnauer via Mike
McCandless)
* LUCENE-1677: The global property
org.apache.lucene.SegmentReader.class, and
ReadOnlySegmentReader.class are now deprecated, to be removed in
3.0. src/gcj/* has been removed. (Earwin Burrfoot via Mike
McCandless)
* LUCENE-1673: Deprecated NumberTools in favour of the new
NumericRangeQuery and its new indexing format for numeric or
date values. (Uwe Schindler)
* LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds
a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /*
topScorer */) method instead of scorer(IndexReader). IndexSearcher uses
this method to obtain a scorer matching the capabilities of the Collector
wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more
efficient if out-of-order documents scoring is allowed by a Collector.
Collector must now implement acceptsDocsOutOfOrder. If you write a
Collector which does not care about doc ID orderness, it is recommended
that you return true. Weight has a scoresDocsOutOfOrder method, which by
default returns false. If you create a Weight which will score documents
out of order if requested, you should override that method to return true.
BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been
deprecated as they are not needed anymore. BooleanQuery will now score docs
out of order when used with a Collector that can accept docs out of order.
Finally, Weight#explain now takes a sub-reader and sub-docID, rather than
a top level reader and docID.
(Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
* LUCENE-1466, LUCENE-1906: Added CharFilter and MappingCharFilter, which allows
chaining & mapping of characters before tokenizers run. CharStream (subclass of
Reader) is the base class for custom java.io.Reader's, that support offset
correction. Tokenizers got an additional method correctOffset() that is passed
down to the underlying CharStream if input is a subclass of CharStream/-Filter.
(Koji Sekiguchi via Mike McCandless, Uwe Schindler)
* LUCENE-1703: Add IndexWriter.waitForMerges. (Tim Smith via Mike
McCandless)
* LUCENE-1625: CheckIndex's programmatic API now returns separate
classes detailing the status of each component in the index, and
includes more detailed status than previously. (Tim Smith via
Mike McCandless)
* LUCENE-1713: Deprecated RangeQuery and RangeFilter and renamed to
TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant
score auto rewrite mode by default. The new classes also have new
ctors taking field and term ranges as Strings (see also
LUCENE-1424). (Uwe Schindler)
* LUCENE-1609: The termInfosIndexDivisor must now be specified
up-front when opening the IndexReader. Attempts to call
IndexReader.setTermInfosIndexDivisor will hit an
UnsupportedOperationException. This was done to enable removal of
all synchronization in TermInfosReader, which previously could
cause threads to pile up in certain cases. (Dan Rosher via Mike
McCandless)
* LUCENE-1688: Deprecate static final String stop word array in and
StopAnalzyer and replace it with an immutable implementation of
CharArraySet. (Simon Willnauer via Mark Miller)
* LUCENE-1742: SegmentInfos, SegmentInfo and SegmentReader have been
made public as expert, experimental APIs. These APIs may suddenly
change from release to release (Jason Rutherglen via Mike
McCandless).
* LUCENE-1754: QueryWeight.scorer() can return null if no documents
are going to be matched by the query. Similarly,
Filter.getDocIdSet() can return null if no documents are going to
be accepted by the Filter. Note that these 'can' return null,
however they don't have to and can return a Scorer/DocIdSet which
does not match / reject all documents. This is already the
behavior of some QueryWeight/Filter implementations, and is
documented here just for emphasis. (Shai Erera via Mike
McCandless)
* LUCENE-1705: Added IndexWriter.deleteAllDocuments. (Tim Smith via
Mike McCandless)
* LUCENE-1460: Changed TokenStreams/TokenFilters in contrib to
use the new TokenStream API. (Robert Muir, Michael Busch)
* LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
compat break and caused custom SpanQuery implementations to fail at runtime
in a variety of ways. This issue attempts to remedy things by causing
a compile time break on custom SpanQuery implementations and removing
the PayloadSpans class, with its functionality now moved to Spans. To
help in alleviating future back compat pain, Spans has been changed from
an interface to an abstract class.
(Hugh Cayless, Mark Miller)
* LUCENE-1808: Query.createWeight has been changed from protected to
public. (Tim Smith, Shai Erera via Mark Miller)
* LUCENE-1826: Add constructors that take AttributeSource and
AttributeFactory to all Tokenizer implementations.
(Michael Busch)
* LUCENE-1847: Similarity#idf for both a Term and Term Collection have
been deprecated. New versions that return an IDFExplanation have been
added. (Yasoja Seneviratne, Mike McCandless, Mark Miller)
* LUCENE-1877: Made NativeFSLockFactory the default for
the new FSDirectory API (open(), FSDirectory subclass ctors).
All FSDirectory system properties were deprecated and all lock
implementations use no lock prefix if the locks are stored inside
the index directory. Because the deprecated String/File ctors of
IndexWriter and IndexReader (LUCENE-1672) and FSDirectory.getDirectory()
still use the old SimpleFSLockFactory and the new API
NativeFSLockFactory, we strongly recommend not to mix deprecated
and new API. (Uwe Schindler, Mike McCandless)
* LUCENE-1911: Added a new method isCacheable() to DocIdSet. This method
should return true, if the underlying implementation does not use disk
I/O and is fast enough to be directly cached by CachingWrapperFilter.
OpenBitSet, SortedVIntList, and DocIdBitSet are such candidates.
The default implementation of the abstract DocIdSet class returns false.
In this case, CachingWrapperFilter copies the DocIdSetIterator into an
OpenBitSet for caching. (Uwe Schindler, Thomas Becker)
Bug fixes
* LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
implementation - Leads to Solr Cache misses.
(Todd Feak, Mark Miller via yonik)
* LUCENE-1327: Fix TermSpans#skipTo() to behave as specified in javadocs
of Terms#skipTo(). (Michael Busch)
* LUCENE-1573: Do not ignore InterruptedException (caused by
Thread.interrupt()) nor enter deadlock/spin loop. Now, an interrupt
will cause a RuntimeException to be thrown. In 3.0 we will change
public APIs to throw InterruptedException. (Jeremy Volkman via
Mike McCandless)
* LUCENE-1590: Fixed stored-only Field instances do not change the
value of omitNorms, omitTermFreqAndPositions in FieldInfo; when you
retrieve such fields they will now have omitNorms=true and
omitTermFreqAndPositions=false (though these values are unused).
(Uwe Schindler via Mike McCandless)
* LUCENE-1587: RangeQuery#equals() could consider a RangeQuery
without a collator equal to one with a collator.
(Mark Platvoet via Mark Miller)
* LUCENE-1600: Don't call String.intern unnecessarily in some cases
when loading documents from the index. (P Eger via Mike
McCandless)
* LUCENE-1611: Fix case where OutOfMemoryException in IndexWriter
could cause "infinite merging" to happen. (Christiaan Fluit via
Mike McCandless)
* LUCENE-1623: Properly handle back-compatibility of 2.3.x indexes that
contain field names with non-ascii characters. (Mike Streeton via
Mike McCandless)
* LUCENE-1593: MultiSearcher and ParallelMultiSearcher did not break ties (in
sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs.
when it wasn't). (Shai Erera via Michael McCandless)
* LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
the segment's deletion count to be incorrect. (Mike McCandless)
* LUCENE-1542: When the first token(s) have 0 position increment,
IndexWriter used to incorrectly record the position as -1, if no
payload is present, or Integer.MAX_VALUE if a payload is present.
This causes positional queries to fail to match. The bug is now
fixed, but if your app relies on the buggy behavior then you must
call IndexWriter.setAllowMinus1Position(). That API is deprecated
so you must fix your application, and rebuild your index, to not
rely on this behavior by the 3.0 release of Lucene. (Jonathan
Mamou, Mark Miller via Mike McCandless)
* LUCENE-1658: Fixed MMapDirectory to correctly throw IOExceptions
on EOF, removed numeric overflow possibilities and added support
for a hack to unmap the buffers on closing IndexInput.
(Uwe Schindler)
* LUCENE-1681: Fix infinite loop caused by a call to DocValues methods
getMinValue, getMaxValue, getAverageValue. (Simon Willnauer via Mark Miller)
* LUCENE-1599: Add clone support for SpanQuerys. SpanRegexQuery counts
on this functionality and does not work correctly without it.
(Billow Gao, Mark Miller)
* LUCENE-1718: Fix termInfosIndexDivisor to carry over to reopened
readers (Mike McCandless)
* LUCENE-1583: SpanOrQuery skipTo() doesn't always move forwards as Spans
documentation indicates it should. (Moti Nisenson via Mark Miller)
* LUCENE-1566: Sun JVM Bug
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 causes
invalid OutOfMemoryError when reading too many bytes at once from
a file on 32bit JVMs that have a large maximum heap size. This
fix adds set/getReadChunkSize to FSDirectory so that large reads
are broken into chunks, to work around this JVM bug. On 32bit
JVMs the default chunk size is 100 MB; on 64bit JVMs, which don't
show the bug, the default is Integer.MAX_VALUE. (Simon Willnauer
via Mike McCandless)
* LUCENE-1448: Added TokenStream.end() to perform end-of-stream
operations (ie to return the end offset of the tokenization).
This is important when multiple fields with the same name are added
to a document, to ensure offsets recorded in term vectors for all
of the instances are correct.
(Mike McCandless, Mark Miller, Michael Busch)
* LUCENE-1805: CloseableThreadLocal did not allow a null Object in get(),
although it does allow it in set(Object). Fix get() to not assert the object
is not null. (Shai Erera via Mike McCandless)
* LUCENE-1801: Changed all Tokenizers or TokenStreams in core/contrib)
that are the source of Tokens to always call
AttributeSource.clearAttributes() first. (Uwe Schindler)
* LUCENE-1819: MatchAllDocsQuery.toString(field) should produce output
that is parsable by the QueryParser. (John Wang, Mark Miller)
* LUCENE-1836: Fix localization bug in the new query parser and add
new LocalizedTestCase as base class for localization junit tests.
(Robert Muir, Uwe Schindler via Michael Busch)
* LUCENE-1847: PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats
in their Weight#explain methods - these stats should be corpus wide.
(Yasoja Seneviratne, Mike McCandless, Mark Miller)
* LUCENE-1885: Fix the bug that NativeFSLock.isLocked() did not work,
if the lock was obtained by another NativeFSLock(Factory) instance.
Because of this IndexReader.isLocked() and IndexWriter.isLocked() did
not work correctly. (Uwe Schindler)
* LUCENE-1899: Fix O(N^2) CPU cost when setting docIDs in order in an
OpenBitSet, due to an inefficiency in how the underlying storage is
reallocated. (Nadav Har'El via Mike McCandless)
* LUCENE-1918: Fixed cases where a ParallelReader would
generate exceptions on being passed to
IndexWriter.addIndexes(IndexReader[]). First case was when the
ParallelReader was empty. Second case was when the ParallelReader
used to contain documents with TermVectors, but all such documents
have been deleted. (Christian Kohlschütter via Mike McCandless)
New features
* LUCENE-1411: Added expert API to open an IndexWriter on a prior
commit, obtained from IndexReader.listCommits. This makes it
possible to rollback changes to an index even after you've closed
the IndexWriter that made the changes, assuming you are using an
IndexDeletionPolicy that keeps past commits around. This is useful
when building transactional support on top of Lucene. (Mike
McCandless)
* LUCENE-1382: Add an optional arbitrary Map (String -> String)
"commitUserData" to IndexWriter.commit(), which is stored in the
segments file and is then retrievable via
IndexReader.getCommitUserData instance and static methods.
(Shalin Shekhar Mangar via Mike McCandless)
* LUCENE-1420: Similarity now has a computeNorm method that allows
custom Similarity classes to override how norm is computed. It's
provided a FieldInvertState instance that contains details from
inverting the field. The default impl is boost *
lengthNorm(numTerms), to be backwards compatible. Also added
{set/get}DiscountOverlaps to DefaultSimilarity, to control whether
overlapping tokens (tokens with 0 position increment) should be
counted in lengthNorm. (Andrzej Bialecki via Mike McCandless)
* LUCENE-1424: Moved constant score query rewrite capability into
MultiTermQuery, allowing TermRangeQuery, PrefixQuery and WildcardQuery
to switch between constant-score rewriting or BooleanQuery
expansion rewriting via a new setRewriteMethod method.
Deprecated ConstantScoreRangeQuery (Mark Miller via Mike
McCandless)
* LUCENE-1461: Added FieldCacheRangeFilter, a RangeFilter for
single-term fields that uses FieldCache to compute the filter. If
your documents all have a single term for a given field, and you
need to create many RangeFilters with varying lower/upper bounds,
then this is likely a much faster way to create the filters than
RangeFilter. FieldCacheRangeFilter allows ranges on all data types,
FieldCache supports (term ranges, byte, short, int, long, float, double).
However, it comes at the expense of added RAM consumption and slower
first-time usage due to populating the FieldCache. It also does not
support collation (Tim Sturge, Matt Ericson via Mike McCandless and
Uwe Schindler)
* LUCENE-1296: add protected method CachingWrapperFilter.docIdSetToCache
to allow subclasses to choose which DocIdSet implementation to use
(Paul Elschot via Mike McCandless)
* LUCENE-1390: Added ASCIIFoldingFilter, a Filter that converts
alphabetic, numeric, and symbolic Unicode characters which are not in
the first 127 ASCII characters (the "Basic Latin" Unicode block) into
their ASCII equivalents, if one exists. ISOLatin1AccentFilter, which
handles a subset of this filter, has been deprecated.
(Andi Vajda, Steven Rowe via Mark Miller)
* LUCENE-1478: Added new SortField constructor allowing you to
specify a custom FieldCache parser to generate numeric values from
terms for a field. (Uwe Schindler via Mike McCandless)
* LUCENE-1528: Add support for Ideographic Space to the queryparser.
(Luis Alves via Michael Busch)
* LUCENE-1487: Added FieldCacheTermsFilter, to filter by multiple
terms on single-valued fields. The filter loads the FieldCache
for the field the first time it's called, and subsequent usage of
that field, even with different Terms in the filter, are fast.
(Tim Sturge, Shalin Shekhar Mangar via Mike McCandless).
* LUCENE-1314: Add clone(), clone(boolean readOnly) and
reopen(boolean readOnly) to IndexReader. Cloning an IndexReader
gives you a new reader which you can make changes to (deletions,
norms) without affecting the original reader. Now, with clone or
reopen you can change the readOnly of the original reader. (Jason
Rutherglen, Mike McCandless)
* LUCENE-1506: Added FilteredDocIdSet, an abstract class which you
subclass to implement the "match" method to accept or reject each
docID. Unlike ChainedFilter (under contrib/misc),
FilteredDocIdSet never requires you to materialize the full
bitset. Instead, match() is called on demand per docID. (John
Wang via Mike McCandless)
* LUCENE-1398: Add ReverseStringFilter to contrib/analyzers, a filter
to reverse the characters in each token. (Koji Sekiguchi via yonik)
* LUCENE-1551: Add expert IndexReader.reopen(IndexCommit) to allow
efficiently opening a new reader on a specific commit, sharing
resources with the original reader. (Torin Danil via Mike
McCandless)
* LUCENE-1434: Added org.apache.lucene.util.IndexableBinaryStringTools,
to encode byte[] as String values that are valid terms, and
maintain sort order of the original byte[] when the bytes are
interpreted as unsigned. (Steven Rowe via Mike McCandless)
* LUCENE-1543: Allow MatchAllDocsQuery to optionally use norms from
a specific fields to set the score for a document. (Karl Wettin
via Mike McCandless)
* LUCENE-1586: Add IndexReader.getUniqueTermCount(). (Mike
McCandless via Derek)
* LUCENE-1516: Added "near real-time search" to IndexWriter, via a
new expert getReader() method. This method returns a reader that
searches the full index, including any uncommitted changes in the
current IndexWriter session. This should result in a faster
turnaround than the normal approach of commiting the changes and
then reopening a reader. (Jason Rutherglen via Mike McCandless)
* LUCENE-1603: Added new MultiTermQueryWrapperFilter, to wrap any
MultiTermQuery as a Filter. Also made some improvements to
MultiTermQuery: return DocIdSet.EMPTY_DOCIDSET if there are no
terms in the enum; track the total number of terms it visited
during rewrite (getTotalNumberOfTerms). FilteredTermEnum is also
more friendly to subclassing. (Uwe Schindler via Mike McCandless)
* LUCENE-1605: Added BitVector.subset(). (Jeremy Volkman via Mike
McCandless)
* LUCENE-1618: Added FileSwitchDirectory that enables files with
specified extensions to be stored in a primary directory and the
rest of the files to be stored in the secondary directory. For
example, this can be useful for the large doc-store (stored
fields, term vectors) files in FSDirectory and the rest of the
index files in a RAMDirectory. (Jason Rutherglen via Mike
McCandless)
* LUCENE-1494: Added FieldMaskingSpanQuery which can be used to
cross-correlate Spans from different fields.
(Paul Cowan and Chris Hostetter)
* LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
deletions into account when considering merges. (Yasuhiro Matsuda
via Mike McCandless)
* LUCENE-1550: Added new n-gram based String distance measure for spell checking.
See the Javadocs for NGramDistance.java for a reference paper on why
this is helpful (Tom Morton via Grant Ingersoll)
* LUCENE-1470, LUCENE-1582, LUCENE-1602, LUCENE-1673, LUCENE-1701, LUCENE-1712:
Added NumericRangeQuery and NumericRangeFilter, a fast alternative to
RangeQuery/RangeFilter for numeric searches. They depend on a specific
structure of terms in the index that can be created by indexing
using the new NumericField or NumericTokenStream classes. NumericField
can only be used for indexing and optionally stores the values as
string representation in the doc store. Documents returned from
IndexReader/IndexSearcher will return only the String value using
the standard Fieldable interface. NumericFields can be sorted on
and loaded into the FieldCache. (Uwe Schindler, Yonik Seeley,
Mike McCandless)
* LUCENE-1405: Added support for Ant resource collections in contrib/ant
<index> task. (Przemyslaw Sztoch via Erik Hatcher)
* LUCENE-1699: Allow setting a TokenStream on Field/Fieldable for indexing
in conjunction with any other ways to specify stored field values,
currently binary or string values. (yonik)
* LUCENE-1701: Made the standard FieldCache.Parsers public and added
parsers for fields generated using NumericField/NumericTokenStream.
All standard parsers now also implement Serializable and enforce
their singleton status. (Uwe Schindler, Mike McCandless)
* LUCENE-1741: User configurable maximum chunk size in MMapDirectory.
On 32 bit platforms, the address space can be very fragmented, so
one big ByteBuffer for the whole file may not fit into address space.
(Eks Dev via Uwe Schindler)
* LUCENE-1644: Enable 4 rewrite modes for queries deriving from
MultiTermQuery (WildcardQuery, PrefixQuery, TermRangeQuery,
NumericRangeQuery): CONSTANT_SCORE_FILTER_REWRITE first creates a
filter and then assigns constant score (boost) to docs;
CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE create a BooleanQuery but
uses a constant score (boost); SCORING_BOOLEAN_QUERY_REWRITE also
creates a BooleanQuery but keeps the BooleanQuery's scores;
CONSTANT_SCORE_AUTO_REWRITE tries to pick the most performant
constant-score rewrite method. (Mike McCandless)
* LUCENE-1448: Added TokenStream.end(), to perform end-of-stream
operations. This is currently used to fix offset problems when
multiple fields with the same name are added to a document.
(Mike McCandless, Mark Miller, Michael Busch)
* LUCENE-1776: Add an option to not collect payloads for an ordered
SpanNearQuery. Payloads were not lazily loaded in this case as
the javadocs implied. If you have payloads and want to use an ordered
SpanNearQuery that does not need to use the payloads, you can
disable loading them with a new constructor switch. (Mark Miller)
* LUCENE-1341: Added PayloadNearQuery to enable SpanNearQuery functionality
with payloads (Peter Keegan, Grant Ingersoll, Mark Miller)
* LUCENE-1790: Added PayloadTermQuery to enable scoring of payloads
based on the maximum payload seen for a document.
Slight refactoring of Similarity and other payload queries (Grant Ingersoll, Mark Miller)
* LUCENE-1749: Addition of FieldCacheSanityChecker utility, and
hooks to use it in all existing Lucene Tests. This class can
be used by any application to inspect the FieldCache and provide
diagnostic information about the possibility of inconsistent
FieldCache usage. Namely: FieldCache entries for the same field
with different datatypes or parsers; and FieldCache entries for
the same field in both a reader, and one of its (descendant) sub
readers.
(Chris Hostetter, Mark Miller)
* LUCENE-1789: Added utility class
oal.search.function.MultiValueSource to ease the transition to
segment based searching for any apps that directly call
oal.search.function.* APIs. This class wraps any other
ValueSource, but takes care when composite (multi-segment) are
passed to not double RAM usage in the FieldCache. (Chris
Hostetter, Mark Miller, Mike McCandless)
Optimizations
* LUCENE-1427: Fixed QueryWrapperFilter to not waste time computing
scores of the query, since they are just discarded. Also, made it
more efficient (single pass) by not creating & populating an
intermediate OpenBitSet (Paul Elschot, Mike McCandless)
* LUCENE-1443: Performance improvement for OpenBitSetDISI.inPlaceAnd()
(Paul Elschot via yonik)
* LUCENE-1484: Remove synchronization of IndexReader.document() by
using CloseableThreadLocal internally. (Jason Rutherglen via Mike
McCandless).
* LUCENE-1124: Short circuit FuzzyQuery.rewrite when input token length
is small compared to minSimilarity. (Timo Nentwig, Mark Miller)
* LUCENE-1316: MatchAllDocsQuery now avoids the synchronized
IndexReader.isDeleted() call per document, by directly accessing
the underlying deleteDocs BitVector. This improves performance
with non-readOnly readers, especially in a multi-threaded
environment. (Todd Feak, Yonik Seeley, Jason Rutherglen via Mike
McCandless)
* LUCENE-1483: When searching over multiple segments we now visit
each sub-reader one at a time. This speeds up warming, since
FieldCache entries (if required) can be shared across reopens for
those segments that did not change, and also speeds up searches
that sort by relevance or by field values. (Mark Miller, Mike
McCandless)
* LUCENE-1575: The new Collector class decouples collect() from
score computation. Collector.setScorer is called to establish the
current Scorer in-use per segment. Collectors that require the
score should then call Scorer.score() per hit inside
collect(). (Shai Erera via Mike McCandless)
* LUCENE-1596: MultiTermDocs speedup when set with
MultiTermDocs.seek(MultiTermEnum) (yonik)
* LUCENE-1653: Avoid creating a Calendar in every call to
DateTools#dateToString, DateTools#timeToString and
DateTools#round. (Shai Erera via Mark Miller)
* LUCENE-1688: Deprecate static final String stop word array and
replace it with an immutable implementation of CharArraySet.
Removes conversions between Set and array.
(Simon Willnauer via Mark Miller)
* LUCENE-1754: BooleanQuery.queryWeight.scorer() will return null if
it won't match any documents (e.g. if there are no required and
optional scorers, or not enough optional scorers to satisfy
minShouldMatch). (Shai Erera via Mike McCandless)
* LUCENE-1607: To speed up string interning for commonly used
strings, the StringHelper.intern() interface was added with a
default implementation that uses a lockless cache.
(Earwin Burrfoot, yonik)
* LUCENE-1800: QueryParser should use reusable TokenStreams. (yonik)
Documentation
* LUCENE-1908: Scoring documentation imrovements in Similarity javadocs.
(Mark Miller, Shai Erera, Ted Dunning, Jiri Kuhn, Marvin Humphrey, Doron Cohen)
* LUCENE-1872: NumericField javadoc improvements
(Michael McCandless, Uwe Schindler)
* LUCENE-1875: Make TokenStream.end javadoc less confusing.
(Uwe Schindler)
* LUCENE-1862: Rectified duplicate package level javadocs for
o.a.l.queryParser and o.a.l.analysis.cn.
(Chris Hostetter)
* LUCENE-1886: Improved hyperlinking in key Analysis javadocs
(Bernd Fondermann via Chris Hostetter)
* LUCENE-1884: massive javadoc and comment cleanup, primarily dealing with
typos.
(Robert Muir via Chris Hostetter)
* LUCENE-1898: Switch changes to use bullets rather than numbers and
update changes-to-html script to handle the new format.
(Steven Rowe, Mark Miller)
* LUCENE-1900: Improve Searchable Javadoc.
(Nadav Har'El, Doron Cohen, Marvin Humphrey, Mark Miller)
* LUCENE-1896: Improve Similarity#queryNorm javadocs.
(Jiri Kuhn, Mark Miller)
Build
* LUCENE-1440: Add new targets to build.xml that allow downloading
and executing the junit testcases from an older release for
backwards-compatibility testing. (Michael Busch)
* LUCENE-1446: Add compatibility tag to common-build.xml and run
backwards-compatibility tests in the nightly build. (Michael Busch)
* LUCENE-1529: Properly test "drop-in" replacement of jar with
backwards-compatibility tests. (Mike McCandless, Michael Busch)
* LUCENE-1851: Change 'javacc' and 'clean-javacc' targets to build
and clean contrib/surround files. (Luis Alves via Michael Busch)
* LUCENE-1854: tar task should use longfile="gnu" to avoid false file
name length warnings. (Mark Miller)
Test Cases
* LUCENE-1791: Enhancements to the QueryUtils and CheckHits utility
classes to wrap IndexReaders and Searchers in MultiReaders or
MultiSearcher when possible to help exercise more edge cases.
(Chris Hostetter, Mark Miller)
* LUCENE-1852: Fix localization test failures.
(Robert Muir via Michael Busch)
* LUCENE-1843: Refactored all tests that use assertAnalyzesTo() & others
in core and contrib to use a new BaseTokenStreamTestCase
base class. Also rewrote some tests to use this general analysis assert
functions instead of own ones (e.g. TestMappingCharFilter).
The new base class also tests tokenization with the TokenStream.next()
backwards layer enabled (using Token/TokenWrapper as attribute
implementation) and disabled (default for Lucene 3.0)
(Uwe Schindler, Robert Muir)
* LUCENE-1836: Added a new LocalizedTestCase as base class for localization
junit tests. (Robert Muir, Uwe Schindler via Michael Busch)
======================= Release 2.4.1 =======================
API Changes
1. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
resources. (Christian Kohlschütter via Mike McCandless)
Bug fixes
1. LUCENE-1452: Fixed silent data-loss case whereby binary fields are
truncated to 0 bytes during merging if the segments being merged
are non-congruent (same field name maps to different field
numbers). This bug was introduced with LUCENE-1219. (Andrzej
Bialecki via Mike McCandless).
2. LUCENE-1429: Don't throw incorrect IllegalStateException from
IndexWriter.close() if you've hit an OOM when autoCommit is true.
(Mike McCandless)
3. LUCENE-1474: If IndexReader.flush() is called twice when there were
pending deletions, it could lead to later false AssertionError
during IndexReader.open. (Mike McCandless)
4. LUCENE-1430: Fix false AlreadyClosedException from IndexReader.open
(masking an actual IOException) that takes String or File path.
(Mike McCandless)
5. LUCENE-1442: Multiple-valued NOT_ANALYZED fields can double-count
token offsets. (Mike McCandless)
6. LUCENE-1453: Ensure IndexReader.reopen()/clone() does not result in
incorrectly closing the shared FSDirectory. This bug would only
happen if you use IndexReader.open() with a File or String argument.
The returned readers are wrapped by a FilterIndexReader that
correctly handles closing of directory after reopen()/clone().
(Mark Miller, Uwe Schindler, Mike McCandless)
7. LUCENE-1457: Fix possible overflow bugs during binary
searches. (Mark Miller via Mike McCandless)
8. LUCENE-1459: Fix CachingWrapperFilter to not throw exception if
both bits() and getDocIdSet() methods are called. (Matt Jones via
Mike McCandless)
9. LUCENE-1519: Fix int overflow bug during segment merging. (Deepak
via Mike McCandless)
10. LUCENE-1521: Fix int overflow bug when flushing segment.
(Shon Vella via Mike McCandless).
11. LUCENE-1544: Fix deadlock in IndexWriter.addIndexes(IndexReader[]).
(Mike McCandless via Doug Sale)
12. LUCENE-1547: Fix rare thread safety issue if two threads call
IndexWriter commit() at the same time. (Mike McCandless)
13. LUCENE-1465: NearSpansOrdered returns payloads from first possible match
rather than the correct, shortest match; Payloads could be returned even
if the max slop was exceeded; The wrong payload could be returned in
certain situations. (Jonathan Mamou, Greg Shackles, Mark Miller)
14. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
resources. (Christian Kohlschütter via Mike McCandless)
15. LUCENE-1552: Fix IndexWriter.addIndexes(IndexReader[]) to properly
rollback IndexWriter's internal state on hitting an
exception. (Scott Garland via Mike McCandless)
======================= Release 2.4.0 =======================
Changes in backwards compatibility policy
1. LUCENE-1340: In a minor change to Lucene's backward compatibility
policy, we are now allowing the Fieldable interface to have
changes, within reason, and made on a case-by-case basis. If an
application implements its own Fieldable, please be aware of
this. Otherwise, no need to be concerned. This is in effect for
all 2.X releases, starting with 2.4. Also note, that in all
likelihood, Fieldable will be changed in 3.0.
Changes in runtime behavior
1. LUCENE-1151: Fix StandardAnalyzer to not mis-identify host names
(eg lucene.apache.org) as an ACRONYM. To get back to the pre-2.4
backwards compatible, but buggy, behavior, you can either call
StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static
method), or, set system property
org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
to "false" on JVM startup. All StandardAnalyzer instances created
after that will then show the pre-2.4 behavior. Alternatively,
you can call setReplaceInvalidAcronym(false) to change the
behavior per instance of StandardAnalyzer. This backwards
compatibility will be removed in 3.0 (hardwiring the value to
true). (Mike McCandless)
2. LUCENE-1044: IndexWriter with autoCommit=true now commits (such
that a reader can see the changes) far less often than it used to.
Previously, every flush was also a commit. You can always force a
commit by calling IndexWriter.commit(). Furthermore, in 3.0,
autoCommit will be hardwired to false (IndexWriter constructors
that take an autoCommit argument have been deprecated) (Mike
McCandless)
3. LUCENE-1335: IndexWriter.addIndexes(Directory[]) and
addIndexesNoOptimize no longer allow the same Directory instance
to be passed in more than once. Internally, IndexWriter uses
Directory and segment name to uniquely identify segments, so
adding the same Directory more than once was causing duplicates
which led to problems (Mike McCandless)
4. LUCENE-1396: Improve PhraseQuery.toString() so that gaps in the
positions are indicated with a ? and multiple terms at the same
position are joined with a |. (Andrzej Bialecki via Mike
McCandless)
API Changes
1. LUCENE-1084: Changed all IndexWriter constructors to take an
explicit parameter for maximum field size. Deprecated all the
pre-existing constructors; these will be removed in release 3.0.
NOTE: these new constructors set autoCommit to false. (Steven
Rowe via Mike McCandless)
2. LUCENE-584: Changed Filter API to return a DocIdSet instead of a
java.util.BitSet. This allows using more efficient data structures
for Filters and makes them more flexible. This deprecates
Filter.bits(), so all filters that implement this outside
the Lucene code base will need to be adapted. See also the javadocs
of the Filter class. (Paul Elschot, Michael Busch)
3. LUCENE-1044: Added IndexWriter.commit() which flushes any buffered
adds/deletes and then commits a new segments file so readers will
see the changes. Deprecate IndexWriter.flush() in favor of
IndexWriter.commit(). (Mike McCandless)
4. LUCENE-325: Added IndexWriter.expungeDeletes methods, which
consult the MergePolicy to find merges necessary to merge away all
deletes from the index. This should be a somewhat lower cost
operation than optimize. (John Wang via Mike McCandless)
5. LUCENE-1233: Return empty array instead of null when no fields
match the specified name in these methods in Document:
getFieldables, getFields, getValues, getBinaryValues. (Stefan
Trcek vai Mike McCandless)
6. LUCENE-1234: Make BoostingSpanScorer protected. (Andi Vajda via Grant Ingersoll)
7. LUCENE-510: The index now stores strings as true UTF-8 bytes
(previously it was Java's modified UTF-8). If any text, either
stored fields or a token, has illegal UTF-16 surrogate characters,
these characters are now silently replaced with the Unicode
replacement character U+FFFD. This is a change to the index file
format. (Marvin Humphrey via Mike McCandless)
8. LUCENE-852: Let the SpellChecker caller specify IndexWriter mergeFactor
and RAM buffer size. (Otis Gospodnetic)
9. LUCENE-1290: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator
and remove all references to these classes from the core. Also update demos
and tutorials. (Michael Busch)
10. LUCENE-1288: Add getVersion() and getGeneration() to IndexCommit.
getVersion() returns the same value that IndexReader.getVersion()
returns when the reader is opened on the same commit. (Jason
Rutherglen via Mike McCandless)
11. LUCENE-1311: Added IndexReader.listCommits(Directory) static
method to list all commits in a Directory, plus IndexReader.open
methods that accept an IndexCommit and open the index as of that
commit. These methods are only useful if you implement a custom
DeletionPolicy that keeps more than the last commit around.
(Jason Rutherglen via Mike McCandless)
12. LUCENE-1325: Added IndexCommit.isOptimized(). (Shalin Shekhar
Mangar via Mike McCandless)
13. LUCENE-1324: Added TokenFilter.reset(). (Shai Erera via Mike
McCandless)
14. LUCENE-1340: Added Fieldable.omitTf() method to skip indexing term
frequency, positions and payloads. This saves index space, and
indexing/searching time. (Eks Dev via Mike McCandless)
15. LUCENE-1219: Add basic reuse API to Fieldable for binary fields:
getBinaryValue/Offset/Length(); currently only lazy fields reuse
the provided byte[] result to getBinaryValue. (Eks Dev via Mike
McCandless)
16. LUCENE-1334: Add new constructor for Term: Term(String fieldName)
which defaults term text to "". (DM Smith via Mike McCandless)
17. LUCENE-1333: Added Token.reinit(*) APIs to re-initialize (reuse) a
Token. Also added term() method to return a String, with a
performance penalty clearly documented. Also implemented
hashCode() and equals() in Token, and fixed all core and contrib
analyzers to use the re-use APIs. (DM Smith via Mike McCandless)
18. LUCENE-1329: Add optional readOnly boolean when opening an
IndexReader. A readOnly reader is not allowed to make changes
(deletions, norms) to the index; in exchanged, the isDeleted
method, often a bottleneck when searching with many threads, is
not synchronized. The default for readOnly is still false, but in
3.0 the default will become true. (Jason Rutherglen via Mike
McCandless)
19. LUCENE-1367: Add IndexCommit.isDeleted(). (Shalin Shekhar Mangar
via Mike McCandless)
20. LUCENE-1061: Factored out all "new XXXQuery(...)" in
QueryParser.java into protected methods newXXXQuery(...) so that
subclasses can create their own subclasses of each Query type.
(John Wang via Mike McCandless)
21. LUCENE-753: Added new Directory implementation
org.apache.lucene.store.NIOFSDirectory, which uses java.nio's
FileChannel to do file reads. On most non-Windows platforms, with
many threads sharing a single searcher, this may yield sizable
improvement to query throughput when compared to FSDirectory,
which only allows a single thread to read from an open file at a
time. (Jason Rutherglen via Mike McCandless)
22. LUCENE-1371: Added convenience method TopDocs Searcher.search(Query query, int n).
(Mike McCandless)
23. LUCENE-1356: Allow easy extensions of TopDocCollector by turning
constructor and fields from package to protected. (Shai Erera
via Doron Cohen)
24. LUCENE-1375: Added convenience method IndexCommit.getTimestamp,
which is equivalent to
getDirectory().fileModified(getSegmentsFileName()). (Mike McCandless)
23. LUCENE-1366: Rename Field.Index options to be more accurate:
TOKENIZED becomes ANALYZED; UN_TOKENIZED becomes NOT_ANALYZED;
NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS
is added. (Mike McCandless)
24. LUCENE-1131: Added numDeletedDocs method to IndexReader (Otis Gospodnetic)
Bug fixes
1. LUCENE-1134: Fixed BooleanQuery.rewrite to only optimize a single
clause query if minNumShouldMatch<=0. (Shai Erera via Michael Busch)
2. LUCENE-1169: Fixed bug in IndexSearcher.search(): searching with
a filter might miss some hits because scorer.skipTo() is called
without checking if the scorer is already at the right position.
scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as
scorer.next(). (Eks Dev, Michael Busch)
3. LUCENE-1182: Added scorePayload to SimilarityDelegator (Andi Vajda via Grant Ingersoll)
4. LUCENE-1213: MultiFieldQueryParser was ignoring slop in case
of a single field phrase. (Trejkaz via Doron Cohen)
5. LUCENE-1228: IndexWriter.commit() was not updating the index version and as
result IndexReader.reopen() failed to sense index changes. (Doron Cohen)
6. LUCENE-1267: Added numDocs() and maxDoc() to IndexWriter;
deprecated docCount(). (Mike McCandless)
7. LUCENE-1274: Added new prepareCommit() method to IndexWriter,
which does phase 1 of a 2-phase commit (commit() does phase 2).
This is needed when you want to update an index as part of a
transaction involving external resources (eg a database). Also
deprecated abort(), renaming it to rollback(). (Mike McCandless)
8. LUCENE-1003: Stop RussianAnalyzer from removing numbers.
(TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
9. LUCENE-1152: SpellChecker fix around clearIndex and indexDictionary
methods, plus removal of IndexReader reference.
(Naveen Belkale via Otis Gospodnetic)
10. LUCENE-1046: Removed dead code in SpellChecker
(Daniel Naber via Otis Gospodnetic)
11. LUCENE-1189: Fixed the QueryParser to handle escaped characters within
quoted terms correctly. (Tomer Gabel via Michael Busch)
12. LUCENE-1299: Fixed NPE in SpellChecker when IndexReader is not null and field is (Grant Ingersoll)
13. LUCENE-1303: Fixed BoostingTermQuery's explanation to be marked as a Match
depending only upon the non-payload score part, regardless of the effect of
the payload on the score. Prior to this, score of a query containing a BTQ
differed from its explanation. (Doron Cohen)
14. LUCENE-1310: Fixed SloppyPhraseScorer to work also for terms repeating more
than twice in the query. (Doron Cohen)
15. LUCENE-1351: ISOLatin1AccentFilter now cleans additional ligatures (Cedrik Lime via Grant Ingersoll)
16. LUCENE-1383: Workaround a nasty "leak" in Java's builtin
ThreadLocal, to prevent Lucene from causing unexpected
OutOfMemoryError in certain situations (notably J2EE
applications). (Chris Lu via Mike McCandless)
New features
1. LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis
process. The flag is not indexed/stored and is thus only used by analysis.
2. LUCENE-1147: Add -segment option to CheckIndex tool so you can
check only a specific segment or segments in your index. (Mike
McCandless)
3. LUCENE-1045: Reopened this issue to add support for short and bytes.
4. LUCENE-584: Added new data structures to o.a.l.util, such as
OpenBitSet and SortedVIntList. These extend DocIdSet and can
directly be used for Filters with the new Filter API. Also changed
the core Filters to use OpenBitSet instead of java.util.BitSet.
(Paul Elschot, Michael Busch)
5. LUCENE-494: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms.
This Analyzer is not intended for use during indexing. (Mark Harwood via Grant Ingersoll)
6. LUCENE-1044: Change Lucene to properly "sync" files after
committing, to ensure on a machine or OS crash or power cut, even
with cached writes, the index remains consistent. Also added
explicit commit() method to IndexWriter to force a commit without
having to close. (Mike McCandless)
7. LUCENE-997: Add search timeout (partial) support.
A TimeLimitedCollector was added to allow limiting search time.
It is a partial solution since timeout is checked only when
collecting a hit, and therefore a search for rare words in a
huge index might not stop within the specified time.
(Sean Timm via Doron Cohen)
8. LUCENE-1184: Allow SnapshotDeletionPolicy to be re-used across
close/re-open of IndexWriter while still protecting an open
snapshot (Tim Brennan via Mike McCandless)
9. LUCENE-1194: Added IndexWriter.deleteDocuments(Query) to delete
documents matching the specified query. Also added static unlock
and isLocked methods (deprecating the ones in IndexReader). (Mike
McCandless)
10. LUCENE-1201: Add IndexReader.getIndexCommit() method. (Tim Brennan
via Mike McCandless)
11. LUCENE-550: Added InstantiatedIndex implementation. Experimental
Index store similar to MemoryIndex but allows for multiple documents
in memory. (Karl Wettin via Grant Ingersoll)
12. LUCENE-400: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper
that wraps another Analyzer's token stream with a ShingleFilter (Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
13. LUCENE-1166: Decomposition tokenfilter for languages like German and Swedish (Thomas Peuss via Grant Ingersoll)
14. LUCENE-1187: ChainedFilter and BooleanFilter now work with new Filter API
and DocIdSetIterator-based filters. Backwards-compatibility with old
BitSet-based filters is ensured. (Paul Elschot via Michael Busch)
15. LUCENE-1295: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public. (Grant Ingersoll)
16. LUCENE-1298: MoreLikeThis can now accept a custom Similarity (Grant Ingersoll)
17. LUCENE-1297: Allow other string distance measures for the SpellChecker
(Thomas Morton via Otis Gospodnetic)
18. LUCENE-1001: Provide access to Payloads via Spans. All existing Span Query implementations in Lucene implement. (Mark Miller, Grant Ingersoll)
19. LUCENE-1354: Provide programmatic access to CheckIndex (Grant Ingersoll, Mike McCandless)
20. LUCENE-1279: Add support for Collators to RangeFilter/Query and Query Parser. (Steve Rowe via Grant Ingersoll)
Optimizations
1. LUCENE-705: When building a compound file, use
RandomAccessFile.setLength() to tell the OS/filesystem to
pre-allocate space for the file. This may improve fragmentation
in how the CFS file is stored, and allows us to detect an upcoming
disk full situation before actually filling up the disk. (Mike
McCandless)
2. LUCENE-1120: Speed up merging of term vectors by bulk-copying the
raw bytes for each contiguous range of non-deleted documents.
(Mike McCandless)
3. LUCENE-1185: Avoid checking if the TermBuffer 'scratch' in
SegmentTermEnum is null for every call of scanTo().
(Christian Kohlschuetter via Michael Busch)
4. LUCENE-1217: Internal to Field.java, use isBinary instead of
runtime type checking for possible speedup of binaryValue().
(Eks Dev via Mike McCandless)
5. LUCENE-1183: Optimized TRStringDistance class (in contrib/spell) that uses
less memory than the previous version. (Cédrik LIME via Otis Gospodnetic)
6. LUCENE-1195: Improve term lookup performance by adding a LRU cache to the
TermInfosReader. In performance experiments the speedup was about 25% on
average on mid-size indexes with ~500,000 documents for queries with 3
terms and about 7% on larger indexes with ~4.3M documents. (Michael Busch)
Documentation
1. LUCENE-1236: Added some clarifying remarks to EdgeNGram*.java (Hiroaki Kawai via Grant Ingersoll)
2. LUCENE-1157 and LUCENE-1256: HTML changes log, created automatically
from CHANGES.txt. This HTML file is currently visible only via developers page.
(Steven Rowe via Doron Cohen)
3. LUCENE-1349: Fieldable can now be changed without breaking backward compatibility rules (within reason. See the note at
the top of this file and also on Fieldable.java). (Grant Ingersoll)
4. LUCENE-1873: Update documentation to reflect current Contrib area status.
(Steven Rowe, Mark Miller)
Build
1. LUCENE-1153: Added JUnit JAR to new lib directory. Updated build to rely on local JUnit instead of ANT/lib.
2. LUCENE-1202: Small fixes to the way Clover is used to work better
with contribs. Of particular note: a single clover db is used
regardless of whether tests are run globally or in the specific
contrib directories.
3. LUCENE-1353: Javacc target in contrib/miscellaneous for
generating the precedence query parser.
Test Cases
1. LUCENE-1238: Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded.
Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped
collector to collect also the last doc, after allowed-tTime passed. (Doron Cohen)
2. LUCENE-1348: relax TestTimeLimitedCollector to not fail due to
timeout exceeded (just because test machine is very busy).
======================= Release 2.3.2 =======================
Bug fixes
1. LUCENE-1191: On hitting OutOfMemoryError in any index-modifying
methods in IndexWriter, do not commit any further changes to the
index to prevent risk of possible corruption. (Mike McCandless)
2. LUCENE-1197: Fixed issue whereby IndexWriter would flush by RAM
too early when TermVectors were in use. (Mike McCandless)
3. LUCENE-1198: Don't corrupt index if an exception happens inside
DocumentsWriter.init (Mike McCandless)
4. LUCENE-1199: Added defensive check for null indexReader before
calling close in IndexModifier.close() (Mike McCandless)
5. LUCENE-1200: Fix rare deadlock case in addIndexes* when
ConcurrentMergeScheduler is in use (Mike McCandless)
6. LUCENE-1208: Fix deadlock case on hitting an exception while
processing a document that had triggered a flush (Mike McCandless)
7. LUCENE-1210: Fix deadlock case on hitting an exception while
starting a merge when using ConcurrentMergeScheduler (Mike McCandless)
8. LUCENE-1222: Fix IndexWriter.doAfterFlush to always be called on
flush (Mark Ferguson via Mike McCandless)
9. LUCENE-1226: Fixed IndexWriter.addIndexes(IndexReader[]) to commit
successfully created compound files. (Michael Busch)
10. LUCENE-1150: Re-expose StandardTokenizer's constants publicly;
this was accidentally lost with LUCENE-966. (Nicolas Lalevée via
Mike McCandless)
11. LUCENE-1262: Fixed bug in BufferedIndexReader.refill whereby on
hitting an exception in readInternal, the buffer is incorrectly
filled with stale bytes such that subsequent calls to readByte()
return incorrect results. (Trejkaz via Mike McCandless)
12. LUCENE-1270: Fixed intermittent case where IndexWriter.close()
would hang after IndexWriter.addIndexesNoOptimize had been
called. (Stu Hood via Mike McCandless)
Build
1. LUCENE-1230: Include *pom.xml* in source release files. (Michael Busch)
======================= Release 2.3.1 =======================
Bug fixes
1. LUCENE-1168: Fixed corruption cases when autoCommit=false and
documents have mixed term vectors (Suresh Guvvala via Mike
McCandless).
2. LUCENE-1171: Fixed some cases where OOM errors could cause
deadlock in IndexWriter (Mike McCandless).
3. LUCENE-1173: Fixed corruption case when autoCommit=false and bulk
merging of stored fields is used (Yonik via Mike McCandless).
4. LUCENE-1163: Fixed bug in CharArraySet.contains(char[] buffer, int
offset, int len) that was ignoring offset and thus giving the
wrong answer. (Thomas Peuss via Mike McCandless)
5. LUCENE-1177: Fix rare case where IndexWriter.optimize might do too
many merges at the end. (Mike McCandless)
6. LUCENE-1176: Fix corruption case when documents with no term
vector fields are added before documents with term vector fields.
(Mike McCandless)
7. LUCENE-1179: Fixed assert statement that was incorrectly
preventing Fields with empty-string field name from working.
(Sergey Kabashnyuk via Mike McCandless)
======================= Release 2.3.0 =======================
Changes in runtime behavior
1. LUCENE-994: Defaults for IndexWriter have been changed to maximize
out-of-the-box indexing speed. First, IndexWriter now flushes by
RAM usage (16 MB by default) instead of a fixed doc count (call
IndexWriter.setMaxBufferedDocs to get backwards compatible
behavior). Second, ConcurrentMergeScheduler is used to run merges
using background threads (call IndexWriter.setMergeScheduler(new
SerialMergeScheduler()) to get backwards compatible behavior).
Third, merges are chosen based on size in bytes of each segment
rather than document count of each segment (call
IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get
backwards compatible behavior).
NOTE: users of ParallelReader must change back all of these
defaults in order to ensure the docIDs "align" across all parallel
indices.
(Mike McCandless)
2. LUCENE-1045: SortField.AUTO didn't work with long. When detecting
the field type for sorting automatically, numbers used to be
interpreted as int, then as float, if parsing the number as an int
failed. Now the detection checks for int, then for long,
then for float. (Daniel Naber)
API Changes
1. LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have
IndexWriter flush whenever the buffered documents are using more
than the specified amount of RAM. Also added new APIs to Token
that allow one to set a char[] plus offset and length to specify a
token (to avoid creating a new String() for each Token). (Mike
McCandless)
2. LUCENE-963: Add setters to Field to allow for re-using a single
Field instance during indexing. This is a sizable performance
gain, especially for small documents. (Mike McCandless)
3. LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
permit re-using of Token and TokenStream instances during
indexing. Changed Token to use a char[] as the store for the
termText instead of String. This gives faster tokenization
performance (~10-15%). (Mike McCandless)
4. LUCENE-847: Factored MergePolicy, which determines which merges
should take place and when, as well as MergeScheduler, which
determines when the selected merges should actually run, out of
IndexWriter. The default merge policy is now
LogByteSizeMergePolicy (see LUCENE-845) and the default merge
scheduler is now ConcurrentMergeScheduler (see
LUCENE-870). (Steven Parkes via Mike McCandless)
5. LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method
that allows you to reduce memory usage of the termInfos by further
sub-sampling (over the termIndexInterval that was used during
indexing) which terms are loaded into memory. (Chuck Williams,
Doug Cutting via Mike McCandless)
6. LUCENE-743: Add IndexReader.reopen() method that re-opens an
existing IndexReader (see New features -> 8.) (Michael Busch)
7. LUCENE-1062: Add setData(byte[] data),
setData(byte[] data, int offset, int length), getData(), getOffset()
and clone() methods to o.a.l.index.Payload. Also add the field name
as arg to Similarity.scorePayload(). (Michael Busch)
8. LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to
"partially optimize" an index down to maxNumSegments segments.
(Mike McCandless)
9. LUCENE-1080: Changed Token.DEFAULT_TYPE to be public.
10. LUCENE-1064: Changed TopDocs constructor to be public.
(Shai Erera via Michael Busch)
11. LUCENE-1079: DocValues cleanup: constructor now has no params,
and getInnerArray() now throws UnsupportedOperationException (Doron Cohen)
12. LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns
the Object (if any) that was bumped from the queue to allow
re-use. (Shai Erera via Mike McCandless)
13. LUCENE-1101: Token reuse 'contract' (defined LUCENE-969)
modified so it is token producer's responsibility
to call Token.clear(). (Doron Cohen)
14. LUCENE-1118: Changed StandardAnalyzer to skip too-long (default >
255 characters) tokens. You can increase this limit by calling
StandardAnalyzer.setMaxTokenLength(...). (Michael McCandless)
Bug fixes
1. LUCENE-933: QueryParser fixed to not produce empty sub
BooleanQueries "()" even if the Analyzer produced no
tokens for input. (Doron Cohen)
2. LUCENE-955: Fixed SegmentTermPositions to work correctly with the
first term in the dictionary. (Michael Busch)
3. LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader
that was thrown after a call of TermPositions.seek().
(Rich Johnson via Michael Busch)
4. LUCENE-938: Fixed cases where an unhandled exception in
IndexWriter's methods could cause deletes to be lost.
(Steven Parkes via Mike McCandless)
5. LUCENE-962: Fixed case where an unhandled exception in
IndexWriter.addDocument or IndexWriter.updateDocument could cause
unreferenced files in the index to not be deleted
(Steven Parkes via Mike McCandless)
6. LUCENE-957: RAMDirectory fixed to properly handle directories
larger than Integer.MAX_VALUE. (Doron Cohen)
7. LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(),
isOptimized() or getVersion() is called. Separated MultiReader
into two classes: MultiSegmentReader extends IndexReader, is
package-protected and is created automatically by IndexReader.open()
in case the index has multiple segments. The public MultiReader
now extends MultiSegmentReader and is intended to be used by users
who want to add their own subreaders. (Daniel Naber, Michael Busch)
8. LUCENE-970: FilterIndexReader now implements isOptimized(). Before
a call of isOptimized() would throw a NPE. (Michael Busch)
9. LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(),
isOptimized() or getVersion() is called. (Michael Busch)
10. LUCENE-948: Fix FNFE exception caused by stale NFS client
directory listing caches when writers on different machines are
sharing an index over NFS and using a custom deletion policy (Mike
McCandless)
11. LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader
close any streams they had opened if an exception is hit in the
constructor. (Ning Li via Mike McCandless)
12. LUCENE-985: If an extremely long term is in a doc (> 16383 chars),
we now throw an IllegalArgumentException saying the term is too
long, instead of cryptic ArrayIndexOutOfBoundsException. (Karl
Wettin via Mike McCandless)
13. LUCENE-991: The explain() method of BoostingTermQuery had errors
when no payloads were present on a document. (Peter Keegan via
Grant Ingersoll)
14. LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again
(this was broken by LUCENE-843). (Ning Li via Mike McCandless)
15. LUCENE-1008: Fixed corruption case when document with no term
vector fields is added after documents with term vector fields.
This bug was introduced with LUCENE-843. (Grant Ingersoll via
Mike McCandless)
16. LUCENE-1006: Fixed QueryParser to accept a "" field value (zero
length quoted string.) (yonik)
17. LUCENE-1010: Fixed corruption case when document with no term
vector fields is added after documents with term vector fields.
This case is hit during merge and would cause an EOFException.
This bug was introduced with LUCENE-984. (Andi Vajda via Mike
McCandless)
19. LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when
autoCommit=false and documents are using stored fields and/or term
vectors. (Mark Miller via Mike McCandless)
20. LUCENE-1011: Fixed corruption case when two or more machines,
sharing an index over NFS, can be writers in quick succession.
(Patrick Kimber via Mike McCandless)
21. LUCENE-1028: Fixed Weight serialization for few queries:
DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery.
Serialization check added for all queries.
(Kyle Maxwell via Doron Cohen)
22. LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the
timeout argument is very large (eg Long.MAX_VALUE). Also added
Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout. (Nikolay
Diakov via Mike McCandless)
23. LUCENE-1050: Throw LockReleaseFailedException in
Simple/NativeFSLockFactory if we fail to delete the lock file when
releasing the lock. (Nikolay Diakov via Mike McCandless)
24. LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in
the merged segment. (Michael Busch)
25. LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent
with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent. (Karl Wettin via Grant Ingersoll)
26. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted
along with iterating the hits. Deleting docs already retrieved
now works seamlessly. If docs not yet retrieved are deleted
(e.g. from another thread), and then, relying on the initial
Hits.length(), an application attempts to retrieve more hits
than actually exist , a ConcurrentMidificationException
is thrown. (Doron Cohen)
27. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking
the type of some tokens incorrectly. This is done by adding a new flag named
replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting
this flag to true fixes the problem. This flag is a temporary fix and is already
marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll)
LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)
28. LUCENE-749: ChainedFilter behavior fixed when logic of
first filter is ANDNOT. (Antonio Bruno via Doron Cohen)
29. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last
term) after next() returns false. (Steven Tamm via Mike
McCandless)
New features
1. LUCENE-906: Elision filter for French.
(Mathieu Lecarme via Otis Gospodnetic)
2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for
not only filtering, but knowing where in a Document a Filter matches
(Grant Ingersoll)
3. LUCENE-868: Added new Term Vector access features. New callback
mechanism allows application to define how and where to read Term
Vectors from disk. This implementation contains several extensions
of the new abstract TermVectorMapper class. The new API should be
back-compatible. No changes in the actual storage of Term Vectors
has taken place.
3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper
to provide information about what document is being accessed.
(Karl Wettin via Grant Ingersoll)
4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for
position based lookup of term vector information.
See item #3 above (LUCENE-868).
5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store)
to verify that locking is working properly. LockVerifyServer runs
a separate server to verify locks. LockStressTest runs a simple
tool that rapidly obtains and releases locks.
VerifyingLockFactory is a LockFactory that wraps any other
LockFactory and consults the LockVerifyServer whenever a lock is
obtained or released, throwing an exception if an illegal lock
obtain occurred. (Patrick Kimber via Mike McCandless)
6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to
support doubles and longs. Added support into SortField for sorting
on doubles and longs as well. (Grant Ingersoll)
7. LUCENE-1020: Created basic index checking & repair tool
(o.a.l.index.CheckIndex). When run without -fix it does a
detailed test of all segments in the index and reports summary
information and any errors it hit. With -fix it will remove
segments that had errors. (Mike McCandless)
8. LUCENE-743: Add IndexReader.reopen() method that re-opens an
existing IndexReader by only loading those portions of an index
that have changed since the reader was (re)opened. reopen() can
be significantly faster than open(), depending on the amount of
index changes. SegmentReader, MultiSegmentReader, MultiReader,
and ParallelReader implement reopen(). (Michael Busch)
9. LUCENE-1040: CharArraySet useful for efficiently checking
set membership of text specified by char[]. (yonik)
10. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a
live backup of an index without pausing indexing. (Mike
McCandless)
11. LUCENE-1019: CustomScoreQuery enhanced to support multiple
ValueSource queries. (Kyle Maxwell via Doron Cohen)
12. LUCENE-1095: Added an option to StopFilter to increase
positionIncrement of the token succeeding a stopped token.
Disabled by default. Similar option added to QueryParser
to consider token positions when creating PhraseQuery
and MultiPhraseQuery. Disabled by default (so by default
the query parser ignores position increments).
(Doron Cohen)
13. LUCENE-1380: Added TokenFilter for setting position increment in special cases related to the ShingleFilter (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll)
Optimizations
1. LUCENE-937: CachingTokenFilter now uses an iterator to access the
Tokens that are cached in the LinkedList. This increases performance
significantly, especially when the number of Tokens is large.
(Mark Miller via Michael Busch)
2. LUCENE-843: Substantial optimizations to improve how IndexWriter
uses RAM for buffering documents and to speed up indexing (2X-8X
faster). A single shared hash table now records the in-memory
postings per unique term and is directly flushed into a single
segment. (Mike McCandless)
3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes
takes place when using compound files. (Mike McCandless)
4. LUCENE-959: Remove synchronization in Document (yonik)
5. LUCENE-963: Add setters to Field to allow for re-using a single
Field instance during indexing. This is a sizable performance
gain, especially for small documents. (Mike McCandless)
6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos
and don't rely on exceptions. (Michael Busch)
7. LUCENE-966: Very substantial speedups (~6X faster) for
StandardTokenizer (StandardAnalyzer) by using JFlex instead of
JavaCC to generate the tokenizer.
(Stanislaw Osinski via Mike McCandless)
8. LUCENE-969: Changed core tokenizers & filters to re-use Token and
TokenStream instances when possible to improve tokenization
performance (~10-15%). (Mike McCandless)
9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike
McCandless)
10. LUCENE-986: Refactored SegmentInfos from IndexReader into the new
subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader
now extend DirectoryIndexReader and are the only IndexReader
implementations that use SegmentInfos to access an index and
acquire a write lock for index modifications. (Michael Busch)
11. LUCENE-1007: Allow flushing in IndexWriter to be triggered by
either RAM usage or document count or both (whichever comes
first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable
one of the flush triggers. (Ning Li via Mike McCandless)
12. LUCENE-1043: Speed up merging of stored fields by bulk-copying the
raw bytes for each contiguous range of non-deleted documents.
(Robert Engels via Mike McCandless)
13. LUCENE-693: Speed up nested conjunctions (~2x) that match many
documents, and a slight performance increase for top level
conjunctions. (yonik)
14. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static
and final. (Nathan Beyer via Michael Busch)
Documentation
1. LUCENE-1051: Generate separate javadocs for core, demo and contrib
classes, as well as an unified view. Also add an appropriate menu
structure to the website. (Michael Busch)
2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery.
(Ronnie Kolehmainen via Michael Busch)
Build
1. LUCENE-908: Improvements and simplifications for how the MANIFEST
file and the META-INF dir are created. (Michael Busch)
2. LUCENE-935: Various improvements for the maven artifacts. Now the
artifacts also include the sources as .jar files. (Michael Busch)
3. Added apply-patch target to top-level build. Defaults to looking for
a patch in ${basedir}/../patches with name specified by -Dpatch.name.
Can also specify any location by -Dpatch.file property on the command
line. This should be helpful for easy application of patches, but it
is also a step towards integrating automatic patch application with
JIRA and Hudson, and is thus subject to change. (Grant Ingersoll)
4. LUCENE-935: Defined property "m2.repository.url" to allow setting
the url to a maven remote repository to deploy to. (Michael Busch)
5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)
6. LUCENE-1055: Remove gdata-server from build files and its sources
from trunk. (Michael Busch)
7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository
via scp and ssh authentication. (Michael Busch)
8. LUCENE-1123: Allow overriding the specification version for
MANIFEST.MF (Michael Busch)
Test Cases
1. LUCENE-766: Test adding two fields with the same name but different
term vector setting. (Nicolas Lalevée via Doron Cohen)
======================= Release 2.2.0 =======================
Changes in runtime behavior
API Changes
1. LUCENE-793: created new exceptions and added them to throws clause
for many methods (all subclasses of IOException for backwards
compatibility): index.StaleReaderException,
index.CorruptIndexException, store.LockObtainFailedException.
This was done to better call out the possible root causes of an
IOException from these methods. (Mike McCandless)
2. LUCENE-811: make SegmentInfos class, plus a few methods from related
classes, package-private again (they were unnecessarily made public
as part of LUCENE-701). (Mike McCandless)
3. LUCENE-710: added optional autoCommit boolean to IndexWriter
constructors. When this is false, index changes are not committed
until the writer is closed. This gives explicit control over when
a reader will see the changes. Also added optional custom
deletion policy to explicitly control when prior commits are
removed from the index. This is intended to allow applications to
share an index over NFS by customizing when prior commits are
deleted. (Mike McCandless)
4. LUCENE-818: changed most public methods of IndexWriter,
IndexReader (and its subclasses), FieldsReader and RAMDirectory to
throw AlreadyClosedException if they are accessed after being
closed. (Mike McCandless)
5. LUCENE-834: Changed some access levels for certain Span classes to allow them
to be overridden. They have been marked expert only and not for public
consumption. (Grant Ingersoll)
6. LUCENE-796: Removed calls to super.* from various get*Query methods in
MultiFieldQueryParser, in order to allow sub-classes to override them.
(Steven Parkes via Otis Gospodnetic)
7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
combination when caching is desired.
(Chris Hostetter, Otis Gospodnetic)
8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
to enable extensibility of these classes. (Michael Busch)
9. LUCENE-580: Added the public method reset() to TokenStream. This method does
nothing by default, but may be overwritten by subclasses to support consuming
the TokenStream more than once. (Michael Busch)
10. LUCENE-580: Added a new constructor to Field that takes a TokenStream as
argument, available as tokenStreamValue(). This is useful to avoid the need of
"dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
11. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
improves performance for certain queries but results in scoring out of docid
order. This patch reverse this change, so now by default hit docs are scored
in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
This patch also enables the tests in QueryUtils again that check for docid
order. (Paul Elschot, Doron Cohen, Michael Busch)
12. LUCENE-888: Added Directory.openInput(File path, int bufferSize)
to optionally specify the size of the read buffer. Also added
BufferedIndexInput.setBufferSize(int) to change the buffer size.
(Mike McCandless)
13. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need
to be public because it implements the public interface TermPositionVector.
(Michael Busch)
Bug fixes
1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen)
2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
Query parser modified to create a prefix query only for the case
that there is a single trailing wildcard (and no additional wildcard
or '?' in the query text). (Doron Cohen)
3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory
and SimpleFSLockFactory. This enables all 4 builtin LockFactory
implementations to be specified via the System property
org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless)
4. LUCENE-821: The new single-norm-file introduced by LUCENE-756
failed to reduce the number of open descriptors since it was still
opened once per field with norms. (yonik)
5. LUCENE-823: Make sure internal file handles are closed when
hitting an exception (eg disk full) while flushing deletes in
IndexWriter's mergeSegments, and also during
IndexWriter.addIndexes. (Mike McCandless)
6. LUCENE-825: If directory is removed after
FSDirectory.getDirectory() but before IndexReader.open you now get
a FileNotFoundException like Lucene pre-2.1 (before this fix you
got an NPE). (Mike McCandless)
7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
because the backslash is the escape character. Also changed the ESCAPED_CHAR
list to contain all possible characters, because every character that
follows a backslash should be considered as escaped. (Michael Busch)
8. LUCENE-372: QueryParser.parse() now ensures that the entire input string
is consumed. Now a ParseException is thrown if a query contains too many
closing parentheses. (Andreas Neumann via Michael Busch)
9. LUCENE-814: javacc build targets now fix line-end-style of generated files.
Now also deleting all javacc generated files before calling javacc.
(Steven Parkes, Doron Cohen)
10. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
11. LUCENE-828: Minor fix for Term's equal().
(Paul Cowan via Otis Gospodnetic)
12. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
and you call addIndexes, and hit an exception (eg disk full) then
when IndexWriter rolls back its internal state this could corrupt
the instance of IndexWriter (but, not the index itself) by
referencing already deleted segments. This bug was only present
in 2.2 (trunk), ie was never released. (Mike McCandless)
13. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
14. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
15. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
has written the postings. Then the resources associated with the
TokenStreams can safely be released. (Michael Busch)
16. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()
won't insert terms twice anymore. (Daniel Naber)
17. LUCENE-881: QueryParser.escape() now also escapes the characters
'|' and '&' which are part of the queryparser syntax. (Michael Busch)
18. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
anymore and ignored, but re-thrown. Some javadoc improvements.
(Daniel Naber)
19. LUCENE-698: FilteredQuery now takes the query boost into account for
scoring. (Michael Busch)
20. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
enumeration. (Christian Mallwitz via Daniel Naber)
21. LUCENE-903: FilteredQuery explanation inaccuracy with boost.
Explanation tests now "deep" check the explanation details.
(Chris Hostetter, Doron Cohen)
22. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the
skip target param and ends up at the first match.
(Sudaakeran B. via Chris Hostetter & Doron Cohen)
23. LUCENE-913: Two consecutive score() calls return different
scores for Boolean Queries. (Michael Busch, Doron Cohen)
24. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the
box", again, by moving set/getMaxMergeDocs up from
LogDocMergePolicy into LogMergePolicy. This fixes the API
breakage (non backwards compatible change) caused by LUCENE-994.
(Yonik Seeley via Mike McCandless)
New features
1. LUCENE-759: Added two n-gram-producing TokenFilters.
(Otis Gospodnetic)
2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with
RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
These metadata are called Payloads. For every position of a Token one Payload in the form
of a variable length byte array can be stored in the prox file.
Remark: The APIs introduced with this feature are in experimental state and thus
contain appropriate warnings in the javadocs.
(Michael Busch)
4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the
values of a payload (see #3 above.) (Grant Ingersoll)
5. LUCENE-834: Similarity has a new method for scoring payloads called
scorePayloads that can be overridden to take advantage of payload
storage (see #3 above)
6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
implemented it in the appropriate places (Grant Ingersoll)
7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
on the remote side of the RMI connection.
(Matt Ericson via Otis Gospodnetic)
8. LUCENE-446: Added Solr's search.function for scores based on field
values, plus CustomScoreQuery for simple score (post) customization.
(Yonik Seeley, Doron Cohen)
9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more
Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two
Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations
between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples.
(Grant Ingersoll, Michael Busch, Yonik Seeley)
Optimizations
1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
when nextPosition() is called for the first time. This allows using instances
of SegmentTermPositions instead of SegmentTermDocs without additional costs.
(Michael Busch)
2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
IndexOutput directly now. This avoids further buffering and thus avoids
unnecessary array copies. (Michael Busch)
3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
cases and possibly improve scoring performance. Documents can now be
delivered out-of-order as they are scored (e.g. to HitCollector).
N.B. A bit of code had to be disabled in QueryUtils in order for
TestBoolean2 test to keep passing.
(Paul Elschot via Otis Gospodnetic)
4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
them to keep the spell index small. (Daniel Naber)
5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
Together with LUCENE-888 this will allow to adjust the buffer size
dynamically. (Paul Elschot, Michael Busch)
6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
BufferedIndexOutput. Also increase buffer size in
BufferedIndexInput, but only when used during merging. Together,
these increases yield 10-18% overall performance gain vs the
previous 1K defaults. (Mike McCandless)
7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
up most queries that use skipTo(), especially on big indexes with large posting
lists. For average AND queries the speedup is about 20%, for queries that
contain very frequent and very unique terms the speedup can be over 80%.
(Michael Busch)
Documentation
1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
http://wiki.apache.org/lucene-java/ Updated the links in the docs and
wherever else I found references. (Grant Ingersoll, Joe Schaefer)
2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
consistent with java.util.Comparator.compare(): Any integer is allowed to
be returned instead of only -1/0/1.
(Paul Cowan via Michael Busch)
3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
Solved javadoc errors under jdk5 (jars in path for gdata).
Made "javadocs" target depend on "build-contrib" for first downloading
contrib jars configured for dynamic downloaded. (Note: when running
behind firewall, a firewall prompt might pop up) (Doron Cohen)
4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)
6. LUCENE-926: Added document package javadocs. (Grant Ingersoll)
Build
1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
(Steven Parkes via Michael Busch)
2. LUCENE-885: "ant test" now includes all contrib tests. The new
"ant test-core" target can be used to run only the Core (non
contrib) tests.
(Chris Hostetter)
3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages).
(Doron Cohen)
4. LUCENE-894: Add custom build file for binary distributions that includes
targets to build the demos. (Chris Hostetter, Michael Busch)
5. LUCENE-904: The "package" targets in build.xml now also generate .md5
checksum files. (Chris Hostetter, Michael Busch)
6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
demo war, demo jar, and the contrib jars. (Michael Busch)
7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)
8. LUCENE-908: Improves content of MANIFEST file and makes it customizable
for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
(Chris Hostetter, Michael Busch)
9. LUCENE-930: Various contrib building improvements to ensure contrib
dependencies are met, and test compilation errors fail the build.
(Steven Parkes, Chris Hostetter)
10. LUCENE-622: Add ant target and pom.xml files for building maven artifacts
of the Lucene core and the contrib modules.
(Sami Siren, Karl Wettin, Michael Busch)
======================= Release 2.1.0 =======================
Changes in runtime behavior
1. 's' and 't' have been removed from the list of default stopwords
in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
as a stopword meant that 's-class' led to the same results as 'class'.
Note that this problem still exists for 'a', e.g. in 'a-class' as
'a' continues to be a stopword.
(Daniel Naber)
2. LUCENE-478: Updated the list of Unicode code point ranges for CJK
(now split into CJ and K) in StandardAnalyzer. (John Wang and
Steven Rowe via Otis Gospodnetic)
3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
and added a few more of them to increase CJK character coverage.
Also documented some of the ranges.
(Otis Gospodnetic)
4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
QueryParser. Default is to disallow them, as before.
(Steven Parkes via Otis Gospodnetic)
5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
for range queries. Added useOldRangeQuery property to QueryParser to allow
selection of old RangeQuery class if required.
(Mark Harwood)
6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
does not contain a wildcard character (? or *), when previously a
StringIndexOutOfBoundsException was thrown.
(Michael Busch via Erik Hatcher)
7. LUCENE-726: Removed the use of deprecated doc.fields() method and
Enumeration.
(Michael Busch via Otis Gospodnetic)
8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
and added a call to enumerators.remove() in TermInfosReader.close().
The finalize() overrides were added to help with a pre-1.4.2 JVM bug
that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
(Otis Gospodnetic)
9. LUCENE-771: The default location of the write lock is now the
index directory, and is named simply "write.lock" (without a big
digest prefix). The system properties "org.apache.lucene.lockDir"
nor "java.io.tmpdir" are no longer used as the global directory
for storing lock files, and the LOCK_DIR field of FSDirectory is
now deprecated. (Mike McCandless)
New features
1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
(Samphan Raruenrom via Chris Hostetter)
2. LUCENE-545: New FieldSelector API and associated changes to
IndexReader and implementations. New Fieldable interface for use
with the lazy field loading mechanism. (Grant Ingersoll and Chuck
Williams via Grant Ingersoll)
3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura
Smolsky, Yonik Seeley)
4. LUCENE-678: Added NativeFSLockFactory, which implements locking
using OS native locking (via java.nio.*). (Michael McCandless via
Yonik Seeley)
5. LUCENE-544: Added the ability to specify different boosts for
different fields when using MultiFieldQueryParser (Matt Ericson
via Otis Gospodnetic)
6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
optimize the index when adding new segments, only performing
merges as needed. (Ning Li via Yonik Seeley)
7. LUCENE-573: QueryParser now allows backslash escaping in
quoted terms and phrases. (Michael Busch via Yonik Seeley)
8. LUCENE-716: QueryParser now allows specification of Unicode
characters in terms via a unicode escape of the form \uXXXX
(Michael Busch via Yonik Seeley)
9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
and IndexWriter.flushRamSegments(), allowing applications to
control the amount of memory used to buffer documents.
(Chuck Williams via Yonik Seeley)
10. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery
(Yonik Seeley)
11. LUCENE-741: Command-line utility for modifying or removing norms
on fields in an existing index. This is mostly based on LUCENE-496
and lives in contrib/miscellaneous.
(Chris Hostetter, Otis Gospodnetic)
12. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
their passing unit tests.
(Otis Gospodnetic)
13. LUCENE-565: Added methods to IndexWriter to more efficiently
handle updating documents (the "delete then add" use case). This
is intended to be an eventual replacement for the existing
IndexModifier. Added IndexWriter.flush() (renamed from
flushRamSegments()) to flush all pending updates (held in RAM), to
the Directory. (Ning Li via Mike McCandless)
14. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
which allow one to retrieve the size of a field without retrieving the
actual field. (Chuck Williams via Grant Ingersoll)
15. LUCENE-799: Properly handle lazy, compressed fields.
(Mike Klaas via Grant Ingersoll)
API Changes
1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow
changing of termText via setTermText(). (Yonik Seeley)
2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
and is supposed to be replaced with the WordlistLoader class in
package org.apache.lucene.analysis (Daniel Naber)
3. LUCENE-609: Revert return type of Document.getField(s) to Field
for backward compatibility, added new Document.getFieldable(s)
for access to new lazy loaded fields. (Yonik Seeley)
4. LUCENE-608: Document.fields() has been deprecated and a new method
Document.getFields() has been added that returns a List instead of
an Enumeration (Daniel Naber)
5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
subclass allows explain methods to produce Explanations which model
"matching" independent of having a positive value.
(Chris Hostetter)
6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
and IndexWriter.setDefaultCommitLockTimeout for overriding default
timeout values for all future instances of IndexWriter (as well
as for any other classes that may reference the static values,
ie: IndexReader).
(Michael McCandless via Chris Hostetter)
7. LUCENE-638: FSDirectory.list() now only returns the directory's
Lucene-related files. Thanks to this change one can now construct
a RAMDirectory from a file system directory that contains files
not related to Lucene.
(Simon Willnauer via Daniel Naber)
8. LUCENE-635: Decoupling locking implementation from Directory
implementation. Added set/getLockFactory to Directory and moved
all locking code into subclasses of abstract class LockFactory.
FSDirectory and RAMDirectory still default to their prior locking
implementations, but now you can mix & match, for example using
SingleInstanceLockFactory (ie, in memory locking) locking with an
FSDirectory. Note that now you must call setDisableLocks before
the instantiation a FSDirectory if you wish to disable locking
for that Directory.
(Michael McCandless, Jeff Patterson via Yonik Seeley)
9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
(Steven Parkes via Otis Gospodnetic)
10. LUCENE-701: Lockless commits: a commit lock is no longer required
when a writer commits and a reader opens the index. This includes
a change to the index file format (see docs/fileformats.html for
details). It also removes all APIs associated with the commit
lock & its timeout. Readers are now truly read-only and do not
block one another on startup. This is the first step to getting
Lucene to work correctly over NFS (second step is
LUCENE-710). (Mike McCandless)
11. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
in Similarity's MoreLikeThis class. The misspelling has been
replaced by the correct spelling.
(Andi Vajda via Daniel Naber)
12. LUCENE-738: Reduce the size of the file that keeps track of which
documents are deleted when the number of deleted documents is
small. This changes the index file format and cannot be
read by previous versions of Lucene. (Doron Cohen via Yonik Seeley)
13. LUCENE-756: Maintain all norms in a single .nrm file to reduce the
number of open files and file descriptors for the non-compound index
format. This changes the index file format, but maintains the
ability to read and update older indices. The first segment merge
on an older format index will create a single .nrm file for the new
segment. (Doron Cohen via Yonik Seeley)
14. LUCENE-732: DateTools support has been added to QueryParser, with
setters for both the default Resolution, and per-field Resolution.
For backwards compatibility, DateField is still used if no Resolutions
are specified. (Michael Busch via Chris Hostetter)
15. Added isOptimized() method to IndexReader.
(Otis Gospodnetic)
16. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
take a boolean "create" argument. Instead you should use
IndexWriter's "create" argument to create a new index.
(Mike McCandless)
17. LUCENE-780: Add a static Directory.copy() method to copy files
from one Directory to another. (Jiri Kuhn via Mike McCandless)
18. LUCENE-773: Added Directory.clearLock(String name) to forcefully
remove an old lock. The default implementation is to ask the
lockFactory (if non null) to clear the lock. (Mike McCandless)
19. LUCENE-795: Directory.renameFile() has been deprecated as it is
not used anymore inside Lucene. (Daniel Naber)
Bug fixes
1. Fixed the web application demo (built with "ant war-demo") which
didn't work because it used a QueryParser method that had
been removed (Daniel Naber)
2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement
(Yonik Seeley)
3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar
(Karl Wettin via Yonik Seeley)
4. LUCENE-587: Explanation.toHtml was producing malformed HTML
(Chris Hostetter)
5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
6. LUCENE-601: RAMDirectory and RAMFile made Serializable
(Karl Wettin via Otis Gospodnetic)
7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
Explanations match up with the real scores.
(Chris Hostetter)
8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to
new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
disambiguate inner class scorer's use of doc() in BooleanScorer2,
other test code changes. (DM Smith via Yonik Seeley)
10. LUCENE-451: All core query types now use ComplexExplanations so that
boosts of zero don't confuse the BooleanWeight explain method.
(Chris Hostetter)
11. LUCENE-593: Fixed LuceneDictionary's inner Iterator
(Kåre Fiedler Christiansen via Otis Gospodnetic)
12. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
(Daniel Naber)
13. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
14. LUCENE-650: Fixed NPE in Locale specific String Sort when Document
has no value.
(Oliver Hutchison via Chris Hostetter)
15. LUCENE-683: Fixed data corruption when reading lazy loaded fields.
(Yonik Seeley)
16. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
lock to be shared between different directories.
(Michael McCandless via Yonik Seeley)
17. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.
(Yonik Seeley)
18. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
called on it before next(). (Yonik Seeley)
19. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
to recognize ordered spans if they overlapped with unordered spans.
(Paul Elschot via Chris Hostetter)
20. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
21. LUCENE-715: Fixed private constructor in IndexWriter.java to
properly release the acquired write lock if there is an
IOException after acquiring the write lock but before finishing
instantiation. (Matthew Bogosian via Mike McCandless)
22. LUCENE-651: Multiple different threads requesting the same
FieldCache entry (often for Sorting by a field) at the same
time caused multiple generations of that entry, which was
detrimental to performance and memory use.
(Oliver Hutchison via Otis Gospodnetic)
23. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.
(Doron Cohen via Otis Gospodnetic)
24. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
classes from contrib/similarity, as their new home is under
contrib/queries.
(Otis Gospodnetic)
25. LUCENE-669: Do not double-close the RandomAccessFile in
FSIndexInput/Output during finalize(). Besides sending an
IOException up to the GC, this may also be the cause intermittent
"The handle is invalid" IOExceptions on Windows when trying to
close readers or writers. (Michael Busch via Mike McCandless)
26. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
on any exceptions (eg disk full). The semantics of these methods
is now transactional: either all indices are merged or none are.
Also fixed IndexWriter.mergeSegments (called outside of
addIndexes(*) by addDocument, optimize, flushRamSegments) and
IndexReader.commit() (called by close) to clean up and keep the
instance state consistent to what's actually in the index (Mike
McCandless).
27. LUCENE-129: Change finalizers to do "try {...} finally
{super.finalize();}" to make sure we don't miss finalizers in
classes above us. (Esmond Pitt via Mike McCandless)
28. LUCENE-754: Fix a problem introduced by LUCENE-651, causing
IndexReaders to hang around forever, in addition to not
fixing the original FieldCache performance problem.
(Chris Hostetter, Yonik Seeley)
29. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
correctly raise ArrayIndexOutOfBoundsException when docNum is too
large. Previously, if docNum was only slightly too large (within
the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
exception would be raised and instead the index would become
silently corrupted. The corruption then only appears much later,
in mergeSegments, when the corrupted segment is merged with
segment(s) after it. (Mike McCandless)
30. LUCENE-768: Fix case where an Exception during deleteDocument,
undeleteAll or setNorm in IndexReader could leave the reader in a
state where close() fails to release the write lock.
(Mike McCandless)
31. Remove "tvp" from known index file extensions because it is
never used. (Nicolas Lalevée via Bernhard Messer)
32. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
rely on file length check and instead use the SegmentInfo's
docCount that's already stored explicitly in the index. This is a
defensive bug fix (ie, there is no known problem seen "in real
life" due to this, just a possible future problem). (Chuck
Williams via Mike McCandless)
Optimizations
1. LUCENE-586: TermDocs.skipTo() is now more efficient for
multi-segment indexes. This will improve the performance of many
types of queries against a non-optimized index. (Andrew Hudson
via Yonik Seeley)
2. LUCENE-623: RAMDirectory.close now nulls out its reference to all
internal "files", allowing them to be GCed even if references to the
RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
3. LUCENE-629: Compressed fields are no longer uncompressed and
recompressed during segment merges (e.g. during indexing or
optimizing), thus improving performance . (Michael Busch via Otis
Gospodnetic)
4. LUCENE-388: Improve indexing performance when maxBufferedDocs is
large by keeping a count of buffered documents rather than
counting after each document addition. (Doron Cohen, Paul Smith,
Yonik Seeley)
5. Modified TermScorer.explain to use TermDocs.skipTo() instead of
looping through docs. (Grant Ingersoll)
6. LUCENE-672: New indexing segment merge policy flushes all
buffered docs to their own segment and delays a merge until
mergeFactor segments of a certain level have been accumulated.
This increases indexing performance in the presence of deleted
docs or partially full segments as well as enabling future
optimizations.
NOTE: this also fixes an "under-merging" bug whereby it is
possible to get far too many segments in your index (which will
drastically slow down search, risks exhausting file descriptor
limit, etc.). This can happen when the number of buffered docs
at close, plus the number of docs in the last non-ram segment is
greater than mergeFactor. (Ning Li, Yonik Seeley)
7. Lazy loaded fields unnecessarily retained an extra copy of loaded
String data. (Yonik Seeley)
8. LUCENE-443: ConjunctionScorer performance increase. Speed up
any BooleanQuery with more than one mandatory clause.
(Abdul Chaudhry, Paul Elschot via Yonik Seeley)
9. LUCENE-365: DisjunctionSumScorer performance increase of
~30%. Speeds up queries with optional clauses. (Paul Elschot via
Yonik Seeley)
10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
size buffers, which will speed up merging and retrieving binary
and compressed fields. (Nadav Har'El via Yonik Seeley)
11. LUCENE-687: Lazy skipping on proximity file speeds up most
queries involving term positions, including phrase queries.
(Michael Busch via Yonik Seeley)
12. LUCENE-714: Replaced 2 cases of manual for-loop array copying
with calls to System.arraycopy instead, in DocumentWriter.java.
(Nicolas Lalevee via Mike McCandless)
13. LUCENE-729: Non-recursive skipTo and next implementation of
TermDocs for a MultiReader. The old implementation could
recurse up to the number of segments in the index. (Yonik Seeley)
14. LUCENE-739: Improve segment merging performance by reusing
the norm array across different fields and doing bulk writes
of norms of segments with no deleted docs.
(Michael Busch via Yonik Seeley)
15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
to the List of clauses and replaced the internal synchronized Vector
with an unsynchronized List. (Yonik Seeley)
16. LUCENE-750: Remove finalizers from FSIndexOutput and move the
FSIndexInput finalizer to the actual file so all clones don't
register a new finalizer. (Yonik Seeley)
Test Cases
1. Added TestTermScorer.java (Grant Ingersoll)
2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)
3. LUCENE-744 Append the user.name property onto the temporary directory
that is created so it doesn't interfere with other users. (Grant Ingersoll)
Documentation
1. Added style sheet to xdocs named lucene.css and included in the
Anakia VSL descriptor. (Grant Ingersoll)
2. Added scoring.xml document into xdocs. Updated Similarity.java
scoring formula.(Grant Ingersoll and Steve Rowe. Updates from:
Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
Issue 664.
3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
4. Moved xdocs directory to src/site/src/documentation/content/xdocs per
Issue 707. Site now builds using Forrest, just like the other Lucene
siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll)
6. LUCENE-713 Updated the Term Vector section of File Formats to include
documentation on how Offset and Position info are stored in the TVF file.
(Grant Ingersoll, Samir Abdou)
7. Added in link to Clover Test Code Coverage Reports under the Develop
section in Resources (Grant Ingersoll)
8. LUCENE-748: Added details for semantics of IndexWriter.close on
hitting an Exception. (Jed Wesley-Smith via Mike McCandless)
9. Added some text about what is contained in releases.
(Eric Haszlakiewicz via Grant Ingersoll)
10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
makes a full copy of the starting Directory. (Mike McCandless)
11. LUCENE-764: Fix javadocs to detail temporary space requirements
for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
methods. (Mike McCandless)
Build
1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721
To enable clover code coverage, you must have clover.jar in the ANT
classpath and specify -Drun.clover=true on the command line.
(Michael Busch and Grant Ingersoll)
2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
${build.dir}/test just like the tempDir sysproperty.
3. LUCENE-757 Added new target named init-dist that does setup for
distribution of both binary and source distributions. Called by package
and package-*-src
======================= Release 2.0.0 =======================
API Changes
1. All deprecated methods and fields have been removed, except
DateField, which will still be supported for some time
so Lucene can read its date fields from old indexes
(Yonik Seeley & Grant Ingersoll)
2. DisjunctionSumScorer is no longer public.
(Paul Elschot via Otis Gospodnetic)
3. Creating a Field with both an empty name and an empty value
now throws an IllegalArgumentException
(Daniel Naber)
4. LUCENE-301: Added new IndexWriter({String,File,Directory},
Analyzer) constructors that do not take a boolean "create"
argument. These new constructors will create a new index if
necessary, else append to the existing one. (Dan Armbrust via
Mike McCandless)
New features
1. LUCENE-496: Command line tool for modifying the field norms of an
existing index; added to contrib/miscellaneous. (Chris Hostetter)
2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.
(Chris Hostetter)
Bug fixes
1. LUCENE-330: Fix issue of FilteredQuery not working properly within
BooleanQuery. (Paul Elschot via Erik Hatcher)
2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
with RemoteSearchable. (Philippe Laflamme via Yonik Seeley)
3. Added methods to get/set writeLockTimeout and commitLockTimeout in
IndexWriter. These could be set in Lucene 1.4 using a system property.
This feature had been removed without adding the corresponding
getter/setter methods. (Daniel Naber)
4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
when using SpanQueries. (Paul Elschot via Yonik Seeley)
5. Implemented FilterIndexReader.getVersion() and isCurrent()
(Yonik Seeley)
6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
that sometimes caused the index order of documents to change.
(Yonik Seeley)
7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
subsequent String sorts with different locales to sort identically.
(Paul Cowan via Yonik Seeley)
8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery
(Stefan Will via Yonik Seeley)
9. LUCENE-514: Added getTermArrays() and extractTerms() to
MultiPhraseQuery (Eric Jain & Yonik Seeley)
10. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors
(frederic via Yonik)
11. LUCENE-352: Fixed bug in SpanNotQuery that manifested as
NullPointerException when "exclude" query was not a SpanTermQuery.
(Chris Hostetter)
12. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
(Chris Hostetter)
13. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
didn't know about the field yet, reader didn't keep track if it had deletions,
and deleteDocument calls could circumvent synchronization on the subreaders.
(Chuck Williams via Yonik Seeley)
14. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
ConstantScoreQuery in order to allow their use with a MultiSearcher.
(Yonik Seeley)
15. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.
(Peter Royal, Michael Chan, Yonik Seeley)
16. LUCENE-485: Don't hold commit lock while removing obsolete index
files. (Luc Vanlerberghe via cutting)
1.9.1
Bug fixes
1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization
introduced in 1.9-final. (Shay Banon & Steven Tamm via cutting)
1.9 final
Note that this release is mostly but not 100% source compatible with
the previous release of Lucene (1.4.3). In other words, you should
make sure your application compiles with this version of Lucene before
you replace the old Lucene JAR with the new one. Many methods have
been deprecated in anticipation of release 2.0, so deprecation
warnings are to be expected when upgrading from 1.4.3 to 1.9.
Bug fixes
1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
effects on indexing performance and has thus been reverted. The
argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
an exception is thrown. (Daniel Naber)
Optimizations
1. Optimized BufferedIndexOutput.writeBytes() to use
System.arraycopy() in more cases, rather than copying byte-by-byte.
(Lukas Zapletal via Cutting)
1.9 RC1
Requirements
1. To compile and use Lucene you now need Java 1.4 or later.
Changes in runtime behavior
1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
FuzzyQuery expands to more than BooleanQuery.maxClauseCount
terms only the BooleanQuery.maxClauseCount most similar terms
go into the rewritten query and thus the exception is avoided.
(Christoph)
2. Changed system property from "org.apache.lucene.lockdir" to
"org.apache.lucene.lockDir", so that its casing follows the existing
pattern used in other Lucene system properties. (Bernhard)
3. The terms of RangeQueries and FuzzyQueries are now converted to
lowercase by default (as it has been the case for PrefixQueries
and WildcardQueries before). Use setLowercaseExpandedTerms(false)
to disable that behavior but note that this also affects
PrefixQueries and WildcardQueries. (Daniel Naber)
4. Document frequency that is computed when MultiSearcher is used is now
computed correctly and "globally" across subsearchers and indices, while
before it used to be computed locally to each index, which caused
ranking across multiple indices not to be equivalent.
(Chuck Williams, Wolf Siberski via Otis, bug #31841)
5. When opening an IndexWriter with create=true, Lucene now only deletes
its own files from the index directory (looking at the file name suffixes
to decide if a file belongs to Lucene). The old behavior was to delete
all files. (Daniel Naber and Bernhard Messer, bug #34695)
6. The version of an IndexReader, as returned by getCurrentVersion()
and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
is now initialized by the system time in milliseconds.
(Bernhard Messer via Daniel Naber)
7. Several default values cannot be set via system properties anymore, as
this has been considered inappropriate for a library like Lucene. For
most properties there are set/get methods available in IndexWriter which
you should use instead. This affects the following properties:
See IndexWriter for getter/setter methods:
org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
org.apache.lucene.mergeFactor,
See BooleanQuery for getter/setter methods:
org.apache.lucene.maxClauseCount
See FSDirectory for getter/setter methods:
disableLuceneLocks
(Daniel Naber)
8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
instead of using Integer and Float classes for parsing.
(Yonik Seeley via Otis Gospodnetic)
9. Expert level search routines returning TopDocs and TopFieldDocs
no longer normalize scores. This also fixes bugs related to
MultiSearchers and score sorting/normalization.
(Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
New features
1. Added support for stored compressed fields (patch #31149)
(Bernhard Messer via Christoph)
2. Added support for binary stored fields (patch #29370)
(Drew Farris and Bernhard Messer via Christoph)
3. Added support for position and offset information in term vectors
(patch #18927). (Grant Ingersoll & Christoph)
4. A new class DateTools has been added. It allows you to format dates
in a readable format adequate for indexing. Unlike the existing
DateField class DateTools can cope with dates before 1970 and it
forces you to specify the desired date resolution (e.g. month, day,
second, ...) which can make RangeQuerys on those fields more efficient.
(Daniel Naber)
5. QueryParser now correctly works with Analyzers that can return more
than one token per position. For example, a query "+fast +car"
would be parsed as "+fast +(car automobile)" if the Analyzer
returns "car" and "automobile" at the same position whenever it
finds "car" (Patch #23307).
(Pierrick Brihaye, Daniel Naber)
6. Permit unbuffered Directory implementations (e.g., using mmap).
InputStream is replaced by the new classes IndexInput and
BufferedIndexInput. OutputStream is replaced by the new classes
IndexOutput and BufferedIndexOutput. InputStream and OutputStream
are now deprecated and FSDirectory is now subclassable. (cutting)
7. Add native Directory and TermDocs implementations that work under
GCJ. These require GCC 3.4.0 or later and have only been tested
on Linux. Use 'ant gcj' to build demo applications. (cutting)
8. Add MMapDirectory, which uses nio to mmap input files. This is
still somewhat slower than FSDirectory. However it uses less
memory per query term, since a new buffer is not allocated per
term, which may help applications which use, e.g., wildcard
queries. It may also someday be faster. (cutting & Paul Elschot)
9. Added javadocs-internal to build.xml - bug #30360
(Paul Elschot via Otis)
10. Added RangeFilter, a more generically useful filter than DateFilter.
(Chris M Hostetter via Erik)
11. Added NumberTools, a utility class indexing numeric fields.
(adapted from code contributed by Matt Quail; committed by Erik)
12. Added public static IndexReader.main(String[] args) method.
IndexReader can now be used directly at command line level
to list and optionally extract the individual files from an existing
compound index file.
(adapted from code contributed by Garrett Rooney; committed by Bernhard)
13. Add IndexWriter.setTermIndexInterval() method. See javadocs.
(Doug Cutting)
14. Added LucenePackage, whose static get() method returns java.util.Package,
which lets the caller get the Lucene version information specified in
the Lucene Jar.
(Doug Cutting via Otis)
15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
This provides standard java.util.Iterator iteration over Hits.
Each call to the iterator's next() method returns a Hit object.
(Jeremy Rayner via Erik)
16. Add ParallelReader, an IndexReader that combines separate indexes
over different fields into a single virtual index. (Doug Cutting)
17. Add IntParser and FloatParser interfaces to FieldCache, so that
fields in arbitrarily formats can be cached as ints and floats.
(Doug Cutting)
18. Added class org.apache.lucene.index.IndexModifier which combines
IndexWriter and IndexReader, so you can add and delete documents without
worrying about synchronization/locking issues.
(Daniel Naber)
19. Lucene can now be used inside an unsigned applet, as Lucene's access
to system properties will not cause a SecurityException anymore.
(Jon Schuster via Daniel Naber, bug #34359)
20. Added a new class MatchAllDocsQuery that matches all documents.
(John Wang via Daniel Naber, bug #34946)
21. Added ability to omit norms on a per field basis to decrease
index size and memory consumption when there are many indexed fields.
See Field.setOmitNorms()
(Yonik Seeley, LUCENE-448)
22. Added NullFragmenter to contrib/highlighter, which is useful for
highlighting entire documents or fields.
(Erik Hatcher)
23. Added regular expression queries, RegexQuery and SpanRegexQuery.
Note the same term enumeration caveats apply with these queries as
apply to WildcardQuery and other term expanding queries.
These two new queries are not currently supported via QueryParser.
(Erik Hatcher)
24. Added ConstantScoreQuery which wraps a filter and produces a score
equal to the query boost for every matching document.
(Yonik Seeley, LUCENE-383)
25. Added ConstantScoreRangeQuery which produces a constant score for
every document in the range. One advantage over a normal RangeQuery
is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
number of terms the range can cover. Both endpoints may also be open.
(Yonik Seeley, LUCENE-383)
26. Added ability to specify a minimum number of optional clauses that
must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch().
(Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
27. Added DisjunctionMaxQuery which provides the maximum score across its clauses.
It's very useful for searching across multiple fields.
(Chuck Williams via Yonik Seeley, LUCENE-323)
28. New class ISOLatin1AccentFilter that replaces accented characters in the ISO
Latin 1 character set by their unaccented equivalent.
(Sven Duzont via Erik Hatcher)
29. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
This is useful for data like zip codes, ids, and some product names.
(Erik Hatcher)
30. Copied LengthFilter from contrib area to core. Removes words that are too
long and too short from the stream.
(David Spencer via Otis and Daniel)
31. Added getPositionIncrementGap(String fieldName) to Analyzer. This allows
custom analyzers to put gaps between Field instances with the same field
name, preventing phrase or span queries crossing these boundaries. The
default implementation issues a gap of 0, allowing the default token
position increment of 1 to put the next field's first token into a
successive position.
(Erik Hatcher, with advice from Yonik)
32. StopFilter can now ignore case when checking for stop words.
(Grant Ingersoll via Yonik, LUCENE-248)
33. Add TopDocCollector and TopFieldDocCollector. These simplify the
implementation of hit collectors that collect only the
top-scoring or top-sorting hits.
API Changes
1. Several methods and fields have been deprecated. The API documentation
contains information about the recommended replacements. It is planned
that most of the deprecated methods and fields will be removed in
Lucene 2.0. (Daniel Naber)
2. The Russian and the German analyzers have been moved to contrib/analyzers.
Also, the WordlistLoader class has been moved one level up in the
hierarchy and is now org.apache.lucene.analysis.WordlistLoader
(Daniel Naber)
3. The API contained methods that declared to throw an IOException
but that never did this. These declarations have been removed. If
your code tries to catch these exceptions you might need to remove
those catch clauses to avoid compile errors. (Daniel Naber)
4. Add a serializable Parameter Class to standardize parameter enum
classes in BooleanClause and Field. (Christoph)
5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
This allows custom SpanQuery subclasses that rewrite (for term expansion, for
example) to nest within the built-in SpanQuery classes successfully.
Bug fixes
1. The JSP demo page (src/jsp/results.jsp) now properly closes the
IndexSearcher it opens. (Daniel Naber)
2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
prevented deletion of obsolete segments. (Christoph Goller)
3. Fix in FieldInfos to avoid the return of an extra blank field in
IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
PhrasePrefixQuery) could provoke UnsupportedOperationException
(bug #33161). (Rhett Sutphin via Daniel Naber)
5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
if skipTo() was called without prior call to next() fixed. (Christoph)
6. Disable Similiarty.coord() in the scoring of most automatically
generated boolean queries. The coord() score factor is
appropriate when clauses are independently specified by a user,
but is usually not appropriate when clauses are generated
automatically, e.g., by a fuzzy, wildcard or range query. Matches
on such automatically generated queries are no longer penalized
for not matching all terms. (Doug Cutting, Patch #33472)
7. Getting a lock file with Lock.obtain(long) was supposed to wait for
a given amount of milliseconds, but this didn't work.
(John Wang via Daniel Naber, Bug #33799)
8. Fix FSDirectory.createOutput() to always create new files.
Previously, existing files were overwritten, and an index could be
corrupted when the old version of a file was longer than the new.
Now any existing file is first removed. (Doug Cutting)
9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
could return an incorrect number of hits.
(Reece Wilton via Erik Hatcher, Bug #35157)
10. Fix NullPointerException that could occur with a MultiPhraseQuery
inside a BooleanQuery.
(Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)
11. Fixed SnowballFilter to pass through the position increment from
the original token.
(Yonik Seeley via Erik Hatcher, LUCENE-437)
12. Added Unicode range of Korean characters to StandardTokenizer,
grouping contiguous characters into a token rather than one token
per character. This change also changes the token type to "<CJ>"
for Chinese and Japanese character tokens (previously it was "<CJK>").
(Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
13. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
FieldInfo.storePositionWithTermVector and creates the Field with
correct TermVector parameter.
(Frank Steinmann via Bernhard, LUCENE-455)
14. Fixed WildcardQuery to prevent "cat" matching "ca??".
(Xiaozheng Ma via Bernhard, LUCENE-306)
15. Fixed a bug where MultiSearcher and ParallelMultiSearcher could
change the sort order when sorting by string for documents without
a value for the sort field.
(Luc Vanlerberghe via Yonik, LUCENE-453)
16. Fixed a sorting problem with MultiSearchers that can lead to
missing or duplicate docs due to equal docs sorting in an arbitrary order.
(Yonik Seeley, LUCENE-456)
17. A single hit using the expert level sorted search methods
resulted in the score not being normalized.
(Yonik Seeley, LUCENE-462)
18. Fixed inefficient memory usage when loading an index into RAMDirectory.
(Volodymyr Bychkoviak via Bernhard, LUCENE-475)
19. Corrected term offsets returned by ChineseTokenizer.
(Ray Tsang via Erik Hatcher, LUCENE-324)
20. Fixed MultiReader.undeleteAll() to correctly update numDocs.
(Robert Kirchgessner via Doug Cutting, LUCENE-479)
21. Race condition in IndexReader.getCurrentVersion() and isCurrent()
fixed by acquiring the commit lock.
(Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
22. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
this has now been fixed. (Daniel Naber)
23. Fixed QueryParser when called with a date in local form like
"[1/16/2000 TO 1/18/2000]". This query did not include the documents
of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
24. Removed sorting constraint that threw an exception if there were
not yet any values for the sort field (Yonik Seeley, LUCENE-374)
Optimizations
1. Disk usage (peak requirements during indexing and optimization)
in case of compound file format has been improved.
(Bernhard, Dmitry, and Christoph)
2. Optimize the performance of certain uses of BooleanScorer,
TermScorer and IndexSearcher. In particular, a BooleanQuery
composed of TermQuery, with not all terms required, that returns a
TopDocs (e.g., through a Hits with no Sort specified) runs much
faster. (cutting)
3. Removed synchronization from reading of term vectors with an
IndexReader (Patch #30736). (Bernhard Messer via Christoph)
4. Optimize term-dictionary lookup to allocate far fewer terms when
scanning for the matching term. This speeds searches involving
low-frequency terms, where the cost of dictionary lookup can be
significant. (cutting)
5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
of 0 now run 20-50% faster (Patch #31882).
(Jonathan Hager via Daniel Naber)
6. A Version of BooleanScorer (BooleanScorer2) added that delivers
documents in increasing order and implements skipTo. For queries
with required or forbidden clauses it may be faster than the old
BooleanScorer, for BooleanQueries consisting only of optional
clauses it is probably slower. The new BooleanScorer is now the
default. (Patch 31785 by Paul Elschot via Christoph)
7. Use uncached access to norms when merging to reduce RAM usage.
(Bug #32847). (Doug Cutting)
8. Don't read term index when random-access is not required. This
reduces time to open IndexReaders and they use less memory when
random access is not required, e.g., when merging segments. The
term index is now read into memory lazily at the first
random-access. (Doug Cutting)
9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
added indexes is larger than mergeFactor. Previously this could
result in quadratic performance. Now performance is n log(n).
(Doug Cutting)
10. Speed up the creation of TermEnum for indices with multiple
segments and deleted documents, and thus speed up PrefixQuery,
RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
and sorting the first time on a field.
(Yonik Seeley, LUCENE-454)
11. Optimized and generalized 32 bit floating point to byte
(custom 8 bit floating point) conversions. Increased the speed of
Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
(Yonik Seeley, LUCENE-467)
Infrastructure
1. Lucene's source code repository has converted from CVS to
Subversion. The new repository is at
http://svn.apache.org/repos/asf/lucene/java/trunk
2. Lucene's issue tracker has migrated from Bugzilla to JIRA.
Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
The old issues are still available at
http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx
(use the bug number instead of xxxx)
1.4.3
1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
messages which might contain user input (e.g. error messages about
query parsing). If you used that page as a starting point for your
own code please make sure your code also properly escapes HTML
characters from user input in order to avoid so-called cross site
scripting attacks. (Daniel Naber)
2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
API is supported again. (Christoph)
1.4.2
1. Fixed bug #31241: Sorting could lead to incorrect results (documents
missing, others duplicated) if the sort keys were not unique and there
were more than 100 matches. (Daniel Naber)
2. Memory leak in Sort code (bug #31240) eliminated.
(Rafal Krzewski via Christoph and Daniel)
3. FuzzyQuery now takes an additional parameter that specifies the
minimum similarity that is required for a term to match the query.
The QueryParser syntax for this is term~x, where x is a floating
point number >= 0 and < 1 (a bigger number means that a higher
similarity is required). Furthermore, a prefix can be specified
for FuzzyQuerys so that only those terms are considered similar that
start with this prefix. This can speed up FuzzyQuery greatly.
(Daniel Naber, Christoph Goller)
4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
of relative positions. (Christoph Goller)
5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
(patch #9110); some unused method parameters removed; The ability
to specify a minimum similarity for FuzzyQuery has been added.
(Christoph Goller)
6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
for every non-zero-scoring hit. This makes 'OR' queries that
contain common terms substantially faster. (cutting)
1.4.1
1. Fixed a performance bug in hit sorting code, where values were not
correctly cached. (Aviran via cutting)
2. Fixed errors in file format documentation. (Daniel Naber)
1.4 final
1. Added "an" to the list of stop words in StopAnalyzer, to complement
the existing "a" there. Fix for bug 28960
(http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
2. Added new class FieldCache to manage in-memory caches of field term
values. (Tim Jones)
3. Added overloaded getFieldQuery method to QueryParser which
accepts the slop factor specified for the phrase (or the default
phrase slop for the QueryParser instance). This allows overriding
methods to replace a PhraseQuery with a SpanNearQuery instead,
keeping the proper slop factor. (Erik Hatcher)
4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
UTF-8 and changed the build encoding to UTF-8, to make changed files
compile. (Otis Gospodnetic)
5. Removed synchronization from term lookup under IndexReader methods
termFreq(), termDocs() or termPositions() to improve
multi-threaded performance. (cutting)
6. Fix a bug where obsolete segment files were not deleted on Win32.
1.4 RC3
1. Fixed several search bugs introduced by the skipTo() changes in
release 1.4RC1. The index file format was changed a bit, so
collections must be re-indexed to take advantage of the skipTo()
optimizations. (Christoph Goller)
2. Added new Document methods, removeField() and removeFields().
(Christoph Goller)
3. Fixed inconsistencies with index closing. Indexes and directories
are now only closed automatically by Lucene when Lucene opened
them automatically. (Christoph Goller)
4. Added new class: FilteredQuery. (Tim Jones)
5. Added a new SortField type for custom comparators. (Tim Jones)
6. Lock obtain timed out message now displays the full path to the lock
file. (Daniel Naber via Erik)
7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
8. Fixed so that FSDirectory's locks still work when the
java.io.tmpdir system property is null. (cutting)
9. Changed FilteredTermEnum's constructor to take no parameters,
as the parameters were ignored anyway (bug #28858)
1.4 RC2
1. GermanAnalyzer now throws an exception if the stopword file
cannot be found (bug #27987). It now uses LowerCaseFilter
(bug #18410) (Daniel Naber via Otis, Erik)
2. Fixed a few bugs in the file format documentation. (cutting)
1.4 RC1
1. Changed the format of the .tis file, so that:
- it has a format version number, which makes it easier to
back-compatibly change file formats in the future.
- the term count is now stored as a long. This was the one aspect
of the Lucene's file formats which limited index size.
- a few internal index parameters are now stored in the index, so
that they can (in theory) now be changed from index to index,
although there is not yet an API to do so.
These changes are back compatible. The new code can read old
indexes. But old code will not be able read new indexes. (cutting)
2. Added an optimized implementation of TermDocs.skipTo(). A skip
table is now stored for each term in the .frq file. This only
adds a percent or two to overall index size, but can substantially
speedup many searches. (cutting)
3. Restructured the Scorer API and all Scorer implementations to take
advantage of an optimized TermDocs.skipTo() implementation. In
particular, PhraseQuerys and conjunctive BooleanQuerys are
faster when one clause has substantially fewer matches than the
others. (A conjunctive BooleanQuery is a BooleanQuery where all
clauses are required.) (cutting)
4. Added new class ParallelMultiSearcher. Combined with
RemoteSearchable this makes it easy to implement distributed
search systems. (Jean-Francois Halleux via cutting)
5. Added support for hit sorting. Results may now be sorted by any
indexed field. For details see the javadoc for
Searcher#search(Query, Sort). (Tim Jones via Cutting)
6. Changed FSDirectory to auto-create a full directory tree that it
needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
7. Added a new span-based query API. This implements, among other
things, nested phrases. See javadocs for details. (Doug Cutting)
8. Added new method Query.getSimilarity(Searcher), and changed
scorers to use it. This permits one to subclass a Query class so
that it can specify its own Similarity implementation, perhaps
one that delegates through that of the Searcher. (Julien Nioche
via Cutting)
9. Added MultiReader, an IndexReader that combines multiple other
IndexReaders. (Cutting)
10. Added support for term vectors. See Field#isTermVectorStored().
(Grant Ingersoll, Cutting & Dmitry)
11. Fixed the old bug with escaping of special characters in query
strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
(Jean-Francois Halleux via Otis)
12. Added support for overriding default values for the following,
using system properties:
- default commit lock timeout
- default maxFieldLength
- default maxMergeDocs
- default mergeFactor
- default minMergeDocs
- default write lock timeout
(Otis)
13. Changed QueryParser.jj to allow '-' and '+' within tokens:
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
(Morus Walter via Otis)
14. Changed so that the compound index format is used by default.
This makes indexing a bit slower, but vastly reduces the chances
of file handle problems. (Cutting)
1.3 final
1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
throw ParseException instead. (Erik Hatcher)
2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
3. Added a new method IndexReader.setNorm(), that permits one to
alter the boosting of fields after an index is created.
4. Distinguish between the final position and length when indexing a
field. The length is now defined as the total number of tokens,
instead of the final position, as it was previously. Length is
used for score normalization (Similarity.lengthNorm()) and for
controlling memory usage (IndexWriter.maxFieldLength). In both of
these cases, the total number of tokens is a better value to use
than the final token position. Position is used in phrase
searching (see PhraseQuery and Token.setPositionIncrement()).
5. Fix StandardTokenizer's handling of CJK characters (Chinese,
Japanese and Korean ideograms). Previously contiguous sequences
were combined in a single token, which is not very useful. Now
each ideogram generates a separate token, which is more useful.
1.3 RC3
1. Added minMergeDocs in IndexWriter. This can be raised to speed
indexing without altering the number of files, but only using more
memory. (Julien Nioche via Otis)
2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
3. Fix bug #16952, in demo HTML parser, skip comments in
javascript. (Christoph Goller)
4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
output (Daniel Naber via Christoph Goller)
5. Fix bug #24301, in demo HTML parser, long titles no longer
hang things. (Christoph Goller)
6. Fix bug #23534, Replace use of file timestamp of segments file
with an index version number stored in the segments file. This
resolves problems when running on file systems with low-resolution
timestamps, e.g., HFS under MacOS X. (Christoph Goller)
7. Fix QueryParser so that TokenMgrError is not thrown, only
ParseException. (Erik Hatcher)
8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
9. Fixed a problem compiling TestRussianStem. (Christoph Goller)
10. Cleaned up some build stuff. (Erik Hatcher)
1.3 RC2
1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
SegmentsReader. (Julien Nioche via otis)
2. Changed file locking to place lock files in
System.getProperty("java.io.tmpdir"), where all users are
permitted to write files. This way folks can open and correctly
lock indexes which are read-only to them.
3. IndexWriter: added a new method, addDocument(Document, Analyzer),
permitting one to easily use different analyzers for different
documents in the same index.
4. Minor enhancements to FuzzyTermEnum.
(Christoph Goller via Otis)
5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
and MultiIndexSearcher to use it.
(Christoph Goller via Otis)
6. Fixed a bug in IndexWriter that returned incorrect docCount().
(Christoph Goller via Otis)
7. Fixed SegmentsReader to eliminate the confusing and slightly different
behaviour of TermEnum when dealing with an enumeration of all terms,
versus an enumeration starting from a specific term.
This patch also fixes incorrect term document frequencies when the same term
is present in multiple segments.
(Christoph Goller via Otis)
8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
9. Added support for the new "compound file" index format (Dmitry
Serebrennikov)
10. Added Locale setting to QueryParser, for use by date range parsing.
11. Changed IndexReader so that it can be subclassed by classes
outside of its package. Previously it had package-private
abstract methods. Also modified the index merging code so that it
can work on an arbitrary IndexReader implementation, and added a
new method, IndexWriter.addIndexes(IndexReader[]), to take
advantage of this. (cutting)
12. Added a limit to the number of clauses which may be added to a
BooleanQuery. The default limit is 1024 clauses. This should
stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
queries which run amok. (cutting)
13. Add new method: IndexReader.undeleteAll(). This undeletes all
deleted documents which still remain in the index. (cutting)
1.3 RC1
1. Fixed PriorityQueue's clear() method.
Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
(Matthijs Bomhoff via otis)
2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
(Dale Anson via otis)
3. Added the ability to disable lock creation by using disableLuceneLocks
system property. This is useful for read-only media, such as CD-ROMs.
(otis)
4. Added id method to Hits to be able to access the index global id.
Required for sorting options.
(carlson)
5. Added support for new range query syntax to QueryParser.jj.
(briangoetz)
6. Added the ability to retrieve HTML documents' META tag values to
HTMLParser.jj.
(Mark Harwood via otis)
7. Modified QueryParser to make it possible to programmatically specify the
default Boolean operator (OR or AND).
(Péter Halácsy via otis)
8. Made many search methods and classes non-final, per requests.
This includes IndexWriter and IndexSearcher, among others.
(cutting)
9. Added class RemoteSearchable, providing support for remote
searching via RMI. The test class RemoteSearchableTest.java
provides an example of how this can be used. (cutting)
10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
test class TestPhrasePrefixQuery provides the usage example.
(Anders Nielsen via otis)
11. Changed the German stemming algorithm to ignore case while
stripping. The new algorithm is faster and produces more equal
stems from nouns and verbs derived from the same word.
(gschwarz)
12. Added support for boosting the score of documents and fields via
the new methods Document.setBoost(float) and Field.setBoost(float).
Note: This changes the encoding of an indexed value. Indexes
should be re-created from scratch in order for search scores to
be correct. With the new code and an old index, searches will
yield very large scores for shorter fields, and very small scores
for longer fields. Once the index is re-created, scores will be
as before. (cutting)
13. Added new method Token.setPositionIncrement().
This permits, for the purpose of phrase searching, placing
multiple terms in a single position. This is useful with
stemmers that produce multiple possible stems for a word.
This also permits the introduction of gaps between terms, so that
terms which are adjacent in a token stream will not be matched by
and exact phrase query. This makes it possible, e.g., to build
an analyzer where phrases are not matched over stop words which
have been removed.
Finally, repeating a token with an increment of zero can also be
used to boost scores of matches on that token. (cutting)
14. Added new Filter class, QueryFilter. This constrains search
results to only match those which also match a provided query.
Results are cached, so that searches after the first on the same
index using this filter are very fast.
This could be used, for example, with a RangeQuery on a formatted
date field to implement date filtering. One could re-use a
single QueryFilter that matches, e.g., only documents modified
within the last week. The QueryFilter and RangeQuery would only
need to be reconstructed once per day. (cutting)
15. Added a new IndexWriter method, getAnalyzer(). This returns the
analyzer used when adding documents to this index. (cutting)
16. Fixed a bug with IndexReader.lastModified(). Before, document
deletion did not update this. Now it does. (cutting)
17. Added Russian Analyzer.
(Boris Okner via otis)
18. Added a public, extensible scoring API. For details, see the
javadoc for org.apache.lucene.search.Similarity.
19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
(Peter Mularien via otis)
21. Added getFields(String) and getValues(String) methods.
Contributed by Rasik Pandey on 2002-10-09
(Rasik Pandey via otis)
22. Revised internal search APIs. Changes include:
a. Queries are no longer modified during a search. This makes
it possible, e.g., to reuse the same query instance with
multiple indexes from multiple threads.
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
etc.) now work correctly with MultiSearcher, fixing bugs 12619
and 12667.
c. Boosting BooleanQuery's now works, and is supported by the
query parser (problem reported by Lee Mallabone). Thus a query
like "(+foo +bar)^2 +baz" is now supported and equivalent to
"(+foo^2 +bar^2) +baz".
d. New method: Query.rewrite(IndexReader). This permits a
query to re-write itself as an alternate, more primitive query.
Most of the term-expanding query classes (PrefixQuery,
WildcardQuery, etc.) are now implemented using this method.
e. New method: Searchable.explain(Query q, int doc). This
returns an Explanation instance that describes how a particular
document is scored against a query. An explanation can be
displayed as either plain text, with the toString() method, or
as HTML, with the toHtml() method. Note that computing an
explanation is as expensive as executing the query over the
entire index. This is intended to be used in developing
Similarity implementations, and, for good performance, should
not be displayed with every hit.
f. Scorer and Weight are public, not package protected. It now
possible for someone to write a Scorer implementation that is
not in the org.apache.lucene.search package. This is still
fairly advanced programming, and I don't expect anyone to do
this anytime soon, but at least now it is possible.
g. Added public accessors to the primitive query classes
(TermQuery, PhraseQuery and BooleanQuery), permitting access to
their terms and clauses.
Caution: These are extensive changes and they have not yet been
tested extensively. Bug reports are appreciated.
(cutting)
23. Added convenience RAMDirectory constructors taking File and String
arguments, for easy FSDirectory to RAMDirectory conversion.
(otis)
24. Added code for manual renaming of files in FSDirectory, since it
has been reported that java.io.File's renameTo(File) method sometimes
fails on Windows JVMs.
(Matt Tucker via otis)
25. Refactored QueryParser to make it easier for people to extend it.
Added the ability to automatically lower-case Wildcard terms in
the QueryParser.
(Tatu Saloranta via otis)
1.2 RC6
1. Changed QueryParser.jj to have "?" be a special character which
allowed it to be used as a wildcard term. Updated TestWildcard
unit test also. (Ralf Hettesheimer via carlson)
1.2 RC5
1. Renamed build.properties to default.properties and updated
the BUILD.txt document to describe how to override the
default.property settings without having to edit the file. This
brings the build process closer to Scarab's build process.
(jon)
2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
3. Updated "powered by" links. (otis)
4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
5. Added throwing exception if FSDirectory could not create directory
- Bug #6914 (Eugene Gluzberg via otis)
6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
LowerCaseTokenizer javadoc (otis)
7. Added fix to avoid NullPointerException in results.jsp
(Mark Hayes via otis)
8. Changed Wildcard search to find 0 or more char instead of 1 or more
(Lee Mallobone, via otis)
9. Fixed error in offset issue in GermanStemFilter - Bug #7412
(Rodrigo Reyes, via otis)
10. Added unit tests for wildcard search and DateFilter (otis)
11. Allow co-existence of indexed and non-indexed fields with the same name
(cutting/casper, via otis)
12. Add escape character to query parser.
(briangoetz)
13. Applied a patch that ensures that searches that use DateFilter
don't throw an exception when no matches are found. (David Smiley, via
otis)
14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
1.2 RC4
1. Updated contributions section of website.
Add XML Document #3 implementation to Document Section.
Also added Term Highlighting to Misc Section. (carlson)
2. Fixed NullPointerException for phrase searches containing
unindexed terms, introduced in 1.2RC3. (cutting)
3. Changed document deletion code to obtain the index write lock,
enforcing the fact that document addition and deletion cannot be
performed concurrently. (cutting)
4. Various documentation cleanups. (otis, acoliver)
5. Updated "powered by" links. (cutting, jon)
6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
7. Changed Term and Query to implement Serializable. (scottganyo)
8. Fixed to never delete indexes added with IndexWriter.addIndexes().
(cutting)
9. Upgraded to JUnit 3.7. (otis)
1.2 RC3
1. IndexWriter: fixed a bug where adding an optimized index to an
empty index failed. This was encountered using addIndexes to copy
a RAMDirectory index to an FSDirectory.
2. RAMDirectory: fixed a bug where RAMInputStream could not read
across more than across a single buffer boundary.
3. Fix query parser so it accepts queries with unicode characters.
(briangoetz)
4. Fix query parser so that PrefixQuery is used in preference to
WildcardQuery when there's only an asterisk at the end of the
term. Previously PrefixQuery would never be used.
5. Fix tests so they compile; fix ant file so it compiles tests
properly. Added test cases for Analyzers and PriorityQueue.
6. Updated demos, added Getting Started documentation. (acoliver)
7. Added 'contributions' section to website & docs. (carlson)
8. Removed JavaCC from source distribution for copyright reasons.
Folks must now download this separately from metamata in order to
compile Lucene. (cutting)
9. Substantially improved the performance of DateFilter by adding the
ability to reuse TermDocs objects. (cutting)
10. Added IndexReader methods:
public static boolean indexExists(String directory);
public static boolean indexExists(File directory);
public static boolean indexExists(Directory directory);
public static boolean isLocked(Directory directory);
public static void unlock(Directory directory);
(cutting, otis)
11. Fixed bugs in GermanAnalyzer (gschwarz)
1.2 RC2
- added sources to distribution
- removed broken build scripts and libraries from distribution
- SegmentsReader: fixed potential race condition
- FSDirectory: fixed so that getDirectory(xxx,true) correctly
erases the directory contents, even when the directory
has already been accessed in this JVM.
- RangeQuery: Fix issue where an inclusive range query would
include the nearest term in the index above a non-existant
specified upper term.
- SegmentTermEnum: Fix NullPointerException in clone() method
when the Term is null.
- JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
since they rely on a feature added in JDK 1.2.
1.2 RC1
- first Apache release
- packages renamed from com.lucene to org.apache.lucene
- license switched from LGPL to Apache
- ant-only build -- no more makefiles
- addition of lock files--now fully thread & process safe
- addition of German stemmer
- MultiSearcher now supports low-level search API
- added RangeQuery, for term-range searching
- Analyzers can choose tokenizer based on field name
- misc bug fixes.
1.01b
. last Sourceforge release
. a few bug fixes
. new Query Parser
. new prefix query (search for "foo*" matches "food")
1.0
This release fixes a few serious bugs and also includes some
performance optimizations, a stemmer, and a few other minor
enhancements.
0.04
Lucene now includes a grammar-based tokenizer, StandardTokenizer.
The only tokenizer included in the previous release (LetterTokenizer)
identified terms consisting entirely of alphabetic characters. The
new tokenizer uses a regular-expression grammar to identify more
complex classes of terms, including numbers, acronyms, email
addresses, etc.
StandardTokenizer serves two purposes:
1. It is a much better, general purpose tokenizer for use by
applications as is.
The easiest way for applications to start using
StandardTokenizer is to use StandardAnalyzer.
2. It provides a good example of grammar-based tokenization.
If an application has special tokenization requirements, it can
implement a custom tokenizer by copying the directory containing
the new tokenizer into the application and modifying it
accordingly.
0.01
First open source release.
The code has been re-organized into a new package and directory
structure for this release. It builds OK, but has not been tested
beyond that since the re-organization.
|