forked from rapidsai/cudf
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG.md
7865 lines (7492 loc) · 834 KB
/
CHANGELOG.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# cudf 24.06.00 (5 Jun 2024)
## 🚨 Breaking Changes
- Deprecate `Groupby.collect` ([#15808](https://github.com/rapidsai/cudf/pull/15808)) [@galipremsagar](https://github.com/galipremsagar)
- Raise FileNotFoundError when a literal JSON string that looks like a json filename is passed ([#15806](https://github.com/rapidsai/cudf/pull/15806)) [@lithomas1](https://github.com/lithomas1)
- Support filtered I/O in `chunked_parquet_reader` and simplify the use of `parquet_reader_options` ([#15764](https://github.com/rapidsai/cudf/pull/15764)) [@mhaseeb123](https://github.com/mhaseeb123)
- Raise errors for unsupported operations on certain types ([#15712](https://github.com/rapidsai/cudf/pull/15712)) [@galipremsagar](https://github.com/galipremsagar)
- Support `DurationType` in cudf parquet reader via `arrow:schema` ([#15617](https://github.com/rapidsai/cudf/pull/15617)) [@mhaseeb123](https://github.com/mhaseeb123)
- Remove protobuf and use parsed ORC statistics from libcudf ([#15564](https://github.com/rapidsai/cudf/pull/15564)) [@bdice](https://github.com/bdice)
- Remove legacy JSON reader from Python ([#15538](https://github.com/rapidsai/cudf/pull/15538)) [@bdice](https://github.com/bdice)
- Removing all batching code from parquet writer ([#15528](https://github.com/rapidsai/cudf/pull/15528)) [@mhaseeb123](https://github.com/mhaseeb123)
- Convert libcudf resource parameters to rmm::device_async_resource_ref ([#15507](https://github.com/rapidsai/cudf/pull/15507)) [@harrism](https://github.com/harrism)
- Remove deprecated strings offsets_begin ([#15454](https://github.com/rapidsai/cudf/pull/15454)) [@davidwendt](https://github.com/davidwendt)
- Floating <--> fixed-point conversion must now be called explicitly ([#15438](https://github.com/rapidsai/cudf/pull/15438)) [@pmattione-nvidia](https://github.com/pmattione-nvidia)
- Bind `read_parquet_metadata` API to libcudf instead of pyarrow and extract `RowGroup` information ([#15398](https://github.com/rapidsai/cudf/pull/15398)) [@mhaseeb123](https://github.com/mhaseeb123)
- Remove deprecated hash() and spark_murmurhash3_x86_32() ([#15375](https://github.com/rapidsai/cudf/pull/15375)) [@davidwendt](https://github.com/davidwendt)
- Remove empty elements from exploded character-ngrams output ([#15371](https://github.com/rapidsai/cudf/pull/15371)) [@davidwendt](https://github.com/davidwendt)
- [FEA] Performance improvement for mixed left semi/anti join ([#15288](https://github.com/rapidsai/cudf/pull/15288)) [@tgujar](https://github.com/tgujar)
- Align date_range defaults with pandas, support tz ([#15139](https://github.com/rapidsai/cudf/pull/15139)) [@mroeschke](https://github.com/mroeschke)
## 🐛 Bug Fixes
- Revert "Fix docs for IO readers and strings_convert" ([#15872](https://github.com/rapidsai/cudf/pull/15872)) [@vyasr](https://github.com/vyasr)
- Remove problematic call of index setter to unblock dask-cuda CI ([#15844](https://github.com/rapidsai/cudf/pull/15844)) [@charlesbluca](https://github.com/charlesbluca)
- Use rapids_cpm_nvtx3 to get same nvtx3 target state as rmm ([#15840](https://github.com/rapidsai/cudf/pull/15840)) [@robertmaynard](https://github.com/robertmaynard)
- Return boolean from config_host_memory_resource instead of throwing ([#15815](https://github.com/rapidsai/cudf/pull/15815)) [@abellina](https://github.com/abellina)
- Add temporary dask-cudf workaround for categorical sorting ([#15801](https://github.com/rapidsai/cudf/pull/15801)) [@rjzamora](https://github.com/rjzamora)
- Fix row group alignment in ORC writer ([#15789](https://github.com/rapidsai/cudf/pull/15789)) [@vuule](https://github.com/vuule)
- Raise error when sorting by categorical column in dask-cudf ([#15788](https://github.com/rapidsai/cudf/pull/15788)) [@rjzamora](https://github.com/rjzamora)
- Upgrade `arrow` to 16.1 ([#15787](https://github.com/rapidsai/cudf/pull/15787)) [@galipremsagar](https://github.com/galipremsagar)
- Add support for `PandasArray` for `pandas<2.1.0` ([#15786](https://github.com/rapidsai/cudf/pull/15786)) [@galipremsagar](https://github.com/galipremsagar)
- Limit runtime dependency to `libarrow>=16.0.0,<16.1.0a0` ([#15782](https://github.com/rapidsai/cudf/pull/15782)) [@pentschev](https://github.com/pentschev)
- Fix cat.as_ordered not propogating correct size ([#15780](https://github.com/rapidsai/cudf/pull/15780)) [@mroeschke](https://github.com/mroeschke)
- Handle mixed-like homogeneous types in `isin` ([#15771](https://github.com/rapidsai/cudf/pull/15771)) [@galipremsagar](https://github.com/galipremsagar)
- Fix id_vars and value_vars not accepting string scalars in melt ([#15765](https://github.com/rapidsai/cudf/pull/15765)) [@mroeschke](https://github.com/mroeschke)
- Fix `DatetimeIndex.loc` for all types of ordering cases ([#15761](https://github.com/rapidsai/cudf/pull/15761)) [@galipremsagar](https://github.com/galipremsagar)
- Fix arrow versioning logic ([#15755](https://github.com/rapidsai/cudf/pull/15755)) [@vyasr](https://github.com/vyasr)
- Avoid running sanitizer on Java test designed to cause an error ([#15753](https://github.com/rapidsai/cudf/pull/15753)) [@jlowe](https://github.com/jlowe)
- Handle empty dataframe object with index present in setitem of `loc` ([#15752](https://github.com/rapidsai/cudf/pull/15752)) [@galipremsagar](https://github.com/galipremsagar)
- Eliminate circular reference in DataFrame/Series.iloc/loc ([#15749](https://github.com/rapidsai/cudf/pull/15749)) [@mroeschke](https://github.com/mroeschke)
- Cap the absolute row index per pass in parquet chunked reader. ([#15735](https://github.com/rapidsai/cudf/pull/15735)) [@nvdbaranec](https://github.com/nvdbaranec)
- Fix `Index.repeat` for `datetime64` types ([#15722](https://github.com/rapidsai/cudf/pull/15722)) [@galipremsagar](https://github.com/galipremsagar)
- Fix multibyte check for case convert for large strings ([#15721](https://github.com/rapidsai/cudf/pull/15721)) [@davidwendt](https://github.com/davidwendt)
- Fix `get_loc` to properly fetch results from an index that is in decreasing order ([#15719](https://github.com/rapidsai/cudf/pull/15719)) [@galipremsagar](https://github.com/galipremsagar)
- Return same type as the original index for `.loc` operations ([#15717](https://github.com/rapidsai/cudf/pull/15717)) [@galipremsagar](https://github.com/galipremsagar)
- Correct static builds + static arrow ([#15715](https://github.com/rapidsai/cudf/pull/15715)) [@robertmaynard](https://github.com/robertmaynard)
- Raise errors for unsupported operations on certain types ([#15712](https://github.com/rapidsai/cudf/pull/15712)) [@galipremsagar](https://github.com/galipremsagar)
- Fix ColumnAccessor caching of nrows if empty previously ([#15710](https://github.com/rapidsai/cudf/pull/15710)) [@mroeschke](https://github.com/mroeschke)
- Allow `None` when `nan_as_null=False` in column constructor ([#15709](https://github.com/rapidsai/cudf/pull/15709)) [@galipremsagar](https://github.com/galipremsagar)
- Refine `CudaTest.testCudaException` in case throwing wrong type of CudaError under aarch64 ([#15706](https://github.com/rapidsai/cudf/pull/15706)) [@sperlingxx](https://github.com/sperlingxx)
- Fix maxima of categorical column ([#15701](https://github.com/rapidsai/cudf/pull/15701)) [@rjzamora](https://github.com/rjzamora)
- Add proxy for inplace operations in `cudf.pandas` ([#15695](https://github.com/rapidsai/cudf/pull/15695)) [@galipremsagar](https://github.com/galipremsagar)
- Make `nan_as_null` behavior consistent across all APIs ([#15692](https://github.com/rapidsai/cudf/pull/15692)) [@galipremsagar](https://github.com/galipremsagar)
- Fix CI s3 api command to fetch latest results ([#15687](https://github.com/rapidsai/cudf/pull/15687)) [@galipremsagar](https://github.com/galipremsagar)
- Add `NumpyExtensionArray` proxy type in `cudf.pandas` ([#15686](https://github.com/rapidsai/cudf/pull/15686)) [@galipremsagar](https://github.com/galipremsagar)
- Properly implement binaryops for proxy types ([#15684](https://github.com/rapidsai/cudf/pull/15684)) [@galipremsagar](https://github.com/galipremsagar)
- Fix copy assignment and the comparison operator of `rmm_host_allocator` ([#15677](https://github.com/rapidsai/cudf/pull/15677)) [@vuule](https://github.com/vuule)
- Fix multi-source reading in JSON byte range reader ([#15671](https://github.com/rapidsai/cudf/pull/15671)) [@shrshi](https://github.com/shrshi)
- Return `int64` when pandas compatible mode is turned on for `get_indexer` ([#15659](https://github.com/rapidsai/cudf/pull/15659)) [@galipremsagar](https://github.com/galipremsagar)
- Fix Index contains for error validations and float vs int comparisons ([#15657](https://github.com/rapidsai/cudf/pull/15657)) [@galipremsagar](https://github.com/galipremsagar)
- Preserve sub-second data for time scalars in column construction ([#15655](https://github.com/rapidsai/cudf/pull/15655)) [@galipremsagar](https://github.com/galipremsagar)
- Check row limit size in cudf::strings::join_strings ([#15643](https://github.com/rapidsai/cudf/pull/15643)) [@davidwendt](https://github.com/davidwendt)
- Enable sorting on column with nulls using query-planning ([#15639](https://github.com/rapidsai/cudf/pull/15639)) [@rjzamora](https://github.com/rjzamora)
- Fix operator precedence problem in Parquet reader ([#15638](https://github.com/rapidsai/cudf/pull/15638)) [@etseidl](https://github.com/etseidl)
- Fix decoding of dictionary encoded FIXED_LEN_BYTE_ARRAY data in Parquet reader ([#15601](https://github.com/rapidsai/cudf/pull/15601)) [@etseidl](https://github.com/etseidl)
- Fix debug warnings/errors in from_arrow_device_test.cpp ([#15596](https://github.com/rapidsai/cudf/pull/15596)) [@davidwendt](https://github.com/davidwendt)
- Add "collect" aggregation support to dask-cudf ([#15593](https://github.com/rapidsai/cudf/pull/15593)) [@rjzamora](https://github.com/rjzamora)
- Fix categorical-accessor support and testing in dask-cudf ([#15591](https://github.com/rapidsai/cudf/pull/15591)) [@rjzamora](https://github.com/rjzamora)
- Disable compute-sanitizer usage in CI tests with CUDA<11.6 ([#15584](https://github.com/rapidsai/cudf/pull/15584)) [@davidwendt](https://github.com/davidwendt)
- Preserve RangeIndex.step in to_arrow/from_arrow ([#15581](https://github.com/rapidsai/cudf/pull/15581)) [@mroeschke](https://github.com/mroeschke)
- Ignore new cupy warning ([#15574](https://github.com/rapidsai/cudf/pull/15574)) [@vyasr](https://github.com/vyasr)
- Add cuda-sanitizer-api dependency for test-cpp matrix 11.4 ([#15573](https://github.com/rapidsai/cudf/pull/15573)) [@davidwendt](https://github.com/davidwendt)
- Allow apply udf to reference global modules in cudf.pandas ([#15569](https://github.com/rapidsai/cudf/pull/15569)) [@mroeschke](https://github.com/mroeschke)
- Fix deprecation warnings for json legacy reader ([#15563](https://github.com/rapidsai/cudf/pull/15563)) [@davidwendt](https://github.com/davidwendt)
- Fix millisecond resampling in cudf Python ([#15560](https://github.com/rapidsai/cudf/pull/15560)) [@mroeschke](https://github.com/mroeschke)
- Rename JSON_READER_OPTION to JSON_READER_OPTION_NVBENCH. ([#15553](https://github.com/rapidsai/cudf/pull/15553)) [@bdice](https://github.com/bdice)
- Fix a JNI bug in JSON parsing fixup ([#15550](https://github.com/rapidsai/cudf/pull/15550)) [@revans2](https://github.com/revans2)
- Remove conda channel setup from wheel CI image script. ([#15539](https://github.com/rapidsai/cudf/pull/15539)) [@bdice](https://github.com/bdice)
- cudf.pandas: Series dt accessor is CombinedDatetimelikeProperties ([#15523](https://github.com/rapidsai/cudf/pull/15523)) [@wence-](https://github.com/wence-)
- Fix for some compiler warnings in parquet/page_decode.cuh ([#15518](https://github.com/rapidsai/cudf/pull/15518)) [@etseidl](https://github.com/etseidl)
- Fix exponent overflow in strings-to-double conversion ([#15517](https://github.com/rapidsai/cudf/pull/15517)) [@davidwendt](https://github.com/davidwendt)
- nanoarrow uses package override for proper pinned versions generation ([#15515](https://github.com/rapidsai/cudf/pull/15515)) [@robertmaynard](https://github.com/robertmaynard)
- Remove index name overrides in dask-cudf pyarrow table dispatch ([#15514](https://github.com/rapidsai/cudf/pull/15514)) [@charlesbluca](https://github.com/charlesbluca)
- Fix async synchronization issues in json_column.cu ([#15497](https://github.com/rapidsai/cudf/pull/15497)) [@karthikeyann](https://github.com/karthikeyann)
- Add new patch to hide more CCCL APIs ([#15493](https://github.com/rapidsai/cudf/pull/15493)) [@vyasr](https://github.com/vyasr)
- Make improvements in pandas-test reporting ([#15485](https://github.com/rapidsai/cudf/pull/15485)) [@galipremsagar](https://github.com/galipremsagar)
- Fixed page data truncation in parquet writer under certain conditions. ([#15474](https://github.com/rapidsai/cudf/pull/15474)) [@nvdbaranec](https://github.com/nvdbaranec)
- Only use data_type constructor with scale for decimal types ([#15472](https://github.com/rapidsai/cudf/pull/15472)) [@wence-](https://github.com/wence-)
- Avoid "p2p" shuffle as a default when `dask_cudf` is imported ([#15469](https://github.com/rapidsai/cudf/pull/15469)) [@rjzamora](https://github.com/rjzamora)
- Fix debug build errors from to_arrow_device_test.cpp ([#15463](https://github.com/rapidsai/cudf/pull/15463)) [@davidwendt](https://github.com/davidwendt)
- Fix base_normalator::integer_sizeof_fn integer dispatch ([#15457](https://github.com/rapidsai/cudf/pull/15457)) [@davidwendt](https://github.com/davidwendt)
- Allow consumers of static builds to find nanoarrow ([#15456](https://github.com/rapidsai/cudf/pull/15456)) [@robertmaynard](https://github.com/robertmaynard)
- Allow jit compilation when using a splayed CUDA toolkit ([#15451](https://github.com/rapidsai/cudf/pull/15451)) [@robertmaynard](https://github.com/robertmaynard)
- Handle case of scan aggregation in groupby-transform ([#15450](https://github.com/rapidsai/cudf/pull/15450)) [@wence-](https://github.com/wence-)
- Test static builds in CI and fix nanoarrow configure ([#15437](https://github.com/rapidsai/cudf/pull/15437)) [@vyasr](https://github.com/vyasr)
- Fixes potential race in JSON parser when parsing JSON lines format and when recovering from invalid lines ([#15419](https://github.com/rapidsai/cudf/pull/15419)) [@elstehle](https://github.com/elstehle)
- Fix errors in chunked ORC writer when no tables were (successfully) written ([#15393](https://github.com/rapidsai/cudf/pull/15393)) [@vuule](https://github.com/vuule)
- Support implicit array conversion with query-planning enabled ([#15378](https://github.com/rapidsai/cudf/pull/15378)) [@rjzamora](https://github.com/rjzamora)
- Fix arrow-based round trip of empty dataframes ([#15373](https://github.com/rapidsai/cudf/pull/15373)) [@wence-](https://github.com/wence-)
- Remove empty elements from exploded character-ngrams output ([#15371](https://github.com/rapidsai/cudf/pull/15371)) [@davidwendt](https://github.com/davidwendt)
- Remove boundscheck=False setting in cython files ([#15362](https://github.com/rapidsai/cudf/pull/15362)) [@wence-](https://github.com/wence-)
- Patch dask-expr `var` logic in dask-cudf ([#15347](https://github.com/rapidsai/cudf/pull/15347)) [@rjzamora](https://github.com/rjzamora)
- Fix for logical and syntactical errors in libcudf c++ examples ([#15346](https://github.com/rapidsai/cudf/pull/15346)) [@mhaseeb123](https://github.com/mhaseeb123)
- Disable dask-expr in docs builds. ([#15343](https://github.com/rapidsai/cudf/pull/15343)) [@bdice](https://github.com/bdice)
- Apply the cuFile error work around to data_sink as well ([#15335](https://github.com/rapidsai/cudf/pull/15335)) [@vuule](https://github.com/vuule)
- Fix parquet predicate filtering with column projection ([#15113](https://github.com/rapidsai/cudf/pull/15113)) [@karthikeyann](https://github.com/karthikeyann)
- Check column type equality, handling nested types correctly. ([#14531](https://github.com/rapidsai/cudf/pull/14531)) [@bdice](https://github.com/bdice)
## 📖 Documentation
- Fix docs for IO readers and strings_convert ([#15842](https://github.com/rapidsai/cudf/pull/15842)) [@bdice](https://github.com/bdice)
- Update cudf.pandas docs for GA ([#15744](https://github.com/rapidsai/cudf/pull/15744)) [@beckernick](https://github.com/beckernick)
- Add contributing warning about circular imports ([#15691](https://github.com/rapidsai/cudf/pull/15691)) [@er-eis](https://github.com/er-eis)
- Update libcudf developer guide for strings offsets column ([#15661](https://github.com/rapidsai/cudf/pull/15661)) [@davidwendt](https://github.com/davidwendt)
- Update developer guide with device_async_resource_ref guidelines ([#15562](https://github.com/rapidsai/cudf/pull/15562)) [@harrism](https://github.com/harrism)
- DOC: add pandas intersphinx mapping ([#15531](https://github.com/rapidsai/cudf/pull/15531)) [@raybellwaves](https://github.com/raybellwaves)
- rm-dup-doc in frame.py ([#15530](https://github.com/rapidsai/cudf/pull/15530)) [@raybellwaves](https://github.com/raybellwaves)
- Update CONTRIBUTING.md to use latest cuda env ([#15467](https://github.com/rapidsai/cudf/pull/15467)) [@raybellwaves](https://github.com/raybellwaves)
- Doc: interleave columns pandas compat ([#15383](https://github.com/rapidsai/cudf/pull/15383)) [@raybellwaves](https://github.com/raybellwaves)
- Simplified README Examples ([#15338](https://github.com/rapidsai/cudf/pull/15338)) [@wkaisertexas](https://github.com/wkaisertexas)
- Add debug tips section to libcudf developer guide ([#15329](https://github.com/rapidsai/cudf/pull/15329)) [@davidwendt](https://github.com/davidwendt)
- Fix and clarify notes on result ordering ([#13255](https://github.com/rapidsai/cudf/pull/13255)) [@shwina](https://github.com/shwina)
## 🚀 New Features
- Add JNI bindings for zstd compression of NVCOMP. ([#15729](https://github.com/rapidsai/cudf/pull/15729)) [@firestarman](https://github.com/firestarman)
- Fix spaces around CSV quoted strings ([#15727](https://github.com/rapidsai/cudf/pull/15727)) [@thabetx](https://github.com/thabetx)
- Add default pinned pool that falls back to new pinned allocations ([#15665](https://github.com/rapidsai/cudf/pull/15665)) [@vuule](https://github.com/vuule)
- Overhaul ops-codeowners coverage ([#15660](https://github.com/rapidsai/cudf/pull/15660)) [@raydouglass](https://github.com/raydouglass)
- Concatenate dictionary of objects along axis=1 ([#15623](https://github.com/rapidsai/cudf/pull/15623)) [@er-eis](https://github.com/er-eis)
- Construct `pylibcudf` columns from objects supporting `__cuda_array_interface__` ([#15615](https://github.com/rapidsai/cudf/pull/15615)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Expose some Parquet per-column configuration options via the python API ([#15613](https://github.com/rapidsai/cudf/pull/15613)) [@etseidl](https://github.com/etseidl)
- Migrate string `find` operations to `pylibcudf` ([#15604](https://github.com/rapidsai/cudf/pull/15604)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Round trip FIXED_LEN_BYTE_ARRAY data properly in Parquet writer ([#15600](https://github.com/rapidsai/cudf/pull/15600)) [@etseidl](https://github.com/etseidl)
- Reading multi-line JSON in string columns using runtime configurable delimiter ([#15556](https://github.com/rapidsai/cudf/pull/15556)) [@shrshi](https://github.com/shrshi)
- Remove public gtest dependency from libcudf conda package ([#15534](https://github.com/rapidsai/cudf/pull/15534)) [@robertmaynard](https://github.com/robertmaynard)
- Fea/move to latest nanoarrow ([#15526](https://github.com/rapidsai/cudf/pull/15526)) [@robertmaynard](https://github.com/robertmaynard)
- Migrate string `case` operations to `pylibcudf` ([#15489](https://github.com/rapidsai/cudf/pull/15489)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Add Parquet encoding statistics to column chunk metadata ([#15452](https://github.com/rapidsai/cudf/pull/15452)) [@etseidl](https://github.com/etseidl)
- Implement JNI for chunked ORC reader ([#15446](https://github.com/rapidsai/cudf/pull/15446)) [@ttnghia](https://github.com/ttnghia)
- Add some missing optional fields to the Parquet RowGroup metadata ([#15421](https://github.com/rapidsai/cudf/pull/15421)) [@etseidl](https://github.com/etseidl)
- Adding parquet transcoding example ([#15420](https://github.com/rapidsai/cudf/pull/15420)) [@mhaseeb123](https://github.com/mhaseeb123)
- Add fields to Parquet Statistics structure that were added in parquet-format 2.10 ([#15412](https://github.com/rapidsai/cudf/pull/15412)) [@etseidl](https://github.com/etseidl)
- Add option to Parquet writer to skip compressing individual columns ([#15411](https://github.com/rapidsai/cudf/pull/15411)) [@etseidl](https://github.com/etseidl)
- Add BYTE_STREAM_SPLIT support to Parquet ([#15311](https://github.com/rapidsai/cudf/pull/15311)) [@etseidl](https://github.com/etseidl)
- Introduce benchmark suite for JSON reader options ([#15124](https://github.com/rapidsai/cudf/pull/15124)) [@shrshi](https://github.com/shrshi)
- Implement ORC chunked reader ([#15094](https://github.com/rapidsai/cudf/pull/15094)) [@ttnghia](https://github.com/ttnghia)
- Extend cudf devcontainers to specify jitify2 kernel cache ([#15068](https://github.com/rapidsai/cudf/pull/15068)) [@robertmaynard](https://github.com/robertmaynard)
- Add `to_arrow_device` function to cudf interop using nanoarrow ([#15047](https://github.com/rapidsai/cudf/pull/15047)) [@zeroshade](https://github.com/zeroshade)
- Add JSON option to prune columns ([#14996](https://github.com/rapidsai/cudf/pull/14996)) [@karthikeyann](https://github.com/karthikeyann)
## 🛠️ Improvements
- Deprecate `Groupby.collect` ([#15808](https://github.com/rapidsai/cudf/pull/15808)) [@galipremsagar](https://github.com/galipremsagar)
- Raise FileNotFoundError when a literal JSON string that looks like a json filename is passed ([#15806](https://github.com/rapidsai/cudf/pull/15806)) [@lithomas1](https://github.com/lithomas1)
- Deprecate `divisions='quantile'` support in `set_index` ([#15804](https://github.com/rapidsai/cudf/pull/15804)) [@rjzamora](https://github.com/rjzamora)
- Improve performance of Series.to_numpy/to_cupy ([#15792](https://github.com/rapidsai/cudf/pull/15792)) [@mroeschke](https://github.com/mroeschke)
- Access `self.index` instead of `self._index` where possible ([#15781](https://github.com/rapidsai/cudf/pull/15781)) [@mroeschke](https://github.com/mroeschke)
- Support filtered I/O in `chunked_parquet_reader` and simplify the use of `parquet_reader_options` ([#15764](https://github.com/rapidsai/cudf/pull/15764)) [@mhaseeb123](https://github.com/mhaseeb123)
- Avoid index-to-column conversion in some DataFrame ops ([#15763](https://github.com/rapidsai/cudf/pull/15763)) [@mroeschke](https://github.com/mroeschke)
- Fix `chunked_parquet_reader` behavior when input has no more rows to read ([#15757](https://github.com/rapidsai/cudf/pull/15757)) [@mhaseeb123](https://github.com/mhaseeb123)
- [JNI] Expose java API for cudf::io::config_host_memory_resource ([#15745](https://github.com/rapidsai/cudf/pull/15745)) [@abellina](https://github.com/abellina)
- Migrate all cpp pxd files into pylibcudf ([#15740](https://github.com/rapidsai/cudf/pull/15740)) [@vyasr](https://github.com/vyasr)
- Validate and materialize iterators earlier in as_column ([#15739](https://github.com/rapidsai/cudf/pull/15739)) [@mroeschke](https://github.com/mroeschke)
- Push some as_column arrow logic to ColumnBase.from_arrow ([#15738](https://github.com/rapidsai/cudf/pull/15738)) [@mroeschke](https://github.com/mroeschke)
- Expose stream parameter in public reduction APIs ([#15737](https://github.com/rapidsai/cudf/pull/15737)) [@srinivasyadav18](https://github.com/srinivasyadav18)
- remove unnecessary 'setuptools' host dependency, simplify dependencies.yaml ([#15736](https://github.com/rapidsai/cudf/pull/15736)) [@jameslamb](https://github.com/jameslamb)
- Defer to C++ equality and hashing for pylibcudf DataType and Aggregation objects ([#15732](https://github.com/rapidsai/cudf/pull/15732)) [@wence-](https://github.com/wence-)
- Implement null-aware NOT_EQUALS binop ([#15731](https://github.com/rapidsai/cudf/pull/15731)) [@wence-](https://github.com/wence-)
- Fix split-record result list column offset type ([#15707](https://github.com/rapidsai/cudf/pull/15707)) [@davidwendt](https://github.com/davidwendt)
- Upgrade `arrow` to `16` ([#15703](https://github.com/rapidsai/cudf/pull/15703)) [@galipremsagar](https://github.com/galipremsagar)
- Remove experimental namespace from make_strings_children ([#15702](https://github.com/rapidsai/cudf/pull/15702)) [@davidwendt](https://github.com/davidwendt)
- Rework get_json_object benchmark to use nvbench ([#15698](https://github.com/rapidsai/cudf/pull/15698)) [@davidwendt](https://github.com/davidwendt)
- Rework some python tests of Parquet delta encodings ([#15693](https://github.com/rapidsai/cudf/pull/15693)) [@etseidl](https://github.com/etseidl)
- Skeleton cudf polars package ([#15688](https://github.com/rapidsai/cudf/pull/15688)) [@wence-](https://github.com/wence-)
- Upgrade pre commit hooks ([#15685](https://github.com/rapidsai/cudf/pull/15685)) [@wence-](https://github.com/wence-)
- Allow `fillna` to validate for `CategoricalColumn.fillna` ([#15683](https://github.com/rapidsai/cudf/pull/15683)) [@galipremsagar](https://github.com/galipremsagar)
- Misc Column cleanups ([#15682](https://github.com/rapidsai/cudf/pull/15682)) [@mroeschke](https://github.com/mroeschke)
- Reducing runtime of JSON reader options benchmark ([#15681](https://github.com/rapidsai/cudf/pull/15681)) [@shrshi](https://github.com/shrshi)
- Add `Timestamp` and `Timedelta` proxy types ([#15680](https://github.com/rapidsai/cudf/pull/15680)) [@galipremsagar](https://github.com/galipremsagar)
- Remove host_parse_nested_json. ([#15674](https://github.com/rapidsai/cudf/pull/15674)) [@bdice](https://github.com/bdice)
- Reduce runtime for ParquetChunkedReaderInputLimitTest gtests ([#15672](https://github.com/rapidsai/cudf/pull/15672)) [@davidwendt](https://github.com/davidwendt)
- Add large-strings gtest for cudf::interleave_columns ([#15669](https://github.com/rapidsai/cudf/pull/15669)) [@davidwendt](https://github.com/davidwendt)
- Use experimental make_strings_children for multi-replace_re ([#15667](https://github.com/rapidsai/cudf/pull/15667)) [@davidwendt](https://github.com/davidwendt)
- Enabled `Holiday` types in `cudf.pandas` ([#15664](https://github.com/rapidsai/cudf/pull/15664)) [@galipremsagar](https://github.com/galipremsagar)
- Remove obsolete `XFAIL` markers for query-planning ([#15662](https://github.com/rapidsai/cudf/pull/15662)) [@rjzamora](https://github.com/rjzamora)
- Clean up join benchmarks ([#15644](https://github.com/rapidsai/cudf/pull/15644)) [@PointKernel](https://github.com/PointKernel)
- Enable warnings as errors in custreamz ([#15642](https://github.com/rapidsai/cudf/pull/15642)) [@mroeschke](https://github.com/mroeschke)
- Improve distinct join with set `retrieve` ([#15636](https://github.com/rapidsai/cudf/pull/15636)) [@PointKernel](https://github.com/PointKernel)
- Fix -Werror=type-limits. ([#15635](https://github.com/rapidsai/cudf/pull/15635)) [@bdice](https://github.com/bdice)
- Enable FutureWarnings/DeprecationWarnings as errors for dask_cudf ([#15634](https://github.com/rapidsai/cudf/pull/15634)) [@mroeschke](https://github.com/mroeschke)
- Remove NVBench SHA override. ([#15633](https://github.com/rapidsai/cudf/pull/15633)) [@alliepiper](https://github.com/alliepiper)
- Add support for large string columns to Parquet reader and writer ([#15632](https://github.com/rapidsai/cudf/pull/15632)) [@etseidl](https://github.com/etseidl)
- Large strings support in MD5 and SHA hashers ([#15631](https://github.com/rapidsai/cudf/pull/15631)) [@davidwendt](https://github.com/davidwendt)
- Fix make_offsets_child_column usage in cudf::strings::detail::shift ([#15630](https://github.com/rapidsai/cudf/pull/15630)) [@davidwendt](https://github.com/davidwendt)
- Use experimental make_strings_children for strings convert ([#15629](https://github.com/rapidsai/cudf/pull/15629)) [@davidwendt](https://github.com/davidwendt)
- Forward-merge branch-24.04 to branch-24.06 ([#15627](https://github.com/rapidsai/cudf/pull/15627)) [@bdice](https://github.com/bdice)
- Avoid accessing attributes via `_column` if not needed ([#15624](https://github.com/rapidsai/cudf/pull/15624)) [@mroeschke](https://github.com/mroeschke)
- Make ColumnBase.__cuda_array_interface__ opt out instead of opt in ([#15622](https://github.com/rapidsai/cudf/pull/15622)) [@mroeschke](https://github.com/mroeschke)
- Large strings support for cudf::gather ([#15621](https://github.com/rapidsai/cudf/pull/15621)) [@davidwendt](https://github.com/davidwendt)
- Remove jni-docker-build workflow ([#15619](https://github.com/rapidsai/cudf/pull/15619)) [@bdice](https://github.com/bdice)
- Support `DurationType` in cudf parquet reader via `arrow:schema` ([#15617](https://github.com/rapidsai/cudf/pull/15617)) [@mhaseeb123](https://github.com/mhaseeb123)
- Drop Centos7 support ([#15608](https://github.com/rapidsai/cudf/pull/15608)) [@NvTimLiu](https://github.com/NvTimLiu)
- Use experimental make_strings_children for json/csv writers ([#15599](https://github.com/rapidsai/cudf/pull/15599)) [@davidwendt](https://github.com/davidwendt)
- Use experimental make_strings_children for strings join/url_encode/slice ([#15598](https://github.com/rapidsai/cudf/pull/15598)) [@davidwendt](https://github.com/davidwendt)
- Use experimental make_strings_children in nvtext APIs ([#15595](https://github.com/rapidsai/cudf/pull/15595)) [@davidwendt](https://github.com/davidwendt)
- Migrate to `{{ stdlib("c") }}` ([#15594](https://github.com/rapidsai/cudf/pull/15594)) [@hcho3](https://github.com/hcho3)
- Deprecate `to/from_dask_dataframe` APIs in dask-cudf ([#15592](https://github.com/rapidsai/cudf/pull/15592)) [@rjzamora](https://github.com/rjzamora)
- Minor fixups for future NumPy 2 compatibility ([#15590](https://github.com/rapidsai/cudf/pull/15590)) [@seberg](https://github.com/seberg)
- Delay materializing RangeIndex in .reset_index ([#15588](https://github.com/rapidsai/cudf/pull/15588)) [@mroeschke](https://github.com/mroeschke)
- Use experimental make_strings_children for capitalize/case/pad functions ([#15587](https://github.com/rapidsai/cudf/pull/15587)) [@davidwendt](https://github.com/davidwendt)
- Use experimental make_strings_children for strings replace/filter/translate ([#15586](https://github.com/rapidsai/cudf/pull/15586)) [@davidwendt](https://github.com/davidwendt)
- Add multithreaded parquet reader benchmarks. ([#15585](https://github.com/rapidsai/cudf/pull/15585)) [@nvdbaranec](https://github.com/nvdbaranec)
- Don't materialize column during RangeIndex methods ([#15582](https://github.com/rapidsai/cudf/pull/15582)) [@mroeschke](https://github.com/mroeschke)
- Improve performance for cudf::strings::count_re ([#15578](https://github.com/rapidsai/cudf/pull/15578)) [@davidwendt](https://github.com/davidwendt)
- Replace RangeIndex._start/_stop/_step with _range ([#15576](https://github.com/rapidsai/cudf/pull/15576)) [@mroeschke](https://github.com/mroeschke)
- add --rm and --name to devcontainer run args ([#15572](https://github.com/rapidsai/cudf/pull/15572)) [@trxcllnt](https://github.com/trxcllnt)
- Change the default dictionary policy in Parquet writer from `ALWAYS` to `ADAPTIVE` ([#15570](https://github.com/rapidsai/cudf/pull/15570)) [@mhaseeb123](https://github.com/mhaseeb123)
- Rename experimental JSON tests. ([#15568](https://github.com/rapidsai/cudf/pull/15568)) [@bdice](https://github.com/bdice)
- Refactor JNI native dependency loading to allow returning of library path ([#15566](https://github.com/rapidsai/cudf/pull/15566)) [@jlowe](https://github.com/jlowe)
- Remove protobuf and use parsed ORC statistics from libcudf ([#15564](https://github.com/rapidsai/cudf/pull/15564)) [@bdice](https://github.com/bdice)
- Deprecate legacy JSON reader options. ([#15558](https://github.com/rapidsai/cudf/pull/15558)) [@bdice](https://github.com/bdice)
- Use same .clang-format in cuDF JNI ([#15557](https://github.com/rapidsai/cudf/pull/15557)) [@bdice](https://github.com/bdice)
- Large strings support for cudf::fill ([#15555](https://github.com/rapidsai/cudf/pull/15555)) [@davidwendt](https://github.com/davidwendt)
- Upgrade upper bound pinning to `pandas-2.2.2` ([#15554](https://github.com/rapidsai/cudf/pull/15554)) [@galipremsagar](https://github.com/galipremsagar)
- Work around issues with cccl main ([#15552](https://github.com/rapidsai/cudf/pull/15552)) [@miscco](https://github.com/miscco)
- Enable pandas plotting unit tests for cudf.pandas ([#15547](https://github.com/rapidsai/cudf/pull/15547)) [@mroeschke](https://github.com/mroeschke)
- Move timezone conversion logic to `DatetimeColumn` ([#15545](https://github.com/rapidsai/cudf/pull/15545)) [@mroeschke](https://github.com/mroeschke)
- Large strings support for cudf::interleave_columns ([#15544](https://github.com/rapidsai/cudf/pull/15544)) [@davidwendt](https://github.com/davidwendt)
- [skip ci] Switch back to 24.06 branch for pandas tests ([#15543](https://github.com/rapidsai/cudf/pull/15543)) [@galipremsagar](https://github.com/galipremsagar)
- Remove checks dependency from static-configure test job. ([#15542](https://github.com/rapidsai/cudf/pull/15542)) [@bdice](https://github.com/bdice)
- Remove legacy JSON reader from Python ([#15538](https://github.com/rapidsai/cudf/pull/15538)) [@bdice](https://github.com/bdice)
- Enable more ignored pandas unit tests for cudf.pandas ([#15535](https://github.com/rapidsai/cudf/pull/15535)) [@mroeschke](https://github.com/mroeschke)
- Large strings support for cudf::clamp ([#15533](https://github.com/rapidsai/cudf/pull/15533)) [@davidwendt](https://github.com/davidwendt)
- Remove version hard-coding ([#15529](https://github.com/rapidsai/cudf/pull/15529)) [@galipremsagar](https://github.com/galipremsagar)
- Removing all batching code from parquet writer ([#15528](https://github.com/rapidsai/cudf/pull/15528)) [@mhaseeb123](https://github.com/mhaseeb123)
- Make some private class properties not settable ([#15527](https://github.com/rapidsai/cudf/pull/15527)) [@mroeschke](https://github.com/mroeschke)
- Large strings support in regex replace APIs ([#15524](https://github.com/rapidsai/cudf/pull/15524)) [@davidwendt](https://github.com/davidwendt)
- Skip pandas unit tests that crash pytest workers in `cudf.pandas` ([#15521](https://github.com/rapidsai/cudf/pull/15521)) [@mroeschke](https://github.com/mroeschke)
- Preserve column metadata during more DataFrame operations ([#15519](https://github.com/rapidsai/cudf/pull/15519)) [@mroeschke](https://github.com/mroeschke)
- Move to pandas-tests to a dedicated workflow file and trigger it from branch.yaml ([#15516](https://github.com/rapidsai/cudf/pull/15516)) [@galipremsagar](https://github.com/galipremsagar)
- Large strings gtest fixture and utilities ([#15513](https://github.com/rapidsai/cudf/pull/15513)) [@davidwendt](https://github.com/davidwendt)
- Convert libcudf resource parameters to rmm::device_async_resource_ref ([#15507](https://github.com/rapidsai/cudf/pull/15507)) [@harrism](https://github.com/harrism)
- Relax protobuf lower bound to 3.20. ([#15506](https://github.com/rapidsai/cudf/pull/15506)) [@bdice](https://github.com/bdice)
- Clean up index methods ([#15496](https://github.com/rapidsai/cudf/pull/15496)) [@mroeschke](https://github.com/mroeschke)
- Update strings contains benchmarks to nvbench ([#15495](https://github.com/rapidsai/cudf/pull/15495)) [@davidwendt](https://github.com/davidwendt)
- Update NVBench fixture to use new hooks, fix pinned memory segfault. ([#15492](https://github.com/rapidsai/cudf/pull/15492)) [@alliepiper](https://github.com/alliepiper)
- Enable tests/scalar and test/series in cudf.pandas tests ([#15486](https://github.com/rapidsai/cudf/pull/15486)) [@mroeschke](https://github.com/mroeschke)
- Clean up __cuda_array_interface__ handling in as_column ([#15477](https://github.com/rapidsai/cudf/pull/15477)) [@mroeschke](https://github.com/mroeschke)
- Avoid .ordered and .categories from being settable in CategoricalColumn and CategoricalDtype ([#15475](https://github.com/rapidsai/cudf/pull/15475)) [@mroeschke](https://github.com/mroeschke)
- Ignore pandas tests for cudf.pandas that need motoserver ([#15468](https://github.com/rapidsai/cudf/pull/15468)) [@mroeschke](https://github.com/mroeschke)
- Use cached_property for NumericColumn.nan_count instead of ._nan_count variable ([#15466](https://github.com/rapidsai/cudf/pull/15466)) [@mroeschke](https://github.com/mroeschke)
- Add to_arrow_device() functions that accept views ([#15465](https://github.com/rapidsai/cudf/pull/15465)) [@davidwendt](https://github.com/davidwendt)
- Add custom status check workflow ([#15464](https://github.com/rapidsai/cudf/pull/15464)) [@galipremsagar](https://github.com/galipremsagar)
- Disable pandas 2.x clipboard tests in cudf.pandas tests ([#15462](https://github.com/rapidsai/cudf/pull/15462)) [@mroeschke](https://github.com/mroeschke)
- Enable tests/strings/test_api.py and tests/io/pytables in cudf.pandas tests ([#15461](https://github.com/rapidsai/cudf/pull/15461)) [@mroeschke](https://github.com/mroeschke)
- Enable test_parsing in cudf.pandas tests ([#15460](https://github.com/rapidsai/cudf/pull/15460)) [@mroeschke](https://github.com/mroeschke)
- Add `from_arrow_device` function to cudf interop using nanoarrow ([#15458](https://github.com/rapidsai/cudf/pull/15458)) [@zeroshade](https://github.com/zeroshade)
- Remove deprecated strings offsets_begin ([#15454](https://github.com/rapidsai/cudf/pull/15454)) [@davidwendt](https://github.com/davidwendt)
- Enable tests/windows/ in cudf.pandas tests ([#15444](https://github.com/rapidsai/cudf/pull/15444)) [@mroeschke](https://github.com/mroeschke)
- Enable tests/interchange/test_impl.py in cudf.pandas tests ([#15443](https://github.com/rapidsai/cudf/pull/15443)) [@mroeschke](https://github.com/mroeschke)
- Enable tests/io/test_user_agent.py in cudf pandas tests ([#15442](https://github.com/rapidsai/cudf/pull/15442)) [@mroeschke](https://github.com/mroeschke)
- Performance improvement in libcudf case conversion for long strings ([#15441](https://github.com/rapidsai/cudf/pull/15441)) [@davidwendt](https://github.com/davidwendt)
- Remove prior test skipping in run-pandas-tests with testing 2.2.1 ([#15440](https://github.com/rapidsai/cudf/pull/15440)) [@mroeschke](https://github.com/mroeschke)
- Support orc and text IO with dask-expr using legacy conversion ([#15439](https://github.com/rapidsai/cudf/pull/15439)) [@rjzamora](https://github.com/rjzamora)
- Floating <--> fixed-point conversion must now be called explicitly ([#15438](https://github.com/rapidsai/cudf/pull/15438)) [@pmattione-nvidia](https://github.com/pmattione-nvidia)
- Unify Copy-On-Write and Spilling ([#15436](https://github.com/rapidsai/cudf/pull/15436)) [@madsbk](https://github.com/madsbk)
- Enable ``dask_cudf`` json and s3 tests with query-planning on ([#15408](https://github.com/rapidsai/cudf/pull/15408)) [@rjzamora](https://github.com/rjzamora)
- Bump ruff and codespell pre-commit checks ([#15407](https://github.com/rapidsai/cudf/pull/15407)) [@mroeschke](https://github.com/mroeschke)
- Enable all tests for `arm` arch ([#15402](https://github.com/rapidsai/cudf/pull/15402)) [@galipremsagar](https://github.com/galipremsagar)
- Bind `read_parquet_metadata` API to libcudf instead of pyarrow and extract `RowGroup` information ([#15398](https://github.com/rapidsai/cudf/pull/15398)) [@mhaseeb123](https://github.com/mhaseeb123)
- Optimizing multi-source byte range reading in JSON reader ([#15396](https://github.com/rapidsai/cudf/pull/15396)) [@shrshi](https://github.com/shrshi)
- add correct labels to pandas_function_request.md ([#15381](https://github.com/rapidsai/cudf/pull/15381)) [@raybellwaves](https://github.com/raybellwaves)
- Remove deprecated hash() and spark_murmurhash3_x86_32() ([#15375](https://github.com/rapidsai/cudf/pull/15375)) [@davidwendt](https://github.com/davidwendt)
- Large strings support in cudf::merge ([#15374](https://github.com/rapidsai/cudf/pull/15374)) [@davidwendt](https://github.com/davidwendt)
- Enable test-reporting for pandas pytests in CI ([#15369](https://github.com/rapidsai/cudf/pull/15369)) [@galipremsagar](https://github.com/galipremsagar)
- Use logical types in Parquet reader ([#15365](https://github.com/rapidsai/cudf/pull/15365)) [@etseidl](https://github.com/etseidl)
- Add experimental make_strings_children utility ([#15363](https://github.com/rapidsai/cudf/pull/15363)) [@davidwendt](https://github.com/davidwendt)
- Forward-merge branch-24.04 to branch-24.06 ([#15349](https://github.com/rapidsai/cudf/pull/15349)) [@bdice](https://github.com/bdice)
- Fix CMake files in libcudf C++ examples to use existing libcudf build if present ([#15348](https://github.com/rapidsai/cudf/pull/15348)) [@mhaseeb123](https://github.com/mhaseeb123)
- Use ruff pydocstyle over pydocstyle pre-commit hook ([#15345](https://github.com/rapidsai/cudf/pull/15345)) [@mroeschke](https://github.com/mroeschke)
- Refactor stream mode setup for gtests ([#15337](https://github.com/rapidsai/cudf/pull/15337)) [@davidwendt](https://github.com/davidwendt)
- Benchmark decimal <--> floating conversions. ([#15334](https://github.com/rapidsai/cudf/pull/15334)) [@pmattione-nvidia](https://github.com/pmattione-nvidia)
- Avoid duplicate dask-cudf testing ([#15333](https://github.com/rapidsai/cudf/pull/15333)) [@rjzamora](https://github.com/rjzamora)
- Skip decode steps in Parquet reader when nullable columns have no nulls ([#15332](https://github.com/rapidsai/cudf/pull/15332)) [@etseidl](https://github.com/etseidl)
- Update udf_cpp to use rapids_cpm_cccl. ([#15331](https://github.com/rapidsai/cudf/pull/15331)) [@bdice](https://github.com/bdice)
- Forward-merge branch-24.04 into branch-24.06 [skip ci] ([#15330](https://github.com/rapidsai/cudf/pull/15330)) [@rapids-bot[bot]](https://github.com/rapids-bot[bot])
- Allow ``numeric_only=True`` for simple groupby reductions ([#15326](https://github.com/rapidsai/cudf/pull/15326)) [@rjzamora](https://github.com/rjzamora)
- Drop CentOS 7 support. ([#15323](https://github.com/rapidsai/cudf/pull/15323)) [@bdice](https://github.com/bdice)
- Rework cudf::find_and_replace_all to use gather-based make_strings_column ([#15305](https://github.com/rapidsai/cudf/pull/15305)) [@davidwendt](https://github.com/davidwendt)
- First pass at adding testing for pylibcudf ([#15300](https://github.com/rapidsai/cudf/pull/15300)) [@vyasr](https://github.com/vyasr)
- [FEA] Performance improvement for mixed left semi/anti join ([#15288](https://github.com/rapidsai/cudf/pull/15288)) [@tgujar](https://github.com/tgujar)
- Rework cudf::replace_nulls to use strings::detail::copy_if_else ([#15286](https://github.com/rapidsai/cudf/pull/15286)) [@davidwendt](https://github.com/davidwendt)
- Clean up special casing in `as_column` for non-typed input ([#15276](https://github.com/rapidsai/cudf/pull/15276)) [@mroeschke](https://github.com/mroeschke)
- Large strings support in cudf::concatenate ([#15195](https://github.com/rapidsai/cudf/pull/15195)) [@davidwendt](https://github.com/davidwendt)
- Use less _is_categorical_dtype ([#15148](https://github.com/rapidsai/cudf/pull/15148)) [@mroeschke](https://github.com/mroeschke)
- Align date_range defaults with pandas, support tz ([#15139](https://github.com/rapidsai/cudf/pull/15139)) [@mroeschke](https://github.com/mroeschke)
- `ModuleAccelerator` performance: cache the result of checking if a caller is in the denylist ([#15056](https://github.com/rapidsai/cudf/pull/15056)) [@shwina](https://github.com/shwina)
- Use offsetalator in cudf::strings::replace functions ([#14824](https://github.com/rapidsai/cudf/pull/14824)) [@davidwendt](https://github.com/davidwendt)
- Cleanup some timedelta/datetime column logic ([#14715](https://github.com/rapidsai/cudf/pull/14715)) [@mroeschke](https://github.com/mroeschke)
- Refactor numpy array input in as_column ([#14651](https://github.com/rapidsai/cudf/pull/14651)) [@mroeschke](https://github.com/mroeschke)
- Refactor joins for conditional semis and antis ([#14646](https://github.com/rapidsai/cudf/pull/14646)) [@DanialJavady96](https://github.com/DanialJavady96)
- Eagerly populate the class dict for cudf.pandas proxy types ([#14534](https://github.com/rapidsai/cudf/pull/14534)) [@shwina](https://github.com/shwina)
- Some additional kernel thread index refactoring. ([#14107](https://github.com/rapidsai/cudf/pull/14107)) [@bdice](https://github.com/bdice)
# cuDF 24.04.00 (10 Apr 2024)
## 🚨 Breaking Changes
- Restructure pylibcudf/arrow interop facilities ([#15325](https://github.com/rapidsai/cudf/pull/15325)) [@vyasr](https://github.com/vyasr)
- Change exceptions thrown by copying APIs ([#15319](https://github.com/rapidsai/cudf/pull/15319)) [@vyasr](https://github.com/vyasr)
- Change strings_column_view::char_size to return int64 ([#15197](https://github.com/rapidsai/cudf/pull/15197)) [@davidwendt](https://github.com/davidwendt)
- Upgrade to `arrow-14.0.2` ([#15108](https://github.com/rapidsai/cudf/pull/15108)) [@galipremsagar](https://github.com/galipremsagar)
- Add support for `pandas-2.2` in `cudf` ([#15100](https://github.com/rapidsai/cudf/pull/15100)) [@galipremsagar](https://github.com/galipremsagar)
- Deprecate cudf::hashing::spark_murmurhash3_x86_32 ([#15074](https://github.com/rapidsai/cudf/pull/15074)) [@davidwendt](https://github.com/davidwendt)
- Align MultiIndex.get_indexder with pandas 2.2 change ([#15059](https://github.com/rapidsai/cudf/pull/15059)) [@mroeschke](https://github.com/mroeschke)
- Raise an error on import for unsupported GPUs. ([#15053](https://github.com/rapidsai/cudf/pull/15053)) [@bdice](https://github.com/bdice)
- Deprecate datelike isin casting strings to dates to match pandas 2.2 ([#15046](https://github.com/rapidsai/cudf/pull/15046)) [@mroeschke](https://github.com/mroeschke)
- Align concat Series name behavior in pandas 2.2 ([#15032](https://github.com/rapidsai/cudf/pull/15032)) [@mroeschke](https://github.com/mroeschke)
- Add `future_stack` to `DataFrame.stack` ([#15015](https://github.com/rapidsai/cudf/pull/15015)) [@galipremsagar](https://github.com/galipremsagar)
- Deprecate groupby fillna ([#15000](https://github.com/rapidsai/cudf/pull/15000)) [@mroeschke](https://github.com/mroeschke)
- Deprecate replace with categorical columns ([#14988](https://github.com/rapidsai/cudf/pull/14988)) [@mroeschke](https://github.com/mroeschke)
- Deprecate delim_whitespace in read_csv for pandas 2.2 ([#14986](https://github.com/rapidsai/cudf/pull/14986)) [@mroeschke](https://github.com/mroeschke)
- Deprecate parameters similar to pandas 2.2 ([#14984](https://github.com/rapidsai/cudf/pull/14984)) [@mroeschke](https://github.com/mroeschke)
- Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. ([#14962](https://github.com/rapidsai/cudf/pull/14962)) [@bdice](https://github.com/bdice)
- Add `pandas-2.x` support in `cudf` ([#14916](https://github.com/rapidsai/cudf/pull/14916)) [@galipremsagar](https://github.com/galipremsagar)
- Use cuco::static_set in the hash-based groupby ([#14813](https://github.com/rapidsai/cudf/pull/14813)) [@PointKernel](https://github.com/PointKernel)
## 🐛 Bug Fixes
- Fix an issue with creating a series from scalar when `dtype='category'` ([#15476](https://github.com/rapidsai/cudf/pull/15476)) [@galipremsagar](https://github.com/galipremsagar)
- Update pre-commit-hooks to v0.0.3 ([#15355](https://github.com/rapidsai/cudf/pull/15355)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- [BUG][JNI] Trigger MemoryBuffer.onClosed after memory is freed ([#15351](https://github.com/rapidsai/cudf/pull/15351)) [@abellina](https://github.com/abellina)
- Fix an issue with multiple short list rowgroups using the Parquet chunked reader. ([#15342](https://github.com/rapidsai/cudf/pull/15342)) [@nvdbaranec](https://github.com/nvdbaranec)
- Avoid importing dask-expr if "query-planning" config is `False` ([#15340](https://github.com/rapidsai/cudf/pull/15340)) [@rjzamora](https://github.com/rjzamora)
- Fix gtests/ERROR_TEST errors when run in Debug ([#15317](https://github.com/rapidsai/cudf/pull/15317)) [@davidwendt](https://github.com/davidwendt)
- Fix OOB read in `inflate_kernel` ([#15309](https://github.com/rapidsai/cudf/pull/15309)) [@vuule](https://github.com/vuule)
- Work around a cuFile error when running CSV tests with memcheck ([#15293](https://github.com/rapidsai/cudf/pull/15293)) [@vuule](https://github.com/vuule)
- Fix Doxygen upload directory ([#15291](https://github.com/rapidsai/cudf/pull/15291)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Fix Doxygen check ([#15289](https://github.com/rapidsai/cudf/pull/15289)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Reintroduce PANDAS_GE_220 import ([#15287](https://github.com/rapidsai/cudf/pull/15287)) [@wence-](https://github.com/wence-)
- Fix mean computation for the geometric distribution in the data generator ([#15282](https://github.com/rapidsai/cudf/pull/15282)) [@vuule](https://github.com/vuule)
- Fix Parquet decimal64 stats ([#15281](https://github.com/rapidsai/cudf/pull/15281)) [@etseidl](https://github.com/etseidl)
- Make linking of nvtx3-cpp BUILD_LOCAL_INTERFACE ([#15271](https://github.com/rapidsai/cudf/pull/15271)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Workaround compute-sanitizer memcheck bug ([#15259](https://github.com/rapidsai/cudf/pull/15259)) [@davidwendt](https://github.com/davidwendt)
- Cleanup `hostdevice_vector` and add more APIs ([#15252](https://github.com/rapidsai/cudf/pull/15252)) [@ttnghia](https://github.com/ttnghia)
- Fix number of rows in randomly generated lists columns ([#15248](https://github.com/rapidsai/cudf/pull/15248)) [@vuule](https://github.com/vuule)
- Fix wrong output for `collect_list`/`collect_set` of lists column ([#15243](https://github.com/rapidsai/cudf/pull/15243)) [@ttnghia](https://github.com/ttnghia)
- Fix testchunkedPackTwoPasses to copy from the bounce buffer ([#15220](https://github.com/rapidsai/cudf/pull/15220)) [@abellina](https://github.com/abellina)
- Fix accessing `.columns` by an external API ([#15212](https://github.com/rapidsai/cudf/pull/15212)) [@galipremsagar](https://github.com/galipremsagar)
- [JNI] Disable testChunkedPackTwoPasses for now ([#15210](https://github.com/rapidsai/cudf/pull/15210)) [@abellina](https://github.com/abellina)
- Update labeler and codeowner configs for CMake files ([#15208](https://github.com/rapidsai/cudf/pull/15208)) [@PointKernel](https://github.com/PointKernel)
- Avoid dict normalization in ``__dask_tokenize__`` ([#15187](https://github.com/rapidsai/cudf/pull/15187)) [@rjzamora](https://github.com/rjzamora)
- Fix memcheck error in distinct inner join ([#15164](https://github.com/rapidsai/cudf/pull/15164)) [@PointKernel](https://github.com/PointKernel)
- Remove unneeded script parameters in test_cpp_memcheck.sh ([#15158](https://github.com/rapidsai/cudf/pull/15158)) [@davidwendt](https://github.com/davidwendt)
- Fix `ListColumn.to_pandas()` to retain `list` type ([#15155](https://github.com/rapidsai/cudf/pull/15155)) [@galipremsagar](https://github.com/galipremsagar)
- Avoid factorization in MultiIndex.to_pandas ([#15150](https://github.com/rapidsai/cudf/pull/15150)) [@mroeschke](https://github.com/mroeschke)
- Fix GroupBy.get_group and GroupBy.indices ([#15143](https://github.com/rapidsai/cudf/pull/15143)) [@wence-](https://github.com/wence-)
- Remove `const` from `range_window_bounds::_extent`. ([#15138](https://github.com/rapidsai/cudf/pull/15138)) [@mythrocks](https://github.com/mythrocks)
- DataFrame.columns = ... retains RangeIndex & set dtype ([#15129](https://github.com/rapidsai/cudf/pull/15129)) [@mroeschke](https://github.com/mroeschke)
- Correctly handle output for `GroupBy.apply` when chunk results are reindexed series ([#15109](https://github.com/rapidsai/cudf/pull/15109)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Fix Series.groupby.shift with a MultiIndex ([#15098](https://github.com/rapidsai/cudf/pull/15098)) [@mroeschke](https://github.com/mroeschke)
- Fix reductions when DataFrame has MulitIndex columns ([#15097](https://github.com/rapidsai/cudf/pull/15097)) [@mroeschke](https://github.com/mroeschke)
- Fix deprecation warnings for deprecated hash() calls ([#15095](https://github.com/rapidsai/cudf/pull/15095)) [@davidwendt](https://github.com/davidwendt)
- Add support for arrow `large_string` in `cudf` ([#15093](https://github.com/rapidsai/cudf/pull/15093)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `sort_values` pytest failure with pandas-2.x regression ([#15092](https://github.com/rapidsai/cudf/pull/15092)) [@galipremsagar](https://github.com/galipremsagar)
- Resolve path parsing issues in `get_json_object` ([#15082](https://github.com/rapidsai/cudf/pull/15082)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Fix bugs in handling of delta encodings ([#15075](https://github.com/rapidsai/cudf/pull/15075)) [@etseidl](https://github.com/etseidl)
- Fix `is_device_write_preferred` in `void_sink` and `user_sink_wrapper` ([#15064](https://github.com/rapidsai/cudf/pull/15064)) [@vuule](https://github.com/vuule)
- Eliminate duplicate allocation of nested string columns ([#15061](https://github.com/rapidsai/cudf/pull/15061)) [@vuule](https://github.com/vuule)
- Raise an error on import for unsupported GPUs. ([#15053](https://github.com/rapidsai/cudf/pull/15053)) [@bdice](https://github.com/bdice)
- Align concat Series name behavior in pandas 2.2 ([#15032](https://github.com/rapidsai/cudf/pull/15032)) [@mroeschke](https://github.com/mroeschke)
- Fix `Index.difference` to handle duplicate values when one of the inputs is empty ([#15016](https://github.com/rapidsai/cudf/pull/15016)) [@galipremsagar](https://github.com/galipremsagar)
- Add `future_stack` to `DataFrame.stack` ([#15015](https://github.com/rapidsai/cudf/pull/15015)) [@galipremsagar](https://github.com/galipremsagar)
- Fix handling of values=None in pylibcudf GroupBy.get_groups ([#14998](https://github.com/rapidsai/cudf/pull/14998)) [@shwina](https://github.com/shwina)
- Fix `DataFrame.sort_index` to respect `ignore_index` on all axis ([#14995](https://github.com/rapidsai/cudf/pull/14995)) [@galipremsagar](https://github.com/galipremsagar)
- Raise for pyarrow array that is tz-aware ([#14980](https://github.com/rapidsai/cudf/pull/14980)) [@mroeschke](https://github.com/mroeschke)
- Direct ``SeriesGroupBy.aggregate`` to ``SeriesGroupBy.agg`` ([#14971](https://github.com/rapidsai/cudf/pull/14971)) [@rjzamora](https://github.com/rjzamora)
- Respect IntervalDtype and CategoricalDtype objects passed by users ([#14961](https://github.com/rapidsai/cudf/pull/14961)) [@mroeschke](https://github.com/mroeschke)
- unset `CUDF_SPILL` after a pytest ([#14958](https://github.com/rapidsai/cudf/pull/14958)) [@galipremsagar](https://github.com/galipremsagar)
- Fix Null literals to be not parsed as string when mixed types as string is enabled in JSON reader ([#14939](https://github.com/rapidsai/cudf/pull/14939)) [@karthikeyann](https://github.com/karthikeyann)
- Fix chunked reads of Parquet delta encoded pages ([#14921](https://github.com/rapidsai/cudf/pull/14921)) [@etseidl](https://github.com/etseidl)
- Fix reading offset for data stream in ORC reader ([#14911](https://github.com/rapidsai/cudf/pull/14911)) [@ttnghia](https://github.com/ttnghia)
- Enable sanitizer check for a test case testORCReadAndWriteForDecimal128 ([#14897](https://github.com/rapidsai/cudf/pull/14897)) [@res-life](https://github.com/res-life)
- Fix dask token normalization ([#14829](https://github.com/rapidsai/cudf/pull/14829)) [@rjzamora](https://github.com/rjzamora)
- Fix 24.04 versions ([#14825](https://github.com/rapidsai/cudf/pull/14825)) [@raydouglass](https://github.com/raydouglass)
- Ensure slow private attrs are maybe proxies ([#14380](https://github.com/rapidsai/cudf/pull/14380)) [@mroeschke](https://github.com/mroeschke)
## 📖 Documentation
- Ignore DLManagedTensor in the docs build ([#15392](https://github.com/rapidsai/cudf/pull/15392)) [@davidwendt](https://github.com/davidwendt)
- Revert "Temporarily disable docs errors. ([#15265)" (#15269](https://github.com/rapidsai/cudf/pull/15265)" (#15269)) [@bdice](https://github.com/bdice)
- Temporarily disable docs errors. ([#15265](https://github.com/rapidsai/cudf/pull/15265)) [@bdice](https://github.com/bdice)
- Update `developer_guide.md` with new guidance on quoted internal includes ([#15238](https://github.com/rapidsai/cudf/pull/15238)) [@harrism](https://github.com/harrism)
- Fix broken link for developer guide ([#15025](https://github.com/rapidsai/cudf/pull/15025)) [@sanjana098](https://github.com/sanjana098)
- [DOC] Update typo in docs example of structs_column_wrapper ([#14949](https://github.com/rapidsai/cudf/pull/14949)) [@karthikeyann](https://github.com/karthikeyann)
- Update cudf.pandas FAQ. ([#14940](https://github.com/rapidsai/cudf/pull/14940)) [@bdice](https://github.com/bdice)
- Optimize doc builds ([#14856](https://github.com/rapidsai/cudf/pull/14856)) [@vyasr](https://github.com/vyasr)
- Add developer guideline to use east const. ([#14836](https://github.com/rapidsai/cudf/pull/14836)) [@bdice](https://github.com/bdice)
- Document how cuDF is pronounced ([#14753](https://github.com/rapidsai/cudf/pull/14753)) [@pentschev](https://github.com/pentschev)
- Notes convert to Pandas-compat ([#12641](https://github.com/rapidsai/cudf/pull/12641)) [@Touutae-lab](https://github.com/Touutae-lab)
## 🚀 New Features
- Address inconsistency in single quote normalization in JSON reader ([#15324](https://github.com/rapidsai/cudf/pull/15324)) [@shrshi](https://github.com/shrshi)
- Use JNI pinned pool resource with cuIO ([#15255](https://github.com/rapidsai/cudf/pull/15255)) [@abellina](https://github.com/abellina)
- Add DELTA_BYTE_ARRAY encoder for Parquet ([#15239](https://github.com/rapidsai/cudf/pull/15239)) [@etseidl](https://github.com/etseidl)
- Migrate filling operations to pylibcudf ([#15225](https://github.com/rapidsai/cudf/pull/15225)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- [JNI] rmm based pinned pool ([#15219](https://github.com/rapidsai/cudf/pull/15219)) [@abellina](https://github.com/abellina)
- Implement zero-copy host buffer source instead of using an arrow implementation ([#15189](https://github.com/rapidsai/cudf/pull/15189)) [@vuule](https://github.com/vuule)
- Enable creation of columns from scalar ([#15181](https://github.com/rapidsai/cudf/pull/15181)) [@vyasr](https://github.com/vyasr)
- Use NVTX from GitHub. ([#15178](https://github.com/rapidsai/cudf/pull/15178)) [@bdice](https://github.com/bdice)
- Implement `segmented_row_bit_count` for computing row sizes by segments of rows ([#15169](https://github.com/rapidsai/cudf/pull/15169)) [@ttnghia](https://github.com/ttnghia)
- Implement search using pylibcudf ([#15166](https://github.com/rapidsai/cudf/pull/15166)) [@vyasr](https://github.com/vyasr)
- Add distinct left join ([#15149](https://github.com/rapidsai/cudf/pull/15149)) [@PointKernel](https://github.com/PointKernel)
- Add cardinality control for groupby benchs with flat types ([#15134](https://github.com/rapidsai/cudf/pull/15134)) [@PointKernel](https://github.com/PointKernel)
- Add ability to request Parquet encodings on a per-column basis ([#15081](https://github.com/rapidsai/cudf/pull/15081)) [@etseidl](https://github.com/etseidl)
- Automate include grouping order in .clang-format ([#15063](https://github.com/rapidsai/cudf/pull/15063)) [@harrism](https://github.com/harrism)
- Requesting a clean build directory also clears Jitify cache ([#15052](https://github.com/rapidsai/cudf/pull/15052)) [@robertmaynard](https://github.com/robertmaynard)
- API for JSON unquoted whitespace normalization ([#15033](https://github.com/rapidsai/cudf/pull/15033)) [@shrshi](https://github.com/shrshi)
- Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf ([#15011](https://github.com/rapidsai/cudf/pull/15011)) [@vyasr](https://github.com/vyasr)
- Implement replace in pylibcudf ([#15005](https://github.com/rapidsai/cudf/pull/15005)) [@vyasr](https://github.com/vyasr)
- Add distinct key inner join ([#14990](https://github.com/rapidsai/cudf/pull/14990)) [@PointKernel](https://github.com/PointKernel)
- Implement rolling in pylibcudf ([#14982](https://github.com/rapidsai/cudf/pull/14982)) [@vyasr](https://github.com/vyasr)
- Implement joins in pylibcudf ([#14972](https://github.com/rapidsai/cudf/pull/14972)) [@vyasr](https://github.com/vyasr)
- Implement scans and reductions in pylibcudf ([#14970](https://github.com/rapidsai/cudf/pull/14970)) [@vyasr](https://github.com/vyasr)
- Rewrite cudf internals using pylibcudf groupby ([#14946](https://github.com/rapidsai/cudf/pull/14946)) [@vyasr](https://github.com/vyasr)
- Implement groupby in pylibcudf ([#14945](https://github.com/rapidsai/cudf/pull/14945)) [@vyasr](https://github.com/vyasr)
- Support casting of Map type to string in JSON reader ([#14936](https://github.com/rapidsai/cudf/pull/14936)) [@karthikeyann](https://github.com/karthikeyann)
- POC for whitespace removal in input JSON data using FST ([#14931](https://github.com/rapidsai/cudf/pull/14931)) [@shrshi](https://github.com/shrshi)
- Support for LZ4 compression in ORC and Parquet ([#14906](https://github.com/rapidsai/cudf/pull/14906)) [@vuule](https://github.com/vuule)
- Remove supports_streams from cuDF custom memory resources. ([#14857](https://github.com/rapidsai/cudf/pull/14857)) [@harrism](https://github.com/harrism)
- Migrate unary operations to pylibcudf ([#14850](https://github.com/rapidsai/cudf/pull/14850)) [@vyasr](https://github.com/vyasr)
- Migrate binary operations to pylibcudf ([#14821](https://github.com/rapidsai/cudf/pull/14821)) [@vyasr](https://github.com/vyasr)
- Add row index and stripe size options to Python ORC chunked writer ([#14785](https://github.com/rapidsai/cudf/pull/14785)) [@vuule](https://github.com/vuule)
- Support CUDA 12.2 ([#14712](https://github.com/rapidsai/cudf/pull/14712)) [@jameslamb](https://github.com/jameslamb)
## 🛠️ Improvements
- Use `conda env create --yes` instead of `--force` ([#15403](https://github.com/rapidsai/cudf/pull/15403)) [@bdice](https://github.com/bdice)
- Restructure pylibcudf/arrow interop facilities ([#15325](https://github.com/rapidsai/cudf/pull/15325)) [@vyasr](https://github.com/vyasr)
- Change exceptions thrown by copying APIs ([#15319](https://github.com/rapidsai/cudf/pull/15319)) [@vyasr](https://github.com/vyasr)
- Enable branch testing for `cudf.pandas` ([#15316](https://github.com/rapidsai/cudf/pull/15316)) [@galipremsagar](https://github.com/galipremsagar)
- Replace black with ruff-format ([#15312](https://github.com/rapidsai/cudf/pull/15312)) [@mroeschke](https://github.com/mroeschke)
- This fixes an NPE when trying to read empty JSON data by adding a new API for missing information ([#15307](https://github.com/rapidsai/cudf/pull/15307)) [@revans2](https://github.com/revans2)
- Address poor performance of Parquet string decoding ([#15304](https://github.com/rapidsai/cudf/pull/15304)) [@etseidl](https://github.com/etseidl)
- Update script input name ([#15301](https://github.com/rapidsai/cudf/pull/15301)) [@AyodeAwe](https://github.com/AyodeAwe)
- Make test_read_parquet_partitioned_filtered data deterministic ([#15296](https://github.com/rapidsai/cudf/pull/15296)) [@mroeschke](https://github.com/mroeschke)
- Add timeout for `cudf.pandas` pandas tests ([#15284](https://github.com/rapidsai/cudf/pull/15284)) [@galipremsagar](https://github.com/galipremsagar)
- Add upper bound to prevent usage of NumPy 2 ([#15283](https://github.com/rapidsai/cudf/pull/15283)) [@bdice](https://github.com/bdice)
- Fix cudf::test::to_host return of host_vector ([#15263](https://github.com/rapidsai/cudf/pull/15263)) [@davidwendt](https://github.com/davidwendt)
- Implement grouped product scan ([#15254](https://github.com/rapidsai/cudf/pull/15254)) [@wence-](https://github.com/wence-)
- Add CUDA 12.4 to supported PTX versions ([#15247](https://github.com/rapidsai/cudf/pull/15247)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Implement DataFrame|Series.squeeze ([#15244](https://github.com/rapidsai/cudf/pull/15244)) [@mroeschke](https://github.com/mroeschke)
- Roll back ipow changes due to register pressure. ([#15242](https://github.com/rapidsai/cudf/pull/15242)) [@pmattione-nvidia](https://github.com/pmattione-nvidia)
- Remove create_chars_child_column utility ([#15241](https://github.com/rapidsai/cudf/pull/15241)) [@davidwendt](https://github.com/davidwendt)
- Update dlpack to version 0.8 ([#15237](https://github.com/rapidsai/cudf/pull/15237)) [@dantegd](https://github.com/dantegd)
- Improve performance in JSON reader when `mixed_types_as_string` option is enabled ([#15236](https://github.com/rapidsai/cudf/pull/15236)) [@shrshi](https://github.com/shrshi)
- Remove row conversion code from libcudf ([#15234](https://github.com/rapidsai/cudf/pull/15234)) [@ttnghia](https://github.com/ttnghia)
- Use variable substitution for RAPIDS version in Doxyfile ([#15231](https://github.com/rapidsai/cudf/pull/15231)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Add ListColumns.to_pandas(arrow_type=) ([#15228](https://github.com/rapidsai/cudf/pull/15228)) [@mroeschke](https://github.com/mroeschke)
- Treat dask-cudf CI artifacts as pure wheels ([#15223](https://github.com/rapidsai/cudf/pull/15223)) [@bdice](https://github.com/bdice)
- Clean up usage of __CUDA_ARCH__ and other macros. ([#15218](https://github.com/rapidsai/cudf/pull/15218)) [@bdice](https://github.com/bdice)
- DOC: use constants in performance-comparisons.ipynb ([#15215](https://github.com/rapidsai/cudf/pull/15215)) [@raybellwaves](https://github.com/raybellwaves)
- Rewrite conversion in terms of column ([#15213](https://github.com/rapidsai/cudf/pull/15213)) [@vyasr](https://github.com/vyasr)
- Switch `pytest-xdist` algo to `worksteal` ([#15207](https://github.com/rapidsai/cudf/pull/15207)) [@galipremsagar](https://github.com/galipremsagar)
- Deprecate strings_column_view::offsets_begin() ([#15205](https://github.com/rapidsai/cudf/pull/15205)) [@davidwendt](https://github.com/davidwendt)
- Add `get_upstream_resource` method to `stream_checking_resource_adaptor` ([#15203](https://github.com/rapidsai/cudf/pull/15203)) [@miscco](https://github.com/miscco)
- Tune up row size estimation in the data generator ([#15202](https://github.com/rapidsai/cudf/pull/15202)) [@vuule](https://github.com/vuule)
- Fix `offset` value for generating test data in `parquet_chunked_reader_test.cu` ([#15200](https://github.com/rapidsai/cudf/pull/15200)) [@ttnghia](https://github.com/ttnghia)
- Change strings_column_view::char_size to return int64 ([#15197](https://github.com/rapidsai/cudf/pull/15197)) [@davidwendt](https://github.com/davidwendt)
- Fix includes for row_operators.cuh ([#15194](https://github.com/rapidsai/cudf/pull/15194)) [@davidwendt](https://github.com/davidwendt)
- Generalize GHA selectors for pure Python testing ([#15191](https://github.com/rapidsai/cudf/pull/15191)) [@bdice](https://github.com/bdice)
- Improvements for `__cuda_array_interface__` tests ([#15188](https://github.com/rapidsai/cudf/pull/15188)) [@bdice](https://github.com/bdice)
- Allow to_pandas to return pandas.ArrowDtype ([#15182](https://github.com/rapidsai/cudf/pull/15182)) [@mroeschke](https://github.com/mroeschke)
- Ignore `byte_range` in `read_json` when the size is not smaller than the input data ([#15180](https://github.com/rapidsai/cudf/pull/15180)) [@vuule](https://github.com/vuule)
- Expose new stable_sort and finish stream_compaction in pylibcudf ([#15175](https://github.com/rapidsai/cudf/pull/15175)) [@wence-](https://github.com/wence-)
- [ci] update matrix filters for dask-cudf builds ([#15174](https://github.com/rapidsai/cudf/pull/15174)) [@jameslamb](https://github.com/jameslamb)
- Change make_strings_children to return uvector ([#15171](https://github.com/rapidsai/cudf/pull/15171)) [@davidwendt](https://github.com/davidwendt)
- Don't override to_pandas for Datelike columns ([#15167](https://github.com/rapidsai/cudf/pull/15167)) [@mroeschke](https://github.com/mroeschke)
- Drop python-snappy from dependencies. ([#15161](https://github.com/rapidsai/cudf/pull/15161)) [@bdice](https://github.com/bdice)
- Add microkernels for fixed-width and fixed-width dictionary in Parquet decode ([#15159](https://github.com/rapidsai/cudf/pull/15159)) [@abellina](https://github.com/abellina)
- Make HostColumnVector.DataType accessor methods public ([#15157](https://github.com/rapidsai/cudf/pull/15157)) [@jbrennan333](https://github.com/jbrennan333)
- Java bindings for left outer distinct join ([#15154](https://github.com/rapidsai/cudf/pull/15154)) [@jlowe](https://github.com/jlowe)
- Forward-merge branch-24.02 to branch-24.04 ([#15153](https://github.com/rapidsai/cudf/pull/15153)) [@bdice](https://github.com/bdice)
- Enable pandas pytests for `cudf.pandas` ([#15147](https://github.com/rapidsai/cudf/pull/15147)) [@galipremsagar](https://github.com/galipremsagar)
- Add java option to keep quotes for JSON reads ([#15146](https://github.com/rapidsai/cudf/pull/15146)) [@revans2](https://github.com/revans2)
- Change cross-pandas-version testing in `cudf` ([#15145](https://github.com/rapidsai/cudf/pull/15145)) [@galipremsagar](https://github.com/galipremsagar)
- Use `hostdevice_vector` in `kernel_error` to avoid the pageable copy ([#15140](https://github.com/rapidsai/cudf/pull/15140)) [@vuule](https://github.com/vuule)
- Clean up Columns.astype & cudf.dtype ([#15125](https://github.com/rapidsai/cudf/pull/15125)) [@mroeschke](https://github.com/mroeschke)
- Simplify some to_pandas implementations ([#15123](https://github.com/rapidsai/cudf/pull/15123)) [@mroeschke](https://github.com/mroeschke)
- Java: Add leak tracking for Scalar instances ([#15121](https://github.com/rapidsai/cudf/pull/15121)) [@jlowe](https://github.com/jlowe)
- Remove calls to strings_column_view::offsets_begin() ([#15112](https://github.com/rapidsai/cudf/pull/15112)) [@davidwendt](https://github.com/davidwendt)
- Add support for Python 3.11, require NumPy 1.23+ ([#15111](https://github.com/rapidsai/cudf/pull/15111)) [@jameslamb](https://github.com/jameslamb)
- Compile-time ipow computation with array lookup ([#15110](https://github.com/rapidsai/cudf/pull/15110)) [@pmattione-nvidia](https://github.com/pmattione-nvidia)
- Upgrade to `arrow-14.0.2` ([#15108](https://github.com/rapidsai/cudf/pull/15108)) [@galipremsagar](https://github.com/galipremsagar)
- Dynamically set version in RAPIDS doc builds ([#15101](https://github.com/rapidsai/cudf/pull/15101)) [@jakirkham](https://github.com/jakirkham)
- Add support for `pandas-2.2` in `cudf` ([#15100](https://github.com/rapidsai/cudf/pull/15100)) [@galipremsagar](https://github.com/galipremsagar)
- Update devcontainers to CUDA Toolkit 12.2 ([#15099](https://github.com/rapidsai/cudf/pull/15099)) [@trxcllnt](https://github.com/trxcllnt)
- Fix `datetime` binop pytest failures in pandas-2.2 ([#15090](https://github.com/rapidsai/cudf/pull/15090)) [@galipremsagar](https://github.com/galipremsagar)
- Validate types in pylibcudf Column/Table constructors ([#15088](https://github.com/rapidsai/cudf/pull/15088)) [@wence-](https://github.com/wence-)
- xfail test_join_ordering_pandas_compat for pandas 2.2 ([#15080](https://github.com/rapidsai/cudf/pull/15080)) [@mroeschke](https://github.com/mroeschke)
- Add general purpose host memory allocator reference to cuIO with a demo of pooled-pinned allocation. ([#15079](https://github.com/rapidsai/cudf/pull/15079)) [@nvdbaranec](https://github.com/nvdbaranec)
- Adjust test_binops for pandas 2.2 ([#15078](https://github.com/rapidsai/cudf/pull/15078)) [@mroeschke](https://github.com/mroeschke)
- Remove offsets_begin() call from nvtext::generate_ngrams ([#15077](https://github.com/rapidsai/cudf/pull/15077)) [@davidwendt](https://github.com/davidwendt)
- Use offsetalator in cudf::detail::has_nonempty_null_rows ([#15076](https://github.com/rapidsai/cudf/pull/15076)) [@davidwendt](https://github.com/davidwendt)
- Deprecate cudf::hashing::spark_murmurhash3_x86_32 ([#15074](https://github.com/rapidsai/cudf/pull/15074)) [@davidwendt](https://github.com/davidwendt)
- Fix cudf::test::to_host to handle both offset types for strings columns ([#15073](https://github.com/rapidsai/cudf/pull/15073)) [@davidwendt](https://github.com/davidwendt)
- Add condition for test_groupby_nulls_basic in pandas 2.2 ([#15072](https://github.com/rapidsai/cudf/pull/15072)) [@mroeschke](https://github.com/mroeschke)
- xfail tests in test_udf_masked_ops due to pandas 2.2 bug ([#15071](https://github.com/rapidsai/cudf/pull/15071)) [@mroeschke](https://github.com/mroeschke)
- target branch-24.04 for GitHub Actions workflows ([#15069](https://github.com/rapidsai/cudf/pull/15069)) [@jameslamb](https://github.com/jameslamb)
- Implement stable version of `cudf::sort` ([#15066](https://github.com/rapidsai/cudf/pull/15066)) [@wence-](https://github.com/wence-)
- Fix ORC and JSON tests failures for pandas 2.2 ([#15062](https://github.com/rapidsai/cudf/pull/15062)) [@mroeschke](https://github.com/mroeschke)
- Adjust test_joining for pandas 2.2 ([#15060](https://github.com/rapidsai/cudf/pull/15060)) [@mroeschke](https://github.com/mroeschke)
- Align MultiIndex.get_indexder with pandas 2.2 change ([#15059](https://github.com/rapidsai/cudf/pull/15059)) [@mroeschke](https://github.com/mroeschke)
- Fix test_resample index dtype checking for pandas 2.2 ([#15058](https://github.com/rapidsai/cudf/pull/15058)) [@mroeschke](https://github.com/mroeschke)
- Split out strings/replace.cu and rework its gtests ([#15054](https://github.com/rapidsai/cudf/pull/15054)) [@davidwendt](https://github.com/davidwendt)
- Avoid incompatible value type setting in test_rolling for pandas 2.2 ([#15050](https://github.com/rapidsai/cudf/pull/15050)) [@mroeschke](https://github.com/mroeschke)
- Change chained replace inplace test to COW test for pandas 2.2 ([#15049](https://github.com/rapidsai/cudf/pull/15049)) [@mroeschke](https://github.com/mroeschke)
- Deprecate datelike isin casting strings to dates to match pandas 2.2 ([#15046](https://github.com/rapidsai/cudf/pull/15046)) [@mroeschke](https://github.com/mroeschke)
- Avoid chained indexing in test_indexing for pandas 2.2 ([#15045](https://github.com/rapidsai/cudf/pull/15045)) [@mroeschke](https://github.com/mroeschke)
- Avoid pandas 2.2 `DeprecationWarning` in test_hdf ([#15044](https://github.com/rapidsai/cudf/pull/15044)) [@mroeschke](https://github.com/mroeschke)
- Use appropriate make_offsets_child_column for building lists columns ([#15043](https://github.com/rapidsai/cudf/pull/15043)) [@davidwendt](https://github.com/davidwendt)
- Factor out position-offsets logic from strings split_helper utility ([#15040](https://github.com/rapidsai/cudf/pull/15040)) [@davidwendt](https://github.com/davidwendt)
- Forward-merge branch-24.02 to branch-24.04 ([#15039](https://github.com/rapidsai/cudf/pull/15039)) [@bdice](https://github.com/bdice)
- Clean up nvtx macros ([#15038](https://github.com/rapidsai/cudf/pull/15038)) [@PointKernel](https://github.com/PointKernel)
- Add xfailures for test_applymap for pandas 2.2 ([#15034](https://github.com/rapidsai/cudf/pull/15034)) [@mroeschke](https://github.com/mroeschke)
- Expose libcudf filter expression in read_parquet ([#15028](https://github.com/rapidsai/cudf/pull/15028)) [@wence-](https://github.com/wence-)
- Adjust tests in test_dataframe.py for pandas 2.2 ([#15023](https://github.com/rapidsai/cudf/pull/15023)) [@mroeschke](https://github.com/mroeschke)
- Adjust test_datetime_infer_format for pandas 2.2 ([#15021](https://github.com/rapidsai/cudf/pull/15021)) [@mroeschke](https://github.com/mroeschke)
- Performance optimizations for parquet sub-rowgroup reader. ([#15020](https://github.com/rapidsai/cudf/pull/15020)) [@nvdbaranec](https://github.com/nvdbaranec)
- JNI bindings for distinct_hash_join ([#15019](https://github.com/rapidsai/cudf/pull/15019)) [@jlowe](https://github.com/jlowe)
- Change copy_if_safe to call thrust instead of the overload function ([#15018](https://github.com/rapidsai/cudf/pull/15018)) [@davidwendt](https://github.com/davidwendt)
- Improve performance of copy_if_else for long strings ([#15017](https://github.com/rapidsai/cudf/pull/15017)) [@davidwendt](https://github.com/davidwendt)
- Fix is_string_dtype test for pandas 2.2 ([#15012](https://github.com/rapidsai/cudf/pull/15012)) [@mroeschke](https://github.com/mroeschke)
- Rework cudf::strings::detail::copy_range for offsetalator ([#15010](https://github.com/rapidsai/cudf/pull/15010)) [@davidwendt](https://github.com/davidwendt)
- Use offsetalator in cudf::get_json_object() ([#15009](https://github.com/rapidsai/cudf/pull/15009)) [@davidwendt](https://github.com/davidwendt)
- Align integral types in ORC to specs ([#15008](https://github.com/rapidsai/cudf/pull/15008)) [@vuule](https://github.com/vuule)
- Clean up detail sequence header inclusion ([#15007](https://github.com/rapidsai/cudf/pull/15007)) [@PointKernel](https://github.com/PointKernel)
- Add groupby.apply(include_groups=) to match pandas 2.2 deprecation ([#15006](https://github.com/rapidsai/cudf/pull/15006)) [@mroeschke](https://github.com/mroeschke)
- Use offsetalator in cudf::interleave_columns() ([#15004](https://github.com/rapidsai/cudf/pull/15004)) [@davidwendt](https://github.com/davidwendt)
- Use offsetalator in cudf::row_bit_count() ([#15003](https://github.com/rapidsai/cudf/pull/15003)) [@davidwendt](https://github.com/davidwendt)
- Use offsetalator in cudf::strings::wrap() ([#15002](https://github.com/rapidsai/cudf/pull/15002)) [@davidwendt](https://github.com/davidwendt)
- Use offsetalator in cudf::strings::reverse ([#15001](https://github.com/rapidsai/cudf/pull/15001)) [@davidwendt](https://github.com/davidwendt)
- Deprecate groupby fillna ([#15000](https://github.com/rapidsai/cudf/pull/15000)) [@mroeschke](https://github.com/mroeschke)
- Ensure to_* IO methods respect pandas 2.2 keyword only deprecation ([#14999](https://github.com/rapidsai/cudf/pull/14999)) [@mroeschke](https://github.com/mroeschke)
- Remove unneeded calls to create_chars_child_column utility ([#14997](https://github.com/rapidsai/cudf/pull/14997)) [@davidwendt](https://github.com/davidwendt)
- Add environment-agnostic scripts for running ctests and pytests ([#14992](https://github.com/rapidsai/cudf/pull/14992)) [@trxcllnt](https://github.com/trxcllnt)
- Filter all `DeprecationWarning`'s by `ArrowTable.to_pandas()` ([#14989](https://github.com/rapidsai/cudf/pull/14989)) [@galipremsagar](https://github.com/galipremsagar)
- Deprecate replace with categorical columns ([#14988](https://github.com/rapidsai/cudf/pull/14988)) [@mroeschke](https://github.com/mroeschke)
- Deprecate delim_whitespace in read_csv for pandas 2.2 ([#14986](https://github.com/rapidsai/cudf/pull/14986)) [@mroeschke](https://github.com/mroeschke)
- Deprecate parameters similar to pandas 2.2 ([#14984](https://github.com/rapidsai/cudf/pull/14984)) [@mroeschke](https://github.com/mroeschke)
- Ensure that `ctest` is called with `--no-tests=error`. ([#14983](https://github.com/rapidsai/cudf/pull/14983)) [@bdice](https://github.com/bdice)
- Deprecate non-integer `periods` in `date_range` and `interval_range` ([#14976](https://github.com/rapidsai/cudf/pull/14976)) [@galipremsagar](https://github.com/galipremsagar)
- Update ops-bot.yaml ([#14974](https://github.com/rapidsai/cudf/pull/14974)) [@AyodeAwe](https://github.com/AyodeAwe)
- Use page statistics in Parquet reader ([#14973](https://github.com/rapidsai/cudf/pull/14973)) [@etseidl](https://github.com/etseidl)
- Use fused types for overloaded function signatures ([#14969](https://github.com/rapidsai/cudf/pull/14969)) [@vyasr](https://github.com/vyasr)
- Deprecate certain frequency strings ([#14967](https://github.com/rapidsai/cudf/pull/14967)) [@galipremsagar](https://github.com/galipremsagar)
- Update copyrights for 24.04. ([#14964](https://github.com/rapidsai/cudf/pull/14964)) [@bdice](https://github.com/bdice)
- Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. ([#14962](https://github.com/rapidsai/cudf/pull/14962)) [@bdice](https://github.com/bdice)
- Introduce `GetJsonObjectOptions` in `getJSONObject` Java API ([#14956](https://github.com/rapidsai/cudf/pull/14956)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- JNI JSON read with DataSource and infered schema, along with basic java nested Schema JSON reads ([#14954](https://github.com/rapidsai/cudf/pull/14954)) [@revans2](https://github.com/revans2)
- Make codecov only informational (always pass). ([#14952](https://github.com/rapidsai/cudf/pull/14952)) [@bdice](https://github.com/bdice)
- Replace legacy cudf and dask_cudf imports as (d)gd ([#14944](https://github.com/rapidsai/cudf/pull/14944)) [@mroeschke](https://github.com/mroeschke)
- Replace _is_datetime64tz/interval_dtype with isinstance ([#14943](https://github.com/rapidsai/cudf/pull/14943)) [@mroeschke](https://github.com/mroeschke)
- Update tests for pandas 2. ([#14941](https://github.com/rapidsai/cudf/pull/14941)) [@bdice](https://github.com/bdice)
- Use more public pandas APIs ([#14929](https://github.com/rapidsai/cudf/pull/14929)) [@mroeschke](https://github.com/mroeschke)
- Replace local copyright check with pre-commit-hooks verify-copyright ([#14917](https://github.com/rapidsai/cudf/pull/14917)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Add `pandas-2.x` support in `cudf` ([#14916](https://github.com/rapidsai/cudf/pull/14916)) [@galipremsagar](https://github.com/galipremsagar)
- Use offsetalator in nvtext::byte_pair_encoding ([#14888](https://github.com/rapidsai/cudf/pull/14888)) [@davidwendt](https://github.com/davidwendt)
- De-DOS line-endings ([#14880](https://github.com/rapidsai/cudf/pull/14880)) [@wence-](https://github.com/wence-)
- Add detail `cuco_allocator` ([#14877](https://github.com/rapidsai/cudf/pull/14877)) [@PointKernel](https://github.com/PointKernel)
- Move all core types to using enum class in Cython ([#14876](https://github.com/rapidsai/cudf/pull/14876)) [@vyasr](https://github.com/vyasr)
- Read `cudf.__version__` in Sphinx build ([#14872](https://github.com/rapidsai/cudf/pull/14872)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Use int64 offset types for accessing code-points in nvtext::normalize ([#14868](https://github.com/rapidsai/cudf/pull/14868)) [@davidwendt](https://github.com/davidwendt)
- Read version from VERSION file in CMake ([#14867](https://github.com/rapidsai/cudf/pull/14867)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Update conda-cpp-post-build-checks to branch-24.04. ([#14854](https://github.com/rapidsai/cudf/pull/14854)) [@bdice](https://github.com/bdice)
- Update cudf for compatibility with the latest cuco ([#14849](https://github.com/rapidsai/cudf/pull/14849)) [@PointKernel](https://github.com/PointKernel)
- Remove deprecated strings functions ([#14848](https://github.com/rapidsai/cudf/pull/14848)) [@davidwendt](https://github.com/davidwendt)
- Fix CI workflows for pandas-tests and add test summary. ([#14847](https://github.com/rapidsai/cudf/pull/14847)) [@bdice](https://github.com/bdice)
- Use offsetalator in cudf::strings::copy_slice ([#14844](https://github.com/rapidsai/cudf/pull/14844)) [@davidwendt](https://github.com/davidwendt)
- Fix V2 Parquet page alignment for use with zStandard compression ([#14841](https://github.com/rapidsai/cudf/pull/14841)) [@etseidl](https://github.com/etseidl)
- Fix calls to deprecated strings factory API in examples. ([#14838](https://github.com/rapidsai/cudf/pull/14838)) [@bdice](https://github.com/bdice)
- Update pre-commit hooks ([#14837](https://github.com/rapidsai/cudf/pull/14837)) [@bdice](https://github.com/bdice)
- Use `rapids_cuda_set_runtime` to determine cuda runtime usage by target ([#14833](https://github.com/rapidsai/cudf/pull/14833)) [@vyasr](https://github.com/vyasr)
- Remove get_mem_info functions from custom memory resources ([#14832](https://github.com/rapidsai/cudf/pull/14832)) [@harrism](https://github.com/harrism)
- Fix debug build by splitting row_operator_tests_utilities.cu ([#14826](https://github.com/rapidsai/cudf/pull/14826)) [@davidwendt](https://github.com/davidwendt)
- Remove -DNVBench_ENABLE_CUPTI=OFF. ([#14820](https://github.com/rapidsai/cudf/pull/14820)) [@bdice](https://github.com/bdice)
- Use cuco::static_set in the hash-based groupby ([#14813](https://github.com/rapidsai/cudf/pull/14813)) [@PointKernel](https://github.com/PointKernel)
- Branch 24.04 merge branch 24.02 ([#14809](https://github.com/rapidsai/cudf/pull/14809)) [@vyasr](https://github.com/vyasr)
- Branch 24.04 merge branch 24.02 ([#14806](https://github.com/rapidsai/cudf/pull/14806)) [@vyasr](https://github.com/vyasr)
- Introduce basic "cudf" backend for Dask Expressions ([#14805](https://github.com/rapidsai/cudf/pull/14805)) [@rjzamora](https://github.com/rjzamora)
- Remove `build_struct|list_column` ([#14786](https://github.com/rapidsai/cudf/pull/14786)) [@mroeschke](https://github.com/mroeschke)
- Use offsetalator in nvtext tokenize functions ([#14783](https://github.com/rapidsai/cudf/pull/14783)) [@davidwendt](https://github.com/davidwendt)
- Reduce execution time of Python ORC tests ([#14776](https://github.com/rapidsai/cudf/pull/14776)) [@vuule](https://github.com/vuule)
- Use offsetalator in cudf::strings::split functions ([#14757](https://github.com/rapidsai/cudf/pull/14757)) [@davidwendt](https://github.com/davidwendt)
- Use offsetalator in cudf::strings::findall ([#14745](https://github.com/rapidsai/cudf/pull/14745)) [@davidwendt](https://github.com/davidwendt)
- Use offsetalator in cudf::strings::url_decode ([#14744](https://github.com/rapidsai/cudf/pull/14744)) [@davidwendt](https://github.com/davidwendt)
- Use get_offset_value utility in strings shift function ([#14743](https://github.com/rapidsai/cudf/pull/14743)) [@davidwendt](https://github.com/davidwendt)
- Use as_column instead of full ([#14698](https://github.com/rapidsai/cudf/pull/14698)) [@mroeschke](https://github.com/mroeschke)
- List all notable breaking changes ([#13535](https://github.com/rapidsai/cudf/pull/13535)) [@galipremsagar](https://github.com/galipremsagar)
# cuDF 24.02.00 (12 Feb 2024)
## 🚨 Breaking Changes
- Remove **kwargs from astype ([#14765](https://github.com/rapidsai/cudf/pull/14765)) [@mroeschke](https://github.com/mroeschke)
- Remove mimesis as a testing dependency ([#14723](https://github.com/rapidsai/cudf/pull/14723)) [@mroeschke](https://github.com/mroeschke)
- Update to Dask's `shuffle_method` kwarg ([#14708](https://github.com/rapidsai/cudf/pull/14708)) [@pentschev](https://github.com/pentschev)
- Drop Pascal GPU support. ([#14630](https://github.com/rapidsai/cudf/pull/14630)) [@bdice](https://github.com/bdice)
- Update to CCCL 2.2.0. ([#14576](https://github.com/rapidsai/cudf/pull/14576)) [@bdice](https://github.com/bdice)
- Expunge as_frame conversions in Column algorithms ([#14491](https://github.com/rapidsai/cudf/pull/14491)) [@wence-](https://github.com/wence-)
- Deprecate cudf::make_strings_column accepting typed offsets ([#14461](https://github.com/rapidsai/cudf/pull/14461)) [@davidwendt](https://github.com/davidwendt)
- Remove deprecated nvtext::load_merge_pairs_file ([#14460](https://github.com/rapidsai/cudf/pull/14460)) [@davidwendt](https://github.com/davidwendt)
- Include writer code and writerVersion in ORC files ([#14458](https://github.com/rapidsai/cudf/pull/14458)) [@vuule](https://github.com/vuule)
- Remove null mask for zero nulls in json readers ([#14451](https://github.com/rapidsai/cudf/pull/14451)) [@karthikeyann](https://github.com/karthikeyann)
- REF: Remove **kwargs from to_pandas, raise if nullable is not implemented ([#14438](https://github.com/rapidsai/cudf/pull/14438)) [@mroeschke](https://github.com/mroeschke)
- Consolidate 1D pandas object handling in as_column ([#14394](https://github.com/rapidsai/cudf/pull/14394)) [@mroeschke](https://github.com/mroeschke)
- Move chars column to parent data buffer in strings column ([#14202](https://github.com/rapidsai/cudf/pull/14202)) [@karthikeyann](https://github.com/karthikeyann)
- Switch to scikit-build-core ([#13531](https://github.com/rapidsai/cudf/pull/13531)) [@vyasr](https://github.com/vyasr)
## 🐛 Bug Fixes
- Exclude tests from builds ([#14981](https://github.com/rapidsai/cudf/pull/14981)) [@vyasr](https://github.com/vyasr)
- Fix the bounce buffer size in ORC writer ([#14947](https://github.com/rapidsai/cudf/pull/14947)) [@vuule](https://github.com/vuule)
- Revert sum/product aggregation to always produce `int64_t` type ([#14907](https://github.com/rapidsai/cudf/pull/14907)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Fixed an issue with output chunking computation stemming from input chunking. ([#14889](https://github.com/rapidsai/cudf/pull/14889)) [@nvdbaranec](https://github.com/nvdbaranec)
- Fix total_byte_size in Parquet row group metadata ([#14802](https://github.com/rapidsai/cudf/pull/14802)) [@etseidl](https://github.com/etseidl)
- Fix index difference to follow the pandas format ([#14789](https://github.com/rapidsai/cudf/pull/14789)) [@amiralimi](https://github.com/amiralimi)
- Fix shared-workflows repo name ([#14784](https://github.com/rapidsai/cudf/pull/14784)) [@raydouglass](https://github.com/raydouglass)
- Remove unparseable attributes from all nodes ([#14780](https://github.com/rapidsai/cudf/pull/14780)) [@vyasr](https://github.com/vyasr)
- Refactor and add validation to IntervalIndex.__init__ ([#14778](https://github.com/rapidsai/cudf/pull/14778)) [@mroeschke](https://github.com/mroeschke)
- Work around incompatibilities between V2 page header handling and zStandard compression in Parquet writer ([#14772](https://github.com/rapidsai/cudf/pull/14772)) [@etseidl](https://github.com/etseidl)
- Fix calls to deprecated strings factory API ([#14771](https://github.com/rapidsai/cudf/pull/14771)) [@davidwendt](https://github.com/davidwendt)
- Fix ptx file discovery in editable installs ([#14767](https://github.com/rapidsai/cudf/pull/14767)) [@vyasr](https://github.com/vyasr)
- Revise ``shuffle`` deprecation to align with dask/dask ([#14762](https://github.com/rapidsai/cudf/pull/14762)) [@rjzamora](https://github.com/rjzamora)
- Enable intermediate proxies to be picklable ([#14752](https://github.com/rapidsai/cudf/pull/14752)) [@shwina](https://github.com/shwina)
- Add CUDF_TEST_PROGRAM_MAIN macro to tests lacking it ([#14751](https://github.com/rapidsai/cudf/pull/14751)) [@etseidl](https://github.com/etseidl)
- Fix CMake args ([#14746](https://github.com/rapidsai/cudf/pull/14746)) [@vyasr](https://github.com/vyasr)
- Fix logic bug introduced in #14730 ([#14742](https://github.com/rapidsai/cudf/pull/14742)) [@wence-](https://github.com/wence-)
- [Java] Choose The Correct RoundingMode For Checking Decimal OutOfBounds ([#14731](https://github.com/rapidsai/cudf/pull/14731)) [@razajafri](https://github.com/razajafri)
- Fix ``Groupby.get_group`` ([#14728](https://github.com/rapidsai/cudf/pull/14728)) [@rjzamora](https://github.com/rjzamora)
- Ensure that all CUDA kernels in cudf have hidden visibility. ([#14726](https://github.com/rapidsai/cudf/pull/14726)) [@robertmaynard](https://github.com/robertmaynard)
- Split cuda versions for notebook testing ([#14722](https://github.com/rapidsai/cudf/pull/14722)) [@raydouglass](https://github.com/raydouglass)
- Fix to_numeric not preserving Series index and name ([#14718](https://github.com/rapidsai/cudf/pull/14718)) [@mroeschke](https://github.com/mroeschke)
- Update dask-cudf wheel name ([#14713](https://github.com/rapidsai/cudf/pull/14713)) [@raydouglass](https://github.com/raydouglass)
- Fix strings::contains matching end of string target ([#14711](https://github.com/rapidsai/cudf/pull/14711)) [@davidwendt](https://github.com/davidwendt)
- Update to Dask's `shuffle_method` kwarg ([#14708](https://github.com/rapidsai/cudf/pull/14708)) [@pentschev](https://github.com/pentschev)
- Write file-level statistics when writing ORC files with zero rows ([#14707](https://github.com/rapidsai/cudf/pull/14707)) [@vuule](https://github.com/vuule)
- Potential fix for peformance regression in #14415 ([#14706](https://github.com/rapidsai/cudf/pull/14706)) [@etseidl](https://github.com/etseidl)
- Ensure DataFrame column types are preserved during serialization ([#14705](https://github.com/rapidsai/cudf/pull/14705)) [@mroeschke](https://github.com/mroeschke)
- Skip numba test that fails on ARM ([#14702](https://github.com/rapidsai/cudf/pull/14702)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Allow Z in datetime string parsing in non pandas compat mode ([#14701](https://github.com/rapidsai/cudf/pull/14701)) [@mroeschke](https://github.com/mroeschke)
- Fix nan_as_null not being respected when passing arrow object ([#14688](https://github.com/rapidsai/cudf/pull/14688)) [@mroeschke](https://github.com/mroeschke)
- Fix constructing Series/Index from arrow array and dtype ([#14686](https://github.com/rapidsai/cudf/pull/14686)) [@mroeschke](https://github.com/mroeschke)
- Fix Aggregation Type Promotion: Ensure Unsigned Input Types Result in Unsigned Output for Sum and Multiply ([#14679](https://github.com/rapidsai/cudf/pull/14679)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Add BaseOffset as a final proxy type to pass instancechecks for offsets against `BaseOffset` ([#14678](https://github.com/rapidsai/cudf/pull/14678)) [@shwina](https://github.com/shwina)
- Add row conversion code from spark-rapids-jni ([#14664](https://github.com/rapidsai/cudf/pull/14664)) [@ttnghia](https://github.com/ttnghia)
- Unconditionally export the CCCL path ([#14656](https://github.com/rapidsai/cudf/pull/14656)) [@vyasr](https://github.com/vyasr)
- Ensure libcudf searches for our patched version of CCCL first ([#14655](https://github.com/rapidsai/cudf/pull/14655)) [@robertmaynard](https://github.com/robertmaynard)
- Constrain CUDA in notebook testing to prevent CUDA 12.1 usage until we have pynvjitlink ([#14648](https://github.com/rapidsai/cudf/pull/14648)) [@vyasr](https://github.com/vyasr)
- Fix invalid memory access in Parquet reader ([#14637](https://github.com/rapidsai/cudf/pull/14637)) [@etseidl](https://github.com/etseidl)
- Use column_empty over as_column([]) ([#14632](https://github.com/rapidsai/cudf/pull/14632)) [@mroeschke](https://github.com/mroeschke)
- Add (implicit) handling for torch tensors in is_scalar ([#14623](https://github.com/rapidsai/cudf/pull/14623)) [@wence-](https://github.com/wence-)
- Fix astype/fillna not maintaining column subclass and types ([#14615](https://github.com/rapidsai/cudf/pull/14615)) [@mroeschke](https://github.com/mroeschke)
- Remove non-empty nulls in cudf::get_json_object ([#14609](https://github.com/rapidsai/cudf/pull/14609)) [@davidwendt](https://github.com/davidwendt)
- Remove `cuda::proclaim_return_type` from nested lambda ([#14607](https://github.com/rapidsai/cudf/pull/14607)) [@ttnghia](https://github.com/ttnghia)
- Fix DataFrame.reindex when column reindexing to MultiIndex/RangeIndex ([#14605](https://github.com/rapidsai/cudf/pull/14605)) [@mroeschke](https://github.com/mroeschke)
- Address potential race conditions in Parquet reader ([#14602](https://github.com/rapidsai/cudf/pull/14602)) [@etseidl](https://github.com/etseidl)
- Fix DataFrame.reindex removing column name ([#14601](https://github.com/rapidsai/cudf/pull/14601)) [@mroeschke](https://github.com/mroeschke)
- Remove unsanitized input test data from copy gtests ([#14600](https://github.com/rapidsai/cudf/pull/14600)) [@davidwendt](https://github.com/davidwendt)
- Fix race detected in Parquet writer ([#14598](https://github.com/rapidsai/cudf/pull/14598)) [@etseidl](https://github.com/etseidl)
- Correct invalid or missing return types ([#14587](https://github.com/rapidsai/cudf/pull/14587)) [@robertmaynard](https://github.com/robertmaynard)
- Fix unsanitized nulls from strings segmented-reduce ([#14586](https://github.com/rapidsai/cudf/pull/14586)) [@davidwendt](https://github.com/davidwendt)
- Upgrade to nvCOMP 3.0.5 ([#14581](https://github.com/rapidsai/cudf/pull/14581)) [@davidwendt](https://github.com/davidwendt)
- Fix unsanitized nulls produced by `cudf::clamp` APIs ([#14580](https://github.com/rapidsai/cudf/pull/14580)) [@davidwendt](https://github.com/davidwendt)
- Fix unsanitized nulls produced by libcudf dictionary decode ([#14578](https://github.com/rapidsai/cudf/pull/14578)) [@davidwendt](https://github.com/davidwendt)
- Fixes a symbol group lookup table issue ([#14561](https://github.com/rapidsai/cudf/pull/14561)) [@elstehle](https://github.com/elstehle)
- Drop llvm16 from cuda118-conda devcontainer image ([#14526](https://github.com/rapidsai/cudf/pull/14526)) [@charlesbluca](https://github.com/charlesbluca)
- REF: Make DataFrame.from_pandas process by column ([#14483](https://github.com/rapidsai/cudf/pull/14483)) [@mroeschke](https://github.com/mroeschke)
- Improve memory footprint of isin by using contains ([#14478](https://github.com/rapidsai/cudf/pull/14478)) [@wence-](https://github.com/wence-)
- Move creation of env.yaml outside the current directory ([#14476](https://github.com/rapidsai/cudf/pull/14476)) [@davidwendt](https://github.com/davidwendt)
- Enable `pd.Timestamp` objects to be picklable when `cudf.pandas` is active ([#14474](https://github.com/rapidsai/cudf/pull/14474)) [@shwina](https://github.com/shwina)
- Correct dtype of count aggregations on empty dataframes ([#14473](https://github.com/rapidsai/cudf/pull/14473)) [@wence-](https://github.com/wence-)
- Avoid DataFrame conversion in `MultiIndex.from_pandas` ([#14470](https://github.com/rapidsai/cudf/pull/14470)) [@mroeschke](https://github.com/mroeschke)
- JSON writer: avoid default stream use in `string_scalar` constructors ([#14444](https://github.com/rapidsai/cudf/pull/14444)) [@vuule](https://github.com/vuule)
- Fix default stream use in the CSV reader ([#14443](https://github.com/rapidsai/cudf/pull/14443)) [@vuule](https://github.com/vuule)
- Preserve DataFrame(columns=).columns dtype during empty-like construction ([#14381](https://github.com/rapidsai/cudf/pull/14381)) [@mroeschke](https://github.com/mroeschke)
- Defer PTX file load to runtime ([#13690](https://github.com/rapidsai/cudf/pull/13690)) [@brandon-b-miller](https://github.com/brandon-b-miller)
## 📖 Documentation
- Disable parallel build ([#14796](https://github.com/rapidsai/cudf/pull/14796)) [@vyasr](https://github.com/vyasr)
- Add pylibcudf to the docs ([#14791](https://github.com/rapidsai/cudf/pull/14791)) [@vyasr](https://github.com/vyasr)
- Describe unpickling expectations when cudf.pandas is enabled ([#14693](https://github.com/rapidsai/cudf/pull/14693)) [@shwina](https://github.com/shwina)
- Update CONTRIBUTING for pyproject-only builds ([#14653](https://github.com/rapidsai/cudf/pull/14653)) [@vyasr](https://github.com/vyasr)
- More doxygen fixes ([#14639](https://github.com/rapidsai/cudf/pull/14639)) [@vyasr](https://github.com/vyasr)
- Enable doxygen XML generation and fix issues ([#14477](https://github.com/rapidsai/cudf/pull/14477)) [@vyasr](https://github.com/vyasr)
- Some doxygen improvements ([#14469](https://github.com/rapidsai/cudf/pull/14469)) [@vyasr](https://github.com/vyasr)
- Remove warning in dask-cudf docs ([#14454](https://github.com/rapidsai/cudf/pull/14454)) [@wence-](https://github.com/wence-)
- Update README links with redirects. ([#14378](https://github.com/rapidsai/cudf/pull/14378)) [@bdice](https://github.com/bdice)
- Add pip install instructions to README ([#13677](https://github.com/rapidsai/cudf/pull/13677)) [@shwina](https://github.com/shwina)
## 🚀 New Features
- Add ci check for external kernels ([#14768](https://github.com/rapidsai/cudf/pull/14768)) [@robertmaynard](https://github.com/robertmaynard)
- JSON single quote normalization API ([#14729](https://github.com/rapidsai/cudf/pull/14729)) [@shrshi](https://github.com/shrshi)
- Write cuDF version in Parquet "created_by" metadata field ([#14721](https://github.com/rapidsai/cudf/pull/14721)) [@etseidl](https://github.com/etseidl)
- Implement remaining copying APIs in pylibcudf along with required helper functions ([#14640](https://github.com/rapidsai/cudf/pull/14640)) [@vyasr](https://github.com/vyasr)
- Don't constrain `numba<0.58` ([#14616](https://github.com/rapidsai/cudf/pull/14616)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Add DELTA_LENGTH_BYTE_ARRAY encoder and decoder for Parquet ([#14590](https://github.com/rapidsai/cudf/pull/14590)) [@etseidl](https://github.com/etseidl)
- JSON - Parse mixed types as string in JSON reader ([#14572](https://github.com/rapidsai/cudf/pull/14572)) [@karthikeyann](https://github.com/karthikeyann)
- JSON quote normalization ([#14545](https://github.com/rapidsai/cudf/pull/14545)) [@shrshi](https://github.com/shrshi)
- Make DefaultHostMemoryAllocator settable ([#14523](https://github.com/rapidsai/cudf/pull/14523)) [@gerashegalov](https://github.com/gerashegalov)
- Implement more copying APIs in pylibcudf ([#14508](https://github.com/rapidsai/cudf/pull/14508)) [@vyasr](https://github.com/vyasr)
- Include writer code and writerVersion in ORC files ([#14458](https://github.com/rapidsai/cudf/pull/14458)) [@vuule](https://github.com/vuule)
- Parquet sub-rowgroup reading. ([#14360](https://github.com/rapidsai/cudf/pull/14360)) [@nvdbaranec](https://github.com/nvdbaranec)
- Move chars column to parent data buffer in strings column ([#14202](https://github.com/rapidsai/cudf/pull/14202)) [@karthikeyann](https://github.com/karthikeyann)
- PARQUET-2261 Size Statistics ([#14000](https://github.com/rapidsai/cudf/pull/14000)) [@etseidl](https://github.com/etseidl)
- Improve GroupBy JIT error handling ([#13854](https://github.com/rapidsai/cudf/pull/13854)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Generate unified Python/C++ docs ([#13846](https://github.com/rapidsai/cudf/pull/13846)) [@vyasr](https://github.com/vyasr)
- Expand JIT groupby test suite ([#13813](https://github.com/rapidsai/cudf/pull/13813)) [@brandon-b-miller](https://github.com/brandon-b-miller)
## 🛠️ Improvements
- Pin `pytest<8` ([#14920](https://github.com/rapidsai/cudf/pull/14920)) [@galipremsagar](https://github.com/galipremsagar)
- Move cudf::char_utf8 definition from detail to public header ([#14779](https://github.com/rapidsai/cudf/pull/14779)) [@davidwendt](https://github.com/davidwendt)
- Clean up `TimedeltaIndex.__init__` constructor ([#14775](https://github.com/rapidsai/cudf/pull/14775)) [@mroeschke](https://github.com/mroeschke)
- Clean up `DatetimeIndex.__init__` constructor ([#14774](https://github.com/rapidsai/cudf/pull/14774)) [@mroeschke](https://github.com/mroeschke)
- Some `frame.py` typing, move seldom used methods in `frame.py` ([#14766](https://github.com/rapidsai/cudf/pull/14766)) [@mroeschke](https://github.com/mroeschke)
- Remove **kwargs from astype ([#14765](https://github.com/rapidsai/cudf/pull/14765)) [@mroeschke](https://github.com/mroeschke)
- fix benchmarks compatibility with newer pytest-cases ([#14764](https://github.com/rapidsai/cudf/pull/14764)) [@jameslamb](https://github.com/jameslamb)
- Add `pynvjitlink` as a dependency ([#14763](https://github.com/rapidsai/cudf/pull/14763)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Resolve degenerate performance in `create_structs_data` ([#14761](https://github.com/rapidsai/cudf/pull/14761)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Simplify ColumnAccessor methods; avoid unnecessary validations ([#14758](https://github.com/rapidsai/cudf/pull/14758)) [@mroeschke](https://github.com/mroeschke)
- Pin pytest-cases<3.8.2 ([#14756](https://github.com/rapidsai/cudf/pull/14756)) [@mroeschke](https://github.com/mroeschke)
- Use _from_data instead of _from_columns for initialzing Frame ([#14755](https://github.com/rapidsai/cudf/pull/14755)) [@mroeschke](https://github.com/mroeschke)
- Consolidate cudf object handling in as_column ([#14754](https://github.com/rapidsai/cudf/pull/14754)) [@mroeschke](https://github.com/mroeschke)
- Reduce execution time of Parquet C++ tests ([#14750](https://github.com/rapidsai/cudf/pull/14750)) [@vuule](https://github.com/vuule)
- Implement to_datetime(..., utc=True) ([#14749](https://github.com/rapidsai/cudf/pull/14749)) [@mroeschke](https://github.com/mroeschke)
- Remove usages of rapids-env-update ([#14748](https://github.com/rapidsai/cudf/pull/14748)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Provide explicit pool size and avoid RMM detail APIs ([#14741](https://github.com/rapidsai/cudf/pull/14741)) [@harrism](https://github.com/harrism)
- Implement `cudf.MultiIndex.from_arrays` ([#14740](https://github.com/rapidsai/cudf/pull/14740)) [@mroeschke](https://github.com/mroeschke)
- Remove unused/single use methods ([#14739](https://github.com/rapidsai/cudf/pull/14739)) [@mroeschke](https://github.com/mroeschke)
- refactor CUDA versions in dependencies.yaml ([#14733](https://github.com/rapidsai/cudf/pull/14733)) [@jameslamb](https://github.com/jameslamb)
- Remove unneeded methods in Column ([#14730](https://github.com/rapidsai/cudf/pull/14730)) [@mroeschke](https://github.com/mroeschke)
- Clean up base column methods ([#14725](https://github.com/rapidsai/cudf/pull/14725)) [@mroeschke](https://github.com/mroeschke)
- Ensure column.fillna signatures are consistent ([#14724](https://github.com/rapidsai/cudf/pull/14724)) [@mroeschke](https://github.com/mroeschke)
- Remove mimesis as a testing dependency ([#14723](https://github.com/rapidsai/cudf/pull/14723)) [@mroeschke](https://github.com/mroeschke)
- Replace as_numerical with as_numerical_column/codes ([#14719](https://github.com/rapidsai/cudf/pull/14719)) [@mroeschke](https://github.com/mroeschke)
- Use offsetalator in gather_chars ([#14700](https://github.com/rapidsai/cudf/pull/14700)) [@davidwendt](https://github.com/davidwendt)
- Use make_strings_children for fill() specialization logic ([#14697](https://github.com/rapidsai/cudf/pull/14697)) [@davidwendt](https://github.com/davidwendt)
- Change `io::detail::orc` namespace into `io::orc::detail` ([#14696](https://github.com/rapidsai/cudf/pull/14696)) [@ttnghia](https://github.com/ttnghia)
- Fix call to deprecated factory function ([#14695](https://github.com/rapidsai/cudf/pull/14695)) [@davidwendt](https://github.com/davidwendt)
- Use as_column instead of arange for range like inputs ([#14689](https://github.com/rapidsai/cudf/pull/14689)) [@mroeschke](https://github.com/mroeschke)
- Reorganize ORC reader into multiple files and perform some small fixes to cuIO code ([#14665](https://github.com/rapidsai/cudf/pull/14665)) [@ttnghia](https://github.com/ttnghia)
- Split parquet test into multiple files ([#14663](https://github.com/rapidsai/cudf/pull/14663)) [@etseidl](https://github.com/etseidl)
- Custom error messages for IO with nonexistent files ([#14662](https://github.com/rapidsai/cudf/pull/14662)) [@vuule](https://github.com/vuule)
- Explicitly pass .dtype into is_foo_dtype functions ([#14657](https://github.com/rapidsai/cudf/pull/14657)) [@mroeschke](https://github.com/mroeschke)
- Basic validation in reader benchmarks ([#14647](https://github.com/rapidsai/cudf/pull/14647)) [@vuule](https://github.com/vuule)
- Update dependencies.yaml to support CUDA 12.*. ([#14644](https://github.com/rapidsai/cudf/pull/14644)) [@bdice](https://github.com/bdice)
- Consolidate memoryview handling in as_column ([#14643](https://github.com/rapidsai/cudf/pull/14643)) [@mroeschke](https://github.com/mroeschke)
- Convert `FieldType` to scoped enum ([#14642](https://github.com/rapidsai/cudf/pull/14642)) [@vuule](https://github.com/vuule)
- Use instance over is_foo_dtype ([#14641](https://github.com/rapidsai/cudf/pull/14641)) [@mroeschke](https://github.com/mroeschke)
- Use isinstance over is_foo_dtype internally ([#14638](https://github.com/rapidsai/cudf/pull/14638)) [@mroeschke](https://github.com/mroeschke)
- Remove unnecessary **kwargs in function signatures ([#14635](https://github.com/rapidsai/cudf/pull/14635)) [@mroeschke](https://github.com/mroeschke)
- Drop nvbench patch for nvml. ([#14631](https://github.com/rapidsai/cudf/pull/14631)) [@bdice](https://github.com/bdice)
- Drop Pascal GPU support. ([#14630](https://github.com/rapidsai/cudf/pull/14630)) [@bdice](https://github.com/bdice)
- Add cpp/doxygen/xml to .gitignore ([#14613](https://github.com/rapidsai/cudf/pull/14613)) [@davidwendt](https://github.com/davidwendt)
- Create strings-specific make_offsets_child_column for multiple offset types ([#14612](https://github.com/rapidsai/cudf/pull/14612)) [@davidwendt](https://github.com/davidwendt)
- Use the offsetalator in cudf::concatenate for strings ([#14611](https://github.com/rapidsai/cudf/pull/14611)) [@davidwendt](https://github.com/davidwendt)
- Make Parquet ColumnIndex null_counts optional ([#14596](https://github.com/rapidsai/cudf/pull/14596)) [@etseidl](https://github.com/etseidl)
- Support `freq` in DatetimeIndex ([#14593](https://github.com/rapidsai/cudf/pull/14593)) [@shwina](https://github.com/shwina)
- Remove legacy benchmarks for cuDF-python ([#14591](https://github.com/rapidsai/cudf/pull/14591)) [@osidekyle](https://github.com/osidekyle)
- Remove WORKSPACE env var from cudf_test temp_directory class ([#14588](https://github.com/rapidsai/cudf/pull/14588)) [@davidwendt](https://github.com/davidwendt)
- Use exceptions instead of return values to handle errors in `CompactProtocolReader` ([#14582](https://github.com/rapidsai/cudf/pull/14582)) [@vuule](https://github.com/vuule)
- Use cuda::proclaim_return_type on device lambdas. ([#14577](https://github.com/rapidsai/cudf/pull/14577)) [@bdice](https://github.com/bdice)
- Update to CCCL 2.2.0. ([#14576](https://github.com/rapidsai/cudf/pull/14576)) [@bdice](https://github.com/bdice)
- Update dependencies.yaml to new pip index ([#14575](https://github.com/rapidsai/cudf/pull/14575)) [@vyasr](https://github.com/vyasr)
- Simplify Python CMake ([#14565](https://github.com/rapidsai/cudf/pull/14565)) [@vyasr](https://github.com/vyasr)
- Java expose parquet pass_read_limit ([#14564](https://github.com/rapidsai/cudf/pull/14564)) [@revans2](https://github.com/revans2)
- Add column sanitization checks in `CUDF_TEST_EXPECT_COLUMN_*` macros ([#14559](https://github.com/rapidsai/cudf/pull/14559)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Use cudf_test temp_directory class for nvtext::subword_tokenize gbenchmark ([#14558](https://github.com/rapidsai/cudf/pull/14558)) [@davidwendt](https://github.com/davidwendt)
- Fix return type of prefix increment overloads ([#14544](https://github.com/rapidsai/cudf/pull/14544)) [@vuule](https://github.com/vuule)
- Make bpe_merge_pairs_impl member private ([#14543](https://github.com/rapidsai/cudf/pull/14543)) [@davidwendt](https://github.com/davidwendt)
- Small clean up in `io::statistics` ([#14542](https://github.com/rapidsai/cudf/pull/14542)) [@vuule](https://github.com/vuule)
- Change json gtest environment variable to compile-time definition ([#14541](https://github.com/rapidsai/cudf/pull/14541)) [@davidwendt](https://github.com/davidwendt)
- Remove extra total chars size calculation from cudf::concatenate ([#14540](https://github.com/rapidsai/cudf/pull/14540)) [@davidwendt](https://github.com/davidwendt)
- Refactor IndexedFrame.hash_values to use cudf::hashing functions, add xxhash64 to cudf Python. ([#14538](https://github.com/rapidsai/cudf/pull/14538)) [@bdice](https://github.com/bdice)
- Move non-templated inline function definitions from table_view.hpp to table_view.cpp ([#14535](https://github.com/rapidsai/cudf/pull/14535)) [@davidwendt](https://github.com/davidwendt)
- Add JNI for strings::code_points ([#14533](https://github.com/rapidsai/cudf/pull/14533)) [@thirtiseven](https://github.com/thirtiseven)
- Add a test for issue 12773 ([#14529](https://github.com/rapidsai/cudf/pull/14529)) [@vyasr](https://github.com/vyasr)
- Split libarrow build dependencies. ([#14506](https://github.com/rapidsai/cudf/pull/14506)) [@bdice](https://github.com/bdice)
- Implement `IndexedFrame.duplicated` with `distinct_indices` + `scatter` ([#14493](https://github.com/rapidsai/cudf/pull/14493)) [@wence-](https://github.com/wence-)
- Expunge as_frame conversions in Column algorithms ([#14491](https://github.com/rapidsai/cudf/pull/14491)) [@wence-](https://github.com/wence-)
- Remove unsanitized null from input strings column in rank_tests.cpp ([#14475](https://github.com/rapidsai/cudf/pull/14475)) [@davidwendt](https://github.com/davidwendt)
- Refactor Parquet kernel_error ([#14464](https://github.com/rapidsai/cudf/pull/14464)) [@etseidl](https://github.com/etseidl)
- Deprecate cudf::make_strings_column accepting typed offsets ([#14461](https://github.com/rapidsai/cudf/pull/14461)) [@davidwendt](https://github.com/davidwendt)
- Remove deprecated nvtext::load_merge_pairs_file ([#14460](https://github.com/rapidsai/cudf/pull/14460)) [@davidwendt](https://github.com/davidwendt)
- Introduce Comprehensive Pathological Unit Tests for Issue #14409 ([#14459](https://github.com/rapidsai/cudf/pull/14459)) [@aocsa](https://github.com/aocsa)
- Expose stream parameter in public nvtext APIs ([#14456](https://github.com/rapidsai/cudf/pull/14456)) [@davidwendt](https://github.com/davidwendt)
- Include encode type in the error message when unsupported Parquet encoding is detected ([#14453](https://github.com/rapidsai/cudf/pull/14453)) [@ZelboK](https://github.com/ZelboK)
- Remove null mask for zero nulls in json readers ([#14451](https://github.com/rapidsai/cudf/pull/14451)) [@karthikeyann](https://github.com/karthikeyann)
- Refactor cudf.Series.__init__ ([#14450](https://github.com/rapidsai/cudf/pull/14450)) [@mroeschke](https://github.com/mroeschke)
- Remove the use of `volatile` in Parquet ([#14448](https://github.com/rapidsai/cudf/pull/14448)) [@vuule](https://github.com/vuule)
- REF: Remove **kwargs from to_pandas, raise if nullable is not implemented ([#14438](https://github.com/rapidsai/cudf/pull/14438)) [@mroeschke](https://github.com/mroeschke)
- Testing stream pool implementation ([#14437](https://github.com/rapidsai/cudf/pull/14437)) [@shrshi](https://github.com/shrshi)
- Match pandas join ordering obligations in pandas-compatible mode ([#14428](https://github.com/rapidsai/cudf/pull/14428)) [@wence-](https://github.com/wence-)
- Forward-merge branch-23.12 to branch-24.02 ([#14426](https://github.com/rapidsai/cudf/pull/14426)) [@bdice](https://github.com/bdice)
- Use isinstance(..., cudf.IntervalDtype) instead of is_interval_dtype ([#14424](https://github.com/rapidsai/cudf/pull/14424)) [@mroeschke](https://github.com/mroeschke)
- Use isinstance(..., cudf.CategoricalDtype) instead of is_categorical_dtype ([#14423](https://github.com/rapidsai/cudf/pull/14423)) [@mroeschke](https://github.com/mroeschke)
- Forward-merge branch-23.12 to branch-24.02 ([#14422](https://github.com/rapidsai/cudf/pull/14422)) [@bdice](https://github.com/bdice)
- REF: Remove instances of pd.core ([#14421](https://github.com/rapidsai/cudf/pull/14421)) [@mroeschke](https://github.com/mroeschke)
- Expose streams in public filling APIs for label_bins ([#14401](https://github.com/rapidsai/cudf/pull/14401)) [@ZelboK](https://github.com/ZelboK)
- Consolidate 1D pandas object handling in as_column ([#14394](https://github.com/rapidsai/cudf/pull/14394)) [@mroeschke](https://github.com/mroeschke)
- Limit DELTA_BINARY_PACKED encoder to the same number of bits as the physical type being encoded ([#14392](https://github.com/rapidsai/cudf/pull/14392)) [@etseidl](https://github.com/etseidl)
- Add SHA-1 and SHA-2 hash functions. ([#14391](https://github.com/rapidsai/cudf/pull/14391)) [@bdice](https://github.com/bdice)
- Expose streams in Parquet reader and writer APIs ([#14359](https://github.com/rapidsai/cudf/pull/14359)) [@shrshi](https://github.com/shrshi)
- Update to fmt 10.1.1 and spdlog 1.12.0. ([#14355](https://github.com/rapidsai/cudf/pull/14355)) [@bdice](https://github.com/bdice)
- Replace default stream for scalars and column factories usages (because of defaulted arguments) ([#14354](https://github.com/rapidsai/cudf/pull/14354)) [@karthikeyann](https://github.com/karthikeyann)
- Expose streams in ORC reader and writer APIs ([#14350](https://github.com/rapidsai/cudf/pull/14350)) [@shrshi](https://github.com/shrshi)
- Convert compression and io to string axis type in IO benchmarks ([#14347](https://github.com/rapidsai/cudf/pull/14347)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Add cuDF devcontainers ([#14015](https://github.com/rapidsai/cudf/pull/14015)) [@trxcllnt](https://github.com/trxcllnt)
- Refactoring of Buffers (last step towards unifying COW and Spilling) ([#13801](https://github.com/rapidsai/cudf/pull/13801)) [@madsbk](https://github.com/madsbk)
- Switch to scikit-build-core ([#13531](https://github.com/rapidsai/cudf/pull/13531)) [@vyasr](https://github.com/vyasr)
- Simplify null count checking in column equality comparator ([#13312](https://github.com/rapidsai/cudf/pull/13312)) [@vyasr](https://github.com/vyasr)
# cuDF 23.12.00 (6 Dec 2023)
## 🚨 Breaking Changes
- Raise error in `reindex` when `index` is not unique ([#14400](https://github.com/rapidsai/cudf/pull/14400)) [@galipremsagar](https://github.com/galipremsagar)
- Expose stream parameter to get_json_object API ([#14297](https://github.com/rapidsai/cudf/pull/14297)) [@davidwendt](https://github.com/davidwendt)
- Refactor cudf_kafka to use skbuild ([#14292](https://github.com/rapidsai/cudf/pull/14292)) [@jdye64](https://github.com/jdye64)
- Expose stream parameter in public strings convert APIs ([#14255](https://github.com/rapidsai/cudf/pull/14255)) [@davidwendt](https://github.com/davidwendt)
- Upgrade to nvCOMP 3.0.4 ([#13815](https://github.com/rapidsai/cudf/pull/13815)) [@vuule](https://github.com/vuule)
## 🐛 Bug Fixes
- Update actions/labeler to v4 ([#14562](https://github.com/rapidsai/cudf/pull/14562)) [@raydouglass](https://github.com/raydouglass)
- Fix data corruption when skipping rows ([#14557](https://github.com/rapidsai/cudf/pull/14557)) [@etseidl](https://github.com/etseidl)
- Fix function name typo in `cudf.pandas` profiler ([#14514](https://github.com/rapidsai/cudf/pull/14514)) [@galipremsagar](https://github.com/galipremsagar)
- Fix intermediate type checking in expression parsing ([#14445](https://github.com/rapidsai/cudf/pull/14445)) [@vyasr](https://github.com/vyasr)
- Forward merge `branch-23.10` into `branch-23.12` ([#14435](https://github.com/rapidsai/cudf/pull/14435)) [@raydouglass](https://github.com/raydouglass)
- Remove needs: wheel-build-cudf. ([#14427](https://github.com/rapidsai/cudf/pull/14427)) [@bdice](https://github.com/bdice)
- Fix dask dependency in custreamz ([#14420](https://github.com/rapidsai/cudf/pull/14420)) [@vyasr](https://github.com/vyasr)
- Ensure nvbench initializes nvml context when built statically ([#14411](https://github.com/rapidsai/cudf/pull/14411)) [@robertmaynard](https://github.com/robertmaynard)
- Support java AST String literal with desired encoding ([#14402](https://github.com/rapidsai/cudf/pull/14402)) [@winningsix](https://github.com/winningsix)
- Raise error in `reindex` when `index` is not unique ([#14400](https://github.com/rapidsai/cudf/pull/14400)) [@galipremsagar](https://github.com/galipremsagar)
- Always build nvbench statically so we don't need to package it ([#14399](https://github.com/rapidsai/cudf/pull/14399)) [@robertmaynard](https://github.com/robertmaynard)
- Fix token-count logic in nvtext::tokenize_with_vocabulary ([#14393](https://github.com/rapidsai/cudf/pull/14393)) [@davidwendt](https://github.com/davidwendt)
- Fix as_column(pd.Timestamp/Timedelta, length=) not respecting length ([#14390](https://github.com/rapidsai/cudf/pull/14390)) [@mroeschke](https://github.com/mroeschke)
- cudf.pandas: cuDF subpath checking in module `__getattr__` ([#14388](https://github.com/rapidsai/cudf/pull/14388)) [@shwina](https://github.com/shwina)
- Fix and disable encoding for nanosecond statistics in ORC writer ([#14367](https://github.com/rapidsai/cudf/pull/14367)) [@vuule](https://github.com/vuule)
- Add the new manylinux builds to the build job ([#14351](https://github.com/rapidsai/cudf/pull/14351)) [@vyasr](https://github.com/vyasr)
- cudf jit parser now supports .pragma instructions with quotes ([#14348](https://github.com/rapidsai/cudf/pull/14348)) [@robertmaynard](https://github.com/robertmaynard)
- Fix overflow check in `cudf::merge` ([#14345](https://github.com/rapidsai/cudf/pull/14345)) [@divyegala](https://github.com/divyegala)
- Add cramjam ([#14344](https://github.com/rapidsai/cudf/pull/14344)) [@vyasr](https://github.com/vyasr)
- Enable `dask_cudf/io` pytests in CI ([#14338](https://github.com/rapidsai/cudf/pull/14338)) [@galipremsagar](https://github.com/galipremsagar)
- Temporarily avoid the current build of pydata-sphinx-theme ([#14332](https://github.com/rapidsai/cudf/pull/14332)) [@vyasr](https://github.com/vyasr)
- Fix host buffer access from device function in the Parquet reader ([#14328](https://github.com/rapidsai/cudf/pull/14328)) [@vuule](https://github.com/vuule)
- Run IO tests for Dask-cuDF ([#14327](https://github.com/rapidsai/cudf/pull/14327)) [@rjzamora](https://github.com/rjzamora)
- Fix logical type issues in the Parquet writer ([#14322](https://github.com/rapidsai/cudf/pull/14322)) [@vuule](https://github.com/vuule)
- Remove aws-sdk-pinning and revert to arrow 12.0.1 ([#14319](https://github.com/rapidsai/cudf/pull/14319)) [@vyasr](https://github.com/vyasr)
- test is_valid before reading column data ([#14318](https://github.com/rapidsai/cudf/pull/14318)) [@etseidl](https://github.com/etseidl)
- Fix gtest validity setting for TextTokenizeTest.Vocabulary ([#14312](https://github.com/rapidsai/cudf/pull/14312)) [@davidwendt](https://github.com/davidwendt)
- Fixes stack context for json lines format that recovers from invalid JSON lines ([#14309](https://github.com/rapidsai/cudf/pull/14309)) [@elstehle](https://github.com/elstehle)
- Downgrade to Arrow 12.0.0 for aws-sdk-cpp and fix cudf_kafka builds for new CI containers ([#14296](https://github.com/rapidsai/cudf/pull/14296)) [@vyasr](https://github.com/vyasr)
- fixing thread index overflow issue ([#14290](https://github.com/rapidsai/cudf/pull/14290)) [@hyperbolic2346](https://github.com/hyperbolic2346)
- Fix memset error in nvtext::edit_distance_matrix ([#14283](https://github.com/rapidsai/cudf/pull/14283)) [@davidwendt](https://github.com/davidwendt)
- Changes JSON reader's recovery option's behaviour to ignore all characters after a valid JSON record ([#14279](https://github.com/rapidsai/cudf/pull/14279)) [@elstehle](https://github.com/elstehle)
- Handle empty string correctly in Parquet statistics ([#14257](https://github.com/rapidsai/cudf/pull/14257)) [@etseidl](https://github.com/etseidl)
- Fixes behaviour for incomplete lines when `recover_with_nulls` is enabled ([#14252](https://github.com/rapidsai/cudf/pull/14252)) [@elstehle](https://github.com/elstehle)
- cudf::detail::pinned_allocator doesn't throw from `deallocate` ([#14251](https://github.com/rapidsai/cudf/pull/14251)) [@robertmaynard](https://github.com/robertmaynard)
- Fix strings replace for adjacent, identical multi-byte UTF-8 character targets ([#14235](https://github.com/rapidsai/cudf/pull/14235)) [@davidwendt](https://github.com/davidwendt)
- Fix the precision when converting a decimal128 column to an arrow array ([#14230](https://github.com/rapidsai/cudf/pull/14230)) [@jihoonson](https://github.com/jihoonson)
- Fixing parquet list of struct interpretation ([#13715](https://github.com/rapidsai/cudf/pull/13715)) [@hyperbolic2346](https://github.com/hyperbolic2346)
## 📖 Documentation
- Fix io reference in docs. ([#14452](https://github.com/rapidsai/cudf/pull/14452)) [@bdice](https://github.com/bdice)
- Update README ([#14374](https://github.com/rapidsai/cudf/pull/14374)) [@shwina](https://github.com/shwina)
- Example code for blog on new row comparators ([#13795](https://github.com/rapidsai/cudf/pull/13795)) [@divyegala](https://github.com/divyegala)
## 🚀 New Features
- Expose streams in public unary APIs ([#14342](https://github.com/rapidsai/cudf/pull/14342)) [@vyasr](https://github.com/vyasr)
- Add python tests for Parquet DELTA_BINARY_PACKED encoder ([#14316](https://github.com/rapidsai/cudf/pull/14316)) [@etseidl](https://github.com/etseidl)
- Update rapids-cmake functions to non-deprecated signatures ([#14265](https://github.com/rapidsai/cudf/pull/14265)) [@robertmaynard](https://github.com/robertmaynard)
- Expose streams in public null mask APIs ([#14263](https://github.com/rapidsai/cudf/pull/14263)) [@vyasr](https://github.com/vyasr)
- Expose streams in binaryop APIs ([#14187](https://github.com/rapidsai/cudf/pull/14187)) [@vyasr](https://github.com/vyasr)
- Add pylibcudf.Scalar that interoperates with Arrow scalars ([#14133](https://github.com/rapidsai/cudf/pull/14133)) [@vyasr](https://github.com/vyasr)
- Add decoder for DELTA_BYTE_ARRAY to Parquet reader ([#14101](https://github.com/rapidsai/cudf/pull/14101)) [@etseidl](https://github.com/etseidl)
- Add DELTA_BINARY_PACKED encoder for Parquet writer ([#14100](https://github.com/rapidsai/cudf/pull/14100)) [@etseidl](https://github.com/etseidl)
- Add BytePairEncoder class to cuDF ([#13891](https://github.com/rapidsai/cudf/pull/13891)) [@davidwendt](https://github.com/davidwendt)
- Upgrade to nvCOMP 3.0.4 ([#13815](https://github.com/rapidsai/cudf/pull/13815)) [@vuule](https://github.com/vuule)
- Use `pynvjitlink` for CUDA 12+ MVC ([#13650](https://github.com/rapidsai/cudf/pull/13650)) [@brandon-b-miller](https://github.com/brandon-b-miller)
## 🛠️ Improvements
- Build concurrency for nightly and merge triggers ([#14441](https://github.com/rapidsai/cudf/pull/14441)) [@bdice](https://github.com/bdice)
- Cleanup remaining usages of dask dependencies ([#14407](https://github.com/rapidsai/cudf/pull/14407)) [@galipremsagar](https://github.com/galipremsagar)
- Update to Arrow 14.0.1. ([#14387](https://github.com/rapidsai/cudf/pull/14387)) [@bdice](https://github.com/bdice)
- Remove Cython libcpp wrappers ([#14382](https://github.com/rapidsai/cudf/pull/14382)) [@vyasr](https://github.com/vyasr)
- Forward-merge branch-23.10 to branch-23.12 ([#14372](https://github.com/rapidsai/cudf/pull/14372)) [@bdice](https://github.com/bdice)
- Upgrade to arrow 14 ([#14371](https://github.com/rapidsai/cudf/pull/14371)) [@galipremsagar](https://github.com/galipremsagar)
- Fix a pytest typo in `test_kurt_skew_error` ([#14368](https://github.com/rapidsai/cudf/pull/14368)) [@galipremsagar](https://github.com/galipremsagar)
- Use new rapids-dask-dependency metapackage for managing dask versions ([#14364](https://github.com/rapidsai/cudf/pull/14364)) [@vyasr](https://github.com/vyasr)
- Change `nullable()` to `has_nulls()` in `cudf::detail::gather` ([#14363](https://github.com/rapidsai/cudf/pull/14363)) [@divyegala](https://github.com/divyegala)
- Split up scan_inclusive.cu to improve its compile time ([#14358](https://github.com/rapidsai/cudf/pull/14358)) [@davidwendt](https://github.com/davidwendt)
- Implement user_datasource_wrapper is_empty() and is_device_read_preferred(). ([#14357](https://github.com/rapidsai/cudf/pull/14357)) [@tpn](https://github.com/tpn)
- Added streams to CSV reader and writer api ([#14340](https://github.com/rapidsai/cudf/pull/14340)) [@shrshi](https://github.com/shrshi)
- Upgrade wheels to use arrow 13 ([#14339](https://github.com/rapidsai/cudf/pull/14339)) [@vyasr](https://github.com/vyasr)
- Rework nvtext::byte_pair_encoding API ([#14337](https://github.com/rapidsai/cudf/pull/14337)) [@davidwendt](https://github.com/davidwendt)
- Improve performance of nvtext::tokenize_with_vocabulary for long strings ([#14336](https://github.com/rapidsai/cudf/pull/14336)) [@davidwendt](https://github.com/davidwendt)
- Upgrade `arrow` to `13` ([#14330](https://github.com/rapidsai/cudf/pull/14330)) [@galipremsagar](https://github.com/galipremsagar)
- Expose stream parameter in public nvtext replace APIs ([#14329](https://github.com/rapidsai/cudf/pull/14329)) [@davidwendt](https://github.com/davidwendt)
- Drop `pyorc` dependency and use `pandas`/`pyarrow` instead ([#14323](https://github.com/rapidsai/cudf/pull/14323)) [@galipremsagar](https://github.com/galipremsagar)
- Avoid `pyarrow.fs` import for local storage ([#14321](https://github.com/rapidsai/cudf/pull/14321)) [@rjzamora](https://github.com/rjzamora)
- Unpin `dask` and `distributed` for `23.12` development ([#14320](https://github.com/rapidsai/cudf/pull/14320)) [@galipremsagar](https://github.com/galipremsagar)
- Expose stream parameter in public nvtext tokenize APIs ([#14317](https://github.com/rapidsai/cudf/pull/14317)) [@davidwendt](https://github.com/davidwendt)
- Added streams to JSON reader and writer api ([#14313](https://github.com/rapidsai/cudf/pull/14313)) [@shrshi](https://github.com/shrshi)
- Minor improvements in `source_info` ([#14308](https://github.com/rapidsai/cudf/pull/14308)) [@vuule](https://github.com/vuule)
- Forward-merge branch-23.10 to branch-23.12 ([#14307](https://github.com/rapidsai/cudf/pull/14307)) [@bdice](https://github.com/bdice)
- Add stream parameter to Set Operations (Public List APIs) ([#14305](https://github.com/rapidsai/cudf/pull/14305)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Expose stream parameter to get_json_object API ([#14297](https://github.com/rapidsai/cudf/pull/14297)) [@davidwendt](https://github.com/davidwendt)
- Sort dictionary data alphabetically in the ORC writer ([#14295](https://github.com/rapidsai/cudf/pull/14295)) [@vuule](https://github.com/vuule)
- Expose stream parameter in public strings filter APIs ([#14293](https://github.com/rapidsai/cudf/pull/14293)) [@davidwendt](https://github.com/davidwendt)
- Refactor cudf_kafka to use skbuild ([#14292](https://github.com/rapidsai/cudf/pull/14292)) [@jdye64](https://github.com/jdye64)
- Update `shared-action-workflows` references ([#14289](https://github.com/rapidsai/cudf/pull/14289)) [@AyodeAwe](https://github.com/AyodeAwe)
- Register ``partd`` encode dispatch in ``dask_cudf`` ([#14287](https://github.com/rapidsai/cudf/pull/14287)) [@rjzamora](https://github.com/rjzamora)
- Update versioning strategy ([#14285](https://github.com/rapidsai/cudf/pull/14285)) [@vyasr](https://github.com/vyasr)
- Move and rename byte-pair-encoding source files ([#14284](https://github.com/rapidsai/cudf/pull/14284)) [@davidwendt](https://github.com/davidwendt)
- Expose stream parameter in public strings combine APIs ([#14281](https://github.com/rapidsai/cudf/pull/14281)) [@davidwendt](https://github.com/davidwendt)
- Expose stream parameter in public strings contains APIs ([#14280](https://github.com/rapidsai/cudf/pull/14280)) [@davidwendt](https://github.com/davidwendt)
- Add stream parameter to List Sort and Filter APIs ([#14272](https://github.com/rapidsai/cudf/pull/14272)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Use branch-23.12 workflows. ([#14271](https://github.com/rapidsai/cudf/pull/14271)) [@bdice](https://github.com/bdice)
- Refactor LogicalType for Parquet ([#14264](https://github.com/rapidsai/cudf/pull/14264)) [@etseidl](https://github.com/etseidl)
- Centralize chunked reading code in the parquet reader to reader_impl_chunking.cu ([#14262](https://github.com/rapidsai/cudf/pull/14262)) [@nvdbaranec](https://github.com/nvdbaranec)
- Expose stream parameter in public strings replace APIs ([#14261](https://github.com/rapidsai/cudf/pull/14261)) [@davidwendt](https://github.com/davidwendt)
- Expose stream parameter in public strings APIs ([#14260](https://github.com/rapidsai/cudf/pull/14260)) [@davidwendt](https://github.com/davidwendt)
- Cleanup of namespaces in parquet code. ([#14259](https://github.com/rapidsai/cudf/pull/14259)) [@nvdbaranec](https://github.com/nvdbaranec)
- Make parquet schema index type consistent ([#14256](https://github.com/rapidsai/cudf/pull/14256)) [@hyperbolic2346](https://github.com/hyperbolic2346)
- Expose stream parameter in public strings convert APIs ([#14255](https://github.com/rapidsai/cudf/pull/14255)) [@davidwendt](https://github.com/davidwendt)
- Add in java bindings for DataSource ([#14254](https://github.com/rapidsai/cudf/pull/14254)) [@revans2](https://github.com/revans2)
- Reimplement `cudf::merge` for nested types without using comparators ([#14250](https://github.com/rapidsai/cudf/pull/14250)) [@divyegala](https://github.com/divyegala)
- Add stream parameter to List Manipulation and Operations APIs ([#14248](https://github.com/rapidsai/cudf/pull/14248)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Expose stream parameter in public strings split/partition APIs ([#14247](https://github.com/rapidsai/cudf/pull/14247)) [@davidwendt](https://github.com/davidwendt)
- Improve `contains_column` by invoking `contains_table` ([#14238](https://github.com/rapidsai/cudf/pull/14238)) [@PointKernel](https://github.com/PointKernel)
- Detect and report errors in Parquet header parsing ([#14237](https://github.com/rapidsai/cudf/pull/14237)) [@etseidl](https://github.com/etseidl)
- Normalizing offsets iterator ([#14234](https://github.com/rapidsai/cudf/pull/14234)) [@davidwendt](https://github.com/davidwendt)
- Forward merge `23.10` into `23.12` ([#14231](https://github.com/rapidsai/cudf/pull/14231)) [@galipremsagar](https://github.com/galipremsagar)
- Return error if BOOL8 column-type is used with integers-to-hex ([#14208](https://github.com/rapidsai/cudf/pull/14208)) [@davidwendt](https://github.com/davidwendt)
- Enable indexalator for device code ([#14206](https://github.com/rapidsai/cudf/pull/14206)) [@davidwendt](https://github.com/davidwendt)
- Marginally reduce memory footprint of joins ([#14197](https://github.com/rapidsai/cudf/pull/14197)) [@wence-](https://github.com/wence-)
- Add nvtx annotations to spilling-based data movement ([#14196](https://github.com/rapidsai/cudf/pull/14196)) [@wence-](https://github.com/wence-)
- Optimize ORC writer for decimal columns ([#14190](https://github.com/rapidsai/cudf/pull/14190)) [@vuule](https://github.com/vuule)
- Remove the use of volatile in ORC ([#14175](https://github.com/rapidsai/cudf/pull/14175)) [@vuule](https://github.com/vuule)
- Add `bytes_per_second` to distinct_count of stream_compaction nvbench. ([#14172](https://github.com/rapidsai/cudf/pull/14172)) [@Blonck](https://github.com/Blonck)
- Add `bytes_per_second` to transpose benchmark ([#14170](https://github.com/rapidsai/cudf/pull/14170)) [@Blonck](https://github.com/Blonck)
- cuDF: Build CUDA 12.0 ARM conda packages. ([#14112](https://github.com/rapidsai/cudf/pull/14112)) [@bdice](https://github.com/bdice)
- Add `bytes_per_second` to shift benchmark ([#13950](https://github.com/rapidsai/cudf/pull/13950)) [@Blonck](https://github.com/Blonck)
- Extract `debug_utilities.hpp/cu` from `column_utilities.hpp/cu` ([#13720](https://github.com/rapidsai/cudf/pull/13720)) [@ttnghia](https://github.com/ttnghia)
# cuDF 23.10.00 (11 Oct 2023)
## 🚨 Breaking Changes
- Expose stream parameter in public nvtext ngram APIs ([#14061](https://github.com/rapidsai/cudf/pull/14061)) [@davidwendt](https://github.com/davidwendt)
- Raise `MixedTypeError` when a column of mixed-dtype is being constructed ([#14050](https://github.com/rapidsai/cudf/pull/14050)) [@galipremsagar](https://github.com/galipremsagar)
- Raise `NotImplementedError` for `MultiIndex.to_series` ([#14049](https://github.com/rapidsai/cudf/pull/14049)) [@galipremsagar](https://github.com/galipremsagar)
- Create table_input_metadata from a table_metadata ([#13920](https://github.com/rapidsai/cudf/pull/13920)) [@etseidl](https://github.com/etseidl)
- Enable RLE boolean encoding for v2 Parquet files ([#13886](https://github.com/rapidsai/cudf/pull/13886)) [@etseidl](https://github.com/etseidl)
- Change `NA` to `NaT` for `datetime` and `timedelta` types ([#13868](https://github.com/rapidsai/cudf/pull/13868)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `any`, `all` reduction behavior for `axis=None` and warn for other reductions ([#13831](https://github.com/rapidsai/cudf/pull/13831)) [@galipremsagar](https://github.com/galipremsagar)
- Add minhash support for MurmurHash3_x64_128 ([#13796](https://github.com/rapidsai/cudf/pull/13796)) [@davidwendt](https://github.com/davidwendt)
- Remove the libcudf cudf::offset_type type ([#13788](https://github.com/rapidsai/cudf/pull/13788)) [@davidwendt](https://github.com/davidwendt)
- Raise error when trying to join `datetime` and `timedelta` types with other types ([#13786](https://github.com/rapidsai/cudf/pull/13786)) [@galipremsagar](https://github.com/galipremsagar)
- Update to Cython 3.0.0 ([#13777](https://github.com/rapidsai/cudf/pull/13777)) [@vyasr](https://github.com/vyasr)
- Raise error on constructing an array from mixed type inputs ([#13768](https://github.com/rapidsai/cudf/pull/13768)) [@galipremsagar](https://github.com/galipremsagar)
- Enforce deprecations in `23.10` ([#13732](https://github.com/rapidsai/cudf/pull/13732)) [@galipremsagar](https://github.com/galipremsagar)
- Upgrade to arrow 12 ([#13728](https://github.com/rapidsai/cudf/pull/13728)) [@galipremsagar](https://github.com/galipremsagar)
- Remove Arrow dependency from the `datasource.hpp` public header ([#13698](https://github.com/rapidsai/cudf/pull/13698)) [@vuule](https://github.com/vuule)
## 🐛 Bug Fixes
- Fix inaccurate ceil/floor and inaccurate rescaling casts of fixed-point values. ([#14242](https://github.com/rapidsai/cudf/pull/14242)) [@bdice](https://github.com/bdice)
- Fix inaccuracy in decimal128 rounding. ([#14233](https://github.com/rapidsai/cudf/pull/14233)) [@bdice](https://github.com/bdice)
- Workaround for illegal instruction error in sm90 for warp instrinsics with mask ([#14201](https://github.com/rapidsai/cudf/pull/14201)) [@karthikeyann](https://github.com/karthikeyann)
- Fix pytorch related pytest ([#14198](https://github.com/rapidsai/cudf/pull/14198)) [@galipremsagar](https://github.com/galipremsagar)
- Pin to `aws-sdk-cpp<1.11` ([#14173](https://github.com/rapidsai/cudf/pull/14173)) [@pentschev](https://github.com/pentschev)
- Fix assert failure for range window functions ([#14168](https://github.com/rapidsai/cudf/pull/14168)) [@mythrocks](https://github.com/mythrocks)
- Fix Memcheck error found in JSON_TEST JsonReaderTest.ErrorStrings ([#14164](https://github.com/rapidsai/cudf/pull/14164)) [@karthikeyann](https://github.com/karthikeyann)
- Fix calls to copy_bitmask to pass stream parameter ([#14158](https://github.com/rapidsai/cudf/pull/14158)) [@davidwendt](https://github.com/davidwendt)
- Fix DataFrame from Series with different CategoricalIndexes ([#14157](https://github.com/rapidsai/cudf/pull/14157)) [@mroeschke](https://github.com/mroeschke)
- Pin to numpy<1.25 and numba<0.58 to avoid errors and deprecation warnings-as-errors. ([#14156](https://github.com/rapidsai/cudf/pull/14156)) [@bdice](https://github.com/bdice)
- Fix kernel launch error for cudf::io::orc::gpu::rowgroup_char_counts_kernel ([#14139](https://github.com/rapidsai/cudf/pull/14139)) [@davidwendt](https://github.com/davidwendt)
- Don't sort columns for DataFrame init from list of Series ([#14136](https://github.com/rapidsai/cudf/pull/14136)) [@mroeschke](https://github.com/mroeschke)
- Fix DataFrame.values with no columns but index ([#14134](https://github.com/rapidsai/cudf/pull/14134)) [@mroeschke](https://github.com/mroeschke)