10X single-cell data & HDF5Array performance

Earlier this year 10X Genomics released a single-cell RNA-sequencing dataset containing data from 1.3 million mouse brain cells.  The blog post accompanying the release contained the provocative statement “We do not recommend loading the file into R, due to the file size and the lack of 64 bit integers support in R.” This is a bit of a non-sequitur, and naturally there has been a push within the Bioconductor community to address such concerns and show how to work with such datasets efficiently. Here we look at some basic benchmarks of R & Bioconductor’s performance on this dataset.

