Cell Segmentation for Blood Cancer Diagnosis

Introducing the winning solution for SegPC-2021 Plasma Cell Segmentation competition

4. maj 2021 - Avtor Álvaro García Faura

4. maj 2021
Avtor Álvaro García Faura

The Problem and its Importance

Multiple Myeloma (MM) is a type of blood cancer, specifically of plasma cells. According to the International Agency for Research on Cancer [1], the worldwide absolute incidence of MM in 2018 was 160,000, and its mortality 106,000. Among others, one of the ways to diagnose MM is by performing a bone marrow biopsy and finding cancerous plasma cells. To assess the severity of the disease, the percentage of bone marrow occupied by these cells is computed.

In the diagnosis of this and other types of cancer, the use of computer-assisted tools has been gaining increasing interest, since it could be a real game-changer in terms of diagnostic accuracy and speed. At XLAB, we wanted to take part in this breakthrough, so we decided to register at the SegPC-2021 challenge for plasma cell segmentation, organized as part of the 2021 edition of the IEEE International Symposium on Biomedical Imaging (ISBI). We saw it as the perfect opportunity not only for putting our data science skills to work and learn on the way, but also for doing it in a meaningful way.

The Challenge

Overall, the goal of the SegPC-2021 challenge [2-5] was the development of a method for instance segmentation of MM plasma cells. For those unfamiliar with machine learning challenges, let’s just say that different teams compete to provide an algorithm that would produce the best results on data to which they do not have prior access or for which they don’t know the ground-truth labels. When working with images, labels tell what every pixel in the image corresponds to. For SegPC, this could be either background, cell nucleus, or cell cytoplasm.

We have included below one of the microscopic images in the competition’s dataset along with the labels that were provided for that image. As you can see, cells are labeled individually and their nucleus and cytoplasm are also given separately. This is what makes it an instance segmentation problem. If we were talking about semantic segmentation, our goal would be to detect all the ‘nucleus’ pixels and all the ‘cytoplasm’ pixels in the image, but not to separate them into individual cell instances. All we can say is: challenge accepted!

Original image (upper) and same image with nucleus and cytoplasm instance labels overlapped using different colors (bottom).

Our Solution

By developing and carrying out a structured experimentation phase, we managed to put the right effort in the right direction and finally came up with the approach with which we obtained the winning result. We combined state-of-the-art instance segmentation architectures such as SCNet [6] and convolutional backbones such as ResNeSt [7], among some others, making the necessary tweaks to make them perfectly suit the specific problem we wanted to solve.

Another important procedure that helped us boost the performance of our model is image augmentation. In total, the competition organizers provided us with around 500 labeled images containing a total of 2633 cell instances in order to train our models. Image augmentation is a process by which you create more valid training examples out of the ones you have and, if done correctly, that will for sure help!

But you have to be careful. If you are trying to distinguish between cats and dogs, you probably don’t want to modify your original images so much that your dogs won’t look like any kind of animal anymore. That’d be self-sabotaging your learning process. Same applies here, but in our case, we found out that even with very heavy augmentations (this is, modifying the images a lot), the obtained results still improved, so we finally produced up to 50 very different images per each one of the original ones.

Original image (upper) and 25 heavy augmentations produced from it (bottom).

Finally, even if our best model alone already produced impressive results, we wouldn’t have won the competition if that would have been our final choice. Instead, we decided to give an extra performance boost by combining several of our best performing models. The approach we used to combine their predictions is quite simple though. It’s referred to as majority voting, and below there’s a picture that perfectly illustrates it.

Illustration of the majority voting method used to combine predicted cell instances from different models.

Fierce Competition

A total of 41 competitors from all over the world participated in the first phase of the challenge. In this phase, for which only 298 labeled images were available, our XLAB Insights team succeeded to obtain the first position with a 0.9360 mIoU. This metric, the mean Intersection-over-Union (mIoU), takes values between 0 and 1 and measures how well your predictions match the original ground truth labels. For the final phase, 200 additional labels images were released and 25 teams were shortlisted. Finally, we achieved a mIoU of 0.9368 in the final test set, with which we won the SegPC-2021 Multiple Myeloma Plasma Cell Segmentation competition.

What’s next?

At XLAB, we have already gotten hands-on with our next steps in the medical domain, which include much more on stuff like cell segmentation, anomaly detection in medical images, and brain MRI segmentation. Stay posted!

And if you’re already curious, you can see a bit of what we’ve already achieved by having a look at our recent CVPR workshop publication on unsupervised detection of cancerous regions in histology imagery [8] or at our previous blog post on personalized paediatric cancer treatments.

*Participation in the ISBI challenge was a part of the European Commission project iPC (grant agreement number 826121).*

References

[1] - Ferlay, Jacques, et al. “Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods.” International journal of cancer 144.8 (2019): 1941-1953.

[2] - Anubha Gupta, Ritu Gupta, Shiv Gehlot, and Shubham Goswami, “SegPC-2021: Segmentation of Multiple Myeloma Plasma Cells in Microscopic Images”, IEEE Dataport, doi: https://dx.doi.org/10.21227/7np1-2q42.

[3] - Anubha Gupta, Rahul Duggal, Shiv Gehlot, Ritu Gupta, Anvit Mangal, Lalit Kumar, Nisarg Thakkar, and Devprakash Satpathy, “GCTI-SN: Geometry-Inspired Chemical and Tissue Invariant Stain Normalization of Microscopic Medical Images,” Medical Image Analysis, vol. 65, Oct 2020. DOI: https://doi.org/10.1016/j.media.2020.101788. (2020 IF: 11.148)

[4] - Shiv Gehlot, Anubha Gupta and Ritu Gupta, “EDNFC-Net: Convolutional Neural Network with Nested Feature Concatenation for Nuclei-Instance Segmentation,” ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 1389-1393.

[5] - Anubha Gupta, Pramit Mallick, Ojaswa Sharma, Ritu Gupta, and Rahul Duggal, “PCSeg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma,” PLoS ONE 13(12): e0207908, Dec 2018. DOI: 10.1371/journal.pone.0207908

[6] - Vu, Thang, Haeyong Kang, and Chang D. Yoo. “SCNet: Training Inference Sample Consistency for Instance Segmentation.” arXiv preprint arXiv:2012.10150 (2020).

[7] - Zhang, Hang, et al. “Resnest: Split-attention networks.” arXiv preprint arXiv:2004.08955 (2020).

[8] - Stepec, Dejan and Skocaj, Danijel . “Unsupervised Detection of Cancerous Regions in Histology Imagery using Image-to-Image Translation.” arXiv preprint arXiv:2104.13786 (2021)