SB-431542 br A systematic outlier was the spinal
A systematic outlier was the spinal cord, in the bottom right corner of Fig. 3. According to the guidelines, the cranial border of the spinal cord should be the tip of the dens of C2, but the ROs dif-fered on the tip location from 1-3 CT slices (3–6 mm difference). The caudal border of the spinal cord showed even more variation between ROs, from the bottom of the CT scan to the cranial border of T3 (30–93 mm difference). According to the guidelines, the cau-dal border of the spinal cord should reach to the upper edge of T3, although for caudal tumours it should reach 5 cm under the plan-ning target volume (PTV). To accommodate for these different cases, the CNN was trained to delineate the spinal cord to the most caudal slice of the planning CT scan. One of the ROs systematically corrected the automated delineations with respect to the caudal border of the spinal cord, while the other did not. On axial planes, however, no differences in spinal cord delineations were observed, as illustrated in Fig. 2.
The IOV, quantified by DSC and ASSD reduced significantly for most OARs by using automated delineations compared to manual delineations. We are convinced that consensus guidelines are important to decrease therapeutic variability and to train a CNN, but that implementation of a CNN to automatically generate con-tours, decreases IOV even more. Of course, this is only the case if the CNN generates correct delineations. The reason for this is that if multiple ROs receive the same automatically generated contours, these will not need much modification, and therefore will result in less IOV. If the CNN would generate incorrect contours which would need more modification, this would result in more IOV, sim-ilar to manual delineation. ASSD decreased significantly for all OARs, except for both cochlea, which are small structures ( 1 ml) for which DSC and ASSD are very susceptible to small delineation differences. DSC did not improve significantly for brainstem and oral cavity, while ASSD did. This is because DSC is volume dependent and for these large OARs only large delineation differences will impact DSC. Moreover, IOV between manual delin-eations was already small for oral cavity when measured with DSC (94% on average) and an improvement in DSC to 96% using the
72 Benefits of deep learning for delineation of SB-431542 at risk in head and neck cancer
Fig. 2. Illustration of intra- and inter-observer variability between manual delin-eations (Am Cm) and corrected delineations (Ac Cc), for both observers (RO1, RO2). Notice a decrease in IOV in Ac compared to Am, and Bc compared to Bm, even with scatter artefacts. The decrease in IOV observed in Cc compared to Cm is due to a difference in delineation by RO2, independent of the network. Figs. D1 and D2 show the difference in cranial and caudal border selection by the two ROs for brainstem and spinal cord.
automated delineations was therefore not significant. The spinal cord unexpectedly showed one of the worst DSC and ASSD results compared to the other OARs. This is mainly due to the difference in caudal border chosen by one of the ROs as already explained above, resulting in an underestimation of the benefit of automated delin-eation on IOV for this structure. However, this difference is not
clinically relevant: even though it has an influence on DVH, it has no impact on plan creation, evaluation and acceptance because the maximum (Dmax) and not the average dose (Dmean) to the spinal cord is taken into account. In a serial OAR, like the spinal cord, loss of function in one part will cause the entire organ to stop function-ing. A high dose to a small volume can cause serious toxicity and therefore the risk of damage is dominated by Dmax. In a parallel OAR like the salivary glands, loss of function in one part of the OAR can be compensated by an unaffected part. Therefore, there is a threshold volume effect and the risk of injury, in this case resulting in xerostomia, is dominated by the Dmean over the whole OAR. Mucosa, like that of the oral cavity, is neither serial nor par-allel, but behaves clinically like a parallel OAR, as desquamation of a large area of mucosa is more problematic than a small area . The requirements and importance of correct OAR delineation and its impact on treatment planning thus depends on the type of OAR.
Automated delineation of OARs on HNC planning CT scans using a CNN has been previously investigated [20,26,27]. Ibragimov et al. was the first to use a tri-planar convolutional CNN and concluded that their method performed better or comparable to state-of-the-art algorithms and commercial software for spinal cord, mandible, larynx and pharynx, and inferior for parotid- and submandibular glands, with average DSCs ranging from 69% and 90%, for pharynx and mandible respectively. Zhu et al. used a 3D convolutional neural network including whole volume image segmentations and reported DSCs ranging from approximately 81% for submandibular glands to 92% for mandible. The advantage of our model is the preservation of spatial context while still using patch-based approaches for processing 3D information in detail. Moreover our results are similar to those from Nikolov et al. but difficult to com-pare since they use a slightly different measure, i.e. the surface dice.