WP1 New light event readout board prototype
1.1 WP presentation
This task is about providing electric signals from VUV (Vacuum ultraviolet at 178 nm) generated during
scintillations within the liquid xenon (LXe). UV photons will be collected from off-the-shelf photodetectors. To detect low charges (few hundreds of pC), deposited on the photodetector by photoelectrons, and their timing, a self-triggered circuit is needed to collect and preprocess these data to ensure proper digital conversion. When scintillation happens, few photons reach the detector due to the LXe. Since the lightemission is isotropic several detectors can output a signal. Hence, we have to discriminate correctly between low charges to select the detector receiving the most photons, i.e. the highest charge.
Typically, the photodetector output charge is converted into a time pulse whose duration is proportional
to the charge value, this technique is known as “Time Over Threshold” (TOT). Obtaining such a linear conversion is challenging for low charges yet essential to improve image SNR [18]. The pulse duration is then digitized using an external clock. Delay line architectures will be studied to determine the best compromise between complexity and accuracy. Finally, the external clock sets a time-window during which the photodetector could randomly fire several times, we must ensure that the charge to time conversion can be performed before within the time window. Also, considering the number of disintegration expected, the data flow is also tremendous, about 1.106 pulses/s.
To locate where the detection took place, each photodetector will have its digitized output data tagged
with the photodetector address. So that the Event Builder (WP2) will receive light signals with the proper localisation information.
1.2 Results
Time Over Threshold is not suited when one needs to count a large number of photon , i.e. more than 10 as shown in Fig. 1.2. As the number of PEs increasease the resulting integration of the light pulses into a single analogue pulse by the shaper does not yield difference in the time varying digital signal steming from the threshold. Thus, this does not count the number of PEs above a dozen at best. A better solution to correctly count PEs above 10 is to generate several TOTs using several threshold voltages for the same analogue pulse and then add the TOTs up to get the corresponding number of PEs, Fig. 1.2.
A new readout circuit implementing multi time-over-threshold has been designed, Fig. 1.3 shows the block diagram of the board. The PCB format is fully compatible with the XEMIS2 prototype, hence testing the prototype is real conditions wil be possible. Compared to the existing light readout circuit, this one has 6 differents digitally tunable (12 bits) threshold voltages, increased bandwidth for amplifiers and comparators bandwidths resulting in better noise performance. Figure 1.4 shows a picture of the fabricated prototype, called XSREMTOT.
A calibration protocol was carried out on a test bench to define the value of the four thresholds to reach a noise count rate of 1 kHz. The calibration set-up consists of a pulse generator, the XSREMTOT board, an FPGA, and a computer for data recording.
Figure 1.4a shows the MTOT method for two configuration of threshold levels, in blue for (1,4,16,64)
PEs an in green for (1,4,8,32) PEs. These resolutions are compared with the one of the STOT method, as well (cf. orange vs blue or green dots in Figure 1.4a). The theoretical ideal resolution, depicted by gray squares, represents a scenario where the only perturbation is the unavoidable variation in PMT gain, while all other potential disturbances are absent. This comparison highlights the impact of these disturbances on the reconstruction accuracy. Figure 1.4b directly compares the two methods: STOT and MTOT. The relative difference between these methods is calculated for both sets of thresholds. This graph significantly highlights the precision improvement provided by MTOT. With each new threshold triggered (represented by blue and green dashed lines), the resolution of the number of NPEm increases dramatically, reaching up to 70% for the highest signals (>64 PEs) with the first set of thresholds within the studied range. It is important to note that the second set (magenta curve) offers better accuracy at the beginning of the interval studied, as its configuration is optimized for detecting smaller signals. The number of PEs detected in XEMIS2 depends on the activity of the radioactive source used. Accurate knowledge of the distribution of the vaccum UV photons across the XEMIS2 PMT network as a function of activity will enable maximizing efficiency by selecting the optimal threshold configuration for each injected dose.
Another significant improvemen tbrought in by W1 is a 30% increase in the estimated event time represented by the leading edge (LE) time.
Figure 1.5 shows the standard deviation (represented by β) of the T0 distribution. Figure 1.5a compares the reconstruction of the initial time T0 using the STOT methods (yellow curve) and MTOT (blue curve for the set 1, 4, 16, and 64 PEs and green curve for the set 1, 4, 8, and 32 PEs) by plotting the parameter β as a function of NPEm. Figure 1.5b illustrates the relative difference between the two methods for both sets of thresholds, calculated as before. This visualization highlights the improvement in the precision of the reconstruction of the initial time T0 provided by the MTOT method. The correction of the walk provided by each threshold allows for an improvement of 20% to 30% in the
time reconstruction of the event in the studied range.
WP2 Event builder
2.1 WP presentation
The light and charge information received from the sensors (WP1) must be analyzed to determine the instances where these two information are synchronous, i.e. both values are above a certain threshold within a certain small time-range, indicating the occurrence of an actual phenomenon, called ’event’. These events are, in turn, outputted to computer (WP3). Therefore, these two values must be sorted first, based on their time of occurrence, before events can be detected. In the previous version of the system (XEMIS-2) that was a smaller prototype, the sorting and event-builder task was accomplished within software. As the new system under development (XEMIS-3) receives almost 256 LVDS channels for PUs (charge info), and 64 LVDS channels for PMs (light info), processing these channels in real-time to obtain the events, dictates a faster hardwarebased processing, which defines the scope of WP2. Moreover, as the number of PM/PU channels could increase in the future, for XIMEIS-3 or future versions, the hardware solution must be fast, efficient and flexible enough to be tuned to any speed requirement that the overall system demands. Therefore, FPGAs have been selected as framework to provide this performance, due to their high speed, high flexibility and easy-to-use built-in LVDS receivers. Therefore, the tasks included in WP2 can be listed as below:
- Task 1: Designing the overall system
- Task 2: Receiving & decoding of data from LVDS channels
- Task 3: Distribution of data between FPGAs (Ethernet interconnect)
- Task 4: Sorting Accelerator
- Task 4.1- Exploring the design-space and literature review
- Task 4.2- Designing algorithm & architecture
- Task 4.3- RTL implementation (VHDL)
- Task 5: Event-builder & Outputting
2.2 Results
2.2.1 Overall design
The architecture for implementing the scheme of Fig. 2.1 has been developed. According to this architecture, if the total number of PU channels is NPU , and the number of FPGAs is NFPGA, a dataset of size: n × NPU is formed by receiving n samples from each channel. This dataset is given to the first FPGA for sorting and outputting. The second dataset is, in turn, given to the second FPGA and so on. Following the same patern, (NFPGA + 1)th dataset is sent to the first FPGA, which should have already finished the processing of the 1th dataset by that time. This means that each FPGA has a time-budget of n × NPU × NFPGA cycles to receive, sort and output a dataset of size n × NPU . In this summary, only the PU receiving and sorting is covered, but the PM channels are treated similarly, but separately, up until the ”event-builder” block, where both PM and PU sorted results are processed together to construct events. Regarding practical aspects, the FPGA boards at reasonable prices have almost 40-50 LVDS receivers. Allocating 32 of which to PU channels and the rest to PM channels, demands that almost 8 FPGAs are used to accommodate all the existing PU channels, which is around 256. Therefore, NPU = 256, NFPGA = 8 and n = 16 are viable choices, simplifying the above-mentioned expressions so that, each FPGA must be able to receive, sort and output a dataset of size ’4K’ within ’32K’ clock-cycles. As n is proportional to the time-window of the dataset, higher values for n’result in more accuracy in the result, but increased cost, on behalf of the sorter, which is the bottleneck of the design, and whose size and throughput are highly dependant on dataset-size.
2.2.2 Decoding/Inputting (Task 2)
The block-level diagram of Fig. 2.1 depicts the overall data-flow of WP2, in which data is delivered to the ’Event-Builder’ after passing through the sorter. Also, due to the fact that only the portion of the received packages that contains the time info (26 bits) is needed for sorting, the rest of data is stored in RAM, to
be fetched back and used at the event-builder, when the sorted result is ready. This can substantially reduce the data-width and cost of the sorter, which is the cost/throughput bottleneck of the design. The hardware implementation for this part is currently ongoing.
2.2.3 Sorting accelarator (Task 4)
As discussed, the sorting accelerator must be able to input, sort, and output a dataset of size 4K within 32K clock-cycles (multi-streaming of inputs/outputs is allowed, accelerating the I/O phase). For this purpose, a bitonic sorter with flexible dataset-size has been developed in system-verilog; The synthesis results reveal that even for a dataset-size of 128, the combinational area-consumption exceeds the total available area of a Virtex-7 FPGA. As there must be free space in FPGAs for performing other functionalities, e.g. Ethernet, Event-Builder, etc, and also the data-set size might need to be increased further (n > 32) for improved accuracy, these requirements demand for a much more efficient sorting accelerator, which is currently the main area/throughput bottleneck of the design.
2.2.3.1 Merge-sort
Generally, for sorting large datasets with minimal cost, a divide-and-conquer strategy is practiced, where the dataset is divided into smaller subsets, with each subset sorted separately, and finally the results are merged together to construct the final result. The problem with this highly popular approach, referred to as ”merge-sort”, is that the hardware for merging at the final stage is quite costly. We have proposed an alternative solution for implementing merge-sort, which is demonstrated in Fig. 2.2. The scheme utilizes a ”pivot-finder”, inspired from the quick-sort algorithm. According to quick-sort, ”pivots” are the elements
from the dataset that can divide the dataset into equal non-overlapping subsets (’non-overlapping’ means that the range of data in each subset is distinct from the others). The extracted pivots by pivot-finder are utilized in the arranger, to merge the two subsets. The usage of pivot-finder reduces the complex design of the merger, into much simpler blocks (Arranger, 1-merger) that are substantially
less costly for hardware implementation.
2.2.3.2 Bucket-sort
The pivot-finder can also be utilized in the other popular sorting scheme for large datasets (”bucket-sort”), shown in Fig. 2.3. From hardware perspective, currently an estimation of the pivot-finder is practiced in literature (known as ”samplesort”), which is susceptible to possible hazards, due to its inaccuracy. The proposed pivot-finder can find the accurate pivots in two passes over the dataset. The RTL development of the sorting-accelerator is currently in progress.
2.2.4 Current progress
The current progress of WP2 in each task is summarized below:
Task 1: Designing the overall system – 90%
Task 2: Receiving & decoding of data from LVDS channels – 50%
Task 3: Distribution of data between FPGAs (Ethernet interconnect) – 0%
Task 4: Sorting Accelerator
Task 4.1- Exploring the design-space and literature review – 90%
Task 4.2- Designing algorithm & architecture – 80%
Task 4.3- RTL implementation (VHDL) – 10%
Task 5: Event-builder & Outputting – 25%
Task 6: Publications – 50% (Waiting for RTL results)
WP3 Acceleration of the reconstruction process for real time 3-γ imaging
3.1 WP presentation
Artificial intelligence has already demonstrated its potential within the field of medical image processing for tasks such as segmentation, denoising, super-resolution or classification. On the other hand, in the last couple of years, there has been an increasing interest in the deployment of AI within the field of raw data correction and the image reconstruction process. Potential advantages include the real-time execution once the model is trained and the potential to directly correct the physics principles of the detection process within the reconstruction task. The disintegration events from emitters used in 3- gamma imaging are processed one by one in order to determine the position of the 3rd gamma emission. Within this context, analytical methods are time-consuming and lead to inaccurate models. Artificial intelligence approaches, coupled with realistic Monte Carlo simulations (MCS) have the potential to solve this problem. Indeed, by using each photon its interaction positions in the liquid xenon and associated energy information we will train a convolutional (deep learning, DL) neural network capable of predicting the direction of the third gamma from the measurements that will be provided by the Event Builder to be developed in WP2 above. The use of MCS will provide sufficient data for model learning. We have previously used similar approaches to predict the interaction position in monolithic PET detectors [21]. The geometry of a total body clinical system based on XEMIS detector technology will be considered in these simulations in addition to the geometry of the XEMIS2 that will be used to experimentally validate the developed algorithm during the integration work to be carried out in WP4. In the second phase of this work a direct DL based image reconstruction algorithm will be developed that will produce 3D reconstructed images using the acquired raw datasets and previously determined third gamma location information. The performance of this algorithm both in terms of precision but also in terms of speed of execution will be compared with traditional iterative reconstruction algorithms used in PET imaging and a recently developed reconstruction algorithm for 3-gamma imaging developed by the LaTIM [13].
The work in this work package is closely related to the development of the Advanced Event Builder that will be provided in WP2. The output of the Event Builder will be used as the input of the reconstruction algorithm. The algorithm developed in this WP3 will be benchmarked for its performance using measured datasets within the context of the integrative WP4.
The Direct3γPET pipeline is a comprehensive solution designed to address the challenges of 3-γ PET imaging. The proposed solution is divided into three key parts: Sequence and Compton Cone Building, Histo-Image Building, and Final Image Reconstruction. Each part includes multiple subtasks, each contributing to the overall goal of accurate and efficient image reconstruction.
The full pipeline is shown in the figure below:
Part 1: Sequence and Compton Cone Building
- Event Detection:
- The process begins with detecting raw 3-γ events from the PET scanner. Each event represents a series of photon interactions that occur when gamma photons interact with the detector material.
- Photon Interaction Sequencing:
- Modified Interaction Network (MIN): To correctly sequence the photon interactions, a Modified Interaction Network (MIN) based on a graph neural network (GNN) is employed. This network is designed to handle the complex task of determining the order of interactions, particularly in cases where there are multiple interactions. The correct sequencing of these interactions is essential for the accurate construction of the Compton cone.
- Sequence Determination: The MIN processes the detected interactions, analyzing both the energy and spatial data to establish the correct order. This sequencing step is crucial because it directly impacts the accuracy of the subsequent Compton cone construction, thus the estimation of emission point.
- Emission point estimation:
- Cone Construction: Once the correct sequence of photon interactions is determined, the Compton cone is constructed. The angle of the cone is determined by the Compton scattering angle of the first interaction, and the axis of the cone is defined by the line connecting the first two interaction points.
- Emission Point Estimation: The emission point is estimated by finding the intersection between the Compton cone and the Line of Response (LOR). The LOR is defined by the two detected 511-keV photons that result from the positron annihilation. This intersection helps to narrow down the possible locations of the radioactive source, providing more precise information for image reconstruction.
Part 2: Histo-Image Building
- Line of Response (LOR) Processing:
- Blurring Effects and Detector Response Error Propagator (DREP): The detected events are processed along the LOR to account for blurring effects caused by detector imperfections. The Detector Response Error Propagator (DREP) method is applied to estimate the uncertainty in the Compton angle. This uncertainty arises due to errors in both spatial and energy measurements. These errors are modeled using a non-symmetric Gaussian distribution, which more accurately reflects the nature of the blurring effects compared to traditional symmetric models.
- Uncertainty Management: The DREP method incorporates the uncertainties in both energy and spatial resolution, which are crucial for accurately estimating the Compton angle and, consequently, the emission point. By propagating these uncertainties along the LOR, the method provides a more realistic representation of the detected events, reducing potential errors in the reconstruction process.
- Histo-Image Generation:
- Non-Symmetric Gaussian Modeling: The DREP method generates a non-symmetric Gaussian function that models the distribution of potential emission points along the LOR. This approach allows for a more accurate estimation of the emission points by taking into account the directional uncertainties associated with the detected events.
- Attenuation Correction: Attenuation correction is applied to account for the absorption of both 511 keV and 1,157 keV gamma rays as they pass through the scanned object. This correction is essential for accurately reconstructing the activity distribution, especially in regions of the body where attenuation effects are significant, such as dense tissues or large organs.
- Histo-Image Creation: The corrected data is used to generate a preliminary histo-image. This image represents the spatial distribution of the detected radioactivity within the scanned object. The histo-image serves as the foundational layer for the subsequent image reconstruction process, providing a detailed map of the detected events that will be refined in the final stages.
Part 3: Final Image Reconstruction
The final step of the Direct3γPET pipeline uses a generative model based on an encoder-decoder Convolutional Neural Network (CNN) to refine the preliminary histo-image and produce the final reconstructed image.
- Deblurring and Denoising:
- The generative model processes the initial histo-image to remove blurring and noise, which are common issues in 3-γ PET imaging. The encoder-decoder CNN is designed to keep important details in the image while reducing noise, resulting in a clearer and more accurate output.
- Adversarial Training:
- The CNN is trained using adversarial methods, where the generative model creates images, and a discriminator network tries to tell the difference between these generated images and real ones. This training method pushes the CNN to produce images that closely resemble the actual distribution of radioactive sources, leading to realistic reconstructions.
- Attention U-Net Architecture:
- The model incorporates an Attention U-Net architecture, which uses attention gates to focus on the most important areas of the input image. This helps the network to reduce irrelevant information and improve the accuracy of the final image by maintaining key structural features.
- Final Image Reconstruction:
- The generative model produces a fully reconstructed 3D image that accurately represents the scanned area. The final image is a balance between including detected events and achieving precise reconstruction, making it useful for clinical and research purposes.
3.2 Results
- Photon Interaction Sequence Determination:
- Performance of MIN: The Modified Interaction Network (MIN) was tested against other methods, such as the dϕ-criterion and fully connected neural networks (FCNN), for predicting photon interaction sequences. The results showed that the MIN provided accurate predictions, particularly for events involving more than two interactions. The MIN approach was able to effectively handle the complexities associated with multiple interactions, making it a reliable choice for this task.
- Accuracy in Sequence Determination: The accuracy of the MIN in determining the correct sequence of interactions was higher than that of the other tested methods. This was particularly important for reconstructing the Compton cone, as the correct sequencing directly impacts the accuracy of the emission point estimation.
- Image Reconstruction:
- Comparison of Reconstruction Methods: Different methods were tested for image reconstruction, including those that incorporated the interaction ordering algorithm. The results indicated that the method using this algorithm produced images with better structural preservation and noise reduction compared to other approaches. The use of adversarial training in the CNN further improved the quality of the final images by enhancing structural details and reducing noise.
- Impact of Noise and Blurring: The reconstruction methods were evaluated for their ability to handle noise and blurring in the histo-images. The CNN, particularly when trained adversarially, was effective in mitigating these issues, resulting in clearer and more accurate images. The attention mechanisms in the U-Net architecture also contributed to the refinement of the images, ensuring that the final output was both detailed and reliable.
- Overall Performance:
- Balancing Event Inclusion and Accuracy: The Direct3γPET pipeline was successful in balancing the inclusion of detected events with the accuracy of the reconstruction process. This balance is crucial for achieving high-quality 3D images that are suitable for clinical and research applications. The pipeline’s ability to accurately reconstruct the emission points, even in the presence of uncertainties and detector imperfections, demonstrated its potential as a powerful tool for 3-γ PET imaging.
The figure below shows the Emission phantom (a), the obtained histo-image (b), and the reconstructed final image with Direct3γPET pipeline.
References
[18] T. Orita, et al, “The current mode Time-over-Threshold ASIC for a MPPC module in a TOF-PET system” (2018) Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 912, pp. 303-308.
[13] D. Giovagnoli et al., “A Pseudo-TOF Image Reconstruction Approach for Three-Gamma Small Animal Imaging,” in IEEE Transactions on Radiation and Plasma Medical Sciences, doi: 10.1109/TRPMS.2020.3046409.
[21] A. Iborra, D. Visvikis, et al. “Ensemble of Neural Networks for 3D Position Estimation in Monolithic PET Detectors”. Physics in Medicine & Biology 64.19 (2019), p. 195010