Cairo is not a good fit, unless all you care about is drawing a frame once in a while, and you don’t mind doing so on the CPU.
Modern UI toolkits try to move all rendering to the GPU, the one dedicated component that allows you to hand over buffers to the compositor in order to present them on the screen; and allows you to power down the CPU in order to consume less power/battery.
The best cross-platform API for using the GPU is, currently, OpenGL; even in the future it’s entirely likely that OpenGL will keep working as a compatibility layer on top of Vulkan. For the time being, Vulkan does not offer us anything we need—though at some point in the future we may decide to spend time to write a better Vulkan renderer and switch to it.
My feeling is, that the author of Blend2D does not fully agree with you. At least drawing elements like lines with the GPU does not work well, projects like FastUIDraw, GitHub - jpbruyere/vkvg: Vulkan 2D graphics library or the cairo GL backend mostly failed. And the Blend2D author argued, that CPU (SIMD) drawing consumes less power, compared to GPU, which is important for mobile devices. I was told that transferring data from the GPU memory back to CPU or RAM is a bottleneck. But that is what I heard, I have no own experience.
Well, they are entirely entitled to their opinion; but you have to realise that basically everyone has moved already, or is in the process of moving to GPU-based renderers.
The Cairo GL backend was a science experiment. The problem is that Cairo was never designed to work on GPUs. It was designed to work on the hardware that was prevalent in the Linux space in 2003-2005, which was heavily skewed towards CPUs and shared system memory buffers that had fast readback properties; it also worked on a very specific subset of integrated Intel GPUs, because those had dedicated 2D pipelines. Those GPUs do not exist any more.
Yes, and that’s why Cairo is generally a bad API to try to migrate to the GPU. Nevertheless, it’s perfectly possible to write a fairly good renderer using the GPU and OpenGL; you need to understand how GPUs work, and you need to avoid doing work on the CPU after you pushed the data to the GPU. Which is what GTK4 does. It’s also what web rendering engines do, as well as other toolkits on both desktop and mobile, where you want to offload as much as possible to the GPU so that the whole SoC package can be placed on a lower power level as quickly as possible, in order to preserve battery. SIMD types are perfectly fine for operating on bulk vector data—that’s why I wrote Graphene, which is also what GTK uses; but you either have a very fancy Intel CPU in order to have access to a good amount of vectorised registers, or you want to do vectorisation on the GPU, which has more, wider, more efficient, parallelised pipelines.
Moving big chunks of data between CPU and GPU isn’t fast. I tried making a filter a while back for images that were in CPU memory and passing them to the GPU for filtering and back again to CPU space. The SIMD was fast but the back and forth transfer was slow. It was much faster to slice the image into strips and filter the strips in separate threads on the CPU.
Maybe one day Intel will make use of their CPU-GPU shared memory facility and the bottleneck will go away.