Anbox with Wayland support - SurfaceFlinger as a Wayland client


#1

So the idea behind this is to make Anbox headless and to turn Surfaceflinger into a Wayland client, in order for Anbox to support Wayland. At the moment this works on Sailfish, it boots and renders but no input. As an extra feature of sorts, this is running on an ARM device.

Let’s break it down:-

basic idea:

  • make Anbox headless (per define or runtime flag)
  • SurfaceFlinger as a Wayland client.
    • advantage: fast
    • disadvantage: works only on devices which use libhybris (or rather have android-native EGL / GLES / gralloc blobs)

implementation details:

  • SurfaceFlinger as a Wayland client
  • we bind mount /dev/ion and GPU specific nodes into the container as well as the Wayland socket.
  • we use overlayfs to insert the gralloc / EGL / GLES blobs into the container: /system can stay read-only
  • for some devices we need linker hack:
    • linker hack means undefined but needed symbols are mapped to abort()
      this is because some of our devices are not 7.1 compatible ootb
      and those symbols are not called in many cases so mapping them to abort() is a hack and
      should only be enabled when absolutely necessary (e.g. if the blobs can be made to work with this only)
  • SurfaceFlinger handles input events and sends them via uinput to InputFlinger.

current status:

  • SurfaceFlinger as a Wayland client with single window support is implemented
  • multi-window is pending
  • SurfaceFlinger handling input events and sending them to InputFlinger is implemented locally but:
    • SurfaceFlinger can create the uinput event node BUT InputFlinger cannot open
      it due to Permission denied even though the permissions are correct AND there
      is no race condition between ueventd and InputFlinger, so it seems something
      in LXC is preventing us from doing so.

#2

I am not sure I understand why we have to make Surfaceflinger aware of wayland at all. Anbox abstracts this entirely already and you can simply implement a different platform inside Anbox or use the existing SDL one and never touch the Android world.

Why do we have bridge wayland into the container? Is this only for performance reasons? My last tries on an Android device gave me fairly usable frame rates for Android applications given that Anbox is completely not optimized.


#3

To summarize what I said on IRC:

  • Anbox can run as is on a Sailfish / Ubuntu Touch device as long as the SDL lib it uses has support for Wayland/Mir in a single-window mode
  • Anbox doesn’t require any GL-to-ES translator. This is optional and handing --gles-driver=host to the session manager will tell it to always use the host EGL/GLES libraries
  • The communication pipe we use with the container may cause a little performance loss but having a generic Android image running everything is worth this loss.
  • Adding other host dependencies/overlayfs/linker tricks into the container makes the whole system a lot more complex and hard to maintain.

What we should start to improve instead of hacking around and trying to bridge wayland/any native hardware drivers into the container is the performance of the GL serialization stream. My last tries on a Android device gave me 25-30 fps which is fairly usable for normal applications. Not enough for games though but it’s a good start for a not-optimized-in-any-way solution.


#4

The whole point of this idea is to make anbox run at near native speeds even on not as fast devices, that should have been said from the start, it is not intended as a replacement for the current method and it would be entirely optional, say anbox session-manager --gles-driver=native. What is wrong with hacking to show something works? :). The qemu pipe which the anbox pipe is based on has been optimized a lot, are there any bottlenecks that we could work on? The reason why we made Android aware of wayland is because using SDL inside the container for this purpose is overhead (in the sense of work overhead). Of course a Mir backend or even X11 backend could be implemented in one way or another. There are mesa android drivers so this could be made to work even on non-libhybris based operating systems. The overall maintenance overhead for the anbox project is minimal: a -DHEADLESS build. Since we will be using overlayfs we can even share the same rootfs.
But i am aware that this adds more complexity. That being said i am not entirely sure whether the emugl pipe or this approach is more complex. Anbox does not run on most SailfishOS devices out of the box but i am sure this will change. Due to kernel configurations and build dependancies (e.g. some boost components).


#5

Please don’t get me wrong. I am not totally against this, just trying to get to the best decision for Anbox.

The whole point of this idea is to make anbox run at near native speeds even on not as fast devices, that should have been said from the start, it is not intended as a replacement for the current method and it would be entirely optional, say anbox session-manager --gles-driver=native.

Yes and no. There quite a bit more than a simple switch as the entire foundation currently relies on certain things.

The qemu pipe which the anbox pipe is based on has been optimized a lot, are there any bottlenecks that we could work on?

We can’t map things from the QEMU emulator 1:1 to Anbox. The pipe implementation in QEMU is different then from what we do in Anbox. Right now we have a single local socket which is responsible for different communication channels. The code responsible for this was written for Anbox and is not copied from anywhere else. I am sure there is a lot room for optimization. Before we can start to talk about this we need to get some numbers. Adding support for LLTng would be the next natural step here to look into the pipeline and the graphics stack to see what is happening and then take further decisions.

The reason why we made Android aware of wayland is because using SDL inside the container for this purpose is overhead (in the sense of work overhead).

I agree, using SDL inside the container doesn’t make sense. However it contradicts one of the goals of Anbox of having the container being agnostic to the host.

That being said i am not entirely sure whether the emugl pipe or this approach is more complex.

As whatever we do here has consequences for more than Sailfish OS. So the decision needs to be wisely choosen. Lets sit back and take a first step and get Anbox working as is. Then lets see where are the bootlenecks and what we can do to optimize these. If we end up with a situation that things are not well working we can still switch to a native approach.

I think long term the community gains a lot more from such a generic approach than diversing and creating device specific implementations which are hard to maintain, hard to oversee and complex to debug. If we all work with the same container, we all have to solve the same problems, etc.

Anbox does not run on most SailfishOS devices out of the box but i am sure this will change. Due to kernel configurations and build dependancies (e.g. some boost components).

True, but I wouldn’t count any packaging / missing dependency as a blocker or problem.


#6

Three more things I forgot to say:

  • There is no notion of supporting multiple Android versions with Android. There is already some initial work to move to the upcoming O release which would make the maintenance of such an approach even harder and would mean not all users of Anbox can move together to the next Android release. Maybe this releases a bit with the coming stablized Android HAL API but that is still far into the future.
  • There is interest from others using Anbox too and all will share the same goal of optimizing it’s performance. This isn’t a problem your are faced with alone.
  • Anbox already uses a fair kind of additional meta data today to enable multi-screen support. I am planing to implement something similar for stacked window manager environments but bringing SurfaceFlinger into the game makes this again more complex.

#7

Btw. could that be because a user namespace is used and the unprivileged user/group inside the container doesn’t match the ones on the outside.


#8

@nh1402 @krnlyng what do you think?


#9

@morphis The overlayfs is mainly to ease development as of now, so that we don’t have to compile the whole of anbox squashfs image with any little changes we make.

As for the code complexity, the idea was to cut down the code complexity and let surfaceflinger directly deal with the windowing issues instead of the surfaceflinger -> anbox/SDL -> wayland path. Also to get rid of code atrocities like this from the initial implementation of wayland support https://github.com/sfdroid/anbox/commit/ccdaa9820c1625c8a2e96ea56592d79bf3d7a96d#diff-86488119f76b7ff0bee3f659bca81a56R46 . We hoped to not worry about “optimizing the code so it runs the best” issues. As an added bonus, going via. this path meant we didn’t have to worry about some missing dependency issues that came from emugl. Otherwise, I manually compiled/backported all the missing dependencies for anbox. ( https://build.merproject.org/project/show/home:saidinesh5:anbox ). (I still have some LXC issues/3.4 kernel issues on getting this anbox build running on my device, So @krnlyng can give us the details about the performance numbers in this build).

As for me, next weekend, I will try getting snapd working on sailfish, so we can test the performance of the upstream version of anbox directly.


#10

I am ok with you guys following this road. I think I said what I was thinking about :slight_smile: Lets sit down and see how we can integrate these changes and make them work well. I really want that things are staying together and Anbox isn’t used as a framework but stays the thing everybody just pulls into his product and it just works.

I see there are many interesting changes in the anbox tree linked above. Can you make sure you submit these as individual PRs as soon as possible so we don’t end up with a fork nobody cares about to merge back with upstream?

Also if we land this into the Anbox tree we need to give this a better name than “sfdroid” as otherwise we’re mixing two different things here. We should simply call this the “native approach” but maybe we find a better abreviation for this. I am thinking about having this code path being invoked by a --native-rendering switch for the session-manager.

Lets make this a first class citizen and lets put the development into the Anbox upstream tree as soon as possible.


#11

I’ve started to abstract the platform implementation in Anbox a bit more. See https://github.com/anbox/anbox/pull/326