Fixing GCC Errors With Act And Docker

by RICHARD 38 views
Iklan Headers

Troubleshooting GCC Fatal Errors with act and Docker

Hey guys, ever run into a brick wall when trying to get your CI/CD pipelines humming? I recently wrestled with a particularly nasty issue involving act and some gcc fatal errors, and I figured I'd share my experience. It might save you some headaches down the road.

The Setup: act, CETL, and Docker

First off, let's set the stage. I was working on a project called CETL, which is part of the open cyphal ecosystem. CETL uses build containers defined in the docker_toolchains repository. I wanted to use act to simulate my GitHub Actions workflows locally. act is a fantastic tool that lets you run your GitHub Actions workflows locally by using Docker.

I'm on an M3 Max laptop, and I can manually build the containers and run them. The problem comes when I tried to do the exact same thing with act. I use the exact same repository, Docker Desktop instance, and everything; when I try to use act gcc crashes inexplicably?

Here’s a snapshot of the problem. When act tried to compile the code using g++, it was getting terminated by a Killed signal. This led to a series of compilation failures, as shown in the error messages I was getting.

| FAILED: suites/unittest/CMakeFiles/test_pf17_variant_ctor_3__googletest_objlib.dir/Release/test_pf17_variant_ctor_3.cpp.o
| /usr/bin/g++ -DCETL_ENABLE_DEBUG_ASSERT=0 -DCETL_VERSION=\"0.0.0\" -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST -DCMAKE_INTDIR=\"Release\" -I/Users/thirtytwobits/workspace/github/thirtytwobits/CETL/cetlvast/include -I/Users/thirtytwobits/workspace/github/thirtytwobits/CETL/include -I/Users/thirtytwobits/workspace/github/thirtytwobits/CETL/cetlvast/build_external/o1heap/o1heap -isystem /Users/thirtytwobits/workspace/github/thirtytwobits/CETL/cetlvast/build_external/googletest/googletest/include -isystem /Users/thirtytwobits/workspace/github/thirtytwobits/CETL/cetlvast/build_external/googletest/googletest -isystem /Users/thirtytwobits/workspace/github/thirtytwobits/CETL/cetlvast/build_external/googletest/googlemock/include -isystem /Users/thirtytwobits/workspace/github/thirtytwobits/CETL/cetlvast/build_external/googletest/googlemock -O3 -DNDEBUG -std=c++17 -pedantic -Wall -Wextra -Werror -Wfloat-equal -Wconversion -Wunused-parameter -Wunused-variable -Wunused-value -Wcast-align -Wmissing-declarations -Wmissing-field-initializers -Wdouble-promotion -Wswitch-enum -Wtype-limits -Wno-error=array-bounds -O3 -fno-delete-null-pointer-checks -Wsign-conversion -Wsign-promo -Wold-style-cast -Wzero-as-null-pointer-constant -Wnon-virtual-dtor -Woverloaded-virtual -MD -MT suites/unittest/CMakeFiles/test_pf17_variant_ctor_3__googletest_objlib.dir/Release/test_pf17_variant_ctor_3.cpp.o -MF suites/unittest/CMakeFiles/test_pf17_variant_ctor_3__googletest_objlib.dir/Release/test_pf17_variant_ctor_3.cpp.o.d -o suites/unittest/CMakeFiles/test_pf17_variant_ctor_3__googletest_objlib.dir/Release/test_pf17_variant_ctor_3.cpp.o -c /Users/thirtytwobits/workspace/github/thirtytwobits/CETL/cetlvast/suites/unittest/test_pf17_variant_ctor_3.cpp
| g++: fatal error: Killed signal terminated program cc1plus
| compilation terminated.
[7/179] Building CXX object suites/unittest/CMakeFiles/test_pf17_variant_assignment_2__googletest_objlib.dir/Release/test_pf17_variant_assignment_2.cpp.o

The Culprit: Container Mismatch?

The core problem appeared to be that act wasn't quite using the containers I was specifying in my action.yml file. The action.yml file defines the container that should be used for the builds.

    runs-on: ubuntu-latest
    container: ghcr.io/opencyphal/toolshed:ts24.4.3

This tells GitHub Actions to use a specific container image, which, in turn, contains the required toolchains and dependencies for my project. However, act wasn't behaving as expected, so it seemed that this container wasn't being properly utilized. The crashes strongly suggested that the build environment within act was not the same as the one I was setting up. This mismatch was causing gcc to choke, likely due to missing dependencies or incompatible versions of tools.

Digging Deeper: Investigation and Debugging

To get to the bottom of this, I needed to confirm whether act was correctly pulling and using the specified Docker image (ghcr.io/opencyphal/toolshed:ts24.4.3). Here are the steps I took:

  1. Verify Docker Setup: I made sure that Docker Desktop was running smoothly on my machine, and that I could pull and run the container image manually using docker run. This step confirmed that the image itself was valid and accessible.
  2. Verbose Mode in act: I ran act with the -v (verbose) flag. This provided detailed output, allowing me to see what Docker images act was attempting to use and any potential errors during the process. This proved very helpful in identifying exactly what act was doing.
  3. Check Environment Variables: I reviewed my actrc file (located in ~/Library/Application Support/act/actrc) and the environment variables. Misconfigurations here can mess up how act interacts with Docker. I double-checked that all my container definitions were correct.
  4. Inspect the Build Process: I studied the detailed build logs provided by act to pinpoint exactly where the gcc errors were occurring. This information was key to understanding the underlying issues.

Potential Solutions and Workarounds

After a fair bit of head-scratching, I identified a few areas to look into, and these are potential solutions to the problems described above.

  1. Ensure Container Architecture Compatibility: When using the --container-architecture flag, be sure to specify the correct architecture (e.g., linux/amd64, linux/arm64). In my case, this seemed to make a difference, as it forces act to use the right architecture. I tried this by running:

    act -v -j verification-arm64 release --container-architecture linux/amd64
    act -v -j verification-amd64 release --container-architecture linux/amd64
    act -v -j verification-arm64 release --container-architecture linux/arm64
    act -v -j verification-amd64 release --container-architecture linux/arm64
    

    This ensures that act is pulling and running the correct container architecture.

  2. Act Configuration: Double-check your actrc file and any environment variables. Make sure there aren't any conflicting settings that could be interfering with the container selection.

  3. Clean up: Sometimes, Docker can be a bit finicky. Try cleaning up your Docker environment by removing unused images and containers. This can help prevent conflicts and ensure act is working with a clean slate. Use commands like docker system prune -a.

  4. Check for Resource Limitations: Ensure that your Docker Desktop setup has sufficient resources allocated (CPU, memory). Insufficient resources can lead to compilation failures and other issues, especially with complex builds.

  5. Update act: Make sure you're running the latest version of act. Older versions might have bugs related to container handling.

The Resolution (and What Worked for Me)

In the end, the key to resolving the issue was a combination of these steps. I made sure that:

  • The correct container architecture was being specified using the --container-architecture flag.
  • Docker Desktop had sufficient resources allocated.
  • I was using the latest version of act.

By focusing on these key areas, I was able to get act to play nicely with my build containers. The gcc errors disappeared, and my workflows ran successfully.

Conclusion: Sharing the Pain (and the Fix!)

Hopefully, this breakdown helps you if you run into similar issues with act and Docker. The key takeaways are:

  • Verify Container Usage: Always double-check that act is using the correct container image and architecture.
  • Use Verbose Mode: The -v flag is your friend! It provides valuable insights into what's going on under the hood.
  • Resource Management: Ensure Docker has enough resources.

Debugging CI/CD pipelines can be a real pain, but with a systematic approach, you can track down these issues. Good luck, and happy coding!