From mboxrd@z Thu Jan 1 00:00:00 1970 Delivery-date: Wed, 10 Apr 2024 12:37:54 +0200 Received: from metis.whiteo.stw.pengutronix.de ([2a0a:edc0:2:b01:1d::104]) by lore.white.stw.pengutronix.de with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1ruVL0-005ujC-1f for lore@lore.pengutronix.de; Wed, 10 Apr 2024 12:37:54 +0200 Received: from localhost ([127.0.0.1] helo=metis.whiteo.stw.pengutronix.de) by metis.whiteo.stw.pengutronix.de with esmtp (Exim 4.92) (envelope-from ) id 1ruVL0-0006nc-01; Wed, 10 Apr 2024 12:37:54 +0200 Received: from mail.thorsis.com ([217.92.40.78]) by metis.whiteo.stw.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ruVKs-0006i1-4D for distrokit@pengutronix.de; Wed, 10 Apr 2024 12:37:46 +0200 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 9F010148CAB8; Wed, 10 Apr 2024 12:37:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thorsis.com; s=dkim; t=1712745461; h=from:subject:date:message-id:to:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=T869kmqL9SpKDD4ue2jM/escl+yezFvRhY+4ZROQgJ8=; b=wFD/e73o+AmfsadqUq9LgASMix1myemb7FqiQaQ4Ng/CngbubUU+RXZ0g+3ZR6ET/m/Fhx UbvjnmSZa7IPFVFxEOAkzWGLQQN/YtvD1DXps3tOYEJWbNs77aGMYIdUhEXiy0VxunLDot 0rZe1hYsXcrO1OBDHGZZGTaUSacrQTFuTuHbmRnrZwEsra2MdyZakrYk7atLF3inHfpFXk HOJRzUnyEfstRi0BHNe79mzdllk2/4uhBdYJfFVZSCmZEfbJQ4iK2seECzuA5rzwSBoSCP EeK3NzTgYJmE2tILUV7s78nWnmzDYs4IJsT3TNztqWUU91Y4myJwBrz9a5kJZA== Date: Wed, 10 Apr 2024 12:37:39 +0200 From: Alexander Dahl To: Lennart Poettering , systemd-devel@lists.freedesktop.org, distrokit@pengutronix.de Message-ID: <20240410-cavalier-droop-6034d4ff719e@thorsis.com> Mail-Followup-To: Lennart Poettering , systemd-devel@lists.freedesktop.org, distrokit@pengutronix.de References: <20240409-moonlight-viewpoint-29d2a170d8b7@thorsis.com> <20240409-encounter-aptitude-09d3f64a697c@thorsis.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20240409-encounter-aptitude-09d3f64a697c@thorsis.com> User-Agent: Mutt/2.2.12 (2023-09-09) X-Last-TLS-Session-Version: TLSv1.3 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on metis.whiteo.stw.pengutronix.de X-Spam-Level: X-Spam-Status: No, score=-2.1 required=4.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, URIBL_SBL,URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.2 Subject: Re: [DistroKit] [systemd-devel] How to debug systemd services failing to start with 11/SEGV? X-BeenThere: distrokit@pengutronix.de X-Mailman-Version: 2.1.29 Precedence: list List-Id: DistroKit Mailinglist List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "DistroKit" X-SA-Exim-Connect-IP: 127.0.0.1 X-SA-Exim-Mail-From: distrokit-bounces@pengutronix.de X-SA-Exim-Scanned: No (on metis.whiteo.stw.pengutronix.de); SAEximRunCond expanded to false Hello everone, I thought I knew how to let the kernel create coredumps … (see below). Am Tue, Apr 09, 2024 at 04:21:21PM +0200 schrieb Alexander Dahl: > Hello Lennart, > > thanks for your quick reply, see below. > > Am Tue, Apr 09, 2024 at 03:53:24PM +0200 schrieb Lennart Poettering: > > On Di, 09.04.24 14:42, Alexander Dahl (ada@thorsis.com) wrote: > > > > > Hello everyone, > > > > > > I'm currently trying to build a firmware for an embedded device and > > > running into trouble because systemd seems to crash. The BSP is > > > based on pengutronix DistroKit (master) built with ptxdist and the > > > target is the Microchip SAM9X60-Curiosity board, which is arm v5te > > > architecture (that board is not part of DistroKit, support for that is > > > in an upper layer of mine not public yet (?)). > > > > > > Everything is quite recent, building systemd version 255.2 currently. > > > On startup I get messages like this (this is the first one, later on > > > there are lot more, all with the same status): > > > > > > [ 11.175650] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV > > > [ 11.239679] systemd[1]: systemd-journald.service: Failed with result 'signal'. > > > [ 11.292640] systemd[1]: Failed to start systemd-journald.service. > > > [FAILED] Failed to start systemd-journald.service. > > > See 'systemctl status systemd-journald.service' for details. > > > > > > The system drops me on a shell later, where I can run the above > > > mentioned command, which gives: > > > > > > ~ # systemctl status systemd-journald.service > > > x systemd-journald.service - Journal Service > > > Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static) > > > Active: failed (Result: signal) since Tue 2024-04-09 11:44:52 UTC; 11min a> > > > TriggeredBy: x systemd-journald-dev-log.socket > > > * systemd-journald-audit.socket > > > x systemd-journald.socket > > > Docs: man:systemd-journald.service(8) > > > man:journald.conf(5) > > > Process: 197 ExecStart=/usr/lib/systemd/systemd-journald (code=killed, sign> > > > Main PID: 197 (code=killed, signal=SEGV) > > > FD Store: 0 (limit: 4224) > > > CPU: 330ms > > > > > > This does not help me much. Other services crashing: systemd-udevd > > > and systemd-timesyncd, also with status 11/SEGV which is segmentation > > > fault, right? > > > > Yes. > > > > > I had this board running with an older version of systemd, but I can > > > not remember which was the last good version. > > > > > > Could anyone give me a hint please how to debug this? > > > > "coredumpctl gdb" should get open the most recent backtrace for you. > > This gives: > > ~ # coredumpctl gdb > No journal files were found. > No match found. > > > The coredump should also show up in the logs with a backtrace. > > I only have serial console output. journald is crashing. With > `dmesg` I see systemd messages in kernel log, but no backtrace. > gdbserver is installed on target, no gdb currently. Trying to get a > coredump tomorrow. Well I tried for like three hours now, and I could not get a coredump from journald nor udevd. Tried with the systemd way of creating coredumps, which means /usr/lib/systemd/systemd-coredump is registered as kernel.core_pattern with settings from /usr/lib/sysctl.d/50-coredump.conf and I even made sure systemd-coredump.socket is active (started from the maintenance shell): ~ # systemctl status systemd-coredump.socket * systemd-coredump.socket - Process Core Dump Socket Loaded: loaded (/usr/lib/systemd/system/systemd-coredump.socket; static) Active: active (listening) since Tue 2024-04-09 11:46:21 UTC; 2min 44s ago Docs: man:systemd-coredump(8) Listen: /run/systemd/coredump (SequentialPacket) Accepted: 1; Connected: 0; CGroup: /system.slice/systemd-coredump.socket Started a long running process like this: ~ # /usr/lib/systemd/systemd-udevd -d Starting systemd-udevd version 255.2 Killed it like this: ~ # kill -11 269 Got this on serial console (kernel log): [ 329.282462] systemd[1]: Started systemd-coredump@1-271-0.service. [ 329.454739] systemd[1]: Listening on systemd-journald-dev-log.socket. [ 329.552429] systemd[1]: Starting systemd-journald.service... [ 329.953204] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV [ 329.966458] systemd[1]: systemd-journald.service: Failed with result 'signal'. [ 329.977845] systemd[1]: Failed to start systemd-journald.service. [ 330.002231] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1. [ 330.092289] systemd[1]: Starting systemd-journald.service... [ 330.513858] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV [ 330.533602] systemd[1]: systemd-journald.service: Failed with result 'signal'. [ 330.544728] systemd[1]: Failed to start systemd-journald.service. [ 330.570419] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 2. [ 330.652355] systemd[1]: Starting systemd-journald.service... [ 331.071540] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV [ 331.084789] systemd[1]: systemd-journald.service: Failed with result 'signal'. [ 331.104092] systemd[1]: Failed to start systemd-journald.service. [ 331.121208] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 3. [ 331.212357] systemd[1]: Starting systemd-journald.service... [ 331.648631] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV [ 331.662108] systemd[1]: systemd-journald.service: Failed with result 'signal'. [ 331.673588] systemd[1]: Failed to start systemd-journald.service. [ 331.702297] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 4. [ 331.782346] systemd[1]: Starting systemd-journald.service... [ 332.221658] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV [ 332.244381] systemd[1]: systemd-journald.service: Failed with result 'signal'. [ 332.255494] systemd[1]: Failed to start systemd-journald.service. [ 332.265870] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 5. [ 332.291243] systemd[1]: systemd-journald.service: Start request repeated too quickly. [ 332.299240] systemd[1]: systemd-journald.service: Failed with result 'signal'. [ 332.308392] systemd[1]: Failed to start systemd-journald.service. [ 332.315720] systemd[1]: systemd-journald.socket: Failed with result 'service-start-limit-hit'. [ 332.335020] systemd[1]: systemd-journald-dev-log.socket: Failed with result 'service-start-limit-hit'. [ 332.838500] systemd[1]: systemd-coredump@1-271-0.service: Deactivated successfully. [ 332.851322] systemd[1]: systemd-coredump@1-271-0.service: Consumed 1.004s CPU time. As you can see the systemd-coredump@1-271-0.service is started, and systemd tries to start systemd-journald.service again and again, each failing with segfault. However after that folder /var/lib/systemd/coredump/ is completely empty. I tried with a different (simple) core pipe handler which works all the time on a non systemd system: ~ # cat /usr/local/bin/pipecore.sh #!/bin/sh /usr/bin/xz -z -0 - > /mnt/data/tmp/cores/core-${1}-s${2}-${3}.xz ~ # echo '|/usr/local/bin/pipecore.sh %e %s %t' > /proc/sys/kernel/core_pattern For stuff not started with systemctl this successfully creates coredumps in my folder. Trying `systemctl start systemd-journald.service` after that: nothing. Something is preventing coredumps of systemd itself, and I have no idea what or why. Read https://systemd.io/COREDUMP/ but it did not help me in this case. Will try to go back to a working previous version now, and maybe bisect later if my frustration got back to a normal level. Greets Alex