ServiceControl (sometimes) doesn't start

Hello, we succesfully intalled ServiceControl on a clean VM (Windows Server 2019).
The transport we use is Azure Service Bus. I can see that the servicecontrol service is set to start automatically. However, sometimes it fails to start after the machine is rebooted. It usually takes 3 to 4 manual retries before it starts running again.

Service Control version: 5.0.1

Things I tried so far:

None of this helped me, so I digged into the logs and the only exception I can see is the following:

2024-01-03T10:21:13.6383317Z, 1, Operations, Server, Sparrow.Server.LowMemory.CheckPageFileOnHdd, Failed to determine if drive C is SSD or HDD
2024-01-03T10:21:13.6391558Z, 1, Operations, Server, Sparrow.Server.LowMemory.CheckPageFileOnHdd, Failed to determine if drive D is SSD or HDD
2024-01-03T10:21:13.6397102Z, 1, Operations, Server, Sparrow.Server.LowMemory.CheckPageFileOnHdd, Failed to determine if drive E is SSD or HDD
2024-01-03T10:21:14.9823783Z, 7, Operations, A, Raven.Server.Rachis.RachisConsensus, Took way too much time (00:00:00.3424981) to change the state to LeaderElect in term 45. (Election timeout:00:00:00.3000000)
2024-01-03T10:21:41.0436815Z, 2, Operations, Server, Raven.Server.Program, Server is about to shut down (interactive mode)
2024-01-03T10:21:43.0776266Z, 30, Operations, Server/TCP, Raven.Server.RavenServer, Failed to accept new tcp connection again, will wait 1 seconds before retrying, EXCEPTION: System.InvalidOperationException: Not listening. You must call the Start() method before calling this method.
   at System.Net.Sockets.TcpListener.AcceptSocketAsync(CancellationToken cancellationToken)
   at System.Net.Sockets.TcpListener.AcceptTcpClientAsync()
   at Raven.Server.RavenServer.AcceptTcpClientAsync(TcpListener listener) in C:\Builds\RavenDB-Stable-5.4\54092\src\Raven.Server\RavenServer.cs:line 2221
2024-01-03T10:21:43.0772459Z, 38, Operations, Server/TCP, Raven.Server.RavenServer, Failed to accept new tcp connection again, will wait 1 seconds before retrying, EXCEPTION: System.InvalidOperationException: Not listening. You must call the Start() method before calling this method.
   at System.Net.Sockets.TcpListener.AcceptSocketAsync(CancellationToken cancellationToken)
   at System.Net.Sockets.TcpListener.AcceptTcpClientAsync()
   at Raven.Server.RavenServer.AcceptTcpClientAsync(TcpListener listener) in C:\Builds\RavenDB-Stable-5.4\54092\src\Raven.Server\RavenServer.cs:line 2221
2024-01-03T10:21:43.0772520Z, 54, Operations, Server/TCP, Raven.Server.RavenServer, Failed to accept new tcp connection again, will wait 1 seconds before retrying, EXCEPTION: System.InvalidOperationException: Not listening. You must call the Start() method before calling this method.
   at System.Net.Sockets.TcpListener.AcceptSocketAsync(CancellationToken cancellationToken)
   at System.Net.Sockets.TcpListener.AcceptTcpClientAsync()
   at Raven.Server.RavenServer.AcceptTcpClientAsync(TcpListener listener) in C:\Builds\RavenDB-Stable-5.4\54092\src\Raven.Server\RavenServer.cs:line 2221
2024-01-03T10:21:43.0772457Z, 57, Operations, Server/TCP, Raven.Server.RavenServer, Failed to accept new tcp connection again, will wait 1 seconds before retrying, EXCEPTION: System.InvalidOperationException: Not listening. You must call the Start() method before calling this method.
   at System.Net.Sockets.TcpListener.AcceptSocketAsync(CancellationToken cancellationToken)
   at System.Net.Sockets.TcpListener.AcceptTcpClientAsync()
   at Raven.Server.RavenServer.AcceptTcpClientAsync(TcpListener listener) in C:\Builds\RavenDB-Stable-5.4\54092\src\Raven.Server\RavenServer.cs:line 2221

My hope is that somebody here knows how to troubleshoot this, as my company is afraid that the service won’t start at all one day.

Thank you and best wishes for 2024!
Robert

@Jokelab - Happy New year!!
I see that you mentioned that you installed Service control 5.0.1 on a clean VM. If possible, can you send in more information about your machine so that we can make sure that it meets the hardware recommendations mentioned in the doc? It will also be helpful if you can send in your configuration details as well as the logs. You can send in these files by opening a support ticket with us.

Thanks
Jayanthi

Robert, could you browse the Windows Event Logs for any error details that aren’t listed in the ServiceControl log files?

You could be affected by the following bug:

Please apply the mentioned workaround to recover the storage corruption until we have a fix. We can assist you if you have the appropriate support agreement. As @Jayanthi_Sourirajan mentioned the easiest way to do that is via our support system.

Thank you both for your swift responses!
I didn’t see any errors related the the RavenDB bug you refer to.

The VM runs on Azure with the following configuration: Windows server 2019, Standard B4ms (4 vcpus, 16 GiB memory) .
I experience the same issue on my local development machine (Windows 11, 64GiB memory), so there must be something I’m doing wrong.

Eventually I reinstalled everything with the latest version (5.0.3) and still see the same error, so I created a support ticket with the requested data.

Best regards,
Robert

I would like to add the reproduction steps that indicate the difference in ServiceControl versions:

  • Spin up a new virtual machine in Azure, for example a Windows Server 2022 datacenter edition - Standard D2s v3 (2 vcpus, 8 GiB memory)
  • Install ServiceControl 5.0.x
    – Click on +New
    – Select Add ServiceControl and Audit instances…
    – Set name to ‘test’
    – Select Transport: Azure Service bus and provide a valid connection string
    – Leave everything else at it’s default value and click Add.
  • Wait for everything to complete and notice that the ServiceControl instance is running
  • Reboot the VM
    - Notice that the ServiceControl instance is stopped

If I do the exact same test with version 4.33.1, the ServiceControl instance still runs after the reboot.
I’m aware that the VM is not specced according to the hardware considerations, but I think it is important to notice that the newer service control version has this behavior while the older one doesn’t.

To follow up: the issue seems to be fixed in 5.0.4.

That is correct, 5.0.4 is published but not yet announced. This will happen in the upcoming hours.

Thanks for confirming that this isn’t happening with 5.0.4

– Ramon