ServiceControl on Linux containers early access questions

DavidBoike · July 18, 2024, 4:52pm

@stesme-delen The “connection refused” message would come from far enough down the stack that the database container isn’t even getting the message. Even though the other pod can connect, there must be something wrong with the configuration.

Could you open a non-critical support case and include the K8S configuration you’re using?

stesme-delen · July 18, 2024, 5:08pm

Hi David, I found the problem. The RavenDb container went out of memory and rebooted every time I ran the servicecontrol --setup. I gave it 1Gi and now it runs. Thx!

DavidBoike · July 19, 2024, 9:15pm

Today we’ve made major updates to the ServiceControl documentation in support of containers. The documentation previously leaned very heavily on ServiceControl Management, so in addition to documenting the container bits, we also had to do quite a lot of restructuring.

Here are some highlights in the new documentation:

Container deployment docs for each instance type:
Migration guide from Windows to containers
- The migration process leans heavily on the same procedures previously used for migrating from RavenDB 3.5 to RavenDB 5. This process been generalized in Replacing Error instances and Replacing Audit instances
The configuration pages for Error, Audit, and Monitoring instances have been updated so it’s clearer how to set the value whether you’re using an App config key, Environmnent variable, or ServiceControl Management field.
The ServiceControl Transports page has been updated to document the transport types that were previously only available in the ServiceControl Management dropdown or via a PowerShell cmdlet.

And we’re not done…

Soon we’ll be releasing a new version of ServiceControl that allows for setting the instance/queue name in containers. (This was always based on the Windows Service Name before, which makes no sense for containers.)
We’re adding reverse proxy capabilities to our ServicePulse container, so that communication with ServiceControl and Monitoring goes through the same host as ServicePulse, giving you a single point of ingress/egress to secure with SSL and authentication.
We’re still working on samples/documentation for compose/k8s.

Stay tuned…

bording · July 23, 2024, 9:20pm

@ivanl With the release of ServiceControl 5.5.0, we have now made the InstanceName setting be the consistent way to control the name of the instance/queue:

SERVICECONTROL_INSTANCENAME
SERVICECONTROL_AUDIT_INSTANCENAME
MONITORING_INSTANCENAME

ivanl · July 29, 2024, 1:59pm

Thanks @bording I can confirm that these work well.

Separately, @DavidBoike (or anyone), I have been struggling to get RavenDB persistence working on Azure Container Apps. I have set up a volume mount to file share on both a Standard General Purpose V2 account and a Premium FileStorage account type. RavenDB is able to create a System folder within the volume mount but fails to create the lock file within the System folder.

Here is a snippet from the stack trace (I customised the data path via an environment variable, but the same error is obtained without customisation of the data path).

System.InvalidOperationException: Cannot open database because RavenDB was unable create file lock on: '/data/RavenData/System/system.lock'. File system type: smb2
---> System.UnauthorizedAccessException: Access to the path '/data/RavenData/System/system.lock' is denied.
---> System.IO.IOException: Permission denied
  --- End of inner exception stack trace ---
  at System.IO.Strategies.FileStreamHelpers.CheckFileCall(Int64 result, String path, Boolean ignoreNotSupported)
  at System.IO.Strategies.FileStreamHelpers.Lock(SafeFileHandle handle, Boolean canWrite, Int64 position, Int64 length)
  at Raven.Server.Utils.FileLocker.TryAcquireWriteLock(Logger logger, Int32 numberOfRetries) in D:\Builds\RavenDB-Stable-5.4\54136\src\Raven.Server\Utils\FileLocker.cs:line 36

I then came across this existing issue:
Request for SMB Mount Support in Azure Container Apps for Volume Persistence · Issue #18330 · ravendb/ravendb (github.com)

There is also the comment there:

You want to run in (temporary) containers, but also use them for persistence. That leads to potential issues with multiple instances running on the same files (which leads to complicated and unsafe locking).
In general, it is atypical to run persistence in this manner for containers. The usual manner is to host the database externally, with databases per client. That would also ensure high availability for your usage.

I am starting to think that running RavenDB in a container with Azure Container Apps is not currently supported, perhaps due to the way that Raven is attempting to acquire the writer lock. This may mean that the DB needs to be hosted outside of our Azure Container Apps Environment, which would then mean that the DBs for the Service Control error and audit instances would be on a network external to the container.

I know the above is not a Particular issue. It rather it looks like a RavenDB container + Azure Container Apps issue. I thought I’d mention it here anyway. I will post here if I find a solution to this.

ivanl · July 29, 2024, 11:08pm

Looks like my issue was lack of mount options. These mount options work for RavenDB persistence:

dir_mode=0777,file_mode=0777,uid=1001,gid=1001,mfsymlinks,nobrl

DarrenTiplady · August 8, 2024, 9:38am

Hi,

First time poster!
I’ve been attempting to get this setup on our AKS (Azure Kubernetes Service) environment new to me.

I have currently got an nginx ingress which exposes the ServicePulse web, in the configuration of service pulse I am using the “service” name for both the servicecontrol and servicecontrol-monitoring instances. If I CURL from the pulse pod I can access both of these via their respective default ports 33333 and 33633.

However from the service-pulse hosted application this doesn’t work ? Is this something that the reverse proxy work going through the service pulse host would aim to enable.

Right now I’ve exposed both sc and sc-monitoring through my nginx-ingress which seems to work but not sure how this kind of setup would be secured or if it’s even the correct way to accomplish this ?

Thanks for all the work so far
Regards
Darren

DavidBoike · August 8, 2024, 1:19pm

Hey @DarrenTiplady, that’s exactly what the ServicePulse reverse proxy is for. We’re close to shipping that, stay tuned!

DavidBoike · August 13, 2024, 6:25pm

@DarrenTiplady ServicePulse 1.41.0 is now out.

Here is the release announcement.

Check out the updated documentation for running ServicePulse in a container which talks about the reverse proxy feature.

Note there are some changes - the container now exposes port 9090 instead of 90 and the way you specify the URL for ServiceControl isn’t quite the same as it was.

rick · August 22, 2024, 10:25am

I’m having difficulty getting started with this, so I would appreciate some help. Eventually my aim is to run it on Azure Container Apps like @ivanl, but for now I’m just trying it locally on my Windows laptop.

Minor point: the docs on particular/servicecontrol refer to
particular/servicecontrol-ravendb:latest-x64 but that tag doesn’t exist, so I’m using latest.
The requirement to run with --setup first is difficult. I’ve seen the thread that points to ramonsmits/ServiceControl-Docker) but I can’t get it to mount the volume, so I get an error saying the entrypoint script is not found. For example, with the script in an error folder below compose.yaml:
```
docker run -v ./error:/error --entrypoint /error/entrypoint.sh particular/servicecontrol:latest

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/error/entrypoint.sh": stat /error/entrypoint.sh: no such file or directory: unknown.
```
This is my current problem and I’d appreciate any advice. I don’t know Linux so I’m aware this could be a Linux thing I’m missing about what paths can be mounted.
Thinking ahead to Azure deployment, if --setup is just creating queues, is it possible to document the queues that need creating? In Azure I’m already following advice not to use EnableInstallers() but to create queues ahead of time instead. If I could do that again from documentation it might avoid the need to run --setup.

DavidBoike · August 22, 2024, 7:44pm

Hey @rick,

Here are some answers to your questions:

We missed that, thanks for pointing it out! Now that the documentation is updated, we’re going to do an update to the Docker Hub READMEs to fix a lot of this stuff.
We feel your pain on this, and we’re actively working on how to make this easier.
- Don’t look at Ramon’s docker compose anymore. The ServiceControl apps use a “chiseled” base image that does not have a shell, so having a .sh file to run the app with --setup and then normally will not work.
- The next release will have a “setup + run” option so that you can have a single container, making this whole issue go away.
You wouldn’t want to do that because while --setup creates queues, there are a lot of other things setup might do at some point in the future, as any “migration stuff” from version to version would be in there too. So skipping --setup would be a good way to ensure at some point you’ll have an upgrade break on you.

rick · August 23, 2024, 9:21am

Thank you for that. I look forward to the setup + run version.

Other docs I’ve noticed:

there’s some raw markdown on Installing ServiceInsight
On that same page, I would like to see some guidance on which URLs to connect to. It seems to accept http://localhost:33633, http://localhost:44444/api and http://localhost:33333/api. This was just from trial and error - I only got the /api part from someone’s post, not the documentation.
Also on the same page, it would be useful to link to the advice about enabling auditing with endpointConfiguration.AuditProcessedMessagesTo() and saga auditing with endpointConfiguration.AuditSagaStateChanges().
On the ServiceControl container page it would be useful to note that you may need triple-quotes for the JSON values (I’m not sure yet if this is because I’m running Windows or because my prompt is PowerShell.) My command line includes -e RemoteInstances='[{"""api_uri""":"""http://host.docker.internal:44444/api"""}].
On Send Metrics data to ServiceControl • Metrics.ServiceControl • Particular Docs it says the default instance name is particular.monitoring and includes sample code with that name, but I found monitoring data only started appearing in ServicePulse when I used Particular.Monitoring, which is the name I can see in RabbitMQ when using all the recommended settings for containers.

These are the questions I have at the moment:

It seems like the error instance is designed to look at one error queue. We have one error queue per endpoint. Does that mean we’d run multiple error instances of ServiceControl, or do we have our topology wrong and should switch to a combined error queue? Our system is still in development so we can change easily.
I have ServicePulse displaying monitoring data on the ‘Monitoring’ page, aka ‘Endpoints overview’. However Configuration > Usage setup > Diagnostics still has a red x and the message “No throughput from Monitoring recorded in the last 30 days. Listening on queue ServiceControl.ThroughputData.” RabbitMQ shows that queue exists and has had some messages when I’ve generated activity in the system. What else might I be missing here?
I put a deliberate exception in the last step of my saga for testing. ServicePulse shows that failed message, but the flow diagram doesn’t show any of the messages leading up to it. Should it, and if so what might I be missing?

Update
I’ve now seen the advice not to connect ServiceInsight direct to the error queue (although I can’t now find where that advice was…), so I’ve removed the connection to http://localhost:33333/api. I’ve also removed the connection to http://localhost:33633 as it appears monitoring data is only surfaced in ServicePulse.

DavidBoike · August 26, 2024, 7:48pm

Yeah @rick we haven’t gotten to the ServiceInsight docs yet. We might have to touch ServiceInsight soon. Stay tuned.

I don’t know why you’d need triple-doublequotes inside a single-quoted thing. That seems weird?

On error queues, yes the error instance only looks at 1 error queue. You will only make things more difficult having an error queue per endpoint. Messages sent to the error queue have a header so they know where to be returned to on a replay.

Everything else you’ve mentioned, I’m not really sure is within the purview of our containerization effort and would be better off handled by our non-critical support team.

DavidBoike · August 26, 2024, 7:50pm

By the way, we’ve just released ServiceControl 5.7.0 which includes a new --setup-and-run parameter that can be used to do the container initialization without needing a separate init container.

Here is the new documentation for each instance type … spoiler alert it’s all basically the same text:

rick · August 29, 2024, 8:11am

@DavidBoike Thank you for your help, and the new versions.

FYI on the quotes, if I run from a Windows PowerShell prompt with a normally-quoted string I get the error System.Text.Json.JsonException: 'a' is an invalid start of a property name. Expected a '"'. If I run from a cmd prompt I get a different error System.Text.Json.JsonReaderException: ''' is an invalid start of a value.

My next problem is trying to get the v5.7.0 images working from docker compose. Focusing on just the audit instance for now, this is my compose.yaml file:

services:
  rabbitmq:
    image: rabbitmq:3-management
    container_name: sv_rabbit
    ports:
      - "5672:5672"
      - "15672:15672"

  servicecontrol_ravendb:
    image: particular/servicecontrol-ravendb:latest
    container_name: sv_servicecontrol_ravendb
    ports:
      - "8080:8080"

  servicecontrol_audit:
    image: particular/servicecontrol-audit:latest
    container_name: servicecontrol_audit
    depends_on:
      - rabbitmq
      - servicecontrol_ravendb
    ports:
      - "44444:44444"
    environment:
      - TransportType=RabbitMQ.QuorumConventionalRouting
      - ConnectionString='host=host.docker.internal'
      - RavenDB_ConnectionString='http://host.docker.internal:8080'
    entrypoint: /app/ServiceControl.Audit --setup-and-run

The same arguments used from docker run work, but this throws the following exception:

servicecontrol_audit       | 2024-08-29 08:00:26.0805|1|Error|Program|Unhandled exception was caught.|System.ArgumentException: Format of the initialization string does not conform to specification starting at index 0.
servicecontrol_audit       |    at System.Data.Common.DbConnectionOptions.GetKeyValuePair(String connectionString, Int32 currentPosition, StringBuilder buffer, Boolean useOdbcRules, String& keyname, String& keyvalue)
servicecontrol_audit       |    at System.Data.Common.DbConnectionOptions.ParseInternal(Dictionary`2 parsetable, String connectionString, Boolean buildChain, Dictionary`2 synonyms, Boolean firstKey)
servicecontrol_audit       |    at System.Data.Common.DbConnectionOptions..ctor(String connectionString, Dictionary`2 synonyms, Boolean useOdbcRules)
servicecontrol_audit       |    at System.Data.Common.DbConnectionStringBuilder.set_ConnectionString(String value)
servicecontrol_audit       |    at NServiceBus.Transport.RabbitMQ.ConnectionConfiguration.ParseNServiceBusConnectionString(String connectionString, StringBuilder invalidOptionsMessage) in /_/src/NServiceBus.Transport.RabbitMQ/Configuration/ConnectionConfiguration.cs:line 123
servicecontrol_audit       |    at NServiceBus.Transport.RabbitMQ.ConnectionConfiguration.Create(String connectionString) in /_/src/NServiceBus.Transport.RabbitMQ/Configuration/ConnectionConfiguration.cs:line 57
servicecontrol_audit       |    at NServiceBus.RabbitMQTransport..ctor(RoutingTopology routingTopology, String connectionString, Boolean enableDelayedDelivery) in /_/src/NServiceBus.Transport.RabbitMQ/RabbitMQTransport.cs:line 60
servicecontrol_audit       |    at ServiceControl.Transports.RabbitMQ.RabbitMQConventionalRoutingTransportCustomization.CreateTransport(TransportSettings transportSettings, TransportTransactionMode preferredTransactionMode) in /_/src/ServiceControl.Transports.RabbitMQ/RabbitMQConventionalRoutingTransportCustomization.cs:line 25
servicecontrol_audit       |    at ServiceControl.Transports.TransportCustomization`1.ProvisionQueues(TransportSettings transportSettings, IEnumerable`1 additionalQueues) in /_/src/ServiceControl.Transports/TransportCustomization.cs:line 114
servicecontrol_audit       |    at ServiceControl.Audit.Infrastructure.Hosting.Commands.SetupCommand.Execute(HostArguments args, Settings settings) in /_/src/ServiceControl.Audit/Infrastructure/Hosting/Commands/SetupCommand.cs:line 34
servicecontrol_audit       |    at ServiceControl.Audit.Infrastructure.Hosting.Commands.CommandRunner.Execute(HostArguments args, Settings settings) in /_/src/ServiceControl.Audit/Infrastructure/Hosting/Commands/CommandRunner.cs:line 12
servicecontrol_audit       |    at Program.<Main>$(String[] args) in /_/src/ServiceControl.Audit/Program.cs:line 30
servicecontrol_audit       |    at Program.<Main>(String[] args)

DarrenTiplady · August 29, 2024, 1:53pm

Hi @DavidBoike ,

Thanks for the updates, I have now also implemented the new container setup and run which works nicely.

Maybe a misunderstanding but I don’t seem to be able to get ServicePulse working now with the new changes.

I have the following environment configuration for service pulse

apiVersion: v1
kind: Service
metadata:
  name: servicepulse
  namespace: nservicebus
spec:
  ports:
    - port: 80
      targetPort: 9090
  selector:
    app: servicepulse

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: servicepulse
  namespace: nservicebus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: servicepulse
  template:
    metadata:
      labels:
        app: servicepulse
    spec:
      containers:
        - name: servicepulse
          image: particular/servicepulse:latest
          ports:
            - containerPort: 9090
          env:
            - name: SERVICECONTROL_URL
              value: "http://servicecontrol:33333"
            - name: MONITORING_URL
              value: "http://servicecontrol-monitoring:33633"
            - name: SHOW_PENDING_RETRY
              value: "true"
            - name: DEFAULT_ROUTE
              value: "/dashboard"

The services and pods for service-control are named servicecontrol and servicecontrol-monitoring. When I look on ServicePulse and try test the connection it doesn’t work. I assume how this would work is the requests get proxied via service-pulse and communicate to the service control and monitoring containers on the local network.

Not sure what else I’ve done wrong in the configuration, any help is appreciated.

DarrenTiplady · August 29, 2024, 3:57pm

So this appears to of been a problem I believe in my ingress as removing the first slash in both /api/ and /api-monitoring/ fixed it if anyone else was having issues…

DavidBoike · September 6, 2024, 10:22pm

Hi all,

Sorry we’ve been a bit quiet lately. We were testing with the ServiceControl images in Azure and ran into a problem that required us to engage with Microsoft support. Long story short, Microsoft rolled out a Linux kernel change related to FIPS mode in Azure that caused all the base images in the Noble family (Ubuntu 24) to blow up if you used anything in the System.Cryptography namespace. So for a while, ServiceControl 5.6.x and 5.7.x containers were sometimes failing in Azure until today when the Azure team completed a rollback of that FIPS change. Now all versions should run properly.

Check out our new platform container examples repository. In there we already have a Docker Compose example that’s great for spinning up the platform locally, and an Azure Container Apps example that is a proof of concept for deploying the platform to Azure with a ServicePulse endpoint secured by Microsoft Entra ID authentication…or in other words you log into Azure to access ServicePulse.

We’re still working on a Kubernetes example.

We haven’t linked to this repository from our docs yet, but we’re putting it in a different repository for a few different reasons:

We want to make it a lot easier to interact with and contribute to the repository without the ceremony of a sample in our documentation repo.
Separate repos get a nice handy built-in download as zip link.

We hope these are useful. Let us know!

DavidBoike · September 6, 2024, 10:26pm

A couple replies…

@DarrenTiplady glad you figured that issue out!

@rick I suspect it’s not reading the environment variables like you think it is and so it’s actually trying to parse a null/empty connection string. (Though I’m not sure why.) In any case I’d recommend you start with the new Docker Compose example I mentioned above.

wsmelton · September 11, 2024, 7:53pm

Is there any plan for Kubernetes to get a publish a helm chart?