The Two Types of Service Architects

Tomas Restrepo comments on my recent SSB and WCF posts:

Harry Pierson asks how well WCF supports long running tasks. He suggests that WCF does not support them very well, and says that’s one reason he likes SQL Server Service Broker so much. I’d say SSSB is a good match only as long as the long running tasks you’re going to be executing are purely database driven and can be executed completely within the database. Sure, this is an “expanded universe” with the CLR support in SQL Server 2005, but even so it makes me nervous at times 😄

You could also consider using a custom service with MSMQ or something like BizTalk Server for this if you had long running processes that were not completely tied to the DB (or a single DB for that matter).

Sam Gentile follows up:

In that same post, but I needed to call it out separate, Tomas rightfully says, “I’d say SSSB is a good match only as long as the long running tasks you’re going to be executing are purely database driven and can be executed completely within the database,” in response to Harry liking Service Broker so much. Talk about a narrow edge case. That’s way I never really got excited or cared about Service Broker. Its a narrow solution to a special edge case when everything is database driven and can be executed totally inside the database. That’s the old Microsoft Data-Driven Architecture for sure. Me, I’d rather have a rich Domain-Driven architecture most of the time. Then if you have Oracle databases in your architecture too, where does it leave you? Nowhere.

As you might expect, I have a few comments,  clarifications and corrections.

First, Tomas’ statement that Service Broker only supports service logic “executed completely within the database” in flat out wrong. Service Broker can be used from any environment that can connect to SQL Server and execute DML statements. If you can call SELECT/INSERT/UPDATE/DELETE, then you can also call BEGIN DIALOG/SEND/RECEIVE/END CONVERSATION. This includes Windows apps and services, web apps and services, console apps and even Java apps. Of course, you can also access Service Broker from stored procedures if you wish, but you’re not limited to them as Tomas suggested.

Tomas’ misconception may come from a feature of Service Broker called Activation. Activation is a feature of Service Broker that dynamically scales message processing to match demand. For example, Service Broker can be configured to launch a new instance of a specified stored procedure if messaging processing isn’t keeping up with incoming message traffic on a given queue. This is called internal activation and because it uses stored procedures it does execute within the database as Thomas said. Service Broker also supports external activation where it notifies an external application when activation is needed. You do have to build an application to host your service logic and handle these notifications, but that application doesn’t execute within the database. So while you could argue that it’s easier to execute your service logic within the database (no need to build a separate host app), it’s not required.

Given that you don’t have host your service logic in the database, then you’re also not limited to “a single DB” as Tomas suggests. You don’t, in fact, have to put your Service Broker queues in the same database with your business data. So if you have Oracle in your environment, like the scenario Sam mentioned, you would host your service logic in an external application that processed messages from a queue in a SQL 2005 database while accessing and modifying business data from tables in the Oracle database. Using multiple databases does require using distributed instead of local transactions, but if you’re using MSMQ as Tomas recommended, you’re already stuck with the DTC anyway.

Finally, I didn’t get Tomas’ “purely database driven” or Sam’s “everything is database driven” comments at all. While there are exceptions, the vast majority of systems I’ve ever seen/built/designed have essentially been one or more stateless tiers sitting in front of a stateful database. If it’s a traditional three tier web app, there’s a stateless presentation tier, a stateless business logic tier and a stateless data access logic tier. For a web service, there’s no presentation tier, but there’s is the stateless SOAP processing tier typically provided by the web service stack. Does this mean the vast majority of web apps and services are  “purely database driven” too? If so, then I guess it’s a good thing, right?

In the end, maybe there are two types of service architects – those that believe the majority of services will be atomic and those that believe the majority of services will be long running. For atomic services, Service Broker is overkill. But if it turns out that most services are long running, WCF’s lack of support is going to be a pretty big roadblock.

I’m obviously in the long running camp. I’m not sure, but I get the feeling this is the less popular camp, at least for now. We’ll have to wait to see, but I do know is that whenever someone brings me what they think is an atomic business scenario, it doesn’t take much digging to reveal that the atomic scenario is actually a single step of a long running business scenario that also needs to be automated.

Here’s a question for Tomas, Sam and the rest of you: Which group do you self select into? Are most services going to be atomic or long running in the (pardon the pun) long run?

Essential Windows Workflow Foundation

On Don’s recommendation, I picked up Essential WF. In the forward, Don writes “[S]omething big is about to happen.” I’m only part way thru chapter one, and this is already a must read. Go get it. Now.

In the preface, they define the term “Reactive Program”, which I’m adding to my personal lexicon.

“Windows Workflow Foundation (WF) is a general-purpose programming framework for creating reactive programs that act in response to stimulus from external entities. The basic characteristic of reactive programs is that they pause during their execution, for unknown amounts of time, awaiting input.”

That “unknown amounts of time” is the kicker. Here’s a paragraph from early in chapter one that expands on that:

“Real-world processes take a long time – days, weeks, or even months. It is wishful thinking to assume that the operating system process (or CLR application domain) in which the program begins execution will survive for the required duration.”

Gee, that sounds familiar doesn’t it?

Is WCF “Straightforward” for Long Running Tasks?

My father sent me a link to this article on SOA scalability. He thought it was pretty good until he got to this paragraph:

Long-running tasks become more complex. You cannot assume that your client can maintain a consistent connection to your web service throughout the life of a task that takes 15 minutes, much less one hour or two days. In this case, you need to implement a solution that follows a full-duplex pattern (where your client is also a service and gets notified when the task is completed) or a polling scheme (where your client checks back later to get the results). Both of these solutions require stateful services. This full-duplex pattern becomes straightforward to implement using the Windows Communications Foundation (Indigo) included with .NET 3.0.

When I first saw duplex channels in WCF, I figured you can use them for long running tasks also. Turns out that of the nine standard WCF bindings, only four support duplex contracts. Of those four, one is designed for peer-to-peer scenarios and one uses named pipes so it doesn’t work across the network, so they’re obviously not usable in the article’s scenario. NetTcp can only provide duplex contracts within the scope of a consistent connection, which the author has already ruled out as a solution. That leaves wsDualHttp, which is implemented much as the author describes, where both client and the service are listening on the network for messages. There’s even a standard binding element – Composite Duplex – which ties two one-way messaging channels into a duplex channel.

Alas, the wsDualHttp solution has a few flaws that render it – in my opinion at least – unusable for exactly these sorts of long running scenarios. On the client side, while you can specify the ClientBaseAddress, you can’t specify the entire ListenUri. Instead, wsDualHttp generates a random guid and tacks it on the end of your ClientBaseAddress, effectively creating a random url every time you run the client app. So if you shut down and restart your client app, you’re now listening on a different url than the one the service is going to send messages to and the connection is broken. Oops.

The issues don’t end there. On the service side of a duplex contract, you get an object you can use to call back to the client via OperationContext.Current.GetCallbackChannel. This works fine, as long as you don’t have to shut down your service. There’s no way to persist the callback channel information to disk and later recreate it. So if you shut down and restart your service, there’s no way to reconnect with the client, even if they haven’t changed the url they’re listening on. Oops.

So in other words, WCF can do long running services using the wsDualHttp binding, as long as you don’t restart the client or service during the conversation. Because that would never ever happen, right?

This is part of the reason why I’m sold on Service Broker. From where I sit, it looks like WCF can’t handle long running operations at all – at least, not with any of the built in transports and bindings. You may be able to build something custom that would work for long running services, I’m not a deep enough expert on WCF to know. From reading what Nicholas Allen has to say about CompositeDuplex, I’m fairly sure you could work around the client url issue if you built a custom binding element to set the ListenUriBaseAddress. But I have no idea how to deal with the service callback channel issue. It doesn’t appear that the* *necessary plumbing is there at all to persist and rehydrate the callback channel. If you can’t do that, I don’t see how you can reliably support long running services.

Custom Authentication with WCF is Top Shelf

I’ve spent the last three days heads down in WCF security and color me massively impressed. I just checked in a prototype that provides customized authentication for a business service. The idea that you could bang up a custom authentication service fairly easily blows my mind.

The cornerstone to this support in WCF is the standard WSFederationHttpBinding. While the binding name implies support for WS-Federation which in turn implies the use of infrastructure like Active Directory Federation Services, the binding also scales down to support simple federation scenarios with a single Security Token Service (aka STS) as defined by WS-Trust. WS-Trust appears similar to Kerberos. If you want to access a service using the federation binding, you first obtain a security token from the associated STS. Tokens contain SAML assertions, which can be standard – such as Name and Windows SID – or entirely custom, which opens up very interesting and flexible security scenarios.

If you want to support multiple authentication systems (windows, certificates, CardSpace, PassportWindows Live ID, etc), STS is perfect because you can centralize the multiple authentication schemes at the STS, which then hands out a standard token the business service understands. Adding a new auth scheme can happen centrally at the STS rather than in each and every service. Support for multiple authentication schemes was the focus of our current prototype and it worked extremely well.

WCF includes a federation sample which is where you should start if you’re interested in this stuff. That scenario includes a chain of two STS’s. Accessing the secure bookstore service requires authenticating against the bookstore STS which in turn requires authenticating against a generic “HomeRealm” STS. Since there are two STS’s, they factored the common STS code into a shared assembly. You can use that common code to build an STS of your own.

For our prototype, we made only minor changes to the common STS code from the sample. In fact, the only significant change we made was to support programmatic selection of the proof key encryption token. In the sample, both the issuer token and the proof key encryption token are hard coded (passed into the base class constructor). The issuer token is used to sign the custom security token so the target service knows it came from the STS. The encryption token is used to – you guessed it – encrypt the token so it can only be used by the target service. Hard-coding the encryption token means you can only use your STS with a single target service. We changed that so the encryption token can be chosen based on the incoming service token request.

Of course, it wasn’t all puppy dogs and ice cream. While I like the config system of WCF, anyone who calls it “easy” is full of it. I’ve spend most of the last three days looking at config files. Funny thing about config files is that they’re hard to debug. So most of my effort over the last few days has been in a cycle of run app / app throws exception / tweak config / repeat. Ugh.

Also, while the federation sample is comprehensive, I wonder why this functionality isn’t in the based WCF platform. For example, the sample includes implementations of RequestSecurityToken and RequestSecurityTokenResponse, the input and output messages of the STS. But I realized that WCF has to have its own implementations of RST and RSTR as well, since it has to send the RST to the STS and process the RSTR it gets in response. A little spelunking revealed the presence of an official WCF implementation of RST and RSTR, both marked internal. I normally fall on the pragmatic side of the internal/public debate, but this one makes little sense to me.

Otherwise, the prototype went smooth as silk and my project teammates were very impressed at how quickly this came together. Several of the project teams we’re working with have identified multiple authentication as the “killer” capability they’re looking to us to provide, so it’s good to know we’re making progress in the right direction.

FeedFlare Finally Fixed

I moved over to FeedBurner a while back. DasBlog has great support for FeedBurner – all you do set your FeedBurner feed name in the DasBlog config and it handles the rest, including permanently redirecting your readers to the new feed.

However, I haven’t been able to make FeedFlares work today. FeedFlares “build interactivity into each post” with little links like “Digg this”, “Email this” or “Add to del.icio.us”. Since FeedBurner is serving the XML feed, it’s no big deal for them to add those links into the RSS feed. But to get those same flares to work on the web site, you have to embed a little script at the end of each item. Scott shows how to do this with DasBlog, except that it didn’t work for me. I’ve tried off and on, but for some reason, the FeedBurner script file I was including was always empty.

Then I noticed the other day that my post WorkflowQueueNames had the flare’s on them. Hmm, why would that post work and none of the rest of mine work? Turns out that it works because there’s no spaces in the title. Unlike most of the rest of the DasBlog community, I’m using ‘+’ for spaces in my permalinks, instead of removing them. So I get http://devhawk.net/FeedFlare+Finally+Fixed.aspx as the permalink url instead of http://devhawk.net/FeedFlareFinallyFixed.aspx. In fact, that feature is in DasBlog because I pushed for it (a fact Scott reminded me of while I was troubleshooting this last night). And it was breaking the FeedFlares.

The solution is to URL encode the ‘+’, which is %2B, in the FeedFlare script link. I created a custom macro, since I already had a few custom macro’s powering this site anyway, and now I get the FeedFlares on all my blog entries. I’ll also go update the DasBlog source, but creating a custom macro was both easier and less risky than patching the tree and upgrading everything.