Wednesday, May 28, 2008

Denodo Helps Mesh Enterprise Data

Every now and then you read something along the lines of “more words will be published in the next 30 minutes than in the whole of human history up to the battle of Lepanto”. Whether this is literally true or not (and who could know?), there is certainly more data sloshing around than ever. This is partly because there are so many more transactions, partly because each transaction generates so much more detail, and partly because so much previously hidden data is now exposed on the Internet. The frustration for marketers is that they know this data could be immensely useful, if only they could employ it effectively.

This is the fundamental reason I spend so much time on database technology (to manage all that data), analytics (to make sense of it) and marketing automation (to do something useful with it). Gaining value from data is the greatest new challenge facing marketers today—as opposed to old but still important challenges like managing your brand and understanding your customers. Since the answers are still being discovered, it’s worth lots of attention.

One subject I haven’t explored very often is mining data from the public Internet (as opposed to data captured on your own Web site—the “private” Internet, if you will). Marketers don’t seem terribly concerned with this, apart from specialized efforts to track comments in the press, blogs, social networks and similar forums. Technologists do find the subject quite fascinating, since it offers the intriguing challenge of converting unstructured data into something more useful, preferably using cool semantic technology. It doesn’t hurt that tons of funding is available from government agencies that want to spy on us (for our own protection, of course). The main marketing application of this work has been building business and consumer profiles with information from public Web sources. Zoominfo may be the best known vendor in this field, although there are many others.

But plenty of other interesting work has been going on. I recently spoke with Denodo, which specializes in what are called “enterprise data mashups”. This turns out to be a whole industry (maybe you already knew this—I admit that I need to get out more). See blog posts by Dion Hinchcliffe here and here for more than I’ll ever know about the topic. What seems to distinguish enterprise mashups from the more familiar widget-based Web mashups is that the enterprise versions let developers take data from sources they choose, rather than sources that have already been formatted for use.

Since Denodo is the only mashup software I’ve examined, I can’t compare it with its competitors. But I was quite impressed with what Denodo showed me. Basically their approach is to build specialized connectors, called “wrappers,” that (a) extract specified information from databases, Web sites and unstructured text sources, (b) put it into queryable structure, and (c) publish it to other applications in whatever format is needed. Each of these is easier said than done.

Denodo showed me how it would build a wrapper to access competitive data exposed on a Web site—in this case, mobile phone rate plans. This was a matter of manually accessing the competitor’s site, entering the necessary parameter (a Zip code), and highlighting the result. Denodo recorded this process, read the source code of the underlying Web page, and developed appropriate code to repeat the steps automatically. This code was embedded in a process template that included the rest of the process (restructuring the data and exposing it). According to Denodo, the wrapper can automatically adjust itself if the target Web page changes: this is a major advantage since links might otherwise break constantly. If the Web page changes more than Denodo can handle, it will alert the user.

As I mentioned, Denodo will place the data it retrieves into a queryable format—essentially, an in-memory database table. It could also copy the data into a physical database if desired, although this is an exception. The data can be sorted and otherwise manipulated, and joined with data from other wrappers using normal database queries. Results can be posted back to the original sources or be presented to external systems in pretty much any format or interface: HTML, XML, CSV, ODBC, JDBC, HTTP, Web service, and the rest of the usual alphabet soup. Denodo can join data using inexact as well as exact matches, allowing it to overcome common differences in spelling and format.

The technicians among you may find this terribly exciting, but to most marketers it is pure gobbledygook. What really matters to them is the applications Denodo makes possible. The company cites several major areas, including gathering business and competitive intelligence; merging customer data across systems; and integrating business processes with Web sites.

Some of these applications resemble the integration-enabled interaction management offered by eglue (click here for my post). The difference is Denodo’s greater ability to access external data sources, and what I believe is a significantly more sophisticated approach to data extraction. On the other hand, eglue offers richer features for presenting information to call center agents. It does appear that Denodo significantly lowers the barriers to many kinds of data integration, which should open up all sorts of new possibilities.

The price seems reasonable, given the productivity benefits that Denodo should provide: $27,000 to $150,000 per CPU based on the number of data sources and other application details. An initial application can usually be developed in about two weeks.

Denodo was founded in Spain in 1999. The company has recently expanded outside of Europe and now has nearly 100 customers worldwide.

Thursday, May 22, 2008

For Behavior Detection, Simple Triggers May Do the Trick

I was in the middle of writing last week’s post, on marketing systems that react to customers’ Web behavior, when I got a phone call from a friend at a marketing services agency who excitedly described his firm’s success with exactly such programs. Mostly this confirmed my belief that these programs are increasingly important. But it also prompted me to rethink the role of predictive modeling in these projects.

To back up just a bit, behavioral targeting is a hot topic right now in the world of Web marketing. It usually refers to systems that use customer behavior to predict which offers a visitor will find most attractive. By displaying the right offer for each person, rather than showing the same thing to everyone, average response rates can be increased significantly.

This type of behavioral targeting relies heavily on automated models that find correlations between the a relatively small amount of data and subsequent choices. Vendors like Certona and [X+1]tell me they can usually make valuable distinctions among visitors after as few as a half-dozen clicks.

At the risk of stating the obvious, this works because the system is able to track the results of making different offers. But this simple condition is not always met. The type of behavior tracking I wrote about last week—seeing which pages a visitor selected, what information they downloaded, how long they spent in different areas of the site, how often they returned, and so on—often relates to large, considered purchases. The sales cycle for these extends over many interactions as the customer educates herself, gets others involved for their opinions and approvals, speaks with sales people, and moves slowly towards a decision. A single Web visit rarely results in an offer that is rejected or accepted on the spot. Without a set of outcomes—that is, a list of offers that were accepted or rejected—predictive modeling systems don’t have anything to predict.

If your goal is to find a way to do predictive modeling, there are a couple of ways around this. One is to tie together the string of interactions and link them with the customer’s ultimate purchase decision. This can be used to estimate the value of a lead in a lead scoring system. Another solution is to make intermediate offers during each interaction, of “products” such as white papers and sales person contacts. These could be made through display ads on the Web site or something more direct like an email or phone call. The result is to give the modeling system something to predict. You have to be careful, of course, to check the impact of these offers on the customer’s ultimate purchase behavior: a phone call or email might annoy people (not to mention reminding them that you are watching). Information such as comparisons with competitors may be popular but could lead them to delay their decision or even end up purchasing something else.

Of course, predictive modeling is not an end in itself, unless you happen to sell predictive modeling software. The business issue is how to make the best use of the information about detailed Web (and other) behaviors. This information can signal something important about a customer even if it doesn’t include response to an explicit offer.

As I wrote last week, one approach to exploiting this information is to let salespeople review it and decide how to react. This is expensive but make sense where a small number of customers to monitor have been identified in advance. Where manual review is not feasible, behavior detection software including SAS Interaction Management, Unica Affinium Detect, Fair Isaac OfferPoint, Harte-Hanks Allink Agent, Eventricity and ASA Customer Opportunity Advisor can scan huge volumes of information for significant patterns. They can then either react automatically or alert a sales person to take a closer look.

The behavior detection systems monitor complex patterns over multiple interactions. These are usually defined in advance through sophisticated manual and statistical analysis. But trigger events can also be as basic as an abandoned shopping cart or search for information on pricing. These can be identified intuitively, defined in simple rules and captured with standard technology. What’s important is not that sophisticated analytics can uncover subtle relationships, but that access to detailed data exposes behavior which was previously hidden. This is what my friend on the phone found so exciting—it was like finding gold nuggets lying on ground: all you had to do was look.

That said, even simple behavior-based triggers need some technical support. A good marketer can easily think of triggers to consider: in fact, a good marketer can easily think of many more triggers than it’s practical to exploit. So a testing process, and system to support the process, is needed to determine which triggers are actually worth deploying. This involves setting up the triggers, reacting when they fire, and measuring the short- and long-term results. The process can never be fully automated because the trigger definitions themselves will come from humans who perceive new opportunities. But it should be as automated as possible so the company can test new ideas as conditions change over time.

Fortunately, the technical requirements for this sort of testing and execution are largely the same as the requirements for other types of marketing execution. This means that any good customer management system should already meet them. (Another way to look at it: if your customer management system can’t support this, you probably need a new one anyway.)

So my point, for once, is not that some cool new technology can make you rich. It’s that you can do cool new things with your existing technology that can make you rich. All you have to do is look.

Thursday, May 15, 2008

Demand Generation Systems Shift Focus to Tracking Behavior

Over the past few months, I’ve had conversations with “demand generation” software vendors including Eloqua, Vtrenz and Manticore, and been on the receiving end of a drip marketing stream from yet another (Moonray Marketing, lately renamed OfficeAutoPilot).

What struck me was that each vendor stressed its ability to give a detailed view of prospects’ activities on the company Web site (pages visited, downloads requested, time spent, etc.) The (true) claim is that this information gives a significant insight into the prospect’s state of mind: the exact issues that concern them, their current degree of interest, and which people at the prospect company were involved. Of course, the Web information is combined with conventional contact history such as emails sent and call notes to give a complete view of the customer’s situation..

Even though I’ve long known it was technically possible for companies can track my visits in such detail, I’ll admit I still find it a bit spooky. It just doesn’t seem quite sporting of them to record what I’m doing if I haven’t voluntarily identified myself by registration or logging in. But I suppose it’s not a real privacy violation. I also know that if this really bothered me, I could remove cookies on a regular basis and foil much of the tracking.

Lest I comfort myself that my personal behavior is more private, another conversation with the marketing software people at SAS reminded me that they use the excellent Web behavior tracking technology of UK-based Speed-Trap to similarly monitor consumer activities. (I originally wrote about the SAS offering, called Customer Experience Analytics, when it was launched in the UK in February 2007. It is now being offered elsewhere.) Like the demand generation systems, SAS and Speed-Trap can record anonymous visits and later connect them to personal profiles once the user is identified.

Detailed tracking of individual behavior is quite different from traditional Web analytics, which are concerned with mass statistics—which pages are viewed most often, what paths do most customers follow, which offers yield the highest response. Although the underlying technology is similar, the focus on individuals supports highly personalized marketing.

In fact, the ability of these systems to track individual behavior is what links their activity monitoring features to what I have previously considered the central feature of the demand generation systems: the ability to manage automated, multi-step contact streams. This is still a major selling point and vendors continue to make such streams more powerful and easier to use. But it no longer seems to be the focus of their presentations.

Perhaps contact streams are no longer a point of differentiation simply because so many products now have them in a reasonably mature form. But I suspect the shift reflects something more fundamental. I believe that marketers now recognize, perhaps only intuitively, that the amount of detailed, near-immediate information now available about individual customers substantially changes their business. Specifically, it makes possible more effective treatments than a small number of conventional contact streams can provide.

Conventional contact streams are relatively difficult to design, deploy and maintain. As a result, they are typically limited to a small number of key decisions. The greater volume of information now available implies a much larger number of possible decisions, so a new approach is needed.

This will still use decision rules to react as events occur. But the rules will make more subtle distinctions among events, based on the details of the events themselves and the context provided by surrounding events. This may eventually involve advanced analytics to uncover subtle relationships among events and behaviors, and to calculate the optimal response in each situation. However, those analytics are not yet in place. Until they are, human decision-makers will do a better job of integrating the relevant information and finding the best response. This is why the transformation has started with demand generation systems, which are used primarily in business-to-business situations where sales people personally manage individual customer relationships.

Over time, the focus of these systems will shift from simply capturing information and presenting it to humans, to reacting to that information automatically. The transition may be nearly imperceptible since it will employ technologies that already exist, such as recommendation engines and interaction management systems. These will gradually take over an increasing portion of the treatment decisions as they gradually improve the quality of the decisions they can make. Only when we compare today’s systems with those in place several years from now will we see how radically the situation has changed.

But the path is already clear. As increasing amounts of useful information become accessible, marketers will find tools to take advantage of it. Today, the volume is overwhelming, like oil gushing into the air from a newly drilled well. Eventually marketers will cap that well and use its stream of information invisibly but even more effectively—not wasting a single precious drop.

Thursday, May 08, 2008

Infobright Puts a Clever Twist on the Columnar Database

It took me some time to form a clear picture of analytical database vendor Infobright, despite an excellent white paper that seems to have since vanished from their Web site. [Note: Per Susan Davis' comment below, they have since reloaded it here.] Infobright’s product, named BrightHouse, confused me because it is a SQL-compatible, columnar database, which makes it sound similar to systems like Vertica and ParAccel (click here for my ParAccel entry).

But it turns out there is a critical difference: while those other products rely on massively parallel (MPP) hardware for scalability and performance, BrightHouse runs on conventional (SMP) servers. The system gains its performance edge by breaking each database column into 65K chunks called “data packs”, and reading relatively few packs to resolve most queries.

The trick is that BrightHouse stores descriptive information about each data pack and can often use this information to avoid loading the pack itself. For example, the descriptive information holds minimum and maximum values of data within the pack, plus summary data such as totals. This means that a query involving a certain value range may determine that all or none of the records within a pack are qualified. If all values are out of range, the pack can be ignored; if all values are in range, the summary data may suffice. Only when some but not all of the records within a pack are relevant must the pack itself be loaded from disk and decompressed. According to CEO Miriam Tuerk, this approach can reduce data transfers by up to 90%. The data is also highly compressed when loaded into the packs—by ratios as high as 50:1, although 10:1 is average. This reduces hardware costs and yields even faster disk reads. By contrast, data in MPP columnar systems often takes up as much or more storage space as the source files.

This design is substantially more efficient than conventional columnar systems, which read every record in a given column to resolve queries involving that column. The small size of the BrightHouse data packs means that many packs will be totally included or excluded from queries even without their contents being sorted when the data is loaded. This lack of sorting, along with the lack of indexing or data hashing, yields load rates of up to 250 GB per hour. This is impressive for a SMP system, although MPP systems are faster.

You may wonder what happens to BrightHouse when queries require joins across tables. It turns out that even in these cases, the system can use its summary data to exclude many data packs. In addition, the system watches queries as they execute and builds a record of which data packs are related to other data packs. Subsequent queries can use this information to avoid opening data packs unnecessarily. The system thus gains a performance advantage without requiring a single, predefined join path between tables—something that is present in some other columnar systems, though not all of them. The net result of all this is great flexibility: users can load data from existing source systems without restructuring it, and still get excellent analytical performance.

BrightHouse uses the open source MySQL database interface, allowing it to connect with any data source that is accessible to MySQL. According to Tuerk, it is the only version of MySQL that scales beyond 500 GB. Its scalability is still limited, however, to 30 to 50 TB of source data, which would be a handful of terabytes once compressed. The system runs on any Red Hat Linux 5 server—for example, a 1 TB installation runs on a $22,000 Dell. A Windows version is planned for later this year. The software itself costs $30,000 per terabyte of source data (one-time license plus annual maintenance), which puts it towards the low end of other analytical systems.

Infobright was founded in 2005 although development of the BrightHouse engine began earlier. Several production systems were in place by 2007. The system was officially launched in early 2008 and now has about dozen production customers.

Friday, May 02, 2008

Trust Me: Buyers Worry Too Much About Software Costs

I ranted a bit the other week about buyers who focus too much on software license fees and not enough on differences in productivity. The key to that argument is that software costs are a relatively small portion of companies’ total investment in a business intelligence system. This is self-evident to me, based on personal experience, and seems fairly widely accepted by others in the field. But it’s always nice to have hard numbers to cite as proof.

In search of that, I poked around a bit and found several references to a 2007 study from AMR Research . The study itself, Market Demand for Business Intelligence and Performance Management (BI/PM), 2007 will cost you $4,000. But statistics from it were quoted elsewhere, and showed the following distribution of estimated spending for 2007:

31.4%...internal labor costs
24.7%...software costs
16.2%...integration costs
15.4%...hardware costs
12.3%...outsourced services

The most relevant tidbit is that software accounts for just one quarter of total costs, while labor and outside services combined account for over 40%. I’m not sure what counts as “integration” but suspect that is mostly labor as well, which would raise the total to nearly 60%. This confirms my feeling that people should focus on more than software costs.

A February 2008 study from Aberdeen Group, Managing the TCO of Business Intelligence (payment required), addresses the question of whether buyers do in fact focus primarily on software costs.

The short answer is yes. The long answer is it’s often hard to interpret Aberdeen findings. But when they asked buyers which direct cost criteria were ranked as critical at the time of purchase, the highest rating was indeed software license cost. Looking just at what Aberdeen calls “best-in-class” companies, the figures were:

42%...software license cost
37%...implementation consulting costs
15%...user training services offered
10%...additional hardware costs

Aberdeen also reported priority rankings for indirect costs and ongoing costs. Software fees don’t play much of a role in those areas, but, even so, people still didn’t focus on labor. Instead, the top priorities were “ease of use for end users” for indirect costs and “scalability of data volumes and users” for ongoing costs. Ease of development and ongoing support costs did finally show up as the second and third ranked items under ongoing costs, but that’s digging pretty deep.

Of course, you can’t really combine these two sets of figures. But if you could, you might argue that they show a clear skew in buyer priorities: 42% give highest priority to software costs even though these account for just 25% of expenses. Even though the numbers don’t really prove it, I’d say that’s very close to the truth.