edexe: 2010

Thursday 9 December 2010

expressor studio - new version that is free to download

For those of you who haven't heard, expressor software have released a new version of their semantic data integration software.

The new studio version is free to download and features a new gui and a revamped semantic engine that will make the product much easier to use.

To download the beta version as a taster of what is to come, go to expressor-community.com/expressor-studio-download

edexe partners with Talend

Were pleased to announce our new partnership with Talend (www.talend.com).

Talend is the recognized market leader in open source data management & application integration. Talend revolutionized the world of data integration when it released the first version of Talend Open Studio in 2006.

Talend’s data management solution portfolio now includes data integration (operational data integration and ETL for Business Intelligence), data quality, and master data management (MDM). Through the acquisition of Sopera in 2010, Talend became also a key player in application integration.

Unlike proprietary, closed solutions, which can only be afforded by the largest and wealthiest organizations, Talend makes middleware solutions available to organizations of all sizes, for all integration needs.

Tuesday 15 June 2010

Interesting DI survey

I'm fascinated by this survey on linkedin:

What is the #1 obstacle to successful deployment of ETL/data integration software? - http://bit.ly/ayVt86

It's only a small sample thus far, however the overwhelming view is that non product related problems are the main downfall of data integration deployment. It’s a view I’ve always held and it would seem that many others share my opinion.

Follow edexe on twitter

edexe is now on twitter - http://twitter.com/edexe_ltd

edexe's new youtube channel

edexe now has a youtube channel - http://www.youtube.com/user/edexeDM.

Our first video is a comparison of data prototyping and the traditional waterfall development methodology and how prototyping can help reduce the risk of data projects overrunning.

Monday 17 May 2010

Migration and Integration

Last week I attended the Data Migration Matters conference http://datamigrationmatters.co.uk/ in London and what I learned was that there are differences in approach between integration and migration, however there are common factors, two of which I will cover in this blog.

The first is customer involvement. The data in an organisation is utilised by the business, defined by the business and ultimately informs business decisions, so any project, IT or otherwise that is required to make changes to data ultimately needs business buy in and involvement.

The second is understanding the data. Many projects have failed because of incorrect assumptions and gaps in knowledge that only manifest themselves when a project is in full flight. It is imperative that the source data is mapped out and understood prior to coding.

In some ways these two requirements go hand in hand. To understand and make sense of the data, you need the business to add their experience of using it to the mix. To involve the business, you have to be able to deliver technical information in such a way that it becomes possible to interpret the data in a non-technical way.

This is where data profiling and quality tools come into their own. These tools analyse the source data and present the user with a high level view of the data, enabling the user to view patterns and statistics and relationships at both file and field level.

Profiling information is often a revelation for business users. Operationally the data works, so is deemed fit for purpose, however when profiled it is not uncommon to see genuine problems with the data, such as duplicate records, missing fields and records and often just plain incorrect values. The ability also for drill down to the actual data is imperative in order to show that the statistics marry up to real data on real systems within the organisation.

It is often at this point, when the illusion of “perfect data” evaporates, that the business buys into the project and begins to understand why the ownership of the data and the associated business rules fall squarely within their domain. It is surprising how showing people their own data “warts and all” can have a profound effect on their involvement in a project.

How often have we heard the term “if it isn’t broken, don’t fix it” and for many users their opinion is that their data isn’t broken, so it is perhaps hard to understand why IT make such a fuss during a data migration.

The truth is that for many organisations their data is, to varying degrees, somewhere between broken and fixed and it is only when it is utilised en masse, say for reporting or migration, that problems suddenly begin to appear.

Wednesday 28 April 2010

The reducing cost of data warehousing

This post in many ways relates to the mission for edexe: performance, price and productivity.

As a result of recent activities, I am stunned by how much the cost of data warehousing has reduced in the last couple of years..

MPP (massively parallel processing) technologies have fallen in price; on the ETL/DI front and on the database front, and database automation tools are becoming more popular as a result.

Competition is hot with regards to MPP in the database/appliance market. Teradata and Netezza have been the dominant players for the last 5-6 years, however a host of new appliances, cluster based and columnar databases have hit the market in the last 2-3 years. The increased competition from the likes of Oracle Exxadata, HP Neoview, Aster, Green Plum, Kognitio, Kickfire and Vertica is rapidly bringing the price of MPP database processing down to sub £100k (and below) for entry level databases.

This is bringing the MPP database well within the reaches of mid-tier companies, enabling enterprise performance on a budget.

On the data integration front, expressor’s parallel processing engine delivers speeds to compare against or even beat the most established DI vendors such as Ab Initio, Informatica and Datastage, yet remains priced sub £50k for an entry level DI product. Talend also deliver an MPP option to the market with their MPx version of their Integration suite.

Again high performance DI/ETL products are now available for the mid-tier company.

So I’ve discussed price and performance, what about productivity?

Well the database products deliver significant benefits in terms of productivity over the traditional OLAP databases such as Oracle and SQL Server. The MPP database standard was set by Netezza, delivering MPP performance, with minimal DBA activity. Even in large organisations, Netezza DBAs may only spend 1 day a week maintaining the system. Within the MPP space this is pretty much standard, with self-organising databases being the norm.

Between DI tools, there is little to choose in terms of productivity, however when compared to SQL or other hand cranked coding languages, the productivity gains are huge in comparison (anywhere from 50-80% reduction in coding time).

Staying with productivity, one tool that has really impressed me is BIReady. It is a database automation solution that really does deliver on productivity on two main fronts: changes to the data model do not necessarily require changes to the data structures, since data is automatically organised in a model independent normalised schema; key assignments are managed within BIReady, so need not be maintained by the ETL solution. This is a significant productivity gain in terms of reducing DBA activity (like the MPP databases) and simplifying the ETL process and shortening development times by taking away the need for key management. What’s more BIReady pricing also fits comfortably into mid-tier budgets.

So there we have it, price, performance and productivity. It is now possible to purchase a low maintenance, high performance end to end MPP warehousing technology for sub £300k. The nature of this beast also means that the delivery and maintenance of the solution is reduced.

High performance datawarehousing is finally in reach of the mid-market companies.

Wednesday 10 March 2010

Physician heal thyself

Wow, it's been a busy few weeks, hence the lack of a second blog, but at last here it is!

Starting a new business gives one the opportunity to begin at the ground up with process and procedures. A chance to start with a clean sheet and bring prior experience to bear…..not always easy when time is short, as this cautionary tale will illustrate.

One of our current projects is to tidy up our customer data. How easy that sounded when I started. So I started a salesforce account, exported outlook contacts lists into a csv files, grabbed a handful of old contact lists (MS Excel) and set about using Excel to standardise them. Since they were already in Excel I thought it would be easier to use Excel than anything else. It should be just a matter of moving columns around. How hard can that be?

Wrong!!!!

In short I ended up with endless printouts, lot's of half finished files, multiple backups, more duplication and an even bigger headache through staring at lists and lists of contact data for hours on end.

I also became very nervous about quality. What if copy and paste to the wrong field? What if I delete a column accidentally and can’t get it back. This could prove very embarrassing should I mix up customer information.

It was then that the title of this post came to mind. Here I am, a long term data integration professional with access to a data integration tool (expressor), messing about with MS Excel. Why, why, why?

The answer is simple. Like the titular physician, I ignored my own advice. All because of the allure of the "quick fix". I know from experience that hand cranking using Excel doesn't work, but was seduced by the seeming “simplicity of approach” and more fool me.........

So, after a day or so of Excel torture, I bit the bullet. First, I defined the target fields and created a master customer format in expressor . I then did the same for the source fields and lo and behold I suddenly was in control again. All I then needed was a simple transformation drawing to copy the source data to the target format.

We now have a master format for all our customer data, a master customer file and repeatable process for rebuilding that file. It also means that there is no need to alter the source files, deliberately or accidentally, so no more worries about losing data.

What is also great is that this is not a one-shot deal. Should we feel the need to further enrich the master we can do so easily simply by extending the master format and copying in additional source data. Not an easy thing to do with Excel!

What's more, each list now takes only 15-20 minutes to standardise into the master format, which means that should I receive new lists, we can easily apply them to our master customer list.

Voila!

I suppose the moral here is to take your own advice and use the right tool for the job. When time is tight, it is even more important to choose a reliable and proven approach and to resist the temptation to use one-off hacks.

Oh and I also learned beyond a shadow of a doubt that MS Excel is NOT a reliable data integration tool........I will not weaken again!

Wednesday 10 February 2010

A new dawn

Hello, and welcome to the shiny new edexe blog.

Edexe was recently formed to bring new and exciting data integration and management tools and services to a whole new audience (and some of the old audience too!).

As the new decade dawns, edexe are looking forward to a new dawn in data management. Performant, affordable and smart tools combined with deep experience and strong processes mean that we can genuinely help you to do more for less.

We'll be covering a variety of topics including data integration and ETL, data warehousing, data appliances, database automeation and migration as a starter for 10.

We hope you enjoy our blog and look forward to hearing from you.

Also why not check out our website at www.edexe.com to find out more.

Rick Barton - Owner and director