Thursday 9 December 2010
expressor studio - new version that is free to download
edexe partners with Talend
Talend’s data management solution portfolio now includes data integration (operational data integration and ETL for Business Intelligence), data quality, and master data management (MDM). Through the acquisition of Sopera in 2010, Talend became also a key player in application integration.
Unlike proprietary, closed solutions, which can only be afforded by the largest and wealthiest organizations, Talend makes middleware solutions available to organizations of all sizes, for all integration needs.
Tuesday 15 June 2010
Interesting DI survey
I'm fascinated by this survey on linkedin:
What is the #1 obstacle to successful deployment of ETL/data integration software? - http://bit.ly/ayVt86
It's only a small sample thus far, however the overwhelming view is that non product related problems are the main downfall of data integration deployment. It’s a view I’ve always held and it would seem that many others share my opinion.
edexe's new youtube channel
Monday 17 May 2010
Migration and Integration
Last week I attended the Data Migration Matters conference http://datamigrationmatters.co.uk/ in London and what I learned was that there are differences in approach between integration and migration, however there are common factors, two of which I will cover in this blog.
The first is customer involvement. The data in an organisation is utilised by the business, defined by the business and ultimately informs business decisions, so any project, IT or otherwise that is required to make changes to data ultimately needs business buy in and involvement.
The second is understanding the data. Many projects have failed because of incorrect assumptions and gaps in knowledge that only manifest themselves when a project is in full flight. It is imperative that the source data is mapped out and understood prior to coding.
In some ways these two requirements go hand in hand. To understand and make sense of the data, you need the business to add their experience of using it to the mix. To involve the business, you have to be able to deliver technical information in such a way that it becomes possible to interpret the data in a non-technical way.
This is where data profiling and quality tools come into their own. These tools analyse the source data and present the user with a high level view of the data, enabling the user to view patterns and statistics and relationships at both file and field level.
Profiling information is often a revelation for business users. Operationally the data works, so is deemed fit for purpose, however when profiled it is not uncommon to see genuine problems with the data, such as duplicate records, missing fields and records and often just plain incorrect values. The ability also for drill down to the actual data is imperative in order to show that the statistics marry up to real data on real systems within the organisation.
It is often at this point, when the illusion of “perfect data” evaporates, that the business buys into the project and begins to understand why the ownership of the data and the associated business rules fall squarely within their domain. It is surprising how showing people their own data “warts and all” can have a profound effect on their involvement in a project.
How often have we heard the term “if it isn’t broken, don’t fix it” and for many users their opinion is that their data isn’t broken, so it is perhaps hard to understand why IT make such a fuss during a data migration.
The truth is that for many organisations their data is, to varying degrees, somewhere between broken and fixed and it is only when it is utilised en masse, say for reporting or migration, that problems suddenly begin to appear.
Wednesday 28 April 2010
The reducing cost of data warehousing
This post in many ways relates to the mission for edexe: performance, price and productivity.
As a result of recent activities, I am stunned by how much the cost of data warehousing has reduced in the last couple of years..
MPP (massively parallel processing) technologies have fallen in price; on the ETL/DI front and on the database front, and database automation tools are becoming more popular as a result.
Competition is hot with regards to MPP in the database/appliance market. Teradata and Netezza have been the dominant players for the last 5-6 years, however a host of new appliances, cluster based and columnar databases have hit the market in the last 2-3 years. The increased competition from the likes of Oracle Exxadata, HP Neoview, Aster, Green Plum, Kognitio, Kickfire and Vertica is rapidly bringing the price of MPP database processing down to sub £100k (and below) for entry level databases.
This is bringing the MPP database well within the reaches of mid-tier companies, enabling enterprise performance on a budget.
On the data integration front, expressor’s parallel processing engine delivers speeds to compare against or even beat the most established DI vendors such as Ab Initio, Informatica and Datastage, yet remains priced sub £50k for an entry level DI product. Talend also deliver an MPP option to the market with their MPx version of their Integration suite.
Again high performance DI/ETL products are now available for the mid-tier company.
So I’ve discussed price and performance, what about productivity?
Well the database products deliver significant benefits in terms of productivity over the traditional OLAP databases such as Oracle and SQL Server. The MPP database standard was set by Netezza, delivering MPP performance, with minimal DBA activity. Even in large organisations, Netezza DBAs may only spend 1 day a week maintaining the system. Within the MPP space this is pretty much standard, with self-organising databases being the norm.
Between DI tools, there is little to choose in terms of productivity, however when compared to SQL or other hand cranked coding languages, the productivity gains are huge in comparison (anywhere from 50-80% reduction in coding time).
Staying with productivity, one tool that has really impressed me is BIReady. It is a database automation solution that really does deliver on productivity on two main fronts: changes to the data model do not necessarily require changes to the data structures, since data is automatically organised in a model independent normalised schema; key assignments are managed within BIReady, so need not be maintained by the ETL solution. This is a significant productivity gain in terms of reducing DBA activity (like the MPP databases) and simplifying the ETL process and shortening development times by taking away the need for key management. What’s more BIReady pricing also fits comfortably into mid-tier budgets.
So there we have it, price, performance and productivity. It is now possible to purchase a low maintenance, high performance end to end MPP warehousing technology for sub £300k. The nature of this beast also means that the delivery and maintenance of the solution is reduced.
High performance datawarehousing is finally in reach of the mid-market companies.