Jump to content

Feb 17

In any kind of data access scenario, transforming and applying business logic to your extracted data is critical to structure the data to the exact format you need.

In ETL terminology this is the “T” in ETL.

When accessing data from Web based applications, this capability is critical since the data is typically unstructured and the more structure you can add, the higher the value.

Most Web Scraping and Screen Scraping tools on the market today typically lack adequate transformation capabilities.

Here are some examples as to how the Kapow Web  Data Server delivers full transformation:

Below is a small extract of a blog list from PW Forum:

ETL4Web

On the first line in blue you see the timestamp set to “Today 15:15:13”. This timestamp denotes the time and date when this blog entry was posted, but it would need to be transformed into a fixed timestamp like “2010-02-14 3:15:13 pm Pacific time” to be useful in comparing it to other blog entries over the internet.

Here’s another example from Ebay. When you bid on an item on Ebay, the price of the item is red or green depending on whether you are the high bidder or not. This is important information “hidden” in the color that you would like to capture along with the price.  Once transformed, you’ll know not only the price, but whether it is the “highest bid” or not.  It’s a simple step to define the business logic as “if price is ‘green’ then set status to ‘high bidder’ otherwise set status to ‘not high bidder’.”

The Kapow Web Data server and its powerful visual programming IDE allows you to  apply any business logic and data transformation you can think of giving you the most powerful ETL for the Web product on the market today. And it’s all done visually with no need for any coding.

Try it out next time you need Web data for your BI or analytics tools.

By:  Stefan Andreasen Stefan Andreasen, CTO and Founder

Tagged with:    
Jul 21

A few weeks ago, I had a great chat with Jamie Thomson from EMC about Web Data Services.  I noticed Jamie recently wrote an interesting blog post titled, “ETL for HTML”.  ETL is a well known term for anyone working with Data Integration or Data Warehousing. It stands for Extract, Transform and Load, and describes a one-way process of extracting data from a source, transforming the data into a new format and then loading the data into a destination. Traditional ETL vendors like Informatica are most effective for extracting and loading data from sources which can be accessed in traditional ways through SQL, XML or program APIs. This is where Web Data Services products like Kapow Web Data Server come in as a next-generation ETL tool. The Kapow Web Data Server allows users to Extract and Load data to and from all the data sources, including those that cannot be accessed in traditional ways, with the only prerequisite being that users are able to access and see the data in a normal Web Browser.

We live in a browser-centric world today where “ETL for HTML” encompasses the 2 extremes:  Web2.0 (e.g. web scraping, mashups, etc.) and Enterprise Data Management (e.g . data extraction, data collection, data mining, data conversion, data integration, etc.).  “ETL for HTML” is the perfect universal term that best describes working with all the data we work with and see in our Web browsers. This gives us fast and automated access to any data in applications like SalesForce or NetSuite or any of the millions of other web-based applications that exist inside our firewall, at our business partners, with the government, or just out on the public web.

Jamie is spot-on with the term “ETL for HTML” as a way to describe how most of us will access web data.  Although ETL traditionally describes a one-way process of moving data from point A to point B, Web Data Services provides two-way access to data. This means we can leave the data where it resides best (like in your HR or ERP applications) and get full programmatic access by using a product like the Kapow Web Data Server to “wrap” the applications into standard service APIs like REST, SOAP or .NET.

Why is this so important? Well for two reasons.  First, with the data explosion around us it becomes impractical to move and synchronize data into one common data repository.  Second, the data we need to perform our analysis and drive business decisions will change more and more rapidly. We will need new data sources daily, or at least weekly, to react to the ever changing business needs of the future.

So what is a good replacement for the term “ETL for HTML”? I suggest something like “Access, Enrich and Serve Web data”. This is a superset of ETL that also covers the way we want to access data in the future.

What term do you think we should use?

By:  Stefan Andreasen Stefan_Andreasen_CTO

Tagged with:                

The Kapow Katalyst Blog is…

... a collection of insights, perspectives, and thought leadership around the Browser-Based Application Integration.

Comments, Feedback, Contact Us:

blog at kapowsoftware.com

Get Our RSS Feed

RSSKapowSoftware