Whilst work on the new gallery is ongoing, and is soon to be completed, I already have a couple of new ideas forming in my head, and I thought I'd share them to see if the blogsphere has any decent input!

As a pre-cursor, both ideas involve using the Yahoo Content Analysis Term Extraction API in some way or another. So firstly I came across this site, TagCloud.com, which takes any number of rss feeds, and produces a tag cloud from them, by extracting keywords using the Yahoo API, and then using them as tags for the various posts. This works fairly well, as you can see from the tag cloud produced for my site, however one, their site isn't that reliable, two, the cloud isn't updated very often, and three, when you click on a tag it takes you to their site, to a page loaded with Google Ads.

Now me being me I don't like the fact it does that, so what I want to do is create my own tag cloud. Now the limitations from using the rss feed of my site are obvious, its only the last 15 posts. And you can't use blogger to do it as they don't let you tag posts as yet. Furthermore you can't use the blogger API to retrieve all of your blog posts to then tag, because its rubbish. That leaves a problem, of how can I programatically extract keywords from all of my posts, and organise them into a tag cloud, on my site.

Luckily, I have a brain! Because my site is all XHTML strict, that means that all my blog pages are a XML document, meaning if I just fiddle with my blogger template a little so I can identify the div that has the content of my blog post in it, I can use asp.net to parse all of my static blog post pages, pull out the relevant content, pass it through the Yahoo API, thus generating tags for each post, organise the tags and produce a tag cloud, and stick it on my site, brilliant! I've then got full control, no adverts, no other sites required. Obviously I will need to setup an word exclusion list so any tags the Yahoo API comes up with that aren't relevant, I can block.

So, any comments, has anyone else already done this with a Blogger blog, and any potential downfalls you can see to my method?

So that was the first idea, the second is to use the Yahoo API to train an rss reader to only show me posts that I want to read. This is a trickier problem to develop. Essentially I will need a custom rss reader, that will not only display posts for me, but will also pass it through the Yahoo API to determine the main content of the post. Over time I can tell it which posts I'm reading, and which I'm not, and thus it can be trained to spot the keywords in posts I read, and ones I don't. Once I've got that nailed down, then each feed I subscribe to I can pass through my intelligent filter, and then only get to read the posts I am interested in. Any that are filtered out I could screen, and if needs be the post added to the keyword filter. And any I read that I don't want I can then remove from the keyword filter. Essentially it then means that whilst an rss feed is generally specialised, I can specialise it even more to my particular taste, with the filter training itself.

So again, not a bad idea I think, but do you think its useful, have you seen something similar elsewhere? Let me know your thoughts!

Watch this space and see what I turn out next!!