unstruct.org My unstructured thoughts and rants


Free Money and Easy Government Grants – Be Careful!

Who wouldn't want “money for free” or “government grants”  that just magically falls into
your lap?  When you hear this kind of thing, you might wonder “Does money really grow on trees?”.  After reading the article "Free Money?", I decided to do a little research of my own, and share what I learned:

If you were to believe the infomercials on late-night TV or perhaps the ads in the local classifieds paper , you would probably assume that it's possible. These smooth talking guys on TV invite you to their seminars, where they will show you how to get all this free money to pay your bills. They have even smoother looking people telling you how they got a grant from government for $ 40,000 to run their small businesses. Or they will tell you that they got thousands of dollars which they do not need to return, to use as they wish.


Google’s SearchMash

Google has started a new site called SearchMash which seems to be intended to test out new UI ideas for search results. It’s AJAX-based and currently has web page and image search.

Currently they show images to the right of web page results and the green URLs are clickable menus. When you click on “more web pages” it expands in place, giving you a longer scrollable page instead of a new one, which looks good and is useful.

It appears that they will experiment there so it should be fun to go back every now and then. Nice to see more of AJAX and JavaScript.


A9 drops unique features and switches to MSN search

Amazon’s search engine effort called A9 has quietly dropped their unique and highly publicized features. They are no longer remembering all past search queries by logged-in users. This is odd since they could have built a great personalization feature using this information. Perhaps they are worried about privacy issues.

They have also removed the street-level images that they so painstakingly collected over several years. These were pictures of store-fronts that were shown when you searched for an address in twenty cities. They must have spent quite a fortune to take these pictures so it’s surprising that they have removed them.

In addition to this they have switched their search engine from Google to MSN. Since Microsoft’s search engine is not quite up there in quality yet this is puzzling. They will likely lose a lot of their users because of this at least in the short term. I suspect there are business strategy decisions behind all this but for the moment what remains of all the hype on A9 is a pretty pedestrian aggregated-search site. It will be interesting to see where A9 is headed.


Web Search, Extraction & Machine Learning positions at Radar Networks [San Francisco]

Required Skills

  • Java development: Exceptional skills with Java – at least 3 – 5 years of professional (or significant academic) coding experience in Java.
  • Search engines for the Web: Crawling, resource discovery, indexing, harvesting, extraction, using tools such as Lucene, Nutch, Hadoop etc. Experience in scaling search engines to handle massive amounts of data. Experience in using ontologies/taxonomies in search is desired, graph search and social network analysis are also of interest.
  • Experience with modern software engineering practices and paradigms - we want you to develop beautiful code which is a pleasure to look at and is easy to maintain.

Optional Specialized Skills

  • Data Extraction and Harvesting: Harvesting knowledge from unstructured and structured datasets. Entity detection, topic detection, document segmentation and classification. Familiarity with products such as InXight, GATE, UIMA, MinorThird, Mallet, WEKA, and/or other text mining technologies. Natural language processing skills are also a plus.
  • Machine Learning for Search and Classification. Machine learning algorithms to assist with search, classification, clustering, personalization, optimization and data extraction. Supervised, unsupervised, Bayesian learning, SVM, HMMs, graph theory and graph search and vector search algorithms.
  • Semantic Web: Experience the Semantic Web, RDF, OWL, reasoning over semantic data and ontologies.
Filed under: General No Comments

Wikio, a new news aggregation and moderation site

It’s becoming clearer by the day that news is too important to be left to the media corporations. They show us what they would like us to learn and believe, but that is often not what really happened. With modern technology it should be easier to give this power to the people. This idea works quite well with blogs, e.g., Memeorandum.com.

News aggregation has been dominated by Google News. Now there is a new French site called Wikio.com that looks promising. It’s still in beta, but TechCrunch has had a look. Digg.com, of course also works well, but is mostly for technology topics.


3D photo collages created automatically from photos

Researchers from Microsoft and the University of Washington have created a very impressive way of organizing and indexing unstructured photos. The system extracts distinctive features from the images that are then aligned pairwise. By using all these alignments, the original position of each camera can be estimated. It’s impressive beacuse the system does not need to know the geometry or lcoation of any of the cameras, and any picture can be used for this, not just images from the same camera.

Filed under: General Continue reading

ResourceShelf – a wealth of information

For those of you who don’t know Gary Price and his ResourceShelf web site - this is a recommendation to go there! It is a wealth of information for anyone into research, information sciences and technology. It’s updated several times a day, all through the year. This must be a daunting task since Gary is also a very requested speaker at conferences and events - when does he get the time to update this virtual gold mine?

p.s. a similar resource site, but with perhaps a bit more researcher focus, is FreePint.com. It also comes highly recommended.


Government Agencies spend top dollars on security-related information-analysis technologies

Federal Computer Week has a long story about how the US government challenges the software industry to develop applications that predicts terrorist actions. The emphasis is on pattern-recognition algorithms and the vendors that use them e.g. ClearForest, Inxight, Convera, Kofax and Autonomy just to mention a few.

Some interesting facts in the text is the deal sizes of some recent projects awarded by the US government - $5,2 million to Convera, $3 million to Inxight and $10 million to TranTech Inc. for a DOD contract. Clearly, there is serious money in this sector.


The other intelligent open source

Stephen Arnold writes a very good article about the use of “open source intelligence” or OSINT in Web Active Magazine. He mentiones that there are several problems in this field when it comes to the use of adequate technology. In the article he quotes the US based Open Source pioneer and founder of OSS (a network that now contains leading actors such as East View Cartographic and Infosphere) Robert Steele.

The emphasis in government has been to spend on very complex information technology for collection, then to spend almost nothing on information technology for processing,” explained Steele.

In our experience this analysis by both Robert Steele and Stephen Arnold are very true. In fact, the focus on gathering technologies are just adding to their problems with too much information. We would suggest that the Open Source intelligence believers also starts to focus their attention to modern(unstructured) information analysis tools.