Our crawlers download millions of posts a day from millions of sources. Sometimes you may want to only sift through news or blog posts that had some kind of social impact. To provide you with this capability, we are introducing a new score we call the “Performance Score”.
Many factors can affect streaming data relevancy. When the data you consume isn’t ordered by relevancy, rather by the time it was crawled, getting the relevant posts is essential. I would like to share with you a few tips you can use to highly increase the relevancy of the data you consume via Webhose.io API
Are you looking to focus your data search specifically on consumer generated reviews? Here are a couple of simple Webhose.io tricks that might help: Limit your query to specific sites You can limit your search to specific “review sites” like amazon.com, bestbuy.com, newegg.com, cnet.com, engadget.com, pcmag.com etc.. Here is an example for how you should
After bashing various crawling techniques, I would like to describe the technique we use here, at webhose.io, a technology that was developed over the past 8 years. Our crawlers were developed with the following demands in mind: Efficient on server resources, i.e CPU & bandwidth Fast in fetching and extracting content Easily add new sites