The Blog

How to access, cite, and defend web datasets in academic research

Posted on November 24, 2016 by

We’re used to getting questions about accessing structured web data. But recently, we’ve been fielding a different kind of use case.  Researchers and scientists have been asking about data citation conventions and how to defend research citing web datasets for peer review. As you might expect, we published our answers in the new Guide to Citing Web

Continue reading

Posted in Big Data | Leave a comment

Can Crawled Web Data Tell the Future?

Posted on November 14, 2016 by

Robert Tercek’s book Vaporized: Solid Strategies for Success in a Dematerialized World recently recently won GetAbastract’s 2016 International Book of the Year award at the Frankfurt Book Fair. Based in Hollywood, Robert has  spent his entire career creating interactive content and inspiring others to do the same. He was kind enough to share a few words

Continue reading

Posted in Big Data, Marketing | Leave a comment

Web Data Visualization of The Hillary Clinton Top 100 Network Graph

Posted on October 20, 2016 by

The web data business can get pretty tricky, especially when your job is to extract the broadest possible dataset from the planet’s biggest database. Last week, Webhose CEO Ran Geva ran a fun experiment to visualize Hillary Clinton’s web network. More precisely, who are the top 100 people most frequently mentioned in news articles and blog

Continue reading

Posted in API, Big Data, Technology | Leave a comment

Should you buy crawled web data or build your own solution?

Posted on October 10, 2016 by

In a technologically driven environment, the temptation to develop a proprietary web crawling solution is virtually irresistible. Our latest report examines the true cost of computing and software development resources required to deliver a data crawling and structuring solution at scale: Development & Maintenance Development could mean coding a proprietary solution from scratch, or modifying an existing crawling

Continue reading

Posted in API, Big Data | Leave a comment

Top 10 Big Data Stories Leading the Conversation

Posted on September 26, 2016 by

In the right hands, crawled web data can tell an amazing story. We were interested in the top 10 news stories – sorted by social shares on Facebook and LinkedIn. So we set up a simple news API request. We were looking for the stories published over the past 30 days returned by an exact match query for the term “big data”.  Here

Continue reading

Posted in Big Data, Technology | Leave a comment

Guide to Structured Web Data Consumption: How to get instant access to news, blogs, and online discussions

Posted on September 1, 2016 by

Hundreds of entrepreneurs, researchers, and data scientists contact us daily with questions about accessing structured web data. We put together our answers our new guide to Structured Web Data Consumption.     The consumerization of web data It’s easy to fall into the trap of building a proprietary crawling and data structuring solution tailored to

Continue reading

Posted in Big Data, Marketing | Leave a comment

5 Ways to Measure the Impact of Crawled Web Data on Your Business

Posted on July 27, 2016 by

The analysis you provide is only as good as the raw data you start with. Although data from the open web is often perceived as a commodity, not all crawled data is created equal.  Whether you’re relying on a proprietary crawling technology, tapping into a vendor’s firehose, or implementing a combination of both strategies –

Continue reading

Posted in Big Data, Technology | Leave a comment

How to Keep Your Restaurant Sentiment Analysis Well-Fed

Posted on April 6, 2016 by

When the team from London-based data analysis service GetSentiment developed a bleeding-edge system to measure the emotional baggage found in free text, they were missing just one thing: relevant data. “We were looking for a data provider that would be able to give access to sufficiently large amounts of frequently updated mentions of brands,” recalls

Continue reading

Posted in Uncategorized | Leave a comment

Webhose.io helps Observify expand their coverage and add a new angle to their already rich offering.

Posted on March 17, 2016 by

We had a the pleasure of speaking to Karl from Observify to understand a bit more about them but also why and how they use Webhose.io A bit about Observify “Observify is a fast growing company on a mission to relieve their clients of their analytical headaches. We’re shaking up the social and web listening

Continue reading

Posted in Uncategorized | Leave a comment

How to Create a Custom RSS Feed for Content Monitoring

Posted on March 3, 2016 by

Imagine that you had the ability to track what’s being said, felt and published about a given topic, industry or brand. Whether you’re in marketing, sales, search engine optimization, management or just a curious person, there are some major benefits to staying on top of the latest discussions, trends, issues and developments happening in your

Continue reading

Posted in Uncategorized | Leave a comment

How Crawled Data Gave One News Outlet the Edge in the Israeli Election

Posted on February 18, 2016 by

In the spring of 2015, as Israel prepared for general elections, virtually all of the mainstream media analysts believed that change was in the air. Conventional wisdom at that time had it that the Israeli populace was ready to turn its back on Prime Minister Benjamin Netanyahu and the government led by his Likud Party

Continue reading

Posted in Uncategorized | Leave a comment

goPRit and Webhose.io

Posted on February 9, 2016 by

Startups, small businesses, and even enterprise-level organizations all need publicity not only to survive, but to thrive. Why? Because the truth is … without an audience, even the best product won’t win. Owners and founders know this, which is why they hire PR firms to help them get the word out and cultivate valuable relationships

Continue reading

Posted in Uncategorized | Leave a comment

The 15 Data Experts You Should be Following on Twitter

Posted on January 14, 2016 by

Twitter is a phenomenal place not only to connect with peers in the analytics industry but also to follow and learn from its leading authorities. Unfortunately, the Twitter marketplace is crowded and trying to wade through and research exactly who’s who on your own is overwhelming Even worse is making your Twitter decisions based on

Continue reading

Posted in Big Data, Technology | Leave a comment

Five Reasons a News Crawler Is Essential to Your Business

Posted on January 5, 2016 by

“Originality is the art of remembering something but forgetting where you heard it.” Case in point, I don’t remember where I heard that. Nonetheless, it’s absolutely true, especially when it comes to running an online business. Why? Because in today’s online marketplace, sales, brand management, and genuine engagement are all practices that shouldn’t begin with

Continue reading

Posted in API, Big Data | Leave a comment

Extracting Data from Forums: 3 Sources to Discover What Your Market Really Thinks

Posted on December 29, 2015 by

Robert Collier, the great ad man of the early 20th century, once summarized the secret of all effective marketing as entering “the conversation already taking place in the customer’s mind.” That’s powerful advice … and difficult. Why? Because most of the sources we normally turn to for market research are woefully incomplete. For example, surveys

Continue reading

Posted in Big Data | Leave a comment

How to Extract Data from a Website: 5 Steps to Transform Unstructured Data into Business Insights

Posted on December 8, 2015 by

Big data is big business. And for good reason. As Harvard Business Review recently reported, an exhaustive study of 330 North American companies led by the MIT Center for Digital Business in conjunction with McKinsey’s Business Technology Office revealed that the use of data in business decisions like product development, hiring and firing, as well

Continue reading

Posted in Big Data, Technology | Leave a comment

Social Media Analytics: Insights from Structured versus Unstructured Data

Posted on December 1, 2015 by

Let’s be honest … social media is a challenge. Not only is staying current, active, and “topped off” a chore, but crafting full-scale campaigns that contribute to your business’ and brand’s actual goals can be bewildering. At the same time, the market for social-media continues to grow. According to recent data from eMarketer, “Social Network

Continue reading

Posted in API, Big Data | Leave a comment

Dead simple {for devs} python crawler (script) for extracting structured data from any website into CSV

Posted on August 16, 2015 by

On my previous post I wrote about a very basic web crawler I wrote, that can randomly scour the web and mirror/download websites. Today I want to share with you a very simple script that can extract structured data from any <almost> website. Use the following script to extract specific information from any website (i.e prices, ids, titles,

Continue reading

Posted in Technology | Leave a comment

Tiny basic multi-threaded web crawler in Python

Posted on August 12, 2015 by

If you need a simple web crawler that will scour the web for a while to download random site’s content – this code is for you. Usage: $ python tinyDirtyIffyGoodEnoughWebCrawler.py http://cnn.com Where http://cnn.com is your seed site. It could be any site that contains content and links to other sites. My colleagues described this piece of code I wrote

Continue reading

Posted in Technology | Leave a comment

How we quadrupled the performance of Elasticsearch

Posted on July 19, 2015 by

Well, that’s a misleading title. We actually quadrupled the performance of our brand monitoring alert system that uses Elasticsearch’s Percolator, but that would have been a much longer title. Some background Buzzilla has two main products. The first is Webhose.io which provides businesses worldwide access to structured data from the open web, and the second

Continue reading

Posted in Technology | Leave a comment

Webhose.io Tips & Tricks: Search for Reviews

Posted on December 10, 2014 by

Are you looking to focus your data search specifically on consumer generated reviews? Here are a couple of simple Webhose.io tricks that might help: Limit your query to specific sites You can limit your search to specific “review sites” like amazon.com, bestbuy.com, newegg.com, cnet.com, engadget.com, pcmag.com etc.. Here is an example for how you should

Continue reading

Posted in Technology | Leave a comment