Digital search vs. human search: Exploring a premise and citing an example

by drm on April 13, 2009

Following my post about the potential of social media platforms like Twitter and Facebook to erode the hegemony of Google’s search, an interesting discussion developed in the comments. One focus was how semantic search, which takes an object-oriented approach to the elements on a page, fit in with what another commenter has called the “human-search” paradigm that I was pointing to as the basis of Twitter and Facebook’s potential.

Since that post ran, the dialogue has grown more active in a number of pockets around the blogosphere.  Joshua Porter of bokardo.com wrote a succinct post pointing to the possibility of a slow erosion of Google’s search position.  Porter points to attention as a critical currency in the shift in market dynamics.

Thus the real problem for Google is attention. People are increasingly giving their attention to Twitter, Facebook, and other social software, and thus (indirectly) giving it less to Google. Also notice that services have traditionally been happy to give Google their search traffic, but neither Twitter nor Facebook are doing that.

So while Google continues to increase its search market share, and folks look at that and say “Google is only getting better”, what they don’t necessarily see is how much the social sites are sucking up more attention. And eventually that attention will be so strong that Google will begin to suffer.

In an attempt to solidify my thinking around this topic, I put together a couple of schematics that attempt to describe, at a very high level, the difference between digital search and human search.  (In fact, the real term should be human-assisted search.

Digital search is highly focused on the elements contained in each page and the way that other pages relate to it.

picture-111

Human search enhances the component of digital search by observing the way that specific individuals and groups of related people interact with specific pieces of content.  That history enhances the understanding of the digital elements of the page and the way that other pages interact with it.

picture-122

How does it work you ask?  I’ve attached the full presentation at the end of this post.  It addresses some of the specific issues that I can identify related to execution.  Remember, I’m not a technologist, so this schematic presents a conceptual overview of an architecture for a search approach, not a guideline.

But, there are people getting at the idea in different ways right now.  Here’s one start-up that is going at the problem by focusing on one specific category — employment.

TalentSpring, a semantic-search startup that lets recruiters identify potential job candidates on social-networking sites and job boards by scanning for their qualifications, has raised $1.6 million. Investors include Second Avenue Partners. The company is testing its service, and expects to go live with it in May. TalentSpring argues that semantic search is more effective than searching for key words and correlated terms.

Facebook and Twitter have been incredibly quiet about their search strategies.  There’s no question that they are sitting on a new and substantial opportunity.

The full presentation:

[slideshare id=1277497&doc=humanvsdigitalsearch-090412083918-phpapp02]

Related Posts with Thumbnails
Share
  • Tom Cintorino

    If I am grasping the presentation correctly… The citations of a piece of work and the works who cite that piece of work have transformed online into the core of Google’s original algorithm (digital search – links in and links out). Dan’s Human Search model extends citations to social interactions. Reasonable – technical challenges notwithstanding. In an effort to maintain credibility beyond what is measured through content interactions (behavioral approach), perhaps the profile of the individuals interacting with the content should be given weight. Of course, profiles would have to be validated. Although social networks represent the power of the masses, in order to trump hierarchy, credibility ratings of social interactions with content will have to emerge. Perhaps this is where AI approaches weigh-in and create systems that learn and improve through each engagement. Very interesting discussion. Google commercialized the Internet. There will certainly be new and better approaches to interacting with the world’s content and its individuals.

  • Tom Cintorino

    If I am grasping the presentation correctly… The citations of a piece of work and the works who cite that piece of work have transformed online into the core of Google’s original algorithm (digital search – links in and links out). Dan’s Human Search model extends citations to social interactions. Reasonable – technical challenges notwithstanding. In an effort to maintain credibility beyond what is measured through content interactions (behavioral approach), perhaps the profile of the individuals interacting with the content should be given weight. Of course, profiles would have to be validated. Although social networks represent the power of the masses, in order to trump hierarchy, credibility ratings of social interactions with content will have to emerge. Perhaps this is where AI approaches weigh-in and create systems that learn and improve through each engagement. Very interesting discussion. Google commercialized the Internet. There will certainly be new and better approaches to interacting with the world’s content and its individuals.

  • http://www.nci.com drm

    Tom, you’ve summarized the relationship between the two types of search neatly. I think that profile attributes could be an aspect of an individual’s credibility, but I’m intrigued by the knowledge that the group has of the individual and how it informs their interaction with any content associated with the individual. A key premise is that people are experts on unexpected things, and that the expertise is a matter of context and experience. For instance, I’ve recently developed an expertise in the removal of environmentally unsafe soil: our oil tank sprung a leak and we had to go through the process of getting it removed, the soil cleaned and a new tank replaced. This isn’t expertise that has any relevance to my profiles. I do have a lot of information on it though, and if there was a serendipitous question asked by one of my contacts about this problem, I’d be able to share current, researched and quality information. A search protocol that captures the freshness of that information and the interaction with that information by a person with a specific query, who is in my group of contacts, would give that information more relevance than a digital search engine might. Imagine then that an element of a search is that it first looks within your group of connections, and their associated connections, for information relevant to the query that has been acted on and is deemed to be of good quality.

    My emphasis here is that the human-search paradigm doesn’t require AI or semantic search tools to be implemented. Both types of tools would enhance the search, but it is possible to create this kind of search process using the digital data associated with individuals and their interaction with each other and content.

  • http://www.nci.com drm

    Tom, you’ve summarized the relationship between the two types of search neatly. I think that profile attributes could be an aspect of an individual’s credibility, but I’m intrigued by the knowledge that the group has of the individual and how it informs their interaction with any content associated with the individual. A key premise is that people are experts on unexpected things, and that the expertise is a matter of context and experience. For instance, I’ve recently developed an expertise in the removal of environmentally unsafe soil: our oil tank sprung a leak and we had to go through the process of getting it removed, the soil cleaned and a new tank replaced. This isn’t expertise that has any relevance to my profiles. I do have a lot of information on it though, and if there was a serendipitous question asked by one of my contacts about this problem, I’d be able to share current, researched and quality information. A search protocol that captures the freshness of that information and the interaction with that information by a person with a specific query, who is in my group of contacts, would give that information more relevance than a digital search engine might. Imagine then that an element of a search is that it first looks within your group of connections, and their associated connections, for information relevant to the query that has been acted on and is deemed to be of good quality.

    My emphasis here is that the human-search paradigm doesn’t require AI or semantic search tools to be implemented. Both types of tools would enhance the search, but it is possible to create this kind of search process using the digital data associated with individuals and their interaction with each other and content.

  • http://www.nstein.com OlegR

    Tom, glad you joined the conversation :) This is Oleg with Nstein.

    Dan, I agree with you that on a high level AI and semantic technologies are not needed to measure freshness and credibility of infromation, since, as you said in your example – people might have unexpected expertise in unexpected fields.

    Area, where AI, semantic search and tools like text mining are absolutely needed is filtering of all the noise.
    Twitter/Facebook etc. are already filled with noise, commercial posts, junk posts, etc.
    Such content will interfere with your search algorithms and thus distort results.
    What is needed is tools that will be able to remove all the junk from the search base, to keep as clean and as pure human input as possible.

    Great presentation! I’m sharing this link with our semantic guys here at Nstein as well as my twitter network, so we will get some interesting input soon.

    Cheers-

    Oleg (twitter – @OlegR)

  • http://www.nstein.com OlegR

    Tom, glad you joined the conversation :) This is Oleg with Nstein.

    Dan, I agree with you that on a high level AI and semantic technologies are not needed to measure freshness and credibility of infromation, since, as you said in your example – people might have unexpected expertise in unexpected fields.

    Area, where AI, semantic search and tools like text mining are absolutely needed is filtering of all the noise.
    Twitter/Facebook etc. are already filled with noise, commercial posts, junk posts, etc.
    Such content will interfere with your search algorithms and thus distort results.
    What is needed is tools that will be able to remove all the junk from the search base, to keep as clean and as pure human input as possible.

    Great presentation! I’m sharing this link with our semantic guys here at Nstein as well as my twitter network, so we will get some interesting input soon.

    Cheers-

    Oleg (twitter – @OlegR)

  • http://asserttrue.blogspot.com/ Kas Thomas

    I use the Twitterstream of my friends as a “smart” (but still quite noisy) RSS feed. My Twitter friends provide a lot of high-quality semantic filtering for free. But I agree with Oleg that noise is a huge problem that calls for the intervention of “smart” software. Categorization, ditto.

  • http://asserttrue.blogspot.com/ Kas Thomas

    I use the Twitterstream of my friends as a “smart” (but still quite noisy) RSS feed. My Twitter friends provide a lot of high-quality semantic filtering for free. But I agree with Oleg that noise is a huge problem that calls for the intervention of “smart” software. Categorization, ditto.

  • danielrmccarthy

    in regards to the noise, I'm putting a lot of weight in the consistency of the individual associated with the content.

    That's where this model breaks down: the breadth of content associated with individuals would take a long time to build.

    So, imagine three levels: Human-filtered content, determined by the individuals associated with specific content objects, ranked according to the interactions of the group; filtered social network content — the real-time search people talk about, enhanced with the techniques of semantic search we've discussed; and finally, the current paradigm of digital search. That creates a tiered search result that gives the user a broad choice of options.

  • http://budurl.com/twithomeoleg OlegR

    That sounds very comprehensive. Not sure if the technical capability to run that structure on a global scale exists today (for example text mining is a pretty slow technology even with today's machine speeds).

    When are we launching the next “google-killer”? :)

    On a serious note I see two types of businesses arising from our discussion.
    First one would be user-ranking/content filtering business.
    There were some steps taken in this direction.
    What i mean by it is some kind of UGC crawler that would be able to crawl twitter/facebook/etc, aggregate multiple profiles of same users (OpenID?), to unite their UGC clowds.
    Then this crawler would compare expertise of these users (their meta-profiles), and assign them to expertise taxonomy with hierarchical structure (e.g. I have more expertise related to the concept of text mining than user X). When this expertise is somehow quantified, UGC credibility matrix can be established.
    From there, all the junk UGC content will be automatically filtered out, since its providers are not ranked as credible by the crawler.

    So now we have database-structured clowd of UGC content that is indexable and searcheable, while still very relevant and topic-oriented.

    Next step is to use a robust search engine (Solr?) to search this UGC universe, and mash it up with machine search content (editorial/general interest).

    What do you think about this approach?

  • danielrmccarthy

    The UGC crawler you describe does exactly what we're talking about. I wonder if you could accomplish it without going into the social networks. You could easily identify content that was accessed from the networks by crawling referrers. And, each of the referrers has a unique id associated with a user. That would get you the type of content organization around user identities that we're talking about, and you'd still be trolling the Google web, not the Social web.

    The mash-up feels like the easy part, although I know I'm over-simplifying.

  • alltoute

    Sorry to join this interesting discussion late. I agree and partially disagree with several things in here :-) so I will try to focus on the latest comments only.

    Speaking of what I call “social signatures”, I agree that a search engine on top of social network content works best when you take into account not only the pages, but also their readership, their general content consumption patterns, their profiles, their social graphs, etc. Nevertheless, I believe that one can reach much higher levels of accuracy/context if he could analyze what the content is about. That's where tagging, text mining and semantic search comes in (analysis of content aboutness). A combination of these 2 types of information (what content is about and what user is interested in) is key for a killer application for mulidimentional and global content access system.

    A good example where the human factor can’t really help is at the moment of creation/ingestion of a new piece of content. How can the algorithm tell you that this piece of content is relevant to you, based on your interest or on the interest of your best social connections? How could it evaluate some sort of relevancy at this time using only your user profile? It knows what you like (based on your past connections/content) but it does not know what exactly the new piece of content is about.

    On the other hand, if you can text-mine content to obtain a semantic representation (database-like structure of each piece of content) of it, you could match that semantic map with semantic maps of other documents and then leverage the similarities in content the same way as you would leverage similarities of semantic user profiles. For example if you already consumed content, that is similar semantically to this new piece, then this new piece is highly likely to be relevant to you.

    Semantic representation of a given user group interests set is also pretty interesting. In terms of search, content discovery and navigation, semantic annotations are really helpful to slow down content stream, better handle noise and help create different view levels over and between content and people.

    Again, a combination of these 2 approaches (semantic content analysis and user profile analysis) works best to build a more complete collective intelligence system. In this discussion we talk about its use as a search engine, while it could be the framework to do more than just meet basic search needs.

  • danielrmccarthy

    This is pretty elegant.

    In terms of the ability to include fresh content in the search results in a relevant way, I think that the credibility of the users who distributes the content and the interaction of his connections with the content will create more information about a new piece of information than the current search paradigm does.

    Everyone has talked about the noise factor. I wonder whether a paradigm that associates quality content distributors with quality content actually filters out a lot of the noise.

    Here I'm thinking about “distribution” and “content” in a narrower way than is typically considered in search. First, distribution is probably better thought of re-distribution. The social web has created an method of sharing discrete pieces of content — web pages/pics/etc. Links shared around twitter are a good example. So, people who re-distribute content that is frequently acted on, and which, when indexed, is deemed to have context consistency, etc., are given higher weight in their content associations. And with content, I'm actually dismissing the content that is flowing through twitter or along Facebook in a stream. That content is typically creating some reference to a content object or a topic and feels pretty perishable.

    On the issue of semantic representation, the simplest image is of the morphology of an organism: all the terms are shared, but the genetic make-up is infinitely variable.

    What exactly has hampered the creation of a common organizing principle for content. We've got the dewey decimal system in libraries, and all the content has to declare itself into some root category and further sub-categories?

    The Open Directory project was designed to do just that: create a common schematic for declaration and categorization of content. Meta-tags in Google are designed to accomplish the opposite, to allow the pages to intersect with the highest number of searches, even when marginally relevant.

  • newbill

    Really good post Dan. Reading it, I couldn’t help but to think of another blog that helped frame how I tackle the task of capturing the attention of the home buyer’s with search. The Human Search Schematic is very interesting but I think we can take it further when we temper it with “why and what” we are searching. Gord Hotchkiss dusted off some old physiological theories and applied it to how people search on things that have low brand value and high risk associations like buying a home.

    In short he took the concepts introduced by Herbert Simon such as satisficing, bounded rationality and chunking and applied it to this new wealth of information available with search engines.

    “We have never had more information available. At the click of a mouse, we can access huge amounts of information. There's simply no way we can process it all and come to rational decisions. And this brings us to another concept, that of bounded rationality. We're more rational about some decisions than others. It depends on a number of factors, including risk, emotional enjoyment and brand self identification. Think of it as a chart with three axes. One axis is risk. We put more rational thought into decisions that expose us to greater risk. In consumer decisions, risk usually equates with cost, but in B to B decisions, it could also include professional reputation (related to but not always directly tied to cost). We're going to put a lot more thought into the purchase of a car or house than that of a candy bar. Another axis is emotional enjoyment. This is a risk/reward mechanism to most decisions, and if the reward is one that is particularly appealing to us, we tend to be swayed more by emotion than rational decision. If we're planning a holiday, we may make some irrational decisions (or at least, they might appear that way to an outsider) based on a sense of rewarding ourselves. We'll treat ourselves to a few nights in a 5 star resort, when the 3 star resort would offer greater overall value. The final factor, and one that is usually buried somewhere in our subconscious, is how we use brands or products to define who we are.”

    Buying a home is probably the single biggest risk decision that a consumer will ever make and typically the house we choose becomes our customized brand. If we subscribe to Simon’s theory, then the home buyer needs a medium that gives them all the information that they need to identify with their purchase. House style and look, community flavor, points of interest, pictures, virtual tours, peer advice, etc.(social media can aid the peer association but that is another subject) all help the consumer emotionally connect with the “branding” of the home. It is a very unique decision that allows the consumer to personalize the brand that fits best with their decision. The brand of the aggregator becomes secondary to information that it provides. Basically, the more content they have access to in one place will fuel the personal connection to the home and help them connect emotionally to the purchase.

    Take a look at http://www.outofmygord.com/archive/2007/11/09/S….

  • http://twitter.com/alltoute alltoute

    I agree, but credibility of a user is not so different than the credibility of a web site for Google. It's more personal and thus closer to interests. It really depends on what kind of things you need to find and what kind of search you need to do: discovery, exploration, simple facts, website, opinions, etc. Social based search have a lot of potential, but I'm pretty sure it can’t do well alone for all kinds of search. A given user is maybe credible and shares similar interests with you, but he is maybe also a lot into lawn mowers and you maybe not :-) That's why I still believe that semantic content and interests representations are also part of the solutions.
    About semantic representation, I think that it's an error to reduce the semantic web to the linked data. I completely agree that ambiguity will always remain. Semantic technologies such as semantic search can also “reach semantics” using analytics techniques like text mining. Classifications are necessary and helpful in various situations, but they have limitations because they are rigid and subjective and that's why context analysis remains a key differentiator. Once again, there is no magic solution. I'm a hybrid technologies believer, especially when we talk about text technologies.

  • http://budurl.com/twithomeoleg OlegR

    Dan,
    I absolutely agree with this newbill's.
    And this is what I meant when I mentioned social media featurea for ApartmentFinder.com.
    Adding some sublte but powerful social media features to each property's microsite will help visitors better connect with the property, or, like newbill says it – with the “branding” of the home.

  • http://www.viralhousingfix.com danielrmccarthy

    Oleg,

    We're working on a couple of different ways to approach the implementation of social media on our sites. As usual, we want to make sure that we've got it well thought out and that it integrates with our overall approach. (How's that for plausible deniability.) I'm not comfortable talking about the things yet, but we're having fun trying to cook them up.

    Dan

  • http://www.viralhousingfix.com danielrmccarthy

    I wanted to re-read this comment before replying. Something about your thoughts sparked a rapid and simple series of connections that I haven't been able to really nail down. Meaning is all about context and connotation. Language is a remarkably flexible and elastic tool.

    Throughout our discussion about the potential for human-assisted search, I have imagined that tracking the exchange of content between two people would capture something of the context and intent in the communication, and that by capturing that nuance, the meaning of the content could be indexed with an understanding of relevance that transcended the information on the page.

    Your comment made me think about the powerful ways that search algorithmns can catalog and categorize the incidence and usage of terms. The ability to generate relevance from that data base grows with or without awareness of the quality and influence of the individual who used the terms.

    So, yes, merged technologies is the answer. And finding ways to align identity, semantic objects and traditional machine search would create a very rich search environment.

    On last thing: The lawn mowers. That's actually where I see the value of human-assisted search. My premise is that different people are expert about the most unlikely things, and that you won't find out unless you have some occasion to prick that interest. Let's say that I've developed confidence in the quality of information that a certain user distributes: it is good, well-researched and balance content. Then, if I am searching for some information about lawn mowers, and content associated with that user pops up, I'm going to have a fairly high degree of confidence that the content is reliable. A click-thru follows. That's the human assist.

  • http://www.viralhousingfix.com danielrmccarthy

    I wanted to re-read this comment before replying. Something about your thoughts sparked a rapid and simple series of connections that I haven't been able to really nail down. Meaning is all about context and connotation. Language is a remarkably flexible and elastic tool.

    Throughout our discussion about the potential for human-assisted search, I have imagined that tracking the exchange of content between two people would capture something of the context and intent in the communication, and that by capturing that nuance, the meaning of the content could be indexed with an understanding of relevance that transcended the information on the page.

    Your comment made me think about the powerful ways that search algorithmns can catalog and categorize the incidence and usage of terms. The ability to generate relevance from that data base grows with or without awareness of the quality and influence of the individual who used the terms.

    So, yes, merged technologies is the answer. And finding ways to align identity, semantic objects and traditional machine search would create a very rich search environment.

    On last thing: The lawn mowers. That's actually where I see the value of human-assisted search. My premise is that different people are expert about the most unlikely things, and that you won't find out unless you have some occasion to prick that interest. Let's say that I've developed confidence in the quality of information that a certain user distributes: it is good, well-researched and balance content. Then, if I am searching for some information about lawn mowers, and content associated with that user pops up, I'm going to have a fairly high degree of confidence that the content is reliable. A click-thru follows. That's the human assist.

  • http://www.viralhousingfix.com danielrmccarthy

    I wanted to re-read this comment before replying. Something about your thoughts sparked a rapid and simple series of connections that I haven't been able to really nail down. Meaning is all about context and connotation. Language is a remarkably flexible and elastic tool.

    Throughout our discussion about the potential for human-assisted search, I have imagined that tracking the exchange of content between two people would capture something of the context and intent in the communication, and that by capturing that nuance, the meaning of the content could be indexed with an understanding of relevance that transcended the information on the page.

    Your comment made me think about the powerful ways that search algorithmns can catalog and categorize the incidence and usage of terms. The ability to generate relevance from that data base grows with or without awareness of the quality and influence of the individual who used the terms.

    So, yes, merged technologies is the answer. And finding ways to align identity, semantic objects and traditional machine search would create a very rich search environment.

    On last thing: The lawn mowers. That's actually where I see the value of human-assisted search. My premise is that different people are expert about the most unlikely things, and that you won't find out unless you have some occasion to prick that interest. Let's say that I've developed confidence in the quality of information that a certain user distributes: it is good, well-researched and balance content. Then, if I am searching for some information about lawn mowers, and content associated with that user pops up, I'm going to have a fairly high degree of confidence that the content is reliable. A click-thru follows. That's the human assist.