Posted by RuthBurrReedy

Gone are the days of optimizing material only for search engines. For modern SEO, your material needs to please both robots and humans. But how do you know that what you’re writing can check the boxes for both man and machine?

In today’s Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google abuses NLP( natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.

Click on the whiteboard likenes above to open a high resolution version in a brand-new tab!

Video Transcription

Howdy, Moz followers. I’m Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a emporium technical market bureau specializing in technological SEO and boosted entanglement analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.

Those videos will be available online at some detail. [ Editor’s observe: that item is now !] But today I wanted to talk about one point from my talk that I ascertained real interesting and that has kind of changed the space that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually offset you a better scribe and help you write better content for humans. It is a win-win.

The relationships between entities, commands, and how people search

To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of occasion and a lot of energy and a good deal of fund on things like neural match and natural language processing, which seek to understand mostly when people talk, what are they talking about?

This vanishes along with the evolution of search to be more conversational. But “theres a lot” of days when someone is searching, but they don’t entirely know what they crave, and Google still craves them to get what they want because that’s how Google obliges money. They are spending a lot of epoch trying to understand the relationships between entities and between words and how people use statements to search.

The example that Danny Sullivan passed online, that I think is a really great example, is if someone is experiencing the soap opera effect on their Tv. If you’ve ever seen a soap opera, you’ve noticed that they seem kind of weird. Someone might be experiencing that, and not knowing what that’s called they can’t Google soap opera effect because they don’t know about it.

They might search something like, “Why does my TV looking funny? ” Neural matching facilitates Google understand that when somebody is searching “Why does my TV watch funny? ” one possible ask might be the soap opera effect. So they can serve up that arise, and beings are happy.

Understanding salience

As we’re thinking about natural language processing, a core component of natural language processing is understanding salience.

Salience, content, and entities

Salience is a one-word way to sum up to what stretch is this piece of content about this specific entity? At this spot Google is really good at extracting entities from a piece of content. Entity are basically nouns, people, residences, things, proper nouns, regular nouns.

Entities are things, parties, etc ., numbers, things like that. Google is really good at taking those out and saying, “Okay, here are all of the entities that are contained within this piece of content.” Salience attempts to understand how they’re related to each other, because what Google is really trying to understand when they’re crawling a page is: What is this page about, and is this a good example of a sheet about this subject?

Salience really goes into the second piece. To what stretch is any given entity be specific topics of a piece of content? It’s often astonishing the degree to which a piece of content that a person has created is not actually about anything. I think we’ve all suffered that.

You’re searching and you come to a sheet and you’re like, “This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn’t find what I needed. This wasn’t good intelligence for me.” As purveyors, we’re often on the other side of that, trying to get our clients to say what their produce actually does on their website or say, “I know you think that you originated a usher to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven clauses about your brand-new Instagram tool. This does not constitute actually a blog announce about Instagram for the holidays. It’s a piece of content about your tool.” These are the kinds of battles that we pushed as purveyors.

Natural Language Processing( NLP) APIs

Fortunately, there are now a number of different APIs that you can use to understand natural language processing:

IBM has one: https :// watson/ works/ natural-language-understanding / Google actually has a natural language processing API that’s right here on https :// natural-language /

Is it as intelligent as what they’re using on their own stuff? Probably not. But you can test it out. Put in a piece of content and participate( a) what entities Google is able to extract from it, and( b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?

So this natural language processing API, which you can try for free and it’s actually not that expensive for an API if you want to build a tool with it, will apportion each entity that it can extract a salient compose between 0 and 1, saying, “Okay, how sure are we that this piece of content is about this thing versus just containing it? “

So the higher or the closer you get to 1, the more self-confident appropriate tools is that this piece of content is about this thing. 0.9 would be really, really gone. 0.01 aims it’s there, but they’re not sure how well it’s pertained.

A yummy instance of how salience and entities cultivate

The example I have here, and this is not taken from a real piece of content — these numbers are made up, it’s exactly an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chipping cookies recipe, chocolate chipping cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a quite high salient score.

You would want the tool to feel moderately confident, yes, this case of content is about this topic. But what you can also see is the other entities it’s extracting and to what degree they are also salient to the topic. So “youre seeing” things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, carbohydrate, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate microchip cookie recipe.

But I think that it’s genuinely, truly important for us as SEOs to understand that salience is the future of relevant keywords. We’re beyond the time when to optimize for chocolate chipping cookie recipe, we would also be looking for things like chocolate recipe, chocolate microchips, chocolate cookie recipe, things like that. Stems, discrepancies, TF-IDF, these are all older methodologies for understanding what a piece of content is about.

Instead what we need to understand is what are the entities that Google, exercising its prodigious body of knowledge, applying things like Freebase, exercising large portions of the internet, where is Google examine these entities co-occur at such a proportion that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?

Using an expert is the best mode to create material that’s salient to a topic

So chocolate chip cookie recipe, we’re now likewise moving sure we’re adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we’re going to start seeing as a material vogue in SEO is that the best way to create content that is salient to a topic is to have an actual professional in that topic establish that content.

Somebody with late knowledge of a topic is naturally going to include co-occurring expressions, because they know how to create something that’s about what it’s supposed to be about. I think what we’re going to start considering is that people are going to have to start more for content marketing, sincerely. Regrettably, a great deal of corporations seems to think that content commerce is and should be cheap.

Content marketers, I feel you on that. It sucks, and it’s no longer the action. We need to start investing in content and investing in professionals to create that material so that they can create that penetrating, rich, salient content that everyone is actually needs.

How can you use this API to improve your own SEO?

One of the things that I like to do with this kind of information is look at — and this is something that I’ve done for years, just not in this context — but a prime optimization target in general is sheets that grade for a topic, but they rank on page 2.

What this often makes is that Google understands that that keyword is a topic of the sheet, but it doesn’t definitely understand that it is a good piece of content on that topic, that the sheet is actually alone about that content, that it’s a good rich. In other texts, the signal is there, but it’s weak.

What you can do is go content that grades but not feeling ill, run for your lives through this natural language API or another natural language processing implement, and look at how the entities are extracted and how Google is determining that they’re related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you’ll notice that while chocolate cookies is called a work of art, and I concur, cookie here is actually announced other.

This is because cookie necessitates more than one thing. There’s cookies, the broiled good, but then there’s too cookies, the container of data. Both of those are legitimate uses of the word “cookie.” Words have numerous symbolizes. If you notice that Google, that this natural language processing API is having trouble properly categorizing your entities, that’s a good time to go in and do some disambiguation.

Make sure that the terms bordering that call are clearly saying, “No, I convey the baked good , not the software portion of data.” That’s a really great way to various kinds of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You’d be amazed at how many cases of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.

A lot of eras the API is like “I think this is what it’s about, ” but it’s not sure. This is a great time to go in and bump up that content, make it more robust, and look at directions that you are able to make those entities easier to both remove and to relate to each other. This accompanies me to my second point, which is my new favorite thing in the world.

Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven’t had to do this in a long time, but the idea that you might keyword stuff or otherwise create material for Google that your consumers might not recognize or are worried about is behavior, method, route over.

Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together.

Tips for writing for human and machine readability: Reduce semantic distances!

What I’ve done here is I did some research not on natural language processing, but on writing for human readability, that is advice from novelists, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I drew out the segments of suggestion that too wield as cases of opinion for writing for natural language processing. So natural language processing, again, is the process by which Google or truly anything that might be processing language tries to understand how entities are related to each other within a devoted form of content.

Short, simple decisions

Short, simple decisions. Write simply. Don’t use a lot of flowery expression. Short-lived convicts and try to keep it to one plan per sentence.

One hypothesi per sentence

If you’re trot on, if you’ve got a lot of different riders, if you’re consuming a lot of pronouns and it’s becoming confusing what you’re talking about, that’s not immense for readers.

It also utters it harder for machines to parse your content.

Connect questions to answers

Then closely connecting questions to answers. So don’t say, “What is the best temperature to bake cookies? Well, let me tell you a narrative about my grandmother and my childhood, ” and 500 names later here’s the answer. Connect questions to answers.

What all three of those readability tips-off have in common is they boil down to reducing the semantic length between entities.

If you demand natural language processing to understand that two entities in your content are closely interrelated, move them closer together in the decision. Move the words closer together. Reduce the jumble, abbreviate the flub, reduce the number of semantic moves that a robot might have to take between one entity and another to understand the relationship, and you’ve now procreated material that is more understandable because it’s shorter and easier to glide, but likewise easier for a robot to parse and understand.

Be specific first, then show subtlety

Going back to the example of “What is the best temperature to broil chocolate chipping cookies at? ” Now the real answer to what is the best temperature to roast chocolate cookies is it depends. Hello. Hi, I’m an SEO, and I simply rebutted a question with it depends. It does depend.

That is true, and that is real, but it is not a good explanation. It is simply not the kind of thing that a robot could remove and procreate in, for example, tone pursuing or a featured snippet. If someone says, “Okay, Google, what is a good temperature to cook cookies at? ” and Google says, “It depends, ” that helps nobody even though it’s true. So in order to write for both machine and human readability, should be specified first and then you can explain nuance.

Then you can go into the details. So a better, just as correct answer to “What is the temperature to roast chocolate chipping cookies? ” is the best temperature to bake chocolate chipping cookies is frequently between 325 and 425 grades, depending on your altitude and how crispy you like your cookie. That is just as true as it depends and, in fact, signifies the same thing as it depends, but it’s a lot more specific.

It’s a lot more precise. It employs real numbers. It affords a real answer. I’ve diminished the length between the question and the answer. I didn’t say it depends first. I said it depends at the end. That’s the kind of thing that you can do to improve readability and understanding for both humans and machines.

Get to the object( don’t bury the lede)

Get to the point. Don’t bury the contribute. All of you journalists who try to become content purveyors, and then everybody in content marketing said, “Oh, you need to wait till the end to get to your point or they won’t read the whole thing, “and you were like, “Don’t bury the make, ” you are correct. For those of you who aren’t familiar with journalism speak , not hiding the precede mostly means get to the point upfront, at the top.

Include all the information that somebody would really need to get from that segment of content. If they don’t read anything else, they read that one paragraph and they’ve gotten the gist. Then people who want to go deep can go deep. That’s how people actually like to consume content, and amazingly it doesn’t mean they won’t spoke the contents. It exactly means they don’t have to read it if they don’t have age, if they are necessary a speedy answer.

The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You’ll have a much better organized case of content that’s easier to parse on all sides.

Avoid lingo and “marketing speak”

Avoid jargon. Avoid marketing speak. Not simply is it awful and very hard to understand. You see this a lot. I’m going back again to the example of get your clients to say what their products do. You work with a lot of B2B fellowships, you will you will often run into this. Yes, but what does it do? It supports solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own headings about, but it’s so important for consumers, for machines.

Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it’s used. That’s actually what esoteric intends. What that symbolizes is the less commonly a word is used, the less likely it is that Google is going to understand its semantic affinities to other entities.

Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and various kinds of the flub that can happen in your material, you’re likewise, once again, reducing the semantic intervals between entities, obliging them easier to parse.

Organize your information to match the user journey

Organize it and delineate it out to the user journey. Think about the information somebody might need and the line-up in which they might need it.

Break out subtopics with starts

Then break it out with subheadings. “Hes like” highly, very basic writing advice, and hitherto you all aren’t doing it. So if you’re not going to do it for your customers, get it on for machines.

Format registers with missiles or numbers

You can also genuinely repercussion skimmability for users by breaking out rosters with missiles or numbers.

The huge thing about that is that breaking out a directory with bullets or quantities also makes information easier for a robot to parse and remove. If a lot of these tips-off seem like they’re the same tips that you would use to get boasted snippets, they are, because boasted snippets are actually a pretty good indicator that you’re procreating content that a robot can find, parse, understand, and remove, and that’s what you want.

So if you’re targeting boasted snippets, you’re probably previously doing a great deal of these things, good chore.

Grammar and spelling count!

The last thing, which I shouldn’t have to say, but I’m going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don’t count to all users, but they count to users. They likewise count to search engines.

Things like grammar, spelling, and punctuation are very, so easy signals for a machine to find and parse. Google has been specific in things, like the “Quality Rater Guidelines, “that a well-written, well-structured, well-spelled, grammatically remedy record, that these are signs of authoritativeness. I’m not saying that having a greatly spelled substantiate is going to mean that you immediately projectile to the top of the results.

I am saying that if you’re not on that stuff, it’s probably going to hurt you. So make the time to make sure everything is nice and tidy. You can be utilized colloquial English. You don’t have to be perfect “AP Style Guide” all the time. But make sure that you are formatting things properly from a grammatical position as well as a technical perspective. What I love about all of this, this is just good writing.

This is good writing. It’s easy to understand. It’s easy to parse. It’s still so hard, especially in the marketing world, to get out of that nature of lingo, to get to the point, to stop writing 2,000 names because we think we need 2,000 oaths, to really think about are we developing content that’s about what the hell is think it’s about.

Use these tools to understand how comprehensible, parsable, and comprehensible your material is

So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the excellent keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I’m hoping is that you will use these tools to help yourself understand how understandable, how parsable, and how understandable your content is, how much your content is about what you say it’s about and what you think it’s about so you can create better nonsense for users.

It builds the internet a better place, and it will probably construct you some fund as well. So these are my thinkings. I’d love to hear in specific comments if you’re consuming the natural language processing API now, if you’ve constructed a implement with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.

Have a great Friday.

Video transcription by

Sign up for The Moz Top 10, a semimonthly mailer informing you on the top 10 hottest articles of SEO news, tip-off, and rad links uncovered by the Moz team. Think of it as your exclusive accept of stuff you don’t have time to hunt down but want to read!

Read more: