Train Me Please
  • Home
  • About
  • Services
  • Videos
  • Blog
  • Testimonials
  • Contact

The Law of Contiguity in Dog Training: Why Timing Matters

26/4/2025

0 Comments

 

​Training a dog can sometimes feel frustrating, especially when it seems like your dog simply isn't grasping what you are trying to teach. In many cases, the problem isn't the behaviour or even the learner — it's the timing. Understanding and applying the Law of Contiguity can make a remarkable difference in how quickly and effectively your dog learns.

The Law of Contiguity is a fundamental principle of learning which states that an association is formed between two events when they occur closely together in time. In the context of dog training, this means a dog is much more likely to associate a behaviour with a consequence if the consequence happens immediately after the behaviour.

Timing is absolutely critical. If you delay reinforcement — even by a few seconds — your dog may not understand what they did correctly. Instead, they might associate the reward with whatever they happen to be doing at the time the consequence occurs. This can cause confusion and slow learning, or worse, lead to associations you did not intend.

The same principle applies when attempting to discourage unwanted behaviour. Unfortunately, many well-intentioned dog guardians inadvertently violate the Law of Contiguity when using punishment or corrections. For instance, punishing a puppy for a toilet mistake hours after the event is not only ineffective but can also create unnecessary anxiety and fear. In such cases, because there is no close temporal proximity between the behaviour and the consequence, the puppy does not understand what they are being punished for. They may display submissive behaviours that look like 'guilt,' but these are simply reactions to the guardian’s current emotional state and body language — not an understanding of past actions. Importantly, even with impeccable timing, punishment procedures are better avoided when kinder and more effective ways are available.

Another example highlights the importance of timing when teaching new behaviours. Imagine teaching your dog to sit. If your dog sits and you deliver a reinforcer — like a treat or praise — within a second or two, they are likely to associate the sitting behaviour with the reward. However, if you wait 10, 30, or even 60 seconds before offering reinforcement, the opportunity for a clear association is lost. By then, your dog might be sniffing the floor, looking around, or engaging in a completely different behaviour, and the link between the sit and the reward disappears.

Tools like clickers in clicker training are designed to help bridge this gap by marking the correct behaviour at the precise moment it occurs, even if the reward follows slightly later. This way, the dog clearly understands which behaviour earned the reinforcement.

Picture

Timing also plays a crucial role when using negative punishment strategies. For instance, if a dog jumps on people, immediately withdrawing attention — such as turning away or leaving the room — can help the dog associate the jumping behaviour with the loss of social interaction. However, if attention is withdrawn only after a delay, the association becomes unclear, and learning is compromised. Importantly, while negative punishment can be an effective tool, using management strategies and teaching alternative appropriate behaviours are often more humane and effective long-term solutions.

Interestingly, the Law of Contiguity can be traced back to ancient philosophy. It was first proposed by Aristotle and has since been studied and validated by numerous philosophers and researchers. It applies not just to dogs, but to any being with a brain.

The take-home message is clear: the Law of Contiguity underscores the importance of immediate consequences in shaping behaviour. Whether reinforcing desired behaviours or discouraging unwanted ones, success depends on your ability to ensure that consequences closely follow the behaviours you want to address.

For a deeper dive into this important principle and practical tips on applying it in your training, check out the full video below.
​
0 Comments

The Clever Hans Effect: A Mathematical Behaviour Tale

10/4/2025

0 Comments

 

At the turn of the 20th century, a horse named Hans captivated the German public—and eventually, the world. Dubbed Clever Hans, this seemingly unremarkable horse was believed to possess astonishing intellectual abilities. Under the guidance of Wilhelm von Osten, a German mathematics instructor and amateur horse trainer, Hans was presented as an animal prodigy capable of solving arithmetic problems and even answering calendar-related questions.

According to von Osten, Hans could add, subtract, multiply, and divide. He could work with fractions, and his understanding extended beyond numbers into concepts related to dates and sequences. Whether questions were posed verbally or in writing, Hans would respond with a series of hoof taps—stopping, remarkably, at the correct number every time.

Von Osten toured with Hans across Germany, drawing large crowds wherever they went. Spectators were eager to witness what seemed like undeniable proof that animals, under the right circumstances, could possess intelligence rivalling that of humans. Hans’s performances were met with awe and curiosity, inspiring admiration as well as scepticism.

Enter Oskar Pfungst, a psychologist with a growing interest in animal behaviour and the scientific method. Sensing that there was more to Hans’s abilities than met the eye, Pfungst set out to investigate. In 1907, he conducted a series of careful and controlled experiments designed to isolate the true source of the horse's apparent intelligence.

Pfungst’s experimental design was ahead of its time. He tested Hans with multiple questioners, sometimes allowing them to know the answers and other times deliberately keeping them in the dark. He varied the presentation of questions and observed the horse under different environmental conditions. Over time, a clear pattern emerged.

Hans only performed well when the questioner knew the answer. When the person posing the question was unaware of the correct response—or if visual cues were blocked—Hans could no longer produce the right number of taps. This discovery led to a breakthrough: Hans wasn’t doing maths at all. Instead, he was reading incredibly subtle signals from humans—tiny changes in posture, shifts in facial expression, even minute muscle movements.

Pfungst’s findings dismantled the myth of Hans the mathematician but revealed something arguably more fascinating: the horse’s incredible sensitivity to human body language. This became known as the Clever Hans Effect, a term that remains relevant in scientific and training communities to this day. It describes the way an animal (or human) can unconsciously respond to involuntary cues provided by another individual, especially during testing or training.

What made Pfungst’s contribution so valuable wasn’t just the outcome—it was the methodology. His meticulous attention to experimental controls set a precedent for behavioural science. By demonstrating how unintentional cues could skew results, he highlighted the critical importance of controlling for observer bias in any investigation involving live subjects.

​
Picture

The Clever Hans Effect has wide-ranging implications. In scientific research, particularly in the study of behaviour and cognition, it has prompted the widespread use of double-blind procedures. These approaches ensure that neither the experimenter nor the subject knows which variables are being tested, helping prevent the kind of unconscious cueing that fooled the world in Hans’s case.

Even outside the lab, the effect plays a significant role. For example, when training drug-sniffing dogs, it’s essential that handlers are unaware of which containers contain contraband. If they know, they might—without realising—give away the location through a glance, a shift in stance, or a change in breathing. The dog, attuned to its human partner, may pick up on that signal and indicate a "find" based on human behaviour rather than scent detection.

This story also holds important lessons for those of us working closely with animals, whether as trainers, behaviour consultants, or curious observers. It reminds us to reflect carefully on what our animals are responding to and whether we might be shaping behaviour unintentionally. It challenges us to be more precise in our training, more thoughtful in our observations, and more humble in our assumptions.

Above all, the tale of Clever Hans is a powerful example of the scientific process in action. What began as a sensation built on anecdotal performance became, through careful investigation, a case study in critical thinking and experimental rigour. It urges us to meet extraordinary claims with healthy scepticism and to ask deeper questions about the mechanisms behind what we see.

So, the next time someone shares a story about an animal with seemingly supernatural abilities, take a moment to think of Hans. Let curiosity lead the way—but don’t forget the value of cautious inquiry and the importance of good experimental design.

To see the full story brought to life, check out the video on my YouTube channel, Train Me Please.



Reference
Bellows, A. (2007, February). Clever Hans the Math Horse. Damn Interesting. https://www.damninteresting.com/clever-hans-the-math-horse/

Photo reference
The “Clever Hans Phenomenon” revisited - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/clever-Hans-an-Orlov-trotter-horse-1895-1916-and-his-owner-and-teacher-Wilhelm-von_fig1_260376462 [accessed 25 Feb, 2024] 
0 Comments

Operant Conditioning in Dog Training: A Fresh Perspective

4/3/2025

0 Comments

 
Operant conditioning is a foundational concept in modern dog training, yet its depth and complexity are often under-appreciated. This article delves into its historical development, its relationship with classical conditioning, and its crucial role in behaviour modification.


The Origins of Operant Conditioning


While Ivan Pavlov was pioneering research into classical conditioning, psychologist Edward L. Thorndike was investigating how animals solve problems. He formulated the "Law of Effect," which states that behaviours followed by pleasant consequences are likely to be repeated, whereas those followed by unpleasant consequences are likely to diminish.


Building on Thorndike’s work, B.F. Skinner introduced the concept of operant conditioning in 1937. He argued that organisms learn responses by interacting with their environment, with behaviour being modified through its consequences. This laid the groundwork for applied behaviour analysis and modern training methodologies.


The Interplay Between Classical and Operant Conditioning


Although classical and operant conditioning are often discussed separately, they are deeply interconnected. In reality, distinguishing between the two is a simplification for analytical purposes. Both processes occur simultaneously: when we shape behaviour using operant conditioning, we also influence the learner’s emotional state through classical conditioning. This underscores the importance of considering both aspects in training.


Contingencies and the ABCs of Behaviour


For operant conditioning to occur, there must be a contingency—a causal relationship between behaviour and environmental consequences. The most fundamental form is a two-component contingency: for example, a dog sits and receives a treat, or a dog barks and another dog moves away.


In applied behaviour analysis, a more comprehensive model is the three-component contingency:


Antecedent: The cue or situation prompting the behaviour.
Behaviour: The dog’s response.
Consequence: The outcome that influences future behaviour.


For example, if a dog sees another dog too close (antecedent), barks (behaviour), and the other dog moves away (consequence), the barking behaviour is likely to be reinforced. Context plays a crucial role, as external factors can influence whether a behaviour occurs. For instance, a dog’s recall response may vary depending on environmental distractions.


The Nature of Operant Behaviour


Operant behaviour consists of voluntary actions that an organism performs in response to environmental stimuli. These behaviours are defined by their consequences, which can be classified into five categories:


Positive Reinforcement: Adding a stimulus to increase behaviour frequency (e.g., giving a treat for sitting).
Negative Reinforcement: Removing a stimulus to increase behaviour frequency (e.g., releasing pressure on a dog’s bottom when they sit).
Positive Punishment: Adding a stimulus to decrease behaviour frequency (e.g., yelling at a dog for jumping, if jumping decreases).
Negative Punishment: Removing a stimulus to decrease behaviour frequency (e.g., walking away when a puppy bites, if biting decreases).
Extinction: Eliminating reinforcement to reduce a behaviour (e.g., ignoring begging at the table to decrease the behaviour).


It is important to note that reinforcement and punishment are defined by their effects on behaviour, not by intent. Additionally, the terms "positive" and "negative" refer to the addition or removal of stimuli, not to their pleasantness.

Picture

​Understanding Negative Reinforcement


One of the most misunderstood concepts in operant conditioning is negative reinforcement. It involves increasing a behaviour by removing an aversive stimulus. However, for something to be removed, it must first be introduced—this often results in a combination of positive punishment and negative reinforcement.


A classic example involves training a horse to turn using a bit. When pressure is applied to the reins, the horse experiences discomfort. As soon as the horse turns, the pressure is released. The turning behaviour is negatively reinforced (because turning removes the pressure), while the behaviour of moving forward in a straight line is positively punished (because an aversive stimulus was added when the horse failed to turn).


The Implications of Extinction


Extinction occurs when a behaviour that was previously reinforced is no longer reinforced, leading to a decrease in its occurrence. However, this process is often accompanied by an extinction burst, where the behaviour temporarily intensifies before fading. If reinforcement resumes, spontaneous recovery can occur, reinstating the behaviour. This unpredictability makes extinction a less reliable behaviour modification tool compared to reinforcement-based strategies.


The Risks of Positive Punishment


Scientific evidence strongly advises against using positive punishment as a primary training method. It carries a high risk of negative side effects, including fear, aggression, and damage to the trainer-dog relationship. Furthermore, it often fails to provide the learner with an alternative behaviour to perform, making reinforcement-based approaches more effective and ethical.


Final Thoughts


Operant conditioning is an essential framework for understanding and modifying behaviour. However, its application requires careful consideration of contingencies, reinforcement schedules, and the broader emotional impact on the learner. By leveraging reinforcement and minimising punishment, we can create ethical, effective training strategies that benefit both dogs and their trainers.


To explore these concepts further, watch our YouTube video on operant conditioning and download our digital handout for additional insights and practical applications.


​Digital Handouts available:

Operant Conditioning Quadrants and Extinction https://www.buymeacoffee.com/trainmeplease/e/130745
Flowchart of Basic Operant Conditioning Procedures
https://www.buymeacoffee.com/trainmeplease/e/130754


References:
O'Heare, J. (2010) Changing Problem Behavior. BehaveTech Publishing.
0 Comments

Classical Conditioning in Dogs: The Science of Behavioural Associations

13/2/2025

0 Comments

 

Classical conditioning is one of the foundational principles of learning that governs behaviour across species, including dogs and humans. First identified by the Russian physiologist Ivan Pavlov in the late 19th and early 20th centuries, this process explains how associations between stimuli shape behavioural responses. Understanding classical conditioning is crucial for dog trainers, pet guardians, and behaviour professionals, particularly when addressing emotional responses such as fear and excitement in dogs.


Pavlov’s Discovery and the Basics of Classical Conditioning

Ivan Pavlov’s research initially focused on physiological processes, specifically salivation in dogs. During his experiments, he observed that his dogs would begin to salivate before the food was presented, merely in anticipation of the meal. This unexpected response led him to investigate the mechanisms behind it, eventually formulating the principles of classical conditioning.

Although popular explanations often refer to Pavlov using a bell in his experiments, he actually used a metronome. The bell has become a common example because it simplifies the explanation.

Pavlov identified several key components in this process:
  • Unconditioned Stimulus (US): A stimulus that naturally elicits a response (e.g., food).
  • Unconditioned Response (UR): The instinctive reaction to the unconditioned stimulus (e.g., salivation when food is present).
  • Neutral Stimulus (NS): A stimulus that initially has no effect on behaviour (e.g., the sound of a bell before conditioning).
  • Conditioned Stimulus (CS): The formerly neutral stimulus that, after repeated association with the unconditioned stimulus, elicits a response on its own (e.g., the bell after conditioning).
  • Conditioned Response (CR): The learned response to the conditioned stimulus (e.g., salivation at the sound of the bell, even in the absence of food).

This process highlights how an originally neutral stimulus can acquire meaning through association, leading to predictable behavioural outcomes.


​The Role of Classical Conditioning in Dog Training

Classical conditioning is highly relevant in dog training, particularly in shaping emotional responses and addressing behavioural issues. Many of the associations dogs form with their environment stem from classical conditioning, influencing how they react to people, objects, and experiences.

​Common Examples in Everyday Life
  1. Leash Excitement: If a dog consistently experiences a walk after seeing their guardian pick up the leash, the sight of the leash alone will begin to elicit excitement.
  2. Feeding Cues: The sound of a can opener may prompt a cat to run to their food bowl, anticipating their meal due to repeated associations.
  3. Fear of Nail Clippers or Syringes: If a dog experiences discomfort every time their nails are clipped or they receive an injection, they may begin to fear these objects even before anything happens, as the clippers or syringe become conditioned stimuli for discomfort.

These examples illustrate how classical conditioning operates in daily interactions with animals, often shaping their emotional states without deliberate training efforts.
​
Picture

Emotional Responses and Behaviour Modification

Classical conditioning plays a significant role in addressing behavioural concerns rooted in emotions, such as fear and anxiety. A fearful reaction to a raised hand, for example, may result from past punishment, leading the dog to associate a raised hand with negative experiences. Similarly, a dog who receives treats and praise when meeting new people may develop a positive emotional response toward social interactions.
Understanding and applying classical conditioning principles can help modify problematic behaviours by replacing negative associations with positive ones. This process, known as counterconditioning, is often used in behaviour modification strategies to reduce fear-based responses in dogs.


Classical vs. Operant Conditioning

It is important to distinguish between classical and operant conditioning. Classical conditioning involves associations between stimuli and involuntary responses, while operant conditioning deals with voluntary behaviours and their consequences. However, in real-world training, these two learning processes often occur simultaneously.
For example, when teaching a dog to sit, the act of sitting is reinforced (operant conditioning), but the emotions associated with training—whether positive or negative—are influenced by classical conditioning. A positive training experience leads to a dog feeling relaxed and eager to participate, while aversive methods may induce anxiety or fear, creating negative associations with training sessions.


Conclusion

Classical conditioning, also known as Pavlovian or respondent conditioning, is a powerful mechanism that shapes behaviour in dogs and humans alike. From basic reflexes to complex emotional responses, this scientific principle provides invaluable insight into learning and behaviour. By applying classical conditioning thoughtfully in training, pet guardians and trainers can create positive associations that enhance their dogs' well-being and foster better communication.
​
To explore this topic further and see how classical conditioning shapes behaviour, watch the full video below.
​
0 Comments

The Problem with Flooding in Dog Behaviour Modification

1/2/2025

2 Comments

 
Flooding as a dog behaviour modification procedure is almost always a terrible idea. Yet, it remains a popular approach in dog TV shows and online content. Unfortunately, people searching for information on dog training—particularly about reactivity and aggression—are frequently exposed to methods that rely on flooding.
This is both disappointing and unsurprising.


The Influence of Media and Popular Culture

In the early 2000s, the TV show Fear Factor captivated audiences by forcing participants to confront their deepest fears—whether that involved snakes, spiders, or extreme heights. The show had no real therapeutic value; in fact, the psychological impact of these forced exposures ranged from negligible to outright harmful. However, it made for commercially appealing content.
The same principle applies to many popular dog training shows. The promise of a dramatic transformation in just minutes makes for compelling viewing, but what is really happening?
In many cases, these so-called transformations fall under the umbrella of flooding—a procedure that is not only ineffective but also often misapplied.


What Is Flooding in Dog Training?

Flooding is a technique in which a dog is exposed to a feared stimulus at full intensity while being prevented from escaping or avoiding it. The idea is that the dog will eventually stop exhibiting fear responses.
For example:
  • A dog afraid of loud noises might be confined to a room while loud sounds play, with no option to leave.
  • A dog fearful of water might be forced into a pool and prevented from escaping.
  • A reactive dog may be placed close to other dogs while escape behaviours like barking and lunging are suppressed.
In theory, once the dog stops reacting, it has "overcome" its fear. In reality, this process often leads to significant behavioural and emotional harm.


The Problems with Flooding

More Ethical and Effective Alternatives Exist
Ethical and scientifically sound methods, such as desensitisation and counterconditioning, achieve long-lasting behaviour change without unnecessary distress.

Incorrect Application
Even in human clinical settings, flooding is rarely used, and when it is, it is carefully controlled. The aversive stimulus must be removed once the fear response ceases. In dog training, this condition is almost never met. Instead, dogs are often forced into overwhelming situations where their fear is intensified rather than reduced.

Risk of Learned Helplessness
Flooding can lead to a state known as learned helplessness, where the dog stops responding because it has learned that its actions have no impact on the environment. This can generalise beyond the specific fearful situation, leading to an overall suppression of behaviour.

Potential for Lasting Psychological and Physiological Damage
Studies indicate that exposure to high-stress situations can cause long-term emotional and physiological side effects. Since animals cannot verbally communicate their distress, it is difficult to assess when emotional harm has occurred, making flooding a highly risky approach.


Flooding in a Clinical Context

In human psychology, exposure therapy is sometimes used for PTSD, OCD, and anxiety disorders. However, this process is overseen by highly trained professionals in controlled environments. The same level of expertise is rarely present in dog training, where flooding is often applied by self-taught trainers with no formal education in behaviour science.
​
If you needed help overcoming a serious fear, would you trust someone without formal training? The same consideration should apply to dogs.


What’s Really Happening in Popular Dog Training Shows?

When we see a dog that appears "cured" after a short, intense exposure, it is often not because the fear has been resolved. More likely, the dog has entered a state of learned helplessness. To the untrained eye, the dog looks calm, but in reality, it has simply stopped trying to escape because it has given up.
​
Picture

​A Better Alternative: Desensitisation and Counterconditioning


For almost every case, desensitisation and counterconditioning are far superior to flooding. These methods involve gradually exposing the dog to a feared stimulus at a low intensity while pairing it with positive experiences. This allows the dog to form new, positive associations and adapt without overwhelming fear.
​
In my years of experience training dogs, I have yet to encounter a case where flooding was the best or even a reasonable option. Not only that, but I frequently consult with or refer cases to professionals who specialise in behaviour modification, ensuring that dogs receive the best possible care.


Final Thoughts

While dramatic, flooding-based transformations may make for good TV, they do not make for ethical or effective dog training. If you are facing behavioural challenges with your dog, seeking guidance from a qualified professional with formal education should be your first step. Ethical, science-based approaches not only produce better results but also protect your dog’s well-being.

If you’d like to learn more about how to use desensitisation and counterconditioning effectively, I have a video on my YouTube channel covering these techniques in detail. I’ll link it below.

By choosing humane, science-based methods, we can ensure that dogs receive training that is both effective and compassionate.

2 Comments

Schedules of reinforcement in animal training

1/2/2018

10 Comments

 
Picture
​There are several options in terms of reinforcement schedules that can be used for behaviour modification. In this text I will provide you with a quick description of each of the different simple schedules and a couple of examples for each (one human example and one animal training example). I will also offer a couple of considerations for people debating the idea of which schedule to use for a given situation.
 
Early in my career I was told that, in general, a good way to go about training animals would be to use a continuous schedule of reinforcement for teaching a new behaviour and to then maintain the behaviour using a “variable schedule of reinforcement”. This is a very broad statement and one that seems to make sense to someone being introduced to animal training. However, is this really the best option to go about when training animals? And what do people mean when they mention “a variable schedule of reinforcement”? Let’s start by defining the most common types of simple schedules of reinforcement according to Paul Chance’s book Learning and behavior (2003; figure 1).
Picture
Figure 1 – The most common types of simple reinforcement schedules
 
​
The simplest type of reinforcement schedule is a Continuous reinforcement schedule. In this case every correct behaviour that meets the established criteria is reinforced. For example, the dog gets a treat every time it sits when asked to do so; the salesman gets paid every time he sells a book.
 
Partial Schedules of reinforcement can be divided into Fixed Ratio, Variable Ratio, Fixed Interval and Variable Interval.
 
In a Fixed Ratio reinforcement schedule, the behaviour is reinforced after a certain amount of correct responses has occurred. For example, the dog gets a treat after sitting three times (FR 3); the salesman gets paid when four books are sold (FR 4).
 
In a Variable Ratio reinforcement schedule, the behaviour is reinforced when a variable number of correct responses has occurred. This variable number can be around a given average. For example, the dog gets a treat after sitting twice, after sitting four times and after sitting six times. The average in this example is four, so this would be a VR 4 schedule of reinforcement. Using our human example, if the salesman gets paid after selling five, fifteen and ten books he would be on a VR 10 schedule of reinforcement, given than ten is the average number around which his payments are offered.
 
In a Fixed Interval reinforcement schedule, the behaviour is reinforced after a certain behaviour has happened, but only when that behaviour occurs after a certain amount of time. For example, if a dog is in a FI 8 schedule of reinforcement it will get a treat the first time it sits, but sitting will not produce treats for the next 8 seconds. After the 8 second period, the first sit will produce a treat again. The salesman will get paid after selling a book but then not receive payment for each book sold for the next 3 hours. After the 3-hour period, the first book he sells results in the salesman getting paid again (FI 3).
 
In a Variable Interval reinforcement schedule, the behaviour is reinforced after a certain variable amount of time has elapsed. The amount of time can vary around a given average. For example, instead of always reinforcing the sit behaviour after 8 seconds, that behaviour could be reinforced after 4, 8 or 12 seconds. In this case the average is 8, so it would be a VI 8 schedule of reinforcement. The salesman could be paid when selling a book after 1, 3 or 5 hours, a VI 3 schedule of reinforcement.
 
The next question would be “How do the different schedules of reinforcement compare to each other?”. Kazdin (1994) argues that a continuous schedule of reinforcement or at the very least a “generous” schedule of reinforcement is ideal when teaching new behaviours. After a behaviour has been learned, the choice of which type of reinforcement schedule to use becomes somewhat more complex. Kazdin also mentions that behaviours maintained under a partial schedule of reinforcement are more resistant to extinction than behaviours maintained under a continuous schedule of reinforcement. The thinner the reinforcement schedule for a certain behaviour, the more resistant to extinction that behaviour is. In other words, the learner presents more responses for less reinforcers under partial schedules when compared to a continuous schedule of reinforcement.
 
According to figure 2 we can see that, in general, a variable ratio schedule produces more responses for a similar or lower number of reinforcers than other partial schedules of reinforcement. In many situations it also seems to produce those responses faster and with little latency from the individual. This information, along with my own personal observations and communication with professionals in the field of animal training, makes me believe that when trainers use the broad term “Variable Schedule of Reinforcement” they usually mean a variable ratio schedule. 
Picture
​Figure 2 – Behaviour responses under the most common types of partial schedules of reinforcement (Chance, 2003; Kazdin, 1994; Schunk, 2012).
 

​A variable ratio schedule might elicit the highest response rate, a constant pattern of responses with minimal pauses and the most resistance to extinction. A fixed ratio has a slightly lower response rate, a steady pattern of responses and a resistance to extinction that is dependent on the ratio used. A fixed interval schedule produces a moderate response rate, a long pause in responding after reinforcement followed by gradual acceleration in responding and a resistance to extinction that is dependent on the interval chosen (the longer the interval, the more resistance). A variable interval has a similar response rate, a steady pattern of responses and is more resilient to extinction than a fixed interval schedule. These characteristics of partial schedules of reinforcement are summarised in table 1.
 
Table 1 – Characteristics of the most common types of partial schedules of reinforcement (Wood, Wood & Boyd, 2005).
Picture

​With all these different types of schedules, each with different characteristics you might be wondering: “Do I need to master all of these principles to successfully train my pet at home?” The quick and simple answer is “No, you don’t”. For most animal training situations, a continuous schedule of reinforcement will be a simple, easy and effective tool that will yield the results you want.
 
Doing a training session with your dog in which you ask for behaviours on cue when the dog is in front of you (sit, down, stand, shake, play dead) could be very well maintained using a continuous schedule. A continuous schedule of reinforcement would be an efficient and easy approach and it would allow you to change the cue or stop a behaviour easily (faster extinction) if you change your mind about a given behaviour later. One could argue that a variable ratio schedule would possibly produce more responses with less reinforcement, and a higher resistance to extinction for these behaviours. One of the disadvantages of this option would be the possibility of a ratio strain (post-reinforcement pauses or decrease in responding).
 
Some specific situations might justify the maintenance of a behaviour using partial schedules of reinforcement. For example, when a dog has learned that lying down on a mat in the living room results in reinforcement, the dog’s carer could maintain this behaviour using a variable interval schedule of reinforcement, in which the dog only gets reinforced after varying amounts of time for lying on the mat. Martin and Friedman (2011) offer another example in which partial reinforcement schedules could be helpful. If a trainer wants to train a lion to make several trips to a public viewing window throughout the day, the behaviour should be trained using a continuous schedule to get a high rate of window passes in the early stages. The trainer should then use a variable ratio schedule of reinforcement to maintain the behaviour. They do advise however, that this would require “careful planning to keep the reinforcement rate high enough for the lion to remain engaged in the training”.
 
The process of extinction of a reinforced behaviour means withholding the consequence that reinforces the behaviour and it is usually followed by a decline in the presentation of that behaviour (Chance, 2003). Resistance to extinction can be an advantage or a disadvantage depending on which behaviour we are considering. For example, one could argue that a student paying attention to its teacher would be a behaviour that should be resistant to extinction, and so, a good option to be kept on a partial schedule of reinforcement. On the other hand, a dog that touches a bell to go outside could be kept on a continuous schedule of reinforcement. One of the advantages of this approach would be that, if in the future the dog’s owner decides that she no longer wants the dog to touch the bell, by not reinforcing it anymore, the behaviour could cease to happen relatively fast.
 
While I do believe that for certain specific situations, partial schedules of reinforcement might be helpful, I would like to take a moment to caution against the use of a non-continuous pairing of bridge and backup reinforcer. Many animal trainers call this a “variable schedule of reinforcement” when in practical terms this usually ends up being a continuous reinforcement schedule that weakens the strength and reliability of the bridge. For more information on this topic check my blog post entitled “Blazing clickers – Click and always offer a treat?”.
 
When asked about continuous vs. ratio schedules, Bailey & Bailey (1998) have an interesting general recommendation: “If you do not need a ratio, do not use a ratio. Or, in other words, stick to continuous reinforcement unless there is a good reason to go to a ratio”. They also describe that they have trained and maintained numerous behaviours with a wide variety of animal species using exclusively a continuous schedule of reinforcement. They raise some possible complications when deciding to have a behaviour maintained on a ratio schedule. The example given is of a dog’s sit behaviour being maintained on a FR 2 schedule of reinforcement: “You tell the dog sit – the first response is a bit sloppy, the second one is ok. You click and treat. What have you reinforced? A sloppy response, chained to a good response.”
 
Karen Pryor (2006) also has an interesting view on this topic. She mentions that during the early stages of training a new behaviour you start by using a continuous schedule of reinforcement to get the first few responses. Then, when you decide to improve the behaviour and raise criteria, the animal is put on a variable ratio schedule, because not every response is going to result in reinforcement. This is an interesting point, because the trainer could look at this situation and still read it as a continuous schedule of reinforcement, when in reality the animal is producing responses that are not resulting in reinforcement. At this point in time, only our new “correct responses” will result in reinforcement. From the learners’ point of view the schedule has become variable at this stage. Pryor concludes that when the animal “is meeting the new criterion every time, the reinforcement becomes continuous again.”
 
Pryor (2006) suggests that the situations in which you should deliberately use a variable ratio schedule of reinforcement are: “in raising criteria”, when “building resistance to extinction during shaping” and “for extending duration and distance of a behaviour”. Regarding the situations in which we should not use it, she starts by saying that we should never use a variable ratio schedule purely as “a maintenance tool”. She adds that “behaviours that occur in just the same way with the same level of difficulty each time are better maintained by continuous reinforcement”. Pryor also advises against the use of a variable ratio schedule for maintaining chains, because “failing to reinforce the whole chain at the end of it would inevitably lead to pieces of the chain beginning to extinguish down the road.” Finally, she does not recommend using such a schedule of reinforcement for discrimination problems such as scent, match to sample tasks, or any other training that requires choice between two or more items.
 
In conclusion, there are a few possible schedules of reinforcement that can be effectively used to train and maintain trained behaviours for our pets. Each has its own set of characteristics, but for most training situations, a continuous schedule of reinforcement is a simple, efficient and powerful tool to effectively communicate with our pets. Some specific training situations might be good candidates for partial schedules of reinforcement. In those situations, you should remember to follow each bridge with a backup reinforcer, plan your training well and keep the reinforcement rate high enough for the animal to remain engaged. Have fun with your training!
 


Bailey, B., Bailey, M., (1998). "Clickersolutions Training Articles - Ratios, Schedules - Why And When". Clickersolutions.com. N.p., Accessed 2 February 2018.

Chance, P. (2003). Learning and behavior (5th ed.). Belmont: Thomson Wadsworth.

Kazdin, A. (1994). Behavior modification in applied settings (5th ed.). Belmont: Brooks/Cole Publishing Company.

Martin, S., Friedman, S.G., (2011, November). Blazing clickers. Paper present at Animal Behavior Management Alliance conference, Denver. Co.

Pryor, K. (2006). Reinforce Every Behavior?. Clickertraining.com. Retrieved 2 February 2018, from https://clickertraining.com/node/670

Schunk, D. (2012). Chapter 3: Behaviorism. In Learning theories: An educational perspective (6th ed., pp. 71-116). MA: Pearson.

Wood, S., Wood, E., & Boyd, D. (2005). The world of psychology (5th ed., pp. 180-190). Boston: Allyn & Bacon. Retrieved from http://www.pearsonhighered.com/samplechapter/0205361374.pdf

​Picture: www.morguefile.com
10 Comments

Jackpots in Animal Training

15/6/2017

6 Comments

 
Picture

​In 2009, shortly after I started training animals on a more ongoing basis, one of the first concepts that I learned from fellow animal training colleagues was the concept of a jackpot. A quick google search yields the following definition “a large cash prize in a game or lottery, especially one that accumulates until it is won.” For animal training purposes the following definition is more commonly used “giving a dog a really big reward, often a large number of treats, all at once. It is usually reserved for a breakthrough moment or a desired behaviour that the dog only occasionally performs” (Schwarz, 2016). This is the first definition that I have been exposed to and the idea behind it is that by offering a large reward the behaviour that preceded it is somehow more likely to be remembered better and repeated in the future.
 
My first contact with this concept was in a context in which the animals were trained using a bridge or bridging stimulus (e.g. a click from a clicker) for both learning and maintaining known behaviours. Known behaviours do not necessarily need a bridge to be maintained, but that is a topic for another discussion. The theory goes that if Fido gets a click and three pieces of food for a perfect sit and a click and only one treat for a decent, but not perfect sit, he will be more likely to do perfect sits in the future.
 
We might be inclined to assume that an animal will be tuned in to the magnitude or quality of the reinforcer in a way that makes some variations of the same behaviour more likely to be repeated than others. However, is that really what happens? Does the animal actually remember the topography of that behaviour better because she got 5 or 10 food treats after the click instead of the standard one treat? In this text we will explore the function and the best use of jackpots in animal training by relying on the opinion defended by animal training professionals.
 
Jackpots are commonly used as a special reward for excellent behaviours. They are an attempt by the trainer to capitalise on a behaviour (or a variation of the behaviour) that the trainer particularly likes.  This seems to be based on the assumption that a particular special reward will increase the chances of similar responses in the future. For example, Kazdin (1994) mentions that “The greater the amount of the reinforcer delivered for a response, the more frequent the response will be.” However, research confirming this rationale regarding animal training, with a bridging stimulus, is hard to find. If you click and pay more than one treat there are a few things happening that are helpful for your training program, but those things might differ from the traditional interpretation of jackpots. So, let’s start by having a look at some quotes by international references in the world of animal training and how they contrast with the common understanding of jackpots in animal training.
 
“A jackpot serves to charge up future performance but does little to communicate to the animal that his previous actions were special.” (Reid, 2012).
 
“If you click, and then deliver the treat afterwards, an especially large, numerous, or wonderful treat is no different from any other treat, in terms of its ability to reinforce behavior.” (Pryor, 2006).
 
“Click means treat is coming. If the treat is sometimes a kibble and sometimes chicken, sometimes small and sometimes huge, that's fine, it keeps your clicker nice and strong; but it doesn't tell the animal anything different about the behavior.” (Pryor, 2006).
 
“When it comes to training a new behavior, it's rare that a jackpot would work in having the dog repeat the jackpot earning behaviour.” (Fisher, 2009).
 
“Jackpots make the giver feel good, but they interrupt the flow of training and focus the dog on the food, rather than the task. (…) Overall, it's clarity of criteria and a consistently high rate of reinforcement that leads to a solid behavior.” (Alexander, 2006).
 
As you can see from these quotes, there are several animal training specialists suggesting a different interpretation for what really happens when we bridge a behaviour and offer a bigger reward after. Let’s explore their rational and look into what really happens when we use such an approach.
 
Clicking and paying several treats can increase the value of the clicker. Given that there is some variability regarding what happens after the click, that stimulus (the click) remains nice and strong from the animal’s perspective (Pryor, 2006). When a large reward is offered in the beginning of a training session it can motivate the animal and increase interest in the task. It can make the animal increase its activity level and it can trigger subsequent variable behaviour (Fisher, 2009). So, as you can see offering several rewards after the click can actually accomplish a few handy things. These are some of the things that happen when we use the traditional interpretation of jackpots in animal training. Now let´s have a look at a few things that do not necessarily happen.
 
Clicking and offering several treats does not provide the animal with any additional information about the behaviour that she just did. Offering more than one treat after the click is also unlikely to strengthen a behaviour over another; what ultimately accomplishes that goal is when you choose to use your clicker: clicked and rewarded behaviours are more likely to occur in the future when compared to behaviours that do not get a click and reward. For many practical situations in which we are training our pets, offering several treats after the click simply tells the animal that sometimes it gets more treats than usual (Pryor, 2006; Farricelli, 2014).
 
For her Masters Thesis, dog trainer Elizabeth Kershaw (2002) conducted a dog training experiment that tried to measure the effects of magnitude of reinforcement after the click when dogs are learning a new task. She had two groups of dogs learning to touch a cone with their nose and with their paw. One group progressed through criteria with one click and one treat all the time (constant group), whilst another group progressed through criteria with one click and one treat most of the time and an occasional click and delivery of larger reinforcement amounts (jackpot group). Overall, significant differences in performance between the two groups could not be detected.
 
Kershaw (2002) also mentions that using a jackpot to reward a breakthrough when the dog is learning a new behaviour might be a better option when it marks the end of the session. Using a jackpot halfway through a session, when you intend to continue immediately might be counter-productive. This can cause the dog to not be able to associate the larger reward received with the intended behaviour because a longer period of eating can disrupt the learning flow.
 
Fisher (2009) offers some interesting additional considerations about the traditional use of jackpots in dog training. She mentions that the longer it takes for the animal to eat the reward, the more the behaviour might be subject to memory decay (a disconnect between the reward and the behaviour that caused it). Instead of strengthening a behaviour, jackpotting can elicit the dog to follow it with a different behaviour. For speedy learning, a short time span between reward and the next repetition might be ideal and the training will progress faster with a rapid rate of reinforcement (many repetitions, each resulting in quick to ingest treats).
 
So, what if we still want to incorporate jackpots in our animal training sessions? What are the properties of a real jackpot? A real jackpot should function as an event marker (no bridge required) and almost startle the animal, it should consist of an unusual primary reinforcer and it has to make that behaviour more likely to happen again. A jackpot, when used correctly, should be an astonishingly big reinforcer, delivered contingently. The jackpot has to appear while the animal is doing the behaviour, not afterwards (Pryor, 2006). If the reward is offered after the behaviour we enter the realm of the non-contingent reward.
 
A non-contingent reward is a reward that is offered after the behaviour has occurred as opposed to while the behaviour is occurring. A non-contingent reward is not necessarily associated with any specific behaviour, it can be used to encourage the animal in a given situation and it can increase motivation (Pryor, 2006). Pryor offers the example of a slot machine jackpot, which is always delivered contingently (while you are playing), so that the act of playing is heavily rewarded. Compare this with a situation in which you play the slot machine, you then go out for dinner, then you go to a music concert and finally you return to your hotel bedroom to find a huge sum of money on your bed. This is the same amount of money as the slot machine jackpot, but this time it was not delivered contingently. Hard to say which behaviour would increase in this case... Perhaps the going back to the hotel, but not necessarily the slot machine playing.
 
To conclude, when we click and offer a bigger reward (say 3, 5 or 10 treats) we might be maintaining the animal’s motivation high or increasing it, which means that the following behaviours can be more enthusiastic. The bridge is also kept nice and strong, because there is some variability in what happens after we use it. This procedure is not a real jackpot though. A real jackpot should be totally unexpected and almost startle the animal, it should be a rare event, and you should not click the behaviour (clicking and offering lots of treats will make changes in the connection between the bridge and the reinforcer; not in the behaviour that you bridged).
 
Animal training is a fluid technology that is constantly being updated. 20 or 30 years ago we were probably looking at jackpots in animal training in a different way than we are today. A few years from now, today’s knowledge might get updated and refined. That is the beauty of the animal training world and I can’t wait to see what the next chapter brings us.
 
 
Alexander, M. (2006). Should You "Jackpot" Outstanding Responses? Clickertraining.com. Retrieved 1 May 2017, from https://clickertraining.com/node/632

Farricelli, A. (2014). Using Jackpots of Treats in Dog Training. hubpages. Retrieved 5 May 2017, from https://hubpages.com/animals/Using-Jackpots-of-Treats-in-Dog-Training

Fisher, G. (2009). The Thinking Dog: Crossover to Clicker Training (1st ed.). Wenatchee, Wash.: Dogwise Pub.

Kazdin, A. (1994). Behavior modification in applied settings (5th ed., p. 147). Pacific Grove, CA: Brooks/Cole Publishing Company.

Kershaw, E. (2002). An evaluation of the use of magnitude of reinforcement, i.e. “jackpot” rewards, during shaping in the training of pet dogs. (MSc). University of Southampton New College.

Pryor, K. (2006). Jackpots: Hitting it Big | Karen Pryor Clicker Training. Clickertraining.com. Retrieved 29 April 2017, from https://clickertraining.com/node/825

Reid, P. (2012). Dog Insight (1st ed.). Wenatchee, WA: Dogwise.

Schwarz, S. (2016). AgilityNerd Dog Agility Blog : Better Jackpot Rewards. Agilitynerd.com. Retrieved 4 May 2017, from http://agilitynerd.com/blog/dog/training/Jackpot.html

​Picture: www.morguefile.com
6 Comments

Blazing clickers – Click and always offer a treat?

13/5/2016

6 Comments

 
Picture
In modern animal training it is common to use a clicker or any other type of bridging stimulus to allow for better communication during the training process. The clicker allows us to tell the animal when the correct behaviour has been done. It also helps establishing contingency between the behaviour and its consequence, thus strengthening the behaviour. Research from Pavlov and Skinner suggests that, for the clicker to retain its full strength as an event marker, every click should be paired with a backup reinforcer (1:1 click-treat pairing; for a review on this topic see Fernandez 2001). Many trainers around the world do not pair every click with another well-established reinforcer (a non-1:1 pairing) and still achieve successful outcomes when training animals. Martin and Friedman (2011) call this approach “blazing clickers” and define it as “the unsystematic, rapid-fire clicking of each correct response in a series of correct responses, without following every click with a well-established, backup reinforcer, i.e., click, no treat”.

The following text is a discussion of my personal experience, the experts’ opinion, the controlled experiments and the literary references that relate to the topic of click-treat pairing in animal training. By debating it I hope to bring some clarity to the issue and help animal trainers make a more informed decision when deciding on how to use their bridging stimulus. Learning theory applies to all species, so the following content should be equally relevant for those of you training dogs, pet rats or even dolphins, if you are lucky enough to have the opportunity to do that for a living.

Terminology

To make this text easier to understand, I will refer to some of the terminology that Martin and Friedman (2011) used in their insightful paper “Blazing Clickers”, as indicated below:

“1. The word click refers to any conditioned reinforcer used in training to reinforce a behaviour with super contiguity. It is used synonymously with conditioned or secondary reinforcer, bridging stimulus, bridge, event marker and marker.
2. The word treat refers to any well-established reinforcer, conditioned or unconditioned, used to condition and maintain the reinforcing strength of the click. Treat is used synonymously with backup reinforcer (most often in animal training the backup reinforcer is food).
3. The term blazing clickers refers to the practice of repeatedly clicking without systematically delivering the backup reinforcer, also referred to a solo clicks.”
4. Many trainers mention a Variable Schedule of Reinforcement when they are technically referring to a Variable Ratio Schedule of Reinforcement. For simplicity, in this text we will maintain the general term "Variable Schedule of Reinforcement", but keep in mind that Partial Schedules of Reinforcement are divided in Fixed Ratio, Variable Ratio, Fixed Interval and Variable Interval. 

Personal observations and experience

My first contact with animal training happened when I was training some dogs from family members and friends. I was reading animal training books, watching dog training DVDs and then practicing the learned skills with these dogs. I soon realised how powerful classical and operant conditioning can be and got absolutely hooked on the topic. Soon after that I started working in the marine animal training field and quickly learned that worldwide there were lots of parks, zoos and aquariums in which the trainers used a Blazing Clickers (BC) approach (a non-1:1 pairing of clicker and backup reinforcer). This was a surprise to me because I had never seen this approach being suggested in any animal training book or article that I had encountered. Yet, much to my surprise these animals were extremely well trained and capable of amazing behaviours. The field of marine animal training is full of incredible people that love to share their passion, techniques and successes with fellow colleagues and thus I started using a BC approach with considerable success and personal satisfaction.

A few years later I read the article “Blazing Clickers” (Martin and Friedman 2011) explaining the reasons why a 1:1 click-treat pairing is recommended in animal training and adverting trainers to the downside of not pairing every click with an additional reinforcer. Whenever I see someone that is an International reference in a given field suggesting something that is different from what I have been doing I tend to embrace that new approach as a better option. One could be tempted to ignore such advice because of previous success with a different approach. However, I personally find it very helpful to have the ability to be open to new approaches and trial them, especially when they are put forward by someone that definitely knows more than I do about a certain topic. And thus, from that moment onwards I have always looked at a BC approach as a non-ideal technique for animal training. I kept that view for several years and yet, I still see many animals today that are trained with a BC approach.

According to my personal observations and experience, a BC approach is relatively common in the captive animal field. If you are going to a public aquarium to see a sea lion or dolphin presentation you are likely to see a BC approach. If, on the other hand, you go to a modern puppy class or if you sign up for a dog training course/certification you are likely to encounter a 1:1 click-treat pairing approach. It is not clear why the captive animal community uses a different approach than the dog training field but I would speculate that two of the main reasons would be:
  1. A lot of the original work about schedules of reinforcement was done without the use of a bridging stimulus (e.g. a click or a whistle). From these studies we learned that a Variable Schedule of Reinforcement can make behaviour more resilient to extinction and make the animal more persistent about getting reinforcement (this does not necessarily mean that it is the best schedule for most animal training situations; see Bailey and Bailey 1998). When captive animal trainers started to try to implement these schedules they ended up assuming that the “Variable” portion of the equation only applied to the backup reinforcer and not to the bridging stimulus. This is a mere misinterpretation of what a real Variable Schedule of Reinforcement is. If you click every correct response but only follow it with a backup reinforcer occasionally you are still technically using a Continuous Schedule of Reinforcement, but one that unfortunately weakens the power of your bridging stimulus. More on this later…
  2. Using a bridging stimulus is highly reinforcing for the trainer. It gives us a sense of accomplishment if we can get lots of correct responses from the animal and I would imagine that a session with 40 clicks is more reinforcing to the trainer than a session with 15 clicks. In practical terms a BC approach might actually be more reinforcing for the trainer than for the animal.

What do the experts say?

In their 2011 article, Martin and Friedman list five common misconceptions that trainers have and commonly use to justify a BC approach. The following list includes these common misconceptions and the main points for why they are pure misconceptions, as suggested by Steve Martin and Susan Friedman:
  1. The clicker is already a reinforcer (sometimes as strong or even stronger than a primary reinforcer) so there is no need for an additional one:
    1. Even though some secondary reinforcers can be as strong or even stronger than a primary reinforcer they still depend on repeated pairing with other reinforcers to acquire and maintain their reinforcing ability;
    2. Primary reinforcers are automatically reinforcing or pre-wired, while secondary reinforcers depend on pairing with additional reinforcers;
    3. Every time a click happens without a backup reinforcer, it just lost some of its ability to work as a reinforcer;
    4. If the click fails to predict a treat, the animal may develop a tendency to scan the environment for other cues (such as the trainer’s hand moving towards the treat bag); the animal might actually respond to this visual stimulus before or after hearing the click and thus using it as its “official” bridge.
  2. BC makes training more interesting and unpredictable for the animal:
    1. Although variety is important, it should come from the variety and quantity of the reinforcers, the behaviours trained and the pace of the session; not from using a BC approach;
    2. Animals may become inattentive in a training session due to blazing behaviours (lots of behaviours asked in quick succession with clicks after each correct behaviour and then a big reward at the end); as an example, targeting is most helpful when the behaviour is held for some duration of time, instead of a rapid succession of several quick targets.
  3. The behaviour will be stronger using a BC approach because it is a variable schedule of reinforcement similar to a slot machine:
    1. An intermittent schedule of reinforcement creates persistence into fluent behaviour, but if the clicker is an effective conditioned reinforcer, withholding the treat does not change the fact that we are still using a continuous schedule of clicks. If the click is not an efficient conditioned reinforcer (meaningless noise) the animal has to try to find the behaviour-consequence contingency with other environmental cues;
    2. When persistence is required it is better to teach the new behaviour with continuous reinforcement (click-treat) and then gradually stretching the reinforcements over time to the desired variable schedule (still pairing the click and treat, but varying the length of time or repetitions that the animal needs to perform the behaviour to be reinforced).
  4. It reduces frustration based aggression because the animal is not expecting a treat every time:
    1. Plan your sessions to have enough backup reinforcers or end the session sooner;
    2. There is data showing that extinction trials (click-no treat) can create frustration induced aggression.
  5. The clicker can tell the animal that he/she did something right but that he/she should keep doing it. The click can mean different things:
    1. A keep going signal (KGS) is indeed helpful in animal training, but the click should not mean two different things;
    2. The click meaning both “keep going” and “food is coming” makes for very unclear communication;
    3. The KGS and the formal bridge (click) should be two different stimulus.

So in sum, Martin and Friedman (2011) refer that clickers, whistles and other event markers can be used to improve communication between trainer and trainee, but that this communication is only clear when the conditioned reinforcer is systematically paired with a well-established backup reinforcer. When the click is not reliably paired with other reinforcers communication becomes less clear, motivation and performance can go down and frustration/aggression can go up. Not pairing a click with another reinforcer makes the click lose meaning and the animal tends to rely on other environmental cues (e.g. the hand going to the treat pouch) as a reliable predictor of an imminent reward. Finally, the article suggests that every time we do a solo click (no treat) the animal just underwent an extinction trial that weakens the meaning of the click.

Bob and Marian Bailey (1998) categorically testify that, in their experience, a Continuous Schedule of Reinforcement is by far the recommended approach for teaching and maintaining behaviour in animals for the vast majority of situations. They do mention a couple of very specific exceptions in which a Ratio Schedule might be used instead, but keep in mind that they never suggest that when we use a Ratio Schedule we should still click every correct response and withhold the backup reinforcer. As mentioned above, if we were to click every correct response we would be transforming our Ratio Schedule into a Continuous Schedule of Reinforcement.

What do we know from controlled experiments?

While training an animal, every time we pair a conditioned stimulus (click) with an unconditioned stimulus (a treat or other backup reinforcer), the animal just underwent a Pavlovian, classical or respondent conditioning trial. When a conditioned response is acquired it can be maintained if the conditional stimulus (the click) is followed up by the unconditional stimulus (food, water, etc.). If, however, the conditional stimulus is repeatedly used without the unconditioned stimulus (click and no food) the response becomes weaker and weaker. This process is called extinction (Chance 2003). For purposes of evaluating animal training studies, the more resistance to extinction we see, the more we can conclude that the conditioned reinforcer (the click) acquired reinforcement value.

McCall and Burgin (2002) conducted an experiment with 48 horses in which they trained them to press a lever for food. In an initial stage of the experiment, upon pressing the lever, half of the horses received a food reward only and the other half received a food reward preceded by an auditory buzzer as secondary reinforcer (similar to a click-treat sequence). Interestingly, they did not find the use of an event marker prior to the delivery of primary reinforcement (buzzer-food) to yield these horses more resilient to extinction of a learned behaviour later on. In other words, when the bridging stimulus was still marking correct behaviour but no longer paired with primary reinforcement (buzzer-no treat) the horses stopped performing the learned behaviour just as quickly as the horses that were trained using a food reward only. A second part of the experiment determined that the horses were able to learn a new task (push a flap) with the use of secondary reinforcement only (buzzer-no treat), but their interest in the task was low. Reintroducing the primary reinforcement along with the buzzer renewed their interest in the task, so the authors concluded that secondary reinforcers are more efficient when paired with primary reinforcers at a high rate.

Another similar experiment ended up coming up with similar results (Williams et al. 2004). In this case, 60 horses learned to touch a plastic cone for either primary reinforcement alone (food only) or secondary and primary reinforcement (click-treat). The horses trained with the use of the clicker did not show more resistance to extinction than the ones on the primary reinforcement only. This means that once the clicker stopped being paired with the food reward the behaviour plummeted just as with the horses that were not conditioned to the sound of the clicker. Importantly, these authors note that when the horses were put in the extinction trial (click-no treat) they appeared frustrated.

In a 2007 experiment, Smith and Davis trained dogs to touch a traffic cone with their noses. Some dogs got a click and treat, while others only got the treat when they touched the traffic cone. In this case when the dogs were put on extinction trials the ones that were conditioned to the clicker continued to perform the behaviour for longer if they continued to hear the clicker (but no food) as feedback for correct responses. It was concluded by the authors that the clicker might be able to maintain previously established behaviours when primary reinforcement cannot be delivered.

In another study, thirsty rats were conditioned to the sound of a buzzer before being offered water. With enough buzzer-water pairings the rats were then able to learn to push a lever for the sound of the buzzer, even though that behaviour stopped producing water. The buzzer became a conditioned reinforcer (Zimmerman 1957). Chance (2003) argues that the reinforcing power of the buzzer comes from being paired with another reinforcer and that it will lose its strength if it is never followed by that reinforcer. However, when water follows the sound of the buzzer, the sound may retain its reinforcing quality.

Langbein et al. (2007) conducted a shape discrimination learning task with dwarf goats (Capra hiscus). As in other similar studies some goats were trained using primary reinforcement only (water) while others received secondary and primary reinforcement (acoustic tone + water). In this study, it was concluded that secondary and primary reinforcement have to be paired at a high rate during learning and that the time delay between secondary and primary reinforcement should be short.

One of the most elucidating studies when trying to compare a 1:1 pairing to a non 1:1 pairing was conducted by Egger and Miller (1962). They conditioned rats to two different stimuli conditions (S1 and S2). In both conditions, the stimulus was paired up with food but for S1 rats every occurrence of the stimulus was followed by food (a 1:1 pairing), while for the S2 rats the stimulus was paired with food only occasionally (a non 1:1 pairing, similar to a BC approach). They then taught the rats to lever press using the conditioned stimuli. For the S2 group (non 1:1 pairing) the stimulus did not become an effective reinforcer while for the S1 group (1:1 pairing) it did.

Wennmacher (2007) trained dogs to perform a bow and a spin on cue and then compared their performance across two conditions: C+F (1:1 pairing of clicker and food) and C+C+F (clicking every correct response but offering food only for every other behaviour). In the C+F condition both dogs performed better in terms of frequency, accuracy and topography of the behaviour. In the C+C+F condition the dogs required more cues to perform the behaviour, showed increased noncompliance and other unwanted behaviours. The author also notes that in the C+C+F condition the dogs were less willing to come to the experimenter's location. In the C+F condition they showed a more enthusiastic body language.

The overall conclusions of these studies seem to be in agreement with the points brought up earlier. A 1:1 pairing of bridging stimulus and backup reinforcer is ideal to keep the strength of the bridging stimulus. In other words, the less we follow a click with a treat the less efficient the click will be. Additionally, a non 1:1 pairing seems to be sub-optimal and can even elicit frustration and other unwanted behaviours.

Additional considerations

Some trainers even reverse the sequence of events, by using the bridging stimulus after the primary reinforcement has been offered (treat-click; or behaviour-click-treat-click) when working with animals that have low motivation to eat. I believe that the reasoning behind it is that the click might be stronger than the primary reinforcer so when the animal actually accepts the food bridging it will increase the likelihood of the animal eating better in the future. I have worked with several animals with low food drive and I must confess that I do not see the above method as a helpful approach to tackle the problem. One of the dogs I worked with recently was Oakley, a beautiful American Staffordshire Terrier. Like many other animals that I have worked with, she was not very food driven and would turn her head away from the treat (after the click) in the first few sessions. It was decided to keep pairing every click with a food reward and after a few sessions Oakley was taking the treats with gusto and looking way more food motivated than in the first few sessions.

What happens when we ask the animal for several behaviours and we click every correct response but we only follow that up with a backup reinforcer sometimes? As mentioned above, this is commonly but wrongly believed to be a Variable Schedule of Reinforcement (Martin and Friedman 2011). In addition, the click is an event marker and when we click and do not use an additional reinforcer the event will technically be an extinction event, thus weakening the strength of the click (Chance 2003). A real Variable Schedule of Reinforcement (if you really need to use one) would be one in which some correct responses do not get the click and the backup reinforcer, while others get both the click and the additional reinforcer. So, regardless of the schedule you choose, avoid doing solo clicks.

In operant terms, extinction events can create emotional behaviour, particularly aggression. Rats conditioned to press a lever for food have been observed biting the lever or other rats if lever pressing no longer produces a food reward (Azrin et al. 1966, Rilling and Caplan 1973). This is a very important consideration, especially for those working in free contact with animals that are capable of inflicting serious damage.

If you are an animal training and behaviour enthusiastic learner you will realise that the literature contains seldom, if any, references or recommendations for a BC approach to train an animal. There are many discussions about schedules of reinforcement and about which ones are better for different goals and circumstances (for a review and sound advice see Bailey and Bailey 1998), but the suggestion of using solo clicks is an extreme rarity. If you are comparing a Continuous to a Variable Schedule of Reinforcement keep in mind that in both options click and treat should go together.

Conclusions

One of the main challenges in determining a BC approach’s efficiency is that in controlled studies it has been studied only occasionally. Clicking every correct response but only using a backup reinforcer occasionally seems to be a human construct that is widely used but rarely studied in controlled settings. With that said, what we do know from Classical and Operant Conditioning Theory and practical studies strongly suggests that a 1:1 click-treat pairing is a more efficient tool to establish clear communication.

If you are having issues with frustration based aggression or if you see visual tracking for additional cues (e.g. tracking when you reach for the treat pouch) I highly recommend that you trial the 1:1 click-treat pairing approach. A good test for the strength of your bridge is the following: ask the animal to perform a behaviour and then remain completely still while sounding your bridge. Try this with a few different behaviours. Did the animal continued to perform the behaviour as if nothing has happened? If so your bridging stimulus is not strong/clear enough and you should give this idea a chance.

If you currently do not have any problems in your training program you can keep the BC approach but I would still advise against it. Here is a real life comparison that is probably going to elucidate my view on this topic. You can watch a movie in a VHS tape and have a great time doing so. However, if you watch it in Blue Ray your experience is likely to be much better. Both options allow you to watch the movie and both options work, but in a Blue Ray the picture and the sound are much better. Which option would you would prefer?

So, in conclusion, should you adopt a BC approach to train your animal? After doing all this research I am left with the feeling that both approaches can work and yield good results, but I cannot help but think that according to the evidence that we have a 1:1 click-treat pairing approach seems to have less, if any, disadvantages when compared with a BC approach.


Azrin, N. H., Hake, D. F., Hutchinson, R. R. (1965). The Opportunity for Aggression as an Operant Reinforcer during Aversive Aggression. Journal of the Experimental Analysis of Behavior. 8(3), 171–180.

Bailey, B., Bailey, M., (1998). "Clickersolutions Training Articles - Ratios, Schedules - Why And When". Clickersolutions.com. N.p., Accessed 24 April 2016.

Chance, P., (2003). Learning and behavior (5th ed.). Belmont, CA: Wadsworth.

Egger, M. D., Miller, N. E., (1962). Secondary reinforcement in rats as a function of information value and reliability of the stimulus. Journal of Experimental Psychology, 64(2), 97-104.

Fernandez, E.J., (2001). Click or Treat: A Trick or Two in the Zoo. American Animal Trainer Magazine, 2, 41-44. Shedd Aquarium.

Langbein, J., Siebert, K., Nuernberg, G., Manteuffel, G., (2007). The impact of acoustical secondary reinforcement during shape discrimination learning of dwarf goats (Capra hircus). Applied Animal Behaviour Science. 103(1-2), 35–44.

Martin, S., Friedman, S.G., (2011, November). Blazing clickers. Paper present at Animal Behavior Management Alliance conference, Denver. Co.

McCall, C.A., Burgin, S.E., (2002). Equine utilization of secondary reinforcement during response extinction and acquisition. Applied Animal Behaviour Science. 78, 253–262.

Rilling, M., Caplan, H. J., (1973). Extinction-induced aggression during errorless discrimination learning. Journal of the Experimental Analysis of Behavior. 20, 85-92.

Smith, S.M., Davis, E.S., (2008) Clicker increases resistance to extinction but does not decrease training time of a simple operant task in domestic dogs (Canis familiaris). Applied Animal Behaviour Science. 110(3-4), 318-329.

Wennmacher, P. L. (2007). Effects of Click + Continuous Food Vs. Click + Intermittent Food on the Maintenance of Dog Behavior (Master's Thesis). University of North Texas.

Williams, J.L., Friend, T.H., Nevill, C.H., Archer, G., (2004). The efficacy of a secondary reinforcer (clicker) during acquisition and extinction of an operant task in horses. Applied Animal Behaviour Science. 88, 331–341.

Zimmerman, D. W., (1957). Durable secondary reinforcement: Method and theory. Psychological Review. 64, 373-383.

Picture: www.morguefile.com

​Edited: 16/12/2016
6 Comments

Why is timing so important in animal training?

25/10/2015

5 Comments

 
Picture
Some five or six years ago I wrote an article for a Portuguese website about the importance of timing in dog training. Today, I will revisit that article and re-word some of its content according to my experiences and what I have learned since then. I still agree with most of what I wrote back then, but I also have slightly different views on some topics. Hence, this will be an introductory article on the topic of timing specifically directed to people that are new to animal training.
 
Timing is probably one of the attributes that contributes most to establishing an efficient venue of communication between a human and a non-human animal. If we want to positively reinforce a behaviour, and increase its frequency of occurrence in the future, the reward should be offered at the exact moment when the behaviour we want to "capture" occurs (or within one second after the occurrence of this behaviour). The moment in which we offer the reward contains very important information about what we are trying to teach.

Operant conditioning tells us that the common sequence of events is antecedent, behaviour, consequence (ABC). Thus, the cue (antecedent) should come before the behaviour and then, when the behaviour occurs, an appropriate consequence must follow. If we want the dog to learn to relieve herself on cue, that cue should ideally be offered when the dog starts circling and sniffing the ground. We should not wait until she has already started the behaviour and/or has finished it. Instead, the cue should happen before. Similarly, a road sign that warns of a crossroad should appear before the crossing. If it appears on or after the crossing it will not be very useful.

We then have the issue of using good timing for the feedback we provide when the behaviour happens. If the timing is not appropriate we may be reinforcing a behaviour that is not the intended one. Imagine asking the dog to sit and when he does so we get distracted and look at something else for a few seconds. During that time the dog can stand up, look away, sniff the ground, etc. When you turn back and reinforce the behaviour, what did you reinforce exactly? Was it the sit? Was it the Stand? It is usually difficult to know, but you were probably reinforcing the behaviour that the dog was doing when you offered the reward.

Now let’s discuss an imaginary scenario. Imagine that you go to a restaurant for lunch and you choose fruit instead of chocolate mousse (your typical choice) for desert. When you get home two hours later you find 100$ in cash in your mail box. The money was left there without your knowledge by a wealthy friend of yours who seeks to reinforce the fact that you have chosen to eat fruit instead of chocolate mousse. The millionaire’s intention was to make it more likely that you would choose fruit in the future. It is highly unlikely that the millionaire will be successful in this initiative. Many behaviours have occurred between the time you ate the fruit and found the cash. In this scenario the probability of association between the two behaviours is greatly reduced. Now imagine that what happened instead was that when you ordered the fruit it came with a 100$ voucher on the side. In that scenario it is much more likely that you will order fruit the next time. It all has to do with timing.

We can now make an analogy to an issue that most dog guardians face. Imagine that you get home and your puppy urinated on the floor during your absence. If you decide to reprimand her for the “mistake” she did when you get home, how would she know why she is being reprimanded? It will be very difficult for her to associate your reaction with the behaviour of urinating on the floor that happened half an hour ago. I should make a quick reference here to the fact that I would not recommend reprimanding a puppy for a house training mistake, even if you catch her “in the act”. First, leaving a young puppy unattended in the house without supervision and/or management and “hoping for the best” is a recipe for disaster. Second, if you reprimand her “in the act” she may think that she cannot relieve herself in front of you and will start to sneak away when she needs to relieve herself.

Unfortunately (or maybe fortunately) we cannot tell an animal that we like what he did two hours ago. We have to act in the present moment and we can only influence what is happening then. When I have a conversation with a person, I can let her know that I like what she did yesterday and earlier today. I can also let her know about the things that I did not like. With non-human animals, a system that allows us to do this does not yet exist, as far as I am aware at least. We would then have the issue of whether or not they are cognitively capable of comprehending information referring to the past, but that is a topic for another time…

I mentioned earlier that in order to positively reinforce a behaviour, and increase its frequency of occurrence in the future, the reward should be offered when the behaviour happens (or in less than a second after the behaviour) in order to be effective. The reality is that it is very difficult to achieve this consistently. The solution to this problem involves the use of a conditioned or secondary reinforcer (e.g. a word, a whistle, the sound of a clicker). A primary reinforcer is something that the individual naturally needs such as food and water. A conditioned reinforcer, on the other hand, acquires meaning when repeatedly paired with something that the individual finds rewarding (typically a primary reinforcer, but it could also be a previously established secondary reinforcer). To accomplish this we make use of classical conditioning and we condition the individual to a “marker” or “bridge”. We do this by systematically presenting it before the delivery of something that already has reinforcement value for the animal (e.g. click – food, click – food, etc.). This is often called “charging the marker” or “charging the clicker”. With enough repetition the clicker will start to elicit the same response that the food would and later on we can use this to pinpoint moments in time in which he just earned a reward.

Now we are able to tell the dog to sit, click for the sit and give the reward after a few seconds. Another major advantage of this system is that we can reinforce behaviour that occurs away from us. A dolphin jumps to touch a ball and the trainer blows its whistle when it happens. The delivery of the reward can now occur when the animal is back with the trainer, even if that takes a few seconds to happen. This procedure eliminates confusion about which behaviour caused the delivery of the reward.

Regarding the bridge, there is a lot of ongoing discussion debating whether or not the bridge should tell the animal that the behaviour is over and that it can now come to the trainer to collect the reward. The alternative approach would be one in which the animal hears the bridge and should remain in position to receive the reward.  An example of this is when you tell your dog to sit. When he sits, you click and offer a food reward. The question then is: does he have to remain sitting to receive it? I tend to prefer using the bridge as a classic terminal signal that tells the animal that he did well and that the behaviour is over. I recognize however that many successful trainers use the other approach and that animals seem to learn equally well. Furthermore, for some behaviours offering the reward out of position seems to speed up the learning process, while in other situations offering the reward in position seems to convey important information to the animal.
 
In conclusion, timing is a very important skill to master if effective communication is to be achieved. In operant conditioning terminology the sequence antecedent, behaviour, consequence (ABC) is highly accepted and it tells us the correct chain of events when something is under operant conditioning control. Regarding the consequence, when training non-human animals there is no point in providing feedback for something that has happened in the past. Acting in the moment when the behaviour happens makes the trainee’s learning experience easier and clearer. Considering the importance of good timing, the use of a “marker” or “bridge” is recommended to fill in the gap between the behaviour and the collection of a reward.


Picture: www.morguefile.com
5 Comments

Five things that dog trainers do differently

31/7/2015

15 Comments

 
Picture
1.       Socialization

Dog trainers recognize how important it is to start socializing puppies early on and how much easier it is to prevent problems from arising instead of trying to fix them later. Puppies go through a sensitive period of socialization between 4 and 14 weeks of age (the exact length of this period is variable and constantly being debated). Within this time period, the worst thing we can do is to keep the puppy indoors at all times, with no access to members outside of the household. It is critical that the puppy goes outside to interact with the world.

There are some health considerations during this period because the puppy’s immune system is still developing and thus some caution is advised to minimize exposure to diseases. Going through where we should or shouldn’t take our puppies early on is beyond the scope of this article, but this is something that you can discuss with your veterinarian and dog training professional. If your veterinarian does not recognize the importance of early socialization and advises you to keep the puppy indoors until the puppy is four months old, I would recommend that you seek a second opinion.

Socialization is critical and you will probably only have one chance to do it right. If you start too late you are already risking the puppy developing phobias and other detrimental behavioural issues. I also recommend that socialization is an ongoing commitment with special emphasis during the first year. If you let too much time go by without exposure to a certain stimulus (dogs, people, places), the dog may start to show some type of negative emotional response towards those things.

Here is a real life example: imagine that you are raising a child and that between the ages of 2 and 15 years of age that child only lives inside the house and never goes out to school, to play with other children, to interact with other adults, to visit different places, etc. Certainly, this child would not develop healthy social habits and behaviour. A similar process happens to dogs, but it happens faster. Dog trainers are aware of this and they take their puppies out to interact with other puppies, friendly adult dogs, people from different age groups and ethnicity, and to places that look, feel, smell and sound different.


2.       Management

Dog trainers are very good at using management solutions to make their life (and their dogs’ life) easier. A dog in a new environment (especially if it is a puppy) with too much freedom to roam the house and make his/her own choices is a recipe for disaster. Dog trainers are aware of this and take a proactive approach to minimize the amount of mistakes that the dog can make.

The use of dog crates, baby gates, exercise pens, leashes (when supervising the puppy) and other management tools makes it easier to control where and what the dog is doing. Many new pet owners simply bring a puppy home and hope for the best (they hope that the puppy will know where to relieve himself/herself and what are appropriate chew items). Dog trainers know that puppies will probably make choices that we don’t like and so they use confinement to minimize issues during the initial phase. Thus, if the dog cannot be supervised he/she goes into a confinement option. Dog trainers are also aware that harnesses and leashes are great tools to be used inside the house, as long as this is done under supervision (leashes are not just for leash walking).

Dog trainers will balance out the amount of confinement with the amount of physical activity, mental stimulation, socialization and training sessions. When house training and chewing appropriate items is reaching success on a regular basis, dog trainers start to progressively offer the dog more freedom until the use of confinement is considerably reduced.


3.       Motivation

Dog trainers are fully aware that generally dogs will not do things to please us. They will mostly do things to please themselves. With that in mind, most dog trainers use access to high value resources contingent upon doing something that they want the dog to do. A great approach that many trainers use to put motivation working for them is to get rid of the food bowl and to offer food in training sessions, in puzzle toys or other environmental enrichment options. The dog’s wild cousins have to work hard to get food and that approach seems to make sense to our domesticated companions as well.

Dog trainers also put other resources working in their favour. Does the dog want to sniff a bush? Does he/she want to say hello to another dog or person? Go through a door? Does he/she want you to toss a tennis ball? Dog trainers will ask the dog to do something before they proceed with these highly prized events.

Petting and affection might be valuable in the living room, but out there in the real world they are probably not that high value for your dog. Dog trainers are aware of this and adapt accordingly to the situation they are in. In some cases a piece of kibble is high value enough for your dog to be engaged with you, but in other scenarios you may need a piece of cheese or cooked chicken.


4.       Occupational activities and exercise

Most dogs are pretty good at spending a big part of the day resting and sleeping, but they also need a regular supply of mental and physical stimulation. Dog trainers make sure that their dogs receive exercise and environmental enrichment on a regular basis.

Here are some tips and tricks that dog trainers use:

·         playing fetch will get a dog tired faster than walking him/her on a leash;

·         a long leash (8-15m) attached to a harness is a magnificent tool if the environment is safe enough to use such a device;

·         if you will have a very busy day consider using the services of a dog walker or doggie day care;

·         if the weather is terribly bad, there are still lots of stimulating activities that you can do indoors;

·         leaving a stuffed food toy for your dog to chew will make it more likely for the dog to be content with being left alone;

·         there are many “Kong recipes” out there that will make food toys more challenging and interesting;

·         finding food throughout the house and/or yard is more fun than eating it from a bowl;

·         preventing access to shoes, socks, rubbish, etc. is likely to make your life easier;

·         toys that are available all the time lose value and become “furniture”.


5.       Preparation for real life situations

Dog trainers realize that prevention and preparation will go a long way towards avoiding fears and other behaviours that we do not want our beloved dogs to show. They prepare for such a situation in a way that is easy for the dog to handle before he/she is confronted with the real life potential trigger.

Here is an example: many dogs are very likely to show fear towards thunderstorms or fireworks. One possible way to help preventing this occurrence is to play recorded sounds of those events with soft volume and then progressively increase the volume until it somehow resembles the real sound that the dog may encounter. This process is called desensitization. Dog trainers also like to pair desensitization with counter-conditioning. To include counter-conditioning in the previous example you would pair the “frighting sound” with something that the dog enjoys (e.g. food treats). The sequence would be: thunder sounds equals super yummy food rewards; no sound equals food treats are no longer available and “life is boring”. With this approach we would possibly create a positive emotional response to the sound of thunderstorms or fireworks.

A dog trainer will not wait for these events to be exposed to his/her dog in real life and hope for the best. Instead, a dog trainer will assume that a negative emotional response is likely to evolve if things are left for chance, and for that reason he/she will actively prepare the dog for a real life situation before it happens. If enough preparation is not possible, the dog trainer will use management and counter-conditioning to try to minimize the negative experience as much as possible.

Picture: www.morguefile.com

15 Comments
<<Previous

    Author

    Jose Gomes is a certified dog behaviour consultant by the ABC of SA and currently applying the most updated humane techniques to the training of dogs and other pets

    Disclaimer: All opinions expressed are my own and do not represent the opinions of any other academic and professional organisations

    Archives

    April 2025
    March 2025
    February 2025
    February 2018
    June 2017
    May 2016
    October 2015
    July 2015
    June 2015
    May 2015
    April 2015

    Categories

    All

    RSS Feed

© 2025 Train Me Please, All Rights Reserved
Services | About | Contact Form
[email protected]