The following text is a discussion of my personal experience, the experts’ opinion, the controlled experiments and the literary references that relate to the topic of click-treat pairing in animal training. By debating it I hope to bring some clarity to the issue and help animal trainers make a more informed decision when deciding on how to use their bridging stimulus. Learning theory applies to all species, so the following content should be equally relevant for those of you training dogs, pet rats or even dolphins, if you are lucky enough to have the opportunity to do that for a living.
To make this text easier to understand, I will refer to some of the terminology that Martin and Friedman (2011) used in their insightful paper “Blazing Clickers”, as indicated below:
“1. The word click refers to any conditioned reinforcer used in training to reinforce a behaviour with super contiguity. It is used synonymously with conditioned or secondary reinforcer, bridging stimulus, bridge, event marker and marker.
2. The word treat refers to any well-established reinforcer, conditioned or unconditioned, used to condition and maintain the reinforcing strength of the click. Treat is used synonymously with backup reinforcer (most often in animal training the backup reinforcer is food).
3. The term blazing clickers refers to the practice of repeatedly clicking without systematically delivering the backup reinforcer, also referred to a solo clicks.”
4. Many trainers mention a Variable Schedule of Reinforcement when they are technically referring to a Variable Ratio Schedule of Reinforcement. For simplicity, in this text we will maintain the general term "Variable Schedule of Reinforcement", but keep in mind that Partial Schedules of Reinforcement are divided in Fixed Ratio, Variable Ratio, Fixed Interval and Variable Interval.
Personal observations and experience
My first contact with animal training happened when I was training some dogs from family members and friends. I was reading animal training books, watching dog training DVDs and then practicing the learned skills with these dogs. I soon realised how powerful classical and operant conditioning can be and got absolutely hooked on the topic. Soon after that I started working in the marine animal training field and quickly learned that worldwide there were lots of parks, zoos and aquariums in which the trainers used a Blazing Clickers (BC) approach (a non-1:1 pairing of clicker and backup reinforcer). This was a surprise to me because I had never seen this approach being suggested in any animal training book or article that I had encountered. Yet, much to my surprise these animals were extremely well trained and capable of amazing behaviours. The field of marine animal training is full of incredible people that love to share their passion, techniques and successes with fellow colleagues and thus I started using a BC approach with considerable success and personal satisfaction.
A few years later I read the article “Blazing Clickers” (Martin and Friedman 2011) explaining the reasons why a 1:1 click-treat pairing is recommended in animal training and adverting trainers to the downside of not pairing every click with an additional reinforcer. Whenever I see someone that is an International reference in a given field suggesting something that is different from what I have been doing I tend to embrace that new approach as a better option. One could be tempted to ignore such advice because of previous success with a different approach. However, I personally find it very helpful to have the ability to be open to new approaches and trial them, especially when they are put forward by someone that definitely knows more than I do about a certain topic. And thus, from that moment onwards I have always looked at a BC approach as a non-ideal technique for animal training. I kept that view for several years and yet, I still see many animals today that are trained with a BC approach.
According to my personal observations and experience, a BC approach is relatively common in the captive animal field. If you are going to a public aquarium to see a sea lion or dolphin presentation you are likely to see a BC approach. If, on the other hand, you go to a modern puppy class or if you sign up for a dog training course/certification you are likely to encounter a 1:1 click-treat pairing approach. It is not clear why the captive animal community uses a different approach than the dog training field but I would speculate that two of the main reasons would be:
- A lot of the original work about schedules of reinforcement was done without the use of a bridging stimulus (e.g. a click or a whistle). From these studies we learned that a Variable Schedule of Reinforcement can make behaviour more resilient to extinction and make the animal more persistent about getting reinforcement (this does not necessarily mean that it is the best schedule for most animal training situations; see Bailey and Bailey 1998). When captive animal trainers started to try to implement these schedules they ended up assuming that the “Variable” portion of the equation only applied to the backup reinforcer and not to the bridging stimulus. This is a mere misinterpretation of what a real Variable Schedule of Reinforcement is. If you click every correct response but only follow it with a backup reinforcer occasionally you are still technically using a Continuous Schedule of Reinforcement, but one that unfortunately weakens the power of your bridging stimulus. More on this later…
- Using a bridging stimulus is highly reinforcing for the trainer. It gives us a sense of accomplishment if we can get lots of correct responses from the animal and I would imagine that a session with 40 clicks is more reinforcing to the trainer than a session with 15 clicks. In practical terms a BC approach might actually be more reinforcing for the trainer than for the animal.
What do the experts say?
In their 2011 article, Martin and Friedman list five common misconceptions that trainers have and commonly use to justify a BC approach. The following list includes these common misconceptions and the main points for why they are pure misconceptions, as suggested by Steve Martin and Susan Friedman:
- The clicker is already a reinforcer (sometimes as strong or even stronger than a primary reinforcer) so there is no need for an additional one:
- Even though some secondary reinforcers can be as strong or even stronger than a primary reinforcer they still depend on repeated pairing with other reinforcers to acquire and maintain their reinforcing ability;
- Primary reinforcers are automatically reinforcing or pre-wired, while secondary reinforcers depend on pairing with additional reinforcers;
- Every time a click happens without a backup reinforcer, it just lost some of its ability to work as a reinforcer;
- If the click fails to predict a treat, the animal may develop a tendency to scan the environment for other cues (such as the trainer’s hand moving towards the treat bag); the animal might actually respond to this visual stimulus before or after hearing the click and thus using it as its “official” bridge.
- BC makes training more interesting and unpredictable for the animal:
- Although variety is important, it should come from the variety and quantity of the reinforcers, the behaviours trained and the pace of the session; not from using a BC approach;
- Animals may become inattentive in a training session due to blazing behaviours (lots of behaviours asked in quick succession with clicks after each correct behaviour and then a big reward at the end); as an example, targeting is most helpful when the behaviour is held for some duration of time, instead of a rapid succession of several quick targets.
- The behaviour will be stronger using a BC approach because it is a variable schedule of reinforcement similar to a slot machine:
- An intermittent schedule of reinforcement creates persistence into fluent behaviour, but if the clicker is an effective conditioned reinforcer, withholding the treat does not change the fact that we are still using a continuous schedule of clicks. If the click is not an efficient conditioned reinforcer (meaningless noise) the animal has to try to find the behaviour-consequence contingency with other environmental cues;
- When persistence is required it is better to teach the new behaviour with continuous reinforcement (click-treat) and then gradually stretching the reinforcements over time to the desired variable schedule (still pairing the click and treat, but varying the length of time or repetitions that the animal needs to perform the behaviour to be reinforced).
- It reduces frustration based aggression because the animal is not expecting a treat every time:
- Plan your sessions to have enough backup reinforcers or end the session sooner;
- There is data showing that extinction trials (click-no treat) can create frustration induced aggression.
- The clicker can tell the animal that he/she did something right but that he/she should keep doing it. The click can mean different things:
- A keep going signal (KGS) is indeed helpful in animal training, but the click should not mean two different things;
- The click meaning both “keep going” and “food is coming” makes for very unclear communication;
- The KGS and the formal bridge (click) should be two different stimulus.
So in sum, Martin and Friedman (2011) refer that clickers, whistles and other event markers can be used to improve communication between trainer and trainee, but that this communication is only clear when the conditioned reinforcer is systematically paired with a well-established backup reinforcer. When the click is not reliably paired with other reinforcers communication becomes less clear, motivation and performance can go down and frustration/aggression can go up. Not pairing a click with another reinforcer makes the click lose meaning and the animal tends to rely on other environmental cues (e.g. the hand going to the treat pouch) as a reliable predictor of an imminent reward. Finally, the article suggests that every time we do a solo click (no treat) the animal just underwent an extinction trial that weakens the meaning of the click.
Bob and Marian Bailey (1998) categorically testify that, in their experience, a Continuous Schedule of Reinforcement is by far the recommended approach for teaching and maintaining behaviour in animals for the vast majority of situations. They do mention a couple of very specific exceptions in which a Ratio Schedule might be used instead, but keep in mind that they never suggest that when we use a Ratio Schedule we should still click every correct response and withhold the backup reinforcer. As mentioned above, if we were to click every correct response we would be transforming our Ratio Schedule into a Continuous Schedule of Reinforcement.
What do we know from controlled experiments?
While training an animal, every time we pair a conditioned stimulus (click) with an unconditioned stimulus (a treat or other backup reinforcer), the animal just underwent a Pavlovian, classical or respondent conditioning trial. When a conditioned response is acquired it can be maintained if the conditional stimulus (the click) is followed up by the unconditional stimulus (food, water, etc.). If, however, the conditional stimulus is repeatedly used without the unconditioned stimulus (click and no food) the response becomes weaker and weaker. This process is called extinction (Chance 2003). For purposes of evaluating animal training studies, the more resistance to extinction we see, the more we can conclude that the conditioned reinforcer (the click) acquired reinforcement value.
McCall and Burgin (2002) conducted an experiment with 48 horses in which they trained them to press a lever for food. In an initial stage of the experiment, upon pressing the lever, half of the horses received a food reward only and the other half received a food reward preceded by an auditory buzzer as secondary reinforcer (similar to a click-treat sequence). Interestingly, they did not find the use of an event marker prior to the delivery of primary reinforcement (buzzer-food) to yield these horses more resilient to extinction of a learned behaviour later on. In other words, when the bridging stimulus was still marking correct behaviour but no longer paired with primary reinforcement (buzzer-no treat) the horses stopped performing the learned behaviour just as quickly as the horses that were trained using a food reward only. A second part of the experiment determined that the horses were able to learn a new task (push a flap) with the use of secondary reinforcement only (buzzer-no treat), but their interest in the task was low. Reintroducing the primary reinforcement along with the buzzer renewed their interest in the task, so the authors concluded that secondary reinforcers are more efficient when paired with primary reinforcers at a high rate.
Another similar experiment ended up coming up with similar results (Williams et al. 2004). In this case, 60 horses learned to touch a plastic cone for either primary reinforcement alone (food only) or secondary and primary reinforcement (click-treat). The horses trained with the use of the clicker did not show more resistance to extinction than the ones on the primary reinforcement only. This means that once the clicker stopped being paired with the food reward the behaviour plummeted just as with the horses that were not conditioned to the sound of the clicker. Importantly, these authors note that when the horses were put in the extinction trial (click-no treat) they appeared frustrated.
In a 2007 experiment, Smith and Davis trained dogs to touch a traffic cone with their noses. Some dogs got a click and treat, while others only got the treat when they touched the traffic cone. In this case when the dogs were put on extinction trials the ones that were conditioned to the clicker continued to perform the behaviour for longer if they continued to hear the clicker (but no food) as feedback for correct responses. It was concluded by the authors that the clicker might be able to maintain previously established behaviours when primary reinforcement cannot be delivered.
In another study, thirsty rats were conditioned to the sound of a buzzer before being offered water. With enough buzzer-water pairings the rats were then able to learn to push a lever for the sound of the buzzer, even though that behaviour stopped producing water. The buzzer became a conditioned reinforcer (Zimmerman 1957). Chance (2003) argues that the reinforcing power of the buzzer comes from being paired with another reinforcer and that it will lose its strength if it is never followed by that reinforcer. However, when water follows the sound of the buzzer, the sound may retain its reinforcing quality.
Langbein et al. (2007) conducted a shape discrimination learning task with dwarf goats (Capra hiscus). As in other similar studies some goats were trained using primary reinforcement only (water) while others received secondary and primary reinforcement (acoustic tone + water). In this study, it was concluded that secondary and primary reinforcement have to be paired at a high rate during learning and that the time delay between secondary and primary reinforcement should be short.
One of the most elucidating studies when trying to compare a 1:1 pairing to a non 1:1 pairing was conducted by Egger and Miller (1962). They conditioned rats to two different stimuli conditions (S1 and S2). In both conditions, the stimulus was paired up with food but for S1 rats every occurrence of the stimulus was followed by food (a 1:1 pairing), while for the S2 rats the stimulus was paired with food only occasionally (a non 1:1 pairing, similar to a BC approach). They then taught the rats to lever press using the conditioned stimuli. For the S2 group (non 1:1 pairing) the stimulus did not become an effective reinforcer while for the S1 group (1:1 pairing) it did.
Wennmacher (2007) trained dogs to perform a bow and a spin on cue and then compared their performance across two conditions: C+F (1:1 pairing of clicker and food) and C+C+F (clicking every correct response but offering food only for every other behaviour). In the C+F condition both dogs performed better in terms of frequency, accuracy and topography of the behaviour. In the C+C+F condition the dogs required more cues to perform the behaviour, showed increased noncompliance and other unwanted behaviours. The author also notes that in the C+C+F condition the dogs were less willing to come to the experimenter's location. In the C+F condition they showed a more enthusiastic body language.
The overall conclusions of these studies seem to be in agreement with the points brought up earlier. A 1:1 pairing of bridging stimulus and backup reinforcer is ideal to keep the strength of the bridging stimulus. In other words, the less we follow a click with a treat the less efficient the click will be. Additionally, a non 1:1 pairing seems to be sub-optimal and can even elicit frustration and other unwanted behaviours.
Some trainers even reverse the sequence of events, by using the bridging stimulus after the primary reinforcement has been offered (treat-click; or behaviour-click-treat-click) when working with animals that have low motivation to eat. I believe that the reasoning behind it is that the click might be stronger than the primary reinforcer so when the animal actually accepts the food bridging it will increase the likelihood of the animal eating better in the future. I have worked with several animals with low food drive and I must confess that I do not see the above method as a helpful approach to tackle the problem. One of the dogs I worked with recently was Oakley, a beautiful American Staffordshire Terrier. Like many other animals that I have worked with, she was not very food driven and would turn her head away from the treat (after the click) in the first few sessions. It was decided to keep pairing every click with a food reward and after a few sessions Oakley was taking the treats with gusto and looking way more food motivated than in the first few sessions.
What happens when we ask the animal for several behaviours and we click every correct response but we only follow that up with a backup reinforcer sometimes? As mentioned above, this is commonly but wrongly believed to be a Variable Schedule of Reinforcement (Martin and Friedman 2011). In addition, the click is an event marker and when we click and do not use an additional reinforcer the event will technically be an extinction event, thus weakening the strength of the click (Chance 2003). A real Variable Schedule of Reinforcement (if you really need to use one) would be one in which some correct responses do not get the click and the backup reinforcer, while others get both the click and the additional reinforcer. So, regardless of the schedule you choose, avoid doing solo clicks.
In operant terms, extinction events can create emotional behaviour, particularly aggression. Rats conditioned to press a lever for food have been observed biting the lever or other rats if lever pressing no longer produces a food reward (Azrin et al. 1966, Rilling and Caplan 1973). This is a very important consideration, especially for those working in free contact with animals that are capable of inflicting serious damage.
If you are an animal training and behaviour enthusiastic learner you will realise that the literature contains seldom, if any, references or recommendations for a BC approach to train an animal. There are many discussions about schedules of reinforcement and about which ones are better for different goals and circumstances (for a review and sound advice see Bailey and Bailey 1998), but the suggestion of using solo clicks is an extreme rarity. If you are comparing a Continuous to a Variable Schedule of Reinforcement keep in mind that in both options click and treat should go together.
One of the main challenges in determining a BC approach’s efficiency is that in controlled studies it has been studied only occasionally. Clicking every correct response but only using a backup reinforcer occasionally seems to be a human construct that is widely used but rarely studied in controlled settings. With that said, what we do know from Classical and Operant Conditioning Theory and practical studies strongly suggests that a 1:1 click-treat pairing is a more efficient tool to establish clear communication.
If you are having issues with frustration based aggression or if you see visual tracking for additional cues (e.g. tracking when you reach for the treat pouch) I highly recommend that you trial the 1:1 click-treat pairing approach. A good test for the strength of your bridge is the following: ask the animal to perform a behaviour and then remain completely still while sounding your bridge. Try this with a few different behaviours. Did the animal continued to perform the behaviour as if nothing has happened? If so your bridging stimulus is not strong/clear enough and you should give this idea a chance.
If you currently do not have any problems in your training program you can keep the BC approach but I would still advise against it. Here is a real life comparison that is probably going to elucidate my view on this topic. You can watch a movie in a VHS tape and have a great time doing so. However, if you watch it in Blue Ray your experience is likely to be much better. Both options allow you to watch the movie and both options work, but in a Blue Ray the picture and the sound are much better. Which option would you would prefer?
So, in conclusion, should you adopt a BC approach to train your animal? After doing all this research I am left with the feeling that both approaches can work and yield good results, but I cannot help but think that according to the evidence that we have a 1:1 click-treat pairing approach seems to have less, if any, disadvantages when compared with a BC approach.
Azrin, N. H., Hake, D. F., Hutchinson, R. R. (1965). The Opportunity for Aggression as an Operant Reinforcer during Aversive Aggression. Journal of the Experimental Analysis of Behavior. 8(3), 171–180.
Bailey, B., Bailey, M., (1998). "Clickersolutions Training Articles - Ratios, Schedules - Why And When". Clickersolutions.com. N.p., Accessed 24 April 2016.
Chance, P., (2003). Learning and behavior (5th ed.). Belmont, CA: Wadsworth.
Egger, M. D., Miller, N. E., (1962). Secondary reinforcement in rats as a function of information value and reliability of the stimulus. Journal of Experimental Psychology, 64(2), 97-104.
Fernandez, E.J., (2001). Click or Treat: A Trick or Two in the Zoo. American Animal Trainer Magazine, 2, 41-44. Shedd Aquarium.
Langbein, J., Siebert, K., Nuernberg, G., Manteuffel, G., (2007). The impact of acoustical secondary reinforcement during shape discrimination learning of dwarf goats (Capra hircus). Applied Animal Behaviour Science. 103(1-2), 35–44.
Martin, S., Friedman, S.G., (2011, November). Blazing clickers. Paper present at Animal Behavior Management Alliance conference, Denver. Co.
McCall, C.A., Burgin, S.E., (2002). Equine utilization of secondary reinforcement during response extinction and acquisition. Applied Animal Behaviour Science. 78, 253–262.
Rilling, M., Caplan, H. J., (1973). Extinction-induced aggression during errorless discrimination learning. Journal of the Experimental Analysis of Behavior. 20, 85-92.
Smith, S.M., Davis, E.S., (2008) Clicker increases resistance to extinction but does not decrease training time of a simple operant task in domestic dogs (Canis familiaris). Applied Animal Behaviour Science. 110(3-4), 318-329.
Wennmacher, P. L. (2007). Effects of Click + Continuous Food Vs. Click + Intermittent Food on the Maintenance of Dog Behavior (Master's Thesis). University of North Texas.
Williams, J.L., Friend, T.H., Nevill, C.H., Archer, G., (2004). The efficacy of a secondary reinforcer (clicker) during acquisition and extinction of an operant task in horses. Applied Animal Behaviour Science. 88, 331–341.
Zimmerman, D. W., (1957). Durable secondary reinforcement: Method and theory. Psychological Review. 64, 373-383.