Train Me Please

Schedules of reinforcement in animal training

1/2/2018

There are several options in terms of reinforcement schedules that can be used for behaviour modification. In this text I will provide you with a quick description of each of the different simple schedules and a couple of examples for each (one human example and one animal training example). I will also offer a couple of considerations for people debating the idea of which schedule to use for a given situation.

Early in my career I was told that, in general, a good way to go about training animals would be to use a continuous schedule of reinforcement for teaching a new behaviour and to then maintain the behaviour using a “variable schedule of reinforcement”. This is a very broad statement and one that seems to make sense to someone being introduced to animal training. However, is this really the best option to go about when training animals? And what do people mean when they mention “a variable schedule of reinforcement”? Let’s start by defining the most common types of simple schedules of reinforcement according to Paul Chance’s book Learning and behavior (2003; figure 1).

Figure 1 – The most common types of simple reinforcement schedules

The simplest type of reinforcement schedule is a Continuous reinforcement schedule. In this case every correct behaviour that meets the established criteria is reinforced. For example, the dog gets a treat every time it sits when asked to do so; the salesman gets paid every time he sells a book.

Partial Schedules of reinforcement can be divided into Fixed Ratio, Variable Ratio, Fixed Interval and Variable Interval.

In a Fixed Ratio reinforcement schedule, the behaviour is reinforced after a certain amount of correct responses has occurred. For example, the dog gets a treat after sitting three times (FR 3); the salesman gets paid when four books are sold (FR 4).

In a Variable Ratio reinforcement schedule, the behaviour is reinforced when a variable number of correct responses has occurred. This variable number can be around a given average. For example, the dog gets a treat after sitting twice, after sitting four times and after sitting six times. The average in this example is four, so this would be a VR 4 schedule of reinforcement. Using our human example, if the salesman gets paid after selling five, fifteen and ten books he would be on a VR 10 schedule of reinforcement, given than ten is the average number around which his payments are offered.

In a Fixed Interval reinforcement schedule, the behaviour is reinforced after a certain behaviour has happened, but only when that behaviour occurs after a certain amount of time. For example, if a dog is in a FI 8 schedule of reinforcement it will get a treat the first time it sits, but sitting will not produce treats for the next 8 seconds. After the 8 second period, the first sit will produce a treat again. The salesman will get paid after selling a book but then not receive payment for each book sold for the next 3 hours. After the 3-hour period, the first book he sells results in the salesman getting paid again (FI 3).

In a Variable Interval reinforcement schedule, the behaviour is reinforced after a certain variable amount of time has elapsed. The amount of time can vary around a given average. For example, instead of always reinforcing the sit behaviour after 8 seconds, that behaviour could be reinforced after 4, 8 or 12 seconds. In this case the average is 8, so it would be a VI 8 schedule of reinforcement. The salesman could be paid when selling a book after 1, 3 or 5 hours, a VI 3 schedule of reinforcement.

The next question would be “How do the different schedules of reinforcement compare to each other?”. Kazdin (1994) argues that a continuous schedule of reinforcement or at the very least a “generous” schedule of reinforcement is ideal when teaching new behaviours. After a behaviour has been learned, the choice of which type of reinforcement schedule to use becomes somewhat more complex. Kazdin also mentions that behaviours maintained under a partial schedule of reinforcement are more resistant to extinction than behaviours maintained under a continuous schedule of reinforcement. The thinner the reinforcement schedule for a certain behaviour, the more resistant to extinction that behaviour is. In other words, the learner presents more responses for less reinforcers under partial schedules when compared to a continuous schedule of reinforcement.

According to figure 2 we can see that, in general, a variable ratio schedule produces more responses for a similar or lower number of reinforcers than other partial schedules of reinforcement. In many situations it also seems to produce those responses faster and with little latency from the individual. This information, along with my own personal observations and communication with professionals in the field of animal training, makes me believe that when trainers use the broad term “Variable Schedule of Reinforcement” they usually mean a variable ratio schedule.

Figure 2 – Behaviour responses under the most common types of partial schedules of reinforcement (Chance, 2003; Kazdin, 1994; Schunk, 2012).

A variable ratio schedule might elicit the highest response rate, a constant pattern of responses with minimal pauses and the most resistance to extinction. A fixed ratio has a slightly lower response rate, a steady pattern of responses and a resistance to extinction that is dependent on the ratio used. A fixed interval schedule produces a moderate response rate, a long pause in responding after reinforcement followed by gradual acceleration in responding and a resistance to extinction that is dependent on the interval chosen (the longer the interval, the more resistance). A variable interval has a similar response rate, a steady pattern of responses and is more resilient to extinction than a fixed interval schedule. These characteristics of partial schedules of reinforcement are summarised in table 1.

Table 1 – Characteristics of the most common types of partial schedules of reinforcement (Wood, Wood & Boyd, 2005).

With all these different types of schedules, each with different characteristics you might be wondering: “Do I need to master all of these principles to successfully train my pet at home?” The quick and simple answer is “No, you don’t”. For most animal training situations, a continuous schedule of reinforcement will be a simple, easy and effective tool that will yield the results you want.

Doing a training session with your dog in which you ask for behaviours on cue when the dog is in front of you (sit, down, stand, shake, play dead) could be very well maintained using a continuous schedule. A continuous schedule of reinforcement would be an efficient and easy approach and it would allow you to change the cue or stop a behaviour easily (faster extinction) if you change your mind about a given behaviour later. One could argue that a variable ratio schedule would possibly produce more responses with less reinforcement, and a higher resistance to extinction for these behaviours. One of the disadvantages of this option would be the possibility of a ratio strain (post-reinforcement pauses or decrease in responding).

Some specific situations might justify the maintenance of a behaviour using partial schedules of reinforcement. For example, when a dog has learned that lying down on a mat in the living room results in reinforcement, the dog’s carer could maintain this behaviour using a variable interval schedule of reinforcement, in which the dog only gets reinforced after varying amounts of time for lying on the mat. Martin and Friedman (2011) offer another example in which partial reinforcement schedules could be helpful. If a trainer wants to train a lion to make several trips to a public viewing window throughout the day, the behaviour should be trained using a continuous schedule to get a high rate of window passes in the early stages. The trainer should then use a variable ratio schedule of reinforcement to maintain the behaviour. They do advise however, that this would require “careful planning to keep the reinforcement rate high enough for the lion to remain engaged in the training”.

The process of extinction of a reinforced behaviour means withholding the consequence that reinforces the behaviour and it is usually followed by a decline in the presentation of that behaviour (Chance, 2003). Resistance to extinction can be an advantage or a disadvantage depending on which behaviour we are considering. For example, one could argue that a student paying attention to its teacher would be a behaviour that should be resistant to extinction, and so, a good option to be kept on a partial schedule of reinforcement. On the other hand, a dog that touches a bell to go outside could be kept on a continuous schedule of reinforcement. One of the advantages of this approach would be that, if in the future the dog’s owner decides that she no longer wants the dog to touch the bell, by not reinforcing it anymore, the behaviour could cease to happen relatively fast.

While I do believe that for certain specific situations, partial schedules of reinforcement might be helpful, I would like to take a moment to caution against the use of a non-continuous pairing of bridge and backup reinforcer. Many animal trainers call this a “variable schedule of reinforcement” when in practical terms this usually ends up being a continuous reinforcement schedule that weakens the strength and reliability of the bridge. For more information on this topic check my blog post entitled “Blazing clickers – Click and always offer a treat?”.

When asked about continuous vs. ratio schedules, Bailey & Bailey (1998) have an interesting general recommendation: “If you do not need a ratio, do not use a ratio. Or, in other words, stick to continuous reinforcement unless there is a good reason to go to a ratio”. They also describe that they have trained and maintained numerous behaviours with a wide variety of animal species using exclusively a continuous schedule of reinforcement. They raise some possible complications when deciding to have a behaviour maintained on a ratio schedule. The example given is of a dog’s sit behaviour being maintained on a FR 2 schedule of reinforcement: “You tell the dog sit – the first response is a bit sloppy, the second one is ok. You click and treat. What have you reinforced? A sloppy response, chained to a good response.”

Karen Pryor (2006) also has an interesting view on this topic. She mentions that during the early stages of training a new behaviour you start by using a continuous schedule of reinforcement to get the first few responses. Then, when you decide to improve the behaviour and raise criteria, the animal is put on a variable ratio schedule, because not every response is going to result in reinforcement. This is an interesting point, because the trainer could look at this situation and still read it as a continuous schedule of reinforcement, when in reality the animal is producing responses that are not resulting in reinforcement. At this point in time, only our new “correct responses” will result in reinforcement. From the learners’ point of view the schedule has become variable at this stage. Pryor concludes that when the animal “is meeting the new criterion every time, the reinforcement becomes continuous again.”

Pryor (2006) suggests that the situations in which you should deliberately use a variable ratio schedule of reinforcement are: “in raising criteria”, when “building resistance to extinction during shaping” and “for extending duration and distance of a behaviour”. Regarding the situations in which we should not use it, she starts by saying that we should never use a variable ratio schedule purely as “a maintenance tool”. She adds that “behaviours that occur in just the same way with the same level of difficulty each time are better maintained by continuous reinforcement”. Pryor also advises against the use of a variable ratio schedule for maintaining chains, because “failing to reinforce the whole chain at the end of it would inevitably lead to pieces of the chain beginning to extinguish down the road.” Finally, she does not recommend using such a schedule of reinforcement for discrimination problems such as scent, match to sample tasks, or any other training that requires choice between two or more items.

In conclusion, there are a few possible schedules of reinforcement that can be effectively used to train and maintain trained behaviours for our pets. Each has its own set of characteristics, but for most training situations, a continuous schedule of reinforcement is a simple, efficient and powerful tool to effectively communicate with our pets. Some specific training situations might be good candidates for partial schedules of reinforcement. In those situations, you should remember to follow each bridge with a backup reinforcer, plan your training well and keep the reinforcement rate high enough for the animal to remain engaged. Have fun with your training!

Bailey, B., Bailey, M., (1998). "Clickersolutions Training Articles - Ratios, Schedules - Why And When". Clickersolutions.com. N.p., Accessed 2 February 2018.

Chance, P. (2003). Learning and behavior (5th ed.). Belmont: Thomson Wadsworth.

Kazdin, A. (1994). Behavior modification in applied settings (5th ed.). Belmont: Brooks/Cole Publishing Company.

Martin, S., Friedman, S.G., (2011, November). Blazing clickers. Paper present at Animal Behavior Management Alliance conference, Denver. Co.

Pryor, K. (2006). Reinforce Every Behavior?. Clickertraining.com. Retrieved 2 February 2018, from https://clickertraining.com/node/670

Schunk, D. (2012). Chapter 3: Behaviorism. In Learning theories: An educational perspective (6th ed., pp. 71-116). MA: Pearson.

Wood, S., Wood, E., & Boyd, D. (2005). The world of psychology (5th ed., pp. 180-190). Boston: Allyn & Bacon. Retrieved from http://www.pearsonhighered.com/samplechapter/0205361374.pdf

Picture: www.morguefile.com

9 Comments

Jackpots in Animal Training

15/6/2017

4 Comments

In 2009, shortly after I started training animals on a more ongoing basis, one of the first concepts that I learned from fellow animal training colleagues was the concept of a jackpot. A quick google search yields the following definition “a large cash prize in a game or lottery, especially one that accumulates until it is won.” For animal training purposes the following definition is more commonly used “giving a dog a really big reward, often a large number of treats, all at once. It is usually reserved for a breakthrough moment or a desired behaviour that the dog only occasionally performs” (Schwarz, 2016). This is the first definition that I have been exposed to and the idea behind it is that by offering a large reward the behaviour that preceded it is somehow more likely to be remembered better and repeated in the future.

My first contact with this concept was in a context in which the animals were trained using a bridge or bridging stimulus (e.g. a click from a clicker) for both learning and maintaining known behaviours. Known behaviours do not necessarily need a bridge to be maintained, but that is a topic for another discussion. The theory goes that if Fido gets a click and three pieces of food for a perfect sit and a click and only one treat for a decent, but not perfect sit, he will be more likely to do perfect sits in the future.

We might be inclined to assume that an animal will be tuned in to the magnitude or quality of the reinforcer in a way that makes some variations of the same behaviour more likely to be repeated than others. However, is that really what happens? Does the animal actually remember the topography of that behaviour better because she got 5 or 10 food treats after the click instead of the standard one treat? In this text we will explore the function and the best use of jackpots in animal training by relying on the opinion defended by animal training professionals.

Jackpots are commonly used as a special reward for excellent behaviours. They are an attempt by the trainer to capitalise on a behaviour (or a variation of the behaviour) that the trainer particularly likes. This seems to be based on the assumption that a particular special reward will increase the chances of similar responses in the future. For example, Kazdin (1994) mentions that “The greater the amount of the reinforcer delivered for a response, the more frequent the response will be.” However, research confirming this rationale regarding animal training, with a bridging stimulus, is hard to find. If you click and pay more than one treat there are a few things happening that are helpful for your training program, but those things might differ from the traditional interpretation of jackpots. So, let’s start by having a look at some quotes by international references in the world of animal training and how they contrast with the common understanding of jackpots in animal training.

“A jackpot serves to charge up future performance but does little to communicate to the animal that his previous actions were special.” (Reid, 2012).

“If you click, and then deliver the treat afterwards, an especially large, numerous, or wonderful treat is no different from any other treat, in terms of its ability to reinforce behavior.” (Pryor, 2006).

“Click means treat is coming. If the treat is sometimes a kibble and sometimes chicken, sometimes small and sometimes huge, that's fine, it keeps your clicker nice and strong; but it doesn't tell the animal anything different about the behavior.” (Pryor, 2006).

“When it comes to training a new behavior, it's rare that a jackpot would work in having the dog repeat the jackpot earning behaviour.” (Fisher, 2009).

“Jackpots make the giver feel good, but they interrupt the flow of training and focus the dog on the food, rather than the task. (…) Overall, it's clarity of criteria and a consistently high rate of reinforcement that leads to a solid behavior.” (Alexander, 2006).

As you can see from these quotes, there are several animal training specialists suggesting a different interpretation for what really happens when we bridge a behaviour and offer a bigger reward after. Let’s explore their rational and look into what really happens when we use such an approach.

Clicking and paying several treats can increase the value of the clicker. Given that there is some variability regarding what happens after the click, that stimulus (the click) remains nice and strong from the animal’s perspective (Pryor, 2006). When a large reward is offered in the beginning of a training session it can motivate the animal and increase interest in the task. It can make the animal increase its activity level and it can trigger subsequent variable behaviour (Fisher, 2009). So, as you can see offering several rewards after the click can actually accomplish a few handy things. These are some of the things that happen when we use the traditional interpretation of jackpots in animal training. Now let´s have a look at a few things that do not necessarily happen.

Clicking and offering several treats does not provide the animal with any additional information about the behaviour that she just did. Offering more than one treat after the click is also unlikely to strengthen a behaviour over another; what ultimately accomplishes that goal is when you choose to use your clicker: clicked and rewarded behaviours are more likely to occur in the future when compared to behaviours that do not get a click and reward. For many practical situations in which we are training our pets, offering several treats after the click simply tells the animal that sometimes it gets more treats than usual (Pryor, 2006; Farricelli, 2014).

For her Masters Thesis, dog trainer Elizabeth Kershaw (2002) conducted a dog training experiment that tried to measure the effects of magnitude of reinforcement after the click when dogs are learning a new task. She had two groups of dogs learning to touch a cone with their nose and with their paw. One group progressed through criteria with one click and one treat all the time (constant group), whilst another group progressed through criteria with one click and one treat most of the time and an occasional click and delivery of larger reinforcement amounts (jackpot group). Overall, significant differences in performance between the two groups could not be detected.

Kershaw (2002) also mentions that using a jackpot to reward a breakthrough when the dog is learning a new behaviour might be a better option when it marks the end of the session. Using a jackpot halfway through a session, when you intend to continue immediately might be counter-productive. This can cause the dog to not be able to associate the larger reward received with the intended behaviour because a longer period of eating can disrupt the learning flow.

Fisher (2009) offers some interesting additional considerations about the traditional use of jackpots in dog training. She mentions that the longer it takes for the animal to eat the reward, the more the behaviour might be subject to memory decay (a disconnect between the reward and the behaviour that caused it). Instead of strengthening a behaviour, jackpotting can elicit the dog to follow it with a different behaviour. For speedy learning, a short time span between reward and the next repetition might be ideal and the training will progress faster with a rapid rate of reinforcement (many repetitions, each resulting in quick to ingest treats).

So, what if we still want to incorporate jackpots in our animal training sessions? What are the properties of a real jackpot? A real jackpot should function as an event marker (no bridge required) and almost startle the animal, it should consist of an unusual primary reinforcer and it has to make that behaviour more likely to happen again. A jackpot, when used correctly, should be an astonishingly big reinforcer, delivered contingently. The jackpot has to appear while the animal is doing the behaviour, not afterwards (Pryor, 2006). If the reward is offered after the behaviour we enter the realm of the non-contingent reward.

A non-contingent reward is a reward that is offered after the behaviour has occurred as opposed to while the behaviour is occurring. A non-contingent reward is not necessarily associated with any specific behaviour, it can be used to encourage the animal in a given situation and it can increase motivation (Pryor, 2006). Pryor offers the example of a slot machine jackpot, which is always delivered contingently (while you are playing), so that the act of playing is heavily rewarded. Compare this with a situation in which you play the slot machine, you then go out for dinner, then you go to a music concert and finally you return to your hotel bedroom to find a huge sum of money on your bed. This is the same amount of money as the slot machine jackpot, but this time it was not delivered contingently. Hard to say which behaviour would increase in this case... Perhaps the going back to the hotel, but not necessarily the slot machine playing.

To conclude, when we click and offer a bigger reward (say 3, 5 or 10 treats) we might be maintaining the animal’s motivation high or increasing it, which means that the following behaviours can be more enthusiastic. The bridge is also kept nice and strong, because there is some variability in what happens after we use it. This procedure is not a real jackpot though. A real jackpot should be totally unexpected and almost startle the animal, it should be a rare event, and you should not click the behaviour (clicking and offering lots of treats will make changes in the connection between the bridge and the reinforcer; not in the behaviour that you bridged).

Animal training is a fluid technology that is constantly being updated. 20 or 30 years ago we were probably looking at jackpots in animal training in a different way than we are today. A few years from now, today’s knowledge might get updated and refined. That is the beauty of the animal training world and I can’t wait to see what the next chapter brings us.

Alexander, M. (2006). Should You "Jackpot" Outstanding Responses? Clickertraining.com. Retrieved 1 May 2017, from https://clickertraining.com/node/632

Farricelli, A. (2014). Using Jackpots of Treats in Dog Training. hubpages. Retrieved 5 May 2017, from https://hubpages.com/animals/Using-Jackpots-of-Treats-in-Dog-Training

Fisher, G. (2009). The Thinking Dog: Crossover to Clicker Training (1st ed.). Wenatchee, Wash.: Dogwise Pub.

Kazdin, A. (1994). Behavior modification in applied settings (5th ed., p. 147). Pacific Grove, CA: Brooks/Cole Publishing Company.

Kershaw, E. (2002). An evaluation of the use of magnitude of reinforcement, i.e. “jackpot” rewards, during shaping in the training of pet dogs. (MSc). University of Southampton New College.

Pryor, K. (2006). Jackpots: Hitting it Big | Karen Pryor Clicker Training. Clickertraining.com. Retrieved 29 April 2017, from https://clickertraining.com/node/825

Reid, P. (2012). Dog Insight (1st ed.). Wenatchee, WA: Dogwise.

Schwarz, S. (2016). AgilityNerd Dog Agility Blog : Better Jackpot Rewards. Agilitynerd.com. Retrieved 4 May 2017, from http://agilitynerd.com/blog/dog/training/Jackpot.html

Picture: www.morguefile.com

4 Comments

Blazing clickers – Click and always offer a treat?

13/5/2016

5 Comments

In modern animal training it is common to use a clicker or any other type of bridging stimulus to allow for better communication during the training process. The clicker allows us to tell the animal when the correct behaviour has been done. It also helps establishing contingency between the behaviour and its consequence, thus strengthening the behaviour. Research from Pavlov and Skinner suggests that, for the clicker to retain its full strength as an event marker, every click should be paired with a backup reinforcer (1:1 click-treat pairing; for a review on this topic see Fernandez 2001). Many trainers around the world do not pair every click with another well-established reinforcer (a non-1:1 pairing) and still achieve successful outcomes when training animals. Martin and Friedman (2011) call this approach “blazing clickers” and define it as “the unsystematic, rapid-fire clicking of each correct response in a series of correct responses, without following every click with a well-established, backup reinforcer, i.e., click, no treat”.

The following text is a discussion of my personal experience, the experts’ opinion, the controlled experiments and the literary references that relate to the topic of click-treat pairing in animal training. By debating it I hope to bring some clarity to the issue and help animal trainers make a more informed decision when deciding on how to use their bridging stimulus. Learning theory applies to all species, so the following content should be equally relevant for those of you training dogs, pet rats or even dolphins, if you are lucky enough to have the opportunity to do that for a living.

Terminology

To make this text easier to understand, I will refer to some of the terminology that Martin and Friedman (2011) used in their insightful paper “Blazing Clickers”, as indicated below:

“1. The word click refers to any conditioned reinforcer used in training to reinforce a behaviour with super contiguity. It is used synonymously with conditioned or secondary reinforcer, bridging stimulus, bridge, event marker and marker.
2. The word treat refers to any well-established reinforcer, conditioned or unconditioned, used to condition and maintain the reinforcing strength of the click. Treat is used synonymously with backup reinforcer (most often in animal training the backup reinforcer is food).
3. The term blazing clickers refers to the practice of repeatedly clicking without systematically delivering the backup reinforcer, also referred to a solo clicks.”
4. Many trainers mention a Variable Schedule of Reinforcement when they are technically referring to a Variable Ratio Schedule of Reinforcement. For simplicity, in this text we will maintain the general term "Variable Schedule of Reinforcement", but keep in mind that Partial Schedules of Reinforcement are divided in Fixed Ratio, Variable Ratio, Fixed Interval and Variable Interval.

Personal observations and experience

My first contact with animal training happened when I was training some dogs from family members and friends. I was reading animal training books, watching dog training DVDs and then practicing the learned skills with these dogs. I soon realised how powerful classical and operant conditioning can be and got absolutely hooked on the topic. Soon after that I started working in the marine animal training field and quickly learned that worldwide there were lots of parks, zoos and aquariums in which the trainers used a Blazing Clickers (BC) approach (a non-1:1 pairing of clicker and backup reinforcer). This was a surprise to me because I had never seen this approach being suggested in any animal training book or article that I had encountered. Yet, much to my surprise these animals were extremely well trained and capable of amazing behaviours. The field of marine animal training is full of incredible people that love to share their passion, techniques and successes with fellow colleagues and thus I started using a BC approach with considerable success and personal satisfaction.

A few years later I read the article “Blazing Clickers” (Martin and Friedman 2011) explaining the reasons why a 1:1 click-treat pairing is recommended in animal training and adverting trainers to the downside of not pairing every click with an additional reinforcer. Whenever I see someone that is an International reference in a given field suggesting something that is different from what I have been doing I tend to embrace that new approach as a better option. One could be tempted to ignore such advice because of previous success with a different approach. However, I personally find it very helpful to have the ability to be open to new approaches and trial them, especially when they are put forward by someone that definitely knows more than I do about a certain topic. And thus, from that moment onwards I have always looked at a BC approach as a non-ideal technique for animal training. I kept that view for several years and yet, I still see many animals today that are trained with a BC approach.

According to my personal observations and experience, a BC approach is relatively common in the captive animal field. If you are going to a public aquarium to see a sea lion or dolphin presentation you are likely to see a BC approach. If, on the other hand, you go to a modern puppy class or if you sign up for a dog training course/certification you are likely to encounter a 1:1 click-treat pairing approach. It is not clear why the captive animal community uses a different approach than the dog training field but I would speculate that two of the main reasons would be:

A lot of the original work about schedules of reinforcement was done without the use of a bridging stimulus (e.g. a click or a whistle). From these studies we learned that a Variable Schedule of Reinforcement can make behaviour more resilient to extinction and make the animal more persistent about getting reinforcement (this does not necessarily mean that it is the best schedule for most animal training situations; see Bailey and Bailey 1998). When captive animal trainers started to try to implement these schedules they ended up assuming that the “Variable” portion of the equation only applied to the backup reinforcer and not to the bridging stimulus. This is a mere misinterpretation of what a real Variable Schedule of Reinforcement is. If you click every correct response but only follow it with a backup reinforcer occasionally you are still technically using a Continuous Schedule of Reinforcement, but one that unfortunately weakens the power of your bridging stimulus. More on this later…
Using a bridging stimulus is highly reinforcing for the trainer. It gives us a sense of accomplishment if we can get lots of correct responses from the animal and I would imagine that a session with 40 clicks is more reinforcing to the trainer than a session with 15 clicks. In practical terms a BC approach might actually be more reinforcing for the trainer than for the animal.

What do the experts say?

In their 2011 article, Martin and Friedman list five common misconceptions that trainers have and commonly use to justify a BC approach. The following list includes these common misconceptions and the main points for why they are pure misconceptions, as suggested by Steve Martin and Susan Friedman:

The clicker is already a reinforcer (sometimes as strong or even stronger than a primary reinforcer) so there is no need for an additional one:
1. Even though some secondary reinforcers can be as strong or even stronger than a primary reinforcer they still depend on repeated pairing with other reinforcers to acquire and maintain their reinforcing ability;
2. Primary reinforcers are automatically reinforcing or pre-wired, while secondary reinforcers depend on pairing with additional reinforcers;
3. Every time a click happens without a backup reinforcer, it just lost some of its ability to work as a reinforcer;
4. If the click fails to predict a treat, the animal may develop a tendency to scan the environment for other cues (such as the trainer’s hand moving towards the treat bag); the animal might actually respond to this visual stimulus before or after hearing the click and thus using it as its “official” bridge.
BC makes training more interesting and unpredictable for the animal:
1. Although variety is important, it should come from the variety and quantity of the reinforcers, the behaviours trained and the pace of the session; not from using a BC approach;
2. Animals may become inattentive in a training session due to blazing behaviours (lots of behaviours asked in quick succession with clicks after each correct behaviour and then a big reward at the end); as an example, targeting is most helpful when the behaviour is held for some duration of time, instead of a rapid succession of several quick targets.
The behaviour will be stronger using a BC approach because it is a variable schedule of reinforcement similar to a slot machine:
1. An intermittent schedule of reinforcement creates persistence into fluent behaviour, but if the clicker is an effective conditioned reinforcer, withholding the treat does not change the fact that we are still using a continuous schedule of clicks. If the click is not an efficient conditioned reinforcer (meaningless noise) the animal has to try to find the behaviour-consequence contingency with other environmental cues;
2. When persistence is required it is better to teach the new behaviour with continuous reinforcement (click-treat) and then gradually stretching the reinforcements over time to the desired variable schedule (still pairing the click and treat, but varying the length of time or repetitions that the animal needs to perform the behaviour to be reinforced).
It reduces frustration based aggression because the animal is not expecting a treat every time:
1. Plan your sessions to have enough backup reinforcers or end the session sooner;
2. There is data showing that extinction trials (click-no treat) can create frustration induced aggression.
The clicker can tell the animal that he/she did something right but that he/she should keep doing it. The click can mean different things:
1. A keep going signal (KGS) is indeed helpful in animal training, but the click should not mean two different things;
2. The click meaning both “keep going” and “food is coming” makes for very unclear communication;
3. The KGS and the formal bridge (click) should be two different stimulus.

So in sum, Martin and Friedman (2011) refer that clickers, whistles and other event markers can be used to improve communication between trainer and trainee, but that this communication is only clear when the conditioned reinforcer is systematically paired with a well-established backup reinforcer. When the click is not reliably paired with other reinforcers communication becomes less clear, motivation and performance can go down and frustration/aggression can go up. Not pairing a click with another reinforcer makes the click lose meaning and the animal tends to rely on other environmental cues (e.g. the hand going to the treat pouch) as a reliable predictor of an imminent reward. Finally, the article suggests that every time we do a solo click (no treat) the animal just underwent an extinction trial that weakens the meaning of the click.

Bob and Marian Bailey (1998) categorically testify that, in their experience, a Continuous Schedule of Reinforcement is by far the recommended approach for teaching and maintaining behaviour in animals for the vast majority of situations. They do mention a couple of very specific exceptions in which a Ratio Schedule might be used instead, but keep in mind that they never suggest that when we use a Ratio Schedule we should still click every correct response and withhold the backup reinforcer. As mentioned above, if we were to click every correct response we would be transforming our Ratio Schedule into a Continuous Schedule of Reinforcement.

What do we know from controlled experiments?

While training an animal, every time we pair a conditioned stimulus (click) with an unconditioned stimulus (a treat or other backup reinforcer), the animal just underwent a Pavlovian, classical or respondent conditioning trial. When a conditioned response is acquired it can be maintained if the conditional stimulus (the click) is followed up by the unconditional stimulus (food, water, etc.). If, however, the conditional stimulus is repeatedly used without the unconditioned stimulus (click and no food) the response becomes weaker and weaker. This process is called extinction (Chance 2003). For purposes of evaluating animal training studies, the more resistance to extinction we see, the more we can conclude that the conditioned reinforcer (the click) acquired reinforcement value.

McCall and Burgin (2002) conducted an experiment with 48 horses in which they trained them to press a lever for food. In an initial stage of the experiment, upon pressing the lever, half of the horses received a food reward only and the other half received a food reward preceded by an auditory buzzer as secondary reinforcer (similar to a click-treat sequence). Interestingly, they did not find the use of an event marker prior to the delivery of primary reinforcement (buzzer-food) to yield these horses more resilient to extinction of a learned behaviour later on. In other words, when the bridging stimulus was still marking correct behaviour but no longer paired with primary reinforcement (buzzer-no treat) the horses stopped performing the learned behaviour just as quickly as the horses that were trained using a food reward only. A second part of the experiment determined that the horses were able to learn a new task (push a flap) with the use of secondary reinforcement only (buzzer-no treat), but their interest in the task was low. Reintroducing the primary reinforcement along with the buzzer renewed their interest in the task, so the authors concluded that secondary reinforcers are more efficient when paired with primary reinforcers at a high rate.

Another similar experiment ended up coming up with similar results (Williams et al. 2004). In this case, 60 horses learned to touch a plastic cone for either primary reinforcement alone (food only) or secondary and primary reinforcement (click-treat). The horses trained with the use of the clicker did not show more resistance to extinction than the ones on the primary reinforcement only. This means that once the clicker stopped being paired with the food reward the behaviour plummeted just as with the horses that were not conditioned to the sound of the clicker. Importantly, these authors note that when the horses were put in the extinction trial (click-no treat) they appeared frustrated.

In a 2007 experiment, Smith and Davis trained dogs to touch a traffic cone with their noses. Some dogs got a click and treat, while others only got the treat when they touched the traffic cone. In this case when the dogs were put on extinction trials the ones that were conditioned to the clicker continued to perform the behaviour for longer if they continued to hear the clicker (but no food) as feedback for correct responses. It was concluded by the authors that the clicker might be able to maintain previously established behaviours when primary reinforcement cannot be delivered.

In another study, thirsty rats were conditioned to the sound of a buzzer before being offered water. With enough buzzer-water pairings the rats were then able to learn to push a lever for the sound of the buzzer, even though that behaviour stopped producing water. The buzzer became a conditioned reinforcer (Zimmerman 1957). Chance (2003) argues that the reinforcing power of the buzzer comes from being paired with another reinforcer and that it will lose its strength if it is never followed by that reinforcer. However, when water follows the sound of the buzzer, the sound may retain its reinforcing quality.

Langbein et al. (2007) conducted a shape discrimination learning task with dwarf goats (Capra hiscus). As in other similar studies some goats were trained using primary reinforcement only (water) while others received secondary and primary reinforcement (acoustic tone + water). In this study, it was concluded that secondary and primary reinforcement have to be paired at a high rate during learning and that the time delay between secondary and primary reinforcement should be short.

One of the most elucidating studies when trying to compare a 1:1 pairing to a non 1:1 pairing was conducted by Egger and Miller (1962). They conditioned rats to two different stimuli conditions (S1 and S2). In both conditions, the stimulus was paired up with food but for S1 rats every occurrence of the stimulus was followed by food (a 1:1 pairing), while for the S2 rats the stimulus was paired with food only occasionally (a non 1:1 pairing, similar to a BC approach). They then taught the rats to lever press using the conditioned stimuli. For the S2 group (non 1:1 pairing) the stimulus did not become an effective reinforcer while for the S1 group (1:1 pairing) it did.

Wennmacher (2007) trained dogs to perform a bow and a spin on cue and then compared their performance across two conditions: C+F (1:1 pairing of clicker and food) and C+C+F (clicking every correct response but offering food only for every other behaviour). In the C+F condition both dogs performed better in terms of frequency, accuracy and topography of the behaviour. In the C+C+F condition the dogs required more cues to perform the behaviour, showed increased noncompliance and other unwanted behaviours. The author also notes that in the C+C+F condition the dogs were less willing to come to the experimenter's location. In the C+F condition they showed a more enthusiastic body language.

The overall conclusions of these studies seem to be in agreement with the points brought up earlier. A 1:1 pairing of bridging stimulus and backup reinforcer is ideal to keep the strength of the bridging stimulus. In other words, the less we follow a click with a treat the less efficient the click will be. Additionally, a non 1:1 pairing seems to be sub-optimal and can even elicit frustration and other unwanted behaviours.

Additional considerations

Some trainers even reverse the sequence of events, by using the bridging stimulus after the primary reinforcement has been offered (treat-click; or behaviour-click-treat-click) when working with animals that have low motivation to eat. I believe that the reasoning behind it is that the click might be stronger than the primary reinforcer so when the animal actually accepts the food bridging it will increase the likelihood of the animal eating better in the future. I have worked with several animals with low food drive and I must confess that I do not see the above method as a helpful approach to tackle the problem. One of the dogs I worked with recently was Oakley, a beautiful American Staffordshire Terrier. Like many other animals that I have worked with, she was not very food driven and would turn her head away from the treat (after the click) in the first few sessions. It was decided to keep pairing every click with a food reward and after a few sessions Oakley was taking the treats with gusto and looking way more food motivated than in the first few sessions.

What happens when we ask the animal for several behaviours and we click every correct response but we only follow that up with a backup reinforcer sometimes? As mentioned above, this is commonly but wrongly believed to be a Variable Schedule of Reinforcement (Martin and Friedman 2011). In addition, the click is an event marker and when we click and do not use an additional reinforcer the event will technically be an extinction event, thus weakening the strength of the click (Chance 2003). A real Variable Schedule of Reinforcement (if you really need to use one) would be one in which some correct responses do not get the click and the backup reinforcer, while others get both the click and the additional reinforcer. So, regardless of the schedule you choose, avoid doing solo clicks.

In operant terms, extinction events can create emotional behaviour, particularly aggression. Rats conditioned to press a lever for food have been observed biting the lever or other rats if lever pressing no longer produces a food reward (Azrin et al. 1966, Rilling and Caplan 1973). This is a very important consideration, especially for those working in free contact with animals that are capable of inflicting serious damage.

If you are an animal training and behaviour enthusiastic learner you will realise that the literature contains seldom, if any, references or recommendations for a BC approach to train an animal. There are many discussions about schedules of reinforcement and about which ones are better for different goals and circumstances (for a review and sound advice see Bailey and Bailey 1998), but the suggestion of using solo clicks is an extreme rarity. If you are comparing a Continuous to a Variable Schedule of Reinforcement keep in mind that in both options click and treat should go together.

Conclusions

One of the main challenges in determining a BC approach’s efficiency is that in controlled studies it has been studied only occasionally. Clicking every correct response but only using a backup reinforcer occasionally seems to be a human construct that is widely used but rarely studied in controlled settings. With that said, what we do know from Classical and Operant Conditioning Theory and practical studies strongly suggests that a 1:1 click-treat pairing is a more efficient tool to establish clear communication.

If you are having issues with frustration based aggression or if you see visual tracking for additional cues (e.g. tracking when you reach for the treat pouch) I highly recommend that you trial the 1:1 click-treat pairing approach. A good test for the strength of your bridge is the following: ask the animal to perform a behaviour and then remain completely still while sounding your bridge. Try this with a few different behaviours. Did the animal continued to perform the behaviour as if nothing has happened? If so your bridging stimulus is not strong/clear enough and you should give this idea a chance.

If you currently do not have any problems in your training program you can keep the BC approach but I would still advise against it. Here is a real life comparison that is probably going to elucidate my view on this topic. You can watch a movie in a VHS tape and have a great time doing so. However, if you watch it in Blue Ray your experience is likely to be much better. Both options allow you to watch the movie and both options work, but in a Blue Ray the picture and the sound are much better. Which option would you would prefer?

So, in conclusion, should you adopt a BC approach to train your animal? After doing all this research I am left with the feeling that both approaches can work and yield good results, but I cannot help but think that according to the evidence that we have a 1:1 click-treat pairing approach seems to have less, if any, disadvantages when compared with a BC approach.

Azrin, N. H., Hake, D. F., Hutchinson, R. R. (1965). The Opportunity for Aggression as an Operant Reinforcer during Aversive Aggression. Journal of the Experimental Analysis of Behavior. 8(3), 171–180.

Bailey, B., Bailey, M., (1998). "Clickersolutions Training Articles - Ratios, Schedules - Why And When". Clickersolutions.com. N.p., Accessed 24 April 2016.

Chance, P., (2003). Learning and behavior (5th ed.). Belmont, CA: Wadsworth.

Egger, M. D., Miller, N. E., (1962). Secondary reinforcement in rats as a function of information value and reliability of the stimulus. Journal of Experimental Psychology, 64(2), 97-104.

Fernandez, E.J., (2001). Click or Treat: A Trick or Two in the Zoo. American Animal Trainer Magazine, 2, 41-44. Shedd Aquarium.

Langbein, J., Siebert, K., Nuernberg, G., Manteuffel, G., (2007). The impact of acoustical secondary reinforcement during shape discrimination learning of dwarf goats (Capra hircus). Applied Animal Behaviour Science. 103(1-2), 35–44.

Martin, S., Friedman, S.G., (2011, November). Blazing clickers. Paper present at Animal Behavior Management Alliance conference, Denver. Co.

McCall, C.A., Burgin, S.E., (2002). Equine utilization of secondary reinforcement during response extinction and acquisition. Applied Animal Behaviour Science. 78, 253–262.

Rilling, M., Caplan, H. J., (1973). Extinction-induced aggression during errorless discrimination learning. Journal of the Experimental Analysis of Behavior. 20, 85-92.

Smith, S.M., Davis, E.S., (2008) Clicker increases resistance to extinction but does not decrease training time of a simple operant task in domestic dogs (Canis familiaris). Applied Animal Behaviour Science. 110(3-4), 318-329.

Wennmacher, P. L. (2007). Effects of Click + Continuous Food Vs. Click + Intermittent Food on the Maintenance of Dog Behavior (Master's Thesis). University of North Texas.

Williams, J.L., Friend, T.H., Nevill, C.H., Archer, G., (2004). The efficacy of a secondary reinforcer (clicker) during acquisition and extinction of an operant task in horses. Applied Animal Behaviour Science. 88, 331–341.

Zimmerman, D. W., (1957). Durable secondary reinforcement: Method and theory. Psychological Review. 64, 373-383.

Picture: www.morguefile.com

Edited: 16/12/2016

5 Comments

Why is timing so important in animal training?

25/10/2015

4 Comments

Some five or six years ago I wrote an article for a Portuguese website about the importance of timing in dog training. Today, I will revisit that article and re-word some of its content according to my experiences and what I have learned since then. I still agree with most of what I wrote back then, but I also have slightly different views on some topics. Hence, this will be an introductory article on the topic of timing specifically directed to people that are new to animal training.

Timing is probably one of the attributes that contributes most to establishing an efficient venue of communication between a human and a non-human animal. If we want to positively reinforce a behaviour, and increase its frequency of occurrence in the future, the reward should be offered at the exact moment when the behaviour we want to "capture" occurs (or within one second after the occurrence of this behaviour). The moment in which we offer the reward contains very important information about what we are trying to teach.

Operant conditioning tells us that the common sequence of events is antecedent, behaviour, consequence (ABC). Thus, the cue (antecedent) should come before the behaviour and then, when the behaviour occurs, an appropriate consequence must follow. If we want the dog to learn to relieve herself on cue, that cue should ideally be offered when the dog starts circling and sniffing the ground. We should not wait until she has already started the behaviour and/or has finished it. Instead, the cue should happen before. Similarly, a road sign that warns of a crossroad should appear before the crossing. If it appears on or after the crossing it will not be very useful.

We then have the issue of using good timing for the feedback we provide when the behaviour happens. If the timing is not appropriate we may be reinforcing a behaviour that is not the intended one. Imagine asking the dog to sit and when he does so we get distracted and look at something else for a few seconds. During that time the dog can stand up, look away, sniff the ground, etc. When you turn back and reinforce the behaviour, what did you reinforce exactly? Was it the sit? Was it the Stand? It is usually difficult to know, but you were probably reinforcing the behaviour that the dog was doing when you offered the reward.

Now let’s discuss an imaginary scenario. Imagine that you go to a restaurant for lunch and you choose fruit instead of chocolate mousse (your typical choice) for desert. When you get home two hours later you find 100$ in cash in your mail box. The money was left there without your knowledge by a wealthy friend of yours who seeks to reinforce the fact that you have chosen to eat fruit instead of chocolate mousse. The millionaire’s intention was to make it more likely that you would choose fruit in the future. It is highly unlikely that the millionaire will be successful in this initiative. Many behaviours have occurred between the time you ate the fruit and found the cash. In this scenario the probability of association between the two behaviours is greatly reduced. Now imagine that what happened instead was that when you ordered the fruit it came with a 100$ voucher on the side. In that scenario it is much more likely that you will order fruit the next time. It all has to do with timing.

We can now make an analogy to an issue that most dog guardians face. Imagine that you get home and your puppy urinated on the floor during your absence. If you decide to reprimand her for the “mistake” she did when you get home, how would she know why she is being reprimanded? It will be very difficult for her to associate your reaction with the behaviour of urinating on the floor that happened half an hour ago. I should make a quick reference here to the fact that I would not recommend reprimanding a puppy for a house training mistake, even if you catch her “in the act”. First, leaving a young puppy unattended in the house without supervision and/or management and “hoping for the best” is a recipe for disaster. Second, if you reprimand her “in the act” she may think that she cannot relieve herself in front of you and will start to sneak away when she needs to relieve herself.

Unfortunately (or maybe fortunately) we cannot tell an animal that we like what he did two hours ago. We have to act in the present moment and we can only influence what is happening then. When I have a conversation with a person, I can let her know that I like what she did yesterday and earlier today. I can also let her know about the things that I did not like. With non-human animals, a system that allows us to do this does not yet exist, as far as I am aware at least. We would then have the issue of whether or not they are cognitively capable of comprehending information referring to the past, but that is a topic for another time…

I mentioned earlier that in order to positively reinforce a behaviour, and increase its frequency of occurrence in the future, the reward should be offered when the behaviour happens (or in less than a second after the behaviour) in order to be effective. The reality is that it is very difficult to achieve this consistently. The solution to this problem involves the use of a conditioned or secondary reinforcer (e.g. a word, a whistle, the sound of a clicker). A primary reinforcer is something that the individual naturally needs such as food and water. A conditioned reinforcer, on the other hand, acquires meaning when repeatedly paired with something that the individual finds rewarding (typically a primary reinforcer, but it could also be a previously established secondary reinforcer). To accomplish this we make use of classical conditioning and we condition the individual to a “marker” or “bridge”. We do this by systematically presenting it before the delivery of something that already has reinforcement value for the animal (e.g. click – food, click – food, etc.). This is often called “charging the marker” or “charging the clicker”. With enough repetition the clicker will start to elicit the same response that the food would and later on we can use this to pinpoint moments in time in which he just earned a reward.

Now we are able to tell the dog to sit, click for the sit and give the reward after a few seconds. Another major advantage of this system is that we can reinforce behaviour that occurs away from us. A dolphin jumps to touch a ball and the trainer blows its whistle when it happens. The delivery of the reward can now occur when the animal is back with the trainer, even if that takes a few seconds to happen. This procedure eliminates confusion about which behaviour caused the delivery of the reward.

Regarding the bridge, there is a lot of ongoing discussion debating whether or not the bridge should tell the animal that the behaviour is over and that it can now come to the trainer to collect the reward. The alternative approach would be one in which the animal hears the bridge and should remain in position to receive the reward. An example of this is when you tell your dog to sit. When he sits, you click and offer a food reward. The question then is: does he have to remain sitting to receive it? I tend to prefer using the bridge as a classic terminal signal that tells the animal that he did well and that the behaviour is over. I recognize however that many successful trainers use the other approach and that animals seem to learn equally well. Furthermore, for some behaviours offering the reward out of position seems to speed up the learning process, while in other situations offering the reward in position seems to convey important information to the animal.

In conclusion, timing is a very important skill to master if effective communication is to be achieved. In operant conditioning terminology the sequence antecedent, behaviour, consequence (ABC) is highly accepted and it tells us the correct chain of events when something is under operant conditioning control. Regarding the consequence, when training non-human animals there is no point in providing feedback for something that has happened in the past. Acting in the moment when the behaviour happens makes the trainee’s learning experience easier and clearer. Considering the importance of good timing, the use of a “marker” or “bridge” is recommended to fill in the gap between the behaviour and the collection of a reward.

Picture: www.morguefile.com

4 Comments

Five things that dog trainers do differently

31/7/2015

15 Comments

1.       Socialization

Dog trainers recognize how important it is to start socializing puppies early on and how much easier it is to prevent problems from arising instead of trying to fix them later. Puppies go through a sensitive period of socialization between 4 and 14 weeks of age (the exact length of this period is variable and constantly being debated). Within this time period, the worst thing we can do is to keep the puppy indoors at all times, with no access to members outside of the household. It is critical that the puppy goes outside to interact with the world.

There are some health considerations during this period because the puppy’s immune system is still developing and thus some caution is advised to minimize exposure to diseases. Going through where we should or shouldn’t take our puppies early on is beyond the scope of this article, but this is something that you can discuss with your veterinarian and dog training professional. If your veterinarian does not recognize the importance of early socialization and advises you to keep the puppy indoors until the puppy is four months old, I would recommend that you seek a second opinion.

Socialization is critical and you will probably only have one chance to do it right. If you start too late you are already risking the puppy developing phobias and other detrimental behavioural issues. I also recommend that socialization is an ongoing commitment with special emphasis during the first year. If you let too much time go by without exposure to a certain stimulus (dogs, people, places), the dog may start to show some type of negative emotional response towards those things.

Here is a real life example: imagine that you are raising a child and that between the ages of 2 and 15 years of age that child only lives inside the house and never goes out to school, to play with other children, to interact with other adults, to visit different places, etc. Certainly, this child would not develop healthy social habits and behaviour. A similar process happens to dogs, but it happens faster. Dog trainers are aware of this and they take their puppies out to interact with other puppies, friendly adult dogs, people from different age groups and ethnicity, and to places that look, feel, smell and sound different.

2.       Management

Dog trainers are very good at using management solutions to make their life (and their dogs’ life) easier. A dog in a new environment (especially if it is a puppy) with too much freedom to roam the house and make his/her own choices is a recipe for disaster. Dog trainers are aware of this and take a proactive approach to minimize the amount of mistakes that the dog can make.

The use of dog crates, baby gates, exercise pens, leashes (when supervising the puppy) and other management tools makes it easier to control where and what the dog is doing. Many new pet owners simply bring a puppy home and hope for the best (they hope that the puppy will know where to relieve himself/herself and what are appropriate chew items). Dog trainers know that puppies will probably make choices that we don’t like and so they use confinement to minimize issues during the initial phase. Thus, if the dog cannot be supervised he/she goes into a confinement option. Dog trainers are also aware that harnesses and leashes are great tools to be used inside the house, as long as this is done under supervision (leashes are not just for leash walking).

Dog trainers will balance out the amount of confinement with the amount of physical activity, mental stimulation, socialization and training sessions. When house training and chewing appropriate items is reaching success on a regular basis, dog trainers start to progressively offer the dog more freedom until the use of confinement is considerably reduced.

3.       Motivation

Dog trainers are fully aware that generally dogs will not do things to please us. They will mostly do things to please themselves. With that in mind, most dog trainers use access to high value resources contingent upon doing something that they want the dog to do. A great approach that many trainers use to put motivation working for them is to get rid of the food bowl and to offer food in training sessions, in puzzle toys or other environmental enrichment options. The dog’s wild cousins have to work hard to get food and that approach seems to make sense to our domesticated companions as well.

Dog trainers also put other resources working in their favour. Does the dog want to sniff a bush? Does he/she want to say hello to another dog or person? Go through a door? Does he/she want you to toss a tennis ball? Dog trainers will ask the dog to do something before they proceed with these highly prized events.

Petting and affection might be valuable in the living room, but out there in the real world they are probably not that high value for your dog. Dog trainers are aware of this and adapt accordingly to the situation they are in. In some cases a piece of kibble is high value enough for your dog to be engaged with you, but in other scenarios you may need a piece of cheese or cooked chicken.

4.       Occupational activities and exercise

Most dogs are pretty good at spending a big part of the day resting and sleeping, but they also need a regular supply of mental and physical stimulation. Dog trainers make sure that their dogs receive exercise and environmental enrichment on a regular basis.

Here are some tips and tricks that dog trainers use:

·         playing fetch will get a dog tired faster than walking him/her on a leash;

·         a long leash (8-15m) attached to a harness is a magnificent tool if the environment is safe enough to use such a device;

·         if you will have a very busy day consider using the services of a dog walker or doggie day care;

·         if the weather is terribly bad, there are still lots of stimulating activities that you can do indoors;

·         leaving a stuffed food toy for your dog to chew will make it more likely for the dog to be content with being left alone;

·         there are many “Kong recipes” out there that will make food toys more challenging and interesting;

·         finding food throughout the house and/or yard is more fun than eating it from a bowl;

·         preventing access to shoes, socks, rubbish, etc. is likely to make your life easier;

·         toys that are available all the time lose value and become “furniture”.

5.       Preparation for real life situations

Dog trainers realize that prevention and preparation will go a long way towards avoiding fears and other behaviours that we do not want our beloved dogs to show. They prepare for such a situation in a way that is easy for the dog to handle before he/she is confronted with the real life potential trigger.

Here is an example: many dogs are very likely to show fear towards thunderstorms or fireworks. One possible way to help preventing this occurrence is to play recorded sounds of those events with soft volume and then progressively increase the volume until it somehow resembles the real sound that the dog may encounter. This process is called desensitization. Dog trainers also like to pair desensitization with counter-conditioning. To include counter-conditioning in the previous example you would pair the “frighting sound” with something that the dog enjoys (e.g. food treats). The sequence would be: thunder sounds equals super yummy food rewards; no sound equals food treats are no longer available and “life is boring”. With this approach we would possibly create a positive emotional response to the sound of thunderstorms or fireworks.

A dog trainer will not wait for these events to be exposed to his/her dog in real life and hope for the best. Instead, a dog trainer will assume that a negative emotional response is likely to evolve if things are left for chance, and for that reason he/she will actively prepare the dog for a real life situation before it happens. If enough preparation is not possible, the dog trainer will use management and counter-conditioning to try to minimize the negative experience as much as possible.

Picture: www.morguefile.com

15 Comments

Dealing with less experienced animal professionals

5/7/2015

11 Comments

Here is a scenario that has happened to me when I started sharing animal training information and that I’ve encountered several times in the cybernetic space: a young animal trainer posts an article, shares a video or opinion on dog training and a more experienced trainer criticizes him/her, mentioning that it is too basic and that the person lacks experience. I consider criticism to be a bad idea and the wrong approach when interacting with another person (for more information on that topic please refer to my article “Gentle with animals, but harsh with people”), but let’s explore the situation from a different perspective.

To make this exercise easier, join me in an imaginary world where we can quantify knowledge (in this case, animal training knowledge). Imagine a scale from 0 to 10 in which 0 is a person that knows absolutely nothing about animal training and 10 is the person that knows everything there is to know on the topic. I can think of a few people that would be a 9 on this scale and many of them would be the influential trainers that have been showing the world that using a carrot is better than using a stick, backing it up with anecdotal and scientific data. In my opinion, the 10 is impossible to achieve because there is always stuff to be learned, but more on that later.

Going back to my opening sentence, imagine that the trainer posting the article or video is a level 3 trainer and gets the “correction” from a level 6 trainer. Probably, the level 6 trainer finds it very basic information and too irrelevant to be shared. Perhaps that specific community has many trainers that are at a higher level than the level 3 trainer. But here is the important piece of information: all the level 6 trainers were level 3 one day. Actually, there was a moment in which they were level 1 or 2 and knew less than this person does at the moment. More importantly, perhaps there are people in that group that are level 1 or 2 at the moment and that can learn from the content that the level 3 person shared.

Many fields related to the biological sciences have been evolving a lot and animal training is especially prone to new information. For many of us the evolution is constant and very fast paced. Let me give you an example in the first person: I offer a lot of free animal training videos on my Youtube channel. Many times, I shoot the video and by the time I start with the editing it has become obsolete in my view. I would change several things if I was to do it again. When I look back at videos that I have done a couple of years ago then my general thought is “what the heck was I thinking when I did that”? That is probably the case for many of my fellow friends and animal trainers. I do not believe however that this is a bad thing. It is evidence that we are trying to learn and evolve each and every day. It shows that a given trainer’s current approach is an improved one when compared with the one he/she was using in the past.

Why do we criticize the “already known” or “not ideal technique” then? Well, perhaps it is a hard wired behavior that evolved to make us survive, to get a competitive edge over others. It makes some intuitive sense that if we devalue someone else's work we might be valued instead. The problem is that we might discourage the level 3 trainer from becoming level 6 and who knows, perhaps one day becoming level 9. We do need force free trainers out there. We do need modern training techniques to be the mainstream information out there.

Some years ago I had a few younger colleagues starting to be given responsibilities that up until that point were only my responsibility or of some older, more experienced colleagues. When this happened, my first natural inclination was to find mistakes and criticize them. I had seen other people using this approach, so certainly that was the normal thing to do, right? Well, I was fortunate enough to follow the lead of some amazing people that would take on the success of younger colleagues and celebrate it as if it was their own success. More impressively, they would even try to learn from younger colleagues. Obviously, they would take most of their new knowledge from international references, but it was still remarkable that they would learn some bits with less experienced people.

Experience is important, but I believe that even more important than experience is a desire to hunt for new knowledge and ideas. I have met some very well experienced animal trainers that ended up stagnating their careers because they did not actively tried to acquire new knowledge. The best animal trainers in the word (the level 9 ones) are always trying to learn more stuff and sometimes they refer to “less knowledgeable people”. Let’s embrace new information even if it comes from newcomers and let’s make the world a more force free place.

Picture: http://www.morguefile.com

11 Comments

Why I chose to be a Modern force free trainer

1/6/2015

36 Comments

In this blog post I will tell you the story of why I decided to be a Modern force free animal trainer. In late 2008, when I decided to try formal training with my grandparents’ dog, I was made aware of a variety of training techniques and approaches. A lot of it ended up being an overload of contradictory information with each side offering very strong arguments and reasons as to why their approach was better. Perhaps by sharing my ideas on this topic I will be able to help people decide on the best way to train their pets.

To make this text easier to understand I will divide animal trainers into three categories: Traditional, Balanced and Modern. If we look at the picture below we can see that at the very left we have Traditional trainers, in the middle we have Balanced trainers and at the right end side we have Modern trainers. For each group I have added which operant conditioning quadrants they use, along with the main training motivation tool. This arrangement will result in an over-simplified version of the several training philosophies out there, so keep in mind that there are obviously people that will fall within different points of this spectrum.

Traditional trainers are trainers that correct incorrect behavior as the main tool to teach the animal what they consider an appropriate behavior. This was the popular way of training dogs and even wild captive animals some decades ago. In regards to dog training, a few things that can identify a trainer that falls into this category would be references to “pack leader”, “dominance”, “dogs are like wolves”, “status”, “hierarchy”, “be strong and firm”, “correct the dog” and “discipline”. Any form of physical correction (including leash ones) can be an indication of a trainer that belongs to this group or the next one.

Balanced trainers will typically provide rewards for correct behavior and aversive consequences for incorrect behavior. This group can be a little more difficult to spot because they do use rewards and, can quickly be confounded with the Modern trainers group to the untrained eye. If, apart from rewards, there is any type of physical correction or intimidation when the animal makes a mistake, the trainer is likely to belong to this category. Some trainers in this category will teach behaviors without force and then “perfect it” with corrections when it is assumed that the animal has already learnt the behavior.

Modern trainers or force free trainers are the ones that do not use force or aversive measures to teach animals. They typically motivate the animal to do what they want and offer rewards for it. They can also remove pleasant things from the environment if the animal is showing an undesired behavior. These trainers manage the environment so that the animal can succeed. For example: instead of correcting a dog for grabbing and chewing socks, these trainers make sure that the dog has no access to socks at all and offer an appropriate chew item when the dog is likely to want to chew. In this group, trainers also understand that many times the animals’ behavior is the result of an underlying emotional state that needs to be addressed first.

I consider myself a Modern trainer and like any other human on the planet I am biased towards my own perception on this topic. At the moment, it is crystal clear to me why being a Modern trainer is better than being a Traditional or Balanced one, but back when I started that was not the case. There are many psychological, behavioral and biological reasons as to why a Modern approach is more appropriate (my last blog post refers some of those reasons), but for purposes of this text I will use a different set of arguments which were the ones that made me choose this approach some years ago.

Argument 1: Going back to my grandparents’ dog, when I decided to start training him I followed a Youtube tutorial. According to this, you would train the dog offering a little bit of praise now and then and the dog would do the things you ask out of a desire to please the owner. Long story short, it was boring, uninteresting and neither I nor the dog enjoyed it. The dog would not look at me and I didn’t feel like it was working. I was a bit frustrated and did some additional research. Soon enough, I found a different video suggesting to train dogs with food. So, the next day that is what I did. What a difference! The dog had his tail up, a smile on his face and was looking at me the entire time. I got a few behaviors started and everything happened with a lot of “gusto”. I was hooked on animal training from this moment onwards. I was also pretty sure that using rewards was a good idea.

Argument 2: I started to read a lot of dog training books and I also bought several dog training DVDs. At this moment I was getting a lot of information from the three different approaches described above. I saw a lot of Modern trainers pointing out that the then famous TV show “Dog Whisperer with Cesar Millan” was a terrible approach to training dogs. I remember wondering if his approach could be used in conjunction with a Modern one, depending on each specific case. Then, I stumbled upon a fact that is still to this day one of my main arguments for the Modern approach. I noticed that all the major international references in animal training with a strong academic background in relevant fields were Modern trainers. People with Masters, PhDs, Post-Docs and Peer Reviewed work in fields related to animal behavior that decided to become dog trainers were all Modern Trainers. I noticed that the Traditional and the Balanced group had a lot of trainers that were self-taught and with many years of experience. In the Modern approach group, trainers without a formal academic background usually have or are currently obtaining certifications and courses based on the latest scientific findings. So if all trainers who have extensive academic studies on this topic choose a Modern approach, instead of a Traditional or Balanced one, what does this tell you?

Let me give you a practical example to further substantiate my point of view. If one day I wake up with a very strange looking mole on my skin the first thing I would do would be to show it to a medical doctor. I would not go to a witchdoctor or anyone that is self-taught on this topic. I want someone with a formal education in the field of medicine to help me figure out what it is and what to do next. I will also not trust my friends’ opinions and diagnosis if they offer it to me.

Argument 3: Over the years I have seen a huge migration from trainers that used to be Traditional or Balanced to Modern trainers. Some of these trainers have actually published some amazing material on this topic with which I learned a lot. Many of them have shared the reasons for that change in approach so that they can influence other trainers to make the same decision. However, I do not commonly see Modern approach trainers wanting to use a Traditional or Balanced approach. So, if we see a lot of people changing from the left to the right side of the spectrum, but not the other way around, what is this telling you?

Argument 4: I worked as marine mammal trainer with several animals that easily weigh four or five times more than I do. With marine mammals the mainstream approach is a Modern one and I am very glad that this is the case. With these animals, if you use a Traditional or Balanced approach you will be in serious risk of getting injured. In that sense, dogs are much more forgiving than many other species. Does that make it right to use aversive methods on them? I don’t think so and I am sure that many of you reading this would agree with me. Sometimes during training questions may arise regarding the suitability of a particular approach. Some of those times I end up asking myself “Is this something that I could use on a fur seal or on a tiger?” If the answer is no, it probably means that there is a better way.

Argument 5: I have also realized that Modern trainers are extremely open about their approach. They have no problem saying that their approach relies on rewards, managing the environment and emotions, etc. Balanced and Traditional trainers are a little bit less comfortable to openly say that they use physical corrections in their trainers’ tool box. Most of the balanced trainers that I have spoken to have no problem telling me that they offer food rewards when the dog is learning something new, but they hesitate to openly refer to how and when they use corrections. Here is an interesting thing I have found: on Youtube you will quickly notice that the vast majority of videos from Modern trainers is open to comments, while the Traditional and Balanced ones usually do not allow comments. What is this telling you?

In sum, a modern approach seems to be the best option for both you and your dog, with long lasting positive effects and happy lives. When in doubt about which approach is best, pull a Temple Grandin approach and put yourself in the animal’s shoes. Would you like to do that behavior to avoid punishment or would you prefer to do it because you will get something that you like? I believe that there will come a time in which a choke chain will be a historical piece kept in museums, which will serve to remind us that back in the day we used to train dogs with those things. I am not sure if that moment will come in 10 or 50 years, but everything seems to suggest that it is coming. The amount of Modern trainers is growing at a pace that the two other approaches combined cannot catch. Personally, I think that the future is certainly a Modern force free approach.

36 Comments

Gentle with animals, but harsh with people?

4/5/2015

17 Comments

Throughout my academic and professional endeavors, I have had the pleasure of working with many extremely knowledgeable people across a variety of topics that interest me. Whenever I have a tutor or someone in charge of my education and training I usually find people use one of two approaches. To make this easier I will group them into Group A and Group B instructors.

Group A instructors will typically rely a lot on getting you to observe them for a long period of time. They will then ask you to do something that you have seen them doing and will carefully observe you doing so. Some instructors will instead rely on a complete explanation of what you have to do to get the task done, whilst others will rely on a combination of the two. After that they will point out what you have done wrong so that you can correct it the next time you have to do the task. These instructors seem to have no problem telling you what you did wrong.

Group B instructors will let you observe, and then they will give you a simple version of the task and explain it in baby steps. Sometimes they will do the task with you at the same time so that you can easily succeed. They will then build up from there and add small components to the task until you are able to perform the entire task. These instructors do not correct your mistakes. They ignore them and focus instead on what you did well. If they really have to point out a mistake they are very careful about how they do it and mask it within things that you did well.

I encounter instructors from Group A much more often than from Group B. I was once given the opportunity to evaluate a younger colleague doing something that was new to him but that I was already comfortable doing. I used a full blown Group A approach to do it. I pointed out many mistakes that should be fixed and I told him every single one them. Upon seeing his reaction I instantly started to feel bad about what I was doing. Within the next hours and even days I continued to feel bad about it. From that moment on I have always tried to use a Group B approach and to focus more on what goes well instead of on what goes wrong.

When I am being coached on a task, if I get an instructor from group A that is pointing out my mistakes I usually feel offended and on my next try at the task I will be afraid of doing it, merely trying to avoid mistakes. I am typically a happy guy and I like to be creative and funny about everything I do. If I am being coached by a Group A instructor all of that goes away and I will merely try to avoid mistakes. Sometimes I might even shut down and freeze. Overall, my relationship with this person starts to be a less pleasant one.

Let’s look at it from the operant conditioning point of view. If I am pointing out mistakes to a colleague I am using an aversive (assuming that he does not like it). If he is doing a task and trying to avoid making mistakes to get the job done some behaviors will be under negative reinforcement control. Isn’t this the stuff that we so passionately try to avoid when teaching animals? Sure, if I tell him what he did wrong there is a good chance of it being fixed the next time he does it, as we have a tendency to love quick fixes, but is that really the best approach?

Let’s look at it from the emotional point of view. What is a good relationship with another person? In my opinion, it is one in which the amount of pleasant interactions clearly outweighs the unpleasant ones. I like to use the bank account analogy in this case: if you want to have a healthy bank account you need to make way more deposits (pleasant interactions) than withdrawals (unpleasant interactions). If you make too many withdrawals you may go broke (seriously damage the relationship). If I am mostly pointing out mistakes I may be poisoning my relationship with that person.

Let’s look at it from the performance side of things. If I correct a person today, perhaps she will be doing the task correctly tomorrow, but she will also be trying to avoid mistakes. That makes it very likely that her performance will have a limit. If on the other hand I use praise and rewards, the person will actively hunt for additional praise and go beyond what is asked, sometimes discovering extremely helpful variations of the task at hand.

Here is a real life example: imagine that you are trying to learn how to cook properly and you just met Peter, a well experience chef. You invite him over to your house to try your dishes for a few days, as a way to improve your skills. Peter arrives at your house and you serve him your food. He tries it and with a disapproving and disappointed look says “This has too much salt and the food is undercooked. I would have chosen a different combination of ingredients. Not an impressive dish to be honest.” How would you feel about Peter’s approach? Even though he is being honest and genuinely trying to help you, my guess is that you would feel demotivated, perhaps offended and not looking forward to Peter’s next visit. You might even not want to try to prepare that dish ever again.

Now, imagine that instead of Peter, you met John. John is also a well experienced chef and when he comes and tries your dish he smiles approvingly and says “This is really good, especially considering how little experience you have. I like the detail that you put into the preparation of this dish and I am so glad you invited me over. Perhaps I could suggest that you use a tiny little bit less salt so that we can taste the flavor of these amazing ingredients even more.” John also notices that the food is slightly undercooked but he prefers not mention that. He then says: “Why don’t I show up earlier next time and help you with the cooking? It will be fun!” How would you feel about this interaction? I would be extremely motivated and looking forward to the next cooking session. I would probably even take an extra step to learn as much as possible about that dish in particular so that I could make it better.

I have met many animal trainers that use approach B brilliantly on animals, but then use approach A with their colleagues. I wonder why that happens. Does it come more natural to us to focus on mistakes? Does it have to do with the way we were raised in a culture that tends to focus a lot on what is done wrong instead of on what is done right? I wonder… I do know one thing: the ones that use approach B on animals and approach B on colleagues are the ones that I hold dearest in my heart. They are also the ones that I love to spend time with and to learn from.

Probably, most of us don’t even realize that we are using approach A as our default way of dealing with people. I am not suggesting that we do it because we want to be mean to other people. Perhaps we have a tendency to repeat the approach that was used on us and it ends up becoming a habit. Many animal trainers progressed from approach A to approach B when applied to animals and ended up becoming international references in the field of positive reinforcement training. I have met many people that started to implement the same changes when it relates to people, and even though there will be an adaptation period, the rewards will be worth the change.

How about each of you reading this text tries to use approach B for 48 hours with every human you encounter throughout the day and then report back to me on what happened during those 48 hours, so that you can show how this approach will benefit not only your work, but also your relationship or friendship. I would bet that you are going to have a happier day. I look forward to hearing back from you!

Picture: http://www.morguefile.com

17 Comments

I want a dog but I cannot commit to it for fifteen years

28/4/2015

9 Comments

Dogs have short life spans, too short in fact. In my opinion, being responsible for a dog is one of the most rewarding things we can do in life. For most of us, the 11-15 years of life that most dogs have is not long enough and we wish we could have our furry friends with us for longer.

Shelters and animal care organizations are full of dogs in need of a home because many dog owners do not prepare in advance for the arrival of the dog, or due to a lack of research before making the decision of taking in a new dog, or as a result of a combination of factors. The best way to become responsible for a dog is to consider those factors and to be in a stable enough situation that allows you to make a long term commitment, thus preventing the shelter dog population from increasing. However, many of us live in a fast moving world and being able to make a 15 year commitment of caring for a dog might not be an option.

This could be the case for a variety of reasons, including but not limited to: you might be going overseas for a few years to fulfill some personal or professional goal, you might consider that becoming a parent in a few years’ time will be incompatible with owning a dog, you might want to relocate to a place where living with a dog is not possible, or you might have a health-related reason.

In this small text I will go over a few options for those of you who still want to experience the pleasure of sharing your life with a dog, but that for some reason find it more prudent to not make a long term commitment. Dogs are obviously not disposable objects so you have to make sure that if for some unforeseen reason plan A does not work, you will be able to find an adequate alternative plan B. Here are a few options with respective pros and cons.

Raising a dog for an Institution that places them with people living with an impairment (blind, autistic, etc.)

Many Institutions place their puppies with families to raise them until they reach an age when they are brought back to the Institution for more serious training and testing, specifically directed at the dogs’ future job. For this purpose, you can apply to become a puppy raiser.

Pros:

·         You get to be responsible for a dog for 1-2 years;

·         You will be helping a very noble cause;

·         Labradors, Goldens and other working breeds are usually the option and these breeds are typically easy to train;

·         You get to bring your puppy with you inside stores, on the bus/tram and other options where pet dogs are not allowed in;

·         The institution that you partner with will provide a lot of support: training, vet care, information, etc.

Cons:

·         You will eventually have to say goodbye to the dog at a moment in time in which your bond will be very strong;

·         Labradors, Goldens and other working breeds might shed a lot;

·         You might have to stick to a schedule (e.g. you may have to be at a certain place every Saturday for training class);

·         If you have an extremely busy schedule it will be very difficult to offer the puppy the necessary socialization;

·         This option is not available everywhere (easier in big cities).

Adopting an older dog

Shelters and animal rescues are packed with dogs from all ages and backgrounds. Young adults and puppies are usually easier to place in new homes, but adopting an older dog could be a great option.

Pros:

·         You are helping a dog with lower chances of being adopted (puppies are more popular);

·         Older dogs are usually less time and attention demanding than puppies or young adults (older dogs sleep a lot);

·         If you are very busy, a low energy person, or not outgoing, an older dog could be a great match.

Cons:

·         If you adopt a 11-13 year old dog you will soon have to undergo the emotional roller coaster of witnessing his/her demise;

·         Older dogs are still highly trainable, but there are limitations (e.g. if you want a dog to do agility with this is not a good option);

·         You might face increased veterinary bills.

Fostering

With so many dogs in need of a home, many shelters and animal rescue organizations are at maximum capacity. Many of these will happily place the dogs in temporary families until they find them a new permanent family.

Pros:

·         You will be helping the dog to find a loving home;

·         You will be helping the animal rescue organization stay alive by cutting some of their expenses;

·         If you do this often you may get to know the personality of several different dogs;

·         It is a joy to find these dogs a loving family.

Cons:

·         Sometimes, you might only get to keep the dog for an extremely short period of time;

·         It might be emotionally demanding to say goodbye to the dog;

·         Some of these dogs can have some behavioral baggage;

·         You might take in a dog that takes very long to get adopted and that will mean that you have to adapt your original plan.

Pet sitting

Pet siting has become very popular over the past few decades, especially in developed countries. In some places this is mostly taken as a hobby while in others it is a professionally competitive market. Whether you want to do it as a part-time occupation for family, friends and neighbors or as a professional endeavor, this is also a good option to share your life with dogs without the long-term commitment.

Pros:

·         You get to meet lots of different dogs;

·         You do not have any financial obligations towards the dog;

·         If you are doing it professionally you will get paid to do this;

·         You do not get too emotionally attached to any of the individual dogs that you pet sit.

Cons:

·         Sharing time with dogs will depend on the dog owners’ needs and not yours;

·         There might be scheduling constraints;

·         If you decide to do this professionally there will be a lot of responsibility on your hands.

I hope that I have presented some helpful points, especially for those of you considering sharing your lives with dogs on a non-permanent basis. I think that it is our responsibility to determine if we can keep a dog for 15 years or if a more short or medium-term solution is more appropriate. This reflects my opinion today and I am sure that when I look at this text a few months from now I will want to change a few things, but that is the beauty of animal behavior: it is always changing and at a given point you can only click and reward what is happening at that moment in time.

Picture: http://www.morguefile.com

9 Comments

Schedules of reinforcement in animal training

Jackpots in Animal Training

Blazing clickers – Click and always offer a treat?

Why is timing so important in animal training?

Five things that dog trainers do differently

Dealing with less experienced animal professionals

Why I chose to be a Modern force free trainer

Gentle with animals, but harsh with people?

I want a dog but I cannot commit to it for fifteen years

Author

Archives

Categories